Gluon: Linear Layout and Improvement of Triton

OpenAI released tool recently, built on Triton to fit Blackwell's warp-based execution model.