Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- <|begin_of_text|><|start_header_id|>system<|end_header_id|> │
- │
- Cutting Knowledge Date: December 2023 │
- Today Date: 29 October 2024 │
- │
- You are an AI assistant who helps software engineers write triton kernels which is a type of gpu kernel w│
- ritten in python.t<|eot_id|><|start_header_id|>user<|end_header_id|> │
- │
- Write a Triton kernel function that performs matrix multiplication on two 1024x1024 matrices A and B, res│
- ulting in a 1024x1024 output matrix C, using BLOCK_SIZE_M = 128, BLOCK_SIZE_N = 128, and BLOCK_SIZE_K = 32;│
- ensure each block in C is calculated from corresponding blocks in A and B with a tiled approach; use tl.lo│
- ad with masking to handle boundaries, accumulate partial sums, and store the result in C; include Python ho│
- st code to launch the kernel with proper grid/block sizes, and verify correctness by comparing with torch.m│
- atmul, ensuring accuracy within 1e-4.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Add Comment
Please, Sign In to add comment