Would be great to have a simple example showing how to build and use custom Metal (MPS) kernels with torch.utils.cpp_extension on Apple Silicon. Right now there’s nothing covering MPS, and it’s not obvious how to compile .metal files or register them with PyTorch. Even a minimal example (like an elementwise add) and a short note in the README would help a lot.