nix-builder: redesign kernel testing#676
Conversation
In development shells, set `LOCAL_KERNELS` so that one can `get_kernel` the kernel using its `repo-id` and get the local build.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
sayakpaul
left a comment
There was a problem hiding this comment.
Nice!
I think it'd be nice to have a short section in local-dev.md or build.md covering:
- what the command does under the hood (the local_install target, arch vs. noarch)
- the supported env vars (DEBUG, MAX_JOBS, NVCC_THREADS, ccache/sccache).
|
|
||
| Writing kernels is a hard problem, so be specific to agents. A robust prompt will declare all core attributes, including: | ||
|
|
||
| - the library, for example `transformers` or `diffusers` |
There was a problem hiding this comment.
How were doc related formatting changes started showing up? Any version updates?
There was a problem hiding this comment.
Possibly my editor reformatting these superfluous EOL whitespaces. If it's compatible with hf doc builder, we could consider using prettier to standardize formatting.
| ├── benchmark_rmsnorm.py # Micro-benchmark script | ||
| ├── build.toml # kernel-builder config | ||
| ├── setup.py # pip install -e . | ||
| ├── setup.py # python setup.py build_kernel |
There was a problem hiding this comment.
Does this mean that the existing skill files need to be updated?
There was a problem hiding this comment.
I think so, I'll make a separate PR for that to move this forward.
| import torch | ||
| import torch.nn.functional as F | ||
|
|
||
| relu = kernels.get_kernel("kernels-community/relu", version=1) |
There was a problem hiding this comment.
This assumes that the kernel has been pushed to the Hub and the tests require loading remotely and NOT locally? Should we perhaps make that point a bit clearer?
There was a problem hiding this comment.
It doesn't, see the note about LOCAL_KERNELS below.
| debug = int(os.environ.get("DEBUG", 0)) | ||
| cfg = "Debug" if debug else "Release" |
There was a problem hiding this comment.
Do we need to document this env var?
There was a problem hiding this comment.
This is part of the standardized CMake/setuptools integration stuff. I am not sure if we really want to document it? We should probably come up with a more principled way of enabling debugging if that's what we want.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
sayakpaul
left a comment
There was a problem hiding this comment.
Thanks!
Just one question:
#676 (comment)
This PR combines several changes to improve testing of kernels.
What we did so far: in tests for e.g. flash-attn3, we imported
flash_attn3in tests. However, this does not properly test the real-world use case of kernels, dynamically loaded kernels. It also lead to issues in some cases because it relies on the compatibility module being used. The compatibility module sometimes triggers a reload of a part of a kernel (first through the randomized name and then again through the regular name), resulting in operator registration clashes, etc.What we will do instead: use
get_kernelsin tests. In development/tests shells setLOCAL_KERNELSto ensure that the local kernel gets tested.Concrete changes:
LOCAL_KERNELSinnix developandnix develop .#test.setup.py build_kernelsubcommand to arch builds as a replacement forpip install. This subcommand was already supported by noarch kernels.setup.py build_kernelin place of pip install.get_kernelto the Kernel tests section.