MacOS-only version with MPS support#127
Conversation
Rewriting for support of Apple silicon
dkoes
left a comment
There was a problem hiding this comment.
This seems to replace CUDA with Metal, which is not what is desired. Can the choice of backend be a compile time switch?
|
It may not be easy, but I can always try. I'll have a look at it and come back to you later. |
|
I restored CUDA compatibility alongside Metal, with a choice of backend at installation. So that you know, I couldn't test the CUDA side. |
|
Hi @fnachon I traced your commits back from Gnina. |
|
Actually I've just checked further. Just a little background I have not ever used CUDA before but just a glance at grid_maker.cpp Line 358, shows float3 datatypes everywhere + exposing CUDA datatypes via public facing API. Any serious effort to port this to metal or any other platform like HIPS (close to gold standard), will by definition not be backwards compatible a major architectural refactor is required. |
.claude/settings.local.json is a per-developer permission allowlist with local filesystem paths; it has no business in version control.
The public API reused CUDA's own vector_types.h names (float3, uint2, ...) even in non-CUDA builds, since common.h just redefined structs with the same names when LIBMOLGRID_USE_CUDA was off. That makes the API look CUDA-coupled when it isn't. Replace the fallback with a single, unconditional Vec3/UVec2 pair plus a to_cuda() conversion helper for the CUDA kernel-launch boundary; downstream renames follow in subsequent commits.
Mechanical rename of float3/uint2 to Vec3/UVec2 across the public headers and their host (.cpp) implementations. The genuine CUDA __global__/__device__ kernel functions declared as friends in grid_maker.h keep native float3, since they're real device code compiled by nvcc, not part of this host API.
These Objective-C++ files are compiled by clang as ordinary host C++, not Metal Shading Language, so they follow the same Vec3/UVec2 rename as the rest of the host API. src/metal/shaders.metal is untouched: that's genuine MSL with its own native float3.
GridMaker/GridInterpolater/Transform's host-callable forward/backward methods now take Vec3 (matching their header declarations) and convert to native CUDA float3 at the actual kernel-launch boundary via to_cuda(). Free __global__/__device__ kernels that only do raw GPU buffer arithmetic keep native float3 throughout. Kernels that call Vec3-based shared host/device logic (Quaternion::rotate/transform, cart2grid/grid2cart) use Vec3 directly instead, since it's a trivial POD valid in both host and device code under nvcc. I don't have a CUDA toolchain to compile-test this locally (macOS); this needs verification via the project's CI or on a CUDA machine before merging upstream.
molgrid.float3 was a public Python class, not just an internal type, so the C++ rename alone would have broken existing user code and tests. Bind it as molgrid.Vec3 and alias float3 = Vec3 in python/__init__.py so molgrid.float3(...) and tuple call sites keep working unchanged.
Host, Metal (.mm), and CUDA (.cu) test files all call directly into the renamed public API (constructors, get_grid_dims, forward/backward, ...), so they need the same float3->Vec3 rename. test_transform.py is left unchanged: it exercises the molgrid.float3 back-compat alias as a regression check.
|
Thanks for the detailed look, both points were fair and I've pushed fixes: The CUDA datatypes in the public API — also a fair point, and worth separating into two parts:
I've replaced that with a single backend-neutral Split into 6 commits so it's reviewable in stages (type definition → public API → Metal dispatcher → CUDA boundary → Python bindings → tests) rather than one large diff. Caveat: I don't have a CUDA toolchain to compile-test the |
ManagedGrid's CPU allocation placed a tiny header struct before the data pointer, which on the non-CUDA (Metal) build shrank to 1 byte and broke the page alignment that MTLBuffer's zero-copy wrapping requires. GPU dispatches (Transform, GridMaker, GridInterpolater) silently wrote into a buffer that was never visible to the CPU-side pointer, which is what was causing the apply_transform/Vec3 regression and several other GPU-path test failures. Fix: keep the header in the same allocation but place it after the data instead of before, so the data pointer stays exactly page-aligned. Also: - python/CMakeLists.txt: resolve Python symbols via -undefined,dynamic_lookup on macOS instead of linking libpython directly, since some distributions (e.g. conda) statically embed the interpreter rather than exposing it as a dylib; linking it directly pulled in a second CPython runtime and crashed on import. - torch_bindings.py: MolIterDataset.__setstate__ (used when DataLoader workers are spawned, which macOS does by default) reconstructed self.examples but never recomputed self._num_labels. - test_coordinateset.py: deduplicate two same-named test functions (the second was silently shadowing the first), and make the CoordinateSet leak-detection test robust to OpenBabel's legitimate one-time/lazy cache allocations instead of asserting byte-exact RSS equality. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
gninasrc (molgridder.cpp/h, torch_model.cpp) now uses libmolgrid's Vec3/UVec2 types, which only exist on the fnachon/libmolgrid fork (gnina/libmolgrid#127, not yet merged upstream). The non-Metal ExternalProject_Add was still pulling upstream gnina/libmolgrid, which has no Vec3 type, breaking the CUDA/Linux CI build. Point it at the fork like the Metal branch already does, until gnina#127 merges. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Both come from CUDA-templated code now getting instantiated inside plain .cpp translation units (compiled by g++ directly), not just .cu files (compiled by nvcc, which is more permissive by default): - common.h: to_cuda() called make_float3(), which lives in <vector_functions.h>. nvcc implicitly includes that for .cu files; a host-only g++ compile of a .cpp file doesn't get it, so the name was undeclared. Build the ::float3 via aggregate init instead. - managed_grid.h: cudaMalloc(&gpu_info->gpu_ptr, ...) passes a float**/double** where cudaMalloc wants void**, an implicit pointer-to-pointer conversion that's illegal in strict C++. nvcc passes -fpermissive to its host-compiler stage for .cu files, masking this; plain g++ on a .cpp file does not. Add the explicit cast. Confirmed via the PR's own CI log (gnina/libmolgrid run 28481163579): GCC 11.4.0 + CUDA 12.6.68 hit exactly these two errors while compiling src/libmolgrid.cpp. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
This is a MacOS-only version created with Claude.
Dependencies must be installed with Homebrew.
Xcode command Line Tools is necessary to compile Metal kernels