Skip to content

MacOS-only version with MPS support#127

Open
fnachon wants to merge 13 commits into
gnina:masterfrom
fnachon:master
Open

MacOS-only version with MPS support#127
fnachon wants to merge 13 commits into
gnina:masterfrom
fnachon:master

Conversation

@fnachon

@fnachon fnachon commented Mar 31, 2026

Copy link
Copy Markdown

This is a MacOS-only version created with Claude.
Dependencies must be installed with Homebrew.
Xcode command Line Tools is necessary to compile Metal kernels

@dkoes dkoes left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to replace CUDA with Metal, which is not what is desired. Can the choice of backend be a compile time switch?

@fnachon

fnachon commented Mar 31, 2026

Copy link
Copy Markdown
Author

It may not be easy, but I can always try. I'll have a look at it and come back to you later.
Another option is to create a macOS-specific branch. Not ideal, but better than nothing.
Note that for gnina, I pointed to my fork for MacOS building.

@fnachon

fnachon commented Apr 5, 2026

Copy link
Copy Markdown
Author

I restored CUDA compatibility alongside Metal, with a choice of backend at installation. So that you know, I couldn't test the CUDA side.

@phillipweinstock

Copy link
Copy Markdown

Hi @fnachon I traced your commits back from Gnina.
First off, you attached claudes brainstorming documentation so even if someone merged it in, it would drag all the junk in. Quite frankly this pull is not even possible to audit in the slightest. I commend you wanting to try and vibe code your way around CUDA, but even trained software engineers such as myself know that this isnt a viable solution.
Seeing as you don't want your work to goto waste, I suggest you keep your changes as a patch and follow this package upstream. Same with Gnina. That way you can continue working until someone else figures out how to swap out CUDA. Probably the best way to do it is first shim/wrap all CUDA logic into an abstracted interface, that way the backend becomes swappable instead of a mess of include statements and defines. Its still a very large job, your probably looking at a major rewrite of this library I.E the kind of thing the original authors would probably need a grant to do.

@phillipweinstock

Copy link
Copy Markdown

Actually I've just checked further. Just a little background I have not ever used CUDA before but just a glance at grid_maker.cpp Line 358, shows float3 datatypes everywhere + exposing CUDA datatypes via public facing API. Any serious effort to port this to metal or any other platform like HIPS (close to gold standard), will by definition not be backwards compatible a major architectural refactor is required.
Regarding CUDA datatypes, a quick step to start cleaning up the internals would be replacing the float3 and other CUDA specific data types with a abstracted Vec3 datatype.

fnachon added 7 commits June 30, 2026 23:09
.claude/settings.local.json is a per-developer permission allowlist
with local filesystem paths; it has no business in version control.
The public API reused CUDA's own vector_types.h names (float3, uint2,
...) even in non-CUDA builds, since common.h just redefined structs
with the same names when LIBMOLGRID_USE_CUDA was off. That makes the
API look CUDA-coupled when it isn't. Replace the fallback with a
single, unconditional Vec3/UVec2 pair plus a to_cuda() conversion
helper for the CUDA kernel-launch boundary; downstream renames follow
in subsequent commits.
Mechanical rename of float3/uint2 to Vec3/UVec2 across the public
headers and their host (.cpp) implementations. The genuine CUDA
__global__/__device__ kernel functions declared as friends in
grid_maker.h keep native float3, since they're real device code
compiled by nvcc, not part of this host API.
These Objective-C++ files are compiled by clang as ordinary host C++,
not Metal Shading Language, so they follow the same Vec3/UVec2 rename
as the rest of the host API. src/metal/shaders.metal is untouched:
that's genuine MSL with its own native float3.
GridMaker/GridInterpolater/Transform's host-callable forward/backward
methods now take Vec3 (matching their header declarations) and
convert to native CUDA float3 at the actual kernel-launch boundary via
to_cuda(). Free __global__/__device__ kernels that only do raw GPU
buffer arithmetic keep native float3 throughout. Kernels that call
Vec3-based shared host/device logic (Quaternion::rotate/transform,
cart2grid/grid2cart) use Vec3 directly instead, since it's a trivial
POD valid in both host and device code under nvcc.

I don't have a CUDA toolchain to compile-test this locally (macOS);
this needs verification via the project's CI or on a CUDA machine
before merging upstream.
molgrid.float3 was a public Python class, not just an internal type,
so the C++ rename alone would have broken existing user code and
tests. Bind it as molgrid.Vec3 and alias float3 = Vec3 in
python/__init__.py so molgrid.float3(...) and tuple call sites keep
working unchanged.
Host, Metal (.mm), and CUDA (.cu) test files all call directly into
the renamed public API (constructors, get_grid_dims, forward/backward,
...), so they need the same float3->Vec3 rename. test_transform.py is
left unchanged: it exercises the molgrid.float3 back-compat alias as a
regression check.
@fnachon

fnachon commented Jun 30, 2026

Copy link
Copy Markdown
Author

Thanks for the detailed look, both points were fair and I've pushed fixes:

The .claude/ file — that was a local dev-tool config file (a permission allowlist referencing my own filesystem paths) that had no business in this diff. Removed and added .claude/ to .gitignore (0c522c7).

CUDA datatypes in the public API — also a fair point, and worth separating into two parts:

  • The non-CUDA build path already isolated CUDA via LIBMOLGRID_USE_CUDA (see common.h) — when CUDA is off, float3/uint2/etc. were just local structs, not the real CUDA types. So this was never the "every platform needs its own copy of CUDA logic" problem the rewrite framing suggested.
  • The actual bug was naming: those local structs reused CUDA's names, so the public API (grid_maker.h and friends) looked CUDA-coupled even when it wasn't — exactly what you ran into at grid_maker.cpp.

I've replaced that with a single backend-neutral Vec3/UVec2 (defined unconditionally in common.h, no #ifdef), used throughout the public C++ API, the Metal dispatcher layer, and the Python bindings (molgrid.Vec3, with molgrid.float3 kept as an alias for existing user code). Genuine CUDA float3 is now confined to the actual .cu kernel files and the kernel-launch boundary (one explicit to_cuda() conversion call per launch site); the real Metal Shading Language float3 in shaders.metal is untouched since that's a different compiler/language entirely.

Split into 6 commits so it's reviewable in stages (type definition → public API → Metal dispatcher → CUDA boundary → Python bindings → tests) rather than one large diff.

Caveat: I don't have a CUDA toolchain to compile-test the .cu changes locally (macOS). I built and ran the full C++/Metal/Python test suite here and confirmed no regressions versus the pre-change baseline (a couple of pre-existing failures — an unrelated Metal forward-pass numerical bug and missing torch/psutil in my env — are unchanged either way). The CUDA-side changes should get exercised by CI's CUDA compile step; happy to have someone with a CUDA box sanity-check before this lands anywhere.

ManagedGrid's CPU allocation placed a tiny header struct before the
data pointer, which on the non-CUDA (Metal) build shrank to 1 byte and
broke the page alignment that MTLBuffer's zero-copy wrapping requires.
GPU dispatches (Transform, GridMaker, GridInterpolater) silently wrote
into a buffer that was never visible to the CPU-side pointer, which is
what was causing the apply_transform/Vec3 regression and several other
GPU-path test failures. Fix: keep the header in the same allocation but
place it after the data instead of before, so the data pointer stays
exactly page-aligned.

Also:
- python/CMakeLists.txt: resolve Python symbols via -undefined,dynamic_lookup
  on macOS instead of linking libpython directly, since some distributions
  (e.g. conda) statically embed the interpreter rather than exposing it as
  a dylib; linking it directly pulled in a second CPython runtime and
  crashed on import.
- torch_bindings.py: MolIterDataset.__setstate__ (used when DataLoader
  workers are spawned, which macOS does by default) reconstructed
  self.examples but never recomputed self._num_labels.
- test_coordinateset.py: deduplicate two same-named test functions (the
  second was silently shadowing the first), and make the CoordinateSet
  leak-detection test robust to OpenBabel's legitimate one-time/lazy
  cache allocations instead of asserting byte-exact RSS equality.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
fnachon added a commit to fnachon/gnina that referenced this pull request Jul 1, 2026
gninasrc (molgridder.cpp/h, torch_model.cpp) now uses libmolgrid's
Vec3/UVec2 types, which only exist on the fnachon/libmolgrid fork
(gnina/libmolgrid#127, not yet merged upstream). The non-Metal
ExternalProject_Add was still pulling upstream gnina/libmolgrid,
which has no Vec3 type, breaking the CUDA/Linux CI build. Point it
at the fork like the Metal branch already does, until gnina#127 merges.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Both come from CUDA-templated code now getting instantiated inside
plain .cpp translation units (compiled by g++ directly), not just
.cu files (compiled by nvcc, which is more permissive by default):

- common.h: to_cuda() called make_float3(), which lives in
  <vector_functions.h>. nvcc implicitly includes that for .cu files;
  a host-only g++ compile of a .cpp file doesn't get it, so the name
  was undeclared. Build the ::float3 via aggregate init instead.

- managed_grid.h: cudaMalloc(&gpu_info->gpu_ptr, ...) passes a
  float**/double** where cudaMalloc wants void**, an implicit
  pointer-to-pointer conversion that's illegal in strict C++. nvcc
  passes -fpermissive to its host-compiler stage for .cu files,
  masking this; plain g++ on a .cpp file does not. Add the explicit
  cast.

Confirmed via the PR's own CI log (gnina/libmolgrid run 28481163579):
GCC 11.4.0 + CUDA 12.6.68 hit exactly these two errors while
compiling src/libmolgrid.cpp.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants