Skip to content

Exclude broken cuda-toolkit wheels on Windows#1884

Closed
rwgk wants to merge 1 commit intoNVIDIA:mainfrom
rwgk:exclude_broken_cuda-toolkit_versions_windows_only
Closed

Exclude broken cuda-toolkit wheels on Windows#1884
rwgk wants to merge 1 commit intoNVIDIA:mainfrom
rwgk:exclude_broken_cuda-toolkit_versions_windows_only

Conversation

@rwgk
Copy link
Copy Markdown
Collaborator

@rwgk rwgk commented Apr 9, 2026

Summary

  • exclude cuda-toolkit 12.9.2 and 13.0.3 only on Windows in cuda_pathfinder, cuda_core, and cuda_bindings
  • keep the existing floating 12.* and 13.* behavior on Linux, since the Linux matrix did not show the same breakage
  • preserve flexibility to pick up later good patch releases while avoiding the two known-bad Windows resolutions

Background

During CI for PR #1817, a group of Windows test jobs started failing in cuda.pathfinder strict mode after new cuda-toolkit patch releases became available on the package index.

Compared with the last successful CI run against main, the affected Windows jobs changed from resolving:

  • cuda-toolkit 12.9.1 to 12.9.2
  • cuda-toolkit 13.0.2 to 13.0.3

Those failing Windows jobs then stopped installing the full set of CTK DLL-providing packages such as cublas, cufft, curand, cusolver, cusparse, npp, and nvjpeg, and cuda.pathfinder strict checks failed as a result.

The strongest current hypothesis is that these two newly published cuda-toolkit patch releases have broken Windows dependency metadata for extras resolution. Linux did not show the same regression.

What This PR Changes

  • cuda_pathfinder

    • split the cu12 and cu13 toolkit requirements by platform
    • exclude 12.9.2 and 13.0.3 only when sys_platform == 'win32'
    • leave non-Windows resolution on the original 12.* / 13.* ranges
  • cuda_core

    • apply the same Windows-only exclusions to test-cu12, test-cu13, test-cu12-ft, and test-cu13-ft
  • cuda_bindings

    • apply the same Windows-only exclusion to the all extra for the toolkit packages it installs

Why Windows-Only

The exclusion-based workaround was first applied uniformly (Linux & Windows) under #1817. That is a significantly simpler change, but strictly speaking the observed breakage is Windows-specific:

  • the failing cuda.pathfinder jobs were Windows jobs
  • Linux jobs using floating 12.* / 13.* specs continued to pass

Restricting the exclusions to Windows makes the workaround narrower and keeps Linux on the normal floating-major behavior.

@rwgk rwgk added this to the cuda.pathfinder next milestone Apr 9, 2026
@rwgk rwgk self-assigned this Apr 9, 2026
@rwgk rwgk added P0 High priority - Must do! CI/CD CI/CD infrastructure labels Apr 9, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

@mdboom mdboom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me if the expectation is that 13.0.3 and 12.9.2 are "permanently broken" and upstream will fix this by making a 13.0.4 and 12.9.3 release.

I also have no objection to the simpler approach that would also apply to Linux, especially since I expect the CTK team would always make a new release for all platforms simultaneously.

@rwgk rwgk marked this pull request as draft April 9, 2026 16:13
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot bot commented Apr 9, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Apr 9, 2026

Back to draft mode: it looks like we will not need this.

@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Apr 9, 2026

Closing after seeing that cuda-toolkit 12.9.2 and 13.0.3 were yanked on PyPI:

https://pypi.org/project/cuda-toolkit/#history

@rwgk rwgk closed this Apr 9, 2026
@rwgk rwgk deleted the exclude_broken_cuda-toolkit_versions_windows_only branch April 9, 2026 17:10
github-actions bot pushed a commit that referenced this pull request Apr 10, 2026
Removed preview folders for the following PRs:
- PR #1833
- PR #1884
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD CI/CD infrastructure P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants