Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Direct users to the CUDA metapackage guides for compilation #2416

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

matthewfeickert
Copy link
Member

@matthewfeickert matthewfeickert commented Jan 8, 2025

Resolves #1927

  • The current (2025-01-08) FAQ incorrectly states that compilation of CUDA code using conda-forge provided packages is not possible. Compilation support is provided through the cuda-compiler metapackage and additional cuda metapackages provide additional development and runtime dependencies. The documentation for these metapackage is currently being developed along with their evolution and currently exists as multiple user guides in the cuda feedstock.
  • This change intentionally removes nearly all specific information, not even mentioning the cuda-compiler metapackage, and instead directs readers to the user guides.
  • In keeping with directing all CUDA build information to the cuda-feedstock user guides, remove almost all information from the CUDA builds section of the maintainer Knowledge Base and instead direct people to the user guides.
  • Remove any explicit CUDA version number shown as an example to avoid recommending outdated versions. Instead, use <CUDA VERSION> for something like 11.8 and cuda<version> for something like cuda118, or point to the cuda-feedstock user guides.

PR Checklist:

  • note any issues closed by this PR with closing keywords
  • [N/A] if you are adding a new page under docs/ or community/, you have added it to the sidebar in the corresponding _sidebar.json file
  • [N/A] put any other relevant information below

Amends PRs:

Relevant renders of this PR:

@matthewfeickert matthewfeickert requested a review from a team as a code owner January 8, 2025 07:56
Copy link

netlify bot commented Jan 8, 2025

Deploy Preview for conda-forge-previews ready!

Name Link
🔨 Latest commit 416cf59
🔍 Latest deploy log https://app.netlify.com/sites/conda-forge-previews/deploys/6780519aeb92e300084102ad
😎 Deploy Preview https://deploy-preview-2416--conda-forge-previews.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
Lighthouse
Lighthouse
1 paths audited
Performance: 54
Accessibility: 96
Best Practices: 100
SEO: 89
PWA: -
View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify site configuration.

@matthewfeickert
Copy link
Member Author

matthewfeickert commented Jan 8, 2025

cc @conda-forge/cuda for review as well as everyone who was involved in the Issue #1927 discussion: @jakirkham @pentschev @h-vetinari @vyasr @betatim @traversaro

Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Matthew! 🙏

Included some minor rewording suggestions below

docs/user/faq.md Outdated Show resolved Hide resolved
docs/user/faq.md Outdated Show resolved Hide resolved
@jakirkham
Copy link
Member

Think we can update these to just say CUDA or <CUDA version>

## Installing CUDA-enabled packages like TensorFlow and PyTorch
In conda-forge, some packages are available with GPU support. These packages not only take significantly longer to compile and build, but they also result in rather large binaries that users then download. As an effort to maximize accessibility for users with lower connection and/or storage bandwidth, there is an ongoing effort to limit installing packages compiled for GPUs unnecessarily on CPU-only machines by default. This is accomplished by adding a run dependency, `__cuda`, that detects if the local machine has a GPU. However, this introduces challenges to users who may prefer to still download and use GPU-enabled packages even on a non-GPU machine. For example, login nodes on HPCs often do not have GPUs and their compute counterparts with GPUs often do not have internet access. In this case, a user can override the default setting via the environment variable `CONDA_OVERRIDE_CUDA` to install GPU packages on the login node to be used later on the compute node. At the time of writing (February 2022), we have concluded this safe default behavior is best for most of conda-forge users, with an easy override option available and documented. Please let us know if you have thoughts on or issues with this.
In order to override the default behavior, a user can set the environment variable `CONDA_OVERRIDE_CUDA` like below to install TensorFlow with GPU support even on a machine with CPU only.
```shell-session
CONDA_OVERRIDE_CUDA="11.2" conda install "tensorflow==2.7.0=cuda112*" -c conda-forge
# OR
CONDA_OVERRIDE_CUDA="11.2" mamba install "tensorflow==2.7.0=cuda112*" -c conda-forge
```
:::note
You should select the cudatoolkit version most appropriate for your GPU; currently, we have "10.2", "11.0", "11.1", and "11.2" builds available, where the "11.2" builds are compatible with all cudatoolkits>=11.2. At the time of writing (Mar 2022), there seems to be a bug in how the CUDA builds are resolved by `mamba`, defaulting to `cudatoolkit==10.2`; thus, it is prudent to be as explicit as possible like above or by adding `cudatoolkit>=11.2` or similar to the line above.
:::
For context, installing the TensorFlow 2.7.0 CUDA-enabled variant, `tensorflow==2.7.0=cuda*`, results in approximately 2 GB of packages to download while the CPU variant, `tensorflow=2.7.0=cpu*`, results in approximately 200 MB to download. That is a significant bandwidth and storage wasted if one only needs the CPU only variant!

- Linux: `linux64.yaml` plus the CUDA (10.2, 11.0, 11.1 and 11.2) variants.

where `<VARIANT>` is one of the file names in the `.ci_support/` directory, e.g. `linux64`, `osx64`, and `linux64_cuda102`.


Also think we can cut this section and link to those docs instead

<a id="cuda"></a>
<a id="cuda-builds"></a>
## CUDA builds
Although the provisioned CI machines do not feature a GPU, conda-forge does provide mechanisms
to build CUDA-enabled packages. These mechanisms involve several packages:

@matthewfeickert matthewfeickert force-pushed the fix/remove-outdated-cuda-info branch 2 times, most recently from f53144c to 1cf8a66 Compare January 8, 2025 19:04
docs/user/tipsandtricks.md Outdated Show resolved Hide resolved
docs/user/tipsandtricks.md Show resolved Hide resolved
Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Matthew! 🙏

Generally this looks reasonable

Tried to answer questions below and provide minor suggestions. Please let me know if you have any questions 🙂

docs/user/tipsandtricks.md Outdated Show resolved Hide resolved
docs/user/tipsandtricks.md Outdated Show resolved Hide resolved
docs/user/tipsandtricks.md Outdated Show resolved Hide resolved
docs/user/tipsandtricks.md Show resolved Hide resolved
docs/maintainer/knowledge_base.md Outdated Show resolved Hide resolved
docs/maintainer/knowledge_base.md Outdated Show resolved Hide resolved
@bdice
Copy link
Contributor

bdice commented Jan 9, 2025

Thanks @matthewfeickert for your work on this! It's greatly beneficial to the conda-forge community. And thank you to @jakirkham for the reviews and iteration. 🙇‍♂️

@matthewfeickert matthewfeickert force-pushed the fix/remove-outdated-cuda-info branch from 7b26463 to df17beb Compare January 9, 2025 20:28
@@ -138,19 +138,19 @@ echo "CONDA_SUBDIR: $CONDA_SUBDIR" # Should print "CONDA_SUBDIR: osx-64"

## Installing CUDA-enabled packages like TensorFlow and PyTorch

In conda-forge, some packages are available with GPU support. These packages not only take significantly longer to compile and build, but they also result in rather large binaries that users then download. As an effort to maximize accessibility for users with lower connection and/or storage bandwidth, there is an ongoing effort to limit installing packages compiled for GPUs unnecessarily on CPU-only machines by default. This is accomplished by adding a run dependency, `__cuda`, that detects if the local machine has a GPU. However, this introduces challenges to users who may prefer to still download and use GPU-enabled packages even on a non-GPU machine. For example, login nodes on HPCs often do not have GPUs and their compute counterparts with GPUs often do not have internet access. In this case, a user can override the default setting via the environment variable `CONDA_OVERRIDE_CUDA` to install GPU packages on the login node to be used later on the compute node. At the time of writing (February 2022), we have concluded this safe default behavior is best for most of conda-forge users, with an easy override option available and documented. Please let us know if you have thoughts on or issues with this.
In conda-forge, some packages are available with GPU support. These packages not only take significantly longer to compile and build, but they also result in rather large binaries that users then download. As an effort to maximize accessibility for users with lower connection and/or storage bandwidth, there is an ongoing effort to limit installing packages compiled for GPUs unnecessarily on CPU-only machines by default. This is accomplished by adding a [virtual package](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-virtual.html) run dependency, `__cuda`, that detects if the local machine has a GPU. However, this introduces challenges to users who may prefer to still download and use GPU-enabled packages even on a non-GPU machine. For example, login nodes on HPCs often do not have GPUs and their compute counterparts with GPUs often do not have internet access. In this case, a user can override the default setting via the environment variable `CONDA_OVERRIDE_CUDA` to install GPU packages on the login node to be used later on the compute node. At the time of writing (February 2022), we have concluded this safe default behavior is best for most of conda-forge users, with an easy override option available and documented. Please let us know if you have thoughts on or issues with this.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdice Link to virtual packages docs added here as this is the first place that __cuda shows up.

matthewfeickert and others added 3 commits January 9, 2025 13:56
* The current (2025-01-08) FAQ incorrectly states that compilation of CUDA code
  using conda-forge provided packages is not possible. Compilation support
  is provided through the cuda-compiler metapackage and additional cuda
  metapackages provide additional development and runtime dependencies.
  The documentation for these metapackage is currently being developed along
  with their evolution and currently exists as multiple user guides in the
  cuda feedstock.
   - c.f. https://github.com/conda-forge/cuda-feedstock/blob/main/recipe/README.md
* This change intentionally removes nearly all specific information, not even
  mentioning the cuda-compiler metapackage, and instead directs readers to the
  user guides.

Co-authored-by: jakirkham <[email protected]>
* Remove any explicit CUDA version number shown as an example to avoid
  recommending outdated versions. Instead, use <CUDA VERSION> for something
  like '11.2' and cuda<version> for something like 'cuda112', or point
  to the cuda-feedstock user guides.
* Add link to virtual package docs.
   - c.f. https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-virtual.html

Co-authored-by: jakirkham <[email protected]>
* In keeping with directing all CUDA build information to the cuda-feedstock
  user guides, remove almost all information from the CUDA builds section of
  the maintainer Knowledge Base and instead direct people to the maintainer guide.
   - c.f. https://github.com/conda-forge/cuda-feedstock/blob/main/recipe/doc/recipe_guide.md
@matthewfeickert matthewfeickert force-pushed the fix/remove-outdated-cuda-info branch from df17beb to f35015a Compare January 9, 2025 21:00
Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Matthew! 🙏

This is looking pretty good

Had a comment on one section below

docs/maintainer/knowledge_base.md Outdated Show resolved Hide resolved
* Remove mentions of CUDA 11.2 support for rerenders as CUDA 11.2 was dropped
  on 2024-04-22. Replace this information with the January 2015 supported
  CUDA versions of 11.8 and 12.
   - c.f. https://conda-forge.org/news/2024/03/06/dropping-cuda-112/

Co-authored-by: jakirkham <[email protected]>
@jakirkham
Copy link
Member

@kkraus14 @bdice @vyasr , could you please look over this and share your thoughts? 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

CUDA SDK packages & usage docs
4 participants