Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Choice of NVML Device #38

Open
ranocha opened this issue Mar 27, 2019 · 4 comments
Open

Enable Choice of NVML Device #38

ranocha opened this issue Mar 27, 2019 · 4 comments

Comments

@ranocha
Copy link
Member

ranocha commented Mar 27, 2019

Up to now, the NVML device is hard coded in

nvmlDeviceGetHandleByIndex(0, &device);
and
nvmlDeviceGetHandleByIndex(0, &device);

We should add some command line option to choose another device or even other devices. A somewhat simple option would be to allow logging on only one device. However, I would prefer the ability to enable logging on an arbitrary number of devices.

As described at here, it would be better to use nvmlDeviceGetHandleByUUID or nvmlDeviceGetHandleByPciBusId.

CC @Kostaszki

@ranocha
Copy link
Member Author

ranocha commented Mar 27, 2019

Another option would be to log the power/temperature of all devices, similar to the approach for Intel (packages 0 and 1).

@Kostaszki
Copy link
Member

When logging the power/temperature of all device you still need an option to correlate the used OpenCL device with the NVML device. Considering this I would prefer the command line option.

@ranocha
Copy link
Member Author

ranocha commented Mar 27, 2019

In that case, one possibility might be to query the UUID via

$ nvidia-smi -L
GPU 0: GeForce GTX 1070 Ti (UUID: GPU-7350c62a-efab-c59a-a51f-f99f19ccbf6b)

Then, we can have the general calling syntax toolkitICL -d 0 -nvidia_power 100 [optional uuids] -c config.h5.

  1. If no UUID is specified, we can/should log all devices. If we consider the current behavior as a bug, that's okay for a new release. Otherwise, we would have to go to version 2.0.0 if we change this behavior.
  2. If at least one UUID is specified, the devices having these UUIDs should be used for logging.

We can use names such as power0, power1 to enumerate the devices (in the order used by nvml in case 1 or in the given order in case 2). The UUID (and possibly other data) could be added to the description.

@ranocha
Copy link
Member Author

ranocha commented Mar 31, 2019

In #39, @philipheinisch implemented a sensible default value for nvml. Maybe we want to enable additional logging of specificlly chosen devices for a more general power logging library?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants