Error when running MLPerf inference with --device rocm = main.py: error: argument --device: invalid choice: 'rocm' (choose from 'cpu', 'cuda:0') #649

altairBASIC · 2025-01-15T00:45:35Z

I am running the MLPerf inference benchmark for the Llama2-70b-99 model on a cluster with 6 MI210 GPUs. Below is the command I am using with CM:

cm run script --tags=run-mlperf,inference,_find-performance,_full,_r5.0-dev --model=llama2-70b-99 --implementation=reference --framework=pytorch --category=datacenter --scenario=Offline --execution_mode=test --device=rocm --quiet --test_query_count=10 --env.LLAMA2_CHECKPOINT_PATH=/home/intern01/Llama-2-70b-chat-hf

When I try to run the script with the --device rocm option, I get the error message above. It seems that rocm is not recognized as a valid device option, as the script only accepts cpu or cuda:0. This is the full message `CM script::benchmark-program/run.sh

Run Directory: /home/intern01/CM/repos/local/cache/12bee67ce1d840d4/inference/language/llama2-70b

CMD: /home/intern01/CM/repos/local/cache/def32291fe4247de/mlperf/bin/python3 main.py  --scenario Offline --dataset-path /home/intern01/CM/repos/local/cache/b4603ed8799641d8/open_orca/open_orca_gpt4_tokenized_llama.sampled_24576.pkl.gz --device rocm   --total-sample-count 10 --user-conf '/home/intern01/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/8b4fd7479b754685ab7d620e3a9af93e.conf' --output-log-dir /home/intern01/CM/repos/local/cache/0b04afd372744cef/test_results/gn005-reference-rocm-pytorch-v2.6.0.dev20241122-scc24-base/llama2-70b-99/offline/performance/run_1 --dtype float16 --model-path /home/intern01/Llama-2-70b-chat-hf 2>&1 | tee '/home/intern01/CM/repos/local/cache/0b04afd372744cef/test_results/gn005-reference-rocm-pytorch-v2.6.0.dev20241122-scc24-base/llama2-70b-99/offline/performance/run_1/console.out'; echo \${PIPESTATUS[0]} > exitstatus

INFO:root:         ! cd /home/intern01/CM/repos/local/cache/dd75d90466a24ac1
INFO:root:         ! call /home/intern01/CM/repos/mlcommons@mlperf-automations/script/benchmark-program/run.sh from tmp-run.sh

/home/intern01/CM/repos/local/cache/def32291fe4247de/mlperf/bin/python3 main.py  --scenario Offline --dataset-path /home/intern01/CM/repos/local/cache/b4603ed8799641d8/open_orca/open_orca_gpt4_tokenized_llama.sampled_24576.pkl.gz --device rocm   --total-sample-count 10 --user-conf '/home/intern01/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/8b4fd7479b754685ab7d620e3a9af93e.conf' --output-log-dir /home/intern01/CM/repos/local/cache/0b04afd372744cef/test_results/gn005-reference-rocm-pytorch-v2.6.0.dev20241122-scc24-base/llama2-70b-99/offline/performance/run_1 --dtype float16 --model-path /home/intern01/Llama-2-70b-chat-hf 2>&1 | tee '/home/intern01/CM/repos/local/cache/0b04afd372744cef/test_results/gn005-reference-rocm-pytorch-v2.6.0.dev20241122-scc24-base/llama2-70b-99/offline/performance/run_1/console.out'; echo ${PIPESTATUS[0]} > exitstatus
usage: main.py [-h] [--scenario {Offline,Server}] [--model-path MODEL_PATH]
               [--dataset-path DATASET_PATH] [--accuracy] [--dtype DTYPE]
               [--device {cpu,cuda:0}] [--audit-conf AUDIT_CONF]
               [--user-conf USER_CONF]
               [--total-sample-count TOTAL_SAMPLE_COUNT]
               [--batch-size BATCH_SIZE] [--output-log-dir OUTPUT_LOG_DIR]
               [--enable-log-trace] [--num-workers NUM_WORKERS] [--vllm]
               [--api-model-name API_MODEL_NAME] [--api-server API_SERVER]
main.py: error: argument --device: invalid choice: 'rocm' (choose from 'cpu', 'cuda:0')

CM error: Portable CM script failed (name = benchmark-program, return code = 512)

Could you please advise on how to enable or fix the rocm support for this benchmark? Thanks

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when running MLPerf inference with --device rocm = main.py: error: argument --device: invalid choice: 'rocm' (choose from 'cpu', 'cuda:0') #649

Error when running MLPerf inference with --device rocm = main.py: error: argument --device: invalid choice: 'rocm' (choose from 'cpu', 'cuda:0') #649

altairBASIC commented Jan 15, 2025 •

edited

Loading

Error when running MLPerf inference with --device rocm = main.py: error: argument --device: invalid choice: 'rocm' (choose from 'cpu', 'cuda:0') #649

Error when running MLPerf inference with --device rocm = main.py: error: argument --device: invalid choice: 'rocm' (choose from 'cpu', 'cuda:0') #649

Comments

altairBASIC commented Jan 15, 2025 • edited Loading

altairBASIC commented Jan 15, 2025 •

edited

Loading