Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows 11: Can't run LLama #1919

Open
denmuslimov opened this issue Jan 10, 2025 · 0 comments
Open

Windows 11: Can't run LLama #1919

denmuslimov opened this issue Jan 10, 2025 · 0 comments

Comments

@denmuslimov
Copy link

I'm experimenting with different models.
Some refuse to run.

Windows 11, RTX 4070 12Gb

The following commands:
python generate.py --base_model=TheBloke/Llama-2-7b-Chat-GPTQ --load_gptq="model" --use_safetensors=True --prompt_type=llama2 --save_dir='save'
python generate.py --base_model=TheBloke/Nous-Hermes-13B-GPTQ --score_model=None --load_gptq=model --use_safetensors=True --prompt_type=instruct --langchain_mode=UserData

Return the this error:

...
  File "C:\Users\Worker\Documents\AI\h2ogpt\generate.py", line 20, in <module>
    entrypoint_main()
  File "C:\Users\Worker\Documents\AI\h2ogpt\generate.py", line 16, in entrypoint_main
    H2O_Fire(main)
  File "C:\Users\Worker\Documents\AI\h2ogpt\src\utils.py", line 79, in H2O_Fire
    fire.Fire(component=component, command=args)
  File "C:\Users\Worker\miniconda3\envs\h2ogpt\lib\site-packages\fire\core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "C:\Users\Worker\miniconda3\envs\h2ogpt\lib\site-packages\fire\core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "C:\Users\Worker\miniconda3\envs\h2ogpt\lib\site-packages\fire\core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "C:\Users\Worker\Documents\AI\h2ogpt\src\gen.py", line 2329, in main
    model_state_trial = model_lock_to_state(model_dict, cache_model_state=False, **kwargs_model_lock_to_state)
  File "C:\Users\Worker\Documents\AI\h2ogpt\src\model_utils.py", line 1689, in model_lock_to_state
    return __model_lock_to_state(model_dict1, **kwargs)
  File "C:\Users\Worker\Documents\AI\h2ogpt\src\model_utils.py", line 1767, in __model_lock_to_state
    model0, tokenizer0, device = get_model_retry(reward_type=False,
  File "C:\Users\Worker\Documents\AI\h2ogpt\src\model_utils.py", line 425, in get_model_retry
    model1, tokenizer1, device1 = get_model(**kwargs)
  File "C:\Users\Worker\Documents\AI\h2ogpt\src\model_utils.py", line 1208, in get_model
    return get_hf_model(load_8bit=load_8bit,
  File "C:\Users\Worker\Documents\AI\h2ogpt\src\model_utils.py", line 1534, in get_hf_model
    model = exllama_set_max_input_length(model, tokenizer.model_max_length)
  File "C:\Users\Worker\miniconda3\envs\h2ogpt\lib\site-packages\auto_gptq\utils\exllama_utils.py", line 15, in exllama_set_max_input_length
    from exllama_kernels import prepare_buffers, cleanup_buffers_cuda
ImportError: DLL load failed while importing exllama_kernels: The specified module could not be found.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant