You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I attempted to train LLaVA (base LLM = LLaMA 3) using the Liger kernel (https://github.com/linkedin/Liger-Kernel). The loss graph was similar to when I trained LLaVA without the Liger kernel. However, the model trained with the Liger kernel showed lower performance on MLLM benchmarks, such as ChartQA. Since I used LLaMA 3, which is supported by Liger, I didn't expect any issues. Has anyone else tried training LLaVA with the Liger kernel?
Reproduce
from liger_kernel.transformers import apply_liger_kernel_to_llama
print("Apply liger_kernel_to_llama")
apply_liger_kernel_to_llama()
model = LlavaLlamaForCausalLM.from_pretrained(
"meta-llama/Meta-Llama-3-8B",
attn_implementation="flash_attention_2",
torch_dtype=(torch.bfloat16),
)
Versions
transformer = 4.45.1
torch = 2.4.0
a100
The text was updated successfully, but these errors were encountered:
🐛 Describe the bug
I attempted to train LLaVA (base LLM = LLaMA 3) using the Liger kernel (https://github.com/linkedin/Liger-Kernel). The loss graph was similar to when I trained LLaVA without the Liger kernel. However, the model trained with the Liger kernel showed lower performance on MLLM benchmarks, such as ChartQA. Since I used LLaMA 3, which is supported by Liger, I didn't expect any issues. Has anyone else tried training LLaVA with the Liger kernel?
Reproduce
Versions
transformer = 4.45.1
torch = 2.4.0
a100
The text was updated successfully, but these errors were encountered: