Releases: ggerganov/llama.cpp
Releases · ggerganov/llama.cpp
b4458
b4457
llama: add support for QRWKV6 model architecture (#11001) llama: add support for QRWKV6 model architecture (#11001) * WIP: Add support for RWKV6Qwen2 Signed-off-by: Molly Sophia <[email protected]> * RWKV: Some graph simplification Signed-off-by: Molly Sophia <[email protected]> * Add support for RWKV6Qwen2 with cpu and cuda GLA Signed-off-by: Molly Sophia <[email protected]> * RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead Signed-off-by: Molly Sophia <[email protected]> * Fix some typos Signed-off-by: Molly Sophia <[email protected]> * code format changes Signed-off-by: Molly Sophia <[email protected]> * Fix wkv test & add gla test Signed-off-by: Molly Sophia <[email protected]> * Fix cuda warning Signed-off-by: Molly Sophia <[email protected]> * Update README.md Signed-off-by: Molly Sophia <[email protected]> * Update ggml/src/ggml-cuda/gla.cu Co-authored-by: Georgi Gerganov <[email protected]> * Fix fused lerp weights loading with RWKV6 Signed-off-by: Molly Sophia <[email protected]> * better sanity check skipping for QRWKV6 in llama-quant thanks @compilade Signed-off-by: Molly Sophia <[email protected]> Co-authored-by: compilade <[email protected]> --------- Signed-off-by: Molly Sophia <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: compilade <[email protected]>
b4456
SYCL: Refactor ggml_sycl_compute_forward (#11121) * SYCL: refactor ggml_sycl_compute_forward * SYCL: add back GGML_USED(dst) to ggml_sycl_cpy * SYCL: add function name to noop debug * SYCL: Some device info print refactoring and add details of XMX availability
b4453
model: Add support for PhiMoE arch (#11003) * model: support phimoe * python linter * doc: minor Co-authored-by: ThiloteE <[email protected]> * doc: minor Co-authored-by: ThiloteE <[email protected]> * doc: add phimoe as supported model ggml-ci --------- Co-authored-by: ThiloteE <[email protected]>
b4451
llama-chat : add phi 4 template (#11148)
b4450
fix: add missing msg in static_assert (#11143) Signed-off-by: hydai <[email protected]>
b4447
ci : use actions from ggml-org (#11140)
b4446
lora : improve compat with `mergekit-extract-lora` (#11131) * (wip) support mergekit-extracted lora * support mergekit-extract-lora * use lora->get_scale * correct comment * correct norm name & condition * add some hints
b4445
llama : avoid hardcoded QK_K (#11061) ggml-ci
b4443
ggml : allow loading backend with env variable (ggml/1059) ref: #1058