Releases · ggerganov/llama.cpp

10 Jan 06:22

c3f9d25

b4458 Latest

Latest

Vulkan: Fix float16 use on devices without float16 support + fix subg…

Assets 23

cudart-llama-bin-win-cu11.7-x64.zip

303 MB 2025-01-10T06:22:40Z
cudart-llama-bin-win-cu12.4-x64.zip

373 MB 2025-01-10T06:22:47Z
llama-b4458-bin-macos-arm64.zip

12.6 MB 2025-01-10T06:22:53Z
llama-b4458-bin-macos-x64.zip

13.6 MB 2025-01-10T06:22:54Z
llama-b4458-bin-ubuntu-x64.zip

15.5 MB 2025-01-10T06:22:55Z
llama-b4458-bin-win-avx-x64.zip

9.81 MB 2025-01-10T06:22:55Z
llama-b4458-bin-win-avx2-x64.zip

9.82 MB 2025-01-10T06:22:56Z
llama-b4458-bin-win-avx512-x64.zip

9.83 MB 2025-01-10T06:22:57Z
llama-b4458-bin-win-cuda-cu11.7-x64.zip

147 MB 2025-01-10T06:22:58Z
llama-b4458-bin-win-cuda-cu12.4-x64.zip

147 MB 2025-01-10T06:23:01Z
Source code (zip)

2025-01-10T05:39:33Z
Source code (tar.gz)

2025-01-10T05:39:33Z

10 Jan 02:48

github-actions

b4457

ee7136c

b4457

llama: add support for QRWKV6 model architecture (#11001)

llama: add support for QRWKV6 model architecture (#11001)

* WIP: Add support for RWKV6Qwen2

Signed-off-by: Molly Sophia <[email protected]>

* RWKV: Some graph simplification

Signed-off-by: Molly Sophia <[email protected]>

* Add support for RWKV6Qwen2 with cpu and cuda GLA

Signed-off-by: Molly Sophia <[email protected]>

* RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead

Signed-off-by: Molly Sophia <[email protected]>

* Fix some typos

Signed-off-by: Molly Sophia <[email protected]>

* code format changes

Signed-off-by: Molly Sophia <[email protected]>

* Fix wkv test & add gla test

Signed-off-by: Molly Sophia <[email protected]>

* Fix cuda warning

Signed-off-by: Molly Sophia <[email protected]>

* Update README.md

Signed-off-by: Molly Sophia <[email protected]>

* Update ggml/src/ggml-cuda/gla.cu

Co-authored-by: Georgi Gerganov <[email protected]>

* Fix fused lerp weights loading with RWKV6

Signed-off-by: Molly Sophia <[email protected]>

* better sanity check skipping for QRWKV6 in llama-quant

thanks @compilade

Signed-off-by: Molly Sophia <[email protected]>
Co-authored-by: compilade <[email protected]>

---------

Signed-off-by: Molly Sophia <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: compilade <[email protected]>

Assets 23

10 Jan 00:52

github-actions

b4456

c6860cc

b4456

SYCL: Refactor ggml_sycl_compute_forward (#11121)

* SYCL: refactor ggml_sycl_compute_forward

* SYCL: add back GGML_USED(dst) to ggml_sycl_cpy

* SYCL: add function name to noop debug

* SYCL: Some device info print refactoring and add details of XMX availability

Assets 23

09 Jan 10:57

github-actions

b4453

f8feb4b

b4453

model: Add support for PhiMoE arch (#11003)

* model: support phimoe

* python linter

* doc: minor

Co-authored-by: ThiloteE <[email protected]>

* doc: minor

Co-authored-by: ThiloteE <[email protected]>

* doc: add phimoe as supported model

ggml-ci

---------

Co-authored-by: ThiloteE <[email protected]>

Assets 23

09 Jan 09:51

github-actions

b4451

d9feae1

b4451

llama-chat : add phi 4 template (#11148)

Assets 23

08 Jan 20:47

github-actions

b4450

8d59d91

b4450

fix: add missing msg in static_assert (#11143)

Signed-off-by: hydai <[email protected]>

Assets 23

08 Jan 16:11

github-actions

b4447

f7cd133

b4447

ci : use actions from ggml-org (#11140)

Assets 23

08 Jan 15:49

github-actions

b4446

4d2b3d8

b4446

lora : improve compat with `mergekit-extract-lora` (#11131)

* (wip) support mergekit-extracted lora

* support mergekit-extract-lora

* use lora->get_scale

* correct comment

* correct norm name & condition

* add some hints

Assets 23

08 Jan 15:42

github-actions

b4445

c07d437

b4445

llama : avoid hardcoded QK_K (#11061)

ggml-ci

Assets 23

08 Jan 12:24

github-actions

b4443

c792dcf

b4443

ggml : allow loading backend with env variable (ggml/1059)

ref: #1058

Assets 23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggerganov/llama.cpp

b4458

b4457

b4456

b4453

b4451

b4450

b4447

b4446

b4445

b4443