Serve multiple models with llamacpp server #10431

PierreCarceller · 2024-11-20T13:41:32Z

PierreCarceller
Nov 20, 2024

Is there a way to serve more than one model at the same time? (I don't think so).
Are there any plans to add this feature?

ExtReMLapin · 2024-11-20T14:24:22Z

ExtReMLapin
Nov 20, 2024

abetlen/llama-cpp-python#931

or just start multiple instance lol

1 reply

PierreCarceller Nov 20, 2024
Author

I saw the link you shared but I'd like to use llamacpp server. Not llamacpp-python server.

"or just start multiple instance lol" -> For sure it's a solution that works, but it's a pain to have to manage several instances...

3Simplex · 2024-11-20T20:16:23Z

3Simplex
Nov 20, 2024

I start each model in a separate server instance using a new port for each model. Then I make an API call to any running server providing the array of messages intended for that model. I can also use the very nice webUI that @ngxson made for any running server!

0 replies

kth8 · 2024-11-22T09:50:24Z

kth8
Nov 22, 2024

I saw this project on r/LocalLLaMA as a workaround: https://github.com/mostlygeek/llama-swap

0 replies

Gomez12 · 2024-11-28T05:35:48Z

Gomez12
Nov 28, 2024

I would say either add a proxy server in front of it, or just use a project like llama-swap. Then you can just have all the config options etc per server.

The proxy would just need some simple routing, some simple queuing per server and options like trying to start new model as second server (if you have the memory available) and then you can just reuse all server options.
It also opens up possibilities like reusing the same model with multiple context sizes etc.

0 replies

jukofyork · 2024-11-30T07:36:51Z

jukofyork
Nov 30, 2024

I haven't tried using it, but there is also this:

https://github.com/perk11/large-model-proxy

It looks like you can run multiple backend even and then have them unload automatically using least recently used, etc.

0 replies

Xavantex · 2025-01-05T00:25:30Z

Xavantex
Jan 5, 2025

Still seems like a nice feature, if the will to implement it is there.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serve multiple models with llamacpp server #10431

{{title}}

Replies: 6 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Serve multiple models with llamacpp server #10431

PierreCarceller Nov 20, 2024

Replies: 6 comments · 1 reply

ExtReMLapin Nov 20, 2024

PierreCarceller Nov 20, 2024 Author

3Simplex Nov 20, 2024

kth8 Nov 22, 2024

Gomez12 Nov 28, 2024

jukofyork Nov 30, 2024

Xavantex Jan 5, 2025

PierreCarceller
Nov 20, 2024

Replies: 6 comments 1 reply

ExtReMLapin
Nov 20, 2024

PierreCarceller Nov 20, 2024
Author

3Simplex
Nov 20, 2024

kth8
Nov 22, 2024

Gomez12
Nov 28, 2024

jukofyork
Nov 30, 2024

Xavantex
Jan 5, 2025