Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roadmap: llamacpp-engine to align with llama.cpp upstream #1728

Open
dan-menlo opened this issue Nov 26, 2024 · 2 comments
Open

roadmap: llamacpp-engine to align with llama.cpp upstream #1728

dan-menlo opened this issue Nov 26, 2024 · 2 comments
Assignees
Labels
type: epic A major feature or initiative

Comments

@dan-menlo
Copy link
Contributor

dan-menlo commented Nov 26, 2024

Goal

  • Goal: Can we have a minimalist fork of llama.cpp as llamacpp-engine
    • cortex.cpp's desktop focus means Drogon's features are unused
    • We should contribute our vision and multimodal work upstream as a form of llama.cpp server
    • Very clear Engines abstraction (i.e. support OpenVino etc in the future)
  • Goal: Contribute upwards to llama.cpp
    • Vision, multimodal
    • May not be possible if the vision, audio encoders are Python-runtime based

Can we consider refactoring llamacpp-engine to use the server implementation, and maintain a fork with our improvements to speech, vision etc? This is especially if we do a C++ implementation of whisperVQ in the future.

@dan-menlo dan-menlo added the type: epic A major feature or initiative label Nov 26, 2024
@github-project-automation github-project-automation bot moved this to Investigating in Jan & Cortex Nov 26, 2024
@vansangpfiev
Copy link
Contributor

I agree that we should align with the llama.cpp upstream, but I have several concerns:

  • Drogon is part of cortex.cpp, we have already removed it from llama-cpp engine. If we remove Drogon from cortex.cpp, we need to find a replacement, which will be costly.
  • Repository Structure: Forking the server implementation will necessitate changes to our repository structure, since we currently use llama.cpp as a submodule.
  • Our current version differs significantly from the upstream version, which will require considerable time for refactoring.

@gabrielle-ong gabrielle-ong added this to the v1.0.5 milestone Nov 28, 2024
@gabrielle-ong gabrielle-ong removed this from the v1.0.5 milestone Nov 28, 2024
@github-project-automation github-project-automation bot moved this from Investigating to QA in Jan & Cortex Dec 15, 2024
@dan-menlo dan-menlo reopened this Dec 15, 2024
@github-project-automation github-project-automation bot moved this from QA to In Progress in Jan & Cortex Dec 15, 2024
@dan-menlo dan-menlo changed the title epic: llamacpp-engine to align with llama.cpp upstream roadmap: llamacpp-engine to align with llama.cpp upstream Dec 15, 2024
@vansangpfiev
Copy link
Contributor

vansangpfiev commented Dec 23, 2024

Tasklist:

  • Fork llama.cpp and try to use llama.cpp server
  • Split vision model flow with chat model flow
  • llama.cpp server should be a new process
  • Test with OpenAI API compatible features that Alex and James added
  • CI
  • Docs

Related tickets need to be tested and verified:

Approach 1: cortex.llamacpp spawns llama.cpp server as a new process

  • pros: can directly use llama.cpp server binary
  • cons: spawn a new process is expensive and hard to control the process lifetime

image

Approach 2: Build llama.cpp server as a library and load it into cortex.llamacpp process

  • pros: llama.cpp server can be embedded into cortex.llamacpp
  • cons: will need to apply a patch to build llama.cpp server as a library

image

image

@vansangpfiev vansangpfiev moved this from In Progress to Eng Review in Jan & Cortex Jan 2, 2025
@dan-menlo dan-menlo moved this from Eng Review to QA in Jan & Cortex Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: epic A major feature or initiative
Projects
Status: QA
Development

No branches or pull requests

4 participants