Replies: 3 comments 1 reply
-
Llama was meant for only contexts of max. 2024 tokens, this is a fundamental design decision of Meta and we can't do anything about it. |
Beta Was this translation helpful? Give feedback.
-
You can see it working by setting a context length of 512 (-c 512) and n_keep to the length of the prompt - It will forget half of what you talked about, but the sliding context works as expected so that it will remember the most recent parts of the conversation. You can also see this implemented in main.cpp (don’t remember the exact line number) :) |
Beta Was this translation helpful? Give feedback.
-
I'm getting good results with the following custom startup script for 13B models, specifically with gpt4-x-alpaca.
With this script you have the option to add some tuning commands to the initial prompt. It usually behave well if you don't overcomplicate the prompt too much. |
Beta Was this translation helpful? Give feedback.
-
I've been playing with all the llama/alpaca models in interactive mode as a chatbot and after around 800 words the output freezes for ~10 minutes before continuing. However, when it does it's as if the starting prompt and everything else before never existed. The model will even start hallucinating input from my end of the conversation, ignoring the reverse prompt.
I've played with every parameter and setting but cannot seem to fix this behavior. I assumed llama.cpp used a rolling window with the option to keep the first N tokens. Maybe it's just a current limitation but I've had a hard time finding others with a similar issue. (or maybe I'm not looking for the right keywords)
Beta Was this translation helpful? Give feedback.
All reactions