Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perplexity and memory use comparisons would be useful #9

Open
JohannesGaessler opened this issue Aug 17, 2023 · 4 comments
Open

Perplexity and memory use comparisons would be useful #9

JohannesGaessler opened this issue Aug 17, 2023 · 4 comments

Comments

@JohannesGaessler
Copy link

Currently the README does not necessarily provide a like-for-like comparison because 4 bit quantizations can be of different quality depending on the implementation details. For example, in llama.cpp q4_0 is faster than q4_K_M but the quantization format is less efficient in terms of size. So it would be useful to include measurements for the memory usage as well as measure for the output quality (e.g. perplexity on a large corpus of text) to put the speed numbers into context.

@junrushao
Copy link
Member

Will do

@shwu-nyunai
Copy link

+1 on perplexity. Any timeline on this?
thanks.

@JohannesGaessler
Copy link
Author

I don't know about the timeline but by now llama.cpp has support for the calculation of the KL divergence relative to FP16, see ggerganov/llama.cpp#5076 . This would be a better metric for comparison than perplexity.

@shwu-nyunai
Copy link

sure. I am using the perplexity scores for a paper, hence I need ppl values.
Also, how to go about actually calculating the scores? I doubt I'll be able to directly run the llama script to get the scores on mlc models. Have been trying to find out a way to change mlc_chat but no progress so far.

If you have the scripts that you had used on mlc quantised models, it'd be of great help.

I'm trying to capture out the generated logits for a prompt input. no luck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants