Skip to content

List of Large Language Models and APIs

toncho11 edited this page Apr 27, 2023 · 52 revisions

Models

  • cohere API - trial / paid
    • it provides straight forward context stuffing / few shot learning
    • no model download
  • OpneAI provides Python API for GPT3 - trail / paid
  • GODEL - downloadable and usable with Hugging Face
  • LLaMA 65 billion model from Meta AI
  • LLaMA Alpaca - uses the LLAMA model but fine tuned with different weights
    • Good video
    • Stanford Alpaca team managed to come up with a state-of-the-art instruct model by fine-tuning LLaMA on a fairly small dataset (that was actually produced programatically by GPT-3)
      • Stanford Alpaca - code and documentation to train Stanford's Alpaca models, and generate the data
    • C++ code
    • Alpaca-LoRA - is a 7B-parameter LLaMA model finetuned to follow instructions. It is trained on the Stanford Alpaca dataset and makes use of the Huggingface LLaMA implementation.
  • GPT-Neo 2.7B - a transformer model designed using EleutherAI's replication of the GPT-3 architecture. The model is available on HuggingFace. Although it can be used for different tasks, the model is best at what it was pretrained for, which is generating texts from a prompt.
  • GPT-J - a good example of a very capable model that only works correctly with few-shot learning. GPT-J 6B was trained on the Pile, a large-scale curated dataset created by EleutherAI.
  • DOLLY - Databricks’ Dolly is a large language model that fine-tuned the (GPT-J) on a focused corpus of 50k records (Stanford Alpaca) and produced high quality instruction following behavior not characteristic of the foundation model on which it is based.
  • GPT4ALL
    • paper
    • code - Demo, data and code to train an assistant-style large language model with ~800k GPT-3.5-Turbo Generations based on LLaMa
  • Vicuna-13B - an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT
    • description
    • code github
    • requires you to first get the original weights from Facebook of LLaMA and then you need to apply a delta with weights provided by Vicuna. They provided a script that does apply the delta and converts the weights to the latest Hugging Face format.
    • Current version is 13B, 7B is not available. It requires either 60 GB of RAM or 28 GB of GPU.
  • Baize
    • It uses ChatGPT to engage in a conversation with itself in order to automatically generate a high-quality multi-turn chat corpus. It fine-tuned Meta’s open-source large language model, LLaMA, resulting in a new model called Baize. This open-source chat model has exceptional performance and can function with just one GPU
    • paper, github, demo
  • Koala - a chatbot trained by fine-tuning Meta’s LLaMA on dialogue data gathered from the web
    • 13B
    • again based on LLaMA
    • dialog fine tuned
    • trained with EasyLM
  • ChatGLM
    • 6B
  • Hugging Chat

Tools

Timeline of Open and Proprietary Large Language Models

https://chat.lmsys.org/