[Partner] Gemini Embeddings (#14690)

Add support for Gemini embeddings in the langchain-google-genai package
langchain-ai · Dec 14, 2023 · 1e21a3f · 1e21a3f
1 parent 3449fce
commit 1e21a3f
Show file tree

Hide file tree

Showing 13 changed files with 606 additions and 55 deletions.
diff --git a/docs/docs/integrations/text_embedding/google_generative_ai.ipynb b/docs/docs/integrations/text_embedding/google_generative_ai.ipynb
@@ -0,0 +1,220 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "afab8b36-10bb-4795-bc98-75ab2d2081bb",
+   "metadata": {},
+   "source": [
+    "# Google Generative AI Embeddings\n",
+    "\n",
+    "Connect to Google's generative AI embeddings service using the `GoogleGenerativeAIEmbeddings` class, found in the [langchain-google-genai](https://pypi.org/project/langchain-google-genai/) package."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "63545b38-9d56-4312-8f61-8d4f1e7a3b1b",
+   "metadata": {},
+   "source": [
+    "## Installation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d2f6a3cd-379f-4dff-a449-d3a9f3196f2a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install -U langchain-google-genai"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "25f3f88e-164e-400d-b371-9fa488baba19",
+   "metadata": {},
+   "source": [
+    "## Credentials"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ec89153f-8999-4aab-a21b-0bfba1cc3893",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import getpass\n",
+    "import os\n",
+    "\n",
+    "if \"GOOGLE_API_KEY\" not in os.environ:\n",
+    "    os.environ[\"GOOGLE_API_KEY\"] = getpass(\"Provide your Google API key here\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f2437b22-e364-418a-8c13-490a026cb7b5",
+   "metadata": {},
+   "source": [
+    "## Usage"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "eedc551e-a1f3-4fd8-8d65-4e0784c4441b",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[0.05636945, 0.0048285457, -0.0762591, -0.023642512, 0.05329321]"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from langchain_google_genai import GoogleGenerativeAIEmbeddings\n",
+    "\n",
+    "embeddings = GoogleGenerativeAIEmbeddings(model=\"models/embedding-001\")\n",
+    "vector = embeddings.embed_query(\"hello, world!\")\n",
+    "vector[:5]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b2bed60-e7bd-4e48-83d6-1c87001f98bd",
+   "metadata": {},
+   "source": [
+    "## Batch\n",
+    "\n",
+    "You can also embed multiple strings at once for a processing speedup:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "6ec53aba-404f-4778-acd9-5d6664e79ed2",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(3, 768)"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "vectors = embeddings.embed_documents(\n",
+    "    [\n",
+    "        \"Today is Monday\",\n",
+    "        \"Today is Tuesday\",\n",
+    "        \"Today is April Fools day\",\n",
+    "    ]\n",
+    ")\n",
+    "len(vectors), len(vectors[0])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1482486f-5617-498a-8a44-1974d3212dda",
+   "metadata": {},
+   "source": [
+    "## Task type\n",
+    "`GoogleGenerativeAIEmbeddings` optionally support a `task_type`, which currently must be one of:\n",
+    "\n",
+    "- task_type_unspecified\n",
+    "- retrieval_query\n",
+    "- retrieval_document\n",
+    "- semantic_similarity\n",
+    "- classification\n",
+    "- clustering\n",
+    "\n",
+    "By default, we use `retrieval_document` in the `embed_documents` method and `retrieval_query` in the `embed_query` method. If you provide a task type, we will use that for all methods."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "a223bb25-2b1b-418e-a570-2f543083132e",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Note: you may need to restart the kernel to use updated packages.\n"
+     ]
+    }
+   ],
+   "source": [
+    "%pip install --quiet matplotlib scikit-learn"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 33,
+   "id": "f1f077db-8eb4-49f7-8866-471a8528dcdb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "query_embeddings = GoogleGenerativeAIEmbeddings(\n",
+    "    model=\"models/embedding-001\", task_type=\"retrieval_query\"\n",
+    ")\n",
+    "doc_embeddings = GoogleGenerativeAIEmbeddings(\n",
+    "    model=\"models/embedding-001\", task_type=\"retrieval_document\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "79bd4a5e-75ba-413c-befa-86167c938caf",
+   "metadata": {},
+   "source": [
+    "All of these will be embedded with the 'retrieval_query' task set\n",
+    "```python\n",
+    "query_vecs = [query_embeddings.embed_query(q) for q in [query, query_2, answer_1]]\n",
+    "```\n",
+    "All of these will be embedded with the 'retrieval_document' task set\n",
+    "```python\n",
+    "doc_vecs = [doc_embeddings.embed_query(q) for q in [query, query_2, answer_1]]\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9e1fae5e-0f84-4812-89f5-7d4d71affbc1",
+   "metadata": {},
+   "source": [
+    "In retrieval, relative distance matters. In the image above, you can see the difference in similarity scores between the \"relevant doc\" and \"simil stronger delta between the similar query and relevant doc on the latter case."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/libs/partners/google-genai/README.md b/libs/partners/google-genai/README.md
@@ -56,3 +56,16 @@ The value of `image_url` can be any of the following:
 - A local file path
 - A base64 encoded image (e.g., `data:image/png;base64,abcd124`)
 - A PIL image
+
+
+
+## Embeddings
+
+This package also adds support for google's embeddings models.
+
+```
+from langchain_google_genai import GoogleGenerativeAIEmbeddings
+
+embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
+embeddings.embed_query("hello, world!")
+```
diff --git a/libs/partners/google-genai/langchain_google_genai/__init__.py b/libs/partners/google-genai/langchain_google_genai/__init__.py
@@ -1,3 +1,46 @@
+"""**LangChain Google Generative AI Integration**
+
+This module integrates Google's Generative AI models, specifically the Gemini series, with the LangChain framework. It provides classes for interacting with chat models and generating embeddings, leveraging Google's advanced AI capabilities.
+
+**Chat Models**
+
+The `ChatGoogleGenerativeAI` class is the primary interface for interacting with Google's Gemini chat models. It allows users to send and receive messages using a specified Gemini model, suitable for various conversational AI applications.
+
+**Embeddings**
+
+The `GoogleGenerativeAIEmbeddings` class provides functionalities to generate embeddings using Google's models.
+These embeddings can be used for a range of NLP tasks, including semantic analysis, similarity comparisons, and more.
+
+**Installation**
+
+To install the package, use pip:
+
+```python
+pip install -U langchain-google-genai
+```
+## Using Chat Models
+
+After setting up your environment with the required API key, you can interact with the Google Gemini models.
+
+```python
+from langchain_google_genai import ChatGoogleGenerativeAI
+
+llm = ChatGoogleGenerativeAI(model="gemini-pro")
+llm.invoke("Sing a ballad of LangChain.")
+```
+
+## Embedding Generation
+
+The package also supports creating embeddings with Google's models, useful for textual similarity and other NLP applications.
+
+```python
+from langchain_google_genai import GoogleGenerativeAIEmbeddings
+
+embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
+embeddings.embed_query("hello, world!")
+```
+"""  # noqa: E501
 from langchain_google_genai.chat_models import ChatGoogleGenerativeAI
+from langchain_google_genai.embeddings import GoogleGenerativeAIEmbeddings
 
-__all__ = ["ChatGoogleGenerativeAI"]
+__all__ = ["ChatGoogleGenerativeAI", "GoogleGenerativeAIEmbeddings"]
diff --git a/libs/partners/google-genai/langchain_google_genai/_common.py b/libs/partners/google-genai/langchain_google_genai/_common.py
@@ -0,0 +1,4 @@
+class GoogleGenerativeAIError(Exception):
+    """
+    Custom exception class for errors associated with the `Google GenAI` API.
+    """