-
Notifications
You must be signed in to change notification settings - Fork 16k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Partner] Gemini Embeddings (#14690)
Add support for Gemini embeddings in the langchain-google-genai package
- Loading branch information
Showing
13 changed files
with
606 additions
and
55 deletions.
There are no files selected for viewing
220 changes: 220 additions & 0 deletions
220
docs/docs/integrations/text_embedding/google_generative_ai.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,220 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "afab8b36-10bb-4795-bc98-75ab2d2081bb", | ||
"metadata": {}, | ||
"source": [ | ||
"# Google Generative AI Embeddings\n", | ||
"\n", | ||
"Connect to Google's generative AI embeddings service using the `GoogleGenerativeAIEmbeddings` class, found in the [langchain-google-genai](https://pypi.org/project/langchain-google-genai/) package." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "63545b38-9d56-4312-8f61-8d4f1e7a3b1b", | ||
"metadata": {}, | ||
"source": [ | ||
"## Installation" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "d2f6a3cd-379f-4dff-a449-d3a9f3196f2a", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%pip install -U langchain-google-genai" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "25f3f88e-164e-400d-b371-9fa488baba19", | ||
"metadata": {}, | ||
"source": [ | ||
"## Credentials" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "ec89153f-8999-4aab-a21b-0bfba1cc3893", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import getpass\n", | ||
"import os\n", | ||
"\n", | ||
"if \"GOOGLE_API_KEY\" not in os.environ:\n", | ||
" os.environ[\"GOOGLE_API_KEY\"] = getpass(\"Provide your Google API key here\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "f2437b22-e364-418a-8c13-490a026cb7b5", | ||
"metadata": {}, | ||
"source": [ | ||
"## Usage" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 6, | ||
"id": "eedc551e-a1f3-4fd8-8d65-4e0784c4441b", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"[0.05636945, 0.0048285457, -0.0762591, -0.023642512, 0.05329321]" | ||
] | ||
}, | ||
"execution_count": 6, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"from langchain_google_genai import GoogleGenerativeAIEmbeddings\n", | ||
"\n", | ||
"embeddings = GoogleGenerativeAIEmbeddings(model=\"models/embedding-001\")\n", | ||
"vector = embeddings.embed_query(\"hello, world!\")\n", | ||
"vector[:5]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "2b2bed60-e7bd-4e48-83d6-1c87001f98bd", | ||
"metadata": {}, | ||
"source": [ | ||
"## Batch\n", | ||
"\n", | ||
"You can also embed multiple strings at once for a processing speedup:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 7, | ||
"id": "6ec53aba-404f-4778-acd9-5d6664e79ed2", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"(3, 768)" | ||
] | ||
}, | ||
"execution_count": 7, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"vectors = embeddings.embed_documents(\n", | ||
" [\n", | ||
" \"Today is Monday\",\n", | ||
" \"Today is Tuesday\",\n", | ||
" \"Today is April Fools day\",\n", | ||
" ]\n", | ||
")\n", | ||
"len(vectors), len(vectors[0])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "1482486f-5617-498a-8a44-1974d3212dda", | ||
"metadata": {}, | ||
"source": [ | ||
"## Task type\n", | ||
"`GoogleGenerativeAIEmbeddings` optionally support a `task_type`, which currently must be one of:\n", | ||
"\n", | ||
"- task_type_unspecified\n", | ||
"- retrieval_query\n", | ||
"- retrieval_document\n", | ||
"- semantic_similarity\n", | ||
"- classification\n", | ||
"- clustering\n", | ||
"\n", | ||
"By default, we use `retrieval_document` in the `embed_documents` method and `retrieval_query` in the `embed_query` method. If you provide a task type, we will use that for all methods." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 15, | ||
"id": "a223bb25-2b1b-418e-a570-2f543083132e", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Note: you may need to restart the kernel to use updated packages.\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"%pip install --quiet matplotlib scikit-learn" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 33, | ||
"id": "f1f077db-8eb4-49f7-8866-471a8528dcdb", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"query_embeddings = GoogleGenerativeAIEmbeddings(\n", | ||
" model=\"models/embedding-001\", task_type=\"retrieval_query\"\n", | ||
")\n", | ||
"doc_embeddings = GoogleGenerativeAIEmbeddings(\n", | ||
" model=\"models/embedding-001\", task_type=\"retrieval_document\"\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "79bd4a5e-75ba-413c-befa-86167c938caf", | ||
"metadata": {}, | ||
"source": [ | ||
"All of these will be embedded with the 'retrieval_query' task set\n", | ||
"```python\n", | ||
"query_vecs = [query_embeddings.embed_query(q) for q in [query, query_2, answer_1]]\n", | ||
"```\n", | ||
"All of these will be embedded with the 'retrieval_document' task set\n", | ||
"```python\n", | ||
"doc_vecs = [doc_embeddings.embed_query(q) for q in [query, query_2, answer_1]]\n", | ||
"```" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "9e1fae5e-0f84-4812-89f5-7d4d71affbc1", | ||
"metadata": {}, | ||
"source": [ | ||
"In retrieval, relative distance matters. In the image above, you can see the difference in similarity scores between the \"relevant doc\" and \"simil stronger delta between the similar query and relevant doc on the latter case." | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.9.1" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
45 changes: 44 additions & 1 deletion
45
libs/partners/google-genai/langchain_google_genai/__init__.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,46 @@ | ||
"""**LangChain Google Generative AI Integration** | ||
This module integrates Google's Generative AI models, specifically the Gemini series, with the LangChain framework. It provides classes for interacting with chat models and generating embeddings, leveraging Google's advanced AI capabilities. | ||
**Chat Models** | ||
The `ChatGoogleGenerativeAI` class is the primary interface for interacting with Google's Gemini chat models. It allows users to send and receive messages using a specified Gemini model, suitable for various conversational AI applications. | ||
**Embeddings** | ||
The `GoogleGenerativeAIEmbeddings` class provides functionalities to generate embeddings using Google's models. | ||
These embeddings can be used for a range of NLP tasks, including semantic analysis, similarity comparisons, and more. | ||
**Installation** | ||
To install the package, use pip: | ||
```python | ||
pip install -U langchain-google-genai | ||
``` | ||
## Using Chat Models | ||
After setting up your environment with the required API key, you can interact with the Google Gemini models. | ||
```python | ||
from langchain_google_genai import ChatGoogleGenerativeAI | ||
llm = ChatGoogleGenerativeAI(model="gemini-pro") | ||
llm.invoke("Sing a ballad of LangChain.") | ||
``` | ||
## Embedding Generation | ||
The package also supports creating embeddings with Google's models, useful for textual similarity and other NLP applications. | ||
```python | ||
from langchain_google_genai import GoogleGenerativeAIEmbeddings | ||
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001") | ||
embeddings.embed_query("hello, world!") | ||
``` | ||
""" # noqa: E501 | ||
from langchain_google_genai.chat_models import ChatGoogleGenerativeAI | ||
from langchain_google_genai.embeddings import GoogleGenerativeAIEmbeddings | ||
|
||
__all__ = ["ChatGoogleGenerativeAI"] | ||
__all__ = ["ChatGoogleGenerativeAI", "GoogleGenerativeAIEmbeddings"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
class GoogleGenerativeAIError(Exception): | ||
""" | ||
Custom exception class for errors associated with the `Google GenAI` API. | ||
""" |
Oops, something went wrong.