-
Notifications
You must be signed in to change notification settings - Fork 16k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ibm: Add support for Embedding Models (#20647)
--------- Co-authored-by: Erick Friis <[email protected]>
- Loading branch information
1 parent
7380981
commit 75ffe51
Showing
9 changed files
with
804 additions
and
246 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
243 changes: 243 additions & 0 deletions
243
docs/docs/integrations/text_embedding/ibm_watsonx.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,243 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# IBM watsonx.ai\n", | ||
"\n", | ||
">WatsonxEmbeddings is a wrapper for IBM [watsonx.ai](https://www.ibm.com/products/watsonx-ai) foundation models.\n", | ||
"\n", | ||
"This example shows how to communicate with `watsonx.ai` models using `LangChain`." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Setting up\n", | ||
"\n", | ||
"Install the package `langchain-ibm`." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!pip install -qU langchain-ibm" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"This cell defines the WML credentials required to work with watsonx Embeddings.\n", | ||
"\n", | ||
"**Action:** Provide the IBM Cloud user API key. For details, see\n", | ||
"[documentation](https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import os\n", | ||
"from getpass import getpass\n", | ||
"\n", | ||
"watsonx_api_key = getpass()\n", | ||
"os.environ[\"WATSONX_APIKEY\"] = watsonx_api_key" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Additionaly you are able to pass additional secrets as an environment variable. " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import os\n", | ||
"\n", | ||
"os.environ[\"WATSONX_URL\"] = \"your service instance url\"\n", | ||
"os.environ[\"WATSONX_TOKEN\"] = \"your token for accessing the CPD cluster\"\n", | ||
"os.environ[\"WATSONX_PASSWORD\"] = \"your password for accessing the CPD cluster\"\n", | ||
"os.environ[\"WATSONX_USERNAME\"] = \"your username for accessing the CPD cluster\"\n", | ||
"os.environ[\"WATSONX_INSTANCE_ID\"] = \"your instance_id for accessing the CPD cluster\"" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Load the model\n", | ||
"\n", | ||
"You might need to adjust model `parameters` for different models." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames\n", | ||
"\n", | ||
"embed_params = {\n", | ||
" EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: 3,\n", | ||
" EmbedTextParamsMetaNames.RETURN_OPTIONS: {\"input_text\": True},\n", | ||
"}" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Initialize the `WatsonxEmbeddings` class with previously set parameters.\n", | ||
"\n", | ||
"\n", | ||
"**Note**: \n", | ||
"\n", | ||
"- To provide context for the API call, you must add `project_id` or `space_id`. For more information see [documentation](https://www.ibm.com/docs/en/watsonx-as-a-service?topic=projects).\n", | ||
"- Depending on the region of your provisioned service instance, use one of the urls described [here](https://ibm.github.io/watsonx-ai-python-sdk/setup_cloud.html#authentication).\n", | ||
"\n", | ||
"In this example, we’ll use the `project_id` and Dallas url.\n", | ||
"\n", | ||
"\n", | ||
"You need to specify `model_id` that will be used for inferencing." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from langchain_ibm import WatsonxEmbeddings\n", | ||
"\n", | ||
"watsonx_embedding = WatsonxEmbeddings(\n", | ||
" model_id=\"ibm/slate-125m-english-rtrvr\",\n", | ||
" url=\"https://us-south.ml.cloud.ibm.com\",\n", | ||
" project_id=\"PASTE YOUR PROJECT_ID HERE\",\n", | ||
" params=embed_params,\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Alternatively you can use Cloud Pak for Data credentials. For details, see [documentation](https://ibm.github.io/watsonx-ai-python-sdk/setup_cpd.html). " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"watsonx_embedding = WatsonxEmbeddings(\n", | ||
" model_id=\"ibm/slate-125m-english-rtrvr\",\n", | ||
" url=\"PASTE YOUR URL HERE\",\n", | ||
" username=\"PASTE YOUR USERNAME HERE\",\n", | ||
" password=\"PASTE YOUR PASSWORD HERE\",\n", | ||
" instance_id=\"openshift\",\n", | ||
" version=\"5.0\",\n", | ||
" project_id=\"PASTE YOUR PROJECT_ID HERE\",\n", | ||
" params=embed_params,\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Usage\n", | ||
"\n", | ||
"### Embed query" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"[0.0094472, -0.024981909, -0.026013248, -0.040483925, -0.057804465]" | ||
] | ||
}, | ||
"execution_count": 4, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"text = \"This is a test document.\"\n", | ||
"\n", | ||
"query_result = watsonx_embedding.embed_query(text)\n", | ||
"query_result[:5]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Embed documents" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 5, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"[0.009447193, -0.024981918, -0.026013244, -0.040483937, -0.057804447]" | ||
] | ||
}, | ||
"execution_count": 5, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"texts = [\"This is a content of the document\", \"This is another document\"]\n", | ||
"\n", | ||
"doc_result = watsonx_embedding.embed_documents(texts)\n", | ||
"doc_result[0][:5]" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "langchain", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.13" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
from langchain_ibm.embeddings import WatsonxEmbeddings | ||
from langchain_ibm.llms import WatsonxLLM | ||
|
||
__all__ = ["WatsonxLLM"] | ||
__all__ = ["WatsonxLLM", "WatsonxEmbeddings"] |
Oops, something went wrong.