Merge branch 'master' into master

langchain-ai · Jan 18, 2025 · e4747f5 · e4747f5
2 parents 2f6dabc + 1cd4d8d
commit e4747f5
Show file tree

Hide file tree

Showing 100 changed files with 2,604 additions and 5,020 deletions.
diff --git a/.github/scripts/prep_api_docs_build.py b/.github/scripts/prep_api_docs_build.py
@@ -64,19 +64,29 @@ def main():
     try:
         # Load packages configuration
         package_yaml = load_packages_yaml()
-        packages = [
+
+        # Clean target directories
+        clean_target_directories([
+            p
+            for p in package_yaml["packages"]
+            if p["repo"].startswith("langchain-ai/")
+            and p["repo"] != "langchain-ai/langchain"
+        ])
+
+        # Move libraries to their new locations
+        move_libraries([
             p
             for p in package_yaml["packages"]
             if not p.get("disabled", False)
             and p["repo"].startswith("langchain-ai/")
             and p["repo"] != "langchain-ai/langchain"
-        ]
+        ])
 
-        # Clean target directories
-        clean_target_directories(packages)
-
-        # Move libraries to their new locations
-        move_libraries(packages)
+        # Delete ones without a pyproject.toml
+        for partner in Path("langchain/libs/partners").iterdir():
+            if partner.is_dir() and not (partner / "pyproject.toml").exists():
+                print(f"Removing {partner} as it does not have a pyproject.toml")
+                shutil.rmtree(partner)
 
         print("Library sync completed successfully!")
 

diff --git a/.github/workflows/api_doc_build.yml b/.github/workflows/api_doc_build.yml
@@ -72,7 +72,7 @@ jobs:
       - name: Install dependencies
         working-directory: langchain
         run: |
-          python -m uv pip install $(ls ./libs/partners | xargs -I {} echo "./libs/partners/{}")
+          python -m uv pip install $(ls ./libs/partners | xargs -I {} echo "./libs/partners/{}") --overrides ./docs/vercel_overrides.txt
           python -m uv pip install libs/core libs/langchain libs/text-splitters libs/community libs/experimental libs/standard-tests
           python -m uv pip install -r docs/api_reference/requirements.txt
 

diff --git a/cookbook/mongodb-langchain-cache-memory.ipynb b/cookbook/mongodb-langchain-cache-memory.ipynb
@@ -156,7 +156,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Ensure you have an HF_TOKEN in your development enviornment:\n",
+    "# Ensure you have an HF_TOKEN in your development environment:\n",
     "# access tokens can be created or copied from the Hugging Face platform (https://huggingface.co/docs/hub/en/security-tokens)\n",
     "\n",
     "# Load MongoDB's embedded_movies dataset from Hugging Face\n",

diff --git a/docs/Makefile b/docs/Makefile
@@ -27,7 +27,7 @@ install-py-deps:
 	$(PYTHON) -m pip install -q --upgrade pip
 	$(PYTHON) -m pip install -q --upgrade uv
 	$(PYTHON) -m uv pip install -q --pre -r vercel_requirements.txt
-	$(PYTHON) -m uv pip install -q --pre $$($(PYTHON) scripts/partner_deps_list.py)
+	$(PYTHON) -m uv pip install -q --pre $$($(PYTHON) scripts/partner_deps_list.py) --overrides vercel_overrides.txt
 
 generate-files:
 	mkdir -p $(INTERMEDIATE_DIR)

diff --git a/docs/docs/concepts/retrievers.mdx b/docs/docs/concepts/retrievers.mdx
@@ -90,7 +90,7 @@ LangChain has retrievers for many popular lexical search algorithms / engines.
 ### Vector store 
 
 [Vector stores](/docs/concepts/vectorstores/) are a powerful and efficient way to index and retrieve unstructured data. 
-An vectorstore can be used as a retriever by calling the `as_retriever()` method.
+A vectorstore can be used as a retriever by calling the `as_retriever()` method.
 
 ```python
 vectorstore = MyVectorStore()

diff --git a/docs/docs/concepts/vectorstores.mdx b/docs/docs/concepts/vectorstores.mdx
@@ -151,10 +151,10 @@ Many vectorstores support [the `k`](/docs/integrations/vectorstores/pinecone/#qu
 ### Metadata filtering
 
 While vectorstore implement a search algorithm to efficiently search over *all* the embedded documents to find the most similar ones, many also support filtering on metadata.
-This allows structured filters to reduce the size of the similarity search space. These two concepts work well together:
+Metadata filtering helps narrow down the search by applying specific conditions such as retrieving documents from a particular source or date range. These two concepts work well together:
 
-1. **Semantic search**: Query the unstructured data directly, often using via embedding or keyword similarity.
-2. **Metadata search**: Apply structured query to the metadata, filering specific documents.
+1. **Semantic search**: Query the unstructured data directly, often via embedding or keyword similarity.
+2. **Metadata search**: Apply structured query to the metadata, filtering specific documents.
 
 Vector store support for metadata filtering is typically dependent on the underlying vector store implementation.
 

diff --git a/docs/docs/how_to/document_loader_markdown.ipynb b/docs/docs/how_to/document_loader_markdown.ipynb
@@ -16,7 +16,7 @@
     "- Basic usage;\n",
     "- Parsing of Markdown into elements such as titles, list items, and text.\n",
     "\n",
-    "LangChain implements an [UnstructuredMarkdownLoader](https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.markdown.UnstructuredMarkdownLoader.html) object which requires the [Unstructured](https://unstructured-io.github.io/unstructured/) package. First we install it:"
+    "LangChain implements an [UnstructuredMarkdownLoader](https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.markdown.UnstructuredMarkdownLoader.html) object which requires the [Unstructured](https://docs.unstructured.io/welcome/) package. First we install it:"
    ]
   },
   {

diff --git a/docs/docs/how_to/tool_results_pass_to_model.ipynb b/docs/docs/how_to/tool_results_pass_to_model.ipynb
@@ -16,7 +16,7 @@
     "\n",
     ":::\n",
     "\n",
-    "Some models are capable of [**tool calling**](/docs/concepts/tool_calling) - generating arguments that conform to a specific user-provided schema. This guide will demonstrate how to use those tool cals to actually call a function and properly pass the results back to the model.\n",
+    "Some models are capable of [**tool calling**](/docs/concepts/tool_calling) - generating arguments that conform to a specific user-provided schema. This guide will demonstrate how to use those tool calls to actually call a function and properly pass the results back to the model.\n",
     "\n",
     "![Diagram of a tool call invocation](/img/tool_invocation.png)\n",
     "\n",

diff --git a/docs/docs/integrations/document_loaders/hyperbrowser.ipynb b/docs/docs/integrations/document_loaders/hyperbrowser.ipynb
@@ -0,0 +1,221 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# HyperbrowserLoader"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "[Hyperbrowser](https://hyperbrowser.ai) is a platform for running and scaling headless browsers. It lets you launch and manage browser sessions at scale and provides easy to use solutions for any webscraping needs, such as scraping a single page or crawling an entire site.\n",
+    "\n",
+    "Key Features:\n",
+    "- Instant Scalability - Spin up hundreds of browser sessions in seconds without infrastructure headaches\n",
+    "- Simple Integration - Works seamlessly with popular tools like Puppeteer and Playwright\n",
+    "- Powerful APIs - Easy to use APIs for scraping/crawling any site, and much more\n",
+    "- Bypass Anti-Bot Measures - Built-in stealth mode, ad blocking, automatic CAPTCHA solving, and rotating proxies\n",
+    "\n",
+    "This notebook provides a quick overview for getting started with Hyperbrowser [document loader](https://python.langchain.com/docs/concepts/#document-loaders).\n",
+    "\n",
+    "For more information about Hyperbrowser, please visit the [Hyperbrowser website](https://hyperbrowser.ai) or if you want to check out the docs, you can visit the [Hyperbrowser docs](https://docs.hyperbrowser.ai).\n",
+    "\n",
+    "## Overview\n",
+    "### Integration details\n",
+    "\n",
+    "| Class | Package | Local | Serializable | JS support|\n",
+    "| :--- | :--- | :---: | :---: |  :---: |\n",
+    "| HyperbrowserLoader | langchain-hyperbrowser | ❌ | ❌ | ❌ | \n",
+    "### Loader features\n",
+    "| Source | Document Lazy Loading | Native Async Support |\n",
+    "| :---: | :---: | :---: | \n",
+    "| HyperbrowserLoader | ✅ | ✅ | \n",
+    "\n",
+    "## Setup\n",
+    "\n",
+    "To access Hyperbrowser document loader you'll need to install the `langchain-hyperbrowser` integration package, and create a Hyperbrowser account and get an API key.\n",
+    "\n",
+    "### Credentials\n",
+    "\n",
+    "Head to [Hyperbrowser](https://app.hyperbrowser.ai/) to sign up and generate an API key. Once you've done this set the HYPERBROWSER_API_KEY environment variable:\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Installation\n",
+    "\n",
+    "Install **langchain-hyperbrowser**."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install -qU langchain-hyperbrowser"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Initialization\n",
+    "\n",
+    "Now we can instantiate our model object and load documents:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_hyperbrowser import HyperbrowserLoader\n",
+    "\n",
+    "loader = HyperbrowserLoader(\n",
+    "    urls=\"https://example.com\",\n",
+    "    api_key=\"YOUR_API_KEY\",\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Load"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Document(metadata={'title': 'Example Domain', 'viewport': 'width=device-width, initial-scale=1', 'sourceURL': 'https://example.com'}, page_content='Example Domain\\n\\n# Example Domain\\n\\nThis domain is for use in illustrative examples in documents. You may use this\\ndomain in literature without prior coordination or asking for permission.\\n\\n[More information...](https://www.iana.org/domains/example)')"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "docs = loader.load()\n",
+    "docs[0]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(docs[0].metadata)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Lazy Load"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "page = []\n",
+    "for doc in loader.lazy_load():\n",
+    "    page.append(doc)\n",
+    "    if len(page) >= 10:\n",
+    "        # do some paged operation, e.g.\n",
+    "        # index.upsert(page)\n",
+    "\n",
+    "        page = []"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Advanced Usage\n",
+    "\n",
+    "You can specify the operation to be performed by the loader. The default operation is `scrape`. For `scrape`, you can provide a single URL or a list of URLs to be scraped. For `crawl`, you can only provide a single URL. The `crawl` operation will crawl the provided page and subpages and return a document for each page."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "loader = HyperbrowserLoader(\n",
+    "    urls=\"https://hyperbrowser.ai\", api_key=\"YOUR_API_KEY\", operation=\"crawl\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Optional params for the loader can also be provided in the `params` argument. For more information on the supported params, visit https://docs.hyperbrowser.ai/reference/sdks/python/scrape#start-scrape-job-and-wait or https://docs.hyperbrowser.ai/reference/sdks/python/crawl#start-crawl-job-and-wait."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "loader = HyperbrowserLoader(\n",
+    "    urls=\"https://example.com\",\n",
+    "    api_key=\"YOUR_API_KEY\",\n",
+    "    operation=\"scrape\",\n",
+    "    params={\"scrape_options\": {\"include_tags\": [\"h1\", \"h2\", \"p\"]}},\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## API reference\n",
+    "\n",
+    "- [GitHub](https://github.com/hyperbrowserai/langchain-hyperbrowser/)\n",
+    "- [PyPi](https://pypi.org/project/langchain-hyperbrowser/)\n",
+    "- [Hyperbrowser Docs](https://docs.hyperbrowser.ai/)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.16"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}