Skip to content

Commit

Permalink
Merge branch 'master' into master
Browse files Browse the repository at this point in the history
  • Loading branch information
MartinChen1973 authored Jan 18, 2025
2 parents 2f6dabc + 1cd4d8d commit e4747f5
Show file tree
Hide file tree
Showing 100 changed files with 2,604 additions and 5,020 deletions.
24 changes: 17 additions & 7 deletions .github/scripts/prep_api_docs_build.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,19 +64,29 @@ def main():
try:
# Load packages configuration
package_yaml = load_packages_yaml()
packages = [

# Clean target directories
clean_target_directories([
p
for p in package_yaml["packages"]
if p["repo"].startswith("langchain-ai/")
and p["repo"] != "langchain-ai/langchain"
])

# Move libraries to their new locations
move_libraries([
p
for p in package_yaml["packages"]
if not p.get("disabled", False)
and p["repo"].startswith("langchain-ai/")
and p["repo"] != "langchain-ai/langchain"
]
])

# Clean target directories
clean_target_directories(packages)

# Move libraries to their new locations
move_libraries(packages)
# Delete ones without a pyproject.toml
for partner in Path("langchain/libs/partners").iterdir():
if partner.is_dir() and not (partner / "pyproject.toml").exists():
print(f"Removing {partner} as it does not have a pyproject.toml")
shutil.rmtree(partner)

print("Library sync completed successfully!")

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/api_doc_build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ jobs:
- name: Install dependencies
working-directory: langchain
run: |
python -m uv pip install $(ls ./libs/partners | xargs -I {} echo "./libs/partners/{}")
python -m uv pip install $(ls ./libs/partners | xargs -I {} echo "./libs/partners/{}") --overrides ./docs/vercel_overrides.txt
python -m uv pip install libs/core libs/langchain libs/text-splitters libs/community libs/experimental libs/standard-tests
python -m uv pip install -r docs/api_reference/requirements.txt
Expand Down
2 changes: 1 addition & 1 deletion cookbook/mongodb-langchain-cache-memory.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@
"metadata": {},
"outputs": [],
"source": [
"# Ensure you have an HF_TOKEN in your development enviornment:\n",
"# Ensure you have an HF_TOKEN in your development environment:\n",
"# access tokens can be created or copied from the Hugging Face platform (https://huggingface.co/docs/hub/en/security-tokens)\n",
"\n",
"# Load MongoDB's embedded_movies dataset from Hugging Face\n",
Expand Down
2 changes: 1 addition & 1 deletion docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ install-py-deps:
$(PYTHON) -m pip install -q --upgrade pip
$(PYTHON) -m pip install -q --upgrade uv
$(PYTHON) -m uv pip install -q --pre -r vercel_requirements.txt
$(PYTHON) -m uv pip install -q --pre $$($(PYTHON) scripts/partner_deps_list.py)
$(PYTHON) -m uv pip install -q --pre $$($(PYTHON) scripts/partner_deps_list.py) --overrides vercel_overrides.txt

generate-files:
mkdir -p $(INTERMEDIATE_DIR)
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/concepts/retrievers.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ LangChain has retrievers for many popular lexical search algorithms / engines.
### Vector store

[Vector stores](/docs/concepts/vectorstores/) are a powerful and efficient way to index and retrieve unstructured data.
An vectorstore can be used as a retriever by calling the `as_retriever()` method.
A vectorstore can be used as a retriever by calling the `as_retriever()` method.

```python
vectorstore = MyVectorStore()
Expand Down
6 changes: 3 additions & 3 deletions docs/docs/concepts/vectorstores.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -151,10 +151,10 @@ Many vectorstores support [the `k`](/docs/integrations/vectorstores/pinecone/#qu
### Metadata filtering

While vectorstore implement a search algorithm to efficiently search over *all* the embedded documents to find the most similar ones, many also support filtering on metadata.
This allows structured filters to reduce the size of the similarity search space. These two concepts work well together:
Metadata filtering helps narrow down the search by applying specific conditions such as retrieving documents from a particular source or date range. These two concepts work well together:

1. **Semantic search**: Query the unstructured data directly, often using via embedding or keyword similarity.
2. **Metadata search**: Apply structured query to the metadata, filering specific documents.
1. **Semantic search**: Query the unstructured data directly, often via embedding or keyword similarity.
2. **Metadata search**: Apply structured query to the metadata, filtering specific documents.

Vector store support for metadata filtering is typically dependent on the underlying vector store implementation.

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/how_to/document_loader_markdown.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
"- Basic usage;\n",
"- Parsing of Markdown into elements such as titles, list items, and text.\n",
"\n",
"LangChain implements an [UnstructuredMarkdownLoader](https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.markdown.UnstructuredMarkdownLoader.html) object which requires the [Unstructured](https://unstructured-io.github.io/unstructured/) package. First we install it:"
"LangChain implements an [UnstructuredMarkdownLoader](https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.markdown.UnstructuredMarkdownLoader.html) object which requires the [Unstructured](https://docs.unstructured.io/welcome/) package. First we install it:"
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/how_to/tool_results_pass_to_model.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
"\n",
":::\n",
"\n",
"Some models are capable of [**tool calling**](/docs/concepts/tool_calling) - generating arguments that conform to a specific user-provided schema. This guide will demonstrate how to use those tool cals to actually call a function and properly pass the results back to the model.\n",
"Some models are capable of [**tool calling**](/docs/concepts/tool_calling) - generating arguments that conform to a specific user-provided schema. This guide will demonstrate how to use those tool calls to actually call a function and properly pass the results back to the model.\n",
"\n",
"![Diagram of a tool call invocation](/img/tool_invocation.png)\n",
"\n",
Expand Down
221 changes: 221 additions & 0 deletions docs/docs/integrations/document_loaders/hyperbrowser.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,221 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# HyperbrowserLoader"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[Hyperbrowser](https://hyperbrowser.ai) is a platform for running and scaling headless browsers. It lets you launch and manage browser sessions at scale and provides easy to use solutions for any webscraping needs, such as scraping a single page or crawling an entire site.\n",
"\n",
"Key Features:\n",
"- Instant Scalability - Spin up hundreds of browser sessions in seconds without infrastructure headaches\n",
"- Simple Integration - Works seamlessly with popular tools like Puppeteer and Playwright\n",
"- Powerful APIs - Easy to use APIs for scraping/crawling any site, and much more\n",
"- Bypass Anti-Bot Measures - Built-in stealth mode, ad blocking, automatic CAPTCHA solving, and rotating proxies\n",
"\n",
"This notebook provides a quick overview for getting started with Hyperbrowser [document loader](https://python.langchain.com/docs/concepts/#document-loaders).\n",
"\n",
"For more information about Hyperbrowser, please visit the [Hyperbrowser website](https://hyperbrowser.ai) or if you want to check out the docs, you can visit the [Hyperbrowser docs](https://docs.hyperbrowser.ai).\n",
"\n",
"## Overview\n",
"### Integration details\n",
"\n",
"| Class | Package | Local | Serializable | JS support|\n",
"| :--- | :--- | :---: | :---: | :---: |\n",
"| HyperbrowserLoader | langchain-hyperbrowser | ❌ | ❌ | ❌ | \n",
"### Loader features\n",
"| Source | Document Lazy Loading | Native Async Support |\n",
"| :---: | :---: | :---: | \n",
"| HyperbrowserLoader | ✅ | ✅ | \n",
"\n",
"## Setup\n",
"\n",
"To access Hyperbrowser document loader you'll need to install the `langchain-hyperbrowser` integration package, and create a Hyperbrowser account and get an API key.\n",
"\n",
"### Credentials\n",
"\n",
"Head to [Hyperbrowser](https://app.hyperbrowser.ai/) to sign up and generate an API key. Once you've done this set the HYPERBROWSER_API_KEY environment variable:\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Installation\n",
"\n",
"Install **langchain-hyperbrowser**."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain-hyperbrowser"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialization\n",
"\n",
"Now we can instantiate our model object and load documents:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_hyperbrowser import HyperbrowserLoader\n",
"\n",
"loader = HyperbrowserLoader(\n",
" urls=\"https://example.com\",\n",
" api_key=\"YOUR_API_KEY\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(metadata={'title': 'Example Domain', 'viewport': 'width=device-width, initial-scale=1', 'sourceURL': 'https://example.com'}, page_content='Example Domain\\n\\n# Example Domain\\n\\nThis domain is for use in illustrative examples in documents. You may use this\\ndomain in literature without prior coordination or asking for permission.\\n\\n[More information...](https://www.iana.org/domains/example)')"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs = loader.load()\n",
"docs[0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(docs[0].metadata)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Lazy Load"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"page = []\n",
"for doc in loader.lazy_load():\n",
" page.append(doc)\n",
" if len(page) >= 10:\n",
" # do some paged operation, e.g.\n",
" # index.upsert(page)\n",
"\n",
" page = []"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Advanced Usage\n",
"\n",
"You can specify the operation to be performed by the loader. The default operation is `scrape`. For `scrape`, you can provide a single URL or a list of URLs to be scraped. For `crawl`, you can only provide a single URL. The `crawl` operation will crawl the provided page and subpages and return a document for each page."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"loader = HyperbrowserLoader(\n",
" urls=\"https://hyperbrowser.ai\", api_key=\"YOUR_API_KEY\", operation=\"crawl\"\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Optional params for the loader can also be provided in the `params` argument. For more information on the supported params, visit https://docs.hyperbrowser.ai/reference/sdks/python/scrape#start-scrape-job-and-wait or https://docs.hyperbrowser.ai/reference/sdks/python/crawl#start-crawl-job-and-wait."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"loader = HyperbrowserLoader(\n",
" urls=\"https://example.com\",\n",
" api_key=\"YOUR_API_KEY\",\n",
" operation=\"scrape\",\n",
" params={\"scrape_options\": {\"include_tags\": [\"h1\", \"h2\", \"p\"]}},\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"- [GitHub](https://github.com/hyperbrowserai/langchain-hyperbrowser/)\n",
"- [PyPi](https://pypi.org/project/langchain-hyperbrowser/)\n",
"- [Hyperbrowser Docs](https://docs.hyperbrowser.ai/)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Loading

0 comments on commit e4747f5

Please sign in to comment.