A sophisticated chatbot implementation using Retrieval-Augmented Generation (RAG) with FastAPI, MongoDB, and Hugging Face models. This chatbot can process multiple document types, maintain conversation history, and provide context-aware responses.
-
Document Processing
- Support for multiple file uploads
- Automatic document type detection
- Chunked text processing with configurable size
- Vector embeddings for efficient retrieval
-
Chat Capabilities
- Context-aware conversations
- Session management
- History-aware retrieval system
- Concise, three-sentence maximum responses
-
Technical Stack
- FastAPI for the backend API
- MongoDB with vector search capabilities
- Hugging Face models for embeddings and chat
- LangChain for chain management
-
Embedding Model:
sentence-transformers/all-mpnet-base-v2
- Runs locally
- Used for generating document embeddings
- Downloads automatically to
models
directory
-
LLM:
TinyLlama/TinyLlama-1.1B-Chat-v1.0
- Accessed through Hugging Face Hub Inference API
- Used for generating responses
-
Install Dependencies
pip install -r requirements.txt
-
Configuration
- Rename
config.env.sample
toconfig.env
- Fill in the required environment variables
- Rename
-
MongoDB Setup
There's an unexpected issue with mongodb (check this) that wouldn't let us create index programmatically, so we need to create a vector search index manually through atlas console. Follow this guide to create the index.
- Create a vector search index manually through the Atlas console
- Use the following configuration for the
vector_store
collection:
{ "type": "vector", "path": "embedding", "numDimensions": 768, "similarity": "cosine" }
- Start the Server
python ./main.py
- Access the API by navigating to http://localhost:8080/docs for Swagger documentation
Key configurations in defaults.py
:
SPLITTER_CHUNK_SIZE
: 400SPLITTER_CHUNK_OVERLAP
: 25RETRIEVER_K_PARAM
: 4MAX_READ_LINES_FOR_TEXT_FILE
: 40
The system may experience some latency due to:
- Initial model download and loading
- Document processing time
- Inference API response time
On my side, it took more than 2 mins exactly to add a document of 45+ pages to vector store and almost 1 min to process the messages with LLM.