Simple RAG Server Chatbot

A sophisticated chatbot implementation using Retrieval-Augmented Generation (RAG) with FastAPI, MongoDB, and Hugging Face models. This chatbot can process multiple document types, maintain conversation history, and provide context-aware responses.

Features

Document Processing
- Support for multiple file uploads
- Automatic document type detection
- Chunked text processing with configurable size
- Vector embeddings for efficient retrieval
Chat Capabilities
- Context-aware conversations
- Session management
- History-aware retrieval system
- Concise, three-sentence maximum responses
Technical Stack
- FastAPI for the backend API
- MongoDB with vector search capabilities
- Hugging Face models for embeddings and chat
- LangChain for chain management

Models Used

Embedding Model: sentence-transformers/all-mpnet-base-v2
- Runs locally
- Used for generating document embeddings
- Downloads automatically to models directory
LLM: TinyLlama/TinyLlama-1.1B-Chat-v1.0
- Accessed through Hugging Face Hub Inference API
- Used for generating responses

Setup Instructions

Install Dependencies
```
pip install -r requirements.txt
```
Configuration
- Rename config.env.sample to config.env
- Fill in the required environment variables
MongoDB Setup

There's an unexpected issue with mongodb (check this) that wouldn't let us create index programmatically, so we need to create a vector search index manually through atlas console. Follow this guide to create the index.
- Create a vector search index manually through the Atlas console
- Use the following configuration for the vector_store collection:
```
 {
     "type": "vector",
     "path": "embedding",
     "numDimensions": 768,
     "similarity": "cosine"
 }
```

Usage

Start the Server

python ./main.py

Access the API by navigating to http://localhost:8080/docs for Swagger documentation

Configuration

Key configurations in defaults.py:

SPLITTER_CHUNK_SIZE: 400
SPLITTER_CHUNK_OVERLAP: 25
RETRIEVER_K_PARAM: 4
MAX_READ_LINES_FOR_TEXT_FILE: 40

Performance Note

The system may experience some latency due to:

Initial model download and loading
Document processing time
Inference API response time

On my side, it took more than 2 mins exactly to add a document of 45+ pages to vector store and almost 1 min to process the messages with LLM.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
chatbot		chatbot
.gitignore		.gitignore
CHANGES.md		CHANGES.md
README.md		README.md
config.env.sample		config.env.sample
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple RAG Server Chatbot

Features

Models Used

Setup Instructions

Usage

Configuration

Performance Note

About

Languages

Harsh-br0/chatbot

Folders and files

Latest commit

History

Repository files navigation

Simple RAG Server Chatbot

Features

Models Used

Setup Instructions

Usage

Configuration

Performance Note

About

Resources

Stars

Watchers

Forks

Languages