DocChat is a Streamlit-based chatbot application that allows you to interact with PDF documents in a conversational manner. It leverages a locally stored sentence-transformer model for semantic search and a local GPT model for generating contextual responses.
- Upload PDFs: Easily upload PDF documents.
- Semantic Search: Search the document content using state-of-the-art embeddings.
- Conversational Responses: Ask questions and get meaningful responses from the PDF's content.
- Completely Offline: No API usage; models are stored and run locally.
Ensure you have Python 3.8 or later installed. Then, install the required dependencies:
pip install -r requirements.txt
Before running the application, download the necessary models locally by running the model-install.py
script:
python model-install.py
This script downloads and stores:
- The SentenceTransformer model for semantic search.
- The GPT-Neo model for conversational response generation.
Once the models are downloaded, run the Streamlit app to start DocChat:
streamlit run app.py
- On the main interface, click on the Upload PDF button to upload your document.
- Use the chat interface to ask questions about the uploaded PDF.
- Your messages and the bot's responses will appear in a conversational format.
- The app maintains a history of your conversation with the document for reference.
.
├── app.py # Main Streamlit application
├── model-install.py # Script to download and set up models locally
├── requirements.txt # List of required Python packages
├── README.md # Project documentation
├── models/ # Directory for storing downloaded models
-
Semantic Search:
- The PDF is parsed into chunks of text using the SentenceTransformer model for embeddings.
- FAISS (a similarity search library) is used to perform efficient nearest-neighbor searches.
-
Contextual Response Generation:
- Retrieved chunks are passed as context to the GPT model.
- GPT generates a concise response tailored to your query.
- Ensure sufficient disk space for storing models (~1GB).
- For better performance, larger GPT models like GPT-2 or GPT-J can be used.
- If you encounter errors, ensure all dependencies are correctly installed and the models are downloaded without interruptions.