RAG-based Question Answering

The exercise introduces modern approaches to Question Answering using Retrieval Augmented Generation (RAG) with LLMs and vector databases.

Tasks

Objectives (8 points):

Set up the QA environment:
- Install OLLAMA and select an appropriate LLM
- Configure Qdrant vector database (or vector DB of your choosing)
- Install necessary Python packages for embedding generation
Find PDF file of your choosing. Example - some publication or CV file:
Write next procedures necessary for RAG pipeline. Use LangChain library:
- Load PDF file using PyPDFLoader.
- Split documents into appropriate chunks using RecursiveCharacterTextSplitter.
- Generate and store embeddings in Qdrant database
Design and implement the RAG pipeline with LCEL. As reference use this detailed guide created by LangChain community - RAG. Next steps should involve:
- Create query embedding generation
- Implement semantic search in Qdrant
- Design prompt templates for context integration
- Build response generation with the LLM

Hint: You don't need to build it from scratch. A lot of this steps is already automated using LCEL pipeline definition.

Implement basic retrieval strategies (semantic search).
Create basic QA prompt.
Determine 5 evaluation queries:
- Determine a few questions, which answers are confirmed by you.
Compare performance of RAG vs. pure LLM response.

Questions (2 points):

How does RAG improve the quality and reliability of LLM responses compared to pure LLM generation?
What are the key factors affecting RAG performance (chunk size, embedding quality, prompt design)?
How does the choice of vector database and embedding model impact system performance?
What are the main challenges in implementing a production-ready RAG system?
How can the system be improved to handle complex queries requiring multiple document lookups?