Skip to content

pyrotank41/FinanceEscroAIAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Escrow 1024.17 Doc Chat Assistant

Simplified chat assistant design diagram

Design Diagram

How to interact with the Escrow 1024.17 Doc Chat Assistant

Set the environment variables via .env file

Create a .env file in the root of the repo and set the following environment variables:

OPENAI_API_KEY=<your openai api key>
GROQ_API_KEY=<your groq api key> # required if you are running prompt evals, as the evals uses Mixtral-8b model from groq api to evaluate the prompts against openai's gpt-3.5-turbo model

the app will use the api keys to interact with the openai api and the groq api using this .env file, you do not need to export it manually.

Run the chat assistant

Easiest way to interact with the chat assistant is via the strreamlit app. To run the app, without worrying about the dependencies, you can use the docker.
To build and run our app in docker container, we use docker compose.

From the root of the repo run the following command:

docker compose up --build

The comand above will bild the container and run the streamlit app. The app will be available at http://localhost:8501 in your browser.

Please check the terminal for logs and errors if any.

NOTE: It will take time to build the container as well as for the streamlit app to start. As the embedding model is large, it will take time to load the model, create index and start the app. (To save time and resources the vector database is not hosted on the cloud, it gets created on the fly when the app starts. This is not the best practice and should be avoided in production.)

You can run the app without Docker as well.

First install the dependencies

$ poetry install #first install the dependencies
$ poetry shell #activate the virtual environment
Chat Via Terminal
$ python chat_assistant.py 

running via terminal

Chat via Streamlit App
$ streamlit run app.py 

running via streamlit

Escrow 1024.17 data preparation

Please refer to the escrow_data_retriever.ipynb notebook for the data preparation steps. The notebook explains how the data was retrieved from the Escrow 1024.17 website and how the data was preprocessed to create the dataset for RAG indexingl.

RAG evaluation data

Final rag eval results: RAG evaluation results
I ran a experiments on Direct RAG along with advanced retrieval methods like Sentence Window and Auto-Merging Retrival. Furthermore, experiments included prameter variance as well to find the best configuration for Retrueving Escrow 1024.17 documents. The evaluations was run on these queries. The results are as follows:

RAG Evaluation

The best configuration (good balance of answer and context relevance, and groundedness) was found to be:

  • Sentence retrieval window: 1
  • chunk size: 128
  • effective retrieved context length (node): 384 characters

Notes: although not the cheapest configuration, it was the most effective in terms of Groundedness, answer relevance and context relevance.

Prompt Template

The prompt template is created as follows (learnt from the papers below):

# Role
(Role-play prompting is an effective stratigy where we assign a specific role to play during the interaction. This helps the model to "immerse" itself in the role and provide more accurate and relevant answers) ref 1

# Task
(a direct description of what we want the model to do. One technique that works well is to use chain-of-thought prompting ref 2 to guide the model through the task)

# Specifics (provide most inportant notes regarding the task. Integrating Emotional Stimuli ref 3 has showin to increase response quality and accuracy)

# Context
(what is the environment in which the task is to be performed. Fairness-guided Few-shot Prompting ref 4 has shown that providing context helps the model to understand the task better)

# Examples
(giving a few q/a pairs of example questions and answers can help the model understand the task better. This is a good practice to follow. Rethinking the Role of Demonstrations ref 6 explains this in detail)

# Notes
(Additional and repeted notes that can help the model do the task better . lost in the middle paper ref 7 shows that the llms are good at remembering the start and the end of the context better that the middle. so it is important to repeat the task and the context in the notes section briefly. Though newer models are better at finding needle in a hay stack, it is still a good practice to follow)

Prompt Engineering References

  1. Better Zero-Shot Reasoning with Role-Play Prompting
  2. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
  3. Large Language Models Understand and Can be Enhanced by Emotional Stimuli
  4. Fairness-guided Few-shot Prompting for Large Language Models
  5. Language Models are Few-Shot Learners
  6. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
  7. Lost in the Middle: How Language Models Use Long Contexts

Prompt Evaluation

We are using promptfoo for evaluating our prompts. To run the evaluation, first install promptfoo

$ bun add promptfoo # or npm install -g promptfoo

Then form the eval command from the root folder of this repo as follows:

$ cd prompt_eval_cloud
$ promptfoo eval

make sure the GROQ_API_KEY and OPENAI_API_KEY is set in the .env file as this eval evaluated models from the OpenAI API (gpt-3.5-turbo) and GROQ API(mixtral-8b).

Note: To save time and resources, the evaluation is not thurogh and only a few prompts are evaluated. prompt foo evaluation

To get the detailed view of the evalutaiton, run the following command:

$ promptfoo view -y

A new tab with the following view will open in your browser: prompt foo evaluation

fine-tuneing

Refer the following notebook on how the dataset was generated and the model.

Refer the following notebook for see how the model was finetuned on the Escrow 1024.17 documents.
(note: this notebook is a colab motebook and it was easy to run the experiments on the google cloud with powerful gpus.)

The finetuned gemma model is available at huggingface

Download the model and place it in the fine_tuned_model folder in the repo and from the root of the repo run the following command to interact create ollama model.

$ ollama create escrow_gemma -f ./ModelfileGemma

to interact with the model run the following command:

$ ollama run escrow_gemma:latest 

NOTE The model finetuning dataset consisted only the positive q/a pairs and no relevent context q/a, to get better performance we need to include negative q/a pairs as well along with some chat data. This will help the model to understand the context better and provide more accurate responses as intended for this application.

Fine-Tuned Model Evaluation

Please check out the link to see the evaluation of the fine-tuned model. The model was evaluated on these test and compared to open source models: 'llama3-8b' and 'gemma-8b'.

In the evaluation, the fine-tuned model is named 'escrow_gemma:latest'.

Deploy fine-tuned model to AWS (for future referance)

import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

try:
	role = sagemaker.get_execution_role()
except ValueError:
	iam = boto3.client('iam')
	role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'pyrotank41/gemma-7b-it-escrow-merged-gguf',
	'SM_NUM_GPUS': json.dumps(1)
}



# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	image_uri=get_huggingface_llm_image_uri("huggingface",version="1.4.2"),
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1,
	instance_type="ml.g5.2xlarge",
	container_startup_health_check_timeout=300,
  )
  
# send request
predictor.predict({
	"inputs": "What is the escrow 1024.17 document?",
})

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published