Most up-to-date AI & Gen AI coaching for senior IT professionals
RAG on an Excel sheet using LlamaParse and GPT4-o-mini
By now, many of us know how to perform RAG (Retrieval Augmented Generation) on a PDF. The RAG pipeline consists of loading the document first and then dividing the large document into smaller chunks.
Embeddings are created of these chunks and stored in a vector database.
When a user query comes, a semantic search is applied to the database and the relevant chunks are retrieved.
The chunks are sent to the LLM along with the user query for it to generate the answer.
Now the questions are
- How can we perform the same operations on an Excel sheet?
- How can we load the data?
- How can we divide the Excel sheet into smaller chunks?
Namaste and Welcome to Build It Yourself.
In this tutorial, we will talk about how to perform RAG on an Excel sheet using LlamaParse and GPT4-o-mini
If you are a senior It professional and looking to learn AI + LLM in a simple language, check out the most up-to-date courses and other details – https://www.aimletc.com/online-instructor-led-ai-llm-coaching-for-it-technical-professionals/
Code for RAG on an Excel sheet using LlamaParse and GPT4-o-mini
You can find the code notebook here – https://github.com/tayaln/RAG-on-an-excel-sheet-using-GPT4-o-mini
To use it, download and then upload it to your Google Drive and open it as a Google Colab file.
So, how do we do to?
We will use 2 main classes – LlamaParse and MarkdownElementNodeParser.
LlamaParse will parse the excel sheet and load its data
MarkdownElementNodeParser will do the heavy lifting. This class will first divide the data into small nodes and then it will index these nodes for fast retrieval.
Let us see this in detail in code walkthrough below:
Pre-requisite
– An Open Mind to learn new things
– OpenAI API Key
– LlamaIndex Cloud API (Get it from here – https://cloud.llamaindex.ai/)
Code Walkthrough
Step 1 – Install Llama-Index and Llama-Parse.
Step 2 – Now let us see what classes we need to perform RAG on an Excel sheet.
nest_asyncio – to let LlamaParse work asynchronously
OpenAI – as we are using its model
VectorStoreIndex – to store the embeddings we will create
Image – to display images in Google Colab
Markdown – to display excel data in markdown format
LlamaParse – to parse the excel sheet
MarkdownElementNodeParser – to divide the data into nodes and create index
Once we understood how the excel sheet will be parsed and how the data will be divided then the rest of the steps are easy to understand
Step3 – Set up your OpenAI API key
Step4 – Set up your LlamaImdex API key and define the parser.
Step 5 – define the node_parser using MarkdownElementNodeParser
After dividing the data, store the index using VectorStoreIndex
Step 6 – Now everything is ready. Let’s ask a question and let us see can LLM generate a right answer.
Step 7 – We can also check from which node, the LLM has generated the answer
You can also try another example mentioned in the notebook.
Hope you liked the tutorial. If you have any queries, feel free to reach out to me on LinkedIn – https://www.linkedin.com/in/nikhileshtayal/
The most up-to-date AI + LLM Coaching
In case you are looking to learn AI + Gen AI in an instructor-led live class environment, check out these courses
Happy learning!
Featured Image Source