RAG on an Excel sheet using LlamaParse and GPT4-o-mini

Share it with your senior IT friends and colleagues
Reading Time: 3 minutes

By now, many of us know how to perform RAG (Retrieval Augmented Generation) on a PDF. The RAG pipeline consists of loading the document first and then dividing the large document into smaller chunks.

Embeddings are created of these chunks and stored in a vector database.

When a user query comes, a semantic search is applied to the database and the relevant chunks are retrieved.

The chunks are sent to the LLM along with the user query for it to generate the answer.

Now the questions are

  • How can we perform the same operations on an Excel sheet?
  • How can we load the data?
  • How can we divide the Excel sheet into smaller chunks?

Namaste and Welcome to Build It Yourself.

In this tutorial, we will talk about how to perform RAG on an Excel sheet using LlamaParse and GPT4-o-mini

If you are a senior It professional and looking to learn AI + LLM in a simple language, check out the most up-to-date courses and other details – https://www.aimletc.com/online-instructor-led-ai-llm-coaching-for-it-technical-professionals/

Code for RAG on an Excel sheet using LlamaParse and GPT4-o-mini

You can find the code notebook here – https://github.com/tayaln/RAG-on-an-excel-sheet-using-GPT4-o-mini

To use it, download and then upload it to your Google Drive and open it as a Google Colab file.

So, how do we do to?

We will use 2 main classes – LlamaParse and MarkdownElementNodeParser.

LlamaParse will parse the excel sheet and load its data

MarkdownElementNodeParser will do the heavy lifting. This class will first divide the data into small nodes and then it will index these nodes for fast retrieval.

Let us see this in detail in code walkthrough below:

Pre-requisite

– An Open Mind to learn new things

– OpenAI API Key

– LlamaIndex Cloud API (Get it from here – https://cloud.llamaindex.ai/)

Code Walkthrough

Step 1 – Install Llama-Index and Llama-Parse.

Step 2 – Now let us see what classes we need to perform RAG on an Excel sheet.

nest_asyncio – to let LlamaParse work asynchronously

OpenAI – as we are using its model

VectorStoreIndex – to store the embeddings we will create

Image – to display images in Google Colab

Markdown – to display excel data in markdown format

LlamaParse – to parse the excel sheet

MarkdownElementNodeParser – to divide the data into nodes and create index

Once we understood how the excel sheet will be parsed and how the data will be divided then the rest of the steps are easy to understand

Step3 – Set up your OpenAI API key

Step4 – Set up your LlamaImdex API key and define the parser.

Step 5 – define the node_parser using MarkdownElementNodeParser

After dividing the data, store the index using VectorStoreIndex

Step 6 – Now everything is ready. Let’s ask a question and let us see can LLM generate a right answer.

Step 7 – We can also check from which node, the LLM has generated the answer

You can also try another example mentioned in the notebook.

Hope you liked the tutorial. If you have any queries, feel free to reach out to me on LinkedIn – https://www.linkedin.com/in/nikhileshtayal/

The most up-to-date AI + LLM Coaching

In case you are looking to learn AI + Gen AI in an instructor-led live class environment, check out these courses

Happy learning!

Featured Image Source

Share it with your senior IT friends and colleagues
Nikhilesh Tayal
Nikhilesh Tayal
Articles: 72