Most up-to-date AI & Gen AI coaching for senior IT professionals
Can we access a database using an LLM? Learn how to design a Database AI Agent.
Can you access a database using a Large language model?
Before you say “Yes”, read on.
A part of the question is – can LLM write SQL queries? The answer is obviously yes.
However, there are 2 problems with accessing a database using LLMs.
- Does LLM know the exact table and column names
- How do we know whether the query LLM has generated is right or wrong
The first challenge is easy to understand. The LLM does not have information about the exact table name and column names.
And regarding the second challenge, when we ask an LLM to write a code, it can generate that and then we have an option to check whether it’s right or wrong.
However, when we ask an LLM to generate a query to access a database, writing a query is an intermediary step. We are not getting an SQL query as an output but the result is execution of that query.
And there could be a case that even if the query is wrong, we could get a result. So, how will we ensure that the query is right?
With these 2 challenges, I ask the question again – Can we access a database using an LLM?
Well, the answer is yes. However, the solution is not as straightforward as asking an LLM to write a query and access the database.
But we need to make an AI agent to do that.
In this article, we will talk about how a Database AI agent can write SQL queries and access the database.
Database AI Agent Solution – Think about how a “data analyst”
If you ask a Data analyst a the same job of accessing an unknown database then what would be his/her approach?
Data analysts would normally follow these 3 major steps
- They will first understand what the database looks like, what are the tables, their column names, and their data type. etc.
- They make exploratory queries to the ‘top k’ rows, or look at summary stats
- If they hit an error, they would analyze the error and write a new query.
So, we need to design our Agent in such a way that can imitate the above behaviour and steps. Let’s dive in.
Building a Database AI agent
Step 1 – We will use SQLDatabaseToolkit from Langchain.
What is the purpose you ask?
SQLDatabaseToolkit is used to interact with a database using natural language.
Step 2 – Writing the instructions and role for our Agent
We need to write instructions for our agents. You can write your own prompt. The below is for reference purposes. However, we would like you to note a few instructions:
- You are an agent designed to interact with an SQL database.
- Given an input question, create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.
- You have access to tools for interacting with the database.
- You MUST double check your query before executing it. If you get an error while executing a query, rewrite the query and try again.
- DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the database.
- DO NOT MAKE UP AN ANSWER OR USE PRIOR KNOWLEDGE, ONLY USE THE RESULTS OF THE CALCULATIONS YOU HAVE DONE.
- ALWAYS, as part of your final answer, explain how you got to the answer on a section that starts with: “Explanation:”. Include the SQL query as part of the explanation section.
Step 3 – Writing the format instructions for our Agent
The common way to write this is
- Question
- Thought
- Action
- Action Input
- Observation
- Final Answer
Question is what Agent needs to answer. This is what we want to achieve through the agent.
Thought is what the Agent needs to think to answer the question. It needs to decide whether to use a tool or not.
Action is when the Agent accesses the tool. It could be accessing Google search or a database or a Python Interpreter or any other website.
Action Input is what the Agent access the tool with.
Observation is when the Agent gets the information after accessing the tool
Thought is where the Agent self-analyses the answer. If it thinks the answer is not correct, it could again access the tool to get the correct answer.
Final Answer – Once the agent is sure that it has got the correct answer then it goes ahead and provides the final answer
You could also give an example or two for the agent to understand this complete flow and follow this.
Step 4 – Defining the Agent
Now, let us see how this Agent will go about it’s work
How Database AI agent would work
Agent’s Step 1 – Check the Table
As we discussed earlier, the first thing a data analyst would do is understand the database, i.e. tables, column names, data types, etc.
So, that is what the Agent would also do
Agent’s Step 2 – Check the first few rows
After that agent would go ahead and check the first few rows of data. In the below example, you can see, that it has access first 3 rows from the relevant table
Agent’s Step 3 – Verify the query
As we can see, the agent has first written an SQL query and then it got it verified from the query checker.
Agent’s Step 4 – Provide the output
Once the query is verified, as a final step, the agent will provide the final answer by executing the query.
Conclusion
Writing code is pretty straightforward using an LLM while writing an SQL query or accessing a database is not. LLM needs to think and work like a data analyst.
AI + Gen AI Course for Senior IT Professionals
In case you are looking to learn AI + Gen AI in an instructor-led live class environment, check out these dedicated courses for senior IT professionals here
Pricing for AI courses for senior IT professionals – https://www.aimletc.com/ai-ml-etc-course-offerings-pricing/
My Name is Nikhilesh and if you have any feedback/ suggestion on this article, please feel free to connect with me – https://www.linkedin.com/in/nikhileshtayal/
Disclaimer – The code is taken from Deep Learning AI’s Building Your Own Database Agent course. We are just using it for educational purposes. No copyright infringement is intended. In case any part of content belongs to you or someone you know, please contact us and we will give you credit or remove your content.