diff --git a/adi_function_app/GETTING_STARTED.md b/adi_function_app/GETTING_STARTED.md new file mode 100644 index 00000000..331158ff --- /dev/null +++ b/adi_function_app/GETTING_STARTED.md @@ -0,0 +1,10 @@ +# Getting Started with Document Intelligence Function App + +To get started, perform the following steps: + +1. Setup Azure OpenAI in your subscription with **gpt-4o-mini** & an embedding model, an Python Function App, AI Search and a storage account. +2. Clone this repository and deploy the AI Search rag documents indexes from `deploy_ai_search`. +3. Run `uv sync` within the adi_function_app directory to install dependencies. +4. Configure the environment variables of the function app based on the provided sample +5. Package your Azure Function and upload to your Function App +6. Upload a document for indexing or send a direct HTTP request to the Azure Function. diff --git a/adi_function_app/README.md b/adi_function_app/README.md index 673d8a68..9b43e508 100644 --- a/adi_function_app/README.md +++ b/adi_function_app/README.md @@ -42,6 +42,9 @@ The properties returned from the ADI Custom Skill and Chunking are then used to - Keyphrase extraction - Vectorisation +> [!NOTE] +> See `GETTING_STARTED.md` for a step by step guide of how to use the accelerator. + ## Sample Output Using the [Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone](https://arxiv.org/pdf/2404.14219) as an example, the following output can be obtained for page 7: diff --git a/text_2_sql/GETTING_STARTED.md b/text_2_sql/GETTING_STARTED.md new file mode 100644 index 00000000..a39db016 --- /dev/null +++ b/text_2_sql/GETTING_STARTED.md @@ -0,0 +1,11 @@ +# Getting Started with Agentic Text2SQL Component + +To get started, perform the following steps: + +1. Setup Azure OpenAI in your subscription with **gpt-4o-mini** & an embedding model, alongside a SQL Server sample database, AI Search and a storage account. +2. Clone this repository and deploy the AI Search text2sql indexes from `deploy_ai_search`. +3. Run `uv sync` within the text_2_sql directory to install dependencies. +4. Configure the .env file based on the provided sample +5. Generate a data dictionary for your target server using the instructions in `data_dictionary`. +6. Upload these data dictionaries to the relevant contains in your storage account. Wait for them to be automatically indexed. +7. Navigate to `autogen` directory to view the AutoGen implementation. Follow the steps in `Iteration 5 - Agentic Vector Based Text2SQL.ipynb` to get started. diff --git a/text_2_sql/README.md b/text_2_sql/README.md index fc25680b..c3c2b0f1 100644 --- a/text_2_sql/README.md +++ b/text_2_sql/README.md @@ -4,7 +4,9 @@ This portion of the repo contains code to implement a multi-shot approach to Tex The sample provided works with Azure SQL Server, although it has been easily adapted to other SQL sources such as Snowflake. -**Three iterations on the approach are provided for SQL query generation. A prompt based approach and a two vector database based approaches. See Multi-Shot Approach for more details** +> [!NOTE] +> +> - Previous versions of this approach have now been moved to `previous_iterations/semantic_kernel`. These will not be updated. ## High Level Workflow @@ -12,6 +14,9 @@ The following diagram shows a workflow for how the Text2SQL plugin would be inco ![High level workflow for a plugin driven RAG application](../images/Plugin%20Based%20RAG%20Flow.png "High Level Workflow") +> [!NOTE] +> See `GETTING_STARTED.md` for a step by step guide of how to use the accelerator. + ## Why Text2SQL instead of indexing the database contents? Generating SQL queries and executing them to provide context for the RAG application provided several benefits in the use case this was designed for. @@ -35,13 +40,7 @@ To solve these issues, a Multi-Shot approach is developed. Below is the iteratio ![Comparison between a common Text2SQL approach and a Multi-Shot Text2SQL approach.](./images/Text2SQL%20Approaches.png "Multi Shot SQL Approaches") -Three different iterations are presented and code provided for: - - **Iteration 2:** Injection of a brief description of the available entities is injected into the prompt. This limits the number of tokens used and avoids filling the prompt with confusing schema information. - - **Iteration 3:** Indexing the entity definitions in a vector database, such as AI Search, and querying it to retrieve the most relevant entities for the key terms from the query. - - **Iteration 4:** Keeping an index of commonly asked questions and which schema / SQL query they resolve to - this index is generated by the LLM when it encounters a question that has not been previously asked. Additionally, indexing the entity definitions in a vector database, such as AI Search _(same as Iteration 3)_. First querying this index to see if a similar SQL query can be obtained _(if high probability of exact SQL query match, the results can be pre-fetched)_. If not, falling back to the schema index, and querying it to retrieve the most relevant entities for the key terms from the query. - - **Iteration 5:** Moves the Iteration 4 approach into a multi-agent approach for improved reasoning and query generation. With separation into agents, different agents can focus on one task only, and provide a better overall flow and response quality. See more details below. - -All approaches limit the number of tokens used and avoids filling the prompt with confusing schema information. +Our approach has evolved as the system has matured into an multi-agent approach that brings improved reasoning, speed and instruction following capabilities. With separation into agents, different agents can focus on one task only, and provide a better overall flow and response quality. Using Auto-Function calling capabilities, the LLM is able to retrieve from the plugin the full schema information for the views / tables that it considers useful for answering the question. Once retrieved, the full SQL query can then be generated. The schemas for multiple views / tables can be retrieved to allow the LLM to perform joins and other complex queries. @@ -61,6 +60,10 @@ As the query cache is shared between users (no data is stored in the cache), a n ![Vector Based with Query Cache Logical Flow.](./images/Agentic%20Text2SQL%20Query%20Cache.png "Agentic Vector Based with Query Cache Logical Flow") +#### Parallel execution + +After the first agent has rewritten and decomposed the user input, we execute each of the individual questions in parallel for the quickest time to generate an answer. + ### Caching Strategy The cache strategy implementation is a simple way to prove that the system works. You can adopt several different strategies for cache population. Below are some of the strategies that could be used: @@ -70,53 +73,12 @@ The cache strategy implementation is a simple way to prove that the system works - **Positive Indication System:** Only update the cache when a user positively reacts to a question e.g. a thumbs up from the UI or doesn't ask a follow up question. - **Always update:** Always add all questions into the cache when they are asked. The sample code in the repository currently implements this approach, but this could lead to poor SQL queries reaching the cache. One of the other caching strategies would be better production version. -### Comparison of Iterations -| | Common Text2SQL Approach | Prompt Based Multi-Shot Text2SQL Approach | Vector Based Multi-Shot Text2SQL Approach | Vector Based Multi-Shot Text2SQL Approach With Query Cache | Agentic Vector Based Multi-Shot Text2SQL Approach With Query Cache | -|-|-|-|-|-|-| -|**Advantages** | Fast for a limited number of entities. | Significant reduction in token usage. | Significant reduction in token usage. | Significant reduction in token usage. -| | | | Scales well to multiple entities. | Scales well to multiple entities. | Scales well to multiple entities with small agents. | -| | | | Uses a vector approach to detect the best fitting entity which is faster than using an LLM. Matching is offloaded to AI Search. | Uses a vector approach to detect the best fitting entity which is faster than using an LLM. Matching is offloaded to AI Search. | Uses a vector approach to detect the best fitting entity which is faster than using an LLM. Matching is offloaded to AI Search. | -| | | | | Significantly faster to answer similar questions as best fitting entity detection is skipped. Observed tests resulted in almost half the time for final output compared to the previous iteration. | Significantly faster to answer similar questions as best fitting entity detection is skipped. Observed tests resulted in almost half the time for final output compared to the previous iteration. | -| | | | | Significantly faster execution time for known questions. Total execution time can be reduced by skipping the query generation step. | Significantly faster execution time for known questions. Total execution time can be reduced by skipping the query generation step. | -| | | | | | Instruction following and accuracy is improved by decomposing the task into smaller tasks. | -| | | | | | Handles query decomposition for complex questions. | -|**Disadvantages** | Slows down significantly as the number of entities increases. | Uses LLM to detect the best fitting entity which is slow compared to a vector approach. | AI Search adds additional cost to the solution. | Slower than other approaches for the first time a question with no similar questions in the cache is asked. | Slower than other approaches for the first time a question with no similar questions in the cache is asked. | -| | Consumes a significant number of tokens as number of entities increases. | As number of entities increases, token usage will grow but at a lesser rate than Iteration 1. | | AI Search adds additional cost to the solution. | AI Search and multiple agents adds additional cost to the solution. | -| | LLM struggled to differentiate which table to choose with the large amount of information passed. | | | | -|**Code Availability**| | | | | -| Semantic Kernel | Yes :heavy_check_mark: | Yes :heavy_check_mark: | Yes :heavy_check_mark: | Yes :heavy_check_mark: | | -| LangChain | | | | | | -| AutoGen | | | | | Yes :heavy_check_mark: | - -### Complete Execution Time Comparison for Approaches - -To compare the different in complete execution time, the following questions were tested 25 times each for 4 different approaches. - -Approaches: -- Prompt-based Multi-Shot (Iteration 2) -- Vector-Based Multi-Shot (Iteration 3) -- Vector-Based Multi-Shot with Query Cache (Iteration 4) -- Vector-Based Multi-shot with Pre Run Query Cache (Iteration 4) - -Questions: -- What is the total revenue in June 2008? -- Give me the total number of orders in 2008? -- Which country did had the highest number of orders in June 2008? - -The graph below shows the response times for the experimentation on a Known Question Set (i.e. the cache has already been populated with the query mapping by the LLM). gpt-4o was used as the completion LLM for this experiment. The response time is the complete execution time including: - -- Prompt Preparation -- Question Understanding -- Cache Index Requests _(if applicable)_ -- SQL Query Execution -- Interpretation and generation of answer in the correct format - -![Response Time Distribution](./images/Known%20Question%20Response%20Time.png "Response Time Distribution By Approach") - -The vector-based cache approaches consistently outperform those that just use a Prompt-Based or Vector-Based approach by a significant margin. Given that it is highly likely the same Text2SQL questions will be repeated often, storing the question-sql mapping leads to **significant performance increases** that are beneficial, despite the initial additional latency (between 1 - 2 seconds from testing) when a question is asked the first time. - ## Sample Output +> [!NOTE] +> +> - Full payloads for input / outputs can be found in `text_2_sql_core/src/text_2_sql_core/payloads/interaction_payloads.py`. + ### What is the top performing product by quantity of units sold? #### SQL Query Generated @@ -130,14 +92,12 @@ The vector-based cache approaches consistently outperform those that just use a "answer": "The top-performing product by quantity of units sold is the **Classic Vest, S** from the **Classic Vest** product model, with a total of 87 units sold [1][2].", "sources": [ { - "title": "Sales Order Detail", - "chunk": "| ProductID | TotalUnitsSold |\n|-----------|----------------|\n| 864 | 87 |\n", - "reference": "SELECT TOP 1 ProductID, SUM(OrderQty) AS TotalUnitsSold FROM SalesLT.SalesOrderDetail GROUP BY ProductID ORDER BY TotalUnitsSold DESC;" + "sql_rows": "| ProductID | TotalUnitsSold |\n|-----------|----------------|\n| 864 | 87 |\n", + "sql_query": "SELECT TOP 1 ProductID, SUM(OrderQty) AS TotalUnitsSold FROM SalesLT.SalesOrderDetail GROUP BY ProductID ORDER BY TotalUnitsSold DESC;" }, { - "title": "Product and Description", - "chunk": "| Name | ProductModel |\n|----------------|---------------|\n| Classic Vest, S| Classic Vest |\n", - "reference": "SELECT Name, ProductModel FROM SalesLT.vProductAndDescription WHERE ProductID = 864;" + "sql_rows": "| Name | ProductModel |\n|----------------|---------------|\n| Classic Vest, S| Classic Vest |\n", + "sql_query": "SELECT Name, ProductModel FROM SalesLT.vProductAndDescription WHERE ProductID = 864;" } ] } @@ -159,6 +119,10 @@ The top-performing product by quantity of units sold is the **Classic Vest, S** |----------------|---------------| | Classic Vest, S| Classic Vest | +## Disambiguation Requests + +If the LLM is unable to understand or answer the question asked, it can ask the user follow up questions with a DisambiguationRequest. In cases where multiple columns may be the correct one, or that there user may be referring to several different filter values, the LLM can produce a series of options for the end user to select from. + ## Data Dictionary ### entities.json @@ -233,28 +197,9 @@ Below is a sample entry for a view / table that we which to expose to the LLM. T See `./data_dictionary` for more details on how the data dictionary is structured and ways to **automatically generate it**. -## Prompt Based SQL Plugin (Iteration 2) - -This approach works well for a small number of entities (tested on up to 20 entities with hundreds of columns). It performed well on the testing, with correct metadata, we achieved 100% accuracy on the test set. - -Whilst a simple and high performing approach, the downside of this approach is the increase in number of tokens as the number of entities increases. Additionally, we found that the LLM started to get "confused" on which columns belong to which entities as the number of entities increased. - -## Vector Based SQL Plugin (Iterations 3 & 4) - -This approach allows the system to scale without significantly increasing the number of tokens used within the system prompt. Indexing and running an AI Search instance consumes additional cost, compared to the prompt based approach. - -If the query cache is enabled, we used a vector search to find the similar previously asked questions and the queries / schemas they map to. In the case of a high probability of a match, the results can be pre-run with the stored query and passed to the LLM alongside the query. If the results can answer the question, query generation can be skipped all together, speeding up the total execution time. - -In the case of an unknown question, there is a minor increase in latency but the query index cache could be pre-populated before it is released to users with common questions. - -The following environmental variables control the behaviour of the Vector Based Text2SQL generation: - -- **Text2Sql__UseQueryCache** - controls whether the query cached index is checked before using the standard schema index. -- **Text2Sql__PreRunQueryCache** - controls whether the top result from the query cache index (if enabled) is pre-fetched against the data source to include the results in the prompt. - ## Agentic Vector Based Approach (Iteration 5) -This approach builds on the the Vector Based SQL Plugin approach, but adds a agentic approach to the solution. +This approach builds on the the Vector Based SQL Plugin approach that was previously developed, but adds a agentic approach to the solution. This agentic system contains the following agents: @@ -267,16 +212,6 @@ This agentic system contains the following agents: The combination of this agent allows the system to answer complex questions, whilst staying under the token limits when including the database schemas. The query cache ensures that previously asked questions, can be answered quickly to avoid degrading user experience. -## Code Availability - -| | Common Text2SQL Approach | Prompt Based Multi-Shot Text2SQL Approach | Vector Based Multi-Shot Text2SQL Approach | Vector Based Multi-Shot Text2SQL Approach With Query Cache | Agentic Vector Based Multi-Shot Text2SQL Approach With Query Cache | -|-|-|-|-|-|-| -| Semantic Kernel | Yes :heavy_check_mark: | Yes :heavy_check_mark: | Yes :heavy_check_mark: | Yes :heavy_check_mark: | | -| LangChain | | | | | | -| AutoGen | | | | | Yes :heavy_check_mark: | - -See the relevant directory for the code in the provided framework. - ## Tips for good Text2SQL performance. - Pre-assemble views to avoid the LLM having to make complex joins between multiple tables diff --git a/text_2_sql/autogen/README.md b/text_2_sql/autogen/README.md index f76df088..68e2da90 100644 --- a/text_2_sql/autogen/README.md +++ b/text_2_sql/autogen/README.md @@ -163,8 +163,7 @@ The system produces standardized JSON output through the Answer and Sources Agen "sources": [ { "sql_query": "The SQL query used", - "sql_rows": ["Array of result rows"], - "markdown_table": "Formatted markdown table of results" + "sql_rows": ["Array of result rows"] } ] } diff --git a/text_2_sql/images/Agentic Text2SQL Query Cache.png b/text_2_sql/images/Agentic Text2SQL Query Cache.png index f91f57db..a893bdaf 100644 Binary files a/text_2_sql/images/Agentic Text2SQL Query Cache.png and b/text_2_sql/images/Agentic Text2SQL Query Cache.png differ diff --git a/text_2_sql/images/Text2SQL Approaches.png b/text_2_sql/images/Text2SQL Approaches.png index 2dd23892..29578df2 100644 Binary files a/text_2_sql/images/Text2SQL Approaches.png and b/text_2_sql/images/Text2SQL Approaches.png differ diff --git a/text_2_sql/previous_iterations/semantic_kernel_text_2_sql/Iteration 2 - Prompt Based Text2SQL.ipynb b/text_2_sql/previous_iterations/semantic_kernel/Iteration 2 - Prompt Based Text2SQL.ipynb similarity index 100% rename from text_2_sql/previous_iterations/semantic_kernel_text_2_sql/Iteration 2 - Prompt Based Text2SQL.ipynb rename to text_2_sql/previous_iterations/semantic_kernel/Iteration 2 - Prompt Based Text2SQL.ipynb diff --git a/text_2_sql/previous_iterations/semantic_kernel_text_2_sql/Iterations 3 & 4 - Vector Based Text2SQL.ipynb b/text_2_sql/previous_iterations/semantic_kernel/Iterations 3 & 4 - Vector Based Text2SQL.ipynb similarity index 100% rename from text_2_sql/previous_iterations/semantic_kernel_text_2_sql/Iterations 3 & 4 - Vector Based Text2SQL.ipynb rename to text_2_sql/previous_iterations/semantic_kernel/Iterations 3 & 4 - Vector Based Text2SQL.ipynb diff --git a/text_2_sql/previous_iterations/semantic_kernel/README.md b/text_2_sql/previous_iterations/semantic_kernel/README.md new file mode 100644 index 00000000..c23396aa --- /dev/null +++ b/text_2_sql/previous_iterations/semantic_kernel/README.md @@ -0,0 +1,147 @@ +# Multi-Shot Text2SQL Component - Semantic Kernel + +The implementation is written for [Semantic Kernel](https://github.com/microsoft/semantic-kernel) in Python, although it can easily be adapted for C#. + +**The provided Semantic Kernel code implements Iterations 2, 3 & 4. This section of the repo is an archive of previous iterations. No updates or changes will be made.** + +## Full Logical Flow for Vector Based Approach + +The following diagram shows the logical flow within the Vector Based plugin. In an ideal scenario, the questions will follow the _Pre-Fetched Cache Results Path** which leads to the quickest answer generation. In cases where the question is not known, the plugin will fall back the other paths accordingly and generate the SQL query using the LLMs. + +As the query cache is shared between users (no data is stored in the cache), a new user can benefit from the pre-mapped question and schema resolution in the index. There are multiple possible strategies for updating the query cache, see the possible options in the Text2SQL README. + +**Database results were deliberately not stored within the cache. Storing them would have removed one of the key benefits of the Text2SQL plugin, the ability to get near-real time information inside a RAG application. Instead, the query is stored so that the most-recent results can be obtained quickly. Additionally, this retains the ability to apply Row or Column Level Security.** + +![Vector Based with Query Cache Logical Flow.](../images/Text2SQL%20Query%20Cache.png "Vector Based with Query Cache Logical Flow") + +## Previous Approaches + + - **Iteration 2:** Injection of a brief description of the available entities is injected into the prompt. This limits the number of tokens used and avoids filling the prompt with confusing schema information. + - **Iteration 3:** Indexing the entity definitions in a vector database, such as AI Search, and querying it to retrieve the most relevant entities for the key terms from the query. + - **Iteration 4:** Keeping an index of commonly asked questions and which schema / SQL query they resolve to - this index is generated by the LLM when it encounters a question that has not been previously asked. Additionally, indexing the entity definitions in a vector database, such as AI Search _(same as Iteration 3)_. First querying this index to see if a similar SQL query can be obtained _(if high probability of exact SQL query match, the results can be pre-fetched)_. If not, falling back to the schema index, and querying it to retrieve the most relevant entities for the key terms from the query. + +### Comparison of Iterations +| | Common Text2SQL Approach | Prompt Based Multi-Shot Text2SQL Approach | Vector Based Multi-Shot Text2SQL Approach | Vector Based Multi-Shot Text2SQL Approach With Query Cache | Agentic Vector Based Multi-Shot Text2SQL Approach With Query Cache | +|-|-|-|-|-|-| +|**Advantages** | Fast for a limited number of entities. | Significant reduction in token usage. | Significant reduction in token usage. | Significant reduction in token usage. +| | | | Scales well to multiple entities. | Scales well to multiple entities. | Scales well to multiple entities with small agents. | +| | | | Uses a vector approach to detect the best fitting entity which is faster than using an LLM. Matching is offloaded to AI Search. | Uses a vector approach to detect the best fitting entity which is faster than using an LLM. Matching is offloaded to AI Search. | Uses a vector approach to detect the best fitting entity which is faster than using an LLM. Matching is offloaded to AI Search. | +| | | | | Significantly faster to answer similar questions as best fitting entity detection is skipped. Observed tests resulted in almost half the time for final output compared to the previous iteration. | Significantly faster to answer similar questions as best fitting entity detection is skipped. Observed tests resulted in almost half the time for final output compared to the previous iteration. | +| | | | | Significantly faster execution time for known questions. Total execution time can be reduced by skipping the query generation step. | Significantly faster execution time for known questions. Total execution time can be reduced by skipping the query generation step. | +| | | | | | Instruction following and accuracy is improved by decomposing the task into smaller tasks. | +| | | | | | Handles query decomposition for complex questions. | +|**Disadvantages** | Slows down significantly as the number of entities increases. | Uses LLM to detect the best fitting entity which is slow compared to a vector approach. | AI Search adds additional cost to the solution. | Slower than other approaches for the first time a question with no similar questions in the cache is asked. | Slower than other approaches for the first time a question with no similar questions in the cache is asked. | +| | Consumes a significant number of tokens as number of entities increases. | As number of entities increases, token usage will grow but at a lesser rate than Iteration 1. | | AI Search adds additional cost to the solution. | AI Search and multiple agents adds additional cost to the solution. | +| | LLM struggled to differentiate which table to choose with the large amount of information passed. | | | | +|**Code Availability**| | | | | +| Semantic Kernel | Yes :heavy_check_mark: | Yes :heavy_check_mark: | Yes :heavy_check_mark: | Yes :heavy_check_mark: | | +| LangChain | | | | | | +| AutoGen | | | | | Yes :heavy_check_mark: | + +### Complete Execution Time Comparison for Approaches + +To compare the different in complete execution time, the following questions were tested 25 times each for 4 different approaches. + +Approaches: +- Prompt-based Multi-Shot (Iteration 2) +- Vector-Based Multi-Shot (Iteration 3) +- Vector-Based Multi-Shot with Query Cache (Iteration 4) +- Vector-Based Multi-shot with Pre Run Query Cache (Iteration 4) + +Questions: +- What is the total revenue in June 2008? +- Give me the total number of orders in 2008? +- Which country did had the highest number of orders in June 2008? + +The graph below shows the response times for the experimentation on a Known Question Set (i.e. the cache has already been populated with the query mapping by the LLM). gpt-4o was used as the completion LLM for this experiment. The response time is the complete execution time including: + +- Prompt Preparation +- Question Understanding +- Cache Index Requests _(if applicable)_ +- SQL Query Execution +- Interpretation and generation of answer in the correct format + +![Response Time Distribution](./../../images/Known%20Question%20Response%20Time.png "Response Time Distribution By Approach") + +The vector-based cache approaches consistently outperform those that just use a Prompt-Based or Vector-Based approach by a significant margin. Given that it is highly likely the same Text2SQL questions will be repeated often, storing the question-sql mapping leads to **significant performance increases** that are beneficial, despite the initial additional latency (between 1 - 2 seconds from testing) when a question is asked the first time. + +## Prompt Based SQL Plugin (Iteration 2) + +This approach works well for a small number of entities (tested on up to 20 entities with hundreds of columns). It performed well on the testing, with correct metadata, we achieved 100% accuracy on the test set. + +Whilst a simple and high performing approach, the downside of this approach is the increase in number of tokens as the number of entities increases. Additionally, we found that the LLM started to get "confused" on which columns belong to which entities as the number of entities increased. + +## Vector Based SQL Plugin (Iterations 3 & 4) + +This approach allows the system to scale without significantly increasing the number of tokens used within the system prompt. Indexing and running an AI Search instance consumes additional cost, compared to the prompt based approach. + +If the query cache is enabled, we used a vector search to find the similar previously asked questions and the queries / schemas they map to. In the case of a high probability of a match, the results can be pre-run with the stored query and passed to the LLM alongside the query. If the results can answer the question, query generation can be skipped all together, speeding up the total execution time. + +In the case of an unknown question, there is a minor increase in latency but the query index cache could be pre-populated before it is released to users with common questions. + +The following environmental variables control the behaviour of the Vector Based Text2SQL generation: + +- **Text2Sql__UseQueryCache** - controls whether the query cached index is checked before using the standard schema index. +- **Text2Sql__PreRunQueryCache** - controls whether the top result from the query cache index (if enabled) is pre-fetched against the data source to include the results in the prompt. + +## Provided Notebooks & Scripts + +- `./Iteration 2 - Prompt Based Text2SQL.ipynb` provides example of how to utilise the Prompt Based Text2SQL plugin to query the database. +- `./Iterations 3 & 4 - Vector Based Text2SQL.ipynb` provides example of how to utilise the Vector Based Text2SQL plugin to query the database. The query cache plugin will be enabled or disabled depending on the environmental parameters. +- `./time_comparison_script.py` provides a utility script for performing time based comparisons between the different approaches. + +### ai-search.py + +This util file contains helper functions for interacting with AI Search. + +## Plugins + +### prompt_based_sql_plugin.py + +The `./plugins/prompt_based_sql_plugin/prompt_based_sql_plugin.py` contains 3 key methods to power the Prompt Based Text2SQL engine. + +#### system_prompt() + +This method takes the loaded `entities.json` file and generates a system prompt based on it. Here, the **EntityName** and **Description** are used to build a list of available entities for the LLM to select. + +This is then inserted into a pre-made Text2SQL generation prompt that already contains optimised and working instructions for the LLM. This system prompt for the plugin is added to the main prompt file at runtime. + +The **target_engine** is passed to the prompt, along with **engine_specific_rules** to ensure that the SQL queries generated work on the target engine. + +#### get_entity_schema() + +This method is called by the Semantic Kernel framework automatically, when instructed to do so by the LLM, to fetch the full schema definitions for a given entity. This returns a JSON string of the chosen entity which allows the LLM to understand the column definitions and their associated metadata. This can be called in parallel for multiple entities. + +#### run_sql_query() + +This method is called by the Semantic Kernel framework automatically, when instructed to do so by the LLM, to run a SQL query against the given database. It returns a JSON string containing a row wise dump of the results returned. These results are then interpreted to answer the question. + +### vector_based_sql_plugin.py + +The `./plugins/vector_based_sql_plugin/vector_based_sql_plugin.py` contains 3 key methods to power the Vector Based Text2SQL engine. + +#### system_prompt() + +This method simply returns a pre-made system prompt that contains optimised and working instructions for the LLM. This system prompt for the plugin is added to the main prompt file at runtime. + +The **target_engine** is passed to the prompt, along with **engine_specific_rules** to ensure that the SQL queries generated work on the target engine. + +**If the query cache is enabled, the prompt is adjusted to instruct the LLM to look at the cached data and results first, before calling `get_entity_schema()`.** + +#### get_entity_schema() + +This method is called by the Semantic Kernel framework automatically, when instructed to do so by the LLM, to search the AI Search instance with the given text. The LLM is able to pass the key terms from the user query, and retrieve a ranked list of the most suitable entities to answer the question. + +The search text passed is vectorised against the entity level **Description** columns. A hybrid Semantic Reranking search is applied against the **EntityName**, **Entity**, **Columns/Name** fields. + +#### fetch_queries_from_cache() + +The vector based with query cache uses the `fetch_queries_from_cache()` method to fetch the most relevant previous query and injects it into the prompt before the initial LLM call. The use of Auto-Function Calling here is avoided to reduce the response time as the cache index will always be used first. + +If the score of the top result is higher than the defined threshold, the query will be executed against the target data source and the results included in the prompt. This allows us to prompt the LLM to evaluated whether it can use these results to answer the question, **without further SQL Query generation** to speed up the process. + +#### run_sql_query() + +This method is called by the Semantic Kernel framework automatically, when instructed to do so by the LLM, to run a SQL query against the given database. It returns a JSON string containing a row wise dump of the results returned. These results are then interpreted to answer the question. + +Additionally, if any of the cache functionality is enabled, this method will update the query cache index based on the SQL query run, and the schemas used in execution. diff --git a/text_2_sql/previous_iterations/semantic_kernel_text_2_sql/plugins/prompt_based_sql_plugin/__init__.py b/text_2_sql/previous_iterations/semantic_kernel/plugins/prompt_based_sql_plugin/__init__.py similarity index 100% rename from text_2_sql/previous_iterations/semantic_kernel_text_2_sql/plugins/prompt_based_sql_plugin/__init__.py rename to text_2_sql/previous_iterations/semantic_kernel/plugins/prompt_based_sql_plugin/__init__.py diff --git a/text_2_sql/previous_iterations/semantic_kernel_text_2_sql/plugins/prompt_based_sql_plugin/prompt_based_sql_plugin.py b/text_2_sql/previous_iterations/semantic_kernel/plugins/prompt_based_sql_plugin/prompt_based_sql_plugin.py similarity index 100% rename from text_2_sql/previous_iterations/semantic_kernel_text_2_sql/plugins/prompt_based_sql_plugin/prompt_based_sql_plugin.py rename to text_2_sql/previous_iterations/semantic_kernel/plugins/prompt_based_sql_plugin/prompt_based_sql_plugin.py diff --git a/text_2_sql/previous_iterations/semantic_kernel_text_2_sql/plugins/vector_based_sql_plugin/__init__.py b/text_2_sql/previous_iterations/semantic_kernel/plugins/vector_based_sql_plugin/__init__.py similarity index 100% rename from text_2_sql/previous_iterations/semantic_kernel_text_2_sql/plugins/vector_based_sql_plugin/__init__.py rename to text_2_sql/previous_iterations/semantic_kernel/plugins/vector_based_sql_plugin/__init__.py diff --git a/text_2_sql/previous_iterations/semantic_kernel_text_2_sql/plugins/vector_based_sql_plugin/vector_based_sql_plugin.py b/text_2_sql/previous_iterations/semantic_kernel/plugins/vector_based_sql_plugin/vector_based_sql_plugin.py similarity index 100% rename from text_2_sql/previous_iterations/semantic_kernel_text_2_sql/plugins/vector_based_sql_plugin/vector_based_sql_plugin.py rename to text_2_sql/previous_iterations/semantic_kernel/plugins/vector_based_sql_plugin/vector_based_sql_plugin.py diff --git a/text_2_sql/previous_iterations/semantic_kernel_text_2_sql/prompt.yaml b/text_2_sql/previous_iterations/semantic_kernel/prompt.yaml similarity index 100% rename from text_2_sql/previous_iterations/semantic_kernel_text_2_sql/prompt.yaml rename to text_2_sql/previous_iterations/semantic_kernel/prompt.yaml diff --git a/text_2_sql/previous_iterations/semantic_kernel_text_2_sql/time_comparison_script.py b/text_2_sql/previous_iterations/semantic_kernel/time_comparison_script.py similarity index 100% rename from text_2_sql/previous_iterations/semantic_kernel_text_2_sql/time_comparison_script.py rename to text_2_sql/previous_iterations/semantic_kernel/time_comparison_script.py diff --git a/text_2_sql/previous_iterations/semantic_kernel_text_2_sql/README.md b/text_2_sql/previous_iterations/semantic_kernel_text_2_sql/README.md deleted file mode 100644 index 94ee633a..00000000 --- a/text_2_sql/previous_iterations/semantic_kernel_text_2_sql/README.md +++ /dev/null @@ -1,77 +0,0 @@ -# Multi-Shot Text2SQL Component - Semantic Kernel - -The implementation is written for [Semantic Kernel](https://github.com/microsoft/semantic-kernel) in Python, although it can easily be adapted for C#. - -**The provided Semantic Kernel code implements Iterations 2, 3 & 4. This section of the repo is an archive of previous iterations. No updates or changes will be made.** - -## Full Logical Flow for Vector Based Approach - -The following diagram shows the logical flow within the Vector Based plugin. In an ideal scenario, the questions will follow the _Pre-Fetched Cache Results Path** which leads to the quickest answer generation. In cases where the question is not known, the plugin will fall back the other paths accordingly and generate the SQL query using the LLMs. - -As the query cache is shared between users (no data is stored in the cache), a new user can benefit from the pre-mapped question and schema resolution in the index. There are multiple possible strategies for updating the query cache, see the possible options in the Text2SQL README. - -**Database results were deliberately not stored within the cache. Storing them would have removed one of the key benefits of the Text2SQL plugin, the ability to get near-real time information inside a RAG application. Instead, the query is stored so that the most-recent results can be obtained quickly. Additionally, this retains the ability to apply Row or Column Level Security.** - -![Vector Based with Query Cache Logical Flow.](../images/Text2SQL%20Query%20Cache.png "Vector Based with Query Cache Logical Flow") - -## Provided Notebooks & Scripts - -- `./Iteration 2 - Prompt Based Text2SQL.ipynb` provides example of how to utilise the Prompt Based Text2SQL plugin to query the database. -- `./Iterations 3 & 4 - Vector Based Text2SQL.ipynb` provides example of how to utilise the Vector Based Text2SQL plugin to query the database. The query cache plugin will be enabled or disabled depending on the environmental parameters. -- `./time_comparison_script.py` provides a utility script for performing time based comparisons between the different approaches. - -### ai-search.py - -This util file contains helper functions for interacting with AI Search. - -## Plugins - -### prompt_based_sql_plugin.py - -The `./plugins/prompt_based_sql_plugin/prompt_based_sql_plugin.py` contains 3 key methods to power the Prompt Based Text2SQL engine. - -#### system_prompt() - -This method takes the loaded `entities.json` file and generates a system prompt based on it. Here, the **EntityName** and **Description** are used to build a list of available entities for the LLM to select. - -This is then inserted into a pre-made Text2SQL generation prompt that already contains optimised and working instructions for the LLM. This system prompt for the plugin is added to the main prompt file at runtime. - -The **target_engine** is passed to the prompt, along with **engine_specific_rules** to ensure that the SQL queries generated work on the target engine. - -#### get_entity_schema() - -This method is called by the Semantic Kernel framework automatically, when instructed to do so by the LLM, to fetch the full schema definitions for a given entity. This returns a JSON string of the chosen entity which allows the LLM to understand the column definitions and their associated metadata. This can be called in parallel for multiple entities. - -#### run_sql_query() - -This method is called by the Semantic Kernel framework automatically, when instructed to do so by the LLM, to run a SQL query against the given database. It returns a JSON string containing a row wise dump of the results returned. These results are then interpreted to answer the question. - -### vector_based_sql_plugin.py - -The `./plugins/vector_based_sql_plugin/vector_based_sql_plugin.py` contains 3 key methods to power the Vector Based Text2SQL engine. - -#### system_prompt() - -This method simply returns a pre-made system prompt that contains optimised and working instructions for the LLM. This system prompt for the plugin is added to the main prompt file at runtime. - -The **target_engine** is passed to the prompt, along with **engine_specific_rules** to ensure that the SQL queries generated work on the target engine. - -**If the query cache is enabled, the prompt is adjusted to instruct the LLM to look at the cached data and results first, before calling `get_entity_schema()`.** - -#### get_entity_schema() - -This method is called by the Semantic Kernel framework automatically, when instructed to do so by the LLM, to search the AI Search instance with the given text. The LLM is able to pass the key terms from the user query, and retrieve a ranked list of the most suitable entities to answer the question. - -The search text passed is vectorised against the entity level **Description** columns. A hybrid Semantic Reranking search is applied against the **EntityName**, **Entity**, **Columns/Name** fields. - -#### fetch_queries_from_cache() - -The vector based with query cache uses the `fetch_queries_from_cache()` method to fetch the most relevant previous query and injects it into the prompt before the initial LLM call. The use of Auto-Function Calling here is avoided to reduce the response time as the cache index will always be used first. - -If the score of the top result is higher than the defined threshold, the query will be executed against the target data source and the results included in the prompt. This allows us to prompt the LLM to evaluated whether it can use these results to answer the question, **without further SQL Query generation** to speed up the process. - -#### run_sql_query() - -This method is called by the Semantic Kernel framework automatically, when instructed to do so by the LLM, to run a SQL query against the given database. It returns a JSON string containing a row wise dump of the results returned. These results are then interpreted to answer the question. - -Additionally, if any of the cache functionality is enabled, this method will update the query cache index based on the SQL query run, and the schemas used in execution. diff --git a/text_2_sql/text_2_sql_core/README.md b/text_2_sql/text_2_sql_core/README.md index e69de29b..d93adf4c 100644 --- a/text_2_sql/text_2_sql_core/README.md +++ b/text_2_sql/text_2_sql_core/README.md @@ -0,0 +1,3 @@ +## Text2SQL Core + +This portion of the repository contains the core prompts, code and config used to power the text2sql agentic flow. As much of the code as possible is kept separate from the AutoGen implementation to enable it to be easily rewritten for another framework in the future.