RAG Query¶

In this tutorial, you'll build a simple Retrieval-Augmented Generation (RAG) query flow using WSO2 Integrator: BI. You'll create an HTTP service that retrieves relevant information from a previously ingested vector knowledge base and uses a Large Language Model (LLM) to generate a context-aware response.

By the end of this tutorial, you'll have a working integration that takes a user query, retrieves relevant chunks from the knowledge base, and returns a natural language answer using the configured LLM.

Prerequisites¶

To get started, make sure you have completed the following steps:

Completed the RAG Ingestion Tutorial.

Using the Same Project

Since this tutorial uses an In-Memory Vector Store (as configured in the ingestion tutorial), the ingested data is only available within the same integration project and runtime session. You'll need to add the HTTP service to the same rag_ingestion project you created in the previous tutorial, rather than creating a new project.

If you used an external vector store like Pinecone, Milvus, or Weaviate in the ingestion tutorial, you can create a separate project and configure the same external vector store connection.

Step 1: Open your existing integration project¶

Open the rag_ingestion project that you created in the RAG Ingestion Tutorial.
When you run this integration, the ingestion automation will execute first, automatically loading your data into the in-memory vector store before the HTTP service becomes available.

Step 2: Create an HTTP service¶

In the design view, click on the Add Artifact button.
Select HTTP Service under the Integration as API category.
Choose Create and use the default HTTP listener from the Listener dropdown.
Select Design from Scratch as the Service contract option.
Specify the Service base path as /.
Click Create to create the service.

Step 3: Update the resource method¶

The service will have a default resource named greeting with the GET method. Click the edit button next to the /greeting resource.
Change the HTTP method to POST.
Rename the resource to query.
Add a payload parameter named userQuery of type string.
Keep others set to defaults.
Click Save to apply the changes.

Step 4: Retrieve data from the knowledge base¶

Since you're working in the same project where you completed the ingestion tutorial, the vector knowledge base knowledgeBase that you created earlier is already available. You can use it to retrieve relevant chunks based on the user query.

Click on the newly created POST resource to open it in the flow diagram view.
Hover over the flow line and click the + icon.
Select Knowledge Bases under the AI section.
In the Knowledge Bases section, click on knowledgeBase.
Click on retrieve to open the configuration panel.
Set the Query input to the userQuery variable.
Set the Result to context to store the matched chunks in a variable named context.
Click Save to complete the retrieval step.

Using External Vector Stores

If you have an external vector store (Pinecone, Milvus, Weaviate, etc.) with pre-ingested content, you can create a new vector knowledge base by clicking + Add Vector Knowledge Base and following the instructions in Step 5 of the RAG Ingestion Tutorial. Make sure to configure the same vector store and embedding provider settings that were used during ingestion.

Step 5: Augment the user query with retrieved content¶

WSO2 Integrator: BI includes a built-in function to augment the user query with retrieved context from the knowledge base. We'll use that in this step.

Hover over the flow line and click the + icon.
Select Augment Query under the AI section.
Set Context to context.
Set Query to userQuery.
Set Result to augmentedUserMsg.
Click Save to complete the augmentation step.

Step 6: Connect to an LLM provider¶

After augmenting the query with retrieved context, we can now pass it to an LLM for a grounded response. WSO2 Integrator: BI provides an abstraction called Model Provider to connect with various LLM services.

Hover over the flow line and click the + icon.
Select Model Provider under the AI section.
Click + Add Model Provider to create a new instance.
Select Default Model Provider (WSO2) — a WSO2-hosted LLM — for this tutorial.
Set the Model Provider Name to defaultModel.
Click Save to complete the configuration.

Step 7: Generate the response¶

Now send the augmented query to the LLM to generate the grounded response.

Click on the defaultModel under the Model Providers section in the side panel.
Select the generate action.
Set the Prompt to the expression: check augmentedUserMsg.content.ensureType().
Set the Result variable to response.
Set the Expected Type to string.
Click Save.

Understanding the Expression

The expression check augmentedUserMsg.content.ensureType() extracts the augmented query content and ensures it's in the correct string format that the LLM expects. The check keyword handles any potential type conversion errors.

Step 8: Return the response from the service resource¶

Hover over the flow line and click the + icon.
Under the Control section, click on Return.
Set Expression to response.

Step 9: Configure default WSO2 providers and run the integration¶

As the workflow uses the Default Model Provider (WSO2) and Default Embedding Provider (WSO2), you need to configure its settings:
- Press Ctrl/Cmd + Shift + P to open the VS Code command palette.
- Run the command: Ballerina: Configure default WSO2 model provider. This will automatically generate the required configuration entries.
Click the Run button in the top right corner to start the integration.
The integration will compile and launch in the embedded Ballerina runtime. The ingestion automation will run first, followed by the HTTP service.

You can also test the service using tools like Postman or curl:

curl -X POST http://localhost:9090/query -H "Content-Type: application/json" -d '"Who should I contact for refund approval?"'

To stop the integration, click the ⏹️ button or press Shift + F5.