RAG Query¶
In this tutorial, you'll build a simple Retrieval-Augmented Generation (RAG) query flow using WSO2 Integrator: BI. You'll create an HTTP service that retrieves relevant information from a previously ingested vector knowledge base and uses a Large Language Model (LLM) to generate a context-aware response.
By the end of this tutorial, you'll have a working integration that takes a user query, retrieves relevant chunks from the knowledge base, and returns a natural language answer using the configured LLM.
Prerequisites¶
To get started, make sure you have completed the following steps:
- Completed the RAG Ingestion Tutorial.
Using the Same Project
Since this tutorial uses an In-Memory Vector Store (as configured in the ingestion tutorial), the ingested data is only available within the same integration project and runtime session. You'll need to add the HTTP service to the same rag_ingestion project you created in the previous tutorial, rather than creating a new project.
If you used an external vector store like Pinecone, Milvus, or Weaviate in the ingestion tutorial, you can create a separate project and configure the same external vector store connection.
Step 1: Open your existing integration project¶
- Open the
rag_ingestionproject that you created in the RAG Ingestion Tutorial. - When you run this integration, the ingestion automation will execute first, automatically loading your data into the in-memory vector store before the HTTP service becomes available.
Step 2: Create an HTTP service¶
- In the design view, click on the Add Artifact button.
- Select HTTP Service under the Integration as API category.
- Choose Create and use the default HTTP listener from the Listener dropdown.
- Select Design from Scratch as the Service contract option.
- Specify the Service base path as
/. -
Click Create to create the service.
Step 3: Update the resource method¶
- The service will have a default resource named
greetingwith the GET method. Click the edit button next to the/greetingresource. - Change the HTTP method to POST.
- Rename the resource to
query. - Add a payload parameter named
userQueryof typestring. - Keep others set to defaults.
-
Click Save to apply the changes.
Step 4: Retrieve data from the knowledge base¶
Since you're working in the same project where you completed the ingestion tutorial, the vector knowledge base knowledgeBase that you created earlier is already available. You can use it to retrieve relevant chunks based on the user query.
- Click on the newly created
POSTresource to open it in the flow diagram view. - Hover over the flow line and click the + icon.
- Select Knowledge Bases under the AI section.
- In the Knowledge Bases section, click on
knowledgeBase. - Click on retrieve to open the configuration panel.
- Set the Query input to the
userQueryvariable. - Set the Result to
contextto store the matched chunks in a variable namedcontext. -
Click Save to complete the retrieval step.
Using External Vector Stores
If you have an external vector store (Pinecone, Milvus, Weaviate, etc.) with pre-ingested content, you can create a new vector knowledge base by clicking + Add Vector Knowledge Base and following the instructions in Step 5 of the RAG Ingestion Tutorial. Make sure to configure the same vector store and embedding provider settings that were used during ingestion.
Step 5: Augment the user query with retrieved content¶
WSO2 Integrator: BI includes a built-in function to augment the user query with retrieved context from the knowledge base. We'll use that in this step.
- Hover over the flow line and click the + icon.
- Select Augment Query under the AI section.
- Set Context to
context. - Set Query to
userQuery. - Set Result to
augmentedUserMsg. -
Click Save to complete the augmentation step.
Step 6: Connect to an LLM provider¶
After augmenting the query with retrieved context, we can now pass it to an LLM for a grounded response. WSO2 Integrator: BI provides an abstraction called Model Provider to connect with various LLM services.
- Hover over the flow line and click the + icon.
- Select Model Provider under the AI section.
- Click + Add Model Provider to create a new instance.
- Select
Default Model Provider (WSO2)â a WSO2-hosted LLM â for this tutorial. - Set the Model Provider Name to
defaultModel. -
Click Save to complete the configuration.
Step 7: Generate the response¶
Now send the augmented query to the LLM to generate the grounded response.
- Click on the
defaultModelunder the Model Providers section in the side panel. - Select the
generateaction. - Set the Prompt to the expression:
check augmentedUserMsg.content.ensureType(). - Set the Result variable to
response. - Set the Expected Type to
string. -
Click Save.
Understanding the Expression
The expression check augmentedUserMsg.content.ensureType() extracts the augmented query content and ensures it's in the correct string format that the LLM expects. The check keyword handles any potential type conversion errors.
Step 8: Return the response from the service resource¶
- Hover over the flow line and click the + icon.
- Under the Control section, click on Return.
-
Set Expression to
response.
Step 9: Configure default WSO2 providers and run the integration¶
- As the workflow uses the
Default Model Provider (WSO2)andDefault Embedding Provider (WSO2), you need to configure its settings:- Press
Ctrl/Cmd + Shift + Pto open the VS Code command palette. - Run the command:
Ballerina: Configure default WSO2 model provider. This will automatically generate the required configuration entries.
- Press
- Click the Run button in the top right corner to start the integration.
- The integration will compile and launch in the embedded Ballerina runtime. The ingestion automation will run first, followed by the HTTP service.
- You can also test the service using tools like Postman or curl:
curl -X POST http://localhost:9090/query -H "Content-Type: application/json" -d '"Who should I contact for refund approval?"' -
To stop the integration, click the âšī¸ button or press
Shift + F5.







