Build a RAG Application¶

This tutorial guides you through creating a Retrieval-Augmented Generation (RAG) system using WSO2 Integrator: BI. While there are several ways to structure a RAG workflow, we’ll focus on a typical two-phase approach: ingestion and retrieval.

RAG ingestion¶

This step is managed through Devant and it focuses on preparing documents for efficient retrieval in the RAG system.

Chunk the information into smaller, meaningful sections
Convert each chunk into embeddings using an embedding model
Store embeddings in the vector database for efficient retrieval

We assume that you've already used Devant to process and ingest the documents. Devant handles the entire ingestion process independently of the main application flow. The following steps of the tutorial focus solely on RAG retrieval.

RAG retrieval¶

This tutorial focuses on implementing the retrieval component of a Retrieval-Augmented Generation (RAG) system using the WSO2 Integrator: BI.

Convert the user's question into embeddings
Perform a similarity search in the vector database
Fetch the most relevant chunks
Include only the relevant data in the prompt
Generate a fact-grounded answer using the LLM

By the end of this tutorial, you'll have a working RAG system that can retrieve relevant information and generate accurate, grounded responses using pre-ingested documents.

Prerequisites¶

Access to Pinecone vector database (requires API key and service URL)
Access to Azure OpenAI (requires API key and endpoint URL)
Access to Devant

Step 1: Create an HTTP service¶

In the design view, click on the Add Artifact button.
Select HTTP Service under the Integration as API category.
Select the Create and use the default HTTP listener (port:9090) option from the Listeners dropdown.
Select the Design from Scratch option as the Service Contract and use /personalAssistant as the Service base path.
Click on the Create button to create the new service with the specified configurations.
The service will have a default resource named greeting with the GET method.
Click the Edit FunctionModel button in front of /greeting resource.
Change the resource HTTP method to POST.
Change the resource name to chat.
Click on Add Parameter under the Parameters and specify the parameters you need. Select the Param Type as QUERY and specify request as the name and ChatRequestMessage as the type.
Change the 200 response return type to string.
Click on the Save button to update the resource with the specified configurations.

Note

Here we use a modular approach for the resource logic for the /chat resource. You may use your own logic calling directly in the /chat service without creating functions separately.

This approach allows for flexibility in implementation - you can either:

Follow the modular pattern shown in this tutorial for better organization and maintainability
Implement your logic directly within the /chat resource function based on your specific requirements

Step 2: Implementation of RAG¶

2.1 Retrieve embeddings for user query

Follow these steps to create a function that retrieves embeddings using Azure OpenAI:

2.1.1 Create an embeddings function

Click the + button in the Integrator side panel under the Functions section.
Provide the required details to create the function. Use getEmbeddings as the function name and specify the parameters and return types.

2.1.2 Add embeddings connection

Click the + button and select the + Add Connection in the side panel.
Select the connector Embeddings - ballerinax/azure.openai.embeddings.

2.1.3 Configure the embeddings connector

In the configuration of the connector, under the Config select the Add Expression to open the Expression Helper window.
In the Expression Helper, navigate to Configurables, click the Create new configurable variable. Here we create azure_api_key and azure_service_url.
Select the ConnectionConfig under the Construct Record in the Expression Helper window.
Change the BearerTokenConfig to ApiKeysConfig in the auth.
Select the Configurables and click the azure_api_key.
Expand the Advanced Configurations section. Under the ServiceUrl select the Add Expression to open the Expression Helper window.
In the Expression Helper, navigate to Configurables, select on azure_service_url as the value for ServiceUrl and click Save button.

2.1.4 Implement the embeddings function logic

Click the + button and select the Declare Variable under the Statement.
Create variable name as embeddingsBody and specify its type and expression.
Click the + button and select the embeddingsClient.
Configure the client with the DeploymentId, payload and API version.
Configure the function to convert the returned decimal embeddings to float values.
Return the final float array.

2.2 Retrieve relevant chunks from vector database

Follow these steps to create a function that retrieves similar vectors from Pinecone using vector embeddings:

2.2.1 Add Pinecone vector connection

Click the + button in the Integrator side panel under the Connections section.
Select the connector Vector - ballerinax/pinecone.vector.

2.2.2 Configure the connector

In the configuration of the connector, under the ApiKeyConfig select the Add Expression to open the Expression Helper window.
Select the Configurables and click the Create new configurable variable. Here we create pinecone_api_key and pinecone_url.
Select the ConnectionConfig under the Construct Record in the Expression Helper window.
Click the ApiKeysConfig in the auth, select the Configurables and click the pinecone_api_key.
Enter the pinecone_url as ServiceUrl and save it.

2.2.3 Create a retriever function

Click the + button in the Integrator side panel under the Functions section.
Provide the required details to create the function. Use retrieveData as the function name and specify the parameters and return types.

2.2.4 Implement the retriever function logic

Click the + button and select the vectorClient.
Select Query from the vectorClient dropdown.
Configure the vector client and specify the payload. Here, we use { topK: 4} for the record QueryRequest.
Extract the matches array from the QueryResponse.
Handle null response scenarios with appropriate error handling.
Return the relevant matching array from the client response.

2.3 Augment queries with relevant chunks

Follow these steps to create a function that augments queries with relevant text chunks from vector search results:

2.3.1 Create an augment function

Click the + button in the Integrator side panel under the Functions section.
Create the function with augment as the function name and specify the parameter type and return type.

2.3.2 Implement the augment function logic

Create an empty string variable named context.
Add a foreach loop to process each match in the input array.
Extract metadata from each match and convert to the appropriate type.
Concatenate the text from metadata to the context string.
Return the aggregated context string with all relevant text chunks.

2.4 Generate response using the context

2.4.1 Add chat client connection

Click the + button in the Integrator side panel under the Connections section.
Select the connector Chat - ballerinax/azure.openai.chat.
In the configuration of the connector, under the Config select the ConnectionConfig under the Construct Record in the Expression Helper window.
Change the BearerTokenConfig to ApiKeysConfig in the auth.
Select the Configurables and click the azure_api_key.
Expand the Advanced Configurations and Enter the azure_service_url as ServiceUrl and save it.

Model Flexibility

While this tutorial demonstrates Azure OpenAI integration, the same principles apply to other AI providers. You can adapt this implementation to work with:

OpenAI API
Anthropic's Claude API
Google's PaLM API
Local models (via APIs like Ollama)
Other cloud AI services

Simply replace the connector and adjust the API configuration parameters according to your chosen provider's requirements.

2.4.2 Create a generate function

Click the + button in the Integrator side panel under the Functions section.
Create the function with generateText as the function name and specify the parameters and return types.

2.4.3 Implement the generate function logic

Create variables such as systemPrompt and chatRequest.
Click the + button and select the chatClient.
Select Creates a completion for the chat message from the chatClient dropdown.
Configure the client and specify the DeploymentId, API version, and payload.
Return the chat response from the client.

Step 3: Create the combined LLM function¶

3.1 Create the LLM function

Click the + button in the Integrator side panel under the Functions section.
Create the function with llmChat as the function name and specify the parameters and return types.

3.2 Implement the function logic

This function orchestrates the entire RAG (Retrieval-Augmented Generation):

Get Embeddings: Call the getEmbeddings function with the user query to convert it into vector embeddings.
Retrieve Data: Use the embeddings to query the vector database through the retrieveData function to get relevant document chunks.
Augment Context: Process the retrieved chunks using the augment function to create a consolidated context string.
Generate Response: Call the generateText function with both the original query and the augmented context to generate the final response.
Return Result: Return the generated response string.

This completes the end-to-end RAG where user queries are processed through embeddings, vector search, context augmentation, and LLM generation before returning intelligent responses through the HTTP API.

Step 4: Integrate with HTTP service¶

4.1 Update the chat resource

Go back to the HTTP service created in Step 1. In the /chat resource implementation:

Call the llmChat function with the user's query.
Return the chat response.

Step 5: Run the integration and query the RAG¶

Click on the Run button in the top-right corner to run the integration.
If you have added any variables to the project, you’ll be prompted to update their values in the Config.toml file. Configure them to continue with the execution of the request.

Query the RAG by sending the curl request below.

curl --location 'http://localhost:9090/personalAssistant/chat' \
--header 'Content-Type: application/json' \
--data '{"message": "What is the process for reporting safety concerns?"}'

Response May Vary

Since this integration involves an LLM (Large Language Model) call, the response values may not always be identical across different executions.

Your RAG system is now ready to answer questions using retrieved context from your vector database!