Innovative AI Solutions | AI Development, Web & Mobile Apps – Delhi, India

Building a High-Performance RAG Chatbot on AWS: A Step-by-Step Guide

Building a High-Performance RAG Chatbot on AWS: A Step-by-Step Guide - Innovative AI Solutions Blog

 What Is RAG and Why AWS?

The RAG Workflow – A Simple Explanation

 
 
Step What Happens AWS Service
Ingestion You upload documents (PDFs, websites, text files) S3, Web Crawler
Chunking Documents are split into smaller pieces Bedrock Knowledge Bases
Embedding Each chunk is converted into a vector (a mathematical representation of meaning) Amazon Titan Embeddings, Nova Multimodal
Storage Vectors are stored in a vector database Aurora pgvector, OpenSearch Serverless, S3 Vectors
Query User asks a question, converted to vector, similar chunks retrieved Bedrock Knowledge Bases
Generation Retrieved chunks + user question sent to LLM for answer Anthropic Claude, Amazon Nova, DeepSeek

"RAG bridges the gap between what an LLM knows from training and what your business knows from its own data. The LLM provides the language skills. Your data provides the knowledge."

Why AWS for RAG?

 
 
Advantage Why It Matters
Fully managed Bedrock Knowledge Bases handles chunking, embedding, and vector storage – no infrastructure to manage
Model choice Use Titan, Claude, Nova, DeepSeek, Llama – switch models without code changes 
Serverless Pay only for what you use – scale to zero when idle 
Security Your data stays in your VPC, encrypted with KMS, never used to train models
Proven at scale Ring, Amazon's home security subsidiary, uses Bedrock Knowledge Bases across 10 international locales 

Step 3: Two Paths – Choose Your Starting Point

 
 
Path Best For Time Cost Complexity
Path 1: Simple Knowledge Base (No Code) Beginners, quick prototype, internal knowledge base 20 minutes Pay-as-you-go (Bedrock, OpenSearch) Low
Path 2: Production-Ready Pipeline (Code + Infrastructure) Customer-facing chatbot, custom RAG, high volume 1-2 days 2−10/monthidle,2−10/monthidle,0.001-0.003 per query Medium-High

"Start with Path 1 to understand the concepts. Move to Path 2 when you need production scale, customizations, or cost optimization."


Step 4: Path 1 – Build a RAG Chatbot in 20 Minutes (No Code)

This path requires zero coding. You will use the Amazon Bedrock console to create a knowledge base and test it immediately.

Prerequisites

 
 
Requirement Details
AWS account Any region where Bedrock is available (us-east-1, us-west-2, eu-west-1, ap-southeast-1)
Model access Enable access to Titan Embeddings and Claude or Nova in Bedrock Model Access
S3 bucket Create a bucket for your source documents

Step 4.1: Enable Model Access

 
 
Action Instructions
Open Bedrock console Navigate to Amazon Bedrock in your AWS account
Model access Click "Model Access" in the left navigation
Enable models Select "Amazon Titan Text Embeddings V2" and "Claude 3" or "Nova"
Submit Wait 2-3 minutes for access to be granted 

Step 4.2: Prepare and Upload Documents

 
 
Document Type Best Practices Size Limit
Text (.txt) Clean formatting, no special characters 10MB per file
Markdown (.md) Use for documentation 10MB per file
PDF (.pdf) Ensure text is selectable (not scanned-only) 10MB per file
HTML (.html) Crawl websites directly – no upload needed  N/A

Supported formats: plain text, markdown, PDF, HTML, CSV, JSON, PowerPoint, Word, Excel 

Step 4.3: Create Knowledge Base in Bedrock Console

 
 
Step Action Notes
1 Open Bedrock console → Knowledge Bases → Create Knowledge Base  
2 Enter name (e.g., company-kb) Use descriptive names
3 Select S3 as data source Choose the bucket with your documents
4 For Chunking strategy, select "Default" 300 token chunks with 20% overlap 
5 For Embeddings, select "Titan Text Embeddings V2" 1024 dimensions, can be reduced to 256 
6 For Vector store, select "OpenSearch Serverless" Fully managed, auto-scales 
7 Review and create Takes 2-5 minutes to create and sync

Step 4.4: Test Your Knowledge Base

 
 
Action Instructions
Open test interface In Knowledge Base details, click "Test"
Select model Choose Claude 3 or Nova Lite 
Ask questions "What is our return policy?" "How do I reset my password?"
Review answers Check that responses cite the source documents

"In testing, Ring found that cross-Region latency accounted for less than 10% of total response time. This allows centralized architecture without per-Region deployments."

Path 1 Cost Estimate

 
 
Service Monthly Cost (Idle) Cost per Query
OpenSearch Serverless ~350 USD/min (can be reduced with idle settings) Included
Bedrock Embeddings 0 USD (no ingestion after initial) 0 USD
Bedrock LLM (Claude/Nova) 0 USD $0.001-0.003
S3 Storage $0.02/GB $0

Important: OpenSearch Serverless has a minimum monthly cost of approximately 350 USD. For low-volume workloads, consider Path 2 with S3 Vectors .


Step 5: Path 2 – Production-Ready Serverless RAG Pipeline

This path uses an open-source, serverless architecture that scales to zero when idle. Monthly costs for a small knowledge base can be as low as $2-10 USD .

Architecture Overview

text
┌─────────────────────────────────────────────────────────────────────────────┐
│                    PRODUCTION RAG PIPELINE ARCHITECTURE                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   ┌──────────┐    ┌──────────┐    ┌──────────────┐    ┌──────────────┐      │
│   │   User   │◄──►│  API     │◄──►│   Lambda     │◄──►│   Bedrock    │      │
│   │  Frontend│    │ Gateway  │    │  Orchestrator│    │  Knowledge   │      │
│   └──────────┘    └──────────┘    └──────────────┘    │     Base     │      │
│                                                       └──────┬───────┘      │
│                                                              │              │
│   ┌──────────────┐    ┌──────────────┐    ┌──────────────┐   │              │
│   │     S3       │    │   Step       │    │   DynamoDB   │   │              │
│   │  Documents   │───►│  Functions   │───►│   Metadata   │   │              │
│   └──────────────┘    └──────────────┘    └──────────────┘   │              │
│                                                              │              │
│                                       ┌──────────────────────┘              │
│                                       ▼                                     │
│                              ┌──────────────────┐                           │
│                              │   OpenSearch     │                           │
│                              │  Serverless /    │                           │
│                              │    S3 Vectors    │                           │
│                              └──────────────────┘                           │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Source: 

Step 5.1: Set Up Your AWS Environment

 
 
Action Instructions Cost
Create S3 bucket aws s3 mb s3://your-rag-documents --region us-east-1 $0.023/GB
Upload documents aws s3 cp ./documents/ s3://your-rag-documents/ --recursive Pay for storage
Create IAM role Role with Bedrock, S3, OpenSearch, Lambda permissions Free
Enable Bedrock model access Console → Model Access → Enable Titan + Claude Pay per token

Step 5.2: Create Knowledge Base (Infrastructure as Code)

Using AWS CDK or Terraform:

python
# AWS CDK - knowledge_base_stack.py
from aws_cdk import Stack
from aws_cdk.aws_bedrock import CfnKnowledgeBase, CfnDataSource

class RagKnowledgeBaseStack(Stack):
    def __init__(self, scope, id, **kwargs):
        super().__init__(scope, id, **kwargs)

        # Create S3 bucket for documents
        documents_bucket = s3.Bucket(self, "DocumentsBucket")

        # Create knowledge base
        knowledge_base = CfnKnowledgeBase(self, "RagKnowledgeBase",
            name="company-rag-kb",
            description="RAG knowledge base for company documents",
            role_arn=kb_role.role_arn,
            knowledge_base_configuration={
                "type": "VECTOR",
                "vectorKnowledgeBaseConfiguration": {
                    "embeddingModelArn": "arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0"
                }
            },
            storage_configuration={
                "type": "OPENSEARCH_SERVERLESS",
                "opensearchServerlessConfiguration": {
                    "collectionArn": collection_arn,
                    "vectorIndexName": "rag-index",
                    "fieldMapping": {
                        "metadataField": "metadata",
                        "textField": "text"
                    }
                }
            }
        )

Step 5.3: Implement Query API with Lambda and API Gateway

python
# lambda_function.py - Query handler
import boto3
import json

bedrock_agent_runtime = boto3.client('bedrock-agent-runtime')

KNOWLEDGE_BASE_ID = 'your-knowledge-base-id'
MODEL_ARN = 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0'

def lambda_handler(event, context):
    body = json.loads(event['body'])
    user_query = body.get('query', '')

    # Retrieve and generate using Knowledge Bases
    response = bedrock_agent_runtime.retrieve_and_generate(
        input={'text': user_query},
        retrieveAndGenerateConfiguration={
            'type': 'KNOWLEDGE_BASE',
            'knowledgeBaseConfiguration': {
                'knowledgeBaseId': KNOWLEDGE_BASE_ID,
                'modelArn': MODEL_ARN,
                'retrievalConfiguration': {
                    'vectorSearchConfiguration': {
                        'numberOfResults': 5
                    }
                }
            }
        }
    )

    return {
        'statusCode': 200,
        'headers': {'Content-Type': 'application/json'},
        'body': json.dumps({
            'answer': response['output']['text'],
            'citations': response.get('citations', [])
        })
    }

Source: Adapted from 

Step 5.4: Add Metadata Filtering for Multi-Locale Support

Ring's production architecture uses metadata filtering to serve Region-specific content from a single centralized system. For example, a knowledge base might store content tagged with contentLocale:

text
{locale}/Service.Ring.{Upsert/Delete}.{unique_identifier}.metadata.json
python
# Query with metadata filtering
response = bedrock_agent_runtime.retrieve_and_generate(
    input={'text': user_query},
    retrieveAndGenerateConfiguration={
        'type': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': KNOWLEDGE_BASE_ID,
            'modelArn': MODEL_ARN,
            'retrievalConfiguration': {
                'vectorSearchConfiguration': {
                    'numberOfResults': 5,
                    'filter': {
                        'equals': {
                            'key': 'contentLocale',
                            'value': 'en-US'  # Or dynamically from user profile
                        }
                    }
                }
            }
        }
    }
)

Source: 

Step 5.5: Add Source Citations and Collapsible References

python
# Enhanced Lambda response with citations
response = bedrock_agent_runtime.retrieve_and_generate(
    input={'text': user_query},
    retrieveAndGenerateConfiguration={
        'type': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': KNOWLEDGE_BASE_ID,
            'modelArn': MODEL_ARN,
            'retrievalConfiguration': {
                'vectorSearchConfiguration': {
                    'numberOfResults': 5
                }
            },
            'generationConfiguration': {
                'inferenceConfig': {
                    'textInferenceConfig': {
                        'temperature': 0.7,
                        'maxTokens': 500
                    }
                }
            }
        }
    }
)

# Process citations from retrieval results
citations = []
for result in response.get('retrievalResults', []):
    citations.append({
        'source': result['location']['s3Location']['uri'],
        'content': result['content']['text'],
        'score': result['score']
    })

Source: 


Step 6: Optimizing for Cost and Performance

Cost Optimization Strategies

 
 
Strategy Implementation Savings
Scale to zero Use S3 Vectors instead of OpenSearch Serverless 350/month→350/month→3/month 
Reduce embedding dimensions Titan V2 at 256 dimensions vs 1024 retains 97% accuracy 75% storage reduction 
Cache frequent queries Use ElastiCache or DynamoDB for repeated questions 50-70% fewer Bedrock calls
Use spot/preemptible For batch processing, not real-time 70-90% off compute
Monitor with Cost Explorer Set budgets and alerts Prevent surprise bills

Performance Optimization Strategies

 
 
Strategy Implementation Latency Improvement
Optimize chunk size Experiment with 256-512 tokens; test semantic coherence 20-30% faster retrieval
Use smaller embedding dimensions 256 vs 1024 dimensions Faster vector search
Pre-filter by metadata Apply filters before vector search Reduces search space
Select faster model for simple queries Route to Nova Lite or Haiku; use Claude Sonnet for complex 2-3x faster for simple Q&A
Enable response streaming Stream tokens to user as they generate Perceived latency reduction

Ring's Performance Architecture

Ring's production RAG chatbot requirements specify:

  • Average end-to-end latency: 7-8 seconds

  • Cross-Region latency accounts for: Less than 10% of total response time

  • This allowed: Centralized architecture rather than per-Region deployment

Source: 


Step 7: Advanced Features for Production

Feature 1: Multi-Locale Support with Metadata Filtering

Ring's architecture uses metadata-driven filtering to serve Region-specific content from a single centralized system, reducing cost per additional locale by 21% .

 
 
Component Implementation
Content tagging Each document tagged with contentLocale (en-US, en-GB, de-DE, etc.)
Ingestion pipeline Step Functions orchestrates daily knowledge base creation 
Evaluation pipeline LLM-as-a-judge compares versions and promotes highest-performing 
Query filtering Lambda applies metadata filter based on user's locale

Feature 2: Versioning and Rollback

Ring maintains 30 days of version history for knowledge bases:

  • Daily sync creates new version

  • Evaluation pipeline tests retrieval accuracy

  • Highest-performing version promoted to production

  • Rollback available within 30 days

Feature 3: Multi-Modal Support (Images, Video, Audio)

RAGStack-Lambda supports:

  • Images: Amazon Nova Multimodal Embeddings for visual search; Textract for OCR of scanned images

  • Video: Transcribe for speech-to-text, split into 30-second searchable chunks with speaker identification

  • Audio: Transcribe for transcription, timestamp indexing

Source: 

Feature 4: Web Crawler for Documentation

Bedrock Knowledge Bases can crawl websites directly without S3 upload :

json
{
  "dataSourceConfiguration": {
    "webConfiguration": {
      "crawlerConfiguration": {
        "sourceUrls": ["https://docs.aws.amazon.com/AmazonS3/latest/userguide/"],
        "crawlScope": "HOST_ONLY",
        "urlRegex": {
          "inclusionPatterns": [".*"],
          "exclusionPatterns": ["*.pdf"]
        }
      }
    }
  }
}

Feature 5: Agentic RAG with Amazon Bedrock Agents

For complex workflows (order recommendations, multi-step actions), combine Knowledge Bases with Bedrock Agents:

 
 
Component Purpose
Knowledge Base Stores product info, policies, documentation
Agent Orchestrates multi-step tasks (look up order, check inventory, recommend product)
Action Groups Connect to backend APIs (order status, inventory check, purchase)

Source: 


Step 8: Real-World Case Study – Ring's Multi-Locale RAG Chatbot

The Challenge

Ring needed to provide accurate, contextually relevant support across 10 international locales without creating separate infrastructure for each Region. Each territory needed Region-specific product information, from voltage specifications to regulatory compliance details .

The Solution

Ring built a RAG-based chatbot on Amazon Bedrock Knowledge Bases with:

 
 
Component Implementation
Metadata filtering Content tagged with contentLocale; query filters by user's locale
Two-phase content management Ingestion & Evaluation workflow + Promotion workflow 
Daily evaluation LLM-as-a-judge compares version performance
Centralized architecture Single knowledge base serves all locales

The Results

 
 
Metric Result
Cost reduction per additional locale 21%
Locales supported 10 international Regions
Content updates per week Approximately 200
Version retention 30 days

Source: 


Step 9: Implementation Roadmap

Week 1: Foundation

 
 
Day Task Deliverable
1-2 Set up AWS account, enable Bedrock model access, create S3 bucket Configured environment
3-4 Upload sample documents, create Knowledge Base via console Working prototype
5-7 Test with questions, evaluate answer quality Validation results

Week 2: Production Ready

 
 
Day Task Deliverable
8-9 Implement Lambda + API Gateway for query endpoint REST API
10-11 Add metadata filtering (if multi-tenant/locale) Filtered retrieval
12-13 Add source citations to responses Enhanced UX
14 Set up monitoring (CloudWatch, Cost Explorer) Production readiness

Step 10: Common Pitfalls and How to Avoid Them

 
 
Pitfall Symptom Solution
Unoptimized chunking Retrieved chunks are irrelevant or incomplete Experiment with chunk size (256-512) and overlap (10-20%)
No metadata filtering Retrieval returns results from wrong category/locale Add metadata fields and filter in queries 
OpenSearch cost surprise Monthly bill $300+ even with low usage Use S3 Vectors for low-volume; configure idle settings 
Stale documents Answers don't reflect latest policies Implement daily or weekly document sync
No evaluation pipeline Don't know if answers are improving Use LLM-as-a-judge to score versions 

Step 11: Frequently Asked Questions

Q1: What is the cheapest way to run RAG on AWS?

Use S3 Vectors for your vector store, not OpenSearch Serverless. S3 Vectors charge object storage rates (0.023/GB)ratherthanaminimummonthlyfee.RAGStack−Lambdademonstratesaserverlesspipelinethatcosts0.023/GB)ratherthanaminimummonthlyfee.RAGStack−Lambdademonstratesaserverlesspipelinethatcosts2-10 per month idle .

Q2: How do I choose between OpenSearch Serverless and S3 Vectors?

 
 
Workload Recommendation
Low volume (<1,000 queries/month), cost-sensitive S3 Vectors 
Moderate volume, needs low latency OpenSearch Serverless
Enterprise scale, needs advanced search features OpenSearch (provisioned)

Q3: How do I keep my knowledge base updated?

  • Method 1: S3 event notifications trigger sync on new uploads

  • Method 2: Scheduled Step Functions workflow (daily sync) 

  • Method 3: Web crawler re-crawl on schedule 

Q4: Which LLM should I use for RAG on Bedrock?

 
 
Model Best For Cost
Claude 3 Sonnet Complex Q&A, reasoning Medium
Claude 3 Haiku Fast, cost-effective responses Low
Amazon Nova Lite Balanced performance/cost Very low
Amazon Titan Text Simple Q&A, embeddings Very low
DeepSeek Specialized knowledge domains Pay-per-token 

Q5: Can I use RAG with scanned PDFs?

Yes. Textract handles OCR for scanned PDFs and images. Textract charges approximately $1.50 per 1,000 pages for standard text detection .

Q6: How do I debug retrieval quality?

  • Enable CloudWatch logs for Bedrock Knowledge Bases

  • Log retrieved chunks with relevance scores

  • Use a test set of Q&A pairs to measure retrieval accuracy

  • Implement LLM-as-a-judge evaluation pipeline 

Q7: How do I prevent hallucinations?

  • Set temperature low (0.2-0.5)

  • Use RAG retrieval threshold – only answer if relevant chunks found

  • Add system prompt: "Only answer from the provided context. If unsure, say 'I don't know.'"

  • Cite sources in responses

Q8: What is the difference between Agents and Knowledge Bases?

 
 
Knowledge Bases Agents for Bedrock
Retrieves relevant documents Reasons, plans, takes actions
Answers questions Executes multi-step tasks
Best for Q&A Best for workflows (order status, booking, etc.)

They can be combined: Knowledge Base provides information; Agent orchestrates actions .


Step 12: Final Tagline

"Your company has documents. Policies. Manuals. Support articles. What if an AI could read all of them and answer any question instantly? RAG on AWS makes it possible – for less than the cost of a coffee per month."

Short version:
Step-by-step guide to building a high-performance RAG chatbot on AWS – from 20-minute Knowledge Base setup to production-ready, serverless pipeline. Costs less than $3/month idle.

Hashtags:
#RAGChatbot #AmazonBedrock #AWS #GenerativeAI #KnowledgeBase #Serverless #AIEngineering #InnovativeAISolutions

Ready to Build Your RAG Chatbot?

You don't need to be an AI expert. You need the right architecture and a clear roadmap. Let us help you build it.

Contact Us

Phone: +91 7464 099 059 / +91 96899 67356
Email: info@innovativeais.com
Address: Netaji Subhash Place, Pitampura, Delhi – 110034
Website: https://innovativeais.com

 
 
 
 
 
📢 Share this article:

Ready to build AI solutions for your business?

Innovative AI Solutions — Delhi's leading AI development company. Free consultation available.

Get Free Consultation →