What Is RAG and Why AWS?

The RAG Workflow – A Simple Explanation

Step	What Happens	AWS Service
Ingestion	You upload documents (PDFs, websites, text files)	S3, Web Crawler
Chunking	Documents are split into smaller pieces	Bedrock Knowledge Bases
Embedding	Each chunk is converted into a vector (a mathematical representation of meaning)	Amazon Titan Embeddings, Nova Multimodal
Storage	Vectors are stored in a vector database	Aurora pgvector, OpenSearch Serverless, S3 Vectors
Query	User asks a question, converted to vector, similar chunks retrieved	Bedrock Knowledge Bases
Generation	Retrieved chunks + user question sent to LLM for answer	Anthropic Claude, Amazon Nova, DeepSeek

"RAG bridges the gap between what an LLM knows from training and what your business knows from its own data. The LLM provides the language skills. Your data provides the knowledge."

Why AWS for RAG?

Advantage	Why It Matters
Fully managed	Bedrock Knowledge Bases handles chunking, embedding, and vector storage – no infrastructure to manage
Model choice	Use Titan, Claude, Nova, DeepSeek, Llama – switch models without code changes
Serverless	Pay only for what you use – scale to zero when idle
Security	Your data stays in your VPC, encrypted with KMS, never used to train models
Proven at scale	Ring, Amazon's home security subsidiary, uses Bedrock Knowledge Bases across 10 international locales

Step 3: Two Paths – Choose Your Starting Point

Path	Best For	Time	Cost	Complexity
Path 1: Simple Knowledge Base (No Code)	Beginners, quick prototype, internal knowledge base	20 minutes	Pay-as-you-go (Bedrock, OpenSearch)	Low
Path 2: Production-Ready Pipeline (Code + Infrastructure)	Customer-facing chatbot, custom RAG, high volume	1-2 days	2−10/monthidle,2−10/monthidle,0.001-0.003 per query	Medium-High

"Start with Path 1 to understand the concepts. Move to Path 2 when you need production scale, customizations, or cost optimization."

Step 4: Path 1 – Build a RAG Chatbot in 20 Minutes (No Code)

This path requires zero coding. You will use the Amazon Bedrock console to create a knowledge base and test it immediately.

Prerequisites

Requirement	Details
AWS account	Any region where Bedrock is available (us-east-1, us-west-2, eu-west-1, ap-southeast-1)
Model access	Enable access to Titan Embeddings and Claude or Nova in Bedrock Model Access
S3 bucket	Create a bucket for your source documents

Step 4.1: Enable Model Access

Action	Instructions
Open Bedrock console	Navigate to Amazon Bedrock in your AWS account
Model access	Click "Model Access" in the left navigation
Enable models	Select "Amazon Titan Text Embeddings V2" and "Claude 3" or "Nova"
Submit	Wait 2-3 minutes for access to be granted

Step 4.2: Prepare and Upload Documents

Document Type	Best Practices	Size Limit
Text (`.txt`)	Clean formatting, no special characters	10MB per file
Markdown (`.md`)	Use for documentation	10MB per file
PDF (`.pdf`)	Ensure text is selectable (not scanned-only)	10MB per file
HTML (`.html`)	Crawl websites directly – no upload needed	N/A

Supported formats: plain text, markdown, PDF, HTML, CSV, JSON, PowerPoint, Word, Excel

Step 4.3: Create Knowledge Base in Bedrock Console

Step	Action	Notes
1	Open Bedrock console → Knowledge Bases → Create Knowledge Base
2	Enter name (e.g., `company-kb`)	Use descriptive names
3	Select S3 as data source	Choose the bucket with your documents
4	For Chunking strategy, select "Default"	300 token chunks with 20% overlap
5	For Embeddings, select "Titan Text Embeddings V2"	1024 dimensions, can be reduced to 256
6	For Vector store, select "OpenSearch Serverless"	Fully managed, auto-scales
7	Review and create	Takes 2-5 minutes to create and sync

Step 4.4: Test Your Knowledge Base

Action	Instructions
Open test interface	In Knowledge Base details, click "Test"
Select model	Choose Claude 3 or Nova Lite
Ask questions	"What is our return policy?" "How do I reset my password?"
Review answers	Check that responses cite the source documents

"In testing, Ring found that cross-Region latency accounted for less than 10% of total response time. This allows centralized architecture without per-Region deployments."

Path 1 Cost Estimate

Service	Monthly Cost (Idle)	Cost per Query
OpenSearch Serverless	~350 USD/min (can be reduced with idle settings)	Included
Bedrock Embeddings	0 USD (no ingestion after initial)	0 USD
Bedrock LLM (Claude/Nova)	0 USD	$0.001-0.003
S3 Storage	$0.02/GB	$0

Important: OpenSearch Serverless has a minimum monthly cost of approximately 350 USD. For low-volume workloads, consider Path 2 with S3 Vectors .

Step 5: Path 2 – Production-Ready Serverless RAG Pipeline

This path uses an open-source, serverless architecture that scales to zero when idle. Monthly costs for a small knowledge base can be as low as $2-10 USD .

Architecture Overview

text

┌─────────────────────────────────────────────────────────────────────────────┐
│                    PRODUCTION RAG PIPELINE ARCHITECTURE                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   ┌──────────┐    ┌──────────┐    ┌──────────────┐    ┌──────────────┐      │
│   │   User   │◄──►│  API     │◄──►│   Lambda     │◄──►│   Bedrock    │      │
│   │  Frontend│    │ Gateway  │    │  Orchestrator│    │  Knowledge   │      │
│   └──────────┘    └──────────┘    └──────────────┘    │     Base     │      │
│                                                       └──────┬───────┘      │
│                                                              │              │
│   ┌──────────────┐    ┌──────────────┐    ┌──────────────┐   │              │
│   │     S3       │    │   Step       │    │   DynamoDB   │   │              │
│   │  Documents   │───►│  Functions   │───►│   Metadata   │   │              │
│   └──────────────┘    └──────────────┘    └──────────────┘   │              │
│                                                              │              │
│                                       ┌──────────────────────┘              │
│                                       ▼                                     │
│                              ┌──────────────────┐                           │
│                              │   OpenSearch     │                           │
│                              │  Serverless /    │                           │
│                              │    S3 Vectors    │                           │
│                              └──────────────────┘                           │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Source:

Step 5.1: Set Up Your AWS Environment

Action	Instructions	Cost
Create S3 bucket	`aws s3 mb s3://your-rag-documents --region us-east-1`	$0.023/GB
Upload documents	`aws s3 cp ./documents/ s3://your-rag-documents/ --recursive`	Pay for storage
Create IAM role	Role with Bedrock, S3, OpenSearch, Lambda permissions	Free
Enable Bedrock model access	Console → Model Access → Enable Titan + Claude	Pay per token

Step 5.2: Create Knowledge Base (Infrastructure as Code)

Using AWS CDK or Terraform:

python

# AWS CDK - knowledge_base_stack.py
from aws_cdk import Stack
from aws_cdk.aws_bedrock import CfnKnowledgeBase, CfnDataSource

class RagKnowledgeBaseStack(Stack):
    def __init__(self, scope, id, **kwargs):
        super().__init__(scope, id, **kwargs)

        # Create S3 bucket for documents
        documents_bucket = s3.Bucket(self, "DocumentsBucket")

        # Create knowledge base
        knowledge_base = CfnKnowledgeBase(self, "RagKnowledgeBase",
            name="company-rag-kb",
            description="RAG knowledge base for company documents",
            role_arn=kb_role.role_arn,
            knowledge_base_configuration={
                "type": "VECTOR",
                "vectorKnowledgeBaseConfiguration": {
                    "embeddingModelArn": "arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0"
                }
            },
            storage_configuration={
                "type": "OPENSEARCH_SERVERLESS",
                "opensearchServerlessConfiguration": {
                    "collectionArn": collection_arn,
                    "vectorIndexName": "rag-index",
                    "fieldMapping": {
                        "metadataField": "metadata",
                        "textField": "text"
                    }
                }
            }
        )

Step 5.3: Implement Query API with Lambda and API Gateway

python

# lambda_function.py - Query handler
import boto3
import json

bedrock_agent_runtime = boto3.client('bedrock-agent-runtime')

KNOWLEDGE_BASE_ID = 'your-knowledge-base-id'
MODEL_ARN = 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0'

def lambda_handler(event, context):
    body = json.loads(event['body'])
    user_query = body.get('query', '')

    # Retrieve and generate using Knowledge Bases
    response = bedrock_agent_runtime.retrieve_and_generate(
        input={'text': user_query},
        retrieveAndGenerateConfiguration={
            'type': 'KNOWLEDGE_BASE',
            'knowledgeBaseConfiguration': {
                'knowledgeBaseId': KNOWLEDGE_BASE_ID,
                'modelArn': MODEL_ARN,
                'retrievalConfiguration': {
                    'vectorSearchConfiguration': {
                        'numberOfResults': 5
                    }
                }
            }
        }
    )

    return {
        'statusCode': 200,
        'headers': {'Content-Type': 'application/json'},
        'body': json.dumps({
            'answer': response['output']['text'],
            'citations': response.get('citations', [])
        })
    }

Source: Adapted from

Step 5.4: Add Metadata Filtering for Multi-Locale Support

Ring's production architecture uses metadata filtering to serve Region-specific content from a single centralized system. For example, a knowledge base might store content tagged with contentLocale:

text

{locale}/Service.Ring.{Upsert/Delete}.{unique_identifier}.metadata.json

python

# Query with metadata filtering
response = bedrock_agent_runtime.retrieve_and_generate(
    input={'text': user_query},
    retrieveAndGenerateConfiguration={
        'type': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': KNOWLEDGE_BASE_ID,
            'modelArn': MODEL_ARN,
            'retrievalConfiguration': {
                'vectorSearchConfiguration': {
                    'numberOfResults': 5,
                    'filter': {
                        'equals': {
                            'key': 'contentLocale',
                            'value': 'en-US'  # Or dynamically from user profile
                        }
                    }
                }
            }
        }
    }
)

Source:

Step 5.5: Add Source Citations and Collapsible References

python

# Enhanced Lambda response with citations
response = bedrock_agent_runtime.retrieve_and_generate(
    input={'text': user_query},
    retrieveAndGenerateConfiguration={
        'type': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': KNOWLEDGE_BASE_ID,
            'modelArn': MODEL_ARN,
            'retrievalConfiguration': {
                'vectorSearchConfiguration': {
                    'numberOfResults': 5
                }
            },
            'generationConfiguration': {
                'inferenceConfig': {
                    'textInferenceConfig': {
                        'temperature': 0.7,
                        'maxTokens': 500
                    }
                }
            }
        }
    }
)

# Process citations from retrieval results
citations = []
for result in response.get('retrievalResults', []):
    citations.append({
        'source': result['location']['s3Location']['uri'],
        'content': result['content']['text'],
        'score': result['score']
    })

Source:

Step 6: Optimizing for Cost and Performance

Cost Optimization Strategies

Strategy	Implementation	Savings
Scale to zero	Use S3 Vectors instead of OpenSearch Serverless	350/month→350/month→3/month
Reduce embedding dimensions	Titan V2 at 256 dimensions vs 1024 retains 97% accuracy	75% storage reduction
Cache frequent queries	Use ElastiCache or DynamoDB for repeated questions	50-70% fewer Bedrock calls
Use spot/preemptible	For batch processing, not real-time	70-90% off compute
Monitor with Cost Explorer	Set budgets and alerts	Prevent surprise bills

Performance Optimization Strategies

Strategy	Implementation	Latency Improvement
Optimize chunk size	Experiment with 256-512 tokens; test semantic coherence	20-30% faster retrieval
Use smaller embedding dimensions	256 vs 1024 dimensions	Faster vector search
Pre-filter by metadata	Apply filters before vector search	Reduces search space
Select faster model for simple queries	Route to Nova Lite or Haiku; use Claude Sonnet for complex	2-3x faster for simple Q&A
Enable response streaming	Stream tokens to user as they generate	Perceived latency reduction

Ring's Performance Architecture

Ring's production RAG chatbot requirements specify:

Average end-to-end latency: 7-8 seconds
Cross-Region latency accounts for: Less than 10% of total response time
This allowed: Centralized architecture rather than per-Region deployment

Source:

Step 7: Advanced Features for Production

Feature 1: Multi-Locale Support with Metadata Filtering

Ring's architecture uses metadata-driven filtering to serve Region-specific content from a single centralized system, reducing cost per additional locale by 21% .

Component	Implementation
Content tagging	Each document tagged with `contentLocale` (en-US, en-GB, de-DE, etc.)
Ingestion pipeline	Step Functions orchestrates daily knowledge base creation
Evaluation pipeline	LLM-as-a-judge compares versions and promotes highest-performing
Query filtering	Lambda applies metadata filter based on user's locale

Feature 2: Versioning and Rollback

Ring maintains 30 days of version history for knowledge bases:

Daily sync creates new version
Evaluation pipeline tests retrieval accuracy
Highest-performing version promoted to production
Rollback available within 30 days

Feature 3: Multi-Modal Support (Images, Video, Audio)

RAGStack-Lambda supports:

Images: Amazon Nova Multimodal Embeddings for visual search; Textract for OCR of scanned images
Video: Transcribe for speech-to-text, split into 30-second searchable chunks with speaker identification
Audio: Transcribe for transcription, timestamp indexing

Source:

Feature 4: Web Crawler for Documentation

Bedrock Knowledge Bases can crawl websites directly without S3 upload :

json

{
  "dataSourceConfiguration": {
    "webConfiguration": {
      "crawlerConfiguration": {
        "sourceUrls": ["https://docs.aws.amazon.com/AmazonS3/latest/userguide/"],
        "crawlScope": "HOST_ONLY",
        "urlRegex": {
          "inclusionPatterns": [".*"],
          "exclusionPatterns": ["*.pdf"]
        }
      }
    }
  }
}

Feature 5: Agentic RAG with Amazon Bedrock Agents

For complex workflows (order recommendations, multi-step actions), combine Knowledge Bases with Bedrock Agents:

Component	Purpose
Knowledge Base	Stores product info, policies, documentation
Agent	Orchestrates multi-step tasks (look up order, check inventory, recommend product)
Action Groups	Connect to backend APIs (order status, inventory check, purchase)

Source:

Step 8: Real-World Case Study – Ring's Multi-Locale RAG Chatbot

The Challenge

Ring needed to provide accurate, contextually relevant support across 10 international locales without creating separate infrastructure for each Region. Each territory needed Region-specific product information, from voltage specifications to regulatory compliance details .

The Solution

Ring built a RAG-based chatbot on Amazon Bedrock Knowledge Bases with:

Component	Implementation
Metadata filtering	Content tagged with `contentLocale`; query filters by user's locale
Two-phase content management	Ingestion & Evaluation workflow + Promotion workflow
Daily evaluation	LLM-as-a-judge compares version performance
Centralized architecture	Single knowledge base serves all locales

The Results

Metric	Result
Cost reduction per additional locale	21%
Locales supported	10 international Regions
Content updates per week	Approximately 200
Version retention	30 days

Source:

Step 9: Implementation Roadmap

Week 1: Foundation

Day	Task	Deliverable
1-2	Set up AWS account, enable Bedrock model access, create S3 bucket	Configured environment
3-4	Upload sample documents, create Knowledge Base via console	Working prototype
5-7	Test with questions, evaluate answer quality	Validation results

Week 2: Production Ready

Day	Task	Deliverable
8-9	Implement Lambda + API Gateway for query endpoint	REST API
10-11	Add metadata filtering (if multi-tenant/locale)	Filtered retrieval
12-13	Add source citations to responses	Enhanced UX
14	Set up monitoring (CloudWatch, Cost Explorer)	Production readiness

Step 10: Common Pitfalls and How to Avoid Them

Pitfall	Symptom	Solution
Unoptimized chunking	Retrieved chunks are irrelevant or incomplete	Experiment with chunk size (256-512) and overlap (10-20%)
No metadata filtering	Retrieval returns results from wrong category/locale	Add metadata fields and filter in queries
OpenSearch cost surprise	Monthly bill $300+ even with low usage	Use S3 Vectors for low-volume; configure idle settings
Stale documents	Answers don't reflect latest policies	Implement daily or weekly document sync
No evaluation pipeline	Don't know if answers are improving	Use LLM-as-a-judge to score versions

Step 11: Frequently Asked Questions

Q1: What is the cheapest way to run RAG on AWS?

Use S3 Vectors for your vector store, not OpenSearch Serverless. S3 Vectors charge object storage rates (0.023/GB)ratherthanaminimummonthlyfee.RAGStack−Lambdademonstratesaserverlesspipelinethatcosts0.023/GB)ratherthanaminimummonthlyfee.RAGStack−Lambdademonstratesaserverlesspipelinethatcosts2-10 per month idle .

Q2: How do I choose between OpenSearch Serverless and S3 Vectors?

Workload	Recommendation
Low volume (<1,000 queries/month), cost-sensitive	S3 Vectors
Moderate volume, needs low latency	OpenSearch Serverless
Enterprise scale, needs advanced search features	OpenSearch (provisioned)

Q3: How do I keep my knowledge base updated?

Method 1: S3 event notifications trigger sync on new uploads
Method 2: Scheduled Step Functions workflow (daily sync)
Method 3: Web crawler re-crawl on schedule

Q4: Which LLM should I use for RAG on Bedrock?

Model	Best For	Cost
Claude 3 Sonnet	Complex Q&A, reasoning	Medium
Claude 3 Haiku	Fast, cost-effective responses	Low
Amazon Nova Lite	Balanced performance/cost	Very low
Amazon Titan Text	Simple Q&A, embeddings	Very low
DeepSeek	Specialized knowledge domains	Pay-per-token

Q5: Can I use RAG with scanned PDFs?

Yes. Textract handles OCR for scanned PDFs and images. Textract charges approximately $1.50 per 1,000 pages for standard text detection .

Q6: How do I debug retrieval quality?

Enable CloudWatch logs for Bedrock Knowledge Bases
Log retrieved chunks with relevance scores
Use a test set of Q&A pairs to measure retrieval accuracy
Implement LLM-as-a-judge evaluation pipeline

Q7: How do I prevent hallucinations?

Set temperature low (0.2-0.5)
Use RAG retrieval threshold – only answer if relevant chunks found
Add system prompt: "Only answer from the provided context. If unsure, say 'I don't know.'"
Cite sources in responses

Q8: What is the difference between Agents and Knowledge Bases?

Knowledge Bases	Agents for Bedrock
Retrieves relevant documents	Reasons, plans, takes actions
Answers questions	Executes multi-step tasks
Best for Q&A	Best for workflows (order status, booking, etc.)

They can be combined: Knowledge Base provides information; Agent orchestrates actions .

Step 12: Final Tagline

"Your company has documents. Policies. Manuals. Support articles. What if an AI could read all of them and answer any question instantly? RAG on AWS makes it possible – for less than the cost of a coffee per month."

Short version:
Step-by-step guide to building a high-performance RAG chatbot on AWS – from 20-minute Knowledge Base setup to production-ready, serverless pipeline. Costs less than $3/month idle.

Hashtags:
#RAGChatbot #AmazonBedrock #AWS #GenerativeAI #KnowledgeBase #Serverless #AIEngineering #InnovativeAISolutions

Ready to Build Your RAG Chatbot?

You don't need to be an AI expert. You need the right architecture and a clear roadmap. Let us help you build it.

Contact Us

Phone: +91 7464 099 059 / +91 96899 67356
Email: info@innovativeais.com
Address: Netaji Subhash Place, Pitampura, Delhi – 110034
Website: https://innovativeais.com

Get Free Consultation

Building a High-Performance RAG Chatbot on AWS: A Step-by-Step Guide

What Is RAG and Why AWS?

The RAG Workflow – A Simple Explanation

Why AWS for RAG?

Step 3: Two Paths – Choose Your Starting Point

Step 4: Path 1 – Build a RAG Chatbot in 20 Minutes (No Code)

Prerequisites

Step 4.1: Enable Model Access

Step 4.2: Prepare and Upload Documents

Step 4.3: Create Knowledge Base in Bedrock Console

Step 4.4: Test Your Knowledge Base

Path 1 Cost Estimate

Step 5: Path 2 – Production-Ready Serverless RAG Pipeline

Architecture Overview

Step 5.1: Set Up Your AWS Environment

Step 5.2: Create Knowledge Base (Infrastructure as Code)

Step 5.3: Implement Query API with Lambda and API Gateway

Step 5.4: Add Metadata Filtering for Multi-Locale Support

Step 5.5: Add Source Citations and Collapsible References

Step 6: Optimizing for Cost and Performance

Cost Optimization Strategies

Performance Optimization Strategies

Ring's Performance Architecture

Step 7: Advanced Features for Production

Feature 1: Multi-Locale Support with Metadata Filtering

Feature 2: Versioning and Rollback

Feature 3: Multi-Modal Support (Images, Video, Audio)

Feature 4: Web Crawler for Documentation

Feature 5: Agentic RAG with Amazon Bedrock Agents

Step 8: Real-World Case Study – Ring's Multi-Locale RAG Chatbot

The Challenge

The Solution

The Results

Step 9: Implementation Roadmap

Week 1: Foundation

Week 2: Production Ready

Step 10: Common Pitfalls and How to Avoid Them

Step 11: Frequently Asked Questions

Q1: What is the cheapest way to run RAG on AWS?

Q2: How do I choose between OpenSearch Serverless and S3 Vectors?

Q3: How do I keep my knowledge base updated?

Q4: Which LLM should I use for RAG on Bedrock?

Q5: Can I use RAG with scanned PDFs?

Q6: How do I debug retrieval quality?

Q7: How do I prevent hallucinations?

Q8: What is the difference between Agents and Knowledge Bases?

Step 12: Final Tagline

Ready to Build Your RAG Chatbot?

Contact Us

Ready to build AI solutions for your business?

Related Articles

What is RAG AI — Complete Guide for Indian Businesses

How to Choose the Best AI Development Company in Delhi | Complete Guide 2026

What is Prompt Engineering? Complete Guide with Examples for Indian Businesses (2026)

Get Free Consultation