What Is RAG and Why AWS?
The RAG Workflow – A Simple Explanation
| Step | What Happens | AWS Service |
|---|---|---|
| Ingestion | You upload documents (PDFs, websites, text files) | S3, Web Crawler |
| Chunking | Documents are split into smaller pieces | Bedrock Knowledge Bases |
| Embedding | Each chunk is converted into a vector (a mathematical representation of meaning) | Amazon Titan Embeddings, Nova Multimodal |
| Storage | Vectors are stored in a vector database | Aurora pgvector, OpenSearch Serverless, S3 Vectors |
| Query | User asks a question, converted to vector, similar chunks retrieved | Bedrock Knowledge Bases |
| Generation | Retrieved chunks + user question sent to LLM for answer | Anthropic Claude, Amazon Nova, DeepSeek |
"RAG bridges the gap between what an LLM knows from training and what your business knows from its own data. The LLM provides the language skills. Your data provides the knowledge."
Why AWS for RAG?
| Advantage | Why It Matters |
|---|---|
| Fully managed | Bedrock Knowledge Bases handles chunking, embedding, and vector storage – no infrastructure to manage |
| Model choice | Use Titan, Claude, Nova, DeepSeek, Llama – switch models without code changes |
| Serverless | Pay only for what you use – scale to zero when idle |
| Security | Your data stays in your VPC, encrypted with KMS, never used to train models |
| Proven at scale | Ring, Amazon's home security subsidiary, uses Bedrock Knowledge Bases across 10 international locales |
Step 3: Two Paths – Choose Your Starting Point
| Path | Best For | Time | Cost | Complexity |
|---|---|---|---|---|
| Path 1: Simple Knowledge Base (No Code) | Beginners, quick prototype, internal knowledge base | 20 minutes | Pay-as-you-go (Bedrock, OpenSearch) | Low |
| Path 2: Production-Ready Pipeline (Code + Infrastructure) | Customer-facing chatbot, custom RAG, high volume | 1-2 days | 2−10/monthidle,2−10/monthidle,0.001-0.003 per query | Medium-High |
"Start with Path 1 to understand the concepts. Move to Path 2 when you need production scale, customizations, or cost optimization."
Step 4: Path 1 – Build a RAG Chatbot in 20 Minutes (No Code)
This path requires zero coding. You will use the Amazon Bedrock console to create a knowledge base and test it immediately.
Prerequisites
| Requirement | Details |
|---|---|
| AWS account | Any region where Bedrock is available (us-east-1, us-west-2, eu-west-1, ap-southeast-1) |
| Model access | Enable access to Titan Embeddings and Claude or Nova in Bedrock Model Access |
| S3 bucket | Create a bucket for your source documents |
Step 4.1: Enable Model Access
| Action | Instructions |
|---|---|
| Open Bedrock console | Navigate to Amazon Bedrock in your AWS account |
| Model access | Click "Model Access" in the left navigation |
| Enable models | Select "Amazon Titan Text Embeddings V2" and "Claude 3" or "Nova" |
| Submit | Wait 2-3 minutes for access to be granted |
Step 4.2: Prepare and Upload Documents
| Document Type | Best Practices | Size Limit |
|---|---|---|
Text (.txt) |
Clean formatting, no special characters | 10MB per file |
Markdown (.md) |
Use for documentation | 10MB per file |
PDF (.pdf) |
Ensure text is selectable (not scanned-only) | 10MB per file |
HTML (.html) |
Crawl websites directly – no upload needed | N/A |
Supported formats: plain text, markdown, PDF, HTML, CSV, JSON, PowerPoint, Word, Excel
Step 4.3: Create Knowledge Base in Bedrock Console
| Step | Action | Notes |
|---|---|---|
| 1 | Open Bedrock console → Knowledge Bases → Create Knowledge Base | |
| 2 | Enter name (e.g., company-kb) |
Use descriptive names |
| 3 | Select S3 as data source | Choose the bucket with your documents |
| 4 | For Chunking strategy, select "Default" | 300 token chunks with 20% overlap |
| 5 | For Embeddings, select "Titan Text Embeddings V2" | 1024 dimensions, can be reduced to 256 |
| 6 | For Vector store, select "OpenSearch Serverless" | Fully managed, auto-scales |
| 7 | Review and create | Takes 2-5 minutes to create and sync |
Step 4.4: Test Your Knowledge Base
| Action | Instructions |
|---|---|
| Open test interface | In Knowledge Base details, click "Test" |
| Select model | Choose Claude 3 or Nova Lite |
| Ask questions | "What is our return policy?" "How do I reset my password?" |
| Review answers | Check that responses cite the source documents |
"In testing, Ring found that cross-Region latency accounted for less than 10% of total response time. This allows centralized architecture without per-Region deployments."
Path 1 Cost Estimate
| Service | Monthly Cost (Idle) | Cost per Query |
|---|---|---|
| OpenSearch Serverless | ~350 USD/min (can be reduced with idle settings) | Included |
| Bedrock Embeddings | 0 USD (no ingestion after initial) | 0 USD |
| Bedrock LLM (Claude/Nova) | 0 USD | $0.001-0.003 |
| S3 Storage | $0.02/GB | $0 |
Important: OpenSearch Serverless has a minimum monthly cost of approximately 350 USD. For low-volume workloads, consider Path 2 with S3 Vectors .
Step 5: Path 2 – Production-Ready Serverless RAG Pipeline
This path uses an open-source, serverless architecture that scales to zero when idle. Monthly costs for a small knowledge base can be as low as $2-10 USD .
Architecture Overview
┌─────────────────────────────────────────────────────────────────────────────┐ │ PRODUCTION RAG PIPELINE ARCHITECTURE │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ User │◄──►│ API │◄──►│ Lambda │◄──►│ Bedrock │ │ │ │ Frontend│ │ Gateway │ │ Orchestrator│ │ Knowledge │ │ │ └──────────┘ └──────────┘ └──────────────┘ │ Base │ │ │ └──────┬───────┘ │ │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ │ S3 │ │ Step │ │ DynamoDB │ │ │ │ │ Documents │───►│ Functions │───►│ Metadata │ │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ │ │ │ ┌──────────────────────┘ │ │ ▼ │ │ ┌──────────────────┐ │ │ │ OpenSearch │ │ │ │ Serverless / │ │ │ │ S3 Vectors │ │ │ └──────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘
Source:
Step 5.1: Set Up Your AWS Environment
| Action | Instructions | Cost |
|---|---|---|
| Create S3 bucket | aws s3 mb s3://your-rag-documents --region us-east-1 |
$0.023/GB |
| Upload documents | aws s3 cp ./documents/ s3://your-rag-documents/ --recursive |
Pay for storage |
| Create IAM role | Role with Bedrock, S3, OpenSearch, Lambda permissions | Free |
| Enable Bedrock model access | Console → Model Access → Enable Titan + Claude | Pay per token |
Step 5.2: Create Knowledge Base (Infrastructure as Code)
Using AWS CDK or Terraform:
# AWS CDK - knowledge_base_stack.py
from aws_cdk import Stack
from aws_cdk.aws_bedrock import CfnKnowledgeBase, CfnDataSource
class RagKnowledgeBaseStack(Stack):
def __init__(self, scope, id, **kwargs):
super().__init__(scope, id, **kwargs)
# Create S3 bucket for documents
documents_bucket = s3.Bucket(self, "DocumentsBucket")
# Create knowledge base
knowledge_base = CfnKnowledgeBase(self, "RagKnowledgeBase",
name="company-rag-kb",
description="RAG knowledge base for company documents",
role_arn=kb_role.role_arn,
knowledge_base_configuration={
"type": "VECTOR",
"vectorKnowledgeBaseConfiguration": {
"embeddingModelArn": "arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0"
}
},
storage_configuration={
"type": "OPENSEARCH_SERVERLESS",
"opensearchServerlessConfiguration": {
"collectionArn": collection_arn,
"vectorIndexName": "rag-index",
"fieldMapping": {
"metadataField": "metadata",
"textField": "text"
}
}
}
)
Step 5.3: Implement Query API with Lambda and API Gateway
# lambda_function.py - Query handler
import boto3
import json
bedrock_agent_runtime = boto3.client('bedrock-agent-runtime')
KNOWLEDGE_BASE_ID = 'your-knowledge-base-id'
MODEL_ARN = 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0'
def lambda_handler(event, context):
body = json.loads(event['body'])
user_query = body.get('query', '')
# Retrieve and generate using Knowledge Bases
response = bedrock_agent_runtime.retrieve_and_generate(
input={'text': user_query},
retrieveAndGenerateConfiguration={
'type': 'KNOWLEDGE_BASE',
'knowledgeBaseConfiguration': {
'knowledgeBaseId': KNOWLEDGE_BASE_ID,
'modelArn': MODEL_ARN,
'retrievalConfiguration': {
'vectorSearchConfiguration': {
'numberOfResults': 5
}
}
}
}
)
return {
'statusCode': 200,
'headers': {'Content-Type': 'application/json'},
'body': json.dumps({
'answer': response['output']['text'],
'citations': response.get('citations', [])
})
}
Source: Adapted from
Step 5.4: Add Metadata Filtering for Multi-Locale Support
Ring's production architecture uses metadata filtering to serve Region-specific content from a single centralized system. For example, a knowledge base might store content tagged with contentLocale:
{locale}/Service.Ring.{Upsert/Delete}.{unique_identifier}.metadata.json
# Query with metadata filtering
response = bedrock_agent_runtime.retrieve_and_generate(
input={'text': user_query},
retrieveAndGenerateConfiguration={
'type': 'KNOWLEDGE_BASE',
'knowledgeBaseConfiguration': {
'knowledgeBaseId': KNOWLEDGE_BASE_ID,
'modelArn': MODEL_ARN,
'retrievalConfiguration': {
'vectorSearchConfiguration': {
'numberOfResults': 5,
'filter': {
'equals': {
'key': 'contentLocale',
'value': 'en-US' # Or dynamically from user profile
}
}
}
}
}
}
)
Source:
Step 5.5: Add Source Citations and Collapsible References
# Enhanced Lambda response with citations
response = bedrock_agent_runtime.retrieve_and_generate(
input={'text': user_query},
retrieveAndGenerateConfiguration={
'type': 'KNOWLEDGE_BASE',
'knowledgeBaseConfiguration': {
'knowledgeBaseId': KNOWLEDGE_BASE_ID,
'modelArn': MODEL_ARN,
'retrievalConfiguration': {
'vectorSearchConfiguration': {
'numberOfResults': 5
}
},
'generationConfiguration': {
'inferenceConfig': {
'textInferenceConfig': {
'temperature': 0.7,
'maxTokens': 500
}
}
}
}
}
)
# Process citations from retrieval results
citations = []
for result in response.get('retrievalResults', []):
citations.append({
'source': result['location']['s3Location']['uri'],
'content': result['content']['text'],
'score': result['score']
})
Source:
Step 6: Optimizing for Cost and Performance
Cost Optimization Strategies
| Strategy | Implementation | Savings |
|---|---|---|
| Scale to zero | Use S3 Vectors instead of OpenSearch Serverless | 350/month→350/month→3/month |
| Reduce embedding dimensions | Titan V2 at 256 dimensions vs 1024 retains 97% accuracy | 75% storage reduction |
| Cache frequent queries | Use ElastiCache or DynamoDB for repeated questions | 50-70% fewer Bedrock calls |
| Use spot/preemptible | For batch processing, not real-time | 70-90% off compute |
| Monitor with Cost Explorer | Set budgets and alerts | Prevent surprise bills |
Performance Optimization Strategies
| Strategy | Implementation | Latency Improvement |
|---|---|---|
| Optimize chunk size | Experiment with 256-512 tokens; test semantic coherence | 20-30% faster retrieval |
| Use smaller embedding dimensions | 256 vs 1024 dimensions | Faster vector search |
| Pre-filter by metadata | Apply filters before vector search | Reduces search space |
| Select faster model for simple queries | Route to Nova Lite or Haiku; use Claude Sonnet for complex | 2-3x faster for simple Q&A |
| Enable response streaming | Stream tokens to user as they generate | Perceived latency reduction |
Ring's Performance Architecture
Ring's production RAG chatbot requirements specify:
-
Average end-to-end latency: 7-8 seconds
-
Cross-Region latency accounts for: Less than 10% of total response time
-
This allowed: Centralized architecture rather than per-Region deployment
Source:
Step 7: Advanced Features for Production
Feature 1: Multi-Locale Support with Metadata Filtering
Ring's architecture uses metadata-driven filtering to serve Region-specific content from a single centralized system, reducing cost per additional locale by 21% .
| Component | Implementation |
|---|---|
| Content tagging | Each document tagged with contentLocale (en-US, en-GB, de-DE, etc.) |
| Ingestion pipeline | Step Functions orchestrates daily knowledge base creation |
| Evaluation pipeline | LLM-as-a-judge compares versions and promotes highest-performing |
| Query filtering | Lambda applies metadata filter based on user's locale |
Feature 2: Versioning and Rollback
Ring maintains 30 days of version history for knowledge bases:
-
Daily sync creates new version
-
Evaluation pipeline tests retrieval accuracy
-
Highest-performing version promoted to production
-
Rollback available within 30 days
Feature 3: Multi-Modal Support (Images, Video, Audio)
RAGStack-Lambda supports:
-
Images: Amazon Nova Multimodal Embeddings for visual search; Textract for OCR of scanned images
-
Video: Transcribe for speech-to-text, split into 30-second searchable chunks with speaker identification
-
Audio: Transcribe for transcription, timestamp indexing
Source:
Feature 4: Web Crawler for Documentation
Bedrock Knowledge Bases can crawl websites directly without S3 upload :
{
"dataSourceConfiguration": {
"webConfiguration": {
"crawlerConfiguration": {
"sourceUrls": ["https://docs.aws.amazon.com/AmazonS3/latest/userguide/"],
"crawlScope": "HOST_ONLY",
"urlRegex": {
"inclusionPatterns": [".*"],
"exclusionPatterns": ["*.pdf"]
}
}
}
}
}
Feature 5: Agentic RAG with Amazon Bedrock Agents
For complex workflows (order recommendations, multi-step actions), combine Knowledge Bases with Bedrock Agents:
| Component | Purpose |
|---|---|
| Knowledge Base | Stores product info, policies, documentation |
| Agent | Orchestrates multi-step tasks (look up order, check inventory, recommend product) |
| Action Groups | Connect to backend APIs (order status, inventory check, purchase) |
Source:
Step 8: Real-World Case Study – Ring's Multi-Locale RAG Chatbot
The Challenge
Ring needed to provide accurate, contextually relevant support across 10 international locales without creating separate infrastructure for each Region. Each territory needed Region-specific product information, from voltage specifications to regulatory compliance details .
The Solution
Ring built a RAG-based chatbot on Amazon Bedrock Knowledge Bases with:
| Component | Implementation |
|---|---|
| Metadata filtering | Content tagged with contentLocale; query filters by user's locale |
| Two-phase content management | Ingestion & Evaluation workflow + Promotion workflow |
| Daily evaluation | LLM-as-a-judge compares version performance |
| Centralized architecture | Single knowledge base serves all locales |
The Results
| Metric | Result |
|---|---|
| Cost reduction per additional locale | 21% |
| Locales supported | 10 international Regions |
| Content updates per week | Approximately 200 |
| Version retention | 30 days |
Source:
Step 9: Implementation Roadmap
Week 1: Foundation
| Day | Task | Deliverable |
|---|---|---|
| 1-2 | Set up AWS account, enable Bedrock model access, create S3 bucket | Configured environment |
| 3-4 | Upload sample documents, create Knowledge Base via console | Working prototype |
| 5-7 | Test with questions, evaluate answer quality | Validation results |
Week 2: Production Ready
| Day | Task | Deliverable |
|---|---|---|
| 8-9 | Implement Lambda + API Gateway for query endpoint | REST API |
| 10-11 | Add metadata filtering (if multi-tenant/locale) | Filtered retrieval |
| 12-13 | Add source citations to responses | Enhanced UX |
| 14 | Set up monitoring (CloudWatch, Cost Explorer) | Production readiness |
Step 10: Common Pitfalls and How to Avoid Them
| Pitfall | Symptom | Solution |
|---|---|---|
| Unoptimized chunking | Retrieved chunks are irrelevant or incomplete | Experiment with chunk size (256-512) and overlap (10-20%) |
| No metadata filtering | Retrieval returns results from wrong category/locale | Add metadata fields and filter in queries |
| OpenSearch cost surprise | Monthly bill $300+ even with low usage | Use S3 Vectors for low-volume; configure idle settings |
| Stale documents | Answers don't reflect latest policies | Implement daily or weekly document sync |
| No evaluation pipeline | Don't know if answers are improving | Use LLM-as-a-judge to score versions |
Step 11: Frequently Asked Questions
Q1: What is the cheapest way to run RAG on AWS?
Use S3 Vectors for your vector store, not OpenSearch Serverless. S3 Vectors charge object storage rates (0.023/GB)ratherthanaminimummonthlyfee.RAGStack−Lambdademonstratesaserverlesspipelinethatcosts0.023/GB)ratherthanaminimummonthlyfee.RAGStack−Lambdademonstratesaserverlesspipelinethatcosts2-10 per month idle .
Q2: How do I choose between OpenSearch Serverless and S3 Vectors?
| Workload | Recommendation |
|---|---|
| Low volume (<1,000 queries/month), cost-sensitive | S3 Vectors |
| Moderate volume, needs low latency | OpenSearch Serverless |
| Enterprise scale, needs advanced search features | OpenSearch (provisioned) |
Q3: How do I keep my knowledge base updated?
-
Method 1: S3 event notifications trigger sync on new uploads
-
Method 2: Scheduled Step Functions workflow (daily sync)
-
Method 3: Web crawler re-crawl on schedule
Q4: Which LLM should I use for RAG on Bedrock?
| Model | Best For | Cost |
|---|---|---|
| Claude 3 Sonnet | Complex Q&A, reasoning | Medium |
| Claude 3 Haiku | Fast, cost-effective responses | Low |
| Amazon Nova Lite | Balanced performance/cost | Very low |
| Amazon Titan Text | Simple Q&A, embeddings | Very low |
| DeepSeek | Specialized knowledge domains | Pay-per-token |
Q5: Can I use RAG with scanned PDFs?
Yes. Textract handles OCR for scanned PDFs and images. Textract charges approximately $1.50 per 1,000 pages for standard text detection .
Q6: How do I debug retrieval quality?
-
Enable CloudWatch logs for Bedrock Knowledge Bases
-
Log retrieved chunks with relevance scores
-
Use a test set of Q&A pairs to measure retrieval accuracy
-
Implement LLM-as-a-judge evaluation pipeline
Q7: How do I prevent hallucinations?
-
Set temperature low (0.2-0.5)
-
Use RAG retrieval threshold – only answer if relevant chunks found
-
Add system prompt: "Only answer from the provided context. If unsure, say 'I don't know.'"
-
Cite sources in responses
Q8: What is the difference between Agents and Knowledge Bases?
| Knowledge Bases | Agents for Bedrock |
|---|---|
| Retrieves relevant documents | Reasons, plans, takes actions |
| Answers questions | Executes multi-step tasks |
| Best for Q&A | Best for workflows (order status, booking, etc.) |
They can be combined: Knowledge Base provides information; Agent orchestrates actions .
Step 12: Final Tagline
"Your company has documents. Policies. Manuals. Support articles. What if an AI could read all of them and answer any question instantly? RAG on AWS makes it possible – for less than the cost of a coffee per month."
Short version:
Step-by-step guide to building a high-performance RAG chatbot on AWS – from 20-minute Knowledge Base setup to production-ready, serverless pipeline. Costs less than $3/month idle.
Hashtags:
#RAGChatbot #AmazonBedrock #AWS #GenerativeAI #KnowledgeBase #Serverless #AIEngineering #InnovativeAISolutions
Ready to Build Your RAG Chatbot?
You don't need to be an AI expert. You need the right architecture and a clear roadmap. Let us help you build it.
Contact Us
Phone: +91 7464 099 059 / +91 96899 67356
Email: info@innovativeais.com
Address: Netaji Subhash Place, Pitampura, Delhi – 110034
Website: https://innovativeais.com