What is RAG?
RAG combines the power of large language models with your own data. Instead of relying on training data, the AI retrieves relevant content from your website before generating each response.
Retrieve
When a user asks a question, we search your vectorized content for the most relevant chunks.
Augment
We inject those relevant chunks as context into the LLM prompt.
Generate
The LLM generates an answer using ONLY the provided context.
ChattyBox RAG Architecture
Vector Database
Your content is converted to embeddings and stored in a high-performance vector database for fast semantic search.
Embeddings Model
We use state-of-the-art embedding models to understand the semantic meaning of your content.
Fast Retrieval
Typical retrieval time is under 100ms. We find the top 3-5 most relevant chunks for every query.
Strict Prompting
We use carefully crafted system prompts that instruct the LLM to only use provided context.
Citation Links
Every response includes links to source pages, so users can verify the information.
Fallback Handling
When no relevant content is found, the bot honestly says 'I don't know' instead of hallucinating.
