What is RAG? How AI Answers from Your Own Documents

1. The Problem RAG Solves

When you ask an AI model a question, it answers from memory — everything it learned during training. That training data has a cutoff date, doesn't include your private documents, and is sometimes simply wrong (this is called "hallucination").

Imagine asking an AI: "What is the attendance policy in our college handbook?" The AI has never seen your college handbook. So it will either make something up or admit it doesn't know.

RAG solves this by giving the AI access to a specific set of documents at the moment it answers. Instead of relying purely on memory, it first searches your documents, retrieves the most relevant sections, and then uses those sections to generate an accurate answer.

Simple Analogy

Imagine an exam where you're allowed to use your notes. Without notes (no RAG) — you answer from memory, and you might guess wrong. With notes (RAG) — you search your notes first, find the relevant page, and then write a precise answer. RAG gives the AI its "open-book notes."

2. The RAG Pipeline — How It Works

RAG has two phases: an indexing phase (done once, when you add documents) and a retrieval phase (done every time someone asks a question).

Load DocumentsRead your PDFs, text files, or web pages into memory.

Split into ChunksBreak documents into small overlapping sections (e.g., 500 words each). This is because AI models can only process a limited amount of text at once.

Create EmbeddingsConvert each chunk into a list of numbers (a "vector") that captures its meaning. Chunks about similar topics end up with similar number patterns.

Store in a Vector DatabaseSave all the vectors into a searchable database like ChromaDB or FAISS.

Query Time: Embed the QuestionWhen a user asks a question, convert it into a vector the same way.

Retrieve Relevant ChunksFind the document chunks whose vectors are closest to the question vector — these are the most semantically relevant sections.

Generate with ContextSend the retrieved chunks + the original question to the LLM. The model answers using the provided context, not memory alone.

3. Build a Simple RAG System in Python

We'll use ChromaDB (free, local vector database) and Google Gemini for generation. No paid services needed.

Install Libraries

Terminal

pip install chromadb google-generativeai pypdf

Full RAG Script

Python — simple_rag.py

import os
import chromadb
import google.generativeai as genai
from pypdf import PdfReader

# ── Config ────────────────────────────────────────
genai.configure(api_key=os.environ["GEMINI_API_KEY"])
model = genai.GenerativeModel("gemini-1.5-flash")
chroma = chromadb.Client()
collection = chroma.get_or_create_collection("my_docs")

# ── Step 1: Load and chunk a PDF ──────────────────
def load_pdf(path, chunk_size=400):
    reader = PdfReader(path)
    text = " ".join(page.extract_text() for page in reader.pages)
    words = text.split()
    chunks = []
    for i in range(0, len(words), chunk_size):
        chunk = " ".join(words[i:i+chunk_size])
        chunks.append(chunk)
    return chunks

# ── Step 2: Index the document ────────────────────
def index_document(pdf_path):
    chunks = load_pdf(pdf_path)
    ids = [f"chunk_{i}" for i in range(len(chunks))]
    collection.add(documents=chunks, ids=ids)
    print(f"Indexed {len(chunks)} chunks from {pdf_path}")

# ── Step 3: Answer a question using RAG ───────────
def ask(question, top_k=3):
    results = collection.query(query_texts=[question], n_results=top_k)
    context = "\n\n".join(results["documents"][0])
    prompt = f"""Use the following context to answer the question.
If the context doesn't contain the answer, say "Not found in document."

Context:
{context}

Question: {question}
Answer:"""
    response = model.generate_content(prompt)
    return response.text

# ── Run ───────────────────────────────────────────
index_document("your_document.pdf")

while True:
    q = input("\nAsk a question (or 'quit'): ").strip()
    if q.lower() == "quit":
        break
    print("\n" + ask(q))

Replace your_document.pdf with any PDF — a textbook, a datasheet, your college syllabus, lab manual, or project report.

Project Idea

Use this to build a "Chat with your Syllabus" tool — upload your semester syllabus as a PDF and ask it "What topics are in Unit 3?" or "Which unit covers microprocessors?" This is a genuinely impressive demo for your portfolio.

4. Popular RAG Tools and Frameworks

LangChain — the most popular RAG framework; handles chunking, embedding, retrieval, and generation with minimal code
LlamaIndex — specialised for document Q&A; easier to use for beginners than LangChain
ChromaDB — free, local vector database; runs on your laptop, no account needed
FAISS — Meta's vector similarity library; ultra-fast for large document sets
Pinecone — cloud vector database; needed for production apps with many users

5. Real Project Ideas Using RAG

Chat with your college placement brochure to get company-specific info
Q&A bot for a manufacturer's product datasheet
Study assistant trained on lecture notes and past question papers
Customer support bot for a local business
Legal document assistant (e.g., rent agreement explainer)

← Previous

Build a Gemini Chatbot

AI Tools in 2026