RAG Agent with LangChain

Build a retrieval-augmented generation (RAG) agent that runs entirely on your laptop — Postgres with pgvector for embeddings, Redis for response caching, and LangChain orchestrating it all. No cloud vector database required.

What you'll build

A FastAPI service that:

Ingests documents and stores embeddings in Postgres via pgvector
Retrieves relevant context for user queries using semantic search
Generates answers with an LLM using the retrieved context
Caches frequent queries in Redis for sub-second responses

┌─────────────┐     ┌──────────────────┐     ┌──────────────┐
│   Browser    │────▶│   FastAPI + LC   │────▶│  OpenAI API  │
│  :8000       │◀────│   RAG Agent      │◀────│  (external)  │
└─────────────┘     └───────┬──────────┘     └──────────────┘
                        │        │
                   ┌────▼──┐  ┌──▼───┐
                   │Postgres│  │Redis │
                   │pgvector│  │cache │
                   └────────┘  └──────┘

Project structure

rag-agent/
├── Dockerfile
├── requirements.txt
├── main.py              # FastAPI app + RAG chain
├── ingest.py            # Document ingestion script
└── .github/
    └── workflows/
        └── dev-deploy.yml

requirements.txt

fastapi==0.115.0
uvicorn[standard]==0.30.0
langchain==0.3.0
langchain-openai==0.2.0
langchain-postgres==0.0.12
pgvector==0.3.5
psycopg2-binary==2.9.9
redis==5.0.0
python-multipart==0.0.9

Dockerfile

FROM python:3.12-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

main.py

import os
import hashlib
import json

import redis
from fastapi import FastAPI, UploadFile
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_postgres import PGVector
from langchain.chains import RetrievalQA
from langchain.text_splitter import RecursiveCharacterTextSplitter

app = FastAPI(title="RAG Agent")

# Connection URLs are auto-injected by kindling
DATABASE_URL = os.environ["DATABASE_URL"]
REDIS_URL = os.environ["REDIS_URL"]
OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]

# Initialize components
embeddings = OpenAIEmbeddings(api_key=OPENAI_API_KEY)
vectorstore = PGVector(
    connection=DATABASE_URL,
    embeddings=embeddings,
    collection_name="documents",
)
llm = ChatOpenAI(model="gpt-4o-mini", api_key=OPENAI_API_KEY)
cache = redis.from_url(REDIS_URL)


@app.post("/ingest")
async def ingest(file: UploadFile):
    """Split a document into chunks and store embeddings."""
    content = (await file.read()).decode("utf-8")
    splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
    chunks = splitter.split_text(content)
    vectorstore.add_texts(chunks, metadatas=[{"source": file.filename}] * len(chunks))
    return {"chunks": len(chunks), "source": file.filename}


@app.get("/ask")
async def ask(q: str):
    """Answer a question using RAG with Redis caching."""
    cache_key = f"rag:{hashlib.sha256(q.encode()).hexdigest()[:16]}"
    cached = cache.get(cache_key)
    if cached:
        return json.loads(cached)

    qa = RetrievalQA.from_chain_type(
        llm=llm,
        retriever=vectorstore.as_retriever(search_kwargs={"k": 4}),
    )
    result = qa.invoke(q)
    response = {"question": q, "answer": result["result"]}
    cache.setex(cache_key, 3600, json.dumps(response))
    return response


@app.get("/health")
async def health():
    return {"status": "ok"}

kindling setup

1. Store your API key

kindling secrets set OPENAI_API_KEY sk-your-key-here

2. Deploy with the workflow

The dependencies block is where kindling shines — Postgres and Redis are auto-provisioned with zero configuration, and connection URLs are injected into your container automatically.

# .github/workflows/dev-deploy.yml
name: dev-deploy
on:
  push:
    branches: [main]
  workflow_dispatch:

env:
  REGISTRY: registry:5000
  TAG: ${{ github.actor }}-${{ github.sha }}

jobs:
  deploy:
    runs-on: [self-hosted, "${{ github.actor }}"]
    steps:
      - uses: actions/checkout@v4
      - run: rm -rf /builds/*

      - name: Build RAG agent image
        uses: kindling-sh/kindling/.github/actions/kindling-build@main
        with:
          name: rag-agent
          context: ${{ github.workspace }}
          image: "${{ env.REGISTRY }}/rag-agent:${{ env.TAG }}"

      - name: Deploy RAG agent
        uses: kindling-sh/kindling/.github/actions/kindling-deploy@main
        with:
          name: ${{ github.actor }}-rag-agent
          image: "${{ env.REGISTRY }}/rag-agent:${{ env.TAG }}"
          port: "8000"
          ingress-host: "${{ github.actor }}-rag.localhost"
          health-check-path: "/health"
          dependencies: |
            - type: postgres
            - type: redis
          env: |
            - name: OPENAI_API_KEY
              valueFrom:
                secretKeyRef:
                  name: kindling-secret-openai-api-key
                  key: value

3. Try it

# Ingest a document
curl -F "file=@README.md" http://<you>-rag.localhost/ingest

# Ask a question
curl "http://<you>-rag.localhost/ask?q=what+does+this+project+do"

Iterate with sync

Edit main.py locally — change the chunk size, swap the retriever, add a reranker — and see it live in seconds:

kindling sync -n <you>-rag-agent -d .
# Edit main.py → changes appear instantly
# Ctrl+C → deployment rolls back

Why local matters for RAG

Embedding latency — pgvector runs on localhost, so vector search is single-digit milliseconds instead of a cloud round-trip
Free iteration — tune chunk sizes, overlap, retrieval k values, and prompt templates without burning cloud credits
Data stays local — ingest proprietary docs without sending them to a third-party vector database
Full observability — kindling logs shows exactly what's happening in Postgres, Redis, and your app simultaneously

Next steps

Add Jaeger for tracing LangChain spans: - type: jaeger
Use kindling expose to test with OAuth-protected endpoints
Scale up to a multi-service architecture with a separate ingestion worker

What you'll build​

Project structure​

requirements.txt​

Dockerfile​

main.py​

kindling setup​

1. Store your API key​

2. Deploy with the workflow​

3. Try it​

Iterate with sync​

Why local matters for RAG​

Next steps​