Skip to content

A step-by-step tutorial for spinning up a private, offline RAG app that answers application-security questions and cites the OWASP Top 10.

A nice way to learn AI tooling is to take a topic you already know and turn it into a small, working tool — something you can run on your own laptop, with no cloud bills and no API keys.

In this project we'll build an OWASP Security Advisor: a local Retrieval-Augmented Generation (RAG) app that ingests the OWASP Top 10 documents and answers security questions in plain English, complete with citations and similarity scores.

The whole thing runs offline. The whole thing is scaffolded by GitHub Copilot from a single, detailed prompt. And the whole thing takes less than an afternoon.

Here's how to build it.

What you'll have at the end

A Streamlit web app, running at http://localhost:8501, that:

  • Loads the OWASP Top 10 documentation from a local folder
  • Splits it into chunks, embeds each chunk, and stores the vectors in FAISS
  • Accepts plain-English security questions in a chat-style UI
  • Retrieves the most relevant OWASP passages
  • Sends those passages plus your question to a local Mistral model
  • Returns a structured answer — vulnerability name, severity, explanation, prevention steps — with the source chunks shown alongside

No API keys. No data leaving your machine.

The stack, briefly

If you're brand new to any of these pieces, here's the one-sentence version of each:

  • VS Code — Microsoft's free code editor. The place you write everything.
  • GitHub Copilot — an AI pair programmer that lives inside VS Code. It auto-completes code, answers questions in a chat sidebar, and (in Agent mode) can scaffold entire projects from a spec.
  • Streamlit — a Python library that turns scripts into web UIs with about three lines of code.
  • Ollama — a tool that runs language models locally. Think "Docker for LLMs".
  • Mistral — a capable open-weight language model that runs comfortably on a laptop.
  • sentence-transformers — produces "embeddings", i.e. converts text into vectors so we can search by meaning.
  • FAISS — Facebook's library for fast similarity search over those vectors.

Don't worry if it's a lot. You won't write most of this from scratch.

A 30-second primer on RAG

RAG stands for Retrieval-Augmented Generation. The idea is simple:

  1. The user asks a question.
  2. You search your own document collection for the most relevant snippets.
  3. You stitch those snippets into a prompt and send it to the LLM.
  4. The LLM answers using only what you gave it.

Why bother? Two reasons. First, the model can only answer with your trusted documents, so it makes fewer things up. Second, you can show the source — every answer comes with the chunks that produced it. That's how our app cites OWASP Top 10 sections for every reply.

Prerequisites

Before starting, make sure you have:

  • Windows 10 or 11 (the scripts in this post target Windows; macOS/Linux work with minor tweaks)
  • Python 3.10–3.12 — Python 3.13 is not yet supported by some of our pinned dependencies (faiss-cpu in particular). I'd recommend 3.12.
  • VS Code installed
  • GitHub Copilot extension in VS Code, signed in to a paid or trial Copilot account
  • About 6 GB free disk space (the Mistral model alone is ~4.4 GB)

A note on operating systems: the commands below use Windows PowerShell, but everything works on macOS and Linux too — swap py for python3, .\run.bat for a bash equivalent, and the activation script path for source .venv312/bin/activate.

Step 1 — Install Ollama and pull Mistral

Ollama is the engine that will run our language model locally. Install it once, then pull the model file.

Visit https://ollama.com/download/windows and either click "Download for Windows" or paste the PowerShell one-liner into an admin PowerShell window.

Once Ollama is installed, open PowerShell and run:

ollama pull mistral

You'll see something like this:

PS C:\> ollama pull mistral
pulling manifest
pulling f5074b1221da: 100%   4.4 GB
pulling 43070e2d4e53: 100%   11 KB
pulling 1ff5b64b61b9: 100%   799 B
pulling ed11eda7790d: 100%   30 B
pulling 1064e17101bd: 100%   487 B
verifying sha256 digest
writing manifest
success

The first download is the only slow step. After this, the model is cached on your machine and starts in seconds.

Step 2 — Verify Ollama is reachable

The Ollama daemon listens on port 11434. Let's confirm it's alive:

curl http://localhost:11434/api/tags

You should get back HTTP 200 OK and a JSON payload that lists mistral:latest among the available models. If you see a connection refused, open the Ollama tray app — it has to be running for the API to respond.

Quick sanity check from the shell:

ollama list

You should see mistral:latest in the table.

Step 3 — Grab the OWASP Top 10 documents

Our corpus is the OWASP Top 10 2021 markdown files. The easiest way to grab them is the download-directory.github.io trick:

  1. Go to https://download-directory.github.io/
  2. Paste this URL into the box: https://github.com/OWASP/Top10/tree/master/2021/docs/en
  3. Hit Download. You'll get a ZIP with about 33 markdown files.

Make a folder called documents/ in your project root, and drop the markdown files in there. The structure should look like:

owasp-security-advisor/
├── documents/
│   ├── A01_2021-Broken_Access_Control.md
│   ├── A02_2021-Cryptographic_Failures.md
│   └── ... (31 more)

That's our knowledge base.

Step 4 — Open VS Code and prompt GitHub Copilot

This is the magic step. Open VS Code in your project folder, open the Copilot Chat / Agent panel, and paste the following spec. The clearer the spec, the better the scaffold — vague prompts produce vague code.

Build a Retrieval-Augmented Generation (RAG) system called
"OWASP Security Advisor" with a professional Streamlit GUI.

Stack:
- Data source: OWASP Top 10 2021 markdown files in documents/
- Backend: Ollama running Mistral (local, offline)
- Embeddings: sentence-transformers (all-MiniLM-L6-v2)
- Vector DB: FAISS
- Frontend: Streamlit chat-style UI

Requirements:
- Sidebar with Build/Rebuild Index, model selection, top-K slider
- Chat-style query interface
- Show vulnerability name, severity, explanation, prevention steps
- Display source chunks and similarity scores with each answer
- Export conversation as TXT and PDF
- Handle the case where Ollama is not running with a friendly message
- Use st.cache_resource to cache the embedder and the FAISS index
- Provide a run.bat that creates a venv, installs deps, and launches the app

Generate these files:
- streamlit_app.py
- rag_backend.py
- document_loader.py
- embedder.py
- vector_db.py
- retriever.py
- config.py
- requirements.txt
- .streamlit/config.toml
- run.bat

Pin versions in requirements.txt:
streamlit==1.28.1, langchain==0.1.0, sentence-transformers==2.2.2,
faiss-cpu==1.7.4, ollama==0.1.3, pypdf==3.17.1, python-dotenv==1.0.0,
tiktoken==0.5.2, requests==2.31.0

Hit enter. Copilot Agent goes to work — creating files, editing them, and iterating. A typical run produces eight Python modules, a pinned requirements.txt, a run.bat launcher, and a .streamlit/config.toml for theming.

Read the code before you trust it. Copilot writes confidently, not necessarily correctly. Skim each file, run the syntax check below, and fix anything that smells off.

Step 5 — Run the app

There are two ways to launch.

The easy way: double-click run.bat. The script will:

  1. Detect your Python launcher (py)
  2. Create a .venv312 virtual environment if missing
  3. Activate it
  4. Install dependencies from requirements.txt
  5. Launch Streamlit on http://localhost:8501

The manual way:

py -3.12 -m venv .venv312
.venv312\Scripts\Activate.ps1
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
python -m streamlit run streamlit_app.py

Either path opens a browser tab to the running app.

Step 6 — Build the index and start asking questions

The first time you open the app, click **Build / Rebuild Index** in the sidebar. The backend will:

  • Load every file in documents/
  • Normalize and chunk the text (default: 900 chars with 180 char overlap)
  • Embed each chunk with all-MiniLM-L6-v2
  • Write the vectors and metadata into vector_store/

For 33 OWASP files you'll end up with about 615 chunks. The system status panel confirms Ollama: running, Vector Store: indexed, and shows the counts.

Now ask it something:

"Explain Broken Access Control and how to prevent it."

After a few seconds you'll get a structured answer back: severity (High), an explanation, a list of prevention steps, and — crucially — the OWASP chunks that produced it, each with a cosine similarity score so you can see how confident the retrieval was.

Try a few more:

"What is the top LLM vulnerability?"

It pulls the OWASP LLM Top 10 chunk on Unbounded Consumption, marks it High severity, and lists mitigations: resource limits, model pruning, quantization.

"How does SQL Injection work, and how do we remediate it?"

Critical severity. Use parameterized queries, validate input server-side, use an ORM. Sources are visible right below the answer.

What to do when things break

A few common failures and their fixes:

  • "Streamlit is not recognized" — your terminal is using the system Python, not the venv. Either use run.bat or run python -m streamlit run streamlit_app.py.
  • "Ollama timeout" but the tray app is running — the model probably hasn't been pulled. Run ollama list and confirm mistral:latest is there. If not, ollama pull mistral.
  • faiss-cpu won't install on Python 3.13 — known issue. Drop back to Python 3.12.
  • Empty or low-confidence answers — your documents folder might be empty, or you forgot to click Build Index. Open the sidebar, click Build / Rebuild Index, and try again.

Things worth taking away

Three lessons from this build:

  1. Prompts beat code. A vague prompt ("build me a RAG app") produced about 100 lines of confused Python. The structured spec above produced a working project. The leverage from writing the prompt well is enormous.
  2. RAG is more approachable than it sounds. Once you split it into "embed", "store", "retrieve", and "generate", each part is just a small Python module. You don't need a vector database company — FAISS in a folder works fine for thousands of chunks.
  3. Local LLMs are real now. Two years ago you needed a GPU farm. Today, Mistral on Ollama runs on a classroom laptop. The first query is slow because the model warms up; subsequent queries on the same model are quick.

Read what Copilot writes. This is non-negotiable. Copilot is a faster typist than you are, but it doesn't know your intent. Treat its output the way you'd treat a confident intern's first draft: read it, test it, own the bugs.

Where to go next

A few directions worth exploring once the basic app is running:

  • Swap Mistral for a smaller model (phi3, llama3:8b) and benchmark answer quality vs latency
  • Add a second corpus — your company's secure-coding standards, OWASP Cheat Sheets, or PCI-DSS controls
  • Add a regression test set (a list of questions with expected vulnerability tags) and evaluate the RAG on every change
  • Move the index to a persistent vector DB if you outgrow FAISS
  • Wrap it as a desktop app (PyInstaller, electron-builder, etc.) so non-technical colleagues can use it

Whichever direction you pick, you now have a working starting point — built in an afternoon, running on your machine, with every answer cited to its source.

Go build something.

Back to Blog