
Chosen Topic:
Local LLMs > Setting up LLaMA.cpp on Debian for Offline AI Chat
This should appeal to users trying to run models locally without relying on cloud APIs.
Introduction
Large Language Models (LLMs) like ChatGPT have revolutionized how we interact with machines — but most of them rely on the cloud, leaking data and requiring internet access.
Want full control, privacy, and no OpenAI API bills?
Meet llama.cpp, a blazing-fast C++ implementation of Meta’s LLaMA models. In this tutorial, we’ll walk you through setting up llama.cpp on Debian — no internet required after install. Great for self-hosted AI, air-gapped systems, and off-grid enthusiasts.
What You’ll Need
| Item | Details |
|---|---|
| OS | Debian 12 (Bookworm) or Ubuntu 22.04+ |
| RAM | 8–16 GB (for 7B model) |
| CPU | Modern x86_64 or ARM64 (Apple M1 works too) |
| Tools | git, cmake, g++, Python (optional) |
| Model | LLaMA 2 or Mistral (converted to GGUF) |
This guide assumes you’ve acquired LLaMA models legally and follow Meta’s license terms.
Step 1: Install Dependencies
sudo apt update && sudo apt install build-essential cmake git
Step 2: Clone llama.cpp
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp
Optional (for CPU optimizations):
make LLAMA_OPENBLAS=1
Step 3: Prepare Your Model (GGUF Format)
- Download a LLaMA 2 or Mistral model converted to .gguf format.
- HuggingFace has links (you need authorization for LLaMA 2)
- Example: llama-2-7b-chat.gguf
- Move your .gguf model into llama.cpp/models/
mkdir models && mv ~/Downloads/llama-2-7b-chat.gguf models/
Step 4: Run the Chat!
Basic terminal interaction:
./main -m models/llama-2-7b-chat.gguf -n 128
To get an interactive prompt:
./chat -m models/llama-2-7b-chat.gguf
Optional: Use a Web UI (Ollama / LocalAI)
If you want a more user-friendly interface:
Option 1: Ollama
curl -fsSL https://ollama.com/install.sh | sh
Then run:
ollama run llama2
Option 2: LocalAI
Supports OpenAI-compatible API + Whisper + embeddings!
Bonus: Fully Offline Setup
Want this on a fully air-gapped system?
- Download all dependencies + models on a connected machine
- Transfer via USB
- Build everything from source
- Create scripts for launching and interacting
Combine with Whisper.cpp for full offline speech-to-text AI!
Troubleshooting
| Problem | Solution |
|---|---|
| Out of memory | Try 3B or 7B models |
| Model won’t load | Check GGUF format version |
| Permission denied | Run chmod +x on binaries |
| Slow performance | Compile with OpenBLAS or AVX2 |
Conclusion
You’ve now got a fully private, offline AI chatbot running locally with no API keys, no data leaks, and full control. Welcome to the future of open-source intelligence.
You can even pair this with Piper TTS and Whisper.cpp for a voice assistant that doesn’t phone home.
