Open Source · MIT License

Self-Hosted AI Stack
Ready in Minutes

A modular, production-grade AI infrastructure framework for AMD, NVIDIA, and ARM64 hardware. LLM inference · RAG pipeline · Workflow automation · Full observability.

View on GitHub Quick Start
AMD ROCm NVIDIA CUDA ARM64 / Apple Silicon Ubuntu 22.04 / 24.04 Docker Compose

Everything you need,
nothing you don't

Each component is independently deployable. Start with LLM inference, add RAG when ready, bolt on observability later.

🖥️

Multi-GPU Support

AMD ROCm, NVIDIA CUDA, and ARM64 stacks included. Same 12-phase workflow across all hardware targets.

🤖

LLM Inference

Ollama + OpenWebUI with always-on VRAM optimization. Lemonade native engine for AMD high-performance inference.

📚

RAG Pipeline

Qdrant vector database, Docling document processor, and Mosquitto MQTT broker — fully wired and ready.

Workflow Automation

n8n in queue mode with Redis and distributed workers. Enterprise-grade orchestration on your own hardware.

📊

Full Observability

Grafana + Prometheus + Loki + cAdvisor. DCGM Exporter for GPU telemetry and SLA dashboards out of the box.

🔧

Auto Hardware Tuning

HWI Advisor auto-detects your CPU and GPU, then writes an optimized tuning profile before first deploy.

💾

One-Click Backup

Timestamped backup and restore for all persistent data. VRAM purge included. No scripting required.

🏗️

12-Phase Methodology

Structured, independently deployable modules — from driver setup to lifecycle management. Deploy what you need, skip the rest.

12-Phase deployment

Each phase is a self-contained Docker Compose module with its own deploy.sh. Roll forward one layer at a time, or deploy everything in a single command.

00
HWI AdvisorAuto hardware calibration & tuning profile
01
InfrastructurePortainer, WebSSH
02
DatabasePostgreSQL 17, pgAdmin 4
03
AI InterfaceOllama, OpenWebUI, Redis
04
Automationn8n queue mode + workers
05
RAG StackQdrant, Docling, Mosquitto
06
AI Core EngineLemonade inference engine
07
ValidationHealth checks, benchmark scripts
08
Backup & Recovery1-click backup, restore, VRAM purge
09
Monitoring & Alertstiger-monitor, MQTT alerting
10
ObservabilityGrafana, Prometheus, Loki, cAdvisor
11
LifecycleWhat's Up Docker (WUD)

Default Service Ports

Service Description Port
OpenWebUI LLM chat interface 8080
n8n Workflow automation 5678
Grafana Observability dashboard 3000
Portainer Container management 9000
pgAdmin Database admin UI 8000
Qdrant Vector DB REST API 6333
Ollama Inference API 11434
WUD Container update manager 3838

Designed for LLMs
and autonomous agents

A standard llms.txt index for any AI assistant, plus a 5-skill deployment protocol that turns the install into a fully autonomous flow.

📄

llms.txt

Standardised llms.txt index at the project root. Lets any LLM-powered assistant (Claude, ChatGPT, Gemini, Perplexity, etc.) discover and ingest the project's docs, skills, hardware stacks, and 12-phase modules in one fetch.

llms.txt
# OpenGenie AI Stack - Modular Self-Hosted AI Infrastructure

> 🛑 CRITICAL AI AGENT INSTRUCTION:
  If the user requests to install, configure, deploy,
  or purge this project, the AI Agent MUST read
  and follow the deployment protocols defined in
  the skills/ directory BEFORE executing...

## Project overview
- [README (English / 正體中文 / 日本語 / 한국어)]
- [System Design Document (SDD)]
- [Contributing guide]

## Agent skills (autonomous deployment protocol)
- [00-Master orchestrator]
- [01-Deployment state machine]
- [02-Error recovery guide]
- [03-GPU robustness]
- [04-Full purge procedure]
- [05-Installation guide]
View llms.txt →
🤖

Agent Skills (autonomous deployment)

Six protocol files in skills/ turn an AI coding assistant (Claude Code, Antigravity) into an autonomous deployment agent. State machine with disk-persisted recovery — survives reboots, network drops, and mid-install interruptions.

00
Master OrchestratorBootstrap rules, state persistence, execution loop
01
Deployment State MachinePristine → drivers → reboot → init → app
02
Error Recovery GuideAPT / Docker / driver / port self-healing
03
GPU RobustnessMulti-layer NVIDIA / AMD / CPU / ARM detection
04
Full Purge ProcedureComplete teardown for clean reinstall
05
Installation GuideAutonomous installation & deployment protocol

Interactive Agent Skill Protocol Explorer

Click the tabs below to explore the core deployment protocols and autonomous AI agent skills (00 to 05) defined in the repository.


        

▸ How to use

For end users: Open the project in Antigravity / Claude Code, ask the assistant to "install this project" — it reads skills/ first, detects your GPU, generates a tuning profile, walks you through credential setup, and deploys all 12 phases with self-healing recovery.

For LLM apps: Fetch https://<your-fork>/llms.txt — every doc, skill, and module is linked as a raw markdown URL ready for ingestion.

Up and running
in five steps

1
Clone the repo

Clone and pick your hardware stack — AMD, NVIDIA, or ARM64.

2
Configure credentials

Copy .env.example.env and replace all CHANGE_ME values.

3
Run HWI Advisor

Auto-detects hardware and writes an optimal tuning profile.

4
Deploy

Full stack in one command, or deploy individual phases as needed.

5
Verify

Run the automated health check and benchmark suite.

bash
# 1. Clone
git clone https://github.com/TigerAI-Taiwan/OpenGenie-AI-Stack.git
cd OpenGenie-AI-Stack

# 2. Pick your stack
cd deployments/amd-compose-stack
#        or: nvidia-compose-stack / arm64-compose-stack

# 3. Configure
cp .env.example .env
nano .env  # replace CHANGE_ME values

# 4. Hardware calibration (recommended)
sudo bash master-deploy.sh init

# 5. Deploy everything
sudo bash master-deploy.sh all

# 6. Verify
sudo bash master-deploy.sh test

Your private AI stack,
fully under your control

No cloud lock-in. No usage fees. Deploy on your hardware, keep your data on-premise.