Open Source · MIT License

Self-Hosted AI Stack
Ready in Minutes

A modular, production-grade AI infrastructure framework for AMD, NVIDIA, and ARM64 hardware. LLM inference · RAG pipeline · Workflow automation · Full observability.

View on GitHub Quick Start
AMD ROCm NVIDIA CUDA ARM64 / Apple Silicon Ubuntu 22.04 / 24.04 Docker Compose

Everything you need,
nothing you don't

Each component is independently deployable. Start with LLM inference, add RAG when ready, bolt on observability later.

🖥️

Multi-GPU Support

AMD ROCm, NVIDIA CUDA, and ARM64 stacks included. Same 12-phase workflow across all hardware targets.

🤖

LLM Inference

Ollama + OpenWebUI with always-on VRAM optimization. Lemonade native engine for AMD high-performance inference.

📚

RAG Pipeline

Qdrant vector database, Docling document processor, and Mosquitto MQTT broker — fully wired and ready.

Workflow Automation

n8n in queue mode with Redis and distributed workers. Enterprise-grade orchestration on your own hardware.

📊

Full Observability

Grafana + Prometheus + Loki + cAdvisor. DCGM Exporter for GPU telemetry and SLA dashboards out of the box.

🔧

Auto Hardware Tuning

HWI Advisor auto-detects your CPU and GPU, then writes an optimized tuning profile before first deploy.

💾

One-Click Backup

Timestamped backup and restore for all persistent data. VRAM purge included. No scripting required.

🏗️

12-Phase Methodology

Structured, independently deployable modules — from driver setup to lifecycle management. Deploy what you need, skip the rest.

12-Phase deployment

Each phase is a self-contained Docker Compose module with its own deploy.sh. Roll forward one layer at a time, or deploy everything in a single command.

00
HWI AdvisorAuto hardware calibration & tuning profile
01
InfrastructurePortainer, WebSSH
02
DatabasePostgreSQL 17, pgAdmin 4
03
AI InterfaceOllama, OpenWebUI, Redis
04
Automationn8n queue mode + workers
05
RAG StackQdrant, Docling, Mosquitto
06
AI Core EngineLemonade inference engine
07
ValidationHealth checks, benchmark scripts
08
Backup & Recovery1-click backup, restore, VRAM purge
09
Monitoring & Alertstiger-monitor, MQTT alerting
10
ObservabilityGrafana, Prometheus, Loki, cAdvisor
11
LifecycleWhat's Up Docker (WUD)

Default Service Ports

Service Description Port
OpenWebUILLM chat interface8080
n8nWorkflow automation5678
GrafanaObservability dashboard3000
PortainerContainer management9000
pgAdminDatabase admin UI8000
QdrantVector DB REST API6333
OllamaInference API11434
WUDContainer update manager3838

Up and running
in five steps

1
Clone the repo

Clone and pick your hardware stack — AMD, NVIDIA, or ARM64.

2
Configure credentials

Copy .env.example.env and replace all CHANGE_ME values.

3
Run HWI Advisor

Auto-detects hardware and writes an optimal tuning profile.

4
Deploy

Full stack in one command, or deploy individual phases as needed.

5
Verify

Run the automated health check and benchmark suite.

bash
# 1. Clone
git clone https://github.com/TigerAI-Taiwan/OpenGenie-AI-Stack.git
cd OpenGenie-AI-Stack

# 2. Pick your stack
cd deployments/amd-compose-stack
#        or: nvidia-compose-stack / arm64-compose-stack

# 3. Configure
cp .env.example .env
nano .env  # replace CHANGE_ME values

# 4. Hardware calibration (recommended)
sudo bash master-deploy.sh init

# 5. Deploy everything
sudo bash master-deploy.sh all

# 6. Verify
sudo bash master-deploy.sh test

Your private AI stack,
fully under your control

No cloud lock-in. No usage fees. Deploy on your hardware, keep your data on-premise.