Open Source · MIT License

Self-Hosted AI Stack
Ready in Minutes

A modular, production-grade AI infrastructure framework for AMD, NVIDIA, and ARM64 hardware. LLM inference · RAG pipeline · Workflow automation · Full observability.

View on GitHub Quick Start

AMD ROCm NVIDIA CUDA ARM64 / Apple Silicon Ubuntu 22.04 / 24.04 Docker Compose

Core Features

Everything you need,
nothing you don't

Each component is independently deployable. Start with LLM inference, add RAG when ready, bolt on observability later.

🖥️

Multi-GPU Support

AMD ROCm, NVIDIA CUDA, and ARM64 stacks included. Same 12-phase workflow across all hardware targets.

🤖

LLM Inference

Ollama + OpenWebUI with always-on VRAM optimization. Lemonade native engine for AMD high-performance inference.

📚

RAG Pipeline

Qdrant vector database, Docling document processor, and Mosquitto MQTT broker — fully wired and ready.

⚡

Workflow Automation

n8n in queue mode with Redis and distributed workers. Enterprise-grade orchestration on your own hardware.

📊

Full Observability

Grafana + Prometheus + Loki + cAdvisor. DCGM Exporter for GPU telemetry and SLA dashboards out of the box.

🔧

Auto Hardware Tuning

HWI Advisor auto-detects your CPU and GPU, then writes an optimized tuning profile before first deploy.

💾

One-Click Backup

Timestamped backup and restore for all persistent data. VRAM purge included. No scripting required.

🏗️

12-Phase Methodology

Structured, independently deployable modules — from driver setup to lifecycle management. Deploy what you need, skip the rest.

Architecture

12-Phase deployment

Each phase is a self-contained Docker Compose module with its own deploy.sh. Roll forward one layer at a time, or deploy everything in a single command.

HWI AdvisorAuto hardware calibration & tuning profile

InfrastructurePortainer, WebSSH

DatabasePostgreSQL 17, pgAdmin 4

AI InterfaceOllama, OpenWebUI, Redis

Automationn8n queue mode + workers

RAG StackQdrant, Docling, Mosquitto

AI Core EngineLemonade inference engine

ValidationHealth checks, benchmark scripts

Backup & Recovery1-click backup, restore, VRAM purge

Monitoring & Alertstiger-monitor, MQTT alerting

ObservabilityGrafana, Prometheus, Loki, cAdvisor

LifecycleWhat's Up Docker (WUD)

Default Service Ports

Service	Description	Port
OpenWebUI	LLM chat interface	8080
n8n	Workflow automation	5678
Grafana	Observability dashboard	3000
Portainer	Container management	9000
pgAdmin	Database admin UI	8000
Qdrant	Vector DB REST API	6333
Ollama	Inference API	11434
WUD	Container update manager	3838

Quick Start

Up and running
in five steps

Clone the repo

Clone and pick your hardware stack — AMD, NVIDIA, or ARM64.

Configure credentials

Copy .env.example → .env and replace all CHANGE_ME values.

Run HWI Advisor

Auto-detects hardware and writes an optimal tuning profile.

Deploy

Full stack in one command, or deploy individual phases as needed.

Verify

Run the automated health check and benchmark suite.

bash

# 1. Clone
git clone https://github.com/TigerAI-Taiwan/OpenGenie-AI-Stack.git
cd OpenGenie-AI-Stack

# 2. Pick your stack
cd deployments/amd-compose-stack
#        or: nvidia-compose-stack / arm64-compose-stack

# 3. Configure
cp .env.example .env
nano .env  # replace CHANGE_ME values

# 4. Hardware calibration (recommended)
sudo bash master-deploy.sh init

# 5. Deploy everything
sudo bash master-deploy.sh all

# 6. Verify
sudo bash master-deploy.sh test

Self-Hosted AI Stack Ready in Minutes

Everything you need,nothing you don't