- Dockerfile with CUDA 12.8.1, Python 3.12, PyTorch 2.8.0+cu128 - SageAttention 2.2 compiled from source - Nunchaku wheel installation - 12 custom nodes pre-installed - Handler with image/video output support - Model symlinks to /userdata network volume 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
164 lines
4.8 KiB
Markdown
164 lines
4.8 KiB
Markdown
# ComfyUI RunPod Serverless Project
|
|
|
|
## Project Overview
|
|
Build a RunPod Serverless endpoint running ComfyUI with SageAttention 2.2 for image/video generation. Self-hosted frontend will call the RunPod API.
|
|
|
|
## Architecture
|
|
- **RunPod Serverless**: Hosts ComfyUI worker with GPU inference
|
|
- **Network Volume**: Mounts at `/userdata`
|
|
- **Gitea Registry**: Hosts Docker image
|
|
- **Frontend**: Self-hosted on home server, calls RunPod API over HTTPS
|
|
|
|
## Reference Environment (extracted from working pod)
|
|
|
|
### Base System
|
|
- Ubuntu 22.04.5 LTS (Jammy)
|
|
- Python 3.12.12
|
|
- CUDA 12.8 (nvcc 12.8.93)
|
|
- cuDNN 9.8.0.87
|
|
- NCCL 2.25.1
|
|
|
|
### PyTorch Stack
|
|
- torch==2.8.0+cu128
|
|
- torchvision==0.23.0+cu128
|
|
- torchaudio==2.8.0+cu128
|
|
- triton==3.4.0
|
|
|
|
### Key Dependencies
|
|
- transformers==4.56.2
|
|
- diffusers==0.35.2
|
|
- accelerate==1.11.0
|
|
- safetensors==0.6.2
|
|
- onnxruntime-gpu==1.23.2
|
|
- opencv-python==4.12.0.88
|
|
- mediapipe==0.10.14
|
|
- insightface==0.7.3
|
|
- spandrel==0.4.1
|
|
- kornia==0.8.2
|
|
- einops==0.8.1
|
|
- timm==1.0.22
|
|
- peft==0.17.1
|
|
- gguf==0.17.1
|
|
- av==16.0.1 (video)
|
|
- imageio-ffmpeg==0.6.0
|
|
|
|
### Nunchaku (prebuilt wheel)
|
|
```
|
|
nunchaku @ https://github.com/nunchaku-tech/nunchaku/releases/download/v1.0.2/nunchaku-1.0.2+torch2.8-cp312-cp312-linux_x86_64.whl
|
|
```
|
|
|
|
### ComfyUI
|
|
- Location: `/workspace/ComfyUI`
|
|
- Uses venv at `/workspace/ComfyUI/venv`
|
|
- Commit: 532e2850794c7b497174a0a42ac0cb1fe5b62499 (Dec 24, 2025)
|
|
|
|
### Custom Nodes (from CUSTOM_NODES env var + actual install)
|
|
```
|
|
ltdrdata/ComfyUI-Manager
|
|
jnxmx/ComfyUI_HuggingFace_Downloader
|
|
kijai/ComfyUI-KJNodes
|
|
Fannovel16/comfyui_controlnet_aux
|
|
crystian/ComfyUI-Crystools
|
|
Kosinkadink/ComfyUI-VideoHelperSuite
|
|
willmiao/ComfyUI-Lora-Manager
|
|
city96/ComfyUI-GGUF
|
|
Fannovel16/ComfyUI-Frame-Interpolation
|
|
nunchaku-tech/ComfyUI-nunchaku
|
|
evanspearman/ComfyMath
|
|
ssitu/ComfyUI_UltimateSDUpscale
|
|
```
|
|
|
|
### Environment Variables (relevant)
|
|
```bash
|
|
HF_HOME=/workspace/.cache/huggingface
|
|
HF_HUB_ENABLE_HF_TRANSFER=1
|
|
TRANSFORMERS_CACHE=/workspace/.cache/huggingface/transformers
|
|
PYTHONUNBUFFERED=1
|
|
LD_LIBRARY_PATH=/usr/local/cuda/lib64
|
|
LIBRARY_PATH=/usr/local/cuda/lib64/stubs
|
|
```
|
|
|
|
### Network Volume Mount
|
|
- Mount point: `/userdata`
|
|
|
|
## Technical Requirements
|
|
|
|
### SageAttention 2.2 (Critical)
|
|
Must be compiled from source with no build isolation:
|
|
```bash
|
|
git clone https://github.com/thu-ml/SageAttention.git
|
|
cd SageAttention
|
|
pip install triton
|
|
export EXT_PARALLEL=4 NVCC_APPEND_FLAGS="--threads 8" MAX_JOBS=32
|
|
pip install --no-build-isolation -e .
|
|
```
|
|
|
|
### Network Volume Structure
|
|
```
|
|
/userdata/
|
|
├── models/
|
|
│ ├── checkpoints/
|
|
│ ├── loras/
|
|
│ ├── vae/
|
|
│ ├── controlnet/
|
|
│ ├── clip/
|
|
│ └── upscale_models/
|
|
└── .cache/
|
|
└── huggingface/
|
|
```
|
|
|
|
### Handler Requirements
|
|
- Accept JSON input: `{"image": "base64", "prompt": "string", "workflow": {}}`
|
|
- Image upload to ComfyUI if provided
|
|
- Inject prompt into workflow at specified node
|
|
- Queue workflow, poll for completion
|
|
- Return output as base64:
|
|
- Images: PNG/JPEG base64
|
|
- Videos: MP4 base64 (or presigned URL if >10MB)
|
|
- Detect output type from workflow output node
|
|
- Timeout handling (max 600s for video generation)
|
|
|
|
### Dockerfile Requirements
|
|
- Base: `nvidia/cuda:12.8.1-devel-ubuntu22.04` (or equivalent with CUDA 12.8 devel)
|
|
- Python 3.12
|
|
- PyTorch 2.8.0+cu128 from pytorch index
|
|
- Install nunchaku from GitHub wheel
|
|
- Compile SageAttention with --no-build-isolation
|
|
- Symlink model directories to /userdata
|
|
- Clone and install all custom nodes
|
|
- Install ffmpeg for video handling
|
|
- Expose handler as entrypoint
|
|
|
|
## File Structure
|
|
```
|
|
/project
|
|
├── Dockerfile
|
|
├── handler.py
|
|
├── requirements.txt
|
|
├── scripts/
|
|
│ └── install_custom_nodes.sh
|
|
├── workflows/
|
|
│ └── default_workflow_api.json
|
|
└── README.md
|
|
```
|
|
|
|
## Tasks
|
|
1. Create Dockerfile matching reference environment (CUDA 12.8, Python 3.12, PyTorch 2.8)
|
|
2. Create requirements.txt from extracted pip freeze (pruned to essentials)
|
|
3. Create install_custom_nodes.sh for all listed custom nodes
|
|
4. Create handler.py with ComfyUI API integration (image + video output support)
|
|
5. Document deployment steps in README.md
|
|
|
|
## Notes
|
|
- Nick is a Principal Systems Engineer, prefers direct technical communication
|
|
- Target deployment: RunPod Serverless with 5090 GPU
|
|
- Development machine: RTX 3080 (forward compatible)
|
|
- Registry: Self-hosted Gitea
|
|
- Output will likely be video - ensure ffmpeg installed and handler detects output type
|
|
- Reference pod uses venv - serverless image can install globally
|
|
|
|
## Claude Code Init Command
|
|
```
|
|
Read PROJECT.md fully. Build the Dockerfile first, matching the reference environment exactly: CUDA 12.8.1, Python 3.12, PyTorch 2.8.0+cu128, triton 3.4.0. Install nunchaku from the GitHub wheel URL. Compile SageAttention 2.2 with --no-build-isolation. Install all custom nodes listed. Symlink model paths to /userdata. Do not use a venv in the container.
|
|
```
|