# ComfyUI RunPod Serverless Project ## Project Overview Build a RunPod Serverless endpoint running ComfyUI with SageAttention 2.2 for image/video generation. Self-hosted frontend will call the RunPod API. ## Architecture - **RunPod Serverless**: Hosts ComfyUI worker with GPU inference - **Network Volume**: Mounts at `/userdata` - **Gitea Registry**: Hosts Docker image - **Frontend**: Self-hosted on home server, calls RunPod API over HTTPS ## Reference Environment (extracted from working pod) ### Base System - Ubuntu 22.04.5 LTS (Jammy) - Python 3.12.12 - CUDA 12.8 (nvcc 12.8.93) - cuDNN 9.8.0.87 - NCCL 2.25.1 ### PyTorch Stack - torch==2.8.0+cu128 - torchvision==0.23.0+cu128 - torchaudio==2.8.0+cu128 - triton==3.4.0 ### Key Dependencies - transformers==4.56.2 - diffusers==0.35.2 - accelerate==1.11.0 - safetensors==0.6.2 - onnxruntime-gpu==1.23.2 - opencv-python==4.12.0.88 - mediapipe==0.10.14 - insightface==0.7.3 - spandrel==0.4.1 - kornia==0.8.2 - einops==0.8.1 - timm==1.0.22 - peft==0.17.1 - gguf==0.17.1 - av==16.0.1 (video) - imageio-ffmpeg==0.6.0 ### Nunchaku (prebuilt wheel) ``` nunchaku @ https://github.com/nunchaku-tech/nunchaku/releases/download/v1.0.2/nunchaku-1.0.2+torch2.8-cp312-cp312-linux_x86_64.whl ``` ### ComfyUI - Location: `/workspace/ComfyUI` - Uses venv at `/workspace/ComfyUI/venv` - Commit: 532e2850794c7b497174a0a42ac0cb1fe5b62499 (Dec 24, 2025) ### Custom Nodes (from CUSTOM_NODES env var + actual install) ``` ltdrdata/ComfyUI-Manager jnxmx/ComfyUI_HuggingFace_Downloader kijai/ComfyUI-KJNodes Fannovel16/comfyui_controlnet_aux crystian/ComfyUI-Crystools Kosinkadink/ComfyUI-VideoHelperSuite willmiao/ComfyUI-Lora-Manager city96/ComfyUI-GGUF Fannovel16/ComfyUI-Frame-Interpolation nunchaku-tech/ComfyUI-nunchaku evanspearman/ComfyMath ssitu/ComfyUI_UltimateSDUpscale ``` ### Environment Variables (relevant) ```bash HF_HOME=/workspace/.cache/huggingface HF_HUB_ENABLE_HF_TRANSFER=1 TRANSFORMERS_CACHE=/workspace/.cache/huggingface/transformers PYTHONUNBUFFERED=1 LD_LIBRARY_PATH=/usr/local/cuda/lib64 LIBRARY_PATH=/usr/local/cuda/lib64/stubs ``` ### Network Volume Mount - Mount point: `/userdata` ## Technical Requirements ### SageAttention 2.2 (Critical) Must be compiled from source with no build isolation: ```bash git clone https://github.com/thu-ml/SageAttention.git cd SageAttention pip install triton export EXT_PARALLEL=4 NVCC_APPEND_FLAGS="--threads 8" MAX_JOBS=32 pip install --no-build-isolation -e . ``` ### Network Volume Structure ``` /userdata/ ├── models/ │ ├── checkpoints/ │ ├── loras/ │ ├── vae/ │ ├── controlnet/ │ ├── clip/ │ └── upscale_models/ └── .cache/ └── huggingface/ ``` ### Handler Requirements - Accept JSON input: `{"image": "base64", "prompt": "string", "workflow": {}}` - Image upload to ComfyUI if provided - Inject prompt into workflow at specified node - Queue workflow, poll for completion - Return output as base64: - Images: PNG/JPEG base64 - Videos: MP4 base64 (or presigned URL if >10MB) - Detect output type from workflow output node - Timeout handling (max 600s for video generation) ### Dockerfile Requirements - Base: `nvidia/cuda:12.8.1-devel-ubuntu22.04` (or equivalent with CUDA 12.8 devel) - Python 3.12 - PyTorch 2.8.0+cu128 from pytorch index - Install nunchaku from GitHub wheel - Compile SageAttention with --no-build-isolation - Symlink model directories to /userdata - Clone and install all custom nodes - Install ffmpeg for video handling - Expose handler as entrypoint ## File Structure ``` /project ├── Dockerfile ├── handler.py ├── requirements.txt ├── scripts/ │ └── install_custom_nodes.sh ├── workflows/ │ └── default_workflow_api.json └── README.md ``` ## Tasks 1. Create Dockerfile matching reference environment (CUDA 12.8, Python 3.12, PyTorch 2.8) 2. Create requirements.txt from extracted pip freeze (pruned to essentials) 3. Create install_custom_nodes.sh for all listed custom nodes 4. Create handler.py with ComfyUI API integration (image + video output support) 5. Document deployment steps in README.md ## Notes - Nick is a Principal Systems Engineer, prefers direct technical communication - Target deployment: RunPod Serverless with 5090 GPU - Development machine: RTX 3080 (forward compatible) - Registry: Self-hosted Gitea - Output will likely be video - ensure ffmpeg installed and handler detects output type - Reference pod uses venv - serverless image can install globally ## Claude Code Init Command ``` Read PROJECT.md fully. Build the Dockerfile first, matching the reference environment exactly: CUDA 12.8.1, Python 3.12, PyTorch 2.8.0+cu128, triton 3.4.0. Install nunchaku from the GitHub wheel URL. Compile SageAttention 2.2 with --no-build-isolation. Install all custom nodes listed. Symlink model paths to /userdata. Do not use a venv in the container. ```