Commit Graph

55 Commits

Author SHA1 Message Date
Debian
85a07fcc5f Add ComfyUI output capture for crash debugging
All checks were successful
Build and Push Docker Image / build (push) Successful in 3m25s
- Add background thread to read ComfyUI stdout in real-time
- Store last 200 lines in circular buffer
- Echo output to RunPod logs with [ComfyUI] prefix
- Include last 100 lines in error responses for debugging
- Add comfyui_output field to error responses

This will help diagnose why ComfyUI crashes during generation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 03:15:10 +00:00
Debian
672381ddd0 Revert "Fix RTX 5090 crash: use sdpa attention instead of sageattn"
Some checks failed
Build and Push Docker Image / build (push) Has been cancelled
This reverts commit 1e60401679.
2026-01-11 03:13:01 +00:00
Debian
1e60401679 Fix RTX 5090 crash: use sdpa attention instead of sageattn
Some checks failed
Build and Push Docker Image / build (push) Has been cancelled
SageAttention was only compiled for A100 (sm80) and H100 (sm90).
RTX 5090 (Blackwell sm120) has no compatible kernel, causing ComfyUI
to crash during generation with "Connection reset by peer".

Switch to PyTorch's native SDPA which works on all architectures.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 03:10:06 +00:00
Debian
3c421cf7b8 Add job logging and increase timeout to 20 minutes
All checks were successful
Build and Push Frontend Docker Image / build (push) Successful in 39s
Build and Push Docker Image / build (push) Successful in 31m7s
- Add JobLogger class to handler.py for structured timestamped logging
- Increase MAX_TIMEOUT from 600s to 1200s (20 minutes)
- Add logs column to generated_content table via migration
- Store and display job execution logs in gallery UI
- Add Logs button to gallery items with modal display

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 02:10:55 +00:00
Debian
52dd0d8766 Add video viewer modal for gallery videos
All checks were successful
Build and Push Frontend Docker Image / build (push) Successful in 30s
- Click on video thumbnail to open in large viewer
- Video plays on hover, pauses when mouse leaves
- Close viewer by clicking X or clicking outside video

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 01:08:13 +00:00
Debian
ad4114ab82 Update default negative prompt and add image clear button
All checks were successful
Build and Push Frontend Docker Image / build (push) Successful in 29s
- Replace short negative prompt with comprehensive list
- Add X button to clear selected image before generating
- Allow selecting a new image by clearing the current one

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 00:38:30 +00:00
Debian
55af3da1ae Add privacy toggle to blur media on generate and gallery pages
All checks were successful
Build and Push Frontend Docker Image / build (push) Successful in 30s
- Add "Hide Media" / "Show Media" button to both sections
- Blur images and videos when privacy mode is active
- Persist privacy preference in localStorage per section

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 00:36:28 +00:00
Debian
ff54cf7363 Replace all inline onclick handlers with addEventListener
All checks were successful
Build and Push Frontend Docker Image / build (push) Successful in 57s
Inline onclick handlers on async functions fail silently when
promises reject. This affected delete buttons, edit buttons,
modal close/cancel buttons, and pagination.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 00:28:32 +00:00
Debian
965559f88d Remove debug find command that times out on large volumes
All checks were successful
Build and Push Docker Image / build (push) Successful in 32m5s
The find command searching for .safetensors across /runpod-volume
was timing out after 30 seconds on volumes with many files.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 22:20:30 +00:00
Debian
95dd159e89 Fix Delete button not responding to clicks in gallery
All checks were successful
Build and Push Frontend Docker Image / build (push) Successful in 29s
Same issue as View Progress button - replace inline onclick handler
with proper addEventListener to fix silent failures from async
promise rejections.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 21:54:18 +00:00
Debian
a7cb20fb37 Fix View Progress button not responding to clicks
All checks were successful
Build and Push Frontend Docker Image / build (push) Successful in 1m0s
Replace inline onclick handlers with proper addEventListener to fix
silent failures from async promise rejections. Add try-catch error
handling to show errors to user instead of failing silently.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 21:27:25 +00:00
Debian
8f050b41a0 Fix stuck processing jobs and increase timeouts
All checks were successful
Build and Push Frontend Docker Image / build (push) Successful in 57s
Build and Push Docker Image / build (push) Successful in 30m18s
Background Job Processor:
- Add src/services/jobProcessor.ts that polls RunPod every 30s for stuck jobs
- Automatically completes or fails jobs that were abandoned (user navigated away)
- Times out jobs after 25 minutes

Client-Side Resume:
- Add GET /api/generate/pending endpoint to fetch user's processing jobs
- Add checkPendingJobs() that runs on login/page load
- Show notification banner when user has jobs generating in background
- Add "View Progress" button to resume polling for a job

Timeout Increases (10min → 25min):
- src/utils/validators.ts: request validation max/default
- src/config.ts: RUNPOD_MAX_TIMEOUT_MS default
- public/js/app.js: client-side polling maxTime
- src/services/jobProcessor.ts: background processor timeout

CI/CD Optimization:
- Add paths-ignore to backend build.yaml to skip rebuilds on frontend-only changes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 05:36:53 +00:00
Debian
0758b866bd Move frontend workflow to root .gitea/workflows
All checks were successful
Build and Push Frontend Docker Image / build (push) Successful in 1m4s
Build and Push Docker Image / build (push) Successful in 31m0s
Gitea Actions only detects workflows at repo root, not subdirectories.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 04:59:36 +00:00
Debian
890543fb77 Add frontend service with auth, MFA, and content management
Some checks failed
Build and Push Docker Image / build (push) Has been cancelled
- Node.js/Express backend with TypeScript
- SQLite database for users, sessions, and content metadata
- Authentication with TOTP and WebAuthn MFA support
- Admin user auto-created on first startup
- User content gallery with view/delete functionality
- RunPod API proxy (keeps API keys server-side)
- Docker setup with CI/CD for Gitea registry

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 04:57:08 +00:00
Debian
8a5610a1e4 Remove torch.compile from model loaders entirely
All checks were successful
Build and Push Docker Image / build (push) Successful in 3m26s
Empty backend string still triggered inductor. Removing compile_args
connection from WanVideoModelLoader nodes (131, 132) ensures no
torch.compile is applied to the transformer blocks.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 08:35:24 +00:00
Debian
ab5b853521 Disable torch.compile to fix RTX 5090 CUDA error
All checks were successful
Build and Push Docker Image / build (push) Successful in 31m16s
The inductor backend was causing cudaErrorInvalidValue during Triton
kernel autotuning on Blackwell architecture. Setting backend to empty
string disables torch.compile entirely.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-06 00:22:52 +00:00
Debian
2b64ac96d2 Fix torch compile backend: 'disabled' no longer valid option
All checks were successful
Build and Push Docker Image / build (push) Successful in 30m37s
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 23:02:41 +00:00
Debian
a54312396b Disable torch.compile and fix video display
All checks were successful
Build and Push Docker Image / build (push) Successful in 33m17s
- Disable torch.compile (inductor -> disabled) to reduce cold start time
- Fix handler to detect video type from file extension, not output key
- Fix HTML to check filename extension for video display

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 08:40:03 +00:00
Debian
bcb39c615d Enable tiled_vae to prevent OOM on portrait images
Some checks failed
Build and Push Docker Image / build (push) Failing after 29m0s
Also add async polling to HTML test page for long-running jobs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 02:23:46 +00:00
Debian
c04e4b6250 Update for RTX 5090 (Blackwell sm_120) support
Some checks failed
Build and Push Docker Image / build (push) Failing after 29m16s
- Switch to PyTorch nightly with CUDA 12.8 (required for sm_120)
- Target TORCH_CUDA_ARCH_LIST="12.0" for Blackwell
- Remove nunchaku (incompatible with PyTorch nightly)
- Use latest SageAttention (has sm_120 kernel support)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 22:01:58 +00:00
Debian
9f71e6db57 Limit SageAttention to A100/H100 due to cross-compile issues
Some checks failed
Build and Push Docker Image / build (push) Has been cancelled
The sm90 kernels use wgmma instructions that can't be compiled for
sm86/sm89 targets. Restricting to 8.0 (A100) and 9.0 (H100) only.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 21:57:09 +00:00
Debian
352128aa39 Pin SageAttention to commit that supports GPU-less builds
Some checks failed
Build and Push Docker Image / build (push) Failing after 21m35s
Recent commits broke TORCH_CUDA_ARCH_LIST support, requiring a GPU
during build. Pin to 2aecfa8 which respects the env var.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 20:30:17 +00:00
Debian
c61aca4074 Reduce build parallelism to avoid OOM during SageAttention compile
Some checks failed
Build and Push Docker Image / build (push) Has been cancelled
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 20:17:10 +00:00
Debian
99fdda5b2b Add multi-GPU support and HTML test interface
Some checks failed
Build and Push Docker Image / build (push) Failing after 16m23s
- Update SageAttention CUDA arch list to support A100, A10, RTX 4090, L40, H100/H200
- Add interactive HTML test page for RunPod API testing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 10:25:08 +00:00
Nick
f69bbc2f45 Fix model symlinks to use /runpod-volume/ComfyUI/models/
All checks were successful
Build and Push Docker Image / build (push) Successful in 4m3s
Models are stored in /runpod-volume/ComfyUI/models/ on the network
volume, not /runpod-volume/models/. Updated all symlinks to match.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 00:07:22 +13:00
Nick
47e312a58b Fix RIFE model download URL
All checks were successful
Build and Push Docker Image / build (push) Successful in 4m11s
Use correct URL from styler00dollar/VSGAN-tensorrt-docker releases.
Also fix path to ckpts/rife/ subdirectory.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 23:53:44 +13:00
Nick
342ef24bba Pre-download RIFE v4.7 model for frame interpolation
Some checks failed
Build and Push Docker Image / build (push) Failing after 3m47s
The RIFE model is small (~15MB) and required for the workflow.
Pre-downloading avoids runtime download delays.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 23:46:57 +13:00
Nick
aa95315e3f Add diffusion_models and text_encoders symlinks
Some checks failed
Build and Push Docker Image / build (push) Has been cancelled
WanVideo models are stored in diffusion_models/, and the CLIP text
encoder is in text_encoders/. These were missing from the symlink setup.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 23:45:13 +13:00
Nick
6a92eff814 Add model path debug output on startup
Some checks failed
Build and Push Docker Image / build (push) Has been cancelled
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 23:44:31 +13:00
Nick
b7cecfd69e Use API format workflow instead of frontend format
All checks were successful
Build and Push Docker Image / build (push) Successful in 4m2s
The frontend-to-API conversion was using outdated widget names that
don't match the current WanVideo node API. Using the exported API
format workflow directly bypasses this issue.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 23:28:15 +13:00
Nick
dba11a9f45 Skip bypassed nodes (mode 4) in workflow conversion
All checks were successful
Build and Push Docker Image / build (push) Successful in 4m2s
Bypassed/muted nodes should not be included in the API workflow,
and connections from bypassed nodes should be ignored.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 23:19:24 +13:00
Nick
56e8e164ab Add queue response debug logging for node errors
Some checks failed
Build and Push Docker Image / build (push) Has been cancelled
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 23:18:03 +13:00
Nick
8a69e45b26 Debug output chain nodes with full inputs
All checks were successful
Build and Push Docker Image / build (push) Successful in 4m3s
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 23:06:12 +13:00
Nick
12e2ef3230 Add workflow node connection debug logging
All checks were successful
Build and Push Docker Image / build (push) Successful in 4m6s
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 18:21:56 +13:00
Nick
85e38bc3ec Remove worker restart step from CI
All checks were successful
Build and Push Docker Image / build (push) Successful in 45s
Rolling release from template update is fast enough (~1 min).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 18:13:27 +13:00
Nick
f86acde2e5 Fix RunPod API calls to use correct endpoints
All checks were successful
Build and Push Docker Image / build (push) Successful in 51s
- Use REST API to update template's docker image
- Use saveEndpoint mutation with required name field
- Cycle workers to 0 then back to force image refresh

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 10:26:46 +13:00
Nick
d0140fa2b3 Add RunPod endpoint update and worker purge to CI
All checks were successful
Build and Push Docker Image / build (push) Successful in 45s
Updates the serverless endpoint with new image tag and purges
existing workers to force restart with the new image after build.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 10:20:45 +13:00
Nick
70720c07d6 Add debug logging for workflow output history
All checks were successful
Build and Push Docker Image / build (push) Successful in 4m2s
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 10:15:19 +13:00
Nick
8eedb6b45a Add widget mappings for all WanVideo workflow nodes
All checks were successful
Build and Push Docker Image / build (push) Successful in 4m4s
The convert_frontend_to_api() function was missing mappings for most
node types, causing "Required input is missing" errors when the API
received workflows in frontend format.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 10:03:06 +13:00
Nick
4b32558675 Add ComfyUI-Custom-Scripts and ComfyUI-Easy-Use nodes
All checks were successful
Build and Push Docker Image / build (push) Successful in 7m7s
Fixes workflow error: MathExpression|pysssss node not found.
These nodes are required by the Wan22-I2V-Remix workflow.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 00:45:44 +13:00
Nick
283955f1f7 Add ComfyUI-WanVideoWrapper for WanVideo nodes
All checks were successful
Build and Push Docker Image / build (push) Successful in 7m19s
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 00:31:53 +13:00
Nick
69e91dd7f9 Switch to Docker Hub registry
All checks were successful
Build and Push Docker Image / build (push) Successful in 12m25s
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 23:33:32 +13:00
Nick
ed5bf01972 Use /runpod-volume mount point for RunPod network volumes
All checks were successful
Build and Push Docker Image / build (push) Successful in 3m47s
RunPod mounts network volumes at /runpod-volume, not /userdata.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 23:01:08 +13:00
Nick
929059f812 Target only H200 (sm_90) for SageAttention build
All checks were successful
Build and Push Docker Image / build (push) Successful in 25m31s
Blackwell (sm_100) may not be fully supported yet.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 22:12:59 +13:00
Nick
fef4b8d7ee Reduce CUDA compilation parallelism for 16GB RAM
Some checks failed
Build and Push Docker Image / build (push) Failing after 15m53s
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 16:58:39 +13:00
Nick
7a8b59f471 Target H200 and RTX 5090 GPU architectures
Some checks failed
Build and Push Docker Image / build (push) Failing after 47m51s
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 12:11:31 +13:00
Nick
bdc1d769e8 Set TORCH_CUDA_ARCH_LIST for SageAttention build
Some checks failed
Build and Push Docker Image / build (push) Has been cancelled
Build runner has no GPU, so specify target architectures explicitly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 12:09:57 +13:00
Nick
ab73c2d9be Optimize Dockerfile layers and reduce disk usage
Some checks failed
Build and Push Docker Image / build (push) Failing after 10m52s
- Combine PyTorch + triton install into single layer
- Add pip cache cleanup after each install step
- Change SageAttention to regular install and remove source after build
- Consolidate custom node dependencies into single layer
- Add CLAUDE.md, i2v-workflow.json, update handler.py and PROJECT.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 11:55:56 +13:00
Nick
accb698fd3 Add .gitattributes to enforce LF line endings
Some checks failed
Build and Push Docker Image / build (push) Failing after 9m42s
2025-12-26 10:23:45 +13:00
Nick
469911f2a7 Fix pip bootstrap for Python 3.12
Some checks failed
Build and Push Docker Image / build (push) Has been cancelled
2025-12-26 10:22:02 +13:00