- gc.collect() + torch.cuda.empty_cache() in unload for reliable VRAM release
- POST /admin/clear-vram endpoint unloads all models and reports GPU memory
- VRAMManager.clear_all() method for programmatic VRAM cleanup
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>