DesTEngSsv006_swd

SHA256

Author	SHA256	Message	Date
tlg	d615bb4553	fix: Chatterbox uses separate classes per variant, remove turbo ChatterboxTTS and ChatterboxMultilingualTTS are separate classes. Turbo variant doesn't exist in chatterbox-tts 0.1.7. Multilingual generate() requires language_id parameter. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 21:43:40 +02:00
tlg	f24a225baf	fix: resolve GGUF paths through HF cache, add model_id to GGUF config llama-cpp-python backend now uses huggingface_hub to resolve GGUF file paths within the HF cache structure instead of assuming flat /models/ directory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 21:33:36 +02:00
tlg	38e1523d7e	feat: proper VRAM cleanup and admin clear-vram endpoint - gc.collect() + torch.cuda.empty_cache() in unload for reliable VRAM release - POST /admin/clear-vram endpoint unloads all models and reports GPU memory - VRAMManager.clear_all() method for programmatic VRAM cleanup Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 21:03:39 +02:00
tlg	aa7a160118	fix: proper VRAM cleanup on model unload + CUDA alloc config - Force gc.collect() before torch.cuda.empty_cache() to ensure all model references are released - Set PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True in container Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 17:59:23 +02:00
tlg	17818a3860	feat: FastAPI app assembly with all routes and backend wiring Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 10:04:56 +02:00
tlg	d55c80ae35	feat: API routes for models, chat, transcription, speech, and admin Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 10:04:45 +02:00
tlg	ef44bc09b9	feat: Chatterbox TTS backend with turbo/multilingual/default variants	2026-04-04 09:40:42 +02:00
tlg	c6677dcab3	feat: llama-cpp-python backend with GGUF, vision, and tool support	2026-04-04 09:40:40 +02:00
tlg	de25b5e2a7	feat: transformers ASR backend for cohere-transcribe	2026-04-04 09:40:39 +02:00
tlg	449e37d318	feat: abstract base class for model backends Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 09:29:35 +02:00
tlg	813bbe0ad0	fix: VRAM eviction cascades through all tiers for large LLM loads The original eviction logic blocked ASR eviction even when an LLM genuinely needed all 16GB VRAM (e.g., gpt-oss-20b at 13GB). Now uses two-pass eviction: first evicts lower/same priority, then cascades to higher priority as last resort. Added tests for ASR-survives and full-cascade scenarios. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 09:22:14 +02:00
tlg	d7a091df8c	feat: VRAM manager with priority-based model eviction Tracks GPU VRAM usage (16GB) and handles model loading/unloading with priority-based eviction: LLM (lowest) -> TTS -> ASR (highest, protected). Uses asyncio Lock for concurrency safety. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 09:14:41 +02:00
tlg	969bcb3292	feat: API key authentication dependency Implements create_api_key_dependency() FastAPI dependency that validates Bearer tokens against a configured list of ApiKey objects (401 on missing, malformed, or unknown tokens). Includes 5 TDD tests covering all cases. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 07:31:30 +02:00
tlg	c4eaf5088b	feat: model registry with virtual-to-physical resolution Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 07:31:10 +02:00
tlg	690ad46d88	feat: config loading for models.yaml and api_keys.yaml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 07:30:13 +02:00
tlg	a64f32b590	feat: project scaffolding with config files and test fixtures Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-04 07:23:14 +02:00

16 Commits