DesTEngSsv006_swd

tlg/DesTEngSsv006_swd

SHA256

Fork 0

Commit Graph

Author	SHA256	Message	Date
tlg	813bbe0ad0	fix: VRAM eviction cascades through all tiers for large LLM loads The original eviction logic blocked ASR eviction even when an LLM genuinely needed all 16GB VRAM (e.g., gpt-oss-20b at 13GB). Now uses two-pass eviction: first evicts lower/same priority, then cascades to higher priority as last resort. Added tests for ASR-survives and full-cascade scenarios. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 09:22:14 +02:00
tlg	d7a091df8c	feat: VRAM manager with priority-based model eviction Tracks GPU VRAM usage (16GB) and handles model loading/unloading with priority-based eviction: LLM (lowest) -> TTS -> ASR (highest, protected). Uses asyncio Lock for concurrency safety. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 09:14:41 +02:00

Author

SHA256

Message

Date

tlg

813bbe0ad0

fix: VRAM eviction cascades through all tiers for large LLM loads

The original eviction logic blocked ASR eviction even when an LLM
genuinely needed all 16GB VRAM (e.g., gpt-oss-20b at 13GB). Now uses
two-pass eviction: first evicts lower/same priority, then cascades to
higher priority as last resort. Added tests for ASR-survives and
full-cascade scenarios.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-04 09:22:14 +02:00

tlg

d7a091df8c

feat: VRAM manager with priority-based model eviction

Tracks GPU VRAM usage (16GB) and handles model loading/unloading with
priority-based eviction: LLM (lowest) -> TTS -> ASR (highest, protected).
Uses asyncio Lock for concurrency safety.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-04 09:14:41 +02:00

2 Commits