DesTEngSsv006_swd

SHA256

Go to file

tlg 813bbe0ad0 fix: VRAM eviction cascades through all tiers for large LLM loads

The original eviction logic blocked ASR eviction even when an LLM
genuinely needed all 16GB VRAM (e.g., gpt-oss-20b at 13GB). Now uses
two-pass eviction: first evicts lower/same priority, then cascades to
higher priority as last resort. Added tests for ASR-survives and
full-cascade scenarios.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-04 09:22:14 +02:00

kischdle/llmux

fix: VRAM eviction cascades through all tiers for large LLM loads

2026-04-04 09:22:14 +02:00

.gitignore

Initial commit with .gitignore

2026-03-31 17:58:54 +02:00