Tracks GPU VRAM usage (16GB) and handles model loading/unloading with
priority-based eviction: LLM (lowest) -> TTS -> ASR (highest, protected).
Uses asyncio Lock for concurrency safety.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements create_api_key_dependency() FastAPI dependency that validates
Bearer tokens against a configured list of ApiKey objects (401 on missing,
malformed, or unknown tokens). Includes 5 TDD tests covering all cases.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>