8 Commits

Author SHA256 Message Date
tlg
da35e94b16 fix: add triton kernels for MXFP4, fix GGUF KV cache quantization
- Add 'kernels' package to Dockerfile for native MXFP4 execution
  (fixes gpt-oss-20b OOM: 15.2GB→13.5GB)
- Reduce GGUF n_ctx from 8192 to 4096 and quantize KV cache to Q8_0
  to reduce VRAM usage
- Use GGML_TYPE_Q8_0 constant instead of string for type_k/type_v

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 22:49:16 +02:00
tlg
aa7a160118 fix: proper VRAM cleanup on model unload + CUDA alloc config
- Force gc.collect() before torch.cuda.empty_cache() to ensure all
  model references are released
- Set PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True in container

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 17:59:23 +02:00
tlg
d3285bad8a fix: add accelerate package for transformers device_map support
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 17:19:17 +02:00
tlg
f2f73d204c fix: Dockerfile multi-stage build with working dependency resolution
- Multi-stage: devel image builds llama-cpp-python with CUDA, runtime
  image gets the compiled library via COPY
- chatterbox-tts installed --no-deps to prevent torch 2.6 downgrade
- librosa and diskcache added as explicit chatterbox/llama-cpp deps
- All imports verified with GPU passthrough

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 15:46:34 +02:00
tlg
d6a3fe5427 fix: Dockerfile uses explicit pip install, skip pre-installed packages
Removed librosa (unused), torch, pyyaml from install list since
they're in the base image. Avoid numpy rebuild conflict.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 14:10:07 +02:00
tlg
8816a06369 fix: add --break-system-packages for pip in container
PyTorch base image uses PEP 668 externally-managed Python.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 14:07:14 +02:00
tlg
d5a98879c9 fix: use full Docker Hub registry path in Dockerfile
Podman requires docker.io/ prefix when unqualified-search registries
are not configured.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 13:04:53 +02:00
tlg
1a26d34ea5 feat: Dockerfile, model download script, and pod creation script
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 10:09:34 +02:00