DesTEngSsv006_swd/kischdle/llmux/config/models.yaml at 7c4bbe0b294d3bd215695b0160a56c75a240ce3892bbc600a1c2c13f3168ee69

SHA256

Files

tlg 7c4bbe0b29 feat: Jinja template thinking toggle, Qwen3.5-9B GGUF Q8_0

- Thinking/Instruct toggle via Jinja template patching in llama-cpp
  backend: creates separate handlers for thinking-enabled and
  thinking-disabled modes
- Replace lovedheart/Qwen3.5-9B-FP8 (safetensors, 15.8GB OOM) with
  unsloth/Qwen3.5-9B-GGUF Q8_0 (9.2GB, fits)
- Enable flash_attn in llama-cpp for better performance
- GGUF path resolution falls back to flat gguf/ directory

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-06 09:44:02 +02:00

2.9 KiB

Raw Blame History

View Raw

2.9 KiB Raw Blame History

2.9 KiB

Raw Blame History