Added three prompt files

2026-04-30 12:30:14 +02:00
parent f004d00c28
commit ded8d3c5fb
3 changed files with 269 additions and 0 deletions
--- a/AI-Models_Research.md
+++ b/AI-Models_Research.md
@@ -0,0 +1,78 @@
+# AI models research
+
+## Quantization impact on Qwen 27B
+
+### My on-premise setup
+
+Just as a side info: I run a NVIDIA RTX5070 Ti with 16 GB VRAM and it's Blackwell architecture allows performance improvements with 4-bit quantized AI models.
+
+### Motivation
+
+I want to find out how quantization degrades intelligence and improves speed by looking at real-world reports and comparisons.
+
+### AI model
+
+The model I would like to see compared is 'Qwen3.6-27B' but since this model is pretty new it also would be OK to see comparisons of 'Qwen3.5-27B'.
+I am interested in both reasoning / thinking mode and instruct mode.
+
+### Quantizations
+
+Comparisons between the original model weights size 16-bit and quantized with 8-bit and (most important) quantized with 4-bit are desired.
+
+### Your task
+
+Please perform a deep research to find the requested experience reports and comparisons.
+
+### Ask questions first
+
+Before starting, ask me between 2 and 5 questions to completely understand the situation and your task.
+
+---
+
+Instruction following, terminal coding, logic reasoning
+
+Only Qwen models with 27B or less than 27B
+
+---
+
+My on-premise setup was provided just as a side info. No need to take it into account for the deep research.
+
+To 1.: No model offloading at all.
+To 2.: My inference framework plans are not relevant for the deep research.
+To 3.: Instruction following, terminal coding, logic reasoning.
+To 4.: Comparisons with FP4 would be great, yes, try to find such reports.
+
+---
+
+You are wrong. Here are the Hugging Face model webpages to show you the models exist (but obviouosly were released after your Knowledge-Cutoff date):
+- [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B)
+- [Qwen/Qwen3.5-27B](https://huggingface.co/Qwen/Qwen3.5-27B)
+
+---
+
+## To 1. **Precision formats you care about most**
+
+- With “16‑bit”, I mean the original Qwen model which has BF16 tensors.
+- With “8‑bit”, I mean all of GPTQ‑Int8, AWQ‑Int8, and GGUF‑Q8_0.
+- With “4‑bit”, I mean all of GPTQ‑Int4, AWQ‑Int4, GGUF‑Q4_K_M or other variants like NVFP4.
+
+I am open to whichever 4‑bit/8‑bit quantization is best‑studied for Qwen 27B models.
+
+## To 2. **Workload focus: reasoning vs coding vs general chat**
+
+I care most for these two: *reasoning* and *code generation / debugging*
+
+## To 3. **Metric priorities**
+
+For “intelligence loss”, I want *standard eval scores* and *task‑specific pass‑rates*.
+For “speed”, I care for both *first‑token latency* and *throughput*.
+
+## To 4. **Inference stack hints**  
+
+For the deep research, my plans for the inference stack are not relevant. Any stack is interesting and might impact my inference stack preference.
+
+## To 5. **Local‑only vs “cloud‑style” scores**
+
+I'm also okay with *multi‑GPU BF16 numbers* that illustrate the “ceiling” of un‑quantized performance.
+
+---