Model Gallery

5 models from 1 repositories

Filter by type:

Filter by tags:

qwopus3.5-9b-coder-mtp
# ๐ŸŒŸ Qwopus3.5-9B-v3.5 ## ๐Ÿ’ก Model Overview & v3.5 Design Qwopus3.5-9B-v3.5 is a **data-scaled continuation** of the Qwopus3.5-9B-v3 model. The training data in v3.5 is expanded to cover a broader range of domains, including mathematics, programming, puzzle-solving, multilingual dialogue, instruction-following, multi-turn interactions, and STEM-related tasks. Qwopus3.5-9B-v3.5 is a reasoning-enhanced model based on **Qwen3.5-9B**, designed for: - ๐Ÿงฉ Structured reasoning - ๐Ÿ”ง Tool-augmented workflows - ๐Ÿ” Multi-step agentic tasks - โšก Token-efficient inference Compared with Qwopus3.5-9B-v3, **3.5 version does not introduce a new architecture, RL stage, or template redesign**. This version is trained with approximately **2ร— more SFT data**. ## ๐ŸŽฏ Motivation & Generalization Insight The motivation behind v3.5 comes from a simple observation: > This work is motivated by the hypothesis that scaling high-quality SFT data may further enhance the generalization ability of large language models. In earlier Qwopus3.5 experiments, structured reasoning was observed to improve both **accuracy and efficiency**: ...

Repository: localaiLicense: apache-2.0

qwopus3.6-27b-v2-mtp
๐Ÿช Qwopus3.6-27B-v2-MTP MTP Release Multi-Token Prediction reasoning model fine-tuned from Qwen3.6-27B ๐Ÿงฌ Trace Inversion & Negentropy ๐Ÿง  27B Parameters โšก Speculative Decoding ๐Ÿ› ๏ธ Coding / DevOps / Math ๐Ÿ’ก What is Qwopus3.6-27B-v2-MTP? ๐Ÿช Qwopus3.6-27B-v2-MTP is a speed-oriented reasoning release built on top of Qwen3.6-27B. It keeps the Qwopus line's focus on reconstructed reasoning traces, coding discipline, DevOps procedures, and mathematical derivations, while adding Multi-Token Prediction for faster generation. The goal is simple: preserve the depth and structure of a 27B reasoning model while making real interactive use noticeably faster. โšก MTP DecodingAuxiliary future-token prediction improves throughput on long reasoning, code, math, and strict-format prompts. ๐Ÿงฉ Structured ReasoningInherits the Qwopus training recipe built around reconstructed step-by-step reasoning trajectories. ๐Ÿงช GB10 TestedValidated on a 30-question local benchmark across Logic, Coding, DevOps, Math, and Edge tasks. ๐Ÿš€ Practical SpeedDesigned for workflows where strong answers matter, but waiting several extra minutes per task does not. ...

Repository: localaiLicense: apache-2.0

gemma-4-e2b-it:sglang-mtp
Google Gemma 4 E2B-IT served by SGLang with Multi-Token Prediction (MTP) speculative decoding. The companion drafter google/gemma-4-E2B-it-assistant lets the target accept several tokens per step. Flags are a 1:1 transcription of the SGLang cookbook's MTP command (NEXTN algorithm, num_steps=5, num_draft_tokens=6, eagle_topk=1, mem_fraction_static=0.85). The E2B variant has 5B total / 2B effective parameters and targets the smaller end of consumer GPUs.

Repository: localaiLicense: gemma

gemma-4-e4b-it:sglang-mtp
Google Gemma 4 E4B-IT served by SGLang with Multi-Token Prediction (MTP) speculative decoding. The companion drafter google/gemma-4-E4B-it-assistant lets the target accept several tokens per step. Flags are a 1:1 transcription of the SGLang cookbook's MTP command (NEXTN algorithm, num_steps=5, num_draft_tokens=6, eagle_topk=1, mem_fraction_static=0.85). The E4B variant has 8B total / 4B effective parameters โ€” the natural pick for consumer GPUs in the 16โ€“24 GB range.

Repository: localaiLicense: gemma

mimo-7b-mtp:sglang
Xiaomi MiMo-7B-RL served by SGLang with built-in Multi-Token Prediction (MTP) heads (no separate drafter needed) plus online fp8 weight quantization to fit on a 16 GB consumer GPU. ~90% acceptance per the model card. Verified end-to-end at ~88 tok/s on an RTX 5070 Ti (16 GB). Note: mem_fraction_static is dropped to 0.7 (vs sglang's 0.85 default) because the MTP draft worker's vocab embedding is loaded unquantised (~1.2 GiB) and OOMs the static reservation otherwise.

Repository: localaiLicense: mit