LocalAI - Models

kat-coder-v2.5-dev

KAT-Coder-V2.5-Dev is an Apache-2.0 agentic coding model from Kwaipilot, post-trained from Qwen3.6-35B-A3B. It has 35 billion total parameters with 3 billion activated per token, a 262K-token context window, and text-only weights tuned for repository-level coding and tool use. This entry offers standard Q4_K_M and Q8_0 GGUF quantizations alongside APEX mixed-precision variants with quality, balanced, compact, and mini profiles. The APEX I-profiles use importance-matrix calibration.

Links

Tags

kat-coder-v2.5-dev-q8

KAT-Coder-V2.5-Dev is an Apache-2.0 agentic coding model from Kwaipilot, post-trained from Qwen3.6-35B-A3B. It has 35 billion total parameters with 3 billion activated per token, a 262K-token context window, and text-only weights tuned for repository-level coding and tool use. This entry uses the higher-quality Q8_0 GGUF quantization.

Links

Tags

kat-coder-v2.5-dev-apex-i-quality

KAT-Coder-V2.5-Dev is an Apache-2.0 agentic coding model from Kwaipilot, post-trained from Qwen3.6-35B-A3B. It has 35 billion total parameters with 3 billion activated per token, a 262K-token context window, and text-only weights tuned for repository-level coding and tool use. This entry offers standard Q4_K_M and Q8_0 GGUF quantizations alongside APEX mixed-precision variants with quality, balanced, compact, and mini profiles. The APEX I-profiles use importance-matrix calibration.

Links

Tags

kat-coder-v2.5-dev-apex-i-balanced

KAT-Coder-V2.5-Dev is an Apache-2.0 agentic coding model from Kwaipilot, post-trained from Qwen3.6-35B-A3B. It has 35 billion total parameters with 3 billion activated per token, a 262K-token context window, and text-only weights tuned for repository-level coding and tool use. This entry offers standard Q4_K_M and Q8_0 GGUF quantizations alongside APEX mixed-precision variants with quality, balanced, compact, and mini profiles. The APEX I-profiles use importance-matrix calibration.

Links

Tags

kat-coder-v2.5-dev-apex-i-compact

KAT-Coder-V2.5-Dev is an Apache-2.0 agentic coding model from Kwaipilot, post-trained from Qwen3.6-35B-A3B. It has 35 billion total parameters with 3 billion activated per token, a 262K-token context window, and text-only weights tuned for repository-level coding and tool use. This entry offers standard Q4_K_M and Q8_0 GGUF quantizations alongside APEX mixed-precision variants with quality, balanced, compact, and mini profiles. The APEX I-profiles use importance-matrix calibration.

Links

Tags

kat-coder-v2.5-dev-apex-i-mini

KAT-Coder-V2.5-Dev is an Apache-2.0 agentic coding model from Kwaipilot, post-trained from Qwen3.6-35B-A3B. It has 35 billion total parameters with 3 billion activated per token, a 262K-token context window, and text-only weights tuned for repository-level coding and tool use. This entry offers standard Q4_K_M and Q8_0 GGUF quantizations alongside APEX mixed-precision variants with quality, balanced, compact, and mini profiles. The APEX I-profiles use importance-matrix calibration.

Links

Tags

kat-coder-v2.5-dev-apex-quality

KAT-Coder-V2.5-Dev is an Apache-2.0 agentic coding model from Kwaipilot, post-trained from Qwen3.6-35B-A3B. It has 35 billion total parameters with 3 billion activated per token, a 262K-token context window, and text-only weights tuned for repository-level coding and tool use. This entry offers standard Q4_K_M and Q8_0 GGUF quantizations alongside APEX mixed-precision variants with quality, balanced, compact, and mini profiles. The APEX I-profiles use importance-matrix calibration.

Links

Tags

kat-coder-v2.5-dev-apex-balanced

KAT-Coder-V2.5-Dev is an Apache-2.0 agentic coding model from Kwaipilot, post-trained from Qwen3.6-35B-A3B. It has 35 billion total parameters with 3 billion activated per token, a 262K-token context window, and text-only weights tuned for repository-level coding and tool use. This entry offers standard Q4_K_M and Q8_0 GGUF quantizations alongside APEX mixed-precision variants with quality, balanced, compact, and mini profiles. The APEX I-profiles use importance-matrix calibration.

Links

Tags

kat-coder-v2.5-dev-apex-compact

KAT-Coder-V2.5-Dev is an Apache-2.0 agentic coding model from Kwaipilot, post-trained from Qwen3.6-35B-A3B. It has 35 billion total parameters with 3 billion activated per token, a 262K-token context window, and text-only weights tuned for repository-level coding and tool use. This entry offers standard Q4_K_M and Q8_0 GGUF quantizations alongside APEX mixed-precision variants with quality, balanced, compact, and mini profiles. The APEX I-profiles use importance-matrix calibration.

Links

Tags

qwopus3.6-35b-a3b-coder-mtp

# 🌟 Qwopus3.6-35B-A3B-v1 ## 💡 Base Model Overview **Qwen3.6-35B-A3B** is an advanced hybrid sparse MoE (Mixture-of-Experts) model developed by Alibaba Cloud. It features 35B total parameters with only 3B active parameters per token, ensuring high inference efficiency. Architecturally, it combines Gated DeltaNet linear attention with standard gated attention layers, routing tokens across **256 experts**. It natively supports a massive **262k context window** and is specifically designed for high-performance agentic coding, deep reasoning, and multimodal tasks. ## 🚀 Model Refinement & Logic Tuning （Qwopus3.6-35B-A3B-v1） 🪐**Qwopus3.6-35B-A3B-v1** is a reasoning-enhanced MoE (Mixture of Experts) model fine-tuned on top of **Qwen3.6-35B-A3B**. ### 🛠 Training Strategy The fine-tuning process for this model is structured into **three distinct stages of distributed SFT (Supervised Fine-Tuning)**, progressively scaling reasoning complexity and data diversity. This systematic approach ensures the model inherits the base MoE capabilities while sharpening its logic-handling depth. ...

Links

https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-Coder-MTP-GGUF

Tags

qwopus3.6-27b-coder-compat-mtp

🪐 Qwopus-3.6-27B-Coder Coder SFT Release Agentic Coding & Tool-Use Reasoning Model Fine-Tuned on Qwopus3.6-27B-v2 🧬 Trace Inversion & Negentropy 🧠 27B Dense Model ⚡ Agentic Coding 🛠️ Tool Calling & Agent 🏆 SWE-bench Verified: 67.0% (off-thinking) 💡 What is Qwopus-3.6-27B-Coder? 🪐 Qwopus-3.6-27B-Coder is a reasoning-enhanced agentic coding model built on top of Qwopus3.6-27B-v2. It inherits the powerful reasoning foundation of the v2 base — which achieved 87.43% MMLU-Pro and 75.25% SWE-bench Verified — and further specializes it for agentic code generation, structured tool calling, debugging, and instruction-following in developer workflows. The model is designed to excel at repository-level coding tasks, multi-turn tool orchestration, and complex logical reasoning under realistic agent environments. 🧩 Agentic Coding Optimized for repository-level coding, debugging, patch generation, and structured multi-step development workflows. 🛠️ Tool Calling Learns from real agent trajectories with tool definitions, tool calls, and environment feedback for robust multi-turn execution. ...

Links

https://huggingface.co/Jackrong/Qwopus3.6-27B-Coder-Compat-MTP-GGUF

Tags

qwopus3.6-27b-coder-mtp-nvfp4

🪐 Qwopus-3.6-27B-Coder Coder SFT Release Agentic Coding & Tool-Use Reasoning Model Fine-Tuned on Qwopus3.6-27B-v2 🧬 Trace Inversion & Negentropy 🧠 27B Dense Model ⚡ Agentic Coding 🛠️ Tool Calling & Agent 🏆 SWE-bench Verified: 67.0% (off-thinking) 💡 What is Qwopus-3.6-27B-Coder? 🪐 Qwopus-3.6-27B-Coder is a reasoning-enhanced agentic coding model built on top of Qwopus3.6-27B-v2. It inherits the powerful reasoning foundation of the v2 base — which achieved 87.43% MMLU-Pro (300ex) and 75.25% SWE-bench Verified — and further specializes it for agentic code generation, structured tool calling, debugging, and instruction-following in developer workflows. The model is designed to excel at repository-level coding tasks, multi-turn tool orchestration, and complex logical reasoning under realistic agent environments. 🧩 Agentic Coding Optimized for repository-level coding, debugging, patch generation, and structured multi-step development workflows. 🛠️ Tool Calling Learns from real agent trajectories with tool definitions, tool calls, and environment feedback for robust multi-turn execution. ...

Links

https://huggingface.co/michaelw9999/Qwopus3.6-27B-Coder-MTP-NVFP4-GGUF

Tags

gemma-4-12b-agentic-fable5-composer2.5-v2-3.5x-tau2

Hugging Face | GitHub | Launch Blog | Documentation License: Apache 2.0 | Authors: Google DeepMind > [!Note] > This model card is for the Gemma 4 12B Unified model, which is part of the Gemma 4 family of open models. Built with the same multimodal functionality as Gemma 4 E2B and E4B (text, audio, image, and video inputs), it brings native audio and vision understanding directly to local environments without the need for separate encoders. This unified approach to multimodality makes the model encoder-free, offering a deployment size that is perfect for consumer devices and streamlined local execution. Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on E2B, E4B, and 12B) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages. ...

Links

https://huggingface.co/yuxinlu1/gemma-4-12B-agentic-fable5-composer2.5-v2-3.5x-tau2-GGUF

Tags

gemma-4-12b-coder-fable5-composer2.5-v1

Hugging Face | GitHub | Launch Blog | Documentation License: Apache 2.0 | Authors: Google DeepMind > [!Note] > This model card is for the Gemma 4 12B Unified model, which is part of the Gemma 4 family of open models. Built with the same multimodal functionality as Gemma 4 E2B and E4B (text, audio, image, and video inputs), it brings native audio and vision understanding directly to local environments without the need for separate encoders. This unified approach to multimodality makes the model encoder-free, offering a deployment size that is perfect for consumer devices and streamlined local execution. Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on E2B, E4B, and 12B) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages. ...

Links

https://huggingface.co/yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF

Tags

qwopus3.6-27b-coder-mtp

🪐 Qwopus3.6-27B-v2 SFT Release Reasoning-Enhanced Dense Language Model Fine-Tuned on Qwen3.6-27B 🧬 Trace Inversion & Negentropy 🧠 27B Parameters 🔥 3-Stage Curriculum SFT 🛠️ Vision & Tool-use Support 💡 What is Qwopus3.6-27B-v2? 🪐 Qwopus3.6-27B-v2 is a reasoning-enhanced dense language model built on top of Qwen3.6-27B. By leveraging a multi-stage curriculum learning pipeline and augmented with Trace Inversion datasets (claude-opus-4.6/4.7-traceInversion), it reverse-engineers the compressed "Reasoning Bubbles" of commercial LLMs into structured, step-by-step synthetic reasoning traces, successfully eliminating logical shortcuts and knowledge fractures. 🧩 Structured Reasoning Injects reconstructed deep CoT chains to eliminate logical shortcuts via Trace Inversion. 🪶 Style Consistency Enforces strict constraints on the format and convergence of <think> tags. 🔁 Distillation Alignment Ensures high-quality cross-source SFT data alignment to narrow the capacity gap. ⚡ RL Scalability Sets up a stable formatting pipeline optimized for downstream Reinforcement Learning (RL). ## 💡 1. Base Model, Training Library & Cooperation ...

Links

https://huggingface.co/Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF

Tags

step-3.7-flash

**[ModelPage]**: https://static.stepfun.com/blog/step-3.7-flash/ ## 1. Introduction Step 3.7 Flash is a 198B-parameter sparse Mixture-of-Experts (MoE) vision-language model that combines a 196B-parameter language backbone with a 1.8B-parameter vision encoder for native image understanding. Engineered for high-frequency production workloads, it activates approximately 11B parameters per token and delivers a throughput of up to 400 tokens per second. Step 3.7 Flash supports a 256k context window and offers three selectable reasoning levels (low, medium, and high) so developers can easily balance speed, cost, and cognitive depth. We built Step 3.7 Flash for developers who need to scale agentic workflows that combine perception, search, and reasoning. It is designed to handle intensive tasks such as parsing massive financial reports in one pass, running multi-step search loops with cross-source verification, or operating concurrent coding agents in high-throughput pipelines. ## 2. Capabilities & Performance ### Multimodal Perception and Verification ...

Links

https://huggingface.co/unsloth/Step-3.7-Flash-GGUF

Tags

qwopus3.5-9b-coder-mtp

# 🌟 Qwopus3.5-9B-v3.5 ## 💡 Model Overview & v3.5 Design Qwopus3.5-9B-v3.5 is a **data-scaled continuation** of the Qwopus3.5-9B-v3 model. The training data in v3.5 is expanded to cover a broader range of domains, including mathematics, programming, puzzle-solving, multilingual dialogue, instruction-following, multi-turn interactions, and STEM-related tasks. Qwopus3.5-9B-v3.5 is a reasoning-enhanced model based on **Qwen3.5-9B**, designed for: - 🧩 Structured reasoning - 🔧 Tool-augmented workflows - 🔁 Multi-step agentic tasks - ⚡ Token-efficient inference Compared with Qwopus3.5-9B-v3, **3.5 version does not introduce a new architecture, RL stage, or template redesign**. This version is trained with approximately **2× more SFT data**. ## 🎯 Motivation & Generalization Insight The motivation behind v3.5 comes from a simple observation: > This work is motivated by the hypothesis that scaling high-quality SFT data may further enhance the generalization ability of large language models. In earlier Qwopus3.5 experiments, structured reasoning was observed to improve both **accuracy and efficiency**: ...

Links

https://huggingface.co/Jackrong/Qwopus3.5-9B-Coder-MTP-GGUF

Tags

qwen3.6-35b-a3b-claude-4.6-opus-reasoning-distilled

# 🔥 Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled A reasoning SFT fine-tune of `Qwen/Qwen3.6-35B-A3B` on chain-of-thought (CoT) distillation mostly sourced from Claude Opus 4.6. The goal is to preserve Qwen3.6's strong agentic coding and reasoning base while nudging the model toward structured Claude Opus-style reasoning traces and more stable long-form problem solving. The training path is text-only. The Qwen3.6 base architecture includes a vision encoder, but this fine-tuning run did not train on image or video examples. - **Developed by:** @hesamation - **Base model:** `Qwen/Qwen3.6-35B-A3B` - **License:** apache-2.0 This fine-tuning run is inspired by Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled, including the notebook/training workflow style and Claude Opus reasoning-distillation direction. [](https://x.com/Hesamation) [](https://discord.gg/vtJykN3t) ## Benchmark Results The MMLU-Pro pass used 70 total questions per model: `--limit 5` across 14 MMLU-Pro subjects. Treat this as a smoke/comparative check, not a release-quality full benchmark. ...

Links

https://huggingface.co/hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF

Tags

longcat-video-avatar-1.5

LongCat-Video-Avatar-1.5 served by LocalAI's dedicated CUDA backend. Turns speech plus a prompt into an avatar video, optionally conditioning on a portrait, and continues across multiple segments for longer audio. Avatar generation also loads tokenizer, text encoder, and VAE components from LongCat-Video. Plan for very large downloads and substantial NVIDIA GPU or unified memory; CPU and macOS execution are unsupported.

Links

Tags

magpie-tts-cpp-357m-q8_0

Magpie TTS Multilingual 357M (C++/ggml, magpie-tts.cpp), Q8_0 (~624 MB), near-lossless and the fastest decode. Native C++ text-to-speech for NVIDIA's Magpie TTS Multilingual (encoder + autoregressive decoder over NanoCodec tokens) from one self-contained GGUF (model, codec, tokenizer, G2P dictionaries). 22.05kHz mono, 5 baked voices (set `voice` to Aria, Jason, John, Leo, Sofia or 0-4), 9+ languages (en, es, de, fr, it, pt-BR, hi, vi, ko, plus Arabic variants). Parity-gated per component against the NeMo reference.

Links

Tags

qwen3-coder-next-mxfp4_moe

The model is a quantized version of **Qwen/Qwen3-Coder-Next** (base model) using the **MXFP4** quantization scheme. It is optimized for efficiency while retaining performance, suitable for deployment in applications requiring lightweight inference. The quantized version is tailored for specific tasks, with parameters like temperature=1.0 and top_p=0.95 recommended for generation.

Links

https://huggingface.co/noctrex/Qwen3-Coder-Next-MXFP4_MOE-GGUF

Tags

Model Gallery

Filter by type:

Filter by tags:

kat-coder-v2.5-dev

kat-coder-v2.5-dev-q8

kat-coder-v2.5-dev-apex-i-quality

kat-coder-v2.5-dev-apex-i-balanced

kat-coder-v2.5-dev-apex-i-compact

kat-coder-v2.5-dev-apex-i-mini

kat-coder-v2.5-dev-apex-quality

kat-coder-v2.5-dev-apex-balanced

kat-coder-v2.5-dev-apex-compact

qwopus3.6-35b-a3b-coder-mtp

qwopus3.6-27b-coder-compat-mtp

qwopus3.6-27b-coder-mtp-nvfp4

gemma-4-12b-agentic-fable5-composer2.5-v2-3.5x-tau2

gemma-4-12b-coder-fable5-composer2.5-v1

qwopus3.6-27b-coder-mtp

step-3.7-flash

qwopus3.5-9b-coder-mtp

qwen3.6-35b-a3b-claude-4.6-opus-reasoning-distilled

longcat-video-avatar-1.5

magpie-tts-cpp-357m-q8_0

qwen3-coder-next-mxfp4_moe