LocalAI - Models

kimi-k3

📰 Tech Blog | 📄 Full Report ## 1. Model Introduction Kimi K3 is an open-weight, native multimodal agentic model and our most capable model to date. It is a 2.8T-parameter model built on Kimi Delta Attention (KDA) and Attention Residuals (AttnRes), with native vision capabilities and a 1-million-token context window. It is the world's first open 3T-class model, designed for frontier intelligence across long-horizon coding, knowledge work, and reasoning. ...

Links

https://huggingface.co/unsloth/Kimi-K3-GGUF

Tags

kat-coder-v2.5-dev

KAT-Coder-V2.5-Dev is an Apache-2.0 agentic coding model from Kwaipilot, post-trained from Qwen3.6-35B-A3B. It has 35 billion total parameters with 3 billion activated per token, a 262K-token context window, and text-only weights tuned for repository-level coding and tool use. This entry offers standard Q4_K_M and Q8_0 GGUF quantizations alongside APEX mixed-precision variants with quality, balanced, compact, and mini profiles. The APEX I-profiles use importance-matrix calibration.

Links

Tags

kat-coder-v2.5-dev-q8

KAT-Coder-V2.5-Dev is an Apache-2.0 agentic coding model from Kwaipilot, post-trained from Qwen3.6-35B-A3B. It has 35 billion total parameters with 3 billion activated per token, a 262K-token context window, and text-only weights tuned for repository-level coding and tool use. This entry uses the higher-quality Q8_0 GGUF quantization.

Links

Tags

kat-coder-v2.5-dev-apex-i-quality

KAT-Coder-V2.5-Dev is an Apache-2.0 agentic coding model from Kwaipilot, post-trained from Qwen3.6-35B-A3B. It has 35 billion total parameters with 3 billion activated per token, a 262K-token context window, and text-only weights tuned for repository-level coding and tool use. This entry offers standard Q4_K_M and Q8_0 GGUF quantizations alongside APEX mixed-precision variants with quality, balanced, compact, and mini profiles. The APEX I-profiles use importance-matrix calibration.

Links

Tags

kat-coder-v2.5-dev-apex-i-balanced

KAT-Coder-V2.5-Dev is an Apache-2.0 agentic coding model from Kwaipilot, post-trained from Qwen3.6-35B-A3B. It has 35 billion total parameters with 3 billion activated per token, a 262K-token context window, and text-only weights tuned for repository-level coding and tool use. This entry offers standard Q4_K_M and Q8_0 GGUF quantizations alongside APEX mixed-precision variants with quality, balanced, compact, and mini profiles. The APEX I-profiles use importance-matrix calibration.

Links

Tags

kat-coder-v2.5-dev-apex-i-compact

KAT-Coder-V2.5-Dev is an Apache-2.0 agentic coding model from Kwaipilot, post-trained from Qwen3.6-35B-A3B. It has 35 billion total parameters with 3 billion activated per token, a 262K-token context window, and text-only weights tuned for repository-level coding and tool use. This entry offers standard Q4_K_M and Q8_0 GGUF quantizations alongside APEX mixed-precision variants with quality, balanced, compact, and mini profiles. The APEX I-profiles use importance-matrix calibration.

Links

Tags

kat-coder-v2.5-dev-apex-i-mini

KAT-Coder-V2.5-Dev is an Apache-2.0 agentic coding model from Kwaipilot, post-trained from Qwen3.6-35B-A3B. It has 35 billion total parameters with 3 billion activated per token, a 262K-token context window, and text-only weights tuned for repository-level coding and tool use. This entry offers standard Q4_K_M and Q8_0 GGUF quantizations alongside APEX mixed-precision variants with quality, balanced, compact, and mini profiles. The APEX I-profiles use importance-matrix calibration.

Links

Tags

kat-coder-v2.5-dev-apex-quality

KAT-Coder-V2.5-Dev is an Apache-2.0 agentic coding model from Kwaipilot, post-trained from Qwen3.6-35B-A3B. It has 35 billion total parameters with 3 billion activated per token, a 262K-token context window, and text-only weights tuned for repository-level coding and tool use. This entry offers standard Q4_K_M and Q8_0 GGUF quantizations alongside APEX mixed-precision variants with quality, balanced, compact, and mini profiles. The APEX I-profiles use importance-matrix calibration.

Links

Tags

kat-coder-v2.5-dev-apex-balanced

KAT-Coder-V2.5-Dev is an Apache-2.0 agentic coding model from Kwaipilot, post-trained from Qwen3.6-35B-A3B. It has 35 billion total parameters with 3 billion activated per token, a 262K-token context window, and text-only weights tuned for repository-level coding and tool use. This entry offers standard Q4_K_M and Q8_0 GGUF quantizations alongside APEX mixed-precision variants with quality, balanced, compact, and mini profiles. The APEX I-profiles use importance-matrix calibration.

Links

Tags

kat-coder-v2.5-dev-apex-compact

KAT-Coder-V2.5-Dev is an Apache-2.0 agentic coding model from Kwaipilot, post-trained from Qwen3.6-35B-A3B. It has 35 billion total parameters with 3 billion activated per token, a 262K-token context window, and text-only weights tuned for repository-level coding and tool use. This entry offers standard Q4_K_M and Q8_0 GGUF quantizations alongside APEX mixed-precision variants with quality, balanced, compact, and mini profiles. The APEX I-profiles use importance-matrix calibration.

Links

Tags

inkling

# Inkling BF16 | NVFP4 | Playground | Tinker Cookbook | Acceptable Use ## 1. General Information Inkling is a general-purpose multimodal model that accepts text, image and audio inputs and generates text outputs. It is intended for use in English and other languages, and across multiple coding languages. The model is designed to be used by developers building AI-powered applications, including agentic and tool-use systems, coding assistants, chatbots, and retrieval-augmented generation systems, and is suitable for general-purpose conversational use, instruction-following, and other natural language and multimodal tasks. It is released with open weights to support research, fine-tuning and integration into third-party products by downstream developers. **Languages:** English, with general multilingual capabilities across other languages. ## 2. Getting Started Try Inkling on the Tinker Playground or access via API using the Tinker Cookbook. Inkling supports local deployment using the following open-source libraries: * SGLang (recipe, PR) * vLLM (recipe, PR) * TokenSpeed (recipe, PR) * Unsloth (recipe, PR) * Huggingface (recipe, PR) ...

Links

https://huggingface.co/unsloth/inkling-GGUF

Tags

qwopus3.6-35b-a3b-coder-mtp

# 🌟 Qwopus3.6-35B-A3B-v1 ## 💡 Base Model Overview **Qwen3.6-35B-A3B** is an advanced hybrid sparse MoE (Mixture-of-Experts) model developed by Alibaba Cloud. It features 35B total parameters with only 3B active parameters per token, ensuring high inference efficiency. Architecturally, it combines Gated DeltaNet linear attention with standard gated attention layers, routing tokens across **256 experts**. It natively supports a massive **262k context window** and is specifically designed for high-performance agentic coding, deep reasoning, and multimodal tasks. ## 🚀 Model Refinement & Logic Tuning （Qwopus3.6-35B-A3B-v1） 🪐**Qwopus3.6-35B-A3B-v1** is a reasoning-enhanced MoE (Mixture of Experts) model fine-tuned on top of **Qwen3.6-35B-A3B**. ### 🛠 Training Strategy The fine-tuning process for this model is structured into **three distinct stages of distributed SFT (Supervised Fine-Tuning)**, progressively scaling reasoning complexity and data diversity. This systematic approach ensures the model inherits the base MoE capabilities while sharpening its logic-handling depth. ...

Links

https://huggingface.co/Jackrong/Qwopus3.6-35B-A3B-Coder-MTP-GGUF

Tags

ornith-1.0-9b-mtp

[](https://deep-reinforce.com/ornith.html) # Ornith-1.0-9B Aloha! 🌺 Today, we are releasing Ornith-1.0, a self-improving family of open-source models for agentic coding. Highlights: - **State-of-the-Art Coding Agents**: Available in 9B-Dense, 31B-Dense, 35B-MoE, and 397B-MoE (post-trained on top of Gemma 4 and Qwen 3.5), achieving state-of-the-art performance among open-source models of comparable size on coding benchmarks such as Terminal-Bench 2.1, SWE-Bench, NL2Repo and OpenClaw. - **Self-Improving Training Framework**: Ornith-1.0 employs RL to learn to generate not only solution rollouts, but also the scallfold that drive those rollouts. By jointly optimizing the scaffold and the resulting solution, the model discovers better search trajectories and generates higher-quality solutions. - **Licence**: MIT licensed, globally accessible, and free from regional limitations. ## Ornith 1.0 9B This model card documents **Ornith-1.0-9B**, the most lightweight member of the Ornith family, designed for efficient single-GPU deployment. ### Benchmarks Ornith-1.0-9B Qwen3.5-9B Qwen3.5-35B Gemma4-12B Gemma4-31B Agentic Coding ...

Links

https://huggingface.co/protoLabsAI/Ornith-1.0-9B-MTP-GGUF

Tags

qwen-agentworld-35b-a3b

# Qwen-AgentWorld-35B-A3B 📑 Technical Report | 📖 Blog | 🤗 Hugging Face | 🤖 ModelScope | 💻 GitHub | 🖥️ Demo > [!Note] > This repository contains the model weights and configuration files for **Qwen-AgentWorld-35B-A3B**, a native language world model trained for agentic environment simulation. > > These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, etc. **Qwen-AgentWorld** is the first language world model to cover seven agent interaction domains within a single model. It simulates agentic environments via long chain-of-thought reasoning, predicting the next environment state given an agent's action and interaction history. Trained through a three-stage pipeline — CPT injects environment knowledge, SFT activates next-state-prediction reasoning, RL sharpens simulation fidelity — Qwen-AgentWorld is a **native world model**: environment modeling is the training objective from the CPT stage onward, not a post-hoc add-on. ## Highlights ...

Links

https://huggingface.co/unsloth/Qwen-AgentWorld-35B-A3B-GGUF

Tags

ornith-1.0-9b

[](https://deep-reinforce.com/ornith.html) # Ornith-1.0-9B-GGUF Aloha! 🌺 Today, we are releasing Ornith-1.0, a self-improving family of open-source models for agentic coding. Highlights: - **State-of-the-Art Coding Agents**: Available in 9B-Dense, 31B-Dense, 35B-MoE, and 397B-MoE (post-trained on top of Gemma 4 and Qwen 3.5), achieving state-of-the-art performance among open-source models of comparable size on coding benchmarks such as Terminal-Bench 2.1, SWE-Bench, NL2Repo and OpenClaw. - **Self-Improving Training Framework**: Ornith-1.0 employs RL to learn to generate not only solution rollouts, but also the scallfold that drive those rollouts. By jointly optimizing the scaffold and the resulting solution, the model discovers better search trajectories and generates higher-quality solutions. - **Licence**: MIT licensed, globally accessible, and free from regional limitations. ## Ornith 1.0 9B This model card documents **Ornith-1.0-9B**, the most lightweight member of the Ornith family, designed for efficient single-GPU deployment. ### Benchmarks Ornith-1.0-9B Qwen3.5-9B Qwen3.5-35B Gemma4-12B Gemma4-31B Agentic Coding ...

Links

https://huggingface.co/deepreinforce-ai/Ornith-1.0-9B-GGUF

Tags

ornith-1.0-35b

[](https://deep-reinforce.com/ornith.html) # Ornith-1.0-35B-GGUF Aloha! 🌺 Today, we are releasing Ornith-1.0, a self-improving family of open-source models for agentic coding. Highlights: - **State-of-the-Art Coding Agents**: Available in 9B-Dense, 31B-Dense, 35B-MoE, and 397B-MoE (post-trained on top of Gemma 4 and Qwen 3.5), achieving state-of-the-art performance among open-source models of comparable size on coding benchmarks such as Terminal-Bench 2.1, SWE-Bench, NL2Repo and OpenClaw. - **Self-Improving Training Framework**: Ornith-1.0 employs RL to learn to generate not only solution rollouts, but also the scallfold that drive those rollouts. By jointly optimizing the scaffold and the resulting solution, the model discovers better search trajectories and generates higher-quality solutions. - **Licence**: MIT licensed, globally accessible, and free from regional limitations. ## Ornith 1.0 35B This model card documents **Ornith-1.0-35B**, the lightweight member of the Ornith family, designed for efficient single-GPU deployment. ### Benchmarks Ornith-1.0-35B Qwen3.5-35B Qwen3.6-35B Gemma4-31B Qwen3.5-397B Agentic Coding ...

Links

https://huggingface.co/deepreinforce-ai/Ornith-1.0-35B-GGUF

Tags

laguna-xs-2.1

Laguna XS 2.1 is Poolside's 33B-parameter, 3B-active Mixture-of-Experts model for agentic coding and long-horizon work on local machines. It supports tool use, interleaved reasoning, and a native 262K-token context window. This default entry uses the official 20.3 GB Q4_K_M GGUF. License: OpenMDW 1.1.

Links

Tags

laguna-s-2.1-q8

Laguna S 2.1 is Poolside's 118B-parameter, 8B-active Mixture-of-Experts model for agentic software engineering. It supports tool use and a native one-million-token context window; the official GGUF recommends 256K context for best output quality. This entry uses the 129 GB Q8_0 build, with routed experts quantized to Q8_0 and the signal path kept in BF16. License: OpenMDW 1.1.

Links

Tags

laguna-s-2.1

Laguna S 2.1 is Poolside's 118B-parameter, 8B-active Mixture-of-Experts model for agentic software engineering. It supports tool use and a native one-million-token context window; the official GGUF recommends 256K context for best output quality. This default entry uses the current 96 GB Q4_K_M artifact, with imatrix-quantized routed experts and a Q8_0 signal path. License: OpenMDW 1.1.

Links

Tags

qwopus3.6-27b-coder-compat-mtp

🪐 Qwopus-3.6-27B-Coder Coder SFT Release Agentic Coding & Tool-Use Reasoning Model Fine-Tuned on Qwopus3.6-27B-v2 🧬 Trace Inversion & Negentropy 🧠 27B Dense Model ⚡ Agentic Coding 🛠️ Tool Calling & Agent 🏆 SWE-bench Verified: 67.0% (off-thinking) 💡 What is Qwopus-3.6-27B-Coder? 🪐 Qwopus-3.6-27B-Coder is a reasoning-enhanced agentic coding model built on top of Qwopus3.6-27B-v2. It inherits the powerful reasoning foundation of the v2 base — which achieved 87.43% MMLU-Pro and 75.25% SWE-bench Verified — and further specializes it for agentic code generation, structured tool calling, debugging, and instruction-following in developer workflows. The model is designed to excel at repository-level coding tasks, multi-turn tool orchestration, and complex logical reasoning under realistic agent environments. 🧩 Agentic Coding Optimized for repository-level coding, debugging, patch generation, and structured multi-step development workflows. 🛠️ Tool Calling Learns from real agent trajectories with tool definitions, tool calls, and environment feedback for robust multi-turn execution. ...

Links

https://huggingface.co/Jackrong/Qwopus3.6-27B-Coder-Compat-MTP-GGUF

Tags

kimi-k2.7-code

## 1. Model Introduction Kimi K2.7 Code is a coding-focused agentic model built upon Kimi K2.6. With substantial improvements on real-world long-horizon coding tasks, it strengthens end-to-end task completion across complex software engineering workflows while improving token efficiency, reducing thinking-token usage by approximately 30% compared with Kimi K2.6. ## 2. Model Summary ## 3. Evaluation Results Benchmark Kimi K2.6 Kimi K2.7 Code GPT-5.5 Claude Opus 4.8 Coding Kimi Code Bench v2 50.9 62.0 69.0 67.4 Program Bench 48.3 53.6 69.1 63.8 MLS Bench Lite 26.7 35.1 35.5 42.8 Agentic Kimi Claw 24/7 Bench 42.9 46.9 52.8 50.4 MCP Atlas 69.4 76.0 79.4 81.3 MCP Mark Verified 72.8 81.1 92.9 76.4 Footnotes ...

Links

https://huggingface.co/unsloth/Kimi-K2.7-Code-GGUF

Tags

Model Gallery

Filter by type:

Filter by tags:

kimi-k3

kat-coder-v2.5-dev

kat-coder-v2.5-dev-q8

kat-coder-v2.5-dev-apex-i-quality

kat-coder-v2.5-dev-apex-i-balanced

kat-coder-v2.5-dev-apex-i-compact

kat-coder-v2.5-dev-apex-i-mini

kat-coder-v2.5-dev-apex-quality

kat-coder-v2.5-dev-apex-balanced

kat-coder-v2.5-dev-apex-compact

inkling

qwopus3.6-35b-a3b-coder-mtp

ornith-1.0-9b-mtp

qwen-agentworld-35b-a3b

ornith-1.0-9b

ornith-1.0-35b

laguna-xs-2.1

laguna-s-2.1-q8

laguna-s-2.1

qwopus3.6-27b-coder-compat-mtp

kimi-k2.7-code