Model Gallery

61 models from 1 repositories

Filter by type:

Filter by tags:

kimi-k2.6
🤗  huggingchat  |  📰  Tech Blog ## 1. Model Introduction Kimi K2.6 is an open-source, native multimodal agentic model that advances practical capabilities in long-horizon coding, coding-driven design, proactive autonomous execution, and swarm-based task orchestration. ### Key Features - **Long-Horizon Coding**: K2.6 achieves significant improvements on complex, end-to-end coding tasks, generalizing robustly across programming languages (Rust, Go, Python) and domains spanning front-end, DevOps, and performance optimization. - **Coding-Driven Design**: K2.6 is capable of transforming simple prompts and visual inputs into production-ready interfaces and lightweight full-stack workflows, generating structured layouts, interactive elements, and rich animations with deliberate aesthetic precision. - **Elevated Agent Swarm**: Scaling horizontally to 300 sub-agents executing 4,000 coordinated steps, K2.6 can dynamically decompose tasks into parallel, domain-specialized subtasks, delivering end-to-end outputs from documents to websites to spreadsheets in a single autonomous run. - **Proactive & Open Orchestration**: For autonomous tasks, K2.6 demonstra ...

Repository: localaiLicense: modified-mit

nanbeige4.1-3b-q8
Nanbeige4.1-3B is built upon Nanbeige4-3B-Base and represents an enhanced iteration of our previous reasoning model, Nanbeige4-3B-Thinking-2511, achieved through further post-training optimization with supervised fine-tuning (SFT) and reinforcement learning (RL). As a highly competitive open-source model at a small parameter scale, Nanbeige4.1-3B illustrates that compact models can simultaneously achieve robust reasoning, preference alignment, and effective agentic behaviors. Key features: Strong Reasoning: Capable of solving complex, multi-step problems through sustained and coherent reasoning within a single forward pass, reliably producing correct answers on benchmarks like LiveCodeBench-Pro, IMO-Answer-Bench, and AIME 2026 I. Robust Preference Alignment: Outperforms same-scale models (e.g., Qwen3-4B-2507, Nanbeige4-3B-2511) and larger models (e.g., Qwen3-30B-A3B, Qwen3-32B) on Arena-Hard-v2 and Multi-Challenge. Agentic Capability: First general small model to natively support deep-search tasks and sustain complex problem-solving with >500 rounds of tool invocations; excels in benchmarks like xBench-DeepSearch (75), Browse-Comp (39), and others.

Repository: localaiLicense: apache-2.0

nanbeige4.1-3b-q4
Nanbeige4.1-3B is built upon Nanbeige4-3B-Base and represents an enhanced iteration of our previous reasoning model, Nanbeige4-3B-Thinking-2511, achieved through further post-training optimization with supervised fine-tuning (SFT) and reinforcement learning (RL). As a highly competitive open-source model at a small parameter scale, Nanbeige4.1-3B illustrates that compact models can simultaneously achieve robust reasoning, preference alignment, and effective agentic behaviors. Key features: Strong Reasoning: Capable of solving complex, multi-step problems through sustained and coherent reasoning within a single forward pass, reliably producing correct answers on benchmarks like LiveCodeBench-Pro, IMO-Answer-Bench, and AIME 2026 I. Robust Preference Alignment: Outperforms same-scale models (e.g., Qwen3-4B-2507, Nanbeige4-3B-2511) and larger models (e.g., Qwen3-30B-A3B, Qwen3-32B) on Arena-Hard-v2 and Multi-Challenge. Agentic Capability: First general small model to natively support deep-search tasks and sustain complex problem-solving with >500 rounds of tool invocations; excels in benchmarks like xBench-DeepSearch (75), Browse-Comp (39), and others.

Repository: localaiLicense: apache-2.0

vllm-omni-qwen3-omni-30b
Qwen3-Omni-30B-A3B-Instruct via vLLM-Omni - A large multimodal model (30B active, 3B activated per token) from Alibaba Qwen team. Supports text, image, audio, and video understanding with text and speech output. Features native multimodal understanding across all modalities.

Repository: localaiLicense: apache-2.0

acestep-cpp-turbo
ACE-Step 1.5 Turbo (C++ / GGML) — native C++ music generation from text descriptions and lyrics. Two-stage pipeline: text-to-code (Qwen3 LM) + code-to-audio (DiT-VAE). Stereo 48kHz output. Uses Q8_0 quantized models for a good balance of quality and speed.

Repository: localaiLicense: mit

vibevoice-cpp
VibeVoice Realtime 0.5B (C++ / GGML, Q8_0) - native C++ port of Microsoft VibeVoice via the vibevoice-cpp backend. 24kHz mono TTS with voice cloning from a single reference voice prompt. Default voice prompt: en-Carter_man.

Repository: localaiLicense: mit

qwen3-tts-cpp
Qwen3-TTS 0.6B (C++ / GGML) — native C++ text-to-speech from text input. Generates 24kHz mono audio. Supports 10 languages (en, zh, ja, ko, de, fr, es, it, pt, ru). Uses F16 GGUF models (~2 GB total).

Repository: localaiLicense: apache-2.0

qwen3-omni-30b-a3b-instruct
Qwen3-Omni is the natively end-to-end multilingual omni-modal foundation model. It processes text, images, audio, and video, and delivers real-time streaming responses in both text and natural speech. This GGUF build runs on llama.cpp with the bundled mmproj for multimodal inputs.

Repository: localaiLicense: apache-2.0

qwen3-omni-30b-a3b-thinking
Qwen3-Omni-30B-A3B-Thinking is the reasoning-enhanced variant of Qwen3-Omni, a natively end-to-end multilingual omni-modal foundation model. It processes text, images, and audio and produces chain-of-thought reasoning before the final answer. This GGUF build runs on llama.cpp with the bundled mmproj.

Repository: localaiLicense: apache-2.0

gpt-oss-20b
Welcome to the gpt-oss series, OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. We’re releasing two flavors of the open models: gpt-oss-120b — for production, general purpose, high reasoning use cases that fits into a single H100 GPU (117B parameters with 5.1B active parameters) gpt-oss-20b — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters) Both models were trained on our harmony response format and should only be used with the harmony format as it will not work correctly otherwise. This model card is dedicated to the smaller gpt-oss-20b model. Check out gpt-oss-120b for the larger model. Highlights Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment. Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs. Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users. Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning. Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs. Native MXFP4 quantization: The models are trained with native MXFP4 precision for the MoE layer, making gpt-oss-120b run on a single H100 GPU and the gpt-oss-20b model run within 16GB of memory.

Repository: localaiLicense: apache-2.0

gpt-oss-120b
Welcome to the gpt-oss series, OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. We’re releasing two flavors of the open models: gpt-oss-120b — for production, general purpose, high reasoning use cases that fits into a single H100 GPU (117B parameters with 5.1B active parameters) gpt-oss-20b — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters) Both models were trained on our harmony response format and should only be used with the harmony format as it will not work correctly otherwise. This model card is dedicated to the smaller gpt-oss-20b model. Check out gpt-oss-120b for the larger model. Highlights Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment. Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs. Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users. Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning. Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs. Native MXFP4 quantization: The models are trained with native MXFP4 precision for the MoE layer, making gpt-oss-120b run on a single H100 GPU and the gpt-oss-20b model run within 16GB of memory.

Repository: localaiLicense: apache-2.0

rfdetr-cpp-nano
RF-DETR Nano object detection model, served via the native rfdetr.cpp backend (ggml + purego, no Python). Q8_0 quantization is the recommended default for CPU: same accuracy as F16/F32, ~20MB on disk, fastest CPU latency. Pure C++/ggml runtime; no Python dependencies. Drop-in for the /v1/detection endpoint.

Repository: localaiLicense: apache-2.0

rfdetr-cpp-base
RF-DETR Base object detection model, served via the native rfdetr.cpp backend. F16 quantization is recommended on CPU: identical accuracy to F32, half the size, fastest.

Repository: localaiLicense: apache-2.0

rfdetr-cpp-small
RF-DETR Small object detection model (DINOv2-small backbone, 512px input, 3 decoder layers), served via the native rfdetr.cpp backend (ggml + purego, no Python). A step up from Nano in accuracy while staying lightweight on CPU. F16 quantization is the recommended default: identical accuracy to F32 at roughly half the size. Drop-in for the /v1/detection endpoint.

Repository: localaiLicense: apache-2.0

rfdetr-cpp-medium
RF-DETR Medium object detection model (DINOv2-small backbone, 576px input, 4 decoder layers), served via the native rfdetr.cpp backend. Balanced detection quality vs. CPU latency — recommended when Base is not accurate enough but Large is too slow. F16 quantization is the recommended default: identical accuracy to F32, half the size. Drop-in for the /v1/detection endpoint.

Repository: localaiLicense: apache-2.0

rfdetr-cpp-large
RF-DETR Large object detection model (DINOv2-small backbone, 704px input, 4 decoder layers), served via the native rfdetr.cpp backend. Highest-accuracy detection variant — best for offline workflows and high-resolution inputs where CPU latency is secondary to recall. F16 quantization is the recommended default: identical accuracy to F32, half the size. Drop-in for the /v1/detection endpoint.

Repository: localaiLicense: apache-2.0

rfdetr-cpp-seg-nano
RF-DETR Seg-Nano instance segmentation model (DINOv2-small backbone, 312px input, 4 decoder layers, 100 queries), served via the native rfdetr.cpp backend. Smallest segmentation variant — fastest CPU latency, ideal for edge deployment. Returns both bounding boxes and per-instance masks via the /v1/detection endpoint. F16 quantization is the recommended default: identical accuracy to F32, half the size.

Repository: localaiLicense: apache-2.0

rfdetr-cpp-seg-small
RF-DETR Seg-Small instance segmentation model (DINOv2-small backbone, 384px input, 4 decoder layers, 100 queries), served via the native rfdetr.cpp backend. Step up from Seg-Nano in mask quality while staying CPU-friendly. Returns both bounding boxes and per-instance masks via the /v1/detection endpoint. F16 quantization is the recommended default: identical accuracy to F32, half the size.

Repository: localaiLicense: apache-2.0

rfdetr-cpp-seg-medium
RF-DETR Seg-Medium instance segmentation model (DINOv2-small backbone, 432px input, 5 decoder layers, 200 queries), served via the native rfdetr.cpp backend. Balanced segmentation quality vs. CPU latency — recommended for everyday segmentation workloads. Returns both bounding boxes and per-instance masks via the /v1/detection endpoint. F16 quantization is the recommended default.

Repository: localaiLicense: apache-2.0

rfdetr-cpp-seg-large
RF-DETR Seg-Large instance segmentation model (DINOv2-small backbone, 504px input, 5 decoder layers, 200 queries), served via the native rfdetr.cpp backend. Higher-resolution input than Seg-Medium for sharper mask boundaries. Returns both bounding boxes and per-instance masks via the /v1/detection endpoint. F16 quantization is the recommended default: identical accuracy to F32, half the size.

Repository: localaiLicense: apache-2.0

rfdetr-cpp-seg-xlarge
RF-DETR Seg-XLarge instance segmentation model (DINOv2-small backbone, 624px input, 6 decoder layers, 300 queries), served via the native rfdetr.cpp backend. High-capacity segmentation variant with more queries and deeper decoder — best for dense scenes with many instances. Returns both bounding boxes and per-instance masks via the /v1/detection endpoint. F16 quantization is the recommended default.

Repository: localaiLicense: apache-2.0

Page 1