Model Gallery

74 models from 1 repositories

Filter by type:

Filter by tags:

gemmable-4-12b-mtp

## Gemmable 4 12B Gemmable 4 12B is a GGUF export of Gemma 4 12B fine-tuned on Fable-5 style reasoning and assistant traces. ## Highlights - Base model: `google/gemma-4-12B` - Format: GGUF - Training style: Fable-5 style reasoning and assistant traces - Distribution: fp16 GGUF plus matching assistant GGUFs for each quant - Intended use: local inference, coding, reasoning, and assistant workflows ## How to use ### llama.cpp Standard load: ```bash llama-server -m "gemmable-4-12b-fp16.gguf" ``` Speculative / draft-MTP load: ```bash llama-server -m "gemmable-4-12b-Q4_K_M.gguf" \ --spec-draft-model "gemmable-4-12b-Q4_K_M-mtp.gguf" \ --spec-type draft-mtp \ --spec-draft-n-max 4 ``` Use the matching fp16 or quantized main file with its `-mtp` companion. ### LM Studio 1. Search this repo, download target + mtp file. 2. Load target. 3. Load settings → Speculative Decoding → select mtp file file. (Requires a llama.cpp runtime with Gemma 4 MTP support from ggml-org/llama.cpp#23398. LocalAI's pinned llama.cpp backend already carries it, so this entry runs draft-mtp out of the box.) ## GGUF / local inference notes ...

Repository: localai

ced-base-f16

CED (Consistent Ensemble Distillation, Xiaomi) is a sound-event classifier that tags everyday sounds (baby cry, footsteps, glass breaking, alarms, dog bark, ...) into the 527-class AudioSet ontology. This is the f16 GGUF for the ced backend (a standalone C++/ggml port). Recommended default: fastest on CPU and near-lossless. Use POST /v1/audio/classification, or the realtime websocket API for live recognition.

Repository: localaiLicense: apache-2.0

vibevoice-cpp

VibeVoice Realtime 0.5B (C++ / GGML, Q8_0) - native C++ port of Microsoft VibeVoice via the vibevoice-cpp backend. 24kHz mono TTS with a selectable precomputed voice prompt. Default voice prompt: en-Carter_man. This realtime variant does not accept raw Voice Library reference WAVs.

Repository: localaiLicense: mit

vibevoice-cpp-asr

VibeVoice ASR 7B (C++ / GGML, Q4_K) - long-form speech-to-text with speaker diarization. Returns per-speaker JSON segments with start/end timestamps. English-only. ~10 GB download.

Repository: localaiLicense: mit

moss-tts-cpp-v1_5-q8_0

MOSS-TTS-Local v1.5 (C++/ggml, moss-tts.cpp), Q8_0 (~6 GB), near-lossless. Native C++ text-to-speech for the OpenMOSS MOSS-TTS-Local Transformer v1.5 (a GPT-J local transformer decoded through the MOSS-Audio-Tokenizer-v2 neural codec), 48kHz stereo output with reference-audio voice cloning (set `voice` to a reference .wav). Verified per component against the reference PyTorch and about 2x faster per frame on CPU.

Repository: localaiLicense: apache-2.0

moss-tts-cpp-v1_5-f16

MOSS-TTS-Local v1.5 (C++/ggml, moss-tts.cpp), F16 (~9.4 GB), code-exact versus the reference. 48kHz stereo text-to-speech with reference-audio voice cloning.

Repository: localaiLicense: apache-2.0

magpie-tts-cpp-357m-q8_0

Magpie TTS Multilingual 357M (C++/ggml, magpie-tts.cpp), Q8_0 (~624 MB), near-lossless and the fastest decode. Native C++ text-to-speech for NVIDIA's Magpie TTS Multilingual (encoder + autoregressive decoder over NanoCodec tokens) from one self-contained GGUF (model, codec, tokenizer, G2P dictionaries). 22.05kHz mono, 5 baked voices (set `voice` to Aria, Jason, John, Leo, Sofia or 0-4), 9+ languages (en, es, de, fr, it, pt-BR, hi, vi, ko, plus Arabic variants). Parity-gated per component against the NeMo reference.

Repository: localaiLicense: other

magpie-tts-cpp-357m-f16

Magpie TTS Multilingual 357M (C++/ggml, magpie-tts.cpp), F16 (~784 MB). 22.05kHz mono text-to-speech with 5 baked voices and 9+ languages.

Repository: localaiLicense: other

face-detect-buffalo-l

Face recognition with insightface's `buffalo_l` pack (SCRFD-10GF detector + ResNet50 ArcFace 512-d embedder), ported to C++/ggml and shipped as a single GGUF for the `face-detect` backend. Highest accuracy of the buffalo line. No Python / onnxruntime / torch runtime: face-detect.cpp reads the detector and embedder architecture (`facedetect.arch`) directly from the GGUF metadata, so installing this entry is all that is needed to select buffalo_l. Drives the Embedding / Detect / FaceVerify / FaceAnalyze gRPC rpcs and the /v1/face/{verify,analyze,embed,detect} REST endpoints. This GGUF also embeds the MiniFASNet anti-spoof ensemble, available via the FaceVerify `anti_spoof` request flag. NON-COMMERCIAL RESEARCH USE ONLY: for commercial use see `face-detect-yunet-sface`.

Repository: localaiLicense: insightface-non-commercial

face-detect-buffalo-m

Face recognition with insightface's `buffalo_m` pack (SCRFD-2.5GF detector + ResNet50 ArcFace embedder), converted to a C++/ggml GGUF for the `face-detect` backend. Same recognition accuracy as `buffalo_l` with a cheaper detector: a good balance on mid-range hardware. The architecture (`facedetect.arch`) is read from the GGUF metadata, so this entry alone selects the buffalo_m engine. This GGUF also embeds the MiniFASNet anti-spoof ensemble, available via the FaceVerify `anti_spoof` request flag. NON-COMMERCIAL RESEARCH USE ONLY.

Repository: localaiLicense: insightface-non-commercial

face-detect-buffalo-s

Face recognition with insightface's `buffalo_s` pack (SCRFD-500MF detector + MBF 512-d embedder), converted to a C++/ggml GGUF for the `face-detect` backend. Small and CPU-friendly: a good fit for mid-range and edge deployments. The architecture (`facedetect.arch`) is read from the GGUF metadata, so this entry alone selects the buffalo_s engine. This GGUF also embeds the MiniFASNet anti-spoof ensemble, available via the FaceVerify `anti_spoof` request flag. NON-COMMERCIAL RESEARCH USE ONLY.

Repository: localaiLicense: insightface-non-commercial

face-detect-buffalo-sc

Face recognition with insightface's `buffalo_sc` pack (SCRFD-500M detector + a small ArcFace embedder), converted to a C++/ggml GGUF for the `face-detect` backend. This is the smallest insightface pack: the lightest option for low-resource and edge deployments. The architecture (`facedetect.arch`) is read from the GGUF metadata, so this entry alone selects the buffalo_sc engine. If this GGUF embeds the MiniFASNet anti-spoof ensemble, it is available via the FaceVerify `anti_spoof` request flag. NON-COMMERCIAL RESEARCH USE ONLY.

Repository: localaiLicense: insightface-non-commercial

face-detect-antelopev2

Face recognition with insightface's `antelopev2` pack (SCRFD-10G detector + ArcFace glint360k R100, 512-d embedder), converted to a C++/ggml GGUF for the `face-detect` backend. The higher-accuracy insightface pack: heavier, but the best fit when recognition quality matters more than speed. The architecture (`facedetect.arch`) is read from the GGUF metadata, so this entry alone selects the antelopev2 engine. If this GGUF embeds the MiniFASNet anti-spoof ensemble, it is available via the FaceVerify `anti_spoof` request flag. NON-COMMERCIAL RESEARCH USE ONLY.

Repository: localaiLicense: insightface-non-commercial

face-detect-yunet-sface

Face recognition with OpenCV Zoo weights: YuNet detector + SFace 128-d recognizer, converted to a C++/ggml GGUF for the `face-detect` backend. APACHE 2.0: safe for commercial use. Lower accuracy than the buffalo packs and no demographic head, but the commercial-friendly alternative to the insightface buffalo line. The architecture (`facedetect.arch`) is read from the GGUF metadata, so this entry alone selects the YuNet + SFace engine.

Repository: localaiLicense: apache-2.0

voice-detect-ecapa-tdnn

Speaker (voice) recognition with SpeechBrain's ECAPA-TDNN trained on VoxCeleb, ported to C++/ggml and shipped as a single GGUF for the `voice-detect` backend. 192-d L2-normalised embeddings, ~1.9% Equal Error Rate on VoxCeleb1-O. APACHE 2.0 - commercial-safe. No Python / torch runtime: voice-detect.cpp reads the embedding architecture (`voicedetect.arch`) directly from the GGUF metadata, so installing this entry is all that is needed to select ECAPA-TDNN. Drives the VoiceVerify / VoiceEmbed gRPC rpcs and the /v1/voice/{verify,embed,register,identify,forget} REST endpoints.

Repository: localaiLicense: apache-2.0

voice-detect-wespeaker-resnet34

Speaker recognition with WeSpeaker's ResNet34 trained on VoxCeleb, converted to a C++/ggml GGUF for the `voice-detect` backend. 256-d embeddings, CPU-friendly and runtime-free (no onnxruntime or torch). CC-BY-4.0. Use when you want WeSpeaker's ResNet34 topology instead of ECAPA-TDNN. The embedding architecture (`voicedetect.arch`) is read from the GGUF metadata, so this entry alone selects the engine.

Repository: localaiLicense: cc-by-4.0

voice-detect-eres2net

Speaker recognition with 3D-Speaker's ERes2Net trained on VoxCeleb, converted to a C++/ggml GGUF for the `voice-detect` backend. 192-d embeddings with strong verification accuracy. APACHE 2.0. The embedding architecture (`voicedetect.arch`) is read from the GGUF metadata, so this entry alone selects the ERes2Net engine.

Repository: localaiLicense: apache-2.0

voice-detect-campplus

Speaker recognition with 3D-Speaker's CAM++ trained on VoxCeleb, converted to a C++/ggml GGUF for the `voice-detect` backend. 192-d embeddings, a fast context-aware masking topology well-suited to CPU and edge deployments. APACHE 2.0. The embedding architecture (`voicedetect.arch`) is read from the GGUF metadata, so this entry alone selects the CAM++ engine.

Repository: localaiLicense: apache-2.0

voice-detect-emotion-wav2vec2

Voice analysis (age / gender / emotion) with audEERING's wav2vec2 model, converted to a C++/ggml GGUF for the `voice-detect` backend. Drives the VoiceAnalyze gRPC rpc and the /v1/voice/analyze REST endpoint, returning a continuous age estimate plus gender and emotion class scores for a single utterance. CC-BY-NC-SA-4.0 - research / non-commercial use only. The analysis architecture (`voicedetect.arch`) is read from the GGUF metadata, so this entry alone selects the wav2vec2 analyze head.

Repository: localaiLicense: cc-by-nc-sa-4.0

voice-detect-age-gender-wav2vec2

wav2vec2-large-robust age + gender analysis head (audeering/wav2vec2-large-robust-24-ft-age-gender), converted to a C++/ggml GGUF for the `voice-detect` backend. Drives the VoiceAnalyze gRPC rpc and the /v1/voice/analyze REST endpoint, returning a continuous age estimate plus gender class scores for a single utterance. CC-BY-NC-SA-4.0 - research / non-commercial use only. The analysis architecture (`voicedetect.arch`) is read from the GGUF metadata, so this entry alone selects the wav2vec2 analyze head.

Repository: localaiLicense: cc-by-nc-sa-4.0

rfdetr-cpp-nano

RF-DETR Nano object detection model, served via the native rfdetr.cpp backend (ggml + purego, no Python). Q8_0 quantization is the recommended default for CPU: same accuracy as F16/F32, ~20MB on disk, fastest CPU latency. Pure C++/ggml runtime; no Python dependencies. Drop-in for the /v1/detection endpoint.

Repository: localaiLicense: apache-2.0

Page 1