Repository: localaiLicense: apache-2.0
Streaming English ASR: sherpa-onnx zipformer transducer (int8, chunk-16 left-128). Low-latency real-time transcription with endpoint detection via sherpa-onnx's online recognizer. English-only; for multilingual offline ASR see omnilingual-0.3b-ctc-q8-sherpa.
Links
Tags
Repository: localaiLicense: apache-2.0
Qwen3-Omni is the natively end-to-end multilingual omni-modal foundation model. It processes text, images, audio, and video, and delivers real-time streaming responses in both text and natural speech. This GGUF build runs on llama.cpp with the bundled mmproj for multimodal inputs.
Links
Tags
Kokoros is a pure Rust TTS backend using the Kokoro v1.0 ONNX model (82M parameters). Fast, streaming TTS with high quality. American English with af_heart voice.
Links
Tags
Repository: localaiLicense: cc-by-4.0
Cache-aware streaming RNNT FastConformer with end-of-utterance (EOU) detection, 120M. Use with streaming transcription. F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet), byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.
Links
Tags
Moonshine Streaming Tiny speech recognition. Runs via the CrispASR backend with an explicit backend selector and a companion tokenizer. Default GGUF size ~31 MB.
Links
Tags