Model Gallery

52 models from 1 repositories

Filter by type:

Filter by tags:

nemo-parakeet-tdt-0.6b

NVIDIA NeMo Parakeet TDT 0.6B v3 is an automatic speech recognition (ASR) model from NVIDIA's NeMo toolkit. Parakeet models are state-of-the-art ASR models trained on large-scale English audio data.

Repository: localaiLicense: cc-by-4.0

voxtral-mini-4b-realtime

Voxtral Mini 4B Realtime is a speech-to-text model from Mistral AI. It is a 4B parameter model optimized for fast, accurate audio transcription with low latency, making it ideal for real-time applications. The model uses the Voxtral architecture for efficient audio processing.

Repository: localaiLicense: apache-2.0

moonshine-tiny

Moonshine Tiny is a lightweight speech-to-text model optimized for fast transcription. It is designed for efficient on-device ASR with high accuracy relative to its size.

Repository: localaiLicense: apache-2.0

whisperx-tiny

WhisperX Tiny is a fast and accurate speech recognition model with speaker diarization capabilities. Built on OpenAI's Whisper with additional features for alignment and speaker segmentation.

Repository: localaiLicense: mit

omnilingual-0.3b-ctc-q8-sherpa

Omnilingual ASR CTC 300M (int8) is a multilingual automatic speech recognition model supporting 1,600+ languages. Based on Meta's omniASR_CTC_300M architecture (Wav2Vec2 with CTC head), quantized to int8 for efficient inference. Uses the sherpa-onnx backend with ONNX Runtime.

Repository: localaiLicense: apache-2.0

streaming-zipformer-en-sherpa

Streaming English ASR: sherpa-onnx zipformer transducer (int8, chunk-16 left-128). Low-latency real-time transcription with endpoint detection via sherpa-onnx's online recognizer. English-only; for multilingual offline ASR see omnilingual-0.3b-ctc-q8-sherpa.

Repository: localaiLicense: apache-2.0

vibevoice-cpp-asr

VibeVoice ASR 7B (C++ / GGML, Q4_K) - long-form speech-to-text with speaker diarization. Returns per-speaker JSON segments with start/end timestamps. English-only. ~10 GB download.

Repository: localaiLicense: mit

parakeet-cpp-tdt_ctc-110m

Hybrid TDT+CTC FastConformer, 110M. F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet), byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.

Repository: localaiLicense: cc-by-4.0

parakeet-cpp-realtime_eou_120m-v1

Cache-aware streaming RNNT FastConformer with end-of-utterance (EOU) detection, 120M. Use with streaming transcription. F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet), byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.

Repository: localaiLicense: cc-by-4.0

parakeet-cpp-ctc-0.6b

CTC FastConformer, 0.6B. F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet), byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.

Repository: localaiLicense: cc-by-4.0

parakeet-cpp-rnnt-0.6b

RNNT FastConformer, 0.6B. F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet), byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.

Repository: localaiLicense: cc-by-4.0

parakeet-cpp-tdt-0.6b-v2

TDT FastConformer, 0.6B (v2). F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet), byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.

Repository: localaiLicense: cc-by-4.0

parakeet-cpp-tdt-0.6b-v3

TDT FastConformer, 0.6B (v3, multilingual). F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet), byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.

Repository: localaiLicense: cc-by-4.0

parakeet-cpp-ctc-1.1b

CTC FastConformer, 1.1B. F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet), byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.

Repository: localaiLicense: cc-by-4.0

parakeet-cpp-rnnt-1.1b

RNNT FastConformer, 1.1B. F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet), byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.

Repository: localaiLicense: cc-by-4.0

parakeet-cpp-tdt-1.1b

TDT FastConformer, 1.1B. F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet), byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.

Repository: localaiLicense: cc-by-4.0

parakeet-cpp-tdt_ctc-1.1b

Hybrid TDT+CTC FastConformer, 1.1B. F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet), byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.

Repository: localaiLicense: cc-by-4.0

parakeet-cpp-nemotron-3.5-asr-streaming-0.6b

Multilingual (40+ locales), prompt-conditioned, cache-aware streaming FastConformer RNN-T, 0.6B. Q8_0 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo). Byte-identical to NeMo at WER 0 offline and streaming, about 2.5x faster than NeMo on CPU with no GPU. Select a language with the request "language" field (for example en, de, es, ja-JP), or leave it empty for automatic detection. License OpenMDW-1.1.

Repository: localaiLicense: other

moss-transcribe-cpp-0.9b

MOSS-Transcribe-Diarize 0.9B, an end-to-end audio understanding model for joint multi-speaker transcription, speaker diarization and timestamping in a single pass. Q5_K GGUF for the moss-transcribe-cpp backend (C++/ggml port of OpenMOSS MOSS-Transcribe-Diarize), byte-identical to the reference at ~1/6 the size. Faster than the reference PyTorch runtime on CPU (1.6 to 2.2x).

Repository: localaiLicense: apache-2.0

parakeet-crispasr

NVIDIA Parakeet TDT 0.6B v3 (FastConformer + Token-and-Duration Transducer), 25-language ASR. Runs via the CrispASR backend. Default GGUF size ~467 MB.

Repository: localai

parakeet-v2-crispasr

NVIDIA Parakeet TDT 0.6B v2 (FastConformer + TDT), English-only ASR. Runs via the CrispASR backend. Default GGUF size ~468 MB.

Repository: localai

Page 1