NVIDIA NeMo Parakeet TDT 0.6B v3 is an automatic speech recognition (ASR) model from NVIDIA's NeMo toolkit. Parakeet models are state-of-the-art ASR models trained on large-scale English audio data.
Repository: localaiLicense: cc-by-4.0
nemo-parakeet-tdt-0.6b
NVIDIA NeMo Parakeet TDT 0.6B v3 is an automatic speech recognition (ASR) model from NVIDIA's NeMo toolkit. Parakeet models are state-of-the-art ASR models trained on large-scale English audio data.
Voxtral Mini 4B Realtime is a speech-to-text model from Mistral AI. It is a 4B parameter model optimized for fast, accurate audio transcription with low latency, making it ideal for real-time applications. The model uses the Voxtral architecture for efficient audio processing.
Repository: localaiLicense: apache-2.0
voxtral-mini-4b-realtime
Voxtral Mini 4B Realtime is a speech-to-text model from Mistral AI. It is a 4B parameter model optimized for fast, accurate audio transcription with low latency, making it ideal for real-time applications. The model uses the Voxtral architecture for efficient audio processing.
Moonshine Tiny is a lightweight speech-to-text model optimized for fast transcription. It is designed for efficient on-device ASR with high accuracy relative to its size.
Repository: localaiLicense: apache-2.0
moonshine-tiny
Moonshine Tiny is a lightweight speech-to-text model optimized for fast transcription. It is designed for efficient on-device ASR with high accuracy relative to its size.
WhisperX Tiny is a fast and accurate speech recognition model with speaker diarization capabilities. Built on OpenAI's Whisper with additional features for alignment and speaker segmentation.
Repository: localaiLicense: mit
whisperx-tiny
WhisperX Tiny is a fast and accurate speech recognition model with speaker diarization capabilities. Built on OpenAI's Whisper with additional features for alignment and speaker segmentation.
Omnilingual ASR CTC 300M (int8) is a multilingual automatic speech recognition model supporting 1,600+ languages. Based on Meta's omniASR_CTC_300M architecture (Wav2Vec2 with CTC head), quantized to int8 for efficient inference. Uses the sherpa-onnx backend with ONNX Runtime.
Repository: localaiLicense: apache-2.0
omnilingual-0.3b-ctc-q8-sherpa
Omnilingual ASR CTC 300M (int8) is a multilingual automatic speech recognition model supporting 1,600+ languages. Based on Meta's omniASR_CTC_300M architecture (Wav2Vec2 with CTC head), quantized to int8 for efficient inference. Uses the sherpa-onnx backend with ONNX Runtime.
Streaming English ASR: sherpa-onnx zipformer transducer (int8, chunk-16 left-128). Low-latency real-time transcription with endpoint detection via sherpa-onnx's online recognizer. English-only; for multilingual offline ASR see omnilingual-0.3b-ctc-q8-sherpa.
Repository: localaiLicense: apache-2.0
streaming-zipformer-en-sherpa
Streaming English ASR: sherpa-onnx zipformer transducer (int8, chunk-16 left-128). Low-latency real-time transcription with endpoint detection via sherpa-onnx's online recognizer. English-only; for multilingual offline ASR see omnilingual-0.3b-ctc-q8-sherpa.
Hybrid TDT+CTC FastConformer, 110M. F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet),
byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.
Repository: localaiLicense: cc-by-4.0
parakeet-cpp-tdt_ctc-110m
Hybrid TDT+CTC FastConformer, 110M. F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet),
byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.
Cache-aware streaming RNNT FastConformer with end-of-utterance (EOU) detection, 120M. Use with streaming transcription. F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet),
byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.
Repository: localaiLicense: cc-by-4.0
parakeet-cpp-realtime_eou_120m-v1
Cache-aware streaming RNNT FastConformer with end-of-utterance (EOU) detection, 120M. Use with streaming transcription. F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet),
byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.
CTC FastConformer, 0.6B. F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet),
byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.
Repository: localaiLicense: cc-by-4.0
parakeet-cpp-ctc-0.6b
CTC FastConformer, 0.6B. F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet),
byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.
RNNT FastConformer, 0.6B. F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet),
byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.
Repository: localaiLicense: cc-by-4.0
parakeet-cpp-rnnt-0.6b
RNNT FastConformer, 0.6B. F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet),
byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.
TDT FastConformer, 0.6B (v2). F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet),
byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.
Repository: localaiLicense: cc-by-4.0
parakeet-cpp-tdt-0.6b-v2
TDT FastConformer, 0.6B (v2). F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet),
byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.
TDT FastConformer, 0.6B (v3, multilingual). F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet),
byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.
Repository: localaiLicense: cc-by-4.0
parakeet-cpp-tdt-0.6b-v3
TDT FastConformer, 0.6B (v3, multilingual). F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet),
byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.
CTC FastConformer, 1.1B. F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet),
byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.
Repository: localaiLicense: cc-by-4.0
parakeet-cpp-ctc-1.1b
CTC FastConformer, 1.1B. F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet),
byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.
RNNT FastConformer, 1.1B. F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet),
byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.
Repository: localaiLicense: cc-by-4.0
parakeet-cpp-rnnt-1.1b
RNNT FastConformer, 1.1B. F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet),
byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.
TDT FastConformer, 1.1B. F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet),
byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.
Repository: localaiLicense: cc-by-4.0
parakeet-cpp-tdt-1.1b
TDT FastConformer, 1.1B. F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet),
byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.
Hybrid TDT+CTC FastConformer, 1.1B. F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet),
byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.
Repository: localaiLicense: cc-by-4.0
parakeet-cpp-tdt_ctc-1.1b
Hybrid TDT+CTC FastConformer, 1.1B. F16 GGUF for the parakeet-cpp backend (C++/ggml port of NVIDIA NeMo Parakeet),
byte-identical to NeMo at WER 0. Faster than NeMo on CPU and GPU.
NVIDIA Parakeet TDT 0.6B Japanese ASR (F16 default; Q4_K is quantisation-sensitive for this model). Runs via the CrispASR backend. Default GGUF size ~1.24 GB.
Repository: localai
parakeet-ja-crispasr
NVIDIA Parakeet TDT 0.6B Japanese ASR (F16 default; Q4_K is quantisation-sensitive for this model). Runs via the CrispASR backend. Default GGUF size ~1.24 GB.