Model Gallery

26 models from 1 repositories

Filter by type:

Filter by tags:

streaming-zipformer-en-sherpa
Streaming English ASR: sherpa-onnx zipformer transducer (int8, chunk-16 left-128). Low-latency real-time transcription with endpoint detection via sherpa-onnx's online recognizer. English-only; for multilingual offline ASR see omnilingual-0.3b-ctc-q8-sherpa.

Repository: localaiLicense: apache-2.0

lfm2.5-audio-1.5b-asr
LFM2.5-Audio-1.5B in ASR mode. System prompt `Perform ASR.` is prepended; output is capitalised and punctuated. Wire this entry as a transcription model on the /v1/audio/transcriptions endpoint.

Repository: localaiLicense: LFM-Open-License-v1.0

rfdetr-cpp-nano
RF-DETR Nano object detection model, served via the native rfdetr.cpp backend (ggml + purego, no Python). Q8_0 quantization is the recommended default for CPU: same accuracy as F16/F32, ~20MB on disk, fastest CPU latency. Pure C++/ggml runtime; no Python dependencies. Drop-in for the /v1/detection endpoint.

Repository: localaiLicense: apache-2.0

rfdetr-cpp-small
RF-DETR Small object detection model (DINOv2-small backbone, 512px input, 3 decoder layers), served via the native rfdetr.cpp backend (ggml + purego, no Python). A step up from Nano in accuracy while staying lightweight on CPU. F16 quantization is the recommended default: identical accuracy to F32 at roughly half the size. Drop-in for the /v1/detection endpoint.

Repository: localaiLicense: apache-2.0

rfdetr-cpp-medium
RF-DETR Medium object detection model (DINOv2-small backbone, 576px input, 4 decoder layers), served via the native rfdetr.cpp backend. Balanced detection quality vs. CPU latency — recommended when Base is not accurate enough but Large is too slow. F16 quantization is the recommended default: identical accuracy to F32, half the size. Drop-in for the /v1/detection endpoint.

Repository: localaiLicense: apache-2.0

rfdetr-cpp-large
RF-DETR Large object detection model (DINOv2-small backbone, 704px input, 4 decoder layers), served via the native rfdetr.cpp backend. Highest-accuracy detection variant — best for offline workflows and high-resolution inputs where CPU latency is secondary to recall. F16 quantization is the recommended default: identical accuracy to F32, half the size. Drop-in for the /v1/detection endpoint.

Repository: localaiLicense: apache-2.0

rfdetr-cpp-seg-nano
RF-DETR Seg-Nano instance segmentation model (DINOv2-small backbone, 312px input, 4 decoder layers, 100 queries), served via the native rfdetr.cpp backend. Smallest segmentation variant — fastest CPU latency, ideal for edge deployment. Returns both bounding boxes and per-instance masks via the /v1/detection endpoint. F16 quantization is the recommended default: identical accuracy to F32, half the size.

Repository: localaiLicense: apache-2.0

rfdetr-cpp-seg-small
RF-DETR Seg-Small instance segmentation model (DINOv2-small backbone, 384px input, 4 decoder layers, 100 queries), served via the native rfdetr.cpp backend. Step up from Seg-Nano in mask quality while staying CPU-friendly. Returns both bounding boxes and per-instance masks via the /v1/detection endpoint. F16 quantization is the recommended default: identical accuracy to F32, half the size.

Repository: localaiLicense: apache-2.0

rfdetr-cpp-seg-medium
RF-DETR Seg-Medium instance segmentation model (DINOv2-small backbone, 432px input, 5 decoder layers, 200 queries), served via the native rfdetr.cpp backend. Balanced segmentation quality vs. CPU latency — recommended for everyday segmentation workloads. Returns both bounding boxes and per-instance masks via the /v1/detection endpoint. F16 quantization is the recommended default.

Repository: localaiLicense: apache-2.0

rfdetr-cpp-seg-large
RF-DETR Seg-Large instance segmentation model (DINOv2-small backbone, 504px input, 5 decoder layers, 200 queries), served via the native rfdetr.cpp backend. Higher-resolution input than Seg-Medium for sharper mask boundaries. Returns both bounding boxes and per-instance masks via the /v1/detection endpoint. F16 quantization is the recommended default: identical accuracy to F32, half the size.

Repository: localaiLicense: apache-2.0

rfdetr-cpp-seg-xlarge
RF-DETR Seg-XLarge instance segmentation model (DINOv2-small backbone, 624px input, 6 decoder layers, 300 queries), served via the native rfdetr.cpp backend. High-capacity segmentation variant with more queries and deeper decoder — best for dense scenes with many instances. Returns both bounding boxes and per-instance masks via the /v1/detection endpoint. F16 quantization is the recommended default.

Repository: localaiLicense: apache-2.0

rfdetr-cpp-seg-2xlarge
RF-DETR Seg-2XLarge instance segmentation model (DINOv2-small backbone, 768px input, 6 decoder layers, 300 queries), served via the native rfdetr.cpp backend. Highest-accuracy segmentation variant — best for offline workflows and high-resolution inputs where CPU latency is secondary to mask quality. Returns both bounding boxes and per-instance masks via the /v1/detection endpoint. F16 quantization is the recommended default: identical accuracy to F32, half the size.

Repository: localaiLicense: apache-2.0

edgetam
EdgeTAM is an ultra-efficient variant of the Segment Anything Model (SAM) for image segmentation. It uses a RepViT backbone and is only ~16MB quantized (Q4_0), making it ideal for edge deployment. Supports point-prompted and box-prompted image segmentation via the /v1/detection endpoint. Powered by sam3.cpp (C/C++ with GGML).

Repository: localaiLicense: apache-2.0

nbeerbower_qwen3-gutenberg-encore-14b
nbeerbower/Xiaolong-Qwen3-14B finetuned on: jondurbin/gutenberg-dpo-v0.1 nbeerbower/gutenberg2-dpo nbeerbower/gutenberg-moderne-dpo nbeerbower/synthetic-fiction-dpo nbeerbower/Arkhaios-DPO nbeerbower/Purpura-DPO nbeerbower/Schule-DPO

Repository: localaiLicense: apache-2.0

fireball-meta-llama-3.2-8b-instruct-agent-003-128k-code-dpo
The LLM model is a quantized version of EpistemeAI/Fireball-Meta-Llama-3.2-8B-Instruct-agent-003-128k-code-DPO, which is an experimental and revolutionary fine-tune with DPO dataset to allow LLama 3.1 8B to be an agentic coder. It has some built-in agent features such as search, calculator, and ReAct. Other noticeable features include self-learning using unsloth, RAG applications, and memory. The context window of the model is 128K. It can be integrated into projects using popular libraries like Transformers and vLLM. The model is suitable for use with Langchain or LLamaIndex. The model is developed by EpistemeAI and licensed under the Apache 2.0 license.

Repository: localaiLicense: apache-2.0

llama-3.1-techne-rp-8b-v1
athirdpath/Llama-3.1-Instruct_NSFW-pretrained_e1-plus_reddit was further trained in the order below: SFT Doctor-Shotgun/no-robots-sharegpt grimulkan/LimaRP-augmented Inv/c2-logs-cleaned-deslopped DPO jondurbin/truthy-dpo-v0.1 Undi95/Weyaxi-humanish-dpo-project-noemoji athirdpath/DPO_Pairs-Roleplay-Llama3-NSFW

Repository: localaiLicense: llama3.1

calme-2.3-legalkit-8b-i1
This model is an advanced iteration of the powerful meta-llama/Meta-Llama-3.1-8B-Instruct, specifically fine-tuned to enhance its capabilities in the legal domain. The fine-tuning process utilized a synthetically generated dataset derived from the French LegalKit, a comprehensive legal language resource. To create this specialized dataset, I used the NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO model in conjunction with Hugging Face's Inference Endpoint. This approach allowed for the generation of high-quality, synthetic data that incorporates Chain of Thought (CoT) and advanced reasoning in its responses. The resulting model combines the robust foundation of Llama-3.1-8B with tailored legal knowledge and enhanced reasoning capabilities. This makes it particularly well-suited for tasks requiring in-depth legal analysis, interpretation, and application of French legal concepts.

Repository: localaiLicense: llama3.1

fireball-llama-3.11-8b-v1orpo
Developed by: EpistemeAI License: apache-2.0 Finetuned from model : unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit Finetuned methods: DPO (Direct Preference Optimization) & ORPO (Odds Ratio Preference Optimization)

Repository: localaiLicense: apache-2.0

humanish-roleplay-llama-3.1-8b-i1
A DPO-tuned Llama-3.1 to behave more "humanish", i.e., avoiding all the AI assistant slop. It also works for role-play (RP). To achieve this, the model was fine-tuned over a series of datasets: General conversations from Claude Opus, from Undi95/Meta-Llama-3.1-8B-Claude Undi95/Weyaxi-humanish-dpo-project-noemoji, to make the model react as a human, rejecting assistant-like or too neutral responses. ResplendentAI/NSFW_RP_Format_DPO, to steer the model towards using the *action* format in RP settings. Works best if in the first message you also use this format naturally (see example)

Repository: localaiLicense: apache-2.0

llama3.1-flammades-70b
nbeerbower/Llama3.1-Gutenberg-Doppel-70B finetuned on flammenai/Date-DPO-NoAsterisks and jondurbin/truthy-dpo-v0.1.

Repository: localaiLicense: llama3.1

llama3.1-gutenberg-doppel-70b
mlabonne/Hermes-3-Llama-3.1-70B-lorablated finetuned on jondurbin/gutenberg-dpo-v0.1 and nbeerbower/gutenberg2-dpo.

Repository: localaiLicense: llama3.1

Page 1