Model Gallery

4 models from 1 repositories

Filter by type:

Filter by tags:

kalomaze_qwen3-16b-a3b
A man-made horror beyond your comprehension. But no, seriously, this is my experiment to: measure the probability that any given expert will activate (over my personal set of fairly diverse calibration data), per layer prune 64/128 of the least used experts per layer (with reordered router and indexing per layer) It can still write semi-coherently without any additional training or distillation done on top of it from the original 30b MoE. The .txt files with the original measurements are provided in the repo along with the exported weights. Custom testing to measure the experts was done on a hacked version of vllm, and then I made a bespoke script to selectively export the weights according to the measurements.

Repository: localaiLicense: apache-2.0

bge-m3-colbert
BAAI/bge-m3 loaded by the rerankers backend in ColBERT (late-interaction MaxSim) mode. Pairs with the `colbert` router classifier to score policy descriptions against the prompt without an LLM round-trip — robust on abstract or short labels where next-token scoring with Arch-Router-style models is noisy.

Repository: localaiLicense: mit

arch-router-1.5b-q4
Arch-Router-1.5B is a compact router LLM from Katanemo, fine-tuned from Qwen2.5-1.5B-Instruct. Given a prompt and a set of user-defined route policies (domain + action), it picks the best-matching policy name so requests can be dispatched to the appropriate downstream model. Designed for low-latency, high-throughput use inside the Arch proxy, it pairs with LocalAI's router classifier as a preference-aligned alternative to embedding/ColBERT-based routing on concrete, well-described policies.

Repository: localaiLicense: other

arch-router-1.5b-q8
Arch-Router-1.5B is a compact router LLM from Katanemo, fine-tuned from Qwen2.5-1.5B-Instruct. Given a prompt and a set of user-defined route policies (domain + action), it picks the best-matching policy name so requests can be dispatched to the appropriate downstream model. Designed for low-latency, high-throughput use inside the Arch proxy, it pairs with LocalAI's router classifier as a preference-aligned alternative to embedding/ColBERT-based routing on concrete, well-described policies.

Repository: localaiLicense: other