Skip to content

vllm-project/semantic-router

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
vLLM Semantic Router

Documentation Hugging Face License Crates.io Test And Build Ask DeepWiki

πŸ“š Complete Documentation | πŸš€ Quick Start | πŸ“£ Blog | πŸ“– Publications


Latest News πŸ”₯


Goals

We are building the System Level Intelligence for Mixture-of-Models (MoM), bringing the Collective Intelligence into LLM systems, answering the following questions:

  1. How to capture the missing signals in request, response and context?
  2. How to combine the signals to make better decisions?
  3. How to collaborate more efficiently between different models?
  4. How to secure the real world and LLM system from jailbreaks, pii leaks, hallucinations?
  5. How to collect the valuable signals and build a self-learning system?

vLLM Semantic Router Banner

Where it lives

It lives between the real world and models:

level

Architecture

A quick overview of the current architecture:

architecture

Quick Start

Installation

Tip

We recommend that you setup a Python virtual environment to manage dependencies.

$ python -m venv vsr
$ source vsr/bin/activate
$ pip install vllm-sr

Installed successfully if you see the following help message:

$ vllm-sr

       _ _     __  __       ____  ____
__   _| | |_ _|  \/  |     / ___||  _ \
\ \ / / | | | | |\/| |_____\___ \| |_) |
 \ V /| | | |_| | |  |_____|___) |  _ <
  \_/ |_|_|\__,_|_|  |     |____/|_| \_\

vLLM Semantic Router - Intelligent routing for vLLM

Usage: vllm-sr [OPTIONS] COMMAND [ARGS]...

  vLLM Semantic Router CLI - Intelligent routing and caching for vLLM
  endpoints.

Options:
  --version  Show version and exit.
  --help     Show this message and exit.

Commands:
  config  Print generated configuration.
  init    Initialize vLLM Semantic Router configuration.
  logs    Show logs from vLLM Semantic Router service.
  serve   Start vLLM Semantic Router.
  status  Show status of vLLM Semantic Router services.
  stop    Stop vLLM Semantic Router.

Tip

You can specify the HF_ENDPOINT, HF_TOKEN, and HF_HOME environment variables to configure the Hugging Face credentials.

# Set environment variables (optional)
export HF_ENDPOINT=https://huggingface.co  # Or use mirror: https://hf-mirror.com
export HF_TOKEN=your_token_here  # Only for gated models
export HF_HOME=/path/to/cache  # Optional: custom cache directory

# Start the service - models download automatically
# Environment variables are automatically passed to the container
vllm-sr serve

Documentation πŸ“–

For comprehensive documentation including detailed setup instructions, architecture guides, and API references, visit:

Complete Documentation at Read the Docs

The documentation includes:

Community πŸ‘‹

For questions, feedback, or to contribute, please join #semantic-router channel in vLLM Slack.

Community Meetings πŸ“…

We host bi-weekly community meetings to sync up with contributors across different time zones:

Join us to discuss the latest developments, share ideas, and collaborate on the project!

Citation

If you find Semantic Router helpful in your research or projects, please consider citing it:

@misc{semanticrouter2025,
  title={vLLM Semantic Router},
  author={vLLM Semantic Router Team},
  year={2025},
  howpublished={\url{https://github.com/vllm-project/semantic-router}},
}

Star History πŸ”₯

We opened the project at Aug 31, 2025. We love open source and collaboration ❀️

Star History Chart

Sponsors πŸ‘‹

We are grateful to our sponsors who support us:


AMD provides us with GPU resources and ROCmβ„’ Software for training and researching the frontier router models, enhancing e2e testing, and building online models playground.