Skip to content

Feature Request: Implement Deadline-Based Priority IntraFlow Dispatch Policy #1913

@googs1025

Description

@googs1025

Hello maintainers,

I'd like to propose implementing a new intra-flow dispatch policy that prioritizes requests based on their deadline urgency (i.e., SLO-driven scheduling).

Motivation

In production LLM serving, different requests often have different latency requirements:

  • Interactive user queries may require sub-second responses.

The current FCFS policy treats all requests equally, which can cause high-priority requests to be delayed by long-running or low-priority ones in the same flow (e.g., same model). A deadline-aware policy would improve SLO compliance and user experience.

Proposed Design

  • Policy Name: DeadlinePriority
  • Mechanism:
    • Compute absolute deadline as EnqueueTime() + EffectiveTTL()
    • Prioritize requests with earlier absolute deadlines
    • Use FCFS as tie-breaker for requests with identical deadlines
  • Queue Requirement: CapabilityPriorityConfigurable (e.g., heap-based priority queue)
  • Backward Compatibility: Requests without TTL are treated as lowest priority but still scheduled fairly via FCFS.

Benefits

  • Enables per-request SLO enforcement
  • Improves tail latency for time-sensitive workloads
  • Fully leverages existing EffectiveTTL and EnqueueTime metadata

I’m happy to contribute an initial implementation if this aligns with the project’s direction. Please let me know your thoughts!

Thank you!

Metadata

Metadata

Assignees

Labels

needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions