Implement NSCache to cache MLX ModelContext #90

noorbhatia · 2026-01-13T15:34:28Z

Implement a simple caching for MLXLanguage
A possible solution for #89

noorbhatia · 2026-01-13T15:37:46Z

@mattt could you please take a look at this? I tested the PR with https://github.com/mattt/chat-ui-swift and I’m seeing an issue with conversation history handling. The model appears to be responding to the previous (n-1) prompt rather than the latest one, which suggests the chat state may be getting out of sync somewhere.

Though cache seems to be working fine.

mattt · 2026-01-13T18:42:41Z

@noorbhatia Oh, nice. Thank you for opening this PR in response to the issue you filed earlier. I'm wrapping up work on a Swift implementation of Xet. Once I cut that initial release, I'll take a look at this next.

mattt · 2026-01-15T12:27:24Z

Hi @noorbhatia. Thanks for your patience. I just pushed ee7e4dc, which extends this approach with an actor-coordinated cache that coalesces concurrent model loads per key. That way, we avoid duplicate work, while still benefiting from the same eviction behavior of NSCache. This has a bit more overhead and complexity in the implementation, but that seems like a reasonable trade-off.

How does that look to you?

mattt · 2026-01-15T12:37:31Z

FWIW, this implementation uses classes and locks. I tried a pure actor approach, but it ran into Swift 6’s strict Sendable rules: ModelContext is non‑Sendable because it holds any LanguageModel, any UserInputProcessor, and Tokenizer, so you can’t move it across actor boundaries. Even if you keep it inside an actor, calls like processor.prepare(input:) are nonisolated, so Swift still flags "sending risks data race" when you pass actor‑isolated values into those APIs. Streaming also pushes you toward crossing the boundary.

To make this work cleanly, MLX would need to make ModelContext and its components Sendable with real thread‑safety guarantees, or provide actor‑safe APIs so all interaction stays within an actor. Without those upstream changes, we're stuck either using @unchecked Sendable or keeping all MLX work inside an actor and only emitting Sendable outputs.

(Alternatively, it may just be that I'm not smart enough to figure out how to make this work)

Implement NSCache to cache MLX ModelContext

650e3a0

Implement actor-coalesced MLX model cache backed by NSCache

ee7e4dc

mattt force-pushed the noor/mlx-cache branch from 0a95762 to ee7e4dc Compare January 15, 2026 12:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement NSCache to cache MLX ModelContext #90

Implement NSCache to cache MLX ModelContext #90

Uh oh!

noorbhatia commented Jan 13, 2026

Uh oh!

noorbhatia commented Jan 13, 2026 •

edited

Loading

Uh oh!

mattt commented Jan 13, 2026

Uh oh!

mattt commented Jan 15, 2026

Uh oh!

mattt commented Jan 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Implement NSCache to cache MLX ModelContext #90

Are you sure you want to change the base?

Implement NSCache to cache MLX ModelContext #90

Uh oh!

Conversation

noorbhatia commented Jan 13, 2026

Uh oh!

noorbhatia commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattt commented Jan 13, 2026

Uh oh!

mattt commented Jan 15, 2026

Uh oh!

mattt commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

noorbhatia commented Jan 13, 2026 •

edited

Loading

mattt commented Jan 15, 2026 •

edited

Loading