Skip to content

Conversation

@noorbhatia
Copy link
Contributor

Implement a simple caching for MLXLanguage
A possible solution for #89

@noorbhatia
Copy link
Contributor Author

noorbhatia commented Jan 13, 2026

@mattt could you please take a look at this? I tested the PR with https://github.com/mattt/chat-ui-swift and I’m seeing an issue with conversation history handling. The model appears to be responding to the previous (n-1) prompt rather than the latest one, which suggests the chat state may be getting out of sync somewhere.

Though cache seems to be working fine.

@mattt
Copy link
Owner

mattt commented Jan 13, 2026

@noorbhatia Oh, nice. Thank you for opening this PR in response to the issue you filed earlier. I'm wrapping up work on a Swift implementation of Xet. Once I cut that initial release, I'll take a look at this next.

@mattt
Copy link
Owner

mattt commented Jan 15, 2026

Hi @noorbhatia. Thanks for your patience. I just pushed ee7e4dc, which extends this approach with an actor-coordinated cache that coalesces concurrent model loads per key. That way, we avoid duplicate work, while still benefiting from the same eviction behavior of NSCache. This has a bit more overhead and complexity in the implementation, but that seems like a reasonable trade-off.

How does that look to you?

@mattt
Copy link
Owner

mattt commented Jan 15, 2026

FWIW, this implementation uses classes and locks. I tried a pure actor approach, but it ran into Swift 6’s strict Sendable rules: ModelContext is non‑Sendable because it holds any LanguageModel, any UserInputProcessor, and Tokenizer, so you can’t move it across actor boundaries. Even if you keep it inside an actor, calls like processor.prepare(input:) are nonisolated, so Swift still flags "sending risks data race" when you pass actor‑isolated values into those APIs. Streaming also pushes you toward crossing the boundary.

To make this work cleanly, MLX would need to make ModelContext and its components Sendable with real thread‑safety guarantees, or provide actor‑safe APIs so all interaction stays within an actor. Without those upstream changes, we're stuck either using @unchecked Sendable or keeping all MLX work inside an actor and only emitting Sendable outputs.

(Alternatively, it may just be that I'm not smart enough to figure out how to make this work)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants