Skip to content

Conversation

@JohnathanFL
Copy link

First, apologies for the guerilla PR. I've been using the library for some time while thinking it'd be nice to have per-string data, so I eventually just decided to implement it for fun first and see if it's acceptable to merge back in second.

This refactors the internals of the crate to allow for multiple independent caches (whose ustrs/tokens are distinct types), closing #30, that may additionally choose to store their own datatype alongside the string's hash/length, deriving the data on first internment. The main implementations of global helpers like string_cache_iter are also moved into the trait itself, so you can just say Dataless::string_cache_iter() or Dataless::num_entries(), and the old global functions redirect to those.

Because there's now a pervasive type parameter required / to maintain compatibility with existing code, I switched it up so Ustr is merely a typedef for a new internal InternedString<N> type using a dataless (()-storing) namespace. The idea is that anyone who wants more namespaces/data would define their own facade, like:

static FOO_NS: LazyLock<Bins<FooNs>> = LazyLock::new(|| Bins::new());
struct FooNs;
impl StringCacheNs for FooNs {
    type Data = char;
    fn derive_cache_data(string: &str) -> Self::Data {
        string.chars().last().unwrap()
    }
    fn cache() -> &'static Bins<Self> {
        &FOO_NS
    }
}
pub type MyStr = InternedString<FooNs>;
pub fn mystr(s: &str) -> MyStr { MyStr::from(s) }

I'll leave this as a draft PR for the moment as I look for other things to clean up / use it myself / eventually run through the benchmarks to see if there's any impact from the trait indirection. Let me know if there's anything you'd want changed or if this just doesn't seem like a good fit to merge in.

- Adds a new StringCacheNs trait which defines an associated data-type
  for the namespace and functions to retrieve the global cache and derive data from a new string.
- Defines a default `Dataless` namespace which uses `()` as its data.
- Re-parameterizes `Ustr`, `Bins`, etc with type params of that trait.
  - `Ustr` defaults to `Dataless`, preserving existing semantics/size/etc.
- Moves implementations for some helper methods inside the trait.
- Added simple examples for how to make a new namespace.

TODO:
- Likely some more cleanup to fix API compat.
- Should things be moved around a bit? Right now, stringcache.rs references back up to lib.rs.
- Maybe some more ergonomics/renames.
Now uses InternedString<N> as the base class, allowing existing
code to continue using Ustr:: methods without extra ::<N>:: annoyances.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant