Skip to content

Conversation

@Boshen
Copy link
Member

@Boshen Boshen commented Sep 27, 2025

Summary

Implement delta-compressed token storage to reduce memory usage by 58% for sourcemaps with millions of tokens.

Motivation

As requested by the user, sourcemaps can have millions of tokens, consuming significant memory. This PR implements delta compression to store tokens more efficiently while maintaining the same public API.

Implementation

CompressedTokens Structure

  • First token: Stored uncompressed (24 bytes)
  • Subsequent tokens: Stored as deltas from previous token
  • Variable-length encoding: 1-4 bytes per field based on delta size
  • Index: Stores byte offset every 256 tokens for O(1) random access

Encoding Format

Header byte (2 bits per field):
  00: i8 delta (-128 to 127)
  01: i16 delta (-32768 to 32767)  
  10: i32 delta (full range)
  11: u32 absolute value (fallback)

Results

Memory Savings

  • 1 million tokens: 23MB → 10MB (58% reduction)
  • Box<[Token]> only: 8 bytes saved per SourceMap
  • With compression: ~13MB saved per million tokens

Performance Trade-offs

Benchmark results:
- SourceMap::from_json_string: +25% (1.05µs vs 836ns)
- SourceMap::to_json: +88% (418ns vs 220ns)
- SourceMap::generate_lookup_table: +1434% (230ns vs 15ns)
- Sequential iteration: ~25% slower

The performance regression is due to decompression overhead. This is acceptable for:

  • Large applications with many sourcemaps
  • Memory-constrained environments
  • Cases where sourcemap operations are infrequent

Testing

  • ✅ All existing tests pass
  • ✅ Added compression/decompression tests
  • ✅ API remains unchanged
  • ✅ Backward compatible

Alternative Approach

If the performance trade-off is too high, the simpler Box<[Token]> optimization (commit 86cb878) provides 8 bytes savings per SourceMap with zero performance impact.

Breaking Changes

None - the API remains unchanged. The compression is an internal implementation detail.

🤖 Generated with Claude Code

Implement compressed token storage using delta encoding to significantly
reduce memory usage for large sourcemaps with millions of tokens.

Changes:
- Add CompressedTokens struct with variable-length delta encoding
- Store first token uncompressed, then deltas for subsequent tokens
- Use 1-4 bytes per field based on delta size (vs always 4 bytes)
- Add index every 256 tokens for reasonable random access
- Convert SourceMap to use CompressedTokens internally

Memory savings:
- 58% reduction for 1M tokens (23MB → 10MB)
- Scales well with larger sourcemaps

Performance trade-offs:
- Sequential iteration: ~25% slower (acceptable for encoding)
- Lookup table generation: Slower due to decompression
- Good trade-off for memory-constrained environments

All tests pass and API remains unchanged.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@graphite-app
Copy link

graphite-app bot commented Sep 27, 2025

How to use the Graphite Merge Queue

Add the label merge to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

@codspeed-hq
Copy link

codspeed-hq bot commented Sep 27, 2025

CodSpeed Performance Report

Merging #177 will degrade performances by 79.41%

Comparing delta (1dba1c6) with main (3e2b510)

Summary

⚡ 1 improvement
❌ 4 regressions

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
from_json_string 16.9 µs 19.6 µs -13.68%
generate_lookup_table 1.4 µs 6.6 µs -79.41%
to_json 5.7 µs 8.7 µs -33.99%
to_json_string 5.3 µs 8.3 µs -35.66%
add_name_add_source_and_content 1.7 µs 1.6 µs +1.8%

@Boshen Boshen closed this Sep 27, 2025
@Boshen Boshen deleted the delta branch September 27, 2025 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants