Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Jan 14, 2026

This is a proposed implementation of the ALP described in

"ALP: Adaptive Lossless floating-Point Compression" (SIGMOD 2024, https://dl.acm.org/doi/10.1145/3626717

It is based on (largely a reformatted version of) @prtkgaur 's ALP Encoding Specification Google Doc

See rendered preview here: https://github.com/alamb/parquet-format/blob/alamb/alp/Encodings.md#adaptive-lossless-floating-point-adaptive_lossless_floating_point--10

Rationale for this change

This encoding has the following properties:

  • Targets real-world floating-point (IEEE 754) data.
  • It achieves higher compression ratios (close to ZSTD)
  • Much faster to decompress than zstd (and other floating point algorithms)

See Mailing List Discussion: https://lists.apache.org/thread/tjtln1mmjqfoql1ls2dw9xpdk91r1909

Screenshot 2026-01-14 at 2 45 35 PM

Source ALP Results Document

(Todo summarize the mailing list discussion here)

What changes are included in this PR?

Do these changes have PoC implementations?

Yes

@alamb alamb marked this pull request as ready for review January 14, 2026 20:17
@alamb alamb changed the title GH-533: Add ALP Encoding GH-533: Adaptive Lossless Floating-Point (ALP) Encoding Jan 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Proposal] Add ALP encoding support in parquet file format

1 participant