Releases: delta-io/delta-rs
Releases · delta-io/delta-rs
python-v1.2.1: lazy writes
Performance improvements
- feat: in-flight, streaming
PartitionWriterby @abhiaagarwal in #3857 - fix: use single writer for all partition streams by @ion-elgreco in #3870
What's Changed
- feat: datafusion based kernel engine by @roeap in #3831
- fix: update pyproject.toml by @wagenrace in #3854
- chore: upgrade datafusion, arrow and parquet by @dentiny in #3856
- feat: allow RecordBatchWriter to pass through pass-through-commit-properties by @rtyler in #3858
- perf: support pushing physical filters down through DeltaScan by @alexwilcoxson-rel in #3859
- chore: remove some deprecated methods by @roeap in #3861
- fix: resolve some warnings by @roeap in #3862
- chore: deprecate file_actions on state by @roeap in #3863
- refactor: consolidate datafusion session setup by @roeap in #3860
- fix(core): handle Result type after get_actions sync conversion by @yousefsaad12 in #3846
- chore: change the core and meta crate versions for release by @rtyler in #3864
- chore: use form based issue templates by @roeap in #3865
- chore: add python deprecation warnings by @roeap in #3869
- feat(bench): add TPC-DS benchmarks by @abhiaagarwal in #3845
- fix: add regression test for working with dotted-named columns in Python by @rtyler in #3873
- fix: add a regression test while I'm tooting around by @rtyler in #3874
- feat: allow for lazy loading files in operations by @roeap in #3872
New Contributors
- @wagenrace made their first contribution in #3854
- @dentiny made their first contribution in #3856
- @yousefsaad12 made their first contribution in #3846
Full Changelog: python-v1.2.0...python-v1.2.1
python-v1.2.0
What's Changed
- feat!: kernel log replay by @roeap in #3660
- chore: update hdfs object store to 0.15 by @Kimahriman in #3681
- fix(pandas): implement-automatic-conversion-for-pandas-null-types by @fvaleye in #3695
- fix(format): fix formatting in Python for conversion file by @fvaleye in #3705
- chore: remove unused dependencies by @rtyler in #3698
- fix: enabling correctly pulling partition values out of column mapped tables by @rtyler in #3706
- feat!: use kernel predicates on file streams by @roeap in #3669
- fix: reintroduce the 100 commit checkpoint interval by @rtyler in #3708
- chore: follow up changes on rust-v0.28.0 by @rtyler in #3712
- chore(cargo): add cargo-machete to detect and remove unused dependencies by @fvaleye in #3713
- chore: update kernel to 0.15.1 by @roeap in #3714
- feat(storage): expand user with tilde in local path by @fvaleye in #3717
- chore: bump to a minor version for a small core release with the new kernel by @rtyler in #3718
- feat: domain metadata read support by @roeap in #3678
- chore!: remove deprecated methods by @roeap in #3715
- refactor: move table provider to dedicated mod by @roeap in #3726
- feat(url): use Url in Rust for accessing to DeltaTable, use only string-based api in Python by @fvaleye in #3707
- chore(ci): cache rust dependencies in the CI by @fvaleye in #3728
- refactor: avoid explicit mutex in MergeBarrier by @roeap in #3734
- fix: re-export the DecimalType for consumers by @rtyler in #3738
- fix:
write_deltalakewithmode="overwrite"mode andschema_mode=Nonedoes not overwrite schema metadata by @FrankPortman in #3747 - feat: add per column Parquet Encoding support for Delta Table column by @niltecedu in #3737
- fix: better error handles in unity client by @hntd187 in #3752
- feat(datafusion): add insert_into operation with DataFusion by @fvaleye in #3762
- fix: ensure that invalid URLs are bubbled up as errors when parsed by @rtyler in #3766
- feat: change history() to return an Iterator by @rtyler in #3764
- chore: update docs by @ion-elgreco in #3761
- feat: update to DataFusion 50, pyo3 24, pyo3-arrow 0.11 by @alamb in #3749
- feat: allow OptimizeBuilder to accept SessionConfig for finer-grained control of execution by @rtyler in #3763
- feat(unity-catalog): support credentials via storage options by @fvaleye in #3769
- fix: check if eligible to read by @ion-elgreco in #3771
- chore: upgrade the aws dependencies in deltalake-aws by @rtyler in #3772
- chore: pin cargo-machete action to the sha right before a regression by @rtyler in #3779
- chore: fix some typos in comment by @juejinyuxitu in #3781
- chore: upgrade to delta-kernel-rs 0.16.0 and remove more dependencies by @rtyler in #3773
- feat: get the delta table row count based on the table history by @ohadmata in #3732
- fix: somehow the right test value didn't make it into the pr by @rtyler in #3788
- fix: correct RecordBatchWriter interior schema mutation outside of evolution by @rtyler in #3783
- fix: use a safe checkpoint when cleaning up metadata by @corwinjoy in #3748
- chore(deps): update sqlparser requirement from 0.56.0 to 0.59.0 by @dependabot[bot] in #3792
- chore(ci): add automatic cache cleanup for closed main branch PRs by @fvaleye in #3793
- chore(deps): update foyer requirement from 0.17.2 to 0.20.0 by @dependabot[bot] in #3791
- refactor: use EagerSnapshot in datafusion module by @roeap in #3796
- feat: allow passing a
SessionStateinto aOptimizeBuilderby @abhi-airspace-intelligence in #3802 - refactor: remove table_url from Snapshot by @roeap in #3803
- fix: maintaining load config from state by @ion-elgreco in #3805
- refactor: consolidate extension planners by @roeap in #3804
- ci: split out integration tests by @roeap in #3806
- feat: access tombstones via TombstoneView by @roeap in #3809
- refactor: use EagerSnapshot in vacuum operation by @roeap in #3812
- fix: avoid overflow for large table state by @roeap in #3801
- refactor: avoid downcasting to SessionState by @roeap in #3813
- feat: add deletion_vector_descriptor method by @zeevm in #3721
- refactor: move find_files into dedicated mod by @roeap in #3815
- chore: remove unreferenced file by @roeap in #3819
- feat: shim kernel Scan and ScanBuilder by @roeap in #3818
- chore(deps): update datatest-stable requirement from 0.2 to 0.3 by @dependabot[bot] in #3817
- feat: expose arrow schema on snapshots by @roeap in #3822
- fix: update deprecation versions to next release by @roeap in #3828
- chore: unify inconsistent
SessionStatein datafusion operations by @abhi-airspace-intelligence in #3816 - fix(rust): protect recent uncommitted files in vacuum full mode by @vsmanish1772 in #3835
- feat: enable ability to do writes through Unity Catalog by @hntd187 in #3834
- chore(performance): optimize JSON parsing in get_actions and snapshot reading by @fvaleye in #3830
- refactor(bench): remove baseline while keeping the json_parsing benchmark by @fvaleye in #3838
- perf(path): only clone string for the path by @fvaleye in #3841
- feat(tracing): add tracing spans to all I/O sections by @fvaleye in #3795
- feat(bench): add new benchmarking script, harness, and profiling guide by @abhiaagarwal in #3840
- chore: bump version from 1.1.4 to 1.2.0 by @ion-elgreco in #3842
New Contributors
- @FrankPortman made their first contribution in #3747
- @niltecedu made their first contribution in #3737
- @juejinyuxitu made their first contribution in #3781
- @ohadmata made their first contribution in #3732
- @abhi-airspace-intelligence made their first contribution in #3802
- @vsmanish1772 made their first contribution in #3835
Full Changelog: python-v1.1.4...python-v1.2.0
python-v1.1.4
What's Changed
- chore: bump version for release by @rtyler in #3633
- feat: add keep_versions parameter to vacuum command for python by @corwinjoy in #3635
- chore: remove deprecated use of kernel's Table by @rtyler in #3639
- chore: use pytest-xdist for speeding up python tests by @rtyler in #3642
- fix: aws special paths encoding by @roeap in #3656
- fix: handle checking partition filters in array/list when converting … by @smeyerre in #3657
- ci: run integration tests against next branches by @roeap in #3658
- feat: support converting parquet with non-microsecond timestamps to d… by @smeyerre in #3654
- fix: use RFC3896 percent encoding with delta protocol correctness by @ion-elgreco in #3661
- chore: bump python by @ion-elgreco in #3664
Full Changelog: python-v1.1.3...python-v1.1.4
python-v1.1.3
What's Changed
- refactor: make "cloud" feature in object_store optional by @zeevm in #3590
- fix: creating new DeltaTable with invalid table name path no longer creates empty directory by @smeyerre in #3504
- fix: fix typo to fix CI typo check by @alamb in #3604
- fix: scan time was always 0 for merge metrics by @rtyler in #3596
- chore: minor API changes after integration testing by @rtyler in #3598
- fix: switch the url schemes for Azure integration tests by @rtyler in #3614
- fix: allow non-string primitive types for partition filters when converting to pyarrow dataset by @smeyerre in #3613
- refactor: make match_partitions and new_metadata public by @zeevm in #3605
- fix: ensure new checkpoints can be written after old checkpoints by @rtyler in #3616
- docs: fix broken daft links (how daft) by @rtyler in #3617
- fix: allow writing to DeltaTable objects across Python threads by @rtyler in #3618
- fix: ensure openssl-sys doesn't creep into the dependency via the kernel default engine by @rtyler in #3619
- chore: update to DataFusion
49.0.0by @alamb in #3603 - chore(deps): update rstest requirement from 0.25.0 to 0.26.1 by @dependabot[bot] in #3627
- fix: coerce polars.Array into a suitable Arrow list type by @rtyler in #3623
- fix: make the docs link checking more useful/less faily by @rtyler in #3630
- fix: avoid parsing generationExpressions as JSON by @rtyler in #3632
- feat: build musl wheels upon release by @rtyler in #3631
Full Changelog: python-v1.1.0...python-v1.1.3
python-v1.1.0
What's Changed
- refactor: compute stats schema with kernel types by @roeap in #3514
- chore: set java version to 21 for pyspark 4.0 by @ion-elgreco in #3524
- refactor: remove unecessary uses of datafusion subcrates by @alamb in #3521
- chore: pin aws crates by @ion-elgreco in #3532
- fix: using state provided in args in merge op by @gtrawinski in #3522
- fix: remove forced table update from python writer by @ohanf in #3515
- feat: make TableConfig accessible by @ion-elgreco in #3518
- chore: update the minor version to reflect a behavior change by @rtyler in #3542
- chore: update arrow/parquet to 55.2.0 by @alamb in #3558
- chore: clean up licenses in python project which are causing build issues by @rtyler in #3560
- fix: version binary search by @aditanase in #3549
- chore: update to DataFusion
48.0.0/ arrow to 55.2.0 by @alamb in #3520 - chore: upgrade to delta_kernel 0.12.x by @rtyler in #3561
- docs: ensure create_checkpoint() is visible in the Python API docs by @itamarst in #3564
- fix: use proper DeltaTableState for vacuum commits by @jeromegn in #3550
- chore: remove redundant words in comment by @shangchenglumetro in #3568
- refactor: move schema code to kernel module by @roeap in #3569
- feat: convert partition filters to kernel predicates by @roeap in #3570
- chore: latest clippy by @roeap in #3571
- chore: remove the deltalake-sql crate by @rtyler in #3582
- feat: write
engineInfowith delta-rs version by @zachschuermann in #3584 - chore: bump patch versions for another relaese by @rtyler in #3585
- feat: vacuum with version retention by @corwinjoy in #3537
- chore: bump minor version for rust crate by @rtyler in #3586
- refactor!: use delta-kernel Protocol and Metadata actions by @roeap in #3581
- chore: generate a more recentish updated changelog by @rtyler in #3588
New Contributors
- @ohanf made their first contribution in #3515
- @itamarst made their first contribution in #3564
- @jeromegn made their first contribution in #3550
- @shangchenglumetro made their first contribution in #3568
Full Changelog: python-v1.0.2...python-v1.1.0
rust-v0.27.0
Implemented enhancements:
- Feature: Vacuum with version retention #3530
- Any way to prune the delta_log or support shallow clones #3565
- Upgrade Arrow version to 55.1.0 #3540
- Add config option to suppress
deltalake_core::writer::statswarnings about bytes columns #3519 - Remove pyarrow dependency (make opt-in), replace with arro3 for core components #3455
- Don't retry lakefs commit or merge on
412response (precondition failed) #3429 - Use
object_storespawnService #3427 - Alter table description #3401
- Remove put if absent options injection #3310
- v1.0 Release tracking issue #3250
- feat: add a table description and name to the Delta Table from Python #3464 (fvaleye)
Fixed bugs:
- Python building broken on main due to maturin issue #3559
- TypeError: write_deltalake() got an unexpected keyword argument 'schema' (deltalake/polars) #3546
- SchemaMismatchError on empty ArrayType field while contains_null=True #3544
- Can't open a delta-table: Unsupported reader features required: DeletionVectors #3543
- Attempting to write a transaction 3 but the underlying table has been updated to 3 #3534
- DeltaOps not recognizing abfss scheme for Azure #3523
- Query execution time difference between
QueryBuilderand using DataFusion directly. #3517 - bug: timezone not preserved & raise exc on merge operation #3507
- allow_unsafe_rename option stopped working in version 1 #3493
- predicate appears to ignore partition and stats in pruning #3491
max_rows_per_fileignored when writing with rust engine #3490- delta-rs includes pending versions written by spark #3422
Merged pull requests:
- chore: bump minor version for rust crate #3586 (rtyler)
- refactor!: use delta-kernel Protocol and Metadata actions #3581 (roeap)
- feat: vacuum with version retention #3537 (corwinjoy)
- chore: bump patch versions for another relaese #3585 (rtyler)
- feat: write
engineInfowith delta-rs version #3584 (zachschuermann) - chore: remove the deltalake-sql crate #3582 (rtyler)
- chore: latest clippy #3571 (roeap)
- feat: convert partition filters to kernel predicates #3570 (roeap)
- refactor: move schema code to kernel module #3569 (roeap)
- chore: remove redundant words in comment #3568 (shangchenglumetro)
- docs: ensure create_checkpoint() is visible in the Python API docs #3564 (itamarst)
- chore: upgrade to delta_kernel 0.12.x #3561 (rtyler)
- chore: clean up licenses in python project which are causing build issues #3560 (rtyler)
- chore: update arrow/parquet to 55.2.0 #3558 (alamb)
- fix: use proper DeltaTableState for vacuum commits #3550 (jeromegn)
- fix: version binary search #3549 (aditanase)
- chore: update the minor version to reflect a behavior change #3542 (rtyler)
- chore: pin aws crates #3532 (ion-elgreco)
- chore: set java version to 21 for pyspark 4.0 #3524 (ion-elgreco)
- fix: using state provided in args in merge op #3522 (gtrawinski)
- refactor: remove unecessary uses of datafusion subcrates #3521 (alamb)
- chore: update to DataFusion
48.0.0/ arrow to 55.2.0 #3520 (alamb) - feat: make TableConfig accessible #3518 (ion-elgreco)
- fix: remove forced table update from python writer #3515 (ohanf)
- refactor: compute stats schema with kernel types #3514 (roeap)
- feat: add convenience extension for kernel engine types #3510 (roeap)
- refactor: move LazyTableProvider into python crate #3509 (roeap)
- fix: setting wrong schema in table provider for
merge#3508 (ion-elgreco) - fix: constraint parsing, roundtripping #3503 (ion-elgreco)
- refactor!: have DeltaTable::version return an Option #3500 (roeap)
- chore!: remove get_earliest_version #3499 (roeap)
- chore: prepare for the next python release #3498 (rtyler)
- ci: improve coverage collection #3497 (roeap)
- chore: update runner #3494 (ion-elgreco)
- docs: update link to df #3489 (rluvaton)
- refactor!: remove and deprecate some python methods #3488 (roeap)
- fix: ensure projecting only columns that exist in new files afte sche… #3487 (alexwilcoxson-rel)
- chore: exclude Invariants from the default writer v2 feature set #3486 (rtyler)
- test: improve storage config testing #3485 (roeap)
- refactor!: get transaction versions for specific applications #3484 (roeap)
- docs: fix bullet list formatting in dagster docs #3483 (avriiil)
- fix: set casting safe param to False #3481 (ion-elgreco)
- chore: update kernel to 0.11 #3480 (roeap)
- chore: update migration docs #3479 (ion-elgreco)
- chore: remove unused stats_parsed field #3475 (roeap)
- refactor: remove protocol error #3473 (roeap)
- chore: more typos #3471 (roeap)
- chore: remove unused time_utils #3470 (roeap)
- chore: set correct markers...
python-v1.0.2
What's Changed
- fix: setting wrong schema in table provider for
mergeby @ion-elgreco in #3508 - refactor: move LazyTableProvider into python crate by @roeap in #3509
- feat: add convenience extension for kernel engine types by @roeap in #3510
Full Changelog: python-v1.0.1...python-v1.0.2
python-v1.0.1
Bug Fixes
- fix: constraint parsing, roundtripping by @ion-elgreco in #3503
Other Changes
- docs: update link to df by @rluvaton in #3489
- chore: update runner by @ion-elgreco in #3494
- ci: improve coverage collection by @roeap in #3497
- chore: prepare for the next python release by @rtyler in #3498
- chore!: remove get_earliest_version by @roeap in #3499
- refactor!: have DeltaTable::version return an Option by @roeap in #3500
New Contributors
Full Changelog: python-v1.0.0...python-v1.0.1
python-v1.0.0: Zero to One
It only took us 5 years, but we made it! You can find the upgrade guide here.
Performance improvements
- refactor: async writer + multi-part by @ion-elgreco in #3255
- perf: use lazy sync reader by @ion-elgreco in #3338
New features
- feat: remove optimize operations when building without Apache Datafusion by @rtyler in #3290
- feat(api): add rustls and native-tls features by @zeevm in #3335
- feat!: update storage configuration system by @roeap in #3383
- feat: derive macro for config implementations by @roeap in #3389
- feat: upgrade to DataFusion 47.0.0 by @alamb in #3378
- feat: introduce VacuumMode::Full for cleaning up orphaned files by @rtyler in #3368
- feat: during LakeFS file operations, skip merge when 0 changes by @smeyerre in #3346
- feat: added a check for gc code to run by @JustinRush80 in #3419
- feat: spawn io with spawn service by @ion-elgreco in #3426
- feat: optimize datafusion predicate pushdown and partition pruning by @rtyler in #3436
- feat: expose kernel Engine on LogStore by @roeap in #3446
- refactor: remove pyarrow dependency by @ion-elgreco in #3459
- feat: write checkpoints with kernel by @roeap in #3466
- feat: add a table description and name to the Delta Table from Python by @fvaleye in #3464
- refactor!: remove and deprecate some python methods by @roeap in #3488
Bug Fixes
- fix: use field physical name when resolving partition columns by @zeevm in #3349
- fix(pandas): retain pyarrow decimal datatype in to_pandas() by adding types_mapper to prevent precision loss by @Abhishek1005 in #3296
- fix: prevent panics when peek_next_commit() encounters invalid data by @rtyler in #3308
- fix: serialize empty deletionVector in add actions as absent by @rtyler in #3309
- fix: stats column binary_column has unsupported type binary by @omkar-foss in #3146
- fix: check for all known valid delta files in is_deltatable by @umartin in #3318
- fix: block_in_place to allow nested tasks by @ion-elgreco in #3324
- fix: parse snapshot by @ion-elgreco in #3355
- fix: added restored metadata as action to the next committed version by @Nordalf in #3303
- fix: parse unconventional logs by @roeap in #3373
- fix: clippy warnings by @alamb in #3390
- fix: the default target size should be 100MB by @HiromuHota in #3404
- fix: if field contains space in constraint expression, checks will fail by @Nordalf in #3374
- fix: build Unity Catalog crate without DataFusion by @linhr in #3420
- fix: drop column update by @ion-elgreco in #3416
- fix: ignore temp log entries by @corwinjoy in #3423
- fix: use more accurate log path parsing by @roeap in #3461
- fix: correct spelling errors found by CI spell checker by @fvaleye in #3465
- fix: schema conversion, add conversion test cases by @ion-elgreco in #3468
- fix: set casting safe param to False by @ion-elgreco in #3481
- fix: ensure projecting only columns that exist in new files afte sche… by @alexwilcoxson-rel in #3487
Other Changes
- refactor: drop pyarrow support, restructure python modules by @ion-elgreco in #3285
- chore: bump python version for release by @rtyler in #3291
- chore: use flags for apple arm64 by @ion-elgreco in #3213
- chore: upgrade the kernel version and bump our majorish versions too by @rtyler in #3289
- chore: upgrade to DataFusion 46.0.0 by @alamb in #3261
- refactor: add 'cloud' feature to 'core' to enable 'cloud' on 'object_store' only when needed by @zeevm in #3332
- docs: update dataFusion integration example by @riziles in #3343
- refactor(python): improve typing, linting by @ion-elgreco in #3344
- chore: remove pyarrow upper by @ion-elgreco in #3325
- chore: improve io error msg by @ion-elgreco in #3328
- docs: update merge-tables.md with "Optimizing Merge Performance" section by @ldacey in #3351
- docs: add example how to authenticate using Azure CLI for Azure ADSL integration by @DanielBertocci in #3357
- chore: remove cdf feature by @ion-elgreco in #3365
- fix: correct Python docs for incremental compaction on OPTIMIZE by @roykim98 in #3301
- chore: fix some minor build warnings by @rtyler in #3366
- refactor: move transaction module to kernel by @roeap in #3380
- chore: clippy by @roeap in #3379
- chore: move proofs into dedicated folder by @roeap in #3381
- refactor!: move storage module into logstore by @roeap in #3382
- chore: put a couple symbols behind the right feature gate by @rtyler in #3393
- chore: update delta_kernel to 0.10.0 by @zachschuermann in #3403
- refactor: make "cloud" feature in object_store optional by @zeevm in #3398
- chore: bump versions of rust crates for another release party by @rtyler in #3406
- chore: commit the contents of the 0.26.0 release by @rtyler in #3408
- chore: reduce scope of feature flags and compilation requirements for subcrates by @rtyler in #3409
- chore(deps): update sqlparser requirement from 0.53.0 to 0.56.0 by @dependabot in #3413
- chore(deps): update foyer requirement from 0.16.1 to 0.17.0 by @dependabot in #3412
- chore: bringing dat integration testing in ahead of kernel replay by @rtyler in #3411
- chore: missed a version bump for core by @rtyler in #3415
- chore: include license file in deltalake-derive crate by @ankane in #3417
- chore(deps): bump foyer to v0.17.2 to prevent from wrong result by @MrCroxx in #3428
- chore: bump crate versions which are due for release by @rtyler in #3430
- chore: rely on the testing during coverage generation to speed up tests by @rtyler in #3431
- chore: make codecov more vigorously enforced to help ensure quality by @rtyler in #3434
- chore: prepare py-1.0 release by @ion-elgreco in #3435
- chore: experiment with using sccache in GitHub Actions by @rtyler in #3437
- chore: remove unused code and deps by @roeap in #3441
- chore: minor table module refactors by @rtyler in #3442
- docs: add 1.0.0 migration guide by @ion-elgreco in #3443
- refactor: more specific factory parameter names by @roeap in #3445
- refactor: use LogStore in Snapshot / LogSegment APIs by @roeap in #3452
- test: avoid circular dependency with core/test crates by @roeap in #3450
- chore: ensuring default builds work without datafusion by @rtyler in #3453
- ci: add spellchecker to pr tests by @roeap in #3457
- chore: mark more tests which require datafusion by @rtyler in #3458
- refactor: use full paths in log processing by @roeap in #3456
- chore: set correct markers by @ion-elgreco in https://github.com/delta-io/...
python-v0.25.5
What's Changed
Full Changelog: python-v0.25.4...python-v0.25.5