Releases: Eventual-Inc/Daft
Releases · Eventual-Inc/Daft
v0.6.14
What's Changed 🚀
✨ Features
- feat: embed text metrics @colin-ho (#5583)
- feat: Add description and attributes to custom udf metrics @colin-ho (#5574)
- feat(flotilla): Aggregate Completed Worker Metrics in StatsManager @srilman (#5531)
- feat: add amplification metric for explode operator in native runner @samstokes (#5565)
🐛 Bug Fixes
- fix: Fix empty dataframe
showissue @caican00 (#5595) - fix: Fix openai test metrics fixture @colin-ho (#5593)
- fix: resolve docgen disk space failures by removing unused tools @ykdojo (#5589)
- fix: fix imports in explode.rs @colin-ho (#5573)
- fix: Add support for parsing STRUCT with parentheses syntax @Lucas61000 (#5449)
- fix: Dashboard Verbose Tracing Error @srilman (#5567)
- fix: Skip prompt metrics tests on ray runner @colin-ho (#5564)
📖 Documentation
- docs: Update AI functions usage patterns @everettVT (#5568)
🔧 Maintenance
Full Changelog: v0.6.13...v0.6.14
v0.6.13
What's Changed 🚀
💥 Breaking Changes
- refactor!: Remove support for creating File objects from bytes @universalmind303 (#5556)
✨ Features
- feat: Prompt metrics @colin-ho (#5549)
- feat: Async udf metrics @colin-ho (#5541)
- feat: Support specifying dimensions for text embedding @samstokes (#5543)
- feat: support customized retries error message of S3 request @stayrascal (#5447)
- feat: UDF metrics @colin-ho (#5507)
- feat: Support product @luoyuxia (#5515)
- feat: OTEL Metrics from Swordfish @srilman (#5454)
- feat: add tos object source @stayrascal (#5372)
- feat: Bind the name of the running UDF to the UDFActor @plotor (#5514)
- feat: Support text documents in prompt @colin-ho (#5520)
🐛 Bug Fixes
- fix: sorting on a literal value and aggregation with order-by @kevinzwang (#5547)
- fix: Limit with Offset returns unexpected result when reading Lance dataset @plotor (#5540)
- fix: Add absolute diff threshold to embed_text integration test @colin-ho (#5527)
- fix: test_embed_text_with_none_values with the OpenAI provider fails @desmondcheongzx (#5534)
- fix: Check for numpy dependency in prompt @colin-ho (#5521)
- fix: Handle Nones when embedding text with openai @desmondcheongzx (#5513)
- fix: Fix broken benchmark blog link @colin-ho (#5522)
♻️ Refactor
- refactor!: Remove support for creating File objects from bytes @universalmind303 (#5556)
📖 Documentation
- docs: fix migration guide @kevinzwang (#5563)
- docs: standardize key features casing to sentence case @ykdojo (#5559)
- docs: legacy UDF migration guide @kevinzwang (#5562)
- docs: simplify getting started tip in introduction @ykdojo (#5560)
- docs: update blog icon from bookmark to blog @ykdojo (#5557)
🔧 Maintenance
Full Changelog: v0.6.12...v0.6.13
v0.6.12
What's Changed 🚀
✨ Features
🐛 Bug Fixes
🚀 Performance
📖 Documentation
👷 CI
Full Changelog: v0.6.11...v0.6.12
v0.6.11
What's Changed 🚀
✨ Features
- feat: PostgresCatalog and PostgresTable followups @desmondcheongzx (#5508)
- feat: Add Catalog and Table implementations for PostgreSQL @desmondcheongzx (#5487)
- feat: make maintain_order configurable @stayrascal (#5505)
- feat: chat completions api for prompt function @colin-ho (#5497)
🐛 Bug Fixes
Full Changelog: v0.6.10...v0.6.11
v0.6.10
What's Changed 🚀
✨ Features
- feat: add --addr flag to daft-dashboard cli @VOID001 (#5444)
- feat: Support multiple image and file inputs for prompt function @colin-ho (#5481)
🐛 Bug Fixes
- fix: removes checking model directly for embedding dimensions @rchowell (#5445)
- fix: return-dtype for embed_text/image @universalmind303 (#5496)
- fix: Lower json inflation factor @colin-ho (#5461)
📖 Documentation
- docs: adds daft.func and daft.cls usage with migration page @everettVT (#5475)
🔧 Maintenance
- chore: Drop Python 3.9 @srilman (#5479)
- chore: remove extra from build command @stayrascal (#5493)
Full Changelog: v0.6.9...v0.6.10
v0.6.9
What's Changed 🚀
✨ Features
- feat: experimental vllm provider @kevinzwang (#5443)
- feat: Common Crawl x Daft tutorial using Qwen3 @malcolmgreaves (#5472)
- feat: better display for FileArrays @universalmind303 (#5482)
- feat: add a new subtype of file for video ops @universalmind303 (#5346)
- feat(dashboard): Starting Page for No Queries @srilman (#5452)
- feat: Flotilla OTEL Stats @srilman (#5463)
- feat: JSON Serialization for All Plans @srilman (#5356)
- feat: add mechanism for creating async rust based functions @universalmind303 (#5455)
- feat: Async batch func @colin-ho (#5459)
🐛 Bug Fixes
- fix: Don't update lockfile in package builds @srilman (#5484)
- fix: Undetach flotilla runner @colin-ho (#5473)
🚀 Performance
- perf: make DataType.infer 99.99% faster for core datatypes (image, file) @universalmind303 (#5469)
📖 Documentation
👷 CI
- ci: fix bun install by using node 20 @kevinzwang (#5491)
🔧 Maintenance
Full Changelog: v0.6.8...v0.6.9
v0.6.8
What's Changed 🚀
✨ Features
- feat: Allow images in prompt @colin-ho (#5466)
- feat: add pre-existence checks for lance_data_sink @huleilei (#5381)
- feat: Support setting
actor_udf_ready_timeoutvia Env @plotor (#5426) - feat: retryable udfs @universalmind303 (#5392)
- feat: Add a Bigtable data sink @desmondcheongzx (#5431)
- feat: classify_image expression @universalmind303 (#5428)
- feat: Add support for Metrics tab in quickstart Ray dashboard @jeevb (#5429)
- feat(lance): distributed FTS index creation via Daft UDF with fragment-level parallelism @huleilei (#5236)
- feat: add mimetype detection for daft.file @universalmind303 (#5411)
🐛 Bug Fixes
- fix: report a more reasonable error message when select * from <some_keywords> @VOID001 (#5440)
- fix: Fix async udf with
use_process@colin-ho (#5457) - fix: Drop table error in current active session @plotor (#5439)
- fix: add retry on "unable to open file" @kevinzwang (#5442)
- fix: Allow publishing quickstart helm chart to GHCR @jeevb (#5437)
- fix: convert num_rows to int when query count(*) from clickhouse @dujl (#5421)
- fix: Actually clone the repo before publishing quickstart helm chart @jeevb (#5433)
- fix: file reads for huggingface @universalmind303 (#5427)
- fix: Make benchmarking Ray cluster setup commands idempotent @jeevb (#5425)
- fix(lance): correct limit pushdown semantics with filters @huleilei (#5408)
🚀 Performance
- perf: Call async udfs asynchronously @colin-ho (#5451)
- perf: Elide shuffle for window if already partitioned @colin-ho (#5450)
- perf: defer allocation when creating series from literals @universalmind303 (#5391)
- perf: Double workers per udf actor handle @colin-ho (#5415)
♻️ Refactor
- refactor: Make helper function for calling async python functions from rust @colin-ho (#5432)
- refactor: combine sentence_transformers + transformers, and clean up … @universalmind303 (#5422)
📖 Documentation
- docs: adds ai functions, ai providers, contributing, and docstrings with nav @everettVT (#5438)
- docs: warning for Common Crawl dataset API instability @malcolmgreaves (#5436)
- docs: update the example to access S3-compatible services @huleilei (#5405)
- docs(connectors): add connector page for Lance format @huleilei (#5397)
Full Changelog: v0.6.7...v0.6.8
v0.6.7
What's Changed 🚀
💥 Breaking Changes
- feat!: Catch transient errors on turbopuffer writes @desmondcheongzx (#5380)
✨ Features
- feat: add viz for embedding @samster25 (#5419)
- feat!: Catch transient errors on turbopuffer writes @desmondcheongzx (#5380)
- feat(dashboard): Cleanup Queries Page @srilman (#5416)
- feat: Extend hash variants for xxhash @srilman (#5276)
- feat: prompt @colin-ho (#5394)
- feat: Add case function for better SQL-style conditional expressions @rasanpreetsingh3 (#5383)
🐛 Bug Fixes
- fix: Reduce number udfs by 1 in multi udf test @colin-ho (#5414)
- fix: Wrap azure fsspec in pafs.FSSpecHandler @colin-ho (#5412)
- fix(flotilla): Set flotilla actor cpu requests to 1 @colin-ho (#5404)
- fix: Fix prompt integration tests @colin-ho (#5401)
- fix: Fix Operator Finalization in Swordish Stat Manager @srilman (#5398)
🚀 Performance
♻️ Refactor
📖 Documentation
- docs: Fix some document errors @plotor (#5409)
- docs: update minhash example to use cc dataset @everettVT (#5390)
- docs: fix daft.File usage examples @kevinzwang (#5403)
👷 CI
- ci: Remove Tests for the Old Ray Runner @srilman (#5374)
- ci: disable running tpch profiling on push @kevinzwang (#5384)
🔧 Maintenance
- chore: bump pyo3 dependency @universalmind303 (#5410)
- chore: revert #5383 @kevinzwang (#5396)
- chore: optimize operator naming @Jay-ju (#5204)
Full Changelog: v0.6.6...v0.6.7
v0.6.6
What's Changed 🚀
💥 Breaking Changes
- docs!: update docstrings for various functions @universalmind303 (#5344)
✨ Features
- feat: Explicit AWS vs. HTTP mode for common crawl dataset @malcolmgreaves (#5379)
- feat: pydantic model type conversion @kevinzwang (#5370)
- feat(dashboard): Individual Query Page @srilman (#5367)
- feat(flotilla): Flotilla sort merge join @colin-ho (#5369)
- feat: batch UDF with
@daft.func.batch@kevinzwang (#5362) - feat(dashboard): Queries Page @srilman (#5257)
- feat: more tensor conversions @kevinzwang (#5357)
- feat: @daft.cls decorator for new class UDFs @kevinzwang (#5350)
- feat: Detect concurrency / num gpus for model apis @colin-ho (#5342)
- feat: Flotilla linear scheduler @colin-ho (#4378)
- feat: Lazy
from_glob_path@colin-ho (#5235)
🐛 Bug Fixes
- fix: Use sum supertype for
list_sumtype inference @colin-ho (#5366) - fix: Use default io config in read video if not passed in @colin-ho (#5364)
- fix: support serialize and deserialize LazyImport @stayrascal (#5361)
- fix(file): python expects bytes instead of None @universalmind303 (#5348)
- fix: read_video_frames handles EOF gracefully @rchowell (#5343)
🚀 Performance
- perf(flotilla): Throttle worker refresh and autoscaling @colin-ho (#5351)
- perf: Elide shuffle for distinct if input is already partitioned @colin-ho (#5354)
- perf: use bincode instead of python for io_conf serialization in FileArray @universalmind303 (#5340)
- perf: Only Serialize Required Cols in Process UDFs @srilman (#5069)
📖 Documentation
- docs: Update Common Crawl Dataset docs to make AWS region explicit @desmondcheongzx (#5373)
- docs: add Flotilla blog post links to AI benchmarks @ykdojo (#5359)
- docs: Revamp optimization docs @colin-ho (#5347)
- docs!: update docstrings for various functions @universalmind303 (#5344)
👷 CI
- ci: Fix ai benchmark workflow @colin-ho (#5363)
- ci: Pin pydantic version in
provision.pyfor iceberg tests @colin-ho (#5352) - ci: Add ai benchmarks ci @colin-ho (#5337)
Full Changelog: v0.6.5...v0.6.6
v0.6.5
What's Changed 🚀
💥 Breaking Changes
- refactor!: make daft.File immutable @universalmind303 (#5288)
✨ Features
- feat: add
use_processflag for@daft.func(...)@universalmind303 (#5323) - feat: Dashboard Query Subscriber @srilman (#5266)
- feat: Subscriber Framework @srilman (#5210)
- feat: make file-array serializable @universalmind303 (#5304)
- feat: add count() pushdown optimization in Iceberg datasource @huleilei (#5029)
🐛 Bug Fixes
- fix: Fix the make docs warnings @colin-ho (#5328)
- fix: Iterate on the patched [email protected] @desmondcheongzx (#5322)
- fix: decimal format for handling scientific notation @rchowell (#5303)
- fix: Use patched [email protected] for AKS Workload Identity credentials to continue working > 24 hours @desmondcheongzx (#5299)
🚀 Performance
- perf: more literal optimizations @universalmind303 (#5314)
- perf: Support parallel CSV parsing when files contain carriage returns @desmondcheongzx (#5319)
♻️ Refactor
- refactor!: make daft.File immutable @universalmind303 (#5288)
📖 Documentation
- docs: add casting matrix @kevinzwang (#5333)
- docs: add daft.func docs page and APIs @kevinzwang (#5335)
- docs: Update links for running Daft in distributed mode @desmondcheongzx (#5334)
- docs: Fix broken links on minhash example @colin-ho (#5326)
- docs: Add architecture docs @colin-ho (#5320)
- docs: Add docs to broken link checker @colin-ho (#5324)
- docs: Clean up AGENTS.md structure @ykdojo (#5321)
- docs: Add Kubernetes quickstart to Daft docs @jeevb (#5318)
- docs: Add docs and values reference for quickstart chart @jeevb (#5313)
- docs: Fix broken link in Common Crawl dataset docs @desmondcheongzx (#5301)
- docs: Document Common Crawl dataset @desmondcheongzx (#5300)
👷 CI
- ci: Only fail broken link checker on 404s @colin-ho (#5327)
- ci: fix property test column name @kevinzwang (#5325)
🔧 Maintenance
- chore: Refactor DistributedPipelineNode to implement TreeDisplay @srilman (#5315)
- chore: Enable interactive html for
df.__repr_html__@colin-ho (#5312)
Full Changelog: v0.6.4...v0.6.5