RFC to introduce an IO Client abstraction layer #5533
Replies: 1 comment
-
|
@desmondcheongzx Nice proposal. It's convenient to developers to extend more storage backend. There are some questions about the proposed architectures mentioned above all:
Take a summary, currently Daft might need a object interface not a fs interface, actually I think the current design of ObjectSource trait is concise(might need improve the docs to describe its semantic), we can reuse the trait, and decouple the each storage backend to separate crates. But the |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Background
Daft currently has several custom IO connectors (S3, GCS, etc.), each integrated directly into the main codebase. This creates friction for contributors who wish to add new storage backends (like ByteDance’s TOS) or even proprietary IO clients, and increases the maintenance burden for the core team.
To improve modularity and sustainability, we propose a unified IO abstraction layer based on fsspec or OpenDAL, which allows third-party connectors to live and evolve independently in an external library outside of Daft core.
Goals
Decouple core Daft from storage backends
Core Daft should depend only on an abstract file system interface, and potentially a select few IO clients like S3 that we can test and maintain easily.
Empower external contributors
Developers can build and publish new connectors like daft-tos or daft-hdfs as standalone packages.
Leverage community ecosystems
Use existing fsspec/OpenDAL backends where possible, and provide a thin Daft-specific adapter.
Simplify long-term maintenance
New storage systems can be added without changing Daft’s main codebase. Key stakeholders can also own and iterate on their connectors with full discretion.
Proposed Architecture
Option A: fsspec-based
Option B: OpenDAL-based
Migration Plan
Phase 1: Maintain existing built-in clients for S3, GCS, etc.
Phase 2: Introduce experimental fsspec/OpenDAL adapter and migrate TOS client onto it.
Phase 3: Deprecate built-in clients once equivalent external connectors exist.
Phase 4: Document the plugin interface and create a registry for community-maintained connectors.
Next Steps
Align on fsspec vs. OpenDAL as the preferred base layer.
Define a minimal interface (open/read/write/list) Daft expects.
Collaborate with ByteDance to pilot the first external connector (daft-tos).
Publish a design doc + example repo to guide future contributors.
Beta Was this translation helpful? Give feedback.
All reactions