Skip to content

Conversation

@XiaoHongbo-Hope
Copy link
Contributor

@XiaoHongbo-Hope XiaoHongbo-Hope commented Jan 30, 2026

Problem

When the user updates a column for only one shard (e.g. ShardTableUpdator runs shard 0 only and writes new column d), full table read fails:

pyarrow.lib.ArrowInvalid: Schema at index 1 was different: d: int32 vs d: null

Only that shard’s files have the new column; other files do not. Concat batches → schema mismatch → crash. To fix the issue, we support data evolution shard read.

Tests

API and Format

Documentation

@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as ready for review January 30, 2026 08:19
@XiaoHongbo-Hope XiaoHongbo-Hope changed the title [python/hotfix] fix data-evolution read after partial shard update [python] support data evolution shard read Jan 31, 2026
@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as draft January 31, 2026 07:48
@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as ready for review January 31, 2026 08:54
@XiaoHongbo-Hope XiaoHongbo-Hope changed the title [python] support data evolution shard read [python] support read after update by shard of data evolution table Feb 1, 2026
row_tracking_enabled: bool,
system_fields: dict):
system_fields: dict,
requested_field_names: Optional[List[str]] = None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should just use fields: List[DataField]?

"""Ensure _ROW_ID and _SEQUENCE_NUMBER are not null (per SpecialFields)."""
fields = []
for field in schema:
if field.name == SpecialFields.ROW_ID.name or field.name == SpecialFields.SEQUENCE_NUMBER.name:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it can be nullable?

Copy link
Contributor Author

@XiaoHongbo-Hope XiaoHongbo-Hope Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it can be nullable?

A bug here. Nullable info of row-tracking system fields is lost during _assign_row_tracking. Opened a separate PR #7174 to fix it.

@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as draft February 3, 2026 03:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants