Skip to content

Conversation

@allisonport-db
Copy link
Collaborator

@allisonport-db allisonport-db commented Dec 2, 2025

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

PART OF #5326

Contains the following changes:

  • Removes Spark 3.5 support
  • Adds explicit Spark 4.0 support
  • Removes a "master" build for now
  • Merges shims from the 3.5 vs 4.0 breaking changes into the src code

In a future PR

  • we will add Spark 4.1.0-SNAPSHOT support (in preparation for the Spark 4.1 release)
  • we will add back a "master" build tracking Spark master
    (these will require adding new shims, but in different areas)

How was this patch tested?

Unit tests + ran integration tests locally (python, scala + pip)

Tracking open TODOs at #5326

@allisonport-db allisonport-db changed the title [Spark][Infra][WIP] Drop support for Spark 3.5 in master [Spark][Infra] Drop support for Spark 3.5 in master Dec 3, 2025
numberOfAddFiles = checkpointDataIter.getNumberOfAddActions();
} catch (FileAlreadyExistsException faee) {
throw new CheckpointAlreadyExistsException(version);
} catch (IOException io) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upgrading the hadoop version changes this error class

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm .. I wonder what the change was?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the change in hadoop but instead of seeing a FileAlreadyExistsException we see a IOException with cause FileAlreadyExistsException. We have this tested (at least one test fails w/out this fix here)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They seem similar enough so didn't look further into it. Seems like a minor API difference.

@allisonport-db
Copy link
Collaborator Author

In theory I could do the src code shims + test code shims separately if that would help. Let me know if that makes reviews easier (not sure if anyone wants to review the shim code changes closely, or if tests pass & code compiles that's enough).

echo "❌ Cache MISS - will download dependencies"
fi
- name: Run tests
# Run unit tests with JDK 17. These unit tests depend on Spark, and Spark 4.0+ is JDK 17.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this comment be better placed besides java-version: "17" ?

@@ -1,59 +0,0 @@
name: "Delta Spark Master"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we update the PR title to say drop support for spark 3.5 and spark master compilation ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean we haven't actually been compiling with spark master in a while... (as we're using a very stale snapshot). But I can make the title more clear

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Sorry, I'm still confused. Here we are deleting our job to compile against spark "master" right? (perhaps it was a stale master ..)

But does Drop support for Spark 3.5 and formally pin to released Spark 4.0.1 reflect that?

That seems like an important highlight, sorry, and I want to make sure my understanding is correct

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think calling it spark master before was misleading, in fact, in the previous PR we renamed the spark version spec to spark40Snapshot instead of master. I think saying we are removing spark master is misleading considering we never were compiling with Spark master. We will be fixing that in future PRs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more correct to say spark_master_test.yaml was incorrectly named this whole time.

@allisonport-db allisonport-db changed the title [Spark][Infra] Drop support for Spark 3.5 in master [Spark][Infra] Drop support for Spark 3.5 and formally pin to released Spark 4.0.1 Dec 6, 2025

// Changes in 4.1.0
// TODO: change in type hierarchy due to removal of DeltaThrowableConditionShim
ProblemFilters.exclude[MissingTypesProblem]("io.delta.exceptions.*")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reviewers this seems safe to me, considering no one should be catching DeltaThrowableConditionShim... but would like additional opinions

).configureUnidoc()

/*
TODO: readd delta-iceberg on Spark 4.0+
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lzlfred Hey Fred, we will be releasing in on both Spark 4.0 and Spark 4.1 next release, we will need to update this build to work for that

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also tracking the todo at #5326

).configureUnidoc()

/*
TODO: compilation broken for Spark 4.0
Copy link
Collaborator Author

@allisonport-db allisonport-db Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tracking at #5326

@linzhou-db @littlegrasscao FYI can you please look into fixing this once I merge this PR

val lookupSparkVersion: PartialFunction[(Int, Int), String] = {
// version 4.0.0-preview1
case (major, minor) if major >= 4 => "4.0.0-preview1"
// TODO: how to run integration tests for multiple Spark versions
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tracking at #5326

with open("python/README.md", "r", encoding="utf-8") as fh:
long_description = fh.read()

# TODO: once we support multiple Spark versions update this to be compatible with both
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tracking at #5326

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants