-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
Feature request
Which Delta project/connector is this regarding?
- Spark
- Standalone
- Flink
- Kernel
- Other (fill in here)
Overview
Improve observability of Delta Lake maintenance operations by adding an explicit type for VACUUM commands in the Delta transaction log to differentiate between VACUUM FULLL or VACUUM LITE
This would allow users and tooling that read the transaction log to distinguish VACUUM vs VACUUM LITE runs in a simple, structured way, without having to infer the type of vacuum from other parameters or engine-specific behavior.
Motivation
Today, VACUUM operations are logged in the Delta transaction log via commitInfo entries such as:
- operation = "VACUUM START" / "VACUUM END"
- operationParameters containing retention and other runtime parameters
However, there is no explicit, stable field that indicates whether the command that produced those commits was:
VACUUM <table> RETAIN N HOURS (full vacuum), or
VACUUM <table> LITE RETAIN N HOURS (lite vacuum).
Further details
-
The idea would be to add an explicit parameter in the operationParameters called opType for a VACUUM_START commit with the values being one of
- FULL (for VACUUM)
- LITE (for VACUUM LITE)
-
The expected output should look something like
+------------+---------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|operation |operationParameters |operationMetrics |
+------------+---------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
|VACUUM START|{"opType":"LITE","defaultRetentionMillis":604800000,"retentionCheckEnabled":false,"specifiedRetentionMillis":0}|{"numFilesToDelete":"0","sizeOfDataToDelete":"0"} |
|VACUUM END |{"status":"COMPLETED"} |{"numDeletedFiles":"0","numVacuumedDirectories":"1"}|
+------------+---------------------------------------------------------------------------------------------------------------+----------------------------------------------------+
Compatibility / impact
- Backward compatible: only adds an optional parameter; existing readers will continue to work.
- Low risk: does not change VACUUM behavior, only the metadata stored in the transaction log.
- High value for observability: simplifies downstream log processing, monitoring, and governance checks.
Willingness to contribute
The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?
- Yes. I can contribute this feature independently.
- Yes. I would be willing to contribute this feature with guidance from the Delta Lake community.
- No. I cannot contribute this feature at this time.