-
Notifications
You must be signed in to change notification settings - Fork 2k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Feature request
Which Delta project/connector is this regarding?
Feature request
Which Delta project/connector is this regarding?
- Spark
- Standalone
- Flink
- Kernel
- Other (fill in here)
Overview
Improve VACUUM progress logging and metrics to identify how VACUUM is progressing when a user runs
- FULL VACUUM (filesystem-based directory listing) to find eligible stale files not reference in the tx. log
- LITE VACUUM (Delta log–based scan of RemoveFile / CDF actions),
Motivation
Today when user triggers a VACUUM command, we only get the below logs:
25/12/02 04:51:29 INFO VacuumCommand: Starting garbage collection (dryRun = false) of untracked files older than 2 Dec 2025 04:51:29 GMT in hdfs://hadoop.spark:9000/tmp/delta_vacuum_progress
25/12/02 04:51:41 INFO VacuumCommand: Deleting untracked files and empty directories in hdfs://hadoop.spark:9000/tmp/delta_vacuum_progress. The amount of data to be deleted is 447370000324 (in bytes)
-- After ~1 hr 30 minutes ----
25/12/02 06:21:45 INFO VacuumCommand: Deleted 250000 files (44737324 bytes) and directories in a total of 1 directories. Vacuum stats: DeltaVacuumStats(false,Some(0),604800000,1764651089738,1,2500,2500,44737324,10938,1886,1764651089735,1764651105425,0,50,Some(0),Some(50),LITE)
From the above log the VACUUM completed successfully after 90 min , but during that time there is no log which tells us what is happening or how may files have been listed so far which makes it difficult to understand about the progress of the VACUUM command
Proposed improvements
- A progress thread which monitors the progress of the VACUUM command and provides the no of files listed every 10 minutes by default so that we know what exactly is happening in the backend.
- This must be logged in our logs so that we can have a track of the time taken vs no of files listed.
- Also the implementation must be different for VACUUM_FULL vs LITE.
Willingness to contribute
The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?
- Yes. I can contribute this feature independently.
- Yes. I would be willing to contribute this feature with guidance from the Delta Lake community.
- No. I cannot contribute this feature at this time.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request