-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
Feature request
Which Delta project/connector is this regarding?
- Spark
- Standalone
- Flink
- Kernel
Overview
Improve observability of Delta Lake VACUUM operations by including the table identifier (table ID / table path) in all VACUUM‑related log messages. This helps users understand exactly which table is being vacuumed when multiple Delta tables are processed within the same job. Though we have the table path when the VACUUM job starts , it is not very debug friendly
Motivation
Currently, when user runs VACUUM on Multiple tables , the log messages do not include any table identifier. When multiple tables are processed in a single job, users cannot determine which table is being cleaned up from the logs alone:
Current logs:
25/11/27 05:05:14 INFO VacuumCommand: Starting garbage collection (dryRun = false) of untracked files older than 20 Nov 2025 05:05:14 GMT in /tmp/delta/vacuum
25/11/27 05:05:26 INFO VacuumCommand: Deleting untracked files and empty directories in /tmp/delta/vacuum. The amount of data to be deleted is 0 (in bytes)
25/11/27 05:05:29 INFO VacuumCommand: Deleted 0 files (0 bytes) and directories in a total of 1 directories. Vacuum stats: DeltaVacuumStats(false,None,604800000,1763615114082,1,4,0,0,10038,1158,1764219914030,1764219929008,8,8,8,false,0,0,7,None,None,FULL)
25/11/27 05:05:30 INFO VacuumCommand: Starting garbage collection (dryRun = false) of untracked files older than 20 Nov 2025 05:05:30 GMT in /tmp/delta/vacuum_2
25/11/27 05:05:36 INFO VacuumCommand: Deleting untracked files and empty directories in /tmp/delta/vacuum_2. The amount of data to be deleted is 10638 (in bytes)
25/11/27 05:05:37 INFO VacuumCommand: Deleted 9 files (10638 bytes) and directories in a total of 1 directories. Vacuum stats: DeltaVacuumStats(false,None,604800000,1763615130357,1,10,9,10638,4660,1394,1764219930305,1764219937437,8,8,8,false,0,0,31,None,None,FULL)
25/11/27 05:05:38 INFO VacuumCommand: Starting garbage collection (dryRun = false) of untracked files older than 20 Nov 2025 05:05:38 GMT in /tmp/delta/orders
25/11/27 05:05:44 INFO VacuumCommand: Deleting untracked files and empty directories in /tmp/delta/orders The amount of data to be deleted is 0 (in bytes)
25/11/27 05:05:45 INFO VacuumCommand: Deleted 0 files (0 bytes) and directories in a total of 1 directories. Vacuum stats: DeltaVacuumStats(false,None,604800000,1763615138428,1,4,0,0,4887,938,1764219938390,1764219945316,8,8,8,false,0,0,5,None,None,FULL)
The above logs make it hard to:
- Attribute long‑running VACUUM operations to specific tables
- Debug problems when a particular table’s VACUUM fails or behaves unexpectedly
- Distinguish multiple VACUUM runs in across different tables
Further details
The implementation would add a [tableId=<table_id>] to all the VACUUM logging messages . The table id is already available in VacuumCommand.scala class as
val snapshot = table.update()
deltaLog.protocolWrite(snapshot.protocol)
val tableId = snapshot.metadata.id
- All the changes are made to VacuumCommand.scala class
Expected output after the change:
INFO VacuumCommand: [tableId=abcd1234] Starting garbage collection ...
INFO VacuumCommand: [tableId=abcd1234] Deleting untracked files and empty directories in ...
INFO VacuumCommand: [tableId=abcd1234] Deleted N files (...) Vacuum stats: ...
Willingness to contribute
The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?
- Yes. I can contribute this feature independently.
- Yes. I would be willing to contribute this feature with guidance from the Delta Lake community.
- No. I cannot contribute this feature at this time.