overview
Jobs that enable DELETE_ON_CANCELLATION for externalized checkpoints will fail during upgrades if the operator attempts to find an externalized checkpoint. The checkpoint directory exists but the _metadata file has been deleted and the job fails to start as its unable to find the _metadata file.
When looking for externalized checkpoints, we should ensure that there is a _metadata file before starting the job with it