-
Notifications
You must be signed in to change notification settings - Fork 152
Description
Hey,
I am using the operator in version docker.io/lyft/flinkk8soperator:1355d206b5fb4efd6f6e4ccf24085a87a29443c5.
Running ok aws eks version 1.21.
Sometimes The job manager floods the log with this message and when it starts, I am unable to redeploy the flinkapp without reaching the "DeployFailed" state
log: 2022-07-04 06:03:35,466 ERROR org.apache.flink.runtime.rest.handler.job.JobDetailsHandler [] - Exception occurred in REST handler: Job <HASH> not found
at the same time, task manager does not have any logs in it (makes sense)
in the operator logs I see the below log for multiple flink apps:
{"json":{"app_name":"esp-process-666","ns":"int-streaming","phase":"Running"},"level":"warning","msg":"Failed to reconcile resource <NAMESPACE>/<APP NAME>: GetJobOverview call failed with status 404 Not Found and message ''","ts":"2022-07-04T06:08:35Z"}
is this a known issue?
how do I recover from this without deleting and redeploying the flink app?