Skip to content

Exception occurred in REST handler: Job X not found #256

@liad5h

Description

@liad5h

Hey,

I am using the operator in version docker.io/lyft/flinkk8soperator:1355d206b5fb4efd6f6e4ccf24085a87a29443c5.
Running ok aws eks version 1.21.

Sometimes The job manager floods the log with this message and when it starts, I am unable to redeploy the flinkapp without reaching the "DeployFailed" state

log: 2022-07-04 06:03:35,466 ERROR org.apache.flink.runtime.rest.handler.job.JobDetailsHandler [] - Exception occurred in REST handler: Job <HASH> not found

at the same time, task manager does not have any logs in it (makes sense)

in the operator logs I see the below log for multiple flink apps:
{"json":{"app_name":"esp-process-666","ns":"int-streaming","phase":"Running"},"level":"warning","msg":"Failed to reconcile resource <NAMESPACE>/<APP NAME>: GetJobOverview call failed with status 404 Not Found and message ''","ts":"2022-07-04T06:08:35Z"}

is this a known issue?
how do I recover from this without deleting and redeploying the flink app?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions