-
Notifications
You must be signed in to change notification settings - Fork 694
Fix istio cert rotation bug for issue #4744 #5820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release/v3.9
Are you sure you want to change the base?
Fix istio cert rotation bug for issue #4744 #5820
Conversation
Signed-off-by: Jonathan Bailey <[email protected]>
3d181ae to
c2e36f2
Compare
|
Hey @jonathanelbailey, are you on the CNCF Slack by any chance? I'd like to chat about this one... 🙂 |
kflynn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaning toward taking this but there are some corner cases I'm thinking about -- nice work in any case, many thanks! 🙂 I'm most curious, I think, about the test case, and about what testing you've put this through in general?
Thanks!! and happy to talk on Slack if you'd rather.
| # We have a cache. Start by assuming that we'll need to reset it, | ||
| # OK. If we don't have a cache and there are no deltas, just skip all this crap. | ||
| if (cache and fetcher.deltas) is not None: | ||
| # We have a cache and deltas is non null. Start by assuming that we'll need to reset it, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that fetcher.deltas is indeed always non-None, I think that this comment and the default for reset_cache below need changing. Probably we can just default reset_cache to True.
| # Yes. We're going to walk over them all and assemble a list | ||
| # of things to delete and a count of errors while processing our | ||
| # list. | ||
| # Yes. We're going to walk over them all and assemble a list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Yes. We're going to walk over them all and assemble a list | |
| # We're going to walk over all the deltas and assemble a list |
| ] | ||
| ) | ||
| def test_check_deltas(self, name, cache_entry, deltas, expected, caplog): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silly question: does this test fail without your change? 🙂
| # OK. If we don't have a cache and there are no deltas, just skip all this crap. | ||
| if (cache and fetcher.deltas) is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Restoring the comment that @the-wondersmith accidentally deleted 😂):
This logic feels a little weird – overall I think you're going for 'if we have a cache and there are deltas", and I think that'd be better expressed as
if (cache is not None) and fetcher.deltas:
My reasoning is that fetcher.deltas actually cannot ever be None – it's either an empty list or a list with some things in it.
|
@jonathanelbailey Hello! Would you be willing to take a look at @kflynn's comments? Additionally, could you please target this MR at the |
Description
This PR fixes a known issue when Emissary is configured to connect to an Istio mTLS network. When Emissary's Istio sidecar initiates a cert rotation the
istio-certssecret generates a cache entry, but does not generate a delta since theistio-certscache entry does not map to an actual resource on the Kubernetes cluster. This can eventually resolve itself if a delta is generated that can initiate a change, but other times it results in an unrecoverable error that can cause traffic disruption without custom health checks.Related Issues
#4744
Testing
Checklist
Does my change need to be backported to a previous release?
I made sure to update
CHANGELOG.md.Remember, the CHANGELOG needs to mention:
This is unlikely to impact how Ambassador performs at scale.
Remember, things that might have an impact at scale include:
My change is adequately tested.
Remember when considering testing:
I updated
CONTRIBUTING.mdwith any special dev tricks I had to use to work on this code efficiently.The changes in this PR have been reviewed for security concerns and adherence to security best practices.