-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[AI-6250] Add support for bhist metrics #22030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files🚀 New features to boost your workflow:
|
| if start_time == end_time: | ||
| self.log.trace("Start time %s is equal to end time %s, going back 1 minute", start_time, end_time) | ||
| # the highest granularity is 1 minute, so we need to go back 1 minute if collection interval < 60 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this has a 1 minute granularity, would this be treated in the same way as a badmin perfmon? As in would this normally be set up in an instance with a min collection interval of 60?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, I mentioned this dilemma in the description- I opted to choose to run it at the default collection interval as I didn't want to make the configuration more complex and I wanted these metrics to be enabled by default. But if we tell the user to put it in the instance with badmin_perfmon then that would be ideal solution, just a little more complicated for the user to setup. I don't have a strong preference so I'm ok with having these off by default and recommended to be collected with perfmon!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong preference either. But I guess if it's not detrimental, we can err on the side of simplicity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking about it more and I think I prefer running it with badmin perfmon as we'll be making unnecessary queries otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I updated the default, the documentation is almost ready as well: #22069. Thanks!
Review from cswatt is dismissed. Related teams and files:
- documentation
- ibm_spectrum_lsf/assets/configuration/spec.yaml
- ibm_spectrum_lsf/datadog_checks/ibm_spectrum_lsf/data/conf.yaml.example
What does this PR do?
Add support for metrics from bhist, or completed job metrics.
We will keep track of the last check run time so that we only calculate metrics for jobs that recently completed. There might be some overlap, however, as the bhist command only supports a time range with a 1 minute granularity, so if the collection interval is less than a minute, we will get metrics from the past minute. If averaging by
job IDthis should not impact metric correctness. However, we could alternatively suggest that customers set a higher collection interval for bhist metrics.Motivation
Review checklist (to be filled by reviewers)
qa/skip-qalabel if the PR doesn't need to be tested during QA.backport/<branch-name>label to the PR and it will automatically open a backport PR once this one is merged