Skip to content

Conversation

@sarah-witt
Copy link
Contributor

@sarah-witt sarah-witt commented Dec 2, 2025

What does this PR do?

Add support for metrics from bhist, or completed job metrics.
We will keep track of the last check run time so that we only calculate metrics for jobs that recently completed. There might be some overlap, however, as the bhist command only supports a time range with a 1 minute granularity, so if the collection interval is less than a minute, we will get metrics from the past minute. If averaging by job ID this should not impact metric correctness. However, we could alternatively suggest that customers set a higher collection interval for bhist metrics.

Motivation

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

@codecov
Copy link

codecov bot commented Dec 2, 2025

Codecov Report

❌ Patch coverage is 98.21429% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 89.06%. Comparing base (09942f6) to head (c274c61).
⚠️ Report is 16 commits behind head on master.

Additional details and impacted files
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sarah-witt sarah-witt marked this pull request as ready for review December 3, 2025 16:23
@sarah-witt sarah-witt requested review from a team as code owners December 3, 2025 16:23
@sarah-witt sarah-witt changed the title Add support for bhist metrics [AI-6250] Add support for bhist metrics Dec 3, 2025
cswatt
cswatt previously approved these changes Dec 3, 2025
Comment on lines +570 to +572
if start_time == end_time:
self.log.trace("Start time %s is equal to end time %s, going back 1 minute", start_time, end_time)
# the highest granularity is 1 minute, so we need to go back 1 minute if collection interval < 60
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this has a 1 minute granularity, would this be treated in the same way as a badmin perfmon? As in would this normally be set up in an instance with a min collection interval of 60?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, I mentioned this dilemma in the description- I opted to choose to run it at the default collection interval as I didn't want to make the configuration more complex and I wanted these metrics to be enabled by default. But if we tell the user to put it in the instance with badmin_perfmon then that would be ideal solution, just a little more complicated for the user to setup. I don't have a strong preference so I'm ok with having these off by default and recommended to be collected with perfmon!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong preference either. But I guess if it's not detrimental, we can err on the side of simplicity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about it more and I think I prefer running it with badmin perfmon as we'll be making unnecessary queries otherwise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I updated the default, the documentation is almost ready as well: #22069. Thanks!

@temporal-github-worker-1 temporal-github-worker-1 bot dismissed cswatt’s stale review December 5, 2025 21:00

Review from cswatt is dismissed. Related teams and files:

  • documentation
    • ibm_spectrum_lsf/assets/configuration/spec.yaml
    • ibm_spectrum_lsf/datadog_checks/ibm_spectrum_lsf/data/conf.yaml.example
@sarah-witt sarah-witt added this pull request to the merge queue Dec 5, 2025
Merged via the queue into master with commit 6d8718a Dec 5, 2025
57 checks passed
@sarah-witt sarah-witt deleted the sarah/add-bhist-metrics branch December 5, 2025 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants