-
Notifications
You must be signed in to change notification settings - Fork 8
NO-JIRA: Claude tool - etcd troubleshooting skill #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: fonta-rh The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@fonta-rh: This pull request explicitly references no jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
a4c5145 to
50f415d
Compare
e527c8e to
8b1da20
Compare
clobrano
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments
8b1da20 to
8f455fb
Compare
Relocate scripts and playbooks from .claude/commands/etcd/ to helpers/etcd/ so they can be used by any tool, not just the Claude skill. This aligns with the existing helpers/ directory structure. - Move 3 scripts to helpers/etcd/ - Move 2 playbooks to helpers/etcd/playbooks/ - Update internal path calculations (REPO_ROOT, playbook paths) - Update all documentation references 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add guidance on choosing between quick manual triage and full diagnostic collection. Quick triage is recommended for initial assessment, with the comprehensive script reserved for complex issues where root cause is unclear. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Instead of tracking a static copy of the podman-etcd resource agent, add a script to fetch it from the ClusterLabs repository when needed. This ensures the reference stays current with upstream changes. - Add helpers/etcd/fetch-podman-etcd.sh to fetch from GitHub - Add podman-etcd.txt to .gitignore - Update documentation references 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
The etcd container logs are not excessively large, so collect all of them for more complete diagnostics. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Instead of assuming the first node in inventory is the leader, the playbook now: 1. Checks which node has etcd running and is the actual leader 2. Falls back to first node with running etcd if no leader found 3. Falls back to inventory order only if no etcd is running This prevents data loss from incorrectly designating a follower as the recovery leader. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
clobrano
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments, but overall it looks good. Great work!
- Create .claude/settings.json with read-only diagnostic permissions - Update PERMISSIONS.md to reference actual config file - Permissions auto-approve safe operations: file reading, git queries, Ansible status checks, OpenShift read-only commands, diagnostic dirs Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add etcdctl member list as alternative diagnostic for split-brain - Remove obsolete "Learner Stuck" section (podman-etcd now handles promotion) - Renumber remaining sections (4-6) - Add prerequisite note about running Claude Code from repo directory Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add etcd troubleshooting skill for Claude Code
Adds a comprehensive Claude Code skill that helps troubleshoot etcd issues on two-node fencing clusters.
The skill enables automated diagnosis and remediation of common etcd/Pacemaker problems.
New feature: Claude Code Skill (.claude/commands/etcd/):
Diagnostic Tools:
Helper:
Documentation:
Tested with: