Skip to content

Commit a7ee017

Browse files
committed
Order tasks and README
Signed-off-by: Alberto Gutierrez <[email protected]>
1 parent 96c3575 commit a7ee017

File tree

137 files changed

+474
-40
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

137 files changed

+474
-40
lines changed

evals/claude-code/eval-inline.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ config:
1212
apiKeyKey: JUDGE_API_KEY
1313
modelNameKey: JUDGE_MODEL_NAME
1414
taskSets:
15-
- glob: ../tasks/*/*.yaml
15+
- glob: ../tasks/*/*/*.yaml
1616
assertions:
1717
toolsUsed:
1818
- server: kubernetes

evals/claude-code/eval.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ config:
1212
apiKeyKey: JUDGE_API_KEY
1313
modelNameKey: JUDGE_MODEL_NAME
1414
taskSets:
15-
- glob: ../tasks/*/*.yaml
15+
- glob: ../tasks/*/*/*.yaml
1616
assertions:
1717
toolsUsed:
1818
- server: kubernetes

evals/openai-agent/eval-inline.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ config:
1616
apiKeyKey: JUDGE_API_KEY
1717
modelNameKey: JUDGE_MODEL_NAME
1818
taskSets:
19-
- glob: ../tasks/*/*.yaml
19+
- glob: ../tasks/*/*/*.yaml
2020
assertions:
2121
toolsUsed:
2222
- server: kubernetes

evals/openai-agent/eval.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ config:
1212
apiKeyKey: JUDGE_API_KEY
1313
modelNameKey: JUDGE_MODEL_NAME
1414
taskSets:
15-
- glob: ../tasks/*/*.yaml
15+
- glob: ../tasks/*/*/*.yaml
1616
assertions:
1717
toolsUsed:
1818
- server: kubernetes

evals/tasks/README.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# MCP Task Library
2+
3+
This directory hosts the reusable task scenarios that power MCP evaluations for the Kubernetes MCP Server. Each task captures a realistic cluster workflow (setup, agent-driven actions, verification, and cleanup) so different agents can be compared against the same benchmark.
4+
5+
## Task Families
6+
7+
- [Kubernetes tasks](kubernetes/) – core cluster workflows such as creating pods, fixing deployments, managing RBAC, or debugging state issues.
8+
- [Kiali tasks](kiali/) – service-mesh and observability workflows that exercise the Kiali MCP toolset (Istio config, topology, mesh health, tracing).
9+
10+
## Anatomy of a Task
11+
12+
Every subdirectory under `kubernetes/` or `kiali/` defines a single scenario:
13+
14+
1. `*.yaml` – declarative description consumed by the evaluation harness (prompts, success criteria, required tools).
15+
2. `setup.sh` / `verify.sh` / `cleanup.sh` – shell hooks (optional) that prime the cluster, assert post-conditions, and reset resources so tasks stay idempotent.
16+
3. `artifacts/` – supporting manifests, scripts, or payloads referenced by the task definition.
17+
18+
## Adding a New Task
19+
20+
1. Pick the closest family and create a new subfolder.
21+
2. Author the task YAML referencing MCP tools, expected observations, and any artifacts.
22+
3. Provide helper scripts if the scenario needs deterministic setup or verification.
23+
4. Document nuances in a local `README.md` so future contributors and eval authors can replay the scenario manually.
24+
25+
Well-scoped, deterministic tasks make it easier to compare agents and regressions over time, so keep inputs minimal, outputs explicit, and always clean up what you create.
26+
27+
## Adding a New Task Stack
28+
29+
When a new MCP toolset lands , keep its evaluations isolated by creating a sibling directory under `tasks/` named after the toolset (`tasks/<name>>`, etc.). Populate it with:
30+
31+
1. A scoped `README.md` describing the toolset focus and prerequisite context.
32+
2. One subfolder per scenario that follows the same layout described above (`*.yaml`, scripts, `artifacts/`).
33+
3. Any shared fixtures the stack needs (place them in a `shared/` subdirectory if multiple scenarios reuse them).
34+
35+
This structure keeps task stacks discoverable and lets eval harnesses target toolset-specific workflows without mixing concerns from the core Kubernetes or Kiali libraries.
36+

evals/tasks/kiali-obs-unhealthy-namespaces/kiali-obs-unhealthy-namespaces.yaml

Lines changed: 0 additions & 17 deletions
This file was deleted.

evals/tasks/kiali/Makefile

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
SHELL := /usr/bin/bash
2+
3+
.PHONY: update-tasks
4+
5+
update-tasks:
6+
@./scripts/update_tasks.sh

evals/tasks/kiali/README.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# Kiali Task Stack
2+
3+
Kiali-focused MCP tasks live here. Each folder under this directory represents a self-contained scenario that exercises the Kiali toolset (Istio config, topology, observability, troubleshooting).
4+
5+
## Adding a New Task
6+
7+
1. Create a new subdirectory (e.g., `status-foo/`) and place the scenario YAML plus any helper scripts or artifacts inside it.
8+
2. Make sure the YAML’s `metadata` block includes `name`, `category`, and `difficulty` so it shows up correctly in the catalog below.
9+
3. Keep prompts concise and action-oriented; verification commands should rely on Kiali MCP tools whenever possible.
10+
11+
## Updating the Catalog
12+
13+
After adding or editing tasks, regenerate this README’s catalog with:
14+
15+
```bash
16+
make update_tasks
17+
```
18+
19+
The `update_tasks` target runs `scripts/update_tasks.sh`, which parses every scenario and rewrites the section below automatically. Always run it before committing so the list stays in sync.
20+
21+
## Tasks defined
22+
<!-- TASKS-START -->
23+
- High-Level Observability & Health
24+
- [easy] obs-unhealthy-namespaces (Unhealthy Namespaces)
25+
**Prompt:** *Are there any unhealthy namespaces in my mesh right now?*
26+
- [easy] show-topology (Show topology bookinfo)
27+
**Prompt:** *Show me the topology of the bookinfo namespace.*
28+
- [easy] status-kiali-istio (Status Kiali and Istio)
29+
**Prompt:** *Give me a status report on the interaction between Kiali and Istio components*
30+
- Istio Configuration & Management
31+
- [easy] istio-list (List all VS in bookinfo namespace)
32+
**Prompt:** *List all VirtualServices in the bookinfo namespace and check if they have any validation errors*
33+
- [medium] istio-create (Create a gateway)
34+
**Prompt:** *Create a Gateway named my-gateway in the istio-system namespace.*
35+
- [medium] istio-delete (Remove fault Injection)
36+
**Prompt:** *Fix my namespace bookinfo to remove the fault injection.*
37+
- [medium] istio-patch (Patch my traffic)
38+
**Prompt:** *I need to shift 50% of traffic to v2 of the reviews service. Apply a patch to the existing VirtualService.*
39+
- Resource Inspection
40+
- [easy] resource-get-namespaces (Get mesh namespaces)
41+
**Prompt:** *Check namespaces in my mesh.*
42+
- [easy] resource-get-service-detail (Get service detail)
43+
**Prompt:** *Get the full details and health status for the details service*
44+
- [easy] resource-list-workloads (List workloads without sidecar)
45+
**Prompt:** *List all workloads in the bookinfo namespace that have missing sidecars.*
46+
- [easy] resource-mesh-status (Status of my mesh)
47+
**Prompt:** *Check my mesh.*
48+
- Troubleshooting & Debugging
49+
- [easy] troubleshooting-latency-traces (Get latency workload)
50+
**Prompt:** *Analyze the latency for the reviews workload over the last 30 minutes?*
51+
- [easy] troubleshooting-log (Get log productpage due 500)
52+
**Prompt:** *Why is the productpage service returning 500 errors?*
53+
- [easy] troubleshooting-trace-lagging (Check traces for a service)
54+
**Prompt:** *I see a spike in duration for ratings. Can you check the traces to see which span is lagging?*
55+
<!-- TASKS-END -->
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
kind: Task
2+
metadata:
3+
name: "Remove fault Injection"
4+
difficulty: easy
5+
steps:
6+
setup:
7+
inline: |-
8+
#!/usr/bin/env bash
9+
set -euo pipefail
10+
cat <<'EOF' | kubectl apply -f -
11+
apiVersion: networking.istio.io/v1
12+
kind: DestinationRule
13+
metadata:
14+
namespace: bookinfo
15+
name: ratings
16+
labels:
17+
gevals.kiali.io/test: delete-fault-injection
18+
spec:
19+
host: ratings.bookinfo.svc.cluster.local
20+
subsets:
21+
- name: v1
22+
labels:
23+
version: v1
24+
---
25+
apiVersion: networking.istio.io/v1
26+
kind: VirtualService
27+
metadata:
28+
namespace: bookinfo
29+
name: ratings
30+
labels:
31+
gevals.kiali.io/test: delete-fault-injection
32+
spec:
33+
hosts:
34+
- ratings.bookinfo.svc.cluster.local
35+
http:
36+
- route:
37+
- destination:
38+
host: ratings.bookinfo.svc.cluster.local
39+
subset: v1
40+
weight: 100
41+
fault:
42+
abort:
43+
percentage:
44+
value: 100
45+
httpStatus: 503
46+
EOF
47+
verify:
48+
inline: |-
49+
#!/usr/bin/env bash
50+
vs_fault_names="$(kubectl get virtualservice -n "${NAMESPACE}" -o json \
51+
| jq -r '[.items[] | select(any(.spec.http[]?; has("fault"))) | .metadata.name] | .[]?')"
52+
if [[ -n "${vs_fault_names}" ]]; then
53+
exit 1
54+
fi
55+
56+
# Verify DestinationRule 'ratings' does not exist (created during setup)
57+
if kubectl get destinationrule ratings -n "${NAMESPACE}" >/dev/null 2>&1; then
58+
exit 1
59+
fi
60+
cleanup:
61+
inline: |-
62+
#!/usr/bin/env bash
63+
set -euo pipefail
64+
NS="bookinfo"
65+
LABEL="gevals.kiali.io/test=delete-fault-injection"
66+
kubectl delete virtualservice -n "$NS" -l "$LABEL" --ignore-not-found
67+
kubectl delete destinationrule -n "$NS" -l "$LABEL" --ignore-not-found
68+
prompt:
69+
inline: Fix my namespace bookinfo to remove the fault injection.
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
kind: Task
2+
metadata:
3+
name: "get-namespaces"
4+
difficulty: easy
5+
steps:
6+
setup:
7+
inline: |-
8+
#!/usr/bin/env bash
9+
verify:
10+
contains: "bookinfo"
11+
cleanup:
12+
inline: |-
13+
#!/usr/bin/env bash
14+
prompt:
15+
inline: Check namespaces in my mesh.

0 commit comments

Comments
 (0)