Skip to content

Commit d6d2108

Browse files
committed
Order tasks and README
Signed-off-by: Alberto Gutierrez <[email protected]>
1 parent 96c3575 commit d6d2108

File tree

132 files changed

+245
-558
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

132 files changed

+245
-558
lines changed

evals/claude-code/eval-inline.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ config:
1212
apiKeyKey: JUDGE_API_KEY
1313
modelNameKey: JUDGE_MODEL_NAME
1414
taskSets:
15-
- glob: ../tasks/*/*.yaml
15+
- glob: ../tasks/*/*/*.yaml
1616
assertions:
1717
toolsUsed:
1818
- server: kubernetes

evals/claude-code/eval.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ config:
1212
apiKeyKey: JUDGE_API_KEY
1313
modelNameKey: JUDGE_MODEL_NAME
1414
taskSets:
15-
- glob: ../tasks/*/*.yaml
15+
- glob: ../tasks/*/*/*.yaml
1616
assertions:
1717
toolsUsed:
1818
- server: kubernetes

evals/openai-agent/eval-inline.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ config:
1616
apiKeyKey: JUDGE_API_KEY
1717
modelNameKey: JUDGE_MODEL_NAME
1818
taskSets:
19-
- glob: ../tasks/*/*.yaml
19+
- glob: ../tasks/*/*/*.yaml
2020
assertions:
2121
toolsUsed:
2222
- server: kubernetes

evals/openai-agent/eval.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ config:
1212
apiKeyKey: JUDGE_API_KEY
1313
modelNameKey: JUDGE_MODEL_NAME
1414
taskSets:
15-
- glob: ../tasks/*/*.yaml
15+
- glob: ../tasks/*/*/*.yaml
1616
assertions:
1717
toolsUsed:
1818
- server: kubernetes

evals/tasks/README.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# MCP Task Library
2+
3+
This directory hosts the reusable task scenarios that power MCP evaluations for the Kubernetes MCP Server. Each task captures a realistic cluster workflow (setup, agent-driven actions, verification, and cleanup) so different agents can be compared against the same benchmark.
4+
5+
## Task Families
6+
7+
- [Kubernetes tasks](kubernetes/) – core cluster workflows such as creating pods, fixing deployments, managing RBAC, or debugging state issues.
8+
- [Kiali tasks](kiali/) – service-mesh and observability workflows that exercise the Kiali MCP toolset (Istio config, topology, mesh health, tracing).
9+
10+
## Anatomy of a Task
11+
12+
Every subdirectory under `kubernetes/` or `kiali/` defines a single scenario:
13+
14+
1. `*.yaml` – declarative description consumed by the evaluation harness (prompts, success criteria, required tools).
15+
2. `setup.sh` / `verify.sh` / `cleanup.sh` – shell hooks (optional) that prime the cluster, assert post-conditions, and reset resources so tasks stay idempotent.
16+
3. `artifacts/` – supporting manifests, scripts, or payloads referenced by the task definition.
17+
18+
## Adding a New Task
19+
20+
1. Pick the closest family and create a new subfolder.
21+
2. Author the task YAML referencing MCP tools, expected observations, and any artifacts.
22+
3. Provide helper scripts if the scenario needs deterministic setup or verification.
23+
4. Document nuances in a local `README.md` so future contributors and eval authors can replay the scenario manually.
24+
25+
Well-scoped, deterministic tasks make it easier to compare agents and regressions over time, so keep inputs minimal, outputs explicit, and always clean up what you create.
26+
27+
## Adding a New Task Stack
28+
29+
When a new MCP toolset lands , keep its evaluations isolated by creating a sibling directory under `tasks/` named after the toolset (`tasks/<name>>`, etc.). Populate it with:
30+
31+
1. A scoped `README.md` describing the toolset focus and prerequisite context.
32+
2. One subfolder per scenario that follows the same layout described above (`*.yaml`, scripts, `artifacts/`).
33+
3. Any shared fixtures the stack needs (place them in a `shared/` subdirectory if multiple scenarios reuse them).
34+
35+
This structure keeps task stacks discoverable and lets eval harnesses target toolset-specific workflows without mixing concerns from the core Kubernetes or Kiali libraries.
36+

evals/tasks/kiali-istio-create/kiali-istio-create.yaml

Lines changed: 0 additions & 31 deletions
This file was deleted.

evals/tasks/kiali-istio-delete/kiali-istio-delete.yaml

Lines changed: 0 additions & 70 deletions
This file was deleted.

evals/tasks/kiali-istio-list/kiali-istio-list.yaml

Lines changed: 0 additions & 48 deletions
This file was deleted.

evals/tasks/kiali-istio-patch/kiali-istio-patch.yaml

Lines changed: 0 additions & 110 deletions
This file was deleted.

evals/tasks/kiali-obs-unhealthy-namespaces/kiali-obs-unhealthy-namespaces.yaml

Lines changed: 0 additions & 17 deletions
This file was deleted.

0 commit comments

Comments
 (0)