containers
diff --git a/‎evals/claude-code/eval-inline.yaml‎
Lines changed: 1 addition & 1 deletion b/‎evals/claude-code/eval-inline.yaml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎evals/claude-code/eval.yaml‎
Lines changed: 1 addition & 1 deletion b/‎evals/claude-code/eval.yaml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎evals/openai-agent/eval-inline.yaml‎
Lines changed: 1 addition & 1 deletion b/‎evals/openai-agent/eval-inline.yaml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎evals/openai-agent/eval.yaml‎
Lines changed: 1 addition & 1 deletion b/‎evals/openai-agent/eval.yaml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎evals/tasks/README.md‎
Lines changed: 36 additions & 0 deletions b/‎evals/tasks/README.md‎
Lines changed: 36 additions & 0 deletions
diff --git a/‎evals/tasks/kiali-obs-unhealthy-namespaces/kiali-obs-unhealthy-namespaces.yaml‎
Lines changed: 0 additions & 17 deletions b/‎evals/tasks/kiali-obs-unhealthy-namespaces/kiali-obs-unhealthy-namespaces.yaml‎
Lines changed: 0 additions & 17 deletions
diff --git a/‎evals/tasks/kiali/Makefile‎
Lines changed: 6 additions & 0 deletions b/‎evals/tasks/kiali/Makefile‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎evals/tasks/kiali/README.md‎
Lines changed: 55 additions & 0 deletions b/‎evals/tasks/kiali/README.md‎
Lines changed: 55 additions & 0 deletions
diff --git a/‎evals/tasks/kiali/delete-faultInjection/delete-faultInjection.yaml‎
Lines changed: 69 additions & 0 deletions b/‎evals/tasks/kiali/delete-faultInjection/delete-faultInjection.yaml‎
Lines changed: 69 additions & 0 deletions
diff --git a/‎evals/tasks/kiali/get-namespaces/get-namespaces.yaml‎
Lines changed: 15 additions & 0 deletions b/‎evals/tasks/kiali/get-namespaces/get-namespaces.yaml‎
Lines changed: 15 additions & 0 deletions
@@ -12,7 +12,7 @@ config:
       apiKeyKey: JUDGE_API_KEY
       modelNameKey: JUDGE_MODEL_NAME
   taskSets:
-    - glob: ../tasks/*/*.yaml
+    - glob: ../tasks/*/*/*.yaml
       assertions:
         toolsUsed:
           - server: kubernetes
 
@@ -12,7 +12,7 @@ config:
       apiKeyKey: JUDGE_API_KEY
       modelNameKey: JUDGE_MODEL_NAME
   taskSets:
-    - glob: ../tasks/*/*.yaml
+    - glob: ../tasks/*/*/*.yaml
       assertions:
         toolsUsed:
           - server: kubernetes
 
@@ -16,7 +16,7 @@ config:
       apiKeyKey: JUDGE_API_KEY
       modelNameKey: JUDGE_MODEL_NAME
   taskSets:
-    - glob: ../tasks/*/*.yaml
+    - glob: ../tasks/*/*/*.yaml
       assertions:
         toolsUsed:
           - server: kubernetes
 
@@ -12,7 +12,7 @@ config:
       apiKeyKey: JUDGE_API_KEY
       modelNameKey: JUDGE_MODEL_NAME
   taskSets:
-    - glob: ../tasks/*/*.yaml
+    - glob: ../tasks/*/*/*.yaml
       assertions:
         toolsUsed:
           - server: kubernetes
 
@@ -0,0 +1,36 @@
+# MCP Task Library
+
+This directory hosts the reusable task scenarios that power MCP evaluations for the Kubernetes MCP Server. Each task captures a realistic cluster workflow (setup, agent-driven actions, verification, and cleanup) so different agents can be compared against the same benchmark.
+
+## Task Families
+
+- [Kubernetes tasks](kubernetes/) – core cluster workflows such as creating pods, fixing deployments, managing RBAC, or debugging state issues.
+- [Kiali tasks](kiali/) – service-mesh and observability workflows that exercise the Kiali MCP toolset (Istio config, topology, mesh health, tracing). 
+
+## Anatomy of a Task
+
+Every subdirectory under `kubernetes/` or `kiali/` defines a single scenario:
+
+1. `*.yaml` – declarative description consumed by the evaluation harness (prompts, success criteria, required tools).
+2. `setup.sh` / `verify.sh` / `cleanup.sh` – shell hooks (optional) that prime the cluster, assert post-conditions, and reset resources so tasks stay idempotent.
+3. `artifacts/` – supporting manifests, scripts, or payloads referenced by the task definition.
+
+## Adding a New Task
+
+1. Pick the closest family and create a new subfolder.
+2. Author the task YAML referencing MCP tools, expected observations, and any artifacts.
+3. Provide helper scripts if the scenario needs deterministic setup or verification.
+4. Document nuances in a local `README.md` so future contributors and eval authors can replay the scenario manually.
+
+Well-scoped, deterministic tasks make it easier to compare agents and regressions over time, so keep inputs minimal, outputs explicit, and always clean up what you create.
+
+## Adding a New Task Stack
+
+When a new MCP toolset lands , keep its evaluations isolated by creating a sibling directory under `tasks/` named after the toolset (`tasks/<name>>`, etc.). Populate it with:
+
+1. A scoped `README.md` describing the toolset focus and prerequisite context.
+2. One subfolder per scenario that follows the same layout described above (`*.yaml`, scripts, `artifacts/`).
+3. Any shared fixtures the stack needs (place them in a `shared/` subdirectory if multiple scenarios reuse them).
+
+This structure keeps task stacks discoverable and lets eval harnesses target toolset-specific workflows without mixing concerns from the core Kubernetes or Kiali libraries.
+
@@ -0,0 +1,6 @@
+SHELL := /usr/bin/bash
+
+.PHONY: update-tasks
+
+update-tasks:
+	@./scripts/update_tasks.sh
@@ -0,0 +1,55 @@
+# Kiali Task Stack
+
+Kiali-focused MCP tasks live here. Each folder under this directory represents a self-contained scenario that exercises the Kiali toolset (Istio config, topology, observability, troubleshooting).
+
+## Adding a New Task
+
+1. Create a new subdirectory (e.g., `status-foo/`) and place the scenario YAML plus any helper scripts or artifacts inside it.
+2. Make sure the YAML’s `metadata` block includes `name`, `category`, and `difficulty` so it shows up correctly in the catalog below.
+3. Keep prompts concise and action-oriented; verification commands should rely on Kiali MCP tools whenever possible.
+
+## Updating the Catalog
+
+After adding or editing tasks, regenerate this README’s catalog with:
+
+```bash
+make update_tasks
+```
+
+The `update_tasks` target runs `scripts/update_tasks.sh`, which parses every scenario and rewrites the section below automatically. Always run it before committing so the list stays in sync.
+
+## Tasks defined
+<!-- TASKS-START -->
+- High-Level Observability & Health
+  - [easy] obs-unhealthy-namespaces (Unhealthy Namespaces)
+        **Prompt:** *Are there any unhealthy namespaces in my mesh right now?*
+  - [easy] show-topology (Show topology bookinfo)
+        **Prompt:** *Show me the topology of the bookinfo namespace.*
+  - [easy] status-kiali-istio (Status Kiali and Istio)
+        **Prompt:** *Give me a status report on the interaction between Kiali and Istio components*
+- Istio Configuration & Management
+  - [easy] istio-list (List all VS in bookinfo namespace)
+        **Prompt:** *List all VirtualServices in the bookinfo namespace and check if they have any validation errors*
+  - [medium] istio-create (Create a gateway)
+        **Prompt:** *Create a Gateway named my-gateway in the istio-system namespace.*
+  - [medium] istio-delete (Remove fault Injection)
+        **Prompt:** *Fix my namespace bookinfo to remove the fault injection.*
+  - [medium] istio-patch (Patch my traffic)
+        **Prompt:** *I need to shift 50% of traffic to v2 of the reviews service. Apply a patch to the existing VirtualService.*
+- Resource Inspection
+  - [easy] resource-get-namespaces (Get mesh namespaces)
+        **Prompt:** *Check namespaces in my mesh.*
+  - [easy] resource-get-service-detail (Get service detail)
+        **Prompt:** *Get the full details and health status for the details service*
+  - [easy] resource-list-workloads (List workloads without sidecar)
+        **Prompt:** *List all workloads in the bookinfo namespace that have missing sidecars.*
+  - [easy] resource-mesh-status (Status of my mesh)
+        **Prompt:** *Check my mesh.*
+- Troubleshooting & Debugging
+  - [easy] troubleshooting-latency-traces (Get latency workload)
+        **Prompt:** *Analyze the latency for the reviews workload over the last 30 minutes?*
+  - [easy] troubleshooting-log (Get log productpage due 500)
+        **Prompt:** *Why is the productpage service returning 500 errors?*
+  - [easy] troubleshooting-trace-lagging (Check traces for a service)
+        **Prompt:** *I see a spike in duration for ratings. Can you check the traces to see which span is lagging?*
+<!-- TASKS-END -->
@@ -0,0 +1,69 @@
+kind: Task
+metadata:
+  name: "Remove fault Injection"
+  difficulty: easy
+steps:
+  setup: 
+    inline: |-
+      #!/usr/bin/env bash
+      set -euo pipefail
+      cat <<'EOF' | kubectl apply -f -
+      apiVersion: networking.istio.io/v1
+      kind: DestinationRule
+      metadata:
+        namespace: bookinfo
+        name: ratings
+        labels:
+          gevals.kiali.io/test: delete-fault-injection
+      spec:
+        host: ratings.bookinfo.svc.cluster.local
+        subsets:
+        - name: v1
+          labels:
+            version: v1
+      ---
+      apiVersion: networking.istio.io/v1
+      kind: VirtualService
+      metadata:
+        namespace: bookinfo
+        name: ratings
+        labels:
+          gevals.kiali.io/test: delete-fault-injection
+      spec:
+        hosts:
+        - ratings.bookinfo.svc.cluster.local
+        http:
+        - route:
+          - destination:
+              host: ratings.bookinfo.svc.cluster.local
+              subset: v1
+            weight: 100
+          fault:
+            abort:
+              percentage:
+                value: 100
+              httpStatus: 503
+      EOF
+  verify:
+    inline: |-
+      #!/usr/bin/env bash
+      vs_fault_names="$(kubectl get virtualservice -n "${NAMESPACE}" -o json \
+        | jq -r '[.items[] | select(any(.spec.http[]?; has("fault"))) | .metadata.name] | .[]?')"
+      if [[ -n "${vs_fault_names}" ]]; then
+        exit 1
+      fi
+
+      # Verify DestinationRule 'ratings' does not exist (created during setup)
+      if kubectl get destinationrule ratings -n "${NAMESPACE}" >/dev/null 2>&1; then
+        exit 1
+      fi
+  cleanup: 
+    inline: |-
+      #!/usr/bin/env bash
+      set -euo pipefail
+      NS="bookinfo"
+      LABEL="gevals.kiali.io/test=delete-fault-injection"
+      kubectl delete virtualservice -n "$NS" -l "$LABEL" --ignore-not-found
+      kubectl delete destinationrule -n "$NS" -l "$LABEL" --ignore-not-found
+  prompt:
+    inline: Fix my namespace bookinfo to remove the fault injection.
@@ -0,0 +1,15 @@
+kind: Task
+metadata:
+  name: "get-namespaces"
+  difficulty: easy
+steps:
+  setup:
+      inline: |-
+        #!/usr/bin/env bash
+  verify:
+    contains: "bookinfo"
+  cleanup:
+    inline: |-
+      #!/usr/bin/env bash          
+  prompt:
+    inline: Check namespaces in my mesh.