Need to improve the Regression error criteria

Asking an "expert":

```
**Why your current metric is weak**

You currently do:
full_dist = || data_ref - data ||₂
error_by_dof = full_dist / N
accumulate over frames

This is equivalent to an average global L2 error.

**Problems (you identified correctly)**

1. Late divergence is diluted
   a. Early frames ≈ zero error
   b. Large final error disappears in the average
2. Sparse divergence is diluted
   a. Few DOFs explode
   b. Thousands remain stable → error masked
3. No notion of worst-case

Regression tests should fail on any serious deviation
```


Solutions to discuss:

1. Max per-frame error (L∞ over time)
`max_frame_error = max(frame_errors)` where frame_error = || data_ref - data ||₂

2. Max per-DOF error (true worst case)
```
per_dof_error = np.abs(data_diff)
max_dof_error = per_dof_error.max()
```

Then over time: `global_max_dof_error = max(global_max_dof_error, max_dof_error)`

3. 1 metric on: Use median or high percentile over DOFs and time.
```
abs_diff = np.abs(data_diff)

frame_median = np.median(abs_diff)
frame_p95 = np.percentile(abs_diff, 95)
```

Over time:
```
global_median = max(global_median, frame_median)
global_p95 = max(global_p95, frame_p95)
```

3. 2nd metric: max DOF error, but with safeguards:
```
frame_max = abs_diff.max()
global_max = max(global_max, frame_max)
global_max > K × epsilon
```

Over simulation:
```
max_median = max(max_median, frame_median)
max_p95 = max(max_p95, frame_p95)
max_max = max(max_max, frame_max)
```

FAIL if:
- max_p95 > eps_p95
OR
- max_max > eps_catastrophic

WARN if:
- max_median > eps_median

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need to improve the Regression error criteria #95

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Need to improve the Regression error criteria #95

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions