Skip to content

Need to improve the Regression error criteria #95

@epernod

Description

@epernod

Asking an "expert":

**Why your current metric is weak**

You currently do:
full_dist = || data_ref - data ||₂
error_by_dof = full_dist / N
accumulate over frames

This is equivalent to an average global L2 error.

**Problems (you identified correctly)**

1. Late divergence is diluted
   a. Early frames ≈ zero error
   b. Large final error disappears in the average
2. Sparse divergence is diluted
   a. Few DOFs explode
   b. Thousands remain stable → error masked
3. No notion of worst-case

Regression tests should fail on any serious deviation

Solutions to discuss:

  1. Max per-frame error (L∞ over time)
    max_frame_error = max(frame_errors) where frame_error = || data_ref - data ||₂

  2. Max per-DOF error (true worst case)

per_dof_error = np.abs(data_diff)
max_dof_error = per_dof_error.max()

Then over time: global_max_dof_error = max(global_max_dof_error, max_dof_error)

  1. 1 metric on: Use median or high percentile over DOFs and time.
abs_diff = np.abs(data_diff)

frame_median = np.median(abs_diff)
frame_p95 = np.percentile(abs_diff, 95)

Over time:

global_median = max(global_median, frame_median)
global_p95 = max(global_p95, frame_p95)
  1. 2nd metric: max DOF error, but with safeguards:
frame_max = abs_diff.max()
global_max = max(global_max, frame_max)
global_max > K × epsilon

Over simulation:

max_median = max(max_median, frame_median)
max_p95 = max(max_p95, frame_p95)
max_max = max(max_max, frame_max)

FAIL if:

  • max_p95 > eps_p95
    OR
  • max_max > eps_catastrophic

WARN if:

  • max_median > eps_median

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions