Skip to content

Missing ground trugh and lie_text #4

@MeirKaD

Description

@MeirKaD

Hi Team,

Thank you for creating this important repo !

Summary:

  • In evals/analysis/quality_analyzer.py:90-93, the analyzer forces success = 0.0 whenever both truth_text and lie_text are empty, regardless of HTTP status/content.
  • summarize then computes success_rate as sum(1 for r in results if r.success) / len(results), so these rows count in the denominator but never in the numerator.

Impact

  • In datasets/1-0-0.csv there are 135 rows where both truth_text and lie_text are empty.
  • All scrapes for those rows are automatically marked unsuccessful, which drags down the reported success_rate even when the scraper returns 200 + content.
  • Quality metrics (recall/precision/F1) aren’t meaningful for those rows either, since there’s no ground truth to compare.

What to fix

  • Either drop/skip rows with empty truth_text AND lie_text during evaluation, or
  • Backfill the missing snippets so the rows can be scored, or
  • Change the success calculation to avoid penalizing success_rate for rows with no ground truth (e.g., exclude them from the denominator).

I can provide the list of affected task IDs/URLs if helpful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions