Missing ground trugh and lie_text

Hi Team,

Thank you for creating this important repo ! 

Summary:

- In `evals/analysis/quality_analyzer.py:90-93`, the analyzer forces `success = 0.0` whenever both `truth_text` and `lie_text` are empty, regardless of HTTP status/content.
- `summarize` then computes success_rate as `sum(1 for r in results if r.success) / len(results)`, so these rows count in the denominator but never in the numerator.

Impact
- In `datasets/1-0-0.csv` there are 135 rows where both truth_text and lie_text are empty.
- All scrapes for those rows are automatically marked unsuccessful, which drags down the reported success_rate even when the scraper returns 200 + content.
- Quality metrics (recall/precision/F1) aren’t meaningful for those rows either, since there’s no ground truth to compare.

What to fix
- Either drop/skip rows with empty truth_text AND lie_text during evaluation, or
- Backfill the missing snippets so the rows can be scored, or
- Change the success calculation to avoid penalizing success_rate for rows with no ground truth (e.g., exclude them from the denominator).

I can provide the list of affected task IDs/URLs if helpful.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Missing ground trugh and lie_text #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Missing ground trugh and lie_text #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions