-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
Hi Team,
Thank you for creating this important repo !
Summary:
- In
evals/analysis/quality_analyzer.py:90-93, the analyzer forcessuccess = 0.0whenever bothtruth_textandlie_textare empty, regardless of HTTP status/content. summarizethen computes success_rate assum(1 for r in results if r.success) / len(results), so these rows count in the denominator but never in the numerator.
Impact
- In
datasets/1-0-0.csvthere are 135 rows where both truth_text and lie_text are empty. - All scrapes for those rows are automatically marked unsuccessful, which drags down the reported success_rate even when the scraper returns 200 + content.
- Quality metrics (recall/precision/F1) aren’t meaningful for those rows either, since there’s no ground truth to compare.
What to fix
- Either drop/skip rows with empty truth_text AND lie_text during evaluation, or
- Backfill the missing snippets so the rows can be scored, or
- Change the success calculation to avoid penalizing success_rate for rows with no ground truth (e.g., exclude them from the denominator).
I can provide the list of affected task IDs/URLs if helpful.
FrancescoSaverioZuppichini, williambrach and DonatasDecodo
Metadata
Metadata
Assignees
Labels
No labels