Open Genes AI Benchmark is a framework for benchmarking and evaluating AI models on scientific data extraction tasks from research articles. It supports multiple AI models, automated benchmarking, customizable tasks, detailed scoring, and visualization of results.
- Supports OpenAI and Gemini models for scientific data extraction
- Automated benchmarking pipeline for reproducible evaluation
- Customizable tasks and assessments
- Comprehensive scoring and evaluation metrics:
- Factual accuracy
- Completeness
- Precision
- Hallucination rate
- Plausible error rate
- Format compliance
- Numerical accuracy
- Uncertainty handling
- Visualization scripts for performance metrics
- Utilities for document conversion (DOC/DOCX to PDF)
biobench/— Core benchmarking logic, models, tasks, assessments, and scriptsbiobench/scripts/— Utility scripts (charts, document conversion)biobench/models/— Model wrappers for OpenAI and Geminibiobench/tasks/,biobench/assessments/,biobench/scorers/— Task and scoring logicdata/— Data and supplementary filesdocs/— Documentation
- Python version: Python 3.12+
- Install dependencies:
pip install -r requirements.txt
Set the following environment variables as needed:
BASE_URL— Base URL for OpenAI-compatible API (OpenAI model)API_KEY— API key for OpenAI-compatible APIGEMINI_API_KEY— API key for Gemini modelMAX_TOKENS— Maximum number of tokens in response (default: 75)TEMPERATURE— Model creativity from 0.0 to 1.0 (default: 0.5)
DATABASE_URL— PostgreSQL connection
MYSQL_HOST— MySQL hostMYSQL_USER— MySQL userMYSQL_PASSWORD— MySQL passwordMYSQL_DATABASE— MySQL database name
These are used for task generation in biobench/task_generator/generator.py
soffice(LibreOffice) — For document conversion, ensuresofficeis in your PATH or specify its path in the script argument.
python -m biobench.pipelinepython -m biobench.pipeline_geminiAdd your own pipeline for other modeltypes.
python biobench/scripts/charts.py- Outputs performance heatmaps to
biobench/charts_output/
python biobench/scripts/convert_docs_to_pdf.py- Converts all
.doc/.docxfiles indata/supp/to PDF using LibreOffice (soffice).
- precision: Correct facts in extraction
- completeness: Coverage of key information
- hallucination_rate: Fabricated information rate
- plausible_error_rate: Realistic but incorrect information rate
- format_compliance: Output format correctness
- numerical_accuracy: Correctness of numerical values
- uncertainty_handling: Proper handling of missing/uncertain data
Contributions are welcome! Please open issues or pull requests for improvements, bug fixes, or new features.