-
Notifications
You must be signed in to change notification settings - Fork 137
Open
Description
Since QwQ support max 32768 tokens, however, the default max_tokens is 81920 in scripts/run_web_thinker.py, I am wondering what length is used for evaluating QwQ model for difference benchmarks?
python scripts/run_web_thinker_report.py \
--dataset_name glaive \
--split test \
--concurrent_limit 32 \
--search_engine "serper" \
--serper_api_key "YOUR_GOOGLE_SERPER_API" \
--api_base_url "YOUR_API_BASE_URL" \
--model_name "QwQ-32B" \
--aux_api_base_url "YOUR_AUX_API_BASE_URL" \
--aux_model_name "Qwen2.5-32B-Instruct" \
--tokenizer_path "PATH_TO_YOUR_TOKENIZER" \
--aux_tokenizer_path "PATH_TO_YOUR_AUX_TOKENIZER"Metadata
Metadata
Assignees
Labels
No labels