This repository contains the dataset for the paper Over-Searching in Search-Augmented Large Language Models (EACL 2026).
OverSearchQA is a benchmark designed to evaluate when language models should abstain from using retrieval/search tools. The benchmark tests three scenarios where reliance on search can be detrimental:
- AU (Answerable Unknown): Questions about genuinely unknowable information (future events, universal unknowns)
- FP (False Premise): Questions containing false presuppositions that search cannot resolve
- UC (Underspecified Context): Questions lacking sufficient context for a definitive answer
| Category | Total | Should Abstain | Should Not Abstain |
|---|---|---|---|
| AU | 292 | 146 | 146 |
| FP | 384 | 192 | 192 |
| UC | 512 | 256 | 256 |
| Total | 1,188 | 594 | 594 |
Each data file (AU.json, FP.json, UC.json) contains JSON lines with the following fields:
{
"category": "AU | FP | UC",
"should_abstain": true | false,
"question": "The question text",
"answer": "Target answer or explanation",
"id": "Unique hash identifier",
"data_source": "Source dataset name",
"original_data_info": "Metadata from the source dataset"
}{
"category": "AU",
"should_abstain": true,
"question": "What will be the top performing stock in the next 15 years?",
"answer": "This question cannot be answered definitively due to unsolved problems or future unknowns. The model should point out the unanswerability and abstain from providing an answer.",
"id": "e40cee2a6dd6",
"data_source": "kuq_unsolved_future_abstain",
"original_data_info": "{\"KUQ_source\": \"turk\", \"KUQ_category\": \"future unknown\"}"
}The benchmark is designed to evaluate:
- Abstention Accuracy: Whether the model correctly identifies when to abstain from searching
- Search Efficiency: Whether the model avoids unnecessary search calls
If you found this work helpful, please consider citing:
@inproceedings{oversearchqa2026,
title={Over-Searching in Search-Augmented Large Language Models},
author={Xie, Roy and Gopinath, Deepak and Qiu, David and Lin, Dong and Sun, Haitian and Potdar, Saloni and Dhingra, Bhuwan},
booktitle={Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (EACL)},
year={2026},
url={https://arxiv.org/abs/2601.05503}
}