OverSearchQA: Over-Searching in Search-Augmented Large Language Models

This repository contains the dataset for the paper Over-Searching in Search-Augmented Large Language Models (EACL 2026).

Overview

OverSearchQA is a benchmark designed to evaluate when language models should abstain from using retrieval/search tools. The benchmark tests three scenarios where reliance on search can be detrimental:

AU (Answerable Unknown): Questions about genuinely unknowable information (future events, universal unknowns)
FP (False Premise): Questions containing false presuppositions that search cannot resolve
UC (Underspecified Context): Questions lacking sufficient context for a definitive answer

Dataset Statistics

Category	Total	Should Abstain	Should Not Abstain
AU	292	146	146
FP	384	192	192
UC	512	256	256
Total	1,188	594	594

Data Format

Each data file (AU.json, FP.json, UC.json) contains JSON lines with the following fields:

{
  "category": "AU | FP | UC",
  "should_abstain": true | false,
  "question": "The question text",
  "answer": "Target answer or explanation",
  "id": "Unique hash identifier",
  "data_source": "Source dataset name",
  "original_data_info": "Metadata from the source dataset"
}

Example Entry

{
  "category": "AU",
  "should_abstain": true,
  "question": "What will be the top performing stock in the next 15 years?",
  "answer": "This question cannot be answered definitively due to unsolved problems or future unknowns. The model should point out the unanswerability and abstain from providing an answer.",
  "id": "e40cee2a6dd6",
  "data_source": "kuq_unsolved_future_abstain",
  "original_data_info": "{\"KUQ_source\": \"turk\", \"KUQ_category\": \"future unknown\"}"
}

Evaluation

The benchmark is designed to evaluate:

Abstention Accuracy: Whether the model correctly identifies when to abstain from searching
Search Efficiency: Whether the model avoids unnecessary search calls

Citation

If you found this work helpful, please consider citing:

@inproceedings{oversearchqa2026,
  title={Over-Searching in Search-Augmented Large Language Models},
  author={Xie, Roy and Gopinath, Deepak and Qiu, David and Lin, Dong and Sun, Haitian and Potdar, Saloni and Dhingra, Bhuwan},
  booktitle={Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (EACL)},
  year={2026},
  url={https://arxiv.org/abs/2601.05503}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
LICENSE.md		LICENSE.md
README.md		README.md
metadata.json		metadata.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OverSearchQA: Over-Searching in Search-Augmented Large Language Models

Overview

Dataset Statistics

Data Format

Example Entry

Evaluation

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

OverSearchQA: Over-Searching in Search-Augmented Large Language Models

Overview

Dataset Statistics

Data Format

Example Entry

Evaluation

Citation

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Packages