Skip to content

apple/ml-over-searching

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

OverSearchQA: Over-Searching in Search-Augmented Large Language Models

arXiv

This repository contains the dataset for the paper Over-Searching in Search-Augmented Large Language Models (EACL 2026).

Overview

OverSearchQA is a benchmark designed to evaluate when language models should abstain from using retrieval/search tools. The benchmark tests three scenarios where reliance on search can be detrimental:

  • AU (Answerable Unknown): Questions about genuinely unknowable information (future events, universal unknowns)
  • FP (False Premise): Questions containing false presuppositions that search cannot resolve
  • UC (Underspecified Context): Questions lacking sufficient context for a definitive answer

Dataset Statistics

Category Total Should Abstain Should Not Abstain
AU 292 146 146
FP 384 192 192
UC 512 256 256
Total 1,188 594 594

Data Format

Each data file (AU.json, FP.json, UC.json) contains JSON lines with the following fields:

{
  "category": "AU | FP | UC",
  "should_abstain": true | false,
  "question": "The question text",
  "answer": "Target answer or explanation",
  "id": "Unique hash identifier",
  "data_source": "Source dataset name",
  "original_data_info": "Metadata from the source dataset"
}

Example Entry

{
  "category": "AU",
  "should_abstain": true,
  "question": "What will be the top performing stock in the next 15 years?",
  "answer": "This question cannot be answered definitively due to unsolved problems or future unknowns. The model should point out the unanswerability and abstain from providing an answer.",
  "id": "e40cee2a6dd6",
  "data_source": "kuq_unsolved_future_abstain",
  "original_data_info": "{\"KUQ_source\": \"turk\", \"KUQ_category\": \"future unknown\"}"
}

Evaluation

The benchmark is designed to evaluate:

  1. Abstention Accuracy: Whether the model correctly identifies when to abstain from searching
  2. Search Efficiency: Whether the model avoids unnecessary search calls

Citation

If you found this work helpful, please consider citing:

@inproceedings{oversearchqa2026,
  title={Over-Searching in Search-Augmented Large Language Models},
  author={Xie, Roy and Gopinath, Deepak and Qiu, David and Lin, Dong and Sun, Haitian and Potdar, Saloni and Dhingra, Bhuwan},
  booktitle={Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (EACL)},
  year={2026},
  url={https://arxiv.org/abs/2601.05503}
}

About

No description, website, or topics provided.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors