Visual Riddlesvisual_riddles: a Commonsense and
World Knowledge Challenge for
Large Vision and Language Models

Ben Gurion University of the Negev,   Bar-Ilan University,
The Hebrew University of Jerusalem, Google Research,   Tel Aviv University


arXiv

🤗

Dataset

🤗

Explorer
Evaluation Notebook


Why is he doing this?

Paris



Look at the nightstand

The image depicts a man scratching his arm, in a bedroom and a mosquito on a nightstand near the bed. Therefore, the man probably scratching his arm due to mosquito bite.


What is this local doing?

Paris


Look at his cheek

This local is most likely Italian, based on the colosseum in the background. He appears to be eating and pushing his finger to his cheek. In Italy, while eating, this gesture usually means “buono” - that you find the food tasty. Therefore, he is most likely saying that the food is delicious.

Sara is a resort owner in Krabi, Thailand. could this be her resort?

Paris


Look on the mountains

An outside image of a thai-style house, with big yard. in the yard there is grass and big pool. on the far background there are Alpine mountains with snow on the tops. there is visible snow on the mountains tops.

BibTeX

@misc{bittonguetta2024visualriddlescommonsenseworld,
      title={Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models},
      author={Nitzan Bitton-Guetta and Aviv Slobodkin and Aviya Maimon and Eliya Habba and Royi Rassin and Yonatan Bitton and Idan Szpektor and Amir Globerson and Yuval Elovici},
      year={2024},
      eprint={2407.19474},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2407.19474},
}