Ebhaam at SemEval-2023 Task 1: A CLIP-Based Approach for Comparing Cross-modality and Unimodality in Visual Word Sense Disambiguation

Zeinab Taghavi, Parsa Haghighi Naeini*, Mohammad Ali Sadraei*, Soroush Gooran, Ehsaneddin Asgari, Hamid Reza Rabiee, Hossein Sameti

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

This paper presents an approach to tackle the task of Visual Word Sense Disambiguation (Visual-WSD), which involves determining the most appropriate image to represent a given polysemous word in one of its particular senses. The proposed approach leverages the CLIP model, prompt engineering, and text-to-image models such as GLIDE and DALL-E 2 for both image retrieval and generation. To evaluate our approach, we participated in the SemEval 2023 shared task on “Visual Word Sense Disambiguation (Visual-WSD)” using a zero-shot learning setting, where we compared the accuracy of different combinations of tools, including “Simple prompt-based” methods and “Generated prompt-based” methods for prompt engineering using completion models, and text-to-image models for changing input modality from text to image. Moreover, we explored the benefits of cross-modality evaluation between text and candidate images using CLIP. Our experimental results demonstrate that the proposed approach reaches better results than cross-modality approaches, highlighting the potential of prompt engineering and text-to-image models to improve accuracy in Visual-WSD tasks. We assessed our approach in a zero-shot learning scenario and attained an accuracy of 68.75% in our best attempt.

Original languageEnglish
Title of host publication17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop
EditorsAtul Kr. Ojha, A. Seza Dogruoz, Giovanni Da San Martino, Harish Tayyar Madabushi, Ritesh Kumar, Elisa Sartori
PublisherAssociation for Computational Linguistics
Pages1960-1964
Number of pages5
ISBN (Electronic)9781959429999
Publication statusPublished - 2023
Externally publishedYes
Event17th International Workshop on Semantic Evaluation, SemEval 2023, co-located with the 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023 - Hybrid, Toronto, Canada
Duration: 13 Jul 202314 Jul 2023

Publication series

Name17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop

Conference

Conference17th International Workshop on Semantic Evaluation, SemEval 2023, co-located with the 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
Country/TerritoryCanada
CityHybrid, Toronto
Period13/07/2314/07/23

Fingerprint

Dive into the research topics of 'Ebhaam at SemEval-2023 Task 1: A CLIP-Based Approach for Comparing Cross-modality and Unimodality in Visual Word Sense Disambiguation'. Together they form a unique fingerprint.

Cite this