TY - GEN
T1 - Ebhaam at SemEval-2023 Task 1
T2 - 17th International Workshop on Semantic Evaluation, SemEval 2023, co-located with the 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
AU - Taghavi, Zeinab
AU - Naeini, Parsa Haghighi
AU - Sadraei, Mohammad Ali
AU - Gooran, Soroush
AU - Asgari, Ehsaneddin
AU - Rabiee, Hamid Reza
AU - Sameti, Hossein
N1 - Publisher Copyright:
© 2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - This paper presents an approach to tackle the task of Visual Word Sense Disambiguation (Visual-WSD), which involves determining the most appropriate image to represent a given polysemous word in one of its particular senses. The proposed approach leverages the CLIP model, prompt engineering, and text-to-image models such as GLIDE and DALL-E 2 for both image retrieval and generation. To evaluate our approach, we participated in the SemEval 2023 shared task on “Visual Word Sense Disambiguation (Visual-WSD)” using a zero-shot learning setting, where we compared the accuracy of different combinations of tools, including “Simple prompt-based” methods and “Generated prompt-based” methods for prompt engineering using completion models, and text-to-image models for changing input modality from text to image. Moreover, we explored the benefits of cross-modality evaluation between text and candidate images using CLIP. Our experimental results demonstrate that the proposed approach reaches better results than cross-modality approaches, highlighting the potential of prompt engineering and text-to-image models to improve accuracy in Visual-WSD tasks. We assessed our approach in a zero-shot learning scenario and attained an accuracy of 68.75% in our best attempt.
AB - This paper presents an approach to tackle the task of Visual Word Sense Disambiguation (Visual-WSD), which involves determining the most appropriate image to represent a given polysemous word in one of its particular senses. The proposed approach leverages the CLIP model, prompt engineering, and text-to-image models such as GLIDE and DALL-E 2 for both image retrieval and generation. To evaluate our approach, we participated in the SemEval 2023 shared task on “Visual Word Sense Disambiguation (Visual-WSD)” using a zero-shot learning setting, where we compared the accuracy of different combinations of tools, including “Simple prompt-based” methods and “Generated prompt-based” methods for prompt engineering using completion models, and text-to-image models for changing input modality from text to image. Moreover, we explored the benefits of cross-modality evaluation between text and candidate images using CLIP. Our experimental results demonstrate that the proposed approach reaches better results than cross-modality approaches, highlighting the potential of prompt engineering and text-to-image models to improve accuracy in Visual-WSD tasks. We assessed our approach in a zero-shot learning scenario and attained an accuracy of 68.75% in our best attempt.
UR - http://www.scopus.com/inward/record.url?scp=85175398468&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85175398468
T3 - 17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop
SP - 1960
EP - 1964
BT - 17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop
A2 - Ojha, Atul Kr.
A2 - Dogruoz, A. Seza
A2 - Da San Martino, Giovanni
A2 - Madabushi, Harish Tayyar
A2 - Kumar, Ritesh
A2 - Sartori, Elisa
PB - Association for Computational Linguistics
Y2 - 13 July 2023 through 14 July 2023
ER -