TY - JOUR
T1 - Visualizing Ambiguity
T2 - Analyzing Linguistic Ambiguity Resolution in Text-to-Image Models
AU - Elsharif, Wala
AU - Alzubaidi, Mahmood
AU - She, James
AU - Agus, Marco
N1 - Publisher Copyright:
© 2025 by the authors.
PY - 2025/1
Y1 - 2025/1
N2 - Text-to-image models have demonstrated remarkable progress in generating visual content from textual descriptions. However, the presence of linguistic ambiguity in the text prompts poses a potential challenge to these models, possibly leading to undesired or inaccurate outputs. This work conducts a preliminary study and provides insights into how text-to-image diffusion models resolve linguistic ambiguity through a series of experiments. We investigate a set of prompts that exhibit different types of linguistic ambiguities with different models and the images they generate, focusing on how the models’ interpretations of linguistic ambiguity compare to those of humans. In addition, we present a curated dataset of ambiguous prompts and their corresponding images known as the Visual Linguistic Ambiguity Benchmark (V-LAB) dataset. Furthermore, we report a number of limitations and failure modes caused by linguistic ambiguity in text-to-image models and propose prompt engineering guidelines to minimize the impact of ambiguity. The findings of this exploratory study contribute to the ongoing improvement of text-to-image models and provide valuable insights for future advancements in the field.
AB - Text-to-image models have demonstrated remarkable progress in generating visual content from textual descriptions. However, the presence of linguistic ambiguity in the text prompts poses a potential challenge to these models, possibly leading to undesired or inaccurate outputs. This work conducts a preliminary study and provides insights into how text-to-image diffusion models resolve linguistic ambiguity through a series of experiments. We investigate a set of prompts that exhibit different types of linguistic ambiguities with different models and the images they generate, focusing on how the models’ interpretations of linguistic ambiguity compare to those of humans. In addition, we present a curated dataset of ambiguous prompts and their corresponding images known as the Visual Linguistic Ambiguity Benchmark (V-LAB) dataset. Furthermore, we report a number of limitations and failure modes caused by linguistic ambiguity in text-to-image models and propose prompt engineering guidelines to minimize the impact of ambiguity. The findings of this exploratory study contribute to the ongoing improvement of text-to-image models and provide valuable insights for future advancements in the field.
KW - computational linguistics
KW - diffusion models
KW - linguistic ambiguity
KW - natural language processing
KW - prompt engineering
KW - text-to-image models
UR - http://www.scopus.com/inward/record.url?scp=85216092433&partnerID=8YFLogxK
U2 - 10.3390/computers14010019
DO - 10.3390/computers14010019
M3 - Article
AN - SCOPUS:85216092433
SN - 2073-431X
VL - 14
JO - Computers
JF - Computers
IS - 1
M1 - 19
ER -