Visualizing Ambiguity: Analyzing Linguistic Ambiguity Resolution in Text-to-Image Models

Wala Elsharif*, Mahmood Alzubaidi, James She, Marco Agus*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Text-to-image models have demonstrated remarkable progress in generating visual content from textual descriptions. However, the presence of linguistic ambiguity in the text prompts poses a potential challenge to these models, possibly leading to undesired or inaccurate outputs. This work conducts a preliminary study and provides insights into how text-to-image diffusion models resolve linguistic ambiguity through a series of experiments. We investigate a set of prompts that exhibit different types of linguistic ambiguities with different models and the images they generate, focusing on how the models’ interpretations of linguistic ambiguity compare to those of humans. In addition, we present a curated dataset of ambiguous prompts and their corresponding images known as the Visual Linguistic Ambiguity Benchmark (V-LAB) dataset. Furthermore, we report a number of limitations and failure modes caused by linguistic ambiguity in text-to-image models and propose prompt engineering guidelines to minimize the impact of ambiguity. The findings of this exploratory study contribute to the ongoing improvement of text-to-image models and provide valuable insights for future advancements in the field.

Original languageEnglish
Article number19
JournalComputers
Volume14
Issue number1
DOIs
Publication statusPublished - Jan 2025

Keywords

  • computational linguistics
  • diffusion models
  • linguistic ambiguity
  • natural language processing
  • prompt engineering
  • text-to-image models

Fingerprint

Dive into the research topics of 'Visualizing Ambiguity: Analyzing Linguistic Ambiguity Resolution in Text-to-Image Models'. Together they form a unique fingerprint.

Cite this