TY - GEN
T1 - ImEW
T2 - 1st Workshop on Large Generative Models Meet Multimodal Applications, LGM3A 2023
AU - Mohiuddin, Tasnim
AU - Zhang, Tianyi
AU - Nie, Maowen
AU - Huang, Jing
AU - Chen, Qianqian
AU - Shi, Wei
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/11/2
Y1 - 2023/11/2
N2 - The ability to edit images in a realistic and visually appealing manner is a fundamental requirement in various computer vision applications. In this paper, we present ImEW, a unified framework designed for solving image editing tasks. ImEW utilizes off-The-shelf foundation models to address four essential editing tasks: object removal, object translation, object replacement, and generative fill beyond the image frame. These tasks are accomplished by leveraging the capabilities of state-of-The-Art foundation models, namely the Segment Anything Model, Grounding DINO, LaMa, and Stable Diffusion. These models have undergone extensive training on large-scale datasets and have exhibited exceptional performance in understanding image context, object manipulation, and texture synthesis. Through extensive experimentation, we demonstrate the effectiveness and versatility of ImEW in accomplishing image editing tasks across a wide range of real-world scenarios. The proposed framework opens up new possibilities for realistic and visually appealing image editing and enables diverse applications requiring sophisticated image modifications. Additionally, we discuss the limitations and outline potential directions for future research in the field of image editing using off-The-shelf foundation models, enabling continued advancements in this domain.
AB - The ability to edit images in a realistic and visually appealing manner is a fundamental requirement in various computer vision applications. In this paper, we present ImEW, a unified framework designed for solving image editing tasks. ImEW utilizes off-The-shelf foundation models to address four essential editing tasks: object removal, object translation, object replacement, and generative fill beyond the image frame. These tasks are accomplished by leveraging the capabilities of state-of-The-Art foundation models, namely the Segment Anything Model, Grounding DINO, LaMa, and Stable Diffusion. These models have undergone extensive training on large-scale datasets and have exhibited exceptional performance in understanding image context, object manipulation, and texture synthesis. Through extensive experimentation, we demonstrate the effectiveness and versatility of ImEW in accomplishing image editing tasks across a wide range of real-world scenarios. The proposed framework opens up new possibilities for realistic and visually appealing image editing and enables diverse applications requiring sophisticated image modifications. Additionally, we discuss the limitations and outline potential directions for future research in the field of image editing using off-The-shelf foundation models, enabling continued advancements in this domain.
KW - diffusion models
KW - generative models
KW - image editing
KW - segment anything model
UR - http://www.scopus.com/inward/record.url?scp=85180417488&partnerID=8YFLogxK
U2 - 10.1145/3607827.3616840
DO - 10.1145/3607827.3616840
M3 - Conference contribution
AN - SCOPUS:85180417488
T3 - LGM3A 2023 - Proceedings of the 1st Workshop on Large Generative Models Meet Multimodal Applications, Co-located with: MM 2023
SP - 34
EP - 44
BT - LGM3A 2023 - Proceedings of the 1st Workshop on Large Generative Models Meet Multimodal Applications, Co-located with
PB - Association for Computing Machinery, Inc
Y2 - 2 November 2023
ER -