An Early Investigation into the Utility of Multimodal Large Language Models in Medical Imaging

Sulaiman Khan*, Md Rafiul Biswas*, Alina Murad, Hazrat Ali, Zubair Shah*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Recent developments in multimodal large language models (MLLMs) have spurred significant interest in their potential applications across various medical imaging domains. On the one hand, there is a temptation to use these generative models to synthesize realistic-looking medical image data, while on the other hand, the ability to identify synthetic image data in a pool of data is also significantly important. In this study, we explore the potential of the Gemini (gemini-1.0-pro-visionlatest) and GPT-4V (gpt-4-vision-preview) models for medical image analysis using two modalities of medical image data. Utilizing synthetic and real imaging data, both Gemini AI and GPT-4V are first used to classify real versus synthetic images, followed by an interpretation and analysis of the input images. Experimental results demonstrate that both Gemini and GPT4 could perform some interpretation of the input images. In this specific experiment, Gemini was able to perform slightly better than the GPT-4V on the classification task. In contrast, responses associated with GPT-4V were mostly generic in nature. Our early investigation presented in this work provides insights into the potential of MLLMs to assist with the classification and interpretation of retinal fundoscopy and lung X-ray images. We also identify key limitations associated with the early investigation study on MLLMs for specialized tasks in medical image analysis.

Original languageEnglish
Title of host publication2024 Ieee International Conference On Information Reuse And Integration For Data Science, Iri 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages234-239
Number of pages6
ISBN (Electronic)9798350351187
ISBN (Print)979-8-3503-5119-4
DOIs
Publication statusPublished - 2024
Event25th IEEE International Conference on Information Reuse and Integration for Data Science, IRI 2024 - San Jose, United States
Duration: 7 Aug 20249 Aug 2024

Publication series

NameIeee International Conference On Information Reuse And Integration

Conference

Conference25th IEEE International Conference on Information Reuse and Integration for Data Science, IRI 2024
Country/TerritoryUnited States
CitySan Jose
Period7/08/249/08/24

Keywords

  • ChatGPT
  • Gemini AI
  • Llm
  • Lung
  • Multimodal data
  • Retina

Fingerprint

Dive into the research topics of 'An Early Investigation into the Utility of Multimodal Large Language Models in Medical Imaging'. Together they form a unique fingerprint.

Cite this