Pushing Boundaries: Exploring Zero Shot Object Classification with Large Multimodal Models

Ashhadul Islam, Md Rafiul Biswas, Wajdi Zaghouani, Samir Brahim Belhaouari, Zubair Shah

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

The synergy of language and vision models has given rise to Large Language and Vision Assistant models (LLVAs), designed to engage users in rich conversational experiences intertwined with image-based queries. These comprehensive multimodal models seamlessly integrate vision encoders with Large Language Models (LLMs), expanding their applications in general-purpose language and visual comprehension. The advent of Large Multimodal Models (LMMs) heralds a new era in Artificial Intelligence (AI) assistance, extending the horizons of AI utilization. This paper takes a unique perspective on LMMs, exploring their efficacy in performing image classification tasks using tailored prompts designed for specific datasets. We also investigate the LLVAs zero-shot learning capabilities. Our study includes a benchmarking analysis across four diverse datasets: MNIST, Cats Vs. Dogs, Hymnoptera (Ants Vs. Bees), and an unconventional dataset comprising Pox Vs. Non-Pox skin images. The results of our experiments demonstrate the model's remarkable performance, achieving classification accuracies of 85%, 100%, 77%, and 79% for the respective datasets without any fine-tuning. To bolster our analysis, we assess the model's performance post fine-tuning for specific tasks. In one instance, fine-tuning is conducted over a dataset comprising images of faces of children with and without autism. Prior to fine-tuning, the model demonstrated a test accuracy of 55%, which significantly improved to 83% post fine-tuning. These results, coupled with our prior findings, underscore the transformative potential of LLVAs and their versatile applications in real-world scenarios.

Original languageEnglish
Title of host publicationProceedings - 2023 10th International Conference on Social Networks Analysis, Management and Security, SNAMS 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350318906
DOIs
Publication statusPublished - 2023
Event10th International Conference on Social Networks Analysis, Management and Security, SNAMS 2023 - Abu Dhabi, United Arab Emirates
Duration: 21 Nov 202324 Nov 2023

Publication series

NameProceedings - 2023 10th International Conference on Social Networks Analysis, Management and Security, SNAMS 2023

Conference

Conference10th International Conference on Social Networks Analysis, Management and Security, SNAMS 2023
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period21/11/2324/11/23

Keywords

  • Classification
  • Large Language Models
  • Large Multimodal Models
  • Prompt Engineering

Fingerprint

Dive into the research topics of 'Pushing Boundaries: Exploring Zero Shot Object Classification with Large Multimodal Models'. Together they form a unique fingerprint.

Cite this