Project Details
Abstract
A novel deep neural framework is proposed that aims to enhance interpretability, address ethical and safety considerations, and leverage capabilities of pre-trained foundation models or large language models (LLMs), for the integration of Responsible AI (RAI) schemes at the user end. The model, named @TuneLLM, functions as a Sandbox middleware between users and LLMs to provide better understanding of the influence of instructions on classification goals by providing a set of standard evaluation benchmarks (such as fairness disparities, demographic parity, equalized odds, equal opportunity, local feature value for interpretability, and psycholinguistics benchmarks for sub-group bias and toxicity, personality traits, and gender differences including lexical/semantic evaluations) based on human-grounded knowledge to ensure the model outcome adheres to human values. The @TuneLLM utilizes knowledge representation of LLMs to extract “concept operative words” (cop-words), defined linguistically as a set of words for an individual concept (e.g., Roget’s Thesaurus), to help interpret the basis of LLMs’ collective knowledge. By analyzing these components, we can infer their impact on LLM's knowledge representation for the model response, i.e., classification tasks. While the emergence of LLMs has driven the use of AI, there is limited clarity into the extent of alignment of tools developed by major technology companies with local and global ethical and social values. This evident research gap has raised critical concerns, e.g., unchecked plagiarism, spread of misinformation, and violation of public privacy. Limited research initiatives in Qatar and the Muslim world address the impact of machine learning (ML) on society and need for adaptation. This project proposal aims to address the pressing need for monitoring and evaluating ethical and safety considerations in AI by focusing on value-aligned LLM design. It aims to mitigate potential risks and bridge gaps, providing robust solutions for challenges in LLMs. The proposal aligns with National AI Strategy for Qatar, emphasizing ethics as a basic pillar for Qatar's future as "AI+X" Nation, as proposed by Qatar Center for Artificial Intelligence (QCAI). The proposed framework has the following goals: • Develop a set of standard evaluation benchmarks to quantify ethical and safety concerns by measuring the influence of stereotyped, discordant, hateful, and adversarial concepts (e.g., data poisoning, prompt injection, and paraphrase attack) on LLMs' classification decisions. • Monitor the influence of concepts within the @TuneLLM network to explore the potential for modifying the alignment of the LLM's decision within a certain constraint threshold solely through enforcement or revision of the set of alignment instructions without altering the LLM parameters. • Expand the capacity of LLMs by integration of the proposed Sandbox to create value-aligned AI-based social and business services.
Submitting Institute Name
Hamad Bin Khalifa University (HBKU)
Sponsor's Award Number | ARG01-0525-230348 |
---|---|
Proposal ID | EX-QNRF-ARG-40 |
Status | Active |
Effective start/end date | 1/04/24 → 1/04/26 |
Collaborative partners
- Hamad Bin Khalifa University (lead)
- Qatar University
- Fordham University
Primary Theme
- None
Primary Subtheme
- None
Secondary Theme
- None
Secondary Subtheme
- None
Keywords
- Hydrogen,Energy Transtion,Co2,Machine learning,Reservoir Management and Simulation
- None
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.