Abstract
In today's digital era, the escalating phenomenon of cyberbullying is a pervasive and growing concern. With the increasing prevalence of social media platforms, such as Twitter, online abusive behavior has become a significant issue that often leads to unpleasant experiences for users. Manual detection of abnormal and bullying behavior within the realm of social media is inherently not scalable. Moreover, most existing studies on cyberbullying detection have been predominantly conducted in English and very limited work has been done on Urdu (a widely used language in Asia). This paper presents an approach for detecting cyberbullying in Roman Urdu tweets and identifying abuser profiles on Twitter. Firstly, we develop a text corpus of Roman Urdu tweets with user profile data. Subsequently, we employ Gated Recurrent Unit (GRU) model coupled with the application of word2vec technique for word embedding to develop a cyberbullying detection model. Furthermore, we present temporal abusive tweet probability analysis method to provide a nuanced analysis of the number of bullying and non-bullying tweets sent by individuals within a specific time interval. To evaluate the performance, we compare the GRU-based approach with other machine learning models. The results show that the GRU model with lexical normalization gives the best results with an accuracy of 97% and F1-measure of 97%.
Original language | English |
---|---|
Pages (from-to) | 123339-123351 |
Number of pages | 13 |
Journal | IEEE Access |
Volume | 12 |
DOIs | |
Publication status | Published - 16 Aug 2024 |
Keywords
- Abuser profile identification
- Blogs
- Cyberbullying
- Cyberbullying detection
- Data models
- Deep learning
- Detection algorithms
- Feature extraction
- Hate speech
- Identification of persons
- Machine learning
- Roman Urdu
- Social media
- Social networking (online)
- Support vector machines