TY - JOUR
T1 - A Multi-Faceted Approach to Trending Topic Attack Detection Using Semantic Similarity and Large-Scale Datasets
AU - Kraidia, Insaf
AU - Ghenai, Afifa
AU - Belhaouari, Samir Brahim
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2025
Y1 - 2025
N2 - Twitter's widespread popularity has made it a prime target for malicious actors exploiting trending hashtags to disseminate harmful content. This study marks the first systematic exploration of semantic consistency in tweets to detect trending topic attacks. Unlike previous approaches, we emphasize the semantic aspect of tweets, leveraging advanced techniques such as semantic similarity estimation using WordNet and contextual understanding through Sentence-Transformers. To support this methodology, we curated large-scale, high-quality datasets comprising 7,000 Arabic and 28,000 English tweets, applying tailored preprocessing steps to ensure efficiency and accuracy. A novel data augmentation technique further enriched the quality and diversity of these datasets. We evaluated our approach using a comprehensive framework that assessed textual, image, and overall similarity. Five machine learning models - Random Forest, Decision Tree, K-Neighbors, Gradient Boosting, and XGBoost - were tested, with results benchmarked against nine baseline methods across different linguistic datasets and learning scenarios. Our approach demonstrated superior performance, achieving F1-scores of 96% for English and 97% for Arabic, with accuracy improvements ranging from 2% to 14% for English and 5% to 28% for Arabic. These results establish a new benchmark for detecting trending topic attacks across languages, highlighting the robustness and effectiveness of our method in combating malicious activities on social platforms.
AB - Twitter's widespread popularity has made it a prime target for malicious actors exploiting trending hashtags to disseminate harmful content. This study marks the first systematic exploration of semantic consistency in tweets to detect trending topic attacks. Unlike previous approaches, we emphasize the semantic aspect of tweets, leveraging advanced techniques such as semantic similarity estimation using WordNet and contextual understanding through Sentence-Transformers. To support this methodology, we curated large-scale, high-quality datasets comprising 7,000 Arabic and 28,000 English tweets, applying tailored preprocessing steps to ensure efficiency and accuracy. A novel data augmentation technique further enriched the quality and diversity of these datasets. We evaluated our approach using a comprehensive framework that assessed textual, image, and overall similarity. Five machine learning models - Random Forest, Decision Tree, K-Neighbors, Gradient Boosting, and XGBoost - were tested, with results benchmarked against nine baseline methods across different linguistic datasets and learning scenarios. Our approach demonstrated superior performance, achieving F1-scores of 96% for English and 97% for Arabic, with accuracy improvements ranging from 2% to 14% for English and 5% to 28% for Arabic. These results establish a new benchmark for detecting trending topic attacks across languages, highlighting the robustness and effectiveness of our method in combating malicious activities on social platforms.
KW - Trending topic attacks
KW - detection
KW - hashtag
KW - semantic similarity
KW - twitter
UR - http://www.scopus.com/inward/record.url?scp=85216975068&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2025.3535996
DO - 10.1109/ACCESS.2025.3535996
M3 - Article
AN - SCOPUS:85216975068
SN - 2169-3536
VL - 13
SP - 21005
EP - 21028
JO - IEEE Access
JF - IEEE Access
ER -