TY - GEN
T1 - Content-Agnostic Detection of Phishing Domains using Certificate Transparency and Passive DNS
AU - Alsabah, Mashael
AU - Nabeel, Mohamed
AU - Boshmaf, Yazan
AU - Choo, Euijin
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/10/26
Y1 - 2022/10/26
N2 - Existing phishing detection techniques mainly rely on blacklists or content-based analysis, which are not only evadable, but also exhibit considerable detection delays as they are reactive in nature. We observe through our deep dive analysis that artifacts of phishing are manifested in various sources of intelligence related to a domain even before its contents are online. In particular, we study various novel patterns and characteristics computed from viable sources of data including Certificate Transparency Logs, and passive DNS records. To compare benign and phishing domains, we construct thoroughly-verified realistic benign and phishing datasets. Our analysis shows clear differences between benign and phishing domains that can pave the way for content-agnostic approaches to predict phishing domains even before the contents of these webpages are up and running. To demonstrate the usefulness of our analysis, we train a classifier with distinctive features, and we show that we can (1) perform content-agnostic predictions with a very low FPR of 0.3%, and high precision (98%) and recall (90%), and (2) predict phishing domains days before they are discovered by state-of-the-art content-based tools such as VirusTotal.
AB - Existing phishing detection techniques mainly rely on blacklists or content-based analysis, which are not only evadable, but also exhibit considerable detection delays as they are reactive in nature. We observe through our deep dive analysis that artifacts of phishing are manifested in various sources of intelligence related to a domain even before its contents are online. In particular, we study various novel patterns and characteristics computed from viable sources of data including Certificate Transparency Logs, and passive DNS records. To compare benign and phishing domains, we construct thoroughly-verified realistic benign and phishing datasets. Our analysis shows clear differences between benign and phishing domains that can pave the way for content-agnostic approaches to predict phishing domains even before the contents of these webpages are up and running. To demonstrate the usefulness of our analysis, we train a classifier with distinctive features, and we show that we can (1) perform content-agnostic predictions with a very low FPR of 0.3%, and high precision (98%) and recall (90%), and (2) predict phishing domains days before they are discovered by state-of-the-art content-based tools such as VirusTotal.
KW - certificate transparency
KW - machine learning
KW - passive DNS
KW - phishing domains detection
UR - http://www.scopus.com/inward/record.url?scp=85142543072&partnerID=8YFLogxK
U2 - 10.1145/3545948.3545958
DO - 10.1145/3545948.3545958
M3 - Conference contribution
AN - SCOPUS:85142543072
T3 - ACM International Conference Proceeding Series
SP - 446
EP - 459
BT - Proceedings of 25th International Symposium on Researchin Attacks, Intrusions and Defenses, RAID 2022
PB - Association for Computing Machinery
T2 - 25th International Symposium on Researchin Attacks, Intrusions and Defenses, RAID 2022
Y2 - 26 October 2022 through 28 October 2022
ER -