By Jinendra Khobare, Solution Architect, Sensfrx, Secure Layer7
Phishing threats have become a prevalent issue in India and globally, with cyber criminals’ target computers and networks, with activities ranging from direct system attacks to tech-aided crimes. They attempt to steal sensitive information such as financial details, passwords, and personal data through fake websites, emails, texts or messages. The rapid digitalisation of services and the increasing number of internet users have made it a prime target for phishing attacks nowadays. This growing crime takes different forms, like email scams, fake websites, and misleading ads. To prevent financial loss and data breaches caused by phishing attacks, it’s important to adopt advanced detection and prevention measurements.
According to a fraud survey phishing constitutes a fraction of 0.4% in identity crime reclassification, corresponding to 57,800 victims.
Advanced techniques in phishing detection
Researchers are exploring advanced methods to improve phishing detection. They use different approaches such as combination of weak classifiers, ensemble-based classifiers, and machine learning algorithms for better results.
The effectiveness of anti-phishing tools is tested through usability studies and large-scale online learning to identify suspicious URLs. Automatic detection methods, DNS-poisoning-based phishing detection, and combining multiple classifiers are all part of the evolving phishing defense strategies.
Problem background
Phishing websites are a growing cybersecurity threat, risking online financial services and data security. They take advantage of weaknesses in web servers by hacking into existing ones or setting up new ones. Chinese phishers often create new website domains, while American phishers prefer using compromised sites. Researchers are trying to make phishing detection more accurate using various methods like Linear Regression, K-nearest neighbor (KNN), Naïve Bayes, Support Vector Machine (SVM), and Artificial Neural Network (ANN). KNN and ANN have shown themselves to be quite accurate. However, there are still challenges, especially with small datasets and the need for detecting phishing in real-time. Current methods include filtering URLs and whitelisting, with combinations of classifiers proving to be very accurate.
Problem statement
Phishing detection methods do have a high false alarm rate and poor detection accuracy, particularly when new phishing strategies are used. Furthermore, because registering a new domain has grown easier, the most widely used technique—the blacklist-based method—is ineffective in reacting to phishing attempts. No com- prehensive blacklist can guarantee a flawless, up-to-date database. Additionally, some solutions have employed page content inspection to supplement the weaknesses of the stale lists and get around the issue of false nega- tives. Furthermore, the methods used by various page content inspection algorithms to detect phishing websites vary in terms of accuracy. Some solutions have incorporated page content inspection to address the limitations of static blacklists, but the accuracy of various content inspection algorithms varies. Ensemble methods, which integrate error-detection rates and accuracy from diverse algorithms, emerge as a promising alternative. This study focuses on two key research areas:
A. How should a raw dataset be processed for phishing detection?
B. How can we enhance the algorithms for phishing website detection?
C. What strategies can lower the false-negative rate in phishing website detection algorithms?
D. Which combinations of classifiers yield the highest phishing detection rates?
Categories of phishing attacks
1. Spear phishing
Standard phishing campaigns typically involve unfamiliar websites, making them easier to identify. However, spear-phishing campaigns are more sophisticated, with emails appearing to be from familiar websites or tailored to the recipient’s interests, making them harder to spot as scams.
• Spear phishing is the preferred method of attack for 65% of attackers.
• Approximately 71% of all targeted attacks are conducted through spear phishing.
• In 2012, nearly 90% of cyber-attacks were executed through spear phishing.
2. Extension and credential phishing
In extension and credential phishing various common file extensions are frequently exploited in phishing schemes where users’ sign-in data is stolen. In this type of phishing attacks users sometimes download malicious files by entering fake CAPTCHAs. In 2021, 52% of companies had their credentials compromised, leading to unauthorised access to confidential and private information.
3. Existing anti-phishing approaches
In a recent study review published by the Anti-Phishing Working Group (APWG), there were at least 67,677 phishing attacks in the last 6 months of 2010. Various anti-phishing approaches have been designed, categorized into non-content-based, content-based, visual similarity-based, and character-based methods.
4. Non-content-based approaches
Categorized non-content-based approaches into URL and host information-based classification, blacklisting, and whitelisting methods. URL-based schemes classify URLs based on lexical and host
features, achieving success rates between 95% and 99%. Blacklisting uses reports from users or companies to detect phishing websites, while whitelisting identifies known good sites. Whitelisting approaches use server-side validation to add additional authentication metrics to client browsers. However, URL blacklisting fails to identify most phishing incidents, especially those targeting specific users (spear-phishing). Examples of whitelisting tools include dynamic security skins, TrustBar, and SRD.
5. Content-based approaches
Content-based approaches detect phishing attacks by investigating site contents. Some researchers explore fingerprinting and fuzzy logic- based approaches using hashes of websites to identify phishing sites while some anti-phishing filter examines various page features including URL, page rank, WHOIS information, and content.
How the phishing detector in SensFRX Works: A summary
The phishing detector in SensFRX serves as an advanced barrier against cyber threats, specifically designed for users with limited technical expertise. It operates in real-time, examining URLs during user interactions and meticulously analysing their structures and content for potential signs of phishing. The detector’s primary advantage is its integration with a machine learning model that continuously learns from a diverse dataset of phishing URLs.
Importantly, the integration of this phishing detection mechanism into SensFrx is designed to be user- friendly. The system seamlessly integrates into existing security frameworks, making it accessible to users without specialised cybersecurity knowledge. The user interface is intuitive, offering a straightforward experience while providing a powerful defense against phishing attempts.
Conclusion
In conclusion, phishing is a complex online threat with diverse tactics and significant implications for user security and data integrity. On the internet, malware hosting websites are more prevalent than phishing websites. Consequently, drive-by download attacks and malware distribution websites may be overlooked in a cybersecurity strategy that concentrates on phishing attack mitigation but leaves out mitigations for higher likelihood threats. Bulgaria, Ukraine, and Indonesia are among the places where there have previously been higher than usual numbers of phishing sites.