Predictive analytics using machine learning techniques have a major and evolving role to play in cyber security. Such systems will self-learn normal patterns by observing an organization’s data flows under normal business operations and then match it to the pattern when an event occurs
The nearly continuous news-stream of information security breaches around the world is something that modern society was definitely not prepared for in the rush to benefits from advances in information technologies. As the recent Wannacry worm vividly demonstrated, info-security problems can now impact every aspect of modern society, from hospitals, banks, and telecoms to governments and individuals. While our lives are being enriched by IT products and services, our personal and social experiences are being challenged by the rapid adoption of new technologies, and this will only get more complex with the advent of the Internet of Things, where our refrigerators, cars, even water purifiers could all be used by attackers. The sheer scale of the problem makes it critical for big data analytics to be deployed to uncover unusual data and behavior patterns across all kinds of systems.
The battle against security breaches is fought along the four dimensions of Prevention, Preparation, Detection, and Response. Over the last decade, the security industry seems to have largely given up on Prevention, but that is a topic for another day. It is in the dimensions of Preparation and Detection that Big Data Analytics capabilities are being used to “look for the needle in the haystack”, to identify anomalous patterns and to connect the dots across diverse systems and data sets.
An enterprise’s data may be categorized into transaction and interaction data, entity data, systems operations data, reference data, business rules, and activity logs data. Each type of data travels at different frequencies, may be structured or unstructured, and may be centralized, distributed or even dispersed beyond the enterprise firewall. Combing through such massive data haystacks in real-time is a huge challenge by itself. The easy buzz-word response is big-data technologies, but it’s not easy to actually put such solutions into practice.
Even knowing what an anomalous pattern looks like is very difficult because attackers don’t advertise themselves or their methods! Sometimes we may know what to look for, and other times we may know what NOT to look for. It takes a savvy combination of business domain experience, cyber security smarts, and data science expertise to collaboratively apply advanced analytics techniques and find data patterns that signify potential attacks. These findings lead to actions through the organization’s plans for breach Preparation or Detection. In the case of many newer advanced attacks, there may be no discernible anomaly in a specific dataset, but when data scientists and security experts look across multiple data sets, then they start to connect the dots and find anomalies indicating the presence of APTs (Advanced Persistent Threats), sleeper-bots, or intelligence-gathering bots.
Predictive analytics using machine learning techniques have a major and evolving role to play in this field. Such systems will self-learn normal patterns by observing an organization’s data flows under normal business operations. For example, the system may build pattern models for the normal data flow patterns in the first few days of a new employee joining the marketing department, during busy festival sales periods, in the last 3 months before a supplier relationship is terminated etc. The system will then continually match data flows triggered by a specific business event, against its stored patterns to find anomalies and even predict likely breaches, as well as to evolve the pattern models themselves.
Such a system might have noted the unusual behavior of the Wannacry payload while it was gathering and encrypting files. This unusual behavior would then have triggered early alerts or even isolated the activity, thus preventing serious damage and proliferation. Of course, this would only be possible if the organization had a Big Data Analytics framework in place.
Even before such advanced approaches become mainstream, Big Data Analytics has made an impact in some areas of cyber security. The creation of contextualized tagging for DLP (Data Loss Prevention) on massive enterprise-wide networks and data assets is one example. The prioritization and visualization of security exceptions is another. In this scenario, Big Data Analytics is used to extract and prioritize insights from the large streams of alerts and logs that would otherwise swamp a security team, and to present the insights visually in the form of easily actionable dashboards.
The potential attackers, their motives, and the nature of their attacks can often be very different for a bank, an automated car company, a pharma or consumer goods major, or a government department. Until more generic solutions become available for breach Prevention, there is little possibility of a one-size-fits-all, off-the-shelf solution for Preparation or Detection.
In fact, the only truly effective way to find “the needle in the haystack” of data is to combine business domain knowledge along with big data, analytics, and cyber security expertise. Today, these advanced multi-disciplinary skillsets are simply not available in one place, neither in enterprises, nor in traditional information consulting, technology, and security firms. As the industry grapples with critical cyber security challenges, the winners will be those organizations that successfully create and deploy such inter-disciplinary teams effectively to leverage the full potential of Big Data Analytics.
Authored by Ardaman Kohli, Principal, Axtria and Ashish Sharma, Principal, Axtria