{"id":315863,"date":"2024-03-12T09:55:14","date_gmt":"2024-03-12T16:55:14","guid":{"rendered":"https:\/\/www.paloaltonetworks.com\/blog\/?p=315863"},"modified":"2024-03-12T09:57:17","modified_gmt":"2024-03-12T16:57:17","slug":"challenges-for-ai-in-cybersecurity","status":"publish","type":"post","link":"https:\/\/www2.paloaltonetworks.com\/blog\/2024\/03\/challenges-for-ai-in-cybersecurity\/","title":{"rendered":"5 Unique Challenges for AI in Cybersecurity"},"content":{"rendered":"<p>AI tends to be understood as one coherent field of study and application where similar solutions apply for all the use cases. The reality is that applying AI in real-world environments with high precision requires specialization in the specific field of study, and each use case has unique challenges. Applied AI in cybersecurity has many unique challenges, and we will take a look into a few of them that we are considering the most important.<\/p>\n<h3><a id=\"post-315863-_tfihsg8aei7k\"><\/a> One \u2014 Lack of Labeled Data<\/h3>\n<p>Unlike many other fields, data and labels are scarce in the cybersecurity space and usually require highly skilled labor to generate. Looking at a random set of logs in most cybersecurity logging systems will most likely result in zero labels. Nobody labeled a user downloading a document as malicious or benign; nobody provided data if a login was legitimate or not. This is unique to cybersecurity. In many other fields of applied AI, labels are abundant and allow for using techniques leveraging those labels.<\/p>\n<p>Because of the lack of labels, most detection approaches use unsupervised learning, such as clustering or anomaly detection, as it doesn\u2019t require any labels. But, that has considerable downsides.<\/p>\n<h3><a id=\"post-315863-_5re0n2924zle\"><\/a> Two \u2014 Anomalous Is Not Malicious<\/h3>\n<p>Following up on the last point, many approaches use anomaly detection and clustering to detect suspicious activities. While these techniques have some merit, they have the unfortunate secondary effect of detecting many benign activities.<\/p>\n<p>Reviewing any mature network environment will present many assets and activities that are anomalous by design, like vulnerability scanners, domain controllers, service accounts and many more. These assets create considerable noise for anomaly detection systems, as well as alert fatigue for a SOC analyst reviewing the alerts generated by such systems. Whereas attackers, most of the time, will remain below the threshold and can remain undetected by such systems as the level of anomalous activity to achieve their goals is often considerably lower than what is done by the aforementioned assets.<\/p>\n<figure id=\"attachment_315878\" aria-describedby=\"caption-attachment-315878\" style=\"width: 600px\" class=\"wp-caption aligncenter\"><div style=\"max-width:100%\" data-width=\"600\"><span class=\"ar-custom\" style=\"padding-bottom:61.83%;\"><img loading=\"lazy\" decoding=\"async\"  class=\"wp-image-315878 lozad\"  data-src=\"https:\/\/www.paloaltonetworks.com\/blog\/wp-content\/uploads\/2024\/03\/word-image-315863-1.png\" alt=\"Chart of number of devices versus ports accessed for a typical network.\" width=\"600\" height=\"371\" \/><\/span><\/div><figcaption id=\"caption-attachment-315878\" class=\"wp-caption-text\">Visualization of simplistic anomaly detection algorithms to detect port scans.<\/figcaption><\/figure>\n<p>On the other hand, supervised learning systems can remediate this issue and filter out anomalous by design activities and assets, even when using unsupervised techniques as part of the model. But, they require labels, and we\u2019ve established that those are hard to find.<\/p>\n<h3><a id=\"post-315863-_uixy5luv5rdw\"><\/a> Three \u2014 Domain Adaptation and Concept Drift Are Abundant<\/h3>\n<p>Domain adaptation and concept drift are key issues in data science. Models are usually trained on a subset of data many times in a simulation of the real world. When that model is losing touch with the real-world data, leading to poor precision and recall, you would call this \u201cConcept Drift.\u201d Alternatively, if the model doesn\u2019t provide the same result across multiple situations, you would call that \u201cDomain Adaptation.\u201d<\/p>\n<figure id=\"attachment_315891\" aria-describedby=\"caption-attachment-315891\" style=\"width: 736px\" class=\"wp-caption aligncenter\"><div style=\"max-width:100%\" data-width=\"736\"><span class=\"ar-custom\" style=\"padding-bottom:49.86%;\"><img loading=\"lazy\" decoding=\"async\"  class=\"wp-image-315891 lozad\"  data-src=\"https:\/\/www.paloaltonetworks.com\/blog\/wp-content\/uploads\/2024\/03\/word-image-315863-2.png\" alt=\"Original data versus concept drift. \" width=\"736\" height=\"367\" \/><\/span><\/div><figcaption id=\"caption-attachment-315891\" class=\"wp-caption-text\">Visual representation of the concept drift.<\/figcaption><\/figure>\n<p>In the cybersecurity space, the world is always changing as both attackers and defenders try to stay ahead of one another, leading to considerable concept drift. By reviewing the MITRE definition of <a href=\"https:\/\/attack.mitre.org\/techniques\/T1055\/\" rel=\"nofollow,noopener\" >process injection<\/a> we can see that the meaning of the term has changed considerably in the last couple of years with new subtechniques being added all the time. That will probably change again as attackers evolve. Models trained to detect such activity require retraining, or they become obsolete.<\/p>\n<p>Additionally, models trained in one environment don\u2019t necessarily generalize well for others. Due to the large set of configurations in real-world environments, models trained for cybersecurity issues tend to have considerable domain adaptation issues. Imagine a model trained on a lab environment, that model has never been fed with examples of the myriad of configurations applicable to a specific application, let alone how different applications might change the behavior due to other installed applications.<\/p>\n<h3><a id=\"post-315863-_ve9u38nxveix\"><\/a> Four \u2014 Domain Expertise Is Critical and Hard to Find<\/h3>\n<p>Unlike many other domains, validating models requires unique cybersecurity expertise. Classifying if a traffic light is green or red doesn\u2019t need a specialist, whereas classifying if a file is malicious requires a malware analysis expert. Building AI models for cybersecurity requires trained experts that can validate the results and label cases to assess key performance indicators (KPIs). As there\u2019s a scarcity of those experts and doing supervised learning is the golden path for cybersecurity AI, that creates another key challenge to doing AI correctly in this space.<\/p>\n<h3><a id=\"post-315863-_kqri806q71r0\"><\/a> Five \u2014 Explainability Is Key for Successful Incident Response<\/h3>\n<p>Even if you can train the best model that has high precision and recall but the output isn\u2019t clear, it\u2019s not a good model. Incident response requires a clear understanding of what actually happened to properly respond to the threat at hand. Models are just tools that help reach the goal of detecting the attack, but without explaining what happened, those don\u2019t translate into actual security value for analysts. This creates challenges for unsupervised learning as it\u2019s harder to explain the model behavior. It also creates a high bar for any supervised model that must provide a proper explanation on what happened, why it\u2019s important, and how it\u2019s detecting the activity.<\/p>\n<figure id=\"attachment_315904\" aria-describedby=\"caption-attachment-315904\" style=\"width: 549px\" class=\"wp-caption aligncenter\"><div style=\"max-width:100%\" data-width=\"549\"><span class=\"ar-custom\" style=\"padding-bottom:84.88%;\"><img loading=\"lazy\" decoding=\"async\"  class=\"wp-image-315904 lozad\"  data-src=\"https:\/\/www.paloaltonetworks.com\/blog\/wp-content\/uploads\/2024\/03\/word-image-315863-3.png\" alt=\"SmartScore explains why a score was set, based on the following insights. \" width=\"549\" height=\"466\" \/><\/span><\/div><figcaption id=\"caption-attachment-315904\" class=\"wp-caption-text\">Explainability in Cortex SmartScore helps an analyst understand why a priority score was given.<\/figcaption><\/figure>\n<h2><a id=\"post-315863-_m4ew3vpr82l\"><\/a> Cortex \u2014 Cybersecurity AI Applied in Scale<\/h2>\n<p>Cortex has applied solutions for unlabeled data that is leveraging a <a href=\"https:\/\/patents.google.com\/patent\/US11468358B2\" rel=\"nofollow,noopener\" >patented, semi-supervised learning technique<\/a> and multiple other techniques to leverage the scale of data that the Cortex platform collects. Our entire stack of prevention, detection and prioritization systems, including Local Analysis, <a href=\"https:\/\/docs-cortex.paloaltonetworks.com\/r\/Cortex-XDR\/Cortex-XDR-Pro-Administrator-Guide\/Analytics-Concepts\">Cortex Analytics<\/a> and <a href=\"https:\/\/www.paloaltonetworks.com\/blog\/security-operations\/beating-alert-fatigue-with-cortex-xdr-smartscore-technology\/\">SmartScore<\/a>, are leveraging supervised learning that aims to detect malicious data and ignore the anomalous by designing data that is benign. Furthermore, we have invested considerably into explainability and transparency with <a href=\"https:\/\/docs-cortex.paloaltonetworks.com\/r\/Cortex-XDR-Analytics-Alert-Reference-by-data-source\">documentation<\/a> and <a href=\"https:\/\/www.paloaltonetworks.com\/blog\/security-operations\/unlocking-the-black-box-transparency-for-ml-based-incident-risk-scoring\/#:~:text=SmartScore%20Explainability%20in%20a%20Nutshell,handle%20it%20quickly%20and%20efficiently.\">explainability models<\/a> where needed.<\/p>\n<h3><a id=\"post-315863-_5vavumnr74ar\"><\/a> Key Takeaways<\/h3>\n<p><strong>Specialization Is Pivotal:<\/strong> Understand that applying AI in cybersecurity requires specialization in the specific field and use case. Each use case has unique challenges, and a one-size-fits-all approach doesn't work. Tailor your AI solutions to the specific cybersecurity challenges you face.<\/p>\n<p><strong>Lack of Labeled Data:<\/strong> Unlike many other fields, cybersecurity often lacks labeled data, making supervised learning challenging. Embrace unsupervised learning techniques, like clustering and anomaly detection, but be aware that they can generate false positives, contributing to alert fatigue.<\/p>\n<p><strong>Domain Adaptation and Concept Drift:<\/strong> Recognize that the cybersecurity landscape is evolving, leading to concept drift and domain adaptation issues. Models trained on outdated or limited data may become obsolete. Regularly retrain models and consider the dynamic nature of the threat landscape.<\/p>\n<p><strong>Domain Expertise Is Essential:<\/strong> Building AI models for cybersecurity requires domain expertise. Validate models with cybersecurity experts who can assess key performance indicators. Scarcity of such experts can be a challenge, but their input is crucial for effective AI implementation.<\/p>\n<p><strong>Explainability Matters<\/strong>: In incident response, explainability is crucial. Models must not only detect threats but also provide clear explanations of what happened, why it's important, and how they detected the activity. Invest in AI solutions that prioritize explainability for successful incident response.<\/p>\n<p><b>Like What You Read? Stay Up-to-Date by <\/b><a href=\"https:\/\/www.paloaltonetworks.com\/blog\/security-operations\/subscribe\/\"><b>Subscribing<\/b><\/a><b> to our SecOps Blogs<\/b><\/p>\n<p><b>Learn More About AI\u2019s Impact on Cybersecurity<\/b><\/p>\n<p><a href=\"https:\/\/symphony.paloaltonetworks.com\/?utm_source=content-corp-blog&amp;utm_medium=web&amp;utm_campaign=symphony24&amp;utm_content=\">Register for Symphony 2024<\/a><span style=\"font-weight: 400;\">, April 17-18, to explore the latest advancements in AI-driven security, where machine learning algorithms predict, detect and respond to threats faster and more effectively than ever.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>There are challenges for AI in cybersecurity in real-world environments with high precision, requiring specialization in the specific field of study. <\/p>\n","protected":false},"author":723,"featured_media":315925,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[6717],"tags":[9727,6738,8773],"coauthors":[8258],"class_list":["post-315863","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-products-and-services","tag-ai-in-cybersecurity","tag-cortex","tag-smartscore","sec_ops_category-must-read-articles"],"jetpack_featured_media_url":"https:\/\/www2.paloaltonetworks.com\/blog\/wp-content\/uploads\/2024\/03\/13_cortex_AI-blog_5-challenges_blog_4_3.jpg","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www2.paloaltonetworks.com\/blog\/wp-json\/wp\/v2\/posts\/315863","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www2.paloaltonetworks.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www2.paloaltonetworks.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www2.paloaltonetworks.com\/blog\/wp-json\/wp\/v2\/users\/723"}],"replies":[{"embeddable":true,"href":"https:\/\/www2.paloaltonetworks.com\/blog\/wp-json\/wp\/v2\/comments?post=315863"}],"version-history":[{"count":8,"href":"https:\/\/www2.paloaltonetworks.com\/blog\/wp-json\/wp\/v2\/posts\/315863\/revisions"}],"predecessor-version":[{"id":315924,"href":"https:\/\/www2.paloaltonetworks.com\/blog\/wp-json\/wp\/v2\/posts\/315863\/revisions\/315924"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www2.paloaltonetworks.com\/blog\/wp-json\/wp\/v2\/media\/315925"}],"wp:attachment":[{"href":"https:\/\/www2.paloaltonetworks.com\/blog\/wp-json\/wp\/v2\/media?parent=315863"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www2.paloaltonetworks.com\/blog\/wp-json\/wp\/v2\/categories?post=315863"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www2.paloaltonetworks.com\/blog\/wp-json\/wp\/v2\/tags?post=315863"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www2.paloaltonetworks.com\/blog\/wp-json\/wp\/v2\/coauthors?post=315863"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}