{"id":2134,"date":"2018-01-15T20:23:43","date_gmt":"2018-01-16T04:23:43","guid":{"rendered":"https:\/\/new.acalvio.com\/?p=2134"},"modified":"2018-01-15T20:23:43","modified_gmt":"2018-01-16T04:23:43","slug":"ransomware-command-and-control-detection-using-machine-learning","status":"publish","type":"post","link":"https:\/\/acalvio.p2staging.us\/index.php\/2018\/01\/15\/ransomware-command-and-control-detection-using-machine-learning\/","title":{"rendered":"Ransomware Command and Control Detection using Machine Learning"},"content":{"rendered":"<p>Authors: Deepak Gujraniya, Mohammad Waseem, Balamurali AR, and Satnam Singh<br \/>\nSince the first attack in 1989 [1], ransomware attacks have gained popularity. Especially in 2017, it has created havoc in every possible industry, including the government offices, public-sector departments, and hospitals. Apart from the financial strain that ransomware can bring, it also affects everyday aspects of the public life. For instance, the WannaCry attack on hospitals, police stations, government offices have hindered daily life of the regular citizens of numerous countries [2]. To make the things worse, ransomware is now available as a service on the darknet. Any novice attacker can also avail the ransomware service to start their own attack. This leads to the situation in which same entity is attacked more than once.<br \/>\nLike other malware, ransomware also has a kill chain. That typically includes luring the victim via phishing or other means, loading the payload i.e. installing the ransomware on the target host, and finally spreading and detonating the ransomware i.e. encrypting the hostb\u0000\u0019s memory and demanding for the ransom via a ransomware note. The ransomware attack starts when a user clicks a malicious web link or opens the attached file in a phishing email. Now, ransomware is installed on the target machine. Depending on the strain, the detonation can happen before it spreads. To encrypt machineb\u0000\u0019s data, ransomware need to use an encryption key. It may or may not use the Command and Control (C&amp;C) to get the encryption keys. The ransomware without C&amp;C use hardcoded encryption keys or locally generated keys and use the same keys for all the infected hosts. In this case, the security experts can reverse engineer the malware binaries and may find the keys. However, the ransomware using the C&amp;C get the encryption keys from the C&amp;C server hosted by the attacker. CryptoLocker, WannaCry, TeslaCrypt, Cerber, and Locky are some of the ransomware using the C&amp;C that makes nearly impossible for the defenders to recover the encryption keys from the ransomware.<\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" class=\"alignnone size-full wp-image-6484 aligncenter\" src=\"https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image7.png\" alt=\"\" width=\"613\" height=\"280\" srcset=\"https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image7.png 613w, https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image7-300x137.png 300w\" sizes=\"(max-width: 613px) 100vw, 613px\" \/>B Figure 1: Ransomware attack using command &amp; control (C&amp;C) for encryption key management<\/p>\n<p>Ransomware use different asymmetric and symmetric encryption techniques e.g. RSA, AES for generating encryption keys. The attackers are becoming more sophisticated and using both techniques in combination. An AES key is hardcoded within the payload to encrypt the files on the infected machine and then an RSA public-private key pair is generated and used to encrypt the AES encryption key and the private RSA key is uploaded onto the C&amp;C server.<br \/>\nIn early variants of ransomware, the C&amp;C server addresses were hardcoded in the malware binaries. It was easy for the defenders to find these addresses and block them. Once blocked, C&amp;C servers could not spread the infection and encrypt files. To evade such security measures ransomware started using Domain Generation Algorithm (DGA)-based techniques to connect to C&amp;C servers. With the DGA, attackers generate dynamic domain names and redirect their C&amp;C servers to these dynamic addresses. Using DGA-based C&amp;C the attackers can easily evade perimeter-based security defense tools such as firewalls, IDS\/IPS, and even threat intelligence feeds.<br \/>\nThe main use of the C&amp;C is to contact the C&amp;C server using a domain name, if the domain can be detected and blocked immediately, the attack can be stopped from spreading to other machines. For example,B thisisyourchangeqq.comB andB gvludcvhcrjwmgq.inB are two C&amp;C domains that are used by TeslaCrypt and Locky ransomware respectively.<br \/>\nRansomware connects to the C&amp;C server using DNS queries. To establish the connection, DNS resolution queries are made for domains generated by the ransomware. These queries are captured in the DNS logs of the system. By analyzing DNS logs we can detect domains used for the C&amp;C. In machine learning (ML), this problem can be posed as a classification problem where there are two classes i.e. benign DNS server and malicious C&amp;C domain server. Several machine learning classifiers, such as Random Forest [3], Support Vector Machine (SVM) [4], Artificial Neural Networks etc. can be used as classifiers. Using discriminatory and informative features from the DNS logs, one can build a classification model to detect the C&amp;C domains.<br \/>\nWe trained a ML classifier using a Random Forest classifier to detect domains generated by DGAs. Features such as bigram and trigram scores are informative and discriminating to classify the C&amp;C domains from benign domains. A bigram score tells how often that bigram is likely to occur in a normal english word [5]. B This score is less in a DGA-generated domain. We computedB trigram_benignB andB trigram_malicious scores that are fractions of trigrams present in the benign and malicious corpus respectively. TheB entropyB of a domain is also different for both malicious and benign classes of domains. We used Shannon entropy [6] as another feature to differentiate between benign and malicious domains.<br \/>\nExample domain:B <strong>google.co.in<\/strong> <strong>bigrams<\/strong> [&#8216;$g&#8217;, &#8216;go&#8217;, &#8216;oo&#8217;, &#8216;og&#8217;, &#8216;gl&#8217;, &#8216;le&#8217;, &#8216;e$&#8217;, &#8216;$c&#8217;, &#8216;co&#8217;, &#8216;o$&#8217;, &#8216;$i&#8217;, &#8216;in&#8217;, &#8216;n$&#8217;] <strong>trigrams<\/strong> [&#8216;$go&#8217;, &#8216;goo&#8217;, &#8216;oog&#8217;, &#8216;ogl&#8217;, &#8216;gle&#8217;, &#8216;le$&#8217;, &#8216;$co&#8217;, &#8216;co$&#8217;, &#8216;$in&#8217;, &#8216;in$&#8217;] Figure 2: An example of how bigrams and trigrams are extracted from a domain Below histograms shows how these features are discriminative for benign and malicious domains. Some features are more discriminative than others however all of them complement each other and improve the classification.<br \/>\n<img loading=\"lazy\" class=\"alignnone wp-image-6486 size-medium\" src=\"https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image9-300x212.png\" alt=\"\" width=\"300\" height=\"212\" srcset=\"https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image9-300x212.png 300w, https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image9.png 400w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/> <img loading=\"lazy\" class=\"alignnone wp-image-6480 size-medium\" src=\"https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image3-300x210.png\" alt=\"\" width=\"300\" height=\"210\" srcset=\"https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image3-300x210.png 300w, https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image3.png 401w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/> <img loading=\"lazy\" class=\"alignnone wp-image-6479 size-medium\" src=\"https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image2-300x211.png\" alt=\"\" width=\"300\" height=\"211\" srcset=\"https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image2-300x211.png 300w, https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image2.png 401w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><img loading=\"lazy\" class=\"alignnone wp-image-6483 size-medium\" src=\"https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image6-300x211.png\" alt=\"\" width=\"300\" height=\"211\" srcset=\"https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image6-300x211.png 300w, https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image6.png 401w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/p>\n<p style=\"text-align: center;\">Figure 3: Frequency distribution plots ofB entropy,B bigramsB andB trigramB features<\/p>\n<p><strong><i>Example: Benign domains<\/i><\/strong><\/p>\n<table>\n<tbody>\n<tr>\n<td><strong>url<\/strong><\/td>\n<td><strong>bigram_score<\/strong><\/td>\n<td><strong>entropy<\/strong><\/td>\n<td><strong>trigram_benign<\/strong><\/td>\n<td><strong>trigram_malicious<\/strong><\/td>\n<td>class label<\/td>\n<\/tr>\n<tr>\n<td>google.co.in<\/td>\n<td>7.28<\/td>\n<td>0.44<\/td>\n<td>1.0<\/td>\n<td>0.0<\/td>\n<td>benign<\/td>\n<\/tr>\n<tr>\n<td>bloomberg.com<\/td>\n<td>7.94<\/td>\n<td>0.30<\/td>\n<td>1.0<\/td>\n<td>0.0<\/td>\n<td>benign<\/td>\n<\/tr>\n<tr>\n<td>conservativetribune.com<\/td>\n<td>7.53<\/td>\n<td>0.77<\/td>\n<td>1.0<\/td>\n<td>0.0<\/td>\n<td>benign<\/td>\n<\/tr>\n<tr>\n<td>howstuffworks.com<\/td>\n<td>8.21<\/td>\n<td>0.35<\/td>\n<td>1.0<\/td>\n<td>0.0<\/td>\n<td>benign<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><strong><i>Example: Malicious domains<\/i><\/strong><\/p>\n<table>\n<tbody>\n<tr>\n<td><strong>url<\/strong><\/td>\n<td>bigram_score<\/td>\n<td>entropy<\/td>\n<td>trigram_benign<\/td>\n<td>trigram_malicious<\/td>\n<td>class label<\/td>\n<\/tr>\n<tr>\n<td>52uo5k3t73ypjije.zzis8p.bid<\/td>\n<td>10.02<\/td>\n<td>0.18<\/td>\n<td>0.54<\/td>\n<td>0.45<\/td>\n<td>malicious<\/td>\n<\/tr>\n<tr>\n<td>equityaccountants.nl<\/td>\n<td>7.88<\/td>\n<td>0.56<\/td>\n<td>1.00<\/td>\n<td>0.0<\/td>\n<td>malicious<\/td>\n<\/tr>\n<tr>\n<td>3qbyaoohkcqkzrz6.tordonator.li<\/td>\n<td>8.65<\/td>\n<td>0.52<\/td>\n<td>0.68<\/td>\n<td>0.32<\/td>\n<td>malicious<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><img loading=\"lazy\" class=\"aligncenter wp-image-6485\" src=\"https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image8.png\" alt=\"\" width=\"600\" height=\"426\" srcset=\"https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image8.png 794w, https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image8-300x213.png 300w, https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image8-768x546.png 768w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/p>\n<p style=\"text-align: center;\">Figure 4: Precision-Recall curve for the classifier<\/p>\n<p><img loading=\"lazy\" class=\"aligncenter wp-image-6478\" src=\"https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image1-1024x479.png\" alt=\"\" width=\"600\" height=\"281\" srcset=\"https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image1-1024x479.png 1024w, https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image1-300x140.png 300w, https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image1-768x359.png 768w, https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image1.png 1334w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/p>\n<p style=\"text-align: center;\">Figure 5: Code snippet of C&amp;C Detection Classifier<\/p>\n<p><img loading=\"lazy\" class=\"aligncenter wp-image-6482\" src=\"https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image5-1024x246.png\" alt=\"\" width=\"600\" height=\"144\" srcset=\"https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image5-1024x246.png 1024w, https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image5-300x72.png 300w, https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image5-768x185.png 768w, https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image5.png 1338w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><img loading=\"lazy\" class=\"size-full wp-image-1801 aligncenter\" src=\"https:\/\/new.acalvio.com\/wp-content\/uploads\/2018\/03\/f5.png\" alt=\"\" width=\"658\" height=\"158\" srcset=\"https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/03\/f5.png 658w, https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/03\/f5-300x72.png 300w\" sizes=\"(max-width: 658px) 100vw, 658px\" \/><\/p>\n<p style=\"text-align: center;\">Figure 6: Test run on some normal domains and some C&amp;C<\/p>\n<p>In the above example, b\u0000\u001cgoogle.comb\u0000\u001d and b\u0000\u001chowstuffworks.comb\u0000\u001d are benign domains and other domains are used by Locky [7] ransomware for the C&amp;C. The domain &#8220;fofsslkwvwee.de&#8221; got the maximumB malicious score (=1.0),B however, other 3 malicious domains that look like normal domains got a score less than one. Typically, ML-based the C&amp;C detection is deployed at the perimeter to monitor every DNS domain that needs big data infrastructure to process a high volume of DNS logs. Acalviob\u0000\u0019s solution to the C&amp;C detection is different and more effective than other solutions as our approach is event-driven instead of traditional boiling-the-ocean approach where every DNS query needs to be monitored. In our approach, we use deception to detect the ransomware and then leverage machine learning to detect the C&amp;C. In this approach, one does not need to monitor all the DNS traffic all the time. Once detected, these domains can be blocked to stop spreading of the ransomware in the organization. We analyse the domains only when we detect ransomware attack from our deception-based solution ShadowPlex-R [8], hence the false positive detection rate is very low. <img loading=\"lazy\" class=\"aligncenter size-full wp-image-1802\" src=\"https:\/\/new.acalvio.com\/wp-content\/uploads\/2018\/03\/f6.png\" alt=\"\" width=\"640\" height=\"389\" srcset=\"https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/03\/f6.png 640w, https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/03\/f6-300x182.png 300w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><br \/>\n<img loading=\"lazy\" class=\"aligncenter wp-image-6481\" src=\"https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image4-1024x622.png\" alt=\"\" width=\"600\" height=\"364\" srcset=\"https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image4-1024x622.png 1024w, https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image4-300x182.png 300w, https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image4-768x466.png 768w, https:\/\/acalvio.p2staging.us\/wp-content\/uploads\/2018\/01\/cc-image4.png 1186w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/p>\n<p style=\"text-align: center;\">Figure 7: Low False-Positive rate against various ransomware families<\/p>\n<div class=\"section post-body\">\nWe tested our approach on nearly 20 different ransomware families and results are summarised in Figure 7. b\u0000\u001cDetected C&amp;C domainsb\u0000\u001d is the number of domains detected by our solution and b\u0000\u001cActual C&amp;C domainsb\u0000\u001d was the actual number of the C&amp;C domains used by the ransomware. B Our solution can achieve nearly 100% true detection rate i.e. detect all the ransomware with a false positive rate of nearly 2.5%. The results demonstrate the power of combining deception along with machine learning for the C&amp;C detection.<\/p>\n<h2>Conclusion:<\/h2>\n<p>Ransomware attacks are evolving at an unprecedented pace today and it is becoming impossible to detect them beforehand. In this blog, we explained how ransomware uses the C&amp;C to encrypt the user data and how one can extract features from domains and train a ML classifier to detect the C&amp;C domains. Many current techniques for detecting the C&amp;C monitor logs continuously and inspect every domain request. This leads to a high number of false positives and is computationally expensive. With Acalviob\u0000\u0019s deception-based solution ShadowPlex-R, we can detect a ransomware attack in real time and use a ML-based classifier to detect the C&amp;C domains. A demonstration of ransomware attack and the C&amp;C detection is available in our webinar [9] hosted by Acalvio and Splunk. <strong>References:<\/strong> [1]:Lord, Nord (2017, July 17), A history of ransomware attacks: the biggest and worst ransomware attacks of all time, URL:B <a href=\"https:\/\/digitalguardian.com\/blog\/history-ransomware-attacks-biggest-and-worst-ransomware-attacks-all-time\">https:\/\/digitalguardian.com\/blog\/history-ransomware-attacks-biggest-and-worst-ransomware-attacks-all-time<\/a> [2]: SANS whitepaper 2017 https:\/\/www.sans.org\/reading-room\/whitepapers\/threats\/sensitive-data-risk-2017-data-protection-survey-37950 [3]:B Breiman, L., 2001. Random forests.B <i>Machine learning<\/i>,B <i>45<\/i>(1), pp.5-32. [4]:B Burges, C.J., 1998. A tutorial on support vector machines for pattern recognition.B <i>Data mining and knowledge discovery<\/i>,B <i>2<\/i>(2), pp.121-167. [5]: Cheng Qi, Xiaojun Chen, Cui Xu, Jinqiao Shi, Peipeng Liu, A Bigram based Real Time DNS Tunnel Detection Approach, In Procedia Computer Science, Volume 17, 2013, Pages 852-860 [6]:B Shannon, C.E., 1951. Prediction and entropy of printed English.B <i>Bell Labs Technical Journal<\/i>,B <i>30<\/i>(1), pp.50-64. [7]: Locky.B <i>Wikipedia.,<\/i>Retrieved November 19, 2017, from B https:\/\/en.wikipedia.org\/wiki\/Locky [8]: Shadowplex-r, Retrieved November 19, 2017, from https:\/\/www.acalvio.com\/shadowplex-r\/ [9]: Splunk webinar, Retrieved November 19, 2017, from https:\/\/www.splunk.com\/blog\/2017\/08\/18\/webinar-learn-how-to-use-deception-to-defend-against-ransomware.html\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Authors: Deepak Gujraniya, Mohammad Waseem, Balamurali AR, and Satnam Singh Since the first attack in 1989 [1], ransomware attacks have gained popularity. Especially in 2017, it has created havoc in every possible industry, including the government offices, public-sector departments, and hospitals. Apart from the financial strain that ransomware can bring, it also affects everyday aspects [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":4533,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[60,87,131],"_links":{"self":[{"href":"https:\/\/acalvio.p2staging.us\/index.php\/wp-json\/wp\/v2\/posts\/2134"}],"collection":[{"href":"https:\/\/acalvio.p2staging.us\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/acalvio.p2staging.us\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/acalvio.p2staging.us\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/acalvio.p2staging.us\/index.php\/wp-json\/wp\/v2\/comments?post=2134"}],"version-history":[{"count":0,"href":"https:\/\/acalvio.p2staging.us\/index.php\/wp-json\/wp\/v2\/posts\/2134\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/acalvio.p2staging.us\/index.php\/wp-json\/wp\/v2\/media\/4533"}],"wp:attachment":[{"href":"https:\/\/acalvio.p2staging.us\/index.php\/wp-json\/wp\/v2\/media?parent=2134"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/acalvio.p2staging.us\/index.php\/wp-json\/wp\/v2\/categories?post=2134"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/acalvio.p2staging.us\/index.php\/wp-json\/wp\/v2\/tags?post=2134"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}