Cybercrime has been the most prevalent issue among others in cyberspace and it has been costing a fortune for many enterprises and firms. Cybersecurity issues have grown more than they were a few years back. Any data on the cyberspace is in a delicate state and is prone to data theft and leaks by hackers today.
Image source: September 2019 Cyber Attacks Statistics
Cybercrimes can be categorized into Hacking, child pornography, cyberstalking, DDoS, virus dissemination, software piracy, IRC crimes, bots, credit card fraud, phishing, etc. Data mining and analytics can be useful in threat detection and stopping cybercrimes. If we analyze the motivations behind these cyber attacks, Cybercrimes are at the top with 84.3% and with 47.3% of malware attacks as the primary attacks among other cyber attacks.
Image source: September 2019 Cyber Attacks Statistics
Several data mining and machine learning techniques are used for cybersecurity today. There are techniques like cyber forensics that can benefit from the realm of machine learning. Take an example of the clustering techniques, as they can find patterns amongst the log files during a forensic investigation. By diverging learning cognitive computing in cyber forensics, we can achieve better cybersecurity.
The World of Cyber Forensics:
Cyber forensics is a sub-domain of the cybersecurity that deals with the thorough investigation of the system using several components and software tools. It extracts the evidence about cybercrimes and is presented as evidence in front of the court of law for processing the criminal procedure on the aforesaid cybercrime. The evidence to the cybercrime is extracted through a phase-wise investigation through:
- Expressing Evidence
- Analyzing Evidence
- Abstracting Evidence
- Fixing Evidence
- Discovering Evidence
Cyber forensics has to analyze large amounts of data to find evidence of cybercrime and the same can’t be achieved through manual processes as it could render errors and that is why using the deep learning techniques can help investigate the system and find, analyze and identify the data about the attack.
Deep Learning Paradigm:
The deep learning paradigm is vast and has a lot of potential in cyber forensics and cybersecurity. The computational capabilities of deep learning techniques have been delivering great results. A subsidiary of machine learning technique in the Artificial Intelligence paradigm, deep learning uses Convoluted Neural networks(CNN), Auto-recorder and Restricted Boltzman Machine to ensure superior performance when it comes to learning the data.
Apart from being a popular research field, deep learning offers fast computing and processing capabilities for large amounts of data and that is the sole reason it is useful to diverge deep learning cognitive computational techniques for cyber forensics. Many enterprises around the world are looking to capitalize on deep learning capabilities. Some firms hire android developers and iOS developers to design their applications in sync with deep learning algorithms. Deep Neural Network (DNN) can be used to unearth visual patterns through robust learning and analyze a huge volume of data sets.
When applied to cyber forensics, it can be used to identify potential digital evidence for cybercrime by the investigators.
Deep Learning Cyber Forensics(DLCF) Framework:
This framework uses the power of cognitive computing to improve the data forensics powers of investigation. DCLF framework brings capabilities of the deep learning to the cyber forensics by introducing robust learning techniques based on cognitive computing and processing engines.
High level- DCLF Framework:
There are five layers to this framework:
- Initialization Process.
- Potential Digital Evidence (PDE) Data Sources Identification.
- Deep Learning Enabled Cyber Forensic Investigation Engine.
- Forensic Reporting and Presentation.
- Decision Making and Case Closure.
1.Initialization Process: It is the first responder to the incident that takes place when there is an attack. It includes planning and preparing the system for investigation of the incident and is a rather post-incident mechanism. The nature of the activities involved in this layer allows the use of machine learning techniques for planning and scheduling the first responders’ tasks.
2. Potential Digital Evidence (PDE) Data Sources Identification: Whenever a cybercrime occurs several types of PDEs can be captured and capturing a PDE from an unreliable source can be hard and non-secure too for representation on a legal front. PDEs can be found from a myriad of sources like social media, internet search engines, e-commerce platforms, online cinemas, video footage, smart sensors. For this layer, the machine learning technique named clustering can be used that can group the data and perform analysis to find patterns from the clusters of PDEs.
3. Deep Learning Enabled Cyber Forensic Investigation Engine: This layer is concerned with the investigative part of DCLF and includes several functions like evidence acquisition, analysis, and preservation. Evidence acquisition is done through several sources which must be reliable and further preservation of evidence is essential to the legal representation of the evidence.
4. Forensic Reporting and Presentation: After the completion of the investigative process, an evidence report is prepared for presentation to the respective stakeholders. This layer uses the classification of deep learning algorithms as the classification of algorithms helps conclude the report. The report contains the following points:
- A detailed analysis of all the PDE captured.
- Proof and justification of all sources of each captured item of the evidence.
- A detailed description of each captured item of evidence and how it was preserved
- Links and relationships that exist between sources and evidence captured
- Detailed descriptions of the intentions of the attacker to the targeted victims
- Explanations on the effects of the attack to the targeted victims
- And any other relevant information to the investigation at hand
5. Decision Making and Case Closure: This is the last layer of the framework which includes law enforcement jury and courts deciding based on the evidence reports and this process cannot be automated due to prevalent human intervention.
Full-stack DCLF Framework:
This framework has four major phases to it:-
1. Evidence Acquisition:
With an increased number of data and its sources, collection, and acquisition have become a difficult task at hand. There are different types of methods used to capture the potential evidence and also the source from which it is captured. Reduction in errors during an evidence acquisition can be achieved through deep learning algorithms by digging deep into the data source and finding specific artifacts on predefined criteria.
2. Evidence Preservation:
This phase deals with the preservation of the evidence as the evidence is the backbone of any investigation of cybercrime. Proper preservation and storage of the evidence are important for investigators for legal representation. For this reason, employing an algorithm to establish proper preservation protocols that can preserve the evidence without tampering them or making any alteration.
3. Evidence Analysis:
This phase includes various functions like analyzing the evidence for anomalies, looking for patterns and creating a hypothesis on the incident that occurred and who needs to be held responsible and what are the characteristics of the incident. As can be seen, it is a very complex process and deep learning algorithms can help reduce the complexity through cognitive computing and algorithms like classification, predictions, and K-nearest neighbors.
4. Evidence Interpretation:
Interpretation of the potential digital evidence is crucial to the whole process of cyber forensics. Algorithms such as classification, clustering among others provide solutions to cybercrimes by helping investigators interpret the PDEs better and more efficiently.
With more innovations and emerging technologies the risk of cyberattacks and cybercrimes is increasing and so are the ways to tackle these incidences. Technologies like Blockchain are powering a democratized version of data sharing and as it may look fruitfull there are risks attached to it that need to be realized and use innovations like deep learning and machine learning.