CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Advancing Automation in Digital Forensic Investigations
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.ORCID iD: 0000-0002-5115-1453
2018 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Digital Forensics is used to aid traditional preventive security mechanisms when they fail to curtail sophisticated and stealthy cybercrime events. The Digital Forensic Investigation process is largely manual in nature, or at best quasi-automated, requiring a highly skilled labour force and involving a sizeable time investment. Industry standard tools are evidence-centric, automate only a few precursory tasks (E.g. Parsing and Indexing) and have limited capabilities of integration from multiple evidence sources. Furthermore, these tools are always human-driven.

These challenges are exacerbated in the increasingly computerized and highly networked environment of today. Volumes of digital evidence to be collected and analyzed have increased, and so has the diversity of digital evidence sources involved in a typical case. This further handicaps digital forensics practitioners, labs and law enforcement agencies, causing delays in investigations and legal systems due to backlogs of cases. Improved efficiency of the digital investigation process is needed, in terms of increasing the speed and reducing the human effort expended. This study aims at achieving this time and effort reduction, by advancing automation within the digital forensic investigation process.

Using a Design Science research approach, artifacts are designed and developed to address these practical problems. Summarily, the requirements, and architecture of a system for automating digital investigations in highly networked environments are designed. The architecture initially focuses on automation of the identification and acquisition of digital evidence, while later versions focus on full automation and self-organization of devices for all phases of the digital investigation process. Part of the remote evidence acquisition capability of this system architecture is implemented as a proof of concept. The speed and reliability of capturing digital evidence from remote mobile devices over a client-server paradigm is evaluated. A method for the uniform representation and integration of multiple diverse evidence sources for enabling automated correlation, simple reasoning and querying is developed and tested. This method is aimed at automating the analysis phase of digital investigations. Machine Learning (ML)-based triage methods are developed and tested to evaluate the feasibility and performance of using such techniques to automate the identification of priority digital evidence fragments. Models from these ML methods are evaluated in identifying network protocols within DNS tunneled network traffic. A large dataset is also created for future research in ML-based triage for identifying suspicious processes for memory forensics.

From an ex ante evaluation, the designed system architecture enables individual devices to participate in the entire digital investigation process, contributing their processing power towards alleviating the burden on the human analyst. Experiments show that remote evidence acquisition of mobile devices over networks is feasible, however a single-TCP-connection paradigm scales poorly. A proof of concept experiment demonstrates the viability of the automated integration, correlation and reasoning over multiple diverse evidence sources using semantic web technologies. Experimentation also shows that ML-based triage methods can enable prioritization of certain digital evidence sources, for acquisition or analysis, with up to 95% accuracy.

The artifacts developed in this study provide concrete ways to enhance automation in the digital forensic investigation process to increase the investigation speed and reduce the amount of costly human intervention needed.

 

Place, publisher, year, edition, pages
Stockholm: Department of Computer and Systems Sciences, Stockholm University , 2018. , p. 149
Series
Report Series / Department of Computer & Systems Sciences, ISSN 1101-8526 ; 18-002
Keywords [en]
Digital Forensics, Machine Learning, Computer Forensics, Network Forensics, Predictive Modelling, Distributed Systems, Mobile Devices, Mobile Forensics, Memory Forensics, Android, Semantic Web, Hypervisors, Virtualization, Remote Acquisition, Evidence Analysis, Correlation, P2P, Bittorrent
National Category
Computer Systems Communication Systems Telecommunications Computer Sciences Computer Engineering Information Systems
Research subject
Computer and Systems Sciences
Identifiers
URN: urn:nbn:se:su:diva-161555ISBN: 978-91-7797-521-2 (print)ISBN: 978-91-7797-520-5 (electronic)OAI: oai:DiVA.org:su-161555DiVA, id: diva2:1259778
Public defence
2018-12-17, L30, NOD-huset, Borgarfjordsgatan 12, Kista, 14:00 (English)
Opponent
Supervisors
Available from: 2018-11-22 Created: 2018-10-30 Last updated: 2018-11-16Bibliographically approved
List of papers
1. LEIA: The Live Evidence Information Aggregator: Towards Efficient Cyber-Law Enforcement
Open this publication in new window or tab >>LEIA: The Live Evidence Information Aggregator: Towards Efficient Cyber-Law Enforcement
2013 (English)In: World Congress on Internet Security (WorldCIS), IEEE Computer Society, 2013, p. 156-161Conference paper, Published paper (Refereed)
Abstract [en]

Given the complexity and velocity of the interactions among vastly heterogeneous elements on the Internet; the colossal amounts of information generated and exchanged, coupled with the increasingly evasive nature of new forms of electronic crimes, as well as the relative immaturity of current Digital Forensics tools, Law Enforcement Agencies are easily outpaced and overwhelmed with the types of electronic crimes experienced today. In this paper, we describe the architecture of a comprehensive automated Digital Investigation platform termed as the Live Evidence Information Aggregator (LEIA). It makes use of the strong points of hypervisor technologies, large scale distributed file systems, the resource description framework (RDF), peer-to-peer networks, and innovative collaborative mechanisms in order to introduce a level of speed, accuracy and efficiency to match up with the imminent age of massively distributed cybercrime in the context of Internet of Things.

Place, publisher, year, edition, pages
IEEE Computer Society, 2013
Keywords
Digital Forensics, Cybercrime, Digital Evidence, Big Data, Hadoop, Hypervisors, P2P, Collaborative Live Investigation
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-114705 (URN)10.1109/WorldCIS.2013.6751038 (DOI)978-1-908320-22-3 (ISBN)
Conference
World Congress on Internet Security (WorldCIS), London, 9-12 December, 2013
Available from: 2015-03-09 Created: 2015-03-09 Last updated: 2018-11-02Bibliographically approved
2. On the Network Performance of Digital Evidence Acquisition of Small Scale Devices over Public Networks
Open this publication in new window or tab >>On the Network Performance of Digital Evidence Acquisition of Small Scale Devices over Public Networks
2015 (English)In: The Journal of Digital Forensics, Security and Law, ISSN 1558-7215, E-ISSN 1558-7223, Vol. 10, no 3, p. 59-86Article in journal (Refereed) Published
Abstract [en]

While cybercrime proliferates – becoming more complex and surreptitious on the Internet – the tools and techniques used in performing digital investigations are still largely lagging behind, effectively slowing down law enforcement agencies at large. Real-time remote acquisition of digital evidence over the Internet is still an elusive ideal in the combat against cybercrime. In this paper we briefly describe the architecture of a comprehensive proactive digital investigation system that is termed as the Live Evidence Information Aggregator (LEIA). This system aims at collecting digital evidence from potentially any device in real time over the Internet. Particular focus is made on the importance of the efficiency of the network communication in the evidence acquisition phase, in order to retrieve potentially evidentiary information remotely and with immediacy. Through a proof of concept implementation, we demonstrate the live, remote evidence capturing capabilities of such a system on small scale devices, highlighting the necessity for better throughput and availability envisioned through the use of Peer-to-Peer overlays.

Keywords
Digital Forensics, Digital Evidence, Remote acquisition, Proactive forensics, Mobile devices, P2P, Network performance Availability
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-122847 (URN)000363877200004 ()
Available from: 2015-11-11 Created: 2015-11-10 Last updated: 2018-11-02Bibliographically approved
3. Improving Distributed Forensics and Incident Response in Loosely Controlled Networked Environments
Open this publication in new window or tab >>Improving Distributed Forensics and Incident Response in Loosely Controlled Networked Environments
2016 (English)In: International Journal of Security and Its Applications, ISSN 1738-9976, Vol. 10, no 1, p. 385-414Article in journal (Refereed) Published
Abstract [en]

Mobile devices and virtualized appliances in the Internet of Things can be end nodes on varying networks owned by different parties over time, while still seamlessly participating in licit or illicit activities. Digital Forensics and Incident Response (DFIR) tools today struggle to perform digital investigations in such loosely controlled networked environments as they face several challenges including: scarcity of resources, availability, trust, privacy, data volumes, velocity and variety. In this paper we analyze the state of research in DFIR in networked environments, identifying the challenges facing DFIR tools particularly in loosely controlled network environments. We present the requirements for a system to address these challenges at the various steps of the typical digital investigation methodology. From this we identify the need for support from Peer to Peer (P2P) overlays and discuss their relative merits and drawbacks in order to identify those that would best support DFIR in loosely controlled networked environments. Finally we incorporate both structured and unstructured P2P overlays in various capacities in our architecture in order to organize devices in loosely controlled networks, using context information, thus enabling efficient capture, analysis and reporting of artifacts of use in digital investigations.

Keywords
Digital Forensics, Incident Response, P2P Overlays, Open Distributed Systems, Uncontrolled Environment, Internet of Things
National Category
Computer Sciences
Research subject
Information Systems Security
Identifiers
urn:nbn:se:su:diva-128806 (URN)10.14257/ijsia.2016.10.1.35 (DOI)000376639500035 ()
Available from: 2016-04-04 Created: 2016-04-04 Last updated: 2018-10-30Bibliographically approved
4. Semantic Representation and Integration of Digital Evidence
Open this publication in new window or tab >>Semantic Representation and Integration of Digital Evidence
2013 (English)In: Procedia Computer Science, ISSN 1877-0509, E-ISSN 1877-0509, Vol. 22, p. 1266-1275Article in journal (Refereed) Published
Abstract [en]

The ever-increasing complexity and sophistication of computer and network attacks challenge society's dependability on digital infrastructure. Digital investigations recover and reconstruct the digital trails of such events and may employ practices from various subfields (computer, network forensics), each with its own set of techniques and tools. Integration of evidence from heterogeneous sources of data (e.g. disk images, network packet captures, logs) is often a manual and time- consuming process relying significantly on the investigator's expertise. In this paper, we propose and develop an approach, based on the Semantic Web framework, for ontologically representing and integrating digital evidence. The presented approach enhances existing forensic analysis techniques by providing partial and eventually full automation of the investigative process.

Keywords
Digital evidence, Ontology, Semantic Web, Evidence Integration, Knowledge Representation
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-97234 (URN)10.1016/j.procs.2013.09.214 (DOI)
Conference
17th International Conference in Knowledge Based and Intelligent Information and Engineering Systems - KES 2013, Kitakyushu, Japan, September 9 - 12, 2013
Available from: 2013-12-05 Created: 2013-12-05 Last updated: 2018-11-02Bibliographically approved
5. Information-Entropy-Based DNS Tunnel Prediction
Open this publication in new window or tab >>Information-Entropy-Based DNS Tunnel Prediction
2018 (English)In: Advances in Digital Forensics XIV: Revised Selected Papers / [ed] Gilbert Peterson, Sujeet Shenoi, Springer, 2018, p. 127-140Conference paper, Published paper (Refereed)
Abstract [en]

DNS tunneling techniques are often used for malicious purposes. Network security mechanisms have struggled to detect DNS tunneling. Network forensic analysis has been proposed as a solution, but it is slow, invasive and tedious as network forensic analysis tools struggle to deal with undocumented and new network tunneling techniques.

This chapter presents a method for supporting forensic analysis by automating the inference of tunneled protocols. The internal packet structure of DNS tunneling techniques is analyzed and the information entropy of various network protocols and their DNS tunneled equivalents are characterized. This provides the basis for a protocol prediction method that uses entropy distribution averaging. Experiments demonstrate that the method has a prediction accuracy of 75%. The method also preserves privacy because it only computes the information entropy and does not parse the actual tunneled content.

Place, publisher, year, edition, pages
Springer, 2018
Series
IFIP Advances in Information and Communication Technology, ISSN 1868-4238, E-ISSN 1868-422X ; 532
Keywords
Network forensics, DNS tunneling, information entropy
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-136620 (URN)10.1007/978-3-319-99277-8_8 (DOI)978-3-319-99276-1 (ISBN)978-3-319-99277-8 (ISBN)
Conference
14th IFIP WG 11.9 International Conference, New Delhi, India, January 3-5, 2018
Available from: 2016-12-12 Created: 2016-12-12 Last updated: 2018-10-30Bibliographically approved
6. Harnessing Predictive Models for Assisting Network Forensic Investigations of DNS Tunnels
Open this publication in new window or tab >>Harnessing Predictive Models for Assisting Network Forensic Investigations of DNS Tunnels
2017 (English)In: Annual ADFSL Conference on Digital Forensics, Security and Law: Proceedings, 2017, article id 7Conference paper, Published paper (Refereed)
Abstract [en]

In recent times, DNS tunneling techniques have been used for malicious purposes, however network security mechanisms struggle to detect them. Network forensic analysis has been proven effective, but is slow and effort intensive as Network Forensics Analysis Tools struggle to deal with undocumented or new network tunneling techniques. In this paper, we present a machine learning approach, based on feature subsets of network traffic evidence, to aid forensic analysis through automating the inference of protocols carried within DNS tunneling techniques. We explore four network traffic protocols, namely, HTTP, HTTPS, FTP, and POP3. Three features are extracted from the DNS tunneled traffic: IP packet length, DNS Query Name Entropy and DNS Query Name Length. We benchmark the performance of four classification models, i.e., decision trees, support vector machines, k-nearest neighbours, and neural networks, on a data set of DNS tunneled traffic. Classification accuracy of 95% is achieved and the feature set reduces the original evidence data size by a factor of 74%. More importantly, our findings provide strong evidence that predictive modeling machine learning techniques can be used to identify network protocols within DNS tunneled traffic in real-time with high accuracy from a relatively small-sized feature-set, without necessarily infringing on privacy from the outset, nor having to collect complete DNS Tunneling sessions.

Keywords
Network Forensics, Machine Learning, Predictive Models, DNS Tunneling, Protocol Tunneling, Digital Investigations
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-149265 (URN)
Conference
12th ADFSL Conference on Digital Forensics, Security and Law (2017), Daytona Beach, Florida, May 15 - 16, 2017
Available from: 2017-11-24 Created: 2017-11-24 Last updated: 2018-10-30Bibliographically approved
7. Coriander: A Toolset for Generating Realistic Android Digital Evidence Datasets
Open this publication in new window or tab >>Coriander: A Toolset for Generating Realistic Android Digital Evidence Datasets
2017 (English)In: Digital Forensics and Cyber Crime: Proceedings / [ed] Petr Matoušek, Martin Schmiedecker, Springer, 2017, p. 228-233Conference paper, Published paper (Refereed)
Abstract [en]

Triage has been suggested as a means to prioritize and identify sources and artifacts of evidence that might be of most interest when faced with large amounts of digital evidence. Memory Forensics has long relied on simple string matching to triage evidence sources. In this paper, we describe the early devel-opments into our study on Machine Learning-based triage for Memory Forensics. To start off, there are no large datasets of memory captures available. We thus, develop a toolset to enable the automated creation of realistic Android process memory dumps. Using our toolset we generate a dataset of 2375 process memory string dumps from both malicious and benign Android applications, classified by VirusTotal, and sourced from the AndroZoo project. Our dataset and toolset are made available online to help promote research in this field and related areas.

Place, publisher, year, edition, pages
Springer, 2017
Series
Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, ISSN 1867-8211 ; 216
Keywords
Android Forensics, Digital Forensics, Mobile Forensics, Memory Forensics, Digital Evidence, Datasets, Metadata, Machine Learning, Triage
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-149260 (URN)10.1007/978-3-319-73697-6_18 (DOI)978-3-319-73696-9 (ISBN)978-3-319-73697-6 (ISBN)
Conference
9th International Conference, ICDF2C 2017, Prague, Czech Republic, October 9-11, 2017
Available from: 2017-11-24 Created: 2017-11-24 Last updated: 2018-10-30Bibliographically approved

Open Access in DiVA

Advancing Automation in Digital Forensic Investigations(4172 kB)36 downloads
File information
File name FULLTEXT01.pdfFile size 4172 kBChecksum SHA-512
604a33aaa4a0289aeca659764dc76a35050d8a28c822b8db98dcc645e513da9a24ee4daa17e01d1e9edb65ff56290b252d9c3653d13cf21306cda34dd0aa4168
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Homem, Irvin
By organisation
Department of Computer and Systems Sciences
Computer SystemsCommunication SystemsTelecommunicationsComputer SciencesComputer EngineeringInformation Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 36 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 266 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf