Research Output
Approaches to the classification of high entropy file fragments.
  In this paper we propose novel approaches to the problem of classifying high entropy file fragments. We achieve 97% correct classification for encrypted fragments and 78% for compressed. Although classification of file fragments is central to the science of Digital Forensics, high entropy types have been regarded as a problem. Roussev and Garfinkel [1] argue that existing methods will not work on high entropy fragments because they have no discernible patterns to exploit. We propose two methods that do not rely on such patterns. The NIST statistical test suite is used to detect randomness in 4KB fragments. These test results were analysed using Support Vector Machines, k-Nearest-Neighbour analysis and Artificial Neural Networks (ANN). We compare the performance of each of these analysis methods. Optimum results were obtained using an ANN for analysis giving 94% and 74% correct classification rates for encrypted and compressed fragments respectively. We also use the compressibility of a fragment as a measure of its randomness. Correct classification was 76% and 70% for encrypted and compressed fragments respectively. Although it gave poorer results for encrypted fragments we believe that this method has more potential for future work. We have used subsets of the publicly available GovDocs1 Million File Corpus‘ so that any future research may make valid comparisons with the results obtained here.

  • Type:


  • Date:

    03 October 2013

  • Publication Status:


  • Publisher


  • DOI:


  • ISSN:


  • Library of Congress:

    QA75 Electronic computers. Computer science

  • Dewey Decimal Classification:

    005.8 Data security


Penrose, P., Macfarlane, R., & Buchanan, W. J. (2013). Approaches to the classification of high entropy file fragments. Digital Investigation, 10(4), 372-384.



Digital forensics; File fragments; Encrpyted files; File forensics; Encryption detection;

Monthly Views:

Available Documents