Efficient File Fragments Classification using Depthwise Separable Convolutions

Abstract

A significant element of digital forensics is the classification of file fragments. Researchers have used several approaches to classify file fragments without using meta-data. Recently, deep learning algorithms, including convolutional neural networks and feed-forward neural networks, have been used to build classification models for this task. This paper proposes a depthwise separable convolutional neural network-based model for the efficient classification of file fragments. Our proposed model's evaluation results are faster and more accurate than state-of-the-art models on 75 file fragment types. In particular, our model achieves an accuracy of 78.45\% on the FFT-75 dataset with 100K parameters and 167M FLOPs, which is 24x faster and 4-5x smaller than the state-of-the-art classifier in the literature.

Technologies:

Inception Depthwise Separable Conv Block

Model consists of multiple Inception Depthwise Separable blocks followed by 1x1 Conv for classification.


Dataset

We used FFT-75 dataset that composed of 75 types of files that are organized into 6 different scenarios and variants with 512 and 4096-byte blocks.


Results on FFT-75 dataset Scenario 1 (all 75 classes)

Model Neural Network Block Size # Params Accuracy Speed [ms/block] Speed [min/GB]
Our Model Depthwise Separable CNN 4096 103,083 78.45 2.65 0.055
512 103,083 65.89 2.78 0.382
FiFTy 1-D CNN 4096 449,867 77.04 38.189 1.366
512 289,995 65.66 38.67 3.052

Comparison between FiFTy and our model for floating point operations (Mega FLOPs)

Scenario ) Fragment Size FLOPs (ours) FLOPs (FiFTy)
1 4096 167.83 1047.59
512 21.00 1801.71
2 4096 167.82 1327.90
512 20.99 918.06
3 4096 167.81 647.78
512 20.99 3579.57
4 4096 167.81 2378.52
512 20.98 1576.71
5 4096 167.81 488.37
512 20.98 2330.48
6 4096 167.81 1126.00
512 20.98 611.30

Confusion Matrix for Scenario 1 (4096 and 512 bytes block)