File Fragment Classification

A significant element of digital forensics is the classification of file fragments. Researchers have used several approaches to classify file fragments without using meta-data. Recently, deep learning algorithms, including convolutional neural networks and feed-forward neural networks, have been used to build classification models for this task. This paper proposes a depthwise separable convolutional neural network-based model for the efficient classification of file fragments. Our proposed model's evaluation results are faster and more accurate than state-of-the-art models on 75 file fragment types. In particular, our model achieves an accuracy of 78.45\% on the FFT-75 dataset with 100K parameters and 167M FLOPs, which is 24x faster and 4-5x smaller than the state-of-the-art classifier in the literature.

Technologies:

Inception Depthwise Separable Conv Block

Model consists of multiple Inception Depthwise Separable blocks followed by 1x1 Conv for classification.

Dataset

We used FFT-75 dataset that composed of 75 types of files that are organized into 6 different scenarios and variants with 512 and 4096-byte blocks.

Results on FFT-75 dataset Scenario 1 (all 75 classes)

Model	Neural Network	Block Size	# Params	Accuracy	Speed [ms/block]	Speed [min/GB]
Our Model	Depthwise Separable CNN	4096	103,083	78.45	2.65	0.055
Our Model	Depthwise Separable CNN	512	103,083	65.89	2.78	0.382
FiFTy	1-D CNN	4096	449,867	77.04	38.189	1.366
FiFTy	1-D CNN	512	289,995	65.66	38.67	3.052

Scenario )	Fragment Size	FLOPs (ours)	FLOPs (FiFTy)
1	4096	167.83	1047.59
1	512	21.00	1801.71
2	4096	167.82	1327.90
2	512	20.99	918.06
3	4096	167.81	647.78
3	512	20.99	3579.57
4	4096	167.81	2378.52
4	512	20.98	1576.71
5	4096	167.81	488.37
5	512	20.98	2330.48
6	4096	167.81	1126.00
6	512	20.98	611.30

Efficient File Fragments Classification using Depthwise Separable Convolutions

Abstract

Technologies:

Inception Depthwise Separable Conv Block

Dataset

Results on FFT-75 dataset Scenario 1 (all 75 classes)

Comparison between FiFTy and our model for floating point operations (Mega FLOPs)

Confusion Matrix for Scenario 1 (4096 and 512 bytes block)