Overview
Malicious .exe Detection: Using classifiers to determine whether a Windows portable executable file (.exe) is malicious or benign.
A “Portable Executable” is a file format used for installation on the Windows operating system (32-bit and 64-bit systems), most commonly known for the .exe
file format. This program trains a classifier using scikit-learn, writing pickle files for the classifier and features. This model can then be used to classify PE files, outputting “malicious” or “clean.”
Usage
-
python learnmodel.py [model]
- model can be: AdaBoost, DecisionTree, GNB, GradientBoosting, KNN, RandomForest, NONE
- specifying
NONE
as the model will train all of them before selecting whichever has the highest precision.
- Manual:
python checkfile.py exe-dir/[file]
for file in exe-dir/*; do python checkfile.py $file; done
- Automatic:
./verify.sh
Observations
In practice, there’s a lot of variance. I applied the models to some common files, and got mixed results. This was interesting to try, but signature-based methods for finding malicious programs are probably the way to go still.