You are currently viewing 3- Malware Detection using Deep Learning – Science & Environment Implementation

3- Malware Detection using Deep Learning – Science & Environment Implementation

We will start by describing the Environment in which we will explain how we will be deploying our Detection System using Deep Learning.

We will be using for Environment:

  • Python 3+
  • Ember 0.10.0 (Feature Files)
  • Tensorflow 2
  • Lief 0.10.1

The Hardware Minimum Requirements are:

  • CPU: Intel Core Xeon 3.10 GHz
  • RAM: 64 GB DDR4
  • GPU: Nvidia GTX 960Ti
  • HardDrive: 15 GB for Dataset Minimum

Methodology:

First we will Extract the Dataset and turn it into Vectorized form and then Save it using Ember Features File calling a function:

ember.create_vectorized_features( path_to_dir , scale = 1. )

It will take time to create vectorized data objects files. After Creating, We will then Read them to Access the Dataset and save it for Training.

ember.read_vectorized_features( path_to_data_objects , scale = 1. )

Then We will pass the Data Objects Files for Training into Our Model using below command:

Before passing the dataset into Model, we will restructure the data into Training, Testing, and Validation Data. After the transformation using scaler_transform, we will scale the data to Normalize it. Then, The Data should be Transformed into 3-Dimensional Data formation using Numpy Function .expand_dims( Training / Testing / Validation Data Nodes / axis = -1 )

Our Model will be training on Training Features & Testing Features with Validation Data which will also Save the Best Performing Epochs Result. After, the Training we can re-assure the Results by Visualized Graphs that must be shown or saved in any Directory (if needed).

Model Structure will be as below:

This above Image shows the actual architecture of CNN-1D.

After the Training, we will have the Best Models Data & Weights being saved in a Directory. We can then Further call the model.h5 file for our evaluation and after assuring the evaluation by passing the Test Features for Accuracy measurements. We will be Testing the Neural Nets we made by Passing an Actual .exe File.

We will decide a threshold (for experiments: the accuracy will be on 0.273 or 0.1 Threshold Value).

Features of the File will be Extracted from the Ember Features Code (provided the link below) and then after the extraction. Prediction on File Data will take Place. Either it will Detect it as Benign or Malware.

The Tutorial for Malware Detection using Deep Learning in Theoretical Way is completed.

For Further Learning, stay tuned.

References:

1710.09435.pdf (arxiv.org)

EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models (arxiv.org)

Links for Reference Data:

GitHub – elastic/ember

GitHub – tamnguyenvan/malnet: Malware Detection using Convolutional Neural Networks

Leave a Reply