Anomaly Detection in Data Science

1 Anomaly Detection in Data ScienceOne-class Classificati...

Author: Luke Hines

0 downloads 1 Views

1 Anomaly Detection in Data ScienceOne-class Classification with Privileged Information for Malware Detection Pavel Erofeev, IITP RAS, Airbus Group Russia

2 Find the Panda

3 Anomaly Detection: Hadlum vs HadlumThe birth of a child to Mrs. Hadlum happened 349 days after Mr. Haldum left for military service Average human pregnancy period is 280 days (40 weeks) Statistically, 39 days is an outlier

4 An outlier is an observation which deviates so much from other observations as to arouse suspicions that it was generated by different mechanism Howkins, 1980

5 Defining Anomaly DetectionDigital representation vectors describing observations Mixture of “nominal” and “abnormal” points Anomaly points are generated by different generative process than the nominal points

6 Possible Settings in CSSupervised (Know attacks) Training data labeled with “nominal” or “anomaly” Clean (Zero-day attacks) Training data are all “nominal”, test data may be contaminated with “anomaly” Unsupervised (Unknown attacks) Training data consists of mixture of “nominal” and “anomaly” points

7 Real World Data ProblemsData is multivariate There is usually more than one generating mechanism underlying the “normal” data Anomalies may represent a different class of objects, so there sre many of them Domain specific definition of what to count as anomaly Normality evaolves in time

8 Anomaly Taxonomy Point Anomaly

9 Anomaly Taxonomy Contextual Anomaly

10 Anomaly Taxonomy Causal Anomaly

11 Taxonomy

12 Imbalanced classificationNormal data - a lot of samples Abnormal - very few Standard methods do not work as expected Standard metrics do not apply

13 Imbalanced classificationWeights for classes Proved not to be helpful in most cases Resampling methods Oversampling (Bootstrap, SMOTE, etc.) Undersampling How to choose which method to use? How to choose resampling parameter? We compared several methods We proposed a meta-model that on average gives best results [Papanov, Erofeev, Burnaev, 2015]

14 Statistics-based modelsAssumption on normal data generation procedure (e.g. Gaussian distribution, etc.) PCA is a method commonly used to extract most variant combinations in data PCA based anomaly detection is good for highly correlated environments

15 Density-based models SVM-based and nearest neighbours basedHow to choose best kernel parameter?

16 One-class SVM with Privileged InformationEvgeny Burnaev Dmitry Smolyakov Skoltech, IITP RAS

17 One-Class SVM

18 One-Class SVM

19 One-Class SVM

20 One-Class SVM Kernel Trick

21 Kernel Trick

22 Hyper-parameter Influence

23 Decision Functions

24 Learning with Privileged InfoExample: Image classification with textual description

25 Learning with Privileged Info

26 Learning with Privileged Info

27 Learning with Privileged Info

28 Microsoft Malware Classification ChallengeKaggle.com competition data (2015)

29 Problem Description 9 malware families Raw dataRumnit, Lollipop, Kelihos ver3, Vundo, Simda, Tracur, Kelihos ver1, Obfuscator.ACY, Gatak Raw data Hexadecimal representation of the raw binary content Meta-data extracted from the binaries, including function calls, strings, etc.

30 Features Original features Privileged featuresInformation from binary files such as Frequencies of bytes Number of different N-grams, etc. Privileged features Information from code disassemble such as Frequencies of commands Number of calls to external dlls Bytecode as an image Features based on image texture which is commonly used for image classification

31 Features

32 Experimental Setup

33 Results

34 Thanks! Any questions?

Anomaly Detection in Data Science

Recommend Documents