Ensemble learning techniques for cyber security applications
Mostra/ Apri
Creato da
Pisani, Francesco Sergio
Crupi, Felice
Folino, Gianluigi
Metadata
Mostra tutti i dati dell'itemDescrizione
Formato
/
Dottorato di Ricerca in Information and Communication Engineering For Pervasive Intelligent Environments, Ciclo XXIX; Cyber security involves protecting information and systems from major cyber threats;
frequently, some high-level techniques, such as for instance data mining techniques,
are be used to efficiently fight, alleviate the effect or to prevent the action of the
cybercriminals.
In particular, classification can be efficiently used for many cyber security application,
i.e. in intrusion detection systems, in the analysis of the user behavior, risk
and attack analysis, etc.
However, the complexity and the diversity of modern systems opened a wide
range of new issues difficult to address.
In fact, security softwares have to deal with missing data, privacy limitation and
heterogeneous sources. Therefore, it would be really unlikely a single classification
algorithm will perform well for all the types of data, especially in presence
of changes and with constraints of real time and scalability.
To this aim, this thesis proposes a framework based on the ensemble paradigm
to cope with these problems. Ensemble is a learning paradigm where multiple learners
are trained for the same task by a learning algorithm, and the predictions of the
learners are combined for dealing with new unseen instances. The ensemble method
helps to reduce the variance of the error, the bias, and the dependence from a single
dataset; furthermore, it can be build in an incremental way and it is apt to distributed
implementations. It is also particularly suitable for distributed intrusion detection,
because it permits to build a network profile by combining different classifiers that
together provide complementary information. However, the phase of building of the
ensemble could be computationally expensive as when new data arrives, it is necessary
to restart the training phase. For this reason, the framework is based on Genetic
Programming to evolve a function for combining the classifiers composing the
ensemble, having some attractive characteristics. First, the models composing the
ensemble can be trained only on a portion of the training set, and then they can be
combined and used without any extra phase of training. Moreover the models can be
specialized for a single class and they can be designed to handle the difficult problems
of unbalanced classes and missing data. In case of changes in the data, the function can be recomputed in an incrementally
way, with a moderate computational effort and, in a streaming environment,
drift strategies can be used to update the models. In addition, all the phases of
the algorithm are distributed and can exploits the advantages of running on parallel/
distributed architectures to cope with real time constraints.
The framework is oriented and specialized towards cyber security applications.
For this reason, the algorithm is designed to work with missing data, unbalanced
classes, models specialized on some tasks and model working with streaming data.
Two typical scenarios in the cyber security domain are provided and some experiment
are conducted on artificial and real datasets to test the effectiveness of the
approach. The first scenario deals with user behavior. The actions taken by users
could lead to data breaches and the damages could have a very high cost. The second
scenario deals with intrusion detection system. In this research area, the ensemble
paradigm is a very new technique and the researcher must completely understand the
advantages of this solution.; Università della CalabriaSoggetto
Computer security; Machine learning
Relazione
ING-INF/05;