Although the KDD99 dataset is older than 15 years, it is still widely used in academic research. To investigate the wide use of this dataset in machine learning research (MLR) and intrusion detection systems (IDS); This study reviews 149 research articles from 65 journals indexed in the Science Citation Expanded and Emerging Sources Citation citation index over the last six years (2010-2015). If we include papers presented in other indexes and conferences, the number of studies would triple. The number of published studies shows that KDD99 is the most widely used dataset in IDS and machine learning areas, and is the de facto dataset for these areas of research. To show the recent use of KDD99 and the related sub-dataset (NSL-KDD) in IDS and MLR, the following descriptive statistics on the reviewed studies are given: main contribution of articles, applied algorithms, compared classification algorithms, toolbox Software The size and type of data set used for training and testing and sorting output classes (binary, multi-class). In addition to these statistics, a checklist is provided for future researchers working in this area.
This is the dataset used for the third international competition for knowledge discovery and data mining, which was held together with KDD-99 The Fifth International Conference on Knowledge Discovery and Data Mining. The task of the competition was to build a network intrusion detector, a predictive model capable of distinguishing between "bad" connections, called intrusions or attacks, and "good" normal connections. This database contains a standard set of data to be audited, which includes a wide variety of simulated intrusions in a military network environment.