Introduction: In today’s internet-driven world, Distributed Denial of Service (DDoS) attacks have become a major threat to the stability and security of networks. These attacks overwhelm systems by flooding them with massive amounts of traffic, making them slow or unavailable to users. Detecting such attacks in real time is very important to prevent service disruption, data loss, or financial damage. Traditional rule-based systems often fail to catch new or evolving attack patterns. This thesis explores how advanced Artificial Intelligence (AI) techniques, specifically Machine Learning (ML) and Deep Learning (DL), can be used to detect DDoS attacks effectively and automatically, both in offline data and in real-time scenarios.
Research Question: The research questions of this thesis addresses are: How does the DDoS attack detection performance evaluated using Deep Learning models and Machine Learning models over two benchmark datasets in real-time attack detection scenario?
Method: To answer the research question, the study tested eight algorithms, four ML models (Random Forest, SVM, Logistic Regression, and Decision Tree) and four DL models (CNN, RNN, LSTM, and CNN-LSTM). These models were trained and evaluated using two widely accepted datasets: CICIDS-2017 and CICDDoS-2019.
Before training the models a comprehensive literature review and scientific justification were demonstrated, and then several data preprocessing steps were applied: missing values were removed, categorical values were encoded into numbers, and features were normalized. Then, the top 15 most important features were selected using a tree-based feature ranking method. The cleaned and processed data were split into training, validation, and testing sets. Each model was trained for binary classification to identify whether a given traffic flow was normal or an attack.
To evaluate how well these models work in real life, a small-scale simulated environment was created. Here, DDoS attacks were simulated using tools like hping3 (for SYN floods) and slowloris (for HTTP slow-rate attacks) between two virtual machines. The Random Forest model was saved as a trained pickle file and used in a Python-based monitoring tool that scanned live traffic and triggered alarms when attacks were detected.
Results: The study found that all models achieved high accuracy, often greater than 99%, in identifying DDoS attacks. Among the deep learning models, CNN performed the best on the CICIDS-2017 dataset, with an accuracy of 99.92%, while CNN-LSTM was the most balanced performer across both datasets. However, the Random Forest model stood out by delivering the highest or near-highest scores on both datasets—achieving 99.97% accuracy on CICIDS-2017 and 99.91% on CICDDoS-2019. It also showed excellent precision and recall, meaning it was both accurate and reliable in catching attacks without too many false alarms. In the real-time simulation test, the Random Forest model successfully detected live DDoS attacks within seconds. The monitoring tool raised alarms immediately after the start of SYN flood and slowloris attacks. This confirmed that Random Forest could operate in real-time, with very few false positives and high responsiveness. I apply an Extra Trees-based feature ranking to reduce each flow’s description to the 15 most informative metrics, then evaluate four machine learning (RF, SVM, LR, DT) and four deep learning models (CNN, RNN, LSTM, CNN-LSTM) on CICIDS-2017 and CICDDoS-2019. All achieve > 99 % accuracy offline; RF leads with 99.97% and 99.91% respectively and detects SYN-flood and Slowloris attacks in ≈ 2 s on a standard CPU, using < 4 GB RAM.
Discussion: These findings suggest that both ML and DL models can effectively detect DDoS attacks, but some models offer practical advantages over others. Deep learning models like CNN and LSTM are excellent at learning complex patterns from large datasets, especially if the data includes subtle or evolving attack behaviors. However, they are computationally expensive and may require specialized hardware or longer training times. On the other hand, the Random Forest model, a traditional ML approach, achieved similar or even better results with much less complexity. It was faster to train, easier to interpret, and more lightweight making it ideal for real-time applications on standard machines without GPUs (Graphics Processing Unit). This suggests that in environments where speed and resource use are critical, Random Forest is a strong candidate.
Finally, I discuss limitations (binary labels, two datasets, 1 Gbps simulation lab) and outline future work on multiclass DDoS identification, encrypted traffic, and large-scale deployments.