Network traffic analysis using machine learning: an unsupervised approach to understand and slice your network

  • Published: 04 November 2021
  • Volume 77 , pages 297–309, ( 2022 )

Cite this article

network traffic analysis research papers

  • Ons Aouedi 1 ,
  • Kandaraj Piamrat   ORCID: orcid.org/0000-0002-2343-0850 1 ,
  • Salima Hamma 1 &
  • J. K. Menuka Perera 1  

1149 Accesses

8 Citations

Explore all metrics

Recent development in smart devices has lead us to an explosion in data generation and heterogeneity, which requires new network solutions for better analyzing and understanding traffic. These solutions should be intelligent and scalable in order to handle the huge amount of data automatically. With the progress of high-performance computing (HPC), it becomes feasible easily to deploy machine learning (ML) to solve complex problems and its efficiency has been validated in several domains (e.g., healthcare or computer vision). At the same time, network slicing (NS) has drawn significant attention from both industry and academia as it is essential to address the diversity of service requirements. Therefore, the adoption of ML within NS management is an interesting issue. In this paper, we have focused on analyzing network data with the objective of defining network slices according to traffic flow behaviors. For dimensionality reduction, the feature selection has been applied to select the most relevant features (15 out of 87 features) from a real dataset of more than 3 million instances. Then, a K-means clustering is applied to better understand and distinguish behaviors of traffic. The results demonstrated a good correlation among instances in the same cluster generated by the unsupervised learning. This solution can be further integrated in a real environment using network function virtualization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

network traffic analysis research papers

Similar content being viewed by others

network traffic analysis research papers

Network Slicing in Software-Defined Networks for Resource Optimization

network traffic analysis research papers

Classification of Services through Feature Selection and Machine Learning in 5G Networks

network traffic analysis research papers

Categorical learning for automated network traffic categorization for future generation networks in SDN

https://www.kaggle.com/jsrojas/ip-network-traffic-flows-labeled-with-87-apps

Shen X, Gao J, Wu W, Lyu K, Li M, Zhuang W, Li X, Rao J (2020) Ai-assisted network-slicing based next-generation wireless networks. IEEE Open J Veh Technol 1:45–66

Article   Google Scholar  

Fantacci R, Picano B (2020) When network slicing meets prospect theory: A service provider revenue maximization framework. IEEE Trans Veh Technol 69(3):3179–3189

Boutaba R, Salahuddin MA, Limam N, Ayoubi S, Shahriar N, Estrada-Solano F, Caicedo OM (2018) A comprehensive survey on machine learning for networking: evolution, applications and research opportunities. J Internet Serv Appl 9(1):1–99

Li X, Samaka M, Chan HA, Bhamare D, Gupta L, Guo C, Jain R (2017) Network slicing for 5g: Challenges and opportunities. IEEE Internet Comput 21(5):20–27

Abidi MH, Alkhalefah H, Moiduddin K, Alazab M, Mohammed MK, Ameen W, Gadekallu TR (2021) Optimal 5g network slicing using machine learning and deep learning concepts. Comput Stand Interfaces, p 103518

Kafle VP, Fukushima Y, Martinez-Julia P, Miyazawa T (2018) Consideration on automation of 5g network slicing with machine learning. In: 2018 ITU Kaleidoscope: Machine learning for a 5G future (ITU K). IEEE, pp 1–8

Mestres A, Rodriguez-Natal A, Carner J, Barlet-Ros P, Alarcón E, Solé M, Muntés-Mulero V, Meyer D, Barkai S, Hibbett MJ et al (2017) Knowledge-defined networking. ACM SIGCOMM Comput Commun Rev 47(3):2–10

L’heureux A, Grolinger K, Elyamany HF, Capretz MA (2017) Machine learning with big data: Challenges and approaches. IEEE Access 5:7776–7797

Kuranage MPJ, Piamrat K, Hamma S (2019) Network traffic classification using machine learning for software defined networks. In: International conference on machine learning for networking. Springer, pp 28–39

Le L-V, Lin B-SP, Tung L-P, Sinh D (2018) Sdn/nfv, machine learning, and big data driven network slicing for 5g. In: 2018 IEEE 5G world forum (5GWF). IEEE, pp 20–25

Nakao A, Du P (2018) Toward in-network deep machine learning for identifying mobile applications and enabling application specific network slicing. IEICE Trans Commun, 1536–1543

Le L-V, Sinh D, Lin B-SP, Tung L-P (2018) Applying big data, machine learning, and sdn/nfv to 5g traffic clustering, forecasting, and management. In: 2018 4th IEEE conference on network softwarization and workshops (NetSoft). IEEE, pp 168–176

Wang S, Wu X, Chen H, Wang Y, Li D (2014) An optimal slicing strategy for sdn based smart home network. In: 2014 International conference on smart computing. IEEE, pp 118–122

Singh SK, Salim MM, Cha J, Pan Y, Park JH (2020) Machine learning-based network sub-slicing framework in a sustainable 5g environment. Sustainability 12(15):6250

Foukas X, Patounas G, Elmokashfi A, Marina MK (2017) Network slicing in 5g: Survey and challenges. IEEE Commun Mag 55(5):94–100

Afolabi I, Taleb T, Samdanis K, Ksentini A, Flinck H (2018) Network slicing and softwarization: A survey on principles, enabling technologies, and solutions. IEEE Commun Surv Tutorials 20(3):2429–2453

Ye Q, Li J, Qu K, Zhuang W, Shen XS, Li X (2018) End-to-end quality of service in 5g networks: Examining the effectiveness of a network slicing framework. IEEE Veh Technol Mag 13(2):65–74

Usama M, Qadir J, Raza A, Arif H, Yau K-LA, Elkhatib Y, Hussain A, Al-Fuqaha A (2019) Unsupervised machine learning for networking: Techniques, applications and research challenges. IEEE Access 7:65579–65615

Tomar D, Agarwal S (2013) A survey on data mining approaches for healthcare. Int J Bio-Sci Bio-Techn 5(5):241–266

Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv (CSUR) 31(3):264–323

Johnson SC (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254

Ester M, Kriegel H-P, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: kdd, no. 34, vol 96, pp 226–231

Tomar D, Agarwal S (2013) A survey on data mining approaches for healthcare. Int J Bio-Sci Bio-Technol 5(5):241–266

Ahmad A, Khan SS (2019) Survey of state-of-the-art mixed data clustering algorithms. IEEE Access 7:31883–31902

Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 2:224–227

Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65

Janecek A, Gansterer W, Demel M, Ecker G (2008) On the relationship between feature selection and classification accuracy. In: New challenges for feature selection in data mining and knowledge discovery, PMLR, pp 90–105

Domingos P (2012) Afew useful things to know about machine learning. Commun ACM 55 (10):78–87

Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1-3):389–422

Rojas JS, Gallón Á, Corrales JC (2018) Personalized service degradation policies on ott applications based on the consumption behavior of users. In: International conference on computational science and its applications. Springer, pp 543–557

Langley P et al (1994) Selection of relevant features in machine learning. In: Proceedings of the AAAI fall symposium on relevance, vol 184, pp 245–271

Aouedi O, Piamrat K, Parrein B (2021) Performance evaluation of feature selection and tree-based algorithms for traffic classification. In: 2021 IEEE international conference on communications (ICC) DDINS Workshop, Montreal Canada

Li R, Zhao Z, Zhou X, Ding G, Chen Y, Wang Z, Zhang H (2017) Intelligent 5g: When cellular networks meet artificial intelligence. IEEE Wirel Commun 24(5):175–183

Download references

Author information

Authors and affiliations.

Laboratoire des Sciences du Numerique de Nantes, Nantes, France

Ons Aouedi, Kandaraj Piamrat, Salima Hamma & J. K. Menuka Perera

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Kandaraj Piamrat .

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Aouedi, O., Piamrat, K., Hamma, S. et al. Network traffic analysis using machine learning: an unsupervised approach to understand and slice your network. Ann. Telecommun. 77 , 297–309 (2022). https://doi.org/10.1007/s12243-021-00889-1

Download citation

Received : 25 July 2020

Accepted : 14 September 2021

Published : 04 November 2021

Issue Date : June 2022

DOI : https://doi.org/10.1007/s12243-021-00889-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Machine learning
  • Feature selection
  • Unsupervised learning
  • Network traffic
  • Traffic analysis
  • Network slicing
  • Find a journal
  • Publish with us
  • Track your research
  • Princeton University Doctoral Dissertations, 2011-2024
  • Computer Science

Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.

  • Open access
  • Published: 28 April 2021

Traffic analysis for 5G network slice based on machine learning

  • Feng Xie 1 ,
  • Dongxue Wei 1 &
  • Zhencheng Wang 2  

EURASIP Journal on Wireless Communications and Networking volume  2021 , Article number:  108 ( 2021 ) Cite this article

7438 Accesses

12 Citations

Metrics details

With the rise of 5G and Internet of things, especially the key technology of 5G, network slice cuts a physical network into multiple virtual end-to-end networks, each of them can obtain logically independent network resources to support richer services. 5G mobile data and sensor data converge to form a growing network traffic. Traffic explosion evolved into a mixed network type, and network viruses, worms, network theft and malicious attacks are also involved. How to distinguish traffic types, block malicious traffic and make effective use of sensor data under the background of 5G network slice, and also the significance of this study.

With the advent of the era of 5G and artificial intelligence, machine learning plays an important role in image recognition, language processing, speech recognition and other fields. In this paper, the Internet of things of home is selected as the research scene, through the construction of a sensor network, ZigBee device is connected to the Internet, to obtain temperature, humidity, power of home appliances, smoke concentration and other sensor data in the home, and then, it used the machine learning to study and analyze the flow data, identify the validity of flow, ensure the security of home network, summarize and analyze sensor data, and ensure the security of home physical environment. This study not only has important theoretical significance in machine learning, but also has a broad application prospects in smart home industry.

1 Introduce

At present, there are few studies on the combination of machine learning, traffic analysis, network slice and the Internet of things, and the function of adding sensor data to the Internet traffic participation classification is not implemented [ 1 , 2 , 3 , 4 ]. To solve this problem, under the background of 5G network slice, this paper proposes a home traffic analysis system combined with Internet of things [ 5 , 6 ].

Firstly, this paper introduces the network slice of key role in 5G technology. Around the key concepts of machine learning and traffic analysis, it includes origin, development, features and forms [ 7 , 8 , 9 , 10 ]. With the wide application of artificial intelligence in various industries, people can use the high-speed processing ability of computers to train a large amount of data, get a model to achieve traffic classification and finally apply it in practical situations [ 11 ]. This paper introduces the design and implementation of the home traffic analysis system combined with the Internet of things in detail and demonstrates its characteristics in the process of explaining it. The first is to create the Internet of things inside the smart home, using the sensor network architecture of ZigBee, collect data through the ZigBee node for each sensor device and finally collect data through the coordinator to the gateway, which forward the data to the server, thus realizing the traffic classification [ 12 , 13 , 14 , 15 ]. In order to design and implement the system, this paper establishes a related experimental study. In the experiment, the flow dataset used for training is obtained, its data characteristics are analyzed by statistical analysis, it is filtered and cleaned, and the related algorithms of machine learning, including decision tree, random forest and regression, are used to train the flow data samples, and then, the test is carried out. Samples assess model performance. In order to verify the actual effect of the model, this paper uses the package software to capture the actual traffic data and send it to the model for evaluation and judgment, to measure the effect of the model according to the accuracy of the judgment.

In order to compare the performance differences of different machine learning algorithms in the process of feature selection, this paper describes the feature selection of two different algorithms and measures them through experiments. The experimental data show that the Chi-square filter method can get better accuracy and better comprehensive performance.

2 Related concepts and work

2.1 network slice.

The network slice can be adapted according to the requirements of each user. Different traffic and resources are treated differently by slicing, and then, network operators make different service requirements for customers of different tenant types [ 16 , 17 , 18 ]. Network slice is one of the key technologies of 5G, NFV is an essential technology of network slice, NFV isolated from traditional network hardware and software part and hardware by unified server deployment, the software shall be borne by the different network function (NF), so as to realize the demand of flexible assembly operations, Network slice allow to share the same physical network communication link, and complete the data exchange by virtual independent sub network, so as to better meet the needs of 5G everything connected.

2.2 Machine learning

Machine learning is an important realization method in the field of artificial intelligence. Its development can be summarized in the following stages:

In the early 1950s, machine learning began to sprout. The concept of perceptron was proposed, but this perceptor can only handle linear classification problems, not XOR logic. In the 1960s and 1970s, symbolism learning technology based on logical representation flourished, mainly researching inductive learning system based on logic [ 19 , 20 ]. From the 1980s to the mid-1990s, decision, the emergence of tree makes it easy for classification algorithms to express complex data relationships in terms of knowledge learning, but it will result in the learning process facing assumptions that are too large and complex [ 21 , 22 ]. The last stage is from the mid-1990s to today, when facing a large amount of data content, statistical learning and deep learning came into being.

At present, the research work in the field of machine learning mainly focuses on three aspects, one is task-oriented research, which focuses on performance analysis and improved learning system for a set of predefined tasks; the other is to build a cognitive model, which focuses mainly on human learning process and simulates it with high-performance computer; the last is theoretical analysis, which promotes learning through theoretical exploration. Algorithmic study of the effect. In the past decade, machine learning has achieved very effective results in the application of industry, mainly focusing on weather forecasting, image recognition, voice and handwriting recognition, stock market analysis, pattern recognition and other fields. At present, major companies have launched their own machine learning platforms, such as Tensor Flow machine learning framework of Google, Microsoft Azure machine learning studio, MLflow of open source machine learning platform, Baidu machine learning (BML), Ali PAI, JD NeuCube, etc. Machine learning has become the main research direction of major companies.

The machine learning algorithm can effectively predict the gathered sensor data collected by the wireless sensor network, and then monitor the abnormalities of the home appliances in the sensor location, the future trend of the home environment and so on, to realize the intelligent service of smart home energy saving reminder [ 23 , 24 ]. The combination of smart home and machine learning is of certain research value. In the foreseeable future, there will be a large number of smart home systems that apply machine learning in the market, providing people with a more convenient and intelligent home living space [ 25 ].

2.3 Traffic mining analysis

Traffic mining analysis technology is a technology that captures network traffic and continuously improves the identification algorithm and extracts traffic characteristics according to the changes of network environment [ 26 , 27 ]. Up to now, the main methods of traffic identification analysis are based on port number mapping, based on network behavior characteristics, and based on machine learning.

2.3.1 Traffic identification based on port number mapping

The specific implementation of traffic identification method based on port number mapping is to identify different network applications by checking the source and destination port numbers of network packets, and mapping the port number rules used in communication according to the corresponding network protocol or network application. Kim et al. pointed out that port classification technology is effective in identifying some applications, such as WWW, DNS and MAIL, with both accuracy and recall rates higher than 90% [ 28 ].However, current P2P applications use port hopping technology and port masquerading technology to avoid traffic detection. Bleul et al. analyzed the direct-connect network and found that 70% of the observed ports were used only once [ 29 ]. Schneider et al. found that when port classification technology identifies UDP traffic, the byte accuracy is only 24%.It can be seen that the port-based traffic identification technology can no longer meet the current needs, and the limitations of this method are becoming more and more obvious. First, the system does not define communication port numbers for all applications, especially for some later new applications, so it is not always possible to have a one-to-one correspondence between network port numbers and applications. Second, some common protocols do not use fixed port numbers for data transmission. In addition, services of multiple network protocols can be packaged as common applications and use the same port number, while the traffic classification and identification method based on network port mapping can no longer solve these problems. The accuracy and reliability of this method are also declining, and it can no longer meet the requirements of network traffic classification and identification.

2.3.2 Flow identification based on traffic load

The traffic classification and identification method based on payload determines the network traffic category by analyzing whether the payload of a network packet matches the feature identification library. Due to the low accuracy of port mapping in traffic identification, SenS proposed an in-depth message detection method based on application-layer protocol feature fields [ 30 ]. This method needs to pre-establish the application layer feature recognition rule base of network traffic and verify whether it matches a feature recognition rule in the rule base by analyzing the key control information in the payload, so the process of network traffic classification and identification can also be considered as the process of pattern verification when using this method in practice. At present, with the increase in network bandwidth and the influx of a large amount of data into the Internet, the emergence of new network applications and the continuous updating of existing network applications, the flow identification feature rules that need to be stored in the rule base are also expanding rapidly, and the processing and storage costs of the system are increasing. Moreover, more importantly, a complete network payload analysis is not only expensive to compute, but also may involve user privacy disputes and data security leaks [ 31 ]. Therefore, it has encountered some resistance in its development process.

2.3.3 Traffic identification based on machine learning

Machine learning has the ability of data mining and can extract implicit, regular and effective information from large data. Network traffic contains huge and complex data, and now academia is focusing on the method of traffic identification based on machine learning. Machine learning has many algorithms to learn from, and network traffic has many characteristics to choose from. Andrew Moore et al. gave 248 traffic characteristics to choose from. The traffic classification and recognition methods based on machine learning can be roughly divided into the traffic classification and recognition methods based on supervised learning, the traffic classification and recognition methods based on unsupervised learning and the traffic classification and recognition methods based on reinforcement learning.

ZigBee is a technical standard for building personal area networks using 2.4G band communication. Its implementation of MAC and Physical Layer follows IEEE 802.15.4Standard has the advantages of simple implementation, low power consumption, automatic network formation and the ability to meet different functional requirements with a variety of topological structures. The disadvantage is that the product development is difficult, the development cycle is long, and the product cost is high [ 5 , 16 , 19 ]. ZigBee-based wireless sensor network can be set up, which can be used in many aspects such as energy-saving research as well as data collection, processing and transmission.

Like the ZigBee protocol, Z-Stack uses a hierarchical software architecture, in which the HAL manages tasks using a time slice polling mechanism and provides a multitask management mechanism. It provides driver interfaces for a variety of peripheral modules, including timers, GPIO, universal asynchronous transceiver, analog-to-digital converters, etc. It also provides other extended service items. The OSAL is used to provide application developers with the lower-level interfaces and services required by the upper-level applications.

3.1 ZigBee networking

This paper implements ZigBee communication function based on the ZigBee protocol stack Z-Stack introduced by TI. It supports a variety of microcontrollers, including CC2530 on-chip system, CC2520 in MSP430 series and LM3S9B96 in Stellaris series. This protocol stack includes a variety of network topologies and is widely used in ZigBee industry.

The protocol stack defines how communication hardware and software communicate across layers in different hierarchies. For the data sender, the information packets sent by the user pass through each protocol layer in order from high to low, and each layer's entity adds its own unique identity information in a defined format, reaching the physical layer, transferring in the physical link as a binary stream and reaching the data receiver. When the data receiver receives the data stream, the data packet passes through each protocol layer in order from lowest to highest, and the entities of each layer extract the data information from the data packet in a predefined format that needs to be processed at this layer, and finally reach the application layer.

Figure  1 shows the network structure of ZigBee; it contains two important roles: coordinator and terminal node, which together constitute the simplest ZigBee communication process. The internal network is 2.4G wireless communication interaction. The external network is peripheral devices such as sensors and the Internet. The control of household appliances and environmental monitoring functions can be achieved through the interaction between the internal network and the external network. The coordinator role is the relay of the entire ZigBee network, which scans the current network condition, chooses the appropriate channel and network ID, and then starts ZigBee network; in addition, it will participate in assisting in configuring security parameters and application bindings within the network. In short, the coordinator role is primarily responsible for starting and configuring the network. Once it has completed its work, it can choose to switch to the router role or exit the current network. Such a change will not have any impact on the network as a whole; the terminal node itself is not responsible for the overall work of the network, it only needs to have the ability to sleep and wake up, go to sleep when not working, extend standby time and wake up quickly once it receives the wake-up command from the coordinator.

figure 1

The topology of ZigBee network. Figure shows the network structure of ZigBee; it contains two important roles: coordinator and terminal node, which together constitute the simplest ZigBee communication process

3.2 Establishing a traffic model based on machine learning

In machine learning for traffic analysis, the effectiveness of the assessment requires data support and training. This paper uses Cambridge University's Moore dataset as a training test set for traffic classification. The dataset uses a high-performance network monitor to collect data, and provides a time stamp with resolution over 35 ns, and consists of many objects, each of which is described by a set of features. This dataset was selected because it is close to the actual network situation. With a large amount of manually classified data, each object in each dataset represents a single TCP packet flow between the client and the server [ 16 ]. The characteristics of each object include classifications derived elsewhere and many derived features as inputs to probability classification techniques. The information in the features is exported using the header information alone, while the classification classes are exported using content-based analysis.

The dataset contains 10 sub-datasets, totaling 377,536 data and 249 features. The 11 traffic types involved are WWW, FTP, DATABASE, P2P, SERVICE, MAIL ATTACK, etc. Each subset of the set is characterized as shown in Table 1 .

In this table, duration refers to the time from the beginning of monitoring to the end of data flow. Each sub dataset has a different number of streams and a different duration. The streams contained in it are TCP streams, so they have clear start and end identifiers. Each stream corresponds to different traffic types. Based on machine learning, classification models can be trained to classify the actual traffic.

3.3 Data pre-processing

The dataset itself is not always perfect. Some datasets have different data types, such as text, numbers, time series, continuity and discontinuity. It is also possible that the quality of the data is not good, there is noise, there are anomalies, there are missing, the data is wrong, the dimensions are different, there are duplicates, the data is skewed, and the amount of data is too large or too small. In order for the data to fit the model and match the needs of the model, the Moore dataset needs to be pre-processed, detected from the data, corrected or deleted, inaccurate or inappropriate records for the model [ 8 , 13 ]. Data pre-processing methods include removing unique attributes, processing missing values, attribute coding, data standardization regularization, feature selection, principal component analysis and so on.

In machine learning, most algorithms, such as logistic regression, support vector machine SVM and k -nearest neighbor algorithm, can only process numeric data and cannot process text. In sklearn, in addition to the algorithm used to process text, other algorithms require all input arrays or matrices during training, and cannot import text-based data. Some of the data in this dataset contains the characters Y and N, which cannot be processed directly using machine learning algorithms. You can encode Y as 1 and N as 0 through attribute encoding. During the network connection process, the maximum segment size cannot be known, so the dataset is represented by a'?', so there are consecutive features in the dataset that appear as'?'. For this reason, we use mean filling with Gauss white noise.

From Fig.  2 , it can be seen that the standard deviation and mean values of some features in the dataset are unusually large, reaching 10e17 and 10e15. For such feature data, data regularization is used. For a single sample, the sample is scaled to the unit norm for each sample. The specific process is as follows:

Dataset D is defined as: \(D = \{ (\vec{x}_{1} ,y_{1} ),(\vec{x}_{2} ,y_{2} ), \ldots ,(\vec{x}_{n} ,y_{n} )\} ,\vec{x}_{1} = (x^{(1)}_{i} ,x^{(2)}_{i} , \ldots ,x^{(d)}_{i} )^{{\text{T}}}\) .

Calculate L p norm: \(L_{p} (\vec{x}_{i} ) = \left( {|x^{(1)}_{i} |^{p} + |x^{(1)}_{i} |^{p} + \cdots + |x^{(d)}_{i} |^{p} } \right)^{\frac{1}{p}}\) .

Calculate sample regularization: \(\vec{x}_{i} = \left( {\frac{{x^{(1)}_{i} }}{{L_{p} (\vec{x}_{i} )}},\frac{{x^{(2)}_{i} }}{{L_{p} (\vec{x}_{i} )}}, \ldots ,\frac{{x^{(d)}_{i} }}{{L_{p} (\vec{x}_{i} )}}} \right)^{{\text{T}}}\) .

figure 2

Statistical characteristics of raw data. a The mean values of features in the dataset. b The standard deviation of features in the dataset. c The 25% quartile values of features in the dataset. d The 50% quartile values of features in the dataset

This paper recalculates the statistical characteristics by simply filling and replacing the data with abnormal features and normalizing the data. Figure  3 describes the dataset with some statistical features, including standard deviation, mean and 25% bits as median.

figure 3

Statistical characteristics of normalized data. a The mean values of features in the normalized dataset. b The standard deviation of features in the normalized dataset. c The 25% quartile values of features in the normalized dataset. d The 50% quartile values of features in the normalized dataset

3.4 Data feature processing

When data pre-processing is complete, we need to select meaningful features to input into the machine learning algorithm and model for training. When exploratory analysis of data reveals that there are too many features introduced. To model and analyze directly with these features, further screening of the original features is required and only important features are retained. Generally, features are selected from two perspectives:

Whether a feature is divergent or not: If a feature does not diverge, for example, if the variance of a feature itself is small, then there is little difference in the sample on this feature. Maybe most of the values in the feature are the same, or even the values of the whole feature are the same, then this feature has no effect on sample differentiation.

Relevance of features to objectives: Features that are highly relevant to objectives should be selected. In addition to the variance method, the other methods described in this paper are concerned with correlation.

According to the form of feature selection, there are three feature selection methods:

Filter: A filter method that scores each feature according to divergence or correlation, sets thresholds or the number of thresholds to be selected and selects features.

Wrapper: Packaging method that selects several features at a time based on the objective function or excludes several features, such as recursive elimination of features using a base model for multiple rounds of training. After each round of training, the features of several weight coefficients are eliminated, and then, the next round of training is based on a new set of features.

Embedded: Embedded method, first uses some machine learning algorithms and models to train, get the weight coefficients of each feature and select features from large to small according to the coefficients. Similar to the filter method, but trained to determine the quality of the features.

To explore the performance of different algorithms in the model, this paper chooses different feature selection algorithms to obtain the best traffic classification model through comparison.

3.4.1 Variance filtering

To select the optimal hyperparameter, you can draw a learning curve to find the best point of the model. However, it takes a lot of time, and the improvement of the model is limited. In this paper, variance filtering with a threshold of 0.001 is used to first eliminate some features that are obviously not needed, and then select a better feature selection method to continue to reduce the number of features. By variance filtering, features with variances less than thresholds are removed, leaving 240 features.

After selecting the variance, the next step is to select meaningful features related to the target tag, which can provide a lot of information. If the feature is not tagged, it will simply waste computing memory and possibly noise the model. Here, three common methods can be used to assess the correlation between features and labels: Chi-square, F test and mutual information.

3.4.2 Chi-square filtration

Chi-square filtering is a correlation filtering specifically for discrete tags. The Chi-square test calculates the Chi-square statistics between each nonnegative feature and label and ranks them according to the characteristics of the Chi-square statistics from high to low. Combined with the scoring criteria, the classes with the highest K -score were selected to remove features that are most likely independent of labels and unrelated to the purpose of classification. In addition, if the Chi-square test detects that all values in a feature are the same, it will prompt variance filtering using the difference first. However, the selection of K value is closely related to the performance of the model. In order to obtain the best K value, we need to find ways to explore the best K value.

The F test, also known as ANOVA, variance homogeneity test, is a filtering method used to capture the linear relationship between each feature and a label. It can be used for regression or classification, where F test classification is used for data with labels as discrete variables and F test regression is used for data with labels as continuous variables. The output statistics can be used directly to determine what kind of K we want to set. It is important to note that the F test is very stable when the data follows a normal distribution, so using F test filtering first converts the data into a normal distribution. The essence of F test is to find a linear relationship between two sets of data, assuming that there is no significant linear relationship between the data. It returns two statistics, F and P . As with Chi-square filtering, we want to select features with P values less than 0.05 or 0.01 that are significantly linear with the label, while features with P values greater than 0.05 or 0.01 are considered features that have no significant linear relationship with the label and should be deleted.

Mutual information is a filtering method used to capture any relationship (both linear and nonlinear) between each feature and the label. Similar to the F test, it can be used for both regression and classification, and it includes both mutual information classification and mutual information regression. Both classes have the same usage and parameters as the F test, but the mutual information method is more powerful than the F test, which can only find linear relationships, while the mutual information method can find any relationships. Mutual information does not return statistics with similar P or F values. It returns an estimate of the amount of mutual information between each feature and the target, which takes a value between [0, 1]. A value of 0 indicates that the two variables are independent and a value of 1 indicates that the two variables are fully correlated.

3.4.3 Lasso

Lasso algorithm seeks the smallest sum of squares of residuals when the sum of absolute values of model coefficients is less than a constant. It is better than stepwise regression, principal component regression, ridge regression, partial least squares and so on in variable selection. It can better overcome the shortcomings of traditional methods in model selection. Lasso regression is one of the regularization methods and is a compressed estimation. It obtains a more refined model by constructing a penalty function. Using it to compress some coefficients while setting some coefficients to zero preserves the advantage of subset shrinkage and is a biased estimate for processing data with multicollinearity. Lasso is a shrinkage estimation method based on the idea of reducing the feature set. Lasso method can compress the coefficients of features and make some regression coefficients 0, which can be used for feature selection. Lasso method can be widely used in model improvement and selection. By choosing a penalty function, Lasso's ideas and methods are used to achieve the purpose of feature selection. Model selection is essentially a process of seeking sparse representation of a model, which can be accomplished by optimizing a loss + penalty function problem. The advantage of Lasso regression method is that it can make up for the deficiencies of least squares estimation and stepwise regression local optimal estimation. It can select features well and effectively solve the problem of multicollinearity among features. Its objective function can be expressed as:

3.5 Model training

The training process of machine learning is to first define a loss function, add input samples and get prediction tests based on forward propagation. Compared with the real sample, the loss value is obtained, and then, the reverse propagation is used to update the weight value, iterating back and forth continuously until the loss function is small and the accuracy reaches the ideal value. The parameters at this point are those required by the model. That is, the ideal model is built. This paper divides the dataset into training group and testing group; the ratio is 8 to 2. First, the training data is used to preliminarily train the model, and then, a preliminary model is obtained. Then, the test data is used to test the model to see if there is any phenomenon of fitting.

4 Results and discussion

4.1 chi-square filtration test.

This paper combines the data characteristics of the Moore dataset, chooses Chi-square filtering method, sets appropriate threshold parameters, sets K value as learning parameter through learning curve and uses the accuracy of model training as evaluation index to get learning curve.

According to the results of Fig.  4 , we can see that the accuracy of the model increases rapidly with the increase in K value in the initial stage. When K  = 12, the accuracy decreases, and then, there is a small fluctuation with the increase in K . From the result of the curve feedback, the best K value is 25, so it is only necessary to model according to these 25 features to get a good classification model. Finally, a random forest cross-validation training model with K 25 can be obtained, and the accuracy is 0.9944.

figure 4

Learning curve. Figure shows that the accuracy of the model increases rapidly with the increase in K value in the initial stage. Feature number, it means the number of feature in dataset. Accuracy, it means the accuracy of the model which we train

4.2 LassoCV test

This article uses the LassoCV class in liner model of Python to obtain the weight coefficients of each features under the optimal regularization parameters to achieve the feature selection of the dataset. It selects the first five features of the absolute value of the coefficient in Fig.  5 .

figure 5

Coefficients in the Lasso Model. X -coordinate, it shows weight coefficient under optimal regularization parameters. Y -coordinate, it shows features which weight coefficient large than other

It can be seen from Fig.  5 that the feature weight of them is high which include FFT_ba_Freq4, RTT_max_ba, RTT_avg_b_a, var_data_ip_ab, FFT_ab_Freq4, SYN_pkts_sent_ab, FFT_ba_Freq3, FFT_all_Freq4, FFT_ab_Freq. In particular, the characteristics of FFT of packet IAT are particularly prominent, and it is very important to identify the type of traffic. In this paper, 20 features with absolute value of feature weight ranking in the front are selected for model training, and the accuracy is only 0.755, which is much lower than the previous Chi-square filtering effect. The reason is that Lasso is an algorithm for multicollinearity problems, which limits the impact of multicollinearity. Lasso performs better in applications where datasets are linearly correlated. However, in this paper, the performance of this algorithm is poor. It can also be seen that Moore dataset is nonlinear.

4.3 Grasp the flow for classification

As shown in Fig.  6 , we use the software to capture the traffic data of a certain period of time, and make statistics of the data packets in the period. In the later stage, we will extract some important features of the traffic according to the Moore dataset, input them into the model to identify the traffic types that pass in this period of time, and get the statistical results of traffic categories in the periodic time window.

figure 6

IO traffic. X -coordinate, it shows the time to capture traffic. Y -coordinate, it shows the packets in one second

5 Conclusions

The Internet of things and 5G have been gradually popularized in the daily life of the general public. More and more intelligent home appliances have entered the family. Coupled with the frequent use of the Internet in life, network slice is becoming more and more mature, and the data traffic has increased dramatically. This paper selects the family as the research scene, combined with the Internet of things, and designs the family traffic analysis system, which can help family members understand the family's Internet traffic statistics, identify the invasion of malware or attacks and can also judge whether there is abnormal according to the sensor traffic data uploaded by home appliances, and solve the problem of traffic island. It has good expansibility, high recognition accuracy and easy integration. In this paper, we use experiment to compare different machine learning algorithms in feature selection. Different algorithms perform differently in different datasets. There is no absolutely good algorithm, but in this paper, because the dataset is not nonlinear, the Chi-square filtering algorithm has obvious advantages. In this paper, the accuracy rate is almost 100%, which provides a good model reference for the later actual traffic classification.

Availability of data and materials

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

Density-based spatial clustering with noise

Media access control

Sequential forward selection

Expectation maximization

Hardware abstraction layer

Inter arrival time

Operating system abstraction layer

Y. Sun, S. Qin, G. Feng,L. Zhang, M.A. Imran, Service provisioning framework for RAN slicing: user admissibility, slice association and bandwidth allocation. IEEE Trans. Mob. Comput. pp. 99 (2020)

G. Zhao, S. Qin, G. Feng, Y. Sun, Network slice selection in softwarization-based mobile networks. Trans. Emerg. Telecommun. Technol. 31 (1), e3617 (2020)

Google Scholar  

Y.J. Liu, G. Feng, Y. Sun, S. Qin, Y.C. Liang, Device association for RAN slicing based on hybrid federated deep reinforcement learning. IEEE. Trans. Veh. Technol. pp. 99 (2020)

Y. Sun, W. Jiang, G. Feng, P.V. Klaine, L. Zhang, M.A. Imran, Y.C. Liang, Efficient handover mechanism for radio access network slicing by exploiting distributed learning. IEEE Trans. Netw. Serv. Manag. 17 , 2620–2633 (2020)

Article   Google Scholar  

Y. Mu, G. Feng, J.H. Zhou, Y. Sun, Y.C. Liang, Intelligent resource scheduling for 5G radio access network slicing. IEEE Trans. Veh. Technol. 68 , 7691–7703 (2019)

L.F. Yang, J.Y. Luo, Y. Xu, Z.R. Zhang, Z.Y. Dong, A distributed dual consensus ADMM based on partition for DC-DOPF with carbon emission trading. IEEE Trans. Ind. Inform. 16 , 1858–1872 (2020)

Z.R. Zhang, Y. Yu, S.N. Fu, Broadband on-chip mode-division multiplexer based on adiabatic couplers and symmetric Y-junction. IEEE Photonics J. 9 , 1–6 (2017)

Z. Zhang, J. Li, Y. Wang, Y. Qin, Direct detection of pilot carrier-assisted DMT signals with pre-phase compensation and imaginary noise suppression. J. Lightwave Technol. 39 , 1611–1618 (2020)

H.M. Huang, Y. Yu, L.J. Zhou, Y.Y. Tao, J.B. Yang, Z.R. Zhang, Whispering gallery modes in a microsphere attached to a side-polished fiber and their application for magnetic field sensing. Opt. Commun. 478 , 126366 (2020)

H. Li, K. Ota, M. Dong, Learning IoT in edge: deep learning for the Internet of Things with edge computing. IEEE Netw. 32 (1), 96–101 (2018)

S. Eugene, T. Thanassis, H. Wendy, Analytics for the Internet of Things. ACM Comput. Surv. 51 (4), 1–36 (2018)

L. Yu, W. Wang, C. Runze, Zigbee-based IoT smart home system% the design of internet of things smart home system based on zigbee. Electron. Test. 000 (005), 71–75 (2016)

C. Mingwu, T. Guilin, Non-contact monitoring system for high-temperature industrial furnace based on internet of things technology. J. Hebei North Univ. (Natural Science Edition) 035 (005), 42–45 (2019)

Z. Kaisheng, T. Kaiyuan, L. Ming, L. Chao, Design of agricultural greenhouse environment monitoring system based on internet of things technology. J. Xi’an Univ. Sci. Technol. 035 (006), 805–811 (2015)

Z.H. Wu, Research on the application of internet of things technology to digital museum construction. Acta Geosci. Sin. 38 (2), 293–298 (2017)

Y. Wang, The innovation of computer internet of things technology in logistics field. Log. Technol. 040 (003), 41–42 (2017)

T. Jiya, J. Feng, Laser detection and control system based on internet of things technology. Laser Ma. 040 (003), 153–157 (2019)

X.Q. Wang, Research on multi-view semi-supervised learning algorithm based on co-learning. Int. Conf. Mach. Learn. Cybern. IEEE 20 (6), 1276–1280 (2016)

J. Wang, X.J. Cheng, J.Q. Liu, Y.J. Wen, A enhanced algorithm based on RSSI and quasi Newton method for the node localization in wireless sensor networks. Comput. Knowl. Technol. 12 (8), 222–225 (2016)

Z.H. Huang, X. Xu, H.H. Zhu, M.C. Zhou, An efficient group recommendation model with multiattention-based neural networks. IEEE Trans. Neural Netw. Learn. Syst. 31 , 4461–4474 (2020)

Article   MathSciNet   Google Scholar  

Y. Sun, C. Xu, G.F. Li, W.F. Xu, J.Y. Kong, D. Jiang, B. Tao, D.S. Chen, Intelligent human computer interaction based on non redundant EMG signal. Alex. Eng. J. 59 , 1149–1157 (2020)

W. Wei, H. Song, W. Li, P. Shen, A. Vasilakos, Gradient-driven parking navigation using a continuous information potential field based on wireless sensor network. Inf. Sci. 408 (2), 100–114 (2017)

S.M.M. Gilani, T. Hong, W. Jin, G. Zhao, H.M. Heang, C. Xu, Mobility management in IEEE 802.11 WLAN using SDN/NFV technologies. EURASIP J. Wirel. Commun. Netw 67 (12), 56–62 (2017)

K. Dongyeon, B. Hong, W. Choi, Probabilistic caching based on maximum distance separable code in a user-centric clustered cache-aided wireless network. IEEE Trans. Wirel. Commun. 18 , 1792–1804 (2019)

L. Ruoyu, Indexing of CNN features for large scale image search. Pattern Recognit. 48 (10), 2983–2992 (2018)

R. Deboleena, P. Panda, K. Roy, Tree-CNN: a hierarchical deep convolutional neural network for incremental learning. Neural Netw. 121 , 148–160 (2018)

J. Hongsheng, A deep 3D residual CNN for false-positive reduction in pulmonary nodule detection. Med. Phys. 45 , 2097–2107 (2018)

M. Ma, Region-sequence based six-stream CNN features for general and fine-grained human action recognition in videos. Pattern Recognit. 76 , 506–521 (2018)

Z. Chen, D. Chen, Y. Zhang, X. Cheng, M. Zhang, C. Wu, Deep learning for autonomous ship-oriented small ship detection. Saf. Sci. 130 , 104812 (2020)

V.P. Rao, D. Marandin, Adaptive channel access mechanism for zigbee (IEEE 802.15.4). J. Commun. Softw. Syst. 2 (4), 283–293 (2017)

W.Q. Huang, M. Zhang, D. Wei, D.G. Sun, J. Shi, Efficient and anti-interference method of synchronising information extraction for cideo leaking signal. IET Signal Proc. 10 (1), 63–68 (2016)

Download references

Acknowledgements

The authors thank the editor and anonymous reviewers for their helpful comments and valuable suggestions.

Author information

Authors and affiliations.

School of Artificial Intelligence, Nanning College for Vocational Technology, Nanning, 530008, Guangxi, China

Feng Xie & Dongxue Wei

School of Computer, Electronics and Information, Guangxi University, Nanning, 530004, China

Zhencheng Wang

You can also search for this author in PubMed   Google Scholar

Contributions

All authors take part in the discussion of the work described in this paper. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Feng Xie .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Xie, F., Wei, D. & Wang, Z. Traffic analysis for 5G network slice based on machine learning. J Wireless Com Network 2021 , 108 (2021). https://doi.org/10.1186/s13638-021-01991-7

Download citation

Received : 11 December 2020

Accepted : 16 April 2021

Published : 28 April 2021

DOI : https://doi.org/10.1186/s13638-021-01991-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Network slice
  • Machine learning
  • Traffic analysis

network traffic analysis research papers

Network traffic analysis using machine learning: an unsupervised approach to understand and slice your network

  • Related Documents

Network traffic analysis using Machine Learning Techniques in IoT Network

Internet of things devices are not very intelligent and resource-constrained; thus, they are vulnerable to cyber threats. Cyber threats would become potentially harmful and lead to infecting the machines, disrupting the network topologies, and denying services to their legitimate users. Artificial intelligence-driven methods and advanced machine learning-based network investigation prevent the network from malicious traffics. In this research, a support vector machine learning technique was used to classify normal and abnormal traffic. Network traffic analysis has been done to detect and prevent the network from malicious traffic. Static and dynamic analysis of malware has been done. Mininet emulator was selected for network design, VMware fusion for creating a virtual environment, hosting OS was Ubuntu Linux, network topology was a tree topology. Wireshark was used to open an existing pcap file that contains network traffic. The support vector machine classifier demonstrated the best performance with 99% accuracy.

A Group-Based IoT Devices Classification Through Network Traffic Analysis Based on Machine Learning Approach

A comparison of three machine learning techniques for encrypted network traffic analysis, detecting irc-based botnets by network traffic analysis through machine learning, network traffic analysis based on machine learning methods, novel applications of machine learning to network traffic analysis and prediction, net_watch: a tool for network traffic analysis, network traffic analysis for ddos attack detection, user profiling using smartphone network traffic analysis, fuzzing improving techniques applied and evaluated on a network traffic analysis system, export citation format, share document.

  • Skip to content
  • Skip to search
  • Skip to footer

Products, Solutions, and Services

Want some help finding the Cisco products that fit your needs? You're in the right place. If you want troubleshooting help, documentation, other support, or downloads, visit our  technical support area .

Contact Cisco

  • Get a call from Sales

Call Sales:

  • 1-800-553-6387
  • US/CAN | 5am-5pm PT
  • Product / Technical Support
  • Training & Certification

Products by technology

Networking

  • Software-defined networking
  • Cisco Silicon One
  • Cloud and network management
  • Interfaces and modules
  • Optical networking
  • See all Networking

Wireless and Mobility

Wireless and Mobility

  • Access points
  • Outdoor and industrial access points
  • Controllers
  • See all Wireless and Mobility

Security

  • Secure Firewall
  • Secure Endpoint
  • Secure Email
  • Secure Access
  • Multicloud Defense
  • See all Security

Collaboration

Collaboration

  • Collaboration endpoints
  • Conferencing
  • Cisco Contact Center
  • Unified communications
  • Experience Management
  • See all Collaboration

Data Center

Data Center

  • Servers: Cisco Unified Computing System
  • Cloud Networking
  • Hyperconverged infrastructure
  • Storage networking
  • See all Data Center

Analytics

  • Nexus Dashboard Insights
  • Network analytics
  • Cisco Secure Network Analytics (Stealthwatch)

Video

  • Video endpoints
  • Cisco Vision
  • See all Video

Internet of Things

Internet of Things (IoT)

  • Industrial Networking
  • Industrial Routers and Gateways
  • Industrial Security
  • Industrial Switching
  • Industrial Wireless
  • Industrial Connectivity Management
  • Extended Enterprise
  • Data Management
  • See all industrial IoT

Software

  • Cisco+ (as-a-service)
  • Cisco buying programs
  • Cisco Nexus Dashboard
  • Cisco Networking Software
  • Cisco DNA Software for Wireless
  • Cisco DNA Software for Switching
  • Cisco DNA Software for SD-WAN and Routing
  • Cisco Intersight for Compute and Cloud
  • Cisco ONE for Data Center Compute and Cloud
  • See all Software
  • Product index

Products by business type

Service Providers

Service providers

Small Business

Small business

Midsize

Midsize business

Cisco can provide your organization with solutions for everything from networking and data center to collaboration and security. Find the options best suited to your business needs.

  • By technology
  • By industry
  • See all solutions

CX Services

Cisco and our partners can help you transform with less risk and effort while making sure your technology delivers tangible business value.

  • See all services

Design Zone: Cisco design guides by category

Data center

  • See all Cisco design guides

End-of-sale and end-of-life

  • End-of-sale and end-of-life products
  • End-of-Life Policy
  • Cisco Commerce Build & Price
  • Cisco Software Central
  • Cisco Feature Navigator
  • See all product tools
  • Cisco Mobile Apps
  • Design Zone: Cisco design guides
  • Cisco DevNet
  • Marketplace Solutions Catalog
  • Product approvals
  • Product identification standard
  • Product warranties
  • Cisco Security Advisories
  • Security Vulnerability Policy
  • Visio stencils
  • Local Resellers
  • Technical Support

network traffic analysis research papers

Network Traffic Monitoring and Analysis Using Packet Sniffer

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

COMMENTS

  1. Review Deep Learning for Network Traffic Monitoring and Analysis (NTMA): A Survey

    Several survey papers about traffic data acquisition methods, approaches ... The work in [29] by Conti et al. conducts an in-depth survey on network traffic analysis. They categorize the relevant works into three criteria: (1) the aim of the analysis, (2) the point in the network where the traffic is monitored, and (3) the selected mobile ...

  2. A review of network traffic analysis and prediction techniques

    This paper presents a. review of several techniques proposed, used and. practiced for network traffic analysis and prediction. The distinctiveness and restrictions of previ ous. researches are ...

  3. network traffic analysis Latest Research Papers

    Recently, a majority of security operations centers (SOCs) have been facing a critical issue of increased adoption of transport layer security (TLS) encryption on the Internet, in network traffic analysis (NTA). To this end, in this survey article, we present existing research on NTA and related areas, primarily focusing on TLS-encrypted ...

  4. Network Traffic Analysis, Importance, Techniques: A Review

    Abstract: Since the release of tcpdump in 1988 the network traffic data are been captured, analyzed and used for network security related decision making. But as the technologies evolved various types of methods can be used for network traffic analysis. The prominent once are Data Mining techniques, Statistical techniques and Visualization techniques, which are surveyed and studied in the ...

  5. Network traffic analysis using machine learning: an unsupervised

    The existing research with the most similar context to this paper is presented in [] where the authors have discussed software defined networks (SDN), network function virtualization (NFV), Machine learning, and big data driven network slicing for 5G.In this work, they have proposed an architecture to classify network traffic and used those decisions for network slicing.

  6. Advanced Techniques in Network Traffic Analysis ...

    The escalating frequency and sophistication of cyber-attacks have placed a spotlight on the importance of Quality of Service (QoS) and robust network security mechanisms. Effective traffic analysis and distribution are critical for maintaining the integrity of network applications and safeguarding data. Our research focuses on the development and assessment of a network intrusion detection ...

  7. A Review of Network Traffic Analysis and Prediction Techniques

    researchers for network traffic analysis. Figure 1. Generic structure of network traffic analysis 1. DARPA data set KDD cup data has been the most widely used for evaluating of network traffic analysis with respect to intrusion detection. This data set is presented by Stolfo et al. [76]. It is constructed based on the data

  8. Machine Learning for Traffic Analysis: A Review

    Machine learning (ML) shows effective capabilities in solving network problems. A review of the techniques used in the traffic analysis is presented in this paper. © 2020 The Authors. ... The traffic statistics from network traffic analysis helps in understanding and evaluating networks utilization, download\upload speeds and type, size ...

  9. NTARC: A Data Model for the Systematic Review of Network Traffic ...

    The increased interest in secure and reliable communications has turned the analysis of network traffic data into a predominant topic. A high number of research papers propose methods to classify traffic, detect anomalies, or identify attacks. Although the goals and methodologies are commonly similar, we lack initiatives to categorize the data, methods, and findings systematically. In this ...

  10. Data Transformation Schemes for CNN-Based Network Traffic Analysis: A

    There is a striking change in the number of research papers that are devoted to the analysis of network traffic by CNN models (see Figure 1). To enhance the analysis, we distinguish three possible subjects of the articles: malware detection, traffic classification, and the junction of both.

  11. Network traffic classification: Techniques, datasets, and challenges

    Providing future directions of possible research in the area of network traffic classification. ... For each category, the paper provided an analysis of its workflow, advantages, disadvantages and deployed features. Although discussing different network traffic classification techniques, the conducted surveys suffer from multiple shortcomings ...

  12. PDF Towards Reproducible Network Traffic Analysis

    to meet our requirements and standardize network traffic analysis research. Rather than focus on a standardized feature set, which previous work has called for, pcapML standardizes network traffic analysis research at the dataset level [12, 18]. pcapML does this by enabling researchers to encode metadata and traffic definitions (i.e.,

  13. Machine Learning-Powered Encrypted Network Traffic Analysis: A

    Traffic analysis is the process of monitoring network activities, discovering specific patterns, and gleaning valuable information from network traffic. It can be applied in various fields such as network assert probing and anomaly detection. With the advent of network traffic encryption, however, traffic analysis becomes an arduous task. Due to the invisibility of packet payload, traditional ...

  14. Network Traffic Analysis using Machine Learning: an unsupervised

    { Cluster analysis: We focus on the analysis of each cluster (i.e., future slice de nition) and explain its behavior according to the property of selected features. The rest of the paper is organized as follows: Section 2 presents related work of similar researches that use ML for tra c analysis and network slic-ing.

  15. A Review of Network Traffic Analysis and Prediction Techniques

    Network traffic analysis and prediction is a proactive approach to ensure secure, reliable and qualitative network communication. ... This paper will review past research conducted on hybrid network traffic prediction models with a summary of the strengths and limitations of existing hybrid network prediction models which use optimization and ...

  16. DataSpace: A Generic Framework For Network Traffic Analysis

    Issue Date: 2022. Publisher: Princeton, NJ : Princeton University. Abstract: Researchers and practitioners rely on network traffic analysis techniquesfor a variety of critical network security and network management tasks. Ever-increasing traffic volumes and encryption rates have rendered traditional, signature-based solutions less effective.

  17. A Methodical Review on Network traffic monitoring and Analysis tools

    Abstract. Network traffic monitoring is observation of the inflow and outflow of traffic moving in-across the network. The continuous monitoring is required for various purposes such as intrusion ...

  18. PDF Network Traffic Analysis and Prediction Using Machine Learning

    International Journal of Research Publication and Reviews, Vol 4, no 8, pp 2071-2075 August 2023 2073 Feature Extraction: ML algorithms can automatically extract relevant features from raw network traffic data, reducing the need for manual feature engineering. This capability is particularly useful in capturing intricate characteristics that might be challenging to define explicitly.

  19. Traffic analysis for 5G network slice based on machine learning

    At present, there are few studies on the combination of machine learning, traffic analysis, network slice and the Internet of things, and the function of adding sensor data to the Internet traffic participation classification is not implemented [1,2,3,4].To solve this problem, under the background of 5G network slice, this paper proposes a home traffic analysis system combined with Internet of ...

  20. Complex-network-based traffic network analysis and dynamics: A

    Therefore, this paper aims to provide a comprehensive review with the content coverings the following three aspects: (1) the state-of-the-art research on complex-network-based traffic network analysis and dynamics, (2) the critical topological metrics and statistical properties of traffic networks, and (3) the main issues in research on traffic ...

  21. Network traffic analysis using machine learning: an unsupervised

    In this research, a support vector machine learning technique was used to classify normal and abnormal traffic. Network traffic analysis has been done to detect and prevent the network from malicious traffic. Static and dynamic analysis of malware has been done.

  22. Network Traffic Analysis and Prediction

    The analysis and forecasting of network traffic is a means of reliable and secure network communication. Many techniques have been proposed for network traffic congestion analysis like soft computing, regression, etc. The present study is a survey of various works carried out in the field of network traffic analysis and prediction.

  23. Products, Solutions, and Services

    Cisco+ (as-a-service) Cisco buying programs. Cisco Nexus Dashboard. Cisco Networking Software. Cisco DNA Software for Wireless. Cisco DNA Software for Switching. Cisco DNA Software for SD-WAN and Routing. Cisco Intersight for Compute and Cloud. Cisco ONE for Data Center Compute and Cloud.

  24. Network Traffic Monitoring and Analysis Using Packet Sniffer

    Traffic analysis using the internet is an activity to record data from user activities in using the Internet. This study aims to obtain data about the results of traffic in a graphical form so that it can find out the number of users who access the internet and use bandwidth. In this study, researchers also noted when the peak internet usage time in Telkom Vocational School Pekanbaru. The ...