An Overview of Traditional and Recent Trends in Video Processing

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

  • Survey paper
  • Open access
  • Published: 06 June 2019

Intelligent video surveillance: a review through deep learning techniques for crowd analysis

  • G. Sreenu   ORCID: orcid.org/0000-0002-2298-9177 1 &
  • M. A. Saleem Durai 1  

Journal of Big Data volume  6 , Article number:  48 ( 2019 ) Cite this article

71k Accesses

238 Citations

11 Altmetric

Metrics details

Big data applications are consuming most of the space in industry and research area. Among the widespread examples of big data, the role of video streams from CCTV cameras is equally important as other sources like social media data, sensor data, agriculture data, medical data and data evolved from space research. Surveillance videos have a major contribution in unstructured big data. CCTV cameras are implemented in all places where security having much importance. Manual surveillance seems tedious and time consuming. Security can be defined in different terms in different contexts like theft identification, violence detection, chances of explosion etc. In crowded public places the term security covers almost all type of abnormal events. Among them violence detection is difficult to handle since it involves group activity. The anomalous or abnormal activity analysis in a crowd video scene is very difficult due to several real world constraints. The paper includes a deep rooted survey which starts from object recognition, action recognition, crowd analysis and finally violence detection in a crowd environment. Majority of the papers reviewed in this survey are based on deep learning technique. Various deep learning methods are compared in terms of their algorithms and models. The main focus of this survey is application of deep learning techniques in detecting the exact count, involved persons and the happened activity in a large crowd at all climate conditions. Paper discusses the underlying deep learning implementation technology involved in various crowd video analysis methods. Real time processing, an important issue which is yet to be explored more in this field is also considered. Not many methods are there in handling all these issues simultaneously. The issues recognized in existing methods are identified and summarized. Also future direction is given to reduce the obstacles identified. The survey provides a bibliographic summary of papers from ScienceDirect, IEEE Xplore and ACM digital library.

Bibliographic Summary of papers in different digital repositories

Bibliographic summary about published papers under the area “Surveillance video analysis through deep learning” in digital repositories like ScienceDirect, IEEExplore and ACM are graphically demonstrated.

ScienceDirect

SceinceDirect lists around 1851 papers. Figure  1 demonstrates the year wise statistics.

figure 1

Year wise paper statistics of “surveillance video analysis by deep learning”, in ScienceDirect

Table  1 list title of 25 papers published under same area.

Table  2 gives the list of journals in ScienceDirect where above mentioned papers are published.

Keywords always indicate the main disciplines of the paper. An analysis is conducted through keywords used in published papers. Table  3 list the frequency of most frequently used keywords.

ACM digital library includes 20,975 papers in the given area. The table below includes most recently published surveillance video analysis papers under deep learning field. Table  4 lists the details of published papers in the area.

IEEE Xplore

Table  5 shows details of published papers in the given area in IEEEXplore digital library.

Violence detection among crowd

The above survey presents the topic surveillance video analysis as a general topic. By going more deeper into the area more focus is given to violence detection in crowd behavior analysis.

Table  6 lists papers specific to “violence detection in crowd behavior” from above mentioned three journals.

Introduction

Artificial intelligence paves the way for computers to think like human. Machine learning makes the way more even by adding training and learning components. The availability of huge dataset and high performance computers lead the light to deep learning concept, which extract automatically features or the factors of variation that distinguishes objects from one another. Among the various data sources which contribute to terabytes of big data, video surveillance data is having much social relevance in today’s world. The widespread availability of surveillance data from cameras installed in residential areas, industrial plants, educational institutions and commercial firms contribute towards private data while the cameras placed in public places such as city centers, public conveyances and religious places contribute to public data.

Analysis of surveillance videos involves a series of modules like object recognition, action recognition and classification of identified actions into categories like anomalous or normal. This survey giving specific focus on solutions based on deep learning architectures. Among the various architectures in deep learning, commonly used models for surveillance analysis are CNN, auto-encoders and their combination. The paper Video surveillance systems-current status and future trends [ 14 ] compares 20 papers published recently in the area of surveillance video analysis. The paper begins with identifying the main outcomes of video analysis. Application areas where surveillance cameras are unavoidable are discussed. Current status and trends in video analysis are revealed through literature review. Finally the vital points which need more consideration in near future are explicitly stated.

Surveillance video analysis: relevance in present world

The main objectives identified which illustrate the relevance of the topic are listed out below.

Continuous monitoring of videos is difficult and tiresome for humans.

Intelligent surveillance video analysis is a solution to laborious human task.

Intelligence should be visible in all real world scenarios.

Maximum accuracy is needed in object identification and action recognition.

Tasks like crowd analysis are still needs lot of improvement.

Time taken for response generation is highly important in real world situation.

Prediction of certain movement or action or violence is highly useful in emergency situation like stampede.

Availability of huge data in video forms.

The majority of papers covered for this survey give importance to object recognition and action detection. Some papers are using procedures similar to a binary classification that whether action is anomalous or not anomalous. Methods for Crowd analysis and violence detection are also included. Application areas identified are included in the next section.

Application areas identified

The contexts identified are listed as application areas. Major part in existing work provides solutions specifically based on the context.

Traffic signals and main junctions

Residential areas

Crowd pulling meetings

Festivals as part of religious institutions

Inside office buildings

Among the listed contexts crowd analysis is the most difficult part. All type of actions, behavior and movement are needed to be identified.

Surveillance video data as Big Data

Big video data have evolved in the form of increasing number of public cameras situated towards public places. A huge amount of networked public cameras are positioned around worldwide. A heavy data stream is generated from public surveillance cameras that are creatively exploitable for capturing behaviors. Considering the huge amount of data that can be documented over time, a vital scenario is facility for data warehousing and data analysis. Only one high definition video camera can produce around 10 GB of data per day [ 87 ].

The space needed for storing large amount of surveillance videos for long time is difficult to allot. Instead of having data, it will be useful to have the analysis result. That will result in reduced storage space. Deep learning techniques are involved with two main components; training and learning. Both can be achieved with highest accuracy through huge amount of data.

Main advantages of training with huge amount of data are listed below. It’s possible to adapt variety in data representation and also it can be divided into training and testing equally. Various data sets available for analysis are listed below. The dataset not only includes video sequences but also frames. The analysis part mainly includes analysis of frames which were extracted from videos. So dataset including images are also useful.

The datasets widely used for various kinds of application implementation are listed in below Table  7 . The list is not specific to a particular application though it is specified against an application.

Methods identified/reviewed other than deep learning

Methods identified are mainly classified into two categories which are either based on deep learning or not based on deep learning. This section is reviewing methods other than deep learning.

SVAS deals with automatic recognition and deduction of complex events. The event detection procedure consists of mainly two levels, low level and high level. As a result of low level analysis people and objects are detected. The results obtained from low level are used for high level analysis that is event detection. The architecture proposed in the model includes five main modules. The five sections are

Event model learning

Action model learning

Action detection

Complex event model learning

Complex event detection

Interval-based spatio-temporal model (IBSTM) is the proposed model and is a hybrid event model. Other than this methods like Threshold models, Bayesian Networks, Bag of actions and Highly cohesive intervals and Markov logic networks are used.

SVAS method can be improved to deal with moving camera and multi camera data set. Further enhancements are needed in dealing with complex events specifically in areas like calibration and noise elimination.

Multiple anomalous activity detection in videos [ 88 ] is a rule based system. The features are identified as motion patterns. Detection of anomalous events are done either by training the system or by following dominant set property.

The concept of dominant set where events are detected as normal based on dominant behavior and anomalous events are decided based on less dominant behavior. The advantage of rule based system is that easy to recognize new events by modifying some rules. The main steps involved in a recognition system are

Pre processing

Feature extraction

Object tracking

Behavior understanding

As a preprocessing system video segmentation is used. Background modeling is implemented through Gaussian Mixture Model (GMM). For object recognition external rules are required. The system is implemented in Matlab 2014. The areas were more concentration further needed are doubtful activities and situations where multiple object overlapping happens.

Mining anomalous events against frequent sequences in surveillance videos from commercial environments [ 89 ] focus on abnormal events linked with frequent chain of events. The main result in identifying such events is early deployment of resources in particular areas. The implementation part is done using Matlab, Inputs are already noticed events and identified frequent series of events. The main investigation under this method is to recognize events which are implausible to chase given sequential pattern by fulfilling the user identified parameters.

The method is giving more focus on event level analysis and it will be interesting if pay attention at entity level and action level. But at the same time going in such granular level make the process costly.

Video feature descriptor combining motion and appearance cues with length invariant characteristics [ 90 ] is a feature descriptor. Many trajectory based methods have been used in abundant installations. But those methods have to face problems related with occlusions. As a solution to that, feature descriptor using optical flow based method.

As per the algorithm the training set is divided into snippet set. From each set images are extracted and then optical flow are calculated. The covariance is calculated from optical flow. One class SVM is used for learning samples. For testing also same procedure is performed.

The model can be extended in future by handling local abnormal event detection through proposed feature which is related with objectness method.

Multiple Hierarchical Dirichlet processes for anomaly detection in Traffic [ 91 ] is mainly for understanding the situation in real world traffic. The anomalies are mainly due to global patterns instead of local patterns. That include entire frame. Concept of super pixel is included. Super pixels are grouped into regions of interest. Optical flow based method is used for calculating motion in each super pixel. Points of interest are then taken out in active super pixel. Those interested points are then tracked by Kanade–Lucas–Tomasi (KLT) tracker.

The method is better the handle videos involving complex patterns with less cost. But not mentioning about videos taken in rainy season and bad weather conditions.

Intelligent video surveillance beyond robust background modeling [ 92 ] handle complex environment with sudden illumination changes. Also the method will reduce false alerts. Mainly two components are there. IDS and PSD are the two components.

First stage intruder detection system will detect object. Classifier will verify the result and identify scenes causing problems. Then in second stage problematic scene descriptor will handle positives generated from IDS. Global features are used to avoid false positives from IDS.

Though the method deals with complex scenes, it does not mentioning about bad weather conditions.

Towards abnormal trajectory and event detection in video surveillance [ 93 ] works like an integrated pipeline. Existing methods either use trajectory based approaches or pixel based approaches. But this proposal incorporates both methods. Proposal include components like

Object and group tracking

Grid based analysis

Trajectory filtering

Abnormal behavior detection using actions descriptors

The method can identify abnormal behavior in both individual and groups. The method can be enhanced by adapting it to work in real time environment.

RIMOC: a feature to discriminate unstructured motions: application to violence detection for video surveillance [ 94 ]. There is no unique definition for violent behaviors. Those kind of behaviors show large variances in body poses. The method works by taking the eigen values of histograms of optical flow.

The input video undergoes dense sampling. Local spatio temporal volumes are created around each sampled point. Those frames of STV are coded as histograms of optical flow. Eigen values are computed from this frame. The papers already published in surveillance area span across a large set. Among them methods which are unique in either implementation method or the application for which it is proposed are listed in the below Table  8 .

The methods already described and listed are able to perform following steps

Object detection

Object discrimination

Action recognition

But these methods are not so efficient in selecting good features in general. The lag identified in methods was absence of automatic feature identification. That issue can be solved by applying concepts of deep learning.

The evolution of artificial intelligence from rule based system to automatic feature identification passes machine learning, representation learning and finally deep learning.

Real-time processing in video analysis

Real time Violence Detection Framework for Football Stadium comprising of Big Data Analysis and deep learning through Bidirectional LSTM [ 103 ] predicts violent behavior of crowd in real time. The real time processing speed is achieved through SPARK frame work. The model architecture includes Apache spark framework, spark streaming, Histogram of oriented Gradients function and bidirectional LSTM. The model takes stream of videos from diverse sources as input. The videos are converted in the form of non overlapping frames. Features are extracted from this group of frames through HOG FUNCTION. The images are manually modeled into different groups. The BDLSTM is trained through all these models. The SPARK framework handles the streaming data in a micro batch mode. Two kinds of processing are there like stream and batch processing.

Intelligent video surveillance for real-time detection of suicide attempts [ 104 ] is an effort to prevent suicide by hanging in prisons. The method uses depth streams offered by an RGB-D camera. The body joints’ points are analyzed to represent suicidal behavior.

Spatio-temporal texture modeling for real-time crowd anomaly detection [ 105 ]. Spatio temporal texture is a combination of spatio temporal slices and spatio temporal volumes. The information present in these slices are abstracted through wavelet transforms. A Gaussian approximation model is applied to texture patterns to distinguish normal behaviors from abnormal behaviors.

Deep learning models in surveillance

Deep convolutional framework for abnormal behavior detection in a smart surveillance system [ 106 ] includes three sections.

Human subject detection and discrimination

A posture classification module

An abnormal behavior detection module

The models used for above three sections are, Correspondingly

You only look once (YOLO) network

Long short-term memory (LSTM)

For object discrimination Kalman filter based object entity discrimination algorithm is used. Posture classification study recognizes 10 types of poses. RNN uses back propagation through time (BPTT) to update weight.

The main issue identified in the method is that similar activities like pointing and punching are difficult to distinguish.

Detecting Anomalous events in videos by learning deep representations of appearance and motion [ 107 ] proposes a new model named as AMDN. The model automatically learns feature representations. The model uses stacked de-noising auto encoders for learning appearance and motion features separately and jointly. After learning, multiple one class SVM’s are trained. These SVM predict anomaly score of each input. Later these scores are combined and detect abnormal event. A double fusion framework is used. The computational overhead in testing time is too high for real time processing.

A study of deep convolutional auto encoders for anomaly detection in videos [ 12 ] proposes a structure that is a mixture of auto encoders and CNN. An auto encoder includes an encoder part and decoder part. The encoder part includes convolutional and pooling layers, the decoding part include de convolutional and unpool layers. The architecture allows a combination of low level frames withs high level appearance and motion features. Anomaly scores are represented through reconstruction errors.

Going deeper with convolutions [ 108 ] suggests improvements over traditional neural network. Fully connected layers are replaced by sparse ones by adding sparsity into architecture. The paper suggests for dimensionality reduction which help to reduce the increasing demand for computational resources. Computing reductions happens with 1 × 1 convolutions before reaching 5 × 5 convolutions. The method is not mentioning about the execution time. Along with that not able to make conclusion about the crowd size that the method can handle successfully.

Deep learning for visual understanding: a review [ 109 ], reviewing the fundamental models in deep learning. Models and technique described were CNN, RBM, Autoencoder and Sparse coding. The paper also mention the drawbacks of deep learning models such as people were not able to understand the underlying theory very well.

Deep learning methods other than the ones discussed above are listed in the following Table  9 .

The methods reviewed in above sections are good in automatic feature generation. All methods are good in handling individual entity and group entities with limited size.

Majority of problems in real world arises among crowd. Above mentioned methods are not effective in handling crowd scenes. Next section will review intelligent methods for analyzing crowd video scenes.

Review in the field of crowd analysis

The review include methods which are having deep learning background and methods which are not having that background.

Spatial temporal convolutional neural networks for anomaly detection and localization in crowded scenes [ 114 ] shows the problem related with crowd analysis is challenging because of the following reasons

Large number of pedestrians

Close proximity

Volatility of individual appearance

Frequent partial occlusions

Irregular motion pattern in crowd

Dangerous activities like crowd panic

Frame level and pixel level detection

The paper suggests optical flow based solution. The CNN is having eight layers. Training is based on BVLC caffe. Random initialization of parameters is done and system is trained through stochastic gradient descent based back propagation. The implementation part is done by considering four different datasets like UCSD, UMN, Subway and finally U-turn. The details of implementation regarding UCSD includes frame level and pixel level criterion. Frame level criterion concentrates on temporal domain and pixel level criterion considers both spatiial and temporal domain. Different metrics to evaluate performance includes EER (Equal Error Rate) and Detection Rate (DR).

Online real time crowd behavior detection in video sequences [ 115 ] suggests FSCB, behavior detection through feature tracking and image segmentation. The procedure involves following steps

Feature detection and temporal filtering

Image segmentation and blob extraction

Activity detection

Activity map

Activity analysis

The main advantage is no need of training stage for this method. The method is quantitatively analyzed through ROC curve generation. The computational speed is evaluated through frame rate. The data set considered for experiments include UMN, PETS2009, AGORASET and Rome Marathon.

Deep learning for scene independent crowd analysis [ 82 ] proposes a scene independent method which include following procedures

Crowd segmentation and detection

Crowd tracking

Crowd counting

Pedestrian travelling time estimation

Crowd attribute recognition

Crowd behavior analysis

Abnormality detection in a crowd

Attribute recognition is done thorugh a slicing CNN. By using a 2D CNN model learn appearance features then represent it as a cuboid. In the cuboid three temporal filters are identified. Then a classifier is applied on concatenated feature vector extracted from cuboid. Crowd counting and crowd density estimation is treated as a regression problem. Crowd attribute recognition is applied on WWW Crowd dataset. Evaluation metrics used are AUC and AP.

The analysis of High Density Crowds in videos [ 80 ] describes methods like data driven crowd analysis and density aware tracking. Data driven analysis learn crowd motion patterns from large collection of crowd videos through an off line manner. Learned pattern can be applied or transferred in applications. The solution includes a two step procedure. Global crowded scene matching and local crowd patch matching. Figure  2 illustrates the two step procedure.

figure 2

a Test video, b results of global matching, c a query crowd patch, d matching crowd patches [ 80 ]

The database selected for experimental evaluation includes 520 unique videos with 720 × 480 resolutions. The main evaluation is to track unusual and unexpected actions of individuals in a crowd. Through experiments it is proven that data driven tracking is better than batch mode tracking. Density based person detection and tracking include steps like baseline detector, geometric filtering and tracking using density aware detector.

A review on classifying abnormal behavior in crowd scene [ 77 ] mainly demonstrates four key approaches such as Hidden Markov Model (HMM), GMM, optical flow and STT. GMM itself is enhanced with different techniques to capture abnormal behaviours. The enhanced versions of GMM are

GMM and Markov random field

Gaussian poisson mixture model and

GMM and support vector machine

GMM architecture includes components like local descriptor, global descriptor, classifiers and finally a fusion strategy. The distinction between normal and and abnormal behaviour is evaluated based on Mahalanobis distance method. GMM–MRF model mainly divided into two sections where first section identifies motion pttern through GMM and crowd context modelling is done through MRF. GPMM adds one extra feture such as count of occurrence of observed behaviour. Also EM is used for training at later stage of GPMM. GMM–SVM incorporate features such as crowd collectiveness, crowd density, crowd conflict etc. for abnormality detection.

HMM has also variants like

HM and OSVMs

Hidden Markov Model is a density aware detection method used to detect motion based abnormality. The method generates foreground mask and perspective mask through ORB detector. GM-HMM involves four major steps. First step GMBM is used for identifying foreground pixels and further lead to development of blobs generation. In second stage PCA–HOG and motion HOG are used for feature extraction. The third stage applies k means clustering to separately cluster features generated through PCA–HOG and motion–HOG. In final stage HMM processes continuous information of moving target through the application of GM. In SLT-HMM short local trajectories are used along with HMM to achieve better localization of moving objects. MOHMM uses KLT in first phase to generate trajectories and clustering is applied on them. Second phase uses MOHMM to represent the trajectories to define usual and unusual frames. OSVM uses kernel functions to solve the nonlinearity problem by mapping high dimensional features in to a linear space by using kernel function.

In optical flow based method the enhancements made are categorized into following techniques such as HOFH, HOFME, HMOFP and MOFE.

In HOFH video frames are divided into several same size patches. Then optical flows are extracted. It is divided into eight directions. Then expectation and variance features are used to calculate optical flow between frames. HOFME descriptor is used at the final stage of abnormal behaviour detection. As the first step frame difference is calculated then extraction of optical flow pattern and finally spatio temporal description using HOFME is completed. HMOFP Extract optical flow from each frame and divided into patches. The optical flows are segmented into number of bins. Maximum amplitude flows are concatenated to form global HMOFP. MOFE method convert frames into blobs and optical flow in all the blobs are extracted. These optical flow are then clustered into different groups. In STT, crowd tracking and abnormal behaviour detection is done through combing spatial and temporal dimensions of features.

Crowd behaviour analysis from fixed and moving cameras [ 78 ] covers topics like microscopic and macroscopic crowd modeling, crowd behavior and crowd density analysis and datasets for crowd behavior analysis. Large crowds are handled through macroscopic approaches. Here agents are handled as a whole. In microscopic approaches agents are handled individually. Motion information to represent crowd can be collected through fixed and moving cameras. CNN based methods like end-to-end deep CNN, Hydra-CNN architecture, switching CNN, cascade CNN architecture, 3D CNN and spatio temporal CNN are discussed for crowd behaviour analysis. Different datasets useful specifically for crowd behaviour analysis are also described in the chapter. The metrics used are MOTA (multiple person tracker accuracy) and MOTP (multiple person tracker precision). These metrics consider multi target scenarios usually present in crowd scenes. The dataset used for experimental evaluation consists of UCSD, Violent-flows, CUHK, UCF50, Rodriguez’s, The mall and finally the worldExpo’s dataset.

Zero-shot crowd behavior recognition [ 79 ] suggests recognizers with no or little training data. The basic idea behind the approach is attribute-context cooccurrence. Prediction of behavioural attribute is done based on their relationship with known attributes. The method encompass different steps like probabilistic zero shot prediction. The method calculates the conditional probability of known to original appropriate attribute relation. The second step includes learning attribute relatedness from Text Corpora and Context learning from visual co-occurrence. Figure  3 shows the illustration of results.

figure 3

Demonstration of crowd videos ranked in accordance with prediction values [ 79 ]

Computer vision based crowd disaster avoidance system: a survey [ 81 ] covers different perspectives of crowd scene analysis such as number of cameras employed and target of interest. Along with that crowd behavior analysis, people count, crowd density estimation, person re identification, crowd evacuation, and forensic analysis on crowd disaster and computations on crowd analysis. A brief summary about benchmarked datasets are also given.

Fast Face Detection in Violent Video Scenes [ 83 ] suggests an architecture with three steps such as violent scene detector, a normalization algorithm and finally a face detector. ViF descriptor along with Horn–Schunck is used for violent scene detection, used as optical flow algorithm. Normalization procedure includes gamma intensity correction, difference Gauss, Local Histogram Coincidence and Local Normal Distribution. Face detection involve mainly two stages. First stage is segmenting regions of skin and the second stage check each component of face.

Rejecting Motion Outliers for Efficient Crowd Anomaly Detection [ 54 ] provides a solution which consists of two phases. Feature extraction and anomaly classification. Feature extraction is based on flow. Different steps involved in the pipeline are input video is divided into frames, frames are divided into super pixels, extracting histogram for each super pixel, aggregating histograms spatially and finally concatenation of combined histograms from consecutive frames for taking out final feature. Anomaly can be detected through existing classification algorithms. The implementation is done through UCSD dataset. Two subsets with resolution 158 × 238 and 240 × 360 are present. The normal behavior was used to train k means and KUGDA. The normal and abnormal behavior is used to train linear SVM. The hardware part includes Artix 7 xc7a200t FPGA from Xilinx, Xilinx IST and XPower Analyzer.

Deep Metric Learning for Crowdedness Regression [ 84 ] includes deep network model where learning of features and distance measurements are done concurrently. Metric learning is used to study a fine distance measurement. The proposed model is implemented through Tensorflow package. Rectified linear unit is used as an activation function. The training method applied is gradient descent. Performance is evaluated through mean squared error and mean absolute error. The WorldExpo dataset and the Shanghai Tech dataset are used for experimental evaluation.

A Deep Spatiotemporal Perspective for Understanding Crowd Behavior [ 61 ] is a combination of convolution layer and long short-term memory. Spatial informations are captured through convolution layer and temporal motion dynamics are confined through LSTM. The method forecasts the pedestrian path, estimate the destination and finally categorize the behavior of individuals according to motion pattern. Path forecasting technique includes two stacked ConvLSTM layers by 128 hidden states. Kernel of ConvLSTM size is 3 × 3, with a stride of 1 and zeropadding. Model takes up a single convolution layer with a 1 × 1 kernel size. Crowd behavior classification is achieved through a combination of three layers namely an average spatial pooling layer, a fully connected layer and a softmax layer.

Crowded Scene Understanding by Deeply Learned Volumetric Slices [ 85 ] suggests a deep model and different fusion approaches. The architecture involves convolution layers, global sum pooling layer and fully connected layers. Slice fusion and weight sharing schemes are required by the architecture. A new multitask learning deep model is projected to equally study motion features and appearance features and successfully join them. A new concept of crowd motion channels are designed as input to the model. The motion channel analyzes the temporal progress of contents in crowd videos. The motion channels are stirred by temporal slices that clearly demonstrate the temporal growth of contents in crowd videos. In addition, we also conduct wide-ranging evaluations by multiple deep structures with various data fusion and weights sharing schemes to find out temporal features. The network is configured with convlutional layer, pooling layer and fully connected layer with activation functions such as rectified linear unit and sigmoid function. Three different kinds of slice fusion techniques are applied to measure the efficiency of proposed input channels.

Crowd Scene Understanding from Video A survey [ 86 ] mainly deals with crowd counting. Different approaches for crowd counting are categorized into six. Pixel level analysis, texture level analysis, object level analysis, line counting, density mapping and joint detection and counting. Edge features are analyzed through pixel level analysis. Image patches are analysed through texture level analysis. Object level analysis is more accurate compared to pixel and texture analysis. The method identifies individual subjects in a scene. Line counting is used to take the count of people crossed a particular line.

Table  10 will discuss some more crowd analysis methods.

Results observed from the survey and future directions

The accuracy analysis conducted for some of the above discussed methods based on various evaluation criteria like AUC, precision and recall are discussed below.

Rejecting Motion Outliers for Efficient Crowd Anomaly Detection [ 54 ] compare different methods as shown in Fig.  4 . KUGDA is a classifier proposed in Rejecting Motion Outliers for Efficient Crowd Anomaly Detection [ 54 ].

figure 4

Comparing KUGDA with K-means [ 54 ]

Fast Face Detection in Violent Video Scenes [ 83 ] uses a ViF descriptor for violence scene detection. Figure 5 shows the evaluation of an SVM classifier using ROC curve.

figure 5

Receiver operating characteristics of a classifier with ViF descriptor [ 83 ]

Figure  6 represents a comparison of detection performance which is conducted by different methods [ 80 ]. The comparison shows the improvement of density aware detector over other methods.

figure 6

Comparing detection performance of density aware detector with different methods [ 80 ]

As an analysis of existing methods the following shortcomings were identified. Real world problems are having following objectives like

Time complexity

Bad weather conditions

Real world dynamics

Overlapping of objects

Existing methods were handling the problems separately. No method handles all the objectives as features in a single proposal.

To handle effective intelligent crowd video analysis in real time the method should be able to provide solutions to all these problems. Traditional methods are not able to generate efficient economic solution in a time bounded manner.

The availability of high performance computational resource like GPU allows implementation of deep learning based solutions for fast processing of big data. Existing deep learning architectures or models can be combined by including good features and removing unwanted features.

The paper reviews intelligent surveillance video analysis techniques. Reviewed papers cover wide variety of applications. The techniques, tools and dataset identified were listed in form of tables. Survey begins with video surveillance analysis in general perspective, and then finally moves towards crowd analysis. Crowd analysis is difficult in such a way that crowd size is large and dynamic in real world scenarios. Identifying each entity and their behavior is a difficult task. Methods analyzing crowd behavior were discussed. The issues identified in existing methods were listed as future directions to provide efficient solution.

Abbreviations

Surveillance Video Analysis System

Interval-Based Spatio-Temporal Model

Kanade–Lucas–Tomasi

Gaussian Mixture Model

Support Vector Machine

Deep activation-based attribute learning

Hidden Markov Model

You only look once

Long short-term memory

Area under the curve

Violent flow descriptor

Kardas K, Cicekli NK. SVAS: surveillance video analysis system. Expert Syst Appl. 2017;89:343–61.

Article   Google Scholar  

Wang Y, Shuai Y, Zhu Y, Zhang J. An P Jointly learning perceptually heterogeneous features for blind 3D video quality assessment. Neurocomputing. 2019;332:298–304 (ISSN 0925-2312) .

Tzelepis C, Galanopoulos D, Mezaris V, Patras I. Learning to detect video events from zero or very few video examples. Image Vis Comput. 2016;53:35–44 (ISSN 0262-8856) .

Fakhar B, Kanan HR, Behrad A. Learning an event-oriented and discriminative dictionary based on an adaptive label-consistent K-SVD method for event detection in soccer videos. J Vis Commun Image Represent. 2018;55:489–503 (ISSN 1047-3203) .

Luo X, Li H, Cao D, Yu Y, Yang X, Huang T. Towards efficient and objective work sampling: recognizing workers’ activities in site surveillance videos with two-stream convolutional networks. Autom Constr. 2018;94:360–70 (ISSN 0926-5805) .

Wang D, Tang J, Zhu W, Li H, Xin J, He D. Dairy goat detection based on Faster R-CNN from surveillance video. Comput Electron Agric. 2018;154:443–9 (ISSN 0168-1699) .

Shao L, Cai Z, Liu L, Lu K. Performance evaluation of deep feature learning for RGB-D image/video classification. Inf Sci. 2017;385:266–83 (ISSN 0020-0255) .

Ahmed SA, Dogra DP, Kar S, Roy PP. Surveillance scene representation and trajectory abnormality detection using aggregation of multiple concepts. Expert Syst Appl. 2018;101:43–55 (ISSN 0957-4174) .

Arunnehru J, Chamundeeswari G, Prasanna Bharathi S. Human action recognition using 3D convolutional neural networks with 3D motion cuboids in surveillance videos. Procedia Comput Sci. 2018;133:471–7 (ISSN 1877-0509) .

Guraya FF, Cheikh FA. Neural networks based visual attention model for surveillance videos. Neurocomputing. 2015;149(Part C):1348–59 (ISSN 0925-2312) .

Pathak AR, Pandey M, Rautaray S. Application of deep learning for object detection. Procedia Comput Sci. 2018;132:1706–17 (ISSN 1877-0509) .

Ribeiro M, Lazzaretti AE, Lopes HS. A study of deep convolutional auto-encoders for anomaly detection in videos. Pattern Recogn Lett. 2018;105:13–22.

Huang W, Ding H, Chen G. A novel deep multi-channel residual networks-based metric learning method for moving human localization in video surveillance. Signal Process. 2018;142:104–13 (ISSN 0165-1684) .

Tsakanikas V, Dagiuklas T. Video surveillance systems-current status and future trends. Comput Electr Eng. In press, corrected proof, Available online 14 November 2017.

Wang Y, Zhang D, Liu Y, Dai B, Lee LH. Enhancing transportation systems via deep learning: a survey. Transport Res Part C Emerg Technol. 2018. https://doi.org/10.1016/j.trc.2018.12.004 (ISSN 0968-090X) .

Huang H, Xu Y, Huang Y, Yang Q, Zhou Z. Pedestrian tracking by learning deep features. J Vis Commun Image Represent. 2018;57:172–5 (ISSN 1047-3203) .

Yuan Y, Zhao Y, Wang Q. Action recognition using spatial-optical data organization and sequential learning framework. Neurocomputing. 2018;315:221–33 (ISSN 0925-2312) .

Perez M, Avila S, Moreira D, Moraes D, Testoni V, Valle E, Goldenstein S, Rocha A. Video pornography detection through deep learning techniques and motion information. Neurocomputing. 2017;230:279–93 (ISSN 0925-2312) .

Pang S, del Coz JJ, Yu Z, Luaces O, Díez J. Deep learning to frame objects for visual target tracking. Eng Appl Artif Intell. 2017;65:406–20 (ISSN 0952-1976) .

Wei X, Du J, Liang M, Ye L. Boosting deep attribute learning via support vector regression for fast moving crowd counting. Pattern Recogn Lett. 2017. https://doi.org/10.1016/j.patrec.2017.12.002 .

Xu M, Fang H, Lv P, Cui L, Zhang S, Zhou B. D-stc: deep learning with spatio-temporal constraints for train drivers detection from videos. Pattern Recogn Lett. 2017. https://doi.org/10.1016/j.patrec.2017.09.040 (ISSN 0167-8655) .

Hassan MM, Uddin MZ, Mohamed A, Almogren A. A robust human activity recognition system using smartphone sensors and deep learning. Future Gener Comput Syst. 2018;81:307–13 (ISSN 0167-739X) .

Wu G, Lu W, Gao G, Zhao C, Liu J. Regional deep learning model for visual tracking. Neurocomputing. 2016;175:310–23 (ISSN 0925-2312) .

Nasir M, Muhammad K, Lloret J, Sangaiah AK, Sajjad M. Fog computing enabled cost-effective distributed summarization of surveillance videos for smart cities. J Parallel Comput. 2018. https://doi.org/10.1016/j.jpdc.2018.11.004 (ISSN 0743-7315) .

Najva N, Bijoy KE. SIFT and tensor based object detection and classification in videos using deep neural networks. Procedia Comput Sci. 2016;93:351–8 (ISSN 1877-0509) .

Yu Z, Li T, Yu N, Pan Y, Chen H, Liu B. Reconstruction of hidden representation for Robust feature extraction. ACM Trans Intell Syst Technol. 2019;10(2):18.

Mammadli R, Wolf F, Jannesari A. The art of getting deep neural networks in shape. ACM Trans Archit Code Optim. 2019;15:62.

Zhou T, Tucker R, Flynn J, Fyffe G, Snavely N. Stereo magnification: learning view synthesis using multiplane images. ACM Trans Graph. 2018;37:65

Google Scholar  

Fan Z, Song X, Xia T, Jiang R, Shibasaki R, Sakuramachi R. Online Deep Ensemble Learning for Predicting Citywide Human Mobility. Proc ACM Interact Mob Wearable Ubiquitous Technol. 2018;2:105.

Hanocka R, Fish N, Wang Z, Giryes R, Fleishman S, Cohen-Or D. ALIGNet: partial-shape agnostic alignment via unsupervised learning. ACM Trans Graph. 2018;38:1.

Xu M, Qian F, Mei Q, Huang K, Liu X. DeepType: on-device deep learning for input personalization service with minimal privacy concern. Proc ACM Interact Mob Wearable Ubiquitous Technol. 2018;2:197.

Potok TE, Schuman C, Young S, Patton R, Spedalieri F, Liu J, Yao KT, Rose G, Chakma G. A study of complex deep learning networks on high-performance, neuromorphic, and quantum computers. J Emerg Technol Comput Syst. 2018;14:19.

Pouyanfar S, Sadiq S, Yan Y, Tian H, Tao Y, Reyes MP, Shyu ML, Chen SC, Iyengar SS. A survey on deep learning: algorithms, techniques, and applications. ACM Comput Surv. 2018;51:92.

Tian Y, Lee GH, He H, Hsu CY, Katabi D. RF-based fall monitoring using convolutional neural networks. Proc ACM Interact Mob Wearable Ubiquitous Technol. 2018;2:137.

Roy P, Song SL, Krishnamoorthy S, Vishnu A, Sengupta D, Liu X. NUMA-Caffe: NUMA-aware deep learning neural networks. ACM Trans Archit Code Optim. 2018;15:24.

Lovering C, Lu A, Nguyen C, Nguyen H, Hurley D, Agu E. Fact or fiction. Proc ACM Hum-Comput Interact. 2018;2:111.

Ben-Hamu H, Maron H, Kezurer I, Avineri G, Lipman Y. Multi-chart generative surface modeling. ACM Trans Graph. 2018;37:215

Ge W, Gong B, Yu Y. Image super-resolution via deterministic-stochastic synthesis and local statistical rectification. ACM Trans Graph. 2018;37:260

Hedman P, Philip J, Price T, Frahm JM, Drettakis G, Brostow G. Deep blending for free-viewpoint image-based rendering. ACM Trans Graph. 2018;37:257

Sundararajan K, Woodard DL. Deep learning for biometrics: a survey. ACM Comput Surv. 2018;51:65.

Kim H, Kim T, Kim J, Kim JJ. Deep neural network optimized to resistive memory with nonlinear current–voltage characteristics. J Emerg Technol Comput Syst. 2018;14:15.

Wang C, Yang H, Bartz C, Meinel C. Image captioning with deep bidirectional LSTMs and multi-task learning. ACM Trans Multimedia Comput Commun Appl. 2018;14:40.

Yao S, Zhao Y, Shao H, Zhang A, Zhang C, Li S, Abdelzaher T. RDeepSense: Reliable Deep Mobile Computing Models with Uncertainty Estimations. Proc ACM Interact Mob Wearable Ubiquitous Technol. 2018;1:173.

Liu D, Cui W, Jin K, Guo Y, Qu H. DeepTracker: visualizing the training process of convolutional neural networks. ACM Trans Intell Syst Technol. 2018;10:6.

Yi L, Huang H, Liu D, Kalogerakis E, Su H, Guibas L. Deep part induction from articulated object pairs. ACM Trans Graph. 2018. https://doi.org/10.1145/3272127.3275027 .

Zhao N, Cao Y, Lau RW. What characterizes personalities of graphic designs? ACM Trans Graph. 2018;37:116.

Tan J, Wan X, Liu H, Xiao J. QuoteRec: toward quote recommendation for writing. ACM Trans Inf Syst. 2018;36:34.

Qu Y, Fang B, Zhang W, Tang R, Niu M, Guo H, Yu Y, He X. Product-based neural networks for user response prediction over multi-field categorical data. ACM Trans Inf Syst. 2018;37:5.

Yin K, Huang H, Cohen-Or D, Zhang H. P2P-NET: bidirectional point displacement net for shape transform. ACM Trans Graph. 2018;37:152.

Yao S, Zhao Y, Shao H, Zhang C, Zhang A, Hu S, Liu D, Liu S, Su L, Abdelzaher T. SenseGAN: enabling deep learning for internet of things with a semi-supervised framework. Proc ACM Interact Mob Wearable Ubiquitous Technol. 2018;2:144.

Saito S, Hu L, Ma C, Ibayashi H, Luo L, Li H. 3D hair synthesis using volumetric variational autoencoders. ACM Trans Graph. 2018. https://doi.org/10.1145/3272127.3275019 .

Chen A, Wu M, Zhang Y, Li N, Lu J, Gao S, Yu J. Deep surface light fields. Proc ACM Comput Graph Interact Tech. 2018;1:14.

Chu W, Xue H, Yao C, Cai D. Sparse coding guided spatiotemporal feature learning for abnormal event detection in large videos. IEEE Trans Multimedia. 2019;21(1):246–55.

Khan MUK, Park H, Kyung C. Rejecting motion outliers for efficient crowd anomaly detection. IEEE Trans Inf Forensics Secur. 2019;14(2):541–56.

Tao D, Guo Y, Yu B, Pang J, Yu Z. Deep multi-view feature learning for person re-identification. IEEE Trans Circuits Syst Video Technol. 2018;28(10):2657–66.

Zhang D, Wu W, Cheng H, Zhang R, Dong Z, Cai Z. Image-to-video person re-identification with temporally memorized similarity learning. IEEE Trans Circuits Syst Video Technol. 2018;28(10):2622–32.

Serrano I, Deniz O, Espinosa-Aranda JL, Bueno G. Fight recognition in video using hough forests and 2D convolutional neural network. IEEE Trans Image Process. 2018;27(10):4787–97. https://doi.org/10.1109/tip.2018.2845742 .

Article   MathSciNet   MATH   Google Scholar  

Li Y, Li X, Zhang Y, Liu M, Wang W. Anomalous sound detection using deep audio representation and a blstm network for audio surveillance of roads. IEEE Access. 2018;6:58043–55.

Muhammad K, Ahmad J, Mehmood I, Rho S, Baik SW. Convolutional neural networks based fire detection in surveillance videos. IEEE Access. 2018;6:18174–83.

Ullah A, Ahmad J, Muhammad K, Sajjad M, Baik SW. Action recognition in video sequences using deep bi-directional LSTM with CNN features. IEEE Access. 2018;6:1155–66.

Li Y. A deep spatiotemporal perspective for understanding crowd behavior. IEEE Trans Multimedia. 2018;20(12):3289–97.

Pamula T. Road traffic conditions classification based on multilevel filtering of image content using convolutional neural networks. IEEE Intell Transp Syst Mag. 2018;10(3):11–21.

Vandersmissen B, et al. indoor person identification using a low-power FMCW radar. IEEE Trans Geosci Remote Sens. 2018;56(7):3941–52.

Min W, Yao L, Lin Z, Liu L. Support vector machine approach to fall recognition based on simplified expression of human skeleton action and fast detection of start key frame using torso angle. IET Comput Vision. 2018;12(8):1133–40.

Perwaiz N, Fraz MM, Shahzad M. Person re-identification using hybrid representation reinforced by metric learning. IEEE Access. 2018;6:77334–49.

Olague G, Hernández DE, Clemente E, Chan-Ley M. Evolving head tracking routines with brain programming. IEEE Access. 2018;6:26254–70.

Dilawari A, Khan MUG, Farooq A, Rehman Z, Rho S, Mehmood I. Natural language description of video streams using task-specific feature encoding. IEEE Access. 2018;6:16639–45.

Zeng D, Zhu M. Background subtraction using multiscale fully convolutional network. IEEE Access. 2018;6:16010–21.

Goswami G, Vatsa M, Singh R. Face verification via learned representation on feature-rich video frames. IEEE Trans Inf Forensics Secur. 2017;12(7):1686–98.

Keçeli AS, Kaya A. Violent activity detection with transfer learning method. Electron Lett. 2017;53(15):1047–8.

Lu W, et al. Unsupervised sequential outlier detection with deep architectures. IEEE Trans Image Process. 2017;26(9):4321–30.

Feizi A. High-level feature extraction for classification and person re-identification. IEEE Sens J. 2017;17(21):7064–73.

Lee Y, Chen S, Hwang J, Hung Y. An ensemble of invariant features for person reidentification. IEEE Trans Circuits Syst Video Technol. 2017;27(3):470–83.

Uddin MZ, Khaksar W, Torresen J. Facial expression recognition using salient features and convolutional neural network. IEEE Access. 2017;5:26146–61.

Mukherjee SS, Robertson NM. Deep head pose: Gaze-direction estimation in multimodal video. IEEE Trans Multimedia. 2015;17(11):2094–107.

Hayat M, Bennamoun M, An S. Deep reconstruction models for image set classification. IEEE Trans Pattern Anal Mach Intell. 2015;37(4):713–27.

Afiq AA, Zakariya MA, Saad MN, Nurfarzana AA, Khir MHM, Fadzil AF, Jale A, Gunawan W, Izuddin ZAA, Faizari M. A review on classifying abnormal behavior in crowd scene. J Vis Commun Image Represent. 2019;58:285–303.

Bour P, Cribelier E, Argyriou V. Chapter 14—Crowd behavior analysis from fixed and moving cameras. In: Computer vision and pattern recognition, multimodal behavior analysis in the wild. Cambridge: Academic Press; 2019. pp. 289–322.

Chapter   Google Scholar  

Xu X, Gong S, Hospedales TM. Chapter 15—Zero-shot crowd behavior recognition. In: Group and crowd behavior for computer vision. Cambridge: Academic Press; 2017:341–369.

Rodriguez M, Sivic J, Laptev I. Chapter 5—The analysis of high density crowds in videos. In: Group and crowd behavior for computer vision. Cambridge: Academic Press. 2017. pp. 89–113.

Yogameena B, Nagananthini C. Computer vision based crowd disaster avoidance system: a survey. Int J Disaster Risk Reduct. 2017;22:95–129.

Wang X, Loy CC. Chapter 10—Deep learning for scene-independent crowd analysis. In: Group and crowd behavior for computer vision. Cambridge: Academic Press; 2017. pp. 209–52.

Arceda VM, Fabián KF, Laura PL, Tito JR, Cáceres JG. Fast face detection in violent video scenes. Electron Notes Theor Comput Sci. 2016;329:5–26.

Wang Q, Wan J, Yuan Y. Deep metric learning for crowdedness regression. IEEE Trans Circuits Syst Video Technol. 2018;28(10):2633–43.

Shao J, Loy CC, Kang K, Wang X. Crowded scene understanding by deeply learned volumetric slices. IEEE Trans Circuits Syst Video Technol. 2017;27(3):613–23.

Grant JM, Flynn PJ. Crowd scene understanding from video: a survey. ACM Trans Multimedia Comput Commun Appl. 2017;13(2):19.

Tay L, Jebb AT, Woo SE. Video capture of human behaviors: toward a Big Data approach. Curr Opin Behav Sci. 2017;18:17–22 (ISSN 2352-1546) .

Chaudhary S, Khan MA, Bhatnagar C. Multiple anomalous activity detection in videos. Procedia Comput Sci. 2018;125:336–45.

Anwar F, Petrounias I, Morris T, Kodogiannis V. Mining anomalous events against frequent sequences in surveillance videos from commercial environments. Expert Syst Appl. 2012;39(4):4511–31.

Wang T, Qiao M, Chen Y, Chen J, Snoussi H. Video feature descriptor combining motion and appearance cues with length-invariant characteristics. Optik. 2018;157:1143–54.

Kaltsa V, Briassouli A, Kompatsiaris I, Strintzis MG. Multiple Hierarchical Dirichlet Processes for anomaly detection in traffic. Comput Vis Image Underst. 2018;169:28–39.

Cermeño E, Pérez A, Sigüenza JA. Intelligent video surveillance beyond robust background modeling. Expert Syst Appl. 2018;91:138–49.

Coşar S, Donatiello G, Bogorny V, Garate C, Alvares LO, Brémond F. Toward abnormal trajectory and event detection in video surveillance. IEEE Trans Circuits Syst Video Technol. 2017;27(3):683–95.

Ribeiro PC, Audigier R, Pham QC. Romaric Audigier, Quoc Cuong Pham, RIMOC, a feature to discriminate unstructured motions: application to violence detection for video-surveillance. Comput Vis Image Underst. 2016;144:121–43.

Şaykol E, Güdükbay U, Ulusoy Ö. Scenario-based query processing for video-surveillance archives. Eng Appl Artif Intell. 2010;23(3):331–45.

Castanon G, Jodoin PM, Saligrama V, Caron A. Activity retrieval in large surveillance videos. In: Academic Press library in signal processing. Vol. 4. London: Elsevier; 2014.

Cheng HY, Hwang JN. Integrated video object tracking with applications in trajectory-based event detection. J Vis Commun Image Represent. 2011;22(7):673–85.

Hong X, Huang Y, Ma W, Varadarajan S, Miller P, Liu W, Romero MJ, del Rincon JM, Zhou H. Evidential event inference in transport video surveillance. Comput Vis Image Underst. 2016;144:276–97.

Wang T, Qiao M, Deng Y, Zhou Y, Wang H, Lyu Q, Snoussi H. Abnormal event detection based on analysis of movement information of video sequence. Optik. 2018;152:50–60.

Ullah H, Altamimi AB, Uzair M, Ullah M. Anomalous entities detection and localization in pedestrian flows. Neurocomputing. 2018;290:74–86.

Roy D, Mohan CK. Snatch theft detection in unconstrained surveillance videos using action attribute modelling. Pattern Recogn Lett. 2018;108:56–61.

Lee WK, Leong CF, Lai WK, Leow LK, Yap TH. ArchCam: real time expert system for suspicious behaviour detection in ATM site. Expert Syst Appl. 2018;109:12–24.

Dinesh Jackson Samuel R, Fenil E, Manogaran G, Vivekananda GN, Thanjaivadivel T, Jeeva S, Ahilan A. Real time violence detection framework for football stadium comprising of big data analysis and deep learning through bidirectional LSTM. Comput Netw. 2019;151:191–200 (ISSN 1389-1286) .

Bouachir W, Gouiaa R, Li B, Noumeir R. Intelligent video surveillance for real-time detection of suicide attempts. Pattern Recogn Lett. 2018;110:1–7 (ISSN 0167-8655) .

Wang J, Xu Z. Spatio-temporal texture modelling for real-time crowd anomaly detection. Comput Vis Image Underst. 2016;144:177–87 (ISSN 1077-3142) .

Ko KE, Sim KB. Deep convolutional framework for abnormal behavior detection in a smart surveillance system. Eng Appl Artif Intell. 2018;67:226–34.

Dan X, Yan Y, Ricci E, Sebe N. Detecting anomalous events in videos by learning deep representations of appearance and motion. Comput Vis Image Underst. 2017;156:117–27.

Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR). 2015.

Guo Y, Liu Y, Oerlemans A, Lao S, Lew MS. Deep learning for visual understanding: a review. Neurocomputing. 2016;187(26):27–48.

Babaee M, Dinh DT, Rigoll G. A deep convolutional neural network for video sequence background subtraction. Pattern Recogn. 2018;76:635–49.

Xue H, Liu Y, Cai D, He X. Tracking people in RGBD videos using deep learning and motion clues. Neurocomputing. 2016;204:70–6.

Dong Z, Jing C, Pei M, Jia Y. Deep CNN based binary hash video representations for face retrieval. Pattern Recogn. 2018;81:357–69.

Zhang C, Tian Y, Guo X, Liu J. DAAL: deep activation-based attribute learning for action recognition in depth videos. Comput Vis Image Underst. 2018;167:37–49.

Zhou S, Shen W, Zeng D, Fang M, Zhang Z. Spatial–temporal convolutional neural networks for anomaly detection and localization in crowded scenes. Signal Process Image Commun. 2016;47:358–68.

Pennisi A, Bloisi DD, Iocchi L. Online real-time crowd behavior detection in video sequences. Comput Vis Image Underst. 2016;144:166–76.

Feliciani C, Nishinari K. Measurement of congestion and intrinsic risk in pedestrian crowds. Transp Res Part C Emerg Technol. 2018;91:124–55.

Wang X, He X, Wu X, Xie C, Li Y. A classification method based on streak flow for abnormal crowd behaviors. Optik Int J Light Electron Optics. 2016;127(4):2386–92.

Kumar S, Datta D, Singh SK, Sangaiah AK. An intelligent decision computing paradigm for crowd monitoring in the smart city. J Parallel Distrib Comput. 2018;118(2):344–58.

Feng Y, Yuan Y, Lu X. Learning deep event models for crowd anomaly detection. Neurocomputing. 2017;219:548–56.

Download references

Acknowledgements

Not applicable.

Author information

Authors and affiliations.

VIT, Vellore, 632014, Tamil Nadu, India

G. Sreenu & M. A. Saleem Durai

You can also search for this author in PubMed   Google Scholar

Contributions

GS and MASD selected and analyzed different papers for getting more in depth view about current scenarios of the problem and its solutions. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to G. Sreenu .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Availability of data and materials

Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article.

Sreenu, G., Saleem Durai, M.A. Intelligent video surveillance: a review through deep learning techniques for crowd analysis. J Big Data 6 , 48 (2019). https://doi.org/10.1186/s40537-019-0212-5

Download citation

Received : 07 December 2018

Accepted : 28 May 2019

Published : 06 June 2019

DOI : https://doi.org/10.1186/s40537-019-0212-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Video surveillance
  • Deep learning
  • Crowd analysis

research paper on video processing

EURASIP Journal on Image and Video Processing Cover Image

  • Search by keyword
  • Search by citation

Page 1 of 19

Analysis of thermal videos for detection of lie during interrogation

The lie-detection tests are traditionally carried out by well-trained experts using polygraph machines. However, it is time-consuming, invasive, and, overall, a cumbersome process, not admissible by the court ...

  • View Full Text

Semi-automated computer vision-based tracking of multiple industrial entities: a framework and dataset creation approach

This contribution presents the TOMIE framework (Tracking Of Multiple Industrial Entities), a framework for the continuous tracking of industrial entities (e.g., pallets, crates, barrels) over a network of, in ...

Fast CU size decision and intra-prediction mode decision method for H.266/VVC

H.266/Versatile Video Coding (VVC) is the most recent video coding standard developed by the Joint Video Experts Team (JVET). The quad-tree with nested multi-type tree (QTMT) architecture that improves the com...

Assessment framework for deepfake detection in real-world situations

Detecting digital face manipulation in images and video has attracted extensive attention due to the potential risk to public trust. To counteract the malicious usage of such techniques, deep learning-based de...

Edge-aware nonlinear diffusion-driven regularization model for despeckling synthetic aperture radar images

Speckle noise corrupts synthetic aperture radar (SAR) images and limits their applications in sensitive scientific and engineering fields. This challenge has attracted several scholars because of the wide dema...

Multimodal few-shot classification without attribute embedding

Multimodal few-shot learning aims to exploit complementary information inherent in multiple modalities for vision tasks in low data scenarios. Most of the current research focuses on a suitable embedding space...

Secure image transmission through LTE wireless communications systems

Secure transmission of images over wireless communications systems can be done using RSA, the most known and efficient cryptographic algorithm, and OFDMA, the most preferred signal processing choice in wireles...

An optimized capsule neural networks for tomato leaf disease classification

Plant diseases have a significant impact on leaves, with each disease exhibiting specific spots characterized by unique colors and locations. Therefore, it is crucial to develop a method for detecting these di...

Multi-layer features template update object tracking algorithm based on SiamFC++

SiamFC++ only extracts the object feature of the first frame as a tracking template, and only uses the highest level feature maps in both the classification branch and the regression branch, so that the respec...

Robust steganography in practical communication: a comparative study

To realize the act of covert communication in a public channel, steganography is proposed. In the current study, modern adaptive steganography plays a dominant role due to its high undetectability. However, th...

Multi-attention-based approach for deepfake face and expression swap detection and localization

Advancements in facial manipulation technology have resulted in highly realistic and indistinguishable face and expression swap videos. However, this has also raised concerns regarding the security risks assoc...

Semantic segmentation of textured mosaics

This paper investigates deep learning (DL)-based semantic segmentation of textured mosaics. Existing popular datasets for mosaic texture segmentation, designed prior to the DL era, have several limitations: (1...

Comparison of synthetic dataset generation methods for medical intervention rooms using medical clothing detection as an example

The availability of real data from areas with high privacy requirements, such as the medical intervention space is low and the acquisition complex in terms of data protection. To enable research for assistance...

Phase congruency based on derivatives of circular symmetric Gaussian function: an efficient feature map for image quality assessment

Image quality assessment (IQA) has become a hot issue in the area of image processing, which aims to evaluate image quality automatically by a metric being consistent with subjective evaluation. The first stag...

Correction: Printing and scanning investigation for image counter forensics

The original article was published in EURASIP Journal on Image and Video Processing 2022 2022 :2

An early CU partition mode decision algorithm in VVC based on variogram for virtual reality 360 degree videos

360-degree videos have become increasingly popular with the application of virtual reality (VR) technology. To encode such kind of videos with ultra-high resolution, an efficient and real-time video encoder be...

Learning a crowd-powered perceptual distance metric for facial blendshapes

It is known that purely geometric distance metrics cannot reflect the human perception of facial expressions. A novel perceptually based distance metric designed for 3D facial blendshape models is proposed in ...

Studies in differentiating psoriasis from other dermatoses using small data set and transfer learning

Psoriasis is a common skin disorder that should be differentiated from other dermatoses if an effective treatment has to be applied. Regions of Interests, or scans for short, of diseased skin are processed by ...

Heterogeneous scene matching based on the gradient direction distribution field

Heterogeneous scene matching is a key technology in the field of computer vision. The image rotation problem is popular and difficult in the field of heterogeneous scene matching. In this paper, a heterogeneou...

FitDepth: fast and lite 16-bit depth image compression algorithm

This article presents a fast parallel lossless technique and a lossy image compression technique for 16-bit single-channel images. Nowadays, such techniques are “a must” in robotics and other areas where sever...

Vehicle logo detection using an IoAverage loss on dataset VLD100K-61

Vehicle Logo Detection (VLD) is of great significance to Intelligent Transportation Systems (ITS). Although many methods have been proposed for VLD, it remains a challenging problem. To improve the VLD accurac...

Correction: Research on application of multimedia image processing technology based on wavelet transform

The original article was published in EURASIP Journal on Image and Video Processing 2019 2019 :24

Correction: Geolocation of covert communication entity on the Internet for post-steganalysis

The original article was published in EURASIP Journal on Image and Video Processing 2020 2020 :15

Reversible designs for extreme memory cost reduction of CNN training

Training Convolutional Neural Networks (CNN) is a resource-intensive task that requires specialized hardware for efficient computation. One of the most limiting bottlenecks of CNN training is the memory cost a...

Data and image storage on synthetic DNA: existing solutions and challenges

Storage of digital data is becoming challenging for humanity due to the relatively short life-span of storage devices. Furthermore, the exponential increase in the generation of digital data is creating the ne...

Retraction Note: Research on path guidance of logistics transport vehicle based on image recognition and image processing in port area

A novel secured euclidean space points algorithm for blind spatial image watermarking.

Digital raw images obtained from the data set of various organizations require authentication, copyright protection, and security with simple processing. New Euclidean space point’s algorithm is proposed to au...

Retraction Note: Research on professional talent training technology based on multimedia remote image analysis

Retraction note: analysis of sports image detection technology based on machine learning, retraction note: research on image correction method of network education assignment based on wavelet transform, retraction note: performance analysis of ethylene-propylene diene monomer sound-absorbing materials based on image processing recognition, retraction note to: translation analysis of english address image recognition based on image recognition, retraction note: image processing algorithm of hartmann method aberration automatic measurement system with tensor product model, retraction note to: research on english translation distortion detection based on image evolution, retraction note: a method for spectral image registration based on feature maximum submatrix, fine-grained precise-bone age assessment by integrating prior knowledge and recursive feature pyramid network.

Bone age assessment (BAA) evaluates individual skeletal maturity by comparing the characteristics of skeletal development to the standard in a specific population. The X-ray image examination for bone age is t...

Palpation localization of radial artery based on 3-dimensional convolutional neural networks

Palpation localization is essential for detecting physiological parameters of the radial artery for pulse diagnosis of Traditional Chinese Medicine (TCM). Detecting signal or applying pressure at the wrong loc...

Weakly supervised spatial–temporal attention network driven by tracking and consistency loss for action detection

This study proposes a novel network model for video action tube detection. This model is based on a location-interactive weakly supervised spatial–temporal attention mechanism driven by multiple loss functions...

Performance analysis of different DCNN models in remote sensing image object detection

In recent years, deep learning, especially deep convolutional neural networks (DCNN), has made great progress. Many researchers use different DCNN models to detect remote sensing targets. Different DCNN models...

Multi-orientation local ternary pattern-based feature extraction for forensic dentistry

Accurate and automated identification of the deceased victims with dental radiographs plays a significant role in forensic dentistry. The image processing techniques such as segmentation and feature extraction...

Face image synthesis from facial parts

Recently, inspired by the growing power of deep convolutional neural networks (CNNs) and generative adversarial networks (GANs), facial image editing has received increasing attention and has produced a series...

An image-guided network for depth edge enhancement

With the rapid development of 3D coding and display technologies, numerous applications are emerging to target human immersive entertainments. To achieve a prime 3D visual experience, high accuracy depth maps ...

Automatic kidney segmentation using 2.5D ResUNet and 2.5D DenseUNet for malignant potential analysis in complex renal cyst based on CT images

Bosniak renal cyst classification has been widely used in determining the complexity of a renal cyst. However, it turns out that about half of patients undergoing surgery for Bosniak category III, take surgica...

Adaptive response maps fusion of correlation filters with anti-occlusion mechanism for visual object tracking

Despite the impressive performance of correlation filter-based trackers in terms of robustness and accuracy, the trackers have room for improvement. The majority of existing trackers use a single feature or fi...

Random CNN structure: tool to increase generalization ability in deep learning

The paper presents a novel approach for designing the CNN structure of improved generalization capability in the presence of a small population of learning data. Unlike the classical methods for building CNN, ...

Printing and scanning investigation for image counter forensics

Examining the authenticity of images has become increasingly important as manipulation tools become more accessible and advanced. Recent work has shown that while CNN-based image manipulation detectors can suc...

The Correction to this article has been published in EURASIP Journal on Image and Video Processing 2023 2023 :10

Reduced reference image and video quality assessments: review of methods

With the growing demand for image and video-based applications, the requirements of consistent quality assessment metrics of image and video have increased. Different approaches have been proposed in the liter...

Perceptual hashing method for video content authentication with maximized robustness

Perceptual video hashing represents video perceptual content by compact hash. The binary hash is sensitive to content distortion manipulations, but robust to perceptual content preserving operations. Currently...

A study on implementation of real-time intelligent video surveillance system based on embedded module

Conventional surveillance systems for preventing accidents and incidents do not identify 95% thereof after 22 min when one person monitors a plurality of closed circuit televisions (CCTV). To address this issu...

HR-MPF: high-resolution representation network with multi-scale progressive fusion for pulmonary nodule segmentation and classification

Accurate segmentation and classification of pulmonary nodules are of great significance to early detection and diagnosis of lung diseases, which can reduce the risk of developing lung cancer and improve patien...

  • Aims and Scope
  • Editorial Board
  • Sign up for article alerts and news from this journal
  • Follow us on Twitter
  • Follow us on Facebook

New Content Item

Webinar Series

Learn more about the EURASIP Journal on Image and Video Processing free monthly webinar series

Affiliated with

research paper on video processing

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

  •  We're Hiring!
  •  Help Center

Video Processing

  • Most Cited Papers
  • Most Downloaded Papers
  • Newest Papers
  • Save to Library
  • Last »
  • Image Processing Follow Following
  • Video Analysis Follow Following
  • Web Programming Follow Following
  • Computer Vision Follow Following
  • Pattern Recognition Follow Following
  • Digital Image Processing Follow Following
  • Video Games Follow Following
  • MySQL Follow Following
  • Computer Science Follow Following
  • Machine Learning Follow Following

Enter the email address you signed up with and we'll email you a reset link.

  • Academia.edu Publishing
  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

MIT Technology Review

  • Newsletters

OpenAI’s new GPT-4o lets people interact using voice or video in the same model

The company’s new free flagship “omnimodel” looks like a supercharged version of assistants like Siri or Alexa.

  • James O'Donnell archive page

screenshot from video of Greg Brockman using two instances of GPT4o on two phones to collaborate with each other

OpenAI just debuted GPT-4o, a new kind of AI model that you can communicate with in real time via live voice conversation, video streams from your phone, and text. The model is rolling out over the next few weeks and will be free for all users through both the GPT app and the web interface, according to the company. Users who subscribe to OpenAI’s paid tiers, which start at $20 per month, will be able to make more requests. 

OpenAI CTO Mira Murati led the live demonstration of the new release one day before Google is expected to unveil its own AI advancements at its flagship I/O conference on Tuesday, May 14. 

GPT-4 offered similar capabilities, giving users multiple ways to interact with OpenAI’s AI offerings. But it siloed them in separate models, leading to longer response times and presumably higher computing costs. GPT-4o has now merged those capabilities into a single model, which Murati called an “omnimodel.” That means faster responses and smoother transitions between tasks, she said.

The result, the company’s demonstration suggests, is a conversational assistant much in the vein of Siri or Alexa but capable of fielding much more complex prompts.

“We’re looking at the future of interaction between ourselves and the machines,” Murati said of the demo. “We think that GPT-4o is really shifting that paradigm into the future of collaboration, where this interaction becomes much more natural.”

Barret Zoph and Mark Chen, both researchers at OpenAI, walked through a number of applications for the new model. Most impressive was its facility with live conversation. You could interrupt the model during its responses, and it would stop, listen, and adjust course. 

OpenAI showed off the ability to change the model’s tone, too. Chen asked the model to read a bedtime story “about robots and love,” quickly jumping in to demand a more dramatic voice. The model got progressively more theatrical until Murati demanded that it pivot quickly to a convincing robot voice (which it excelled at). While there were predictably some short pauses during the conversation while the model reasoned through what to say next, it stood out as a remarkably naturally paced AI conversation. 

The model can reason through visual problems in real time as well. Using his phone, Zoph filmed himself writing an algebra equation (3 x + 1 = 4) on a sheet of paper, having GPT-4o follow along. He instructed it not to provide answers, but instead to guide him much as a teacher would.

“The first step is to get all the terms with x on one side,” the model said in a friendly tone. “So, what do you think we should do with that plus one?”

Like previous generations of GPT, GPT-4o will store records of users’ interactions with it, meaning the model “has a sense of continuity across all your conversations,” according to Murati. Other new highlights include live translation, the ability to search through your conversations with the model, and the power to look up information in real time. 

As is the nature of a live demo, there were hiccups and glitches. GPT-4o’s voice might jump in awkwardly during the conversation. It appeared to comment on one of the presenters’ outfits even though it wasn’t asked to. But it recovered well when the demonstrators told the model it had erred. It seems to be able to respond quickly and helpfully across several mediums that other models have not yet merged as effectively. 

Previously, many of OpenAI’s most powerful features, like reasoning through image and video, were behind a paywall. GPT-4o marks the first time they’ll be opened up to the wider public, though it’s not yet clear how many interactions you’ll be able to have with the model before being charged. OpenAI says paying subscribers will “continue to have up to five times the capacity limits of our free users.” 

Additional reporting by Will Douglas Heaven.

Artificial intelligence

Sam altman says helpful agents are poised to become ai’s killer function.

Open AI’s CEO says we won’t need new hardware or lots more training data to get there.

Is robotics about to have its own ChatGPT moment?

Researchers are using generative AI and other techniques to teach robots new skills—including tasks they could perform in homes.

  • Melissa Heikkilä archive page

What’s next for generative video

OpenAI's Sora has raised the bar for AI moviemaking. Here are four things to bear in mind as we wrap our heads around what's coming.

  • Will Douglas Heaven archive page

An AI startup made a hyperrealistic deepfake of me that’s so good it’s scary

Synthesia's new technology is impressive but raises big questions about a world where we increasingly can’t tell what’s real.

Stay connected

Get the latest updates from mit technology review.

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at [email protected] with a list of newsletters you’d like to receive.

Main Navigation

  • Contact NeurIPS
  • Code of Ethics
  • Code of Conduct
  • Create Profile
  • Journal To Conference Track
  • Diversity & Inclusion
  • Proceedings
  • Future Meetings
  • Exhibitor Information
  • Privacy Policy

NeurIPS 2024, the Thirty-eighth Annual Conference on Neural Information Processing Systems, will be held at the Vancouver Convention Center

Monday Dec 9 through Sunday Dec 15. Monday is an industry expo.

research paper on video processing

Registration

Pricing » Registration 2024 Registration Cancellation Policy » . Certificate of Attendance

Our Hotel Reservation page is currently under construction and will be released shortly. NeurIPS has contracted Hotel guest rooms for the Conference at group pricing, requiring reservations only through this page. Please do not make room reservations through any other channel, as it only impedes us from putting on the best Conference for you. We thank you for your assistance in helping us protect the NeurIPS conference.

Announcements

  • The call for High School Projects has been released
  • The Call For Papers has been released
  • See the Visa Information page for changes to the visa process for 2024.

Latest NeurIPS Blog Entries [ All Entries ]

Important dates.

If you have questions about supporting the conference, please contact us .

View NeurIPS 2024 exhibitors » Become an 2024 Exhibitor Exhibitor Info »

Organizing Committee

General chair, program chair, workshop chair, workshop chair assistant, tutorial chair, competition chair, data and benchmark chair, diversity, inclusion and accessibility chair, affinity chair, ethics review chair, communication chair, social chair, journal chair, creative ai chair, workflow manager, logistics and it, mission statement.

The Neural Information Processing Systems Foundation is a non-profit corporation whose purpose is to foster the exchange of research advances in Artificial Intelligence and Machine Learning, principally by hosting an annual interdisciplinary academic conference with the highest ethical standards for a diverse and inclusive community.

About the Conference

The conference was founded in 1987 and is now a multi-track interdisciplinary annual meeting that includes invited talks, demonstrations, symposia, and oral and poster presentations of refereed papers. Along with the conference is a professional exposition focusing on machine learning in practice, a series of tutorials, and topical workshops that provide a less formal setting for the exchange of ideas.

More about the Neural Information Processing Systems foundation »

Help | Advanced Search

Computer Science > Computation and Language

Title: optimization techniques for sentiment analysis based on llm (gpt-3).

Abstract: With the rapid development of natural language processing (NLP) technology, large-scale pre-trained language models such as GPT-3 have become a popular research object in NLP field. This paper aims to explore sentiment analysis optimization techniques based on large pre-trained language models such as GPT-3 to improve model performance and effect and further promote the development of natural language processing (NLP). By introducing the importance of sentiment analysis and the limitations of traditional methods, GPT-3 and Fine-tuning techniques are introduced in this paper, and their applications in sentiment analysis are explained in detail. The experimental results show that the Fine-tuning technique can optimize GPT-3 model and obtain good performance in sentiment analysis task. This study provides an important reference for future sentiment analysis using large-scale language models.

Submission history

Access paper:.

  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Various tomato infection discrimination using spectroscopy

  • Original Paper
  • Open access
  • Published: 17 May 2024

Cite this article

You have full access to this open access article

research paper on video processing

  • Bogdan Ruszczak   ORCID: orcid.org/0000-0003-1089-1778 1 ,
  • Krzysztof Smykała   ORCID: orcid.org/0000-0003-1970-5388 1 , 2 ,
  • Michał Tomaszewski   ORCID: orcid.org/0000-0001-6672-3971 1 &
  • Pedro Javier Navarro Lorente   ORCID: orcid.org/0000-0001-8367-2934 3  

Diagnosing plant diseases is a difficult task, but it could be made easier with the use of advanced instrumentation and the latest machine learning techniques. This paper is a further development of a previous authors study by the authors, which has been extended to provide the classification method for tomato diseases and to indicate the spectral ranges of greatest importance for this process. As tomatoes are one of the most popular and consumed vegetables, and diseases of this crop even reduce yields by up to 80% every year, their detection is a vague topic. This manuscript describes research in which spectroscopy was used to develop methods for discriminating between selected tomato diseases. The following, frequently occurring diseases were investigated for this research: anthracnose, bacterial speck, early blight, late blight , and Septoria Leaf Spot . The study used a dataset consisting of 3877 measurements taken with the ASD FieldSpec 4 Hi-Res spectroradiometer in the 350–2500 nm range from 2019/09/10 to 2019/12/20. The highest classification efficiency ( \(F_1\) score) of 0.896 was obtained for the logistic regression based model which was evaluated on Septoria Leaf Spot disease records.

Avoid common mistakes on your manuscript.

1 Introduction

Tomato ( Solanum lycopersicum ) is one of the most grown vegetables in the world. The yearly global yield of this plant reaches 160 million tons, and it is the second most-consumed vegetable in the world [ 1 ]. However, tomatoes are very vulnerable to fungal, bacterial, and viral diseases such as late blight, powdery mildew, or early blight [ 2 ]. Some of these diseases, like Alternaria solani , could reduce yield by 80% under favorable conditions for the pathogen [ 3 , 4 ]. Farmers counteract the effects of the diseases in various ways e.g., by fungicide control or planting plants partially resistant to diseases. However, a key element of the disease counteract process is a correct diagnosis. Currently, the standard diagnosis of these diseases is based on a microscopic assessment of the plant tissue damage and assessing the number of pathogen spores. Therefore, the non-invasive, reliable, and fast method of disease recognition is indispensable in modern agriculture [ 5 ].

The use of spectroscopy, along with digital signal processing algorithms and artificial intelligence, seems to be increasingly popular among researchers [ 5 , 6 , 7 ]. Spectral data are used on a large scale in remote sensing thanks to space agencies, such as NASA or ESA. The data are useful in, among others, crisis management, climate change monitoring, and agriculture. In the last one, the useful applications are indicators [ 8 ] that enable the determination of plant vegetation [ 8 , 9 ], water stress [ 10 , 75 ], and soil moisture [ 11 , 12 ] or another soil properties [ 13 ].

The present study is a continuation of the authors’ previous work [ 14 ] in which they discussed the possibility of early detection of several tomato pathogens using spectroscopy. As an extension, we further developed the machine learning models aiming to increase method precision. To be specific, in contrast to the previous work, three main goals were set for the presented work, which should be considered:

Goal 1 : finding the best classifier allowing one to distinguish diseased plants from healthy ones based on spectroscopy in 6 experiments variants:

CS (control samples) versus objects infected by AN— Anthracnose ( Colletotrichum coccodes ),

CS versus objects infected by BS - Bacterial speck ( Pseudomonas syringae ),

CS versus objects infected by EB - early blight ( Alternaria solani ),

CS versus objects infected by LB - late blight ( Phytophthora infestans ),

CS versus objects infected by SL - Septoria Leaf Spot ( Septoria lycopersici ),

CS versus randomly selected measurements of infected objects (AN, BS, EB, LB, SL).

Goal 2 : data processing method development, i.e., creating a data processing procedure that achieves the highest improvement in binary classification results,

Goal 3 : hypothesis verification that selected spectral bands are especially important in the disease analysis process.

Data used for this study consists of 3877 spectra of leaf reflectance taken by ASD FieldSpec 4 Hi-Res spectroradiometer in the spectral range 350–2500 nm and metadata for each measurement such as disease, date of measurement, date of infection, etc. The dataset contained control samples (CS) and infected samples (AN, BS, EB, LB, SL) at various stages of infection, from the early stage where symptoms are not visible by the naked eye to the late stage, which is a novelty compared to the researches conducted so far.

There are a number of mature machine learning algorithms developed that could be employed for investigating the spectral information for the purpose of classification. In Sect. 4 we enumerate the methods that are currently used for a number of pathogen detection applications, including linear models, tree-based methods, ensemble machine learning, and neural networks.

2 Diagnostic methods for solanum lycopersicum diseases

The issue of detecting plant diseases attracts the interest of many researchers, which is reflected in numerous publications [ 5 , 9 , 15 , 16 , 17 , 18 ]. Among other studies, one could find a research targeted to pathogens on Asian soybean [ 19 ], wheat infected with Fusarium head blight [ 20 ], sugar beet [ 21 ] or even for phenotyping disease resistance of crops[ 22 ]. A significant part of the researches concerns tomatoes crops [ 14 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 ].

Conventional disease diagnosis methods involve the observation of disease symptoms. However, the symptoms of the various diseases can be markedly similar to each other, which makes it difficult even for experienced farmers to correctly diagnose the disease. The detailed examination requires, therefore, the collection of plant material and laboratory tests [ 41 ]. In the case of fungal diseases, the pathogen has to be isolated and cultured in vitro. The identification is confirmed using microscopic techniques by a taxonomist based on spore morphology and conidiogenesis [ 41 ]. Biochemical tests can determine bacterial diseases, while viruses are identified based on genetic material, transmission assays, and their host range [ 42 ].

Invasive methods, including enzyme-linked immunosorbent assay, polymerase chain reaction, and immunofluorescent assay, are imperfect in terms of cost, speed, and accuracy matter, while near-infrared spectroscopy and hyperspectral imaging allow for an equally good, cost-effective, and non-invasive plant diagnostics [ 43 ], not limited to tomato cultivation, with reported applications to, i.e. potato [ 44 ] or rice [ 45 , 46 ], as well as other plant related phenomena, such as chlorophyll modeling [ 47 , 48 ] or plant phenotyping [ 49 , 50 ].

Some authors note that the strongly developing agrotechnical and food processing industries require the development of non-invasive diagnostic methods [ 43 , 51 ]. Digital cameras [ 31 , 32 , 33 , 34 , 52 , 53 , 54 ], multispectral imaging [ 30 , 55 , 56 ] and spectroscopy [ 26 , 27 , 28 , 35 , 37 , 57 ] were used for the researches on that issue.

Attempts to diagnose crops and identify Solanum lycopersicum diseases on the basis of spectral measurements were made by, among others:

P1 - Zhang et al. [ 24 ];

P2 - Wang et al. [ 36 ];

P3 - Jones et al. [ 38 ];

P4 - Moghadam P. et al. [ 26 ];

P5 - Xie C. et al. [ 35 ];

P6 - Lu J. et al. [ 37 ].

P7 - Pereira J. et al. [ 40 ]

In the mentioned publications, various spectral measurements were used, from ultraviolet (UV) to visual light range (VIS) and near-infrared (NIR) to short-wave-infrared (SWIR). In publications P1 and P2 a spectroradiometer GER that works in the 400–2500 nm spectral range was used. In P3 a spectroradiometer Cary 500 measuring within 200–2500 nm spectral range was employed. A set consisting of 2 hyperspectral Headwall cameras, imaging in 400–1000 nm and 900–2500 nm ranges, was used in P4. In P5 a hyperspectral camera, imaging within 380–1023 nm spectral range, was used. Results presented in P6 were based on spectroradiometer measurements in the 400–1050 nm spectral range.

A synthetic summary of the characteristics of the conducted measurements, research, and results is presented in Table 1 .

Dataset used in publications P1 and P2 consisted of spectral data of 60 plants and 1074 measurement points. Additionally, 195 spectroscopic images from the AVIRIS programme , covering 700–1130nm wavelengths, were used in P2. For P1, the average values for six spectral ranges were calculated (600–690 nm, 750–930 nm, 950–1030 nm, 1040–1130 nm, 1450–1850 nm, 2000–2400 nm), and novel spectral disease indices were proposed. The indices were quotients of two selected ranges. The result of the P1 publication was the discovery of the most relevant wavelengths in a matter of late blight detection on tomatoes, while P2 focused on the creation of a backpropagation artificial neural network for resolving the same issue. Authors of the P1 indicate the most important spectral ranges (750–930 nm in the first place, then 950–1030 nm and 1040–1130 nm) for the process of Late Blight detection in tomato crops. The researchers noted five specified, narrow wavelengths characterized by a high reflectance difference for diseased and healthy plants (625 nm, 850 nm, 1050 nm, 1500 nm and 2100 nm). Based on that observation, they developed novel vegetation indices (VI). Authors of the P2 indicate spectral ranges to distinguish healthy and infected objects—750–1350 nm for field measurements and 700–1105nm for AVIRIS imaging spectroscopy. The study was conducted based on the plants with symptoms visible by the bare eye.

In the P3 study analysis of a correlation coefficient spectrum, analysis of a B-matrix from partial least squares regression (PLS) and stepwise multiple linear regression procedure (SMLR) was used to diagnose bacterial leaf spot of tomato. The studied dataset consisted of 156 spectra. Authors of the P3 publication noted a significant correlation between disease severity and spectral absorbance in several wavelengths: 384nm, 626nm, 691nm, and 761nm. A spectral area of 750–760nm was identified as meaningful in all used approaches mentioned above. Additionally, researchers indicated wavelengths 395nm, 400nm, 630nm, and 633–635nm as significant.

The authors of the P4 publication implemented three various approaches to the issue of detecting diseased and healthy plants. A goal set by researchers in this particular study was to distinguish control plants from those infected by tomato spotted wilt virus (TSWV) by using hyperspectral imaging of 400–2500nm spectral range. The very first approach applied in the study was a feature extraction method based on existing vegetation indices (12 various indices). In the second approach, the whole available spectral range was used. In the third approach, probabilistic topic modeling (PTM) was used. PTM is a method usually used in natural language processing (NLP) for determining the topic of the document. The researchers treated the hyperspectral image as a document, and measurement values for each wavelength were treated as a word in the document.

The dataset used in P4 consisted of imaging of 30 diseased and 30 control plants of 1 variety of tomato. Plants were observed for 21 days. During this time, six measurement sessions were taken from the 3rd to the 21st day after inoculation. During the sessions, 133 diseased leaves and 103 control leaves were measured. The Kullback-Leibler divergence, also known as relative entropy, was used in the study to estimate the distance between the measurement distribution of diseased and control objects.

The best results were achieved using the full wavelength range. The method based on full SWIR wavelength attained an \(F_1\) score equal to 0.92. For VNIR, the \(F_1\) score reached 0.94. However, all analyzed approaches obtained an \(F_1\) score equal to or higher than 0.8. The authors pointed out that no specified wavelength is the most significant for disease classification issues.

The study described in P5 was focused on using spectroradiometer measurements in the 380–1023 nm spectral range to train a multiclassification model for Early Blight, Late Blight and control plants. The study’s main concentration was on wavelengths 442, 508, 573, 696, and 715 nm. To achieve this, 310 measurements were used—120 measurements of healthy plants, 120– early blight , and 70 infected by late blight . The obtained accuracy of the classification ranged from 97.1 to 100%.

The P6 publication covers the study of 57 spectral vegetation indices (SVIs). Based on the conducted studies, the researchers indicated the four most significant SVIs, which refer to 6 considerable wavelengths, i.e., 445, 450, 690, 707, 1070, and 1200nm. The 445 measurements were used in the study, and the aim of the work was the classification model to classify four various classes—control (74 measurements), infected but asymptotic plants (77 measurements), diseased plants with visible symptoms in an early stage of disease (148 measurements) and diseased plants with visible symptoms in a late stage of disease (146 measurements). The obtained accuracy reached 100% for control, asymptotic and late-stage classes.

The authors of the P7 report 15 additional bands between 455 and 666 nm as the most relevant compared to the presented results. However, all of the wavelengths mentioned in the P7 are in the visible spectrum 434.9 \(-\) 680.02 nm, as the authors focused on this specific spectral range and attribute it to the absorption of chlorophylls (430 to 480 nm and 640 to 700 nm), carotenoid pigments (450 to 480 nm and 600 to 650 nm), and xanthophylls (520 to 580 nm). For the diseases studied (bacterial speck, bacterial spot) the most important wavelengths were found to be in the blue-green and red VIS regions of the electromagnetic spectrum.

3 Materials and methods

The following part of the manuscript aims to detail the investigation procedure. The whole research involved four main components: the plant experiments, the spectroscopy acquisitions, the development of the machine learning models and the drawing of the necessary conclusions. This chapter provides the required information on how this procedure was carried out and how it could be reproduced.

3.1 The experiment description

The ASD FieldSpec 4 Hi-Res spectroradiometer was used to perform the acquisitions. The device measures reflectance, transmission, radiance, or irradiance of the tested sample in the spectral range of 350–2500 nm with a spectral resolution of 3 nm and 8 nm for 700 nm, 1400 nm, and 2100 nm, respectively. The spectral sampling width was 1.4 nm for the UV-VNIR range (350–1000 nm) and 1.1 nm for the VNIR-SWIR range (1001–2500 nm). This device is built of 3 detectors: one 512-element silicon near infrared sensor (VNIR: 350–1000 [nm]) and two InGaAs photodiode-based, 2 Stage TE Cooled Graded detectors (SWIR \(_1\) : 1000–1800 [nm] and SWIR \(_2\) : 1800–2500 [nm]). This device is characterized by its decent performance, including the wavelength reproducibility of 0.1 [nm], the wavelength accuracy of 0.5 [nm] for the average error of wavelength calibration fit, the wavelength accuracy of ±1 [nm] for any one line, and the Noise Equivalent Radiance of: 1.0 \(\times 10^{-9}\) [W/cm \(^2\) /nm/sr] 1.4 \(\times 10^{-9}\) [W/cm \(^2\) /nm/sr] 2.2 \(\times 10^{-9}\) [W/cm \(^2\) /nm/sr], for detectors: VNIR at 700[nm], SWIR \(_1\) at 1400[nm] and SWIR \(_2\) at 2100[nm] respectively. In this study, reflectance measurements were used.

The measurements, the basis for the analyzed data set, were made from September 10, 2019 until December 20, 2019. Leaves were first removed from the investigated plants, which at that time had between 30 and 40 leaves, and immediately taken to the measuring station to avoid deterioration. This also means that the following measurements on different days were taken from different leaves, but were taken from the same plants, previously inoculated, and treated in the same way in separate vegetative chambers for each pathogen. Measurements were performed in laboratory conditions in vegetation chambers (phytotrons). Figure  1 shows the time of sowing (light green), planting (green), and infecting plants (purple), as well as the period of taking measurements on individual phytotrons (blue).

Two varieties of tomatoes were used in the experiment: Benito and Polfast . Six test plants were prepared for each of the varieties. Plants were infected with five different pathogens identified in the introduction to this article. The additional surplus was a control plant that was not treated with any pathogen. The test was carried out in 3 measurement cycles. The inoculation (infection) in the first cycle took place on September 10, 2019, in the second cycle—on November 12, 2019, and in the third cycle—on December 9, 2019. The first symptoms of infection, visible to humans, were visible 3–5 days after the infection.

figure 1

Experiment Gantt chart presenting: sowing, planting and infecting days on a timeline and taking samples; every strip represents one measurement day

An artificial light source in the form of two identical halogen lamps was used to illuminate the tested objects properly. On two sides we set up two Ushio Eurostar Reflekto MR16 bulbs of the color temperature of 3000 [K], with narrow \(12^{\circ }\) , spot beam spread, and 11,000 [cd] luminous intensity.

A spectroradiometer calibration is required before every measurement session. The calibration consists of measuring white reference and dark current, where white reference refers to an object with nearly 100% reflectance and dark current refers to the current generated within a detector in the absence of any external photons. Spectralon reference panel made of polytetrafluoroethylene (PTFE) and sintered halon was used as the reference white. Its reflectance in the 400–1500 nm range is over 99% and over 95% in the range of 250–2500 nm [ 58 ].

A single reflectance measurement is a result of averaging five successive spectral curves. The measurements were made from three distances of the measuring instrument from the infected leaves: 5, 30, and 60 cm. The analysis of the obtained measurements made at different heights showed that the reflectance differs depending on the height of the measurement. The light scattering and the spectrometer optical fiber parameters cause the reflectance measurements of the same Solanum lycopersicum leaves taken from greater distances (30 and 60 cm) to be subject to significant external disturbances. We found at the later stage of the project, that the accuracy of leaf targeting and the repeatability of scans were higher when the instrument was placed 5 cm above the sample. The resulting observed area diameter is 2.2 cm and that aids precise leaf targeting. The characteristics of various scan heights were also highly distinct and could hinder machine learning modeling. Therefore measurements taken at the height of 5 cm from the tested object were used (Fig.  2 ).

figure 2

Spectrum of all measurements normalized for a height of 5 cm (control measurements—green, infected objects—red)

3.2 Structure of the dataset

The initial data processing consisted of isolating incorrect data, such as incorrect calibration measurements and objects that were not the subject of the project (e.g., background of the tested object) from the correctly performed measurements. Then the metadata encoded in the measurement reference numbers was extracted and added to the data set, creating a model matrix (tidy dataset) [ 59 ].

As a result, the number of primary measurements was reduced from 72,156 to 58,186 by screening out the measurement errors, then to 11,634 by determining the medians of the measurement series. Error detection was based on a straightforward visual inspection of the spectroscopy scans performed, filtering out the obvious outliers and out-of-range spectral curves from the dataset. Finally, the measurements made from a height of 5 cm were selected as the least noisy with the background image and at the same time, the most reliable in the research process, which resulted in obtaining 3877 reliable, fully described with metadata, unique measurements. The whole process of the experiment and data collection has been depicted in Fig.  3 .

This study’s calculations and data processing have been done using Python-based open-source software (licensed under GPLv3), applying Sci-Kit Learn, NumPy, SciPy, Pandas, Seaborn, and Matplotlib scientific libraries.

figure 3

Diagram of the measurement data recording process (from measurements to a tidy dataset)

4 Classification using machine learning algorithms and their evaluation

Several studies reported promising results for plant disease symptoms identification, mainly for images, and the situation when the symptoms are rather visible, working on images using convolutional neural networks for tomato [ 60 , 61 , 62 , 63 , 64 ] as well as for other plants such as potato [ 65 ] cucumber [ 66 ] or rice [ 67 ] (this study exploited also thermal imagery to improve modeling performance, for detecting any visual disease symptoms that cannot be detected externally). Some researchers, similarly to the presented study, explored the potential of hyperspectral information towards the detection of both symptomatic and non-symptomatic cases, of single pathogen detection in this case [ 68 ]. Some extended the investigation testing also the possibility of delivering the segmentation algorithms for pathogens on the visually monitored tomato fields [ 69 , 70 ].

There are numerous different machine learning techniques that have been tested useful for plant disease classification. Even linear methods, like ridge classifier (RC) [ 71 ], logistic regression (LR) and linear support vector classification (linear SVC) could provide a sufficient model for some reported cases [ 28 ]. The more complex ensemble machine learning techniques, with the ability to combine several weaker sub-models that are advised in the number of papers [ 72 , 73 , 74 ]. Those are, among others random forest (RF) [ 28 , 73 , 74 ], modified RF [ 75 ] light gradient boosting machine (LGBM) [ 72 , 76 ] and extreme gradient boosting classifier (XGB) [ 74 ]. Some of the classifiers were tested on data in a similar fashion, collected from tomato samples, and after applying the ensemble machine learning models, concluded with auspicious results [ 28 , 73 ]. Therefore, the set of the recalled models has been selected for further investigation in this study.

The range of machine learning methods for detecting plant diseases is quite extensive. Researchers have employed a variety of methods, including disease indices (P1[ 24 ]), linear modeling (P3[ 38 ]), probabilistic approaches (P4[ 26 ]), and nearest neighbor classifiers (P5[ 35 ]). These methods provide a useful set of tools for interpreting results later on. For certain investigative purposes, it may be necessary to employ more complex architectures when the searched patterns are not easily identifiable. These architectures may include neural networks (P2[ 36 ]), support vector machines[ 77 ] and convolutional neural networks[ 78 ] that support the discovery of more abstract pattern representations.

To efficiently conduct the experiment and evaluate the resulting models’ performance the cross-validation procedure should be applied [ 79 , 80 ]. In order to compare the performance of the prepared models, we decided to investigate the \(F_1\) metric, which allows us to indirectly check the precision and recall results, and is denoted in the Eq. 3 . For results calculated using the validation set, let TP be a number of true positives, or correctly indicated infected samples, TN be a number of true negatives, or correctly indicated not infected samples, FP be a number of false positives, or incorrectly classified not infected samples, and FN be a number of false negatives, that is infected samples missed the examined classifier. And let:

than the \(F_1\) metric that gathers all the necessary information of both those metrics could be denoted as:

Such an approach provides more detailed information about model performance, than simple accuracy that takes into account only the number of correct predictions divided by the total number of predictions and does not reveal the whole case complexity. The usage of such a metric for pathogen classification evaluation was proposed also in [ 81 ].

The investigation described had two main objectives, therefore the results section provides two separate parts to support both. The first is concerned with disease classification and the second focuses on the search for spectral bands that are relevant to specific diseases.

Initial classification trials were performed in order to select machine learning algorithms for further research. The best results were obtained by the following classifiers: LGBM—light gradient boosting machine, linear SVC—linear support vector classification, LR—logistic regression, RF—random forest, RC—ridge classifier, XGB—extreme gradient boosting. These algorithms were used in the actual in-depth study.

In order to increase the effectiveness of the classification of Solanum lycopersicum cultivation diseases, a step-by-step approach to the problem under consideration was chosen. The first classification [STEP 1, Fig.  4 ] was performed for the base set (measurements obtained with a spectroradiometer without additional processing). In the next step, the set was normalized (to achieve this we performed the min-max normalization, and stretched the data in the range of [0, 1]), and the classification was performed again using previously selected algorithms [STEP 2, Fig.  4 ].

figure 4

Diagram of the subsequent stages of the research experiment

Then, because the distributor of measuring equipment indicated a high, difficult to determine measurement uncertainty in the initial (350–475 nm) and final (2400–2500 nm) range of the measurement spectrum, it was decided to remove this range from the data set, and the classification process was repeated [STEP 3, Fig.  4 ]. We have chosen not to examine these two narrow spectral regions at either end of the device spectra obtained with our spectrometer. We identify these two parts of the measurements as weaker. The readings in the first spectral region reflect the characteristics of the halogen light source used for this study, which has lower emission in this UV to blue part of the spectrum. At the other end of the measurement, the process is weaker due to the characteristic of the spectrometer employed, which uses an InGaAs sensor. Therefore, in order to avoid providing data with a lower signal-to-noise ratio, we omit these two extreme parts of the measurements. As a consequence, we consider the omission of these spectra as a limitation of this study. Nevertheless, in the following investigation we plan to address them, including additional blue LED array illuminations that have been reported helpful[ 82 ] as well as additional calibration targets included in the process[ 83 ].

The next step was to limit the feature space in the classification process. For this purpose, the Recursive Feature Elimination (RFE) method was used with the overall goal of reducing the feature space to 50 spectral bands at the end of STEP 4 and, if necessary, with a deeper reduction in STEP 5. We therefore applied two stages of the elimination process. The first feature elimination stage [STEP 4, Fig.  4 ] exploited the RFE to limit the number of features to 50 bands with recursive elimination of 50 features in each subsequent iteration. As a result, 50 spectral features (bands) were selected for later inclusion in the final classifier, for which the \(F_1\) measure was calculated.

The second stage [STEP 5, Fig.  4 ] applied the dynamic elimination of the features, with the difference from STEP 4, that the \(F_1\) score measure was determined after the elimination of each data feature. Based on the \(F_1\) score measure, the best elimination stage was determined for each of the classification cases considered. The step of 50 features for each iteration of the process was chosen arbitrarily at first, as it both provided the expected performance improvements and did not significantly slow down the entire process. We later also tested other options for this parameter, but for shorter steps, the number of required computations increased substantially, slowing down the computations substantially, and for larger step sizes, we noted the accuracy loss. Therefore, we stayed with the 50 features step.

During this process, the cross-validation method was used. We performed data splitting for cross-validation with respect to the plant diseases studied. We aimed to distribute the samples for each disease and control group evenly between the subsets. This also meant that the vegetative chambers were equally represented, as the plants inoculated with different diseases were planted separately. Secondly, we also stratified the data samples collected in the following days across the subsets. However, we restricted the placement of scans of the same leaf in the same fold to avoid later processing very similar scans in the same algorithm training phase.

The last stage [STEP 6, Fig.  4 ] was to fine-tune the hyperparameters of the individual classifiers for the best step from STEP 5 by applying the method of randomly searching for optimal hyperparameters using cross-validation. To address the disease detection task, we applied several well-established machine learning algorithms that fulfill two main criteria: they allow achieving high performance and support model interpretability by exploring the importance of features. An essential step in the development of these detectors was the search for their advantageous hyperparameter configuration, therefore we tested the following parameters for the following models.

LGBM: with learning rates of the range from \(10^{-6}\) to \(10^{-1}\) , minimal number of required samples per trained tree leaf between 5 and 100 with step 5, and with the maximum allowed depth of each trained tree from 2 to 52 with step 5,

Linear SVC: with regularisation parameter C between 1 and 1000 and with \(L_1\) or \(L_2\) regularisation,

LR: with regularisation parameter C between 1 and 1000 and with \(L_1\) or \(L_2\) regularisation,

RF: with the maximum number of submodels included from 200 to 2000 with step 100, with the maximum allowed depth of each trained tree from 2 to 52 with step 5, with the minimum required number of samples at each tree leaf node between 1 and 16, and minimum number of samples needed to perform a node split between 2 and 10,

RC: with learning rates of the range from \(10^{-4}\) to \(10^0\) ,

XGB: with learning rates of the range from \(10^{-4}\) to \(10^-1\) , with allowed percentage of samples to be used per each tree between 1/5 and 4/5 with step 1/5, with the maximum allowed depth of each trained tree from 2 to 52 with step 5, and with the maximum number of submodels included from 200 to 2000 with step 100.

The results presented below correspond to the activities performed at each stage. Results for all the evaluated models are presented in Table 2 ) and are respectively:

result 0—tidy dataset,

result 1—tidy dataset and min-max normalization,

result 2—tidy dataset and elimination of marginal ranges of the reflectance spectrum,

result 3—tidy dataset and min-max normalization and result for the full recursive feature elimination process,

result 4—tidy dataset and min-max normalization and the best result obtained in the process of recursive feature elimination,

result 5—tidy dataset and min-max normalization and the best result obtained in the process of recursive feature elimination and fine-tuning of the classifier hyperparameters.

The ALL column presents the binary classification results ( \(F_1\) score) distinguishing healthy plants from sick plants, regardless of the disease (CS vs AN + BS + EB + LB + SL). The following columns are the results of the classification of individual diseases in relation to the control sample (CS vs AN, CS vs BS, CS vs EB, CS vs LB, CS vs SL, respectively).

All used classifiers were trained on the same training dataset (2/3 of all measurements; n = 854 samples) and tested on a separate test dataset (1/3 of collected measurements; n = 420 samples) that did not participate in the learning process. Training and test sets for all experiment variants have been balanced, in that sense they were built of 50% of samples of infected plants and 50% of the control group.

figure 5

The importance of particular ranges of the reflectance spectrum in the process of disease classification. The white dashed line marks the overlapping spectral ranges for all the analyzed diseases

For the part of the process where we are working on sub-models for individual infections (i.e. CS vs. AN, and so on), the detection is based only on data for that pathogen and the control set. Therefore, at this stage, a smaller subset of samples is used sequentially, using only data for specific infections and control data at the same time.

In almost each of the experiment stages, the classification performance improved. Only STEP 3 (elimination of the marginal ranges of the reflectance spectrum) resulted in worse scores. For the ALL classification (CS vs. AN + BS + EB + LB + SL) the outcome of the classification was 0.879.

The detection of BS, EB, and LB diseases was characterized by slightly lower effects (0.872, 0.877, and 0.866, respectively), while the remaining diseases were detected with greater efficiency (AN: 0.894, SL: 0.896). After performing the entire procedure described, including the hyperparameter tuning, the best result improvements were observed for logistic regression (in 4 cases) and linear SVC (in 2 cases). The smallest improvement in the classification efficiency was obtained in the case of AN, and it was 0.084. On the other hand, the greatest improvement of 0.144 was achieved in the case of BS.

6 Discusion

Another goal of the study was to determine the participation of individual spectral bands (model features) in the decision-making process. Depending on the used classifier, feature weights (“ coef_ ” attribute) or feature significance coefficients (“ feature_importances_ ” attribute) were defined. The coefficients mentioned above can be treated as indicators of the significance of features for such classifiers as inter alia, linear regression, or ridge classifier. The determined feature weights were used for the RidgeClassifier, linear SVC and logistic regression classifiers, while the feature significance coefficients were used for the Random Forest, XGBClassifier, and LGBMClassifier classifiers.

The recursive feature elimination (RFE) method was used in the conducted study, using the levels mentioned earlier of the significance of features and feature coefficients. The method is based on the iterative elimination of successive, least significant features of the data set. In the case of the considered data set, these were data columns representing successive lengths of the reflectance spectrum. Fifty features that showed the least significance in the decision-making process in each subsequent iteration were removed from the set. Then the classifier was retrained on a smaller data set. The elimination process ended when the set was limited to 50 traits. Depending on the data set under consideration, the whole process consisted of 39 or 44 elimination steps. Each feature was assigned a ranking index that allowed it to determine the feature’s elimination stage. For each of the stages, the value of the \(F_1\) score measure was also determined for the test data set.

Based on the \(F_1\) score measure, the step for which the highest value of the indicator was obtained was determined. In this way, all the bands involved in the decision-making process were listed. Then it was determined how often particular features were taken into account by a given classifier. On this basis, a graph of the share of spectral features in the decision-making process was created.

As part of an in-depth analysis, it was decided to group the adjacent spectral ranges (50 bands) and to define the most frequently repeated spectral ranges (70% of the most rarely occurring bands were removed).

On the basis of the conducted research, the following spectrum ranges for the analyzed diseases were obtained (the numbers provided indicate the measures of the determined ranges of the reflectance spectrum with a width of around 43 [nm]):

AN: 371, 413, 455, 708, 751, 793, 835, 961, 1004, 1046, 1088, 1805, 1847, 1889, 2438 [nm],

BS: 371, 413, 455, 666, 708, 751, 793, 835, 961, 1004, 1847, 1889, 1932, 2395, 2438 [nm],

EB: 371, 413, 455, 666, 708, 751, 793, 835, 919, 961, 1004, 1046, 1847, 2269, 2438 [nm],

LB: 371, 413, 455, 666, 708, 751, 793, 835, 961, 1004, 1046, 1847, 1889, 2395, 2438 [nm],

SL: 371, 455, 708, 751, 793, 835, 919, 961, 1004, 805, 1847, 1889, 1974, 2142, 2269, 2438 [nm],

ALL: 371, 413, 455, 498, 666, 708, 751, 793, 835, 919, 961, 1004, 1847, 2395, 2438 [nm].

The above-mentioned spectral reflectance ranges are illustrated in Fig.  5 .

As shown in Fig.  5 , several ranges of the reflectance spectrum cover all analyzed diseases and the case where the control was compared with a pooled set of samples of all diseases. These are the following ranges: 371, 455, 708, 751, 793, 835, 961, 1004, 1847, 2438 [nm]. The conducted study also indicated spectral ranges specific for each of the analyzed diseases, and the relevant summary is presented in Tab. 3 .

The conducted study indicated several specific ranges of the reflectance spectrum, including:

Spectral bands influential in the classification of three or four examined diseases (413, 666, 1046, 1889 [nm]),

Bands characteristic of two examined diseases: EB, SL (919, 2269 [nm]), AN, SL (1805 [nm]), BS, LB (2395 [nm]),

Spectral ranges specific for one analyzed disease: 1088 [nm] (AN), 1932 [nm] (BS), 1974 and 2142 [nm] (SL).

figure 6

Comparative report concerning the spectral bands indicated in the literature review (P1-P6—publications described in the introduction to the article, LB, BS, EB—results of the experiment described in this article)

Additionally, the spectral ranges mentioned in the literature review were compared with the results obtained during the research for LB, BS, and EB diseases. The summary of the results is visualized in Fig.  6 .

The spectral band ranges indicated in the literature review were compared with ranges obtained during the described research. The range of visible and near-infrared light appears in the cited publications and the authors’ results. The conducted research also indicated other bands of the reflectance spectrum, significant in the classification of the studied diseases of Solanum lycopersicum , located outside the areas mentioned above. These are, among others:

for LB: 1847, 1889, 2395, 2438 [nm] (the P1 publication indicates: 2100 [nm]),

for BS: 1847, 1889, 1932, 2395, 2438 [nm],

for EB: 1847, 2269, 2438 [nm].

7 Conclusion

The research confirmed that spectroscopy reflectance measurements could be used to detect some Solanum lycopersicum diseases. Regarding objective 1, the best results were obtained for the Logistic Regression and Linear SVC classifiers. The Logistic Regression classifier achieved the highest \(F_1\) score for Septoria Leaf Spot disease, amounting to 0.896 after hyperparameter adjustment.

The procedure proposed in the article (Stage 1–6), the analysis of which was the second main objective of the study, resulted in an improvement in the classification results in almost every analyzed case—for the Bacterial Speck disease , the most significant improvement of the \(F_1\) score metric was obtained (from 0.728 to 0.877).

The performed analysis of the significance of the spectral bands in the classification of diseases (objective no. 3) indicated overlapping reflectance ranges for various diseases (Fig.  6 ). Additionally, different and specific reflectance ranges were indicated, which were particularly important in classifying specific diseases (Table 2 ). The indicated ranges can be used to develop new methods and measuring instruments dedicated to diagnosing appropriate Solanum lycopersicum diseases and after further research into other crops’ diagnoses.

The method described in the article can also be applied to other crops and their diseases. However, this requires the development of appropriate, dedicated data sets consisting of measurements made in laboratory or field conditions. Based on spectroscopy, the presented diagnostic method can be attempted to be transferred to aviation platforms (i.e., unmanned aerial vehicles, manned aircraft, satellites) [ 24 ]. The conducted literature review and the research described in the article show that collecting the most extensive possible set of measurement data, covering various disease cases, may allow for the creation of quick and possibly reliable methods of plant disease diagnostics. In subsequent scientific and research works, the authors plan to attempt to classify diseased plants regardless of the stage of disease development, i.e., at the stage when symptoms are not visible (analysis of the classification effectiveness in the days following infection). It also seems advisable to use the constructed dataset to analyze the effectiveness of disease classification using various artificial neural network (ANN) models, including Deep Learning.

Availability of data and materials

The data that support the findings of this study are available from the authors but restrictions apply to the availability of these data. Data are, however, available from the corresponding author upon reasonable request and with permission from the QZ Solutions.

Foolad, M.R.: Genome mapping and molecular breeding of tomato. Int. J. Plant Genom. (2007). https://doi.org/10.1155/2007/64358

Article   Google Scholar  

Nawrocka, B., Robak, J., Ślusarski, C., Macias, W.: Choroby i Szkodniki Pomidora W Polu i Pod Osłonami. Wydawnictwo Plantpress Sp. z o.o, Kraków (2001)

Google Scholar  

Datar, V.V., Mayee, C.D.: Conidial dispersal of Alternaria solani in tomato. Indian Phytopathol. 35 , 68–70 (1982)

Grigolli, J.F.J., Kubota, M.M., Alves, D.P., Rodrigues, G.B., Cardoso, C.R., Silva, D.J.H., Mizubuti, E.S.G.: Characterization of tomato accessions for resistance to early blight. Crop Breed. Appl. Biotechnol. 11 (2), 174–180 (2011). https://doi.org/10.1590/S1984-70332011000200010

Jin, X., Jie, L., Wang, S., Qi, H., Li, S.: Classifying wheat hyperspectral pixels of healthy heads and Fusarium head blight disease using a deep neural network in the wild field. Remote Sens. 10 (3), 395 (2018). https://doi.org/10.3390/rs10030395

Khan, A., Vibhute, A.D., Mali, S., Patil, C.H.: A systematic review on hyperspectral imaging technology with a machine and deep learning methodology for agricultural applications. Ecol. Inf. 69 , 101678 (2022). https://doi.org/10.1016/j.ecoinf.2022.101678

Polder, G., Gowen, A.: The hype in spectral imaging. J. Spectr. Imaging 9 (1), 4 (2020). https://doi.org/10.1255/jsi.2020.a4

Tomaszewski, M., Gasz, R., Smykała, K.: Monitoring vegetation changes using satellite imaging - NDVI and RVI4S1 indicators. In: Paszkiel, S. (ed.) Control, Computer Engineering and Neuroscience, pp. 268–278. Springer, Cham (2021)

Chapter   Google Scholar  

Jones, H.G., Vaughan, R.A.: Remote Sensing of Vegetation: Principles. Techniques and Applications, Oxford University Press, Oxford (2010)

Ihuoma, S.O., Madramootoo, C.A.: Recent advances in crop water stress detection. Comput. Electron. Agric. 141 , 267–275 (2017). https://doi.org/10.1016/j.compag.2017.07.026

Finn, M.P., Lewis, M.D., Bosch, D.D., Giraldo, M., Yamamoto, K., Sullivan, D.G., Kincaid, R., Luna, R., Allam, G.K., Kvien, C., Williams, M.S.: Remote sensing of soil moisture using airborne hyperspectral data. GIScience Remote Sens. 48 (4), 522–540 (2011). https://doi.org/10.2747/1548-1603.48.4.522

Ruszczak, B., Boguszewska-Mańkowska, D.: Deep potato - the hyperspectral imagery of potato cultivation with reference agronomic measurements dataset: towards potato physiological features modeling. Data Brief 42 , 108087 (2022). https://doi.org/10.1016/j.dib.2022.108087

Singh, S., Kasana, S.S.: Estimation of soil properties from the EU spectral library using long short-term memory networks. Geoderma Reg. 18 , 00233 (2019). https://doi.org/10.1016/j.geodrs.2019.e00233

Tomaszewski, M., Nalepa, J., Moliszewska, E., Ruszczak, B., Smykała, K.: Early detection of Solanum lycopersicum diseases from temporally-aggregated hyperspectral measurements using machine learning. Sci. Rep. 13 (1), 7671 (2023). https://doi.org/10.1038/s41598-023-34079-x

Marino, S., Beauseroy, P., Smolarz, A.: Weakly-supervised learning approach for potato defects segmentation. Eng. Appl. Artific. Intell. 85 , 337–346 (2019). https://doi.org/10.1016/j.engappai.2019.06.024

Reis-Pereira, M., Tosin, R., Martins, R., Santos, F., Tavares, F., Cunha, M.: Kiwi plant canker diagnosis using hyperspectral signal processing and machine learning: detecting symptoms caused by Pseudomonas syringae pv. Actinidiae . Plants. (2022) https://doi.org/10.3390/plants11162154

Naik, B.N., Malmathanraj, R., Palanisamy, P.: Detection and classification of chilli leaf disease using a squeeze-and-excitation-based CNN model. Ecol. Inf. 69 , 101663 (2022). https://doi.org/10.1016/j.ecoinf.2022.101663

Keceli, A.S., Kaya, A., Catal, C., Tekinerdogan, B.: Deep learning-based multi-task prediction system for plant disease and species detection. Ecol. Inf. 69 , 101679 (2022). https://doi.org/10.1016/j.ecoinf.2022.101679

Furlanetto, R.H., Nanni, M.R., Mizuno, M.S., Crusiol, L.G.T., da Silva, C.R.: Identification and classification of Asian soybean rust using leaf-based hyperspectral reflectance. Int. J. Remote Sens. 42 (11), 4177–4198 (2021). https://doi.org/10.1080/01431161.2021.1890855

Alisaac, E., Behmann, J., Kuska, M.T., Dehne, H.-W., Mahlein, A.-K.: Hyperspectral quantification of wheat resistance to Fusarium head blight: comparison of two Fusarium species. Eur. J. Plant Pathol. 152 (4), 869–884 (2018). https://doi.org/10.1007/s10658-018-1505-9

Mahlein, A.-K.: Detection, identification, and quantification of fungal diseases of sugar beet leaves using imaging and non-imaging hyperspectral techniques. PhD thesis, Rheinische Friedrich-Wilhelms-Universität Bonn (2011). https://hdl.handle.net/20.500.11811/4713

Mahlein, A.-K., Kuska, M.T., Thomas, S., Wahabzada, M., Behmann, J., Rascher, U., Kersting, K.: Quantitative and qualitative phenotyping of disease resistance of crops by hyperspectral sensors: seamless interlocking of phytopathology, sensors, and machine learning is needed!. Curr. Opin. Plant Biol. 50 , 156–162 (2019) https://doi.org/10.1016/j.pbi.2019.06.007 . Biotic interactions

Kaur, P., Harnal, S., Gautam, V., Singh, M.P., Singh, S.P.: An approach for characterization of infected area in tomato leaf disease based on deep learning and object detection technique. Eng. Appl. Artific. Intell. 115 , 105210 (2022). https://doi.org/10.1016/j.engappai.2022.105210

Zhang, M., Qin, Z.: Spectral analysis of tomato late blight infections for remote sensing of tomato disease stress in California. In: IEEE International Geoscience and Remote Sensing Symposium, 2004. IGARSS ’04. Proceedings. 2004, vol. 6, pp. 4091–4094. IEEE, Anchorage, AK, USA (2004). https://doi.org/10.1109/IGARSS.2004.1370031 . http://ieeexplore.ieee.org/document/1370031/

Xie, C., Yang, C., He, Y.: Hyperspectral imaging for classification of healthy and gray mold diseased tomato leaves with different infection severities. Comput. Electron. Agric. 135 , 154–162 (2017). https://doi.org/10.1016/j.compag.2016.12.015

Moghadam, P., et al.: Plant disease detection using hyperspectral imaging. In: 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA), pp. 1–8. IEEE, Sydney (2017). https://doi.org/10.1109/DICTA.2017.8227476 . http://ieeexplore.ieee.org/document/8227476/

Smykała, K., Ruszczak, B., Dziubański, K.: Application of ensemble learning to detect Alternaria solani infection on tomatoes cultivated under foil tunnels. Intell. Environ. (2020). https://doi.org/10.3233/AISE200033

Ruszczak, B., Smykała, K., Dziubański, K.: The detection of Alternaria solani infection on tomatoes using ensemble learning. J. Ambient Intell. Smart Environ. 12 (5), 407–418 (2020). https://doi.org/10.3233/AIS-200573

Tasrif Anubhove, M.S., Ashrafi, N., Saleque, A.M., Akter, M., Saif, S.U.: Machine learning algorithm based disease detection in tomato with automated image telemetry for vertical farming. In: 2020 International Conference on Computational Performance Evaluation (ComPE), pp. 250–254. IEEE, Shillong, India (2020). https://doi.org/10.1109/ComPE49325.2020.9200129 . https://ieeexplore.ieee.org/document/9200129

Fahrentrapp, J., Ria, F., Geilhausen, M., Panassiti, B.: Detection of gray mold leaf infections prior to visual symptom appearance using a five-band multispectral sensor. Front. Plant Sci. 10 (May), 1–14 (2019). https://doi.org/10.3389/fpls.2019.00628

Ferentinos, K.P.: Deep learning models for plant disease detection and diagnosis. Comput. Electron. Agric. 145 (February), 311–318 (2018). https://doi.org/10.1016/j.compag.2018.01.009

Brahimi, M., Boukhalfa, K., Moussaoui, A.: Deep learning for tomato diseases: classification and symptoms visualization. Appl. Artific. Intell. 31 (4), 299–315 (2017). https://doi.org/10.1080/08839514.2017.1315516

Mohanty, S.P., Hughes, D.P., Salathé, M.: Using deep learning for image-based plant disease detection. Front. Plant Sci. (2016). https://doi.org/10.3389/fpls.2016.01419 . arXiv:1604.03169

Adhikari, S., KC, E., Balkumari, L., Shrestha, B., Baiju, B.: Tomato plant diseases detection system using image processing. In: Kantipur Engineering College Conference, pp. 81–86 (2018)

Xie, C., Shao, Y., Li, X., He, Y.: Detection of early blight and late blight diseases on tomato leaves using hyperspectral imaging. Sci. Rep. 5 (1), 16564 (2015). https://doi.org/10.1038/srep16564

Wang, X., Zhang, M., Zhu, J., Geng, S.: Spectral prediction of Phytophthora infestans infection on tomatoes using artificial neural network (ANN). Int. J. Remote Sens. 29 (6), 1693–1706 (2008). https://doi.org/10.1080/01431160701281007

Lu, J., Ehsani, R., Shi, Y., Castro, A.I., Wang, S.: Detection of multi-tomato leaf diseases (late blight, target and bacterial spots) in different stages by using a spectral-based sensor. Sci. Rep. 8 (1), 2793 (2018). https://doi.org/10.1038/s41598-018-21191-6

Jones, C.D., Jones, J.B., Lee, W.S.: Diagnosis of bacterial spot of tomato using spectral signatures. Comput. Electron. Agric. 74 (2), 329–335 (2010). https://doi.org/10.1016/j.compag.2010.09.008

Patil, M.A., Manur, M.: Enhanced radial basis function neural network for tomato plant disease leaf image segmentation. Ecol. Inf. 70 , 101752 (2022). https://doi.org/10.1016/j.ecoinf.2022.101752

Reis Pereira, M., Santos, F.N., Tavares, F., Cunha, M.: Enhancing host-pathogen phenotyping dynamics: early detection of tomato bacterial diseases using hyperspectral point measurement and predictive modeling. Front. Plant. Sci. (2023). https://doi.org/10.3389/fpls.2023.1242201

Sharma, P., Sharma, S.: Paradigm shift in plant disease diagnostics: a journey from conventional diagnostics to nano-diagnostics. In: Fungal biology, pp. 237–264. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-27312-9_11 . http://link.springer.com/10.1007/978-3-319-27312-9_11

Golhani, K., Balasundram, S.K., Vadamalai, G., Pradhan, B.: A review of neural networks in plant disease detection using hyperspectral data. Inf. Process. Agric. 5 (3), 354–371 (2018). https://doi.org/10.1016/j.inpa.2018.05.002

Sakudo, A., Suganuma, Y., et al.: Near-infrared spectroscopy: promising diagnostic tool for viral infections. Biochem. Biophys. Res. Commun. 341 (2), 279–284 (2006). https://doi.org/10.1016/j.bbrc.2005.12.153

Garhwal, A.S., Pullanagari, R.R., Li, M., Reis, M.M., Archer, R.: Hyperspectral imaging for identification of zebra chip disease in potatoes. Biosyst. Eng. 197 , 306–317 (2020). https://doi.org/10.1016/j.biosystemseng.2020.07.005

Zhang, J., Tian, Y., Yan, L., Wang, B., Wang, L., Xu, J., Wu, K.: Diagnosing the symptoms of sheath blight disease on rice stalk with an in-situ hyperspectral imaging technique. Biosyst. Eng. 209 , 94–105 (2021). https://doi.org/10.1016/j.biosystemseng.2021.06.020

Singh, A., Kaur, J., Singh, K., Singh, M.L.: Deep transfer learning-based automated detection of blast disease in paddy crop. Signal Image Video Process. (2023). https://doi.org/10.1007/s11760-023-02735-4

Ruszczak, B., Wijata, A.M., Nalepa, J.: Unbiasing the estimation of chlorophyll from hyperspectral images: a benchmark dataset, validation procedure and baseline results. Remote Sens. (2022). https://doi.org/10.3390/rs14215526

Ruszczak, B., Wijata, A.M., Nalepa, J.: Estimating chlorophyll content from hyperspectral data using gradient features. In: Computational Science - ICCS 2023 - 23nd International Conference, Prague, Czech Republic, 3-5 July, 2023, Proceedings. Lecture Notes in Computer Science. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-36021-3_18

Ruszczak, B.: Reducing high-dimensional feature set of hyperspectral measurements for plant phenotype classification. GECCO ’23. ACM, Lisbon (2023). https://doi.org/10.1145/3583133.3596941

Navarro, P.J., Miller, L., Díaz-Galián, M.V., Gila-Navarro, A., Aguila, D.J., Egea-Cortines, M.: A novel ground truth multispectral image dataset with weight, anthocyanins, and brix index measures of grape berries tested for its utility in machine learning pipelines. GigaScience (2022). https://doi.org/10.1093/gigascience/giac052

Desai, M., Kumar Jain, A., Jain, N.K., Jethwa, K.: Detection and classification of fruit disease : a review. Int. Res. J. Eng. Technol. (2016)

Sladojevic, S., Arsenovic, M., Anderla, A., Culibrk, D., Stefanovic, D.: Deep neural networks based recognition of plant diseases by leaf image classification. Comput. Intell. Neurosci. (2016). https://doi.org/10.1155/2016/3289801

Islam, M., Anh Dinh, Wahid, K., Bhowmik, P.: Detection of potato diseases using image segmentation and multiclass support vector machine. In: 2017 IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE), pp. 1–4. IEEE, London, Canada (2017). https://doi.org/10.1109/CCECE.2017.7946594 . http://ieeexplore.ieee.org/document/7946594/

Wang, Q., Qi, F., Sun, M., Qu, J., Xue, J.: Identification of tomato disease types and detection of infected areas based on deep convolutional neural networks and object detection techniques. Comput. Intell. Neurosci. (2019). https://doi.org/10.1155/2019/9142753

Moshou, D., Bravo, C., West, J., Wahlen, S., McCartney, A., Ramon, H.: Automatic detection of ‘yellow rust’ in wheat using reflectance measurements and neural networks. Comput. Electron. Agric. 44 (3), 173–188 (2004). https://doi.org/10.1016/j.compag.2004.04.003

Navarro, P.J., Miller, L., Gila-Navarro, A., Díaz-Galián, M.V., Aguila, D.J., Egea-Cortines, M.: 3deepm: an ad hoc architecture based on deep learning methods for multispectral image classification. Remote Sens. (2021). https://doi.org/10.3390/rs13040729

Van De Vijver, R., et al.: In-field detection of Alternaria solani in potato crops using hyperspectral imaging. Comput. Electron. Agric. 168 , 105106 (2020). https://doi.org/10.1016/j.compag.2019.105106

Georgiev, G.T., Butler, J.J.: Long-term calibration monitoring of Spectralon diffusers BRDF in the air-ultraviolet. Appl. Opt. 46 (32), 7892 (2007). https://doi.org/10.1364/AO.46.007892

Wickham, H.: Tidy data. J. Stat. Softw. 59 (10), 1–23 (2014)

Wang, B., et al.: An ultra-lightweight efficient network for image-based plant disease and pest infection detection. Precis. Agric. 24 (5), 1836–1861 (2023). https://doi.org/10.1007/s11119-023-10020-0

Thangaraj, R., et al.: A deep convolution neural network model based on feature concatenation approach for classification of tomato leaf disease. Multimed. Tools Appl. (2023). https://doi.org/10.1007/s11042-023-16347-0

Kumar, A., Patel, V.K.: Classification and identification of disease in potato leaf using hierarchical based deep learning convolutional neural network. Multimed. Tools Appl. 82 (20), 31101–31127 (2023). https://doi.org/10.1007/s11042-023-14663-z

Kurmi, Y., Gangwar, S., Agrawal, D., Kumar, S., Srivastava, H.S.: Leaf image analysis-based crop diseases classification. Signal Image Video Process 15 (3), 589–597 (2021). https://doi.org/10.1007/s11760-020-01780-7

Bora, R., Parasar, D., Charhate, S.: A detection of tomato plant diseases using deep learning MNDLNN classifier. Signal Image Video Process 17 (7), 3255–3263 (2023). https://doi.org/10.1007/s11760-023-02498-y

Appeltans, S., Pieters, J.G., Mouazen, A.M.: Potential of laboratory hyperspectral data for in-field detection of phytophthora infestans on potato. Precis. Agric. 23 (3), 876–893 (2022). https://doi.org/10.1007/s11119-021-09865-0

Omer, S.M., Ghafoor, K.Z., Askar, S.K.: Lightweight improved yolov5 model for cucumber leaf disease and pest detection based on deep learning. Signal Image Video Process (2023). https://doi.org/10.1007/s11760-023-02865-9

Bhakta, I., et al.: A novel plant disease prediction model based on thermal images using modified deep convolutional neural network. Precis. Agric. 24 (1), 23–39 (2023). https://doi.org/10.1007/s11119-022-09927-x

Article   MathSciNet   Google Scholar  

Haagsma, M., Hagerty, C.H., Kroese, D.R., Selker, J.S.: Detection of soil-borne wheat mosaic virus using hyperspectral imaging: from lab to field scans and from hyperspectral to multispectral data. Precision Agric. 24 (3), 1030–1048 (2023). https://doi.org/10.1007/s11119-022-09986-0

Kaur, P., Harnal, S., Gautam, V., Singh, M.P., Singh, S.P.: Performance analysis of segmentation models to detect leaf diseases in tomato plant. Multimed. Tools Appl. (2023). https://doi.org/10.1007/s11042-023-16238-4

Kaur, P., Harnal, S., Gautam, V., Singh, M.P., Singh, S.P.: Hybrid deep learning model for multi biotic lesions detection in solanum lycopersicum leaves. Multimed. Tools Appl. (2023). https://doi.org/10.1007/s11042-023-15940-7

Kang, J., Jin, R., Li, X., Zhang, Y., Zhu, Z.: Spatial upscaling of sparse soil moisture observations based on ridge regression. Remote Sens. (2018). https://doi.org/10.3390/rs10020192

Xu, C., Ding, J., Qiao, Y., Zhang, L.: Tomato disease and pest diagnosis method based on the Stacking of prescription data. Comput. Electron. Agric. 197 , 106997 (2022). https://doi.org/10.1016/j.compag.2022.106997

Panchal, P., Raman, V.C., Mantri, S.: Plant diseases detection and classification using machine learning models. In: CSITSS 2019 - 2019 4th International Conference on Computational Systems and Information Technology for Sustainable Solution, Proceedings. Institute of Electrical and Electronics Engineers Inc. (2019). https://doi.org/10.1109/CSITSS47250.2019.9031029

Boguszewska-Mańkowska, D., Ruszczak, B., Zarzyńska, K.: Classification of potato varieties drought stress tolerance using supervised learning. Appl. Sci. (2022). https://doi.org/10.3390/app12041939

Srinivas, L.N.B., Bharathy, A.M.V., Ramakuri, S.K., Sethy, A., Kumar, R.: An optimized machine learning framework for crop disease detection. Multimed. Tools Appl. (2023). https://doi.org/10.1007/s11042-023-15446-2

Ruszczak, B., Boguszewska-Mańkowska, D.: Soil moisture a posteriori measurements enhancement using ensemble learning. Sensors (2022). https://doi.org/10.3390/s22124591

Xie, Y., Plett, D., Liu, H.: The promise of hyperspectral imaging for the early detection of crown rot in wheat. AgriEngineering 3 (4), 924–941 (2021). https://doi.org/10.3390/agriengineering3040058

Nguyen, C., Sagan, V., Maimaitiyiming, M., Maimaitijiang, M., Bhadra, S., Kwasniewski, M.T.: Early detection of plant viral disease using hyperspectral imaging and deep learning. Sensors (Basel) 21 (3), 742 (2021)

Mourtzinis, S., Esker, P.D., Specht, J.E., Conley, S.P.: Advancing agricultural research using machine learning algorithms. Sci. Rep. 11 (1), 17879 (2021). https://doi.org/10.1038/s41598-021-97380-7

Wicaksono, P., Aryaguna, P.A., Lazuardi, W.: Benthic habitat mapping model and cross validation using machine-learning classification algorithms. Remote Sens. (2019). https://doi.org/10.3390/rs11111279

Nagasubramanian, K., et al.: Hyperspectral band selection using genetic algorithm and support vector machines for early identification of charcoal rot disease in soybean stems. Plant Methods 14 (1) (2018). https://doi.org/10.1186/s13007-018-0349-9 . arXiv:1710.04681

Mahlein, A.-K., Hammersley, S., Oerke, E.-C., Dehne, H.-W., Goldbach, H., Grieve, B.: Supplemental blue led lighting array to improve the signal quality in hyperspectral imaging of plants. Sensors 15 (6), 12834–12840 (2015). https://doi.org/10.3390/s150612834

Crusiol, L.G.T., Nanni, M.R., Silva, G.F.C., et al.: Semi professional digital camera calibration techniques for VIS/NIR spectral data acquisition from an unmanned aerial vehicle. Int J Remote Sens 38 (8–10), 2717–2736 (2017). https://doi.org/10.1080/01431161.2016.1264032

Download references

This work was partially supported by The National Centre of Research and Development of Poland under project POIR.01.01.01 \(-\) 00.1317/17 and QZ Solutions sp. z o.o.

Author information

Authors and affiliations.

Department of Computer Science, Faculty of Electrical Engineering, Automatic Control and Informatics, Opole University of Technology, Prószkowska Street 76, 45-758, Opole, Opolskie, Poland

Bogdan Ruszczak, Krzysztof Smykała & Michał Tomaszewski

Research and Development Department, QZ Solutions Sp. z o.o., Technologiczna 2 Street, 45-839, Opole, Opolskie, Poland

Krzysztof Smykała

Escuela Técnica Superior de Ingeniería de Telecomunicación (DSIE), Universidad Politécnica de Cartagena, Campus Muralla del Mar, 30202, Cartagena, Region of Murcia, Spain

Pedro Javier Navarro Lorente

You can also search for this author in PubMed   Google Scholar

Contributions

BR: Conceptualization, methodology, software, validation, investigation, writing—original draft, Writing—review and editing, supervision, funding acquisition. KS: conceptualization, methodology, software, validation, formal analysis, literature review, investigation, writing—original draft, visualization, funding acquisition. MT: conceptualization, methodology, validation, formal analysis, literature review, investigation, writing—original draft, supervision. PJN: validation, writing—original draft, writing - review and editing, funding acquisition.

Corresponding author

Correspondence to Bogdan Ruszczak .

Ethics declarations

Conflict of interest.

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

Not applicable.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Ruszczak, B., Smykała, K., Tomaszewski, M. et al. Various tomato infection discrimination using spectroscopy. SIViP (2024). https://doi.org/10.1007/s11760-024-03247-5

Download citation

Received : 06 December 2023

Revised : 08 April 2024

Accepted : 26 April 2024

Published : 17 May 2024

DOI : https://doi.org/10.1007/s11760-024-03247-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Plant diseases classification
  • Machine learning
  • Spectroscopy
  • Hyperspectral measurements
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. 😊 Research paper on digital image processing. Digital Image Processing

    research paper on video processing

  2. (PDF) A STUDY ON THE IMPORTANCE OF IMAGE PROCESSING AND ITS APLLICATIONS

    research paper on video processing

  3. (PDF) Object Detection using Image Processing

    research paper on video processing

  4. Steps for preparing research methodology

    research paper on video processing

  5. (PDF) A Review on Image & Video Processing

    research paper on video processing

  6. 😊 Research paper on digital image processing. Digital Image Processing

    research paper on video processing

VIDEO

  1. The Art of VIDEO EDITING: Conveying Storytelling through Visuals

  2. Processing Information on Paper

  3. WORD PROCESSING OF RESEARCH PAPER by Olayres & Rabasio

  4. Tutorial on physics-based rendering for computational imaging [CVPR 2023]

  5. Word Processing • Research Paper by Diomampo and Saldaña

  6. High-speed information processing using an optical neural network

COMMENTS

  1. Video Processing Using Deep Learning Techniques: A Systematic

    Studies show lots of advanced research on various data types such as image, speech, and text using deep learning techniques, but nowadays, research on video processing is also an emerging field of computer vision. Several surveys are present on video processing using computer vision deep learning techniques, targeting specific functionality such as anomaly detection, crowd analysis, activity ...

  2. Video Processing Using Deep Learning Techniques: A Systematic

    Review (SLR) on video processing using deep learning to in vestigate the applications, functionalities, techniques, datasets, issues, and challenges by formulating the relevant research questions ...

  3. Video Processing Using Deep Learning Techniques: A Systematic

    This paper aims to present a Systematic Literature Review (SLR) on video processing using deep learning to investigate the applications, functionalities, techniques, datasets, issues, and challenges by formulating the relevant research questions (RQs). This systematic mapping includes 93 research articles from reputed databases published ...

  4. Deep learning in computer vision: A critical review of emerging

    The features of big data could be captured by DL automatically and efficiently. The current applications of DL include computer vision (CV), natural language processing (NLP), video/speech recognition (V/SP), and finance and banking (F&B). Chai and Li (2019) provided a survey of DL on NLP and the advances on V/SP. The survey emphasized the ...

  5. Applications of Video Processing and Computer Vision Sensor

    A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications. Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the ...

  6. Home

    Overview. Signal, Image and Video Processing is an interdisciplinary journal focusing on theory and practice of signal, image and video processing. Sets forth practical solutions for current signal, image and video processing problems in engineering and science. Features reviews, tutorials, and accounts of practical developments.

  7. An Overview of Traditional and Recent Trends in Video Processing

    Video processing is a significant field of research interest in recent years. Before going into the recent advancement of video processing, an overview about the traditional video processing is a matter of interest. Knowing about this, its advantages and limitations help to give a strong base and invoke an insight into the further development of this research area. This paper introduces the ...

  8. 3314 PDFs

    Explore the latest full-text research PDFs, articles, conference papers, preprints and more on IMAGE AND VIDEO PROCESSING. Find methods information, sources, references or conduct a literature ...

  9. Video coding and processing: A survey

    So the combination between AI algorithms and video coding procedure will be a hot research area in the future. 5. Conclusion. In this paper, the survey of video technologies has been presented. First, the architecture of hybrid coding framework has been shown, which includes prediction, transform, quantization, scanning and entropy coding modules.

  10. Intelligent video surveillance: a review through deep learning

    Big data applications are consuming most of the space in industry and research area. Among the widespread examples of big data, the role of video streams from CCTV cameras is equally important as other sources like social media data, sensor data, agriculture data, medical data and data evolved from space research. Surveillance videos have a major contribution in unstructured big data. CCTV ...

  11. Applications of Video Processing and Computer Vision Sensor II

    A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications. Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the ...

  12. Articles

    Hailey Joren, Otkrist Gupta and Dan Raviv. EURASIP Journal on Image and Video Processing 2022 2022 :2. Research Published on: 7 February 2022. The Correction to this article has been published in EURASIP Journal on Image and Video Processing 2023 2023 :10. Full Text.

  13. Video Understanding with Large Language Models: A Survey

    With the burgeoning growth of online video platforms and the escalating volume of video content, the demand for proficient video understanding tools has intensified markedly. Given the remarkable capabilities of Large Language Models (LLMs) in language and multimodal tasks, this survey provides a detailed overview of the recent advancements in video understanding harnessing the power of LLMs ...

  14. A Technical Analysis of Digital Image and Video Processing

    Abstract. Video processing is the most emergent area of image processing in Research and Development. This paper, we present some advanced technology interrelated to video and image processing, which will provide a super-resolution high-quality most prominent visible light video for better human visualization and background motions or gesticulations of the human brain.

  15. Machine Learning in Image and Video Processing

    Therefore, the aim of this Special Issue is to apply advanced machine learning approaches in image and video processing. The Issue will provide novel guidance for machine learning researchers and broaden the perspectives of machine learning and computer vision researchers. Original research and review articles are welcomed.

  16. (PDF) VIDEO PROCESSING AND ITS APPLICATION

    Introduction. Video proce ssing is a specific instance of sign p rocessing, whic h frequently utilizes video channels and where the info and yield. signals are video records or video tr ansfers ...

  17. Articles

    Aashania Antil. Soham Gakhar. Original Paper 28 April 2024. 1. 2. …. 64. Next. Signal, Image and Video Processing is an interdisciplinary journal focusing on theory and practice of signal, image and video processing.

  18. RMT-BVQA: Recurrent Memory Transformer-based Blind Video Quality

    With recent advances in deep learning, numerous algorithms have been developed to enhance video quality, reduce visual artefacts and improve perceptual quality. However, little research has been reported on the quality assessment of enhanced content - the evaluation of enhancement methods is often based on quality metrics that were designed for compression applications. In this paper, we ...

  19. Video generation models as world simulators

    We explore large-scale training of generative models on video data. Specifically, we train text-conditional diffusion models jointly on videos and images of variable durations, resolutions and aspect ratios. We leverage a transformer architecture that operates on spacetime patches of video and image latent codes. Our largest model, Sora, is capable of generating a minute of high fidelity video.

  20. A review of image and video colorization: From analogies to deep

    It is worth noting that since the current research on semantic understanding in natural language processing is still in the preliminary stage, the input text of text based colorization methods is more similar to a text control instruction, and more accurate text understanding and color matching are still Need further research. 4.4.

  21. Video Processing Using Deep Learning Techniques: A Systematic

    Year-wise distribution of the publication. list of publications we considered purely to answer the RQs is between the time range 2011-2020, and few papers which are beyond time range are used only for background Study III. RESULTS A total of 93 peer-reviewed research papers on video processing using deep learning techniques were studied.

  22. PDF REVIEW PAPER ON VIDEO PROCESSING

    IJNRD2401285 International Journal of Novel Research and Development (www.ijnrd.org) c678 REVIEW PAPER ON VIDEO PROCESSING Nidhin K V, Assistant Professor, Adwaitha Chandran, M.Tech Scholar, Electronics and Communication Department, NSS College, Palakkad. Abstract—Video processing involves the analysis of video data. It encompasses a wide ...

  23. Video Processing Research Papers

    Modèles et méthodes de traitement d'images pour l'analyse de la langue des signes. This paper focuses on methods applied for sign language video processing. In the first part, we present a robust traking method which detects and tracks the hands and face of a person performing Signs' language communication.

  24. OpenAI's new GPT-4o lets people interact using voice or video in the

    The model can reason through visual problems in real time as well. Using his phone, Zoph filmed himself writing an algebra equation (3x + 1 = 4) on a sheet of paper, having GPT-4o follow along. He ...

  25. CinePile: A Novel Dataset and Benchmark Specifically Designed for

    Video understanding is one of the evolving areas of research in artificial intelligence (AI), focusing on enabling machines to comprehend and analyze visual content. Tasks like recognizing objects, understanding human actions, and interpreting events within a video come under this domain. Advancements in this domain find crucial applications in autonomous driving, surveillance, and ...

  26. 2024 Conference

    The Neural Information Processing Systems Foundation is a non-profit corporation whose purpose is to foster the exchange of research advances in Artificial Intelligence and Machine Learning, principally by hosting an annual interdisciplinary academic conference with the highest ethical standards for a diverse and inclusive community.

  27. Hello GPT-4o

    Prior to GPT-4o, you could use Voice Mode to talk to ChatGPT with latencies of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4) on average. To achieve this, Voice Mode is a pipeline of three separate models: one simple model transcribes audio to text, GPT-3.5 or GPT-4 takes in text and outputs text, and a third simple model converts that text back to audio.

  28. Optimization Techniques for Sentiment Analysis Based on LLM (GPT-3)

    View PDF Abstract: With the rapid development of natural language processing (NLP) technology, large-scale pre-trained language models such as GPT-3 have become a popular research object in NLP field. This paper aims to explore sentiment analysis optimization techniques based on large pre-trained language models such as GPT-3 to improve model performance and effect and further promote the ...

  29. (PDF) A Review on Image & Video Processing

    [email protected]. Abstract. Image and Video Processing are hot topics in the field of research and development. Image processing is any form of signal processing for which the input is an image ...

  30. Various tomato infection discrimination using spectroscopy

    Diagnosing plant diseases is a difficult task, but it could be made easier with the use of advanced instrumentation and the latest machine learning techniques. This paper is a further development of a previous authors study by the authors, which has been extended to provide the classification method for tomato diseases and to indicate the spectral ranges of greatest importance for this process ...