Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Review Article
  • Open access
  • Published: 22 April 2020

Deep learning in mental health outcome research: a scoping review

  • Chang Su 1 ,
  • Zhenxing Xu 1 ,
  • Jyotishman Pathak 1 &
  • Fei Wang 1  

Translational Psychiatry volume  10 , Article number:  116 ( 2020 ) Cite this article

49k Accesses

141 Citations

20 Altmetric

Metrics details

  • Psychiatric disorders

Mental illnesses, such as depression, are highly prevalent and have been shown to impact an individual’s physical health. Recently, artificial intelligence (AI) methods have been introduced to assist mental health providers, including psychiatrists and psychologists, for decision-making based on patients’ historical data (e.g., medical records, behavioral data, social media usage, etc.). Deep learning (DL), as one of the most recent generation of AI technologies, has demonstrated superior performance in many real-world applications ranging from computer vision to healthcare. The goal of this study is to review existing research on applications of DL algorithms in mental health outcome research. Specifically, we first briefly overview the state-of-the-art DL techniques. Then we review the literature relevant to DL applications in mental health outcomes. According to the application scenarios, we categorize these relevant articles into four groups: diagnosis and prognosis based on clinical data, analysis of genetics and genomics data for understanding mental health conditions, vocal and visual expression data analysis for disease detection, and estimation of risk of mental illness using social media data. Finally, we discuss challenges in using DL algorithms to improve our understanding of mental health conditions and suggest several promising directions for their applications in improving mental health diagnosis and treatment.

Similar content being viewed by others

mental health prediction using machine learning research paper

Natural language processing applied to mental illness detection: a narrative review

mental health prediction using machine learning research paper

Predicting the future of neuroimaging predictive models in mental health

mental health prediction using machine learning research paper

Predictive modeling of depression and anxiety using electronic health records and a novel machine learning approach with artificial intelligence

Introduction.

Mental illness is a type of health condition that changes a person’s mind, emotions, or behavior (or all three), and has been shown to impact an individual’s physical health 1 , 2 . Mental health issues including depression, schizophrenia, attention-deficit hyperactivity disorder (ADHD), and autism spectrum disorder (ASD), etc., are highly prevalent today and it is estimated that around 450 million people worldwide suffer from such problems 1 . In addition to adults, children and adolescents under the age of 18 years also face the risk of mental health disorders. Moreover, mental health illnesses have also been one of the most serious and prevalent public health problems. For example, depression is a leading cause of disability and can lead to an increased risk for suicidal ideation and suicide attempts 2 .

To better understand the mental health conditions and provide better patient care, early detection of mental health problems is an essential step. Different from the diagnosis of other chronic conditions that rely on laboratory tests and measurements, mental illnesses are typically diagnosed based on an individual’s self-report to specific questionnaires designed for the detection of specific patterns of feelings or social interactions 3 . Due to the increasing availability of data pertaining to an individual’s mental health status, artificial intelligence (AI) and machine learning (ML) technologies are being applied to improve our understanding of mental health conditions and have been engaged to assist mental health providers for improved clinical decision-making 4 , 5 , 6 . As one of the latest advances in AI and ML, deep learning (DL), which transforms the data through layers of nonlinear computational processing units, provides a new paradigm to effectively gain knowledge from complex data 7 . In recent years, DL algorithms have demonstrated superior performance in many data-rich application scenarios, including healthcare 8 , 9 , 10 .

In a previous study, Shatte et al. 11 explored the application of ML techniques in mental health. They reviewed literature by grouping them into four main application domains: diagnosis, prognosis, and treatment, public health, as well as research and clinical administration. In another study, Durstewitz et al. 9 explored the emerging area of application of DL techniques in psychiatry. They focused on DL in the studies of brain dynamics and subjects’ behaviors, and presented the insights of embedding the interpretable computational models into statistical context. In contrast, this study aims to provide a scoping review of the existing research applying DL methodologies on the analysis of different types of data related to mental health conditions. The reviewed articles are organized into four main groups according to the type of the data analyzed, including the following: (1) clinical data, (2) genetic and genomics data, (3) vocal and visual expression data, and (4) social media data. Finally, the challenges the current studies faced with, as well as future research directions towards bridging the gap between the application of DL algorithms and patient care, are discussed.

Deep learning overview

ML aims at developing computational algorithms or statistical models that can automatically infer hidden patterns from data 12 , 13 . Recent years have witnessed an increasing number of ML models being developed to analyze healthcare data 4 . However, conventional ML approaches require a significant amount of feature engineering for optimal performance—a step that is necessary for most application scenarios to obtain good performance, which is usually resource- and time-consuming.

As the newest wave of ML and AI technologies, DL approaches aim at the development of an end-to-end mechanism that maps the input raw features directly into the outputs through a multi-layer network structure that is able to capture the hidden patterns within the data. In this section, we will review several popular DL model architectures, including deep feedforward neural network (DFNN), recurrent neural network (RNN) 14 , convolutional neural network (CNN) 15 , and autoencoder 16 . Figure 1 provides an overview of these architectures.

figure 1

a Deep feedforward neural network (DFNN). It is the basic design of DL models. Commonly, a DFNN contains multiple hidden layers. b A recurrent neural network (RNN) is presented to process sequence data. To encode history information, each recurrent neuron receives the input element and the state vector of the predecessor neuron, and yields a hidden state fed to the successor neuron. For example, not only the individual information but also the dependence of the elements of the sequence x 1  → x 2  → x 3  → x 4  → x 5 is encoded by the RNN architecture. c Convolutional neural network (CNN). Between input layer (e.g., input neuroimage) and output layer, a CNN commonly contains three types of layers: the convolutional layer that is to generate feature maps by sliding convolutional kernels in the previous layer; the pooling layer is used to reduce dimensionality of previous convolutional layer; and the fully connected layer is to make prediction. For the illustrative purpose, this example only has one layer of each type; yet, a real-world CNN would have multiple convolutional and pooling layers (usually in an interpolated manner) and one fully connected layer. d Autoencoder consists of two components: the encoder, which learns to compress the input data into a latent representation layer by layer, whereas the decoder, inverse to the encoder, learns to reconstruct the data at the output layer. The learned compressed representations can be fed to the downstream predictive model.

Deep feedforward neural network

Artificial neural network (ANN) is proposed with the intention of mimicking how human brain works, where the basic element is an artificial neuron depicted in Fig. 2a . Mathematically, an artificial neuron is a nonlinear transformation unit, which takes the weighted summation of all inputs and feeds the result to an activation function, such as sigmoid, rectifier (i.e., rectified linear unit [ReLU]), or hyperbolic tangent (Fig. 2b ). An ANN is composed of multiple artificial neurons with different connection architectures. The simplest ANN architecture is the feedforward neural network (FNN), which stacks the neurons layer by layer in a feedforward manner (Fig. 1a ), where the neurons across adjacent layers are fully connected to each other. The first layer of the FNN is the input layer that each unit receives one dimension of the data vector. The last layer is the output layer that outputs the probabilities that a subject belonging to different classes (in classification). The layers between the input and output layers are the hidden layers. A DFNN usually contains multiple hidden layers. As shown in Fig. 2a , there is a weight parameter associated with each edge in the DFNN, which needs to be optimized by minimizing some training loss measured on a specific training dataset (usually through backpropagation 17 ). After the optimal set of parameters are learned, the DFNN can be used to predict the target value (e.g., class) of any testing data vectors. Therefore, a DFNN can be viewed as an end-to-end process that transforms a specific raw data vector to its target layer by layer. Compared with the traditional ML models, DFNN has shown superior performance in many data mining tasks and have been introduced to the analysis of clinical data and genetic data to predict mental health conditions. We will discuss the applications of these methods further in the Results section.

figure 2

a An illustration of basic unit of neural networks, i.e., artificial neuron. Each input x i is associated with a weight w i . The weighted sum of all inputs Σ w i x i is fed to a nonlinear activation function f to generate the output y j of the j -th neuron, i.e., y j  =  f (Σ w i x i ). b Illustrations of the widely used nonlinear activation function.

Recurrent neural network

RNNs were designed to analyze sequential data such as natural language, speech, and video. Given an input sequence, the RNN processes one element of the sequence at a time by feeding to a recurrent neuron. To encode the historical information along the sequence, each recurrent neuron receives the input element at the corresponding time point and the output of the neuron at previous time stamp, and the output will also be provided to the neuron at next time stamp (this is also where the term “recurrent” comes from). An example RNN architecture is shown in Fig. 1b where the input is a sequence of words (a sentence). The recurrence link (i.e., the edge linking different neurons) enables RNN to capture the latent semantic dependencies among words and the syntax of the sentence. In recent years, different variants of RNN, such as long short-term memory (LSTM) 18 and gated recurrent unit 19 have been proposed, and the main difference among these models is how the input is mapped to the output for the recurrent neuron. RNN models have demonstrated state-of-the-art performance in various applications, especially natural language processing (NLP; e.g., machine translation and text-based classification); hence, they hold great premise in processing clinical notes and social media posts to detect mental health conditions as discussed below.

Convolutional neural network

CNN is a specific type of deep neural network originally designed for image analysis 15 , where each pixel corresponds to a specific input dimension describing the image. Similar to a DFNN, CNN also maps these input image pixels to the corresponding target (e.g., image class) through layers of nonlinear transformations. Different from DFNN, where only fully connected layers are considered, there are typically three types of layers in a CNN: a convolution–activation layer, a pooling layer, and a fully connected layer (Fig. 1c ). The convolution–activation layer first convolves the entire feature map obtained from previous layer with small two-dimensional convolution filters. The results from each convolution filter are activated through a nonlinear activation function in the same way as a DFNN. A pooling layer reduces the size of the feature map through sub-sampling. The fully connected layer is analogous to the hidden layer in a DFNN, where each neuron is connected to all neurons of the previous layer. The convolution–activation layer extracts locally invariant patterns from the feature maps. The pooling layer effectively reduces the feature dimensionality to avoid model overfitting. The fully connected layer explores the global feature interactions as in DFNNs. Different combinations of these three types of layers constitute different CNN architectures. Because of the various characteristics of images such as local self-similarity, compositionality, and translational and deformation invariance, CNN has demonstrated state-of-the-art performance in many computer vision tasks 7 . Hence, the CNN models are promising in processing clinical images and expression data (e.g., facial expression images) to detect mental health conditions. We will discuss the application of these methods in the Results section.

Autoencoder

Autoencoder is a special variant of the DFNN aimed at learning new (usually more compact) data representations that can optimally reconstruct the original data vectors 16 , 20 . An autoencoder typically consists of two components (Fig. 1d ) as follows: (1) the encoder, which learns new representations (usually with reduced dimensionality) from the input data through a multi-layer FNN; and (2) the decoder, which is exactly the reverse of the encoder, reconstructs the data in their original space from the representations derived from the encoder. The parameters in the autoencoder are learned through minimizing the reconstruction loss. Autoencoder has demonstrated the capacity of extracting meaningful features from raw data without any supervision information. In the studies of mental health outcomes, the use of autoencoder has resulted in desirable improvement in analyzing clinical and expression image data, which will be detailed in the Results section.

The processing and reporting of the results of this review were guided by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines 21 . To thoroughly review the literature, a two-step method was used to retrieve all the studies on relevant topics. First, we conducted a search of the computerized bibliographic databases including PubMed and Web of Science. The search strategy is detailed in Supplementary Appendix 1 . The literature search comprised articles published until April 2019. Next, a snowball technique was applied to identify additional studies. Furthermore, we manually searched other resources, including Google Scholar, and Institute of Electrical and Electronics Engineers (IEEE Xplore), to find additional relevant articles.

Figure 3 presents the study selection process. All articles were evaluated carefully and studies were excluded if: (1) the main outcome is not a mental health condition; (2) the model involved is not a DL algorithm; (3) full-text of the article is not accessible; and (4) the article is written not in English.

figure 3

In total, 57 studies, in terms of clinical data analysis, genetic data analysis, vocal and visual expression data analysis, and social media data analysis, which met our eligibility criteria, were included in this review.

A total of 57 articles met our eligibility criteria. Most of the reviewed articles were published between 2014 and 2019. To clearly summarize these articles, we grouped them into four categories according to the types of data analyzed, including (1) clinical data, (2) genetic and genomics data, (3) vocal and visual expression data, and (4) social media data. Table 1 summarizes the characteristics of these selected studies.

Clinical data

Neuroimages.

Previous studies have shown that neuroimages can record evidence of neuropsychiatric disorders 22 , 23 . Two common types of neuroimage data analyzed in mental health studies are functional magnetic resonance imaging (fMRI) and structural MRI (sMRI) data. In fMRI data, the brain activity is measured by identification of the changes associated with blood flow, based on the fact that cerebral blood flow and neuronal activation are coupled 24 . In sMRI data, the neurological aspect of a brain is described based on the structural textures, which show some information in terms of the spatial arrangements of voxel intensities in 3D. Recently, DL technologies have been demonstrated in analyzing both fMRI and sMRI data.

One application of DL in fMRI and sMRI data is the identification of ADHD 25 , 26 , 27 , 28 , 29 , 30 , 31 . To learn meaningful information from the neuroimages, CNN and deep belief network (DBN) models were used. In particular, the CNN models were mainly used to identify local spatial patterns and DBN models were to obtain a deep hierarchical representation of the neuroimages. Different patterns were discovered between ADHDs and controls in the prefrontal cortex and cingulated cortex. Also, several studies analyzed sMRIs to investigate schizophrenia 32 , 33 , 34 , 35 , 36 , where DFNN, DBN, and autoencoder were utilized. These studies reported abnormal patterns of cortical regions and cortical–striatal–cerebellar circuit in the brain of schizophrenia patients, especially in the frontal, temporal, parietal, and insular cortices, and in some subcortical regions, including the corpus callosum, putamen, and cerebellum. Moreover, the use of DL in neuroimages also targeted at addressing other mental health disorders. Geng et al. 37 proposed to use CNN and autoencoder to acquire meaningful features from the original time series of fMRI data for predicting depression. Two studies 31 , 38 integrated the fMRI and sMRI data modalities to develop predictive models for ASDs. Significant relationships between fMRI and sMRI data were observed with regard to ASD prediction.

Challenges and opportunities

The aforementioned studies have demonstrated that the use of DL techniques in analyzing neuroimages can provide evidence in terms of mental health problems, which can be translated into clinical practice and facilitate the diagnosis of mental health illness. However, multiple challenges need to be addressed to achieve this objective. First, DL architectures generally require large data samples to train the models, which may pose a difficulty in neuroimaging analysis because of the lack of such data 39 . Second, typically the imaging data lie in a high-dimensional space, e.g., even a 64 × 64 2D neuroimage can result in 4096 features. This leads to the risk of overfitting by the DL models. To address this, most existing studies reported to utilize MRI data preprocessing tools such as Statistical Parametric Mapping ( https://www.fil.ion.ucl.ac.uk/spm/ ), Data Processing Assistant for Resting-State fMRI 40 , and fMRI Preprocessing Pipeline 41 to extract useful features before feeding to the DL models. Even though an intuitive attribute of DL is the capacity to learn meaningful features from raw data, feature engineering tools are needed especially in the case of small sample size and high-dimensionality, e.g., the neuroimage analysis. The use of such tools mitigates the overfitting risk of DL models. As reported in some selected studies 28 , 31 , 35 , 37 , the DL models can benefit from feature engineering techniques and have been shown to outperform the traditional ML models in the prediction of multiple conditions such as depression, schizophrenia, and ADHD. However, such tools extract features relying on prior knowledge; hence may omit some information that is meaningful for mental outcome research but unknown yet. An alternative way is to use CNN to automatically extract information from the raw data. As reported in the previous study 10 , CNNs perform well in processing raw neuroimage data. Among the studies reviewed in this study, three 29 , 30 , 37 reported to involve CNN layers and achieved desirable performances.

Electroencephalogram data

As a low-cost, small-size, and high temporal resolution signal containing up to several hundred channels, analysis of electroencephalogram (EEG) data has gained significant attention to study brain disorders 42 . As the EEG signal is one kind of streaming data that presents a high density and continuous characteristics, it challenges traditional feature engineering-based methods to obtain sufficient information from the raw EEG data to make accurate predictions. To address this, recently the DL models have been employed to analyze raw EEG signal data.

Four articles reviewed proposed to use DL in understanding mental health conditions based on the analysis of EEG signals. Acharya et al. 43 used CNN to extract features from the input EEG signals. They found that the EEG signals from the right hemisphere of the human brain are more distinctive in terms of the detection of depression than those from the left hemisphere. The findings provided shreds of evidence that depression is associated with a hyperactive right hemisphere. Mohan et al. 44 modeled the raw EEG signals by DFNN to obtain information about the human brain waves. They found that the signals collected from the central (C3 and C4) regions are marginally higher compared with other brain regions, which can be used to distinguish the depressed and normal subjects from the brain wave signals. Zhang et al. 45 proposed a concatenated structure of deep recurrent and 3D CNN to obtain EEG features across different tasks. They reported that the DL model can capture the spectral changes of EEG hemispheric asymmetry to distinguish different mental workload effectively. Li et al. 46 presented a computer-aided detection system by extracting multiple types of information (e.g., spectral, spatial, and temporal information) to recognize mild depression based on CNN architecture. The authors found that both spectral and temporal information of EEG are crucial for prediction of depression.

EEG data are usually classified as streaming data that are continuous and are of high density. Despite the initial success in applying DL algorithms to analyze EEG data for studying multiple mental health conditions, there exist several challenges. One major challenge is that raw EEG data gathered from sensors have a certain degree of erroneous, noisy, and redundant information caused by discharged batteries, failures in sensor readings, and intermittent communication loss in wireless sensor networks 47 . This may challenge the model in extracting meaningful information from noise. Multiple preprocessing steps (e.g., data denoising, data interpolation, data transformation, and data segmentation) are necessary for dealing with the raw EEG signal before feeding to the DL models. Besides, due to the dense characteristics in the raw EEG data, analysis of the streaming data is computationally more expensive, which poses a challenge for the model architecture selection. A proper model should be designed relatively with less training parameters. This is one reason why the reviewed studies are mainly based on the CNN architecture.

Electronic health records

Electronic health records (EHRs) are systematic collections of longitudinal, patient-centered records. Patients’ EHRs consist of both structured and unstructured data: the structured data include information about a patient’s diagnosis, medications, and laboratory test results, and the unstructured data include information in clinical notes. Recently, DL models have been applied to analyze EHR data to study mental health disorders 48 .

The first and foremost issue for analyzing the structured EHR data is how to appropriately handle the longitudinal records. Traditional ML models address this by collapsing patients’ records within a certain time window into vectors, which comprised the summary of statistics of the features in different dimensions 49 . For instance, to estimate the probability of suicide deaths, Choi et al. 50 leveraged a DFNN to model the baseline characteristics. One major limitation of these studies is the omittance of temporality among the clinical events within EHRs. To overcome this issue, RNNs are more commonly used for EHR data analysis as an RNN intuitively handles time-series data. DeepCare 51 , a long short-term memory network (LSTM)-based DL model, encodes patient’s long-term health state trajectories to predict the future outcomes of depressive episodes. As the LSTM architecture appropriately captures disease progression by modeling the illness history and the medical interventions, DeepCare achieved over 15% improvement in prediction, compared with the conventional ML methods. In addition, Lin et al. 52 designed two DFNN models for the prediction of antidepressant treatment response and remission. The authors reported that the proposed DFNN can achieve an area under the receiver operating characteristic curve (AUC) of 0.823 in predicting antidepressant response.

Analyzing the unstructured clinical notes in EHRs refers to the long-standing topic of NLP. To extract meaningful knowledge from the text, conventional NLP approaches mostly define rules or regular expressions before the analysis. However, it is challenging to enumerate all possible rules or regular expressions. Due to the recent advance of DL in NLP tasks, DL models have been developed to mine clinical text data from EHRs to study mental health conditions. Geraci et al. 53 utilized term frequency-inverse document frequency to represent the clinical documents by words and developed a DFNN model to identify individuals with depression. One major limitation of such an approach is that the semantics and syntax of sentences are lost. In this context, CNN 54 and RNN 55 have shown superiority in modeling syntax for text-based prediction. In particular, CNN has been used to mine the neuropsychiatric notes for predicting psychiatric symptom severity 56 , 57 . Tran and Kavuluru 58 used an RNN to analyze the history of present illness in neuropsychiatric notes for predicting mental health conditions. The model engaged an attention mechanism 55 , which can specify the importance of the words in prediction, making the model more interpretable than their previous CNN model 56 .

Although DL has achieved promising results in EHR analysis, several challenges remain unsolved. On one hand, different from diagnosing physical health condition such as diabetes, the diagnosis of mental health conditions lacks direct quantitative tests, such as a blood chemistry test, a buccal swab, or urinalysis. Instead, the clinicians evaluate signs and symptoms through patient interviews and questionnaires during which they gather information based on patient’s self-report. Collection and deriving inferences from such data deeply relies on the experience and subjectivity of the clinician. This may account for signals buried in noise and affect the robustness of the DL model. To address this challenge, a potential way is to comprehensively integrate multimodal clinical information, including structured and unstructured EHR information, as well as neuroimaging and EEG data. Another way is to incorporate existing medical knowledge, which can guide model being trained in the right direction. For instance, the biomedical knowledge bases contain massive verified interactions between biomedical entities, e.g., diseases, genes, and drugs 59 . Incorporating such information brings in meaningful medical constraints and may help to reduce the effects of noise on model training process. On the other hand, implementing a DL model trained from one EHR system into another system is challenging, because EHR data collection and representation is rarely standardized across hospitals and clinics. To address this issue, national/international collaborative efforts such as Observational Health Data Sciences and Informatics ( https://ohdsi.org ) have developed common data models, such as OMOP, to standardize EHR data representation for conducting observational data analysis 60 .

Genetic data

Multiple studies have found that mental disorders, e.g., depression, can be associated with genetic factors 61 , 62 . Conventional statistical studies in genetics and genomics, such as genome-wide association studies, have identified many common and rare genetic variants, such as single-nucleotide polymorphisms (SNPs), associated with mental health disorders 63 , 64 . Yet, the effect of the genetic factors is small and many more have not been discovered. With the recent developments in next-generation sequencing techniques, a massive volume of high-throughput genome or exome sequencing data are being generated, enabling researchers to study patients with mental health disorders by examining all types of genetic variations across an individual’s genome. In recent years, DL 65 , 66 has been applied to identify genetic risk factors associated with mental illness, by borrowing the capacity of DL in identifying highly complex patterns in large datasets. Khan and Wang 67 integrated genetic annotations, known brain expression quantitative trait locus, and enhancer/promoter peaks to generate feature vectors of variants, and developed a DFNN, named ncDeepBrain, to prioritized non-coding variants associated with mental disorders. To further prioritize susceptibility genes, they designed another deep model, iMEGES 68 , which integrates the ncDeepBrain score, general gene scores, and disease-specific scores for estimating gene risk. Wang et al. 69 developed a novel deep architecture that combines deep Boltzmann machine architecture 70 with conditional and lateral connections derived from the gene regulatory network. The model provided insights about intermediate phenotypes and their connections to high-level phenotypes (disease traits). Laksshman et al. 71 used exome sequencing data to predict bipolar disorder outcomes of patients. They developed a CNN and used the convolution mechanism to capture correlations of the neighboring loci within the chromosome.

Although the use of genetic data in DL in studying mental health conditions shows promise, multiple challenges need to be addressed. For DL-based risk c/gene prioritization efforts, one major challenge is the limitation of labeled data. On one hand, the positive samples are limited, as known risk SNPs or genes associated with mental health conditions are limited. For example, there are about 108 risk loci that were genome-wide significant in ASD. On the other hand, the negative samples (i.e., SNPs, variants, or genes) may not be the “true” negative, as it is unclear whether they are associated with the mental illness yet. Moreover, it is also challenging to develop DL models for analyzing patient’s sequencing data for mental illness prediction, as the sequencing data are extremely high-dimensional (over five million SNPs in the human genome). More prior domain knowledge is needed to guide the DL model extracting patterns from the high-dimensional genomic space.

Vocal and visual expression data

The use of vocal (voice or speech) and visual (video or image of facial or body behaviors) expression data has gained the attention of many studies in mental health disorders. Modeling the evolution of people’s emotional states from these modalities has been used to identify mental health status. In essence, the voice data are continuous and dense signals, whereas the video data are sequences of frames, i.e., images. Conventional ML models for analyzing such types of data suffer from the sophisticated feature extraction process. Due to the recent success of applying DL in computer vision and sequence data modeling, such models have been introduced to analyze the vocal and/or visual expression data. In this work, most articles reviewed are to predict mental health disorders based on two public datasets: (i) the Chi-Mei corpus, collected by using six emotional videos to elicit facial expressions and speech responses of the subjects of bipolar disorder, unipolar depression, and healthy controls; 72 and (ii) the International Audio/Visual Emotion Recognition Challenges (AVEC) depression dataset 73 , 74 , 75 , collected within human–computer interaction scenario. The proposed models include CNNs, RNNs, autoencoders, as well as hybrid models based on the above ones. In particular, CNNs were leveraged to encode the temporal and spectral features from the voice signals 76 , 77 , 78 , 79 , 80 and static facial or physical expression features from the video frames 79 , 81 , 82 , 83 , 84 . Autoencoders were used to learn low-dimensional representations for people’s vocal 85 , 86 and visual expression 87 , 88 , and RNNs were engaged to characterize the temporal evolution of emotion based on the CNN-learned features and/or other handcraft features 76 , 81 , 84 , 85 , 86 , 87 , 88 , 89 , 90 . Few studies focused on analyzing static images using a CNN architecture to predict mental health status. Prasetio et al. 91 identified the stress types (e.g., neutral, low stress, and high stress) from facial frontal images. Their proposed CNN model outperforms the conventional ML models by 7% in terms of prediction accuracy. Jaiswal et al. 92 investigated the relationship between facial expression/gestures and neurodevelopmental conditions. They reported accuracy over 0.93 in the diagnostic prediction of ADHD and ASD by using the CNN architecture. In addition, thermal images that track persons’ breathing patterns were also fed to a deep model to estimate psychological stress level (mental overload) 93 .

From the above summary, we can observe that analyzing vocal and visual expression data can capture the pattern of subjects’ emotion evolution to predict mental health conditions. Despite the promising initial results, there remain challenges for developing DL models in this field. One major challenge is to link vocal and visual expression data with the clinical data of patients, given the difficulties involved in collecting such expression data during clinical practice. Current studies analyzed vocal and visual expression over individual datasets. Without clinical guidance, the developed prediction models have limited clinical meanings. Linking patients’ expression information with clinical variables may help to improve both the interpretability and robustness of the model. For example, Gupta et al. 94 designed a DFNN for affective prediction from audio and video modalities. The model incorporated depression severity as the parameter, linking the effects of depression on subjects’ affective expressions. Another challenge is the limitation of the samples. For example, the Chi-Mei dataset contains vocal–visual data from only 45 individuals (15 with bipolar disorder, 15 with unipolar disorder, and 15 healthy controls). Also, there is a lack of “emotion labels” for people’s vocal and visual expression. Apart from improving the datasets, an alternative way to solve this challenge is to use transfer learning, which transfers knowledge gained with one dataset (usually more general) to the target dataset. For example, some studies trained autoencoder in public emotion database such as eNTERFACE 95 to generate emotion profiles (EPs). Other studies 83 , 84 pre-trained CNN over general facial expression datasets 96 , 97 for extracting face appearance features.

Social media data

With the widespread proliferation of social media platforms, such as Twitter and Reddit, individuals are increasingly and publicly sharing information about their mood, behavior, and any ailments one might be suffering. Such social media data have been used to identify users’ mental health state (e.g., psychological stress and suicidal ideation) 6 .

In this study, the articles that used DL to analyze social media data mainly focused on stress detection 98 , 99 , 100 , 101 , depression identification 102 , 103 , 104 , 105 , 106 , and estimation of suicide risk 103 , 105 , 107 , 108 , 109 . In general, the core concept across these work is to mine the textual, and where applicable graphical, content of users’ social media posts to discover cues for mental health disorders. In this context, the RNN and CNN were largely used by the researchers. Especially, RNN usually introduces an attention mechanism to specify the importance of the input elements in the classification process 55 . This provides some interpretability for the predictive results. For example, Ive et al. 103 proposed a hierarchical RNN architecture with an attention mechanism to predict the classes of the posts (including depression, autism, suicidewatch, anxiety, etc.). The authors observed that, benefitting from the attention mechanism, the model can predict risk text efficiently and extract text elements crucial for making decisions. Coppersmith et al. 107 used LSTM to discover quantifiable signals about suicide attempts based on social media posts. The proposed model can capture contextual information between words and obtain nuances of language related to suicide.

Apart from text, users also post images on social media. The properties of the images (e.g., color theme, saturation, and brightness) provide some cues reflecting users’ mental health status. In addition, millions of interactions and relationships among users can reflect the social environment of individuals that is also a kind of risk factors for mental illness. An increasing number of studies attempted to combine these two types of information with text content for predictive modeling. For example, Lin et al. 99 leveraged the autoencoder to extract low-level and middle-level representations from texts, images, and comments based on psychological and art theories. They further extended their work with a hybrid model based on CNN by integrating post content and social interactions 101 . The results provided an implication that the social structure of the stressed users’ friends tended to be less connected than that of the users without stress.

The aforementioned studies have demonstrated that using social media data has the potential to detect users with mental health problems. However, there are multiple challenges towards the analysis of social media data. First, given that social media data are typically de-identified, there is no straightforward way to confirm the “true positives” and “true negatives” for a given mental health condition. Enabling the linkage of user’s social media data with their EHR data—with appropriate consent and privacy protection—is challenging to scale, but has been done in a few settings 110 . In addition, most of the previous studies mainly analyzed textual and image data from social media platforms, and did not consider analyzing the social network of users. In one study, Rosenquist et al. 111 reported that the symptoms of depression are highly correlated inside the circle of friends, indicating that social network analysis is likely to be a potential way to study the prevalence of mental health problems. However, comprehensively modeling text information and network structure remains challenging. In this context, graph convolutional networks 112 have been developed to address networked data mining. Moreover, although it is possible to discover online users with mental illness by social media analysis, translation of this innovation into practical applications and offer aid to users, such as providing real-time interventions, are largely needed 113 .

Discussion: findings, open issues, and future directions

Principle findings.

The purpose of this study is to investigate the current state of applications of DL techniques in studying mental health outcomes. Out of 2261 articles identified based on our search terms, 57 studies met our inclusion criteria and were reviewed. Some studies that involved DL models but did not highlight the DL algorithms’ features on analysis were excluded. From the above results, we observed that there are a growing number of studies using DL models for studying mental health outcomes. Particularly, multiple studies have developed disease risk prediction models using both clinical and non-clinical data, and have achieved promising initial results.

DL models “think to learn” like a human brain relying on their multiple layers of interconnected computing neurons. Therefore, to train a deep neural network, there are multiple parameters (i.e., weights associated links between neurons within the network) being required to learn. This is one reason why DL has achieved great success in the fields where a massive volume of data can be easily collected, such as computer vision and text mining. Yet, in the health domain, the availability of large-scale data is very limited. For most selected studies in this review, the sample sizes are under a scale of 10 4 . Data availability is even more scarce in the fields of neuroimaging, EEG, and gene expression data, as such data reside in a very high-dimensional space. This then leads to the problem of “curse of dimensionality” 114 , which challenges the optimization of the model parameters.

One potential way to address this challenge is to reduce the dimensionality of the data by feature engineering before feeding information to the DL models. On one hand, feature extraction approaches can be used to obtain different types of features from the raw data. For example, several studies reported in this review have attempted to use preprocessing tools to extract features from neuroimaging data. On the other hand, feature selection that is commonly used in conventional ML models is also an option to reduce data dimensionality. However, the feature selection approaches are not often used in the DL application scenario, as one of the intuitive attributes of DL is the capacity to learn meaningful features from “all” available data. The alternative way to address the issue of data bias is to use transfer learning where the objective is to improve learning a new task through the transfer of knowledge from a related task that has already been learned 115 . The basic idea is that data representations learned in the earlier layers are more general, whereas those learned in the latter layers are more specific to the prediction task 116 . In particular, one can first pre-train a deep neural network in a large-scale “source” dataset, then stack fully connected layers on the top of the network and fine-tune it in the small “target” dataset in a standard backpropagation manner. Usually, samples in the “source” dataset are more general (e.g., general image data), whereas those in the “target” dataset are specific to the task (e.g., medical image data). A popular example of the success of transfer learning in the health domain is the dermatologist-level classification of skin cancer 117 . The authors introduced Google’s Inception v3 CNN architecture pre-trained over 1.28 million general images and fine-tuned in the clinical image dataset. The model achieved very high-performance results of skin cancer classification in epidermal (AUC = 0.96), melanocytic (AUC = 0.96), and melanocytic–dermoscopic images (AUC = 0.94). In facial expression-based depression prediction, Zhu et al. 83 pre-trained CNN on the public face recognition dataset to model the static facial appearance, which overcomes the issue that there is no facial expression label information. Chao et al. 84 also pre-trained CNN to encode facial expression information. The transfer scheme of both of the two studies has been demonstrated to be able to improve the prediction performance.

Diagnosis and prediction issues

Unlike the diagnosis of physical conditions that can be based on lab tests, diagnoses of the mental illness typically rely on mental health professionals’ judgment and patient self-report data. As a result, such a diagnostic system may not accurately capture the psychological deficits and symptom progression to provide appropriate therapeutic interventions 118 , 119 . This issue accordingly accounts for the limitation of the prediction models to assist clinicians to make decisions. Except for several studies using the unsupervised autoencoder for learning low-dimensional representations, most studies reviewed in this study reported using supervised DL models, which need the training set containing “true” (i.e., expert provided) labels to optimize the model parameters before the model being used to predict labels of new subjects. Inevitably, the quality of the expert-provided diagnostic labels used for training sets the upper-bound for the prediction performance of the model.

One intuitive route to address this issue is to use an unsupervised learning scheme that, instead of learning to predict clinical outcomes, aims at learning compacted yet informative representations of the raw data. A typical example is the autoencoder (as shown in Fig. 1d ), which encodes the raw data into a low-dimensional space, from which the raw data can be reconstructed. Some studies reviewed have proposed to leverage autoencoder to improve our understanding of mental health outcomes. A constraint of the autoencoder is that the input data should be preprocessed to vectors, which may lead to information loss for image and sequence data. To address this, recently convolutional-autoencoder 120 and LSTM-autoencoder 121 have been developed, which integrate the convolution layers and recurrent layers with the autoencoder architecture and enable us to learn informative low-dimensional representations from the raw image data and sequence data, respectively. For instance, Baytas et al. 122 developed a variation of LSTM-autoencoder on patient EHRs and grouped Parkinson’s disease patients into meaningful subtypes. Another potential way is to predict other clinical outcomes instead of the diagnostic labels. For example, several selected studies proposed to predict symptom severity scores 56 , 57 , 77 , 82 , 84 , 87 , 89 . In addition, Du et al. 108 attempted to identify suicide-related psychiatric stressors from users’ posts on Twitter, which plays an important role in the early prevention of suicidal behaviors. Furthermore, training model to predict future outcomes such as treatment response, emotion assessments, and relapse time is also a promising future direction.

Multimodal modeling

The field of mental health is heterogeneous. On one hand, mental illness refers to a variety of disorders that affect people’s emotions and behaviors. On the other hand, though the exact causes of most mental illnesses are unknown to date, it is becoming increasingly clear that the risk factors for these diseases are multifactorial as multiple genetic, environmental, and social factors interact to influence an individual’s mental health 123 , 124 . As a result of domain heterogeneity, researchers have the chance to study the mental health problems from different perspectives, from molecular, genomic, clinical, medical imaging, physiological signal to facial, and body expressive and online behavioral. Integrative modeling of such multimodal data means comprehensively considering different aspects of the disease, thus likely obtaining deep insight into mental health. In this context, DL models have been developed for multimodal modeling. As shown in Fig. 4 , the hierarchical structure of DL makes it easily compatible with multimodal integration. In particular, one can model each modality with a specific network and combine them by the final fully connected layers, such that parameters can be jointly learned by a typical backpropagation manner. In this review, we found an increasing number of studies have attempted to use multimodal modeling. For example, Zou et al. 28 developed a multimodal model composed of two CNNs for modeling fMRI and sMRI modalities, respectively. The model achieved 69.15% accuracy in predicting ADHD, which outperformed the unimodal models (66.04% for fMRI modal-based and 65.86% for sMRI modal-based). Yang et al. 79 proposed a multimodal model to combine vocal and visual expression for depression cognition. The model results in 39% lower prediction error than the unimodal models.

figure 4

One can model each modality with a specific network and combine them using the final fully-connected layers. In this way, parameters of the entire neural network can be jointly learned in a typical backpropagation manner.

Model interpretability

Due to the end-to-end design, the DL models usually appear to be “black boxes”: they take raw data (e.g., MRI images, free-text of clinical notes, and EEG signals) as input, and yield output to reach a conclusion (e.g., the risk of a mental health disorder) without clear explanations of their inner working. Although this might not be an issue in other application domains such as identifying animals from images, in health not only the model’s prediction performance but also the clues for making the decision are important. For example in the neuroimage-based depression identification, despite estimation of the probability that a patient suffers from mental health deficits, the clinicians would focus more on recognizing abnormal regions or patterns of the brain associated with the disease. This is really important for convincing the clinical experts about the actions recommended from the predictive model, as well as for guiding appropriate interventions. In addition, as discussed above, the introduction of multimodal modeling leads to an increased challenge in making the models more interpretable. Attempts have been made to open the “black box” of DL 59 , 125 , 126 , 127 . Currently, there are two general directions for interpretable modeling: one is to involve the systematic modification of the input and the measure of any resulting changes in the output, as well as in the activation of the artificial neurons in the hidden layers. Such a strategy is usually used in CNN in identifying specific regions of an image being captured by a convolutional layer 128 . Another way is to derive tools to determine the contribution of one or more features of the input data to the output. In this case, the widely used tools include Shapley Additive Explanation 129 , LIME 127 , DeepLIFT 130 , etc., which are able to assign each feature an importance score for the specific prediction task.

Connection to therapeutic interventions

According to the studies reviewed, it is now possible to detect patients with mental illness based on different types of data. Compared with the traditional ML techniques, most of the reviewed DL models reported higher prediction accuracy. The findings suggested that the DL models are likely to assist clinicians in improved diagnosis of mental health conditions. However, to associate diagnosis of a condition with evidence-based interventions and treatment, including identification of appropriate medication 131 , prediction of treatment response 52 , and estimation of relapse risk 132 still remains a challenge. Among the reviewed studies, only one 52 proposed to target at addressing these issues. Thus, further efforts are needed to link the DL techniques with the therapeutic intervention of mental illness.

Domain knowledge

Another important direction is to incorporate domain knowledge. The existing biomedical knowledge bases are invaluable sources for solving healthcare problems 133 , 134 . Incorporating domain knowledge could address the limitation of data volume, problems of data quality, as well as model generalizability. For example, the unified medical language system 135 can help to identify medical entities from the text and gene–gene interaction databases 136 could help to identify meaningful patterns from genomic profiles.

Recent years have witnessed the increasing use of DL algorithms in healthcare and medicine. In this study, we reviewed existing studies on DL applications to study mental health outcomes. All the results available in the literature reviewed in this work illustrate the applicability and promise of DL in improving the diagnosis and treatment of patients with mental health conditions. Also, this review highlights multiple existing challenges in making DL algorithms clinically actionable for routine care, as well as promising future directions in this field.

World Health Organization. The World Health Report 2001: Mental Health: New Understanding, New Hope (World Health Organization, Switzerland, 2001).

Google Scholar  

Marcus, M., Yasamy, M. T., van Ommeren, M., Chisholm, D. & Saxena, S. Depression: A Global Public Health Concern (World Federation of Mental Health, World Health Organisation, Perth, 2012).

Hamilton, M. Development of a rating scale for primary depressive illness. Br. J. Soc. Clin. Psychol. 6 , 278–296 (1967).

CAS   PubMed   Google Scholar  

Dwyer, D. B., Falkai, P. & Koutsouleris, N. Machine learning approaches for clinical psychology and psychiatry. Annu. Rev. Clin. Psychol. 14 , 91–118 (2018).

PubMed   Google Scholar  

Lovejoy, C. A., Buch, V. & Maruthappu, M. Technology and mental health: the role of artificial intelligence. Eur. Psychiatry 55 , 1–3 (2019).

Wongkoblap, A., Vadillo, M. A. & Curcin, V. Researching mental health disorders in the era of social media: systematic review. J. Med. Internet Res. 19 , e228 (2017).

PubMed   PubMed Central   Google Scholar  

LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521 , 436 (2015).

Miotto, R., Wang, F., Wang, S., Jiang, X. & Dudley, J. T. Deep learning for healthcare: review, opportunities and challenges. Brief. Bioinformatics 19 , 1236–1246 (2017).

Durstewitz, D., Koppe, G. & Meyer-Lindenberg, A. Deep neural networks in psychiatry. Mol. Psychiatry 24 , 1583–1598 (2019).

Vieira, S., Pinaya, W. H. & Mechelli, A. Using deep learning to investigate the neuroimaging correlates of psychiatric and neurological disorders: methods and applications. Neurosci. Biobehav. Rev. 74 , 58–75 (2017).

Shatte, A. B., Hutchinson, D. M. & Teague, S. J. Machine learning in mental health: a scoping review of methods and applications. Psychol. Med. 49 , 1426–1448 (2019).

Murphy, K. P. Machine Learning: A Probabilistic Perspective (MIT Press, Cambridge, 2012).

Biship, C. M. Pattern Recognition and Machine Learning (Information Science and Statistics) (Springer-Verlag, Berlin, 2007).

Bengio, Y., Simard, P. & Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. Learn. Syst. 5 , 157–166 (1994).

CAS   Google Scholar  

LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86 , 2278–2324 (1998).

Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y. & Manzagol, P. A. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11 , 3371–3408 (2010).

Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Cogn. modeling. 5 , 1 (1988).

Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9 , 1735–1780 (1997).

Cho, K., Van Merriënboer, B., Bahdanau, D. & Bengio, Y. On the properties of neural machine translation: encoder-decoder approaches. In Proc . SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation 103–111 (Doha, Qatar, 2014).

Liou, C., Cheng, W., Liou, J. & Liou, D. Autoencoder for words. Neurocomputing 139 , 84–96 (2014).

Moher, D., Liberati, A., Tetzlaff, J. & Altman, D. G. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Ann. Intern. Med. 151 , 264–269 (2009).

Schnack, H. G. et al. Can structural MRI aid in clinical classification? A machine learning study in two independent samples of patients with schizophrenia, bipolar disorder and healthy subjects. Neuroimage 84 , 299–306 (2014).

O’Toole, A. J. et al. Theoretical, statistical, and practical perspectives on pattern-based classification approaches to the analysis of functional neuroimaging data. J. Cogn. Neurosci. 19 , 1735–1752 (2007).

Logothetis, N. K., Pauls, J., Augath, M., Trinath, T. & Oeltermann, A. Neurophysiological investigation of the basis of the fMRI signal. Nature 412 , 150 (2001).

Kuang, D. & He, L. Classification on ADHD with deep learning. In Proc . Int. Conference on Cloud Computing and Big Data 27–32 (Wuhan, China, 2014).

Kuang, D., Guo, X., An, X., Zhao, Y. & He, L. Discrimination of ADHD based on fMRI data with deep belief network. In Proc . Int. Conference on Intelligent Computing 225–232 (Taiyuan, China, 2014).

Farzi, S., Kianian, S. & Rastkhadive, I. Diagnosis of attention deficit hyperactivity disorder using deep belief network based on greedy approach. In Proc . 5th Int. Symposium on Computational and Business Intelligence 96–99 (Dubai, United Arab Emirates, 2017).

Zou, L., Zheng, J. & McKeown, M. J. Deep learning based automatic diagnoses of attention deficit hyperactive disorder. In Proc . 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP) 962–966 (Montreal, Canada, 2017).

Riaz A. et al. Deep fMRI: an end-to-end deep network for classification of fMRI data. In Proc . 2018 IEEE 15th Int. Symposium on Biomedical Imaging . 1419–1422 (Washington, DC, USA, 2018).

Zou, L., Zheng, J., Miao, C., Mckeown, M. J. & Wang, Z. J. 3D CNN based automatic diagnosis of attention deficit hyperactivity disorder using functional and structural MRI. IEEE Access. 5 , 23626–23636 (2017).

Sen, B., Borle, N. C., Greiner, R. & Brown, M. R. A general prediction model for the detection of ADHD and Autism using structural and functional MRI. PLoS ONE 13 , e0194856 (2018).

Zeng, L. et al. Multi-site diagnostic classification of schizophrenia using discriminant deep learning with functional connectivity MRI. EBioMedicine 30 , 74–85 (2018).

Pinaya, W. H. et al. Using deep belief network modelling to characterize differences in brain morphometry in schizophrenia. Sci. Rep. 6 , 38897 (2016).

CAS   PubMed   PubMed Central   Google Scholar  

Pinaya, W. H., Mechelli, A. & Sato, J. R. Using deep autoencoders to identify abnormal brain structural patterns in neuropsychiatric disorders: a large-scale multi-sample study. Hum. Brain Mapp. 40 , 944–954 (2019).

Ulloa, A., Plis, S., Erhardt, E. & Calhoun, V. Synthetic structural magnetic resonance image generator improves deep learning prediction of schizophrenia. In Proc . 25th IEEE Int. Workshop on Machine Learning for Signal Processing (MLSP) 1–6 (Boston, MA, USA, 2015).

Matsubara, T., Tashiro, T. & Uehara, K. Deep neural generative model of functional MRI images for psychiatric disorder diagnosis. IEEE Trans. Biomed. Eng . 99 (2019).

Geng, X. & Xu, J. Application of autoencoder in depression diagnosis. In 2017 3rd Int. Conference on Computer Science and Mechanical Automation (Wuhan, China, 2017).

Aghdam, M. A., Sharifi, A. & Pedram, M. M. Combination of rs-fMRI and sMRI data to discriminate autism spectrum disorders in young children using deep belief network. J. Digit. Imaging 31 , 895–903 (2018).

Shen, D., Wu, G. & Suk, H. -I. Deep learning in medical image analysis. Annu. Rev. Biomed. Eng. 19 , 221–248 (2017).

Yan, C. & Zang, Y. DPARSF: a MATLAB toolbox for “pipeline” data analysis of resting-state fMRI. Front. Syst. Neurosci. 4 , 13 (2010).

Esteban, O. et al. fMRIPrep: a robust preprocessing pipeline for functional MRI. Nat. Methods 16 , 111–116 (2019).

Herrmann, C. & Demiralp, T. Human EEG gamma oscillations in neuropsychiatric disorders. Clin. Neurophysiol. 116 , 2719–2733 (2005).

Acharya, U. R. et al. Automated EEG-based screening of depression using deep convolutional neural network. Comput. Meth. Prog. Biol. 161 , 103–113 (2018).

Mohan, Y., Chee, S. S., Xin, D. K. P. & Foong, L. P. Artificial neural network for classification of depressive and normal. In EEG Proc . 2016 IEEE EMBS Conference on Biomedical Engineering and Sciences 286–290 (Kuala Lumpur, Malaysia, 2016).

Zhang, P., Wang, X., Zhang, W. & Chen, J. Learning spatial–spectral–temporal EEG features with recurrent 3D convolutional neural networks for cross-task mental workload assessment. IEEE Trans. Neural Syst. Rehabil. Eng. 27 , 31–42 (2018).

Li, X. et al. EEG-based mild depression recognition using convolutional neural network. Med. Biol. Eng. Comput . 47 , 1341–1352 (2019).

Patel, S., Park, H., Bonato, P., Chan, L. & Rodgers, M. A review of wearable sensors and systems with application in rehabilitation. J. Neuroeng. Rehabil. 9 , 21 (2012).

Smoller, J. W. The use of electronic health records for psychiatric phenotyping and genomics. Am. J. Med. Genet. B Neuropsychiatr. Genet. 177 , 601–612 (2018).

Wu, J., Roy, J. & Stewart, W. F. Prediction modeling using EHR data: challenges, strategies, and a comparison of machine learning approaches. Med. Care. 48 , S106–S113 (2010).

Choi, S. B., Lee, W., Yoon, J. H., Won, J. U. & Kim, D. W. Ten-year prediction of suicide death using Cox regression and machine learning in a nationwide retrospective cohort study in South Korea. J. Affect. Disord. 231 , 8–14 (2018).

Pham, T., Tran, T., Phung, D. & Venkatesh, S. Predicting healthcare trajectories from medical records: a deep learning approach. J. Biomed. Inform. 69 , 218–229 (2017).

Lin, E. et al. A deep learning approach for predicting antidepressant response in major depression using clinical and genetic biomarkers. Front. Psychiatry 9 , 290 (2018).

Geraci, J. et al. Applying deep neural networks to unstructured text notes in electronic medical records for phenotyping youth depression. Evid. Based Ment. Health 20 , 83–87 (2017).

Kim, Y. Convolutional neural networks for sentence classification. arXiv Prepr. arXiv 1408 , 5882 (2014).

Yang, Z. et al. Hierarchical attention networks for document classification. In Proc . 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 1480–1489 (San Diego, California, USA, 2016).

Rios, A. & Kavuluru, R. Ordinal convolutional neural networks for predicting RDoC positive valence psychiatric symptom severity scores. J. Biomed. Inform. 75 , S85–S93 (2017).

Dai, H. & Jonnagaddala, J. Assessing the severity of positive valence symptoms in initial psychiatric evaluation records: Should we use convolutional neural networks? PLoS ONE 13 , e0204493 (2018).

Tran, T. & Kavuluru, R. Predicting mental conditions based on “history of present illness” in psychiatric notes with deep neural networks. J. Biomed. Inform. 75 , S138–S148 (2017).

Samek, W., Binder, A., Montavon, G., Lapuschkin, S. & Müller, K.-R. Evaluating the visualization of what a deep neural network has learned. IEEE Trans. Neural Netw. Learn. Syst. 28 , 2660–2673 (2016).

Hripcsak, G. et al. Characterizing treatment pathways at scale using the OHDSI network. Proc. Natl. Acad. Sci . USA 113 , 7329–7336 (2016).

McGuffin, P., Owen, M. J. & Gottesman, I. I. Psychiatric Genetics and Genomics (Oxford Univ. Press, New York, 2004).

Levinson, D. F. The genetics of depression: a review. Biol. Psychiatry 60 , 84–92 (2006).

Wray, N. R. et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat. Genet. 50 , 668 (2018).

Mullins, N. & Lewis, C. M. Genetics of depression: progress at last. Curr. Psychiatry Rep. 19 , 43 (2017).

Zou, J. et al. A primer on deep learning in genomics. Nat. Genet. 51 , 12–18 (2019).

Yue, T. & Wang, H. Deep learning for genomics: a concise overview. Preprint at arXiv:1802.00810 (2018).

Khan, A. & Wang, K. A deep learning based scoring system for prioritizing susceptibility variants for mental disorders. In Proc . 2017 IEEE Int. Conference on Bioinformatics and Biomedicine (BIBM) 1698–1705 (Kansas City, USA, 2017).

Khan, A., Liu, Q. & Wang, K. iMEGES: integrated mental-disorder genome score by deep neural network for prioritizing the susceptibility genes for mental disorders in personal genomes. BMC Bioinformatics 19 , 501 (2018).

Wang, D. et al. Comprehensive functional genomic resource and integrative model for the human brain. Science 362 , eaat8464 (2018).

Salakhutdinov, R. & Hinton, G. Deep Boltzmann machines. In Proc . 12th Int. Conference on Artificial Intelligence and Statistics 448–455 (Clearwater, Florida, USA, 2009).

Laksshman, S., Bhat, R. R., Viswanath, V. & Li, X. DeepBipolar: Identifying genomic mutations for bipolar disorder via deep learning. Hum. Mutat. 38 , 1217–1224 (2017).

CAS   PubMed Central   Google Scholar  

Huang, K.-Y. et al. Data collection of elicited facial expressions and speech responses for mood disorder detection. In Proc . 2015 Int. Conference on Orange Technologies (ICOT) 42–45 (Hong Kong, China, 2015).

Valstar, M. et al. AVEC 2013: the continuous audio/visual emotion and depression recognition challenge. In Proc . 3rd ACM Int. Workshop on Audio/Visual Emotion Challenge 3–10 (Barcelona, Spain, 2013).

Valstar, M. et al. Avec 2014: 3d dimensional affect and depression recognition challenge. In Proc . 4th Int. Workshop on Audio/Visual Emotion Challenge 3–10 (Orlando, Florida, USA, 2014).

Valstar, M. et al. Avec 2016: depression, mood, and emotion recognition workshop and challenge. In Proc . 6th Int. Workshop on Audio/Visual Emotion Challenge 3–10 (Amsterdam, The Netherlands, 2016).

Ma, X., Yang, H., Chen, Q., Huang, D. & Wang, Y. Depaudionet: an efficient deep model for audio based depression classification. In Proc . 6th Int. Workshop on Audio/Visual Emotion Challenge 35–42 (Amsterdam, The Netherlands, 2016).

He, L. & Cao, C. Automated depression analysis using convolutional neural networks from speech. J. Biomed. Inform. 83 , 103–111 (2018).

Li, J., Fu, X., Shao, Z. & Shang, Y. Improvement on speech depression recognition based on deep networks. In Proc . 2018 Chinese Automation Congress (CAC) 2705–2709 (Xi’an, China, 2018).

Yang, L., Jiang, D., Han, W. & Sahli, H. DCNN and DNN based multi-modal depression recognition. In Proc . 2017 7th Int. Conference on Affective Computing and Intelligent Interaction 484–489 (San Antonio, Texas, USA, 2017).

Huang, K. Y., Wu, C. H. & Su, M. H. Attention-based convolutional neural network and long short-term memory for short-term detection of mood disorders based on elicited speech responses. Pattern Recogn. 88 , 668–678 (2019).

Dawood, A., Turner, S. & Perepa, P. Affective computational model to extract natural affective states of students with Asperger syndrome (AS) in computer-based learning environment. IEEE Access. 6 , 67026–67034 (2018).

Song, S., Shen, L. & Valstar, M. Human behaviour-based automatic depression analysis using hand-crafted statistics and deep learned spectral features. In Proc . 13th IEEE Int. Conference on Automatic Face & Gesture Recognition 158–165 (Xi’an, China, 2018).

Zhu, Y., Shang, Y., Shao, Z. & Guo, G. Automated depression diagnosis based on deep networks to encode facial appearance and dynamics. IEEE Trans. Affect. Comput. 9 , 578–584 (2018).

Chao, L., Tao, J., Yang, M. & Li, Y. Multi task sequence learning for depression scale prediction from video. In Proc . 2015 Int. Conference on Affective Computing and Intelligent Interaction (ACII) 526–531 (Xi’an, China, 2015).

Yang, T. H., Wu, C. H., Huang, K. Y. & Su, M. H. Detection of mood disorder using speech emotion profiles and LSTM. In Proc . 10th Int. Symposium on Chinese Spoken Language Processing (ISCSLP) 1–5 (Tianjin, China, 2016).

Huang, K. Y., Wu, C. H., Su, M. H. & Chou, C. H. Mood disorder identification using deep bottleneck features of elicited speech. In Proc . 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 1648–1652 (Kuala Lumpur, Malaysia, 2017).

Jan, A., Meng, H., Gaus, Y. F. B. A. & Zhang, F. Artificial intelligent system for automatic depression level analysis through visual and vocal expressions. IEEE Trans. Cogn. Dev. Syst. 10 , 668–680 (2017).

Su, M. H., Wu, C. H., Huang, K. Y. & Yang, T. H. Cell-coupled long short-term memory with l-skip fusion mechanism for mood disorder detection through elicited audiovisual features. IEEE Trans. Neural Netw. Learn. Syst . 31 (2019).

Harati, S., Crowell, A., Mayberg, H. & Nemati, S. Depression severity classification from speech emotion. In Proc . 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 5763–5766 (Honolulu, HI, USA, 2018).

Su, M. H., Wu, C. H., Huang, K. Y., Hong, Q. B. & Wang, H. M. Exploring microscopic fluctuation of facial expression for mood disorder classification. In Proc . 2017 Int. Conference on Orange Technologies (ICOT) 65–69 (Singapore, 2017).

Prasetio, B. H., Tamura, H. & Tanno, K. The facial stress recognition based on multi-histogram features and convolutional neural network. In Proc . 2018 IEEE Int. Conference on Systems, Man, and Cybernetics (SMC) 881–887 (Miyazaki, Japan, 2018).

Jaiswal, S., Valstar, M. F., Gillott, A. & Daley, D. Automatic detection of ADHD and ASD from expressive behaviour in RGBD data. In Proc . 12th IEEE Int. Conference on Automatic Face & Gesture Recognition 762–769 (Washington, DC, USA, 2017).

Cho, Y., Bianchi-Berthouze, N. & Julier, S. J. DeepBreath: deep learning of breathing patterns for automatic stress recognition using low-cost thermal imaging in unconstrained settings. In Proc . 2017 7th Int. Conference on Affective Computing and Intelligent Interaction (ACII) 456–463 (San Antonio, Texas, USA, 2017).

Gupta, R., Sahu, S., Espy-Wilson, C. Y. & Narayanan, S. S. An affect prediction approach through depression severity parameter incorporation in neural networks. In Proc . 2017 Int. Conference on INTERSPEECH 3122–3126 (Stockholm, Sweden, 2017).

Martin, O., Kotsia, I., Macq, B. & Pitas, I. The eNTERFACE'05 audio-visual emotion database. In Proc . 22nd Int. Conference on Data Engineering Workshops 8–8 (Atlanta, GA, USA, 2006).

Goodfellow, I. J. et al. Challenges in representation learning: A report on three machine learning contests. In Proc . Int. Conference on Neural Information Processing 117–124 (Daegu, Korea, 2013).

Yi, D., Lei, Z., Liao, S. & Li, S. Z.. Learning face representation from scratch. Preprint at arXiv 1411.7923 (2014).

Lin, H. et al. User-level psychological stress detection from social media using deep neural network. In Proc . 22nd ACM Int. Conference on Multimedia 507–516 (Orlando, Florida, USA, 2014).

Lin, H. et al. Psychological stress detection from cross-media microblog data using deep sparse neural network. In Proc . 2014 IEEE Int. Conference on Multimedia and Expo 1–6 (Chengdu, China, 2014).

Li, Q. et al. Correlating stressor events for social network based adolescent stress prediction. In Proc . Int. Conference on Database Systems for Advanced Applications 642–658 (Suzhou, China, 2017).

Lin, H. et al. Detecting stress based on social interactions in social networks. IEEE Trans. Knowl. Data En. 29 , 1820–1833 (2017).

Cong, Q. et al. X-A-BiLSTM: a deep learning approach for depression detection in imbalanced data. In Proc . 2018 IEEE Int. Conference on Bioinformatics and Biomedicine (BIBM) 1624–1627 (Madrid, Spain, 2018).

Ive, J., Gkotsis, G., Dutta, R., Stewart, R. & Velupillai, S. Hierarchical neural model with attention mechanisms for the classification of social media text related to mental health. In Proc . Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic 69–77 (New Orleans, Los Angeles, USA, 2018).

Sadeque, F., Xu, D. & Bethard, S. UArizona at the CLEF eRisk 2017 pilot task: linear and recurrent models for early depression detection. CEUR Workshop Proc . 1866 (2017).

Fraga, B. S., da Silva, A. P. C. & Murai, F. Online social networks in health care: a study of mental disorders on Reddit. In Proc . 2018 IEEE/WIC/ACM Int. Conference on Web Intelligence (WI) 568–573 (Santiago, Chile, 2018).

Gkotsis, G. et al. Characterisation of mental health conditions in social media using Informed Deep Learning. Sci. Rep. 7 , 45141 (2017).

Coppersmith, G., Leary, R., Crutchley, P. & Fine, A. Natural language processing of social media as screening for suicide risk. Biomed. Inform. Insights 10 , 1178222618792860 (2018).

Du, J. et al. Extracting psychiatric stressors for suicide from social media using deep learning. BMC Med. Inform. Decis. Mak. 18 , 43 (2018).

Alambo, A. et al. Question answering for suicide risk assessment using Reddit. In Proc . IEEE 13th Int. Conference on Semantic Computing 468–473 (Newport Beach, California, USA, 2019).

Eichstaedt, J. C. et al. Facebook language predicts depression in medical records. Proc. Natl Acad. Sci. USA 115 , 11203–11208 (2018).

Rosenquist, J. N., Fowler, J. H. & Christakis, N. A. Social network determinants of depression. Mol. Psychiatry 16 , 273 (2011).

Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. In Proc. 2017 Int. Conference on Learning Representations (Toulon, France, 2017).

Rice, S. M. et al. Online and social networking interventions for the treatment of depression in young people: a systematic review. J. Med. Internet Res. 16 , e206 (2014).

Hastie, T., Tibshirani, R. & Friedman, J. The elements of statistical learning: data mining, inference, and prediction. Springer Series in Statistics. Math. Intell. 27 , 83–85 (2009).

Torrey, L. & Shavlik, J. in Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques 242–264 (IGI Global, 2010).

Yosinski, J., Clune, J., Bengio, Y. & Lipson, H. How transferable are features in deep neural networks? In Proc . Advances in Neural Information Processing Systems 3320–3328 (Montreal, Canada, 2014).

Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542 , 115 (2017).

Insel, T. et al. Research domain criteria (RDoC): toward a new classification framework for research on mental disorders. Am. Psychiatr. Assoc. 167 , 748–751 (2010).

Nelson, B., McGorry, P. D., Wichers, M., Wigman, J. T. & Hartmann, J. A. Moving from static to dynamic models of the onset of mental disorder: a review. JAMA Psychiatry 74 , 528–534 (2017).

Guo, X., Liu, X., Zhu, E. & Yin, J. Deep clustering with convolutional autoencoders. In Proc . Int. Conference on Neural Information Processing 373–382 (Guangzhou, China, 2017).

Srivastava, N., Mansimov, E. & Salakhudinov, R. Unsupervised learning of video representations using LSTMs. In Proc . Int. Conference on Machine Learning 843–852 (Lille, France, 2015).

Baytas, I. M. et al. Patient subtyping via time-aware LSTM networks. In Proc . 23rd ACM SIGKDD Int. Conference on Knowledge Discovery and Data Mining 65–74 (Halifax, Canada, 2017).

American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders (DSM-5®) (American Psychiatric Pub, Washington, DC, 2013).

Biological Sciences Curriculum Study. In: NIH Curriculum Supplement Series (Internet) (National Institutes of Health, USA, 2007).

Noh, H., Hong, S. & Han, B. Learning deconvolution network for semantic segmentation. In Proc . IEEE Int. Conference on Computer Vision 1520–1528 (Santiago, Chile, 2015).

Grün, F., Rupprecht, C., Navab, N. & Tombari, F. A taxonomy and library for visualizing learned features in convolutional neural networks. In Proc. 33rd Int. Conference on Machine Learning (ICML) Workshop on Visualization for Deep Learning (New York, USA, 2016).

Ribeiro, M. T., Singh, S. & Guestrin, C. Why should I trust you?: Explaining the predictions of any classifier. In Proc . 22nd ACM SIGKDD Int. Conference on Knowledge Discovery and Data Mining 1135–1144 (San Francisco, CA, 2016).

Zhang, Q. S. & Zhu, S. C. Visual interpretability for deep learning: a survey. Front. Inf. Technol. Electron. Eng. 19 , 27–39 (2018).

Lundberg, S. M. & Lee, S. I. A unified approach to interpreting model predictions. In Proc . 31st Conference on Neural Information Processing Systems 4765–4774 (Long Beach, CA, 2017).

Shrikumar, A., Greenside, P., Shcherbina, A. & Kundaje, A. Not just a black box: learning important features through propagating activation differences. In Proc . 33rd Int. Conference on Machine Learning (New York, NY, 2016).

Gawehn, E., Hiss, J. A. & Schneider, G. Deep learning in drug discovery. Mol. Inform. 35 , 3–14 (2016).

Jerez-Aragonés, J. M., Gómez-Ruiz, J. A., Ramos-Jiménez, G., Muñoz-Pérez, J. & Alba-Conejo, E. A combined neural network and decision trees model for prognosis of breast cancer relapse. Artif. Intell. Med. 27 , 45–63 (2003).

Zhu, Y., Elemento, O., Pathak, J. & Wang, F. Drug knowledge bases and their applications in biomedical informatics research. Brief. Bioinformatics 20 , 1308–1321 (2018).

Su, C., Tong, J., Zhu, Y., Cui, P. & Wang, F. Network embedding in biomedical data science. Brief. Bioinform . https://doi.org/10.1093/bib/bby117 (2018).

Bodenreider, O. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32 (suppl_1), D267–D270 (2004).

Szklarczyk, D. et al. STRING v10: protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43 , D447–D452 (2014).

Download references

Acknowledgements

The work is supported by NSF 1750326, R01 MH112148, R01 MH105384, R01 MH119177, R01 MH121922, and P50 MH113838.

Author information

Authors and affiliations.

Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, NY, USA

Chang Su, Zhenxing Xu, Jyotishman Pathak & Fei Wang

You can also search for this author in PubMed   Google Scholar

Contributions

C.S., Z.X. and F.W. planned and structured the whole paper. C.S. and Z.X. conducted the literature review and drafted the manuscript. J.P. and F.W. reviewed and edited the manuscript.

Corresponding author

Correspondence to Fei Wang .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental material, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Su, C., Xu, Z., Pathak, J. et al. Deep learning in mental health outcome research: a scoping review. Transl Psychiatry 10 , 116 (2020). https://doi.org/10.1038/s41398-020-0780-3

Download citation

Received : 31 August 2019

Revised : 17 February 2020

Accepted : 26 February 2020

Published : 22 April 2020

DOI : https://doi.org/10.1038/s41398-020-0780-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

mental health prediction using machine learning research paper

  • Brief report
  • Open access
  • Published: 03 January 2023

Single classifier vs. ensemble machine learning approaches for mental health prediction

  • Jetli Chung 1 &
  • Jason Teo   ORCID: orcid.org/0000-0003-2415-5915 2 , 3  

Brain Informatics volume  10 , Article number:  1 ( 2023 ) Cite this article

4588 Accesses

17 Citations

Metrics details

Early prediction of mental health issues among individuals is paramount for early diagnosis and treatment by mental health professionals. One of the promising approaches to achieving fully automated computer-based approaches for predicting mental health problems is via machine learning. As such, this study aims to empirically evaluate several popular machine learning algorithms in classifying and predicting mental health problems based on a given data set, both from a single classifier approach as well as an ensemble machine learning approach. The data set contains responses to a survey questionnaire that was conducted by Open Sourcing Mental Illness (OSMI). Machine learning algorithms investigated in this study include Logistic Regression, Gradient Boosting, Neural Networks, K-Nearest Neighbours, and Support Vector Machine, as well as an ensemble approach using these algorithms. Comparisons were also made against more recent machine learning approaches, namely Extreme Gradient Boosting and Deep Neural Networks. Overall, Gradient Boosting achieved the highest overall accuracy of 88.80% followed by Neural Networks with 88.00%. This was followed by Extreme Gradient Boosting and Deep Neural Networks at 87.20% and 86.40%, respectively. The ensemble classifier achieved 85.60% while the remaining classifiers achieved between 82.40 and 84.00%. The findings indicate that Gradient Boosting provided the highest classification accuracy for this particular mental health bi-classification prediction task. In general, it was also demonstrated that the prediction results produced by all of the machine learning approaches studied here were able to achieve more than 80% accuracy, thereby indicating a highly promising approach for mental health professionals toward automated clinical diagnosis.

1 Introduction

Mental illness is a health problem that significantly affects how a person feels, thinks, behaves, and interacts with other people. Mental illnesses are of different types and degrees of severity. Some of the major types are depression, anxiety, schizophrenia, bipolar mood disorder and personality disorders. Nowadays, advances in scientific and medical fields have produced very effective medical treatments and technology has also made it possible to predict illnesses in their very early stages.

Machine learning is a technique that aims to construct systems that can improve through experience by using advanced statistical and probabilistic techniques. It is believed to be a significantly useful tool to assist in predicting mental health. Generally, there are various machine learning techniques and research that are still ongoing to generate optimal results. Although it is worth noting that there is no single learning algorithm that universally performs best across all domains, it is still pertinent to identify which class of algorithms can perform best for a particular task environment [ 1 ].

In a recent study, a machine learning algorithm was developed to predict clinical remission from a 12-week course of citalopram by Chekroud et al. [ 2 ]. The data set is collected from 1949 patients that experienced a depression of level 1. 25 variables from the data set were selected to improve the prediction outcome. Then, the gradient boosting method was deployed for the prediction because of its characteristics that combine the weak predictive models when built. The accuracy of 64.6% was obtained by using the gradient boosting method.

Based on the research paper that was conducted by Sumathi and Poorna, the authors have predicted mental health problems among children through various machine learning techniques [ 3 ]. The mental health problems that always occur among children are attention problems, academic problems, anxiety, attention deficit hyperactivity disorder and pervasive developmental disorder. The data set is obtained from a clinical psychologist and contains 60 instances in the text document format. Several features and attributes have been selected for the classification and prediction of mental health problems. Several machine learning techniques have been applied for prediction and accuracy. In the experiment, the machine learning technique called Average One-Dependence Estimator (AODE) has recorded 71% in the accuracy. Meanwhile, Neural Networks show the highest accuracy which is 78%. Next is the Logical Analysis Tree (LAT) is recorded accuracy at 70% meanwhile, the multi-class classifier is at 58% accuracy. Another machine learning technique called Radial Basis Function Network (RBFN) has recorded the accuracy at 57%. Furthermore, both K-star and Functional Tree (FT) have been recorded at 42% accuracy. From the experiments, neural networks can perform the best among the algorithms.

A related study was conducted to predict another mental illness which is post-traumatic stress disorder (PTSD), using a support vector machine by Galatzer-Levy et al. [ 4 ]. The data set was made up of longitudinal data with 152 subjects gathered during emergency room admission consequent to a traumatic incident. PTSD symptoms were identified by using the latent growth mixture modelling. Then, the result obtained was used for the prediction of PTSD by using a support vector machine. After applying the support vector machine via MATrix LABoratory (MATLAB), the accuracy was shown to be 64.0%.

A research paper by Sau and Bhakta in 2019 shows the prediction of depression and anxiety among seafarers [ 5 ]. Seafarers are easily exposed to mental health problems which typically are depression and anxiety. Hence, machine learning technology has been useful in predicting and diagnosing them for early treatments. The authors obtained a data set of 470 seafarers through interviews. Five classifiers which are Categorical Boosting (CatBoost), Random Forest, Logistic Regression, Naive Bayes and Support Vector Machine were chosen on the training data set with tenfold cross-validation. To determine the strength of the machine learning algorithms, the data set with 56 instances are deployed on the trained model. From the result, CatBoost, which is a boosting algorithm performs the best on this training data set with tenfold cross-validation. For the test data set, the CatBoost algorithm has outperformed the other machine learning algorithms with a predictive accuracy of 89.3% and precision of 89.0%. Meanwhile, logistic regression has performed very well with a predictive accuracy of 87.5% and precision of 84.0%.

The study carried out by Resom et al. showed that mental health problems can be predicted by machine learning with audio features [ 6 ]. From the results obtained, predictions from text extraction show that XGBoost which is a boosting algorithm is the best performer with a score of 50%. Next, the K-Nearest Neighbours score is 49% in the mean F1 score. Gaussian Processes and Logistic Regression perform reasonably well and can record 48%. Random Forest recorded the mean of the F1 score at 44% followed by the Neural Networks score at 42%. Support Vector Machine scored the lowest mean of F1 score of 49%.

In Young et al., they utilized network analysis and machine learning approaches to identify 48 schizophrenia patients and 24 health controls [ 7 ]. The network properties were estimated from the graphs that were rebuilt using probabilistic brain tractography. Subsequently, machine learning was applied to label schizophrenia patients and healthy controls. The performance of the machine learning models was then analysed and evaluated. Based on the results, the highest accuracy was achieved by the Random Forest model with an accuracy of 68.6%, followed by the Multinomial Naive Bayes with an accuracy of 66.9%. Extreme Gradient Boosting (XGBoost) produced an accuracy of 66.3% while Support Vector Machine produced an accuracy of 58.2%. Most of the machine learning models showed encouraging levels of performance in classifying schizophrenia patients and healthy controls.

A recent study was reported by Tate et al. in 2020 in investigating machine learning approaches to predict the mental health problems in children [ 8 ]. The authors used 474 predictors extracted from parental reports and registration data. The performance of the model was tested with the area under the receiver operating characteristic curve (AUC). From the obtained results, the authors showed that Random Forest and Support Vector Machine achieved the highest AUC score of 0.754. Other machine learning models tested such as Logistic Regression, XGBoost and Neural Network achieved similar scores of AUC above 0.700.

A screening tool known as EarlyDetect was used by Liu et al. in 2021 to examine mental illness, which is bipolar disorder, in mental health centres by applying machine learning approaches [ 9 ]. The data set contains 955 participants that have completed self-report clinical questionnaires and interviews. From the study, the authors managed to obtain an accuracy of 80.6% with a sensitivity of 73.7% and specificity of 87.5% by using the screening tool. Then, they managed to improve the accuracy by 6.9%, and sensitivity by 14.5%, while maintaining specificity by using the fully combined EarlyDetect model.

Previous studies and research surmised that prediction at an early stage can help to prevent and treat illnesses before becoming chronic and lead to significantly more serious issues. Therefore, this paper aims to apply machine algorithms to classify and predict mental health problems from the data set of questionnaires to generate a valuable result to this end. Machine learning algorithms such as Logistic Regression, Gradient Boosting and Neural Networks are empirically and systematically tested to predict and classify mental health problems. Furthermore, machine learning such as K-Nearest Neighbours, Support Vector Machine and ensemble approaches are being introduced to compare with the suggested machine learning algorithms. The majority voting classifier approach has been used for the ensemble approach in this paper. For further exploration, Deep Neural Networks and Extreme Gradient Boosting were also tested and compared in this comparative study. Finally, the performances of the machine learning algorithms will be then analyzed and the major findings summarized. Hopefully, this paper contributes a systematic and comprehensive research analysis for the practitioner to obtain valuable information regarding the mental health field in helping to determine the clinical diagnosis effectively. For instance, the practitioner will be able to gain insights into the performance of machine learning and apply it to a predictable clinical system which could help to determine mental health accurately and precisely.

In this paper, the sections are organized as follows. The Sect. 1 contains the general overview of this research paper and a summary of the related past studies. Next, the Sect. 2 will present the techniques and procedures used to conduct the experiments. The Sect. 3 will discuss the outcome of the conducted experiments. Meanwhile, the Sect. 4 will further explore and analysis about the result of the experiments. Lastly, a Sect. 5 for Conclusion and future work is presented to summarize the research.

The open data set “OSMI Mental Health in Tech Survey” is obtained through an online survey conducted by professional experts of Open Sourcing Mental Illness (OSMI) in 2014 [ 10 ]. The survey consists of various questions regarding the respondents’ mental health and their opinion on mental health. The original raw data can be accessed on the OSMI website, and the data were covered by a Creative Commons Attribution License which allows free adapting and sharing of the survey results. This survey is aimed to measure the respondents’ attitudes toward mental health in their workplace in the tech industry.

In general, the data set for this project contains many missing, inconsistent and unnecessary values. Hence, data cleaning is necessary to make it suitable for machine learning models to process the data. In this context, the column of comments, state and timestamp and country are removed from the data due to unnecessary values. Then, the columns of the data set are renamed into a short and straightforward label name. It is noticed that the data set contains unique and excessive values, especially in the columns such as gender, self-employed and work interfere. For the gender part, unrelated answers are removed and the genders are categorized into three parts: male, female and others. After that, the missing values in the column of self-employed have been replaced with of “No” answer. Meanwhile, the values in the column of the work interfere that are missing have been replaced with the answer to “Don’t know”.

The next step is transforming the data set into an understandable and readable format for the machine learning models. The function of the label encoder has been applied and encoding the data set into suitable data and features. Besides that, it is found there are no missing values or data after performing the testing. The data set is ready to be used in the application of machine learning models. After performing the feature selection by applying the Extra Trees Classifier to reduce the chance of over-fitting, the features of age, gender, family history, benefits, care options, anonymity, leave and work interference have been selected for the training of the machine learning models. In this case, the family history determines whether the respondents have a family with mental illness. The next category is benefits to show whether the respondents’ employer provided them mental health benefits. The care options feature is asking the respondent whether they know options for the mental health care provided by their employer. The feature of anonymity, in this case, presents the awareness of the respondents whether their companies or employer will protect their privacy and can be trusted if knowing their mental health status. Another feature is called leave which determines the difficulty for the respondents to ask for medical leave for their mental health conditions. Meanwhile, work interference determines that mental health could interfere with their work. The selected features consist of categorical values except for the age which is a numerical value.

This research is conducted to classify and predict binary mental health problems where no is 0, and yes is 1. The training data set containing the value of components is used to determine the suitable class based on the predictor. The predictor variables in this project are family history, care options, gender, age and others. The target variable which is known as treatment has been selected for the training data set to predict mental health problems. In this project, a preliminary experiment is first conducted using a 70–30 splitting where 70% will be used as the training data set, while the remaining 30% will be used as the testing data set.

The performance evaluation is prepared based on the experimental results obtained. Hence, the calculation of the accuracy, sensitivity, specificity and precision of the result in this research will be obtained based on the confusion matrix. The performance of machine learning algorithms is being compared based on the obtained accuracy to determine the best machine learning algorithm for classification and prediction of mental health problems.

Other than that, the full comparative experiment is conducted by introducing repeated k-fold cross-validation. It is noticed that a single run of the k-fold cross-validation may generate a noisy estimation for the performance of the algorithms. Hence, repeated k-fold cross-validation is introduced to improve the performance of the algorithms due to noisy estimation and reduce the variability linked with a single run of the k-fold cross-validation [ 11 ]. Moreover, some researchers mentioned that repeating the k-fold cross-validation will help to stabilize the variability of accuracy estimates [ 11 , 12 ]. Previous studies suggested that a higher number of repeats in cross-validation would be recommended to stabilize the model selection process when being compared to a lower number of repeats [ 13 , 14 ]. In this experiment, the number of splits value has been set to tenfold as it is the default and popular value among studies [ 15 , 16 ]. Each tenfold run is repeated 14 times to ensure minimization of stochasticity in the results.

Generally, machine learning models such as Logistic Regression, Gradient Boosting and Neural Networks have been included because they are commonly used to classify data in the medical field. Additionally, K-Nearest Neighbours and Support Vector Machine have been included in this experiment for comparison in the performances. Finally, the Voting Classifier have been included as a representative ensemble approach as well as Deep Neural Networks and Extreme Gradient Boosting representing the more recent machine learning approaches.

Moreover, Gradient Boosting and Neural Networks have been selected for the additional parameter setting. In this case, the parameter for both machine learning models has been tuned to improve the performance in terms of accuracy. The estimators in the Gradient Boosting algorithm have been replaced with the value of 1000. Besides, the learning rate of the algorithm is set to the value of 0.0001. Next, the max depth refers to the maximum depth of the tree that has been changed to the value of 17. The minimum number of samples required to split an internal node in this setting is introduced and set to the value of 10. The minimum number of samples required to be at a leaf node has been set to 5. Meanwhile, the sub-sample is the fraction of samples for fitting the individual base learners and has been set to a value of 0.5. The loss in this setting refers to the loss function to be optimized and has been applied with exponential. For the additional parameter setting of the Neural Networks algorithm, several changes have been performed. For instance, the hidden layer sizes for this setting have been changed to 30, 50, and 13. Next, the learning rate has been set to a constant value.

Initially, there are six machine learning models were tested in this study, which are Logistic Regression, Gradient Boosting, Neural Networks, K-Nearest Neighbours, Support Vector Machine and Voting Classifier. The results obtained from the first set of experiments with the default settings are summarized and presented below. Subsequently, Deep Neural Networks (DNN) and Extreme Gradient Boosting (XGBoost) have been added for further exploration and comparison against the initial set of classifiers.

Table 1 presents the summary of the performance evaluation for the machine learning algorithms in terms of accuracy, precision, sensitivity and specify [ 17 ] during the preliminary experiment with the initial setting.

In this case, accuracy has been used and defined as the sum of true positives and true negatives divided by the total number of predictions. Another metric that is used for performance evaluation is precision which is referred to as the success probability of making a correct positive class classification and computed as the number of true positives divided by the total number of true positives and true negatives. In addition, a sensitivity which is known as the recall is labelled as the percentage of the true positive cases that are correctly classified, thus showing how well the algorithms classified the positive cases. Meanwhile, specificity is defined as the true negative cases that are classified as negative to measure how well the algorithms are in classifying the negative cases.

From the result obtained, Voting Classifier obtains the highest accuracy with a score of 81.75%. It is discovered that Gradient Boosting and K-Nearest Neighbours can achieve the same value of accuracy which is 81.22%. Then, the Support Vector Machine obtains accuracy lower than Gradient Boosting and K-Nearest Neighbours with a percentage of 80.69, followed by Logistic Regression with a score of 79.63%. Neural Networks achieve the lowest accuracy with a percentage of 78.57%.

Moreover, the highest percentage of precision is recorded by K-Nearest Neighbours with a percentage of 78.43%, followed by Logistic Regression with a percentage of 76.19%. Gradient Boosting obtains a higher percentage of precision with a score of 76.13% compared to Voting Classifier and Support Vector Machine recording the precision score of 74.15% and 75.87%, respectively. Next, Neural Networks score the lowest precision with a percentage of 73.45%.

In terms of sensitivity, it is noted that the Support Vector Machine records the highest percentage with a score of 93.58% compared to Voting Classifier with a score of 92.51%. Neural Networks record a lower percentage of 88.77% than Gradient Boosting in the sensitivity score which is 90.37%. On the other hand, Logistic Regression and K-Nearest Neighbours present the lowest and the same percentage of the score in the sensitivity which is 85.56%.

For the specificity, K-Nearest Neighbours obtain the highest percentage score which is 76.96%. Logistic Regression can record a higher specificity score with a percentage of 73.82% than Gradient Boosting which can obtain a percentage of 72.25%. Next, Voting Classifier managed to achieve a percentage of 71.20% in the specificity score. Meanwhile, Neural Networks and Support Vector Machine show a lower percentage compared to the other algorithms which are 68.59% and 68.06%, respectively.

The results will be displayed and examined for the final experiment conducted in this paper. In this experiment, the repeated k-fold cross-validation has been applied in the classification to obtain the average classification. In addition, Gradient Boosting and Neural Networks with additional parameter tuning have been implemented in the classification to improve the accuracy of the prediction.

Table 2 presents the final experiment of machine learning algorithms tested with repeated tenfold cross-validation, the highest accuracy is achieved by the Gradient Boosting algorithm with the additional parameter tuning with a percentage of 88.80% in the final experiment. Next, the Neural Networks algorithm with the additional parameter tuning scores the accuracy with a percentage of 88.00%, followed by the Voting Classifier with a score of 85.60%. The Logistic Regression and K-Nearest Neighbours obtain the same value of accuracy with a percentage of 84.00%. Meanwhile, the Support Vector Machine achieves 82.40% of accuracy which is the lowest accuracy in the final experiment.

In terms of precision, the K-Nearest Neighbours algorithm achieves the highest percentage with a value of 84.85%. Support Vector Machine algorithm obtains a lower precision score than K-Nearest Neighbours with a value of 84.38%. Moreover, the additional parameter tuning for the Gradient Boosting and Neural Networks can obtain precision with the percentage of 84.21% and 84.00%, respectively, which are higher than the Logistic Regression algorithm. Meanwhile, the Logistic Regression algorithm obtains the value of 82.86% in terms of precision.

The Gradient Boosting with additional parameter tuning manages to achieve a remarkable score in sensitivity with a value of 96.97%. Neural Networks with the additional parameter tuning can obtain slightly lower than the Gradient Boosting with additional parameter tuning with a percentage of 95.45%. The Voting Classifier can score a percentage of 90.91% in the sensitivity which is lower than Neural Networks with the additional parameter tuning. Besides that, the Logistic Regression algorithm scores a better percentage of 87.88% than the K-Nearest Neighbours algorithm with a value of 84.85%. The Support Vector Machine algorithm obtains the lowest sensitivity with a percentage of 81.82%.

In terms of specificity, the highest percentage with a value of 83.05% is achieved by the K-Nearest Neighbours algorithm and Support Vector Machine algorithm. Meanwhile, algorithms that are Logistic Regression, Voting Classifier, Gradient Boosting with additional parameter tuning and Neural Networks with additional parameter tuning obtain the same percentage of 79.66% in specificity.

3.1 Deep neural networks and extreme gradient boosting

For further exploration and comparison, DNN and XGBoost have been included in the experiment. The overall results demonstrate that these algorithms performed similarly well when compared to other classifiers conducted in this experiment.

As shown in Table 1 , XGBoost achieves a higher percentage of accuracy with a score of 80.69% compared to DNN which is 79.89%. In terms of precision, it is noted that XGBoost obtains a percentage of 75.22%, meanwhile DNN scores the lowest precision among the algorithms which is 73.62%. However, DNN managed to perform slightly better than XGBoost in terms of sensitivity with a percentage of 92.51%. In this case, XGBoost obtains a score of 90.91% in sensitivity. Next, it shows that the DNN obtained the lowest score of specificity which is 67.54%, meanwhile, XGBoost can score a percentage of 70.68% in the specificity.

From Table 2 , the final result of the experiment shows that XGBoost managed to increase the accuracy score by a percentage of 87.20%. DNN obtained slightly lower than XGBoost with a score of 86.40% in terms of accuracy. In addition, XGBoost displays the second-highest percentage of precision with a score of 84.72%. DNN scores the lowest score for precision with a percentage of 80.25%. However, DNN presents the highest percentage of sensitivity among the algorithms with a score of 98.47%. Meanwhile, XGBoost scores lower than DNN with a percentage of 92.42%. With a percentage of 81.36% of specificity, it is clearly stated that XGBoost performs well compared to DNN with a score of 72.88%.

4 Discussion

figure 1

Comparison chart for machine learning algorithms in accuracy and precision

figure 2

Comparison chart for machine learning algorithms in sensitivity and specificity

Figures 1 and 2 show the comparison of the machine learning algorithms in performances for both preliminary and final experiments. The machine learning algorithms have been compared for both experiments in terms of accuracy, precision, sensitivity and specificity. In this case, the comparison is being conducted to identify and examine the changes in the performance of the machine learning algorithms from both experiments.

The comparison charts show that most of the machine learning algorithms have been improved from the preliminary experiment to the final experiment. The chart shows that Gradient Boosting has the highest accuracy in both experiments. With the additional parameter tuning, Gradient Boosting managed to increase the accuracy score by 7.58%, thus achieving 88.80% in the accuracy score. Meanwhile, Neural Networks show drastic changes in the accuracy score. It shows a great increase in the accuracy from the preliminary experiment to the final experiment by 9.43%. Hence, the percentage accuracy of 88.00% is obtained by Neural Networks with the help of additional parameter tuning which is slightly lower than Gradient Boosting.

In terms of precision, K-Nearest Neighbours have achieved the highest percentage with a score of 84.85% which is increased by 6.42%. However, Gradient Boosting, Neural Networks and Support Vector Machine can present significant changes in terms of precision. Gradient Boosting with additional parameter tuning manages to reach a score of 84.21% in the final experiment from 76.13% in the preliminary experiment. It shows that Gradient Boosting can boost the precision score by 8.08%. Neural Networks with additional parameter tuning show a great impact by increasing the percentage by 10.55%, thus obtaining a score of 84.00% in the final experiment. Meanwhile, the Support Vector Machine manages to obtain 84.38% by increasing the percentage of precision by 10.23% from the preliminary experiment to the final experiment. The Voting Classifier can increase the precision score by 7.43% from the first experiment to the final experiment.

Next, the comparison in terms of sensitivity shows a slight increment in most of the machine learning algorithms except K-Nearest Neighbours and Support Vector Machine. K-Nearest Neighbours have shown a slight decline in the percentage by 0.71%. Meanwhile, Support Vector Machine presents a great decline with a percentage of 11.76%. It is the highest decline where the percentage of sensitivity from 93.58% in the preliminary experiment to 81.82% in the final experiment. Logistic Regression, Gradient Boosting and Neural Networks present an acceptable increase in the percentage. Gradient Boosting can increase the percentage of sensitivity by 6.60% which is from 90.37% to 96.97%. Meanwhile, Neural Networks show a boost in the percentage of sensitivity by 6.68% from the preliminary experiment to the final experiment. It is a great increment where it manages to increase to 95.45% in the final experiment from 88.77% in the preliminary experiment. However, the Voting Classifier shows a small decline of 1.6% in the sensitivity score which is from 92.51% to 90.91%.

In terms of specificity, it can be seen that most of the machine learning algorithms show great changes. From the charts, Gradient Boosting can increase the percentage of specificity by 5.84% from the preliminary experiment to the final experiment. The Neural Networks and Support Vector Machine display a significant increase in the specificity score by 11.07% where the change in increment is higher than changes that occurred in Gradient Boosting. It manages to raise the percentage of specificity from 68.59% to 79.66%. Meanwhile, the Support Vector Machine increases the percentage of specificity by 14.99% which is the highest change recorded. Not only that, the Voting Classifier managed to reach 79.66% of specificity in the final result which is increasing the score by 8.46% from the first result to the final result.

When comparing the single classifiers with the ensemble approach, the experiment conducted shows that the ensemble approach failed to perform better than the Gradient Boosting in terms of accuracy. However, it is able to obtain a good score of 85.60% which is slightly higher than most of the single classifiers such as Logistic Regression, K-Nearest Neighbours and Support Vector Machine. With a percentage of 83.30% in precision, the ensemble approach is being outperformed by most of the single classifiers except for Logistic Regression which is slightly better by a score of 0.44%. In terms of sensitivity, the ensemble approach is unexpectedly higher than Logistic Regression, K-Nearest Neighbours and Support Vector Machine. The result shows that Gradient Boosting and Neural Networks achieve greater scores compared to the ensemble approach which scores 90.91% only in the sensitivity. Moreover, the ensemble approach achieves a score of 79.66% in specificity which is the same degree as most of the single classifiers except K-Nearest Neighbours and Support Vector which is higher by a score of 3.39%. Although the ensemble approach can produce a satisfying performance for predicting mental health problems with higher accuracy and sensitivity, it is still complicated to be explained by the researchers.

4.1 Comparison of deep neural networks and extreme gradient boosting

In this section, the comparison of DNN and XGBoost will be further discussed, including a discussion on the improvement on the prediction results from preliminary experiments to final experiments. Besides that, DNN and XGBoost will be compared to the best classifiers which are Gradient Boosting and Neural Networks. Figures 1 and 2 display the changes that occurred for DNN and XGBoost from the first result to the final result in terms of accuracy, precision, sensitivity and specificity.

In terms of accuracy, it is clearly shown that both algorithms have significant improvements. For instance, DNN managed to increase the accuracy by 6.51% from the first experiment to the final result. Meanwhile, the XGBoost is able to show the same increment with DNN in the accuracy by a percentage of 6.51%.

Next, XGBoost shows a significant boost in precision by a value of 9.5%, thus achieving a percentage of 84.72% in the final result. Even though DNN showed a lower impact in the precision improvement, the classifier managed a percentage of 80.25% which means increased by a value of 6.63%.

When it comes to sensitivity, DNN reached the highest value by increasing the percentage by 5.96% from the first result to the final result. However, XGBoost shows only a minor increment with a value of 1.51%, thus reaching 92.42% in the final experiment.

In terms of specificity, the figures show that XGBoost was able to increase the value of specificity by 10.68%, hence obtaining 81.36% in the final experiment. DNN showed a slight increase when compared to XGBoost. It managed to increase the specificity by a value of 5.34%.

The final results show that Gradient Boosting and Neural Networks managed to perform better than DNN and XGBoost. For instance, the accuracy achieved by Gradient Boosting and Neural Networks is higher than by XGBoost and DNN. In terms of precision, XGBoost appears slightly better than Gradient Boosting and Neural Networks. Meanwhile, DNN showed the lowest percentage of precision. However, the DNN is achieving the highest score in the sensitivity compared to the other algorithms. The final results showed that XGBoost achieves a lower score of sensitivity compared to Gradient Boosting and Neural Networks. When it comes to specificity, XGBoost was able to achieve a better score than Gradient Boosting and Neural Networks. DNN showed the lowest percentage of specificity in the final results.

5 Conclusion and future work

From the final results, the Gradient Boosting algorithm with additional parameter tuning has achieved the best performance in terms of accuracy. All the machine learning algorithms experimented with within this research can achieve a satisfying score of accuracy in the classification of mental health problems. However, Neural Networks with the help of additional parameter tuning can achieve drastic and significant improvements in the conducted experiments. Higher accuracy achieved by the machine learning algorithms provides a higher chance of reliability in solving and determining mental health problems. In the additional comparisons utilizing the more recent machine learning approaches of DNN and XGBoost, the experimental results showed that although both were highly promising classifiers for predicting mental health problems, they did not outperform Gradient Boosting and Neural Networks for this particular bi-classification task.

The advancement of machine learning and artificial intelligence technologies presents us with the development of deep learning that maps the input features directly into the outputs through a multi-layer network structure. Thus, it is able to capture the hidden patterns within the data. The deep learning approaches have been very popular in the study of mental health problems. For instance, Mohan and others have applied a deep learning mechanism known as deep feed-forward neural network to obtain information about human brain waves by shaping the raw electroencephalogram signals [ 18 ]. From the signals collected, they are able to find that the central regions are insignificantly higher than the other brain regions. The obtained information can be used to differentiate the depressed and normal subjects from the brain wave signals. Moreover, the application of deep learning in neuroimages is also targeted at addressing mental health issues. For predicting depression, Geng et al. have proposed to apply a convolutional neural network and auto-encoder to extract important features from the functional magnetic resonance imaging data [ 19 ]. However, the application of deep learning models in mental health issues could encounter several challenges. Firstly, the deep learning technique requires a large volume of data samples to train the models efficiently. This could provide a risk towards several data that are hard to be collected. Besides, collecting massive and different data for training a good deep learning model could be challenging as it needs to consider data redundancy, missing values and deficiency. Not only that, the deep learning model is difficult to interpret and often labelled as a black box. It might become a contentious issue to convince the clinical practitioners about the recommended actions and appropriate procedures generated from the predictive model. Thus, it causes the clinical practitioner to reconsider the mental health prediction through the deep learning model since although it generates good outputs but without clear information about its inner workings.

In general, this research paper has focused on the implementation of machine learning approaches in predicting mental health problems. The empirical testing has shown that the Gradient Boosting algorithm performed best among the individual and ensemble machine learning approaches investigated here, achieving up to 88.8% accuracy. Hence, the result of this study can be useful and helpful for the mental health community, especially in the medical field as an automated computer-based approach to a clinical diagnosis of mental health issues. Researchers and medical practitioners could utilize this achievement in real-world clinical studies where it can become guidance for them to identify or diagnose mental health problems efficiently and effectively. Future investigation to improve prediction performance is planned using Generalized Adversarial Networks (GANs) and transformer neural networks.

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Abbreviations

Area under the receiver operating characteristic curve

Average one-dependence estimator

Categorical boosting

Deep neural networks

Functional tree

Generalized adversarial networks

Logical analysis tree

MATrix LABoratory

Open sourcing mental illness

Post-traumatic stress disorder

Radial basis function network

Extreme gradient boosting

Wolpert DH, Macready WG (1995) No free lunch theorems for search. Technical Report SFI-TR-95-02-010, Santa Fe Institute

Mourad CA, Joseph ZR, Zarrar S, Ralitza G, JohnsonMarcia K, TrivediMadhukar H, CannonTyrone D, Harrison KJ, Robert CP (2016) Cross-trial prediction of treatment outcome in depression: a machine learning approach. Lancet Psychiatry 3(3):243–250

Article   Google Scholar  

Sumathi MR, Poorna B (2016) Prediction of mental health problems among children using machine learning techniques. Int J Adv Comput Sci Appl 7(1):552–557

Galatzer-Levy IR, Ma S, Statnikov A, Yehuda R, Shalev AY (2017) Utilization of machine learning for prediction of post-traumatic stress: a re-examination of cortisol in the prediction and pathways to non-remitting PTSD. Transl Psychiatry 7(3):e1070–e1070

Sau Arkaprabha, Bhakta Ishita (2019) Screening of anxiety and depression among seafarers using machine learning technology. Informat Med Unlocked 16:100228

Zenebe RA, Xu AJ, Yuxin W, Yufei G, Anthony FM (2019) Machine learning for mental health detection. Technical report, 100 Institute Road, Worcester MA 01609-2280 USA, March

Tak JY, Woo JS, Seung-Hyun S, Harin K, Yangsik K, Jungsun L (2020) Diagnosing schizophrenia with network analysis and a machine learning method. Int J Methods Psychiatr Res 29(1):e1818

Tate Ashley E, McCabe Ryan C, Larsson Henrik, Lundström Sebastian, Lichtenstein Paul, Kuja-Halkola Ralf (2020) Predicting mental health problems in adolescence using machine learning techniques. PLOS ONE 15(4):e0230389

Liu Yang S, Chokka Stefani, Cao Bo, Chokka Pratap R (2021) Screening for bipolar disorder in a tertiary mental health centre using EarlyDetect: a machine learning-based pilot study. J Affect Disord Rep 6:100215

Open Sourcing Mental Illness. Open sourcing mental illness 2014

Kim Ji-Hyun (2009) Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Comput Statis Data Anal 53(11):3735–3745

Article   MathSciNet   MATH   Google Scholar  

Gitte V, Hendrik B (2012) On estimating model accuracy with repeated cross-validation. p. 39–44. De Baets, Bernard

Molinaro AM, Simon R, Pfeiffer RM (2005) Prediction error estimation: a comparison of resampling methods. Bioinformatics 21(15):3301–3307

Songthip O, Lensing Shelly Y, Spencer Horace J, Kodell Ralph L (2012) Estimating misclassification error: a closer look at cross-validation based methods. BMC Res Notes 5(1):656

Witten Ian H, Eibe F, Hall Mark A (2011) Data mining: practical machine learning tools and techniques. 01

Bouckaert Remco R (2003) Choosing between two learning algorithms based on calibrated tests. In: Proceedings of the Twentieth International Conference on International Conference on Machine Learning, ICML’03, pp 51–58. AAAI Press

Sujatha J, Rajagopalan SP (2017) Performance evaluation of machine learning algorithms in the classification of Parkinson’s disease using voice attributes. Int J Appl Eng Res 12:10669–10675

Google Scholar  

Yogeswaran M, Seng CS, Pei XDK, Poh FL (2016) Artificial neural network for classification of depressive and normal in EEG. In: 2016 IEEE EMBS Conference on Biomedical Engineering and Sciences (IECBES). IEEE

Geng X-F, Xu J-H (2017) Application of autoencoder in depression diagnosis. DEStech Trans Comput Sci Eng (csma), pp 146–151

Download references

Acknowledgements

The corresponding author is supported by a research grant from the Ministry of Higher Education, Malaysia [Fundamental Research Grant Scheme (FRGS), Dana Penyelidikan, Kementerian Pengajian Tinggi, FRGS/1/2019/ICT02/UMS/01/1]. The APC was funded by Universiti Malaysia Sabah.

Author information

Authors and affiliations.

Faculty of Computing and Informatics, Universiti Malaysia Sabah, Jalan UMS, 88400, Kota Kinabalu, Sabah, Malaysia

Jetli Chung

Advanced Machine Intelligence Research Group, Faculty of Computing and Informatics, Universiti Malaysia Sabah, Jalan UMS, 88400, Kota Kinabalu, Sabah, Malaysia

Evolutionary Computing Laboratory, Faculty of Computing and Informatics, Universiti Malaysia Sabah, Jalan UMS, 88400, Kota Kinabalu, Sabah, Malaysia

You can also search for this author in PubMed   Google Scholar

Contributions

JC: writing—original draft. JT: writing—review and editing, supervision, funding acquisition. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Jason Teo .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Chung, J., Teo, J. Single classifier vs. ensemble machine learning approaches for mental health prediction. Brain Inf. 10 , 1 (2023). https://doi.org/10.1186/s40708-022-00180-6

Download citation

Received : 16 December 2021

Accepted : 13 November 2022

Published : 03 January 2023

DOI : https://doi.org/10.1186/s40708-022-00180-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

mental health prediction using machine learning research paper

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of plosone

Predicting mental health problems in adolescence using machine learning techniques

Ashley E. Tate

1 Department of Medical Epidemiology and Biostatics, Karolinska Institutet, Stockholm, Sweden

Ryan C. McCabe

2 Spotify, Stockholm, Sweden

Henrik Larsson

3 School of Medical Sciences, Örebro University, Örebro, Sweden

Sebastian Lundström

4 Centre for Ethics, Law and Mental Health (CELAM), University of Gothenburg, Gothenburg, Sweden

5 Gillberg Neuropsychiatry Centre, University of Gothenburg, Gothenburg, Sweden

Paul Lichtenstein

Ralf kuja-halkola, associated data.

We regret to say that we are unable to share even de-identified data, as legally bound by the Swedish Serecy Act. Data from the national Swedish registers and twin register were used for this study and made available by ethical approval. The data used for this study include: Swedish Twin Registry, National Patient Register, Multi-Generation Register, Medical Birth Register, Prescribed Drug Register, the Longitudinal Integration Database for Health Insurance and Labor Market Studies. Researchers may apply for access these data sources through the Swedish Research Ethics Boards (etikprovningsmyndigheten.se; es.npec@ilsnak ) and from the primary data owners: Swedish Twin Registry ( es.ik.bem@hcraeser-rts ), Statistics Sweden ( es.bcs@bcs ), and the National Board of Health and Welfare ( es.neslerytslaicos@neslerytslaicos ), in accordance with Swedish law.

Predicting which children will go on to develop mental health symptoms as adolescents is critical for early intervention and preventing future, severe negative outcomes. Although many aspects of a child’s life, personality, and symptoms have been flagged as indicators, there is currently no model created to screen the general population for the risk of developing mental health problems. Additionally, the advent of machine learning techniques represents an exciting way to potentially improve upon the standard prediction modelling technique, logistic regression. Therefore, we aimed to I.) develop a model that can predict mental health problems in mid-adolescence II.) investigate if machine learning techniques (random forest, support vector machines, neural network, and XGBoost) will outperform logistic regression.

In 7,638 twins from the Child and Adolescent Twin Study in Sweden we used 474 predictors derived from parental report and register data. The outcome, mental health problems, was determined by the Strengths and Difficulties Questionnaire. Model performance was determined by the area under the receiver operating characteristic curve (AUC).

Although model performance varied somewhat, the confidence interval overlapped for each model indicating non-significant superiority for the random forest model (AUC = 0.739, 95% CI 0.708–0.769), followed closely by support vector machines (AUC = 0.735, 95% CI 0.707–0.764).

Ultimately, our top performing model would not be suitable for clinical use, however it lays important groundwork for future models seeking to predict general mental health outcomes. Future studies should make use of parent-rated assessments when possible. Additionally, it may not be necessary for similar studies to forgo logistic regression in favor of other more complex methods.

Introduction

Childhood onset psychopathology can carry a heavy burden of negative outcomes that persist through adolescence and into adulthood. These outcomes are often severe: criminal convictions, low educational attainment, unemployment, and increased risk of suicide attempts [ 1 , 2 ]. As many of the documented risk factors for mental illnesses in adolescence can be mitigated by early interventions [ 3 ], research establishing the most informative mental health indicators could help more precisely identify the proper traits for intervention targets.

There are several well researched indicators in childhood that are associated with the development of mental health problems. Psychopathological traits in early childhood also often indicate a higher risk for consistent mental health problems in adolescence and adulthood [ 4 ]; with even subthreshold symptoms indicating future adversity and a general predisposition to mental illnesses [ 5 – 7 ]. Internalizing and externalizing symptoms in childhood are both frequently associated with higher risk of mental illness diagnosis later in life [ 5 , 8 ]. Specifically, impulsivity has been associated with a susceptibility of developing mental illnesses and suicide [ 9 , 10 ]. Moreover, neurodevelopmental disorders, such as autism or ADHD, indicate lifelong diagnosis and frequent psychiatric comorbidities [ 11 ]. Similarly, learning difficulties can also indicate future mental health adversity and are frequently seen in children with neurodevelopmental disorders [ 12 , 13 ].

Additionally, parental mental health, such as anxiety or depression, has been found to correlate with childhood internalizing and externalizing symptoms, likely due to a shared biologic (genetic) etiology[ 14 , 15 ]. Thus, parental mental health may serve as an indicator of a more general predisposition for mental illness in lieu of genetic data. Genetic etiology is important to account for as most childhood psychiatric disorders overlap at both the phenotypic and etiological level [ 15 ]. Similarly, living in a lower SES neighborhood has been associated with an increase in internalizing problems and ADHD, although the mechanisms of this association are debated [ 16 , 17 ]. Factors associated with the neonatal environment and birth have been associated with later adverse mental health and neurodevelopmental disorders [ 18 , 19 ]. Moreover, chronic physical illness or disability can have a profound effect on mental health [ 20 ].

Taken together, reported factors in childhood associated with adolescent mental illness reflect intricate developmental pathways at almost every level. Understandably, most studies have not properly integrated risk factors from varying domains. Modern advancements in prediction modeling with machine learning may, in part, provide a cost-efficient solution to this problem.

Machine learning in mental health

Supervised machine learning, used for classification or prediction modelling, has the advantage of accounting for complex relationships between variables that may not have been previously identified. Thus, as datasets become larger and the variables more complex, machine learning techniques may become a useful tool within psychiatry to properly disentangle variables associated with outcomes for patients[ 21 , 22 ].

A majority of studies using machine learning within psychiatry have focused on classification or diagnosis [ 23 , 24 ]. However, critique has been raised that these studies are prone to under-perform due to a lack of insight on underlying assumptions of the various machine learning techniques or on the psychiatric disorders and corresponding diagnostic processes [ 25 ]; highlighting the difficulty in creating and validating such models. That said, advancements have been made in the field using tree based models to predict suicide in adolescents and the U.S. military [ 26 , 27 ]. Beyond their proven efficacy, tree based models provide information on how extensively a variable was used for the model, or variable importance, which gives some insight to the models’ classification process. This indicates that, while the way forward is arduous, properly conducted machine learning techniques can be interpretable and improve the efficacy of clinical decision making.

The primary aim for this study is to develop a model that can predict mental health problems in mid-adolescence. Additionally, we aim to investigate various machine learning techniques along with standard logistic regression to determine which performs best using combined questionnaire and register data. We expect that the techniques used will perform with similar accuracy according to the “No Free Lunch Theorem” [ 28 , 29 ].

Participants

Participants came from the Child and Adolescent Twin Study in Sweden (CATSS), an ongoing, longitudinal study containing 15,156 twin pairs born in Sweden. During the first wave, the twins’ parents were contacted close to their 9 th or 12 th birthdays for a phone interview, this wave had a response rate of 80% [ 30 ], while the second wave at age 15 had a response rate of ~55%. This sample population was chosen due to the depth of information available, including questionnaire and register data. Using the unique identification number given to all Swedes we linked several Swedish national registries to the CATSS data; the National Patient Register (NPR) [ 31 ], the Multi-Generation Register (for identification of parents) [ 32 ], the Medical Birth Register (MBR) [ 33 ], the Prescribed Drug Register (PDR) [ 34 ], as well the Longitudinal Integration Database for Health Insurance and Labor Market Studies (LISA) [ 35 ]. A total of 7,638 participants born between 1994 and 1999 who completed data collection at age 9 or 12 and again at age 15 were eligible for inclusion and were used in the analysis.

The study was approved by the Regional Ethical Review Board in Stockholm (the CATSS study, Dnr 02–289, 03–672, 2010/597-31/1, 2009/739-31/5, 2010/1410-31/1, 2015/1947-31/4; linkage to national registers, Dnr 2013/862–31/5).

The outcome measure of adolescent mental health problems was collected at age 15 via the Strengths and Difficulties Questionnaire (SDQ) [ 36 ]. We used the SDQ to obtain parent-rated emotional symptoms, conduct problems, prosocial behavior, hyperactivity/inattention, and peer relationship problems. A binary variable was created based on a combination of the parent reported subscales, not including prosocial behavior, with a cut-off score validated for the Swedish population, corresponding to approximately 10% scoring above cut-off and thus rated as having mental health problems [ 37 ]. Predictors were collected at 9/12 or earlier from questionnaires administered through CATSS and through registers. We included a wide range of predictors based on previous findings of association with adolescent mental health outcomes and/or childhood mental health. Predictors encompassed everything from birth information, physical illness, to mental health symptoms, to environmental factors such as neighborhood and parental income. Informants included both register and parental reported information. A total of 474 variables were initially included in the dataset, a complete list can be found in S1 File .

Data pre-processing

Variables with more than 50% missingness were removed from analysis (202 variables excluded). Redundant variables were also removed (134 excluded). Additionally, variables with no variance were removed (32 excluded) and those with variance near zero were combined into one variable if possible, e.g. dust, mold, and pollen allergy collapsed into allergy [ 38 ]. Ultimately, 85 variables were determined to be suitable for analysis. As most machine learning techniques require complete datasets, missing values were imputed with tree based imputation with the R package mice [ 39 ].

Statistical analysis

All analyses were performed in R. First, a learning curve was plotted with the entire dataset in order to check if our study was sufficiently powered.

Then, we split our data into a training-set (60% of the sample), a tune-set (10%), and a test-set (30%). Splitting data allows for more accurate determination in how the model will perform in a new dataset and helps alleviate overfitting, i.e. to fit the training data too closely to accurately predict other datasets. Stratified random sampling was used to ensure that the twin pairs would not get separated between the datasets, thus avoiding potential overfitting. Additionally, we preserved an equal distribution of the outcome between each set. Descriptive statistics were created for each set to determine the quality of the partition ( Table 2 ).

We artificially inflated the number of cases in the training-set through a Synthetic Minority Over-sampling Technique (SMOTE), as implemented in the R-package SMOTEBoost [ 40 ], because positive cases were relatively rare. This phenomenon is commonly termed class imbalance [ 41 ] and can cause the model to predict all outcomes as the majority class.

The performance of predictions from considered models were determined by the area under the receiver operating characteristic curve (AUC). We created prediction models using several machine learning techniques: random forest, XGBoost, logistic regression, neural network and support vector machines ( Table 1 ) to determine which produced the best fitting model for a test set. Using cross validation, each technique trained multiple models using the training set and tested their performance on a subset of the training set. The model with the lowest error was then tested using the tune set. Once the performance in the tune set was deemed satisfactory, the final models were then fitted to the test set. Parameter tuning was guided in part by standard practice when available, however a majority of the tuning took place through the random search function in R package mlr [ 42 , 43 ]. Random search was completed using cross-validation with 3 iterations, 50 times. Variable importance was calculated for tree-based models: random forest and XGBoost. Confidence intervals at 95% were created for each AUC by bootstrapping predictions 10,000 times. Positive Predictive and Negative predictive values were obtained for the best performing model.

*mlr [ 42 ] was also used for all techniques

Sensitivity analysis

The SDQ, used to derive our outcome variable, has several suggested cut-offs based on different criteria and sample populations. Although we used a cut-off of 11, based on capturing the highest 10% in a Swedish sample [ 37 ], it’s possible that this cut-off does not represent a distinct subgroup of psychopathology, ultimately hampering model performance. To assess whether model performance was affected based on used cut-off, we created a new model using the best performing technique with a more stringent cut-off from the original publication. This cut-off of 17 was based on capturing the highest 10% of scorers in a UK sample in the original publication [ 36 ].

The datasets were deemed to be well separated ( Table 2 ). Our classes were fairly imbalanced as only 12% of our sample reached the cut off, we mitigated the effects of this through a combination of over- and under sampling on the training set using SMOTEBoost. Next, the learning curve revealed that the models performed well without additional data nor hyper-parameter modifications, with an exception of neural network which required additional data preparation, e.g. centering and scaling of continuous variables ( Fig 1 ).

An external file that holds a picture, illustration, etc.
Object name is pone.0230389.g001.jpg

The learning curve specifying the performance of each technique without any data nor hyper-parameter modification (y axis) given the total percent of the dataset (x axis) used to train the models.

Model tuning

We then fit models using all considered techniques; the AUCs from the tune-set of the final models for each technique can be found in Fig 2 . A full list of the optimal parameters and the ranges tried for each model can be found in S1 – S4 Tables. No model was found to be significantly superior, however random forest and support vector machine (SVM) had the highest AUCs of 0.754 (95% CI 0.698–0.804; and 95% CI 0.701–0.802, respectively). The rest of the models performed similarly with an AUC above 0.700 ( Fig 2 & Table 3 ).

An external file that holds a picture, illustration, etc.
Object name is pone.0230389.g002.jpg

The AUC performance for each technique using the tune set.

The created models were then used to predict the outcome in the test set. The lack of a statistically significant better model remained. The random forest model preformed slightly better at predicting the test set than SVM, with an AUC of 0.739 (95% CI 0.708–0.769) and 0.735 (95% CI 0.707–0.764) respectively ( Table 4 & Fig 3 ), however the CI of each AUC overlaps the estimate of the other indicating a non-significant difference.

An external file that holds a picture, illustration, etc.
Object name is pone.0230389.g003.jpg

The AUC performance for each technique using the test set.

The probability threshold was set to 0.8, meaning that the model classified participants as having mental health problems if the probability of belonging to the class was greater than 0.2. Our top model had a predictive value of 15%, while the negative predictive value was at 96%. This corresponds to a sensitivity of .91 and a specificity of .30, and classified 15% of the test set with the outcome.

The more stringent cut-off based on a UK sample [ 36 ] categorized roughly 3% of our sample as having mental health issues. We trained a random forest model based on this new cut-off, and found a test AUC of 0.765 (95% CI 0.698–0.826). Although the AUC was marginally better, the confidence interval overlapped with the top performing model with the Swedish cut offs, indicating no meaningful difference.

Variable importance

The variable importance for random forest revealed that the parent-reported mental health items ranked highly, as well as neighborhood quality, gestational age, and parity ( Table 5 ). This indicates that model accuracy decreased significantly when these particular variables were permuted, i.e. randomly exchanged between individuals, during the analysis.

1 Autism—Tics, AD/HD and other Comorbidities inventory [ 58 ]

2 the Longitudinal Integration Database for Health Insurance and Labor Market Studies[ 35 ]

3 Medical Birth Register [ 33 ]

Using a large range of data from parent reports and register data from numerous Swedish national registers, this study predicted adolescent mental health reasonably well, with a maximum AUC of 0.739 on the test set (using the random forest model). Although the AUC indicates an adequate model, it is not accurate enough for clinical use. While the negative predictive value is at 96% indicates clinical level sensitivity, the positive predictive value of this model is only 15%. This indicates that only a small percentage of the children flagged will actually reach our pre-specified cut-off for mental health problems, which should be compared to the prevalence in the sample of 10%.

The variable importance derived from the random forest model indicated that the model did not overly rely on any variable, thus the model would be relatively stable with the removal of any one variable, including those stable over time. The highest ranked variables were parent-reported mental health symptoms such as impulsivity, inattention, and emotional symptoms were important predictive factors for poor mental health at 15. Register information on neighborhood quality, parity and gestational age of birth were also deemed important. These findings fit within literature [ 17 , 18 , 44 ] and could potentially be used by clinicians, parents, or educators to identify at risk children for potential intervention.

The highest ranking variables were either parent-rated or could easily be reported by parents, this indicates that register information, which can be expensive or difficult for researchers to obtain, may not be necessary for a successful psychiatric risk model. Thus, future studies predicting adolescent mental health may want to place a greater emphasis on assessment from caregivers. Moreover, this provides further encouragement for parental involvement in clinicians’ assessment of childhood and adolescent psychiatric prognosis and emotional well-being. Additionally, future studies with similar aims should focus on using symptom ratings for mental health, including neurodevelopmental disorders, for their model.

Sensitivity analysis showed that the model performance was slightly improved, although not significantly, with a more extreme cut-off (sensitivity analysis AUC = 0.765, 95% CI 0.698–0.826; random forest AUC = 0.739, 95% CI 0.708–0.769). This indicates that future studies can use cut-offs validated for their country or the original study based on preference. Additionally, this provides some evidence that the more extreme cases do not represent a distinct severe class.

In line with the No Free Lunch Theorem, all models performed with relatively similar accuracy [ 29 ]. A recent systematic review found no clear predictive performance advantage of using machine learning techniques instead of logistic regression, in a range of clinical prediction studies [ 45 ]. In our study, the similar performance to logistic regression may partially be attributed to the relatively linear associations from the predictors to the outcome, evident by the lack of significance for non-linear associations in our logistic regression model. When the data has a mostly linear relationship to the outcome, machine learning models will be very similar to standard regression [ 46 ]. Although random forest performed slightly better than the compared models, it may be unnecessary for studies with similar datasets and aims to use complex machine learning techniques instead of logistic regression when weighed against time spent learning the techniques, computational time, as well as interpretability of the model.

The strengths of this study include the comprehensive analysis of a wide variety of factors associated with adolescent mental health. Further, the use of parental reports indicates that these risk factors are identifiable by non-clinicians, indicating a low cost future solution for large scale mental health screens. The results need to be viewed in the light of several limitations. First, because we used a twin sample our findings may not be generalizable to singletons as our sample might have underlying differences in comparison to singletons. However, previous literature has found little difference in mental health between singletons and twins [ 47 ]. That said, zygosity did not rank as highly important, indicating that the model did not rely on the similarity between twins. On a similar note, our study results may not generalize outside of Sweden or Scandinavia, as all of our participants were Swedish born and we did not validate our results in an external sample. Second, the outcome as well as the most important variables were all parent-reported, this may have introduced an association due to a reporting bias. Additionally, because we used mixed data types (continuous, categorical, and binary) in our model it’s possible that the variable importance could have been biased, however this effect is likely to be mitigated as we did not sample with replacement [ 48 ]. Finally, the response rate between data collections was 55% [ 30 ], so it’s likely that the nonresponders had elevated psychopathology symptoms compared to responders. Additionally, the performance of the model would likely improve with a larger sample size.

In summation, our models had a reasonable AUC, but no model had statistically significant higher performance than the other. Although supervised machine learning techniques are currently generating considerable interest across scientific fields, it may not be necessary for most studies to forgo logistic regression, especially for studies with smaller datasets featuring primarily linear relationships. Additionally, our results provide further support for diligent screening of neurodevelopmental symptoms and learning difficulties in children for later psychiatric vulnerabilities. Although, machine learning techniques seem to be promising for the integration of risks across different domains for the prediction of mental health problems in adolescence, it seems premature for implementation in clinical use. Nevertheless, as early treatment for these and other mental health symptoms has been found to largely mitigate negative outcomes and symptoms [ 49 , 50 ], there is hope for prevention of negative mental health problems in adolescence with properly timed interventions.

Supporting information

A list of variables considered for our model.

Optimal and explored parameters for the support vector machine model.

Optimal and explored parameters for the neural network model.

Optimal and explored parameters for the random forest model.

Optimal and explored parameters for the XGBoost model.

Acknowledgments

The authors would like to thank Alexander Hatoum for his contribution to the code.

Funding Statement

The Child and Adolescent Twin Study in Sweden study was supported by the Swedish Council for Working Life, funds under the ALF agreement, the Söderström Königska Foundation and the Swedish Research Council (Medicine, Humanities and Social Science; grant number 2017-02552, and SIMSAM). SL, PL This work has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under the Marie Sklodowska-Curie CAPICE Project grant agreement number 721567. ( https://www.capice-project.eu/ ) AT, PL, SL We acknowledge financial support from the Swedish Research Council for Health, Working Life and Welfare (project 2012-1678; PL), the Swedish Research Council (2016-01989; PL), as well as the the Swedish Initiative for Research on Microdata in the Social And Medical Sciences (SIMSAM) framework (340-2013-5867; PL). We acknowledge The Swedish Twin Registry for access to data. The Swedish Twin Registry is managed by Karolinska Institutet and receives funding through the Swedish Research Council under the grant no 2017-00641. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Data Availability

  • PLoS One. 2020; 15(4): e0230389.

Decision Letter 0

PONE-D-19-24985

Dear Ms Tate,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

We would appreciate receiving your revised manuscript by Jan 18 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.
  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.
  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Parisa Rashidi

Academic Editor

Journal Requirements:

1. When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Thank you for stating the following in the Acknowledgments Section of your manuscript:

The Swedish Twin Registry is managed by Karolinska Institutet and receives funding through the Swedish Research Council under the grant no 2017-00641.

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

The Child and Adolescent Twin Study in Sweden study was supported by the Swedish Council for Working Life, funds under the ALF agreement, the Söderström Königska Foundation and the Swedish Research Council (Medicine, Humanities and Social Science; grant number 2017-02552, and SIMSAM). SL, PL

This work has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under the Marie Sklodowska-Curie CAPICE Project grant agreement number 721567. ( https://www.capice-project.eu/ ) AT, PL, SL

We acknowledge financial support from the Swedish Research Council for Health, Working Life and Welfare (project 2012-1678; PL), the Swedish Research Council (2016-01989; PL), as well as the the Swedish Initiative for Research on Microdata in the Social And Medical Sciences (SIMSAM) framework (340-2013-5867; PL)

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

3. Thank you for stating the following in the Competing Interests section:

I have read the journal's policy and the authors of this manuscript have the following competing interests:

H. Larsson has served as a speaker for Evolan Pharmaand Shire and has received research grants from Shire; all outside the submitted work. P. Lichtenstein has served as a speaker for Medice, also outside the submitted work.

R. McCabe serves as a data scientist for Spotify outside of the submitted work.

All other authors declare that no competing interests exist

Please confirm that this does not alter your adherence to all PLOS ONE policies on sharing data and materials, by including the following statement: "This does not alter our adherence to  PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests ).  If there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared.

Please include your updated Competing Interests statement in your cover letter; we will change the online submission form on your behalf.

Please know it is PLOS ONE policy for corresponding authors to declare, on behalf of all authors, all potential competing interests for the purposes of transparency. PLOS defines a competing interest as anything that interferes with, or could reasonably be perceived as interfering with, the full and objective presentation, peer review, editorial decision-making, or publication of research or non-research articles submitted to one of the journals. Competing interests can be financial or non-financial, professional, or personal. Competing interests can arise in relationship to an organization or another person. Please follow this link to our website for more details on competing interests: http://journals.plos.org/plosone/s/competing-interests

4. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions .

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially sensitive information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories .

We will update your Data Availability statement on your behalf to reflect the information you provide.

Additional Editor Comments:

Based on the reviewers' comments, a minor revision is recommended for this manuscript. Please address reviewers' comments as appropriate.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: I Don't Know

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: No

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Overall this is a very well-written paper with clear descriptions of motivation, methods, conclusions, and limitations. The authors were very thorough in describing variables and parameters used in the prediction models. I have only minor comments that I feel would improve clarity:

1. Typo in abstract: "METODS" --> "METHODS"

2. The authors state that machine learning models are "black box", but typically this is in reference to deep learning models. Most of the models used in this paper would be considered conventional and "interpretable"

3. The authors state that CATSS participants are "described in detail elsewhere". This should at least be summarized in the current manuscript.

4. Was there any particular reason that 50% was the missingness threshold for removing variables? I think it would be nice to examine possible missingness patterns, e.g. particular variables missing for certain subgroups.

5. Since the described models are not computationally expensive, it might be nice to perform nested cross validation as opposed to a fixed train/val/test split.

6. The class imbalance should be mentioned in the main manuscript.

7. Table 2 should be referenced for the following line: "Descriptive statistics were created for each set to determine the quality of the partition"

8. Authors should slightly reword the description of training procedure. I assume "fit was determined by finding the maximum AUC" is referring to AUC on the tune set, but this should be explicitly mentioned. It almost reads like models were first trained on the training set before moving on to the tune set, but both of these should be used simultaneously in the cross validation procedure.

9. Why weren't feature importances explored for models like logistic regression or SVM? It is certainly possible.

10. I assume the "best performing model" is based on tune set performance (and not test set), but this should be explicitly mentioned.

11. I find the description of the neural network to be problematic, several important parameters were not mentioned (# of layers, optimizer and its parameters, dropout, etc.). Furthermore, the final hidden dimension of 3 seems very low.

12. The authors should construct a supplemental table of the ranges of parameters explored in the random search.

Reviewer #2: This is a study of predictors of mental health issues in a sample of 7,638 Swedish twins. Predictors were collected on them at ages 9-12 and the mental health criterion data were collected at age 15. Although governmental data on Swedish twins is used in this study, the fact that they are twins is irrelevant and appears to pose no source of bias regarding the results. Of 474 variables collected from various governmental data sources, 85 survived scientific scrutiny and were included in the machine learning and regression models reported. Findings suggested that both kinds of analyses produced AUC scores above .7. Apparently these values are not adequate for clinical application, but they are certainly informative for behavioral scientists. Two very important findings from this investigation are (a) logistic regression was adequate for this work, so machine learning analyses may be unnecessary in similar future studies, and (b) the most powerful predictors of mental health issues among these Swedish teens came from parent reports, which are far faster and easier data to collect than most of the other predictors. These two findings are important to share because they provide a green light to the work of investigators in this area who may not be proficient in machine learning and who may only have access to data from parents. In addition, based on these findings, extramural funders may seek to fund these more affordable projects, instead of rejecting them in favor of funding projects that use more costly machine learning analyses (with the need for a lot of data) and governmental data sources.

I am not a machine learning expert, so I cannot speak to the statistical conclusion validity of those analyses, but I am competent in logistic regression and saw no issues in those analyses.

In several places, the authors need to be careful not to elevate or hint at elevating nonsignificant effects to significance. When the authors say two values are different, then later say they are not significantly different, they blur the conclusion. Statistically the numbers are not different. Better simply to say the two values did not significantly differ and leave it at that. I am no fan of null hypothesis statistical testing, but the authors chose that approach, and some scientific communities still use that approach, so the authors need to remain true to that approach, which posits that findings either are or are not different based on statistical significance.

Because participants being twins was immaterial to the scientific questions addressed in this study, the authors should explain why they used a twin sample. It seems as though they could have gotten data on a far larger sample of Swedish children if they did not restrict their focus to twins, who on average represent less than 3% of a population. It could be that the kinds of data collected on Swedish twins simply are not collected on their non-twin counterparts. If that's so, the authors should say that.

The ms. would be improved by a section that very specifically enumerated important next steps in predicting teen mental health issues, given these twin data. What do the authors think would be good ways for scholars to increase the AOC to levels appropriate for clinical use, for example? Other scholars would very much appreciate this kind of insight to guide their work.

An important limit to this work is cultural. Findings based on Swedes and Swedish culture may not broadly generalize, especially with regard to outcomes as socially defined as mental health concerns. So in addition to the five limitations briefly included in the Discussion, I suggest the authors add concerns about generalizability beyond Sweden and other very similar and similarly homogeneous nations.

6. PLOS authors have the option to publish the peer review history of their article ( what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at gro.solp@serugif . Please note that Supporting Information files do not need this step.

Author response to Decision Letter 0

23 Jan 2020

Dear Editor,

First, we would like to thank you for your timely feedback and thorough review of the paper. Our response can be read in the following key:

Reviewer’s comment Revisions Response

Lines numbers correspond to the document without tracked changes

Journal Comment 1. When submitting your revision, we need you to address these additional requirements.

Answer 1: Thank you for bringing this to our attention. We have unbolded the title and additionally have added a new line for the corresponding author’s email. We have also changed the supporting files names to add the file type in the name, e.g. “Fig1” -> “Fig1.tif”. Please let us know if any additional style requirements were unintentionally omitted.

Journal Comment 2. Thank you for stating the following in the Acknowledgments Section of your manuscript:

Answer 2. We have removed the funding information from lines 296 – 298. We would like to kindly request that our funding statement be updated to include the lines:

We acknowledge The Swedish Twin Registry for access to data. The Swedish Twin Registry is managed by Karolinska Institutet and receives funding through the Swedish Research Council under the grant no 2017-00641.

Journal Comment 3. Thank you for stating the following in the Competing Interests section:

Please confirm that this does not alter your adherence to all PLOS ONE policies on sharing data and materials, by including the following statement: "This does not alter our adherence to PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests ). If there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared.

Answer 3. Thank you for bringing this to our attention. We have added included an updated version of the competing interests to our cover letter.

Journal Comment 4. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions .

Answer 4. Thank you for addressing this. We have added the following response to this prompt in our cover letter:

Moreover, we regret to add that we are unable to share even de-identified data, as legally bound by the Swedish Serecy Act. Data from the national Swedish registers and twin register were used for this study and made available by ethical approval. The data used for this study include: Swedish Twin Registry, National Patient Register, Multi-Generation Register, Medical Birth Register, Prescribed Drug Register, the Longitudinal Integration Database for Health Insurance and Labor Market Studies. Researchers may apply for access these data sources through the Swedish Research Ethics Boards (etikprovningsmyndigheten.se; es.npec@ilsnak ) and from the primary data owners: Swedish Twin Registry ( es.ik.bem@hcraeser-rts ), Statistics Sweden ( es.bcs@bcs ), and the National Board of Health and Welfare ( es.neslerytslaicos@neslerytslaicos ), in accordance with Swedish law.

The authors would like to sincerely thank you for the kind evaluation and encouragement on the manuscript.

Question 1: Typo in abstract: "METODS" --> "METHODS"

Answer 1: The typo in line 34 has been corrected.

Question 2: The authors state that machine learning models are "black box", but typically this is in reference to deep learning models. Most of the models used in this paper would be considered conventional and "interpretable"

Answer 2: Thank you for this comment, we agree that the reference to “black box” was misplaced. We have changed the text accordingly, lines 92 – 94 have been changed to:

Beyond their proven efficacy, tree based models provide information on how extensively a variable was used for the model, or variable importance, which gives some insight to the models’ classification process.

Question 3: The authors state that CATSS participants are "described in detail elsewhere". This should at least be summarized in the current manuscript.

Answer 3: Additional information has been added on lines 104 – 106:

Participants came from the Child and Adolescent Twin Study in Sweden (CATSS), an ongoing, longitudinal study containing 15,156 twin pairs born in Sweden. During the first wave, the twins’ parents were contacted close to their 9th or 12th birthdays for a phone interview, this wave had a response rate of 80% (36), while the second wave at age 15 had a response rate of ~55%.

Question 4: Was there any particular reason that 50% was the missingness threshold for removing variables? I think it would be nice to examine possible missingness patterns, e.g. particular variables missing for certain subgroups.

Answer 4: Thank you for this comment. The 50% missingness was chosen a bit arbitrarily, the main aim was to keep variables with acceptable coverage in the prediction model and we felt that overly imputed variables ultimately would not lead to better results compared to removing them. CATSS is an ongoing longitudinal study and some of questions were changed or added over the years. This means that some questions had a high rate of missingness because only a small percentage of our sample was asked. Additionally, there were several gated questions that also had a high degree of missingness. Thus, our approach automatically excludes these questions.

A distribution of the missingness in our data is visualized in the below figure:

As can be seen, the cut-off of 50% missingness (chosen a priori) removes a set of variables with 90% or more missingness and a set of variables with 70-80% missingness (more borderline quality of variables). We believe our choice, albeit somewhat arbitrary, achieves a good balance between retaining variables with sufficient coverage/quality while not being overly conservative.

Question 5; Since the described models are not computationally expensive, it might be nice to perform nested cross validation as opposed to a fixed train/val/test split.

Answer 5: This is a great topic for discussion, thank you for bringing this up. One problem with this approach would be the potential splitting of twins between the subsets in nested-cross validation, which could lead to overfitting. Without the tune set as a “safety net” we thought it would be hard to catch the potentiality for overfitting before moving to the test set. This concerned the authors as we wanted to avoid training a new model after moving to the test set. To our knowledge, no such control for this issue within nested-cross validation exists within the common ML packages in R (please let us know if you know of a potential solution). Notably, caret has groupKfolds, but in practice this did not turn out to be helpful [1].

We did additional sensitivity analysis by training a random forest model with nested cross validation. We split our data into a train and test set (70/30 split) and found similar performance to our previously created models (AUC=0.743, 95% CI 0.712 - 0.773) compared to our top performing model (AUC=0.739, 95% CI 0.708 – 0.769). This result indicates that the tuning approach comes down to personal preference, although the authors would like to contend that nested cross validation seems to be a more streamlined option.

Question 6: The class imbalance should be mentioned in the main manuscript.

Answer 6: Thank you for this suggestion, we agree and an additional sentence has been added to the results section on lines 178 – 180:

Our classes were fairly imbalanced as only 12% of our sample reached the cut off, we mitigated the effects of this through a combination of over- and under sampling on the training set using SMOTEBoost.

Question 7: Table 2 should be referenced for the following line: "Descriptive statistics were created for each set to determine the quality of the partition"

Answer 7: The suggested edit has been made on line 147.

Question 8: Authors should slightly reword the description of training procedure. I assume "fit was determined by finding the maximum AUC" is referring to AUC on the tune set, but this should be explicitly mentioned. It almost reads like models were first trained on the training set before moving on to the tune set, but both of these should be used simultaneously in the cross validation procedure.

Answer 8: Thank you for spotting this omission. Edits were made on lines 154 – 160:

We created prediction models using several machine learning techniques: random forest, XGBoost, logistic regression, neural network and support vector machines (Table 1) to determine which produced the best fitting model for a test set. Using cross validation, each technique trained multiple models using the training set and tested their performance on a subset of the training set. The model with the lowest error was then tested using the tune set. Once the performance in the tune set was deemed satisfactory, the final models were then fitted to the test set.

Question 9: Why weren't feature importances explored for models like logistic regression or SVM? It is certainly possible.

Answer 9: While possible, we felt that this was not worth delving into for two reasons: the non-superiority of any one model, and the feature importances showed similar results to the random forest model (parent reported mental health symptoms ranked highest). Since random forest was the slightly better model we chose to only report the variable importance for that model.

Question 10: I assume the "best performing model" is based on tune set performance (and not test set), but this should be explicitly mentioned.

Answer 10: Thank you for letting us clarify this. The set we are referring to is in fact the test set.

This has been clarified on lines 154 – 155:

We created prediction models using several machine learning techniques: random forest, XGBoost, logistic regression, neural network and support vector machines (Table 1) to determine which produced the best fitting model for a test set.

Additionally on lines 224 – 226:

Using a large range of data from parent reports and register data from numerous Swedish national registers, this study predicted adolescent mental health reasonably well, with a maximum AUC of 0.739 on the test set (using the random forest model).

Question 11: I find the description of the neural network to be problematic, several important parameters were not mentioned (# of layers, optimizer and its parameters, dropout, etc.). Furthermore, the final hidden dimension of 3 seems very low.

Answer 11: Thank you for your feedback, this particular portion of the analysis gave the authors the most trouble! The parameters not mentioned were not adjusted or modified in our analysis. Given our relatively linear data and small number of participants (from an ML standpoint), a larger number of hidden dimensions were unnecessary [2], moreover we tried a range of hidden dimensions and 3 was determined to lead to the best fitting model. See Answer 12 below for further information about hyper-parameters tested.

Question 12: The authors should construct a supplemental table of the ranges of parameters explored in the random search.

Answer 12: The suggested edit has been added to tables S1 – S4 in the supporting information.

This edit has also been reflected on line 191 – 192:

A full list of the optimal parameters and the ranges tried for each model can be found in S1-S4 tables.

Question 13: In several places, the authors need to be careful not to elevate or hint at elevating nonsignificant effects to significance. When the authors say two values are different, then later say they are not significantly different, they blur the conclusion. Statistically the numbers are not different. Better simply to say the two values did not significantly differ and leave it at that. I am no fan of null hypothesis statistical testing, but the authors chose that approach, and some scientific communities still use that approach, so the authors need to remain true to that approach, which posits that findings either are or are not different based on statistical significance.

Answer 13: Thank you for this valuable comment, we agree that a clearer distinction needed to be made. We’ve edited several lines to further clarify that there were no significant differences between the models.

Line 224 – 226:

In line 248- 249 this sentence was deleted:

The model created with random forest closely followed by support vector machine had the highest AUCs in the test set. However,

Additionally line 281 – 282 was changed to:

In summation, our models had a reasonable AUC, but no model had statistically significant higher performance than the other.

Question 14: Because participants being twins was immaterial to the scientific questions addressed in this study, the authors should explain why they used a twin sample. It seems as though they could have gotten data on a far larger sample of Swedish children if they did not restrict their focus to twins, who on average represent less than 3% of a population. It could be that the kinds of data collected on Swedish twins simply are not collected on their non-twin counterparts. If that's so, the authors should say that.

Answer 14: Thank you for this comment, this is indeed the case, the depth of information, e.g. longitudinal questionnaires, obtained from the Child and Adolescent Twin Study in Sweden is simply not available in singleton samples easily accessible by the authors.

An edit at lines 107 - 108 has been added:

This sample population was chosen due to the depth of information available, including questionnaire and register data

Question 15: The ms. would be improved by a section that very specifically enumerated important next steps in predicting teen mental health issues, given these twin data. What do the authors think would be good ways for scholars to increase the AOC to levels appropriate for clinical use, for example? Other scholars would very much appreciate this kind of insight to guide their work.

Answer 15: Thank you for this suggestion. We’ve added this information to lines 245 - 247:

Additionally, future studies with similar aims should focus on using symptom ratings for mental health, including neurodevelopmental disorders, for their model.

Question 16: An important limit to this work is cultural. Findings based on Swedes and Swedish culture may not broadly generalize, especially with regard to outcomes as socially defined as mental health concerns. So in addition to the five limitations briefly included in the Discussion, I suggest the authors add concerns about generalizability beyond Sweden and other very similar and similarly homogeneous nations.

Answer 16: We’ve added this limitation to lines 272 – 273:

On a similar note, our study results may not generalize outside of Sweden or Scandinavia, as all of our participants were Swedish born and we did not validate our results in an external sample.

1. Kuhn M. Caret: classification and regression training. Astrophysics Source Code Library 2015

2. Dreiseitl S, Ohno-Machado L. Logistic regression and artificial neural network classification models: a methodology review. Journal of biomedical informatics 2002;35(5-6):352-59

Submitted filename: Response Letter 20200123.docx

Decision Letter 1

28 Feb 2020

PONE-D-19-24985R1

Dear Dr. Tate,

We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements.

Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication.

Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/ , click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at gro.solp@gnillibrohtua .

If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact gro.solp@sserpeno .

With kind regards,

Wajid Mumtaz

Additional Editor Comments (optional):

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

2. Is the manuscript technically sound, and do the data support the conclusions?

3. Has the statistical analysis been performed appropriately and rigorously?

4. Have the authors made all data underlying the findings in their manuscript fully available?

5. Is the manuscript presented in an intelligible fashion and written in standard English?

6. Review Comments to the Author

Reviewer #1: The authors have fully addressed my concerns and made satisfactory improvements to the revised manuscript.

7. PLOS authors have the option to publish the peer review history of their article ( what does this mean? ). If published, this will include your full peer review and any attached files.

Acceptance letter

Dear Dr. Tate:

I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact gro.solp@sserpeno .

For any other questions or concerns, please email gro.solp@enosolp .

Thank you for submitting your work to PLOS ONE.

PLOS ONE Editorial Office Staff

on behalf of

Dr. Wajid Mumtaz

Mental Health Prediction Among Students Using Machine Learning Techniques

  • Conference paper
  • First Online: 26 April 2023
  • Cite this conference paper

mental health prediction using machine learning research paper

  • Savita Sahu 7 &
  • Tribid Debbarma 7  

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 326))

Included in the following conference series:

  • International Conference on Frontiers of Intelligent Computing: Theory and Applications

225 Accesses

Mental health problems in students are increasing worldwide. In this research, we consider some mental illnesses like stress, anxiety, Post-traumatic stress disorder (PSTD), Attention Deficit Hyperactivity Disorder (ADHD), and depression in students. If students mental health problem can be diagnosed early, they can treated in earlier stage. Presently, Machine Learning techniques are well suited for the analysis of medical data and the diagnosis of mental health problems. In this research work we apply Machine learning techniques are Logistic Regression, Decision Trees, Random Forests, K NN ( K -Nearest Neighbors) Classifiers, and Neural networks, and compared their accuracy on different measures. Data sets are collected for training and testing the performance of the techniques. Twelve factors have been identified as being important for predicting mental health from the data set, including biological, psychological, and physical factors. By applying feature selection algorithms we have been able to improve the accuracy of the proposed model. We prove that the Neural Network model is the most accurate for this type of prediction.

Tribid Debbarma contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Sumathi, M.R., Poorna, B.: Mental health prediction in children using machine learning techniques. Int. J. Adv. Comput. Sci. Appl. 7 (1) (2016)

Google Scholar  

http://blogs.fortishealthcare.com/mental-health-india-wake-up-call

A-survey-report-on-mental-health. https://www.nami.org/Support-Education/Publications-Reports/Survey-Reports/College-Students-Speak

Priyaa, A., Garga, S., Tiggaa, N.P.: Predicting anxiety, depression and stress in modern life using machine learning algorithms. In: International Conference on Computational Intelligence and Data Science (ICCIDS) (2019)

Hou, Y., Xu, J., Huang, Y., Ma, X.: An application to predict depression in the university based on reading habits. In: ICSAI, pp. 1085–1089 (2017)

Ge, F., Li, Y., Yuan, M., Zhang, J., Zhang, W.: Children and adolescents exposed to earthquakes and their risk factors for posttraumatic stress disorder: a longitudinal study using a machine learning approach. J. Affect. Disord. 264 , 483–493 (2020)

Article   Google Scholar  

Mutalib, S., Shafiee, N.S.M., Abdul-Rahman, S.: Mental health prediction in higher study student using machine learning. Turk. J. Comput. Math. Educ. (TURCOMAT) (2021)

Luo, M.: Research on students’ mental health based on data mining algorithms. Hindawi J. Healthc. Eng. (2021)

Sandhya, P., Kantesaria, M.: Prediction of mental disorder for employees in IT industry. Int. J. Innov. Technol. Explor. Eng. (IJITEE) 8(6S) (2019)

Laijawala, V., Aachaliya, A., Jatta, H., Pinjarkar, V.: Data mining for systematic review for prediction of mental health. In: Proceedings of the 3rd International Conference on Advances in Science and Technology (ICAST) (2020)

Spyrou, I.M., Frantzidis, C., Bratsas, C.: Methodologies of Classification Compared Control and Signal Processing in Biomedicine, pp. 118–129 (2016)

Suhaimi, N.M., Abdul-Rahman, S., Mutalib, S., Hamid, N.H.A., Ab Malik, A.M.: Learning Machine Learning Algorithms to Predict Graduate-on-Time. vol. 1100, pp. 130–141 (2019)

Sabourin, A.A., Prater, J.C., Mason, N.A.: Mental health in pharmacy student. Dept. Pharm. 11 (3), 243–250 (2013)

Dimitriadis, S.I., Liparas, D., Tsolaki, M.N., Alzheimer’s Disease Neuroimaging Initiative: Random forest feature selection for disease prediction. J. Neuron Sci. Method 302 , 14–23 (2018)

Bhakta, I., Sau, A.: Prediction of Depression Among Senior Citizens Using Machine Learning Classifiers

Fayez, M.A.: Diagnose mental health using new machine learning optimization technique. Department of ECE, Institute of Science, Altlnbaş University, Istanbul Turkey 12 (13), 809–815 (2021)

Alonso, S.G., De La Torre-Díez, I., Hamrioui, S., López-Coronado, M., Barreno, D.C., Nozaleda, L.M., et al.: Machine learning techniques in mental health. J. Med. Sci. 42 (161) (2018)

Ramírez-Gallego, S., Krawczyk, B., García, S., Woźniak, M., Herrera, F.: Survey of data prepossessing neurocomputing current and future setuation. Neurocomputing 239 , 39–57 (2017)

https://towardsdatascience.com/machine-learning-basics-with-the-k-nearest-neighbors-algorithm

Navyasri, M., RajeswarRao, R., DaveeduRaju, A., Ramakrishnamurthy, M. et al.: Robust features for emotion recognition from speech by using Gaussian mixture model classification. In: International Conference and Published Proceeding in SIST Series, vol. 2, pp. 437–444. Springer (2017)

Download references

Author information

Authors and affiliations.

Computer Science and Engineering, National Institute of Technology Agartala, Agartala, Tripura, 799046, India

Savita Sahu & Tribid Debbarma

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Savita Sahu .

Editor information

Editors and affiliations.

Department of Electronics Engineering, Faculty of Engineering and Technology, Veer Bahadur Singh Purvanchal University, Jaunpur, Uttar Pradesh, India

Vikrant Bhateja

School of Science and Technology, Middlesex University London, London, UK

Xin-She Yang

Western Norway University of Applied Sciences, Bergen, Norway

Jerry Chun-Wei Lin

Department of Computer Science and Engineering, National Institute of Technology Agartala, Agartala, West Tripura, India

Ranjita Das

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper.

Sahu, S., Debbarma, T. (2023). Mental Health Prediction Among Students Using Machine Learning Techniques. In: Bhateja, V., Yang, XS., Lin, J.CW., Das, R. (eds) Evolution in Computational Intelligence. FICTA 2022. Smart Innovation, Systems and Technologies, vol 326. Springer, Singapore. https://doi.org/10.1007/978-981-19-7513-4_46

Download citation

DOI : https://doi.org/10.1007/978-981-19-7513-4_46

Published : 26 April 2023

Publisher Name : Springer, Singapore

Print ISBN : 978-981-19-7512-7

Online ISBN : 978-981-19-7513-4

eBook Packages : Intelligent Technologies and Robotics Intelligent Technologies and Robotics (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

ORIGINAL RESEARCH article

Prediction of mental health in medical workers during covid-19 based on machine learning.

\nXiaofeng Wang

  • 1 Northeast Asian Research Center, Jilin University, Changchun, China
  • 2 Kuancheng Health Commission, Changchun, China
  • 3 Department of Social Medicine and Health Management, School of Public Health, Jilin University, Changchun, China

Mental health prediction is one of the most essential parts of reducing the probability of serious mental illness. Meanwhile, mental health prediction can provide a theoretical basis for public health department to work out psychological intervention plans for medical workers. The purpose of this paper is to predict mental health of medical workers based on machine learning by 32 factors. We collected the 32 factors of 5,108 Chinese medical workers through questionnaire survey, and the results of Self-reporting Inventory was applied to characterize mental health. In this study, we propose a novel prediction model based on optimization algorithm and neural network, which can select and rank the most important factors that affect mental health of medical workers. Besides, we use stepwise logistic regression, binary bat algorithm, hybrid improved dragonfly algorithm and the proposed prediction model to predict mental health of medical workers. The results show that the prediction accuracy of the proposed model is 92.55%, which is better than the existing algorithms. This method can be used to predict mental health of global medical worker. In addition, the method proposed in this paper can also play a role in the appropriate work plan for medical worker.

Introduction

Although the definition of mental health is not uniform in academic circles, the research significance of mental health is self-evident. Mental health has been widely used in psychology ( 1 ), sociology ( 2 ), psychiatry ( 3 ), pedagogy ( 4 , 5 ), genetics ( 6 ), and other fields.

Currently, some representative scales are usually used to measure mental health, such as Self-reporting Inventory (SCL-90) ( 7 ), Minnesota Multiphasic Personality Inventory (MMPI) ( 8 ), Self-Rating Anxiety Scale (SAS) ( 9 ), Self-Rating Depression Scale (SDS) ( 10 ), Eysenck Personality Questionnaire (EPQ) ( 11 ), the Sixteen Personality Factor Questionnaire (16PF) ( 12 ). Above scales are widely used internationally because they are guided by various psychological theories and can transform abstract mental health concepts into observable specific indicators. However, some shortcomings are not considered in the scales mentioned above. First, the different emphasis of scale measurement leads to the differences in the evaluation criteria because many factors need to be considered in the measurement of mental status. Second, the existing way of answering the scales is self-evaluation, which inevitably makes the respondent hold something back. Third, a lot of time is spent in obtaining the results of the scale for judging mental status in emergency situations. Although the diagnosis and intervention of mental symptoms are significant, prevention is even more important. Therefore, using existing information to predict mental health is of great significance.

Mental health prediction is conductive to detecting mental disorders in advance, reducing the incidence of serious mental illnesses, and facilitating the health system to provide people with targeted health care services ( 13 ). In particular, the mental health of medical workers is seriously threatened by the global spread of COVID-19. These workers are prone to anxiety and depression ( 14 ). United Nations Secretary-General António Guterres indicated in “Message on COVID-19 and the demand for action of mental health” ( 15 ) that various mental health services must be shifted to the community and must be included in the all-people medical plan. Based on a survey conducted by WHO, COVID-19 pandemic has caused the disruption of major mental health services in 93% of countries worldwide ( 16 ). However, there are urgent demand for mental health services in many countries. In addition, the delta variants have appeared in at least 98 countries and regions, and continue to mutate and evolve. Almost all new cases of CIVID-19 are the delta variants ( 17 ), and the delta variants are becoming the main epidemic strain in many countries. The delta variant pandemic is likely to further exacerbate the fears of the public and medical workers. Therefore, predicting the potential psychological symptoms of medical workers contributes to the mental health of medical personnel, and helps maintain the high efficiency of global medical institutions.

The existing mental health prediction methods are divided into statistical model methods and artificial intelligence algorithms.

Among the statistical model methods that are used for mental health prediction, structural equation models are widely used ( 18 – 20 ). Moving average methods are also commonly used in health prediction. Autoregressive Integrated Moving Average model (ARIMA) ( 21 – 23 ) and Exponential Smoothing (ES) ( 24 , 25 ) are the representative methods of the moving average model. Negative binomial model (NBM) ( 26 , 27 ) and fractional polynomials ( 28 ) also provide new mentalities for predicting health. However, the statistical analysis of the data can be achieved by the above methods, but the inherent relationship between the characteristic variables and the prediction results cannot be identified by the above methods. Therefore, the accuracy of mental health prediction based on statistical models is low.

For the purpose of improving the accuracy of mental health prediction, machine learning technology has been used in the mental health prediction research since the 1980s. Basavappa et al. ( 29 ) proposed a depth-first search method according to reverse search strategy in 1996, which is used to diagnose depression or dementia. Basavappa et al. developed an expert system based on the subjects' behavior, cognition, symptoms of emotion, and neuropsychological assessment results. Gil. and Manuel ( 30 ) come up with a system according to Artificial Neural Network (ANN) and Support Vector Machine (SVM) in 2009, which is used to diagnose Parkinson's disease. The system improves the accuracy of diagnosis and reduces the cost of diagnosis. Seixas et al. ( 31 ) come up with a model called Bayesian Network (BN) in 2014, which is used to diagnose dementia and Alzheimer disease. The experimental results show that compared with most other well-known classifiers, the BN decision model has better performance. Dabek and Caban ( 32 ) proposed a neural network model in 2015, which is used to assess the possibility of suffering from psychological illnesses.

However, three problems are not solved in the algorithms mentioned above. First, statistical models are difficult to tackle the impact of random interference factors on mental health because of their limitations. Therefore, statistical models cannot reflect the high uncertainty of mental health and the non-linear relationship between feature variables and prediction results, which leads to their low prediction accuracy. Second, existing machine learning methods that are used for mental health prediction only focus on prediction accuracy without considering the impact of feature variables on the importance of mental health. Therefore, the influence weight cannot be determined according to the degree to which feature variables have effects on the prediction results. As a result, the above algorithms cannot provide a theoretical basis for the health department to work out psychological intervention plans for medical workers. Third, a large number of irrelevant or redundant features are usually included in the datasets that are used for mental health prediction. Statistical model methods merely choose significant features rather than important features, which cannot eliminate the irrelevant and redundant features in dataset that influence prediction results. Consequently, the mental health prediction methods based on statistical model not only have low prediction accuracy, but also waste computing time.

To deal with the above problems, the article proposes a novel mental health prediction algorithm called the Improved Global Chaos Bat Back Propagation Neural Network (IGCBA-BPNN). The purpose of this article is to monitor the mental health of medical workers in time to reduce the incidence of mental illness of medical workers, and to rationalize the distribution of global public health resources. Therefore, IGCBA-BPNN is applied to the mental health prediction of Chinese medical workers. The experimental results show that, compared with the existing mental health prediction methods, IGCBA-BPNN not only improves the accuracy of mental health prediction, but also selects the fewest feature variables.

The contribution of this paper is the proposal of a new mental health prediction algorithm. The proposed algorithm can predict more effectively the mental health of medical workers during COVID-19, and at the same time provides a theoretical basis for global public health departments to work out psychological intervention plans.

The remaining content of this article is arranged as followed: in section Materials and Methods, we introduced the data and methods of this research. In section Results, the effectiveness of proposed algorithm is evaluated. At last, the discussion and conclusion are illustrated in detail.

Materials and Methods

Data preparation.

Using dataset from the “Mental Health Status of Medical Workers During COVID-19” survey conducted in Changchun, Jilin Province, China from June 1, 2020 to June 7, 2020, this paper predicts the mental health of Chinese medical workers during COVID-19. The subjects of above survey are medical workers who participated in epidemic prevention and control. According to the population status and the characteristics of geographical distribution, we selected 150 grass-roots medical units from 220 grass-roots medical units in the Changchun city and then randomly selects 35 medical workers in each grass-roots medical unit. The questionnaire is conducted online and 5,260 questionnaires were obtained in this survey. Based on research need, 152 unqualified samples were eliminated and the final sample size is 5,108. There are 32 variables in the questionnaire. In the process of designing the questionnaire, we collected as much as possible the basic information of the subjects and the variables information that may affect mental status of medical workers during COVID-19. Studies have shown that the measurable factors affecting mental status mainly include the five respects of demography ( 33 , 34 ), family ( 35 , 36 ), employment ( 37 , 38 ), lifestyle ( 39 , 40 ), and work/living environment related to COVID-19 ( 41 , 42 ). Based on the results of the existing literature and the actual situation of medical workers during COVID-19, 32 factors were decided.

The description of variables is presented in Table 1 . The data and its description are published on GitHub ( https://github.com/Hu-Li/mental-health-dataset ).

www.frontiersin.org

Table 1 . The description of variables.

This study had been reviewed and approved by the Ethics Committee of the School of Public Health, Jilin University. This study does not involve questions about the identity of the respondents. An informed consent page was provided on the first page of the questionnaire for confirmation. All participants voluntarily joined this study with informed consent.

Feature Selection

Bat algorithm.

The Bat Algorithm (BA) ( 43 ) proposed by Yang is widely used in many fields because of its simplicity, fast convergence speed and few parameters. The bat algorithm has been used by many scholars for feature selection ( 44 , 45 ). The excellent performance of the bat algorithm has also been verified in comparison with other most well-known algorithms such as genetic algorithm (GA) and particle swarm optimization (PSO) ( 46 ). Bat algorithm uses echolocation principles to simulate the predation process of bats. Bat algorithm is also an effective search method, and it is used to search for the global optimal solution. Original bat algorithm has three ideal hypotheses so as to simulate the predation behavior of bats:

First, bats use echolocation to perceive the distance between themselves and the target, and they can effectively distinguish targets and obstacles. Second, the i th bat flies randomly at a speed v i in the space position x i , and searches for targets with frequency f i , wavelength λ and loudness A i . Bats adjust the rate of emission of pulse r ( r ∈ [0,1]) according to the distance between themselves and prey. Third, loudness changes from maximum A max to minimum A min .

Based on the above three ideal hypotheses, in the search space, the calculating equations of the frequency, velocity and position of bats as follows:

where f i is the pulse frequency of the i th bat, and f min and f max are the minimum and maximum value of the pulse frequency, respectively, β is a random number within [0,1], v i t + 1 is the flight speed of the i th bat at the t + 1th iteration, v i t is the flight speed of the i th bat at the t th iteration, x i t is the position where the i th bat stays at the t th iteration, x i t + 1 is the position where the i th bat stays at the t + 1th iteration, x * is the optimal position of the bat in the current population.

In the process of searching for prey, the initial ultrasonic loudness of bats is large, but the emission rate is low. This helps bats search for prey in the entire space. When a bat finds prey, the loudness of volume that the bat emits is gradually reduced, and the rate of emission of pulse is gradually increased. Through the above adjustments, bats can more accurately determine the location of prey. The rate of emission of pulse and the loudness of volume that the bat emits are calculated as follows:

where r i t + 1 is the pulse emission rate of the i th bat at the t + 1th iteration, r i 0 is the maximum of pulse emission rate of the i th bat, γ(γ > 0) is the enhancement coefficient of the pulse frequency, A i t + 1 and A i t are the loudness of volume that the i th bat emits at the t + 1th iteration and at the t th iteration, respectively, α(α ∈ [0,1]) is the attenuation coefficient of the pulse loudness.

However, the bat algorithm is easy to fall into the local optimum, and the prediction accuracy of bat algorithm is low. The population initialization of the bat algorithm is randomly generated and does not have the ability to cover the entire solution space, which greatly affects the performance of the bat algorithm.

Improved Global Chaos Bat Algorithm

In order to overcome the shortcomings of the bat algorithm, Global Chaos Bat Algorithm (GCBA) ( 47 ) is introduced to eliminate redundant features and irrelevant features in the dataset. As a heuristic optimization algorithm, GCBA is used for feature selection. At first, in the initial stage, the chaotic map method is introduced to ensure the bat population traverse the entire solution space as much as possible. The chaotic map method also conducive to enriching the population diversity. Then, a fitness function based on accuracy and feature subset length is proposed to calculate the score of the feature subset after each update. Finally, GCBA selects the feature subset with the highest score from all feature subsets through the score calculated, which eliminate irrelevant features and redundant features from all feature variables.

To further improve the performance of GCBA, Improved Global Chaos Bat Algorithm (IGCBA) with higher accuracy and better performance is proposed, in which a nonlinear function based on the number of iterations is designed to balance IGCBA's exploitation and exploration capabilities. In the early stage of IGCBA, the algorithm is inclined toward the exploration capability. Global information is fully utilized to enable IGCBA to traverse the entire solution space as much as possible. In the later stage of IGCBA, the algorithm is inclined toward exploitation capability. Partial information is fully utilized to enable IGCBA to obtain the better solution through further exploitation.

Currently, the logistic method is widely used as a chaotic map method. The initial population generated by this method is diverse and can traverse the entire solution space. Therefore, in this paper, the initialization of the population is finished by using an improved logistic mapping method, and its mathematical model ( 48 ) is:

where y i d ( i = 1 , 2 , ⋯ N , d = 1 , 2 , ⋯ D ) ( y i d ∈ [ 0 , 1 ] ) is the chaotic variable, N is the amount of bat population, and D is the dimension of initial population. Then, the position x i d of the bat individual in the solution space is obtained by inverse mapping of y i d . The calculating equation of x i d is:

where l i and u i are the minimum and maximum value of the variable range, respectively.

The local optimum position of the bat and the global optimum position of the population are recorded when the position of each bat is updated. The position of the i th bat at the t + 1th iteration can be calculated as follows:

where P i is the local optimal position of the i th bat, P g is the global optimal position of the bat population, r 1 and r 2 are two random numbers within [0,1].

C 1 is the control coefficient that balances the global exploration capability of IGCBA, represents the degree to which the historical optimal position of a bat individual has effect on the current state of the bat. The larger the C 1 is, the more the algorithm focuses on exploitation capability. C 2 is the control coefficient that balances the local exploitation capability of IGCBA, represents the degree to which the historical optimal position of the bat population has effect on the current state of the bat. The larger the C 2 is, the more the algorithm focuses on exploration capability.

In the preliminary stage of algorithm, it is necessary to traverse the entire solution space as much as possible to ensure that the algorithm does not converge prematurely. Therefore, in the early stage of the algorithm, C 2 should be as large as possible and C 1 as small as possible; in the later stage of the algorithm, C 1 should be as large as possible and C 2 as small as possible. In this way, the algorithm can get better performance. According to the above analysis, the calculating equation of C 1 and C 2 as follows:

where t represents the current iteration times, T represents the maximum iteration times.

When initializing the bat population, we use a matrix of size N × D . N is the number of bat population, D is the number of features. In this paper, a transfer equation is used to perform discrete binary operations on the bat's position. The transfer equation is:

where x i d ( t ) is the position of the i th bat individual in the d th dimension at the t th iteration.

The updating equation of position of the bat individual is:

where rand is a random number within [0,1].

When the i th bat's position in the d th dimension at the t th iteration is 0, this bat will not be selected. When the i th bat's position in the d th dimension at the t th iteration is 1, this bat will be selected.

Back Propagation Neural Network

Back Propagation Neural Network (BPNN) is particularly suited for solving the non-linear problems ( 49 ), so it is widely used in the field of health prediction ( 50 ). In the process of back propagation of prediction errors, the connection weights and bias are constantly adjusted. Finally, the output predicted by BPNN is constantly close to the expected output.

Before using BPNN for prediction, the network needs to be trained. Through training, the network will have associative memory and predictive capabilities. The main steps of the BPNN training process are:

Step 1: Initialize the network. Based on the input and output sequence ( X, Y ), the number of the input layer nodes s and the output layer nodes m can be determined. The number of hidden layers and the number of the hidden layer nodes l are given by experience. The connection weight w hj ( h = 1, 2, ⋯ s ; j = 1, 2, ⋯ l ) between the input and the hidden layer, the connection weight w jk ( j = 1, 2, ⋯ l ; k = 1, 2, ⋯ , m ) between hidden and the output layer, the hidden layer bias value a j and the output layer bias value b k are initialized. Given the learning rate η, the activation function g ( x ). In order to solve non-linear problems, the activation function usually uses the Sigmoid function, which is defined as follows:

Step 2: The output of the hidden layer. The output H j of the hidden layer is calculated based on the input vector X , ω hj and a j .

Step 3: The output of the output layer. The prediction output O k of BPNN is calculated based on H j , ω jk and b k .

Step 4: Calculate prediction error. The prediction error of p th simple E p is calculated based on prediction output of p th simple O pk and expected output of p th simple Y pk .

Step 5: Calculate the reverse transmission value. The reverse transmission value of output layer δ k , and the reverse transmission value of hidden layer δ j are calculated as follows:

Step 6: Update the weight. η is the learning rate, and the weight ω hj and ω jk are updated as follows:

Step 7: Update the bias value. The bias value a j and b k are updated based on δ j and δ k .

Step 7: Determine whether the algorithm iteration is over, if not, return to step 2.

Improved Global Chaos Bat Back Propagation Neural Network

Figure 1 illustrates the process of IGCBA-BPNN. First, initialize all variables. Second, IGCBA is used for feature selection to select a feature subset that can represent as much information as possible of the original features and as few numbers as possible. Existing research has proved that compared with other classifiers, SVM has higher classification accuracy ( 51 ) and better stability ( 52 ). Therefore, SVM is used to judge the quality of the feature subset selected by IGCBA. Third, the features selected by IGCBA are used as the input of the BPNN to reduce the model complexity of BPNN.

www.frontiersin.org

Figure 1 . The flowchart of IGCBA-BPNN.

Parameter Settings

Table 2 shows the parameter settings of feature selection algorithms. In binary bat algorithm (BBA) ( 46 ), GCBA, IGCBA, A is the loudness of volume that the bat emits and is set to 1.5, r is the rate of emission of pulse and is set to 0.5, f max is the maximum value of the pulse frequency and is set to 1, and f min is the minimum value of the pulse frequency and is set to 0. In GCBA, C 1 is the control coefficient, represents the degree to which the historical optimal position of a bat individual has effect on the current state of the bat. C 1 is set to 1.49618. C 2 is the control coefficient, represents the degree to which the historical optimal position of the bat population has effect on the current state of the bat. C 2 is set to 1.49618. In hybrid improved dragonfly algorithm (HIDA) ( 53 ), s and a are the separation weight and the alignment weight, respectively, and they are both set to 0.1. c is the cohesion weight and is set to 0.7. f and e are the food factor and the enemy factor, respectively, and they are both set to 1. w is the inertia weight and is set to 0.9. In information gain binary butterfly optimization algorithm (IG-bBOA) ( 54 ), N represents the number of butterflies and is set to 10, p is the transition probability and is set to 0.8, a is the power exponent and is set to 0.1, C is the sensory modality and is set to 0.01-0.25. α, β, and δ are set to 0.99, 0.001, and 0.009, respectively. In hyper learning binary dragonfly algorithm (HLBDA) ( 55 ), the parameters of s , a , c , f , e , and w are consistent with HIDA. The pl is the personal learning rate and is set to 0.4, and gl is the global learning rate and is set to 0.7.

www.frontiersin.org

Table 2 . The parameter settings of feature selection algorithms.

After combining BPNN with SR, BBA, HIDA, GCBA, IGCBA, IG-bBOA, and HLBDA, the relevant parameters are set in Tables 3 , 4 . q is the number of the hidden layer and is set to 1. p is the training goal and is set to 1,000. g is the training goal and is set to 1e-4. η is the learning rate and is set to 0.08. l is the number of the hidden layer nodes, and as a matter of experience, it is often set to half of the number of input layer nodes. The number of hidden layer nodes of SR-BPNN-4, BBA-BPNN-4, BBA-BPNN-8, HIDA-BPNN-4, HIDA-BPNN-16, GCBA-BPNN-4, GCBA-BPNN-16 and IGCBA-BPNN-4, IG-bBOA-BPNN-4, IG-bBOA-BPNN-10, HLBDA-BPNN-4, and HLBDA-BPNN-14 is set to 4, 4, 8, 4, 16, 4, 9, 4, 4, 10, 4, and 14, respectively.

www.frontiersin.org

Table 3 . Common parameter settings in BPNN.

www.frontiersin.org

Table 4 . The number of hidden layer nodes in BPNN.

Experiment Results

We make experiments to compare the IGCBA algorithm with stepwise regression (SR) ( 56 ), BBA, HIDA, GCBA, IG-bBOA, and HLBDA methods on the survey dataset in this section. At the same time, we also perform experiments to compare the IGCBA-BPNN algorithm with SR-BPNN, BBA-BPNN, HIDA-BPNN, GCBA-BPNN, IG-bBOA-BPNN and HLBDA methods on the survey dataset. Given that BPNN, K-Nearest Neigbour (KNN) ( 57 ) and decision tree (DT) ( 58 ) are important methods for classification, we also add the comparison results of BPNN with KNN and DT. Table 5 shows the experimental results.

www.frontiersin.org

Table 5 . Comparison of prediction accuracy of different algorithms.

Compared with SR, BBA, HIDA and GCBA, HIDA and HLBDA have the highest prediction accuracy followed by IGCBA. However, the number of features finally found by IGCBA is 23 and 20 fewer than HIDA and HLBDA, respectively. Besides, the number of features selected by IGCBA is also less than other methods. By comparing the performance of the feature selection algorithms, it can be proved that IGCBA can reduce the irrelevant and redundant features in the original features as much as possible without reducing the prediction accuracy of the classifier.

The prediction accuracy of SR-BPNN-4 is 0.98% higher than that of SR. The prediction accuracy of BBA-BPNN-4 and BBA-BPNN-8 is 0.19 and 0.59% higher than that of BBA, respectively. The prediction accuracy of HIDA-BPNN-4 and HIDA-BPNN-16 is 2.15 and 2.55% higher than that of HIDA, respectively. The prediction accuracy of GCBA-BPNN-4 and GCBA-BPNN-9 is 1.71 and 2.49% higher than that of GCBA, respectively. The prediction accuracy of IG-bBOA-BPNN-4 and IG-bBOA-BPNN-10 is 1.51 and 1.90% higher than that of IG-bBOA, respectively. The prediction accuracy of HLBDA-BPNN-4 and HLBDA-BPNN-14 is 1.73 and 2.32% higher than that of HLBDA. The prediction accuracy of IGCBA-BPNN-4 is 4.04% higher than IGCBA. The above experimental results prove that compared with the feature selection algorithms, the feature selection algorithms combined with BPNN can improve the prediction accuracy.

The prediction accuracy of SR-BPNN-4, BBA-BPNN-4, BBA-BPNN-8, HIDA-BPNN-4, HIDA-BPNN-16, GCBA-BPNN-4, GCBA-BPNN-9, IG-bBOA-BPNN-4, IG-bBOA-BPNN-10, HLBDA-BPNN-4, and HLBDA-BPNN-14 is 88.43, 88.62, 89.02, 90.78, 91.18, 90.20, 90.98, 90.00, 90.39, 90.39, and 90.98%, respectively. The prediction accuracy of IGCBA-BPNN-4 is 92.55%, which is 4.12, 3.93, 3.53, 1.77, 1.37, 2.35, 1.57, 2.55, 2.16, 2.16, and 1.57% higher than that of SR-BPNN-4, BBA-BPNN-4, BBA-BPNN-8, HIDA-BPNN-4, HIDA-BPNN-16, GCBA-BPNN-4, GCBA-BPNN-9, IG-bBOA-BPNN-4, IG-bBOA-BPNN-10, HLBDA-BPNN-4, and HLBDA-BPNN-14. The experimental results of combining each feature selection algorithm with BPNN prove that IGCBA-BPNN-4's performance is better than other algorithms. At the same time, the prediction accuracy of IGCBA-KNN and IGCBA-DT is 87.52 and 79.56%, respectively. The prediction accuracy of IGCBA-BPNN-4 is 5.03 and 12.99% higher than that of IGCBA-KNN and IGCBA-DT, respectively. It can be proved that BPNN is better than KNN and DT for classification on survey dataset. Therefore, IGCBA-BPNN-4 model has good applicability in predicting the mental health of medical workers in public health events.

For the purpose of better verifying the superior convergence performance of the IGCBA algorithm on the test dataset, Figure 2 shows the convergence performance of the six algorithms. By directly plotting the classification accuracy curve with the iteration times, we can see that the classification accuracy increases monotonously at each iteration until level off. Figure 2 shows that GCBA converges faster than BBA. From Figure 2 , it can be analyzed that GCBA does not solve the shortcoming that BBA falls into the local optimal solution easily. IGCBA falls into the local optimal solution at the 46th iteration, and it jumped out of the local optimal solution at the 66th iteration. Although the ability of IGCBA to jump out of the local optimal solution is not as good as HIDA and HLBDA, it is significantly better than IG-bBOA, BBA and GCBA.

www.frontiersin.org

Figure 2 . Convergence curves of the six algorithms on the survey dataset.

Figure 3 shows that the alteration trend of the number of features selected by the six algorithms in the survey dataset with the number of iterations. Since the non-linear equation balances the exploitation and exploration capabilities of IGCBA, IGCBA has strong exploitation capabilities in the later stage. Therefore, IGCBA finds fewer features at the 65th iterations. Particularly, although the prediction accuracy of HIDA and HLBDA in Figure 2 is 0.12 and 0.12% higher than that of IGCBA, the number of features finally found by IGCBA is 23 and 20 fewer than HIDA and HLBDA. Combining Figures 2 , 3 , the experimental results show that IGCBA has strong exploitation ability and superior performance in the later optimization stage.

www.frontiersin.org

Figure 3 . The feature numbers curves of the six algorithms on the survey dataset.

Analysis of the Degree of Feature Variables on Mental Health

Mean Impact Value (MIV) is currently considered to be one of the best algorithms for evaluating the correlation between input variables and output variables. Sorting the variables according to MIV's absolute value can determine the degree of influence of input variables on network output variables. The symbol of the MIV value represents the relative direction, and the relative importance of the impact is represented by MIV's absolute value.

The IGCBA-BPNN-4 prediction model eliminates irrelevant and redundant features in the original dataset, decreases model running time, and improves the prediction accuracy of the classifier. At the same time, the feature variables that affect mental health are sorted according to the degree of their importance. Table 6 shows that IGCBA-BPBB-4 selects a total of nine feature variables that affect mental health. The nine feature variables are “Have patients with COVID-19 or not in the living place,” “age,” “employment type,” “Have patients with COVID-19 or not in the workplace,” “the work unit is a designated treatment point or not,” “changes in work intensity,” “usual sleep time,” “place of residence,” and “marital status.” We analyze the factors affecting mental health according to their degree of importance.

www.frontiersin.org

Table 6 . Feature variables that affect mental health: sorted by importance.

Variables in statistics are divided into numerical variables and categorical variables. When considering the impact of input variables on output variables, the direction of the symbol is only meaningful for numerical variables, and has no meaning for categorical variables. Since the variables in this article are mostly categorical variables, the positive or negative influence of the symbol is not considered in this analysis.

In the community transmission stage of the epidemic, according to a study that cluster transmission occurs in multiple communities and families. On average, each patient transmits the infection to 2.2 people ( 59 ). When relatives, friends, and nearby people in the living place are determined to be suspected or confirmed cases, people will have psychological problems such as fear and anxiety due to fear of infection.

Patients with COVID-19 are mostly elderly people. Under normal circumstances, the deterioration of body function with age decreases the health levels of the elderly. The elderly are more vulnerable to the threat of diseases because their immune system is relatively weak. In the “Questions and Answers About COVID-19 and the Elderly” on the WHO official website, a clear answer is also given to the question “Who is at risk of severe illness,” that is, the elderly and all ages of people who are diagnosed with diseases such as hypertension, heart disease, lung disease, diabetes or cancer are more likely to suffer from severe illness than others ( 60 ).

Differences in employment type lead to differences in the psychological status of medical workers. In contrast with formal medical personnel, temporarily hired medical personnel may show a stronger sense of anxiety and fear during COVID-19. On the one hand, due to the absence of both manpower capital and social capital, temporary medical workers are more likely to engage in low-tech and labor-intensive jobs. The work pressure caused by high labor intensity make easily temporary medical workers prone to anxiety and hostility. On the other hand, most temporary medical workers are exposed to such a severe epidemic for the first time. They lack the work experience and sufficient mental preparation to deal with severe infectious diseases. At the same time, due to the lack of objective cognition of COVID-19, they are in a highly alert state at work, and their anxiety and fear are more prominent.

There are patients with COVID-19 in the workplace, especially the workplace is a designated treatment point for COVID-19, which will have a greater impact on the mental status of medical workers. In face of high-intensity work pressure and the risk of being infected, medical workers are more likely to become a high-risk group with psychological symptoms. Less sleep and poor sleep during COVID-19 can cause sleep disorders, and sleep disorders are often accompanied by symptoms such as depression, tension, anxiety, hostility and irritability ( 61 ). For people who do not have a spouse, they cannot get timely help when they encounter difficulties and need a good listener. They are prone to anxiety and depression ( 62 ). The farther the place of residence is from the city center, the lower the population density. It is difficult for COVID-19 to spread rapidly in rural areas ( 63 ), and people living in rural areas have less fear of COVID-19 than people living in cities.

According to the above observations, we can make a conclusion that the performance of IGCBA-BPNN-4 is better than other algorithms. First, BPNN learns the non-linear relationship between feature variables and prediction results, which improves the accuracy of mental health prediction. The results in Table 5 indicate that the accuracy of the feature selection algorithms combined with BPNN is higher than that of the feature selection algorithms without BPNN, with an average increase of 2.46%. Particularly, the accuracy of IGCBA-BPNN-4 is 4.04% higher than that of IGCBA. Second, the value calculated by MIV is used as the influence weight, which assesses the extent to which feature variables contribute to mental health. It can be seen from Table 6 that through the calculation of MIV, the nine feature variables that affect mental health are sorted by their importance. The top three important factors affecting mental health are “whether there are patients with COVID-19 in the workplace,” “age” and “employment type.” The result corresponds with our expectations. Third, GCBA eliminates irrelevant and redundant features in the original features, which reduces BPNN's complexity. The results in Table 5 indicate that GCBA reduces the number of features in the survey dataset from 32 to 18. Although GCBA selects more features than SR and BBA, it has higher prediction accuracy. Fourth, the non-linear equation in IGCBA balances the exploitation and exploration capabilities of IGCBA, which accelerates the convergence speed of IGCBA and prevents IGCBA from falling into a local optimal solution. It can be seen from Figures 2 , 3 that IGCBA does not fall into the local optimal solution due to its certain exploration capabilities in the later stage. As a result, IGCBA obtains a feature subset that can represent as much information as possible of the original features and as few features as possible. The number of features selected by IGCBA is only half of the number of features selected by GCBA. Besides, the prediction accuracy rate of IGCBA is higher than that of GCBA.

It should be pointed out that although many people have been vaccinated against COVID-19, the COVID-19 epidemic is far from over due to the spread of mutant strains. COVID-19 directly endangers people's lives, and it is extremely important to diagnose COVID-19 quickly and accurately. The latest method proposed by Wang et al. ( 64 , 65 ) may help diagnose COVID-19 more quickly and effectively. In the fight against COVID-19, when the psychological symptoms of medical workers are discovered and intervened in time, the work efficiency of the entire health system will be improved. The algorithm proposed in this article can more effectively predict the mental health of medical staff, and the research results can also be directly used by global public health departments. However, several limitations also exist in our research. First, the data in this article was obtained through an online survey, and this research is an observational study. As a result, self-report problems and recall biases are inevitable to some extent. Secondly, mental health is affected by personal, family, economic, social environment and other factors. The factors affecting mental health in this article are incomprehensive. Finally, some parameters that are set manually are used in our algorithm. The parameters of the neural network are given by experience rather than obtained from adaptive changes or learning. We will solve this problem in future work.

Conclusions

The accuracy of existing mental health prediction methods is low because the relationship between the feature variables and the prediction results is non-linear and the prediction dataset contains a lot of irrelevant and redundant features. At the same time, current mental health prediction methods cannot estimate the extent to which the feature variables are important to the prediction results. Therefore, this paper proposes IGCBA-BPNN. First, BPNN is introduced to deal with the non-linear problem between prediction results and feature variables, which improves the accuracy of mental health prediction. Second, MIV is introduced to calculate the influence weight, which assesses the extent to which feature variables contribute to mental health. Third, GCBA is introduced to eliminate redundant and irrelevant features in the original features, which reduces the model complexity of BPNN and improves the performance of BPNN. Fourth, a non-linear equation is designed in IGCBA to speeds up the convergence speed of IGCBA and prevents IGCBA from falling into a local optimal solution. Experiment results show that the performance of IGCBA-BPNN is better than existing algorithms. The IGCBA-BPNN prediction model can obtain good results in mental health prediction.

However, IGCBA only reduces BPNN's input dimension. The BPNN's structure is not improved, and the parameters in the BP network is not optimized. Therefore, how to ascertain the number of neural network nodes is an important challenge in the future.

In a word, with the development of swarm intelligence algorithms and neural network technology, the methods based on swarm intelligence algorithms combined with neural networks are playing an increasingly significant role in the field of prediction. In the future health prediction research, the prediction method based on swarm intelligence algorithm combined with neural network will have a wider application prospect.

Data Availability Statement

The datasets presented in this study can be found in an online repository: https://github.com/Hu-Li/mental-health-dataset .

Ethics Statement

The studies involving human participants were reviewed and approved by the Ethics Committee of the School of Public Health, Jilin University. The ethics committee waived the requirement of written informed consent for participation.

Author Contributions

XW and HL came up with the original idea. HL and TW designed this study and provided research methods. CS and XZ completed the data collection and performed the statistical analysis. TW conducted the experiments. XW supervised the research. HL drafted the manuscript. XW, HL, DG, and CD improved the manuscript. All authors contributed to the article and approved the final version.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Acknowledgments

We sincerely thank all participants for their support in this research.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpubh.2021.697850/full#supplementary-material

1. Angehrn A, Sapach MJNT, Ricciardelli R, MacPhee RS, Anderson GS, Carleton RN. Sleep quality and mental disorder symptoms among Canadian public safety personnel. Int J Environ Res Public Health . (2020) 17:2708. doi: 10.3390/ijerph17082708

PubMed Abstract | CrossRef Full Text | Google Scholar

2. Lee MH, Seo MK. Community integration of persons with mental disorders compared with the general population. Int J Environ Res Public Health . (2020) 17:1596. doi: 10.3390/ijerph17051596

3. Tang J, Wu L, Huang HL, Feng J, Yuan YF, Zhou YP, et al. Back propagation artificial neural network for community Alzheimer's disease screening in China. Neural Regen Res . (2013) 8:270-6. doi: 10.3969/j.issn.1673-5374.2013.03.010

4. Francis L, Barling J. Organizational injustice and psychological strain. Can J Behav Sci . (2005) 37:250-61. doi: 10.1037/h0087260

CrossRef Full Text | Google Scholar

5. Ko HC. A sustainable approach to mental health education: an empirical study using Zhuangzi's self-adaptation. Sustainability . (2019) 11:3677. doi: 10.3390/su11133677

6. Pirooznia M, Seifuddin F, Judy J, Mahon PB, Potash JB, Zandi PP, et al. Data mining approaches for genome-wide association of mood disorders. Psychiat Genet . (2012) 22:55-61. doi: 10.1097/YPG.0b013e32834dc40d

7. Derogatis LR, Melisaratos N. The brief symptom inventory-an introductory report. Psychol Med . (1983) 13:595-605. doi: 10.1017/S0033291700048017

8. Hathaway SR, McKinley JC. A multiphasic personality schedule (Minnesota): I. construction of the schedule. J Psychol . (1940) 10:249-54. doi: 10.1080/00223980.1940.9917000

9. Zung WWK. A rating instrument for anxiety disorders. Psychosomatics . (1971) 12:371-9. doi: 10.1016/S0033-3182(71)71479-0

10. Zung WWK. A self-rating depression scale. Arch Gen Psychiat . (1965) 12:63-70. doi: 10.1001/archpsyc.1965.01720310065008

11. Eysenck SBG, Eysenck HJ, Barrett P. A revised version of the psychoticism scale. Pers Indiv Differ . (1985) 6:21-9. doi: 10.1016/0191-8869(85)90026-1

12. Cattell RB. Scree test for number of factors. Multivar Behav Res . (1966) 1:245-76. doi: 10.1207/s15327906mbr0102_10

13. Center for Global Development. Mapping and Realigning Incentives in the Global Health Supply Chain . Available online at: http://www.cgdev.org/doc/DemandForecasting/Principles.pdf (accessed September 24, 2020).

14. Cosic K, Popovic S, Sarlija M, Kesedzic I, Jovanovic T. Artificial intelligence in prediction of mental health disorders induced by the COVID-19 pandemic among health care workers. Croat Med J . (2020) 61:279-88. doi: 10.3325/cmj.2020.61.279

15. Inter-Agency Standing Committee. António Guterres (UN Secretary-General) on COVID-19 and the Need for Action on Mental Health . Available online at: https://interagencystandingcommittee.org/iasc-reference-group-mental-health-and-psychosocial-support-emergency-settings/antonio-guterres-un (accessed June 12, 2020).

16. World Health Organization. The Impact of COVID-19 on Mental, Neurological and Substance Use Services: Results of a Rapid Assessment . Available online at: https://www.who.int/docs/default-source/mental-health/ppt-who-covid19-mental-health-rapid-assessment-v10.pdf?sfvrsn=2f45b88a_2 (accessed August 20, 2020).

Google Scholar

17. O'Dowd A. Covid-19: cases of delta variant rise by 79%, but rate of growth slows. BMJ . (2021) 373:n1596. doi: 10.1136/bmj.n1596

18. Yang X, Jin M, Zheng L. The prediction model of college students' life stressor on mental health. China J Health Psychol. (2018) 26:775-8. doi: 10.13342/j.cnki.cjhp.2018.05.001

19. Margraf J, Zhang XC, Lavallee KL, Schneider S. Longitudinal prediction of positive and negative mental health in Germany, Russia, and China. PLoS ONE . (2020) 15:e0234997. doi: 10.1371/journal.pone.0234997

20. Wilson MG, DeJoy DM, Vandenberg RJ, Richardson HA, McGrath AL. Work characteristics and employee health and well-being: test of a model of healthy work organization. J Occup Organ Psych . (2004) 77:565-88. doi: 10.1348/0963179042596522

21. Boyle J, Jessup M, Crilly J, Green D, Lind J, Wallis M, et al. Predicting emergency department admissions. Emerg Med J . (2012) 29:358-65. doi: 10.1136/emj.2010.103531

22. Champion R, Kinsman LD, Lee GA, Masman KA, May EA, Mills TM, et al. Forecasting emergency department presentations. Aust Health Rev . (2007) 31:83-90. doi: 10.1071/AH070083

23. Reis BY, Mandl KD. Time series modeling for syndromic surveillance. BMC Med Inform Decis Mak . (2003) 3:2. doi: 10.1186/1472-6947-3-2

24. Medina DC, Findley SE, Guindo B, Doumbia S. Forecasting non-stationary diarrhea, acute respiratory infection, and malaria time-series in Niono, Mali. PLoS ONE . (2007) 2:e1181. doi: 10.1371/journal.pone.0001181

25. Hyndman RJ, Koehler AB, Snyder RD, Grose S. A state space framework for automatic forecasting using exponential smoothing methods. Int J Forecasting . (2002) 18:439-54. doi: 10.1016/S0169-2070(01)00110-8

26. Soyiri I, Reidpath D, Sarran C. Determinants of asthma length of stay in London hospitals: individual versus area effects. Emerg Health Threats J . (2011) 4:143. doi: 10.3402/ehtj.v4i0.11179

27. Soyiri I, Reidpath D, Sarran C. Asthma length of stay in hospitals in London 2001-2006: demographic, diagnostic and temporal factors. PLoS ONE . (2011) 6:e27184. doi: 10.1371/journal.pone.0027184

28. Williams JS. Assessing the suitability of fractional polynomial methods in health services research: a perspective on the categorization epidemic. J Health Serv Res Po . (2011) 16:147-52. doi: 10.1258/jhsrp.2010.010063

29. Basavappa SR, Rao SL, Harish B. Expert system for dementia/depression diagnosis. Nimhans J. (1996) 14:99-106.

30. Gil D, Manuel DJ. Diagnosing Parkinson by using artificial neural networks and support vector machines. J Comput Sci Technol . (2009) 9:63-71. https://lup.lub.lu.se/record/1776690

31. Seixas FL, Zadrozny B, Laks J, Conci A, Saade DCM. A Bayesian network decision model for supporting the diagnosis of dementia, Alzheimer's disease and mild cognitive impairment. Comput Biol Med . (2014) 51:140-58. doi: 10.1016/j.compbiomed.2014.04.010

32. Dabek F, Caban JJ. A neural network based model for predicting psychological conditions. In: Proceedings of the 2015 8th International Conference on Brain Informatics and Health (BIH) . London (2015). p. 252-61.

33. Liu CY, Yang YZ, Zhang XM, Xu XY, Dou QL, Zhang WW. The prevalence and influencing factors in anxiety in medical workers fighting COVID-19 in China: a cross-sectional survey. Epidemiol Infect . (2020) 148:e98. doi: 10.1017/S0950268820001107

34. Lei L, Huang XM, Zhang S, Yang JR, Yang L, Xu M. Comparison of prevalence and associated factors of anxiety and depression among people affected by versus people unaffected by quarantine during the COVID-9 epidemic in Southwestern China. Med Sci Monitor . (2020) 26:e924609. doi: 10.12659/MSM.924609

35. Covinsky KE, Newcomer R, Fox P, Wood J, Sands L, Dane K, et al. Patient and caregiver characteristics associated with depression in caregivers of patients with dementia. J Gen Intern Med . (2003) 18:1006-14. doi: 10.1111/j.1525-1497.2003.30103.x

36. Lu JJ, Kong JX, Song JS, Li L, Wang HM. The health-related quality of life of nursing workers: a cross-sectional study in medical institutions. Int J Nues Pract . (2019) 25:e12754. doi: 10.1111/ijn.12754

37. Gomez MAL, Sabbath E, Boden L, Williams JAR, Hopcia K, Hashimoto D, et al. Organizational and psychosocial working conditions and their relationship with mental health outcomes in patient-care workers. J Occup Environ Med . (2019) 61:e480-5. doi: 10.1097/JOM.0000000000001736

38. Maunder RG, Lancee WJ, Balderson KE, Bennett JP, Borgundvaag B, Evans S. Long-term psychological and occupational effects of providing hospital healthcare during SARS outbreak. Emerg Infect Dis . (2006) 12:1924-32. doi: 10.3201/eid1212.060584

39. Lopresti AL, Hood SD, Drummond PD. A review of lifestyle factors that contribute to important pathways associated with major depression: diet, sleep and exercise. J Affect Disord . (2013) 148:12-27. doi: 10.1016/j.jad.2013.01.014

40. Hoare E, Milton K, Foster C, Allender S. The associations between sedentary behaviour and mental health among adolescents: a systematic review. Int J Behav Nutr Phy . (2016) 13:108. doi: 10.1186/s12966-016-0432-4

41. Lai JB, Ma SM, Wang Y, Cai ZX, Hu JB, Wei N, et al. Factors associated with mental health outcomes among health care workers exposed to coronavirus disease 2019. JAMA Netw Open . (2020) 3:e203976. doi: 10.1001/jamanetworkopen.2020.3976

42. Lu W, Wang H, Lin YX, Li L. Psychological status of medical workforce during the COVID-19 pandemic: a cross-sectional study. Psychiat Res . (2020) 288:112936. doi: 10.1016/j.psychres.2020.112936

43. Yang XS. (2010) A New Metaheuristic Bat-Inspired Algorithm. In: González J.R., Pelta D.A., Cruz C., Terrazas G., Krasnogor N. (eds) Nature Inspired Cooperative Strategies for Optimization (NICSO 2010) . Studies in Computational Intelligence, vol 284. Berlin, Heidelberg:Springer, p. 65–74. https://doi.org/10.1007/978-3-642-12538-6_6

44. Nakamura RYM, Pereira LAM, Costa KA, Rodrigues D, Papa JP, Yang XS. BBA: a binary bat algorithm for feature selection. In: 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images . Ouro Preto: IEEE (2012). p. 291-7. https://ieeexplore.ieee.org/abstract/document/6382769 .

45. Rodrigues D, Pereira LAM, Nakamura RYM, Costa KAP, Yang XS, Souza AN. A wrapper approach for feature selection and optimum-path forest based on bat algorithm. Expert Syst Appl . (2014) 41:2250-8. doi: 10.1016/j.eswa.2013.09.023

46. Mirjalili S, Mirjalili SM, Yang X. Binary bat algorithm. Neural Comput Appl . (2014) 25:663-81. doi: 10.1007/s00521-013-1525-5

47. Cui X, Li Y, Fan J. Global chaotic bat optimization algorithm. J Northeast Univ (Nat Sci). (2020) 41:488-91. doi: 10.12068/j.issn.1005-3026.2020.04.006

48. Kazimipour B, Li XD, Qin AK. A review of population initialization techniques for evolutionary algorithms. In: Proceedings of the 2014 IEEE Congress on Evolutionary Computation (CEC) . Beijing (2014). p. 2585-92.

49. Grossi E, Buscema M. Introduction to artificial neural networks. Eur J Gastroen Hepat . (2007) 19:1046-54. doi: 10.1097/MEG.0b013e3282f198a0

50. Ramesh AN, Kambhampati C, Monson JRT, Drew PJ. Artificial intelligence in medicine. Ann Roy Coll Surg . (2004) 86:334-8. doi: 10.1308/147870804290

51. Huang CL, Chen MC, Wang CJ. Credit scoring with a data mining approach based on support vector machines. Expert Syst Appl . (2007) 33:847-56. doi: 10.1016/j.eswa.2006.07.007

52. Subasi A. Classification of EMG signals using PSO optimized SVM for diagnosis of neuromuscular disorders. Comput Biol Med . (2013) 43:576-86. doi: 10.1016/j.compbiomed.2013.01.020

53. Cui X, Li Y, Fan J, Wang T, Zheng Y. A hybrid improved dragonfly algorithm for feature selection. IEEE Access . (2020) 8:155619-29. doi: 10.1109/ACCESS.2020.3012838

54. Sadeghian Z, Akbari E, Nematzadeh H. A hybrid feature selection method based on information theory and binary butterfly optimization algorithm. Eng Appl Artif Intel . (2021) 97:104079. doi: 10.1016/j.engappai.2020.104079

55. Too J, Mirjalili S. A hyper learning binary dragonfly algorithm for feature selection: a COVID-19 case study. Knowl Based Syst . (2021) 212:106553. doi: 10.1016/j.knosys.2020.106553

56. Lucero RJ, Lindberg DS, Fehlberg EA, Bjarnadottir RI, Li Y, Cimiotti JP, et al. A data-driven and practice-based approach to identify risk factors associated with hospital-acquired falls: applying manual and semi- and fully-automated methods. Int J Med Inform . (2019) 122:63-9. doi: 10.1016/j.ijmedinf.2018.11.006

57. Peng NB, Zhang YX, Zhao YH. A SVM-kNN method for quasar-star classification. Sci China Phys Mech . (2013) 56:1227-34. doi: 10.1007/s11433-013-5083-8

58. Bui DT, Pradhan B, Lofman O, Revhaug I. Landslide susceptibility assessment in Vietnam using support vector machines, decision tree, and naive bayes models. Math Probl Eng . (2012) 2012:974638. doi: 10.1155/2012/974638

59. Chan JFW, Yuan SF, Kok KH, To KKW, Chu H, Yang J, et al. A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster. Lancet . (2020) 395:514-23. doi: 10.1016/S0140-6736(20)30154-9

60. World Health Organization. Coronavirus Disease (COVID-19): Risks and Safety for Older People . Available online at: https://www.who.int/emergencies/diseases/novel-coronavirus-2019/question-and-answers-hub/q-a-detail/coronavirus-disease-covid-19-risks-and-safety-for-older-people (accessed September 22, 2020).

61. Metse AP, Fehily C, Clinton-McHarg T, Wynne O, Lawn S, Wiggers J, et al. Self-reported suboptimal sleep and receipt of sleep assessment and treatment among persons with and without a mental health condition in Australia: a cross sectional study. BMC Public Health . (2021) 21:463. doi: 10.1186/s12889-021-10504-6

62. Skapinakis P, Bellos S, Koupidis S, Grammatikopoulos I, Theodorakis PN, Mavreas V. Prevalence and sociodemographic associations of common mental disorders in a nationally representative sample of the general population of Greece. BMC Psychiatry. (2013) 13:163. doi: 10.1186/1471-244X-13-163

63. Eilersen A, Sneppen K. SARS-CoV-2 superspreading in cities vs the countryside. APMIS . (2021) 129:401-7. doi: 10.1111/apm.13120

64. Wang SH, Satapathy SC, Anderson D, Chen SX, Zhang YD. Deep fractional max pooling neural network for COVID-19 recognition. Front Public Health . (2021) 9:1117. doi: 10.3389/fpubh.2021.726144

65. Wang SH, Zhang Y, Cheng X, Zhang X, Zhang YD. PSSPNN: PatchShuffle Stochastic Pooling Neural Network for an explainable diagnosis of COVID-19 with multiple-way data augmentation. Comput Math Methods Med . (2021) 2021:6633755. doi: 10.1155/2021/6633755

Keywords: COVID-19, mental health, prediction, machine learning, artificial intelligence, neural network, public health

Citation: Wang X, Li H, Sun C, Zhang X, Wang T, Dong C and Guo D (2021) Prediction of Mental Health in Medical Workers During COVID-19 Based on Machine Learning. Front. Public Health 9:697850. doi: 10.3389/fpubh.2021.697850

Received: 20 April 2021; Accepted: 16 August 2021; Published: 07 September 2021.

Reviewed by:

Copyright © 2021 Wang, Li, Sun, Zhang, Wang, Dong and Guo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Chuanyong Sun, chuanyongsun@hotmail.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Mental Stress Level Prediction and Classification based on Machine Learning

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

  • Open access
  • Published: 27 May 2024

Role of machine learning algorithms in suicide risk prediction: a systematic review-meta analysis of clinical studies

  • Houriyeh Ehtemam 1 ,
  • Shabnam Sadeghi Esfahlani 1 ,
  • Alireza Sanaei 1 ,
  • Mohammad Mehdi Ghaemi 2 ,
  • Sadrieh Hajesmaeel-Gohari 3 ,
  • Rohaneh Rahimisadegh 2 ,
  • Kambiz Bahaadinbeigy 3 ,
  • Fahimeh Ghasemian 4 &
  • Hassan Shirvani 1  

BMC Medical Informatics and Decision Making volume  24 , Article number:  138 ( 2024 ) Cite this article

289 Accesses

Metrics details

Suicide is a complex and multifactorial public health problem. Understanding and addressing the various factors associated with suicide is crucial for prevention and intervention efforts. Machine learning (ML) could enhance the prediction of suicide attempts.

A systematic review was performed using PubMed, Scopus, Web of Science and SID databases. We aim to evaluate the performance of ML algorithms and summarize their effects, gather relevant and reliable information to synthesize existing evidence, identify knowledge gaps, and provide a comprehensive list of the suicide risk factors using mixed method approach.

Forty-one studies published between 2011 and 2022, which matched inclusion criteria, were chosen as suitable. We included studies aimed at predicting the suicide risk by machine learning algorithms except natural language processing (NLP) and image processing.

The neural network (NN) algorithm exhibited the lowest accuracy at 0.70, whereas the random forest demonstrated the highest accuracy, reaching 0.94. The study assessed the COX and random forest models and observed a minimum area under the curve (AUC) value of 0.54. In contrast, the XGBoost classifier yielded the highest AUC value, reaching 0.97. These specific AUC values emphasize the algorithm-specific performance in capturing the trade-off between sensitivity and specificity for suicide risk prediction.

Furthermore, our investigation identified several common suicide risk factors, including age, gender, substance abuse, depression, anxiety, alcohol consumption, marital status, income, education, and occupation. This comprehensive analysis contributes valuable insights into the multifaceted nature of suicide risk, providing a foundation for targeted preventive strategies and intervention efforts.

Conclusions

The effectiveness of ML algorithms and their application in predicting suicide risk has been controversial. There is a need for more studies on these algorithms in clinical settings, and the related ethical concerns require further clarification.

• Understanding various factors associated with suicide is crucial for prevention and intervention efforts.

• Machine learning (ML) could enhance the prediction of suicide attempts.

• The neural network (NN) algorithm exhibited the lowest accuracy of 0.70.

• The random forest demonstrated the highest accuracy for suicide prediction.

Peer Review reports

Introduction

Suicide is a global concern recognized by the World Health Organization (WHO), with a life lost to suicide every 40 s, making suicide prevention a pressing priority worldwide [ 1 ]. This form of violent death not only brings personal tragedy but also poses a significant threat to communities’ socio-psychological well-being and stability [ 2 ]. While suicide is a complex phenomenon influenced by multiple factors, behavioral, lifestyle, and clinical, can significantly contribute to an elevated risk of suicide [ 3 ]. For example, substance use can be considered a significant factor for suicide within the behavioral category [ 4 ]. Job and financial problems serve as important examples of lifestyle-related suicide risk [ 5 ]. Additionally, mental disorders are crucial clinical factors associated with suicide risk [ 6 ]. Early identification of risk factors is crucial in predicting suicide [ 7 , 8 ]. The prevalence of suicide is exceptionally high among adolescents and young adults, specifically those aged 15 to 44, it is not a universal phenomenon [ 9 ]. Research indicates that, in some countries the lower risk of suicide among older individuals may be due to their enhanced resilience and greater capacity to cope with adversity, potentially reducing the likelihood of suicidal behavior [ 10 , 11 ]. The other common factor can be gender. Some studies have revealed that gender differences in suicide rates indicate that men are more likely to die by suicide. However, this remains controversial because each gender is influenced by many other biological and environmental factors [ 12 ]. Suicide imposes financial burden on the healthcare system. For example, in Canada, New Zealand, and Ireland, the estimated direct and indirect costs of each suicide are approximately 443,000, 1.1 million, and 1.4 million pounds, respectively [ 13 , 14 , 15 ]. A comprehensive review of the works by these authors leads us to the conclusion that suicide is a global issue. Consequently, it is imperative for countries worldwide to collaborate in addressing this concern [ 1 ]. There is a growing interest in utilizing machine learning (ML) techniques for predicting suicide risk to address the issue. ML is a combination of statistical and computational models that can learn from data and improve through experience [ 16 ]. It is categorized into two main types: supervised and unsupervised. In supervised learning, the model is trained on labelled databases; however, in unsupervised learning, the model relies on unlabeled databases [ 17 ]. Both supervised and unsupervised algorithms can be utilized for suicide prediction depending on the type of database and the nature of the prediction.

Research by Walsh, Ribeiro, and Franklin (2017) demonstrated the superior performance of ML over conventional methods in accurately identifying suicide attempts [ 9 ]. ML methods have gained prominence due to their ability to extract valuable insights from diverse datasets and organize data efficiently [ 10 , 11 ]. While ML shows promise in predicting suicide events, it is vital to consider the varied outcomes produced by different ML algorithms. The study conducted by various researchers suggests that while there have been notable scientific advancements in leveraging digital technologies, such as ML algorithms to prevent suicide attempts and identify at-risk individuals, there are still limitations in terms of training, knowledge, and the integration of databases [ 18 , 19 , 20 ]. Current suicide risk assessment methods heavily rely on subjective questioning, limiting their accuracy and predictive value [ 21 ]. As such, this study aims to systematically review previous research that has applied ML methods to predict suicide attempts and identify patients at high risk of suicide. The primary objectives are to evaluate the performance of various ML algorithms and summarize their effects on suicide. Additionally, the study aims to identify significant variables that serve as more effective suicide risk factors.

Materials and methods

Search strategy and study selection.

We adhered to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines to systematically identify, select, and assess relevant studies for inclusion in our review. Our search strategy focused on PubMed, Scopus, Web of Science and SID databases, and there were no limitations on the publication date, ensuring comprehensive coverage of the literature. The project was initiated on June 1, 2022, and concluded on August 8, 2023, with a focus on two domains: machine learning (ML) and suicide.

To capture relevant studies, our search strategy incorporated keywords such as “self-harm”, “self-injury”, “self-destruction”, “self-poisoning”, “self-mutilation”, “self-cutting”, “self-burning”, “suicid*”. Additionally, we explored using artificial intelligence and ML techniques to predict suicidal attempts by employing “AND” and “OR” operators. The management of literature was facilitated through Endnote X7.

The study encompassed two primary outcomes: first, identifying the most effective ML algorithms based on their outcome measures, and second, identifying influential risk factors for predicting suicide. These outcomes were instrumental in achieving a comprehensive understanding of the field and informing our research objectives.

Inclusion & exclusion criteria

Inclusion criteria were applied to identify relevant studies for our review. The following criteria were considered:

Population: Studies that included participants from various age groups, including pediatrics, geriatrics, and all-age populations, were included.

Language: Only studies published in the English language were included.

Methods: Studies employing ML methods to predict suicide were included.

Publication format: Studies published as journal articles, theses, and dissertations were included.

Study design: Various study designs, including prospective, retrospective, retrospective cohort, case-cohort, case-control, cohort, diagnostic/prognostic, longitudinal, longitudinal cohort, longitudinal prospective, prognostic, prospective cohort, retrospective, retrospective cohort, and randomized control trial studies, were considered for inclusion.

Exclusion criteria were applied to select relevant studies for our analysis. Studies were excluded if they met the following criteria:

Population: Studies focusing specifically on military personnel and veterans were excluded. Including military personnel and veterans in our analysis could introduce unique variables and considerations related to their distinct healthcare needs, access to services, and experiences. For example, military personnel and veterans often have specific healthcare requirements stemming from their service-related experiences. These may encompass a range of issues, including physical injuries sustained during deployment, exposure to hazardous environments leading to unique health challenges, and complex medical histories shaped by their military service.

Moreover, their access to healthcare services can differ significantly from that of the general population. To maintain the homogeneity of our study population and to ensure the relevance and applicability of our findings to the specific context of hospitals, we have opted to exclude this subgroup.

Social media-based studies: Studies aiming to predict suicide attempts using ML among adults who posted or searched content related to suicide on internet platforms such as Twitter, Instagram, and Google were excluded.

Natural language processing (NLP) and image processing methods: Studies utilizing NLP and image processing techniques for predicting suicide attempts were excluded.

Publication type: Conference papers, reviews, letters to editors, book chapters, and commentary papers were excluded from the analysis.

By applying these inclusion and exclusion criteria, we aimed to select studies that align with the objectives and focus of our research.

Data collection process

Data extraction was conducted using Microsoft Excel 2016 spreadsheet. The following information was extracted from each included study:

Study title: The title of the research article.

Authors: The names of the authors involved in the study.

Year of publication: The year in which the study was published.

Country of study: The geographical location where the study was conducted.

Population: The target population or participants involved in the study.

Type of study: The research design employed in the study.

Sample size: The number of participants included in the study.

Study objective: The main objective or aim of the study.

Suicide risk factors: Factors or variables considered in predicting suicide risk.

ML models: The specific ML models used in the study.

Outcome measures: Various performance metrics used to evaluate the models, including area under the curve (AUC), sensitivity, specificity, accuracy, false negative rate, false positive rate, true positive rate, negative predictive value, positive predictive value, precision, and recall.

Quality assessment

The quality of the included articles was assessed using the Mixed Methods Appraisal Tool (MMAT 2018) following the search process. We adopted MMAT’s five sets of criteria to evaluate the quality of each type of study included in our analysis, namely qualitative, randomized controlled, nonrandomized, quantitative descriptive, and mixed methods studies [ 8 ]. This rigorous assessment process allowed us to evaluate the included studies’ methodological quality and ensure our findings’ reliability and validity.

Data analysis methods

During the quantitative phase, the extracted data were analyzed using STATA 14.1 statistical software to conduct meta-analytic procedures. We applied the Freeman-Tukey double arcsine transformation to estimate the pooled prevalence of study outcomes and their corresponding 95% confidence intervals (CI). This transformation was also utilized to stabilize variances when generating the CIs. A random-effects model based on DerSimonian and Laird’s method was employed in the pooled data collection to account for between-study variability. This model incorporated the variability using the inverse-variance fixed-effect model [ 22 , 23 ].

In the qualitative phase, the extracted data was imported into MAXQDA 20 software to facilitate meta-synthesis procedures. This critical stage involved coding the suicide risk factors from our final studies based on various themes or categories, such as demographic (demographic factors, such as age, gender, marital status), clinical and behavioral (certain behaviors, like impulsivity, self-harm, or aggression, and clinical factors involve mental health diagnoses and conditions), lifestyle (encompass aspects of an individual’s daily life, including habits and routines), laboratory and biomarkers (these could include genetic markers, hormonal imbalances), and questionnaires (the use of standardized scales and questionnaires helps quantify and measure psychological factors associated with suicide risk). Through this process, we aggregated the coded data to identify common suicide risk factors across all the studies, allowing for a comprehensive understanding of the topic.

Publication bias

We researched various languages and databases to address study, citation and database biases. We enhanced our search strategy, resulting in the identification of 7529 publications. This abundance of sources highlights the prevalence of multiple-citation publications within our dataset. Given the common occurrence of publishing study results in similar or consecutive articles, we utilized EndNote software to identify duplicates and mitigate the risk of multiple publications and bias.

Figure 1 presents the PRISMA flow chart, which provides a concise review process overview. The initial search yielded 7,529 published studies. After removing 569 duplicate records, we screened the titles and abstracts of the remaining 6,624 papers. Based on this screening, 5,624 papers were excluded as they did not meet the inclusion criteria. Subsequently, the full texts of the remaining 369 studies were thoroughly assessed to determine their eligibility for inclusion in the analysis. Among these, 328 studies were deemed ineligible as they did not meet the predetermined criteria. Ultimately, 41 studies were selected for the meta-analysis and meta-synthesis, meeting the quality assessment criteria. Overall, the selected studies demonstrated satisfactory quality.

figure 1

PRISMA flow diagram for the selection of studies on ML algorithms used for the purpose of suicide prediction

The included studies had sample sizes ranging from 159 to 13,980,570, as reported in previous research [ 24 , 25 ]. The mean sample size of (Mean = 549,944.51) refers to the average number of participants across the studies included in our analysis. This value is important as it indicates the generalizability of the findings. Larger sample sizes contribute to more robust and reliable results, allowing for broader applicability of our conclusions.

Standard deviation of (SD = 2,242,858.719), reflects the variability in sample sizes observed across the individual studies. Some studies may have significantly larger or smaller sample sizes compared to the mean, resulting in a wide dispersion of values. This variability influences the heterogeneity of the overall findings and underscores the need to consider the diversity in sample sizes when interpreting the results. The median of sample size, representing the central value, is 13,420. Most of these studies were conducted in the United States and South Korea, with cohort and case-control designs being the most employed study designs. The participants in these studies predominantly represented the general population. The outcome measurement criteria of the data collection process and its results are presented below.

Pooled prevalence of ML outcomes

Additional details of the studies included can be found in Table 1 (after reference section). Note that the statistical analysis revealed that the negative predictive value and the false positive rate did not show a significant difference, with a p-value greater than 0.05. To identify single study influence on the overall meta-analysis, a sensitivity analysis was performed using a random-effects model and the result showed there was no evidence for the effect of single study influence on the overall meta-analysis.

Accuracy refers to the ability of ML models to differentiate between health and patient groups [ 56 ] correctly. Out of the 41 final studies, 13 studies reported information on accuracy. The reported accuracy rates varied across the studies as indicated in Panel A of Fig. 2 , with the lowest being 0.70 for NN in the study conducted in [ 30 ], and the highest being 0.94 for the random forest in the study conducted in [ 32 ]. The overall pooled prevalence of accuracy was 0.78 (( \({I}^{2}=56.32\%\) ; 95% CI: 0.73, 0.84), Table 2 .

figure 2

Panel A. Accuracy of the machine learning models; N studies = 13

The area under the curve (AUC) as a metric used in this study to compare the performance of multiple classifiers [ 26 ]. In our analysis, a total of thirty-two studies reported AUC values as indicated in Fig. 3 , Panel B. Balbuena et al.’s (2022) study reported the lowest AUC of 0.54, based on the COX and random forest models. On the other hand, Choi et al.’s (2021) reported the highest AUC of 0.97, using the XGBoost classifier. The pooled prevalence of AUC across the studies was estimated to be 0.77 ( \({I}^{2}=\) 95.86%; 95% CI: 0.74, 0.80), Table 2 .

figure 3

Panel B. AUC of the machine learning models; N studies = 32

Precision is a measure that determines the number of true positives divided by the sum of true positives and false positives [ 27 ]. In our analysis, three studies reported precision values as depicted in Fig. 4 , Panel C. Two studies reported the highest precision rate of 0.93. The first study, conducted by Choi et al., utilized the XGBoost classifier, and the second study, by Kim et al. (2021), employed a random forest model. On the other hand, the lowest precision rate of 0.86 was documented in the Delgado-Gomez et al. (2016) study, which used a decision tree model. The pooled prevalence of positive predictive value was estimated to be 0.91 ( \({I}^{2}=\) 0.001%; 95% CI: 0.85, 0.98), Table 2 .

figure 4

Panel C. Precision of the machine learning models; N studies = 3

Positive predictive value

Positive predictive value (PPV) represents the proportion of true positive cases among all positive predictions [ 27 ]. Among the studies included in our analysis, six studies reported PPV values as depicted in Fig. 5 , Panel D. The PPV varied across the studies, ranging from 0.01 in Cho et al.’s study conducted in 2020 and 2021, which utilized a random forest model, to 0.62 in Navarro’s (2021) study, also employing a random forest model. The pooled prevalence of PPV was estimated to be 0.10 ( \({I}^{2}=\) 97.02%; 95% CI: 0.03, 0.21), Table 2 .

figure 5

Panel D. Positive predictive value of the machine learning models; N studies = 6

True positive rate

The true positive rate (TPR), also known as sensitivity, represents the proportion of actual positive cases correctly identified by the model [ 27 ]. In our analysis, only one study conducted by Ballester et al. (2021), utilized the gradient tree boosting model, reported the TPR as depicted in Fig. 6 , Panel E. The pooled prevalence of TPR in this study was estimated to be 0.77 (95% CI: 0.40, 1.34), Table 2 .

figure 6

Panel E. True positive rate of the machine learning models; N studies = 1

Sensitivity

Sensitivity, also known as the true positive rate, measures the proportion of actual positive cases correctly identified by the model (specified patient cases) [ 28 ]. In our analysis, fifteen studies provided data on sensitivity as illustrated in Fig. 7 , Panel F. The sensitivity ranged from 0.43 in Navarro’s (2021) random forest study to 0.87 in Delgado-Gomez et al.’s (2016) decision tree study. The pooled prevalence of sensitivity was estimated to be 0.69 ( \({I}^{2}=\) 95.94%; 95% CI: 0.60, 0.78), Table 2 .

figure 7

Panel F. Sensitivity of the machine learning models; N studies = 15

Specificity

Specificity is a measure that identifies the proportion of actual negative cases correctly identified by the model [ 28 ]. In our analysis, fifteen studies reported specificity rates as illustrated in Fig. 8 , Panel G. The specificity ranged from 0.63 in Melhem et al.’s (2019) study using logistic regression, to 0.90 in Barak-Corren et al.’s (2017) study using Naive Bayesian classifier. The pooled prevalence of specificity was estimated to be 0.81 ( \({I}^{2}= 80.31\%;\) 95% CI: 0.77, 0.86), Table 2 .

figure 8

Panel G. Specificity of the machine learning models; N studies = 15

Recall is a measure that determines the proportion of true positive cases correctly identified by the model [ 27 ]. In our analysis, three studies reported recall rates as depicted in Fig. 9 , Panel H, ranging from 0.11 in McKernan et al.’s (2019) study using bootstrapped L-1 penalized regression to 0.95 in Kim et al.’s (2021) study using random forest. The pooled prevalence of recall was estimated to be 0.58 ( \({I}^{2}= 98.43\%;\) 95% CI: 0.15, 1.29), Table 2 .

figure 9

Panel H. Recall of the machine learning models; N studies = 3

False negative rate

False negative rate represents the proportion of actual negative cases incorrectly identified by the model [ 29 ]. Two studies provided data on false negative rates, with rates that were similar to each other as shown in Fig. 10 , Panel I. These studies utilized the random forest and binary logistic regression models. The pooled prevalence of the false negative rate was estimated to be 0.26 ( \({I}^{2}= 0.001\%;\) 95% CI: 0.24, 0.28), Table 2 .

figure 10

Panel I. False negative rate of the machine learning models; N studies = 2

Suicide risk factors

In our meta-synthesis analysis, we studied 41 studies in which we identified 261 suicide risk factors. We implemented a rigorous extraction process to identify the most significant risk factors. While some studies presented vast datasets with over 2500 entries of potential risk factors, the focus was on extracting those factors consistently cited as common and important indicators of suicide risk across multiple studies [ 30 , 31 ]. To ensure robustness, we excluded risk factors reported less than three times, resulting in the compilation of 55 frequently occurring risk factors. We aimed to focus on more prevalent risk factors in the database to enhance the generalizability of the findings to the broader population. Some factors with lower frequencies can introduce noise in the analysis, making it more challenging to identify true patterns. The minimum threshold helped us filter out less relevant factors. This decision was based on a focus group session that included two psychiatrists and one emergency physician. The focus group selected the most common variables that were repeated more than three times based on their scientific knowledge and experience. These factors were categorized into five distinct categories in our study, as outlined in Table 3 .

This study employed a systematic review, meta-analysis, and meta-synthesis approach to examine the pooled prevalence of ML outcomes for predicting suicide and provide a comprehensive list of suicide risk factors. The intricate nature of suicide as a behavior is underscored by a diverse array of risk factors, spanning clinical variables to lifestyle influences [ 32 ]. Our study adopted a comprehensive approach, employing both qualitative and quantitative methods. Additionally, the study was limited to studies with prospective, retrospective, retrospective cohort, case cohort, case-control, cohort, diagnostic/prognostic, longitudinal, longitudinal cohort, longitudinal prospective, prognostic, prospective cohort, retrospective, retrospective cohort, and randomized control trial designs due to the large number of studies in the final stage, and to ensure methodological rigor. Ultimately, 41 studies were selected for the meta-analysis and meta-synthesis, meeting the quality assessment criteria. Results revealed the neural network (NN) algorithm with the lowest accuracy at 0.70, contrasting with the random forest exhibiting the highest accuracy at 0.94. Furthermore, the XGBoost classifier demonstrated the highest Area Under the Curve (AUC) value, reaching an impressive 0.97. These findings not only contribute to our understanding of suicide risk factors but also highlight the significance of methodological considerations and algorithmic performance in predictive models.

The findings of this study are consistent with previous research conducted by [ 33 , 34 ] which suggested that ML algorithms and the identification of innovative indicators play a valuable role in predicting suicide and detecting mental health issues. However, these findings contradict the results of [ 35 ], which indicated insufficient evidence to support the superior performance of ML over logistic regression in clinical prediction models. The studies included in the analysis that used ML techniques to predict suicidal attempts demonstrated overall good performance on the most commonly used algorithms, namely XGBoost. For example, the AUC values reported in these studies were consistently high, ranging approximately between 0.65 and 0.97. An AUC value of 0.5 indicates a random prediction, while a value of 1 represents a perfect prediction. The AUC values in the range of 0.97 for XGBoost model suggest that the ML models had a high degree of accuracy in classifying individuals with respect to their risk of suicidal attempts. The findings of this study are consistent with previous research conducted by [ 36 ] which confirmed acceptable performance of XGBoost algorithm in cognition of patients with major depressive disorder. This result may be due to the fact that XGBoost is an ensemble model that constructs various models to reduce classification errors on each iteration. According to [ 37 ], certain ML algorithms, such as support vector machines (SVM) and decision trees (DT), are preferred over others due to their superior performance in predicting suicide risk. Furthermore [ 38 ], confirmed that the application of ML techniques to analyze large databases holds great potential for facilitating the prediction of suicide, offering promising avenues for future research. The results of this study align with the findings of [ 39 ], which highlighted the ability of ML to enhance suicide prediction models by incorporating a larger number of suicide risk factors. Applicability of these methods in specific patient groups is invaluable. For example [ 40 ], indicated that predicting whether a person has a mental illness itself poses a significant challenge. Therefore, if machine learning can offer a new avenue of hope for clinicians, it is commendable. However [ 41 ], discovered that although these models have demonstrated accuracy in the overall classification, their ability to predict future events remains limited in the context of suicide prediction models.

Consequently, it is important to note that the performance of ML algorithms can vary depending on various factors, including the quality and size of the dataset, the specific features used as input, the preprocessing steps applied, and the hyperparameters selected for the algorithms. Therefore, the overall performance of these algorithms in predicting suicide showed strong discriminatory power in distinguishing between individuals who are at risk of suicidal attempts and those who are not. Future research should continue exploring and refining ML approaches for suicide prediction, considering these factors to improve the accuracy and reliability of predictions.

The findings of our study revealed that various factors, such as age, sex, substance abuse, depression, anxiety, alcohol consumption, marital status, income, education, low-density lipoprotein (LDL) and occupation, were identified as the most prevalent risk factors based on the analysis of included studies. Age plays a complex role in suicide, with several studies indicating a higher incidence of suicide among middle-aged and older adults. However, it is important to note that age is not the sole factor contributing to suicidal behavior [ 42 , 43 ]. The prevalence of suicide is exceptionally high among young adults, specifically those aged 15 to 19 as it is a fourth cause of death in the world [ 44 ]. Sex is a significant risk factor for suicide. In general, men are more likely to die by suicide than women, but women attempt suicide more often than men. This may be because men are more likely to use lethal methods [ 42 , 45 , 46 ].

According to the meta-synthesis results, there appears to be a significant correlation between substance abuse and depression with suicide. This correlation may be because substance abuse can impair judgment and increase impulsivity. On the other hand, a person who is depressed may experience feelings of hopelessness, helplessness, and despair, which can lead to suicidal thoughts or behaviors. These findings align with the study conducted by [ 47 , 48 ]. Anxiety as a mental health condition can lead to various negative outcomes, including an increased risk of suicide [ 49 ]. Alcohol use can increase impulsivity and decrease inhibitions, leading to risky behaviors such as self-harm or suicide attempts [ 50 ,  51 ]. found that the consumption of alcohol while feeling sad or depressed could indicate suicidal behavior in adolescents who had not previously reported having thoughts of suicide before attempting it.

Marital status is a common suicide risk factor. Researchers have found that married individuals have lower suicide rates than their unmarried counterparts. This trend is observed in both men and women across different age groups and cultures [ 52 ]. Low income has been associated with an increased risk of suicide. The reasons for this link are complex and multifactorial, but some possible explanations include limited access to healthcare and mental health services, financial strain, and social isolation [ 53 ]. Lower education levels are also associated with higher suicide rates. This may be because lower education-level individuals have fewer job opportunities and may experience more financial stress [ 53 ]. In addition to the clinical and demographic factors discussed, it is crucial to recognize the significant role that certain biomarkers and laboratory factors play in the vulnerability to suicide. One notable example is the impact of low serum cholesterol levels, which have been found to significantly heighten the risk of suicide [ 54 ]. Some studies have shown that LDL level is an important factor in the incidence of suicide [ 55 ]. Moreover, some studies have indicated that individuals who have committed suicide had higher levels of LDL compared to non-attempters [ 56 ].

Machine learning (ML) techniques are suitable for predicting suicide risk, overcoming the constraints of traditional methods. However, ML requires sufficient and relevant data to train and validate the early identification of risk factors and suicide prediction. We acknowledge the importance of anticipating and addressing immediate concerns related to suicide in a clinical setting. Due to this, some studies have focused on utilizing certain scales in psychiatric outpatients [ 57 ]. However, reliance solely on these scales may instill an unwarranted sense of assurance among healthcare providers. Hence, it is crucial to factor in data availability and the computational demands of handling extensive datasets and intricate models. Our evaluation underscores the proficiency of ML algorithms in uncovering concealed relationships and delivering precise predictions of suicide risk, contingent upon the judicious selection and meticulous evaluation of algorithms. This underscores the indispensable role of ML algorithms in exhaustively analyzing data and pinpointing crucial risk factors, thereby advocating for further exploration in the field. This methodological breadth mirrors the multifaceted nature of suicide risk prediction, enhancing the generalizability of our findings. However, our study may be susceptible to limitations arising from the included studies and the meta-analysis methodology. Additionally, reliance on published literature may introduce publication bias, favoring studies with statistically significant results and potentially skewing overall findings. Furthermore, it is suggested to report τ² and the Q-statistic in future studies to assess heterogeneity. Despite these challenges, our study offers valuable insights into the role of machine learning algorithms in predicting suicide risk and sheds light on important risk factors associated with suicidal behavior. Future research endeavors will continue to tackle these methodological hurdles, striving for enhanced standardization and transparency in study reporting to fortify the reliability and reproducibility of findings in this crucial domain of inquiry.

Ethical considerations in the use of ML for suicide prediction

Machine learning (ML) for suicide prediction requires the implementation of ethical considerations as the well-being and rights of individuals and the privacy and confidentiality of individuals’ data are crucial. Participants should be fully informed about the study’s purpose, potential risks, and benefits and have the right to withdraw their consent at any time. Understanding and interpreting the factors and variables that contribute to the predictions is important. This transparency is required to gain the trust of both individuals at risk and healthcare professionals. Ensuring that ML algorithms cannot be replaced by human intervention and clinical judgment is important. Human oversight is critical in interpreting and acting upon the predictions made by the algorithms. Healthcare professionals should make informed decisions based on ML predictions, considering the individual’s unique circumstances and context [ 58 ].

Suicide is a complex and multifaceted public health issue with significant implications for individuals and communities. Our study examined the application of ML techniques for predicting suicide risk. Our research findings highlight the diverse performance of ML algorithms in predicting suicide, indicating the need for further investigation and refinement.

Our analysis identified several general risk factors contributing to an individual’s heightened risk of suicide. These factors include age, sex, substance abuse, depression, anxiety, alcohol consumption, marital status, income, education, and occupation. Recognizing that these risk factors interact in complex ways is important, and their presence does not guarantee suicidal behaviour. Nonetheless, understanding and addressing these risk factors can aid in developing targeted prevention and intervention strategies.

While ML algorithms have shown promise in predicting suicide risk, their performance can vary depending on the specific dataset and risk factors being considered. Further studies are warranted to explore using ML algorithms across diverse databases encompassing various risk factors. This would allow for a more comprehensive understanding of the predictive capabilities of ML in different contexts and populations.

Moreover, future research should focus on enhancing the interpretability and explainability of ML models in suicide prediction. Understanding the underlying mechanisms and variables contributing to predictions is essential for effective intervention and decision-making. Additionally, rigorous validation and evaluation of ML algorithms should be conducted to assess their accuracy, generalizability, and potential biases.

To advance the field of suicide prediction using ML, collaboration between researchers, clinicians, and policymakers is crucial. This interdisciplinary approach can foster the development of comprehensive and ethical frameworks for implementing ML algorithms in suicide prevention efforts. Ensuring that ML techniques are used responsibly, prioritizing patient well-being, privacy, and equitable outcomes is imperative.

In conclusion, our study sheds light on the potential of ML algorithms in predicting suicide risk. However, further research is needed to refine and validate these algorithms across different datasets and risk factors. By understanding the complexities of suicide and leveraging the power of ML, we can work towards more effective strategies for suicide prevention and intervention.

Availability of data and materials

All data generated or analysed during this study are included in this published article.

Organization WH. Live life: preventing suicide: implementation. World Health Organization; 2022.

Google Scholar  

Furqatovich UF, Sattorovich EZ. Suicide–as a global problem facing humanity. Web Scientist: Int Sci Res J. 2022;3(02):349–54.

Reddy M. Suicide incidence and epidemiology. New Delhi, India: SAGE Publications Sage India; 2010. pp. 77–82.

Diefenbach GJ, et al. Uncovering the role of substance use in suicide attempts using a mixed-methods approach. Suicide Life-Threatening Behav. 2024;54(1):70–82.

Article   Google Scholar  

Cao Z, et al. Healthy lifestyle and the risk of depression recurrence requiring hospitalisation and mortality among adults with pre-existing depression: a prospective cohort study. BMJ Mental Health. 2024;27(1):e300915.

Article   PubMed   PubMed Central   Google Scholar  

Brådvik L. Suicide risk and Mental disorders. Int J Environ Res Public Health. 2018;15(9):2028.

Fonseka TM, Bhat V, Kennedy SH. The utility of artificial intelligence in suicide risk prediction and the management of suicidal behaviors. Aust N Z J Psychiatry. 2019;53(10):954–64.

Article   PubMed   Google Scholar  

Pluye P, Cargo RE, Bartlett M, O’Cathain G, Griffiths A. F. A Mixed Methods Appraisal Tool for systematic mixed studies reviews 2011 . November 15, 2013]; http://mixedmethodsappraisaltoolpublic.pbworks.com .

Amini P, et al. Evaluating the high risk groups for suicide: a comparison of logistic regression, support vector machine, decision Tree and Artificial neural network. Iran J Public Health. 2016;45(9):1179–87.

PubMed   PubMed Central   Google Scholar  

Lindesay J. Suicide in the elderly. Int J Geriatr Psychiatry. 1991;6(6):355–61. https://doi.org/10.1002/gps.930060605 .

Ormiston CK, et al. Trends in adolescent suicide by Method in the US, 1999–2020. JAMA Netw Open. 2024;7(3):e244427–244427.

Schunk DH, Meece JL. Self-efficacy development in adolescence. Self-efficacy Beliefs Adolescents. 2006;5(1):71–96.

O’Dea D, Tucker S. The cost of suicide to society. Wellington: Ministry of Health; 2005.

Clayton D, Barceló A. The cost of suicide mortality in New Brunswick, 1996. Chronic Dis Injuries Can. 1999;20(2):89.

CAS   Google Scholar  

Kennelly B. The economic cost of suicide in Ireland. Crisis. 2007;28(2):89–94.

Kayikci S, Khoshgoftaar TM. Blockchain meets machine learning: a survey. J Big Data. 2024;11(1):9.

Love BC. Comparing supervised and unsupervised category learning. Psychon Bull Rev. 2002;9(4):829–35.

Mendes-Santos C, et al. Understanding mental health professionals’ perspectives and practices regarding the implementation of digital mental health: qualitative study. Volume 6. JMIR formative research; 2022. p. e32558. 4.

Bucci S, Schwannauer M, Berry N. The digital revolution and its impact on mental health care. Psychol Psychotherapy: Theory Res Pract. 2019;92(2):277–97.

Bucci S, Berry N, Morris R, Berry K, Haddock G, Lewis S, Edge D. They are not hard-to-reach clients. We have just got hard-to-reach services. Staff views of digital health tools in specialist mental health services. Front Psychiatr. 2019;10:344. https://doi.org/10.3389/fpsyt.2019.00344 .

Fonseka TM, Bhat V, Kennedy SH. The utility of artificial intelligence in suicide risk prediction and the management of suicidal behaviors. Australian New Z J Psychiatry. 2019;53(10):954–64.

Freeman MF, Tukey JW. Transformations related to the angular and the Square Root. Ann Math Stat. 1950;21(4):607–11.

DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials. 1986;7(3):177–88.

Article   CAS   PubMed   Google Scholar  

Bai S, et al. Potential biomarkers for diagnosing major depressive disorder patients with suicidal ideation. J Inflamm Res. 2021;14:495–503.

Coley RY, et al. Racial/Ethnic disparities in the performance of Prediction models for death by suicide after Mental Health visits. JAMA Psychiatry. 2021;78(7):726–34.

Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143(1):29–36.

Ting KM. Precision and Recall. Encyclopedia of machine learning. Boston, MA: Springer US; 2010. pp. 781–781. C. Sammut and G.I. Webb, Editors.

Baratloo A, Hosseini M, Negida A, El Ashal G. Part 1: simple definition and calculation of accuracy, sensitivity and specificity. Emerg (Tehran). 2015 Spring;3(2):48–9.

Riffenburgh RH. Statistics in medicine. Academic; 2012.

Jiang T, et al. Suicide prediction among men and women with depression: a population-based study. J Psychiatr Res. 2021;142:275–82.

Edgcomb JB, et al. Machine learning to differentiate risk of suicide attempt and self-harm after General Medical hospitalization of Women with Mental Illness. Med Care. 2021;59:S58–64.

de Beurs D, et al. A network perspective on suicidal behavior: understanding suicidality as a complex system. Suicide Life-Threatening Behav. 2021;51(1):115–26.

Burke TA, Ammerman BA, Jacobucci R. The use of machine learning in the study of suicidal and non-suicidal self-injurious thoughts and behaviors: a systematic review. J Affect Disord. 2019;245:869–84.

Chahar R, Dubey AK, Narang SK. A review and meta-analysis of machine intelligence approaches for mental health issues and depression detection. Int J Adv Technol Eng Explor. 2021;8(83):1279.

Christodoulou E, et al. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol. 2019;110:12–22.

Zheng S, et al. Can cognition help predict suicide risk in patients with major depressive disorder? A machine learning study. BMC Psychiatry. 2022;22(1):580.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Castillo-Sánchez G, et al. Suicide risk Assessment using machine learning and Social Networks: a scoping review. J Med Syst. 2020;44(12):205.

Bernert RA, Hilberg AM, Melia R, Kim JP, Shah NH, Abnousi F. Artificial intelligence and suicide prevention: A systematic review of machine learning investigations. Int J Environ Res Public Health. 2020;17(16):5929. https://doi.org/10.3390/ijerph17165929 .

Corke M, et al. Meta-analysis of the strength of exploratory suicide prediction models; from clinicians to computers. BJPsych Open. 2021;7(1):e26.

Kumar P, Chauhan R, Stephan T, Shankar A, Thakur S. A Machine learning implementation for mental health care. Application: Smart Watch for depression detection. 2021 11th international conference on cloud computing, data science & engineering (Confluence), Noida, India, 2021. p. 568–574. https://doi.org/10.1109/Confluence51648.2021.9377199 .

Belsher BE, et al. Prediction models for suicide attempts and deaths: a systematic review and Simulation. JAMA Psychiatry. 2019;76(6):642–51.

Cho S-E, Geem ZW, Na K-S. Development of a suicide prediction model for the Elderly using Health Screening Data. Int J Environ Res Public Health. 2021;18(19):10150.

Edgcomb JB, et al. Predicting suicidal behavior and self-harm after general hospitalization of adults with serious mental illness. J Psychiatr Res. 2021;136:515–21.

Hughes JL, et al. Suicide in young people: screening, risk assessment, and intervention. BMJ. 2023;381:e070630.

Fradera M, et al. Can routine primary Care records help in detecting suicide risk? A Population-based case-control study in Barcelona. Archives Suicide Res. 2022;26(3):1395–409.

Choi SB, et al. Ten-year prediction of suicide death using Cox regression and machine learning in a nationwide retrospective cohort study in South Korea. J Affect Disord. 2018;231:8–14.

Chattopadhyay S, Daneshgar F. A study on suicidal risks in psychiatric adults. Int J BioMed Eng Technol. 2011;5(4):390–408.

Chang HB, et al. The role of substance use, smoking, and inflammation in risk for suicidal behavior. J Affect Disord. 2019;243:33–41.

Sareen J, et al. Anxiety disorders and risk for suicidal ideation and suicide attempts: a Population-based longitudinal study of adults. Arch Gen Psychiatry. 2005;62(11):1249–57.

Choi J, Cho S, Ko I, Han S. Identification of risk factors for suicidal ideation and attempt based on machine learning algorithms: A longitudinal survey in Korea (2007-2019). Int J Environ Res Public Health. 2021;18(23):12772. https://doi.org/10.3390/ijerph182312772 .

Schilling EA, et al. Adolescent alcohol use, suicidal ideation, and suicide attempts. J Adolesc Health. 2009;44(4):335–41.

Smith JC, Mercy JA, Conn JM. Marital status and the risk of suicide. Am J Public Health. 1988;78(1):78–80.

Berkelmans G, et al. Identifying socio-demographic risk factors for suicide using data on an individual level. BMC Public Health. 2021;21(1):1702.

Troisi A. Cholesterol in coronary heart disease and psychiatric disorders: same or opposite effects on morbidity risk? Neurosci Biobehavioral Reviews. 2009;33(2):125–32.

Article   CAS   Google Scholar  

Loas G, et al. Relationships between Anhedonia, alexithymia, impulsivity, suicidal ideation, recent suicide attempt, C-reactive protein and serum lipid levels among 122 inpatients with mood or anxious disorders. Psychiatry Res. 2016;246:296–302.

Ma YJ, Zhou YJ, Wang DF, Li Y, Wang DM, Liu TQ, Zhang XY. Association of lipid profile and suicide attempts in a large sample of first episode drug-naive patients with major depressive disorder. Front Psychiatr. 2020;11:543632. https://doi.org/10.3389/fpsyt.2020.543632 .

Barzilay S, et al. Determinants and predictive value of Clinician Assessment of short-term suicide risk. Suicide Life-Threatening Behav. 2019;49(2):614–26.

Castillo-Sánchez G, Acosta MJ, Garcia-Zapirain B, De la Torre I, Franco-Martín M. Application of machine learning techniques to help in the feature selection related to hospital readmissions of suicidal behavior. Int J Ment Health Addict. 2022;18:1–22. https://doi.org/10.1007/s11469-022-00868-0 . Epub ahead of print.

Alexopoulos GS, et al. Modifiable predictors of suicidal ideation during psychotherapy for late-life major depression. A machine learning approach. Translational Psychiatry. 2021;11(1):536.

Balbuena LD, et al. Identifying long-term and imminent suicide predictors in a general population and a clinical sample with machine learning. BMC Psychiatry. 2022;22(1):120.

Ballester PL, et al. 5-year incidence of suicide-risk in youth: a gradient tree boosting and SHAP study. J Affect Disord. 2021;295:1049–56.

Barak-Corren Y, et al. Predicting suicidal behavior from longitudinal electronic health records. Am J Psychiatry. 2017;174(2):154–62.

Bhak Y, et al. Depression and suicide risk prediction models using blood-derived multi-omics data. Translational Psychiatry. 2019;9(1):262.

Cho S-E, Geem ZW, Na K-S. Prediction of suicide among 372,813 individuals under medical check-up. J Psychiatr Res. 2020;131:9–14.

Choi KS, et al. Deep graph neural network-based prediction of acute suicidal ideation in young adults. Sci Rep. 2021;11(1):1–11.

Etter DJ, et al. Suicide screening in primary care: use of an electronic screener to assess suicidality and improve Provider Follow-Up for adolescents. J Adolesc Health. 2018;62(2):191–7.

Ge F, Jiang J, Wang Y, Yuan C, Zhang W. Identifying suicidal ideation among Chinese patients with major depressive disorder: Evidence from a real-world hospital-based study in China. Neuropsychiatr Dis Treat. 2020;16:665–72.  https://doi.org/10.2147/NDT.S238286 .

Hill RM, Oosterhoff B, Do C. Using machine learning to identify suicide risk: a classification tree approach to prospectively identify adolescent suicide attempters. Archives Suicide Res. 2020;24(2):218–35.

Kim S, Lee H-K, Lee K. Detecting suicidal risk using MMPI-2 based on machine learning algorithm. Sci Rep. 2021;11(1):15310.

Haroz EE, et al. Comparing the predictive value of screening to the use of electronic health record data for detecting future suicidal thoughts and behavior in an urban pediatric emergency department: a preliminary analysis. Volume 51. Suicide and Life-Threatening Behavior; 2021. pp. 1189–202. 6.

Melhem NM, et al. Severity and variability of depression symptoms Predicting suicide attempt in high-risk individuals. JAMA Psychiatry. 2019;76(6):603–13.

Machado CdS, et al. Prediction of suicide attempts in a prospective cohort study with a nationally representative sample of the US population. Psychol Med. 2021;52(14):2985–96.

Miranda O, et al. DeepBiomarker: identifying important lab tests from Electronic Medical Records for the prediction of suicide-related events among PTSD patients. J Personalized Med. 2022;12(4):524.

Zheng L, et al. Development of an early-warning system for high-risk patients for suicide attempt using deep learning and electronic health records. Translational Psychiatry. 2020;10(1):72.

Zhu R, et al. Discriminating suicide attempters and Predicting suicide risk using altered frontolimbic resting-state functional connectivity in patients with bipolar II disorder. Front Psychiatry. 2020;11:597770.

Setoyama D, et al. Plasma metabolites predict severity of depression and suicidal ideation in psychiatric patients-a multicenter pilot analysis. PLoS ONE. 2016;11(12):e0165267.

van Mens K, et al. Predicting future suicidal behaviour in young adults, with different machine learning techniques: a population-based longitudinal study. J Affect Disord. 2020;271:169–77.

Su C, et al. Machine learning for suicide risk prediction in children and adolescents with electronic health records. Translational Psychiatry. 2020;10(1):413.

Forkmann T, et al. Interpersonal theory of suicide: prospective examination. BJPsych Open. 2020;6(5):e113.

Adams RS, et al. Sex-specific risk profiles for suicide among persons with Substance Use disorders in Denmark. Addiction. 2021;116(10):2882–92.

Barak-Corren Y, et al. Validation of an Electronic Health record–based suicide risk prediction modeling Approach across multiple Health Care systems. JAMA Netw Open. 2020;3(3):e201262–201262.

Barros J, et al. Suicide detection in Chile: proposing a predictive model for suicide risk in a clinical sample of patients with mood disorders. Brazilian J Psychiatry. 2016;39:1–11.

Delgado-Gomez D, et al. Computerized adaptive test vs. decision trees: development of a support decision system to identify suicidal behavior. J Affect Disord. 2016;206:204–9.

Delgado-Gomez D, et al. Improving the accuracy of suicide attempter classification. Artif Intell Med. 2011;52(3):165–8.

Delgado-Gomez D, et al. Suicide attempters classification: toward predictive models of suicidal behavior. Neurocomputing. 2012;92:3–8.

DelPozo-Banos M, et al. Using neural networks with routine health records to identify suicide risk: feasibility study. JMIR Mental Health. 2018;5(2):e10144.

Gradus JL, et al. Predicting Sex-specific nonfatal suicide attempt risk using machine learning and data from Danish National registries. Am J Epidemiol. 2021;190(12):2517–27.

Harman G, et al. Prediction of suicidal ideation and attempt in 9 and 10 year-old children using transdiagnostic risk features. PLoS ONE. 2021;16(5):e0252114.

Navarro MC, et al. Machine Learning Assessment of Early Life Factors Predicting Suicide Attempt in adolescence or young adulthood. JAMA Netw Open. 2021;4(3):e211450–211450.

McKernan LC, et al. Outpatient Engagement and predicted risk of suicide attempts in Fibromyalgia. Arthritis Care Res. 2019;71(9):1255–63.

Download references

Acknowledgements

The authors would like to thank Mohammad Hossein Mehrolhassani for his collaboration in this research.

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and affiliations.

School of Engineering and the Built Environment, Anglia Ruskin University, Chelmsford, UK

Houriyeh Ehtemam, Shabnam Sadeghi Esfahlani, Alireza Sanaei & Hassan Shirvani

Health Services Management Research Center, Institute for Futures Studies in Health, Kerman University of Medical Sciences, Kerman, Iran

Mohammad Mehdi Ghaemi & Rohaneh Rahimisadegh

Medical Informatics Research Center, Institute for Futures Studies in Health, Kerman University of Medical Sciences, Kerman, Iran

Sadrieh Hajesmaeel-Gohari & Kambiz Bahaadinbeigy

Department of Computer Engineering, Faculty of Engineering, Shahid Bahonar University of Kerman, Kerman, Iran

Fahimeh Ghasemian

You can also search for this author in PubMed   Google Scholar

Contributions

Houriyeh Ehtemam: Conceptualization, Investigation, Methodology, Writing – original draft, Data management, Visualization, Meta-analysis & Meta-synthesis. Shabnam Sadeghi Esfahlani: Writing – review & editing. Alireza Sanaei: Meta-analysis. Mohammad Mehdi Ghaemi: Supervision. Sadrieh Hajesmaeel-Gohari: Writing – review & editing. Rohaneh Rahimisadegh: Writing – review & editing, Validation. Kambiz Bahaadinbeigy: Supervision. Fahimeh Ghasemian: editing. Hassan Shirvani: Review, Editing and Supervision.

Corresponding author

Correspondence to Mohammad Mehdi Ghaemi .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Ehtemam, H., Sadeghi Esfahlani, S., Sanaei, A. et al. Role of machine learning algorithms in suicide risk prediction: a systematic review-meta analysis of clinical studies. BMC Med Inform Decis Mak 24 , 138 (2024). https://doi.org/10.1186/s12911-024-02524-0

Download citation

Received : 26 September 2023

Accepted : 30 April 2024

Published : 27 May 2024

DOI : https://doi.org/10.1186/s12911-024-02524-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Machine learning
  • Risk prediction
  • Suicide prevention
  • Meta-analysis
  • Meta-synthesis

BMC Medical Informatics and Decision Making

ISSN: 1472-6947

mental health prediction using machine learning research paper

IMAGES

  1. (PDF) Mental Health Prediction Using Machine Learning: Taxonomy

    mental health prediction using machine learning research paper

  2. (PDF) An empirical comparison of machine learning models for student's

    mental health prediction using machine learning research paper

  3. (PDF) A Brief Review of Machine Learning Methods used in Mental Health

    mental health prediction using machine learning research paper

  4. (PDF) Prediction of Mental Health Problems Among Children Using Machine

    mental health prediction using machine learning research paper

  5. Machine learning prediction of motor response after deep brain

    mental health prediction using machine learning research paper

  6. Mental Health Prediction Using Machine Learning

    mental health prediction using machine learning research paper

VIDEO

  1. Can Machines Read our Minds?

  2. Mental Disease Prediction

  3. Mental Disease Prediction

  4. Mental Health Prediction System

  5. Disease Prediction Using Machine Learning Algorithm

  6. Stress level prediction using machine learning techniques (MOHAMED ISLAM)-212220220025

COMMENTS

  1. Mental Health Prediction Using Machine Learning: Taxonomy, Applications, and Challenges

    prediction model. Using several machine learning models, an overall accuracy of 77% can be generated in the prediction. of PTSD. Based on the study conducted by Salminen et al., the. authors have ...

  2. Mental Health Prediction Using Machine Learning: Taxonomy ...

    The increase of mental health problems and the need for effective medical health care have led to an investigation of machine learning that can be applied in mental health problems. This paper presents a recent systematic review of machine learning approaches in predicting mental health problems. Furthermore, we will discuss the challenges, limitations, and future directions for the ...

  3. Prediction of Public Mental Health by using Machine Learning Algorithms

    Prediction of Public Mental Health by using Machine Learning Algorithms ... this paper describes another structure that psychological wellness experts be able to use to address difficulties they realize utilizing data science. Albeit countless exploration papers have been distributed on open emotional well-being, few have tended to the ...

  4. Machine Learning-Based Prediction of Mental Well-Being Using Health

    Background: Since the onset of the COVID-19 pandemic in early 2020, the importance of timely and effective assessment of mental well-being has increased dramatically. Machine learning (ML) algorithms and artificial intelligence (AI) techniques can be harnessed for early detection, prognostication and prediction of negative psychological well-being states. Methods: We used data from a large ...

  5. Predictive modeling of depression and anxiety using electronic health

    To create the prediction matrix on the held out test set, all 5 saved models for each machine learning method made predictions on each subject. The predictions for each model type were then ...

  6. Machine Learning in Mental Health: A Systematic Review of the HCI

    Machine Learning in Mental Health: ... Outside of suicide prediction, individual papers sought to help predict: ... (iv) more actively include MHPs and people with lived experiences of mental health in research and development processes (e.g., through observational studies, interviews, focus groups, and collaborative partnerships), and to ...

  7. A Review of Machine Learning and Deep Learning Approaches on Mental

    This study aims to contribute to mental health condition diagnosis research with the use of machine learning techniques in the form of a systematic study. It examines the developments in ML and DL to provide a summary of the methods used by researchers over the last decade to diagnose or predict the most common mental health conditions ...

  8. Machine learning model to predict mental health crises from ...

    Machine learning applied on electronic health records can predict mental health crises 28 days in advance and become a clinically valuable tool for managing caseloads and mitigating the risk of ...

  9. Predicting Anxiety, Depression and Stress in Modern Life using Machine

    In this paper, predictions f anxiety, depressi n and stress were made using machine learning algorithms. In order to apply these algorithms, data were collected from employed and un mployed individuals across different cultures and communities through the Depression, Anxi ty and Stress Scale questionnair (DASS 21).

  10. Mental Health Prediction Using Machine Learning:

    Besides that, the performances of the machine learning model will be included in this paper to show the efficiency of the machine learning approaches within the mental health field. For instance, the performances such as accuracy, the area under the ROC curve (AUC), F 1-score, sensitivity, or specificity will be specified and mentioned in this ...

  11. Predicting Mental Health Illness using Machine Learning

    These circumstances make it urgently necessary to find a solution so that more individuals are not inclined to mental diseases. This paper focuses on forecasting mental health using deep learning approaches and machine learning algorithm that is support vector machine. Support vector machine is used to solve the existing problem, as many ...

  12. Deep learning in mental health outcome research: a scoping review

    Mental illness is a type of health condition that changes a person's mind, emotions, or behavior (or all three), and has been shown to impact an individual's physical health 1,2.Mental health ...

  13. Artificial intelligence in prediction of mental health disorders

    The coronavirus disease 2019 (COVID-19) pandemic and its immediate aftermath present a serious threat to the mental health of health care workers (HCWs), who may develop elevated rates of anxiety, depression, posttraumatic stress disorder (PTSD), or even suicidal behaviors ().Recent research related to the COVID-19 pandemic (2,3) and 2015 Middle East respiratory syndrome (MERS) outbreak ...

  14. PDF Mental Health Prediction using Machine Learning

    Abstract : Algorithm built using jupyter Notebook and Machine Learning that predict or manage mental health. The contents of this paper is an analysis of using machine learning models to predict Mental Health disorder in people using health care data. Worldwide, depression affects millions of individuals and is a crippling affliction.

  15. A Review of Machine Learning and Deep Learning Approaches on Mental

    Combating mental illnesses such as depression and anxiety has become a global concern. As a result of the necessity for finding effective ways to battle these problems, machine learning approaches have been included in healthcare systems for the diagnosis and probable prediction of the treatment outcomes of mental health conditions. With the growing interest in machine and deep learning ...

  16. Predicting Mental Health Illness using Machine Learning Algorithms

    This study identified five machine learning techniques and assessed their accuracy in identifying mental health issues using several accuracy criteria. The five machine learning techniques are Logistic Regression, K-NN Classifier, Decision Tree Classifier, Random Forest, and Stacking. We have compared these techniques and implemented them and ...

  17. Single classifier vs. ensemble machine learning approaches for mental

    Early prediction of mental health issues among individuals is paramount for early diagnosis and treatment by mental health professionals. One of the promising approaches to achieving fully automated computer-based approaches for predicting mental health problems is via machine learning. As such, this study aims to empirically evaluate several popular machine learning algorithms in classifying ...

  18. Predicting Mental Health Illness using Machine Learning Algorithms

    The five machine learning techniques are. Logistic Regression, K-NN Classifier, Decision Tree Classifier, Random F orest, and Stacking. W e have compared these techniques and implemen ted them ...

  19. Mental Health Status Prediction Using ML Classifiers with NLP-Based

    Using machine learning techniques can provide new ways to study human attitudes and behaviors as well as identifying signs and symptoms of mental illness. In this paper, we used NLP for detecting the status of mental health through a text message and our research gives a detailed investigation of several supervised classifier algorithms ...

  20. PDF Prediction of Mental Health Using Machine Learning

    Abstract - Mental health problems detected at earlier stage helps the psychologist to medicate and enhance the patient's life. Depression is one of the leading causes of disability worldwide. This article provides an evaluation of machine learning in mental healthcare. The analysis of research on ML -peculiar

  21. Predicting mental health problems in adolescence using machine learning

    Open in a separate window. The probability threshold was set to 0.8, meaning that the model classified participants as having mental health problems if the probability of belonging to the class was greater than 0.2. Our top model had a predictive value of 15%, while the negative predictive value was at 96%.

  22. Mental Health Prediction Among Students Using Machine Learning

    In this research, we consider some mental illnesses like stress, anxiety, Post-traumatic stress disorder (PSTD), Attention Deficit Hyperactivity Disorder (ADHD), and depression in students. If students mental health problem can be diagnosed early, they can treated in earlier stage. Presently, Machine Learning techniques are well suited for the ...

  23. PAPER OPEN ACCESS You may also like 3UHGLFWLQJ0HQWDO+HDOWK ...

    performance of mental health prediction with a 90% accuracy rate [6]. In this, the research has concentrated on the bene ts of machine learning in improving mental health identi cation and diagnosis of Alzheimer's disease, depression, and other mental illnesses schizophrenia. Overall,

  24. Stress and Mental Health Detection using Machine Learning

    Here, we provide mental health prediction utilizing KNN, Random Forest, and Logistic Regression whereas CNN for stress detection to aid in the timely identification of the illness. The website we ...

  25. Frontiers

    For the purpose of improving the accuracy of mental health prediction, machine learning technology has been used in the mental health prediction research since the 1980s. Basavappa et al. ... 2020, this paper predicts the mental health of Chinese medical workers during COVID-19. The subjects of above survey are medical workers who participated ...

  26. Mental Stress Level Prediction and Classification based on Machine Learning

    Stress is a major symptom for mental health. Stress affects every aspect of a person's life such as emotions, thoughts, and behaviors. This paper presented the study on previous research on stress detection based on machine learning algorithms. presented a stress level classification framework using the PhysioBank dataset to analyze the stress ...

  27. Editorial: AI approach to the psychiatric diagnosis and prediction

    Machine learning and NLP models have been highly topical issues in medicine in recent years and may be considered a new paradigm in medical research, however, these processes tend to confirm clinical hypotheses rather than developing entirely new information.

  28. Role of machine learning algorithms in suicide risk prediction: a

    Objective Suicide is a complex and multifactorial public health problem. Understanding and addressing the various factors associated with suicide is crucial for prevention and intervention efforts. Machine learning (ML) could enhance the prediction of suicide attempts. Method A systematic review was performed using PubMed, Scopus, Web of Science and SID databases. We aim to evaluate the ...

  29. PDF Comparative Analysis of Mental Health Prediction by Using Machine

    HEALTH PREDICTION BY USING MACHINE LEARNING ALGORITHMS Benitta Devakirubai1 and Nancy Jasmine Goldena2 ... recent research found that individuals with mental illnesses frequently disclose these conditions on social media as a form of consolation. Therefore, the primary goal of this work is to analyse the mental health of the students ...

  30. Emotion Recognition Based on Handwriting Using Generative Adversarial

    The TabNet (a neural network designed for tabular data) with SimAM (a simple, parameter-free attention module) was employed and compared with the original TabNet and traditional machine learning models; the incorporation of the SimAm attention mechanism led to a 1.35% improvement in classification accuracy.