• Survey paper
  • Open access
  • Published: 03 May 2022

A systematic review and research perspective on recommender systems

  • Deepjyoti Roy   ORCID: orcid.org/0000-0002-8020-7145 1 &
  • Mala Dutta 1  

Journal of Big Data volume  9 , Article number:  59 ( 2022 ) Cite this article

67k Accesses

106 Citations

9 Altmetric

Metrics details

Recommender systems are efficient tools for filtering online information, which is widespread owing to the changing habits of computer users, personalization trends, and emerging access to the internet. Even though the recent recommender systems are eminent in giving precise recommendations, they suffer from various limitations and challenges like scalability, cold-start, sparsity, etc. Due to the existence of various techniques, the selection of techniques becomes a complex work while building application-focused recommender systems. In addition, each technique comes with its own set of features, advantages and disadvantages which raises even more questions, which should be addressed. This paper aims to undergo a systematic review on various recent contributions in the domain of recommender systems, focusing on diverse applications like books, movies, products, etc. Initially, the various applications of each recommender system are analysed. Then, the algorithmic analysis on various recommender systems is performed and a taxonomy is framed that accounts for various components required for developing an effective recommender system. In addition, the datasets gathered, simulation platform, and performance metrics focused on each contribution are evaluated and noted. Finally, this review provides a much-needed overview of the current state of research in this field and points out the existing gaps and challenges to help posterity in developing an efficient recommender system.

Introduction

The recent advancements in technology along with the prevalence of online services has offered more abilities for accessing a huge amount of online information in a faster manner. Users can post reviews, comments, and ratings for various types of services and products available online. However, the recent advancements in pervasive computing have resulted in an online data overload problem. This data overload complicates the process of finding relevant and useful content over the internet. The recent establishment of several procedures having lower computational requirements can however guide users to the relevant content in a much easy and fast manner. Because of this, the development of recommender systems has recently gained significant attention. In general, recommender systems act as information filtering tools, offering users suitable and personalized content or information. Recommender systems primarily aim to reduce the user’s effort and time required for searching relevant information over the internet.

Nowadays, recommender systems are being increasingly used for a large number of applications such as web [ 1 , 67 , 70 ], books [ 2 ], e-learning [ 4 , 16 , 61 ], tourism [ 5 , 8 , 78 ], movies [ 66 ], music [ 79 ], e-commerce, news, specialized research resources [ 65 ], television programs [ 72 , 81 ], etc. It is therefore important to build high-quality and exclusive recommender systems for providing personalized recommendations to the users in various applications. Despite the various advances in recommender systems, the present generation of recommender systems requires further improvements to provide more efficient recommendations applicable to a broader range of applications. More investigation of the existing latest works on recommender systems is required which focus on diverse applications.

There is hardly any review paper that has categorically synthesized and reviewed the literature of all the classification fields and application domains of recommender systems. The few existing literature reviews in the field cover just a fraction of the articles or focus only on selected aspects such as system evaluation. Thus, they do not provide an overview of the application field, algorithmic categorization, or identify the most promising approaches. Also, review papers often neglect to analyze the dataset description and the simulation platforms used. This paper aims to fulfil this significant gap by reviewing and comparing existing articles on recommender systems based on a defined classification framework, their algorithmic categorization, simulation platforms used, applications focused, their features and challenges, dataset description and system performance. Finally, we provide researchers and practitioners with insight into the most promising directions for further investigation in the field of recommender systems under various applications.

In essence, recommender systems deal with two entities—users and items, where each user gives a rating (or preference value) to an item (or product). User ratings are generally collected by using implicit or explicit methods. Implicit ratings are collected indirectly from the user through the user’s interaction with the items. Explicit ratings, on the other hand, are given directly by the user by picking a value on some finite scale of points or labelled interval values. For example, a website may obtain implicit ratings for different items based on clickstream data or from the amount of time a user spends on a webpage and so on. Most recommender systems gather user ratings through both explicit and implicit methods. These feedbacks or ratings provided by the user are arranged in a user-item matrix called the utility matrix as presented in Table 1 .

The utility matrix often contains many missing values. The problem of recommender systems is mainly focused on finding the values which are missing in the utility matrix. This task is often difficult as the initial matrix is usually very sparse because users generally tend to rate only a small number of items. It may also be noted that we are interested in only the high user ratings because only such items would be suggested back to the users. The efficiency of a recommender system greatly depends on the type of algorithm used and the nature of the data source—which may be contextual, textual, visual etc.

Types of recommender systems

Recommender systems are broadly categorized into three different types viz. content-based recommender systems, collaborative recommender systems and hybrid recommender systems. A diagrammatic representation of the different types of recommender systems is given in Fig.  1 .

figure 1

Content-based recommender system

In content-based recommender systems, all the data items are collected into different item profiles based on their description or features. For example, in the case of a book, the features will be author, publisher, etc. In the case of a movie, the features will be the movie director, actor, etc. When a user gives a positive rating to an item, then the other items present in that item profile are aggregated together to build a user profile. This user profile combines all the item profiles, whose items are rated positively by the user. Items present in this user profile are then recommended to the user, as shown in Fig.  2 .

figure 2

One drawback of this approach is that it demands in-depth knowledge of the item features for an accurate recommendation. This knowledge or information may not be always available for all items. Also, this approach has limited capacity to expand on the users' existing choices or interests. However, this approach has many advantages. As user preferences tend to change with time, this approach has the quick capability of dynamically adapting itself to the changing user preferences. Since one user profile is specific only to that user, this algorithm does not require the profile details of any other users because they provide no influence in the recommendation process. This ensures the security and privacy of user data. If new items have sufficient description, content-based techniques can overcome the cold-start problem i.e., this technique can recommend an item even when that item has not been previously rated by any user. Content-based filtering approaches are more common in systems like personalized news recommender systems, publications, web pages recommender systems, etc.

Collaborative filtering-based recommender system

Collaborative approaches make use of the measure of similarity between users. This technique starts with finding a group or collection of user X whose preferences, likes, and dislikes are similar to that of user A. X is called the neighbourhood of A. The new items which are liked by most of the users in X are then recommended to user A. The efficiency of a collaborative algorithm depends on how accurately the algorithm can find the neighbourhood of the target user. Traditionally collaborative filtering-based systems suffer from the cold-start problem and privacy concerns as there is a need to share user data. However, collaborative filtering approaches do not require any knowledge of item features for generating a recommendation. Also, this approach can help to expand on the user’s existing interests by discovering new items. Collaborative approaches are again divided into two types: memory-based approaches and model-based approaches.

Memory-based collaborative approaches recommend new items by taking into consideration the preferences of its neighbourhood. They make use of the utility matrix directly for prediction. In this approach, the first step is to build a model. The model is equal to a function that takes the utility matrix as input.

Model = f (utility matrix)

Then recommendations are made based on a function that takes the model and user profile as input. Here we can make recommendations only to users whose user profile belongs to the utility matrix. Therefore, to make recommendations for a new user, the user profile must be added to the utility matrix, and the similarity matrix should be recomputed, which makes this technique computation heavy.

Recommendation = f (defined model, user profile) where user profile  ∈  utility matrix

Memory-based collaborative approaches are again sub-divided into two types: user-based collaborative filtering and item-based collaborative filtering. In the user-based approach, the user rating of a new item is calculated by finding other users from the user neighbourhood who has previously rated that same item. If a new item receives positive ratings from the user neighbourhood, the new item is recommended to the user. Figure  3 depicts the user-based filtering approach.

figure 3

User-based collaborative filtering

In the item-based approach, an item-neighbourhood is built consisting of all similar items which the user has rated previously. Then that user’s rating for a different new item is predicted by calculating the weighted average of all ratings present in a similar item-neighbourhood as shown in Fig.  4 .

figure 4

Item-based collaborative filtering

Model-based systems use various data mining and machine learning algorithms to develop a model for predicting the user’s rating for an unrated item. They do not rely on the complete dataset when recommendations are computed but extract features from the dataset to compute a model. Hence the name, model-based technique. These techniques also need two steps for prediction—the first step is to build the model, and the second step is to predict ratings using a function (f) which takes the model defined in the first step and the user profile as input.

Recommendation = f (defined model, user profile) where user profile  ∉  utility matrix

Model-based techniques do not require adding the user profile of a new user into the utility matrix before making predictions. We can make recommendations even to users that are not present in the model. Model-based systems are more efficient for group recommendations. They can quickly recommend a group of items by using the pre-trained model. The accuracy of this technique largely relies on the efficiency of the underlying learning algorithm used to create the model. Model-based techniques are capable of solving some traditional problems of recommender systems such as sparsity and scalability by employing dimensionality reduction techniques [ 86 ] and model learning techniques.

Hybrid filtering

A hybrid technique is an aggregation of two or more techniques employed together for addressing the limitations of individual recommender techniques. The incorporation of different techniques can be performed in various ways. A hybrid algorithm may incorporate the results achieved from separate techniques, or it can use content-based filtering in a collaborative method or use a collaborative filtering technique in a content-based method. This hybrid incorporation of different techniques generally results in increased performance and increased accuracy in many recommender applications. Some of the hybridization approaches are meta-level, feature-augmentation, feature-combination, mixed hybridization, cascade hybridization, switching hybridization and weighted hybridization [ 86 ]. Table 2 describes these approaches.

Recommender system challenges

This section briefly describes the various challenges present in current recommender systems and offers different solutions to overcome these challenges.

Cold start problem

The cold start problem appears when the recommender system cannot draw any inference from the existing data, which is insufficient. Cold start refers to a condition when the system cannot produce efficient recommendations for the cold (or new) users who have not rated any item or have rated a very few items. It generally arises when a new user enters the system or new items (or products) are inserted into the database. Some solutions to this problem are as follows: (a) Ask new users to explicitly mention their item preference. (b) Ask a new user to rate some items at the beginning. (c) Collect demographic information (or meta-data) from the user and recommend items accordingly.

Shilling attack problem

This problem arises when a malicious user fakes his identity and enters the system to give false item ratings [ 87 ]. Such a situation occurs when the malicious user wants to either increase or decrease some item’s popularity by causing a bias on selected target items. Shilling attacks greatly reduce the reliability of the system. One solution to this problem is to detect the attackers quickly and remove the fake ratings and fake user profiles from the system.

Synonymy problem

This problem arises when similar or related items have different entries or names, or when the same item is represented by two or more names in the system [ 78 ]. For example, babywear and baby cloth. Many recommender systems fail to distinguish these differences, hence reducing their recommendation accuracy. To alleviate this problem many methods are used such as demographic filtering, automatic term expansion and Singular Value Decomposition [ 76 ].

Latency problem

The latency problem is specific to collaborative filtering approaches and occurs when new items are frequently inserted into the database. This problem is characterized by the system’s failure to recommend new items. This happens because new items must be reviewed before they can be recommended in a collaborative filtering environment. Using content-based filtering may resolve this issue, but it may introduce overspecialization and decrease the computing time and system performance. To increase performance, the calculations can be done in an offline environment and clustering-based techniques can be used [ 76 ].

Sparsity problem

Data sparsity is a common problem in large scale data analysis, which arises when certain expected values are missing in the dataset. In the case of recommender systems, this situation occurs when the active users rate very few items. This reduces the recommendation accuracy. To alleviate this problem several techniques can be used such as demographic filtering, singular value decomposition and using model-based collaborative techniques.

Grey sheep problem

The grey sheep problem is specific to pure collaborative filtering approaches where the feedback given by one user do not match any user neighbourhood. In this situation, the system fails to accurately predict relevant items for that user. This problem can be resolved by using pure content-based approaches where predictions are made based on the user’s profile and item properties.

Scalability problem

Recommender systems, especially those employing collaborative filtering techniques, require large amounts of training data, which cause scalability problems. The scalability problem arises when the amount of data used as input to a recommender system increases quickly. In this era of big data, more and more items and users are rapidly getting added to the system and this problem is becoming common in recommender systems. Two common approaches used to solve the scalability problem is dimensionality reduction and using clustering-based techniques to find users in tiny clusters instead of the complete database.

Methodology

The purpose of this study is to understand the research trends in the field of recommender systems. The nature of research in recommender systems is such that it is difficult to confine each paper to a specific discipline. This can be further understood by the fact that research papers on recommender systems are scattered across various journals such as computer science, management, marketing, information technology and information science. Hence, this literature review is conducted over a wide range of electronic journals and research databases such as ACM Portal, IEEE/IEE Library, Google Scholars and Science Direct [ 88 ].

The search process of online research articles was performed based on 6 descriptors: “Recommender systems”, “Recommendation systems”, “Movie Recommend*”, “Music Recommend*”, “Personalized Recommend*”, “Hybrid Recommend*”. The following research papers described below were excluded from our research:

News articles.

Master’s dissertations.

Non-English papers.

Unpublished papers.

Research papers published before 2011.

We have screened a total of 350 articles based on their abstracts and content. However, only research papers that described how recommender systems can be applied were chosen. Finally, 60 papers were selected from top international journals indexed in Scopus or E-SCI in 2021. We now present the PRISMA flowchart of the inclusion and exclusion process in Fig.  5 .

figure 5

PRISMA flowchart of the inclusion and exclusion process. Abstract and content not suitable to the study: * The use or application of the recommender system is not specified: **

Each paper was carefully reviewed and classified into 6 categories in the application fields and 3 categories in the techniques used to develop the system. The classification framework is presented in Fig.  6 .

figure 6

Classification framework

The number of relevant articles come from Expert Systems with Applications (23%), followed by IEEE (17%), Knowledge-Based System (17%) and Others (43%). Table 3 depicts the article distribution by journal title and Table 4 depicts the sector-wise article distribution.

Both forward and backward searching techniques were implemented to establish that the review of 60 chosen articles can represent the domain literature. Hence, this paper can demonstrate its validity and reliability as a literature review.

Review on state-of-the-art recommender systems

This section presents a state-of-art literature review followed by a chronological review of the various existing recommender systems.

Literature review

In 2011, Castellano et al. [ 1 ] developed a “NEuro-fuzzy WEb Recommendation (NEWER)” system for exploiting the possibility of combining computational intelligence and user preference for suggesting interesting web pages to the user in a dynamic environment. It considered a set of fuzzy rules to express the correlations between user relevance and categories of pages. Crespo et al. [ 2 ] presented a recommender system for distance education over internet. It aims to recommend e-books to students using data from user interaction. The system was developed using a collaborative approach and focused on solving the data overload problem in big digital content. Lin et al. [ 3 ] have put forward a recommender system for automatic vending machines using Genetic algorithm (GA), k-means, Decision Tree (DT) and Bayesian Network (BN). It aimed at recommending localized products by developing a hybrid model combining statistical methods, classification methods, clustering methods, and meta-heuristic methods. Wang and Wu [ 4 ] have implemented a ubiquitous learning system for providing personalized learning assistance to the learners by combining the recommendation algorithm with a context-aware technique. It employed the Association Rule Mining (ARM) technique and aimed to increase the effectiveness of the learner’s learning. García-Crespo et al. [ 5 ] presented a “semantic hotel” recommender system by considering the experiences of consumers using a fuzzy logic approach. The system considered both hotel and customer characteristics. Dong et al. [ 6 ] proposed a structure for a service-concept recommender system using a semantic similarity model by integrating the techniques from the view of an ontology structure-oriented metric and a concept content-oriented metric. The system was able to deliver optimal performance when compared with similar recommender systems. Li et al. [ 7 ] developed a Fuzzy linguistic modelling-based recommender system for assisting users to find experts in knowledge management systems. The developed system was applied to the aircraft industry where it demonstrated efficient and feasible performance. Lorenzi et al. [ 8 ] presented an “assumption-based multiagent” system to make travel package recommendations using user preferences in the tourism industry. It performed different tasks like discovering, filtering, and integrating specific information for building a travel package following the user requirement. Huang et al. [ 9 ] proposed a context-aware recommender system through the extraction, evaluation and incorporation of contextual information gathered using the collaborative filtering and rough set model.

In 2012, Chen et al. [ 10 ] presented a diabetes medication recommender model by using “Semantic Web Rule Language (SWRL) and Java Expert System Shell (JESS)” for aggregating suitable prescriptions for the patients. It aimed at selecting the most suitable drugs from the list of specific drugs. Mohanraj et al. [ 11 ] developed the “Ontology-driven bee’s foraging approach (ODBFA)” to accurately predict the online navigations most likely to be visited by a user. The self-adaptive system is intended to capture the various requirements of the online user by using a scoring technique and by performing a similarity comparison. Hsu et al. [ 12 ] proposed a “personalized auxiliary material” recommender system by considering the specific course topics, individual learning styles, complexity of the auxiliary materials using an artificial bee colony algorithm. Gemmell et al. [ 13 ] demonstrated a solution for the problem of resource recommendation in social annotation systems. The model was developed using a linear-weighted hybrid method which was capable of providing recommendations under different constraints. Choi et al. [ 14 ] proposed one “Hybrid Online-Product rEcommendation (HOPE) system” by the integration of collaborative filtering through sequential pattern analysis-based recommendations and implicit ratings. Garibaldi et al. [ 15 ] put forward a technique for incorporating the variability in a fuzzy inference model by using non-stationary fuzzy sets for replicating the variabilities of a human. This model was applied to a decision problem for treatment recommendations of post-operative breast cancer.

In 2013, Salehi and Kmalabadi [ 16 ] proposed an e-learning material recommender system by “modelling of materials in a multidimensional space of material’s attribute”. It employed both content and collaborative filtering. Aher and Lobo [ 17 ] introduced a course recommender system using data mining techniques such as simple K-means clustering and Association Rule Mining (ARM) algorithm. The proposed e-learning system was successfully demonstrated for “MOOC (Massively Open Online Courses)”. Kardan and Ebrahimi [ 18 ] developed a hybrid recommender system for recommending posts in asynchronous discussion groups. The system was built combining both collaborative filtering and content-based filtering. It considered implicit user data to compute the user similarity with various groups, for recommending suitable posts and contents to its users. Chang et al. [ 19 ] adopted a cloud computing technology for building a TV program recommender system. The system designed for digital TV programs was implemented using Hadoop Fair Scheduler (HFC), K-means clustering and k-nearest neighbour (KNN) algorithms. It was successful in processing huge amounts of real-time user data. Lucas et al. [ 20 ] implemented a recommender model for assisting a tourism application by using associative classification and fuzzy logic to predict the context. Niu et al. [ 21 ] introduced “Affivir: An Affect-based Internet Video Recommendation System” which was developed by calculating user preferences and by using spectral clustering. This model recommended videos with similar effects, which was processed to get optimal results with dynamic adjustments of recommendation constraints.

In 2014, Liu et al. [ 22 ] implemented a new route recommendation model for offering personalized and real-time route recommendations for self-driven tourists to minimize the queuing time and traffic jams infamous tourist places. Recommendations were carried out by considering the preferences of users. Bakshi et al. [ 23 ] proposed an unsupervised learning-based recommender model for solving the scalability problem of recommender systems. The algorithm used transitive similarities along with Particle Swarm Optimization (PSO) technique for discovering the global neighbours. Kim and Shim [ 24 ] proposed a recommender system based on “latent Dirichlet allocation using probabilistic modelling for Twitter” that could recommend the top-K tweets for a user to read, and the top-K users to follow. The model parameters were learned from an inference technique by using the differential Expectation–Maximization (EM) algorithm. Wang et al. [ 25 ] developed a hybrid-movie recommender model by aggregating a genetic algorithm (GA) with improved K-means and Principal Component Analysis (PCA) technique. It was able to offer intelligent movie recommendations with personalized suggestions. Kolomvatsos et al. [ 26 ] proposed a recommender system by considering an optimal stopping theory for delivering books or music recommendations to the users. Gottschlich et al. [ 27 ] proposed a decision support system for stock investment recommendations. It computed the output by considering the overall crowd’s recommendations. Torshizi et al. [ 28 ] have introduced a hybrid recommender system to determine the severity level of a medical condition. It could recommend suitable therapies for patients suffering from Benign Prostatic Hyperplasia.

In 2015, Zahálka et al. [ 29 ] proposed a venue recommender: “City Melange”. It was an interactive content-based model which used the convolutional deep-net features of the visual domain and the linear Support Vector Machine (SVM) model to capture the semantic information and extract latent topics. Sankar et al. [ 30 ] have proposed a stock recommender system based on the stock holding portfolio of trusted mutual funds. The system employed the collaborative filtering approach along with social network analysis for offering a decision support system to build a trust-based recommendation model. Chen et al. [ 31 ] have put forward a novel movie recommender system by applying the “artificial immune network to collaborative filtering” technique. It computed the affinity of an antigen and the affinity between an antibody and antigen. Based on this computation a similarity estimation formula was introduced which was used for the movie recommendation process. Wu et al. [ 32 ] have examined the technique of data fusion for increasing the efficiency of item recommender systems. It employed a hybrid linear combination model and used a collaborative tagging system. Yeh and Cheng [ 33 ] have proposed a recommender system for tourist attractions by constructing the “elicitation mechanism using the Delphi panel method and matrix construction mechanism using the repertory grids”, which was developed by considering the user preference and expert knowledge.

In 2016, Liao et al. [ 34 ] proposed a recommender model for online customers using a rough set association rule. The model computed the probable behavioural variations of online consumers and provided product category recommendations for e-commerce platforms. Li et al. [ 35 ] have suggested a movie recommender system based on user feedback collected from microblogs and social networks. It employed the sentiment-aware association rule mining algorithm for recommendations using the prior information of frequent program patterns, program metadata similarity and program view logs. Wu et al. [ 36 ] have developed a recommender system for social media platforms by aggregating the technique of Social Matrix Factorization (SMF) and Collaborative Topic Regression (CTR). The model was able to compute the ratings of users to items for making recommendations. For improving the recommendation quality, it gathered information from multiple sources such as item properties, social networks, feedback, etc. Adeniyi et al. [ 37 ] put forward a study of automated web-usage data mining and developed a recommender system that was tested in both real-time and online for identifying the visitor’s or client’s clickstream data.

In 2017, Rawat and Kankanhalli [ 38 ] have proposed a viewpoint recommender system called “ClickSmart” for assisting mobile users to capture high-quality photographs at famous tourist places. Yang et al. [ 39 ] proposed a gradient boosting-based job recommendation system for satisfying the cost-sensitive requirements of the users. The hybrid algorithm aimed to reduce the rate of unnecessary job recommendations. Lee et al. [ 40 ] proposed a music streaming recommender system based on smartphone activity usage. The proposed system benefitted by using feature selection approaches with machine learning techniques such as Naive Bayes (NB), Support Vector Machine (SVM), Multi-layer Perception (MLP), Instance-based k -Nearest Neighbour (IBK), and Random Forest (RF) for performing the activity detection from the mobile signals. Wei et al. [ 41 ] have proposed a new stacked denoising autoencoder (SDAE) based recommender system for cold items. The algorithm employed deep learning and collaborative filtering method to predict the unknown ratings.

In 2018, Li et al. [ 42 ] have developed a recommendation algorithm using Weighted Linear Regression Models (WLRRS). The proposed system was put to experiment using the MovieLens dataset and it presented better classification and predictive accuracy. Mezei and Nikou [ 43 ] presented a mobile health and wellness recommender system based on fuzzy optimization. It could recommend a collection of actions to be taken by the user to improve the user’s health condition. Recommendations were made considering the user’s physical activities and preferences. Ayata et al. [ 44 ] proposed a music recommendation model based on the user emotions captured through wearable physiological sensors. The emotion detection algorithm employed different machine learning algorithms like SVM, RF, KNN and decision tree (DT) algorithms to predict the emotions from the changing electrical signals gathered from the wearable sensors. Zhao et al. [ 45 ] developed a multimodal learning-based, social-aware movie recommender system. The model was able to successfully resolve the sparsity problem of recommender systems. The algorithm developed a heterogeneous network by exploiting the movie-poster image and textual description of each movie based on the social relationships and user ratings.

In 2019, Hammou et al. [ 46 ] proposed a Big Data recommendation algorithm capable of handling large scale data. The system employed random forest and matrix factorization through a data partitioning scheme. It was then used for generating recommendations based on user rating and preference for each item. The proposed system outperformed existing systems in terms of accuracy and speed. Zhao et al. [ 47 ] have put forward a hybrid initialization method for social network recommender systems. The algorithm employed denoising autoencoder (DAE) neural network-based initialization method (ANNInit) and attribute mapping. Bhaskaran and Santhi [ 48 ] have developed a hybrid, trust-based e-learning recommender system using cloud computing. The proposed algorithm was capable of learning online user activities by using the Firefly Algorithm (FA) and K-means clustering. Afolabi and Toivanen [ 59 ] have suggested an integrated recommender model based on collaborative filtering. The proposed model “Connected Health for Effective Management of Chronic Diseases”, aimed for integrating recommender systems for better decision-making in the process of disease management. He et al. [ 60 ] proposed a movie recommender system called “HI2Rec” which explored the usage of collaborative filtering and heterogeneous information for making movie recommendations. The model used the knowledge representation learning approach to embed movie-related information gathered from different sources.

In 2020, Han et al. [ 49 ] have proposed one Internet of Things (IoT)-based cancer rehabilitation recommendation system using the Beetle Antennae Search (BAS) algorithm. It presented the patients with a solution for the problem of optimal nutrition program by considering the objective function as the recurrence time. Kang et al. [ 50 ] have presented a recommender system for personalized advertisements in Online Broadcasting based on a tree model. Recommendations were generated in real-time by considering the user preferences to minimize the overhead of preference prediction and using a HashMap along with the tree characteristics. Ullah et al. [ 51 ] have implemented an image-based service recommendation model for online shopping based random forest and Convolutional Neural Networks (CNN). The model used JPEG coefficients to achieve an accurate prediction rate. Cai et al. [ 52 ] proposed a new hybrid recommender model using a many-objective evolutionary algorithm (MaOEA). The proposed algorithm was successful in optimizing the novelty, diversity, and accuracy of recommendations. Esteban et al. [ 53 ] have implemented a hybrid multi-criteria recommendation system concerned with students’ academic performance, personal interests, and course selection. The system was developed using a Genetic Algorithm (GA) and aimed at helping university students. It combined both course information and student information for increasing system performance and the reliability of the recommendations. Mondal et al. [ 54 ] have built a multilayer, graph data model-based doctor recommendation system by exploiting the trust concept between a patient-doctor relationship. The proposed system showed good results in practical applications.

In 2021, Dhelim et al. [ 55 ] have developed a personality-based product recommending model using the techniques of meta path discovery and user interest mining. This model showed better results when compared to session-based and deep learning models. Bhalse et al. [ 56 ] proposed a web-based movie recommendation system based on collaborative filtering using Singular Value Decomposition (SVD), collaborative filtering and cosine similarity (CS) for addressing the sparsity problem of recommender systems. It suggested a recommendation list by considering the content information of movies. Similarly, to solve both sparsity and cold-start problems Ke et al. [ 57 ] proposed a dynamic goods recommendation system based on reinforcement learning. The proposed system was capable of learning from the reduced entropy loss error on real-time applications. Chen et al. [ 58 ] have presented a movie recommender model combining various techniques like user interest with category-level representation, neighbour-assisted representation, user interest with latent representation and item-level representation using Feed-forward Neural Network (FNN).

Comparative chronological review

A comparative chronological review to compare the total contributions on various recommender systems in the past 10 years is given in Fig.  7 .

figure 7

Comparative chronological review of recommender systems under diverse applications

This review puts forward a comparison of the number of research works proposed in the domain of recommender systems from the year 2011 to 2021 using various deep learning and machine learning-based approaches. Research articles are categorized based on the recommender system classification framework as shown in Table 5 . The articles are ordered according to their year of publication. There are two key concepts: Application fields and techniques used. The application fields of recommender systems are divided into six different fields, viz. entertainment, health, tourism, web/e-commerce, education and social media/others.

Algorithmic categorization, simulation platforms and applications considered for various recommender systems

This section analyses different methods like deep learning, machine learning, clustering and meta-heuristic-based-approaches used in the development of recommender systems. The algorithmic categorization of different recommender systems is given in Fig.  8 .

figure 8

Algorithmic categorization of different recommender systems

Categorization is done based on content-based, collaborative filtering-based, and optimization-based approaches. In [ 8 ], a content-based filtering technique was employed for increasing the ability to trust other agents and for improving the exchange of information by trust degree. In [ 16 ], it was applied to enhance the quality of recommendations using the account attributes of the material. It achieved better performance concerning with F1-score, recall and precision. In [ 18 ], this technique was able to capture the implicit user feedback, increasing the overall accuracy of the proposed model. The content-based filtering in [ 30 ] was able to increase the accuracy and performance of a stock recommender system by using the “trust factor” for making decisions.

Different collaborative filtering approaches are utilized in recent studies, which are categorized as follows:

Model-based techniques

Neuro-Fuzzy [ 1 ] based technique helps in discovering the association between user categories and item relevance. It is also simple to understand. K-Means Clustering [ 2 , 19 , 25 , 48 ] is efficient for large scale datasets. It is simple to implement and gives a fast convergence rate. It also offers automatic recovery from failures. The decision tree [ 2 , 44 ] technique is easy to interpret. It can be used for solving the classic regression and classification problems in recommender systems. Bayesian Network [ 3 ] is a probabilistic technique used to solve classification challenges. It is based on the theory of Bayes theorem and conditional probability. Association Rule Mining (ARM) techniques [ 4 , 17 , 35 ] extract rules for projecting the occurrence of an item by considering the existence of other items in a transaction. This method uses the association rules to create a more suitable representation of data and helps in increasing the model performance and storage efficiency. Fuzzy Logic [ 5 , 7 , 15 , 20 , 28 , 43 ] techniques use a set of flexible rules. It focuses on solving complex real-time problems having an inaccurate spectrum of data. This technique provides scalability and helps in increasing the overall model performance for recommender systems. The semantic similarity [ 6 ] technique is used for describing a topological similarity to define the distance among the concepts and terms through ontologies. It measures the similarity information for increasing the efficiency of recommender systems. Rough set [ 9 , 34 ] techniques use probability distributions for solving the challenges of existing recommender models. Semantic web rule language [ 10 ] can efficiently extract the dataset features and increase the model efficiency. Linear programming-based approaches [ 13 , 42 ] are employed for achieving quality decision making in recommender models. Sequential pattern analysis [ 14 ] is applied to find suitable patterns among data items. This helps in increasing model efficiency. The probabilistic model [ 24 ] is a famous tool to handle uncertainty in risk computations and performance assessment. It offers better decision-making capabilities. K-nearest neighbours (KNN) [ 19 , 37 , 44 ] technique provides faster computation time, simplicity and ease of interpretation. They are good for classification and regression-based problems and offers more accuracy. Spectral clustering [ 21 ] is also called graph clustering or similarity-based clustering, which mainly focuses on reducing the space dimensionality in identifying the dataset items. Stochastic learning algorithm [ 26 ] solves the real-time challenges of recommender systems. Linear SVM [ 29 , 44 ] efficiently solves the high dimensional problems related to recommender systems. It is a memory-efficient method and works well with a large number of samples having relative separation among the classes. This method has been shown to perform well even when new or unfamiliar data is added. Relational Functional Gradient Boosting [ 39 ] technique efficiently works on the relational dependency of data, which is useful for statical relational learning for collaborative-based recommender systems. Ensemble learning [ 40 ] combines the forecast of two or more models and aims to achieve better performance than any of the single contributing models. It also helps in reducing overfitting problems, which are common in recommender systems.

SDAE [ 41 ] is used for learning the non-linear transformations with different filters for finding suitable data. This aids in increasing the performance of recommender models. Multimodal network learning [ 45 ] is efficient for multi-modal data, representing a combined representation of diverse modalities. Random forest [ 46 , 51 ] is a commonly used approach in comparison with other classifiers. It has been shown to increase accuracy when handling big data. This technique is a collection of decision trees to minimize variance through training on diverse data samples. ANNInit [ 47 ] is a type of artificial neural network-based technique that has the capability of self-learning and generating efficient results. It is independent of the data type and can learn data patterns automatically. HashMap [ 50 ] gives faster access to elements owing to the hashing methodology, which decreases the data processing time and increases the performance of the system. CNN [ 51 ] technique can automatically fetch the significant features of a dataset without any supervision. It is a computationally efficient method and provides accurate recommendations. This technique is also simple and fast for implementation. Multilayer graph data model [ 54 ] is efficient for real-time applications and minimizes the access time through mapping the correlation as edges among nodes and provides superior performance. Singular Value Decomposition [ 56 ] can simplify the input data and increase the efficiency of recommendations by eliminating the noise present in data. Reinforcement learning [ 57 ] is efficient for practical scenarios of recommender systems having large data sizes. It is capable of boosting the model performance by increasing the model accuracy even for large scale datasets. FNN [ 58 ] is one of the artificial neural network techniques which can learn non-linear and complex relationships between items. It has demonstrated a good performance increase when employed in different recommender systems. Knowledge representation learning [ 60 ] systems aim to simplify the model development process by increasing the acquisition efficiency, inferential efficiency, inferential adequacy and representation adequacy. User-based approaches [ 2 , 55 , 59 ] specialize in detecting user-related meta-data which is employed to increase the overall model performance. This technique is more suitable for real-time applications where it can capture user feedback and use it to increase the user experience.

Optimization-based techniques

The Foraging Bees [ 11 ] technique enables both functional and combinational optimization for random searching in recommender models. Artificial bee colony [ 12 ] is a swarm-based meta-heuristic technique that provides features like faster convergence rate, the ability to handle the objective with stochastic nature, ease for incorporating with other algorithms, usage of fewer control parameters, strong robustness, high flexibility and simplicity. Particle Swarm Optimization [ 23 ] is a computation optimization technique that offers better computational efficiency, robustness in control parameters, and is easy and simple to implement in recommender systems. Portfolio optimization algorithm [ 27 ] is a subclass of optimization algorithms that find its application in stock investment recommender systems. It works well in real-time and helps in the diversification of the portfolio for maximum profit. The artificial immune system [ 31 ]a is computationally intelligent machine learning technique. This technique can learn new patterns in the data and optimize the overall system parameters. Expectation maximization (EM) [ 32 , 36 , 38 ] is an iterative algorithm that guarantees the likelihood of finding the maximum parameters when the input variables are unknown. Delphi panel and repertory grid [ 33 ] offers efficient decision making by solving the dimensionality problem and data sparsity issues of recommender systems. The Firefly algorithm (FA) [ 48 ] provides fast results and increases recommendation efficiency. It is capable of reducing the number of iterations required to solve specific recommender problems. It also provides both local and global sets of solutions. Beetle Antennae Search (BAS) [ 49 ] offers superior search accuracy and maintains less time complexity that promotes the performance of recommendations. Many-objective evolutionary algorithm (MaOEA) [ 52 ] is applicable for real-time, multi-objective, search-related recommender systems. The introduction of a local search operator increases the convergence rate and gets suitable results. Genetic Algorithm (GA) [ 2 , 22 , 25 , 53 ] based techniques are used to solve the multi-objective optimization problems of recommender systems. They employ probabilistic transition rules and have a simpler operation that provides better recommender performance.

Features and challenges

The features and challenges of the existing recommender models are given in Table 6 .

Simulation platforms

The various simulation platforms used for developing different recommender systems with different applications are given in Fig.  9 .

figure 9

Simulation platforms used for developing different recommender systems

Here, the Java platform is used in 20% of the contributions, MATLAB is implemented in 7% of the contributions, different fold cross-validation are used in 8% of the contributions, 7% of the contributions are utilized by the python platform, 3% of the contributions employ R-programming and 1% of the contributions are developed by Tensorflow, Weka and Android environments respectively. Other simulation platforms like Facebook, web UI (User Interface), real-time environments, etc. are used in 50% of the contributions. Table 7 describes some simulation platforms commonly used for developing recommender systems.

Application focused and dataset description

This section provides an analysis of the different applications focused on a set of recent recommender systems and their dataset details.

Recent recommender systems were analysed and found that 11% of the contributions are focused on the domain of healthcare, 10% of the contributions are on movie recommender systems, 5% of the contributions come from music recommender systems, 6% of the contributions are focused on e-learning recommender systems, 8% of the contributions are used for online product recommender systems, 3% of the contributions are focused on book recommendations and 1% of the contributions are focused on Job and knowledge management recommender systems. 5% of the contributions concentrated on social network recommender systems, 10% of the contributions are focused on tourist and hotels recommender systems, 6% of the contributions are employed for stock recommender systems, and 3% of the contributions contributed for video recommender systems. The remaining 12% of contributions are miscellaneous recommender systems like Twitter, venue-based recommender systems, etc. Similarly, different datasets are gathered for recommender systems based on their application types. A detailed description is provided in Table 8 .

Performance analysis of state-of-art recommender systems

The performance evaluation metrics used for the analysis of different recommender systems is depicted in Table 9 . From the set of research works, 35% of the works use recall measure, 16% of the works employ Mean Absolute Error (MAE), 11% of the works take Root Mean Square Error (RMSE), 41% of the papers consider precision, 30% of the contributions analyse F1-measure, 31% of the works apply accuracy and 6% of the works employ coverage measure to validate the performance of the recommender systems. Moreover, some additional measures are also considered for validating the performance in a few applications.

Research gaps and challenges

In the recent decade, recommender systems have performed well in solving the problem of information overload and has become the more appropriate tool for multiple areas such as psychology, mathematics, computer science, etc. [ 80 ]. However, current recommender systems face a variety of challenges which are stated as follows, and discussed below:

Deployment challenges such as cold start, scalability, sparsity, etc. are already discussed in Sect. 3.

Challenges faced when employing different recommender algorithms for different applications.

Challenges in collecting implicit user data

Challenges in handling real-time user feedback.

Challenges faced in choosing the correct implementation techniques.

Challenges faced in measuring system performance.

Challenges in implementing recommender system for diverse applications.

Numerous recommender algorithms have been proposed on novel emerging dimensions which focus on addressing the existing limitations of recommender systems. A good recommender system must increase the recommendation quality based on user preferences. However, a specific recommender algorithm is not always guaranteed to perform equally for different applications. This encourages the possibility of employing different recommender algorithms for different applications, which brings along a lot of challenges. There is a need for more research to alleviate these challenges. Also, there is a large scope of research in recommender applications that incorporate information from different interactive online sites like Facebook, Twitter, shopping sites, etc. Some other areas for emerging research may be in the fields of knowledge-based recommender systems, methods for seamlessly processing implicit user data and handling real-time user feedback to recommend items in a dynamic environment.

Some of the other research areas like deep learning-based recommender systems, demographic filtering, group recommenders, cross-domain techniques for recommender systems, and dimensionality reduction techniques are also further required to be studied [ 83 ]. Deep learning-based recommender systems have recently gained much popularity. Future research areas in this field can integrate the well-performing deep learning models with new variants of hybrid meta-heuristic approaches.

During this review, it was observed that even though recent recommender systems have demonstrated good performance, there is no single standardized criteria or method which could be used to evaluate the performance of all recommender systems. System performance is generally measured by different evaluation matrices which makes it difficult to compare. The application of recommender systems in real-time applications is growing. User satisfaction and personalization play a very important role in the success of such recommender systems. There is a need for some new evaluation criteria which can evaluate the level of user satisfaction in real-time. New research should focus on capturing real-time user feedback and use the information to change the recommendation process accordingly. This will aid in increasing the quality of recommendations.

Conclusion and future scope

Recommender systems have attracted the attention of researchers and academicians. In this paper, we have identified and prudently reviewed research papers on recommender systems focusing on diverse applications, which were published between 2011 and 2021. This review has gathered diverse details like different application fields, techniques used, simulation tools used, diverse applications focused, performance metrics, datasets used, system features, and challenges of different recommender systems. Further, the research gaps and challenges were put forward to explore the future research perspective on recommender systems. Overall, this paper provides a comprehensive understanding of the trend of recommender systems-related research and to provides researchers with insight and future direction on recommender systems. The results of this study have several practical and significant implications:

Based on the recent-past publication rates, we feel that the research of recommender systems will significantly grow in the future.

A large number of research papers were identified in movie recommendations, whereas health, tourism and education-related recommender systems were identified in very few numbers. This is due to the availability of movie datasets in the public domain. Therefore, it is necessary to develop datasets in other fields also.

There is no standard measure to compute the performance of recommender systems. Among 60 papers, 21 used recall, 10 used MAE, 25 used precision, 18 used F1-measure, 19 used accuracy and only 7 used RMSE to calculate system performance. Very few systems were found to excel in two or more matrices.

Java and Python (with a combined contribution of 27%) are the most common programming languages used to develop recommender systems. This is due to the availability of a large number of standard java and python libraries which aid in the development process.

Recently a large number of hybrid and optimizations techniques are being proposed for recommender systems. The performance of a recommender system can be greatly improved by applying optimization techniques.

There is a large scope of research in using neural networks and deep learning-based methods for developing recommender systems. Systems developed using these methods are found to achieve high-performance accuracy.

This research will provide a guideline for future research in the domain of recommender systems. However, this research has some limitations. Firstly, due to the limited amount of manpower and time, we have only reviewed papers published in journals focusing on computer science, management and medicine. Secondly, we have reviewed only English papers. New research may extend this study to cover other journals and non-English papers. Finally, this review was conducted based on a search on only six descriptors: “Recommender systems”, “Recommendation systems”, “Movie Recommend*”, “Music Recommend*”, “Personalized Recommend*” and “Hybrid Recommend*”. Research papers that did not include these keywords were not considered. Future research can include adding some additional descriptors and keywords for searching. This will allow extending the research to cover more diverse articles on recommender systems.

Availability of data and materials

Not applicable.

Castellano G, Fanelli AM, Torsello MA. NEWER: A system for neuro-fuzzy web recommendation. Appl Soft Comput. 2011;11:793–806.

Article   Google Scholar  

Crespo RG, Martínez OS, Lovelle JMC, García-Bustelo BCP, Gayo JEL, Pablos PO. Recommendation system based on user interaction data applied to intelligent electronic books. Computers Hum Behavior. 2011;27:1445–9.

Lin FC, Yu HW, Hsu CH, Weng TC. Recommendation system for localized products in vending machines. Expert Syst Appl. 2011;38:9129–38.

Wang SL, Wu CY. Application of context-aware and personalized recommendation to implement an adaptive ubiquitous learning system. Expert Syst Appl. 2011;38:10831–8.

García-Crespo Á, López-Cuadrado JL, Colomo-Palacios R, González-Carrasco I, Ruiz-Mezcua B. Sem-Fit: A semantic based expert system to provide recommendations in the tourism domain. Expert Syst Appl. 2011;38:13310–9.

Dong H, Hussain FK, Chang E. A service concept recommendation system for enhancing the dependability of semantic service matchmakers in the service ecosystem environment. J Netw Comput Appl. 2011;34:619–31.

Li M, Liu L, Li CB. An approach to expert recommendation based on fuzzy linguistic method and fuzzy text classification in knowledge management systems. Expert Syst Appl. 2011;38:8586–96.

Lorenzi F, Bazzan ALC, Abel M, Ricci F. Improving recommendations through an assumption-based multiagent approach: An application in the tourism domain. Expert Syst Appl. 2011;38:14703–14.

Huang Z, Lu X, Duan H. Context-aware recommendation using rough set model and collaborative filtering. Artif Intell Rev. 2011;35:85–99.

Chen RC, Huang YH, Bau CT, Chen SM. A recommendation system based on domain ontology and SWRL for anti-diabetic drugs selection. Expert Syst Appl. 2012;39:3995–4006.

Mohanraj V, Chandrasekaran M, Senthilkumar J, Arumugam S, Suresh Y. Ontology driven bee’s foraging approach based self-adaptive online recommendation system. J Syst Softw. 2012;85:2439–50.

Hsu CC, Chen HC, Huang KK, Huang YM. A personalized auxiliary material recommendation system based on learning style on facebook applying an artificial bee colony algorithm. Comput Math Appl. 2012;64:1506–13.

Gemmell J, Schimoler T, Mobasher B, Burke R. Resource recommendation in social annotation systems: A linear-weighted hybrid approach. J Comput Syst Sci. 2012;78:1160–74.

Article   MathSciNet   Google Scholar  

Choi K, Yoo D, Kim G, Suh Y. A hybrid online-product recommendation system: Combining implicit rating-based collaborative filtering and sequential pattern analysis. Electron Commer Res Appl. 2012;11:309–17.

Garibaldi JM, Zhou SM, Wang XY, John RI, Ellis IO. Incorporation of expert variability into breast cancer treatment recommendation in designing clinical protocol guided fuzzy rule system models. J Biomed Inform. 2012;45:447–59.

Salehi M, Kmalabadi IN. A hybrid attribute–based recommender system for e–learning material recommendation. IERI Procedia. 2012;2:565–70.

Aher SB, Lobo LMRJ. Combination of machine learning algorithms for recommendation of courses in e-learning System based on historical data. Knowl-Based Syst. 2013;51:1–14.

Kardan AA, Ebrahimi M. A novel approach to hybrid recommendation systems based on association rules mining for content recommendation in asynchronous discussion groups. Inf Sci. 2013;219:93–110.

Chang JH, Lai CF, Wang MS, Wu TY. A cloud-based intelligent TV program recommendation system. Comput Electr Eng. 2013;39:2379–99.

Lucas JP, Luz N, Moreno MN, Anacleto R, Figueiredo AA, Martins C. A hybrid recommendation approach for a tourism system. Expert Syst Appl. 2013;40:3532–50.

Niu J, Zhu L, Zhao X, Li H. Affivir: An affect-based Internet video recommendation system. Neurocomputing. 2013;120:422–33.

Liu L, Xu J, Liao SS, Chen H. A real-time personalized route recommendation system for self-drive tourists based on vehicle to vehicle communication. Expert Syst Appl. 2014;41:3409–17.

Bakshi S, Jagadev AK, Dehuri S, Wang GN. Enhancing scalability and accuracy of recommendation systems using unsupervised learning and particle swarm optimization. Appl Soft Comput. 2014;15:21–9.

Kim Y, Shim K. TWILITE: A recommendation system for twitter using a probabilistic model based on latent Dirichlet allocation. Inf Syst. 2014;42:59–77.

Wang Z, Yu X, Feng N, Wang Z. An improved collaborative movie recommendation system using computational intelligence. J Vis Lang Comput. 2014;25:667–75.

Kolomvatsos K, Anagnostopoulos C, Hadjiefthymiades S. An efficient recommendation system based on the optimal stopping theory. Expert Syst Appl. 2014;41:6796–806.

Gottschlich J, Hinz O. A decision support system for stock investment recommendations using collective wisdom. Decis Support Syst. 2014;59:52–62.

Torshizi AD, Zarandi MHF, Torshizi GD, Eghbali K. A hybrid fuzzy-ontology based intelligent system to determine level of severity and treatment recommendation for benign prostatic hyperplasia. Comput Methods Programs Biomed. 2014;113:301–13.

Zahálka J, Rudinac S, Worring M. Interactive multimodal learning for venue recommendation. IEEE Trans Multimedia. 2015;17:2235–44.

Sankar CP, Vidyaraj R, Kumar KS. Trust based stock recommendation system – a social network analysis approach. Procedia Computer Sci. 2015;46:299–305.

Chen MH, Teng CH, Chang PC. Applying artificial immune systems to collaborative filtering for movie recommendation. Adv Eng Inform. 2015;29:830–9.

Wu H, Pei Y, Li B, Kang Z, Liu X, Li H. Item recommendation in collaborative tagging systems via heuristic data fusion. Knowl-Based Syst. 2015;75:124–40.

Yeh DY, Cheng CH. Recommendation system for popular tourist attractions in Taiwan using delphi panel and repertory grid techniques. Tour Manage. 2015;46:164–76.

Liao SH, Chang HK. A rough set-based association rule approach for a recommendation system for online consumers. Inf Process Manage. 2016;52:1142–60.

Li H, Cui J, Shen B, Ma J. An intelligent movie recommendation system through group-level sentiment analysis in microblogs. Neurocomputing. 2016;210:164–73.

Wu H, Yue K, Pei Y, Li B, Zhao Y, Dong F. Collaborative topic regression with social trust ensemble for recommendation in social media systems. Knowl-Based Syst. 2016;97:111–22.

Adeniyi DA, Wei Z, Yongquan Y. Automated web usage data mining and recommendation system using K-Nearest Neighbor (KNN) classification method. Appl Computing Inform. 2016;12:90–108.

Rawat YS, Kankanhalli MS. ClickSmart: A context-aware viewpoint recommendation system for mobile photography. IEEE Trans Circuits Syst Video Technol. 2017;27:149–58.

Yang S, Korayem M, Aljadda K, Grainger T, Natarajan S. Combining content-based and collaborative filtering for job recommendation system: A cost-sensitive Statistical Relational Learning approach. Knowl-Based Syst. 2017;136:37–45.

Lee WP, Chen CT, Huang JY, Liang JY. A smartphone-based activity-aware system for music streaming recommendation. Knowl-Based Syst. 2017;131:70–82.

Wei J, He J, Chen K, Zhou Y, Tang Z. Collaborative filtering and deep learning based recommendation system for cold start items. Expert Syst Appl. 2017;69:29–39.

Li C, Wang Z, Cao S, He L. WLRRS: A new recommendation system based on weighted linear regression models. Comput Electr Eng. 2018;66:40–7.

Mezei J, Nikou S. Fuzzy optimization to improve mobile health and wellness recommendation systems. Knowl-Based Syst. 2018;142:108–16.

Ayata D, Yaslan Y, Kamasak ME. Emotion based music recommendation system using wearable physiological sensors. IEEE Trans Consum Electron. 2018;64:196–203.

Zhao Z, Yang Q, Lu H, Weninger T. Social-aware movie recommendation via multimodal network learning. IEEE Trans Multimedia. 2018;20:430–40.

Hammou BA, Lahcen AA, Mouline S. An effective distributed predictive model with matrix factorization and random forest for big data recommendation systems. Expert Syst Appl. 2019;137:253–65.

Zhao J, Geng X, Zhou J, Sun Q, Xiao Y, Zhang Z, Fu Z. Attribute mapping and autoencoder neural network based matrix factorization initialization for recommendation systems. Knowl-Based Syst. 2019;166:132–9.

Bhaskaran S, Santhi B. An efficient personalized trust based hybrid recommendation (TBHR) strategy for e-learning system in cloud computing. Clust Comput. 2019;22:1137–49.

Han Y, Han Z, Wu J, Yu Y, Gao S, Hua D, Yang A. Artificial intelligence recommendation system of cancer rehabilitation scheme based on IoT technology. IEEE Access. 2020;8:44924–35.

Kang S, Jeong C, Chung K. Tree-based real-time advertisement recommendation system in online broadcasting. IEEE Access. 2020;8:192693–702.

Ullah F, Zhang B, Khan RU. Image-based service recommendation system: A JPEG-coefficient RFs approach. IEEE Access. 2020;8:3308–18.

Cai X, Hu Z, Zhao P, Zhang W, Chen J. A hybrid recommendation system with many-objective evolutionary algorithm. Expert Syst Appl. 2020. https://doi.org/10.1016/j.eswa.2020.113648 .

Esteban A, Zafra A, Romero C. Helping university students to choose elective courses by using a hybrid multi-criteria recommendation system with genetic optimization. Knowledge-Based Syst. 2020;194:105385.

Mondal S, Basu A, Mukherjee N. Building a trust-based doctor recommendation system on top of multilayer graph database. J Biomed Inform. 2020;110:103549.

Dhelim S, Ning H, Aung N, Huang R, Ma J. Personality-aware product recommendation system based on user interests mining and metapath discovery. IEEE Trans Comput Soc Syst. 2021;8:86–98.

Bhalse N, Thakur R. Algorithm for movie recommendation system using collaborative filtering. Materials Today: Proceedings. 2021. https://doi.org/10.1016/j.matpr.2021.01.235 .

Ke G, Du HL, Chen YC. Cross-platform dynamic goods recommendation system based on reinforcement learning and social networks. Appl Soft Computing. 2021;104:107213.

Chen X, Liu D, Xiong Z, Zha ZJ. Learning and fusing multiple user interest representations for micro-video and movie recommendations. IEEE Trans Multimedia. 2021;23:484–96.

Afolabi AO, Toivanen P. Integration of recommendation systems into connected health for effective management of chronic diseases. IEEE Access. 2019;7:49201–11.

He M, Wang B, Du X. HI2Rec: Exploring knowledge in heterogeneous information for movie recommendation. IEEE Access. 2019;7:30276–84.

Bobadilla J, Serradilla F, Hernando A. Collaborative filtering adapted to recommender systems of e-learning. Knowl-Based Syst. 2009;22:261–5.

Russell S, Yoon V. Applications of wavelet data reduction in a recommender system. Expert Syst Appl. 2008;34:2316–25.

Campos LM, Fernández-Luna JM, Huete JF. A collaborative recommender system based on probabilistic inference from fuzzy observations. Fuzzy Sets Syst. 2008;159:1554–76.

Funk M, Rozinat A, Karapanos E, Medeiros AKA, Koca A. In situ evaluation of recommender systems: Framework and instrumentation. Int J Hum Comput Stud. 2010;68:525–47.

Porcel C, Moreno JM, Herrera-Viedma E. A multi-disciplinar recommender system to advice research resources in University Digital Libraries. Expert Syst Appl. 2009;36:12520–8.

Bobadilla J, Serradilla F, Bernal J. A new collaborative filtering metric that improves the behavior of recommender systems. Knowl-Based Syst. 2010;23:520–8.

Ochi P, Rao S, Takayama L, Nass C. Predictors of user perceptions of web recommender systems: How the basis for generating experience and search product recommendations affects user responses. Int J Hum Comput Stud. 2010;68:472–82.

Olmo FH, Gaudioso E. Evaluation of recommender systems: A new approach. Expert Syst Appl. 2008;35:790–804.

Zhen L, Huang GQ, Jiang Z. An inner-enterprise knowledge recommender system. Expert Syst Appl. 2010;37:1703–12.

Göksedef M, Gündüz-Öğüdücü S. Combination of web page recommender systems. Expert Syst Appl. 2010;37(4):2911–22.

Shao B, Wang D, Li T, Ogihara M. Music recommendation based on acoustic features and user access patterns. IEEE Trans Audio Speech Lang Process. 2009;17:1602–11.

Shin C, Woo W. Socially aware tv program recommender for multiple viewers. IEEE Trans Consum Electron. 2009;55:927–32.

Lopez-Carmona MA, Marsa-Maestre I, Perez JRV, Alcazar BA. Anegsys: An automated negotiation based recommender system for local e-marketplaces. IEEE Lat Am Trans. 2007;5:409–16.

Yap G, Tan A, Pang H. Discovering and exploiting causal dependencies for robust mobile context-aware recommenders. IEEE Trans Knowl Data Eng. 2007;19:977–92.

Meo PD, Quattrone G, Terracina G, Ursino D. An XML-based multiagent system for supporting online recruitment services. IEEE Trans Syst Man Cybern. 2007;37:464–80.

Khusro S, Ali Z, Ullah I. Recommender systems: Issues, challenges, and research opportunities. Inform Sci Appl. 2016. https://doi.org/10.1007/978-981-10-0557-2_112 .

Blanco-Fernandez Y, Pazos-Arias JJ, Gil-Solla A, Ramos-Cabrer M, Lopez-Nores M. Providing entertainment by content-based filtering and semantic reasoning in intelligent recommender systems. IEEE Trans Consum Electron. 2008;54:727–35.

Isinkaye FO, Folajimi YO, Ojokoh BA. Recommendation systems: Principles, methods and evaluation. Egyptian Inform J. 2015;16:261–73.

Yoshii K, Goto M, Komatani K, Ogata T, Okuno HG. An efficient hybrid music recommender system using an incrementally trainable probabilistic generative model. IEEE Trans Audio Speech Lang Process. 2008;16:435–47.

Wei YZ, Moreau L, Jennings NR. Learning users’ interests by quality classification in market-based recommender systems. IEEE Trans Knowl Data Eng. 2005;17:1678–88.

Bjelica M. Towards TV recommender system: experiments with user modeling. IEEE Trans Consum Electron. 2010;56:1763–9.

Setten MV, Veenstra M, Nijholt A, Dijk BV. Goal-based structuring in recommender systems. Interact Comput. 2006;18:432–56.

Adomavicius G, Tuzhilin A. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans Knowl Data Eng. 2005;17:734–49.

Symeonidis P, Nanopoulos A, Manolopoulos Y. Providing justifications in recommender systems. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans. 2009;38:1262–72.

Zhan J, Hsieh C, Wang I, Hsu T, Liau C, Wang D. Privacy preserving collaborative recommender systems. IEEE Trans Syst Man Cybernet. 2010;40:472–6.

Burke R. Hybrid recommender systems: survey and experiments. User Model User-Adap Inter. 2002;12:331–70.

Article   MATH   Google Scholar  

Gunes I, Kaleli C, Bilge A, Polat H. Shilling attacks against recommender systems: a comprehensive survey. Artif Intell Rev. 2012;42:767–99.

Park DH, Kim HK, Choi IY, Kim JK. A literature review and classification of recommender systems research. Expert Syst Appl. 2012;39:10059–72.

Download references

Acknowledgements

We thank our colleagues from Assam Down Town University who provided insight and expertise that greatly assisted this research, although they may not agree with all the interpretations and conclusions of this paper.

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and affiliations.

Department of Computer Science & Engineering, Assam Down Town University, Panikhaiti, Guwahati, 781026, Assam, India

Deepjyoti Roy & Mala Dutta

You can also search for this author in PubMed   Google Scholar

Contributions

DR carried out the review study and analysis of the existing algorithms in the literature. MD has been involved in drafting the manuscript or revising it critically for important intellectual content. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Deepjyoti Roy .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Roy, D., Dutta, M. A systematic review and research perspective on recommender systems. J Big Data 9 , 59 (2022). https://doi.org/10.1186/s40537-022-00592-5

Download citation

Received : 04 October 2021

Accepted : 28 March 2022

Published : 03 May 2022

DOI : https://doi.org/10.1186/s40537-022-00592-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Recommender system
  • Machine learning
  • Content-based filtering
  • Collaborative filtering
  • Deep learning

research papers on recommender systems

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

A collaborative approach for research paper recommender system

Roles Conceptualization, Data curation, Formal analysis, Methodology, Project administration, Writing – original draft, Writing – review & editing

Affiliations Department of Computer Science, Faculty of Computer Science and Information Technology, Bayero University, Kano, Nigeria, Department of Information Systems, Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia

ORCID logo

Roles Supervision, Validation, Visualization, Writing – review & editing

* E-mail: [email protected]

Affiliation Department of Information Systems, Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia

Roles Funding acquisition, Resources

Affiliation Sekolah Tinggi Pariwisata Ambarrukmo, Yogyakarta, Indonesia

Affiliation Faculty of Information Technology and Business, Universitas Teknologi Yogyakarta, Yogyakarta, Indonesia

Roles Formal analysis, Investigation, Project administration, Resources, Validation, Visualization, Writing – review & editing

Affiliations Faculty of Information Technology and Business, Universitas Teknologi Yogyakarta, Yogyakarta, Indonesia, AMCS Research Center, Yogyakarta, Indonesia

  • Khalid Haruna, 
  • Maizatul Akmar Ismail, 
  • Damiasih Damiasih, 
  • Joko Sutopo, 
  • Tutut Herawan

PLOS

  • Published: October 5, 2017
  • https://doi.org/10.1371/journal.pone.0184516
  • Reader Comments

Fig 1

Research paper recommenders emerged over the last decade to ease finding publications relating to researchers’ area of interest. The challenge was not just to provide researchers with very rich publications at any time, any place and in any form but to also offer the right publication to the right researcher in the right way. Several approaches exist in handling paper recommender systems. However, these approaches assumed the availability of the whole contents of the recommending papers to be freely accessible, which is not always true due to factors such as copyright restrictions. This paper presents a collaborative approach for research paper recommender system. By leveraging the advantages of collaborative filtering approach, we utilize the publicly available contextual metadata to infer the hidden associations that exist between research papers in order to personalize recommendations. The novelty of our proposed approach is that it provides personalized recommendations regardless of the research field and regardless of the user’s expertise. Using a publicly available dataset, our proposed approach has recorded a significant improvement over other baseline methods in measuring both the overall performance and the ability to return relevant and useful publications at the top of the recommendation list.

Citation: Haruna K, Akmar Ismail M, Damiasih D, Sutopo J, Herawan T (2017) A collaborative approach for research paper recommender system. PLoS ONE 12(10): e0184516. https://doi.org/10.1371/journal.pone.0184516

Editor: Feng Xia, Dalian University of Technology, CHINA

Received: June 10, 2017; Accepted: August 27, 2017; Published: October 5, 2017

Copyright: © 2017 Haruna et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All study files are available from: https://figshare.com/articles/Supporting_Information_Dataset_docx/5368408 .

Funding: The authors received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

1. Introduction

The overabundance of information that is available over the internet makes information seeking a difficult task. Researchers find it difficult to access and keep track of the most relevant and promising research papers of their interest [ 1 ]. The easiest and the most common approach used in searching for related publications is to send a query message asking the web to provide you with specific information [ 2 ]. However, the results from this approach largely depend on how good the user is in fine-tuning the query message beside its inability to personalize the searching results.

Another classical approach used by most researchers is to follow the list of references from the documents they already possessed [ 3 ]. Even though this approach might be quite effective in some instances, it does not guarantee full coverage of recommending research papers and cannot trace papers published after the possessed paper. In addition, the list of references may not be publicly available and therefore hard for the researchers to access.

An alternative approach that has been proposed in the literature is the use of research paper recommender systems [ 4 , 5 ], to automatically suggest relevant papers to the researchers based on some initial information provided by the users that are more elaborate than a few keywords.

To provide more accurate and relevant recommendations, recommender systems incorporate the users’ contexts and the possible contextual information of the consumed contents [ 6 ]. Different researchers proposed the use of a different user provided information such as the use of a list of citations [ 7 ], the list of papers authored by an author [ 8 ], part of paper text [ 2 ], a single paper [ 9 ], and so on. In these approaches, a user profile is constructed from this initial information to represent the interests of the users and the system search for items or other profiles similar to the one provided to generate recommendations. The challenge was not just to provide a very rich recommendation to researchers at any time, any place and in any form but to also offer the right paper to the right researcher in the right way [ 10 – 12 ].

The major limitation of the existing approaches is their assumption of the availability of the whole content of the recommending papers to be freely accessible, which is not always true due to factors such as copyright restrictions. In an attempt to address this problem, Liu, et al . in [ 3 ] applied the concept of the collaborative approach to mine the hidden associations that exists between a target paper and its references to provide a unique and useful list of research papers as recommendations.

Motivated from [ 3 ], this paper presents a collaborative approach for research paper recommender system. In addition to mining the hidden associations between a target paper and its references, in this paper, we also put into cognizance the hidden associations between the target paper’s citations (see section 3). Similar to [ 3 ], our task is not to apply a direct relation between paper-citation relations because, in one way or the other, a researcher who is in possession of a research paper directly or indirectly has access to its limited references and also to its citations. Our aim is to identify the latent associations that exist between research papers based on the perspective of paper-citation relations. A candidate paper is qualified for consideration in [ 3 ] if it cited any of the target paper’s references. In our proposed approach, a candidate paper is qualified for consideration if and only if it cited any of the target paper’s references and there exist another paper which cited both the candidate and the target papers simultaneously. We then measure and weigh the extent of similarity between the target paper and the qualified candidate papers and recommend the top-N most similar papers based on the assumption that if there exist significant co-occurrence between the target paper and the qualified candidate papers, then there exist some extent of similarities between them. This strictness in qualifying a candidate paper helps in enhancing the overall performance of the approach and the ability to return relevant and useful recommendations at the top of the recommendation list.

The major contributions of our proposed approach are as follows;

  • We utilized the advantages of publicly available contextual metadata to propose an independent research paper that does not require a priori user profile.
  • Our approach provides personalized recommendations regardless of the research field and regardless of user expertise.

The outline of the rest of the paper is as follows. We first present some related works on recommending research papers. We then detailed our proposed approach. Next, we described our experiments, starting with the dataset and the baseline methods, followed by the evaluation procedures. We then discuss our findings and lastly conclude the paper with a brief concluding remark and future research directions.

2. Related work

Research paper recommenders that provide the best suggestions for all alternatives emerged over the last decade to help researchers on seemingly finding works of their interest over the Cyber Ocean of information. Collaborative filtering (CF) is one of the most successful techniques used in recommender systems [ 13 ]. It is a method which recommends items to target users based on what other similar users have previously preferred [ 14 – 16 ]. It has been used in various applications such as in recommending movies [ 17 ], audio CD [ 18 ], e-commerce [ 19 ], music [ 20 ], Usenet news [ 16 ], research papers [ 7 , 21 – 24 ] among others (see [ 25 ]). Some researchers [ 13 , 21 , 26 ], have criticized the use of this technique to recommend scholarly papers. Precisely the authors in [ 21 , 26 ], claimed that collaborative filtering is only effective in a domain where the number of users seeking recommendation is higher than the number of items to be recommended, such domains include movies [ 27 ], music [ 28 ], news [ 29 ] etc. While the argument in [ 13 ], is that researchers are not willing to spend their valuable time to provide explicit ratings to their consumed research papers, and therefore, leading to insufficient ratings by the researchers to the research papers. Furthermore, for a user to receive useful recommendations, a tangible number of ratings is required.

Nevertheless, despite these aforementioned problems, a significant amount of papers can be traced, which suggest relevant papers to researchers based on collaborative filtering by mining latent associations between scholarly papers. These associations are either directly obtained by taking into consideration paper citations as rating scores [ 7 ], or by monitoring the researchers’ actions implicitly [ 30 , 31 ]. Applying citation analysis such as bibliographical coupling [ 32 ] and co-citation analysis [ 33 ] has also been used to identify similar papers to a target paper [ 34 ].

The relationships among research papers have been categorized into direct and indirect relations in a survey conducted by [ 35 ]. In the paper, three approaches were identified for detecting the relationships between papers based on the perspective of paper sources. Namely, citation context, citation analysis, and content-based. The authors claimed that content-based approach becomes less appropriate in detecting relationships across research papers, due to its inability to accommodate some specific characteristics that exist in the research papers like author and citations. Therefore, it becomes suitable only for identifying similarity relations across regular documents. On the other hand, the use of citation analysis can generate more relations between research papers but cannot generate relations from semantic text. This weakness is addressed by using citation context based approach, which depicts more emphasis on determining some important features in the text classification process to increase classification performance.

A context-based collaborative framework (CCF) that uses only easily obtained citations relations as source data was proposed in [ 3 ]. The framework employs an association-mining technique to obtain a paper representation of the paper citation context. A pairwise comparison was then performed to compute the extent of similarities between papers. The use of collaborative filtering has also been explored in [ 7 ], by using citation-web between scholarly papers to create a rating matrix. The aim was to use the paper-citation relation to recommend some additional references to the input paper. In doing that, the authors investigated the use of six different algorithms for selecting citations. Using offline evaluation, they discovered large disparity in the returned accuracy by each of the six algorithms.

The authors in [ 6 ], hypothesized the author’s previous publications to constitute a clear signal of the latent interests of a researcher. The key part of their model was to enhance the user profile with the information coming directly from the references to the researcher’s previous works as well as the papers that cited them. However, the approach increases the well-known sparsity problem. To alleviate this problem, they extend their work in [ 8 ], to mine potential citations papers using imputed similarities through the use of collaborative filtering. They also refined the use of citing papers in characterizing a target candidate paper using fragments in the citation and potential citation papers. Whilst the approach works well for researchers with a single discipline, it generates poor results for the multidisciplinary researchers. To overcome this problem, an adaptive neighbor selection approach was proposed in [ 2 ], to overcome imputation-based collaborative filtering problem. Whereas authors in [ 2 , 6 , 8 ], recommend papers relevant to the researcher’s interest, they also addressed the serendipitous scholarly paper recommendation in [ 36 ].

On another development, the increasing number of research communities and social networking sites such as LiveJournal and MySpace have brought new opportunities for research paper recommendation systems. Researches show that users in online social networks tend to form knit groups [ 37 ], with strongly large connected components [ 38 ].

Several kinds of research have considered the social group formation and community membership in social networks and their use in recommender systems [ 39 – 46 ]. These researchers utilized the influence of social properties to suggest relevant information to individual or group of users based on social ties, which can either be strong or weak depending on the tie strength that represents the closeness and interaction frequency between the information source and recipient [ 47 , 48 ].

Recommendations from strong ties are believed to be more persuasive than those from weak ties [ 49 – 51 ]. This is because information transferred by strong ties is likely to be perceived as more relevant and reliable. To be specific, the authors of [ 45 , 52 ] proposed a novel algorithm called socially aware recommendation of scholarly papers (SARSP) that utilizes the aspect of social learning and networking for conference participants through the construction of relations in folksonomies and social ties. The algorithm recommends research papers issued by an active participant to other conference participants based on the computation of their social ties. This approach has been extended in [ 53 ], to include personality behavior in addition to social relations among smart conference attendees. A more detail survey on scholarly data is presented in [ 54 ] for more exploration.

The major challenge with the previous researches is that all the contextual information from the recommended, referenced and cited papers must be fully accessible to the recommenders, which are not always freely available due to factors such as copyright restrictions. Another major problem with the existing research paper recommender systems is their dependency on a priori user profile, which makes the system to work well only when it already has a number of registered users, a major hurdle for the construction of new recommender system. Furthermore, the recommendation coverage of most of the current paper recommenders are limited to a certain field of research, this is because recommending papers are stored prior and therefore the system cannot effectively scan the entire databases to find connections between papers. Moreover, most of the existing research paper frameworks are designed to work only on a single discipline, and therefore cannot be used to address the problems of multidisciplinary scholars. While the use of keyword-based query information retrieval technique through search engines is able to scan all document for relevant text, it also provides 100s of irrelevant documents, besides its inability to provide personalize results to the individual researchers.

Different from the existing works, in this paper, we propose a new approach based on collaborative filtering that utilizes only publicly available contextual metadata to personalize recommendations based on the hidden associations that exist between research papers. Our proposed approach does not only provide personalized recommendations regardless of the research field and regardless of user expertise but also handles multi-disciplinary problems.

3. Proposed collaborative research paper recommendation approach

Even though some researchers [ 6 , 13 , 21 , 26 ], claimed content based to be the most suitable approach when dealing with scholarly domain, other researchers [ 35 ] argued on its suitability because only become suitable in identifying similarity relations across regular documents but lacks some important features to effectively detect relationships across research papers.

In this paper, we are motivated to leverage the advantages of collaborative approach as it has proved to be effective in the domains of movies [ 27 ], music [ 28 ], news [ 29 ], e-commerce [ 19 ], etc. The unsuitability of the collaborative approach to research paper recommenders was referred to the lack of ratings to research papers by the researchers [ 13 ]. In bringing a solution to this problem, we mined rating score between researchers and research papers based on paper-citation relations. We use C ij to denote citation score between paper i and a cited-paper j from a paper-citation matrix C . If paper i cited a paper j , C ij = 1 otherwise C ij = 0.

We initiate our approach by first transforming all the recommending papers (in our dataset) into a paper-citation relations matrix in which, the rows and the columns respectively represent the recommending papers and their citations. Our approach aimed to deal with scenarios in which: (a) A researcher who finds an interesting paper after some initial searches, wants to get more other related papers similar to it. (b) A student received a paper by his supervisor to start a research in the topic area covered by it. (c) A reviewer wants to explore more based on a received paper that addresses a subject matter which he is not a specialist in. (d) A researcher who wants to explore more from his previous publication(s). In all these cases, we consider a situation where the references and citations of the possessed paper that indicate the user’s preferences are publicly available (which is usually the case in almost all the major academic databases).

Algorithm 1. Algorithm representing proposed approach.

Algorithm: Collaborative Research Paper Recommendation

Input: Target Paper

Output: Top-N Recommendation

Given a target paper p i as a query,

  • For each of the references Rf j , extract all other papers p ci that also cited Rf j other than the target paper p i .
  • For each of the citations Cf j , extract all other papers p ri that Cf j referenced other than the target paper p i .
  • Qualify all the candidate papers p c from p ci that has been referenced by at least any of the p ri

research papers on recommender systems

  • Recommend the top-N most similar papers to the user.

We accept the user’s query in order to identify the target-paper. Once the target paper is identified, we apply algorithm 1. The algorithm retrieves all the target paper’s references and citations. For each of the references, it extracts all other papers from the web (google scholar to be precise) that also cited any of those target paper’s references. In addition, for each of the target paper’s citations, it extracts all other papers from the web that referenced any of those target paper’s citations (in other words, all the references to the target paper’s citations) and we refer to these extracted papers as the target papers nearest neighbors. For each of the neighboring papers, we qualify candidate papers that are co-cited with the target paper and which has been referenced by at least any of the target papers references. We then measure the degree of similitude between these qualified candidate papers and the target paper by measuring their collaborative similarity using Jaccard similarity measure given by Eq (1) . We then recommend the top-N most comparable papers to the researcher.

research papers on recommender systems

Z 11 Represents the total number of attributes where X and Y both having a value of 1.

Z 01 Represents the total number of attributes where the attribute of X is 0 and the attribute of Y is 1.

Z 10 Represents the total number of attributes where the attribute of X is 1 and the attribute of Y is 0.

To illustrate our approach further, Fig 1 represents a target-paper ( p i ) with references ( Rf 1 to Rf N ) and citations ( Cit . 1 to Cit . N ). Each of the references of the target paper has other citations from any of Rec . 1 to Rec . N and/or Cit . 1 to Cit . N other than the target-paper ( p i ). Also, each of the citations to the target paper has other references from any of Rec . 1 to Rec . N and/or Rf 1 to Rf N other than the target-paper ( p i ). Our approach qualifies recommending papers ( Rec . 1 to Rec . N ) that are co-cited with the target paper and which has been referenced by at least any of the target papers references.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0184516.g001

For example, from Fig 1 , Rec . 1 and Rec . 2 are co-cited with the target paper by Cit . 1 . However, Ref . 2 does not have any connection to any of the target paper’s references and therefore disqualified by step 3 of our proposed algorithm. On the other hand, Rec . 1 does not only being co-cited with the target paper by Cit . 1 but also referenced one of the target papers references Ref . 1 . As can be observed from Fig 1 , only Rec . 1 and Rec . 3 are qualified candidate papers.

In the following section, we present the experiments setup.

4. Experiments setup

4.1 dataset.

We utilize the publicly available dataset presented in [ 2 ]. The dataset contained the publication list of 50 researchers whose research interests are from different fields of computer science that range from information retrieval, software engineering, user interface, security, graphics, databases, operating systems, embedded systems and programming languages. We retrieved every one of their references and citations and extracted from google scholar, every other paper that cited any of the references as well as all the references of each of the target paper’s citations. Some statistics of the utilized dataset is presented in Table 1 .

thumbnail

https://doi.org/10.1371/journal.pone.0184516.t001

4.2 Baseline methods

In assessing the effectiveness of our proposed framework, we compare the recommendation results with two baselines presented in [ 7 ] and [ 3 ]. The pattern introduced in [ 7 ] views citation relation matrix as a rating score and generates the recommendation based on common citations between the target paper and its neighboring papers. Given a target paper, the algorithm counts the number of times other citations were co-cited with it. The algorithm then recommends citations with the highest total co-citations summed over all recommending papers. The assumption was that, the more the co-citation in like manner between papers the higher their similarity. While [ 3 ], mined the hidden relationship between a target paper and all of its references. The task was to quantify the degree of closeness between the target paper and the other papers that also cited any of the target paper’s references. The rationale behind the approach was that, if two papers are significantly co-occurring with the same citing paper(s), then they should be similar to some extent.

4.3 Evaluation metrics

In order to evaluate the quality of our approach, for each of the target papers, we performed 5-fold cross validation to its references and citations by selecting 20% as a test set. We then assess the general performance using the three most commonly used evaluation metrics in retrieval systems: precision, recall and F1 measures. Precision given by Eq (2) , measures the capability of the system to reclaim as much relevant research papers as possible in response to the target paper request.

research papers on recommender systems

On the other hand, recall given by Eq (3) , measures the capability of the system to reclaim as few irrelevant research papers as possible in response to the target paper request.

research papers on recommender systems

Moreover, F1 measure given by Eq (4) is the harmonic mean between the precision and recall.

research papers on recommender systems

As users often scan only documents presented at the top ranked of the recommendation list, we feel imperative to also measure the system’s ability to provide useful recommendations at the top of the recommendation list using the two most widely used ranked information retrieval evaluation measures: Mean Average Precision (MAP) and Mean Reciprocal Rank (MRR).

research papers on recommender systems

5. Results and discussions

To be specific, the results of each evaluation metric in this section represent the overall averages over all the 50 researchers of our dataset. We start the comparison by assessing the general performance of our proposed approach in returning relevant research papers with the baseline methods based on the three most commonly used information retrieval evaluation metrics. Figs 2 – 4 , demonstrate the comparisons based on precision, recall and F1 evaluation measures respectively. As can be seen from Fig 2 , the precision results of our proposed approach has significantly outperformed the baseline methods (Context-Based Collaborative Filtering (CCF) proposed by [ 3 ] and Co-citation method proposed by [ 7 ]) in returning relevant research papers for all N recommendations values. This is because our approach is able to critically remove recommending papers that are less related to the target paper.

thumbnail

https://doi.org/10.1371/journal.pone.0184516.g002

thumbnail

https://doi.org/10.1371/journal.pone.0184516.g003

thumbnail

https://doi.org/10.1371/journal.pone.0184516.g004

Fig 3 depicts the comparison based on recall. As can be seen from the figure, the performance difference between our proposed approach and CCF is very much insignificant. In fact, the CCF method is even slightly better than our proposed approach when N = 5 and when N = 20. However, our proposed approach began to show the significant difference as the number of N increases, specifically when N is above 20. The low performance based on recall of our proposed approach is as a result of strict rules in qualifying a candidate paper. Thus, our approach is only after the most significant related recommending papers to the target paper and therefore leaving a lot of other less related papers unrecalled. Furthermore, Fig 4 depicts the harmonic mean between the precision and recall ( F 1 measure), and from the figure, the performance difference between our proposed approach and CCF is also insignificant for values of N less than or equals to 20. However, our approach began to show significant improvement over CCF when N is greater than 20. In all the three measures, the Co-citation method performs very low compared to our proposed approach. This is because the Co-citation method does not infer the hidden associations between paper-citation relations rather applies direct relations between a target paper and its neighboring papers.

Conclusively, the general performance of our proposed approach has outstandingly outperformed the baseline methods based on precision for all values of N . On the other hand, our proposed approach performs worse than CCF in a recommendation list of 5 based on recall and F1 performance measures. However, the major reason behind the low performance of our proposed approach based on recall is the strict rules in qualifying a candidate paper.

Our proposed approach is designed to favor precision which has more influence on user satisfaction than recall. This is because precision is the key element in the process of implementing a search solution [ 55 ]. Poor precision damages the reputation of a search system and discourages its use. High precision generally impresses search users [ 55 ]. That is why our proposed approach is only after the most significant related recommending papers to the target paper (the result of this can easily be seen from Fig 2 ), and therefore leaving a lot of other less related papers unrecalled. This is because recall is particularly important in applications where the user cannot afford to miss information such as issues related to security or compliance applications. The recall has less influence on user satisfaction than precision. Many searchers, especially on the Web, are satisfied by precise results, even where recall is low [ 56 ]. Notwithstanding, our proposed approach starts to show large disparities with the baseline methods when the number of N is above 5 for both recall and F 1 measures. Therefore, a very large N value is extremely important in order to recall as much qualitative and useful recommendations as possible.

Due to the fact that users usually scan only the top of the recommendation list, we also make the comparison based on how our approach is able to return relevant research papers at the top of the recommendation list. Figs 5 and 6 depict our comparisons based on Mean Average Precision (MAP) and Mean Reciprocal Rank (MRR) respectively. As can be seen from Fig 5 , that our proposed method has significantly outperformed the baseline methods based on mean average precision (MAP) in all cases in returning the relevant recommendations at the top of the recommendation list. Moreover, the comparison based on mean reciprocal rank (MRR) depicted by Fig 6 has also revealed that our proposed approach has outstandingly outperformed the baseline methods in all scenarios. It can easily be seen from the figure that our approach is able to return a relevant research paper at either rank 1 or rank 2 of the recommendation list for all queries.

thumbnail

https://doi.org/10.1371/journal.pone.0184516.g005

thumbnail

https://doi.org/10.1371/journal.pone.0184516.g006

As we have pointed out earlier, all these improvements are largely due to the strictness in qualifying a candidate paper which removed less relevant papers to the target paper. This, therefore, increases the system’s ability to return relevant and useful recommendations at the top of the recommendation list.

6. Conclusion and future work

In this paper, we utilized the publicly available contextual metadata to leverage the advantages of collaborative filtering approach in recommending a set of related papers to a researcher based on paper-citation relations. The approach mined the hidden associations between a research paper and its references and citations using paper-citation relations. The rationale behind the approach is that, if two papers are significantly co-occurring with the same citing paper(s), then they should be similar to some extent.

As demonstrated using a publicly available dataset, our proposed method outperforms the baseline methods in measuring both the overall performance and the ability to return relevant and useful research papers at the top of the recommendation list. Based on the three most commonly used information retrieval system metrics, our proposed approach have significantly improved the baseline methods based on precision, recall and F1 measures. Our proposed approach has also recorded significant improvements over the baseline methods in providing relevant and useful recommendations at the top of the recommendation list based on mean average precision (MAP) and mean reciprocal rank (MRR).

In addition to considering the collaborative relations among research papers, our next line of research is to also put into cognizance the public contextual contents, such as titles and abstracts of the recommending papers for better performances.

Supporting information

S1 dataset. the detail of the complete dataset can be accessed via https://figshare.com/articles/supporting_information_dataset_docx/5368408 ..

https://doi.org/10.1371/journal.pone.0184516.s001

  • View Article
  • Google Scholar
  • 4. B. Gipp, J. Beel, and C. Hentschel, "Scienstein: A research paper recommender system," in Proceedings of the international conference on emerging trends in computing (icetic’09), 2009, pp. 309–315.
  • 5. J. Beel, S. Langer, M. Genzmehr, and A. Nürnberger, "Introducing Docear's research paper recommender system," in Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries, 2013, pp. 459–460.
  • 6. K. Sugiyama and M.-Y. Kan, "Scholarly paper recommendation via user's recent research interests," in Proceedings of the 10th annual joint conference on Digital libraries, 2010, pp. 29–38.
  • 7. S. M. McNee, I. Albert, D. Cosley, P. Gopalkrishnan, S. K. Lam, A. M. Rashid, et al., "On the recommending of citations for research papers," in Proceedings of the 2002 ACM conference on Computer supported cooperative work, 2002, pp. 116–125.
  • 8. K. Sugiyama and M.-Y. Kan, "Exploiting potential citation papers in scholarly paper recommendation," in Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries, 2013, pp. 153–162.
  • 9. C. Nascimento, A. H. Laender, A. S. da Silva, and M. A. Gonçalves, "A source independent framework for research paper recommendation," in Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries, 2011, pp. 297–306.
  • 10. C. Prahalad, "Beyond CRM: CK Prahalad predicts customer context is the next big thing," American Management Association MwWorld, 2004.
  • 11. K.-L. Skillen, L. Chen, C. D. Nugent, M. P. Donnelly, W. Burns, and I. Solheim, "Ontological user profile modeling for context-aware application personalization," in Ubiquitous Computing and Ambient Intelligence, ed: Springer, 2012, pp. 261–268.
  • 12. Z. Yu, Y. Nakamura, S. Jang, S. Kajita, and K. Mase, "Ontology-based semantic recommendation for context-aware e-learning," in Ubiquitous Intelligence and Computing, ed: Springer, 2007, pp. 898–907.
  • 13. R. Torres, S. M. McNee, M. Abel, J. A. Konstan, and J. Riedl, "Enhancing digital libraries with TechLens," in Digital Libraries, 2004. Proceedings of the 2004 Joint ACM/IEEE Conference on, 2004, pp. 228–236.
  • 16. P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl, "GroupLens: an open architecture for collaborative filtering of netnews," in Proceedings of the 1994 ACM conference on Computer supported cooperative work, 1994, pp. 175–186.
  • 18. U. Shardanand and P. Maes, "Social information filtering: algorithms for automating “word of mouth”," in Proceedings of the SIGCHI conference on Human factors in computing systems, 1995, pp. 210–217.
  • 20. N. Hariri, B. Mobasher, and R. Burke, "Context-aware music recommendation based on latenttopic sequential patterns," in Proceedings of the sixth ACM conference on Recommender systems, 2012, pp. 131–138.
  • 21. N. Agarwal, E. Haque, H. Liu, and L. Parsons, "Research paper recommender systems: A subspace clustering approach," in International Conference on Web-Age Information Management, 2005, pp. 475–491.
  • 22. K. Chandrasekaran, S. Gauch, P. Lakkaraju, and H. P. Luong, "Concept-based document recommendations for citeseer authors," in International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems, 2008, pp. 83–92.
  • 23. M. Gori and A. Pucci, "Research paper recommender systems: A random-walk based approach," in Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on, 2006, pp. 778–781.
  • 24. A. Kodakateri Pudhiyaveetil, S. Gauch, H. Luong, and J. Eno, "Conceptual recommender system for CiteSeerX," in Proceedings of the third ACM conference on Recommender systems, 2009, pp. 241–244.
  • 25. K. Haruna, M. A. Ismail, and S. M. Shuhidan, "Domain of Application in Context-Aware Recommender Systems: A Review," Knowledge Management International Conference (KMICe) 2016, 29–30 August 2016, Chiang Mai, Thailand 2016.
  • 27. A. Azaria, A. Hassidim, S. Kraus, A. Eshkol, O. Weintraub, and I. Netanely, "Movie recommender system for profit maximization," in Proceedings of the 7th ACM conference on Recommender systems, 2013, pp. 121–128.
  • 28. Schedl M., Knees P., McFee B., Bogdanov D., and Kaminskas M., "Music recommender systems," in Recommender Systems Handbook , ed: Springer, 2015, pp. 453–492.
  • 30. D. M. Pennock, E. Horvitz, S. Lawrence, and C. L. Giles, "Collaborative filtering by personality diagnosis: A hybrid memory-and model-based approach," in Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence, 2000, pp. 473–480.
  • 32. Fano R., "Information theory and the retrieval of recorded information," Documentation in Action , Shera JH Kent A. Perry JW (Edts), New York: Reinhold Publ. Co, pp. 238–244, 1956.
  • 33. I. V. Marshakova, "System of document connections based on references," Nauchno-Tekhnicheskaya Informatsiya Seriya 2-Informatsionnye Protsessy I Sistemy, pp. 3–8, 1973.
  • 34. C. L. Giles, K. D. Bollacker, and S. Lawrence, "CiteSeer: An automatic citation indexing system," in Proceedings of the third ACM conference on Digital libraries, 1998, pp. 89–98.
  • 35. Y. Sibaroni, D. H. Widyantoro, and M. L. Khodra, "Survey on research paper's relations," in Information Technology Systems and Innovation (ICITSI), 2015 International Conference on, 2015, pp. 1–6.
  • 36. K. Sugiyama and M.-Y. Kan, "Serendipitous recommendation for scholarly papers considering relations among researchers," in Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries, 2011, pp. 307–310.
  • 38. Kumar R., Novak J., and Tomkins A., "Structure and evolution of online social networks," in Link mining : models , algorithms , and applications , ed: Springer, 2010, pp. 337–357.
  • 39. F. C. T. Chua, H. W. Lauw, and E.-P. Lim, "Predicting item adoption using social correlation," in Proceedings of the 2011 SIAM International Conference on Data Mining, 2011, pp. 367–378.
  • 40. I. Konstas, V. Stathopoulos, and J. M. Jose, "On social networks and collaborative recommendation," in Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, 2009, pp. 195–202.
  • 41. H. Ma, I. King, and M. R. Lyu, "Learning to recommend with social trust ensemble," in Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, 2009, pp. 203–210.
  • 42. H. Ma, D. Zhou, C. Liu, M. R. Lyu, and I. King, "Recommender systems with social regularization," in Proceedings of the fourth ACM international conference on Web search and data mining, 2011, pp. 287–296.
  • 43. L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan, "Group formation in large social networks: membership, growth, and evolution," in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 2006, pp. 44–54.
  • 52. F. Xia, N. Y. Asabere, H. Liu, N. Deonauth, and F. Li, "Folksonomy based socially-aware recommendation of scholarly papers for conference participants," in Proceedings of the 23rd International Conference on World Wide Web, 2014, pp. 781–786.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Sensors (Basel)

Logo of sensors

A Systematic Review of Recommender Systems and Their Applications in Cybersecurity

Aleksandra pawlicka.

1 ITTI Sp. z o.o., Rubież 46, 61-612 Poznań, Poland; lp.moc.itti@ikcilwapm (M.P.); lp.moc.itti@kizokr (R.K.)

Marek Pawlicki

Rafał kozik, ryszard s. choraś.

2 Institute of Telecommunications and Computer Sciences, UTP University of Science and Technology, 85-796 Bydgoszcz, Poland; [email protected]

Associated Data

Not applicable.

This paper discusses the valuable role recommender systems may play in cybersecurity. First, a comprehensive presentation of recommender system types is presented, as well as their advantages and disadvantages, possible applications and security concerns. Then, the paper collects and presents the state of the art concerning the use of recommender systems in cybersecurity; both the existing solutions and future ideas are presented. The contribution of this paper is two-fold: to date, to the best of our knowledge, there has been no work collecting the applications of recommenders for cybersecurity. Moreover, this paper attempts to complete a comprehensive survey of recommender types, after noticing that other works usually mention two–three types at once and neglect the others.

1. Introduction

The digital revolution gave birth to cybercrime, and brought about the concerns about the security of data, privacy and other digital assets of citizens. Cybersecurity has become even more crucial at the outbreak of the Coronavirus (COVID-19) pandemic, when millions of people were forced to turn online almost overnight, without prior knowledge or experience. Never have so many people been so vulnerable to cyberattacks and online mischief [ 1 ].

In the cases when one must make choices without sufficient knowledge, or experience of alternatives, one used to rely on advice and recommendations from other people. It came in several forms, from face-to-face conversations to film and book reviews in magazines, and from letters of recommendation to printed book guides. It is a natural, social process that the so-called recommender systems try to augment and assist in, by giving personalized suggestions and preventing being overwhelmed with the amount of information [ 2 ].

Recommender systems have proven useful in innumerable applications, and each year, new ways of employing the techniques are proposed. In their survey on the development of recommender system applications, the authors of [ 3 ] grouped the possible implementations into eight “e-“ categories, e-government, e-business, e-commerce/e-shopping, e-library, e-tourism, e-resource services, and e-group activities. The application domains have been shown in Figure 1 .

An external file that holds a picture, illustration, etc.
Object name is sensors-21-05248-g001.jpg

The application domains of recommender systems, based on the survey by [ 3 ].

However, they do not mention the possible applications of recommender systems in cybersecurity.

In cybersecurity, the Security Operations Center (SOC) personnel faces vast amounts of data coming from a variety of sources, accumulated at rapid speeds. The constant flow of information often overwhelms the analysts, making timely and adequate response and mitigation unsustainable [ 4 , 5 , 6 ]. The so-called data triage automation is a well-known problem of SOCs [ 7 ], with the intensity of the domain and the characteristics of incident detection signals strongly degrading human performance [ 8 ]. However, the complexity of the task does not allow full automation at this point in time, which causes an ongoing discourse between the need for keeping human operators in the loop despite their physical limitations and the current state of possible automation [ 4 , 9 , 10 ].

On top of all those issues, cybersecurity is not only the SOCs concern—small businesses, especially in the domain of e-commerce [ 11 ] are prone to falling victim to cyberattacks, as they do not have the budget and the necessary skillset to protect their and their users’ assets [ 12 , 13 ].

One way of bridging the gap between the lack of resources allocated to cybersecurity, inadequate education, the limit of human capabilities and the technical state of automation in the domain is by building a system capable of recommending suitable cybersecurity response and mitigation measures. This solution would deload the human operator, and possibly make up for the deficiencies in education.

However, there have not been many cases of applying recommender systems in cybersecurity. Additionally, to the authors’ best knowledge, there has not been a survey gathering the cases where recommender systems were applied to aid cyberdefenders.

This is why a systematic review [ 14 ] has been conducted to identify the sources describing the applications or recommender systems in cybersecurity. The following work presents the results of the study.

The paper is structured as follows. In Section 2 , the review process has been described in detail and the Research Questions have been raised. Following this, the concept of recommender systems is discussed in detail, and the authors gather the list of recommender types. It is then followed by an outlook on their drawbacks and advantages, usual applications, and concerns about their security. Then, against this background, the results of the literature review are presented, i.e., the existing implementations of recommenders in cybersecurity are surveyed, along with some proposals thereof, followed by the final conclusions.

2. The Conduct of the Study

The desire to gather and describe the applications of recommender systems in cybersecurity, has been the primary motivation for conducting the study. The fact the works on the applications of recommender systems tend not mention this possible use has also contributed to it.

The pipeline of the study course has been presented in Figure 2 .

An external file that holds a picture, illustration, etc.
Object name is sensors-21-05248-g002.jpg

The pipeline of the study course.

At the beginning, the following research question was defined.

  • RQ1: What is the current state of the art regarding the application of recommender systems for cybersecurity?

Additionally, during the study, it turned out that the analyzed works often did not agree about the system of dividing recommender system types. Conversely, some of them mentioned only the basic types (e.g., in [ 15 , 16 ]), or even fewer of them (like in [ 17 ]), while others mentioned other types in various combinations ([ 3 , 18 , 19 ], etc.). Thus, the authors of the study wished to aggregate as many types of recommender systems and another research question was formulated:

  • RQ 2: What is the actual, up-to-date and the most comprehensive division of the recommender system types?

The study contained in this paper gathers the applications of recommender systems in cybersecurity found in scientific literature. Before disclosing the cases, the context is set by completing a comprehensive survey of recommender systems, augmenting and unifying the taxonomies found across numerous previous surveys, also including the information found in method descriptions brought up in the investigated research pieces.

The study took place from February 2021 to May 2021. The literature items were gathered, analyzed and selected with a mix of bibliographic methodologies, including the snowballing method [ 20 ], pearl-growing [ 21 ] and citation searching and Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA), all adjusted to fit the criteria of the PRISMA statement [ 22 ]. Pearl-Growing is a methodology of citation and subject search which involves finding a highly relevant piece of research and using it to find more relevant sources by extracting both the bibliography and relevant keywords, which can then in turn be used in another wave of searches. It is akin to the Snowballing method, which relies on consulting the bibliography of numerous research pieces to find other relevant papers, and then consulting the bibliographies of those to find even more relevant items. The weak point of those methodologies is that they focus on going backwards in time. To offset this circumstance, the authors employed Citation Searching. Citation Search uses the ‘Citation’ or ‘Cited in’ tab made available by the major publishers to find research papers which cite the investigated piece, a procedure which yields more recent items.

First, the keywords used for searching the items were determined. They were the combinations of the words “recommender systems” and “cybersecurity”, “threat intelligence”, or “attack mitigation”. Additionally, this set was expanded to include the alternative spellings: “recommendation systems” and “cyber security”.

Then, the publication databases were determined; the sources were thus searched in journal databases: Institute of Electrical and Electronics Engineers Xplore (IEEEXplore), SpringerLink, arXiv, Elsevier, Association for Computing Machinery (ACM) Digital Library and ScienceDirect. The more general sources such as Google Scholar were not taken into account as they mostly index the works from the aforementioned databases. The keywords were inserted in the search fields of the databases selected as sources. The initial, collective number of all the search engine hits was 1,700,067. The huge number of hits was the result of the fact that part of the search engines returned an immense number of false positives; as they were gradually less and less relevant, the browsing of the results stopped when the researchers decided the items became irrelevant to the study. The breakdown of the initial hits is presented in Table 1 .

Search engine hit breakdown.

At this stage, the titles, keywords and abstract (if available) of the papers were analyzed.

The inclusion criteria for the papers were as follows: the papers had to describe the use of recommender systems in cybersecurity, be published in conferences or journals, and be written in English. During the study, several theses were found which appeared relevant, so they were also subjected to further analyses. The exclusion criteria were as follows: papers did not address the relation between recommender systems and cybersecurity, or were the duplicates of the previously identified items.

Altogether, as a result of the aforementioned bibliographic methods, 393 research items were marked as potentially relevant and further analyzed. Out of them, 86 works were selected for the survey due to their quality and appropriateness to this survey topics, among them 12 other surveys [ 3 , 15 , 16 , 17 , 18 , 23 , 24 , 25 , 26 , 27 , 28 , 29 ]. The papers most relevant to the application of recommender systems for attack mitigation are discussed in the Section 4 .

The following section presents the concept of recommender systems and their types; this will serve as the background for the analysis’ results which will be introduced in Section 4 .

3. What Are Recommender Systems?

The simplest definition of recommender systems is that they are programs attempting to recommend the best items to particular users, with the user’s interest in the item being predicted using the data on the items, the users, and the relations between them. Another definition emphasizes the fact that recommender systems assist and augment the social process of using recommendations of others, to make choices, when an individual lacks sufficient personal knowledge or experience of the possible alternatives [ 2 ].

The recommended items are usually particular products or services, while the users may be both individuals and businesses [ 3 ]. Recommender systems are built to effectively analyze the data and find out only the most relevant information from enormous volumes of data, thus avoiding the information overload and making the service as personalized as possible. An effective recommender system can “guess”, or “predict” the particular user’s interest or preference, based on the analysis of their behavior, or the behaviors of other users. This kind of systems have become an independent research topic in the mid-1990s [ 3 ].

The application of recommendation systems offers several significant advantages. In the online shopping environment, they enable easier finding of items, thus cutting the transaction costs, and ultimately, as they contribute to selling more products, they help yield more revenue [ 15 ]. However, although recommender systems are mostly known for the e-commerce applications, they have been found useful in multiple other domains. For example, in scientific online catalogues, they may enhance the experience, by recommending publications which are beyond the searches. The recommender systems which have entered collective consciousness, i.e., the ones that people are most often aware of are, e.g., the systems used by Netflix [ 30 ], YouTube [ 31 ], Amazon [ 32 ], or Hulu [ 33 ]. In general, recommendation systems have been widely accepted tools for boosting the decision-making processes in various domains [ 15 ]. The following benefits of the application of recommender systems have been listed by [ 23 ]: increased revenue, boosted client satisfaction, better-fit personalization, fulfilling the people’s need for discovery, and scrupulous reporting.

3.1. Basic Terms

Although there exist plenty of algorithms and techniques which fall under the umbrella of recommender systems, they all have several elements in common. According to [ 34 ], each recommender system contains the following three elements:

  • transactions.

By items one understands the entity that is recommended by means of the system, i.e., its output [ 35 ]. Items have the attributes users are interested in. They also possess other structures, used to rate/value them. Then, the users are the ones, whose information is used by recommenders to make new item recommendations. Users have various goals and characteristics. Finally, a transaction means the potential interaction between a user and the system. The item rating is based on the set of transactions. In recommender systems, the information gathered from some transactions is used to make a new recommendation [ 34 ].

At the beginning of recommender systems, the features that researchers concentrated on were of two-dimensional, user x item nature. This was caused by the scarcer computational resources as well as by insufficient knowledge. Presently, additional data are gathered and used to make the predictions more accurate, such as demographic, temporal or social network data [ 36 ].

3.2. Creating a Recommender System: The Principles

When creating a recommender system, ref. [ 34 ] proposes for the process to take into account the following dimensions of the recommendation problem:

  • “Users: who are the users of the system? What are their goals?
  • Data: What are the characteristics of the data the recommendations are based on?
  • Application: what is the application the recommender is part of? [ 34 ]”

3.3. Filtering Techniques

There are several techniques used for building recommendation systems. The most popular ones are: Collaborative Filtering (CF) [ 37 ], Content-Based (CB) [ 38 ] and Knowledge-Based (KB) [ 24 ]; they can also be combined to form various hybrids. The abovementioned types are the ones which most researchers agree on. However, there are other, less common methods that are mentioned by much fewer papers, or the researchers have different opinions on which group the methods should be classified into. This paper aims at mentioning at least most of the less-known and -used techniques, and lists them separately from the main three types. Figure 3 shows the division applied for the sake of this work.

An external file that holds a picture, illustration, etc.
Object name is sensors-21-05248-g003.jpg

Recommender system types by filtering technique; the division devised by the authors for the sake of this work.

3.3.1. Collaborative Filtering (CF)

Collaborative filtering was called “the most mature and the most commonly implemented” technique [ 15 ]. It mostly been applied in the analysis of customer preferences and purchase patterns, and other content which is unable to be described by metadata, such as music or movies. The basic task of collaborative filtering is to find users with similar preferences or tastes based on their opinions, the so-called nearest neighbors; in other words, to search for the items that the user may like, based on the reactions of the users who have a similar taste. Simply put, collaborative filtering analyses big groups of people and aim at finding much smaller sets of users who share their preferences with the user of interest. The items which were liked by the people from the set are the basis for building a ranked list of recommendations; the similarity of users may be calculated in several ways and so may be the recommendations, based on this data [ 39 ]. Figure 4 presents the types of CF presented in this paper.

An external file that holds a picture, illustration, etc.
Object name is sensors-21-05248-g004.jpg

The kinds of collaborative filtering discussed in this work.

In collaborative filtering, the results may either be predictions or recommendations. The former come in the form of a numerical value Rij , meaning the prediction for the score of the item j , while i expresses a particular user. Recommendations of the latter type come as a list of N items which will most likely be selected by a user [ 15 ].

As this kind of filtering is based on the actions of the existing users, i.e., the recommendation is produced depending on the observations of the actions a new user takes and comparing them with the actions and ratings of the existing users, in most CF techniques, the generated preferences are based on a user–item matrix [ 19 ]. Such a matrix is built of a set of items and a set of users who reacted to some items. As [ 39 ] explains, this reaction may be either explicit (numeric rating, or expressing “likes” or “dislikes”) or implicit (i.e., the user’s behavior: clicking the link, the fact of viewing the item, adding it to a wishlist, or spending a certain amount of time on the item). A sample of a matrix of this kind has been presented in Figure 5 .

An external file that holds a picture, illustration, etc.
Object name is sensors-21-05248-g005.jpg

A sample user–item matrix.

In this matrix, rows represent the ratings users gave, while columns mean the ratings that a particular item received. Therefore, the third user has rated the third item and gave it a rating of 1, etc. As it is hardly possible for every user to rate every possible item, and realistically, a person usually rates a few items, most matrix cells will remain empty. If most cells of the matrix are empty, it is called sparse; if most cells are filled, then it is said to be dense [ 39 ].

Based on the data, the nearest neighbors are uncovered. The collaborative recommender systems assume that those strongly correlated users will have an interest in similar items. Such users are grouped into the so-called neighborhoods. Consequently, this kind of systems will recommend buying the items which one of the similar users bought and the other did not, with the prediction value for that item being drawn from the proximity of this neighbor to the average rating of this user; in other words, a user is suggested to choose the items that other users in their neighborhood found favorable [ 15 ].

To calculate the similarity between users or items, several methods may be employed. One of the most widely used ones is the Pearson’s Correlation Coefficient (PCC); it has also been proven to perform better than other metrics by several researchers (e.g., [ 40 ]). It is used to determine the strength of the linear relationship between two variables. It is expressed by Equation ( 1 ) [ 36 ]:

where I u v is the set of the items that both users u and v have rated, while r u i and r v i are the ratings that both users gave to the item. In this metric, the similarity is measured on a scale from −1 to 1, where −1 is a perfect negative correlation, 0 means no correlation, while 1 represents a strong positive correlation. The users who display high positive correlation values have bought very similar items.

Another function which is commonly used in similarity calculations is the cosine similarity, shown in Equation ( 2 ) [ 36 ].

Cosine similarity calculates the similarity by measuring the cosine angle between two vectors of an inner product space [ 41 ]. The input parameter u is a user, while v is an item. If the item matches the user exactly, the angle between two vectors would be 0 degrees, the cosine of which would be 1. On the contrary, the angle between the most “dissimilar” vectors would be 90 degrees, leading to the score of −1. All the values between −1 and 1 represent intermediate (dis)similarity [ 35 ].

Other algorithms used for calculating similarity are, for example, the Euclidean distance, Manhattan distance, Spearman correlation, entropy-based uncertainty, men-square difference, Minkowski distance, etc. [ 35 , 41 ].

Collaborative filtering approaches fall into two main categories, memory-based and model-based recommendations. The memory-based techniques can be further divided into item-based and user-based ones; this paper also discusses additional, less-known ones. In turn, by the model-based techniques one usually means matrix factorization algorithms and deep learning [ 42 ].

Memory-Based Collaborative Filtering

In the memory-based collaborative filtering, the recommendations are made based on the whole collection of the rated items. List of recommended items is built based on the items chosen and rated frequently by the users belonging to the same group. These techniques are composed of the following steps: data pre-processing, selecting the set of K users/items that show the greatest similarity to the particular user/the items they have already rated (a.k.a., selecting the neighborhood) and computing the recommendations, i.e., generating predictions and listing the top-N of them [ 41 ]. The similarity between users or items can be measured using several metrics, usually the Pearson correlation coefficient and cosine similarity.

By this approach, the users are matched by the recommender engine based on their taste in the product in question [ 43 ]. Simply put, in the user-based collaborative filtering, the user U and the set of users similar to them are selected. Then, the rating for an item is searched for; the user has not rated the item. By choosing N of the similar users who did rate the item, the rating is then calculated.

This type of recommendation is based on the concept that customers tend to choose items similar to the ones they expressed an interest in, and at the same time will not buy the items they are not interested in. In this kind of system, the user–item matrices are used as an input for finding the relations among various items. The analysis of how the items interact is the basis for generating a personalized recommendation.

The item-based algorithms tend to perform better than the user-based ones, as the latter ones are known to have scalability issues, i.e., when the user–item matrix is of substantial size, the computational time becomes very considerable. As the relations between items are more stable than those among users, item-based algorithms usually need less computational time to make correct predictions, or the computations may be performed offline. Then, the rating may be calculated from the Pearson’s correlation coefficient and the N nearest neighbor.

Very seldom, the researchers classify other methods as memory-based collaborative filtering, such as Predictability Paths, cluster-based smoothing and trust inferences in [ 41 ], and so on.

Collaborative filtering faces several challenges, one of them being the so-called sparsity. It results from datasets lacking big amounts of data, which makes it hardly possible to make accurate recommendations. Another issue is called a cold start. It arises after a new user has rated too few items for the matrix to make accurate predictions based on the available ratings. It can also happen when the ratio of items to users is very high. In such a case, it may be impossible for the user to rate enough items for the recommender to make a prediction. Both sparsity and cold start result from the fact that it is hard to match users with a low number of ratings to similar neighbors. The possible solution to these problems is, e.g., to apply the spreading activation technique. It consists of turning the data from the matrix into a graph and finding the relations between users and items. In the graph in which distance is the number of edges between the item and the user, the recommendations are built depending on how close the item is from a user [ 44 ]. Another possible solution makes recommendations based on similar user ratings and probability. The accuracy of suggestions improves using the information which does exist for the users [ 45 ]. Finally, the accuracy may be improved by using a hybrid system, i.e., exploring other data sources, such as item attributes or demographic data [ 19 ].

Another issue which can directly impact the recommendation process are the so-called grey-sheep customers, i.e., the customers whose tastes are either unique or exotic. Thus, the recommendations made for them could be of poor quality, and, conversely, the grey-sheep customers negatively affect the recommendations made for all the other customers [ 46 , 47 ].

Model-Based Collaborative Filtering

Some of the drawbacks of the item-based and user-based models have been addressed by the model-based ones. The Netflix competition in 2006–2009 was said to spark interest in this type of technique [ 48 ]. The advantage of this approach lies in the fact that the data are first processed in an offline manner, subsequently leading to building a model. This makes it possible to avoid real-time computations. Additionally, by this approach, machine learning methods are used to find user ratings of unrated items. Thus, the model-based recommenders, after training, can make very accurate recommendations [ 25 ]. However, it must be noted that without sufficient training, the models may be less accurate than the memory-based methods [ 41 ]. Generally speaking, the model-based approach involves reducing or compressing a user–item matrix which is substantial in size, but sparse. The technique is called dimensionality reduction. It relates to the cases where just a small part of the available items has user ratings. Memory-based techniques are not able to generate accurate predictions in such cases; on the other hand, model-based algorithms are designed to find latent factors (i.e., implicit, product-specific features) that help predict the lacking ratings [ 49 ]. As the user–item matrix consists of two dimensions—the number of users and the number of items—the dimensions which are mostly empty contributes to boosted performance of the algorithm. The dimensionality reduction is usually performed by means of matrix factorization. It can also be done with autoencoders, clustering-based algorithms, etc.

It consists of breaking a large matrix down into a product of smaller ones [ 16 , 41 ]. The algorithms employed for factorizing matrices are Singular Value Decomposition (SVD), Principal Component Analysis (PCA), Non-negative Matrix Factorization (NMF), and so on [ 41 ]. Using the algorithms, the features may be extracted for every product that has been rated. Then, a comparison is made between them and the items which do not have any ratings and finally, based on this, the rating is predicted [ 35 , 36 ].

Usually, by this type of algorithms, one means the k-Nearest Neighbors (kNN), a Machine Learning (ML) technique. The aim of this algorithm is to search for clusters of similar users, the similarity being based on the users’ past behavior (like their ratings, the items they had already bought, etc.). Although the user-based collaborative filtering is based on the same concept, with the kNN the similarities are found based on an unsupervised machine learning model. Additionally, the number of similar users is limited to k [ 42 ]. It is worth noting that some researchers argue this technique does not belong to the model-based recommenders, but rather, it should be classified as a memory-based one, as though it is a machine learning technique, it is of non-parametric nature [ 42 ].

3.3.2. Content-Based Filtering

The content-recommendation systems are said to be most effective for recommending text-based items, such as documents, news items or web pages [ 15 ].

In the content-based recommendation systems, the predictions are made based on the data on the items and past actions of users, and not on the other users’ choices [ 15 ]. The concept behind this kind of filtering is that a user tends to buy future items which in some ways are similar to the ones they previously bought. The similarity of items is calculated based on several their features and/or attributes. The main challenge of this approach consists of the collection of the data about items. Lack thereof results in sparsity, as in the case of collaborative filtering. Although in content-based recommender systems there is no need for a vast number of users or item ratings (like in collaborative filtering), they require a proper amount of information to make accurate predictions. The features/attributes used in this type of filtering include the metadata or even the actual contents of documents [ 19 ]. Generally speaking, in CB recommender systems, two main techniques have been applied for making recommendations. One of them, the heuristic one, uses traditional tools such as cosine similarity measures. The other approach employs machine learning and statistical methods, using the past users’ data for learning the models which are then able to predict users’ interests [ 50 ]. With this method, when making a recommendation, a vector is built, in which 1 means a word is present within a document, and 0 indicates the document does not contain it. Following that, the vector is compared with other documents by the recommendation system. One of the challenges to this method is the fact that the vector is in favor of longer documents, and that the frequency of a word in the document is not taken into account [ 51 ]. To solve this problem, the analysis of documents should be made by means of the technique called Term Frequency-Inverse Document Frequency (TF-IDF). TF takes into consideration the number of appearances of a word within a document. In turn, IDF attributes greater weight to the words which are only present in one document, thus helping to emphasize the difference between it and other documents [ 52 ]. Other means of modeling the relations between the items could be probabilistic models, such as the Naïve Bayes classifier [ 53 ], decision trees [ 54 ] or neural networks [ 55 ]. In these techniques, the model is learned using the machine learning or statistical analysis techniques.

The recommendation process begins with gathering the information on and the ratings of the items previously bought by a user. Then, the system searches for similar items. The similarity is determined as follows: using a similarity calculation, the items are collected into neighborhoods; in other words, neighborhoods are built by making a comparison between new items and the items already in the inventory. If a user bought an item, it counts as a vote for the neighborhood associated with the item. An item ought to be recommended to a user if they rated highly k of the nearest neighbors. This method is said to require relatively little data to make accurate predictions, and to be easily adaptable [ 51 ], i.e., if there are changes to the user’s profile, the recommender can swiftly adjust the recommendations [ 15 ].

It is helpful to keep long-term profiles of users, as it makes the recommendations more accurate, although short-term profiles are useful as well, e.g., when users tend to grow new interests in items. Apart from the problems mentioned above, the content-based filtering may face other challenges. The type and quality of the data associated with the items is of vital importance to CB recommenders. If the item is not text-based, extracting its features may prove to be a daunting task. In turn, with the text-based items, the systems consider words only, while other, more subjective attributes are neglected. A possible solution could be entering the attributes in a manual way; however, this solution is not entirely realistic due to the fact how time- and resource-consuming it is. The CB recommendation systems may experience sparsity as well, when there are not enough item attributes. There is a concern to be had if the content-based system begins to make suggestions which are too similar. For example, it may recommend a user the same item they bought previously, but from a different company. To avoid this kind of problem, the system may either be supplied with some diversity, or the items which are too similar must be excluded, by means of a filter [ 56 ]. It is also possible that a content-based recommender system experiences the cold start issue; however, as this type of recommender system needs just a few ratings or pieces of information on past actions, the problem is not as severe as with the collaborative filtering recommenders [ 51 ]. Lastly, content-based filtering may sometimes suffer from overspecialization, i.e., the situation in which the system starts recommending items which are very similar to one another, without suggesting novel items [ 50 ].

3.3.3. Knowledge-Based Filtering

As [ 57 ] put it, knowledge-based recommenders are those that use different knowledge sources from the ones used by collaborative filtering and content-based recommender systems. Unlike the recommendations described before, the knowledge-based ones do not rely on the user–item data. Their predictions are based on explicit rules about the problem domain, as well as on the attributes of items. They do not track the actions of users and do not collect ratings; instead, this type of system gathers specific requirements from the users. Therefore, there is no problem with sparsity, even in the case of the seldom-bought items [ 57 ].

Generally speaking, knowledge-based recommenders may be divided into two approaches, constraint-based and case-based ones. Some researchers, such as [ 58 ], classify utility-based as belonging to the knowledge-based recommenders. For the sake of this paper, utility-based recommendation systems have been discussed separately.

The constraint-based recommenders are called this way, as they in fact compare the attributes of an item within the constraints, i.e., the requirements that the users give, or the constraints from the product domain. In other words, in this method, the recommendation equals satisfying the constraints, with the products which fulfil the constraints being the good recommendation [ 58 ]. This type of recommendation may also be made using a conjunctive query over the product database. In this approach, the user requirements for attributes are connected and a conjunctive query is created. Following this, a database query is made, and the items meeting the constraints are returned.

There are also case-based recommendations systems. They make predictions based on the similarities between items and the requirements. According to [ 59 ], the distance similarity of an item is dependent on the sum of all the similarities of attributes weighed by the requirements. Consequently, the distance between an attribute and a requirement is what the other items of interest depend on. In other words, local similarity is used for finding similar items. It is found by dividing the distance of the attribute of the item from the desired attribute by the total range of the attribute. Other case-based recommendations are made using a query-based paradigm, which is effective as long as users have specific requirements. The abovementioned case-based recommenders in fact depend on suggesting items that are close to the requirements, and do not need to fulfil all of them.

It is important to mention that although, following [ 60 ], this work classifies case-based recommenders as the knowledge-based ones, there are researchers such as [ 61 ], who classify them as content-based recommenders instead, or think of them as being a bigger, more general group, encompassing content-based and knowledge-based recommenders [ 62 ].

The limitations of the knowledge-based recommenders result from the inability to meet all the requirements the user has given. This may result in giving a null response. To avoid this, the system should be designed to suggest the users to relax the requirements, or even do it automatically. This way, the system is eventually able to suggest an item which is as close as possible to fulfilling the original requirements. Another way of making this type of system to give better predictions is to use a divide-and-conquer algorithm QuickXPlain [ 63 ]. The algorithm identifies the conflict between the user requirements and the potential items, i.e., constraints that the system is not capable of fulfilling.

3.3.4. The Comparison of the Three Main Filtering Approaches

Each of the three aforementioned methods has its own advantages and flaws. The main, most prominent advantages and disadvantages, as well as the applications the techniques are the most suitable for, have been shown in Table 2 .

The comparison of collaborative filtering, content-based and knowledge-based recommender systems; compiled by the authors based on the subject literature.

3.3.5. Hybrid Recommender Systems

As [ 3 ] remarks, the three main types of filtering play a dominant role in the majority of applications today; however, as the issues characteristic of the three aforementioned filtering techniques influence the recommendation’s quality in a negative way, hybrid filtering has been proposed [ 67 , 68 , 69 ].

The so-called hybrid recommender systems are usually based on the three main types of recommender systems. They can make more accurate predictions, as they combine different approaches to gather information. The final results depend heavily both on the used algorithms and the method of hybridization, i.e., the way and order in which the outcomes of an algorithm relate to the other ones. The recommenders that suffer from sparsity, i.e., CF and CB, are better for solving the issues where there is an abundance of data. On the other hand, knowledge-based recommenders cannot find associations between users and items. CF and CB systems can adjust to the changing needs of users, but the former systems outperform the latter ones in the cases when there is a lack of item attributes. Conversely, content-based ones can work better even without having a great number of user–item ratings to analyze.

To exploit the strengths of the systems, and not rely on their weaker points, hybrid recommenders are constructed to use various techniques on the same dataset. Afterwards, the resulting data are combined to make the final recommendations. For the combinations to give valid results, they must be given static weights. The weights may be influenced and changed, e.g., to reflect the users’ feedback.

One such hybrid might be a system which bases the used technique of computing recommendations on a given context/situation. Such a system is called a switching hybrid. If there were sparsity, it would first use a knowledge-based system and then, after the users’ rating, it would switch to collaborative filtering, and so on. The decision if and when switch the technique may be made based on the fact if the default configuration is able to give a valid result, or not [ 19 , 70 ].

A comprehensive list of the various hybridization techniques has been presented in Table 3 .

The hybridization methods, based on [ 43 ].

Besides the techniques mentioned in the table above, there are also other manners of combining the filtering methods. As [ 71 ] remarks, it is possible to, e.g., make a unified recommendation system, while treating user rating prediction as the issue of machine learning, with probabilistic latent semantic analysis, or combine the similarities in a unified kernel space, where the predictions are made based on support vector learning, etc.

Another approach to combining recommendation methods is by applying graphs. In a database of this kind of a recommender system, data are contained in nodes; their edges are linked together. The links representing the relations may be either weighted or unweighted. Thus, the relationships between nodes are easy to retrieve, especially if the entities in the system are strongly connected [ 72 ]. Although it may take slightly more time to compute [ 73 ], this method is intuitive and available, and helps overcome the issues such as data sparsity [ 71 ]. The graph-based recommendation systems have already been tested in various applications, such as in a digital library [ 74 ], collaborative ranking [ 75 ], and making recommendations of drugs [ 76 ], books [ 73 ], and movies [ 77 ], and so on. Most of the aforementioned solutions are based on the Neo4j graph data platform; it has also been deemed the best choice among graph databases by [ 78 ].

3.3.6. Other Types of Recommender Systems

There are several other, more specialized and thus less widespread recommender system types. It must be noted that there is no common agreement as far as the ontology is concerned, i.e., some of the types mentioned below are classified as belonging to the abovementioned groups by some researchers. This may be due to the dynamically changing domain and state of the art, or the fact that several methods show features which may belong to more than one group/type. In addition, several filtering types have been distinguished according to the technology applied and not the features taken into account when making recommendations; thus, some overlapping occurs, or the same technique is classified as more than one category.

Sometimes called CIRS, the computational intelligence recommender systems are the ones which include Artificial Neural Networks (ANN), Bayesian techniques, clustering techniques, genetic algorithms, fuzzy set techniques, etc., in their recommendation models. Bayesian classifiers solve classification problems based on probabilistics. They often are part of model-based recommenders, or help create a model for the content-based recommenders. With a Bayesian network being used for recommendations, the nodes correspond to items, while the states correspond to all the vote values possible. Thus, each item in the network will have a set of parent items—they will be its best predictors [ 3 ].

Artificial neural networks have also been used as part of recommendation engines. For example, ref. [ 79 ] have applied one in a personalized TV recommendation system. They trained an ANN of three layers with the back-propagation method. A hybrid movie recommender was presented by [ 80 ]. The trained ANN representing the preferences of individual users was responsible for content filtering.

To make the computational cost of finding k-nearest neighbors lower, clustering may be applied. Clustering consists of assigning items to groups. This way, the items within groups are more similar than the ones in other groups. With recommender systems, this may result in, e.g., smoothing the unrated data for users, by predicting the unrated items from a group of related items. Additionally, with the assumption that the nearest neighbor is within the Top-N most similar clusters to the active user, there is only the need for selecting the nearest neighbors in the Top-N clusters. This results in greater scalability of the system [ 3 , 81 ]. Furthermore, the technique can help tackle the cold start issue, by grouping items [ 82 ].

Genetic Algorithms (GA), i.e., stochastic search techniques, have mainly been applied in K-means clustering, for improved online shopping market segmentation, such as in [ 83 ]. Similarly, ref. [ 84 ] have used a GA method for obtaining optimal similarity function. Finally, several techniques based on the fuzzy set theory have been used to handle the non-stochastic uncertainty, e.g., the information being imprecise, or the classes of objects not being sharp enough [ 3 ].

The rapid increase in the social networking tools has directly resulted in social network analysis becoming an important part of recommender systems. Recommender systems offer the possibility for the users to make social interactions among one another, such as comments, adding to friendlist, etc. Based on these interactions, recommendations can be made. The social network recommendations rely heavily on the concept of “trust”. In human interactions, a person’s decision (to buy something) is more likely to be influenced by friends’ opinions than by an advertisement. Trust, i.e., the level of how one user trusts others concerning a product, is helpful in making predictions where the data on similar neighbors would be too sparse otherwise. Indeed, a positive correlation between trust and user similarity has been found scientifically [ 85 ]. In addition, the authors of [ 3 ] discuss other social interactions and relations which are used for making recommendations, namely social bookmarks, physical context, social tag, “co-authorship” relations, “co-citations”, and more.

In recommender systems, context is understood as any kind of information which may characterize a situation or an entity, such as a person, place or an object that is relevant to the user–item interaction [ 86 ]. Context may thus mean time or the company of other people. Applying context in recommendation process makes the results more personalized and appropriate. As [ 87 ] claim, the rating function is no longer two-dimensional, i.e., (R: User × Item → Rating); instead, it has become multi-dimensional (R: User × Item × Context → Rating).

Group recommendations are a method of making group suggestions “when group members are unable to gather for face-to-face negotiation, or their preferences are not clears despite meeting each other [ 3 ]”. They are used for recommending films, music, websites, evens or travels. The process of clustering people into a group may follow several strategies, based on the research of decision-making or social choice theory, such as the theory of average, least misery, most pleasure, and so on [ 44 ], as well as the strategies of sum or approval voting.

Some researchers, such as [ 23 , 88 ], describe the demographic filtering as a separate filtering technique. By this method, the system gathers the information such as age, gender, education level, place of residence, as well as users’ opinions on items. Then, the similarities are found between the users’ ratings; finally, the data are filtered by users’ age or the area they live in. According to [ 18 ], these methods form similar correlations to the ones present in collaborative filtering, but unlike the collaborative and content-based techniques, they may not need a history of user ratings. However, they may raise some security issues, due to the nature of data they gather [ 23 ].

Lastly, there are utility-based recommenders. In them, the utility of an item for a user is calculated, with gathered the users’ interest level in that attribute. As with the knowledge-based recommenders, the utility-based systems are not based on building long-term generalizations concerning the users. Rather, the recommendation is made based on the assessed match between the set of available options and the users’ needs. Specifically, the utility-based recommenders calculate the utility of each object to a user and then make recommendations based on that. The weight of the attribute may also be calculated by the system, lowering the load on users. To do so, the total utility must be determined. It is the sum of all the item values, i.e., the weight multiplied by the similarity function. The system returns a list of items ranked according to their similarity level to the user requirements [ 59 ]. There are various approaches to what makes utility and how to compute it, but the general idea is that the utility function should be based on item ratings that the users offered to describe their preferences [ 37 ]. One of the main advantages of this filtering technique is that the utility computation can be influenced by some non-product attributes (e.g., product availability). This way, for a user who needs to receive an item as soon as possible, such a system could enable trading off price against delivery schedule [ 18 ]. As mentioned before, utility-based recommender systems are either seen as separate method of filtering [ 18 ], or as being part of knowledge-based recommenders [ 19 ].

4. The Result of the Study—The State of the Art of Recommender Systems for Cybersecurity

This chapter aims at presenting the results of the literature study, by gathering the instances when recommender systems were employed in cybersecurity. This will allow formulating the answer to the Research Question 1.

In their work, Polatidis et al. remark that recommender systems had not been used for attack prediction before [ 89 ]. Thus, they have proposed a method which combines collaborative filtering recommendation systems with the methods of discovering attacks paths. The attack graphs were built based on data sourced from maritime supply chain infrastructure. Their tool was validated and evaluated experimentally, proving its effectiveness. It uses a “parameterized version of multi-level collaborative filtering method (…) although other methods could be applied according to the scenario and the available data [ 89 ]”. Their method first uses CF, and then the k-nearest neighbors are rearranged, by the similarity value and the number of co-rated items. More specifically, the tool first finds all the paths that could be used to perform an attack. Then, a recommender system is applied to predict what attacks might take place within this network. The method the authors used was a parametrized version of multi-level collaborative filtering, although they make a remark that other algorithms could be applied as well. By this method, after CF has been applied, the k-nearest neighbors list is rearranged, to reflect the value of similarity and how many items had been co-rated. For attack classification, the characteristics from the abovementioned method was used. The authors first employed classical CF using the Pearson Correlation Coefficient, as shown in Equation ( 3 ) [ 89 ].

There, Sim ( a , b ) relates to the similarity of users a and b , r a , p is the rating of user a for product p , r b , p is the rating of user b for product p and r a , r represent user’s average ratings, while P stands for the set of all products [ 89 ]. Then, the authors analyzed the similarity values and the co-rated vulnerabilities. Based on this, attacks were classified, on a scale from very high to very low. The last step consisted of checking whether there were any attack paths between the assets. The authors mentioned that they did not just use classical CF without additional parameters, as it would be less effective than the method they had proposed.

Lyons [ 19 ] also proposes using a recommender system in cyberdefence domain. The main goal of this effort was to make the Observe => Orient => Decide => Act (OODA) loop of cybersecurity defenders faster, chiefly by speeding up the decision part. They propose implementing existing, effective Intrusion Detection System (IDS) solutions with a recommender system to provide the best steps to take in a given situation, so that the IDS provides the information to the recommender system to predict the likelihood of certain events for given nodes of the network. The system returns a list of actions—the actions are ranked and the highest one is supposed to mitigate the most events that the system predicted are likely to happen. The input is provided by the IDS—Network Modeler proposed in [ 90 ]. It consists of three main parts: sensors, database, and modeler algorithm. The sensors are mostly Commercial-Off-The-Shelf (COTS) software besides the custom-written host monitoring (java-based program sending updates to the sensor machine). For observing the network, Snort and Nmap were used. The database is a combination of the various data gathered by the system, i.e., Snort, Nmap, vulnerability scores, host monitoring software, etc. The host monitoring software sends updates to the base on the sensor machine; the data such as CPU and memory usage, etc., it also controls the antivirus software for every host. The task of the modeler is collecting and classifying the data about the health of the network, i.e., determining if there is any threat to the network, by analyzing the data from all the sensors, as well as generating a list of possible actions which may be taken on client machines/firewall machines to mitigate the threat. (e.g., update operating system, upgrade application, create/delete user account, disable port, block source IP address, etc.). Then, the output XML file describing the network state is created so that every host receives data about the attacks, infections, user accounts, etc., as well as the actions which could be undertaken for that specific host.

The recommender system used in this model is of collaborative type; it uses the user-based nearest neighbor recommendation algorithm; the users and products being replaced with nodes and their attributes. The similarity between the nodes and the attributes for each node is calculated using the Pearson correlation coefficient. The values range from −1 = strong negative correlation to 1 = strong correlation. Then, based on the similarity values, the prediction is calculated. If the nodes are similar (i.e., have similar vulnerabilities, thus one may suppose they are prone to the same kinds of attacks), then the events occurring on them will result in higher predictions. As the author explains, “the predicted rating for nodes are the values used to determine vulnerabilities that the knowledge-based recommender system needs to consider when generating defensive actions”. Then, a knowledge-based recommender system is applied for making recommendations of defensive actions, with the selected paradigm being called a constraint-based recommender (The system depends on the previously set knowledge of the actions which are used to mitigate/counter certain cyberattacks). At the start of the recommender system, all the actions it knows are loaded from the XML file; the file is loaded only once, making the first computation time greater than the other ones. Various exploits may be mitigated by means of one action; this is why the proposed action ought to be able to counter the greatest possible number of the predicted attacks. Thus, the presented list of suggested actions is sorted according to the number of the mitigated attacks. However, in this solution, the final choice of the action is left to the cyberdefender. In this recommender system, all the possible actions which may be taken to mitigate the threat are considered.

After calculating all the action values, the system sorts them and presents as recommendations, in descending order. The author remarks that to find out which recommendation algorithm proves the most effective, it is necessary to observe multiple kinds of them. The so-called hybrid recommender systems comprise of the collaborative, content-based and knowledge-based; they may be combined using various techniques. The author believes that the nature of cyber threats requires one to use a knowledge-based recommender, as the collaborative and content-based ones will be affected by the issue of sparsity. Thus, in their system, the author has combined a knowledge-based recommender with a collaborative technique.

In their paper, Sula et al. [ 35 ] have proposed applying a recommender system to support the mitigation of the Distributed Denial of Service (DDoS) attacks, by helping in the decision-making process of the defender and suggesting the most appropriate cybersecurity solution. The system, called ProtecDdoS, takes into account the requirements of a customer, i.e., the service type (proactive/reactive), the type of a DDoS attack, the coverage region (city, country or even continent), deployment time (minutes, hours, days, weeks), leasing period, and the client’s budget. The three last parameters may also be given priority, e.g., if budget’s priority has been set to high, it will show the cheapest solutions first, while the remaining characteristics will be close to the selected ones. The recommendation engine uses the following similarity algorithms: cosine similarity, Euclidean distance, Manhattan distance, Minkowski distance and the Pearson correlation.

The client-side of the solution has an Application Programming Interface (API) implemented using React. All the abovementioned parameters can be set there. The attack types may be entered from the drop-down field, but can also be imported from attack log files. Once the user’s profile has been set, the recommendation is given, i.e., a list of suitable services is presented, along with their description. The recommendations are supported with visual representations of the results, in the form of plots, to provide clients with better grasp of how the recommender works. Moreover, to provide the maximum security, it is possible to check the service’s hash, to see whether it was manipulated or not and can be trusted. The users can add more services to the database, too.

The server-side of the solution was implemented using Flask 1.0.2 The recommendation process bases on two components: Service Helper and Recommendation Engine. The helper component is responsible for filtering irrelevant data from the dataset, according to the user’s preferred features. It also calculates the index of characteristics, by assigning an integer value to each characteristic taking variables into consideration. Then, the Recommendation Engine calculates the customer index and service index. Then, similarity score is calculated using various algorithms and a resulting list of services is returned, sorted by the similarity index.

Soldo et al. [ 91 ] have proposed a method of predicting future attacks/malicious activity from past behaviors, calling it the “predictive blacklisting”; blacklists being understood as logs of past attacks, attack sources, etc. The recommendation system part was inspired by the one used by Netflix. As the authors said, they “framed the problem as an implicit recommendation system, which paves the way to the application of powerful machine learning methods. (…). Given a set of attackers and a set of victims, a number r is associated with every (attack source, victim destination, time) triplet according to the logs: r can indicate, for example, the number of times an attacker has been reported to attack a victim over a time period. More generally, we interpret r as the rating, or preference, assigned by an attacker to a victim [ 91 ]”.

The architecture of their model consists of three algorithms blended in a linear way; a time series model, responsible for accounting for the temporal dynamics, followed by two neighborhood-based models. The first of the two latter models is a modified kNN model, it predicts the attack while concentrating on finding similarities between the victims who were attacked by the same sources, ideally at the same time. The other algorithm belongs to the co-clustering type; its goal is to discover a group of actors that attack a group of victims at the same time, and it does it automatically [ 91 ]. The method differs from a traditional recommender system in several aspects. First, the authors addressed the issue of the rating matrix varying over time due to the changes in attack intensity. In traditional recommender systems, the matrix remains static. Another significant difference is that in recommender systems, the users provide the ratings themselves. In the case of this tool, as the rating is made based on the attacks reported in the logs, it is implicit.

The tools were tested on real-life data from the dshield.org website, which captures hundreds of millions security logs from a great number of websites. The authors claim that their solution, when compared to the state-of-the-art methods, proves significantly more accurate and more robust against poisoning the dataset.

In their paper, Franco et al. [ 92 ] (as well as [ 40 ]) have presented the MENTOR system, which they describe as a “support tool for cybersecurity”. It was designed to be able to recommend the most suitable protection measures (such as specific anti-malware tools, firewalls, etc.), as the market for such services is booming and end-users may not be able to find the best solution themselves, especially when under cyberattack, needing an ad-hock response and thus in a hurry. The system can recommend both the most suitable prevention and mitigation measures and does it according to various demands, i.e., taking into account the profile of a customer, their budget, as well as the properties of an attack. The architecture is structured as follows: first, the Service Requestor gathers the data concerning the attacked infrastructure as well as the attack itself from the monitor logs. The relevant data are also stored in a database for future analysis. Then, the information is sent to the Extractor and the recommendation process begins. It comprises of several steps; first, the data are analyzed and the correlations with the attack type are found using the Classifier component, by comparing the data to previously identified and known attacks. This allows determining the best mitigation measure, by means of Service Aggregator, which contacts various service providers and gathers the information on available services and their features, such as the region they cover, their price, etc. Based on this information, a list of protection services that could be used is created; then, the Aggregator is queried by Retriever. It can address the clients’ demands fully or partially, providing the most desirable price, performance or technological solutions. Then, the data from the Retriever is sent to the Recommendation Engine. At this step, based on the requirements derived from the Retriever, it uses several different algorithms to find the most suitable solution. Lastly, the customer input is used by the recommendation engine to find out which solution from the list is the best recommendation.

The system’s recommendation process was assessed by means of four commonly used measures of similarity, capable of quantifying the similarity of two items: (1) Euclidean distance, (2) Manhattan distance, (3) cosine similarity and (4) Pearson correlation. Measuring the similarity in a geometric way is possible, as the available protection services and customers’ requirements are mapped as vectors in space. In other words, the similarity may be evaluated when their magnitude and attributes are presented as directions in space.

The recommendation process starts when an integer array is created by indexing the parameters required by the client, and each service. Then, the properties of each service are indexed. Next, the profile of a customer is mapped as the Y vector, while the protection services as the X vector. This constitutes the input for the algorithms assessing similarity. Lastly, a similarity dictionary is built of the ratings. Service ID serves as a key here; owing to this, it can be used for the purposes of exporting or plotting the similarity.

MENTOR’s prototype consisted of a web-based user interface and the Recommendation Engine. A customer may set the requirements and prioritize their demands (high => low) using a dashboard. The prioritizing affects the final recommendation, by neglecting the remaining, less prioritized demands. The Recommendation Engine is also capable of running without the dashboards; in such a case the system acquires necessary input through the MENTOR’s API. Additionally, the choice of the recommendation algorithm may also be made by the end-user. To make the choice easier, MENTOR gives the user various kind of information on the classification results, e.g., in the form of graphs plots, comparison of the vectors representing the profile vs. the vectors of each service, etc. It may be a significant decision, as the results can vary greatly, depending on the methods used. This is so, as the features of a service are shown as a vector in space, some properties (especially the variables of high-magnitude, such as price) may drastically change the vector’s direction in space and influence the rating. This in turn may result in the services specified in the client’s profile, and thus favored in the recommendation process, may turn out to be worse than other possible choices. This is true for the algorithms based on distance (Euclidean, cosine, Manhattan), therefore the authors suggest that the Pearson correlation may be more accurate here, as it shows invariance to the elements’ magnitude. Another suggested way of overcoming the problem is to group the vectors of the services for every attribute and then consequently make a comparison between the service attributes and client-specific attributes in a 1-1 way. Thus, it is the average rating of the service’s attributes which constitutes the final rating. However, the authors notice that the attributes of the services which perform better than those specified by the customer will receive worse ratings, so their suggestion is to rearrange the input attributes to reflect the best conditions possible. This way the algorithm will provide the best option, not the one which is the closest to the request of an end-user.

Esposte et al. [ 66 ] have proposed using a recommender system which would collect cybersecurity alerts gathered from various external sources and recommend them to any person with network administrator profile. The main idea behind this was that a network administrator may be flooded with security alerts, but not all of them are relevant. Based on the administrator’s preferences and ratings, the alerts could be filtered using a recommender system. An interesting assumption is to make sure that some items will always be recommended, even if the user has not provided any requirements yet. The system is a hybrid one; first, the general ranking scores of items are calculated using the Bayesian Average, then, the collaborative filtering and content-based filtering are applied. The greater the ranking scores are, the better ranked the item is going to be. Then, the ranking is computed again, after adding weights for the elements, e.g., adding the greatest weight to the critical votes, etc. In cases when there are no rated items, the most recent cybersecurity alerts are shown. Otherwise, collaborative filtering and content-based approaches are applied. In the CF part, the similarity of users to their neighbors is calculated using the Jaccard similarity coefficient. The greater the coefficient, the greater the similarity. Then, the list of top-N recommendations for a user is computed. The other list is created using the content-based method, where the user’s interest in items is used to weight them, and the items the user is interested in have their tags added to the user’s interest. The higher the weight of the item, the more interest the user has in it. After performing collaborative filtering and the content-based recommendations, a mixed hybrid approach is then used to come up with the list of the top-N recommended items.

As the authors say, their “work advances the state of the art in cyber security by proposing a new model for gathering relevant information on cyber security alerts based on recommender system methods [ 66 ]”. The model was evaluated in an offline experiment and the results show it could be applied in the cybersecurity alert recommendation process. It is worth noting that the authors redefine the elements of a recommender system to suit the needs of cybersecurity administrators: the “item” in this case means a security alert, and its content elements are “item attributes”. The “user” here is the network administrator. By a “transaction” in this context one understands the potential interaction between “users” and “items”.

Casey et al. [ 93 ] discuss a full implementation of a recommendation-verification system in the context of defense against malware. They argue that it is possible to employ machine learning methods in learning the trace features of malware families. They present the requirements such a system would have to meet, and emphasize the significance of its interpretability. In an experimental way they prove the feasibility of the solution they propose.

In their paper, Du et al. [ 94 ] present the problem of People-Readable Threat Intelligence (PRTI), the amount of information in which may be overwhelming for cyberdefenders. They notice that making recommendations in this particular field is challenging, as the data tend to expire and become outdated very quickly, traditional knowledge graphs are not really suitable for this purpose owing to large amounts of noise in data, and the language of PRTI is highly condensed and specific. Thus, they proposed a knowledge graph created specifically for the PRTI recommendation equipped with a denoising entity extraction module. Then, they propose a framework which uses a Knowledge-aware Long Short-term Memory neural network (KLSTM) for providing external knowledge for PRTI recommendation, using information from the knowledge graph. They prove experimentally that their method of combining a Latent Drichlet Model (LDA; a three-level hierarchical Bayesian model) topic model with a KG-aware LSTM proves effective and more accurate than in the case without external knowledge.

Sayan et al. [ 95 ] have presented the design of the architecture of an Intelligent Cyber Security Assistant (ICSA). Such a tool would aid human cyberdefenders in their tasks. The proposed architecture would detect attacks using machine learning and recommend the defense solutions. It gathers the network traffic data using existing monitoring tools. Then, the data are analyzed using the anomaly-based intrusion detection approach. The results are then used for making recommendations. The Intelligent Recommender Assistant (IRA) is a module that leverages machine learning methods for making recommendations. The authors decided to use the knowledge-based technique; thus, the knowledge scope is assessed, and knowledge base must first be built. Then, they apply feature engineering to transform the raw data into so that it better suits the predictive model. The learned model is then tested on unseen data, making recommendations concerning the mitigation of real-time attacks. The authors claim that through iterative machine learning, their system can make accurate predictions and expect it to keep learning, adapting and improving, which will make it possible to let it make automated or semi-automated cyberdefense actions.

Panda et al. have proposed a recommender system the aim of which would be to find which machine learning model is the most suitable for identifying attack models. The recommender system’s components were Naïve Bayes, Decision Trees, Ensemble learning, AdaBoost [ 96 ].

Gadepally et al. argue that recommender systems show promise in cybersecurity applications, as they might significantly lower the time of responding to the threat [ 97 ]. In the cybersecurity domain, the analysts must process massive amounts of information, which may lead to overlooking crucial data. According to authors, a recommendation system would help filter and prioritize information, as well as weigh multiple factors. They think that a recommender could be used to track hundreds of websites in an automated way, to learn from past user behavior. This would be used to prepare recommendations for the IT security team. If vulnerability’s severity and possible impact were assessed, they could be then used to make a recommendation concerning patching the vulnerability, e.g., if the patching might be postponed or should be immediate. Recommendation systems might also be used for tracking network anomalies, to draw the IT security teams’ attention to the possible issues which could be solved in a proactive way. Finally, the authors remark that (as of 2016), future work would be needed to adapt the recommender technologies to meet the specific requirements of the cybersecurity domain.

5. Discussion of the Results

This work has first presented a detailed overview of the concept of recommender systems, their types and techniques, advantages and disadvantages, security concerns, as well as possible fields of application, pointing it out that the use of recommender systems as an aid to cyberdefenders in hardly ever mentioned. Then, against this background, the state of the art of the applications of recommender systems in cybersecurity was indicated, gathered from the systematic review of the subject literature. The results of the conducted study helped answer the Research Questions.

5.1. The Answers to the Research Questions

The answer to Research Question 1: The comprehensive, broad study of the subject literature allowed the authors to conclude that recommender systems have indeed been applied for the sake of cybersecurity. They were also able to identify several specific applications, which have been presented in the previous Section of this paper. Most of the authors of the analyzed papers claim they were the first ones or one of the first ones to apply this technology to cybersecurity, but believe it shows promise and in the long run could effectively assist the human operators in the loop. They believe that recommender systems show promise as a tool for cyberdefenders, as they “have the potential in mission scenarios to shift computational support from being reactive to being predictive [ 97 ]”. It has also been remarked that more exploration should be made into using recommenders in the cyber domain [ 19 ].

The answer to Research Question 2: While analyzing the sources, the authors noticed that the papers did not present a unified taxonomy of recommender system types. By gathering and organizing the types presented in the analyzed papers, this work proposes a new, up-to-date list of recommender system types. Thus, this work has presented the most comprehensive list of the kinds of filtering among all the analyzed papers.

5.2. Threats to Validity

For the sake of this work, a substantial number of papers have been looked upon and considered for further analysis. Due to the fact that three of the five selected knowledge sources’ search engines returned hundreds or thousands of false positives, the authors analyzed the hits until the results ceased to seem relevant. Furthermore, the study let the authors identify about a dozen works showcasing actual implementations of recommender systems for cybersecurity. The investigated works come from different approaches and are not prone to comparison. It emphasizes the need for further research in this emerging field, a similar survey ran in a few years could probably help answer the research questions more thoroughly.

6. Conclusions

This paper has presented the results of a broad, systematic study of the potential use of recommender systems in cybersecurity. Several hundred literature items were marked as potentially relevant and then carefully analyzed. Several papers presenting the implementations of recommenders in cybersecurity were found and described.

All in all, the study showed that recommender systems could indeed be applied to support the human cyberdefender in their decisions, and contribute to a safer, more secure cyberspace.

This emerging field still needs plenty of research and in-depth consideration. It might also be beneficial to further explore enhancing the systems, e.g., by the application of graphs, or combining the method with other cybersecurity tools.

Furthermore, in the course of the study, a list of all the mentioned recommender system types was created, making it the most up-to-date and comprehensive division, as of this day.

Author Contributions

Investigation, R.K.; Supervision, R.S.C.; Writing—original draft, A.P.; Writing—review and editing, M.P. All authors have read and agreed to the published version of the manuscript.

This work is supported by the Ensuresec project, which has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 883242.

Institutional Review Board Statement

Informed consent statement, data availability statement, conflicts of interest.

The authors declare no conflict of interest.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Help | Advanced Search

Computer Science > Human-Computer Interaction

Title: a llm-based controllable, scalable, human-involved user simulator framework for conversational recommender systems.

Abstract: Conversational Recommender System (CRS) leverages real-time feedback from users to dynamically model their preferences, thereby enhancing the system's ability to provide personalized recommendations and improving the overall user experience. CRS has demonstrated significant promise, prompting researchers to concentrate their efforts on developing user simulators that are both more realistic and trustworthy. The emergence of Large Language Models (LLMs) has marked the onset of a new epoch in computational capabilities, exhibiting human-level intelligence in various tasks. Research efforts have been made to utilize LLMs for building user simulators to evaluate the performance of CRS. Although these efforts showcase innovation, they are accompanied by certain limitations. In this work, we introduce a Controllable, Scalable, and Human-Involved (CSHI) simulator framework that manages the behavior of user simulators across various stages via a plugin manager. CSHI customizes the simulation of user behavior and interactions to provide a more lifelike and convincing user interaction experience. Through experiments and case studies in two conversational recommendation scenarios, we show that our framework can adapt to a variety of conversational recommendation settings and effectively simulate users' personalized preferences. Consequently, our simulator is able to generate feedback that closely mirrors that of real users. This facilitates a reliable assessment of existing CRS studies and promotes the creation of high-quality conversational recommendation datasets.

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Research Paper Recommender Systems on Big Scholarly Data

  • Conference paper
  • First Online: 27 July 2018
  • Cite this conference paper

research papers on recommender systems

  • Tsung Teng Chen 15 &
  • Maria Lee 16  

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11016))

Included in the following conference series:

  • Pacific Rim Knowledge Acquisition Workshop

2021 Accesses

11 Citations

Rapidly growing scholarly data has been coined Big Scholarly Data (BSD), which includes hundreds of millions of authors, papers, citations, and other scholarly information. The effective utilization of BSD may expedite various research-related activities, which include research management, collaborator discovery, expert finding and recommender systems. Research paper recommender systems using smaller datasets have been studied with inconclusive results in the past. To facilitate research to tackle the BSD challenge, we built an analytic platform and developed a research paper recommender system. The recommender system may help researchers find research papers closely matching their interests. The system is not only capable of recommending proper papers to individuals based on his/her profile, but also able to recommend papers for a research field using the aggregated profiles of researchers in the research field.

The BSD analytic platform is hosted on a computer cluster running data center operating system and initiated its data using Microsoft Academic Graph (MAG) dataset, which includes citation information from more than 126 million academic articles and over 528 million citation relationships between these articles. The research paper recommender system was implemented using Scala programming language and algorithms supplemented by Spark MLib. The performance of the recommender system is evaluated by the recall rate of the Top-N recommendations. The recall rates fall in the range of 0.3 to 0.6. Our recommender system currently bears the same limitation as other systems that are based on user-based collaborative filtering mechanisms. The cold-start problem can be mitigated by supplementing it with the item-based collaborative filtering mechanism.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

research papers on recommender systems

Hybrid Recommendation System for Scientific Literature

research papers on recommender systems

A hybrid recommendation system for researchgate academic social network

research papers on recommender systems

Scientific Item Recommendation Using a Citation Network

Beel, J., et al.: Research-paper recommender systems: a literature survey. Int. J. Digit. Libr. 17 (4), 305–338 (2016)

Article   Google Scholar  

Rich, E.: User modeling via stereotypes. Cogn. Sci. 3 (4), 329–354 (1979)

Wu, Z., et al.: Towards building a scholarly big data platform: challenges, lessons and opportunities. In: Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries, London, United Kingdom, pp. 117–126. IEEE Press (2014)

Google Scholar  

Khan, S., et al.: A survey on scholarly data: From big data perspective. Inf. Process. Manag. 53 (4), 923–944 (2017)

Xia, F., et al.: Big scholarly data: a survey. IEEE Trans. Big Data 3 (1), 18–35 (2017)

Sesagiri Raamkumar, A., Foo, S., Pang, N.: Using author-specified keywords in building an initial reading list of research papers in scientific paper retrieval and recommender systems. Inf. Process. Manag. 53 (3), 577–594 (2017)

Haruna, K., et al.: A collaborative approach for research paper recommender system. PLoS ONE 12 (10), e0184516 (2017)

Ismail, A.S., Al-Feel, H.: Digital library recommender system on Hadoop. In: Proceedings of the 2015 IEEE 4th Symposium on Network Cloud Computing and Applications, pp. 111–114. IEEE Computer Society (2015)

Xia, F., et al.: Scientific article recommendation: exploiting common author relations and historical preferences. IEEE Trans. Big Data 2 (2), 101–112 (2016)

Son, J., Kim, S.B.: Academic paper recommender system using multilevel simultaneous citation networks. Decis. Support Syst. 105 , 24–33 (2018)

Sinha, A., et al.: An overview of microsoft academic service (MAS) and applications. In: Proceedings of the 24th International Conference on World Wide Web, Florence, Italy, pp. 243–246. ACM (2015)

Open Academic Society (2017). https://www.openacademic.ai/ . Accessed 3 Jan 2018

Hu, Y., Koren, Y., Volinsky, C.: Collaborative filtering for implicit feedback datasets. In: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, pp. 263–272. IEEE Computer Society (2008)

Cano, V.: Citation behavior: classification, utility, and location. J. Am. Soc. Inf. Sci. 40 (4), 284–290 (1989)

Lutz, B., Hans-Dieter, D.: What do citation counts measure? A review of studies on citing behavior. J. Doc. 64 (1), 45–80 (2008)

McNee, S.M., et al.: On the recommending of citations for research papers. In: Proceedings of the 2002 ACM Conference on Computer Supported Cooperative Work, New Orleans, Louisiana, USA, pp. 116–125. ACM (2002)

Singh, D., Reddy, C.K.: A survey on platforms for big data analytics. J. Big Data 2 (1), 8 (2014)

Zachariah, D., et al.: Alternating least-squares for low-rank matrix reconstruction. IEEE Signal Process. Lett. 19 (4), 231–234 (2012)

Cremonesi, P., Koren, Y., Turrin, R.: Performance of recommender algorithms on top-n recommendation tasks. In: Proceedings of the Fourth ACM Conference on Recommender Systems. ACM (2010)

Wang, C., Blei, D.M.: Collaborative topic modeling for recommending scientific articles. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, California, USA, pp. 448–456. ACM (2011)

Download references

Author information

Authors and affiliations.

National Taipei University, New Taipei City, Taiwan

Tsung Teng Chen

Shih Chien University, Taipei, Taiwan

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Maria Lee .

Editor information

Editors and affiliations.

University of Tsukuba , Tokyo, Japan

Kenichi Yoshida

Shih Chien University, Taipei City, Taiwan

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Cite this paper.

Chen, T.T., Lee, M. (2018). Research Paper Recommender Systems on Big Scholarly Data. In: Yoshida, K., Lee, M. (eds) Knowledge Management and Acquisition for Intelligent Systems. PKAW 2018. Lecture Notes in Computer Science(), vol 11016. Springer, Cham. https://doi.org/10.1007/978-3-319-97289-3_20

Download citation

DOI : https://doi.org/10.1007/978-3-319-97289-3_20

Published : 27 July 2018

Publisher Name : Springer, Cham

Print ISBN : 978-3-319-97288-6

Online ISBN : 978-3-319-97289-3

eBook Packages : Computer Science Computer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Research Paper Recommender Systems: A Random-Walk Based Approach

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

IMAGES

  1. Recommender system architecture.

    research papers on recommender systems

  2. (PDF) Scienstein: A Research Paper Recommender System

    research papers on recommender systems

  3. Classification of recommender systems

    research papers on recommender systems

  4. (PDF) Recommender Systems: An Overview, Research Trends, and Future

    research papers on recommender systems

  5. Taxonomy of recommender systems in the development phase.

    research papers on recommender systems

  6. (PDF) Recommender Systems Challenges and Solutions Survey

    research papers on recommender systems

VIDEO

  1. Fairness in Recommender Systems: Research Landscape and Future Directions

  2. 1.4.2. Embedding Users and Items

  3. 16.5 Personalized Ranking for Recommender Systems

  4. 16.10 Deep Factorization Machines

  5. #recommendation based on #topics of #interest

  6. [Lab Seminar] Leveraging Large Language Models in Conversational Recommender Systems (2023)

COMMENTS

  1. A systematic review and research perspective on recommender systems

    The nature of research in recommender systems is such that it is difficult to confine each paper to a specific discipline. This can be further understood by the fact that research papers on recommender systems are scattered across various journals such as computer science, management, marketing, information technology and information science ...

  2. (PDF) Recommender Systems: An Overview, Research Trends, and Future

    Recommender System (RS) has emerged as a major research interest that aims to help users to find items online by providing suggestions that closely match their interests. This paper provides a ...

  3. A systematic review on food recommender systems

    Most recommender systems use content-based filtering and predict recommendations with various machine learning algorithms. ... (25.4%) of the total 67 papers. Research from the past five years is retrieved, and a generally increasing trend of publications can be inferred. There is a dip in publications in 2019 and 2020. A total of 15 (22.4% ...

  4. Systematic Review of Recommendation Systems for Course Selection

    This systematic literature review (SLR) assesses various recommender system methodologies used to suggest course selection tracks, aiming to determine the most effective evidence-based approach ...

  5. Recommender systems: Trends and frontiers

    The six papers in this special issue push the current frontiers in recommender systems and address several of the challenges of open questions outlined above. In their article, Jannach and Chen ( 2022) elaborate why building a conversational recommender system is difficult, and consider such systems a "Grand AI Challenge".

  6. Artificial intelligence in recommender systems

    In this position paper, we review eight fields of AI, introduce their applications in recommender systems, discuss the open research issues, and give directions of possible future research on how AI techniques will be applied in recommender systems. This paper highlights how the recommender system can be enhanced by AI techniques and aims to ...

  7. ACM TRANSACTIONS ON RECOMMENDER SYSTEMS Home

    ACM Transactions on Recommender Systems (TORS) publishes high quality papers that address various aspects of recommender systems research, from algorithms to the user experience, to questions of the impact and value of such systems, on a quarterly basis.The journal takes a holistic view on the field and calls for contributions from different subfields of computer science and information ...

  8. Recent Developments in Recommender Systems: A Survey

    In this technical survey, we comprehensively summarize the latest advancements in the field of recommender systems. The objective of this study is to provide an overview of the current state-of-the-art in the field and highlight the latest trends in the development of recommender systems. The study starts with a comprehensive summary of the main taxonomy of recommender systems, including ...

  9. [2302.02579] Recommender Systems: A Primer

    In this paper, we first provide an overview of the traditional formulation of the recommendation problem. We then review the classical algorithmic paradigms for item retrieval and ranking and elaborate how such systems can be evaluated. Afterwards, we discuss a number of recent developments in recommender systems research, including research on ...

  10. Research-paper recommender systems: a literature survey

    In the last 16 years, more than 200 research articles were published about research-paper recommender systems. We reviewed these articles and present some descriptive statistics in this paper, as well as a discussion about the major advancements and shortcomings and an overview of the most common recommendation concepts and approaches. We found that more than half of the recommendation ...

  11. Scientific paper recommendation systems: a literature review of recent

    Beel J Gipp B Langer S Breitinger C Research-paper recommender systems: a literature survey Int. J. Digit. Libr. 2016 17 4 305 338 10.1007/s00799-015-0156- Google Scholar Digital Library; 17. Beel, J., Langer, S.: A Comparison of offline evaluations, online evaluations, and user studies in the context of research-paper recommender systems.

  12. Evaluating Recommender Systems: Survey and Framework

    A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation. In Proceedings of the International Workshop on Reproducibility and Replication in Recommender Systems Evaluation (RepSys'13). ACM, 7 - 14. DOI: Google Scholar Digital Library

  13. Systematic Review of Recommendation Systems for Course Selection

    Table 13 shows the dataset description and information for the 13 research papers that presented various novel approaches for recommender systems. Ten research papers out of thirteen provided information about the performed preprocessing steps. Additionally, the same number of papers provided information about the data-splitting method.

  14. Recommender systems in the healthcare domain: state-of-the-art and

    In this article, we provide a systematic overview of existing research on healthcare recommender systems. Different from existing related overview papers, our article provides insights into recommendation scenarios and recommendation approaches. Examples thereof are food recommendation, drug recommendation, health status prediction, healthcare ...

  15. Recommendation systems: Principles, methods and evaluation

    Recommender systems solve this problem by searching through large volume of dynamically generated information to provide users with personalized content and services. This paper explores the different characteristics and potentials of different prediction techniques in recommendation systems in order to serve as a compass for research and ...

  16. Deep Learning-Based Recommender Systems Research Progress: A

    Deep Learning-Based Recommender Systems (DLRS) represent a prominent research area in the academic community. This paper aims to conduct a bibliometric and visualization analysis using the VOSviewer software, based on 1,435 DLRS-related publications retrieved from the Web of Science database. By analyzing the existing literature, this study investigates the quantity of DLRS papers, their ...

  17. Recommender systems: An overview of different approaches to

    This paper presents an overview of the field of recommender systems and describes the present generation of recommendation methods. Recommender systems or recommendation systems (RSs) are a subset of information filtering system and are software tools and techniques providing suggestions to the user according to their need. Many popular Ecommerce sites widely use RSs to recommend news, music ...

  18. A collaborative approach for research paper recommender system

    Research paper recommenders emerged over the last decade to ease finding publications relating to researchers' area of interest. The challenge was not just to provide researchers with very rich publications at any time, any place and in any form but to also offer the right publication to the right researcher in the right way. Several approaches exist in handling paper recommender systems ...

  19. A Systematic Review of Recommender Systems and Their Applications in

    The answer to Research Question 2: While analyzing the sources, the authors noticed that the papers did not present a unified taxonomy of recommender system types. By gathering and organizing the types presented in the analyzed papers, this work proposes a new, up-to-date list of recommender system types.

  20. Research-paper recommender systems: a literature survey

    Finally, few research papers had an impact on research-paper recommender systems in practice. We also identified a lack of authority and long-term research interest in the field: 73 % of the authors published no more than one paper on research-paper recommender systems, and there was little cooperation among different co-author groups.

  21. A Systematic Study on the Recommender Systems in the E-Commerce

    The present article illustrates a comprehensive and Systematic Literature Review (SLR) regarding the papers published in the field of e-commerce recommender systems. We reviewed the selected papers to identify the gaps and significant issues of the RSs' traditional methods, which guide the researchers to do future work. So, we provided the ...

  22. A LLM-based Controllable, Scalable, Human-Involved User Simulator

    Conversational Recommender System (CRS) leverages real-time feedback from users to dynamically model their preferences, thereby enhancing the system's ability to provide personalized recommendations and improving the overall user experience. CRS has demonstrated significant promise, prompting researchers to concentrate their efforts on developing user simulators that are both more realistic ...

  23. Research Paper Recommender Systems on Big Scholarly Data

    The paper column stores the number of papers published in the research field of machine learning and recommender systems, respectively. Taking the recommender systems field as an example, it includes 5,431 papers that were authored or co-authored by 10,281 distinct scholars and contained 32,545 references.

  24. Recipe Recommender System Using BERTopic Modelling Technique

    A systematic review on various recent contributions in the domain of recommender systems, focusing on diverse applications like books, movies, products, etc, provides a much-needed overview of the current state of research in this field.

  25. Research Paper Recommender Systems: A Random-Walk Based Approach

    Research Paper Recommender Systems: A Random-Walk Based Approach Abstract: Every day researchers from all over the world have to filter the huge mass of existing research papers with the crucial aim of finding out useful publications related to their current work. In this paper we propose a research paper recommending algorithm based on the ...

  26. Pay researchers to spot errors in published papers

    Our project, Estimating the Reliability and Robustness of Research (ERROR), pays specialists to check highly cited published papers, starting with the social and behavioural sciences (see go ...