Help | Advanced Search

Computer Science > Computers and Society

Title: a review of data mining in personalized education: current trends and future prospects.

Abstract: Personalized education, tailored to individual student needs, leverages educational technology and artificial intelligence (AI) in the digital age to enhance learning effectiveness. The integration of AI in educational platforms provides insights into academic performance, learning preferences, and behaviors, optimizing the personal learning process. Driven by data mining techniques, it not only benefits students but also provides educators and institutions with tools to craft customized learning experiences. To offer a comprehensive review of recent advancements in personalized educational data mining, this paper focuses on four primary scenarios: educational recommendation, cognitive diagnosis, knowledge tracing, and learning analysis. This paper presents a structured taxonomy for each area, compiles commonly used datasets, and identifies future research directions, emphasizing the role of data mining in enhancing personalized education and paving the way for future exploration and innovation.

Submission history

Access paper:.

  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

To read this content please select one of the options below:

Please note you do not have access to teaching notes, educational data mining: a systematic review of research and emerging trends.

Information Discovery and Delivery

ISSN : 2398-6247

Article publication date: 19 May 2020

Issue publication date: 10 October 2020

Educational data mining (EDM) and learning analytics, which are highly related subjects but have different definitions and focuses, have enabled instructors to obtain a holistic view of student progress and trigger corresponding decision-making. Furthermore, the automation part of EDM is closer to the concept of artificial intelligence. Due to the wide applications of artificial intelligence in assorted fields, the authors are curious about the state-of-art of related applications in Education.

Design/methodology/approach

This study focused on systematically reviewing 1,219 EDM studies that were searched from five digital databases based on a strict search procedure. Although 33 reviews were attempted to synthesize research literature, several research gaps were identified. A comprehensive and systematic review report is needed to show us: what research trends can be revealed and what major research topics and open issues are existed in EDM research.

Results show that the EDM research has moved toward the early majority stage; EDM publications are mainly contributed by “actual analysis” category; machine learning or even deep learning algorithms have been widely adopted, but collecting actual larger data sets for EDM research is rare, especially in K-12. Four major research topics, including prediction of performance, decision support for teachers and learners, detection of behaviors and learner modeling and comparison or optimization of algorithms, have been identified. Some open issues and future research directions in EDM field are also put forward.

Research limitations/implications

Limitations for this search method include the likelihood of missing EDM research that was not captured through these portals.

Originality/value

This systematic review has not only reported the research trends of EDM but also discussed open issues to direct future research. Finally, it is concluded that the state-of-art of EDM research is far from the ideal of artificial intelligence and the automatic support part for teaching and learning in EDM may need improvement in the future work.

  • Educational data mining
  • Learning analytics
  • Systematic review
  • Prediction of performance
  • Decision support
  • Artificial intelligence

Acknowledgements

Conflict of interest: The authors have declared no conflicts of interest for this article.

This study was supported by National Natural Science Foundation of China Under Grant No. 61877027.

Du, X. , Yang, J. , Hung, J.-L. and Shelton, B. (2020), "Educational data mining: a systematic review of research and emerging trends", Information Discovery and Delivery , Vol. 48 No. 4, pp. 225-236. https://doi.org/10.1108/IDD-09-2019-0070

Emerald Publishing Limited

Copyright © 2020, Emerald Publishing Limited

Related articles

We’re listening — tell us what you think, something didn’t work….

Report bugs here

All feedback is valuable

Please share your general feedback

Join us on our journey

Platform update page.

Visit emeraldpublishing.com/platformupdate to discover the latest news and updates

Questions & More Information

Answers to the most commonly asked questions here

Data Mining in Education: A Review of Current Practices

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Journal of Educational Data Mining

About the journal.

The Journal of Educational Data Mining (JEDM; ISSN: 2157-2100; see indexing ) is published by the International Educational Data Mining Society (IEDMS) . It is an international and interdisciplinary forum of research on computational approaches for analyzing electronic repositories of student data to answer educational questions. It is completely and permanently free and open-access to both authors and readers . 

  • ►Processes or methodologies followed to analyse educational data
  • ►Integrating data mining with pedagogical theories
  • ►Describing the way findings are used for improving educational software or teacher support
  • ►Improving understanding of learners' domain representations
  • ►improving assessment of learners' engagement in the learning tasks

Former Editors:

Current Issue

  • Archive Issues
  • Announcements

research papers on data mining in education

Published: 2023-12-26

Editorial Acknowledgment

Examining algorithmic fairness for first- term college grade prediction models relying on pre-matriculation data, modelling argument quality in technology-mediated peer instruction, extended articles from the edm 2023 conference, automated search improves logistic knowledge tracing, surpassing deep learning in accuracy and explainability.

  • Vol 15, No 3 (2023)
  • Vol 15, No 2 (2023)
  • Vol 15, No 1 (2023)
  • Vol 14, No 3 (2022)
  • Vol 14, No 2 (2022)
  • Vol 14, No 1 (2022)
  • Vol 13, No 4 (2021)
  • Vol 13, No 3 (2021)
  • Vol 13, No 2 (2021)
  • Vol 13, No 1 (2021)
  • Vol 12, No 4 (2020)
  • Vol 12, No 3 (2020)
  • Vol 12, No 2 (2020)
  • Vol 12, No 1 (2020)
  • Vol 11, No 3 (2019)
  • Vol 11, No 2 (2019)
  • Vol 11, No 1 (2019)
  • Vol 10, No 3 (2018)
  • Vol 10, No 2 (2018)
  • Vol 10, No 1 (2018)
  • Vol 9, No 2 (2017)
  • Vol 9, No 1 (2017)
  • Vol 8, No 2 (2016)
  • Vol 8, No 1 (2016)
  • Vol 7, No 3 (2015)
  • Vol 7, No 2 (2015)
  • Vol 7, No 1 (2015)
  • Vol 6, No 1 (2014)
  • Vol 5, No 2 (2013)
  • Vol 5, No 1 (2013)
  • Vol 4, No 1 (2012)
  • Vol 3, No 1 (2011)
  • Vol 2, No 1 (2010)
  • Vol 1, No 1 (2009)

Advertisement

Advertisement

A bibliometric analysis of Educational Data Mining studies in global perspective

  • Published: 02 September 2023
  • Volume 29 , pages 8961–8985, ( 2024 )

Cite this article

research papers on data mining in education

  • Gizem Dilan Boztaş   ORCID: orcid.org/0000-0002-4593-032X 1 ,
  • Muhammet Berigel   ORCID: orcid.org/0000-0001-5682-8956 2 &
  • Fahriye Altınay   ORCID: orcid.org/0000-0002-3861-6447 3  

1566 Accesses

Explore all metrics

Educational Data Mining (EDM) is an interdisciplinary field that encapsulates different fields such as computer science, education, and statistics. It is crucial to make data mining in education to shape future trends in education for policymakers, researchers, and educators in terms of developments. To have an all-inclusive understanding of EDM studies, a comprehensive examination of both the intellectual and social structure of the field with a global perspective and its evolution over time is required to provide adequate comprehension of the past, present, and as well future research direction for this research field. In this respect, this research study aims to explore the performance analysis of the EDM, the intellectual structure of the EDM, the social structure of the area, and the temporal modeling of the EDM through bibliometric analysis. In this bibliometric analysis, the existing body of knowledge that covered 2010–2021 was presented on educational data mining to provide future directions in the field of education. The results of the study showed the number of publications increased by 1325% (8 to 114) over the years. It has been determined that the most influential journals in the field are “Computers & Education”, “International Journal of Advanced Computer Science and Applications” and “Educational Technology & Society” and the most influential authors are “C. Romero”, “S. Ventura” and “R.S. Barker”, and the USA, Spain, and China seem to be the most influential countries in the field of EDM. The themes of “CLASSIFICATION” and “SYSTEMS” in the first sub-period (2010–2013) of EDM, and “LEARNING ANALYTICS”, “PREDICTION OF ACADEMIC PERFORMANCE” and “SOCIAL MEDIA” in the second sub-term (2014–2017) and the third and last sub-period (2018–2021), “PREDICTION OF ACADEMIC PERFORMANCE”, “MACHINE LEARNING”, “PROCESS MINING”, “PARTICIPATION”, “KNN”, “J48” and “BAYESIAN NETWORK” themes were identified as engine themes. In addition, as a result of the thematic evolution map, it was discovered that the themes “DEEP-LEARNING (DL)” and “EMOTION” are emerging themes for future studies for shaping the future of education based on sustainable goals.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

research papers on data mining in education

Similar content being viewed by others

research papers on data mining in education

Educational Data Mining: A Systematic Review of the Published Literature 2006-2013

research papers on data mining in education

Uncovering trend-based research insights on teaching and learning in big data

research papers on data mining in education

Education big data and learning analytics: a bibliometric analysis

Data availability.

The datasets generated during and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Adeniji, B. A. (2019). A bibliometric study on learning analytics. Master’s Thesis Long Island University, New York, NY, USA.

Akgün, E., & Patan Öztürk, B. (2020). Science mapping research on educational data mining: a bibliometric review of international publications. Future Visions Journal, 4 (14), 1–17. https://doi.org/10.29345/futvis.160

Almasri, A., Obaid, T., Abumandil, M. S. S., Eneizan, B., Mahmoud, A. Y., & Abu-Naser, S. S. (2023). Mining Educational Data to Improve Teachers’ Performance BT - International Conference on Information Systems and Intelligent Applications (M. Al-Emran, M. A. Al-Sharafi, & K. Shaalan (eds.); pp. 243–255). Springer International Publishing. https://doi.org/10.1007/978-3-031-16865-9_20

Azevedo, A., & Azevedo, J. M. (2021). Learning analytics: A bibliometric analysis of the literature over the last decade. International Journal of Educational Research Open , 2 (November), 100084. https://doi.org/10.1016/j.ijedro.2021.100084

Article   Google Scholar  

Baek, C., & Doleck, T. (2022). Educational Data Mining: A bibliometric analysis of an emerging field. IEEE Access: Practical Innovations, Open Solutions, 10 , 31289–31296. https://doi.org/10.1109/ACCESS.2022.3160457

Baker, R. S. J. D., & Yacef, K. (2009). The state of educational data mining in 2009: A review and future visions.  Journal of Educational Data Mining , 1 (1), 3–16. https://doi.org/10.5281/zenodo.3554657

Bakhshinategh, B., Zaiane, O. R., ElAtia, S., & Ipperciel, D. (2018). Educational data mining applications and tasks: A survey of the last 10 years. Education and Information Technologies , 23 (1), 537–553. https://doi.org/10.1007/s10639-017-9616-z

Callon, M., Courtial, J. P., & Laville, F. (1991). Co-word analysis as a tool for describing the network of interactions between basic and technological research: The case of polymer chemistry. Scientometrics , 22 (1), 155–205. https://doi.org/10.1007/BF02019280

Cobo, M. J., López-Herrera, A. G., Herrera-Viedma, E., & Herrera, F. (2011). An approach for detecting, quantifying, and visualizing the evolution of a research field: A practical application to the fuzzy Sets Theory field. Journal of Informetrics , 5 (1), 146–166. https://doi.org/10.1016/J.JOI.2010.10.002

Dormezil, S., Khoshgoftaar, T., & Robinson-Bryant, F. (2020). Differentiating between educational data mining and learning analytics: A bibliometric approach. CEUR Workshop Proceedings, 2592 (May), 17–22.

Du, X., Yang, J., Hung, J. L., & Shelton, B. (2020). Educational data mining: A systematic review of research and emerging trends. Information Discovery and Delivery , 48 (4), 225–236. https://doi.org/10.1108/IDD-09-2019-0070

Feng, G., Fan, M., & Ao, C. (2022). Exploration and visualization of learning behavior patterns from the perspective of Educational process mining. IEEE Access: Practical Innovations, Open Solutions, 10 , 65271–65283. https://doi.org/10.1109/ACCESS.2022.3184111

Ganepola, D. (2022). Assessment of learner emotions in online learning via educational process mining. Proceedings - Frontiers in Education Conference, FIE, 2022-Octob , 11–13. https://doi.org/10.1109/FIE56618.2022.9962490

Hamoud, A. K., Dahr, J. M., Najim, I. A., Kamel, M. B., Hashim, A. S., Awadh, W. A., & Humadi, A. M. (2021). Supervised learning algorithms in Educational Data Mining: A systematic review. Southeast Europe Journal of Soft Computing , 10 (1), 55–70. https://doi.org/10.21533/scjournal.v10i1.199.g191

Harzing, A. W. (2007). Publish or Perish. Retrieved October 9, 2022, from https://harzing.com/resources/publish-or-perish

Hernández-Blanco, A., Herrera-Flores, B., Tomás, D., & Navarro-Colorado, B. (2019). A systematic review of deep learning approaches to Educational Data Mining. Complexity , 2019. https://doi.org/10.1155/2019/1306039

Hung, H. C., Liu, I. F., Liang, C. T., & Su, Y. S. (2020). Applying educational data mining to explore students’ learning patterns in the flipped learning approach for coding education. Symmetry , 12 (2), 1–14. https://doi.org/10.3390/sym12020213

International Educational Data Mining Society (2011). International Educational Data Mining Society . International Educational Data Mining Society. Retrieved September 4, 2022, from https://educationaldatamining.org/

Jena, R. K. (2019). Sentiment mining in a collaborative learning environment: Capitalizing on big data. Behaviour and Information Technology , 38 (9), 986–1001. https://doi.org/10.1080/0144929X.2019.1625440

Lee, S. J., & Siau, K. (2001). A review of data mining techniques. Industrial Management \& Data Systems , 401 (1), 41–46. https://doi.org/10.1108/02635570110365989

Mahboob, K., Asif, R., & Haider, N. (2023). Quality enhancement at higher education institutions by early identifying students at risk using data mining. Mehran University Research Journal of Engineering and Technology , 42 (1), 120–136. https://doi.org/10.22581/muet1982.2301.12

Martínez, M. A., Cobo, M. J., Herrera, M., & Herrera-Viedma, E. (2015). Analyzing the scientific evolution of social work using science mapping. Research on Social Work Practice , 25 (2), 257–277. https://doi.org/10.1177/1049731514522101

Mougiakou, S., Vinatsella, D., Sampson, D., Papamitsiou, Z., Giannakos, M., & Ifenthaler, D. (2023a). Educational data analytics for teachers and school leaders. Springer Nature . https://doi.org/10.1007/978-3-031-15266-5

Mougiakou, S., Vinatsella, D., Sampson, D., Papamitsiou, Z., Giannakos, M., & Ifenthaler, D. (2023b). Online and blended teaching and learning supported by educational data. In Educational Data Analytics for Teachers and School Leaders . Springer International Publishing. https://doi.org/10.1007/978-3-031-15266-5_1

Murgado-Armenteros, E. M., Gutiérrez-Salcedo, M., Torres-Ruiz, F. J., & Cobo, M. J. (2015). Analysing the conceptual evolution of qualitative marketing research through science mapping analysis. Scientometrics , 102 (1), 519–557. https://doi.org/10.1007/s11192-014-1443-z

Nilashi, M., Abumalloh, R. A., Zibarzani, M., Samad, S., Zogaan, W. A., Ismail, M. Y., Mohd, S., & Akib, N. A. M. (2022). What factors influence student’s satisfaction in massive open online courses? Findings from user-generated content using Educational Data Mining. Education and Information Technologies , 27 (7), 9401–9435. https://doi.org/10.1007/s10639-022-10997-7

Nkomo, L. M., & Nat, M. (2021). Student engagement patterns in a blended learning environment: an Educational Data Mining.  Approach TechTrends , 65 (5), 808–817. https://doi.org/10.1007/s11528-021-00638-0

Noyons, E. C. M., Moed, H. F., & Luwel, M. (1999a). Combining mapping and citation analysis for evaluative bibliometric purposes: A bibliometric study. Journal of the American Society for Information Science , 50 (2), 115–131. https://doi.org/10.1002/(SICI)1097-4571(1999)50:2<115::AID-ASI3>3.0.CO;2-J

Noyons, E. C. M., Moed, H. F., & Van Raan, A. F. J. (1999b). Integrating research performance analysis and science mapping. Scientometrics , 46 (3), 591–604. https://doi.org/10.1007/BF02459614

Olson, D. L., & Delen, D. (2008). Advanced data mining techniques. Springer Science \& Business Media. https://doi.org/10.1007/978-3-540-76917-0

Peña-Ayala, A. (2014). Educational data mining: A survey and a data mining-based analysis of recent works. Expert Systems with Applications, 41 (4 PART 1), 1432–1462. https://doi.org/10.1016/j.eswa.2013.08.042

Perianes-Rodriguez, A., Waltman, L., & van Eck, N. J. (2016). Constructing bibliometric networks: A comparison between full and fractional counting. Journal of Informetrics , 10 (4), 1178–1195. https://doi.org/10.1016/j.joi.2016.10.006

Romero, C., & Ventura, S. (2010). Educational Data Mining: a review of the state of the art. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 40 (6), 601–618. https://doi.org/10.1109/TSMCC.2010.2053532

Romero, C., & Ventura, S. (2013). Data mining in education. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 3 (1), 12–27. https://doi.org/10.1002/widm.1075

Romero, C., & Ventura, S. (2020). Educational data mining and learning analytics: An updated survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery , 10 (3), 1–21. https://doi.org/10.1002/widm.1355

Sánchez, A., Vidal-silva, C., Mancilla, G., Tupac-yupanqui, M., & Rubio, J. M. (2023). Sustainable e-Learning by data mining — Successful results in a Chilean University. Sustainability , 15 (2), 1–16. https://doi.org/10.3390/su15020895

Shafiq, D. A., Marjani, M., Habeeb, R. A. A., & Asirvatham, D. (2022). Student retention using Educational Data Mining and predictive analytics: a systematic literature.  Review IEEE Access , 10 , 72480–72503. https://doi.org/10.1109/ACCESS.2022.3188767

Small, H. (1999). Visualizing science by citation mapping. Journal of the American Society for Information Science, 50 (9), 799–813. https://doi.org/10.1002/(SICI)1097-4571(1999)50:9<799::AID-ASI9>3.0.CO;2-G

Talan, T., & Demirbilek, M. (2022). Bibliometric analysis of research on learning analytics based on web of science database. Informatics in Education.   https://doi.org/10.15388/infedu.2023.02

Waheed, H., Hassan, S. U., Aljohani, N. R., & Wasif, M. (2018). A bibliometric perspective of learning analytics research landscape. Behaviour and Information Technology , 37 (10–11), 941–957. https://doi.org/10.1080/0144929X.2018.1467967

White, H. D., & Griffith, B. C. (1981). Author cocitation: A literature measure of intellectual structure. Journal of the Association for Information Science and Technology (JASIST) , 32 (3), 163–237. https://doi.org/10.1002/asi.4630320302

Wu, Y., Gumabay, M. V. N., & Wang, J. (2023). Student achievement predictive analytics based on Educational Data Mining. International Conference on Big Data Engineering and Technology , 60–74. https://doi.org/10.1007/978-3-031-17548-0_6

Xiao, Z., Xu, X., Xing, H., Luo, S., Dai, P., & Zhan, D. (2021). RTFN: A robust temporal feature network for time series classification. Information Sciences , 571 , 65–86. https://doi.org/10.1016/j.ins.2021.04.053

Article   MathSciNet   Google Scholar  

Xiao, W., Ji, P., & Hu, J. (2022). A survey on educational data mining methods used for predicting students’ performance. Engineering Reports , 4 (5), 1–23. https://doi.org/10.1002/eng2.12482

Xing, H., Xiao, Z., Qu, R., Zhu, Z., & Zhao, B. (2022). An efficient federated distillation learning system for multitask time series classification. IEEE Transactions on Instrumentation and Measurement, 71 , 1–12. https://doi.org/10.1109/TIM.2022.3201203

Yu, B., Mao, W., Lv, Y., Zhang, C., & Xie, Y. (2022). A survey on federated learning in data mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery , 12 (1), e1443. https://doi.org/10.1002/widm.1443

Zorić, A. B. (2020). Benefits of Educational Data Mining. Journal of International Business Research and Marketing , 6 (1), 12–16. https://doi.org/10.18775/jibrm.1849-8558.2015.61.3002

Zupic, I., & Čater, T. (2015). Bibliometric methods in management and organization. Organizational Research Methods, 18 (3), 429–472. https://doi.org/10.1177/1094428114562629

Download references

Author information

Authors and affiliations.

Digital Transformation Office, Karadeniz Technical University, Trabzon, 61080, Turkey

Gizem Dilan Boztaş

Management Information Systems, Faculty of Economics and Administrative Sciences, Karadeniz Technical University, Trabzon, 61080, Turkey

Muhammet Berigel

Institute of Graduate Studies, Societal Research and Development Center, Near East University, Nicosia, Northern part Cyprus , Mersin 10, Turkey

Fahriye Altınay

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Gizem Dilan Boztaş .

Ethics declarations

Human and animal rights and informed consent.

The study does not include any human participants or animals.

Conflict of interest

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

figure 10

Strategic diagram of all sub-periods

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Boztaş, G.D., Berigel, M. & Altınay, F. A bibliometric analysis of Educational Data Mining studies in global perspective. Educ Inf Technol 29 , 8961–8985 (2024). https://doi.org/10.1007/s10639-023-12170-0

Download citation

Received : 12 December 2022

Accepted : 23 August 2023

Published : 02 September 2023

Issue Date : May 2024

DOI : https://doi.org/10.1007/s10639-023-12170-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Bibliometric analysis
  • Scientific mapping
  • Educational Data Mining
  • Find a journal
  • Publish with us
  • Track your research
  • Open access
  • Published: 03 March 2022

Educational data mining: prediction of students' academic performance using machine learning algorithms

  • Mustafa Yağcı   ORCID: orcid.org/0000-0003-2911-3909 1  

Smart Learning Environments volume  9 , Article number:  11 ( 2022 ) Cite this article

54k Accesses

111 Citations

38 Altmetric

Metrics details

Educational data mining has become an effective tool for exploring the hidden relationships in educational data and predicting students' academic achievements. This study proposes a new model based on machine learning algorithms to predict the final exam grades of undergraduate students, taking their midterm exam grades as the source data. The performances of the random forests, nearest neighbour, support vector machines, logistic regression, Naïve Bayes, and k-nearest neighbour algorithms, which are among the machine learning algorithms, were calculated and compared to predict the final exam grades of the students. The dataset consisted of the academic achievement grades of 1854 students who took the Turkish Language-I course in a state University in Turkey during the fall semester of 2019–2020. The results show that the proposed model achieved a classification accuracy of 70–75%. The predictions were made using only three types of parameters; midterm exam grades, Department data and Faculty data. Such data-driven studies are very important in terms of establishing a learning analysis framework in higher education and contributing to the decision-making processes. Finally, this study presents a contribution to the early prediction of students at high risk of failure and determines the most effective machine learning methods.

Introduction

The application of data mining methods in the field of education has attracted great attention in recent years. Data Mining (DM) is the discovery of data. It is the field of discovering new and potentially useful information or meaningful results from big data (Witten et al., 2011 ). It also aims to obtain new trends and new patterns from large datasets by using different classification algorithms (Baker & Inventado, 2014 ).

Educational data mining (EDM) is the use of traditional DM methods to solve problems related to education (Baker & Yacef, 2009 ; cited in Fernandes et al., 2019 ). EDM is the use of DM methods on educational data such as student information, educational records, exam results, student participation in class, and the frequency of students' asking questions. In recent years, EDM has become an effective tool used to identify hidden patterns in educational data, predict academic achievement, and improve the learning/teaching environment.

Learning analytics has gained a new dimension through the use of EDM (Waheed et al., 2020 ). Learning analytics covers the various aspects of collecting student information together, better understanding the learning environment by examining and analysing it, and revealing the best student/teacher performance (Long & Siemens, 2011 ). Learning analytics is the compilation, measurement and reporting of data about students and their contexts in order to understand and optimize learning and the environments in which it takes place. It also deals with the institutions developing new strategies.

Another dimension of learning analytics is predicting student academic performance, uncovering patterns of system access and navigational actions, and determining students who are potentially at risk of failing (Waheed et al., 2020 ). Learning management systems (LMS), student information systems (SIS), intelligent teaching systems (ITS), MOOCs, and other web-based education systems leave digital data that can be examined to evaluate students' possible behavior. Using EDM method, these data can be employed to analyse the activities of successful students and those who are at risk of failure, to develop corrective strategies based on student academic performance, and therefore to assist educators in the development of pedagogical methods (Casquero et al., 2016 ; Fidalgo-Blanco et al., 2015 ).

The data collected on educational processes offer new opportunities to improve the learning experience and to optimize users' interaction with technological platforms (Shorfuzzaman et al., 2019 ). The processing of educational data yields improvements in many areas such as predicting student behaviour, analytical learning, and new approaches to education policies (Capuano & Toti, 2019 ; Viberg et al., 2018 ). This comprehensive collection of data will not only allow education authorities to make data-based policies, but also form the basis of software to be developed with artificial intelligence on the learning process.

EDM enables educators to predict situations such as dropping out of school or less interest in the course, analyse internal factors affecting their performance, and make statistical techniques to predict students' academic performance. A variety of DM methods are employed to predict student performance, identify slow learners, and dropouts (Hardman et al., 2013 ; Kaur et al., 2015 ). Early prediction is a new phenomenon that includes assessment methods to support students by proposing appropriate corrective strategies and policies in this field (Waheed et al., 2020 ).

Especially during the pandemic period, learning management systems, quickly put into practice, have become an indispensable part of higher education. While students use these systems, the log records produced have become ever more accessible. (Macfadyen & Dawson, 2010 ; Kotsiantis et al., 2013 ; Saqr et al., 2017 ). Universities now should improve the capacity of using these data to predict academic success and ensure student progress (Bernacki et al., 2020 ).

As a result, EDM provides the educators with new information by discovering hidden patterns in educational data. Using this model, some aspects of the education system can be evaluated and improved to ensure the quality of education.

In various studies on EDM, e-learning systems have been successfully analysed (Lara et al., 2014 ). Some studies have also classified educational data (Chakraborty et al., 2016 ), while some have tried to predict student performance (Fernandes et al., 2019 ).

Asif et al. ( 2017 ) focused on two aspects of the performance of undergraduate students using DM methods. The first aspect is to predict the academic achievements of students at the end of a four-year study program. The second one is to examine the development of students and combine them with predictive results. He divided the students into two parts as low achievement and high achievement groups. He have found that it is important for the educators to focus on a small number of courses indicating particularly good or poor performance in order to offer timely warnings, support underperforming students and offer advice and opportunities to high-performing students. Cruz-Jesus et al. ( 2020 ) predicted student academic performance with 16 demographics such as age, gender, class attendance, internet access, computer possession, and the number of courses taken. Random forest, logistic regression, k-nearest neighbours and support vector machines, which are among the machine learning methods, were able to predict students’ performance with accuracy ranging from 50 to 81%.

Fernandes et al. ( 2019 ) developed a model with the demographic characteristics of the students and the achievement grades obtained from the in-term activities. In that study, students' academic achievement was predicted with classification models based on Gradient Boosting Machine (GBM). The results showed that the best qualities for estimating achievement scores were the previous year's achievement scores and unattendance. The authors found that demographic characteristics such as neighbourhood, school and age information were also potential indicators of success or failure. In addition, he argued that this model could guide the development of new policies to prevent failure. Similarly, by using the student data requested during registration and environmental factors, Hoffait and Schyns ( 2017 ) determined the students with the potential to fail. He found that students with potential difficulties could be classified more precisely by using DM methods. Moreover, their approach makes it possible to rank the students by levels of risk. Rebai et al. ( 2020 ) proposed a machine learning-based model to identify the key factors affecting academic performance of schools and to determine the relationship between these factors. He concluded that the regression trees showed that the most important factors associated with higher performance were school size, competition, class size, parental pressure, and gender proportions. In addition, according to the random forest algorithm results, the school size and the percentage of girls had a powerful impact on the predictive accuracy of the model.

Ahmad and Shahzadi, ( 2018 ) proposed a machine learning-based model to find an answer to the question whether students were at risk regarding their academic performance. Using the students' learning skills, study habits, and academic interaction features, they made a prediction with a classification accuracy of 85%. The researchers concluded that the model they proposed could be used to determine academically unsuccessful student. Musso et al., ( 2020 ) proposed a machine learning model based on learning strategies, perception of social support, motivation, socio-demographics, health condition, and academic performance characteristics. With this model, he predicted the academic performance and dropouts. He concluded that the predictive variable with the highest effect on predicting GPA was learning strategies while the variable with the greatest effect on determining dropouts was background information.

Waheed et al., ( 2020 ) designed a model with artificial neural networks on students' records related to their navigation through the LMS. The results showed that demographics and student clickstream activities had a significant impact on student performance. Students who navigated through courses performed higher. Students' participation in the learning environment had nothing to do with their performance. However, he concluded that the deep learning model could be an important tool in the early prediction of student performance. Xu et al. ( 2019 ) determined the relationship between the internet usage behaviors of university students and their academic performance and he predicted students’ performance with machine learning methods. The model he proposed predicted students' academic performance at a high level of accuracy. The results suggested that Internet connection frequency features were positively correlated with academic performance, whereas Internet traffic volume features were negatively correlated with academic performance. In addition, he concluded that internet usage features had an important role on students' academic performance. Bernacki et al. ( 2020 ) tried to find out whether the log records in the learning management system alone would be sufficient to predict achievement. He concluded that the behaviour-based prediction model successfully predicted 75% of those who would need to repeat a course. He also stated that, with this model, students who might be unsuccessful in the subsequent semesters could be identified and supported. Burgos et al. ( 2018 ) predicted the achievement grades that the students might get in the subsequent semesters and designed a tool for students who were likely to fail. He found that the number of unsuccessful students decreased by 14% compared to previous years. A comparative analysis of studies predicting the academic achievement grades using machine learning methods is given in Table 1 .

A review of previous research that aimed to predict academic achievement indicates that researchers have applied a range of machine learning algorithms, including multiple, probit and logistic regression, neural networks, and C4.5 and J48 decision trees. However, random forests (Zabriskie et al., 2019 ), genetic programming (Xing et al., 2015 ), and Naive Bayes algorithms (Ornelas & Ordonez, 2017 ) were used in recent studies. The prediction accuracy of these models reaches very high levels.

Prediction accuracy of student academic performance requires an deep understanding of the factors and features that impact student results and the achievement of student (Alshanqiti & Namoun, 2020 ). For this purpose, Hellas et al. ( 2018 ) reviewed 357 articles on student performance detailing the impact of 29 features. These features were mainly related to psychomotor skills such as course and pre-course performance, student participation, student demographics such as gender, high school performance, and self-regulation. However, the dropout rates were mainly influenced by student motivation, habits, social and financial issues, lack of progress, and career transitions.

The literature review suggests that, it is a necessity to improve the quality of education by predicting the academic performance of the students and supporting those who are in the risk group. In the literature, the prediction of academic performance was made with many and various variables, various digital traces left by students on the internet (browsing, lesson time, percentage of participation) (Fernandes et al., 2019 ; Rubin et al., 2010 ; Waheed et al., 2020 ; Xu et al., 2019 ) and students demographic characteristics (gender, age, economic status, number of courses attended, internet access, etc.) (Bernacki et al., 2020 ; Rizvi et al., 2019 ; García-González & Skrita, 2019 ; Rebai et al., 2020 ; Cruz-Jesus et al., 2020 ; Aydemir, 2017 ), learning skills, study approaches, study habits (Ahmad & Shahzadi, 2018 ), learning strategies, social support perception, motivation, socio-demography, health form, academic performance characteristics (Costa-Mendes et al., 2020 ; Gök, 2017 ; Kılınç, 2015 ; Musso et al., 2020 ), homework, projects, quizzes (Kardaş & Güvenir, 2020 ), etc. In almost all models developed in such studies, prediction accuracy is ranging from 70 to 95%. Hovewer, collecting and processing such a variety of data both takes a lot of time and requires expert knowledge. Similarly, Hoffait and Schyns ( 2017 ) suggested that collecting so many data is difficult and socio-economic data are unnecessary. Moreover, these demographic or socio-economic data may not always give the right idea of preventing failure (Bernacki et al., 2020 ).

The study concerns predicting students’ academic achievement using grades only, no demographic characteristics and no socio-economic data. This study aimed to develop a new model based on machine learning algorithms to predict the final exam grades of undergraduate students taking their midterm exam grades, Faculty and Department of the students.

For this purpose, classification algorithms with the highest performance in predicting students’ academic achievement were determined by using machine learning classification algorithms. The reason for choosing the Turkish Language-I course was that it is a compulsory course that all students enrolled in the university must take. Using this model, students’ final exam grades were predicted. These models will enable the development of pedagogical interventions and new policies to improve students' academic performance. In this way, the number of potentially unsuccessful students can be reduced following the assessments made after each midterm.

This section describes the details of the dataset, pre-processing techniques, and machine learning algorithms employed in this study.

Educational institutions regularly store all data that are available about students in electronic medium. Data are stored in databases for processing. These data can be of many types and volumes, from students’ demographics to their academic achievements. In this study, the data were taken from the Student Information System (SIS), where all student records are stored at a State University in Turkey. In these records, the midterm exam grades, final exam grades, Faculty, and Department of 1854 students who have taken the Turkish Language-I course in the 2019–2020 fall semester were selected as the dataset. Table 2 shows the distribution of students according to the academic unit. Moreover, as a additional file 1 the dataset are presented.

Midterm and final exam grades are ranging from 0 to 100. In this system, the end-of-semester achievement grade is calculated by taking 40% of the midterm exam and 60% of the final exam. Students with achievement grade below 60 are unsuccessful and those above 60 are successful. The midterm exam is usually held in the middle of the academic semester and the final exam is held at the end of the semester. There are approximately 9 weeks (2.5 months) from the midterm exam to the final exam. In other words, there is a two and a half month period for corrective actions for students who are at risk of failing thanks to the final exam predictions made. In other words, the answer to the question of how effective the student's performance in the middle of the semester is on his performance at the end of the semester was investigated.

Data identification and collection

At this phase, it is determined from which source the data will be stored, which features of the data will be used, and whether the collected data is suitable for the purpose. Feature selection involves decreasing the number of variables used to predict a particular outcome. The goal; to facilitate the interpretability of the model, reduce complexity, increase the computational efficiency of algorithms, and avoid overfitting.

Establishing DM model and implementation of algorithm

RF, NN, LR, SVM, NB and kNN were employed to predict students' academic performance. The prediction accuracy was evaluated using tenfold cross validation. The DM process serves two main purposes. The first purpose is to make predictions by analyzing the data in the database (predictive model). The second one is to describe behaviors (descriptive model). In predictive models, a model is created by using data with known results. Then, using this model, the result values are predicted for datasets whose results are unknown. In descriptive models, the patterns in the existing data are defined to make decisions.

When the focus is on analysing the causes of success or failure, statistical methods such as logistic regression and time series can be employed (Ortiz & Dehon, 2008 ; Arias Ortiz & Dehon, 2013 ). However, when the focus is on forecasting, neural networks (Delen, 2010 ; Vandamme et al., 2007 ), support vector machines (Huang & Fang, 2013 ), decision trees (Delen, 2011 ; Nandeshwar et al., 2011 ) and random forests (Delen, 2010 ; Vandamme et al., 2007 ) is more efficient and give more accurate results. Statistical techniques are to create a model that can successfully predict output values based on available input data. On the other hand, machine learning methods automatically create a model that matches the input data with the expected target values when a supervised optimization problem is given.

The performance of the model was measured by confusion matrix indicators. It is understood from the literature that there is no single classifier that works best for prediction results. Therefore, it is necessary to investigate which classifiers are more studied for the analysed data (Asif et al., 2017 ).

Experiments and results

The entire experimental phase was performed with Orange machine learning software. Orange is a powerful and easy-to-use component-based DM programming tool for expert data scientists as well as for data science beginners. In Orange, data analysis is done by stacking widgets into workflows. Each widget includes some data retrieval, data pre-processing, visualization, modelling, or evaluation task. A workflow is a series of actions or actions that will be performed on the platform to perform a specific task. Comprehensive data analysis charts can be created by combining different components in a workflow. Figure  1 shows the workflow diagram designed.

figure 1

The workflow of the designed model

The dataset included midterm exam grades, final exam grades, Faculty, and Department of 1854 students taking the Turkish Language-I course in the 2019–2020 Fall Semester. The entire dataset is provided as Additional file 1 . Table 3 shows part of the dataset.

In the dataset, students' midterm exam grades, final exam grades, faculty, and department information were determined as features. Each measure contains data associated with a student. Midterm exam and final exam grade variables were explained under the heading "dataset". The faculty variable represents Faculties in Kırşehir Ahi Evran University and the department variable represents departments in faculties. In the development of the model, the midterm, the faculty, and the department information were determined as the independent variable and the final was determined as the dependent variable. Table 4 shows the variable model.

After the variable model was determined, the midterm exam grades and final exam grades were categorized according to the equal-width discretization model. Table 5 shows the criteria used in converting midterm exam grades and final exam grades into the categorical format.

In Table 6 , the values in the final column are the actual values. The values in the RF, SVM, LR, KNN, NB, and NN columns are the values predicted by the proposed model. For example, according to Table 5 , std1’s actual final grade was in the range 55 to 77. While the predicted value of the RF, SVM, LR, NB, and NN models were in the range of, the predicted value of the kNN model was greater than 77.

Evaluation of the model performance

The performance of model was evaluated with confusion matrix, classification accuracy (CA), precision, recall, f-score (F1), and area under roc curve (AUC) metrics.

Confusion matrix

The confusion matrix shows the current situation in the dataset and the number of correct/incorrect predictions of the model. Table 7 shows the confusion matrix. The performance of the model is calculated by the number of correctly classified instances and incorrectly classified instances. The rows show the real numbers of the samples in the test set, and the columns represent the estimation of the model.

In Table 6 , true positive (TP) and true negative (TN) show the number of correctly classified instances. False positive (FP) shows the number of instances predicted as 1 (positive) while it should be in the 0 (negative) class. False negative (FN) shows the number of instances predicted as 0 (negative) while it should be in class 1 (positive).

Table 8 shows the confusion matrix for the RF algorithm. In the confusion matrix of 4 × 4 dimensions, the main diagonal shows the percentage of correctly predicted instances, and the matrix elements other than the main diagonal shows the percentage of errors predicted.

Table 8 shows that 84.9% of those with the actual final grade greater than 77.5, 71.2% of those with range 55–77.5, 65.4% of those with range 32.5–55, and 60% of those with less than 32.5 were predicted correctly. Confusion matrixs of other algorithms are shown in Tables 9 , 10 , 11 , 12 , and 13 .

Classification accuracy:  CA is the ratio of the correct predictions (TP + TN) to the total number of instances (TP + TN + FP + FN).

Precision: Precision is the ratio of the number of positive instances that are correctly classified to the total number of instances that are predicted positive. Gets a value in the range [0.1].

Recall: Recall i s the ratio of the correctly classified number of positive instances to the number of all instances whose actual class is positive. The Recall is also called the true positive rate. Gets a value in the range [0.1].

F-Criterion (F1):  There is an opposite relationship between precision and recall. Therefore, the harmonic mean of both criteria is calculated for more accurate and sensitive results. This is called the F-criterion.

Receiver operating characteristics (ROC) curve

The AUC-ROC curve is used to evaluate the performance of a classification problem. AUC-ROC is a widely used metric to evaluate the performance of machine learning algorithms, especially in cases where there are unbalanced datasets, and explains how well the model is at predicting.

AUC: Area under the ROC curve

The larger the area covered, the better the machine learning algorithms at distinguishing given classes. AUC for the ideal value is 1. The AUC, Classification Accuracy (CA), F-Criterion (F1), precision, and recall values of the models are shown in Table 14 .

The AUC value of RF, NN, SVM, LR, NB, and kNN algorithms were 0.860, 0.863, 0.804, 0.826, 0.810, and 0.810 respectively. The classification accuracy of the RF, NN, SVM, LR, NB, and kNN algorithms were also 0.746, 0.746, 0.735, 0.717, 0.713, and 0,699 respectively. According to these findings, for example, the RF algorithm was able to achieve 74.6% accuracy. In other words, there was a very high-level correlation between the data predicted and the actual data. As a result, 74.6% of the samples were been classified correctly.

Discussion and conclusion

This study proposes a new model based on machine learning algorithms to predict the final exam grades of undergraduate students, taking their midterm exam grades as the source data. The performances of the Random Forests, nearest neighbour, support vector machines, Logistic Regression, Naïve Bayes, and k-nearest neighbour algorithms, which are among the machine learning algorithms, were calculated and compared to predict the final exam grades of the students. This study focused on two parameters. The first parameter was the prediction of academic performance based on previous achievement grades. The second one was the comparison of performance indicators of machine learning algorithms.

The results show that the proposed model achieved a classification accuracy of 70–75%. According to this result, it can be said that students' midterm exam grades are an important predictor to be used in predicting their final exam grades. RF, NN, SVM, LR, NB, and kNN are algorithms with a very high accuracy rate that can be used to predict students' final exam grades. Furthermore, the predictions were made using only three types of parameters; midterm exam grades, Department data and Faculty data. The results of this study were compared with the studies that predicted the academic achievement grades of the students with various demographic and socio-economic variables. Hoffait and Schyns ( 2017 ) proposed a model that uses the academic achievement of students in previous years. With this model, they predicted students' performance to be successful in the courses they will take in the new semester. They found that 12.2% of the students had a very high risk of failure, with a 90% confidence rate. Waheed et al. ( 2020 ) predicted the achievement of the students with demographic and geographic characteristics. He found that it has a significant effect on students' academic performance. He predicted the failure or success of the students by 85% accuracy. Xu et al. ( 2019 ) found that internet usage data can distinguish and predict students' academic performance. Costa-Mendes et al. ( 2020 ), Cruz-Jesus et al. ( 2020 ), Costa-Mendes et al. ( 2020 ) predicted the academic achievement of students in the light of income, age, employment, cultural level indicators, place of residence, and socio-economic information. Similarly, Babić ( 2017 ) predicted students’ performance with an accuracy of 65% to 100% with artificial neural networks, classification tree, and support vector machines methods.

Another result of this study was RF, NN and SVM algorithms have the highest classification accuracy, while kNN has the lowest classification accuracy. According to this result, it can be said that RF, NN and SVM algorithms perform with more accurate results in predicting the academic achievement grades of students with machine learning algorithms. The results were compared with the results of the research in which machine learning algorithms were employed to predict academic performance according to various variables. For example, Hoffait and Schyns ( 2017 ) compared the performances of LR, ANN and RF algorithms to identify students at high risk of academic failure on their various demographic characteristics. They ranked the algorithms from those with the highest accuracy to the ones with the lowest accuracy as LR, ANN, and RF. On the other hand, Waheed et al. ( 2020 ) found that the SVM algorithm performed higher than the LR algorithm. According to Xu et al. ( 2019 ), the algorithm with the highest performance is SVM, followed by the NN algorithm, and the decision tree is the algorithm with the lowest performance.

The proposed model predicted the final exam grades of students with 73% accuracy. According to this result, it can be said that academic achievement can be predicted with this model in the future. By predicting students' achievement grades in future, students can be allowed to review their working methods and improve their performance. The importance of the proposed method can be better understood, considering that there is approximately 2.5 months between the midterm exams and the final exams in higher education. Similarly, Bernacki et al. ( 2020 ) work on the early warning model. He proposed a model to predict the academic achievements of students using their behavior data in the learning management system before the first exam. His algorithm correctly identified 75% of students who failed to earn the grade of B or better needed to advance to the next course. Ahmad and Shahzadi ( 2018 ) predicted students at risk for academic performance with 85% accuracy evaluating their study habits, learning skills, and academic interaction features. Cruz-Jesus et al. ( 2020 ) predicted students' end-of-semester grades with 16 independent variables. He concluded that students could be given the opportunity of early intervention.

As a result, students' academic performances were predicted using different predictors, different algorithms and different approaches. The results confirm that machine learning algorithms can be used to predict students’ academic performance. More importantly, the prediction was made only with the parameters of midterm grade, faculty and department. Teaching staff can benefit from the results of this research in the early recognition of students who have below or above average academic motivation. Later, for example, as Babić ( 2017 ) points out, they can match students with below-average academic motivation by students with above-average academic motivation and encourage them to work in groups or project work. In this way, the students' motivation can be improved, and their active participation in learning can be ensured. In addition, such data-driven studies should assist higher education in establishing a learning analytics framework and contribute to decision-making processes.

Future research can be conducted by including other parameters as input variables and adding other machine learning algorithms to the modelling process. In addition, it is necessary to harness the effectiveness of DM methods to investigate students' learning behaviors, address their problems, optimize the educational environment, and enable data-driven decision making.

Availability of data and materials

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

  • Educational data mining

Random forests

Neural networks

Support vector machines

Logistic regression

Naïve Bayes

K-nearest neighbour

Decision trees

Artificial neural networks

Extremely randomized trees

Regression trees

Multilayer perceptron neural network

Feed-forward neural network

Adaptive resonance theory mapping

Learning management systems

Student information systems

Intelligent teaching systems

Classification accuracy

Area under roc curve

True positive

True negative

False positive

False negative

Receiver operating characteristics

Ahmad, Z., & Shahzadi, E. (2018). Prediction of students’ academic performance using artificial neural network. Bulletin of Education and Research, 40 (3), 157–164.

Google Scholar  

Alshanqiti, A., & Namoun, A. (2020). Predicting student performance and its influential factors using hybrid regression and multi-label classification. IEEE Access, 8 , 203827–203844. https://doi.org/10.1109/access.2020.3036572

Article   Google Scholar  

Arias Ortiz, E., & Dehon, C. (2013). Roads to success in the Belgian French Community’s higher education system: predictors of dropout and degree completion at the Université Libre de Bruxelles. Research in Higher Education, 54 (6), 693–723. https://doi.org/10.1007/s11162-013-9290-y

Asif, R., Merceron, A., Ali, S. A., & Haider, N. G. (2017). Analyzing undergraduate students’ performance using educational data mining. Computers and Education, 113 , 177–194. https://doi.org/10.1016/j.compedu.2017.05.007

Aydemir, B. (2017). Predicting academic success of vocational high school students using data mining methods graduate . [Unpublished master’s thesis]. Pamukkale University Institute of Science.

Babić, I. D. (2017). Machine learning methods in predicting the student academic motivation. Croatian Operational Research Review, 8 (2), 443–461. https://doi.org/10.17535/crorr.2017.0028

Baker, R. S., & Inventado, P. S. (2014). Educational data mining and learning analytics. Learning analytics (pp. 61–75). Springer.

Chapter   Google Scholar  

Baker, R. S., & Yacef, K. (2009). The state of educational data mining in 2009: A review and future visions. Journal of Educational Data Mining, 1 (1), 3–17.

Bernacki, M. L., Chavez, M. M., & Uesbeck, P. M. (2020). Predicting achievement and providing support before STEM majors begin to fail. Computers & Education, 158 (August), 103999. https://doi.org/10.1016/j.compedu.2020.103999

Burgos, C., Campanario, M. L., De, D., Lara, J. A., Lizcano, D., & Martínez, M. A. (2018). Data mining for modeling students’ performance: A tutoring action plan to prevent academic dropout. Computers and Electrical Engineering, 66 (2018), 541–556. https://doi.org/10.1016/j.compeleceng.2017.03.005

Capuano, N., & Toti, D. (2019). Experimentation of a smart learning system for law based on knowledge discovery and cognitive computing. Computers in Human Behavior, 92 , 459–467. https://doi.org/10.1016/j.chb.2018.03.034

Casquero, O., Ovelar, R., Romo, J., Benito, M., & Alberdi, M. (2016). Students’ personal networks in virtual and personal learning environments: A case study in higher education using learning analytics approach. Interactive Learning Environments, 24 (1), 49–67. https://doi.org/10.1080/10494820.2013.817441

Chakraborty, B., Chakma, K., & Mukherjee, A. (2016). A density-based clustering algorithm and experiments on student dataset with noises using Rough set theory. In Proceedings of 2nd IEEE international conference on engineering and technology, ICETECH 2016 , March (pp. 431–436). https://doi.org/10.1109/ICETECH.2016.7569290

Costa-Mendes, R., Oliveira, T., Castelli, M., & Cruz-Jesus, F. (2020). A machine learning approximation of the 2015 Portuguese high school student grades: A hybrid approach. Education and Information Technologies, 26 , 1527–1547. https://doi.org/10.1007/s10639-020-10316-y

Cruz-Jesus, F., Castelli, M., Oliveira, T., Mendes, R., Nunes, C., Sa-Velho, M., & Rosa-Louro, A. (2020). Using artificial intelligence methods to assess academic achievement in public high schools of a European Union country. Heliyon . https://doi.org/10.1016/j.heliyon.2020.e04081

Delen, D. (2010). A comparative analysis of machine learning techniques for student retention management. Decision Support Systems, 49 (4), 498–506. https://doi.org/10.1016/j.dss.2010.06.003

Delen, D. (2011). Predicting student attrition with data mining methods. Journal of College Student Retention: Research, Theory and Practice, 13 (1), 17–35. https://doi.org/10.2190/CS.13.1.b

Fernandes, E., Holanda, M., Victorino, M., Borges, V., Carvalho, R., & Van Erven, G. (2019). Educational data mining : Predictive analysis of academic performance of public school students in the capital of Brazil. Journal of Business Research, 94 (February 2018), 335–343. https://doi.org/10.1016/j.jbusres.2018.02.012

Fidalgo-Blanco, Á., Sein-Echaluce, M. L., García-Peñalvo, F. J., & Conde, M. Á. (2015). Using Learning Analytics to improve teamwork assessment. Computers in Human Behavior, 47 , 149–156. https://doi.org/10.1016/j.chb.2014.11.050

García-González, J. D., & Skrita, A. (2019). Predicting academic performance based on students’ family environment: Evidence for Colombia using classification trees. Psychology, Society and Education, 11 (3), 299–311. https://doi.org/10.25115/psye.v11i3.2056

Gök, M. (2017). Predicting academic achievement with machine learning methods. Gazi University Journal of Science Part c: Design and Technology, 5 (3), 139–148.

Hardman, J., Paucar-Caceres, A., & Fielding, A. (2013). Predicting students’ progression in higher education by using the random forest algorithm. Systems Research and Behavioral Science, 30 (2), 194–203. https://doi.org/10.1002/sres.2130

Hellas, A., Ihantola, P., Petersen, A., Ajanovski, V.V., Gutica, M., Hynninen, T., Knutas, A., Leinonen, J., Messom, C., & Liao, S.N. (2018). Predicting academic performance: a systematic literature review. In Proceedings companion of the 23rd annual ACM conference on innovation and technology in computer science education (pp. 175–199).

Hoffait, A., & Schyns, M. (2017). Early detection of university students with potential difficulties. Decision Support Systems, 101 (2017), 1–11. https://doi.org/10.1016/j.dss.2017.05.003

Huang, S., & Fang, N. (2013). Predicting student academic performance in an engineering dynamics course: A comparison of four types of predictive mathematical models. Computers and Education, 61 (1), 133–145. https://doi.org/10.1016/j.compedu.2012.08.015

Kardaş, K., & Güvenir, A. (2020). Analysis of the effects of Quizzes, homeworks and projects on final exam with different machine learning techniques. EMO Journal of Scientific, 10 (1), 22–29.

Kaur, P., Singh, M., & Josan, G. S. (2015). Classification and prediction based data mining algorithms to predict slow learners in education sector. Procedia Computer Science, 57 , 500–508. https://doi.org/10.1016/j.procs.2015.07.372

Kılınç, Ç. (2015). Examining the effects on university student success by data mining techniques. [Unpublished master’s thesis]. Eskişehir Osmangazi University Institute of Science.

Kotsiantis, S., Tselios, N., Filippidi, A., & Komis, V. (2013). Using learning analytics to identify successful learners in a blended learning course. International Journal of Technology Enhanced Learning, 5 (2), 133–150. https://doi.org/10.1504/IJTEL.2013.059088

Lara, J. A., Lizcano, D., Martínez, M. A., Pazos, J., & Riera, T. (2014). A system for knowledge discovery in e-learning environments within the European Higher Education Area—Application to student data from Open University of Madrid, UDIMA. Computers and Education, 72 , 23–36. https://doi.org/10.1016/j.compedu.2013.10.009

Long, P., & Siemens, G. (2011). Penetrating the fog: Analytics in learning and education. Educause Review, 46 (5), 31–40.

Macfadyen, L. P., & Dawson, S. (2010). Mining LMS data to develop an “early warning system” for educators: A proof of concept. Computers & Education, 54 (2), 588–599. https://doi.org/10.1016/j.compedu.2009.09.008

Musso, M. F., Hernández, C. F. R., & Cascallar, E. C. (2020). Predicting key educational outcomes in academic trajectories: A machine-learning approach. Higher Education, 80 (5), 875–894. https://doi.org/10.1007/s10734-020-00520-7

Nandeshwar, A., Menzies, T., & Nelson, A. (2011). Learning patterns of university student retention. Expert Systems with Applications, 38 (12), 14984–14996. https://doi.org/10.1016/j.eswa.2011.05.048

Ornelas, F., & Ordonez, C. (2017). Predicting student success: A naïve bayesian application to community college data. Technology, Knowledge and Learning, 22 (3), 299–315. https://doi.org/10.1007/s10758-017-9334-z

Ortiz, E. A., & Dehon, C. (2008). What are the factors of success at University? A case study in Belgium. Cesifo Economic Studies, 54 (2), 121–148. https://doi.org/10.1093/cesifo/ifn012

Rebai, S., Ben Yahia, F., & Essid, H. (2020). A graphically based machine learning approach to predict secondary schools performance in Tunisia. Socio-Economic Planning Sciences, 70 (August 2018), 100724. https://doi.org/10.1016/j.seps.2019.06.009

Rizvi, S., Rienties, B., & Ahmed, S. (2019). The role of demographics in online learning; A decision tree based approach. Computers & Education, 137 (August 2018), 32–47. https://doi.org/10.1016/j.compedu.2019.04.001

Rubin, B., Fernandes, R., Avgerinou, M. D., & Moore, J. (2010). The effect of learning management systems on student and faculty outcomes. The Internet and Higher Education, 13 (1–2), 82–83. https://doi.org/10.1016/j.iheduc.2009.10.008

Saqr, M., Fors, U., & Tedre, M. (2017). How learning analytics can early predict under-achieving students in a blended medical education course. Medical Teacher, 39 (7), 757–767. https://doi.org/10.1080/0142159X.2017.1309376

Shorfuzzaman, M., Hossain, M. S., Nazir, A., Muhammad, G., & Alamri, A. (2019). Harnessing the power of big data analytics in the cloud to support learning analytics in mobile learning environment. Computers in Human Behavior, 92 (February 2017), 578–588. https://doi.org/10.1016/j.chb.2018.07.002

Vandamme, J.-P., Meskens, N., & Superby, J.-F. (2007). Predicting academic performance by data mining methods. Education Economics, 15 (4), 405–419. https://doi.org/10.1080/09645290701409939

Viberg, O., Hatakka, M., Bälter, O., & Mavroudi, A. (2018). The current landscape of learning analytics in higher education. Computers in Human Behavior, 89 (July), 98–110. https://doi.org/10.1016/j.chb.2018.07.027

Waheed, H., Hassan, S. U., Aljohani, N. R., Hardman, J., Alelyani, S., & Nawaz, R. (2020). Predicting academic performance of students from VLE big data using deep learning models. Computers in Human Behavior, 104 (October 2019), 106189. https://doi.org/10.1016/j.chb.2019.106189

Witten, I. H., Frank, E., & Hall, M. A. (2011). Data mining practical machine learning tools and techniques (3rd ed.). Morgan Kaufmann.

Xing, W., Guo, R., Petakovic, E., & Goggins, S. (2015). Participation-based student final performance prediction model through interpretable Genetic Programming: Integrating learning analytics, educational data mining and theory. Computers in Human Behavior, 47 , 168–181.

Xu, X., Wang, J., Peng, H., & Wu, R. (2019). Prediction of academic performance associated with internet usage behaviors using machine learning algorithms. Computers in Human Behavior, 98 (January), 166–173. https://doi.org/10.1016/j.chb.2019.04.015

Zabriskie, C., Yang, J., DeVore, S., & Stewart, J. (2019). Using machine learning to predict physics course outcomes. Physical Review Physics Education Research, 15 (2), 020120. https://doi.org/10.1103/PhysRevPhysEducRes.15.020120

Download references

Acknowledgements

Not applicable.

Author information

Authors and affiliations.

Kırşehir Ahi Evran University, Faculty of Engineering and Architecture, 40100, Kırşehir, Turkey

Mustafa Yağcı

You can also search for this author in PubMed   Google Scholar

Contributions

All authors read and approved the final manuscript.

Corresponding author

Correspondence to Mustafa Yağcı .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1:, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Yağcı, M. Educational data mining: prediction of students' academic performance using machine learning algorithms. Smart Learn. Environ. 9 , 11 (2022). https://doi.org/10.1186/s40561-022-00192-z

Download citation

Received : 15 November 2021

Accepted : 15 February 2022

Published : 03 March 2022

DOI : https://doi.org/10.1186/s40561-022-00192-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Machine learning
  • Predicting achievement
  • Learning analytics
  • Early warning systems

research papers on data mining in education

ORIGINAL RESEARCH article

This article is part of the research topic.

Education for the Future: Learning and Teaching for Sustainable Development in Education

Blending Pedagogy: Equipping Student Teachers to Foster Transversal Competencies in Future-oriented Education Provisionally Accepted

  • 1 Department of Education, Faculty of Educational Sciences, University of Helsinki, Finland

The final, formatted version of the article will be published soon.

Blended teaching and learning, combining online and face-to-face instruction, and shared reflection are gaining in popularity worldwide and present evolving challenges in the field of teacher training and education. There is also a growing need to focus on transversal competencies such as critical thinking and collaboration. This study is positioned at the intersection of blended education and transversal competencies in the context of a blended ECEC teacher-training program (1000+) at the University of Helsinki. Blended education is a novel approach to training teachers, and there is a desire to explore how such an approach supports the acquisition of transversal competencies and whether the associated methods offer something essential for the development of teacher training. The aim is to explore what transversal competencies this teacher-training program supports for future teachers, and how students reflect on their learning experiences. The data consist of documents from teacher-education curricula and essays from the students on the 1000+ program. They were content-analyzed from a scoping perspective. Students' experiences of studying enhanced the achievement of generic goals in teacher education, such as to develop critical and reflective thinking, interaction competence, collaboration skills, and independent and collective expertise. We highlight the importance of teacher development in preparing for education in the future during the teacher training. Emphasizing professional development, we challenge the conventional teaching paradigm by introducing a holistic approach.

Keywords: blended teacher training, Transversal competencies, future of education, Teacher Education, early childhood education

Received: 19 Jan 2024; Accepted: 15 May 2024.

Copyright: © 2024 Niemi, Kangas and Köngäs. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Dr. Laura H. Niemi, Department of Education, Faculty of Educational Sciences, University of Helsinki, Helsinki, 00014, Uusimaa, Finland

People also looked at

medRxiv

Coming out of the ashes we rise: Experiences of culturally and linguistically diverse international nursing students at two Australian universities during the Covid-19 pandemic

  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Eric Lim
  • For correspondence: [email protected]
  • ORCID record for Linda Ng
  • ORCID record for Huaqiong Zhou
  • ORCID record for Ambili Nair
  • ORCID record for Fatch Kalembo
  • Info/History
  • Preview PDF

Background and aim: Research on international students conducted during the COVID-19 pandemic has persistently highlighted the vulnerabilities and challenges that they experienced when staying in the host country to continue with their studies. The findings from such research can inevitably create a negative image of international students and their ability to respond to challenges during unprecedented times. Therefore, this paper took a different stance and reported on a qualitative study that explored culturally and linguistically diverse (CaLD) international nursing students who overcame the challenges brought about by the pandemic to continue with their studies in Australia. Method: A descriptive qualitative research design guided by the processes of constructivist grounded theory was selected to ascertain insights from participants' experiences of studying abroad in Australia during the COVID-19 pandemic. Results: Three themes emerged from the collected data that described the participants' lived experiences, and they were: 1) Viewing international education as the pursuit of a better life, 2) Focusing on personal growth, and 3) Coming out of the ashes we rise. Discussion: The findings highlight the importance of recognising the investments and sacrifices that CaLD international students and their families make in pursuit of international tertiary education. The findings also underscore the importance of acknowledging the qualities that CaLD international students have to achieve self-growth and ultimately self-efficacy as they stay in the host country during a pandemic. Conclusion: Future research should focus on identifying strategies that are useful for CaLD international nursing students to experience personal growth and ultimately self-efficacy and continue with their studies in the host country during times of uncertainty such as a pandemic.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This study did not receive any funding

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Ethical approval was obtained from Curtin University Human Research Ethics Office (HRE2022-0238) and The University of Southern Queensland Ethical Review Committee (H22REA114).

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Data Availability

All data produced in the present study are available upon reasonable request to the authors

View the discussion thread.

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Reddit logo

Citation Manager Formats

  • EndNote (tagged)
  • EndNote 8 (xml)
  • RefWorks Tagged
  • Ref Manager
  • Tweet Widget
  • Facebook Like
  • Google Plus One
  • Addiction Medicine (323)
  • Allergy and Immunology (627)
  • Anesthesia (163)
  • Cardiovascular Medicine (2365)
  • Dentistry and Oral Medicine (287)
  • Dermatology (206)
  • Emergency Medicine (378)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (833)
  • Epidemiology (11758)
  • Forensic Medicine (10)
  • Gastroenterology (702)
  • Genetic and Genomic Medicine (3726)
  • Geriatric Medicine (348)
  • Health Economics (632)
  • Health Informatics (2388)
  • Health Policy (929)
  • Health Systems and Quality Improvement (895)
  • Hematology (340)
  • HIV/AIDS (780)
  • Infectious Diseases (except HIV/AIDS) (13301)
  • Intensive Care and Critical Care Medicine (767)
  • Medical Education (365)
  • Medical Ethics (104)
  • Nephrology (398)
  • Neurology (3488)
  • Nursing (198)
  • Nutrition (523)
  • Obstetrics and Gynecology (673)
  • Occupational and Environmental Health (661)
  • Oncology (1819)
  • Ophthalmology (535)
  • Orthopedics (218)
  • Otolaryngology (286)
  • Pain Medicine (232)
  • Palliative Medicine (66)
  • Pathology (445)
  • Pediatrics (1031)
  • Pharmacology and Therapeutics (426)
  • Primary Care Research (420)
  • Psychiatry and Clinical Psychology (3172)
  • Public and Global Health (6133)
  • Radiology and Imaging (1276)
  • Rehabilitation Medicine and Physical Therapy (745)
  • Respiratory Medicine (825)
  • Rheumatology (379)
  • Sexual and Reproductive Health (372)
  • Sports Medicine (322)
  • Surgery (400)
  • Toxicology (50)
  • Transplantation (172)
  • Urology (145)

COMMENTS

  1. Mining Big Data in Education: Affordances and Challenges

    A broad range of data mining techniques can be utilized for big data in education, which Baker and Siemens (2014) broadly categorize into prediction methods, including inferential methods that model knowledge as it changes; structure discovery algorithms, with emphasis on discovering the structures of content and skills in an educational domain and the structures of social networks of learners ...

  2. Educational Data mining and Learning Analytics: An updated survey

    Educational Data Science (EDS) is defined as the use of data gathered from educational environments/settings for solving educational problems (Romero & Ventura, 2017). Data science is a concept to unify statistics, data analysis, machine learning and their related methods. This survey is an updated and improved version of the previous one ...

  3. PDF A Review of Data Mining in Personalized Education: Current Trends and

    data mining papers from related top-tier journals and confer - ences, among which 149 address these areas (88.7%), which highlights their prominence and frequent exploration in per - sonalized educational data mining research. Specifically, educational recommendations (ER) are crucial for analyzing learners' preferences and customizing

  4. A Review of Data Mining in Personalized Education: Current Trends and

    To offer a comprehensive review of recent advancements in personalized educational data mining, this paper focuses on four primary scenarios: educational recommendation, cognitive diagnosis, knowledge tracing, and learning analysis. This paper presents a structured taxonomy for each area, compiles commonly used datasets, and identifies future ...

  5. PDF Overview of Data Mining's Potential Benefits and Limitations in ...

    methodology for education research. Key conceptual papers cited by these articles, some outside education, were also examined when appropriate. After relevant articles were identified, each was read carefully to understand its main claims about the utility of data mining in education research and justifications for each claim. In the initial ...

  6. Educational data mining and learning analytics: An updated survey

    In the last decade, this research area has evolved enormously and a wide range of related terms are now used in the bibliography such as Academic Analytics, Institutional Analytics, Teaching Analytics, Data-Driven Education, Data-Driven Decision-Making in Education, Big Data in Education, and Educational Data Science. This paper provides the ...

  7. [PDF] Data-Mining Research in Education

    Applying data mining in education also known as educational data mining (EDM), which enables to better understand how students learn and identify how improve educational outcomes. Present paper is designed to justify the capabilities of data mining approaches in the filed of education. The latest trends on EDM research are introduced in this ...

  8. A Systematic Review on Data Mining for Mathematics and Science Education

    Educational data mining is used to discover significant phenomena and resolve educational issues occurring in the context of teaching and learning. This study provides a systematic literature review of educational data mining in mathematics and science education. A total of 64 articles were reviewed in terms of the research topics and data mining techniques used. This review revealed that data ...

  9. Educational Data Mining: A Systematic Review on the Applications of

    Educational Data Mining (EDM) is a research field that focuses on extracting valuable insights and knowledge from data in the education sector. EDM exploits Data Mining (DM) techniques such as Clustering, Regression to analyze data and make predictions that help answer questions in education. Meanwhile, Deep Learning (DL) is a subfield of machine learning that uses neural networks to solve ...

  10. Educational data mining: a systematic review of research and emerging

    This study focused on systematically reviewing 1,219 EDM studies that were searched from five digital databases based on a strict search procedure. Although 33 reviews were attempted to synthesize research literature, several research gaps were identified. A comprehensive and systematic review report is needed to show us: what research trends ...

  11. Data mining in education

    Applying data mining (DM) in education is an emerging interdisciplinary research field also known as educational data mining (EDM). It is concerned with developing methods for exploring the unique types of data that come from educational environments.

  12. Educational data mining to predict students' academic ...

    Educational data mining is an emerging interdisciplinary research area involving both education and informatics. It has become an imperative research area due to many advantages that educational institutions can achieve. Along these lines, various data mining techniques have been used to improve learning outcomes by exploring large-scale data that come from educational settings. One of the ...

  13. Predicting Student Performance Using Data Mining and Learning ...

    Feature papers represent the most advanced research with significant potential for high impact in the field. ... Singh, I. An Automated Survey Designing Tool for Indirect Assessment in Outcome Based Education Using Data Mining. In Proceedings of the 2017 5th IEEE International Conference on MOOCs, Innovation and Technology in Education (MITE ...

  14. Data Mining in Education: A Review of Current Practices

    A significant amount of data is physically kept on hard drives or virtually stored in the cloud in the real world. Data is retained for a variety of purposes, such as learning, accessing, understanding, and so forth. Large amounts of data must be stored using an excellent infrastructure, which is quite expensive. Data mining tools were made available to help with this issue. Numerous ...

  15. (PDF) Data Mining in Education

    Educational data. mining (EDM) is a method for extracting useful information. that could potentially affect an organization. The increase of. technology use in educational systems has led to the ...

  16. Journal of Educational Data Mining

    About the Journal. The Journal of Educational Data Mining (JEDM; ISSN: 2157-2100; see indexing) is published by the International Educational Data Mining Society (IEDMS). It is an international and interdisciplinary forum of research on computational approaches for analyzing electronic repositories of student data to answer educational questions.

  17. A bibliometric analysis of Educational Data Mining studies in global

    Educational Data Mining (EDM) is an interdisciplinary field that encapsulates different fields such as computer science, education, and statistics. It is crucial to make data mining in education to shape future trends in education for policymakers, researchers, and educators in terms of developments. To have an all-inclusive understanding of EDM studies, a comprehensive examination of both the ...

  18. Educational data mining: prediction of students' academic performance

    Educational data mining has become an effective tool for exploring the hidden relationships in educational data and predicting students' academic achievements. This study proposes a new model based on machine learning algorithms to predict the final exam grades of undergraduate students, taking their midterm exam grades as the source data. The performances of the random forests, nearest ...

  19. Education Data Science: Past, Present, Future

    What implications did this rise of data science as a transdisciplinary methodological toolkit have for the field of education?One means of illustrating the salience of data science in education research is to study its emergence in the Education Resources Information Center's (ERIC) publication corpus. 1 In the corpus, the growth of data science in education can be identified by the adoption ...

  20. Artificial Neural Networks for Educational Data Mining in Higher

    Educational data mining (EDM) is the analysis of huge sets of learner-related (Barneveld, Arnold, and Campbell Citation 2012; Siemens et al. Citation 2011) with the aid of methods like KDD, business intelligence, educational data mining, social network analysis, operational research, machine learning, and information visualization with the aim ...

  21. (PDF) Data-Mining Research in Education

    Educational data mining (EDM) is an emerging discipline including but not limited t o. information retrieval, recommender systems, visual data a nalytics, social network. Data-Mining Research in ...

  22. Sentiment analysis and opinion mining on educational data: A survey

    Educational data mining assists educational institutions in measuring the teaching and learning process and improving their student recruitment and retention policies. Hussain et al. (2022) proposed a decision support system based on a multi-layered Aspect2Labels (A2L) approach. It is a three-layered topic modelling approach, the first layer ...

  23. [PDF] Online Education Big Data Management and Mining Based on

    This paper presents a systematic framework for online education big data management and mining, encompassing data collection, storage, preprocessing, analysis, and visualization, and discusses the role of intelligent technologies such as artificial intelligence, natural language processing, and predictive analytics in optimizing the process of data mining and knowledge discovery.

  24. DATA MINING IN EDUCATION FOR STUDENTS ACADEMIC ...

    In this paper we analyzed the potential use of. data mining in edu cation section and su rvey the m ost relevant work in this a rea. Data Mining can be u sed for dropout. s tudents, student's ...

  25. Frontiers

    The data consist of documents from teacher-education curricula and essays from the students on the 1000+ program. They were content-analyzed from a scoping perspective. Students' experiences of studying enhanced the achievement of generic goals in teacher education, such as to develop critical and reflective thinking, interaction competence ...

  26. Information Systems IE&IS

    In order to do that, the IS group helps organizations to: (i) understand the business needs and value propositions and accordingly design the required business and information system architecture; (ii) design, implement, and improve the operational processes and supporting (information) systems that address the business need, and (iii) use advanced data analytics methods and techniques to ...

  27. Binary Matrix Factorization and Completion via Integer Programming

    Binary matrix factorization is an essential tool for identifying discrete patterns in binary data. In this paper, we consider the rank-k binary matrix factorization problem (k-BMF) under Boolean arithmetic: we are given an n × m binary matrix X with ...

  28. Coming out of the ashes we rise: Experiences of culturally and

    Background and aim: Research on international students conducted during the COVID-19 pandemic has persistently highlighted the vulnerabilities and challenges that they experienced when staying in the host country to continue with their studies. The findings from such research can inevitably create a negative image of international students and their ability to respond to challenges during ...

  29. Healthcare

    Advances in anti-retroviral therapy (ART) have decreased mortality rates and subsequently led to a rise in the number of HIV-positive people living longer. The housing experiences of this new population of interest—older adults (50 years and older) living with HIV—are under-researched. Understanding the housing experiences and unmet needs of older people with HIV can better provide ...

  30. Applied Sciences

    To address the issue of data integrity and reliability caused by sparse vessel trajectory data, this paper proposes a multi-step restoration method for sparse vessel trajectory based on feature correlation. First, we preserved the overall trend of the trajectory by detecting and marking the sparse and abnormal vessel trajectories points and using the cubic spline interpolation method for ...