Peran Dosen Sebagai Korektor dalam Model Human-in-the-Loop (HITL) untuk Meningkatkan Akurasi Evaluasi Pembelajaran Berbasis Artificial Intelligence

Authors

Keywords:

AI-based learning evaluation, Human-in-the-Loop , HITL , Artificial Intelligence, the role of lecturers

Abstract

Artificial Intelligence (AI)-based learning evaluation is efficient but lacks nuance and risks bias, and the integration of lecturer assessments into the system remains unclear. This study aims to systematically review Human-in-the-Loop (HITL) models to map lecturer roles and measure the impact of their interventions. The study used a literature review of Google Scholar, IEEE, ACM Digital Library, Scopus, dan ERIC databases (2016–2025) with empirical inclusion criteria; 15 studies were analyzed. The results show that while AI improves evaluation efficiency, three lecturer roles initiator, supervisor, and facilitator generally do not directly improve model accuracy. Conversely, the corrector role, which utilizes lecturer feedback for retraining, has the greatest potential for accuracy improvement, but empirical evidence remains limited. Therefore, a shift from simply “Human-in-the-Loop” to a structured feedback mechanism based on Intelligence Augmentation that enables lecturers to contribute to the continuous improvement of Artificial Intelligence models is needed.

Downloads

Download data is not yet available.

References

Amann, J., Blasimme, A., Vayena, E., Frey, D., & Madai, VI (2020). Explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC Medical Informatics and Decision Making, 20 (1), 1–9. https://doi.org/10.1186/s12911-020-01332-6

Bai, JYH, Zawacki-Richter, O., Bozkurt, A., Lee, K., Fanguy, M., Cefa Sari, B., & Marin, VI (2022). Automated Essay Scoring (AES) Systems: Opportunities and Challenges for Open and Distance Education. Conference: Tenth Pan-Commonwealth Forum on Open Learning (PCF10), September 1–10. https://doi.org/10.56059/pcf10.8339

Colonna, L. (2024). Teachers in the loop? An analysis of automatic assessment systems under Article 22 GDPR. International Data Privacy Law, 14 (1), 3–18. https://doi.org/10.1093/idpl/ipad024

Correnti, R., Matsumura, L.C., Wang, E.L., Litman, D., & Zhang, H. (2022). Building a validity argument for an automated writing evaluation system (eRevise) as a formative assessment. Computers and Education Open, 3 (February), 1–15. https://doi.org/10.1016/j.caeo.2022.100084

Dede, C., Etemadi, A., & Forshaw, T. (2021). Intelligence augmentation: Upskilling humans to complement AI (The Next Level Lab at the Harvard Graduate School of Education).

Dikli, S. (2006). An overview of automated scoring of essays. Journal of Technology, Learning, and Assessment, 5 (1), 1–35.

Jiang, Z., Liu, M., Yin, Y., Yu, H., Cheng, Z., & Gu, Q. (2021). Learning from graph propagation via ordinal distillation for one-shot automated essay scoring. IW3C2 (International World Wide Web Conference Committee), 2347–2356. https://doi.org/10.1145/3442381.3450017

Kumar, V., & Boulanger, D. (2020). Explainable Automated Essay Scoring: Deep Learning Really Has Pedagogical Value. Frontiers in Education, 5 (October), 1–22. https://doi.org/10.3389/feduc.2020.572367

Liang, L. (2025). Bridging Human Intelligence Augmentation (IA) and Classroom Practices via GenAI in Learning Engineering: A Response to Dr. Dede's Keynote Speeches on IA 2022-2025 Li (Lee) Liang, University of Sydney.

Litman, D., Zhang, H., Correnti, R., Matsumura, L. C., & Wang, E. (2021). A Fairness Evaluation of Automated Methods for Scoring Text Evidence Usage in Writing. In the International Conference on Artificial Intelligence in Education (AIED). Springer International Publishing. https://doi.org/10.1007/978-3-030-78292-4_21

Liu, R., & Koedinger, K. R. (2017). Closing the loop: Automated data-driven cognitive model discoveries lead to improved instruction and learning gains. Journal of Educational Data Mining, 9 (1), 25–41.

Liu, Z., Guo, Y., & Mahmud, J. (2021). When and Why Does a Model Fail? A Human-in-the-Loop Error Detection Framework for Sentiment Analysis. Human Language Technologies, Industry Papers, 170–177. https://doi.org/10.18653/v1/2021.naacl-industry.22

Maadi, M., Khorshidi, H. A., & Aickelin, U. (2021). A review on human–ai interaction in machine learning and insights for medical applications. International Journal of Environmental Research and Public Health, 18 (4), 1–27. https://doi.org/10.3390/ijerph18042121

Matsumura, L.C., Wang, E.L., Correnti, R., & Litman, D. (2022). Designing Automated Writing Evaluation Systems for Ambitious Instruction and Classroom Integration. In AHA Fan Ouyang, Pengcheng Jiao, Bruce M. McLaren (Ed.), Artificial Intelligence in STEM Education: The Paradigmatic Shifts in Research, Education, and Technology (1st Edition, pp. 195–208). CRC Press. https://doi.org/10.1201/9781003181187

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why Should I Trust You?” Explaining the Predictions of Any Classifier. NAACL-HLT 2016 - 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Demonstrations Session, 97–101. https://doi.org/10.18653/v1/n16-3020

Downloads

Published

2025-08-25

How to Cite

Peran Dosen Sebagai Korektor dalam Model Human-in-the-Loop (HITL) untuk Meningkatkan Akurasi Evaluasi Pembelajaran Berbasis Artificial Intelligence. (2025). Advances In Education Journal , 2(1), 429-436. https://journal.al-afif.org/index.php/aej/article/view/208

Similar Articles

41-50 of 111

You may also start an advanced similarity search for this article.