Variasi Kalimat Dialek Saudi dalam Dataset MADAR sebagai Sumber Pembelajaran Mendalam

Authors

  • Fatima El Zahraa Universitas Islam Negeri Syarif Hidayatullah Jakarta Author

DOI:

https://doi.org/10.64277/semnas.v2i1.231

Keywords:

Dataset MADAR, Pembelajaran Mendalam, Variasi Dialek Arab

Abstract

Keterbatasan sumber belajar yang memuat keragaman teks tuturan autentik dalam bentuk korpus menjadi tantangan dalam pembelajaran berbasis data. Studi ini menganalisis variasi kalimat antara dialek Jeddah dan Riyadh menggunakan dataset MADAR yang mencakup 2.000 entri untuk tiap-tiap dialek. Pendekatan kuantitatif digunakan melalui analisis statistik deskriptif dan inferensial untuk menelaah panjang kalimat dan tingkat keformalan leksikal. Hasilnya menunjukkan bahwa dialek Riyadh memiliki kalimat yang sedikit lebih panjang dan tingkat keformalan yang lebih tinggi dibandingkan dialek Jeddah. Korelasi antara panjang kalimat dan keformalan leksikal teridentifikasi lemah namun signifikan secara statistik, dengan pola lebih kentara dalam dialek Riyadh. Studi ini mengaitkan temuan tersebut dengan konsep pembelajaran mendalam yang merupakan kebalikan dari surface-level processing sebagai pendekatan pembelajaran, bukan dalam konteks machine learning dan kecerdasan buatan. Korpus tersebut dapat mendorong keterampilan analitis dan interpretatif dalam pembelajaran yang berkesadaran, bermakna, menggembirakan, dan memuliakan. Hasil analisis menyoroti koineisasi di Arab Saudi dan menyertakan perspektif bahwa tingkat keformalan yang lebih tinggi cenderung dikaitkan dengan kalimat yang lebih panjang dan ketergantungan yang lebih rendah terhadap konteks. Studi ini menunjukkan potensi integrasi variasi dialek Arab dalam pembelajaran berbasis data dan diversifikasi sumber belajar.

References

Abdulrahman, R. A. I. M. (2018). Role of Corpus-Based Pragmatic Explicit Instructions in Developing Inter language Competence (Al-Neelain University). Al-Neelain University. Retrieved from https://repository.neelain.edu.sd/items/cda8eff7-cdb2-4549-b27c-d1ae73fa5318

AbuZeina, D., Wasfi Al-Khatib, M. E., & Al-Muhtaseb, H. (2011). Toward Enhanced Arabic Speech Recognition Using Part of Speech Tagging. International Journal of Speech Technology, 14, 419–426. https://doi.org/https://doi.org/10.1007/s10772-011-9121-5

Al-Ayyoub, M., Nuseir, A., Alsmearat, K., Jararweh, Y., & Gupta, B. (2018). Deep Learning for Arabic NLP: A Survey. Journal of Computational Science, 26, 522–531. https://doi.org/10.1016/j.jocs.2017.11.011

Al-Rojaie, Y. (2023). Sociolinguistics in Saudi Arabia: Present Situation and Future Directions. Journal of Arabic Sociolinguistics, 1(1), 76–97. https://doi.org/10.3366/arabic.2023.0006

Alasmari, T. (2025). Artificial Intelligence and M-Learning in Arabic Countries: Innovations, Trends, and Regional Perspectives. International Journal of Interactive Mobile Technologies, 19(5), 170–194.

Altman, S. (2017). Reinterpreting the Right to An Open Future: From Autonomy to Authenticity. Law and Philosophy, 37(4), 415–436. https://doi.org/10.1007/s10982-017-9317-1

Anshari, M., Alas, Y., & Guan, L. S. (2016). Developing Online Learning Resources: Big Data, Social Networks, and Cloud Computing to Support Pervasive Knowledge. Education and Information Technologies, 21, 1663–1677. https://doi.org/10.1007/s10639-015-9407-3

Aust, A. (2013). Vienna Convention on the Law of Treaties 1969. In Modern Treaty Law and Practice. https://doi.org/10.1017/cbo9781139152341.005

Bloom, B. S. (1956). Taxonomy of Educational Objectives The Classification of Educational Goals. Longmans.

Bouamor, H., Habash, N., Salameh, M., Zaghouani, W., Rambow, O., Abdulrahim, D., … Oflazer, K. (2018). The MADAR Arabic Dialect Corpus and Lexicon. Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 3387–3396. Retrieved from http://www.ustar-consortium.com/

Bouamor, H., Hassan, S., & Habash, N. (2019). The MADAR Shared Task on Arabic Fine-Grained Dialect Identification. Proceedings of the Fourth Arabic Natural Language Processing Workshop, 199–207. Retrieved from http://resources.camel-lab.com.

Breen, M. P. (1985). Authenticity in the Language Classroom. Applied Linguistics, 6(1), 60–70. https://doi.org/10.1093/applin/6.1.60

Buyya, R., Calheiros, R. N., & Dastjerdi, A. V. (2016). Big Data: Principle & Paradigms. In Morgan Kaufmann.

Canale, M., & Swain, M. (1980). Theoretical Bases of Communicative Approaches to Second Language Teaching and Testing. Applied Linguistics, 1(1), 1–47. https://doi.org/10.1093/applin/I.1.1

Carneiro, T., Nóbrega, R. V. M. Da, Nepomuceno, T., Bian, G.-B., Albuquerque, V. H. C. De, & Filho, P. P. R. (2018). Performance Analysis of Google Colaboratory as a Tool for Accelerating Deep Learning Applications. IEEE Access, 6, 61677–61685. https://doi.org/10.1109/ACCESS.2018.2874767

Chen, N. S., Yin, C., Isaias, P., & Psotka, J. (2020). Educational Big Data: Extracting Meaning from Data for Smart Education. Interactive Learning Environments, 28(2), 142–147. https://doi.org/10.1080/10494820.2019.1635395

Chomsky, N. (1965). Aspects of the Theory of Syntax. The MIT Press.

Cooper, C. S. (1994). Storytelling in the Basic Course for the Promotion of Cultural Diversity. New Orleans.

Costello, R. (2020). Gamification Strategies for Retention, Motivation, and Engagement in Higher Education: Emerging Research and Opportunities. IGI Global.

Craig, S., Hull, K., Haggart, A. G., & Crowder, E. (2001). Storytelling Addressing the Literacy Needs of Diverse Learners. TEACHING Exceptional Children, 33(5), 46–51. https://doi.org/10.1177/004005990103300507

Dahdouh, K., Dakkak, A., Oughdir, L., & Messaoudi, F. (2018). Big Data for Online Learning Systems. Education and Information Technologies, 23(6), 2783–2800. https://doi.org/10.1007/s10639-018-9741-3

Dewaele, J. M., & Heylighen, F. (2002). Variation in the Contextuality of Language: An Empirical Measure. Foundations of Science, 7, 293–340.

Dray, B. J., & Wisneski, D. B. (2011). Mindful Reflection as a Process for Developing Culturally Responsive Practices. TEACHING Exceptional Children, 44(1), 28–36. https://doi.org/10.1177/004005991104400104

Fairclough, N. (1992). Discourse and Text: Linguistic and Intertextual Analysis within Discourse Analysis. Discourse & Society, 3(2), 193–217. https://doi.org/10.1177/0957926592003002004

Feng, A., & Byram, M. (2002). Authenticity in College English Textbooks - an Intercultural Perspective. RELC Journal, 33(2), 58–84.

Frey, N., & Fisher, D. (2010). Motivation Requires a Meaningful Task. English Journal, 100(1), 30–36. https://doi.org/10.2307/20787688

Gilmore, A. (2007). Authentic Materials and Authenticity in Foreign Language Learning. Language Teaching, 40(2), 97–118. https://doi.org/10.1017/S0261444807004144

Godwin-Jones, R. (2017). Emerging Technologies: Scaling Up and Zooming In: Big Data and Personalization in Language Learning. Language Learning and Technology, 21(1), 4–15.

Godwin-Jones, R. (2021). Big Data and Language Learning: Opportunities and Challenges. Language Learning and Technology, 25(1), 4–19.

Habash, Nizar, Kemal Oflazer, Houda Bouamor, Owen Rambow, Mohammed Salameh, Wajdi Zaghouani, … Sabit Hassan. (n.d.). The MADAR Project. Retrieved May 19, 2025, from https://sites.google.com/nyu.edu/madar/

Han, J., Zhang, Z., Tao, J., & Zhao, Z. (2021). Deep Learning for Mobile Mental Health: Challenges and Recent Advances. IEEE Signal Processing Magazine. https://doi.org/10.1109/MSP.2021.3099293

Heylighen, F., & Dewaele, J. (1999). Formality of Language: Definition, Measurement and Behavioral Determinants. In Interner Bericht (Vol. 1999).

Holmes, J., & Wilson, N. (2022). An Introduction to Sociolinguistics, Sixth Edition. Routledge. https://doi.org/10.4324/9780367821852

Hu, X., & Yeo, G. B. (2020). Emotional Exhaustion and Reduced Self‑Efficacy: The Mediating Role of Deep and Surface Learning Strategies. Motivation and Emotion, 785–795. https://doi.org/10.1007/s11031-020-09846-2

Hymes, D. (1972). On Communicative Competence In: J.B. Pride and J. Holmes. (Eds) Sociolinguistics. Selected Readings, pp. 269–293.

Ikrish, R. F. B., Alotaibi, Y. A., & Sandouka, S. B. (2020). Extended Speech Rhythm-Based Analysis of Saudi Dialects Using SAAVB Corpus.

Proceedings of the 2020 14th International Conference on Innovations in Information Technology, IIT 2020, 172–177. https://doi.org/10.1109/IIT50501.2020.9299065

Kabli, H. M. (2023). Reduplication in Urban Hijazi Arabic Dialect Spoken in the Western Region of Saudi Arabia. English Linguistics Research, 12(1), 1–11. https://doi.org/10.5430/elr.v12n1p1

Kalinin, S. V., Sumpter, B. G., & Archibald, R. K. (2015). Big-Deep-Smart Data in Imaging for Guiding Materials Design. Nature Materials, 14, 973–980. https://doi.org/10.1038/nmat4395

Kemdikdasmen. (2025). Pembelajaran Mendalam. Pusat Kurikulum dan Pembelajaran Badan Standar, Kurikulum, dan Asesmen Pendidikan Kementerian Pendidikan Dasar dan Menengah Republik Indonesia.

Keshav, M., Julien, L., & Miezel, J. (2022). The Role Of Technology In Era 5.0 In The Development Of Arabic Language In The World Of Education. Journal International of Lingua and Technology, 1(2), 79–98. https://doi.org/10.55849/jiltech.v1i2.85

Kessler, G. (2018). Technology and the Future of Language Teaching. Foreign Language Annals, 51(1), 205–218. https://doi.org/10.1111/flan.12318

Kramsch, C. (2012). Authenticity and Legitimacy in Multilingual SLA. Critical Multilingualism Studies, 1(1), 107–128.

Marton, F., & Säljö, R. (1976). On Qualitative Differences in Learning: I—Outcome and Process. British Journal of Educational Psychology, 46(1), 4–11. https://doi.org/10.1111/j.2044-8279.1976.tb02980.x

Masmoudi, A., Mdhaffar, S., Sellami, R., & Hadrich, L. (2019). Automatic Diacritics Restoration for Tunisian Dialect. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 18(3), 1–18. https://doi.org/https://doi.org/10.1145/3297278

McHugh, M. C. (2016). Experiencing Flow: Creativity and Meaningful Task Engagement for Senior Women. Women and Therapy, 39(3–4), 280–295. https://doi.org/10.1080/02703149.2016.1116862

Merino-Campos, C. (2025). Enhancing Physical Education Through Gamification and Ergonomics: A Literature Review. Theoretical and Applied Ergonomics, 1(1), 1–12. https://doi.org/https://doi.org/10.3390/tae1010003

Nisred Gasharov, Heyrullah Mahmudov, Dibirasulav Nurmagomedov, Elmira Ramazanova, & Nasrudin Magomedov. (2019). Divergent Tasks as a Means of Developing the Creativity of Younger Students. Atlantic Press, 267–271. https://doi.org/10.2991/mplg-ia-19.2019.50

Nissenbaum, H. (2004). Privacy as Contextual Integrity. Washington Law Review, 79(1), 119–157. Retrieved from https://nyuscholars.nyu.edu/en/publications/privacy-as-contextual-integrity

Obeid, O., Khalifa, S., Habash, N., Bouamor, H., Zaghouani, W., & Oflazer, K. (2018). MADARi: A Web Interface for Joint Arabic Morphological Annotation and Spelling Correction. ArXiv Preprint ArXiv:1808.08392, 2616–2622.

Omar, M. K. (1975). Saudi Arabic Urban Hijazi Dialect. Foreign Service Institute.

Qamarya, Nurul, Andi Fitriani Djollong, Siti Mukaromah, Ely Widayati, Adi Isma, Rabiyatul Adawiyah, … Edy Siswanto. (2023). Model Pembelajaran. Eureka Media Aksara.

Republik Indonesia. (1945). Undang Undang Dasar Negara Republik Indonesia Tahun 1945 (yang dipadukan dengan perubahan I, II, III dan IV). Retrieved from https://bphn.go.id/data/documents/uud_1945.pdf

Sarwani, S., Maria Susan, Ariya Pannaditthana Candra, Fatimah Setiani, Fatima El Zahraa, Veni Nella Syahputri, … Sabrina. (2025). Bahasa dan Literasi dalam Pendidikan Modern Pendekatan Inovatif dan Keterampilan Abad 21. Eureka Media Aksara.

Sayadi, A. A., Abulohoom, A., Uddin, S. A., Lataifeh, M., & Elnagar, A. (2021). Region-Level Arabic Dialect Identification Using Deep Learning Models. IET Conference Proceedings, 288–301. The Institution of Engineering and Technology; The 2nd International

Conference on Distributed Sensing and Intelligent Systems (ICDSIS 2021). https://doi.org/https://doi.org/10.1049/icp.2021.2683

Sedkaoui, S., & Khelfaoui, M. (2018). Understand, Develop and Enhance the Learning Process with Big Data. Information Discovery and Delivery, 47(1), 2–16. https://doi.org/10.1108/IDD-09-2018-0043

Sharma, K., Mahesh, T. R., & Bhuvana, J. (2021). Big Data Technology for Developing Learning Resources. Journal of Physics: Conference Series, 1979(1). https://doi.org/10.1088/1742-6596/1979/1/012019

Smirani, L., & Yamani, H. (2024). Analysing the Impact of Gamification Techniques on Enhancing Learner Engagement, Motivation, and Knowledge Retention: A Structural Equation Modelling Approach. Electronic Journal of E-Learning, 22(9), 111–124. https://doi.org/https://doi.org/10.34190/ejel.22.9.3563

Sullivan, T. N. (2020). Mindful Reflection: Does Intentional Reflection Enhance Learner Creativity and Innovation? (Northcentral University). Northcentral University. Retrieved from https://www.proquest.com/openview/707d5caa3cb8f981a9454c74ea14c90a/1?pq-origsite=gscholar&cbl=51922&diss=y

Sweet, H. (1877). Handbook of Phonetics. Clarendon Press.

T. Scott Bledsoe, & Kimberly A. Setterlund. (2020). Using Narratives and Storytelling to Promote Cultural Diversity on College Campuses Advances in Higher Education and Professional Development. IGI Global.

The European Parliament and the Council. (1996). Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the Legal Protection of Databases. Official Journal of the European Communities, 20–28. Retrieved from https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A31996L0009

UNESCO. (2015). Global Citizenship Education: Topics and Learning Objectives. In Global citizenship education: topics and learning objectives. https://doi.org/10.54675/drhc3544

van Lier, L. (2013). Interaction in the Language Curriculum: Awareness, Autonomy, and Authenticity. Routledge. https://doi.org/10.4324/9781315843223

Vygotsky, L., & M. Cole. (2018). Lev Vygotsky: Learning and Social Constructivism. Learning Theories for Early Years Practice, 68–73.

Wei, T. (2024). The Role of English in Promoting International Cultural Exchange in the Context of Big Data and Deep Learning. Journal of Computational Methods in Sciences and Engineering, 24, 369–384. https://doi.org/10.3233/JCM-237021

Widdowson, H. G. (1998). Context, Community, and Authentic Language. TESOL Quarterly, 32(4), 705–716.

Williamson, B. (2017). Who Owns Educational Theory? Big Data, Algorithms and the Expert Power of Education Data Science. E-Learning and Digital Media, 14(3), 1–18. https://doi.org/10.1177/2042753017731238

Yanhui, W. (2018). Language E-learning Based on Learning Analytics in Big Data Era. Proceedings of the 2018 International Conference on Big Data and Education, 106–111. https://doi.org/10.1145/3206157.3206177

Younes, M. (2014). The Integrated Approach to Arabic Instruction. Routledge. https://doi.org/https://doi.org/10.4324/9781315740614

Zahraa, Fatima El, Muhamad Kosim Gifari, Rabiyatul Adawiyah, Iyam Maryati, Nurma Yunita Peu’uma, Nandang Kusmana, …

Muhammad Arsyam. (2024). Perencanaan Pendidikan: Konsep dan Langkah Strategis. Eureka Media Aksara.

Zahraa, F. El. (2024). Sociolinguistic Competence and Sociolinguistic Appropriateness in the Context of Arabic Language Based on the Common European Framework of Reference for Languages Scale. Al-Ittijah: Jurnal Keilmuan Dan Kependidikan Bahasa Arab, 16(2), 1–24. https://doi.org/10.32678/alittijah.v16i2.10882

Zahraa, F. El. (2025). Leveraging Artificial Intelligence and Digital Technologies to Enhance Socolinguistic Competence and Arabic Language Skills. International Collaborative Conference on Multidisciplinary Science, 8–19. International Forum of Researchers and Lecturers. https://doi.org/https://doi.org/10.70062/iccms.v2i1.65

Zahraa, F. El, Nawawi, M., & Muttaqin, Z. (2024). Sociolinguistic Content in Training Courses: Pidgin Arabic and White Dialect for Bus Drivers. Tanwir Arabiyyah: Arabic As Foreign Language Journal, 4(2), 211–232. https://doi.org/https://doi.org/10.31869/aflj.v4i2.6012

Zayyan, A. A., Elmahdy, M., Husni, H., & Al, J. M. (2016). Automatic Diacritics Restoration for Dialectal Arabic Text. International Journal of Computing & Information Science, 12(2), 159–165. https://doi.org/http://dx.doi.org/10.21700/ijcis.2016.119

Downloads

Published

2025-06-29

How to Cite

Variasi Kalimat Dialek Saudi dalam Dataset MADAR sebagai Sumber Pembelajaran Mendalam. (2025). Prosiding Seminar Nasional Fakultas Ilmu Tarbiyah Dan Keguruan UIN Syarif Hidayatullah Jakarta, 2(1), 18-36. https://doi.org/10.64277/semnas.v2i1.231