• Saturday, 8 June, 2024
  • Almaty 82 °F / 28 °C
  • Astana 75 °F / 24 °C

Kazakh News

  • All Stories
  • Kazakhstan Region Profiles: A Deep Dive Into the Heart of Central Asia
  • State of the Nation
  • Election 2022
  • Election 2023
  • Astana International Forum
  • Kazakhstan’s Presidency in SCO
  • Central Asia

Kazakh Language is Gaining Increasing Popularity, But Needs Greater Support to Sustain Interest, Says Expert

By Aibarshyn Akhmetkali in Society on 17 August 2022

NUR-SULTAN – People in Kazakhstan are increasingly becoming more interested in learning the Kazakh language in the aftermath of recent political events in Kazakhstan and in the world, says Kanat Tasibekov, a Kazakh language advocate and the author of the “Situational Kazakh” series of self-tutorials, in an interview to the Kazakhstanskaya Pravda newspaper.

kazakh language essay

Kanat Tasibekov. Photo credit: pricom.kz

After the fall of the Soviet Union, Kazakhstan had a large share of Russian speakers consisting of ethnic Russians and Kazakhs raised in a predominantly Russian-speaking environment who did not speak or spoke very little Kazakh.

In accordance with the Constitution, the Kazakh language has the status of the state language, while the Russian language has the status of an official language.

According to Tasibekov, language is much more than just a tool to communicate, but rather one of the fundamentals that fosters a deep connection among people. “When you talk to a person in his native language, you talk to his heart…and we have more and more people who speak only the state language. When one or two words are spoken in Kazakh, it is perceived very positively. Emotional contact and trustful relations are established,” said Tasibekov.

The expert identified several groups of people who show increasing interest in improving their ability to speak in their mother tongue.

One of these groups is people who grew up in the Soviet Union, where their educational options were predominantly in Russian. The age range of these people ranges from 40 to 70 years of age. They say that not knowing Kazakh did not inconvenience them in the past, but because they consciously link their future and the future of their families with Kazakhstan, it became important for them to speak the state language.

Young managers and entrepreneurs also increasingly prefer to speak Kazakh, but are driven by rather pragmatic goals – to run their businesses easier, it is necessary to talk to people in their native language.

Gulnara Zhuaspayeva, a lawyer who has run her business for over 15 years, said that the need to improve her ability to speak in the native language became more acute as she moved to Almaty and was approached by a predominantly Kazakh-speaking group.

kazakh language essay

Gulnara Zhuaspayeva. Photo credit: Zhuaspayeva’s Facebook page

During the pandemic, she decided to join the Mamile (mutual understanding) Kazakh language speaking club in the National Library of Kazakhstan in Nur-Sultan.

“After the quarantine was over, I continued my studies in the club. I began to meet more people like me who were products of Soviet education when native languages were in the status of an elective language. There are a lot of people like me and we meet every Saturday at the club to talk and help each other,” she said.

Speaking of her progress, Zhuaspayeva said that she speaks the language “fluently enough to give interviews in Kazakh and speak at trials.”

“But for now, I write documents in Russian and send them to a translator,” she added.

Demographics is also an important factor. The preliminary results of the 2021 census show that ethnic Kazakhs constitute more than 70 percent of the total population and the birth rate is traditionally higher in Kazakh-speaking families, according to Tasibekov.

He also said that a huge contribution to the increase of the Kazakh-speaking population was the return of the kandas, a term used to describe ethnic Kazakhs returning from abroad. Since independence, over one million ethnic Kazakhs have returned to their historical homeland.

Unless a special effort is made, the interest in the language could slow down again, said Tasibekov.

According to him, a similar interest in the Kazakh language was observed after Kazakhstan gained independence, but the shortage of textbooks and teachers made people less enthusiastic to learn the language.

“Now there is a rise of interest again and our task is to sustain it and support those who are engaged in the popularization of the language. For example, we could help to produce video lessons. To do it professionally, you need substantial help from the state,” said Tasibekov.

Get The Astana Times stories sent directly to you! Sign up via the website or subscribe to our Twitter , Facebook , Instagram , Telegram , YouTube and Tiktok !

Most Recent Stories

  • Kazakh Startup Project Achieves UN’s Digital Public Good Status
  • Kazakhstan Launches Global Chess Mentorship Initiative 
  • Officials in Brussels Highlight Strong and Diversifying Kazakhstan-EU Relations
  • Kazakh Senate Adopts Amendments to Law on Renewable Energy
  • Kazakhstan Welcomes Leonardo da Vinci’s La Bella Principessa for First Time
  • News Digest: Foreign Media on Kazakhstan-EU Cooperation, Political Reforms, and More
  • Astana to Host WPC Energy Congress in 2028
  • Kazakh Energy Ministry Launches 60 Investment Projects
  • Nearly Ten Million Foreign Tourists Visit Kazakhstan in 2023
  • Vibrant Events to Spark Your Weekend

Kazakhstan News in English

  • Dialogue of Civilisations
  • Editor’s Picks
  • International
  • Constitutional Referendum
  • National Overview
  • © 2010-2024 The Astana Times
  • Privacy Policy
  • About Us

Email

  • +44 0330 027 0207
  • +1 (818) 532-6908
  • [email protected]
  • e-Learning Courses Online

Commisceo Global Consulting Ltd.

  • You are here:  
  • Resources /
  • Country Guides /
  • Kazakhstan Guide

Kazakhstan - Culture, Etiquette and Business Practices

What will you learn in this guide.

You will gain an understanding of a number of key areas including:

  • Religion and beliefs
  • Culture and society
  • Social etiquette and customs
  • Business culture and etiquette

kazakh boy on horse

You can't understand Kazakhs without understanding their connection to horses. Photo taken in Kolsai Lake by Rosie Blamey on Unsplash

Facts and Statistics

Location: Central Asia, northwest of China ; a small portion west of the Ural River in eastern-most Europe

Capital:  Nur Sultan.  The name of the capital city was change in 2019 from Astana to honour the long-ruling Kazakh President Nursultan Nazarbayev

Climate: continental, cold winters and hot summers, arid and semiarid

Population: 18+ million (2019 est.)

Ethnic Make-up: Kazakh (Qazaq) 53.4%, Russian 30%, Ukrainian 3.7%, Uzbek 2.5%, German 2.4%, Tatar 1.7%, Uygur 1.4%, other 4.9%

Religions: Muslim 47%, Russian Orthodox 44%, Protestant 2%, other 7%

Government: republic; authoritarian presidential rule

Language in Kazakhstan

  • Kazakhstan is a bilingual country: the Kazakh language, spoken by 64.4% of the population, has the status of the "state" language, while Russian, which is spoken by almost all
  • Kazakhstanis, is declared the "official" language, and is used routinely in business.
  • Kazakh (also Qazaq) is a Turkic language closely related to Nogai and Karakalpak.

kazakh sheep at dusk

A herd of sheep at sunset in Altyn Emel National Park. Photo by Charlotte Venema on Unsplash

Kazakh People, Culture and Society

A minority in their own land.

Kazakhstan is unique in that its people, the Kazakhs, did not form the majority of the population upon independence in 1991. Currently the northern part of the country is populated mostly with Ukrainian and Russian majorities while Kazakhs are more prevalent in the south. Other prevalent nationalities include Germans, Uzbeks, and Tatars, and over one hundred different nationalities reside in the country.

It is the goal of the government for the Kazakhs to become the majority of the population throughout the country. This can be seen in many overt and covert actions and policies. Many street names have reverted to their historical names. Kazakh has been declared the national language of the country (even though many native Kazakhs cannot speak their own language). Expatriated Kazakhs have been invited to return home and settle. Couples are encouraged to have large families.

It is important to note that the people of Kazakhstan, inclusive of all ethnic groups living in the country, are called Kazakhstani. Only people of the Kazahk ethnic group are called Kazahks.

If you are not sure of someone’s ethnic background, it is safest to refer to them as Kazakhstanis.

The word "Kazakh" means "a free and independent nomad" in ancient Turkish. Kazakhs have travelled along the steppes of Kazakhstan from western China to the southern border of Russia for centuries.

For centuries Kazakhstan was a country of nomads and herders. Tribes were the basis of society; the tribe was constituted of family members and the family elders. Inter-tribal marriages were important in establishing security and peace. To this day, Kazakhs say, "the matchmaking lasts a thousand years, while the son-in-law lasts only a hundred." Arranged marriages are still the norm in many parts of the country.

A Patriarchal and Hierarchical Society

The Kazakhs developed a patriarchal view of the world. They banded together in extended family groups to battle the hardships of the environment and to protect their cattle and their families. This was officially called "ata-balasy", which means the joining of a grandfather’s sons into one tribe of extended family. The husband plays the primary role in family life and is ultimately responsible for the family’s survival.

Kazakhstan is also an extremely hierarchical society. Everyone has a distinct place in the hierarchy based upon family relationships. People are respected because of their age and position. Older people are viewed as wise and therefore they are granted respect. The "ways of the elders" is a popular expression that is used to explain why things are done in prescribed ways. Kazakhs expect either the eldest or the person with the highest position to make decisions that are in the best interest of the group.

art on a building in Almaty

Art on the side of an apartment block on Tulebaev Steet, Almaty. Photo by Nurgissa Ussen on Unsplash

Kazakh Manners and Etiquette

Meeting people.

  • Greetings are rather formal due to the hierarchical nature of society.
  • The common greeting is the handshake, often done with both hands and a smile. Since many Kazakhs are Muslim, some men will not shake hands with women, so be sensitive to these religious differences.
  • Once you have developed a personal relationship, close friends of the same sex may prefer to hug rather than shake hands.
  • Most Kazakhs have a first and patronymic name (the father’s name followed by a suffix -ich or –ovich for son of or daughter of, respectively).
  • Wait until invited before using someone’s first name, although the invitation generally comes early in the relationship.

Gift Giving Etiquette

  • There is not a great deal of protocol in gift giving.
  • When invited to someone’s house for dinner, it is polite to bring something for the hostess such as pastries.
  • Practising Muslims do not touch alcohol, so do not give alcoholic beverages unless you know your host drinks.
  • Gifts are usually opened when received.

Dining Etiquette

  • Kazakhs are very hospitable people and enjoy hosting dinners at their homes.
  • You will be served tea and bread, even if you are not invited to a meal. Since Kazakhs consider bread to be sacred, serving bread is a sign of respect.
  • When served tea, your cup will often only be filled halfway. To fill the cup would mean that your host wanted you to leave.
  • It is not imperative that you arrive on time, although you should not arrive more than 30 minutes late without telephoning first.
  • Dress conservatively in clothing you might wear to the office. Kazakhs value dressing well over comfort. To dress too informally might insult your hosts.
  • Table manners are not terribly formal in Kazakhstan.
  • Table manners are Continental -- the fork is held in the left hand and the knife in the right while eating.
  • Some foods are meant to be eaten by hand.
  • Your host or another guest may serve you.
  • In more rural settings, you may sit on the floor.
  • You will be given a bowl to drink broth or tea. When you do not want any more, turn your bowl upside-down as an indication.
  • If alcoholic beverages are served, expect a fair amount of toasting.
  • Meals are social events. As such, they may take a great deal of time.
  • Leave something on your plate when you have finished eating. This demonstrates that you have had enough, whereas if you finish everything it means you are still hungry and you will be served more food.
  • Expect to be served second helpings.

A Sheep’s Head

  • In rural settings it is a sign of respect to offer the most honoured guest a boiled sheep's head on a beautiful plate.
  • The guest then divides the food among the guests in the following fashion:
  • The ear is given to the smallest child so that he or she will listen to and obey the elders.
  • The eyes are given to the two closest friends so that they will take care of the guest.
  • The upper palate is given to the daughter-in-law and the tongue to the host’s daughter so both women will hold their tongues.
  • The pelvic bones go to the second most respected guest.
  • The brisket is given to the son-in-law.

A typical Kazakh spread. Photo by Steven M. Otters (CC BY-NC 2.0)

Business Culture and Etiquette

If you're looking for expert help and advice on doing business in Kazakhstan, then this is what we do!

Click here to learn more about our customized cultural training .

Meeting and Greeting

  • The handshake is the common greeting. Two hands are often used.
  • Handshakes tend to be gentle.
  • Many Kazakh men will not shake hands with women. A woman should extend her hand, but if it is not accepted, she should not be insulted.
  • Maintain eye contact during the greeting.
  • Shake hands at the end of a meeting, prior to leaving.
  • If you meet someone several times in the same day, you should shake hands each time.
  • Wait to be introduced to everyone, usually in order of importance.
  • Academic and professional titles are used in business.
  • People are called by their title and surname.
  • Wait until invited before using someone’s first name.
  • Business cards are extremely important to establish one’s position, navigate bureaucracy and open doors.
  • Likewise, show the card of someone significant when trying to gain access or secure an appointment.
  • Business cards are exchanged without a great deal of ritual.
  • It is advisable to have your business cards printed in Russian on one side and English on the other.
  • Make certain that your title is included on your business card.

Communication Styles

Protecting relationships and people’s honour is important. As a result Kazakhs finesse what they say in order to deliver information in a sensitive and diplomatic manner. They tend to speak in a roundabout fashion rather than a linear fashion. They respond more favourably to gentle probing rather than direct questioning.

At the same time, many Kazakhs have a somewhat volatile demeanour and can raise their voice to get their point across. They are known for their fierce arguments. You may wish to retaliate in kind, but do so cautiously as there is a fine line between standing up for yourself and appearing overly aggressive.

Hierarchy is respected in Kazakhstan. Someone more senior is never ever contradicted or criticised, especially in public. You will be expected to treat senior Kazakhs in the same manner.

Business Meetings

Meeting styles vary by the type of business entity. Private industry is often more focused and westernized; things are a little bit more fluid. Public entities, on the other hand, follow lots of protocol and red-tape (leftovers from the Communist era). The latter may involve many more meetings and patience.

The hierarchical nature of the culture means that Kazakhs will want to meet people of similar rank. Therefore, it is important to forward the bios of all team members well in advance of any meeting.

T-shaped tables are often used for meetings so that both sides can be seated opposite each other. The top Kazakh at the meeting will sit at the head and his staff will be seated in decreasing order of rank. Your team should attempt to seat themselves in the same manner. In some companies, there is an emerging trend to seat peers next to each other to facilitate conversation.

There is generally a fair amount of small talk before business is discussed. This may take place over tea and sweets. Wait for the other party to bring the conversation to business

Spend time in relationship building; as a family orientated people they want to be sure you are trustworthy, affable and reliable.

The most senior Kazakh at the meeting opens the discussion and introduces his team in rank order. Although meetings have a start time, they seldom have an ending time. They are masters at delivering roundabout speeches. Therefore, it would be impractical to predetermine when a meeting will finish.

Management Style

  • Read our guide to Kazakh Management Culture for more specific information on this topc.

THANKS FOR READING OUR GUIDE TO KAZAKHSTAN - SHARE IT IF YOU LIKED IT!

Do you need to cite this page for school or university research?

Please see below examples.

Simply change the country name depending on which guide you are referencing.

MLA Format:

Commisceo Global Consulting Ltd. Afghanistan - Language, Culture, Customs and Etiquette. www.commisceo-global.com. 1 Jan. 2020 https://commisceo-global.com/resources/country-guides/ afghanistan -guide

APA Format:

Commisceo Global Consulting Ltd. (2020, January 1) Afghanistan - Language, Culture, Customs and Etiquette. Retrieved from https://commisceo-global.com/resources/country-guides/ afghanistan -guide

Harvard Format:

Commisceo Global Consulting Ltd. (2020). Afghanistan - Language, Culture, Customs and Etiquette. [online] Available at: https://commisceo-global.com/resources/country-guides/ afghanistan -guide [Accessed ENTER DATE].

Can You Help Improve This Page?

License our culture guides, did you know that you can upload all our country culture guides onto your company intranet, connect your expatriate and international business staff with customised country information at the touch of a button., click here for more information..

34 New House, 67-68 Hatton Garden, London EC1N 8JY, UK. 1950 W. Corporate Way PMB 25615, Anaheim, CA 92801, USA. +44 0330 027 0207 or +1 (818) 532-6908

34 New House, 67-68 Hatton Garden, London EC1N 8JY, UK. 1950 W. Corporate Way PMB 25615, Anaheim, CA 92801, USA. +44 0330 027 0207 +1 (818) 532-6908

Search for something

kazakh language essay

How to Cite

  • Endnote/Zotero/Mendeley (RIS)

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License .

Similar Articles

  • Murad HALMET, ON THE OCCASION OF 80'TH ANNIVERSARY OF COLLAPSING OF TURKESTAN AUTONOMY: THE "RED TERROR" PERFORMED BY SOVIET RUSSIA DURING 1937-1938 YEARS IN TURKESTAN AND ITS CONSEQUENCES , Atlas Journal: Vol. 4 No. 14 (2018)
  • Zharkynbike SULEIMENOVA, Akmaral KURMANALIEVA, HOMOGENEOUS HOMONYMS IN THE KAZAKH LANGUAGE: TRADITION AND WORLD OUTLOOK , Atlas Journal: Vol. 3 No. 6 (2017)
  • Gulmira Kalibaevna ABDIRASILOVA, Adilkhan Asel MARATOVNA, GLOBALIZATION AND THE PROBLEMS OF NATIONAL EDUCATION , Atlas Journal: Vol. 4 No. 8 (2018)

You may also start an advanced similarity search for this article.

Make a Submission

Information.

  • For Readers
  • For Authors
  • For Librarians

Announcements

Article editorial fee, number of intervals.

Article Submission

  • Copy On A New Twitch Namah- Ihtilâc-Namah 1024
  • A RESEARCH ON ARTIFICIAL INTELLIGENCE AND ETHICS 109
  • An Investıgatıon on Borderlıne Personality Disorder 69
  • POSITIVE EFFECTS OF PETS ON HUMAN HEALTH 65
  • HISTORICAL DEVELOPMENT OF THE MEDITERRANEAN KITCHEN CULTURE 56

The journal is licensed under a Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) . OJS Hosting, Support, and Customization by |  OJSDergi.com

More information about the publishing system, Platform and Workflow by OJS/PKP.

, .

Scholar Commons

  • < Previous

Home > USC Columbia > HONORS_COLLEGE > SENIOR_THESES > 684

Senior Theses

‘kazakh means freedom’ - kazakh language policy and national identity before and during the ukraine war.

McLean T. Brown , University of South Carolina Follow

Date of Award

Spring 2024

Degree Type

Languages, Literatures and Cultures

Director of Thesis

Dr. Judith Kalb

Second Reader

Dr. Magdalena Stawkowski

This thesis examines the link between language policy and national identity in Kazakhstan, tracing the relationship between the two across history and describing how they have been affected by the Ukraine War. The Kazakh government has put considerable effort into developing a national identity for contemporary Kazakhstan, but conflicting standards of production make it difficult for a cohesive, well-defined Kazakh national identity to be put forth. Through qualitative and quantitative analyses, phenomenological critical discourse analysis, and ethnographic research, this thesis strives to alleviate existing gaps in Central Asian studies research while arguing that language policy is a lens through which researchers can study the production of national identity in post-colonial and post-Soviet states. I begin with an exploration into the historical forms of Kazakh identity and their connection to the Kazakh language before examining the impact of Soviet language and nation-building processes on the production of Kazakh national identity. I then analyze the ways that the Ukraine War, and more specifically, the 2022 Russian invasion of Ukraine, have impacted contemporary language and nation-building policies on both the political and social levels, using ethnographic research conducted in Kazakhstan in July 2023 to investigate their impact on the average person in Kazakhstan.

Recommended Citation

Brown, McLean T., "‘Kazakh Means Freedom’ - Kazakh Language Policy and National Identity Before and During the Ukraine War" (2024). Senior Theses . 684. https://scholarcommons.sc.edu/senior_theses/684

© 2024, McLean T. Brown

Since May 03, 2024

Included in

Ethnic Studies Commons , Modern Languages Commons , Other Languages, Societies, and Cultures Commons

Advanced Search

  • Notify me via email or RSS
  • Collections
  • Disciplines

Submissions

  • Submit Research
  • Give us Feedback
  • University Libraries

Home | About | FAQ | My Account | Accessibility Statement

Privacy Copyright

kazakh language essay

Which language do you want to learn?

TalkPal AI learn English

Present Continuous in Kazakh Grammar

Immersive methods using grammar theory for language learning

Understanding the Present Continuous in Kazakh Grammar

Talkpal AI banner

The Fundamentals of Kazakh Present Continuous Tense

The Present Continuous tense, a key aspect of the Kazakh language, allows speakers to describe ongoing actions or indicate future plans. This tense differs from the Simple Present tense, which primarily demonstrates habitual actions or general truths. In the following sections, we will explore the intricacies of forming and using the Present Continuous tense in the complex world of Kazakh grammar .

Nuances of Present Continuous Usage

While the primary function of the Present Continuous tense is to express ongoing actions, it also conveys the following nuances in the Kazakh language:

1. Immediacy: For actions occurring at the moment of speaking , the present continuous tense is employed, highlighting their real-time nature.

2. Temporary states: The tense is used to denote actions occurring for a limited period rather than repeatedly or habitually.

3. Developing situations: When situations change or progress during a specific time frame, the Present Continuous tense effectively conveys this evolving nature.

4. Future plans: Despite its primary focus on present actions, this tense also signifies foreseeable future events, like appointments or scheduled activities.

Learn a Language With AI 5x Faster

kazakh language essay

Talkpal is AI-powered language tutor. Learn 57+ languages 5x faster with revolutionary technology.

Kazakh language knowledge level assessment system - KAZTEST

  • About the National Testing Center
  • About the KAZTEST system
  • Structure of the KAZTEST system
  • System levels
  • KAZTEST system testing procedure
  • The team developing the KAZTEST system
  • Test items of KAZTEST system

© National Testing Center

  • Skip to main content
  • Accessibility information

kazakh language essay

  • Enlighten Enlighten

Enlighten Theses

  • Latest Additions
  • Browse by Year
  • Browse by Subject
  • Browse by College/School
  • Browse by Author
  • Browse by Funder
  • Login (Library staff only)

In this section

Kazakh cinema and the nation: a critical analysis

Kamza, Assel (2021) Kazakh cinema and the nation: a critical analysis. PhD thesis, University of Glasgow.

Nation building is the process in question. This process is, as a rule, complicated in diverse countries, such as Kazakhstan. As a post-Soviet nation, it is still not sure how to define itself in the country and in the outside world. The crisis of the Kazakh identity is compromised by the manifold ethnic groups and cultures, juxtaposed by the clashes of Kazakh and Russian languages and different identities. In this regard, the role of cinema in the need for cultural certainty and the systematisation of national identity cannot be underestimated. Film is one way of offering knowledge of the nation to itself. Through cinema it is possible to imagine the history of the nation and construct modernity and to rebuild the nation. The current study investigates Kazakh cinema in transition. This thesis, for the first-time, provides an assessment of Kazakh cinema production after the adoption of the new Cinema Law (2019) and the Eurasia International Film Festival (EurIFF) within a nation-building context. Also, little work using a theoretical framework has been done on the relationship between Kazakh film and nation building within the wider discussion of nationalism. This thesis adds to this small body of work by addressing Kazakh cinema’s role in nation building.

Through the analysis of Kazakh films framed through Anthony Smith’s ethno-symbolism concept, this thesis will look at how Kazakhstan is trying to define itself through cinema, how Kazakh films aid the country to reconstruct itself. In order to critically analyse the current Kazakh cinema landscape, this thesis has adopted a qualitative approach, utilising semi-structured interviews with 30 participants residing in the Kazakhstani cities of Almaty and Nur-Sultan. After carrying out my research in relation to the literature on nationalism and film studies, the analysis of the data establishes four primary themes. Firstly, I investigate Kazakh film policy, focusing on the way the Cinema Law may reshape the film industry in the country. Secondly, I consider the significance of the Eurasia International Film Festival (EurIFF) as well as the unusual challenges that this state-run festival had to face in order to organise itself effectively. The third theme explores the curation and programming of the festival, examining the festival’s approach to its audience and palette of films. Finally, in the fourth theme I demonstrate the influence of the film industry on both Kazakh cinema and the EurIFF with respect to image building.

Today, not many countries have successful cases of nation building through films. Kazakhstan is no exception. I conclude that Kazakh cinema and film policy is situated in between the discords of the old system (Kazakhfilm) and the new (the State Centre for Support of National Cinema). This thesis shows that the impact of Kazakh cinema on nation building is limited. Ultimately, I argue that if Kazakhstan had a stronger business-oriented approach to film policy, both domestic and international markets would be more reachable. As a result, cinematic nation building in Kazakhstan would be more successful.

Actions (login required)

Downloads per month over past year

View more statistics

-

The University of Glasgow is a registered Scottish charity: Registration Number SC004401

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 03 June 2024

Applying large language models for automated essay scoring for non-native Japanese

  • Wenchao Li 1 &
  • Haitao Liu 2  

Humanities and Social Sciences Communications volume  11 , Article number:  723 ( 2024 ) Cite this article

185 Accesses

2 Altmetric

Metrics details

  • Language and linguistics

Recent advancements in artificial intelligence (AI) have led to an increased use of large language models (LLMs) for language assessment tasks such as automated essay scoring (AES), automated listening tests, and automated oral proficiency assessments. The application of LLMs for AES in the context of non-native Japanese, however, remains limited. This study explores the potential of LLM-based AES by comparing the efficiency of different models, i.e. two conventional machine training technology-based methods (Jess and JWriter), two LLMs (GPT and BERT), and one Japanese local LLM (Open-Calm large model). To conduct the evaluation, a dataset consisting of 1400 story-writing scripts authored by learners with 12 different first languages was used. Statistical analysis revealed that GPT-4 outperforms Jess and JWriter, BERT, and the Japanese language-specific trained Open-Calm large model in terms of annotation accuracy and predicting learning levels. Furthermore, by comparing 18 different models that utilize various prompts, the study emphasized the significance of prompts in achieving accurate and reliable evaluations using LLMs.

Similar content being viewed by others

kazakh language essay

Accurate structure prediction of biomolecular interactions with AlphaFold 3

kazakh language essay

Testing theory of mind in large language models and humans

kazakh language essay

Highly accurate protein structure prediction with AlphaFold

Conventional machine learning technology in aes.

AES has experienced significant growth with the advancement of machine learning technologies in recent decades. In the earlier stages of AES development, conventional machine learning-based approaches were commonly used. These approaches involved the following procedures: a) feeding the machine with a dataset. In this step, a dataset of essays is provided to the machine learning system. The dataset serves as the basis for training the model and establishing patterns and correlations between linguistic features and human ratings. b) the machine learning model is trained using linguistic features that best represent human ratings and can effectively discriminate learners’ writing proficiency. These features include lexical richness (Lu, 2012 ; Kyle and Crossley, 2015 ; Kyle et al. 2021 ), syntactic complexity (Lu, 2010 ; Liu, 2008 ), text cohesion (Crossley and McNamara, 2016 ), and among others. Conventional machine learning approaches in AES require human intervention, such as manual correction and annotation of essays. This human involvement was necessary to create a labeled dataset for training the model. Several AES systems have been developed using conventional machine learning technologies. These include the Intelligent Essay Assessor (Landauer et al. 2003 ), the e-rater engine by Educational Testing Service (Attali and Burstein, 2006 ; Burstein, 2003 ), MyAccess with the InterlliMetric scoring engine by Vantage Learning (Elliot, 2003 ), and the Bayesian Essay Test Scoring system (Rudner and Liang, 2002 ). These systems have played a significant role in automating the essay scoring process and providing quick and consistent feedback to learners. However, as touched upon earlier, conventional machine learning approaches rely on predetermined linguistic features and often require manual intervention, making them less flexible and potentially limiting their generalizability to different contexts.

In the context of the Japanese language, conventional machine learning-incorporated AES tools include Jess (Ishioka and Kameda, 2006 ) and JWriter (Lee and Hasebe, 2017 ). Jess assesses essays by deducting points from the perfect score, utilizing the Mainichi Daily News newspaper as a database. The evaluation criteria employed by Jess encompass various aspects, such as rhetorical elements (e.g., reading comprehension, vocabulary diversity, percentage of complex words, and percentage of passive sentences), organizational structures (e.g., forward and reverse connection structures), and content analysis (e.g., latent semantic indexing). JWriter employs linear regression analysis to assign weights to various measurement indices, such as average sentence length and total number of characters. These weights are then combined to derive the overall score. A pilot study involving the Jess model was conducted on 1320 essays at different proficiency levels, including primary, intermediate, and advanced. However, the results indicated that the Jess model failed to significantly distinguish between these essay levels. Out of the 16 measures used, four measures, namely median sentence length, median clause length, median number of phrases, and maximum number of phrases, did not show statistically significant differences between the levels. Additionally, two measures exhibited between-level differences but lacked linear progression: the number of attributives declined words and the Kanji/kana ratio. On the other hand, the remaining measures, including maximum sentence length, maximum clause length, number of attributive conjugated words, maximum number of consecutive infinitive forms, maximum number of conjunctive-particle clauses, k characteristic value, percentage of big words, and percentage of passive sentences, demonstrated statistically significant between-level differences and displayed linear progression.

Both Jess and JWriter exhibit notable limitations, including the manual selection of feature parameters and weights, which can introduce biases into the scoring process. The reliance on human annotators to label non-native language essays also introduces potential noise and variability in the scoring. Furthermore, an important concern is the possibility of system manipulation and cheating by learners who are aware of the regression equation utilized by the models (Hirao et al. 2020 ). These limitations emphasize the need for further advancements in AES systems to address these challenges.

Deep learning technology in AES

Deep learning has emerged as one of the approaches for improving the accuracy and effectiveness of AES. Deep learning-based AES methods utilize artificial neural networks that mimic the human brain’s functioning through layered algorithms and computational units. Unlike conventional machine learning, deep learning autonomously learns from the environment and past errors without human intervention. This enables deep learning models to establish nonlinear correlations, resulting in higher accuracy. Recent advancements in deep learning have led to the development of transformers, which are particularly effective in learning text representations. Noteworthy examples include bidirectional encoder representations from transformers (BERT) (Devlin et al. 2019 ) and the generative pretrained transformer (GPT) (OpenAI).

BERT is a linguistic representation model that utilizes a transformer architecture and is trained on two tasks: masked linguistic modeling and next-sentence prediction (Hirao et al. 2020 ; Vaswani et al. 2017 ). In the context of AES, BERT follows specific procedures, as illustrated in Fig. 1 : (a) the tokenized prompts and essays are taken as input; (b) special tokens, such as [CLS] and [SEP], are added to mark the beginning and separation of prompts and essays; (c) the transformer encoder processes the prompt and essay sequences, resulting in hidden layer sequences; (d) the hidden layers corresponding to the [CLS] tokens (T[CLS]) represent distributed representations of the prompts and essays; and (e) a multilayer perceptron uses these distributed representations as input to obtain the final score (Hirao et al. 2020 ).

figure 1

AES system with BERT (Hirao et al. 2020 ).

The training of BERT using a substantial amount of sentence data through the Masked Language Model (MLM) allows it to capture contextual information within the hidden layers. Consequently, BERT is expected to be capable of identifying artificial essays as invalid and assigning them lower scores (Mizumoto and Eguchi, 2023 ). In the context of AES for nonnative Japanese learners, Hirao et al. ( 2020 ) combined the long short-term memory (LSTM) model proposed by Hochreiter and Schmidhuber ( 1997 ) with BERT to develop a tailored automated Essay Scoring System. The findings of their study revealed that the BERT model outperformed both the conventional machine learning approach utilizing character-type features such as “kanji” and “hiragana”, as well as the standalone LSTM model. Takeuchi et al. ( 2021 ) presented an approach to Japanese AES that eliminates the requirement for pre-scored essays by relying solely on reference texts or a model answer for the essay task. They investigated multiple similarity evaluation methods, including frequency of morphemes, idf values calculated on Wikipedia, LSI, LDA, word-embedding vectors, and document vectors produced by BERT. The experimental findings revealed that the method utilizing the frequency of morphemes with idf values exhibited the strongest correlation with human-annotated scores across different essay tasks. The utilization of BERT in AES encounters several limitations. Firstly, essays often exceed the model’s maximum length limit. Second, only score labels are available for training, which restricts access to additional information.

Mizumoto and Eguchi ( 2023 ) were pioneers in employing the GPT model for AES in non-native English writing. Their study focused on evaluating the accuracy and reliability of AES using the GPT-3 text-davinci-003 model, analyzing a dataset of 12,100 essays from the corpus of nonnative written English (TOEFL11). The findings indicated that AES utilizing the GPT-3 model exhibited a certain degree of accuracy and reliability. They suggest that GPT-3-based AES systems hold the potential to provide support for human ratings. However, applying GPT model to AES presents a unique natural language processing (NLP) task that involves considerations such as nonnative language proficiency, the influence of the learner’s first language on the output in the target language, and identifying linguistic features that best indicate writing quality in a specific language. These linguistic features may differ morphologically or syntactically from those present in the learners’ first language, as observed in (1)–(3).

我-送了-他-一本-书

Wǒ-sòngle-tā-yī běn-shū

1 sg .-give. past- him-one .cl- book

“I gave him a book.”

Agglutinative

彼-に-本-を-あげ-まし-た

Kare-ni-hon-o-age-mashi-ta

3 sg .- dat -hon- acc- give.honorification. past

Inflectional

give, give-s, gave, given, giving

Additionally, the morphological agglutination and subject-object-verb (SOV) order in Japanese, along with its idiomatic expressions, pose additional challenges for applying language models in AES tasks (4).

足-が 棒-に なり-ました

Ashi-ga bo-ni nar-mashita

leg- nom stick- dat become- past

“My leg became like a stick (I am extremely tired).”

The example sentence provided demonstrates the morpho-syntactic structure of Japanese and the presence of an idiomatic expression. In this sentence, the verb “なる” (naru), meaning “to become”, appears at the end of the sentence. The verb stem “なり” (nari) is attached with morphemes indicating honorification (“ます” - mashu) and tense (“た” - ta), showcasing agglutination. While the sentence can be literally translated as “my leg became like a stick”, it carries an idiomatic interpretation that implies “I am extremely tired”.

To overcome this issue, CyberAgent Inc. ( 2023 ) has developed the Open-Calm series of language models specifically designed for Japanese. Open-Calm consists of pre-trained models available in various sizes, such as Small, Medium, Large, and 7b. Figure 2 depicts the fundamental structure of the Open-Calm model. A key feature of this architecture is the incorporation of the Lora Adapter and GPT-NeoX frameworks, which can enhance its language processing capabilities.

figure 2

GPT-NeoX Model Architecture (Okgetheng and Takeuchi 2024 ).

In a recent study conducted by Okgetheng and Takeuchi ( 2024 ), they assessed the efficacy of Open-Calm language models in grading Japanese essays. The research utilized a dataset of approximately 300 essays, which were annotated by native Japanese educators. The findings of the study demonstrate the considerable potential of Open-Calm language models in automated Japanese essay scoring. Specifically, among the Open-Calm family, the Open-Calm Large model (referred to as OCLL) exhibited the highest performance. However, it is important to note that, as of the current date, the Open-Calm Large model does not offer public access to its server. Consequently, users are required to independently deploy and operate the environment for OCLL. In order to utilize OCLL, users must have a PC equipped with an NVIDIA GeForce RTX 3060 (8 or 12 GB VRAM).

In summary, while the potential of LLMs in automated scoring of nonnative Japanese essays has been demonstrated in two studies—BERT-driven AES (Hirao et al. 2020 ) and OCLL-based AES (Okgetheng and Takeuchi, 2024 )—the number of research efforts in this area remains limited.

Another significant challenge in applying LLMs to AES lies in prompt engineering and ensuring its reliability and effectiveness (Brown et al. 2020 ; Rae et al. 2021 ; Zhang et al. 2021 ). Various prompting strategies have been proposed, such as the zero-shot chain of thought (CoT) approach (Kojima et al. 2022 ), which involves manually crafting diverse and effective examples. However, manual efforts can lead to mistakes. To address this, Zhang et al. ( 2021 ) introduced an automatic CoT prompting method called Auto-CoT, which demonstrates matching or superior performance compared to the CoT paradigm. Another prompt framework is trees of thoughts, enabling a model to self-evaluate its progress at intermediate stages of problem-solving through deliberate reasoning (Yao et al. 2023 ).

Beyond linguistic studies, there has been a noticeable increase in the number of foreign workers in Japan and Japanese learners worldwide (Ministry of Health, Labor, and Welfare of Japan, 2022 ; Japan Foundation, 2021 ). However, existing assessment methods, such as the Japanese Language Proficiency Test (JLPT), J-CAT, and TTBJ Footnote 1 , primarily focus on reading, listening, vocabulary, and grammar skills, neglecting the evaluation of writing proficiency. As the number of workers and language learners continues to grow, there is a rising demand for an efficient AES system that can reduce costs and time for raters and be utilized for employment, examinations, and self-study purposes.

This study aims to explore the potential of LLM-based AES by comparing the effectiveness of five models: two LLMs (GPT Footnote 2 and BERT), one Japanese local LLM (OCLL), and two conventional machine learning-based methods (linguistic feature-based scoring tools - Jess and JWriter).

The research questions addressed in this study are as follows:

To what extent do the LLM-driven AES and linguistic feature-based AES, when used as automated tools to support human rating, accurately reflect test takers’ actual performance?

What influence does the prompt have on the accuracy and performance of LLM-based AES methods?

The subsequent sections of the manuscript cover the methodology, including the assessment measures for nonnative Japanese writing proficiency, criteria for prompts, and the dataset. The evaluation section focuses on the analysis of annotations and rating scores generated by LLM-driven and linguistic feature-based AES methods.

Methodology

The dataset utilized in this study was obtained from the International Corpus of Japanese as a Second Language (I-JAS) Footnote 3 . This corpus consisted of 1000 participants who represented 12 different first languages. For the study, the participants were given a story-writing task on a personal computer. They were required to write two stories based on the 4-panel illustrations titled “Picnic” and “The key” (see Appendix A). Background information for the participants was provided by the corpus, including their Japanese language proficiency levels assessed through two online tests: J-CAT and SPOT. These tests evaluated their reading, listening, vocabulary, and grammar abilities. The learners’ proficiency levels were categorized into six levels aligned with the Common European Framework of Reference for Languages (CEFR) and the Reference Framework for Japanese Language Education (RFJLE): A1, A2, B1, B2, C1, and C2. According to Lee et al. ( 2015 ), there is a high level of agreement (r = 0.86) between the J-CAT and SPOT assessments, indicating that the proficiency certifications provided by J-CAT are consistent with those of SPOT. However, it is important to note that the scores of J-CAT and SPOT do not have a one-to-one correspondence. In this study, the J-CAT scores were used as a benchmark to differentiate learners of different proficiency levels. A total of 1400 essays were utilized, representing the beginner (aligned with A1), A2, B1, B2, C1, and C2 levels based on the J-CAT scores. Table 1 provides information about the learners’ proficiency levels and their corresponding J-CAT and SPOT scores.

A dataset comprising a total of 1400 essays from the story writing tasks was collected. Among these, 714 essays were utilized to evaluate the reliability of the LLM-based AES method, while the remaining 686 essays were designated as development data to assess the LLM-based AES’s capability to distinguish participants with varying proficiency levels. The GPT 4 API was used in this study. A detailed explanation of the prompt-assessment criteria is provided in Section Prompt . All essays were sent to the model for measurement and scoring.

Measures of writing proficiency for nonnative Japanese

Japanese exhibits a morphologically agglutinative structure where morphemes are attached to the word stem to convey grammatical functions such as tense, aspect, voice, and honorifics, e.g. (5).

食べ-させ-られ-まし-た-か

tabe-sase-rare-mashi-ta-ka

[eat (stem)-causative-passive voice-honorification-tense. past-question marker]

Japanese employs nine case particles to indicate grammatical functions: the nominative case particle が (ga), the accusative case particle を (o), the genitive case particle の (no), the dative case particle に (ni), the locative/instrumental case particle で (de), the ablative case particle から (kara), the directional case particle へ (e), and the comitative case particle と (to). The agglutinative nature of the language, combined with the case particle system, provides an efficient means of distinguishing between active and passive voice, either through morphemes or case particles, e.g. 食べる taberu “eat concusive . ” (active voice); 食べられる taberareru “eat concusive . ” (passive voice). In the active voice, “パン を 食べる” (pan o taberu) translates to “to eat bread”. On the other hand, in the passive voice, it becomes “パン が 食べられた” (pan ga taberareta), which means “(the) bread was eaten”. Additionally, it is important to note that different conjugations of the same lemma are considered as one type in order to ensure a comprehensive assessment of the language features. For example, e.g., 食べる taberu “eat concusive . ”; 食べている tabeteiru “eat progress .”; 食べた tabeta “eat past . ” as one type.

To incorporate these features, previous research (Suzuki, 1999 ; Watanabe et al. 1988 ; Ishioka, 2001 ; Ishioka and Kameda, 2006 ; Hirao et al. 2020 ) has identified complexity, fluency, and accuracy as crucial factors for evaluating writing quality. These criteria are assessed through various aspects, including lexical richness (lexical density, diversity, and sophistication), syntactic complexity, and cohesion (Kyle et al. 2021 ; Mizumoto and Eguchi, 2023 ; Ure, 1971 ; Halliday, 1985 ; Barkaoui and Hadidi, 2020 ; Zenker and Kyle, 2021 ; Kim et al. 2018 ; Lu, 2017 ; Ortega, 2015 ). Therefore, this study proposes five scoring categories: lexical richness, syntactic complexity, cohesion, content elaboration, and grammatical accuracy. A total of 16 measures were employed to capture these categories. The calculation process and specific details of these measures can be found in Table 2 .

T-unit, first introduced by Hunt ( 1966 ), is a measure used for evaluating speech and composition. It serves as an indicator of syntactic development and represents the shortest units into which a piece of discourse can be divided without leaving any sentence fragments. In the context of Japanese language assessment, Sakoda and Hosoi ( 2020 ) utilized T-unit as the basic unit to assess the accuracy and complexity of Japanese learners’ speaking and storytelling. The calculation of T-units in Japanese follows the following principles:

A single main clause constitutes 1 T-unit, regardless of the presence or absence of dependent clauses, e.g. (6).

ケンとマリはピクニックに行きました (main clause): 1 T-unit.

If a sentence contains a main clause along with subclauses, each subclause is considered part of the same T-unit, e.g. (7).

天気が良かった の で (subclause)、ケンとマリはピクニックに行きました (main clause): 1 T-unit.

In the case of coordinate clauses, where multiple clauses are connected, each coordinated clause is counted separately. Thus, a sentence with coordinate clauses may have 2 T-units or more, e.g. (8).

ケンは地図で場所を探して (coordinate clause)、マリはサンドイッチを作りました (coordinate clause): 2 T-units.

Lexical diversity refers to the range of words used within a text (Engber, 1995 ; Kyle et al. 2021 ) and is considered a useful measure of the breadth of vocabulary in L n production (Jarvis, 2013a , 2013b ).

The type/token ratio (TTR) is widely recognized as a straightforward measure for calculating lexical diversity and has been employed in numerous studies. These studies have demonstrated a strong correlation between TTR and other methods of measuring lexical diversity (e.g., Bentz et al. 2016 ; Čech and Miroslav, 2018 ; Çöltekin and Taraka, 2018 ). TTR is computed by considering both the number of unique words (types) and the total number of words (tokens) in a given text. Given that the length of learners’ writing texts can vary, this study employs the moving average type-token ratio (MATTR) to mitigate the influence of text length. MATTR is calculated using a 50-word moving window. Initially, a TTR is determined for words 1–50 in an essay, followed by words 2–51, 3–52, and so on until the end of the essay is reached (Díez-Ortega and Kyle, 2023 ). The final MATTR scores were obtained by averaging the TTR scores for all 50-word windows. The following formula was employed to derive MATTR:

\({\rm{MATTR}}({\rm{W}})=\frac{{\sum }_{{\rm{i}}=1}^{{\rm{N}}-{\rm{W}}+1}{{\rm{F}}}_{{\rm{i}}}}{{\rm{W}}({\rm{N}}-{\rm{W}}+1)}\)

Here, N refers to the number of tokens in the corpus. W is the randomly selected token size (W < N). \({F}_{i}\) is the number of types in each window. The \({\rm{MATTR}}({\rm{W}})\) is the mean of a series of type-token ratios (TTRs) based on the word form for all windows. It is expected that individuals with higher language proficiency will produce texts with greater lexical diversity, as indicated by higher MATTR scores.

Lexical density was captured by the ratio of the number of lexical words to the total number of words (Lu, 2012 ). Lexical sophistication refers to the utilization of advanced vocabulary, often evaluated through word frequency indices (Crossley et al. 2013 ; Haberman, 2008 ; Kyle and Crossley, 2015 ; Laufer and Nation, 1995 ; Lu, 2012 ; Read, 2000 ). In line of writing, lexical sophistication can be interpreted as vocabulary breadth, which entails the appropriate usage of vocabulary items across various lexicon-grammatical contexts and registers (Garner et al. 2019 ; Kim et al. 2018 ; Kyle et al. 2018 ). In Japanese specifically, words are considered lexically sophisticated if they are not included in the “Japanese Education Vocabulary List Ver 1.0”. Footnote 4 Consequently, lexical sophistication was calculated by determining the number of sophisticated word types relative to the total number of words per essay. Furthermore, it has been suggested that, in Japanese writing, sentences should ideally have a length of no more than 40 to 50 characters, as this promotes readability. Therefore, the median and maximum sentence length can be considered as useful indices for assessment (Ishioka and Kameda, 2006 ).

Syntactic complexity was assessed based on several measures, including the mean length of clauses, verb phrases per T-unit, clauses per T-unit, dependent clauses per T-unit, complex nominals per clause, adverbial clauses per clause, coordinate phrases per clause, and mean dependency distance (MDD). The MDD reflects the distance between the governor and dependent positions in a sentence. A larger dependency distance indicates a higher cognitive load and greater complexity in syntactic processing (Liu, 2008 ; Liu et al. 2017 ). The MDD has been established as an efficient metric for measuring syntactic complexity (Jiang, Quyang, and Liu, 2019 ; Li and Yan, 2021 ). To calculate the MDD, the position numbers of the governor and dependent are subtracted, assuming that words in a sentence are assigned in a linear order, such as W1 … Wi … Wn. In any dependency relationship between words Wa and Wb, Wa is the governor and Wb is the dependent. The MDD of the entire sentence was obtained by taking the absolute value of governor – dependent:

MDD = \(\frac{1}{n}{\sum }_{i=1}^{n}|{\rm{D}}{{\rm{D}}}_{i}|\)

In this formula, \(n\) represents the number of words in the sentence, and \({DD}i\) is the dependency distance of the \({i}^{{th}}\) dependency relationship of a sentence. Building on this, the annotation of sentence ‘Mary-ga-John-ni-keshigomu-o-watashita was [Mary- top -John- dat -eraser- acc -give- past] ’. The sentence’s MDD would be 2. Table 3 provides the CSV file as a prompt for GPT 4.

Cohesion (semantic similarity) and content elaboration aim to capture the ideas presented in test taker’s essays. Cohesion was assessed using three measures: Synonym overlap/paragraph (topic), Synonym overlap/paragraph (keywords), and word2vec cosine similarity. Content elaboration and development were measured as the number of metadiscourse markers (type)/number of words. To capture content closely, this study proposed a novel-distance based representation, by encoding the cosine distance between the essay (by learner) and essay task’s (topic and keyword) i -vectors. The learner’s essay is decoded into a word sequence, and aligned to the essay task’ topic and keyword for log-likelihood measurement. The cosine distance reveals the content elaboration score in the leaners’ essay. The mathematical equation of cosine similarity between target-reference vectors is shown in (11), assuming there are i essays and ( L i , …. L n ) and ( N i , …. N n ) are the vectors representing the learner and task’s topic and keyword respectively. The content elaboration distance between L i and N i was calculated as follows:

\(\cos \left(\theta \right)=\frac{{\rm{L}}\,\cdot\, {\rm{N}}}{\left|{\rm{L}}\right|{\rm{|N|}}}=\frac{\mathop{\sum }\nolimits_{i=1}^{n}{L}_{i}{N}_{i}}{\sqrt{\mathop{\sum }\nolimits_{i=1}^{n}{L}_{i}^{2}}\sqrt{\mathop{\sum }\nolimits_{i=1}^{n}{N}_{i}^{2}}}\)

A high similarity value indicates a low difference between the two recognition outcomes, which in turn suggests a high level of proficiency in content elaboration.

To evaluate the effectiveness of the proposed measures in distinguishing different proficiency levels among nonnative Japanese speakers’ writing, we conducted a multi-faceted Rasch measurement analysis (Linacre, 1994 ). This approach applies measurement models to thoroughly analyze various factors that can influence test outcomes, including test takers’ proficiency, item difficulty, and rater severity, among others. The underlying principles and functionality of multi-faceted Rasch measurement are illustrated in (12).

\(\log \left(\frac{{P}_{{nijk}}}{{P}_{{nij}(k-1)}}\right)={B}_{n}-{D}_{i}-{C}_{j}-{F}_{k}\)

(12) defines the logarithmic transformation of the probability ratio ( P nijk /P nij(k-1) )) as a function of multiple parameters. Here, n represents the test taker, i denotes a writing proficiency measure, j corresponds to the human rater, and k represents the proficiency score. The parameter B n signifies the proficiency level of test taker n (where n ranges from 1 to N). D j represents the difficulty parameter of test item i (where i ranges from 1 to L), while C j represents the severity of rater j (where j ranges from 1 to J). Additionally, F k represents the step difficulty for a test taker to move from score ‘k-1’ to k . P nijk refers to the probability of rater j assigning score k to test taker n for test item i . P nij(k-1) represents the likelihood of test taker n being assigned score ‘k-1’ by rater j for test item i . Each facet within the test is treated as an independent parameter and estimated within the same reference framework. To evaluate the consistency of scores obtained through both human and computer analysis, we utilized the Infit mean-square statistic. This statistic is a chi-square measure divided by the degrees of freedom and is weighted with information. It demonstrates higher sensitivity to unexpected patterns in responses to items near a person’s proficiency level (Linacre, 2002 ). Fit statistics are assessed based on predefined thresholds for acceptable fit. For the Infit MNSQ, which has a mean of 1.00, different thresholds have been suggested. Some propose stricter thresholds ranging from 0.7 to 1.3 (Bond et al. 2021 ), while others suggest more lenient thresholds ranging from 0.5 to 1.5 (Eckes, 2009 ). In this study, we adopted the criterion of 0.70–1.30 for the Infit MNSQ.

Moving forward, we can now proceed to assess the effectiveness of the 16 proposed measures based on five criteria for accurately distinguishing various levels of writing proficiency among non-native Japanese speakers. To conduct this evaluation, we utilized the development dataset from the I-JAS corpus, as described in Section Dataset . Table 4 provides a measurement report that presents the performance details of the 14 metrics under consideration. The measure separation was found to be 4.02, indicating a clear differentiation among the measures. The reliability index for the measure separation was 0.891, suggesting consistency in the measurement. Similarly, the person separation reliability index was 0.802, indicating the accuracy of the assessment in distinguishing between individuals. All 16 measures demonstrated Infit mean squares within a reasonable range, ranging from 0.76 to 1.28. The Synonym overlap/paragraph (topic) measure exhibited a relatively high outfit mean square of 1.46, although the Infit mean square falls within an acceptable range. The standard error for the measures ranged from 0.13 to 0.28, indicating the precision of the estimates.

Table 5 further illustrated the weights assigned to different linguistic measures for score prediction, with higher weights indicating stronger correlations between those measures and higher scores. Specifically, the following measures exhibited higher weights compared to others: moving average type token ratio per essay has a weight of 0.0391. Mean dependency distance had a weight of 0.0388. Mean length of clause, calculated by dividing the number of words by the number of clauses, had a weight of 0.0374. Complex nominals per T-unit, calculated by dividing the number of complex nominals by the number of T-units, had a weight of 0.0379. Coordinate phrases rate, calculated by dividing the number of coordinate phrases by the number of clauses, had a weight of 0.0325. Grammatical error rate, representing the number of errors per essay, had a weight of 0.0322.

Criteria (output indicator)

The criteria used to evaluate the writing ability in this study were based on CEFR, which follows a six-point scale ranging from A1 to C2. To assess the quality of Japanese writing, the scoring criteria from Table 6 were utilized. These criteria were derived from the IELTS writing standards and served as assessment guidelines and prompts for the written output.

A prompt is a question or detailed instruction that is provided to the model to obtain a proper response. After several pilot experiments, we decided to provide the measures (Section Measures of writing proficiency for nonnative Japanese ) as the input prompt and use the criteria (Section Criteria (output indicator) ) as the output indicator. Regarding the prompt language, considering that the LLM was tasked with rating Japanese essays, would prompt in Japanese works better Footnote 5 ? We conducted experiments comparing the performance of GPT-4 using both English and Japanese prompts. Additionally, we utilized the Japanese local model OCLL with Japanese prompts. Multiple trials were conducted using the same sample. Regardless of the prompt language used, we consistently obtained the same grading results with GPT-4, which assigned a grade of B1 to the writing sample. This suggested that GPT-4 is reliable and capable of producing consistent ratings regardless of the prompt language. On the other hand, when we used Japanese prompts with the Japanese local model “OCLL”, we encountered inconsistent grading results. Out of 10 attempts with OCLL, only 6 yielded consistent grading results (B1), while the remaining 4 showed different outcomes, including A1 and B2 grades. These findings indicated that the language of the prompt was not the determining factor for reliable AES. Instead, the size of the training data and the model parameters played crucial roles in achieving consistent and reliable AES results for the language model.

The following is the utilized prompt, which details all measures and requires the LLM to score the essays using holistic and trait scores.

Please evaluate Japanese essays written by Japanese learners and assign a score to each essay on a six-point scale, ranging from A1, A2, B1, B2, C1 to C2. Additionally, please provide trait scores and display the calculation process for each trait score. The scoring should be based on the following criteria:

Moving average type-token ratio.

Number of lexical words (token) divided by the total number of words per essay.

Number of sophisticated word types divided by the total number of words per essay.

Mean length of clause.

Verb phrases per T-unit.

Clauses per T-unit.

Dependent clauses per T-unit.

Complex nominals per clause.

Adverbial clauses per clause.

Coordinate phrases per clause.

Mean dependency distance.

Synonym overlap paragraph (topic and keywords).

Word2vec cosine similarity.

Connectives per essay.

Conjunctions per essay.

Number of metadiscourse markers (types) divided by the total number of words.

Number of errors per essay.

Japanese essay text

出かける前に二人が地図を見ている間に、サンドイッチを入れたバスケットに犬が入ってしまいました。それに気づかずに二人は楽しそうに出かけて行きました。やがて突然犬がバスケットから飛び出し、二人は驚きました。バスケット の 中を見ると、食べ物はすべて犬に食べられていて、二人は困ってしまいました。(ID_JJJ01_SW1)

The score of the example above was B1. Figure 3 provides an example of holistic and trait scores provided by GPT-4 (with a prompt indicating all measures) via Bing Footnote 6 .

figure 3

Example of GPT-4 AES and feedback (with a prompt indicating all measures).

Statistical analysis

The aim of this study is to investigate the potential use of LLM for nonnative Japanese AES. It seeks to compare the scoring outcomes obtained from feature-based AES tools, which rely on conventional machine learning technology (i.e. Jess, JWriter), with those generated by AI-driven AES tools utilizing deep learning technology (BERT, GPT, OCLL). To assess the reliability of a computer-assisted annotation tool, the study initially established human-human agreement as the benchmark measure. Subsequently, the performance of the LLM-based method was evaluated by comparing it to human-human agreement.

To assess annotation agreement, the study employed standard measures such as precision, recall, and F-score (Brants 2000 ; Lu 2010 ), along with the quadratically weighted kappa (QWK) to evaluate the consistency and agreement in the annotation process. Assume A and B represent human annotators. When comparing the annotations of the two annotators, the following results are obtained. The evaluation of precision, recall, and F-score metrics was illustrated in equations (13) to (15).

\({\rm{Recall}}(A,B)=\frac{{\rm{Number}}\,{\rm{of}}\,{\rm{identical}}\,{\rm{nodes}}\,{\rm{in}}\,A\,{\rm{and}}\,B}{{\rm{Number}}\,{\rm{of}}\,{\rm{nodes}}\,{\rm{in}}\,A}\)

\({\rm{Precision}}(A,\,B)=\frac{{\rm{Number}}\,{\rm{of}}\,{\rm{identical}}\,{\rm{nodes}}\,{\rm{in}}\,A\,{\rm{and}}\,B}{{\rm{Number}}\,{\rm{of}}\,{\rm{nodes}}\,{\rm{in}}\,B}\)

The F-score is the harmonic mean of recall and precision:

\({\rm{F}}-{\rm{score}}=\frac{2* ({\rm{Precision}}* {\rm{Recall}})}{{\rm{Precision}}+{\rm{Recall}}}\)

The highest possible value of an F-score is 1.0, indicating perfect precision and recall, and the lowest possible value is 0, if either precision or recall are zero.

In accordance with Taghipour and Ng ( 2016 ), the calculation of QWK involves two steps:

Step 1: Construct a weight matrix W as follows:

\({W}_{{ij}}=\frac{{(i-j)}^{2}}{{(N-1)}^{2}}\)

i represents the annotation made by the tool, while j represents the annotation made by a human rater. N denotes the total number of possible annotations. Matrix O is subsequently computed, where O_( i, j ) represents the count of data annotated by the tool ( i ) and the human annotator ( j ). On the other hand, E refers to the expected count matrix, which undergoes normalization to ensure that the sum of elements in E matches the sum of elements in O.

Step 2: With matrices O and E, the QWK is obtained as follows:

K = 1- \(\frac{\sum i,j{W}_{i,j}\,{O}_{i,j}}{\sum i,j{W}_{i,j}\,{E}_{i,j}}\)

The value of the quadratic weighted kappa increases as the level of agreement improves. Further, to assess the accuracy of LLM scoring, the proportional reductive mean square error (PRMSE) was employed. The PRMSE approach takes into account the variability observed in human ratings to estimate the rater error, which is then subtracted from the variance of the human labels. This calculation provides an overall measure of agreement between the automated scores and true scores (Haberman et al. 2015 ; Loukina et al. 2020 ; Taghipour and Ng, 2016 ). The computation of PRMSE involves the following steps:

Step 1: Calculate the mean squared errors (MSEs) for the scoring outcomes of the computer-assisted tool (MSE tool) and the human scoring outcomes (MSE human).

Step 2: Determine the PRMSE by comparing the MSE of the computer-assisted tool (MSE tool) with the MSE from human raters (MSE human), using the following formula:

\({\rm{PRMSE}}=1-\frac{({\rm{MSE}}\,{\rm{tool}})\,}{({\rm{MSE}}\,{\rm{human}})\,}=1-\,\frac{{\sum }_{i}^{n}=1{({{\rm{y}}}_{i}-{\hat{{\rm{y}}}}_{{\rm{i}}})}^{2}}{{\sum }_{i}^{n}=1{({{\rm{y}}}_{i}-\hat{{\rm{y}}})}^{2}}\)

In the numerator, ŷi represents the scoring outcome predicted by a specific LLM-driven AES system for a given sample. The term y i − ŷ i represents the difference between this predicted outcome and the mean value of all LLM-driven AES systems’ scoring outcomes. It quantifies the deviation of the specific LLM-driven AES system’s prediction from the average prediction of all LLM-driven AES systems. In the denominator, y i − ŷ represents the difference between the scoring outcome provided by a specific human rater for a given sample and the mean value of all human raters’ scoring outcomes. It measures the discrepancy between the specific human rater’s score and the average score given by all human raters. The PRMSE is then calculated by subtracting the ratio of the MSE tool to the MSE human from 1. PRMSE falls within the range of 0 to 1, with larger values indicating reduced errors in LLM’s scoring compared to those of human raters. In other words, a higher PRMSE implies that LLM’s scoring demonstrates greater accuracy in predicting the true scores (Loukina et al. 2020 ). The interpretation of kappa values, ranging from 0 to 1, is based on the work of Landis and Koch ( 1977 ). Specifically, the following categories are assigned to different ranges of kappa values: −1 indicates complete inconsistency, 0 indicates random agreement, 0.0 ~ 0.20 indicates extremely low level of agreement (slight), 0.21 ~ 0.40 indicates moderate level of agreement (fair), 0.41 ~ 0.60 indicates medium level of agreement (moderate), 0.61 ~ 0.80 indicates high level of agreement (substantial), 0.81 ~ 1 indicates almost perfect level of agreement. All statistical analyses were executed using Python script.

Results and discussion

Annotation reliability of the llm.

This section focuses on assessing the reliability of the LLM’s annotation and scoring capabilities. To evaluate the reliability, several tests were conducted simultaneously, aiming to achieve the following objectives:

Assess the LLM’s ability to differentiate between test takers with varying levels of oral proficiency.

Determine the level of agreement between the annotations and scoring performed by the LLM and those done by human raters.

The evaluation of the results encompassed several metrics, including: precision, recall, F-Score, quadratically-weighted kappa, proportional reduction of mean squared error, Pearson correlation, and multi-faceted Rasch measurement.

Inter-annotator agreement (human–human annotator agreement)

We started with an agreement test of the two human annotators. Two trained annotators were recruited to determine the writing task data measures. A total of 714 scripts, as the test data, was utilized. Each analysis lasted 300–360 min. Inter-annotator agreement was evaluated using the standard measures of precision, recall, and F-score and QWK. Table 7 presents the inter-annotator agreement for the various indicators. As shown, the inter-annotator agreement was fairly high, with F-scores ranging from 1.0 for sentence and word number to 0.666 for grammatical errors.

The findings from the QWK analysis provided further confirmation of the inter-annotator agreement. The QWK values covered a range from 0.950 ( p  = 0.000) for sentence and word number to 0.695 for synonym overlap number (keyword) and grammatical errors ( p  = 0.001).

Agreement of annotation outcomes between human and LLM

To evaluate the consistency between human annotators and LLM annotators (BERT, GPT, OCLL) across the indices, the same test was conducted. The results of the inter-annotator agreement (F-score) between LLM and human annotation are provided in Appendix B-D. The F-scores ranged from 0.706 for Grammatical error # for OCLL-human to a perfect 1.000 for GPT-human, for sentences, clauses, T-units, and words. These findings were further supported by the QWK analysis, which showed agreement levels ranging from 0.807 ( p  = 0.001) for metadiscourse markers for OCLL-human to 0.962 for words ( p  = 0.000) for GPT-human. The findings demonstrated that the LLM annotation achieved a significant level of accuracy in identifying measurement units and counts.

Reliability of LLM-driven AES’s scoring and discriminating proficiency levels

This section examines the reliability of the LLM-driven AES scoring through a comparison of the scoring outcomes produced by human raters and the LLM ( Reliability of LLM-driven AES scoring ). It also assesses the effectiveness of the LLM-based AES system in differentiating participants with varying proficiency levels ( Reliability of LLM-driven AES discriminating proficiency levels ).

Reliability of LLM-driven AES scoring

Table 8 summarizes the QWK coefficient analysis between the scores computed by the human raters and the GPT-4 for the individual essays from I-JAS Footnote 7 . As shown, the QWK of all measures ranged from k  = 0.819 for lexical density (number of lexical words (tokens)/number of words per essay) to k  = 0.644 for word2vec cosine similarity. Table 9 further presents the Pearson correlations between the 16 writing proficiency measures scored by human raters and GPT 4 for the individual essays. The correlations ranged from 0.672 for syntactic complexity to 0.734 for grammatical accuracy. The correlations between the writing proficiency scores assigned by human raters and the BERT-based AES system were found to range from 0.661 for syntactic complexity to 0.713 for grammatical accuracy. The correlations between the writing proficiency scores given by human raters and the OCLL-based AES system ranged from 0.654 for cohesion to 0.721 for grammatical accuracy. These findings indicated an alignment between the assessments made by human raters and both the BERT-based and OCLL-based AES systems in terms of various aspects of writing proficiency.

Reliability of LLM-driven AES discriminating proficiency levels

After validating the reliability of the LLM’s annotation and scoring, the subsequent objective was to evaluate its ability to distinguish between various proficiency levels. For this analysis, a dataset of 686 individual essays was utilized. Table 10 presents a sample of the results, summarizing the means, standard deviations, and the outcomes of the one-way ANOVAs based on the measures assessed by the GPT-4 model. A post hoc multiple comparison test, specifically the Bonferroni test, was conducted to identify any potential differences between pairs of levels.

As the results reveal, seven measures presented linear upward or downward progress across the three proficiency levels. These were marked in bold in Table 10 and comprise one measure of lexical richness, i.e. MATTR (lexical diversity); four measures of syntactic complexity, i.e. MDD (mean dependency distance), MLC (mean length of clause), CNT (complex nominals per T-unit), CPC (coordinate phrases rate); one cohesion measure, i.e. word2vec cosine similarity and GER (grammatical error rate). Regarding the ability of the sixteen measures to distinguish adjacent proficiency levels, the Bonferroni tests indicated that statistically significant differences exist between the primary level and the intermediate level for MLC and GER. One measure of lexical richness, namely LD, along with three measures of syntactic complexity (VPT, CT, DCT, ACC), two measures of cohesion (SOPT, SOPK), and one measure of content elaboration (IMM), exhibited statistically significant differences between proficiency levels. However, these differences did not demonstrate a linear progression between adjacent proficiency levels. No significant difference was observed in lexical sophistication between proficiency levels.

To summarize, our study aimed to evaluate the reliability and differentiation capabilities of the LLM-driven AES method. For the first objective, we assessed the LLM’s ability to differentiate between test takers with varying levels of oral proficiency using precision, recall, F-Score, and quadratically-weighted kappa. Regarding the second objective, we compared the scoring outcomes generated by human raters and the LLM to determine the level of agreement. We employed quadratically-weighted kappa and Pearson correlations to compare the 16 writing proficiency measures for the individual essays. The results confirmed the feasibility of using the LLM for annotation and scoring in AES for nonnative Japanese. As a result, Research Question 1 has been addressed.

Comparison of BERT-, GPT-, OCLL-based AES, and linguistic-feature-based computation methods

This section aims to compare the effectiveness of five AES methods for nonnative Japanese writing, i.e. LLM-driven approaches utilizing BERT, GPT, and OCLL, linguistic feature-based approaches using Jess and JWriter. The comparison was conducted by comparing the ratings obtained from each approach with human ratings. All ratings were derived from the dataset introduced in Dataset . To facilitate the comparison, the agreement between the automated methods and human ratings was assessed using QWK and PRMSE. The performance of each approach was summarized in Table 11 .

The QWK coefficient values indicate that LLMs (GPT, BERT, OCLL) and human rating outcomes demonstrated higher agreement compared to feature-based AES methods (Jess and JWriter) in assessing writing proficiency criteria, including lexical richness, syntactic complexity, content, and grammatical accuracy. Among the LLMs, the GPT-4 driven AES and human rating outcomes showed the highest agreement in all criteria, except for syntactic complexity. The PRMSE values suggest that the GPT-based method outperformed linguistic feature-based methods and other LLM-based approaches. Moreover, an interesting finding emerged during the study: the agreement coefficient between GPT-4 and human scoring was even higher than the agreement between different human raters themselves. This discovery highlights the advantage of GPT-based AES over human rating. Ratings involve a series of processes, including reading the learners’ writing, evaluating the content and language, and assigning scores. Within this chain of processes, various biases can be introduced, stemming from factors such as rater biases, test design, and rating scales. These biases can impact the consistency and objectivity of human ratings. GPT-based AES may benefit from its ability to apply consistent and objective evaluation criteria. By prompting the GPT model with detailed writing scoring rubrics and linguistic features, potential biases in human ratings can be mitigated. The model follows a predefined set of guidelines and does not possess the same subjective biases that human raters may exhibit. This standardization in the evaluation process contributes to the higher agreement observed between GPT-4 and human scoring. Section Prompt strategy of the study delves further into the role of prompts in the application of LLMs to AES. It explores how the choice and implementation of prompts can impact the performance and reliability of LLM-based AES methods. Furthermore, it is important to acknowledge the strengths of the local model, i.e. the Japanese local model OCLL, which excels in processing certain idiomatic expressions. Nevertheless, our analysis indicated that GPT-4 surpasses local models in AES. This superior performance can be attributed to the larger parameter size of GPT-4, estimated to be between 500 billion and 1 trillion, which exceeds the sizes of both BERT and the local model OCLL.

Prompt strategy

In the context of prompt strategy, Mizumoto and Eguchi ( 2023 ) conducted a study where they applied the GPT-3 model to automatically score English essays in the TOEFL test. They found that the accuracy of the GPT model alone was moderate to fair. However, when they incorporated linguistic measures such as cohesion, syntactic complexity, and lexical features alongside the GPT model, the accuracy significantly improved. This highlights the importance of prompt engineering and providing the model with specific instructions to enhance its performance. In this study, a similar approach was taken to optimize the performance of LLMs. GPT-4, which outperformed BERT and OCLL, was selected as the candidate model. Model 1 was used as the baseline, representing GPT-4 without any additional prompting. Model 2, on the other hand, involved GPT-4 prompted with 16 measures that included scoring criteria, efficient linguistic features for writing assessment, and detailed measurement units and calculation formulas. The remaining models (Models 3 to 18) utilized GPT-4 prompted with individual measures. The performance of these 18 different models was assessed using the output indicators described in Section Criteria (output indicator) . By comparing the performances of these models, the study aimed to understand the impact of prompt engineering on the accuracy and effectiveness of GPT-4 in AES tasks.

Based on the PRMSE scores presented in Fig. 4 , it was observed that Model 1, representing GPT-4 without any additional prompting, achieved a fair level of performance. However, Model 2, which utilized GPT-4 prompted with all measures, outperformed all other models in terms of PRMSE score, achieving a score of 0.681. These results indicate that the inclusion of specific measures and prompts significantly enhanced the performance of GPT-4 in AES. Among the measures, syntactic complexity was found to play a particularly significant role in improving the accuracy of GPT-4 in assessing writing quality. Following that, lexical diversity emerged as another important factor contributing to the model’s effectiveness. The study suggests that a well-prompted GPT-4 can serve as a valuable tool to support human assessors in evaluating writing quality. By utilizing GPT-4 as an automated scoring tool, the evaluation biases associated with human raters can be minimized. This has the potential to empower teachers by allowing them to focus on designing writing tasks and guiding writing strategies, while leveraging the capabilities of GPT-4 for efficient and reliable scoring.

figure 4

PRMSE scores of the 18 AES models.

This study aimed to investigate two main research questions: the feasibility of utilizing LLMs for AES and the impact of prompt engineering on the application of LLMs in AES.

To address the first objective, the study compared the effectiveness of five different models: GPT, BERT, the Japanese local LLM (OCLL), and two conventional machine learning-based AES tools (Jess and JWriter). The PRMSE values indicated that the GPT-4-based method outperformed other LLMs (BERT, OCLL) and linguistic feature-based computational methods (Jess and JWriter) across various writing proficiency criteria. Furthermore, the agreement coefficient between GPT-4 and human scoring surpassed the agreement among human raters themselves, highlighting the potential of using the GPT-4 tool to enhance AES by reducing biases and subjectivity, saving time, labor, and cost, and providing valuable feedback for self-study. Regarding the second goal, the role of prompt design was investigated by comparing 18 models, including a baseline model, a model prompted with all measures, and 16 models prompted with one measure at a time. GPT-4, which outperformed BERT and OCLL, was selected as the candidate model. The PRMSE scores of the models showed that GPT-4 prompted with all measures achieved the best performance, surpassing the baseline and other models.

In conclusion, this study has demonstrated the potential of LLMs in supporting human rating in assessments. By incorporating automation, we can save time and resources while reducing biases and subjectivity inherent in human rating processes. Automated language assessments offer the advantage of accessibility, providing equal opportunities and economic feasibility for individuals who lack access to traditional assessment centers or necessary resources. LLM-based language assessments provide valuable feedback and support to learners, aiding in the enhancement of their language proficiency and the achievement of their goals. This personalized feedback can cater to individual learner needs, facilitating a more tailored and effective language-learning experience.

There are three important areas that merit further exploration. First, prompt engineering requires attention to ensure optimal performance of LLM-based AES across different language types. This study revealed that GPT-4, when prompted with all measures, outperformed models prompted with fewer measures. Therefore, investigating and refining prompt strategies can enhance the effectiveness of LLMs in automated language assessments. Second, it is crucial to explore the application of LLMs in second-language assessment and learning for oral proficiency, as well as their potential in under-resourced languages. Recent advancements in self-supervised machine learning techniques have significantly improved automatic speech recognition (ASR) systems, opening up new possibilities for creating reliable ASR systems, particularly for under-resourced languages with limited data. However, challenges persist in the field of ASR. First, ASR assumes correct word pronunciation for automatic pronunciation evaluation, which proves challenging for learners in the early stages of language acquisition due to diverse accents influenced by their native languages. Accurately segmenting short words becomes problematic in such cases. Second, developing precise audio-text transcriptions for languages with non-native accented speech poses a formidable task. Last, assessing oral proficiency levels involves capturing various linguistic features, including fluency, pronunciation, accuracy, and complexity, which are not easily captured by current NLP technology.

Data availability

The dataset utilized was obtained from the International Corpus of Japanese as a Second Language (I-JAS). The data URLs: [ https://www2.ninjal.ac.jp/jll/lsaj/ihome2.html ].

J-CAT and TTBJ are two computerized adaptive tests used to assess Japanese language proficiency.

SPOT is a specific component of the TTBJ test.

J-CAT: https://www.j-cat2.org/html/ja/pages/interpret.html

SPOT: https://ttbj.cegloc.tsukuba.ac.jp/p1.html#SPOT .

The study utilized a prompt-based GPT-4 model, developed by OpenAI, which has an impressive architecture with 1.8 trillion parameters across 120 layers. GPT-4 was trained on a vast dataset of 13 trillion tokens, using two stages: initial training on internet text datasets to predict the next token, and subsequent fine-tuning through reinforcement learning from human feedback.

https://www2.ninjal.ac.jp/jll/lsaj/ihome2-en.html .

http://jhlee.sakura.ne.jp/JEV/ by Japanese Learning Dictionary Support Group 2015.

We express our sincere gratitude to the reviewer for bringing this matter to our attention.

On February 7, 2023, Microsoft began rolling out a major overhaul to Bing that included a new chatbot feature based on OpenAI’s GPT-4 (Bing.com).

Appendix E-F present the analysis results of the QWK coefficient between the scores computed by the human raters and the BERT, OCLL models.

Attali Y, Burstein J (2006) Automated essay scoring with e-rater® V.2. J. Technol., Learn. Assess., 4

Barkaoui K, Hadidi A (2020) Assessing Change in English Second Language Writing Performance (1st ed.). Routledge, New York. https://doi.org/10.4324/9781003092346

Bentz C, Tatyana R, Koplenig A, Tanja S (2016) A comparison between morphological complexity. measures: Typological data vs. language corpora. In Proceedings of the workshop on computational linguistics for linguistic complexity (CL4LC), 142–153. Osaka, Japan: The COLING 2016 Organizing Committee

Bond TG, Yan Z, Heene M (2021) Applying the Rasch model: Fundamental measurement in the human sciences (4th ed). Routledge

Brants T (2000) Inter-annotator agreement for a German newspaper corpus. Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00), Athens, Greece, 31 May-2 June, European Language Resources Association

Brown TB, Mann B, Ryder N, et al. (2020) Language models are few-shot learners. Advances in Neural Information Processing Systems, Online, 6–12 December, Curran Associates, Inc., Red Hook, NY

Burstein J (2003) The E-rater scoring engine: Automated essay scoring with natural language processing. In Shermis MD and Burstein JC (ed) Automated Essay Scoring: A Cross-Disciplinary Perspective. Lawrence Erlbaum Associates, Mahwah, NJ

Čech R, Miroslav K (2018) Morphological richness of text. In Masako F, Václav C (ed) Taming the corpus: From inflection and lexis to interpretation, 63–77. Cham, Switzerland: Springer Nature

Çöltekin Ç, Taraka, R (2018) Exploiting Universal Dependencies treebanks for measuring morphosyntactic complexity. In Aleksandrs B, Christian B (ed), Proceedings of first workshop on measuring language complexity, 1–7. Torun, Poland

Crossley SA, Cobb T, McNamara DS (2013) Comparing count-based and band-based indices of word frequency: Implications for active vocabulary research and pedagogical applications. System 41:965–981. https://doi.org/10.1016/j.system.2013.08.002

Article   Google Scholar  

Crossley SA, McNamara DS (2016) Say more and be more coherent: How text elaboration and cohesion can increase writing quality. J. Writ. Res. 7:351–370

CyberAgent Inc (2023) Open-Calm series of Japanese language models. Retrieved from: https://www.cyberagent.co.jp/news/detail/id=28817

Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, Minneapolis, Minnesota, 2–7 June, pp. 4171–4186. Association for Computational Linguistics

Diez-Ortega M, Kyle K (2023) Measuring the development of lexical richness of L2 Spanish: a longitudinal learner corpus study. Studies in Second Language Acquisition 1-31

Eckes T (2009) On common ground? How raters perceive scoring criteria in oral proficiency testing. In Brown A, Hill K (ed) Language testing and evaluation 13: Tasks and criteria in performance assessment (pp. 43–73). Peter Lang Publishing

Elliot S (2003) IntelliMetric: from here to validity. In: Shermis MD, Burstein JC (ed) Automated Essay Scoring: A Cross-Disciplinary Perspective. Lawrence Erlbaum Associates, Mahwah, NJ

Google Scholar  

Engber CA (1995) The relationship of lexical proficiency to the quality of ESL compositions. J. Second Lang. Writ. 4:139–155

Garner J, Crossley SA, Kyle K (2019) N-gram measures and L2 writing proficiency. System 80:176–187. https://doi.org/10.1016/j.system.2018.12.001

Haberman SJ (2008) When can subscores have value? J. Educat. Behav. Stat., 33:204–229

Haberman SJ, Yao L, Sinharay S (2015) Prediction of true test scores from observed item scores and ancillary data. Brit. J. Math. Stat. Psychol. 68:363–385

Halliday MAK (1985) Spoken and Written Language. Deakin University Press, Melbourne, Australia

Hirao R, Arai M, Shimanaka H et al. (2020) Automated essay scoring system for nonnative Japanese learners. Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pp. 1250–1257. European Language Resources Association

Hunt KW (1966) Recent Measures in Syntactic Development. Elementary English, 43(7), 732–739. http://www.jstor.org/stable/41386067

Ishioka T (2001) About e-rater, a computer-based automatic scoring system for essays [Konpyūta ni yoru essei no jidō saiten shisutemu e − rater ni tsuite]. University Entrance Examination. Forum [Daigaku nyūshi fōramu] 24:71–76

Hochreiter S, Schmidhuber J (1997) Long short- term memory. Neural Comput. 9(8):1735–1780

Article   CAS   PubMed   Google Scholar  

Ishioka T, Kameda M (2006) Automated Japanese essay scoring system based on articles written by experts. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, 17–18 July 2006, pp. 233-240. Association for Computational Linguistics, USA

Japan Foundation (2021) Retrieved from: https://www.jpf.gp.jp/j/project/japanese/survey/result/dl/survey2021/all.pdf

Jarvis S (2013a) Defining and measuring lexical diversity. In Jarvis S, Daller M (ed) Vocabulary knowledge: Human ratings and automated measures (Vol. 47, pp. 13–44). John Benjamins. https://doi.org/10.1075/sibil.47.03ch1

Jarvis S (2013b) Capturing the diversity in lexical diversity. Lang. Learn. 63:87–106. https://doi.org/10.1111/j.1467-9922.2012.00739.x

Jiang J, Quyang J, Liu H (2019) Interlanguage: A perspective of quantitative linguistic typology. Lang. Sci. 74:85–97

Kim M, Crossley SA, Kyle K (2018) Lexical sophistication as a multidimensional phenomenon: Relations to second language lexical proficiency, development, and writing quality. Mod. Lang. J. 102(1):120–141. https://doi.org/10.1111/modl.12447

Kojima T, Gu S, Reid M et al. (2022) Large language models are zero-shot reasoners. Advances in Neural Information Processing Systems, New Orleans, LA, 29 November-1 December, Curran Associates, Inc., Red Hook, NY

Kyle K, Crossley SA (2015) Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Q 49:757–786

Kyle K, Crossley SA, Berger CM (2018) The tool for the automatic analysis of lexical sophistication (TAALES): Version 2.0. Behav. Res. Methods 50:1030–1046. https://doi.org/10.3758/s13428-017-0924-4

Article   PubMed   Google Scholar  

Kyle K, Crossley SA, Jarvis S (2021) Assessing the validity of lexical diversity using direct judgements. Lang. Assess. Q. 18:154–170. https://doi.org/10.1080/15434303.2020.1844205

Landauer TK, Laham D, Foltz PW (2003) Automated essay scoring and annotation of essays with the Intelligent Essay Assessor. In Shermis MD, Burstein JC (ed), Automated Essay Scoring: A Cross-Disciplinary Perspective. Lawrence Erlbaum Associates, Mahwah, NJ

Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 159–174

Laufer B, Nation P (1995) Vocabulary size and use: Lexical richness in L2 written production. Appl. Linguist. 16:307–322. https://doi.org/10.1093/applin/16.3.307

Lee J, Hasebe Y (2017) jWriter Learner Text Evaluator, URL: https://jreadability.net/jwriter/

Lee J, Kobayashi N, Sakai T, Sakota K (2015) A Comparison of SPOT and J-CAT Based on Test Analysis [Tesuto bunseki ni motozuku ‘SPOT’ to ‘J-CAT’ no hikaku]. Research on the Acquisition of Second Language Japanese [Dainigengo to shite no nihongo no shūtoku kenkyū] (18) 53–69

Li W, Yan J (2021) Probability distribution of dependency distance based on a Treebank of. Japanese EFL Learners’ Interlanguage. J. Quant. Linguist. 28(2):172–186. https://doi.org/10.1080/09296174.2020.1754611

Article   MathSciNet   Google Scholar  

Linacre JM (2002) Optimizing rating scale category effectiveness. J. Appl. Meas. 3(1):85–106

PubMed   Google Scholar  

Linacre JM (1994) Constructing measurement with a Many-Facet Rasch Model. In Wilson M (ed) Objective measurement: Theory into practice, Volume 2 (pp. 129–144). Norwood, NJ: Ablex

Liu H (2008) Dependency distance as a metric of language comprehension difficulty. J. Cognitive Sci. 9:159–191

Liu H, Xu C, Liang J (2017) Dependency distance: A new perspective on syntactic patterns in natural languages. Phys. Life Rev. 21. https://doi.org/10.1016/j.plrev.2017.03.002

Loukina A, Madnani N, Cahill A, et al. (2020) Using PRMSE to evaluate automated scoring systems in the presence of label noise. Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, Seattle, WA, USA → Online, 10 July, pp. 18–29. Association for Computational Linguistics

Lu X (2010) Automatic analysis of syntactic complexity in second language writing. Int. J. Corpus Linguist. 15:474–496

Lu X (2012) The relationship of lexical richness to the quality of ESL learners’ oral narratives. Mod. Lang. J. 96:190–208

Lu X (2017) Automated measurement of syntactic complexity in corpus-based L2 writing research and implications for writing assessment. Lang. Test. 34:493–511

Lu X, Hu R (2022) Sense-aware lexical sophistication indices and their relationship to second language writing quality. Behav. Res. Method. 54:1444–1460. https://doi.org/10.3758/s13428-021-01675-6

Ministry of Health, Labor, and Welfare of Japan (2022) Retrieved from: https://www.mhlw.go.jp/stf/newpage_30367.html

Mizumoto A, Eguchi M (2023) Exploring the potential of using an AI language model for automated essay scoring. Res. Methods Appl. Linguist. 3:100050

Okgetheng B, Takeuchi K (2024) Estimating Japanese Essay Grading Scores with Large Language Models. Proceedings of 30th Annual Conference of the Language Processing Society in Japan, March 2024

Ortega L (2015) Second language learning explained? SLA across 10 contemporary theories. In VanPatten B, Williams J (ed) Theories in Second Language Acquisition: An Introduction

Rae JW, Borgeaud S, Cai T, et al. (2021) Scaling Language Models: Methods, Analysis & Insights from Training Gopher. ArXiv, abs/2112.11446

Read J (2000) Assessing vocabulary. Cambridge University Press. https://doi.org/10.1017/CBO9780511732942

Rudner LM, Liang T (2002) Automated Essay Scoring Using Bayes’ Theorem. J. Technol., Learning and Assessment, 1 (2)

Sakoda K, Hosoi Y (2020) Accuracy and complexity of Japanese Language usage by SLA learners in different learning environments based on the analysis of I-JAS, a learners’ corpus of Japanese as L2. Math. Linguist. 32(7):403–418. https://doi.org/10.24701/mathling.32.7_403

Suzuki N (1999) Summary of survey results regarding comprehensive essay questions. Final report of “Joint Research on Comprehensive Examinations for the Aim of Evaluating Applicability to Each Specialized Field of Universities” for 1996-2000 [shōronbun sōgō mondai ni kansuru chōsa kekka no gaiyō. Heisei 8 - Heisei 12-nendo daigaku no kaku senmon bun’ya e no tekisei no hyōka o mokuteki to suru sōgō shiken no arikata ni kansuru kyōdō kenkyū’ saishū hōkoku-sho]. University Entrance Examination Section Center Research and Development Department [Daigaku nyūshi sentā kenkyū kaihatsubu], 21–32

Taghipour K, Ng HT (2016) A neural approach to automated essay scoring. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, 1–5 November, pp. 1882–1891. Association for Computational Linguistics

Takeuchi K, Ohno M, Motojin K, Taguchi M, Inada Y, Iizuka M, Abo T, Ueda H (2021) Development of essay scoring methods based on reference texts with construction of research-available Japanese essay data. In IPSJ J 62(9):1586–1604

Ure J (1971) Lexical density: A computational technique and some findings. In Coultard M (ed) Talking about Text. English Language Research, University of Birmingham, Birmingham, England

Vaswani A, Shazeer N, Parmar N, et al. (2017) Attention is all you need. In Advances in Neural Information Processing Systems, Long Beach, CA, 4–7 December, pp. 5998–6008, Curran Associates, Inc., Red Hook, NY

Watanabe H, Taira Y, Inoue Y (1988) Analysis of essay evaluation data [Shōronbun hyōka dēta no kaiseki]. Bulletin of the Faculty of Education, University of Tokyo [Tōkyōdaigaku kyōiku gakubu kiyō], Vol. 28, 143–164

Yao S, Yu D, Zhao J, et al. (2023) Tree of thoughts: Deliberate problem solving with large language models. Advances in Neural Information Processing Systems, 36

Zenker F, Kyle K (2021) Investigating minimum text lengths for lexical diversity indices. Assess. Writ. 47:100505. https://doi.org/10.1016/j.asw.2020.100505

Zhang Y, Warstadt A, Li X, et al. (2021) When do you need billions of words of pretraining data? Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, pp. 1112-1125. Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.acl-long.90

Download references

This research was funded by National Foundation of Social Sciences (22BYY186) to Wenchao Li.

Author information

Authors and affiliations.

Department of Japanese Studies, Zhejiang University, Hangzhou, China

Department of Linguistics and Applied Linguistics, Zhejiang University, Hangzhou, China

You can also search for this author in PubMed   Google Scholar

Contributions

Wenchao Li is in charge of conceptualization, validation, formal analysis, investigation, data curation, visualization and writing the draft. Haitao Liu is in charge of supervision.

Corresponding author

Correspondence to Wenchao Li .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Ethical approval

Ethical approval was not required as the study did not involve human participants.

Informed consent

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental material file #1, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Li, W., Liu, H. Applying large language models for automated essay scoring for non-native Japanese. Humanit Soc Sci Commun 11 , 723 (2024). https://doi.org/10.1057/s41599-024-03209-9

Download citation

Received : 02 February 2024

Accepted : 16 May 2024

Published : 03 June 2024

DOI : https://doi.org/10.1057/s41599-024-03209-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

kazakh language essay

FIFA

Kazakhstan become first Europeans to reach Uzbekistan 2024

Birzhan Orazov and Douglas Junior got the goals as Kazakhstan beat the Netherlands to reach next year's FIFA Futsal World Cup.

KAUNAS, LITHUANIA - SEPTEMBER 27: Douglas of Kazakhstan celebrates victory after the FIFA Futsal World Cup 2021 Quarter Final match between IR Iran and Kazakhstan at Kaunas Arena on September 27, 2021 in Kaunas, Lithuania. (Photo by Oliver Hardt - FIFA/FIFA via Getty Images)

Kazakhstan beat Netherlands 2-0 to reach Uzbekistan 2024

Birzhan Orazov and Douglas Junior got the goals

Kazakhstan reached the semi-finals at the last FIFA Futsal World Cup

Kazakhstan have become the first European nation to qualify for the FIFA Futsal World Cup Uzbekistan 2024™. The Hawks only required a draw at home to the Netherlands to guarantee finishing first in Group A in the UEFA qualifiers with a game to spare, but won 2-0 to make it maximum points from a possible 15. Birzhan Orazov opened the scoring after 22 seconds, before Douglas Junior doubled the advantage just after the six-minute mark, and that's how it finished. Kazakhstan have emerged as a global force over the decade, largely thanks to the naturalisation of Leo Higuita, Douglas Junior, Taynan and Leo Jaragua. At the last World Cup, they thrashed Thailand 7-0 in the last 16 and came from two goals down to eliminate IR Iran in the quarter-finals. Kazakhstan led eventual champions Portugal in the semi-finals, only to lose on penalties, before a thrilling defeat by Brazil in the third-place play-off.

  • Quality Automatic Automatic HD
  • Speed Normal
  • Subtitle Options
  • Font family Default
  • Font color Default
  • Font opacity Default
  • Font size Default
  • Background color Default
  • Background opacity Default
  • Window color Default
  • Window opacity Default
  • Character edge style Default
  • Monospaced Serif
  • Proportional Serif
  • Monospaced Sans-Serif
  • Proportional Sans-Serif
  • Drop Shadow
  • , selected descriptions off

This is a modal window.

  • Powered by THEOplayer 2023.3.0

The Hawks also finished third and fourth at the UEFA Futsal EUROs of 2016 and 2018 respectively, while Almaty outfit Kairat won the UEFA Futsal Champions League twice, upsetting Dynamo Moscow and Barcelona in the 2013 and 2015 finals respectively. Kazakhstan shares a border with Uzbekistan. The next Futsal World Cup will kick off in the latter country in September 2024.

  • Share full article

photo

James Estrin/The New York Times

Students, some barely adolescent and some well into adulthood, come from all over the world to study at the Curtis Institute of Music in Philadelphia.

They study with nearly monastic focus, with the numbers and skill to operate as a world-class orchestra and opera company.

But they’re still young people growing up, experiencing triumphs and struggles for the first time, just in an extraordinary environment.

Supported by

At This School, the Students Live Entirely for Music

For a year, we followed five Curtis Institute of Music students as they made friends, pushed their artistry and stared down an uncertain future.

James Estrin

Photographs by James Estrin

Text by Joshua Barone

Reporting from Philadelphia

Delfin Demiray had packed too much. She was leaving her home in Ankara, Turkey, for the Curtis Institute of Music in Philadelphia. An 18-year-old who had never been to the United States, she didn’t know what to expect.

As she prepared for her flight in August, loading her suitcases with clothes and books, she was still surprised at the turn her life had taken. Demiray had played piano since she was 8, and had a gift for reproducing music she heard on TV at the keyboard; she also liked to improvise with friends and write melodies of her own. But she didn’t think of herself as a composer until a year ago, when she applied to Curtis and, to her shock, was accepted.

Her move to the United States would make her parents empty-nesters, but she tried not to think too much about the sadness of saying goodbye. “It’s just how life is,” said Demiray, now 19. “I feel like they are living their dreams through me.”

Her story is not so rare at Curtis, an extremely selective, tuition-free school whose roughly 150 students come from around the world to study with almost monastic focus. Even among conservatories, it is exceptional, with a wide age range — from preadolescence to post-baccalaureate adulthood — and a personalized approach, of schedules and repertoire, for musicians who live almost entirely for their art.

“We know what it feels like to have to go to bed early on a Saturday night because you have to wake up Sunday morning for a lesson,” said Dillon Scott , a viola student, “and we all know what it feels like to have a performance that was objectively good, but still could’ve been better.”

Some of the students are already professionals who perform outside school, as well as on the campus of Curtis, which maintains a full orchestra, an opera program and chamber music groups. Many of the musicians form friendships that lead to collaborations that endure throughout their careers. The list of alumni reads like a musical hall of fame, with titans like Leonard Bernstein and current stars like Lang Lang and Hilary Hahn.

During the 2023-24 year, The New York Times followed five students as they settled into new lives, pushed their artistry and planned as much as they could for an uncertain future.

A woman stands in front of a blue curtain with an orchestra behind her.

SCOTT, A 20-YEAR-OLD from Lansdale, Pa., about an hour away from Philadelphia, grew up determined to attend Curtis. He still feels a sense of awe as he walks into its main building, a historical mansion on Rittenhouse Square. “These four years are going to have the potential to be absolutely instrumental and life-changing,” he said. “But it’s not going to be dropped on my lap.”

Few students, even few professionals, behave like Scott. His mind is a fire hose of ambition and enterprising passion. He approaches music critically, wondering how he can use Curtis’s resources to unearth the works of overlooked, often Black, composers and bring it to audiences beyond the tired demographics of classical music.

Having already spent countless hours in the library assembling a list of about 25 composers, noting all their works and locating their scores, Scott programmed a series of on- and off-campus concerts for the fall, accompanied by talks, and brought 14 other students on board. At community performances, he smiled at the sight of security and staff from school who had come with their families, and at how visibly different the audience looked from a typical Curtis performance.

Busy with concerts, too, was a 25-year-old French soprano named Juliette Tacchino . She started the fall semester staring down her final year and auditions, but other singing opportunities quickly arose as other singers dropped out of performances. On one program, she sang the role of Sophie in a scene from “Der Rosenkavalier” under the baton of Yannick Nézet-Séguin, the music director of the Metropolitan Opera and the Philadelphia Orchestra, who teaches at Curtis.

The experience was double-edged. Tacchino, a sensitive wellspring of calm, was also occupied with being a resident coordinator at Lenfest Hall, where she took care of younger students and organized events like a trip to an animal shelter and a screening of “Maestro.” But Tacchino missed the movie because she had the flu. She had already been feeling under the weather as the stress of her added work was taking its toll, and the flu made things worse. She lost her voice several times, and even when she did get a break, visiting her boyfriend in Montreal over Thanksgiving, she was preparing for auditions.

One of Nézet-Séguin’s students was Micah Gleason , 28, an easygoing yet fiercely skilled conductor and singer, also in her final year. She lived off-campus with her partner, in an apartment outfitted with a school-provided piano, a mirror for watching herself conduct and equipment for her side gig as a photographer.

Gleason conducting and singing in Berio’s ‘Folk Songs’

Like Scott, Gleason thinks about how to push beyond the conventions of performance. For a fall concert in which she was both conducting and singing Berio’s “Folk Songs,” she brought in a lighting designer and tried to hire a movement director. (There, she was less successful.) In her free time, she started emailing people she knew to line up work after Curtis.

In the orchestra for that concert was the 17-year-old flute student Julin Cheung . He had been at Curtis since he was middle-school age, and because he was a minor, he lived with his parents, originally from Hong Kong and Kazakhstan, on Rittenhouse Square. They had moved to Philadelphia for his education from Seattle, where they still traveled during school breaks to visit family.

Cheung, an only child with a mature sensibility and wry humor, is both independent and still very much a teenager. He has friends at Curtis but often eats dinner with his parents at their apartment. His mother helps with some of the logistics of his musical life, but otherwise he manages his own time, finding the space to work on his home-school education. During the school year, he also took German lessons because the language might come in handy when he finishes at Curtis in 2025; he would like to continue his studies in Europe.

Cheung in Jolivet’s ‘Chant de Linos’

In student housing, Demiray was quickly making new friends. She was closest with her roommate, a horn player. They would gather on staircases at Lenfest with other students to sing choral music for fun. After attending a party during her first week, she joined a group to organize one of her own, a masquerade for the holidays.

During the semester, she also finished a string quartet that she had started on the flight from Turkey. As she rehearsed it, she realized how open she was to her music changing in the hands of others; it was the kind of lesson that can’t really be taught in the classroom. “It reminded me,” she said, “that everything we have in music is a matter of perspective.”

FEW CURTIS STUDENTS truly take time off during the month between semesters. Demiray, back in Ankara, read Kant and watched movies, but also continued to compose. Gleason, getting an early start on spring work, took on a conducting project at Dallas Opera. Cheung, at least, made room for catching up with friends and family in Seattle, and skiing.

Scott had a difficult time winding down from the fall semester, which he found excitingly intense; life at home, he said, was like “a vacuum.” At first, he didn’t sleep well because he felt as though he should be doing something. After a few days, he felt himself relax as he took his dog, a Rhodesian Ridgeback called Nandi, for long walks.

Tacchino went home to France, but as a resident coordinator, had to return early to prepare Lenfest for the spring semester. She had also picked up a tour in Florida, where she had never been. She saw more alligators than she would have liked, and it was unpleasantly hot, but she felt refreshed when she got back to school for more auditions and a starring role in Poulenc’s one-act opera “Les Mamelles de Tirésias.”

She had long been looking forward to that; her father, who had recently died, knew Poulenc. Tacchino grew up hearing about the composer, and listening to his music, including four-hands piano works that her parents would play. To her, the opera sounded like home.

IN THE NEW SEMESTER, Cheung went on tour with other Curtis musicians. He liked the independence of it, which felt like a taste of professional life, for better or worse: Not having to worry about school, he could focus on music, even with a hectic schedule. One concert in Florida ended around 10 p.m.; he and his fellow students got back to their hotel at 11, fell asleep around midnight, and were ready to board a shuttle at 4:50 a.m. to catch a flight to Dallas. But during downtime, they would go to a beach, or when the weather was bad, play cards in their hotel rooms.

After an entrepreneurial fall, Scott shifted his attention to technique. He had been gently directed to do so by his teachers, who include Curtis’s president, Roberto Díaz. Scott believed, he said, that “the better I can play the viola, the more credibility I’m going to have to advocate for the things I want to do.”

Scott playing George Walker’s Viola Sonata

He also relaxed a little by reading at night, taking up the Ray Bradbury stories he had loved as a child. In practice rooms, though, he was hard at work on a Bach suite and George Walker’s Viola Sonata, from 1989. He reached out to Walker’s son, and tracked down the violist who had first recorded the piece and a scholar who had written about it. Scott repeatedly returned to the score to mark it up; he thought about what story Walker was trying to tell with the music. The school decided to record his performance, and asked Scott to bring it back for a new-music concert next year.

THE WEEK BEFORE “Les Mamelles de Tirésias” opened, Tacchino tested positive for Covid-19. After months of unreliable health, and audition after audition, she was feeling overwhelmed. She was frustrated by the mixed messages she seemed to be receiving: that she was so young, that she was starting to get old, that she sounded great, that she wasn’t quite right for something. A comment by the tenor Matthew Polenzani, who gave a talk at the school, resonated with her: “He said, ‘There are days when you’re going to have the most incredible audition of your life, and you’re not going to get anything, and another day, you’re going to sing the crappiest audition of your life and get four gigs.”

Tacchino in ‘Les Mamelles de Tirésias’

Tacchino’s optimism held alongside her determination. She recovered in time for the Poulenc premiere, and decided to stay at Curtis an extra year, to perform in its centennial celebrations. In addition, she got into a young artist program in Paris, L’Atelier Lyrique , where she would work with the conductor David Stern.

Gleason’s persistence paid off, too. Because of her emails, she spent part of the spring semester working at the Juilliard School in New York on a production of Mozart’s “La Clemenza di Tito.” She signed with a manager, Intermusica , and continued to apply for conducting jobs. She and her partner decided that after graduation, they wanted to move to Chicago, where they used to live.

An excerpt from Demirary’s ‘Krizantem’

At a concert to showcase the work of composing students, Demiray presented her first piece for orchestra. She was the youngest on the program, and the evening was such a blur, she didn’t remember most of what she saw on video later. In the moment, she said, it felt like something simply happened and was over, but with some distance, she started to recognize how much progress was reflected in those 15 minutes.

TACCHINO HAD ONE more starring role left: the title fox in the Curtis production of Janacek’s “The Cunning Little Vixen.” It was yet another gig she had picked up after someone else dropped it, and it required her learning the material within a month. “But,” she said, “I feel like so many careers started out like that. It’s exciting.”

She received enthusiastic applause at the first performance, but the relief barely registered because after the run she would still have to present her master’s project. (The night of her final bows, she stayed up until 2 a.m. working on it.) Then she was done with the semester, though she had to stick around, in her other role, as resident coordinator. Comfortable with the year she’d had, she left to see her boyfriend in Montreal.

On the eve of graduation, Gleason presented a workshop performance of a chamber opera she was developing with Joanne Evans, a former classmate from Bard College and her duo partner. With the move to Chicago, she wasn’t sure whether she would walk at the graduation ceremony, but she was able to make it. “You only go to Curtis once,” she said.

Cheung played in Gleason’s workshop, before leaving Philadelphia to spend time in Seattle and audition for a piccolo seat at the Vancouver Symphony Orchestra. As a 17-year-old with a year of Curtis left, he wasn’t expecting much, but after two days, he was offered the job. “It’s an amazing opportunity,” he said, “but there’s a lot to be considered.”

It will be complicated, for example, if the orchestra wants him to start immediately, while he still has school (not to mention high school) to finish. If he could wait, he would take the position for a gap year he already had planned. But as he looked forward to the rest of the summer, including a program at the idyllic Verbier Festival in Switzerland, he wasn’t sure what would happen.

Scott landed a place at Verbier as well, in a different program. At the end of the semester, he took account of the year and congratulated himself on tripling his social media followers, playing the pieces he wanted to play and even starting to compose music of his own. He was already thinking about ideas for the next year, and the year after that.

As Demiray packed up her room, she felt sad to be leaving her new friends. At times, she had spent 24 hours straight with these people, experiencing things for the first time together. Back in Turkey, she was happy to see her parents, to have time to swim and to compose without a schedule. But she was also, in a way that surprised her, excited for the return of fall.

“Now,” she said, “I feel like I have two families.”

James Estrin is a photographer and writer who has been with The Times since 1992. More about James Estrin

Joshua Barone is the assistant classical music and dance editor on the Culture Desk and a contributing classical music critic. More about Joshua Barone

Advertisement

IMAGES

  1. (PDF) Kazakh linguistics in Kazakhstan: An outline

    kazakh language essay

  2. 7 Kazakh language ideas

    kazakh language essay

  3. (PDF) THE KAZAKH LANGUAGE HAS THE ASPECT CATEGORY IN ITS MATRIX

    kazakh language essay

  4. Beginning Kazakh

    kazakh language essay

  5. Language

    kazakh language essay

  6. (PDF) Conceptualization of the Kazakh language in the Linguistic

    kazakh language essay

VIDEO

  1. #5 lesson. Learn Kazakh language. Урок 5. Уроки казахского языка

  2. Kazakh lessons 15 (little bit about KZ)

  3. Казахский язык за месяц. 1

  4. #lesson71# Learn Kazakh language. Урок 71. Уроки казахского языка

  5. Learning KAZAKH LANGUAGE #english #kazakh

  6. Kazakh language dance|| funny video|| #youtubeshorts

COMMENTS

  1. Kazakh language

    A Kazakh speaker, recorded in Taiwan A Kazakh speaker, recorded in Kazakhstan. Kazakh or Qazaq (pronounced [qɑzɑqˈʃɑ], [qɑˈzɑq tɪˈlɪ]) is a Turkic language of the Kipchak branch spoken in Central Asia by Kazakhs.It is closely related to Nogai, Kyrgyz and Karakalpak.It is the official language of Kazakhstan and a significant minority language in the Ili Kazakh Autonomous Prefecture ...

  2. Essay About Kazakh Language

    Essay About Kazakh Language. 916 Words4 Pages. The Kazakh language is an important part of Kazakh culture, because the cultural heritage and knowledge of our ancestors are stored in the roots of the Kazakh language. It is my contention that Kazakh language, which is at the heart of Kazakh culture has become more prevalent and has changed Kazakh ...

  3. PDF A Grammar of Kazakh Zura Dotton, Ph.D John Doyle Wagner

    1.1 Locale and Speakers. The Kazakh language is spoken by approximately 12 million people throughout Central Asia, the former Soviet Union, and Western China and Mongolia. Principally, it is the sole. o cial language of the Republic of Kazakhstan, where it enjoys o cial status as the state language. It bears noting that, in addition to Kazakh ...

  4. Kazakh Language is Gaining Increasing Popularity, But Needs Greater

    NUR-SULTAN - People in Kazakhstan are increasingly becoming more interested in learning the Kazakh language in the aftermath of recent political events in Kazakhstan and in the world, says Kanat Tasibekov, a Kazakh language advocate and the author of the "Situational Kazakh" series of self-tutorials, in an interview to the Kazakhstanskaya Pravda newspaper.

  5. PDF Kazakh Language Learning Resources

    Kazakh Language Made Easy , by N. Kubaeva. Phrasebook and grammar, in Russian, Kazakh and English. Dated design but the illustrations can be useful. Kazakh Language: Grammar, Texts, Vocabulary , by Aijan Akhmetova. A beginner's textbook with fairly limited scope and basic design. Beginning Kazakh , by Ablahat Ibrahim.

  6. The Kazakh Language

    As of 1995 Kazakhstan had an estimated population of 17,377,000, Kazakhstan is 1,050,000 sq. miles and is located in central Asia. It borders Russia in the north, China in the east, Kyrgyzstan, Uzbekistan, and Turkmenistan in the south, and the Caspian Sea and European Russia in the west. Astana is the capital and Almaty is the largest city.

  7. The Kazakh Language

    900 Words4 Pages. Language Language characterizes the aggregate history of a community of people, their national identity, and the social lifestyle. A great Kazakh poet-warrior, Baurzhan Momyshuly, expressed his attitude to the state language with these words: "The loss of our native language equals the loss of history, ancestors and culture ".

  8. Kazakhstan

    Kazakhstan is a bilingual country: the Kazakh language, spoken by 64.4% of the population, has the status of the "state" language, while Russian, which is spoken by almost all. Kazakhstanis, is declared the "official" language, and is used routinely in business. Kazakh (also Qazaq) is a Turkic language closely related to Nogai and Karakalpak.

  9. The National Language Is the Wealth of The People

    The history of the Kazakh language reflects the entire history of the people. All actions, events and wise thoughts about the khans and biys that lived in different epochs, poets, speakers, heroes, all the wealth of people were preserved in the language of the nation. Wise words, the source of national wealth, from ancient times to modern poets and writers, reached the people through the ...

  10. PDF Kazakh language skills of Kazakh- medium students in a trilingual

    11-12 Grades students from. 2 Intellectual Schools. 30 Language and Content subject delivered in Kazakh L1 at 2 Intellectual Schools. Writing an essay. 38. Writing assignment was aligned with the assignments in External Summative Assessment (ESA), and the assessment criteria were tailored to test specification of the ESA.

  11. Culture of Kazakhstan

    The state language of the Republic of Kazakhstan is Kazakh. The state language is language used in public management, legislation, legal proceedings and paperwork management operating in all the field of public relations throughout the country. Our country is a multinational state and language policy in Kazakhstan has always been aimed at ...

  12. (PDF) Language nationalism and globalization: from distrust to

    The promotion of the Kazakh language can be considered an example of language nationalism, and is undertaken by all spheres of Kazakh society. This essay seeks to analyze the interaction between ...

  13. Kazakh language Essays

    Text And Discourse Essay 1115 Words | 5 Pages. Difference between text and discourse By Akishova Zamira Djanibekovna, Kazakh Ablai khan University of International Relations and World Languages Abstract The present article deals with the identity of two concepts such as "discourse" and "text".

  14. Language and Identity in Kazakhstan

    non-Kazakhs. The continuing debate on language demonstrates that even though Kazakhstan is now an independent country, fundamental questions remain about its identity. Ó 1998 The Regents of the University of California. Published by Elsevier Science Ltd Keywords: language, identity, Kazakhstan, language policy, Central Asia Introduction

  15. Kazakh language

    Kazakh language. Sort By: Page 1 of 50 - About 500 essays. Better Essays. Cultural Norms And Values Of Kazakhstan And Its People. 2081 Words; 8 Pages; Cultural Norms And Values Of Kazakhstan And Its People. Sometimes close mindedness stops the average American to look at the diversity of our country, after all, it is a country of immigrants ...

  16. History of Kazakh language graphics

    The formation and development of a language close to the modern Kazakh language took place in the 13th-14th centuries. One of the oldest types of letter script is the ancient Turkic script, which arose in the 6th-7th centuries. One of the early monuments of such writing was discovered in 1896-1897 in Kazakhstan near Aulie-Ata (now the city of ...

  17. Kazakh Language And Culture

    Because Kazakh is the state language, all Kazakhstani undergraduate students are required to take two courses (six credits) of the general Kazakh language. ... to compile meaningful texts on familiar or interesting topics and to write short essays on assigned topics by using word combinations and sentences learned in the course. KAZ1507 Upper ...

  18. Kazakh Means Freedom'

    This thesis examines the link between language policy and national identity in Kazakhstan, tracing the relationship between the two across history and describing how they have been affected by the Ukraine War. The Kazakh government has put considerable effort into developing a national identity for contemporary Kazakhstan, but conflicting standards of production make it difficult for a ...

  19. Present Continuous in Kazakh Grammar

    While the primary function of the Present Continuous tense is to express ongoing actions, it also conveys the following nuances in the Kazakh language: 1. Immediacy: For actions occurring at the moment of speaking, the present continuous tense is employed, highlighting their real-time nature. 2. Temporary states: The tense is used to denote ...

  20. About the KAZTEST system

    Kazakh language knowledge level assessment system - KAZTEST. KAZTEST system was established in 2006 on the base of National testing center in the aim of assessing Kazakh language level of the citizens of the Republic of Kazakhstan and foreigners operating in the territory of Kazakhstan.. KAZTEST is a home system of Kazakh language knowledge level assessment in accordance with the principles ...

  21. Kazakhstan

    Kazakhstan, largest country in Central Asia. It is bounded on the north by Russia, on the east by China, on the south by Kyrgyzstan, Uzbekistan, the Aral Sea, and Turkmenistan, and on the southwest by the Caspian Sea. It was a constituent republic of the Soviet Union and became independent in 1991.

  22. Kazakh cinema and the nation: a critical analysis

    Nation building is the process in question. This process is, as a rule, complicated in diverse countries, such as Kazakhstan. As a post-Soviet nation, it is still not sure how to define itself in the country and in the outside world. The crisis of the Kazakh identity is compromised by the manifold ethnic groups and cultures, juxtaposed by the clashes of Kazakh and Russian languages and ...

  23. Applying large language models for automated essay scoring for non

    Recent advancements in artificial intelligence (AI) have led to an increased use of large language models (LLMs) for language assessment tasks such as automated essay scoring (AES), automated ...

  24. Kazakhstan qualify for FIFA Futsal World Cup Uzbekistan 2024

    Kazakhstan beat Netherlands 2-0 to reach Uzbekistan 2024. Birzhan Orazov and Douglas Junior got the goals. Kazakhstan reached the semi-finals at the last FIFA Futsal World Cup

  25. At the Curtis Institute, Students Live Entirely for Music

    Delfin Demiray, a composing student, at the Curtis Institute of Music in Philadelphia. SCOTT, A 20-YEAR-OLD from Lansdale, Pa., about an hour away from Philadelphia, grew up determined to attend ...