PrepScholar

Choose Your Test

Sat / act prep online guides and tips, the 3 popular essay formats: which should you use.

author image

General Education

feature_canyonstars

Not sure which path your essay should follow? Formatting an essay may not be as interesting as choosing a topic to write about or carefully crafting elegant sentences, but it’s an extremely important part of creating a high-quality paper. In this article, we’ll explain essay formatting rules for three of the most popular essay styles: MLA, APA, and Chicago.

For each, we’ll do a high-level overview of what your essay’s structure and references should look like, then we include a comparison chart with nitty-gritty details for each style, such as which font you should use for each and whether they’re a proponent of the Oxford comma. We also include information on why essay formatting is important and what you should do if you’re not sure which style to use.

Why Is Your Essay Format Important?

Does it really matter which font size you use or exactly how you cite a source in your paper? It can! Style formats were developed as a way to standardize how pieces of writing and their works cited lists should look. 

Why is this necessary? Imagine you’re a teacher, researcher, or publisher who reviews dozens of papers a week. If the papers didn’t follow the same formatting rules, you could waste a lot of time trying to figure out which sources were used, if certain information is a direct quote or paraphrased, even who the paper’s author is. Having essay formatting rules to follow makes things easier for everyone involved. Writers can follow a set of guidelines without trying to decide for themselves which formatting choices are best, and readers don’t need to go hunting for the information they’re trying to find.

Next, we’ll discuss the three most common style formats for essays.

MLA Essay Format

MLA style was designed by the Modern Language Association, and it has become the most popular college essay format for students writing papers for class. It was originally developed for students and researchers in the literature and language fields to have a standardized way of formatting their papers, but it is now used by people in all disciplines, particularly humanities. MLA is often the style teachers prefer their students to use because it has simple, clear rules to follow without extraneous inclusions often not needed for school papers. For example, unlike APA or Chicago styles, MLA doesn’t require a title page for a paper, only a header in the upper left-hand corner of the page.

MLA style doesn’t have any specific requirements for how to write your essay, but an MLA format essay will typically follow the standard essay format of an introduction (ending with a thesis statement), several body paragraphs, and a conclusion.

One of the nice things about creating your works cited for MLA is that all references are structured the same way, regardless of whether they’re a book, newspaper, etc. It’s the only essay format style that makes citing references this easy! Here is a guide on how to cite any source in MLA format. When typing up your works cited, here are a few MLA format essay rules to keep in mind:

  • The works cited page should be the last paper of your paper.
  • This page should still be double-spaced and include the running header of your last name and page number.
  • It should begin with “Works Cited” at the top of the page, centered.
  • Your works cited should be organized in alphabetical order, based on the first word of the citation.

APA Essay Format

APA stands for the American Psychological Association. This format type is most often used for research papers, specifically those in behavioral sciences (such as psychology and neuroscience) and social sciences (ranging from archeology to economics). Because APA is often used for more research-focused papers, they have a more specific format to follow compared to, say, MLA style.

All APA style papers begin with a title page, which contains the title of the paper (in capital letters), your name, and your institutional affiliation (if you’re a student, then this is simply the name of the school you attend). The APA recommends the title of your paper not be longer than 12 words.

After your title page, your paper begins with an abstract. The abstract is a single paragraph, typically between 150 to 250 words, that sums up your research. It should include the topic you’re researching, research questions, methods, results, analysis, and a conclusion that touches on the significance of the research. Many people find it easier to write the abstract last, after completing the paper.

After the abstract comes the paper itself. APA essay format recommends papers be short, direct, and make their point clearly and concisely. This isn’t the time to use flowery language or extraneous descriptions. Your paper should include all the sections mentioned in the abstract, each expanded upon.

Following the paper is the list of references used. Unlike MLA style, in APA essay format, every source type is referenced differently. So the rules for referencing a book are different from those for referencing a journal article are different from those referencing an interview. Here’s a guide for how to reference different source types in APA format . Your references should begin on a new page that says “REFERENCES” at the top, centered. The references should be listed in alphabetical order.

body_bookshelves

Chicago Essay Format

Chicago style (sometimes referred to as “Turabian style”) was developed by the University of Chicago Press and is typically the least-used by students of the three major essay style formats. The Chicago Manual of Style (currently on its 17th edition) contains within its 1000+ pages every rule you need to know for this style. This is a very comprehensive style, with a rule for everything. It’s most often used in history-related fields, although many people refer to The Chicago Manual of Style for help with a tricky citation or essay format question. Many book authors use this style as well.

Like APA, Chicago style begins with a title page, and it has very specific format rules for doing this which are laid out in the chart below. After the title page may come an abstract, depending on whether you’re writing a research paper or not. Then comes the essay itself. The essay can either follow the introduction → body → conclusion format of MLA or the different sections included in the APA section. Again, this depends on whether you’re writing a paper on research you conducted or not.

Unlike MLA or APA, Chicago style typically uses footnotes or endnotes instead of in-text or parenthetical citations. You’ll place the superscript number at the end of the sentence (for a footnote) or end of the page (for an endnote), then have an abbreviated source reference at the bottom of the page. The sources will then be fully referenced at the end of the paper, in the order of their footnote/endnote numbers. The reference page should be titled “Bibliography” if you used footnotes/endnotes or “References” if you used parenthetical author/date in-text citations.

Comparison Chart

Below is a chart comparing different formatting rules for APA, Chicago, and MLA styles.

How Should You Format Your Essay If Your Teacher Hasn’t Specified a Format?

What if your teacher hasn’t specified which essay format they want you to use? The easiest way to solve this problem is simply to ask your teacher which essay format they prefer. However, if you can’t get ahold of them or they don’t have a preference, we recommend following MLA format. It’s the most commonly-used essay style for students writing papers that aren’t based on their own research, and its formatting rules are general enough that a teacher of any subject shouldn’t have a problem with an MLA format essay. The fact that this style has one of the simplest sets of rules for citing sources is an added bonus!

feature_argumentativeessay-1

What's Next?

Thinking about taking an AP English class? Read our guide on AP English classes to learn whether you should take AP English Language or AP English Literature (or both!)

Compound sentences are an importance sentence type to know. Read our guide on compound sentences for everything you need to know about compound, complex, and compound-complex sentences.

Need ideas for a research paper topic? Our guide to research paper topics has over 100 topics in ten categories so you can be sure to find the perfect topic for you.

author image

Christine graduated from Michigan State University with degrees in Environmental Biology and Geography and received her Master's from Duke University. In high school she scored in the 99th percentile on the SAT and was named a National Merit Finalist. She has taught English and biology in several countries.

Ask a Question Below

Have any questions about this article or other topics? Ask below and we'll reply!

Improve With Our Famous Guides

  • For All Students

The 5 Strategies You Must Be Using to Improve 160+ SAT Points

How to Get a Perfect 1600, by a Perfect Scorer

Series: How to Get 800 on Each SAT Section:

Score 800 on SAT Math

Score 800 on SAT Reading

Score 800 on SAT Writing

Series: How to Get to 600 on Each SAT Section:

Score 600 on SAT Math

Score 600 on SAT Reading

Score 600 on SAT Writing

Free Complete Official SAT Practice Tests

What SAT Target Score Should You Be Aiming For?

15 Strategies to Improve Your SAT Essay

The 5 Strategies You Must Be Using to Improve 4+ ACT Points

How to Get a Perfect 36 ACT, by a Perfect Scorer

Series: How to Get 36 on Each ACT Section:

36 on ACT English

36 on ACT Math

36 on ACT Reading

36 on ACT Science

Series: How to Get to 24 on Each ACT Section:

24 on ACT English

24 on ACT Math

24 on ACT Reading

24 on ACT Science

What ACT target score should you be aiming for?

ACT Vocabulary You Must Know

ACT Writing: 15 Tips to Raise Your Essay Score

How to Get Into Harvard and the Ivy League

How to Get a Perfect 4.0 GPA

How to Write an Amazing College Essay

What Exactly Are Colleges Looking For?

Is the ACT easier than the SAT? A Comprehensive Guide

Should you retake your SAT or ACT?

When should you take the SAT or ACT?

Stay Informed

Follow us on Facebook (icon)

Get the latest articles and test prep tips!

Looking for Graduate School Test Prep?

Check out our top-rated graduate blogs here:

GRE Online Prep Blog

GMAT Online Prep Blog

TOEFL Online Prep Blog

Holly R. "I am absolutely overjoyed and cannot thank you enough for helping me!”

10 paragraph essay format

Guide on How to Write a 5 Paragraph Essay Effortlessly

10 paragraph essay format

Defining What Is a 5 Paragraph Essay

Have you ever been assigned a five-paragraph essay and wondered what exactly it means? Don't worry; we all have been there. A five-paragraph essay is a standard academic writing format consisting of an introduction, three body paragraphs, and a conclusion.

In the introduction, you present your thesis statement, which is the main idea or argument you will discuss in your essay. The three body paragraphs present a separate supporting argument, while the conclusion summarizes the main points and restates the thesis differently.

While the five-paragraph essay is a tried and true format for many academic assignments, it's important to note that it's not the only way to write an essay. In fact, some educators argue that strict adherence to this format can stifle creativity and limit the development of more complex ideas.

However, mastering the five-paragraph essay is a valuable skill for any student, as it teaches the importance of structure and organization in writing. Also, it enables you to communicate your thoughts clearly and eloquently, which is crucial for effective communication in any area. So the next time you're faced with a five-paragraph essay assignment, embrace the challenge and use it as an opportunity to hone your writing skills.

And if you find it difficult to put your ideas into 5 paragraphs, ask our professional service - 'please write my essay ,' or ' write my paragraph ' and consider it done.

How to Write a 5 Paragraph Essay: General Tips

If you are struggling with how to write a 5 paragraph essay, don't worry! It's a common format that many students learn in their academic careers. Here are some tips from our admission essay writing service to help you write a successful five paragraph essay example:

How to Write a 5 Paragraph Essay Effortlessly

  • Start with a strong thesis statement : Among the 5 parts of essay, the thesis statement can be the most important. It presents the major topic you will debate throughout your essay while being explicit and simple.
  • Use topic sentences to introduce each paragraph : The major idea you will address in each of the three body paragraphs should be established in a concise subject sentence.
  • Use evidence to support your arguments : The evidence you present in your body paragraphs should back up your thesis. This can include facts, statistics, or examples from your research or personal experience.
  • Include transitions: Use transitional words and phrases to make the flow of your essay easier. Words like 'although,' 'in addition,' and 'on the other hand' are examples of these.
  • Write a strong conclusion: In addition to restating your thesis statement in a new way, your conclusion should highlight the key ideas of your essay. You might also leave the reader with a closing idea or query to reflect on.
  • Edit and proofread: When you've completed writing your essay, thoroughly revise and proofread it. Make sure your thoughts are brief and clear and proofread your writing for grammatical and spelling mistakes.

By following these tips, you can write strong and effective five paragraph essays examples that will impress your teacher or professor.

5 Paragraph Essay Format

Let's readdress the five-paragraph essay format and explain it in more detail. So, as already mentioned, it is a widely-used writing structure taught in many schools and universities. A five-paragraph essay comprises an introduction, three body paragraphs, and a conclusion, each playing a significant role in creating a well-structured and coherent essay.

The introduction serves as the opening paragraph of the essay and sets the tone for the entire piece. It should captivate the reader's attention, provide relevant background information, and include a clear and concise thesis statement that presents the primary argument of the essay. For example, if the essay topic is about the benefits of exercise, the introduction may look something like this:

'Regular exercise provides numerous health benefits, including increased energy levels, improved mental health, and reduced risk of chronic diseases.'

The body paragraphs are the meat of the essay and should provide evidence and examples to support the thesis statement. Each body paragraph should begin with a subject sentence that states the major idea of the paragraph. Then, the writer should provide evidence to support the topic sentence. This evidence can be in the form of statistics, facts, or examples. For instance, if the essay is discussing the health benefits of exercise, a body paragraph might look like this:

'One of the key benefits of exercise is improved mental health. Regular exercise has been demonstrated in studies to lessen depressive and anxious symptoms and enhance mood.'

The essay's final paragraph, the conclusion, should repeat the thesis statement and summarize the essay's important ideas. A concluding idea or query might be included to give the reader something to ponder. For example, a conclusion for an essay on the benefits of exercise might look like this:

'In conclusion, exercise provides numerous health benefits, from increased energy levels to reduced risk of chronic diseases. We may enhance both our physical and emotional health and enjoy happier, more satisfying lives by including exercise into our daily routines.'

Overall, the 5 paragraph essay format is useful for organizing thoughts and ideas clearly and concisely. By following this format, writers can present their arguments logically and effectively, which is easy for the reader to follow.

Types of 5 Paragraph Essay 

There are several types of five-paragraph essays, each with a slightly different focus or purpose. Here are some of the most common types of five-paragraph essays:

How to Write a 5 Paragraph Essay Effortlessly

  • Narrative essay : A narrative essay tells a story or recounts a personal experience. It typically includes a clear introductory paragraph, body sections that provide details about the story, and a conclusion that wraps up the narrative.
  • Descriptive essay: A descriptive essay uses sensory language to describe a person, place, or thing. It often includes a clear thesis statement that identifies the subject of the description and body paragraphs that provide specific details to support the thesis.
  • Expository essay: An expository essay offers details or clarifies a subject. It usually starts with a concise introduction that introduces the subject, is followed by body paragraphs that provide evidence and examples to back up the thesis, and ends with a summary of the key points.
  • Persuasive essay: A persuasive essay argues for a particular viewpoint or position. It has a thesis statement that is clear, body paragraphs that give evidence and arguments in favor of it, and a conclusion that summarizes the important ideas and restates the thesis.
  • Compare and contrast essay: An essay that compares and contrasts two or more subjects and looks at their similarities and differences. It usually starts out simply by introducing the topics being contrasted or compared, followed by body paragraphs that go into more depth on the similarities and differences, and a concluding paragraph that restates the important points.

Each type of five-paragraph essay has its own unique characteristics and requirements. When unsure how to write five paragraph essay, writers can choose the most appropriate structure for their topic by understanding the differences between these types.

5 Paragraph Essay Example Topics

Here are some potential topics for a 5 paragraph essay example. These essay topics are just a starting point and can be expanded upon to fit a wide range of writing essays and prompts.

  • The Impact of Social Media on Teenage Communication Skills.
  • How Daily Exercise Benefits Mental and Physical Health.
  • The Importance of Learning a Second Language.
  • The Effects of Global Warming on Marine Life.
  • The Role of Technology in Modern Education.
  • The Influence of Music on Youth Culture.
  • The Pros and Cons of Uniform Policies in Schools.
  • The Significance of Historical Monuments in Cultural Identity.
  • The Growing Importance of Cybersecurity.
  • The Evolution of the American Dream.
  • The Impact of Diet on Cognitive Functioning.
  • The Role of Art in Society.
  • The Future of Renewable Energy Sources.
  • The Effects of Urbanization on Wildlife.
  • The Importance of Financial Literacy for Young Adults.
  • The Influence of Advertising on Consumer Choices.
  • The Role of Books in the Digital Age.\
  • The Benefits and Challenges of Space Exploration.
  • The Impact of Climate Change on Agriculture.
  • The Ethical Implications of Genetic Modification.

Don't Let Essay Writing Stress You Out!

Order a high-quality, custom-written paper from our professional writing service and take the first step towards academic success!

General Grading Rubric for a 5 Paragraph Essay

The following is a general grading rubric that can be used to evaluate a five-paragraph essay:

Content (40%)

  • A thesis statement is clear and specific
  • The main points are well-developed and supported by evidence
  • Ideas are organized logically and coherently
  • Evidence and examples are relevant and support the main points
  • The essay demonstrates a strong understanding of the topic

Organization (20%)

  • The introduction effectively introduces the topic and thesis statement
  • Body paragraphs are well-structured and have clear topic sentences
  • Transitions between paragraphs are smooth and effective
  • The concluding sentence effectively summarizes the main points and restates the thesis statement

Language and Style (20%)

  • Writing is clear, concise, and easy to understand
  • Language is appropriate for the audience and purpose
  • Vocabulary is varied and appropriate
  • Grammar, spelling, and punctuation are correct

Critical Thinking (20%)

  • Student demonstrate an understanding of the topic beyond surface-level knowledge
  • Student present a unique perspective or argument
  • Student show evidence of critical thinking and analysis
  • Students write well-supported conclusions

Considering the above, the paper should demonstrate a thorough understanding of the topic, clear organization, strong essay writing skills, and critical thinking. By using this grading rubric, the teacher can evaluate the essay holistically and provide detailed feedback to the student on areas of strength and areas for improvement.

Five Paragraph Essay Examples

Wrapping up: things to remember.

In conclusion, writing a five paragraph essay example can seem daunting at first, but it doesn't have to be a difficult task. Following these simple steps and tips, you can break down the process into manageable parts and create a clear, concise, and well-organized essay.

Remember to start with a strong thesis statement, use topic sentences to guide your paragraphs, and provide evidence and analysis to support your ideas. Don't forget to revise and proofread your work to make sure it is error-free and coherent. With time and practice, you'll be able to write a 5 paragraph essay with ease and assurance. Whether you're writing for school, work, or personal projects, these skills will serve you well and help you to communicate your ideas effectively.

Meanwhile, you can save time and reduce the stress associated with academic assignments by trusting our research paper writing services to handle the writing for you. So go ahead, buy an essay , and see how easy it can be to meet all of your professors' complex requirements!

Ready to Take the Stress Out of Essay Writing? 

Order your 5 paragraph essay today and enjoy a high-quality, custom-written paper delivered promptly

Adam Jason

is an expert in nursing and healthcare, with a strong background in history, law, and literature. Holding advanced degrees in nursing and public health, his analytical approach and comprehensive knowledge help students navigate complex topics. On EssayPro blog, Adam provides insightful articles on everything from historical analysis to the intricacies of healthcare policies. In his downtime, he enjoys historical documentaries and volunteering at local clinics.

10 paragraph essay format

Related Articles

conclusion for an essay

Essay Papers Writing Online

Ultimate guide to writing a five paragraph essay.

How to write a five paragraph essay

Are you struggling with writing essays? Do you find yourself lost in a sea of ideas, unable to structure your thoughts cohesively? The five paragraph essay is a tried-and-true method that can guide you through the writing process with ease. By mastering this format, you can unlock the key to successful and organized writing.

In this article, we will break down the five paragraph essay into easy steps that anyone can follow. From crafting a strong thesis statement to effectively supporting your arguments, we will cover all the essential components of a well-written essay. Whether you are a beginner or a seasoned writer, these tips will help you hone your skills and express your ideas clearly.

Step-by-Step Guide to Mastering the Five Paragraph Essay

Writing a successful five paragraph essay can seem like a daunting task, but with the right approach and strategies, it can become much more manageable. Follow these steps to master the art of writing a powerful five paragraph essay:

  • Understand the structure: The five paragraph essay consists of an introduction, three body paragraphs, and a conclusion. Each paragraph serves a specific purpose in conveying your message effectively.
  • Brainstorm and plan: Before you start writing, take the time to brainstorm ideas and create an outline. This will help you organize your thoughts and ensure that your essay flows smoothly.
  • Write the introduction: Start your essay with a strong hook to grab the reader’s attention. Your introduction should also include a thesis statement, which is the main argument of your essay.
  • Develop the body paragraphs: Each body paragraph should focus on a single point that supports your thesis. Use evidence, examples, and analysis to strengthen your argument and make your points clear.
  • Conclude effectively: In your conclusion, summarize your main points and restate your thesis in a new way. Leave the reader with a thought-provoking statement or a call to action.

By following these steps and practicing regularly, you can become proficient in writing five paragraph essays that are clear, coherent, and impactful. Remember to revise and edit your work for grammar, punctuation, and clarity to ensure that your essay is polished and professional.

Understanding the Structure of a Five Paragraph Essay

Understanding the Structure of a Five Paragraph Essay

When writing a five paragraph essay, it is important to understand the basic structure that makes up this type of essay. The five paragraph essay consists of an introduction, three body paragraphs, and a conclusion.

Introduction: The introduction is the first paragraph of the essay and sets the tone for the rest of the piece. It should include a hook to grab the reader’s attention, a thesis statement that presents the main idea of the essay, and a brief overview of what will be discussed in the body paragraphs.

Body Paragraphs: The body paragraphs make up the core of the essay and each paragraph should focus on a single point that supports the thesis statement. These paragraphs should include a topic sentence that introduces the main idea, supporting details or evidence, and explanations or analysis of how the evidence supports the thesis.

Conclusion: The conclusion is the final paragraph of the essay and it should summarize the main points discussed in the body paragraphs. It should restate the thesis in different words, and provide a closing thought or reflection on the topic.

By understanding the structure of a five paragraph essay, writers can effectively organize their thoughts and present their ideas in a clear and coherent manner.

Choosing a Strong Thesis Statement

One of the most critical elements of a successful five-paragraph essay is a strong thesis statement. Your thesis statement should clearly and concisely present the main argument or point you will be making in your essay. It serves as the foundation for the entire essay, guiding the reader on what to expect and helping you stay focused throughout your writing.

When choosing a thesis statement, it’s important to make sure it is specific, debatable, and relevant to your topic. Avoid vague statements or generalizations, as they will weaken your argument and fail to provide a clear direction for your essay. Instead, choose a thesis statement that is narrow enough to be effectively supported within the confines of a five-paragraph essay, but broad enough to allow for meaningful discussion.

By choosing a strong thesis statement, you set yourself up for a successful essay that is well-organized, coherent, and persuasive. Take the time to carefully craft your thesis statement, as it will serve as the guiding force behind your entire essay.

Developing Supporting Arguments in Body Paragraphs

When crafting the body paragraphs of your five paragraph essay, it is crucial to develop strong and coherent supporting arguments that back up your thesis statement. Each body paragraph should focus on a single supporting argument that contributes to the overall discussion of your topic.

To effectively develop your supporting arguments, consider using a table to organize your ideas. Start by listing your main argument in the left column, and then provide evidence, examples, and analysis in the right column. This structured approach can help you ensure that each supporting argument is fully developed and logically presented.

Additionally, be sure to use transitional phrases to smoothly connect your supporting arguments within and between paragraphs. Words like “furthermore,” “in addition,” and “on the other hand” can help readers follow your train of thought and understand the progression of your ideas.

Remember, the body paragraphs are where you provide the meat of your argument, so take the time to develop each supporting argument thoroughly and clearly. By presenting compelling evidence and analysis, you can effectively persuade your readers and strengthen the overall impact of your essay.

Polishing Your Writing: Editing and Proofreading Tips

Editing and proofreading are crucial steps in the writing process that can make a significant difference in the clarity and effectiveness of your essay. Here are some tips to help you polish your writing:

1. Take a break before editing: After you finish writing your essay, take a break before starting the editing process. This will help you approach your work with fresh eyes and catch mistakes more easily.

2. Read your essay aloud: Reading your essay aloud can help you identify awkward phrasing, grammar errors, and inconsistencies. This technique can also help you evaluate the flow and coherence of your writing.

3. Use a spelling and grammar checker: Utilize spelling and grammar checkers available in word processing software to catch common errors. However, be mindful that these tools may not catch all mistakes, so it’s essential to manually review your essay as well.

4. Check for coherence and organization: Make sure your ideas flow logically and cohesively throughout your essay. Ensure that each paragraph connects smoothly to the next, and that your arguments are supported by relevant evidence.

5. Look for consistency: Check for consistency in your writing style, tone, and formatting. Ensure that you maintain a consistent voice and perspective throughout your essay to keep your argument coherent.

6. Seek feedback from others: Consider asking a peer, teacher, or tutor to review your essay and provide feedback. External perspectives can help you identify blind spots and areas for improvement in your writing.

7. Proofread carefully: Finally, proofread your essay carefully to catch any remaining errors in spelling, grammar, punctuation, and formatting. Pay attention to details and make any necessary revisions before submitting your final draft.

By following these editing and proofreading tips, you can refine your writing and ensure that your essay is polished and ready for submission.

Tips for Successful Writing: Practice and Feedback

Writing is a skill that improves with practice. The more you write, the better you will become. Set aside time each day to practice writing essays, paragraph by paragraph. This consistent practice will help you develop your writing skills and grow more confident in expressing your ideas.

Seek feedback from your teachers, peers, or mentors. Constructive criticism can help you identify areas for improvement and provide valuable insights into your writing. Take their suggestions into consideration and use them to refine your writing style and structure.

  • Set writing goals for yourself and track your progress. Whether it’s completing a certain number of essays in a week or improving your introductions, having specific goals will keep you motivated and focused on your writing development.
  • Read widely to expand your vocabulary and expose yourself to different writing styles. The more you read, the more you will learn about effective writing techniques and ways to engage your readers.
  • Revise and edit your essays carefully. Pay attention to sentence structure, grammar, punctuation, and spelling. A well-polished essay will demonstrate your attention to detail and dedication to producing high-quality work.

Related Post

How to master the art of writing expository essays and captivate your audience, convenient and reliable source to purchase college essays online, step-by-step guide to crafting a powerful literary analysis essay, unlock success with a comprehensive business research paper example guide, unlock your writing potential with writers college – transform your passion into profession, “unlocking the secrets of academic success – navigating the world of research papers in college”, master the art of sociological expression – elevate your writing skills in sociology.

The Writing Center • University of North Carolina at Chapel Hill

What this handout is about

This handout will help you understand how paragraphs are formed, how to develop stronger paragraphs, and how to completely and clearly express your ideas.

What is a paragraph?

Paragraphs are the building blocks of papers. Many students define paragraphs in terms of length: a paragraph is a group of at least five sentences, a paragraph is half a page long, etc. In reality, though, the unity and coherence of ideas among sentences is what constitutes a paragraph. A paragraph is defined as “a group of sentences or a single sentence that forms a unit” (Lunsford and Connors 116). Length and appearance do not determine whether a section in a paper is a paragraph. For instance, in some styles of writing, particularly journalistic styles, a paragraph can be just one sentence long. Ultimately, a paragraph is a sentence or group of sentences that support one main idea. In this handout, we will refer to this as the “controlling idea,” because it controls what happens in the rest of the paragraph.

How do I decide what to put in a paragraph?

Before you can begin to determine what the composition of a particular paragraph will be, you must first decide on an argument and a working thesis statement for your paper. What is the most important idea that you are trying to convey to your reader? The information in each paragraph must be related to that idea. In other words, your paragraphs should remind your reader that there is a recurrent relationship between your thesis and the information in each paragraph. A working thesis functions like a seed from which your paper, and your ideas, will grow. The whole process is an organic one—a natural progression from a seed to a full-blown paper where there are direct, familial relationships between all of the ideas in the paper.

The decision about what to put into your paragraphs begins with the germination of a seed of ideas; this “germination process” is better known as brainstorming . There are many techniques for brainstorming; whichever one you choose, this stage of paragraph development cannot be skipped. Building paragraphs can be like building a skyscraper: there must be a well-planned foundation that supports what you are building. Any cracks, inconsistencies, or other corruptions of the foundation can cause your whole paper to crumble.

So, let’s suppose that you have done some brainstorming to develop your thesis. What else should you keep in mind as you begin to create paragraphs? Every paragraph in a paper should be :

  • Unified : All of the sentences in a single paragraph should be related to a single controlling idea (often expressed in the topic sentence of the paragraph).
  • Clearly related to the thesis : The sentences should all refer to the central idea, or thesis, of the paper (Rosen and Behrens 119).
  • Coherent : The sentences should be arranged in a logical manner and should follow a definite plan for development (Rosen and Behrens 119).
  • Well-developed : Every idea discussed in the paragraph should be adequately explained and supported through evidence and details that work together to explain the paragraph’s controlling idea (Rosen and Behrens 119).

How do I organize a paragraph?

There are many different ways to organize a paragraph. The organization you choose will depend on the controlling idea of the paragraph. Below are a few possibilities for organization, with links to brief examples:

  • Narration : Tell a story. Go chronologically, from start to finish. ( See an example. )
  • Description : Provide specific details about what something looks, smells, tastes, sounds, or feels like. Organize spatially, in order of appearance, or by topic. ( See an example. )
  • Process : Explain how something works, step by step. Perhaps follow a sequence—first, second, third. ( See an example. )
  • Classification : Separate into groups or explain the various parts of a topic. ( See an example. )
  • Illustration : Give examples and explain how those examples support your point. (See an example in the 5-step process below.)

Illustration paragraph: a 5-step example

From the list above, let’s choose “illustration” as our rhetorical purpose. We’ll walk through a 5-step process for building a paragraph that illustrates a point in an argument. For each step there is an explanation and example. Our example paragraph will be about human misconceptions of piranhas.

Step 1. Decide on a controlling idea and create a topic sentence

Paragraph development begins with the formulation of the controlling idea. This idea directs the paragraph’s development. Often, the controlling idea of a paragraph will appear in the form of a topic sentence. In some cases, you may need more than one sentence to express a paragraph’s controlling idea.

Controlling idea and topic sentence — Despite the fact that piranhas are relatively harmless, many people continue to believe the pervasive myth that piranhas are dangerous to humans.

Step 2. Elaborate on the controlling idea

Paragraph development continues with an elaboration on the controlling idea, perhaps with an explanation, implication, or statement about significance. Our example offers a possible explanation for the pervasiveness of the myth.

Elaboration — This impression of piranhas is exacerbated by their mischaracterization in popular media.

Step 3. Give an example (or multiple examples)

Paragraph development progresses with an example (or more) that illustrates the claims made in the previous sentences.

Example — For example, the promotional poster for the 1978 horror film Piranha features an oversized piranha poised to bite the leg of an unsuspecting woman.

Step 4. Explain the example(s)

The next movement in paragraph development is an explanation of each example and its relevance to the topic sentence. The explanation should demonstrate the value of the example as evidence to support the major claim, or focus, in your paragraph.

Continue the pattern of giving examples and explaining them until all points/examples that the writer deems necessary have been made and explained. NONE of your examples should be left unexplained. You might be able to explain the relationship between the example and the topic sentence in the same sentence which introduced the example. More often, however, you will need to explain that relationship in a separate sentence.

Explanation for example — Such a terrifying representation easily captures the imagination and promotes unnecessary fear.

Notice that the example and explanation steps of this 5-step process (steps 3 and 4) can be repeated as needed. The idea is that you continue to use this pattern until you have completely developed the main idea of the paragraph.

Step 5. Complete the paragraph’s idea or transition into the next paragraph

The final movement in paragraph development involves tying up the loose ends of the paragraph. At this point, you can remind your reader about the relevance of the information to the larger paper, or you can make a concluding point for this example. You might, however, simply transition to the next paragraph.

Sentences for completing a paragraph — While the trope of the man-eating piranhas lends excitement to the adventure stories, it bears little resemblance to the real-life piranha. By paying more attention to fact than fiction, humans may finally be able to let go of this inaccurate belief.

Finished paragraph

Despite the fact that piranhas are relatively harmless, many people continue to believe the pervasive myth that piranhas are dangerous to humans. This impression of piranhas is exacerbated by their mischaracterization in popular media. For example, the promotional poster for the 1978 horror film Piranha features an oversized piranha poised to bite the leg of an unsuspecting woman. Such a terrifying representation easily captures the imagination and promotes unnecessary fear. While the trope of the man-eating piranhas lends excitement to the adventure stories, it bears little resemblance to the real-life piranha. By paying more attention to fact than fiction, humans may finally be able to let go of this inaccurate belief.

Troubleshooting paragraphs

Problem: the paragraph has no topic sentence.

Imagine each paragraph as a sandwich. The real content of the sandwich—the meat or other filling—is in the middle. It includes all the evidence you need to make the point. But it gets kind of messy to eat a sandwich without any bread. Your readers don’t know what to do with all the evidence you’ve given them. So, the top slice of bread (the first sentence of the paragraph) explains the topic (or controlling idea) of the paragraph. And, the bottom slice (the last sentence of the paragraph) tells the reader how the paragraph relates to the broader argument. In the original and revised paragraphs below, notice how a topic sentence expressing the controlling idea tells the reader the point of all the evidence.

Original paragraph

Piranhas rarely feed on large animals; they eat smaller fish and aquatic plants. When confronted with humans, piranhas’ first instinct is to flee, not attack. Their fear of humans makes sense. Far more piranhas are eaten by people than people are eaten by piranhas. If the fish are well-fed, they won’t bite humans.

Revised paragraph

Although most people consider piranhas to be quite dangerous, they are, for the most part, entirely harmless. Piranhas rarely feed on large animals; they eat smaller fish and aquatic plants. When confronted with humans, piranhas’ first instinct is to flee, not attack. Their fear of humans makes sense. Far more piranhas are eaten by people than people are eaten by piranhas. If the fish are well-fed, they won’t bite humans.

Once you have mastered the use of topic sentences, you may decide that the topic sentence for a particular paragraph really shouldn’t be the first sentence of the paragraph. This is fine—the topic sentence can actually go at the beginning, middle, or end of a paragraph; what’s important is that it is in there somewhere so that readers know what the main idea of the paragraph is and how it relates back to the thesis of your paper. Suppose that we wanted to start the piranha paragraph with a transition sentence—something that reminds the reader of what happened in the previous paragraph—rather than with the topic sentence. Let’s suppose that the previous paragraph was about all kinds of animals that people are afraid of, like sharks, snakes, and spiders. Our paragraph might look like this (the topic sentence is bold):

Like sharks, snakes, and spiders, piranhas are widely feared. Although most people consider piranhas to be quite dangerous, they are, for the most part, entirely harmless . Piranhas rarely feed on large animals; they eat smaller fish and aquatic plants. When confronted with humans, piranhas’ first instinct is to flee, not attack. Their fear of humans makes sense. Far more piranhas are eaten by people than people are eaten by piranhas. If the fish are well-fed, they won’t bite humans.

Problem: the paragraph has more than one controlling idea

If a paragraph has more than one main idea, consider eliminating sentences that relate to the second idea, or split the paragraph into two or more paragraphs, each with only one main idea. Watch our short video on reverse outlining to learn a quick way to test whether your paragraphs are unified. In the following paragraph, the final two sentences branch off into a different topic; so, the revised paragraph eliminates them and concludes with a sentence that reminds the reader of the paragraph’s main idea.

Although most people consider piranhas to be quite dangerous, they are, for the most part, entirely harmless. Piranhas rarely feed on large animals; they eat smaller fish and aquatic plants. When confronted with humans, piranhas’ first instinct is to flee, not attack. Their fear of humans makes sense. Far more piranhas are eaten by people than people are eaten by piranhas. A number of South American groups eat piranhas. They fry or grill the fish and then serve them with coconut milk or tucupi, a sauce made from fermented manioc juices.

Problem: transitions are needed within the paragraph

You are probably familiar with the idea that transitions may be needed between paragraphs or sections in a paper (see our handout on transitions ). Sometimes they are also helpful within the body of a single paragraph. Within a paragraph, transitions are often single words or short phrases that help to establish relationships between ideas and to create a logical progression of those ideas in a paragraph. This is especially likely to be true within paragraphs that discuss multiple examples. Let’s take a look at a version of our piranha paragraph that uses transitions to orient the reader:

Although most people consider piranhas to be quite dangerous, they are, except in two main situations, entirely harmless. Piranhas rarely feed on large animals; they eat smaller fish and aquatic plants. When confronted with humans, piranhas’ instinct is to flee, not attack. But there are two situations in which a piranha bite is likely. The first is when a frightened piranha is lifted out of the water—for example, if it has been caught in a fishing net. The second is when the water level in pools where piranhas are living falls too low. A large number of fish may be trapped in a single pool, and if they are hungry, they may attack anything that enters the water.

In this example, you can see how the phrases “the first” and “the second” help the reader follow the organization of the ideas in the paragraph.

Works consulted

We consulted these works while writing this handout. This is not a comprehensive list of resources on the handout’s topic, and we encourage you to do your own research to find additional publications. Please do not use this list as a model for the format of your own reference list, as it may not match the citation style you are using. For guidance on formatting citations, please see the UNC Libraries citation tutorial . We revise these tips periodically and welcome feedback.

Lunsford, Andrea. 2008. The St. Martin’s Handbook: Annotated Instructor’s Edition , 6th ed. New York: St. Martin’s.

Rosen, Leonard J., and Laurence Behrens. 2003. The Allyn & Bacon Handbook , 5th ed. New York: Longman.

You may reproduce it for non-commercial use if you use the entire handout and attribute the source: The Writing Center, University of North Carolina at Chapel Hill

Make a Gift

Five Paragraph Essay Outline

10 May, 2020

7 minutes read

Author:  Tomas White

The five paragraph essay exists as one of the most commonly assigned essays, especially for high school students. In fact, the five paragraph essay format is so popular, it is often used not only in the classroom but for exams and admission essays as well. If you’ve never written a five paragraph essay, or find that you simply need a good refresher, you’ve come to the right place! We have thrown out all the useless information and boiled it down to the essential info you need to both understand what a five paragraph essay is and how to write one to earn the grade you want.

five paragraph essay

What is a Five Paragraph Essay?

Unlike some misleading names, the five-paragraph essay is exactly what it sounds like: an essay that consists solely of five paragraphs. This type of essay is strictly about the structure. That’s what matters much more than the topic or questions to be discussed in the essay. The paragraphs in the essay paragraphs follow a very specific outline.

This kind of essay was separated from all other types with a sole purpose – teaching students about the concept of the essay by practicing it’s most basic structure variant. Even though any kind of essay can have five paragraph – from a  definition essay to story-based narrative essay ; five paragraph essay is never limited to the approach. It might look like any other essay, but the structure is the king here.

Meet the Five Paragraph Essay Outline

This type of essay contains three distinctly different kinds of paragraphs including (in order):

  • Introduction paragraph
  • Three body paragraphs
  • Conclusion paragraph

While all three types of paragraphs follow traditional grammar and syntax conventions, what goes into each paragraph varies based on its purpose.

Let’s take a look at what makes each of these paragraphs unique:

5 Paragraph Essay Outline Structure

Introduction

The introduction paragraph should have three key parts: an attention getter , background information , and a thesis statement . These elements should appear in this order; the thesis statement typically appears as the last sentence in the introduction because it acts as a transition to the body paragraphs which each work to support the argument outlined within the thesis. Let’s say that the topic of your five paragraph essay is the best type of pet to own. After doing some research, you decide to write about cats. The attention getter sentence should capture the audience’s attention and make them want to read more about cats. Attention getter sentences often fall into one of four categories:

  • An interesting fact
  • An engaging statistic
  • A relevant quotation
  • A personal anecdote

It’s important to note that a personal anecdote or story from your own life may not be appropriate for all types of essays — especially if an instructor has noted that no personal pronouns be used in the paper.

Related post: How to write an Essay introduction

After gaining the audience’s interest, the next few sentences, anywhere from 3 to 10 sentences, depending upon the essay, should be about background information. Such information may define specific vocabulary or generally provide background information relevant to the topic.

In a five-paragraph essay about cats, relevant background information could include when cats became domesticated, how many breeds of cats are available today, and where individuals can find cats as pets.

Finally, the last part of the introduction paragraph should be the thesis statement. A well-written thesis statement should include an argument and a roadmap on how to prove it.

In this case, a simple yet effective thesis statement could be: “Cats make the best pets because they are intelligent, friendly, and sociable.” The first part, “cats make the best pets” is the argument while the second part, “intelligent, friendly, and sociable” is the roadmap. This is called a roadmap because it outlines the ideas that will guide the paper in the body paragraphs.

According to this thesis statement, there will be a body paragraph that provides evidence that cats are intelligent, another proving that cats are friendly, and a third proving that cats are sociable.

Stuck on writing your outline? Our essay writer will help You!

Body Paragraphs

A typical body paragraph is anywhere from six to twenty sentences; the length of a body paragraph depends upon the amount of research, analysis, and discussion needed in each paragraph to support the argument set forth in the thesis statement. Typically, a body paragraph will follow an organization such as this:

  • Topic sentence
  • Background sentence(s)
  • Explanation

There could be more sentences if you have more quotations from research to include. Additionally, some topics may require several background sentences while others only require one.

Body Paragraphs

Remember: value your time and your teacher’s time; skip writing sentences to pad the length of your paper and ask yourself if each sentence contributes to proving the argument outlined within the thesis. If the sentence doesn’t, get rid of it!

The conclusion paragraph should be the final paragraph in the paper. It is often the shortest paragraph. Its purpose is to review the main points and prove to the audience that the writer has successfully argued his or her point. A conclusion paragraph should never introduce any new information. Most teachers prefer students to skip obvious phrases such as “In conclusion” or “Before ending” because these statements are understood by the reader.

Related post: Cause and Effect essay outline 

Following the conclusion paragraph, you will likely need to create a “Works Cited” and/or a “Bibliography” page if you included any type of research within the five paragraph essay outline. After each quotation or paraphrase in the essay should be a parenthetical citation, and each parenthetical citation should be referenced in the Works Cited and/or Bibliography page. Your teacher or professor should clearly communicate their preference; a Works Cited page exists as a reference for all the works quoted in the essay whereas the Bibliography page lists every source you consulted during the research process.

Tips from our writers

Tips and Tricks

If writing isn’t one of your favorite requirements of academic life, check out these 10 tips and tricks to navigate creating a five paragraph essay smoothly:

  • Begin early
  • Make an appointment with the teacher to discuss your ideas/progress and get feedback
  • Take good notes, and cite the sources as you go
  • Create a sentence outline before the draft
  • Edit the essay twice: once for content, once for grammar
  • Get at least one other person to provide constructive criticism
  • Make an appointment at the Writing Center if your campus has one
  • Hire a professional editing service to catch pesky grammar errors
  • Review the final essay against any provided rubric item by item
  • Check the paper’s formatting for spacing, margins, and headers/footers

Writing assignments are never sprints, but they don’t have to be marathons either — try to find a happy medium!

A life lesson in Romeo and Juliet taught by death

A life lesson in Romeo and Juliet taught by death

Due to human nature, we draw conclusions only when life gives us a lesson since the experience of others is not so effective and powerful. Therefore, when analyzing and sorting out common problems we face, we may trace a parallel with well-known book characters or real historical figures. Moreover, we often compare our situations with […]

Ethical Research Paper Topics

Ethical Research Paper Topics

Writing a research paper on ethics is not an easy task, especially if you do not possess excellent writing skills and do not like to contemplate controversial questions. But an ethics course is obligatory in all higher education institutions, and students have to look for a way out and be creative. When you find an […]

Art Research Paper Topics

Art Research Paper Topics

Students obtaining degrees in fine art and art & design programs most commonly need to write a paper on art topics. However, this subject is becoming more popular in educational institutions for expanding students’ horizons. Thus, both groups of receivers of education: those who are into arts and those who only get acquainted with art […]

Purdue Online Writing Lab Purdue OWL® College of Liberal Arts

Welcome to the Purdue Online Writing Lab

OWL logo

Welcome to the Purdue OWL

This page is brought to you by the OWL at Purdue University. When printing this page, you must include the entire legal notice.

Copyright ©1995-2018 by The Writing Lab & The OWL at Purdue and Purdue University. All rights reserved. This material may not be published, reproduced, broadcast, rewritten, or redistributed without permission. Use of this site constitutes acceptance of our terms and conditions of fair use.

The Online Writing Lab at Purdue University houses writing resources and instructional material, and we provide these as a free service of the Writing Lab at Purdue. Students, members of the community, and users worldwide will find information to assist with many writing projects. Teachers and trainers may use this material for in-class and out-of-class instruction.

The Purdue On-Campus Writing Lab and Purdue Online Writing Lab assist clients in their development as writers—no matter what their skill level—with on-campus consultations, online participation, and community engagement. The Purdue Writing Lab serves the Purdue, West Lafayette, campus and coordinates with local literacy initiatives. The Purdue OWL offers global support through online reference materials and services.

A Message From the Assistant Director of Content Development 

The Purdue OWL® is committed to supporting  students, instructors, and writers by offering a wide range of resources that are developed and revised with them in mind. To do this, the OWL team is always exploring possibilties for a better design, allowing accessibility and user experience to guide our process. As the OWL undergoes some changes, we welcome your feedback and suggestions by email at any time.

Please don't hesitate to contact us via our contact page  if you have any questions or comments.

All the best,

Social Media

Facebook twitter.

Calculate for all schools

Your chance of acceptance, your chancing factors, extracurriculars, 10 sentence essay example.

Hey! My English teacher assigned us a 10-sentence essay and I'm struggling with making my points clear and concise. Can anyone share a good example or some tips on how to write an effective short essay?

Sure! Let's go through an example 10-sentence essay on the topic "The Importance of Time Management for Students." I'll also provide some tips for writing a concise essay along the way.

1. Time management is a vital skill for students, as it helps them balance academics, extracurricular activities, and personal life.

(Tip: Start with a clear and concise topic sentence that introduces the main idea of your essay.)

2. Proper time management enables students to prioritize tasks and allocate appropriate time to each, reducing the risk of procrastination.

(Tip: Use each sentence to present a new point or expand on a previous one.)

3. By creating a schedule, students can allocate specific time blocks for classes, assignments, and breaks, ensuring that no deadline is missed.

(Tip: Be specific while detailing your points, this helps the reader to visualize the scenario.)

4. Time management also allows students to set short and long-term goals, motivating them to work consistently toward their objectives.

(Tip: Show how your main idea is applicable in different contexts or situations.)

5. Moreover, breaking down larger projects into smaller tasks helps students track their progress and tackle complex assignments with more ease.

(Tip: Use transitions like "moreover" to connect your ideas and maintain a good flow.)

6. Finding a balance between academic and personal life is crucial for students' mental well-being, and effective time management can prevent burnout.

(Tip: Highlight the additional benefits of your main idea.)

7. Students who manage their time wisely can also participate in extracurricular activities, expanding their experiences and skillsets.

(Tip: Use concrete examples to further illustrate your points.)

8. Furthermore, using study techniques such as the Pomodoro Technique can help students maximize their productivity during study sessions.

(Tip: Mention popular or widely-used tools related to your topic.)

9. Mastering time management skills early on prepares students for their future careers, as good organization is highly valued in the professional world.

(Tip: Show the long-lasting impact or relevance of your main idea.)

10. In conclusion, time management is essential for students to achieve academic success, maintain a healthy personal life, and develop crucial skills for the future.

(Tip: Finish with a strong and concise concluding sentence that wraps up your key points and restates the main idea.)

About CollegeVine’s Expert FAQ

CollegeVine’s Q&A seeks to offer informed perspectives on commonly asked admissions questions. Every answer is refined and validated by our team of admissions experts to ensure it resonates with trusted knowledge in the field.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 03 June 2024

Applying large language models for automated essay scoring for non-native Japanese

  • Wenchao Li 1 &
  • Haitao Liu 2  

Humanities and Social Sciences Communications volume  11 , Article number:  723 ( 2024 ) Cite this article

12 Accesses

1 Altmetric

Metrics details

  • Language and linguistics

Recent advancements in artificial intelligence (AI) have led to an increased use of large language models (LLMs) for language assessment tasks such as automated essay scoring (AES), automated listening tests, and automated oral proficiency assessments. The application of LLMs for AES in the context of non-native Japanese, however, remains limited. This study explores the potential of LLM-based AES by comparing the efficiency of different models, i.e. two conventional machine training technology-based methods (Jess and JWriter), two LLMs (GPT and BERT), and one Japanese local LLM (Open-Calm large model). To conduct the evaluation, a dataset consisting of 1400 story-writing scripts authored by learners with 12 different first languages was used. Statistical analysis revealed that GPT-4 outperforms Jess and JWriter, BERT, and the Japanese language-specific trained Open-Calm large model in terms of annotation accuracy and predicting learning levels. Furthermore, by comparing 18 different models that utilize various prompts, the study emphasized the significance of prompts in achieving accurate and reliable evaluations using LLMs.

Similar content being viewed by others

10 paragraph essay format

Scoring method of English composition integrating deep learning in higher vocational colleges

10 paragraph essay format

ChatGPT-3.5 as writing assistance in students’ essays

10 paragraph essay format

Detecting contract cheating through linguistic fingerprint

Conventional machine learning technology in aes.

AES has experienced significant growth with the advancement of machine learning technologies in recent decades. In the earlier stages of AES development, conventional machine learning-based approaches were commonly used. These approaches involved the following procedures: a) feeding the machine with a dataset. In this step, a dataset of essays is provided to the machine learning system. The dataset serves as the basis for training the model and establishing patterns and correlations between linguistic features and human ratings. b) the machine learning model is trained using linguistic features that best represent human ratings and can effectively discriminate learners’ writing proficiency. These features include lexical richness (Lu, 2012 ; Kyle and Crossley, 2015 ; Kyle et al. 2021 ), syntactic complexity (Lu, 2010 ; Liu, 2008 ), text cohesion (Crossley and McNamara, 2016 ), and among others. Conventional machine learning approaches in AES require human intervention, such as manual correction and annotation of essays. This human involvement was necessary to create a labeled dataset for training the model. Several AES systems have been developed using conventional machine learning technologies. These include the Intelligent Essay Assessor (Landauer et al. 2003 ), the e-rater engine by Educational Testing Service (Attali and Burstein, 2006 ; Burstein, 2003 ), MyAccess with the InterlliMetric scoring engine by Vantage Learning (Elliot, 2003 ), and the Bayesian Essay Test Scoring system (Rudner and Liang, 2002 ). These systems have played a significant role in automating the essay scoring process and providing quick and consistent feedback to learners. However, as touched upon earlier, conventional machine learning approaches rely on predetermined linguistic features and often require manual intervention, making them less flexible and potentially limiting their generalizability to different contexts.

In the context of the Japanese language, conventional machine learning-incorporated AES tools include Jess (Ishioka and Kameda, 2006 ) and JWriter (Lee and Hasebe, 2017 ). Jess assesses essays by deducting points from the perfect score, utilizing the Mainichi Daily News newspaper as a database. The evaluation criteria employed by Jess encompass various aspects, such as rhetorical elements (e.g., reading comprehension, vocabulary diversity, percentage of complex words, and percentage of passive sentences), organizational structures (e.g., forward and reverse connection structures), and content analysis (e.g., latent semantic indexing). JWriter employs linear regression analysis to assign weights to various measurement indices, such as average sentence length and total number of characters. These weights are then combined to derive the overall score. A pilot study involving the Jess model was conducted on 1320 essays at different proficiency levels, including primary, intermediate, and advanced. However, the results indicated that the Jess model failed to significantly distinguish between these essay levels. Out of the 16 measures used, four measures, namely median sentence length, median clause length, median number of phrases, and maximum number of phrases, did not show statistically significant differences between the levels. Additionally, two measures exhibited between-level differences but lacked linear progression: the number of attributives declined words and the Kanji/kana ratio. On the other hand, the remaining measures, including maximum sentence length, maximum clause length, number of attributive conjugated words, maximum number of consecutive infinitive forms, maximum number of conjunctive-particle clauses, k characteristic value, percentage of big words, and percentage of passive sentences, demonstrated statistically significant between-level differences and displayed linear progression.

Both Jess and JWriter exhibit notable limitations, including the manual selection of feature parameters and weights, which can introduce biases into the scoring process. The reliance on human annotators to label non-native language essays also introduces potential noise and variability in the scoring. Furthermore, an important concern is the possibility of system manipulation and cheating by learners who are aware of the regression equation utilized by the models (Hirao et al. 2020 ). These limitations emphasize the need for further advancements in AES systems to address these challenges.

Deep learning technology in AES

Deep learning has emerged as one of the approaches for improving the accuracy and effectiveness of AES. Deep learning-based AES methods utilize artificial neural networks that mimic the human brain’s functioning through layered algorithms and computational units. Unlike conventional machine learning, deep learning autonomously learns from the environment and past errors without human intervention. This enables deep learning models to establish nonlinear correlations, resulting in higher accuracy. Recent advancements in deep learning have led to the development of transformers, which are particularly effective in learning text representations. Noteworthy examples include bidirectional encoder representations from transformers (BERT) (Devlin et al. 2019 ) and the generative pretrained transformer (GPT) (OpenAI).

BERT is a linguistic representation model that utilizes a transformer architecture and is trained on two tasks: masked linguistic modeling and next-sentence prediction (Hirao et al. 2020 ; Vaswani et al. 2017 ). In the context of AES, BERT follows specific procedures, as illustrated in Fig. 1 : (a) the tokenized prompts and essays are taken as input; (b) special tokens, such as [CLS] and [SEP], are added to mark the beginning and separation of prompts and essays; (c) the transformer encoder processes the prompt and essay sequences, resulting in hidden layer sequences; (d) the hidden layers corresponding to the [CLS] tokens (T[CLS]) represent distributed representations of the prompts and essays; and (e) a multilayer perceptron uses these distributed representations as input to obtain the final score (Hirao et al. 2020 ).

figure 1

AES system with BERT (Hirao et al. 2020 ).

The training of BERT using a substantial amount of sentence data through the Masked Language Model (MLM) allows it to capture contextual information within the hidden layers. Consequently, BERT is expected to be capable of identifying artificial essays as invalid and assigning them lower scores (Mizumoto and Eguchi, 2023 ). In the context of AES for nonnative Japanese learners, Hirao et al. ( 2020 ) combined the long short-term memory (LSTM) model proposed by Hochreiter and Schmidhuber ( 1997 ) with BERT to develop a tailored automated Essay Scoring System. The findings of their study revealed that the BERT model outperformed both the conventional machine learning approach utilizing character-type features such as “kanji” and “hiragana”, as well as the standalone LSTM model. Takeuchi et al. ( 2021 ) presented an approach to Japanese AES that eliminates the requirement for pre-scored essays by relying solely on reference texts or a model answer for the essay task. They investigated multiple similarity evaluation methods, including frequency of morphemes, idf values calculated on Wikipedia, LSI, LDA, word-embedding vectors, and document vectors produced by BERT. The experimental findings revealed that the method utilizing the frequency of morphemes with idf values exhibited the strongest correlation with human-annotated scores across different essay tasks. The utilization of BERT in AES encounters several limitations. Firstly, essays often exceed the model’s maximum length limit. Second, only score labels are available for training, which restricts access to additional information.

Mizumoto and Eguchi ( 2023 ) were pioneers in employing the GPT model for AES in non-native English writing. Their study focused on evaluating the accuracy and reliability of AES using the GPT-3 text-davinci-003 model, analyzing a dataset of 12,100 essays from the corpus of nonnative written English (TOEFL11). The findings indicated that AES utilizing the GPT-3 model exhibited a certain degree of accuracy and reliability. They suggest that GPT-3-based AES systems hold the potential to provide support for human ratings. However, applying GPT model to AES presents a unique natural language processing (NLP) task that involves considerations such as nonnative language proficiency, the influence of the learner’s first language on the output in the target language, and identifying linguistic features that best indicate writing quality in a specific language. These linguistic features may differ morphologically or syntactically from those present in the learners’ first language, as observed in (1)–(3).

我-送了-他-一本-书

Wǒ-sòngle-tā-yī běn-shū

1 sg .-give. past- him-one .cl- book

“I gave him a book.”

Agglutinative

彼-に-本-を-あげ-まし-た

Kare-ni-hon-o-age-mashi-ta

3 sg .- dat -hon- acc- give.honorification. past

Inflectional

give, give-s, gave, given, giving

Additionally, the morphological agglutination and subject-object-verb (SOV) order in Japanese, along with its idiomatic expressions, pose additional challenges for applying language models in AES tasks (4).

足-が 棒-に なり-ました

Ashi-ga bo-ni nar-mashita

leg- nom stick- dat become- past

“My leg became like a stick (I am extremely tired).”

The example sentence provided demonstrates the morpho-syntactic structure of Japanese and the presence of an idiomatic expression. In this sentence, the verb “なる” (naru), meaning “to become”, appears at the end of the sentence. The verb stem “なり” (nari) is attached with morphemes indicating honorification (“ます” - mashu) and tense (“た” - ta), showcasing agglutination. While the sentence can be literally translated as “my leg became like a stick”, it carries an idiomatic interpretation that implies “I am extremely tired”.

To overcome this issue, CyberAgent Inc. ( 2023 ) has developed the Open-Calm series of language models specifically designed for Japanese. Open-Calm consists of pre-trained models available in various sizes, such as Small, Medium, Large, and 7b. Figure 2 depicts the fundamental structure of the Open-Calm model. A key feature of this architecture is the incorporation of the Lora Adapter and GPT-NeoX frameworks, which can enhance its language processing capabilities.

figure 2

GPT-NeoX Model Architecture (Okgetheng and Takeuchi 2024 ).

In a recent study conducted by Okgetheng and Takeuchi ( 2024 ), they assessed the efficacy of Open-Calm language models in grading Japanese essays. The research utilized a dataset of approximately 300 essays, which were annotated by native Japanese educators. The findings of the study demonstrate the considerable potential of Open-Calm language models in automated Japanese essay scoring. Specifically, among the Open-Calm family, the Open-Calm Large model (referred to as OCLL) exhibited the highest performance. However, it is important to note that, as of the current date, the Open-Calm Large model does not offer public access to its server. Consequently, users are required to independently deploy and operate the environment for OCLL. In order to utilize OCLL, users must have a PC equipped with an NVIDIA GeForce RTX 3060 (8 or 12 GB VRAM).

In summary, while the potential of LLMs in automated scoring of nonnative Japanese essays has been demonstrated in two studies—BERT-driven AES (Hirao et al. 2020 ) and OCLL-based AES (Okgetheng and Takeuchi, 2024 )—the number of research efforts in this area remains limited.

Another significant challenge in applying LLMs to AES lies in prompt engineering and ensuring its reliability and effectiveness (Brown et al. 2020 ; Rae et al. 2021 ; Zhang et al. 2021 ). Various prompting strategies have been proposed, such as the zero-shot chain of thought (CoT) approach (Kojima et al. 2022 ), which involves manually crafting diverse and effective examples. However, manual efforts can lead to mistakes. To address this, Zhang et al. ( 2021 ) introduced an automatic CoT prompting method called Auto-CoT, which demonstrates matching or superior performance compared to the CoT paradigm. Another prompt framework is trees of thoughts, enabling a model to self-evaluate its progress at intermediate stages of problem-solving through deliberate reasoning (Yao et al. 2023 ).

Beyond linguistic studies, there has been a noticeable increase in the number of foreign workers in Japan and Japanese learners worldwide (Ministry of Health, Labor, and Welfare of Japan, 2022 ; Japan Foundation, 2021 ). However, existing assessment methods, such as the Japanese Language Proficiency Test (JLPT), J-CAT, and TTBJ Footnote 1 , primarily focus on reading, listening, vocabulary, and grammar skills, neglecting the evaluation of writing proficiency. As the number of workers and language learners continues to grow, there is a rising demand for an efficient AES system that can reduce costs and time for raters and be utilized for employment, examinations, and self-study purposes.

This study aims to explore the potential of LLM-based AES by comparing the effectiveness of five models: two LLMs (GPT Footnote 2 and BERT), one Japanese local LLM (OCLL), and two conventional machine learning-based methods (linguistic feature-based scoring tools - Jess and JWriter).

The research questions addressed in this study are as follows:

To what extent do the LLM-driven AES and linguistic feature-based AES, when used as automated tools to support human rating, accurately reflect test takers’ actual performance?

What influence does the prompt have on the accuracy and performance of LLM-based AES methods?

The subsequent sections of the manuscript cover the methodology, including the assessment measures for nonnative Japanese writing proficiency, criteria for prompts, and the dataset. The evaluation section focuses on the analysis of annotations and rating scores generated by LLM-driven and linguistic feature-based AES methods.

Methodology

The dataset utilized in this study was obtained from the International Corpus of Japanese as a Second Language (I-JAS) Footnote 3 . This corpus consisted of 1000 participants who represented 12 different first languages. For the study, the participants were given a story-writing task on a personal computer. They were required to write two stories based on the 4-panel illustrations titled “Picnic” and “The key” (see Appendix A). Background information for the participants was provided by the corpus, including their Japanese language proficiency levels assessed through two online tests: J-CAT and SPOT. These tests evaluated their reading, listening, vocabulary, and grammar abilities. The learners’ proficiency levels were categorized into six levels aligned with the Common European Framework of Reference for Languages (CEFR) and the Reference Framework for Japanese Language Education (RFJLE): A1, A2, B1, B2, C1, and C2. According to Lee et al. ( 2015 ), there is a high level of agreement (r = 0.86) between the J-CAT and SPOT assessments, indicating that the proficiency certifications provided by J-CAT are consistent with those of SPOT. However, it is important to note that the scores of J-CAT and SPOT do not have a one-to-one correspondence. In this study, the J-CAT scores were used as a benchmark to differentiate learners of different proficiency levels. A total of 1400 essays were utilized, representing the beginner (aligned with A1), A2, B1, B2, C1, and C2 levels based on the J-CAT scores. Table 1 provides information about the learners’ proficiency levels and their corresponding J-CAT and SPOT scores.

A dataset comprising a total of 1400 essays from the story writing tasks was collected. Among these, 714 essays were utilized to evaluate the reliability of the LLM-based AES method, while the remaining 686 essays were designated as development data to assess the LLM-based AES’s capability to distinguish participants with varying proficiency levels. The GPT 4 API was used in this study. A detailed explanation of the prompt-assessment criteria is provided in Section Prompt . All essays were sent to the model for measurement and scoring.

Measures of writing proficiency for nonnative Japanese

Japanese exhibits a morphologically agglutinative structure where morphemes are attached to the word stem to convey grammatical functions such as tense, aspect, voice, and honorifics, e.g. (5).

食べ-させ-られ-まし-た-か

tabe-sase-rare-mashi-ta-ka

[eat (stem)-causative-passive voice-honorification-tense. past-question marker]

Japanese employs nine case particles to indicate grammatical functions: the nominative case particle が (ga), the accusative case particle を (o), the genitive case particle の (no), the dative case particle に (ni), the locative/instrumental case particle で (de), the ablative case particle から (kara), the directional case particle へ (e), and the comitative case particle と (to). The agglutinative nature of the language, combined with the case particle system, provides an efficient means of distinguishing between active and passive voice, either through morphemes or case particles, e.g. 食べる taberu “eat concusive . ” (active voice); 食べられる taberareru “eat concusive . ” (passive voice). In the active voice, “パン を 食べる” (pan o taberu) translates to “to eat bread”. On the other hand, in the passive voice, it becomes “パン が 食べられた” (pan ga taberareta), which means “(the) bread was eaten”. Additionally, it is important to note that different conjugations of the same lemma are considered as one type in order to ensure a comprehensive assessment of the language features. For example, e.g., 食べる taberu “eat concusive . ”; 食べている tabeteiru “eat progress .”; 食べた tabeta “eat past . ” as one type.

To incorporate these features, previous research (Suzuki, 1999 ; Watanabe et al. 1988 ; Ishioka, 2001 ; Ishioka and Kameda, 2006 ; Hirao et al. 2020 ) has identified complexity, fluency, and accuracy as crucial factors for evaluating writing quality. These criteria are assessed through various aspects, including lexical richness (lexical density, diversity, and sophistication), syntactic complexity, and cohesion (Kyle et al. 2021 ; Mizumoto and Eguchi, 2023 ; Ure, 1971 ; Halliday, 1985 ; Barkaoui and Hadidi, 2020 ; Zenker and Kyle, 2021 ; Kim et al. 2018 ; Lu, 2017 ; Ortega, 2015 ). Therefore, this study proposes five scoring categories: lexical richness, syntactic complexity, cohesion, content elaboration, and grammatical accuracy. A total of 16 measures were employed to capture these categories. The calculation process and specific details of these measures can be found in Table 2 .

T-unit, first introduced by Hunt ( 1966 ), is a measure used for evaluating speech and composition. It serves as an indicator of syntactic development and represents the shortest units into which a piece of discourse can be divided without leaving any sentence fragments. In the context of Japanese language assessment, Sakoda and Hosoi ( 2020 ) utilized T-unit as the basic unit to assess the accuracy and complexity of Japanese learners’ speaking and storytelling. The calculation of T-units in Japanese follows the following principles:

A single main clause constitutes 1 T-unit, regardless of the presence or absence of dependent clauses, e.g. (6).

ケンとマリはピクニックに行きました (main clause): 1 T-unit.

If a sentence contains a main clause along with subclauses, each subclause is considered part of the same T-unit, e.g. (7).

天気が良かった の で (subclause)、ケンとマリはピクニックに行きました (main clause): 1 T-unit.

In the case of coordinate clauses, where multiple clauses are connected, each coordinated clause is counted separately. Thus, a sentence with coordinate clauses may have 2 T-units or more, e.g. (8).

ケンは地図で場所を探して (coordinate clause)、マリはサンドイッチを作りました (coordinate clause): 2 T-units.

Lexical diversity refers to the range of words used within a text (Engber, 1995 ; Kyle et al. 2021 ) and is considered a useful measure of the breadth of vocabulary in L n production (Jarvis, 2013a , 2013b ).

The type/token ratio (TTR) is widely recognized as a straightforward measure for calculating lexical diversity and has been employed in numerous studies. These studies have demonstrated a strong correlation between TTR and other methods of measuring lexical diversity (e.g., Bentz et al. 2016 ; Čech and Miroslav, 2018 ; Çöltekin and Taraka, 2018 ). TTR is computed by considering both the number of unique words (types) and the total number of words (tokens) in a given text. Given that the length of learners’ writing texts can vary, this study employs the moving average type-token ratio (MATTR) to mitigate the influence of text length. MATTR is calculated using a 50-word moving window. Initially, a TTR is determined for words 1–50 in an essay, followed by words 2–51, 3–52, and so on until the end of the essay is reached (Díez-Ortega and Kyle, 2023 ). The final MATTR scores were obtained by averaging the TTR scores for all 50-word windows. The following formula was employed to derive MATTR:

\({\rm{MATTR}}({\rm{W}})=\frac{{\sum }_{{\rm{i}}=1}^{{\rm{N}}-{\rm{W}}+1}{{\rm{F}}}_{{\rm{i}}}}{{\rm{W}}({\rm{N}}-{\rm{W}}+1)}\)

Here, N refers to the number of tokens in the corpus. W is the randomly selected token size (W < N). \({F}_{i}\) is the number of types in each window. The \({\rm{MATTR}}({\rm{W}})\) is the mean of a series of type-token ratios (TTRs) based on the word form for all windows. It is expected that individuals with higher language proficiency will produce texts with greater lexical diversity, as indicated by higher MATTR scores.

Lexical density was captured by the ratio of the number of lexical words to the total number of words (Lu, 2012 ). Lexical sophistication refers to the utilization of advanced vocabulary, often evaluated through word frequency indices (Crossley et al. 2013 ; Haberman, 2008 ; Kyle and Crossley, 2015 ; Laufer and Nation, 1995 ; Lu, 2012 ; Read, 2000 ). In line of writing, lexical sophistication can be interpreted as vocabulary breadth, which entails the appropriate usage of vocabulary items across various lexicon-grammatical contexts and registers (Garner et al. 2019 ; Kim et al. 2018 ; Kyle et al. 2018 ). In Japanese specifically, words are considered lexically sophisticated if they are not included in the “Japanese Education Vocabulary List Ver 1.0”. Footnote 4 Consequently, lexical sophistication was calculated by determining the number of sophisticated word types relative to the total number of words per essay. Furthermore, it has been suggested that, in Japanese writing, sentences should ideally have a length of no more than 40 to 50 characters, as this promotes readability. Therefore, the median and maximum sentence length can be considered as useful indices for assessment (Ishioka and Kameda, 2006 ).

Syntactic complexity was assessed based on several measures, including the mean length of clauses, verb phrases per T-unit, clauses per T-unit, dependent clauses per T-unit, complex nominals per clause, adverbial clauses per clause, coordinate phrases per clause, and mean dependency distance (MDD). The MDD reflects the distance between the governor and dependent positions in a sentence. A larger dependency distance indicates a higher cognitive load and greater complexity in syntactic processing (Liu, 2008 ; Liu et al. 2017 ). The MDD has been established as an efficient metric for measuring syntactic complexity (Jiang, Quyang, and Liu, 2019 ; Li and Yan, 2021 ). To calculate the MDD, the position numbers of the governor and dependent are subtracted, assuming that words in a sentence are assigned in a linear order, such as W1 … Wi … Wn. In any dependency relationship between words Wa and Wb, Wa is the governor and Wb is the dependent. The MDD of the entire sentence was obtained by taking the absolute value of governor – dependent:

MDD = \(\frac{1}{n}{\sum }_{i=1}^{n}|{\rm{D}}{{\rm{D}}}_{i}|\)

In this formula, \(n\) represents the number of words in the sentence, and \({DD}i\) is the dependency distance of the \({i}^{{th}}\) dependency relationship of a sentence. Building on this, the annotation of sentence ‘Mary-ga-John-ni-keshigomu-o-watashita was [Mary- top -John- dat -eraser- acc -give- past] ’. The sentence’s MDD would be 2. Table 3 provides the CSV file as a prompt for GPT 4.

Cohesion (semantic similarity) and content elaboration aim to capture the ideas presented in test taker’s essays. Cohesion was assessed using three measures: Synonym overlap/paragraph (topic), Synonym overlap/paragraph (keywords), and word2vec cosine similarity. Content elaboration and development were measured as the number of metadiscourse markers (type)/number of words. To capture content closely, this study proposed a novel-distance based representation, by encoding the cosine distance between the essay (by learner) and essay task’s (topic and keyword) i -vectors. The learner’s essay is decoded into a word sequence, and aligned to the essay task’ topic and keyword for log-likelihood measurement. The cosine distance reveals the content elaboration score in the leaners’ essay. The mathematical equation of cosine similarity between target-reference vectors is shown in (11), assuming there are i essays and ( L i , …. L n ) and ( N i , …. N n ) are the vectors representing the learner and task’s topic and keyword respectively. The content elaboration distance between L i and N i was calculated as follows:

\(\cos \left(\theta \right)=\frac{{\rm{L}}\,\cdot\, {\rm{N}}}{\left|{\rm{L}}\right|{\rm{|N|}}}=\frac{\mathop{\sum }\nolimits_{i=1}^{n}{L}_{i}{N}_{i}}{\sqrt{\mathop{\sum }\nolimits_{i=1}^{n}{L}_{i}^{2}}\sqrt{\mathop{\sum }\nolimits_{i=1}^{n}{N}_{i}^{2}}}\)

A high similarity value indicates a low difference between the two recognition outcomes, which in turn suggests a high level of proficiency in content elaboration.

To evaluate the effectiveness of the proposed measures in distinguishing different proficiency levels among nonnative Japanese speakers’ writing, we conducted a multi-faceted Rasch measurement analysis (Linacre, 1994 ). This approach applies measurement models to thoroughly analyze various factors that can influence test outcomes, including test takers’ proficiency, item difficulty, and rater severity, among others. The underlying principles and functionality of multi-faceted Rasch measurement are illustrated in (12).

\(\log \left(\frac{{P}_{{nijk}}}{{P}_{{nij}(k-1)}}\right)={B}_{n}-{D}_{i}-{C}_{j}-{F}_{k}\)

(12) defines the logarithmic transformation of the probability ratio ( P nijk /P nij(k-1) )) as a function of multiple parameters. Here, n represents the test taker, i denotes a writing proficiency measure, j corresponds to the human rater, and k represents the proficiency score. The parameter B n signifies the proficiency level of test taker n (where n ranges from 1 to N). D j represents the difficulty parameter of test item i (where i ranges from 1 to L), while C j represents the severity of rater j (where j ranges from 1 to J). Additionally, F k represents the step difficulty for a test taker to move from score ‘k-1’ to k . P nijk refers to the probability of rater j assigning score k to test taker n for test item i . P nij(k-1) represents the likelihood of test taker n being assigned score ‘k-1’ by rater j for test item i . Each facet within the test is treated as an independent parameter and estimated within the same reference framework. To evaluate the consistency of scores obtained through both human and computer analysis, we utilized the Infit mean-square statistic. This statistic is a chi-square measure divided by the degrees of freedom and is weighted with information. It demonstrates higher sensitivity to unexpected patterns in responses to items near a person’s proficiency level (Linacre, 2002 ). Fit statistics are assessed based on predefined thresholds for acceptable fit. For the Infit MNSQ, which has a mean of 1.00, different thresholds have been suggested. Some propose stricter thresholds ranging from 0.7 to 1.3 (Bond et al. 2021 ), while others suggest more lenient thresholds ranging from 0.5 to 1.5 (Eckes, 2009 ). In this study, we adopted the criterion of 0.70–1.30 for the Infit MNSQ.

Moving forward, we can now proceed to assess the effectiveness of the 16 proposed measures based on five criteria for accurately distinguishing various levels of writing proficiency among non-native Japanese speakers. To conduct this evaluation, we utilized the development dataset from the I-JAS corpus, as described in Section Dataset . Table 4 provides a measurement report that presents the performance details of the 14 metrics under consideration. The measure separation was found to be 4.02, indicating a clear differentiation among the measures. The reliability index for the measure separation was 0.891, suggesting consistency in the measurement. Similarly, the person separation reliability index was 0.802, indicating the accuracy of the assessment in distinguishing between individuals. All 16 measures demonstrated Infit mean squares within a reasonable range, ranging from 0.76 to 1.28. The Synonym overlap/paragraph (topic) measure exhibited a relatively high outfit mean square of 1.46, although the Infit mean square falls within an acceptable range. The standard error for the measures ranged from 0.13 to 0.28, indicating the precision of the estimates.

Table 5 further illustrated the weights assigned to different linguistic measures for score prediction, with higher weights indicating stronger correlations between those measures and higher scores. Specifically, the following measures exhibited higher weights compared to others: moving average type token ratio per essay has a weight of 0.0391. Mean dependency distance had a weight of 0.0388. Mean length of clause, calculated by dividing the number of words by the number of clauses, had a weight of 0.0374. Complex nominals per T-unit, calculated by dividing the number of complex nominals by the number of T-units, had a weight of 0.0379. Coordinate phrases rate, calculated by dividing the number of coordinate phrases by the number of clauses, had a weight of 0.0325. Grammatical error rate, representing the number of errors per essay, had a weight of 0.0322.

Criteria (output indicator)

The criteria used to evaluate the writing ability in this study were based on CEFR, which follows a six-point scale ranging from A1 to C2. To assess the quality of Japanese writing, the scoring criteria from Table 6 were utilized. These criteria were derived from the IELTS writing standards and served as assessment guidelines and prompts for the written output.

A prompt is a question or detailed instruction that is provided to the model to obtain a proper response. After several pilot experiments, we decided to provide the measures (Section Measures of writing proficiency for nonnative Japanese ) as the input prompt and use the criteria (Section Criteria (output indicator) ) as the output indicator. Regarding the prompt language, considering that the LLM was tasked with rating Japanese essays, would prompt in Japanese works better Footnote 5 ? We conducted experiments comparing the performance of GPT-4 using both English and Japanese prompts. Additionally, we utilized the Japanese local model OCLL with Japanese prompts. Multiple trials were conducted using the same sample. Regardless of the prompt language used, we consistently obtained the same grading results with GPT-4, which assigned a grade of B1 to the writing sample. This suggested that GPT-4 is reliable and capable of producing consistent ratings regardless of the prompt language. On the other hand, when we used Japanese prompts with the Japanese local model “OCLL”, we encountered inconsistent grading results. Out of 10 attempts with OCLL, only 6 yielded consistent grading results (B1), while the remaining 4 showed different outcomes, including A1 and B2 grades. These findings indicated that the language of the prompt was not the determining factor for reliable AES. Instead, the size of the training data and the model parameters played crucial roles in achieving consistent and reliable AES results for the language model.

The following is the utilized prompt, which details all measures and requires the LLM to score the essays using holistic and trait scores.

Please evaluate Japanese essays written by Japanese learners and assign a score to each essay on a six-point scale, ranging from A1, A2, B1, B2, C1 to C2. Additionally, please provide trait scores and display the calculation process for each trait score. The scoring should be based on the following criteria:

Moving average type-token ratio.

Number of lexical words (token) divided by the total number of words per essay.

Number of sophisticated word types divided by the total number of words per essay.

Mean length of clause.

Verb phrases per T-unit.

Clauses per T-unit.

Dependent clauses per T-unit.

Complex nominals per clause.

Adverbial clauses per clause.

Coordinate phrases per clause.

Mean dependency distance.

Synonym overlap paragraph (topic and keywords).

Word2vec cosine similarity.

Connectives per essay.

Conjunctions per essay.

Number of metadiscourse markers (types) divided by the total number of words.

Number of errors per essay.

Japanese essay text

出かける前に二人が地図を見ている間に、サンドイッチを入れたバスケットに犬が入ってしまいました。それに気づかずに二人は楽しそうに出かけて行きました。やがて突然犬がバスケットから飛び出し、二人は驚きました。バスケット の 中を見ると、食べ物はすべて犬に食べられていて、二人は困ってしまいました。(ID_JJJ01_SW1)

The score of the example above was B1. Figure 3 provides an example of holistic and trait scores provided by GPT-4 (with a prompt indicating all measures) via Bing Footnote 6 .

figure 3

Example of GPT-4 AES and feedback (with a prompt indicating all measures).

Statistical analysis

The aim of this study is to investigate the potential use of LLM for nonnative Japanese AES. It seeks to compare the scoring outcomes obtained from feature-based AES tools, which rely on conventional machine learning technology (i.e. Jess, JWriter), with those generated by AI-driven AES tools utilizing deep learning technology (BERT, GPT, OCLL). To assess the reliability of a computer-assisted annotation tool, the study initially established human-human agreement as the benchmark measure. Subsequently, the performance of the LLM-based method was evaluated by comparing it to human-human agreement.

To assess annotation agreement, the study employed standard measures such as precision, recall, and F-score (Brants 2000 ; Lu 2010 ), along with the quadratically weighted kappa (QWK) to evaluate the consistency and agreement in the annotation process. Assume A and B represent human annotators. When comparing the annotations of the two annotators, the following results are obtained. The evaluation of precision, recall, and F-score metrics was illustrated in equations (13) to (15).

\({\rm{Recall}}(A,B)=\frac{{\rm{Number}}\,{\rm{of}}\,{\rm{identical}}\,{\rm{nodes}}\,{\rm{in}}\,A\,{\rm{and}}\,B}{{\rm{Number}}\,{\rm{of}}\,{\rm{nodes}}\,{\rm{in}}\,A}\)

\({\rm{Precision}}(A,\,B)=\frac{{\rm{Number}}\,{\rm{of}}\,{\rm{identical}}\,{\rm{nodes}}\,{\rm{in}}\,A\,{\rm{and}}\,B}{{\rm{Number}}\,{\rm{of}}\,{\rm{nodes}}\,{\rm{in}}\,B}\)

The F-score is the harmonic mean of recall and precision:

\({\rm{F}}-{\rm{score}}=\frac{2* ({\rm{Precision}}* {\rm{Recall}})}{{\rm{Precision}}+{\rm{Recall}}}\)

The highest possible value of an F-score is 1.0, indicating perfect precision and recall, and the lowest possible value is 0, if either precision or recall are zero.

In accordance with Taghipour and Ng ( 2016 ), the calculation of QWK involves two steps:

Step 1: Construct a weight matrix W as follows:

\({W}_{{ij}}=\frac{{(i-j)}^{2}}{{(N-1)}^{2}}\)

i represents the annotation made by the tool, while j represents the annotation made by a human rater. N denotes the total number of possible annotations. Matrix O is subsequently computed, where O_( i, j ) represents the count of data annotated by the tool ( i ) and the human annotator ( j ). On the other hand, E refers to the expected count matrix, which undergoes normalization to ensure that the sum of elements in E matches the sum of elements in O.

Step 2: With matrices O and E, the QWK is obtained as follows:

K = 1- \(\frac{\sum i,j{W}_{i,j}\,{O}_{i,j}}{\sum i,j{W}_{i,j}\,{E}_{i,j}}\)

The value of the quadratic weighted kappa increases as the level of agreement improves. Further, to assess the accuracy of LLM scoring, the proportional reductive mean square error (PRMSE) was employed. The PRMSE approach takes into account the variability observed in human ratings to estimate the rater error, which is then subtracted from the variance of the human labels. This calculation provides an overall measure of agreement between the automated scores and true scores (Haberman et al. 2015 ; Loukina et al. 2020 ; Taghipour and Ng, 2016 ). The computation of PRMSE involves the following steps:

Step 1: Calculate the mean squared errors (MSEs) for the scoring outcomes of the computer-assisted tool (MSE tool) and the human scoring outcomes (MSE human).

Step 2: Determine the PRMSE by comparing the MSE of the computer-assisted tool (MSE tool) with the MSE from human raters (MSE human), using the following formula:

\({\rm{PRMSE}}=1-\frac{({\rm{MSE}}\,{\rm{tool}})\,}{({\rm{MSE}}\,{\rm{human}})\,}=1-\,\frac{{\sum }_{i}^{n}=1{({{\rm{y}}}_{i}-{\hat{{\rm{y}}}}_{{\rm{i}}})}^{2}}{{\sum }_{i}^{n}=1{({{\rm{y}}}_{i}-\hat{{\rm{y}}})}^{2}}\)

In the numerator, ŷi represents the scoring outcome predicted by a specific LLM-driven AES system for a given sample. The term y i − ŷ i represents the difference between this predicted outcome and the mean value of all LLM-driven AES systems’ scoring outcomes. It quantifies the deviation of the specific LLM-driven AES system’s prediction from the average prediction of all LLM-driven AES systems. In the denominator, y i − ŷ represents the difference between the scoring outcome provided by a specific human rater for a given sample and the mean value of all human raters’ scoring outcomes. It measures the discrepancy between the specific human rater’s score and the average score given by all human raters. The PRMSE is then calculated by subtracting the ratio of the MSE tool to the MSE human from 1. PRMSE falls within the range of 0 to 1, with larger values indicating reduced errors in LLM’s scoring compared to those of human raters. In other words, a higher PRMSE implies that LLM’s scoring demonstrates greater accuracy in predicting the true scores (Loukina et al. 2020 ). The interpretation of kappa values, ranging from 0 to 1, is based on the work of Landis and Koch ( 1977 ). Specifically, the following categories are assigned to different ranges of kappa values: −1 indicates complete inconsistency, 0 indicates random agreement, 0.0 ~ 0.20 indicates extremely low level of agreement (slight), 0.21 ~ 0.40 indicates moderate level of agreement (fair), 0.41 ~ 0.60 indicates medium level of agreement (moderate), 0.61 ~ 0.80 indicates high level of agreement (substantial), 0.81 ~ 1 indicates almost perfect level of agreement. All statistical analyses were executed using Python script.

Results and discussion

Annotation reliability of the llm.

This section focuses on assessing the reliability of the LLM’s annotation and scoring capabilities. To evaluate the reliability, several tests were conducted simultaneously, aiming to achieve the following objectives:

Assess the LLM’s ability to differentiate between test takers with varying levels of oral proficiency.

Determine the level of agreement between the annotations and scoring performed by the LLM and those done by human raters.

The evaluation of the results encompassed several metrics, including: precision, recall, F-Score, quadratically-weighted kappa, proportional reduction of mean squared error, Pearson correlation, and multi-faceted Rasch measurement.

Inter-annotator agreement (human–human annotator agreement)

We started with an agreement test of the two human annotators. Two trained annotators were recruited to determine the writing task data measures. A total of 714 scripts, as the test data, was utilized. Each analysis lasted 300–360 min. Inter-annotator agreement was evaluated using the standard measures of precision, recall, and F-score and QWK. Table 7 presents the inter-annotator agreement for the various indicators. As shown, the inter-annotator agreement was fairly high, with F-scores ranging from 1.0 for sentence and word number to 0.666 for grammatical errors.

The findings from the QWK analysis provided further confirmation of the inter-annotator agreement. The QWK values covered a range from 0.950 ( p  = 0.000) for sentence and word number to 0.695 for synonym overlap number (keyword) and grammatical errors ( p  = 0.001).

Agreement of annotation outcomes between human and LLM

To evaluate the consistency between human annotators and LLM annotators (BERT, GPT, OCLL) across the indices, the same test was conducted. The results of the inter-annotator agreement (F-score) between LLM and human annotation are provided in Appendix B-D. The F-scores ranged from 0.706 for Grammatical error # for OCLL-human to a perfect 1.000 for GPT-human, for sentences, clauses, T-units, and words. These findings were further supported by the QWK analysis, which showed agreement levels ranging from 0.807 ( p  = 0.001) for metadiscourse markers for OCLL-human to 0.962 for words ( p  = 0.000) for GPT-human. The findings demonstrated that the LLM annotation achieved a significant level of accuracy in identifying measurement units and counts.

Reliability of LLM-driven AES’s scoring and discriminating proficiency levels

This section examines the reliability of the LLM-driven AES scoring through a comparison of the scoring outcomes produced by human raters and the LLM ( Reliability of LLM-driven AES scoring ). It also assesses the effectiveness of the LLM-based AES system in differentiating participants with varying proficiency levels ( Reliability of LLM-driven AES discriminating proficiency levels ).

Reliability of LLM-driven AES scoring

Table 8 summarizes the QWK coefficient analysis between the scores computed by the human raters and the GPT-4 for the individual essays from I-JAS Footnote 7 . As shown, the QWK of all measures ranged from k  = 0.819 for lexical density (number of lexical words (tokens)/number of words per essay) to k  = 0.644 for word2vec cosine similarity. Table 9 further presents the Pearson correlations between the 16 writing proficiency measures scored by human raters and GPT 4 for the individual essays. The correlations ranged from 0.672 for syntactic complexity to 0.734 for grammatical accuracy. The correlations between the writing proficiency scores assigned by human raters and the BERT-based AES system were found to range from 0.661 for syntactic complexity to 0.713 for grammatical accuracy. The correlations between the writing proficiency scores given by human raters and the OCLL-based AES system ranged from 0.654 for cohesion to 0.721 for grammatical accuracy. These findings indicated an alignment between the assessments made by human raters and both the BERT-based and OCLL-based AES systems in terms of various aspects of writing proficiency.

Reliability of LLM-driven AES discriminating proficiency levels

After validating the reliability of the LLM’s annotation and scoring, the subsequent objective was to evaluate its ability to distinguish between various proficiency levels. For this analysis, a dataset of 686 individual essays was utilized. Table 10 presents a sample of the results, summarizing the means, standard deviations, and the outcomes of the one-way ANOVAs based on the measures assessed by the GPT-4 model. A post hoc multiple comparison test, specifically the Bonferroni test, was conducted to identify any potential differences between pairs of levels.

As the results reveal, seven measures presented linear upward or downward progress across the three proficiency levels. These were marked in bold in Table 10 and comprise one measure of lexical richness, i.e. MATTR (lexical diversity); four measures of syntactic complexity, i.e. MDD (mean dependency distance), MLC (mean length of clause), CNT (complex nominals per T-unit), CPC (coordinate phrases rate); one cohesion measure, i.e. word2vec cosine similarity and GER (grammatical error rate). Regarding the ability of the sixteen measures to distinguish adjacent proficiency levels, the Bonferroni tests indicated that statistically significant differences exist between the primary level and the intermediate level for MLC and GER. One measure of lexical richness, namely LD, along with three measures of syntactic complexity (VPT, CT, DCT, ACC), two measures of cohesion (SOPT, SOPK), and one measure of content elaboration (IMM), exhibited statistically significant differences between proficiency levels. However, these differences did not demonstrate a linear progression between adjacent proficiency levels. No significant difference was observed in lexical sophistication between proficiency levels.

To summarize, our study aimed to evaluate the reliability and differentiation capabilities of the LLM-driven AES method. For the first objective, we assessed the LLM’s ability to differentiate between test takers with varying levels of oral proficiency using precision, recall, F-Score, and quadratically-weighted kappa. Regarding the second objective, we compared the scoring outcomes generated by human raters and the LLM to determine the level of agreement. We employed quadratically-weighted kappa and Pearson correlations to compare the 16 writing proficiency measures for the individual essays. The results confirmed the feasibility of using the LLM for annotation and scoring in AES for nonnative Japanese. As a result, Research Question 1 has been addressed.

Comparison of BERT-, GPT-, OCLL-based AES, and linguistic-feature-based computation methods

This section aims to compare the effectiveness of five AES methods for nonnative Japanese writing, i.e. LLM-driven approaches utilizing BERT, GPT, and OCLL, linguistic feature-based approaches using Jess and JWriter. The comparison was conducted by comparing the ratings obtained from each approach with human ratings. All ratings were derived from the dataset introduced in Dataset . To facilitate the comparison, the agreement between the automated methods and human ratings was assessed using QWK and PRMSE. The performance of each approach was summarized in Table 11 .

The QWK coefficient values indicate that LLMs (GPT, BERT, OCLL) and human rating outcomes demonstrated higher agreement compared to feature-based AES methods (Jess and JWriter) in assessing writing proficiency criteria, including lexical richness, syntactic complexity, content, and grammatical accuracy. Among the LLMs, the GPT-4 driven AES and human rating outcomes showed the highest agreement in all criteria, except for syntactic complexity. The PRMSE values suggest that the GPT-based method outperformed linguistic feature-based methods and other LLM-based approaches. Moreover, an interesting finding emerged during the study: the agreement coefficient between GPT-4 and human scoring was even higher than the agreement between different human raters themselves. This discovery highlights the advantage of GPT-based AES over human rating. Ratings involve a series of processes, including reading the learners’ writing, evaluating the content and language, and assigning scores. Within this chain of processes, various biases can be introduced, stemming from factors such as rater biases, test design, and rating scales. These biases can impact the consistency and objectivity of human ratings. GPT-based AES may benefit from its ability to apply consistent and objective evaluation criteria. By prompting the GPT model with detailed writing scoring rubrics and linguistic features, potential biases in human ratings can be mitigated. The model follows a predefined set of guidelines and does not possess the same subjective biases that human raters may exhibit. This standardization in the evaluation process contributes to the higher agreement observed between GPT-4 and human scoring. Section Prompt strategy of the study delves further into the role of prompts in the application of LLMs to AES. It explores how the choice and implementation of prompts can impact the performance and reliability of LLM-based AES methods. Furthermore, it is important to acknowledge the strengths of the local model, i.e. the Japanese local model OCLL, which excels in processing certain idiomatic expressions. Nevertheless, our analysis indicated that GPT-4 surpasses local models in AES. This superior performance can be attributed to the larger parameter size of GPT-4, estimated to be between 500 billion and 1 trillion, which exceeds the sizes of both BERT and the local model OCLL.

Prompt strategy

In the context of prompt strategy, Mizumoto and Eguchi ( 2023 ) conducted a study where they applied the GPT-3 model to automatically score English essays in the TOEFL test. They found that the accuracy of the GPT model alone was moderate to fair. However, when they incorporated linguistic measures such as cohesion, syntactic complexity, and lexical features alongside the GPT model, the accuracy significantly improved. This highlights the importance of prompt engineering and providing the model with specific instructions to enhance its performance. In this study, a similar approach was taken to optimize the performance of LLMs. GPT-4, which outperformed BERT and OCLL, was selected as the candidate model. Model 1 was used as the baseline, representing GPT-4 without any additional prompting. Model 2, on the other hand, involved GPT-4 prompted with 16 measures that included scoring criteria, efficient linguistic features for writing assessment, and detailed measurement units and calculation formulas. The remaining models (Models 3 to 18) utilized GPT-4 prompted with individual measures. The performance of these 18 different models was assessed using the output indicators described in Section Criteria (output indicator) . By comparing the performances of these models, the study aimed to understand the impact of prompt engineering on the accuracy and effectiveness of GPT-4 in AES tasks.

Based on the PRMSE scores presented in Fig. 4 , it was observed that Model 1, representing GPT-4 without any additional prompting, achieved a fair level of performance. However, Model 2, which utilized GPT-4 prompted with all measures, outperformed all other models in terms of PRMSE score, achieving a score of 0.681. These results indicate that the inclusion of specific measures and prompts significantly enhanced the performance of GPT-4 in AES. Among the measures, syntactic complexity was found to play a particularly significant role in improving the accuracy of GPT-4 in assessing writing quality. Following that, lexical diversity emerged as another important factor contributing to the model’s effectiveness. The study suggests that a well-prompted GPT-4 can serve as a valuable tool to support human assessors in evaluating writing quality. By utilizing GPT-4 as an automated scoring tool, the evaluation biases associated with human raters can be minimized. This has the potential to empower teachers by allowing them to focus on designing writing tasks and guiding writing strategies, while leveraging the capabilities of GPT-4 for efficient and reliable scoring.

figure 4

PRMSE scores of the 18 AES models.

This study aimed to investigate two main research questions: the feasibility of utilizing LLMs for AES and the impact of prompt engineering on the application of LLMs in AES.

To address the first objective, the study compared the effectiveness of five different models: GPT, BERT, the Japanese local LLM (OCLL), and two conventional machine learning-based AES tools (Jess and JWriter). The PRMSE values indicated that the GPT-4-based method outperformed other LLMs (BERT, OCLL) and linguistic feature-based computational methods (Jess and JWriter) across various writing proficiency criteria. Furthermore, the agreement coefficient between GPT-4 and human scoring surpassed the agreement among human raters themselves, highlighting the potential of using the GPT-4 tool to enhance AES by reducing biases and subjectivity, saving time, labor, and cost, and providing valuable feedback for self-study. Regarding the second goal, the role of prompt design was investigated by comparing 18 models, including a baseline model, a model prompted with all measures, and 16 models prompted with one measure at a time. GPT-4, which outperformed BERT and OCLL, was selected as the candidate model. The PRMSE scores of the models showed that GPT-4 prompted with all measures achieved the best performance, surpassing the baseline and other models.

In conclusion, this study has demonstrated the potential of LLMs in supporting human rating in assessments. By incorporating automation, we can save time and resources while reducing biases and subjectivity inherent in human rating processes. Automated language assessments offer the advantage of accessibility, providing equal opportunities and economic feasibility for individuals who lack access to traditional assessment centers or necessary resources. LLM-based language assessments provide valuable feedback and support to learners, aiding in the enhancement of their language proficiency and the achievement of their goals. This personalized feedback can cater to individual learner needs, facilitating a more tailored and effective language-learning experience.

There are three important areas that merit further exploration. First, prompt engineering requires attention to ensure optimal performance of LLM-based AES across different language types. This study revealed that GPT-4, when prompted with all measures, outperformed models prompted with fewer measures. Therefore, investigating and refining prompt strategies can enhance the effectiveness of LLMs in automated language assessments. Second, it is crucial to explore the application of LLMs in second-language assessment and learning for oral proficiency, as well as their potential in under-resourced languages. Recent advancements in self-supervised machine learning techniques have significantly improved automatic speech recognition (ASR) systems, opening up new possibilities for creating reliable ASR systems, particularly for under-resourced languages with limited data. However, challenges persist in the field of ASR. First, ASR assumes correct word pronunciation for automatic pronunciation evaluation, which proves challenging for learners in the early stages of language acquisition due to diverse accents influenced by their native languages. Accurately segmenting short words becomes problematic in such cases. Second, developing precise audio-text transcriptions for languages with non-native accented speech poses a formidable task. Last, assessing oral proficiency levels involves capturing various linguistic features, including fluency, pronunciation, accuracy, and complexity, which are not easily captured by current NLP technology.

Data availability

The dataset utilized was obtained from the International Corpus of Japanese as a Second Language (I-JAS). The data URLs: [ https://www2.ninjal.ac.jp/jll/lsaj/ihome2.html ].

J-CAT and TTBJ are two computerized adaptive tests used to assess Japanese language proficiency.

SPOT is a specific component of the TTBJ test.

J-CAT: https://www.j-cat2.org/html/ja/pages/interpret.html

SPOT: https://ttbj.cegloc.tsukuba.ac.jp/p1.html#SPOT .

The study utilized a prompt-based GPT-4 model, developed by OpenAI, which has an impressive architecture with 1.8 trillion parameters across 120 layers. GPT-4 was trained on a vast dataset of 13 trillion tokens, using two stages: initial training on internet text datasets to predict the next token, and subsequent fine-tuning through reinforcement learning from human feedback.

https://www2.ninjal.ac.jp/jll/lsaj/ihome2-en.html .

http://jhlee.sakura.ne.jp/JEV/ by Japanese Learning Dictionary Support Group 2015.

We express our sincere gratitude to the reviewer for bringing this matter to our attention.

On February 7, 2023, Microsoft began rolling out a major overhaul to Bing that included a new chatbot feature based on OpenAI’s GPT-4 (Bing.com).

Appendix E-F present the analysis results of the QWK coefficient between the scores computed by the human raters and the BERT, OCLL models.

Attali Y, Burstein J (2006) Automated essay scoring with e-rater® V.2. J. Technol., Learn. Assess., 4

Barkaoui K, Hadidi A (2020) Assessing Change in English Second Language Writing Performance (1st ed.). Routledge, New York. https://doi.org/10.4324/9781003092346

Bentz C, Tatyana R, Koplenig A, Tanja S (2016) A comparison between morphological complexity. measures: Typological data vs. language corpora. In Proceedings of the workshop on computational linguistics for linguistic complexity (CL4LC), 142–153. Osaka, Japan: The COLING 2016 Organizing Committee

Bond TG, Yan Z, Heene M (2021) Applying the Rasch model: Fundamental measurement in the human sciences (4th ed). Routledge

Brants T (2000) Inter-annotator agreement for a German newspaper corpus. Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00), Athens, Greece, 31 May-2 June, European Language Resources Association

Brown TB, Mann B, Ryder N, et al. (2020) Language models are few-shot learners. Advances in Neural Information Processing Systems, Online, 6–12 December, Curran Associates, Inc., Red Hook, NY

Burstein J (2003) The E-rater scoring engine: Automated essay scoring with natural language processing. In Shermis MD and Burstein JC (ed) Automated Essay Scoring: A Cross-Disciplinary Perspective. Lawrence Erlbaum Associates, Mahwah, NJ

Čech R, Miroslav K (2018) Morphological richness of text. In Masako F, Václav C (ed) Taming the corpus: From inflection and lexis to interpretation, 63–77. Cham, Switzerland: Springer Nature

Çöltekin Ç, Taraka, R (2018) Exploiting Universal Dependencies treebanks for measuring morphosyntactic complexity. In Aleksandrs B, Christian B (ed), Proceedings of first workshop on measuring language complexity, 1–7. Torun, Poland

Crossley SA, Cobb T, McNamara DS (2013) Comparing count-based and band-based indices of word frequency: Implications for active vocabulary research and pedagogical applications. System 41:965–981. https://doi.org/10.1016/j.system.2013.08.002

Article   Google Scholar  

Crossley SA, McNamara DS (2016) Say more and be more coherent: How text elaboration and cohesion can increase writing quality. J. Writ. Res. 7:351–370

CyberAgent Inc (2023) Open-Calm series of Japanese language models. Retrieved from: https://www.cyberagent.co.jp/news/detail/id=28817

Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, Minneapolis, Minnesota, 2–7 June, pp. 4171–4186. Association for Computational Linguistics

Diez-Ortega M, Kyle K (2023) Measuring the development of lexical richness of L2 Spanish: a longitudinal learner corpus study. Studies in Second Language Acquisition 1-31

Eckes T (2009) On common ground? How raters perceive scoring criteria in oral proficiency testing. In Brown A, Hill K (ed) Language testing and evaluation 13: Tasks and criteria in performance assessment (pp. 43–73). Peter Lang Publishing

Elliot S (2003) IntelliMetric: from here to validity. In: Shermis MD, Burstein JC (ed) Automated Essay Scoring: A Cross-Disciplinary Perspective. Lawrence Erlbaum Associates, Mahwah, NJ

Google Scholar  

Engber CA (1995) The relationship of lexical proficiency to the quality of ESL compositions. J. Second Lang. Writ. 4:139–155

Garner J, Crossley SA, Kyle K (2019) N-gram measures and L2 writing proficiency. System 80:176–187. https://doi.org/10.1016/j.system.2018.12.001

Haberman SJ (2008) When can subscores have value? J. Educat. Behav. Stat., 33:204–229

Haberman SJ, Yao L, Sinharay S (2015) Prediction of true test scores from observed item scores and ancillary data. Brit. J. Math. Stat. Psychol. 68:363–385

Halliday MAK (1985) Spoken and Written Language. Deakin University Press, Melbourne, Australia

Hirao R, Arai M, Shimanaka H et al. (2020) Automated essay scoring system for nonnative Japanese learners. Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pp. 1250–1257. European Language Resources Association

Hunt KW (1966) Recent Measures in Syntactic Development. Elementary English, 43(7), 732–739. http://www.jstor.org/stable/41386067

Ishioka T (2001) About e-rater, a computer-based automatic scoring system for essays [Konpyūta ni yoru essei no jidō saiten shisutemu e − rater ni tsuite]. University Entrance Examination. Forum [Daigaku nyūshi fōramu] 24:71–76

Hochreiter S, Schmidhuber J (1997) Long short- term memory. Neural Comput. 9(8):1735–1780

Article   CAS   PubMed   Google Scholar  

Ishioka T, Kameda M (2006) Automated Japanese essay scoring system based on articles written by experts. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, 17–18 July 2006, pp. 233-240. Association for Computational Linguistics, USA

Japan Foundation (2021) Retrieved from: https://www.jpf.gp.jp/j/project/japanese/survey/result/dl/survey2021/all.pdf

Jarvis S (2013a) Defining and measuring lexical diversity. In Jarvis S, Daller M (ed) Vocabulary knowledge: Human ratings and automated measures (Vol. 47, pp. 13–44). John Benjamins. https://doi.org/10.1075/sibil.47.03ch1

Jarvis S (2013b) Capturing the diversity in lexical diversity. Lang. Learn. 63:87–106. https://doi.org/10.1111/j.1467-9922.2012.00739.x

Jiang J, Quyang J, Liu H (2019) Interlanguage: A perspective of quantitative linguistic typology. Lang. Sci. 74:85–97

Kim M, Crossley SA, Kyle K (2018) Lexical sophistication as a multidimensional phenomenon: Relations to second language lexical proficiency, development, and writing quality. Mod. Lang. J. 102(1):120–141. https://doi.org/10.1111/modl.12447

Kojima T, Gu S, Reid M et al. (2022) Large language models are zero-shot reasoners. Advances in Neural Information Processing Systems, New Orleans, LA, 29 November-1 December, Curran Associates, Inc., Red Hook, NY

Kyle K, Crossley SA (2015) Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Q 49:757–786

Kyle K, Crossley SA, Berger CM (2018) The tool for the automatic analysis of lexical sophistication (TAALES): Version 2.0. Behav. Res. Methods 50:1030–1046. https://doi.org/10.3758/s13428-017-0924-4

Article   PubMed   Google Scholar  

Kyle K, Crossley SA, Jarvis S (2021) Assessing the validity of lexical diversity using direct judgements. Lang. Assess. Q. 18:154–170. https://doi.org/10.1080/15434303.2020.1844205

Landauer TK, Laham D, Foltz PW (2003) Automated essay scoring and annotation of essays with the Intelligent Essay Assessor. In Shermis MD, Burstein JC (ed), Automated Essay Scoring: A Cross-Disciplinary Perspective. Lawrence Erlbaum Associates, Mahwah, NJ

Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 159–174

Laufer B, Nation P (1995) Vocabulary size and use: Lexical richness in L2 written production. Appl. Linguist. 16:307–322. https://doi.org/10.1093/applin/16.3.307

Lee J, Hasebe Y (2017) jWriter Learner Text Evaluator, URL: https://jreadability.net/jwriter/

Lee J, Kobayashi N, Sakai T, Sakota K (2015) A Comparison of SPOT and J-CAT Based on Test Analysis [Tesuto bunseki ni motozuku ‘SPOT’ to ‘J-CAT’ no hikaku]. Research on the Acquisition of Second Language Japanese [Dainigengo to shite no nihongo no shūtoku kenkyū] (18) 53–69

Li W, Yan J (2021) Probability distribution of dependency distance based on a Treebank of. Japanese EFL Learners’ Interlanguage. J. Quant. Linguist. 28(2):172–186. https://doi.org/10.1080/09296174.2020.1754611

Article   MathSciNet   Google Scholar  

Linacre JM (2002) Optimizing rating scale category effectiveness. J. Appl. Meas. 3(1):85–106

PubMed   Google Scholar  

Linacre JM (1994) Constructing measurement with a Many-Facet Rasch Model. In Wilson M (ed) Objective measurement: Theory into practice, Volume 2 (pp. 129–144). Norwood, NJ: Ablex

Liu H (2008) Dependency distance as a metric of language comprehension difficulty. J. Cognitive Sci. 9:159–191

Liu H, Xu C, Liang J (2017) Dependency distance: A new perspective on syntactic patterns in natural languages. Phys. Life Rev. 21. https://doi.org/10.1016/j.plrev.2017.03.002

Loukina A, Madnani N, Cahill A, et al. (2020) Using PRMSE to evaluate automated scoring systems in the presence of label noise. Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, Seattle, WA, USA → Online, 10 July, pp. 18–29. Association for Computational Linguistics

Lu X (2010) Automatic analysis of syntactic complexity in second language writing. Int. J. Corpus Linguist. 15:474–496

Lu X (2012) The relationship of lexical richness to the quality of ESL learners’ oral narratives. Mod. Lang. J. 96:190–208

Lu X (2017) Automated measurement of syntactic complexity in corpus-based L2 writing research and implications for writing assessment. Lang. Test. 34:493–511

Lu X, Hu R (2022) Sense-aware lexical sophistication indices and their relationship to second language writing quality. Behav. Res. Method. 54:1444–1460. https://doi.org/10.3758/s13428-021-01675-6

Ministry of Health, Labor, and Welfare of Japan (2022) Retrieved from: https://www.mhlw.go.jp/stf/newpage_30367.html

Mizumoto A, Eguchi M (2023) Exploring the potential of using an AI language model for automated essay scoring. Res. Methods Appl. Linguist. 3:100050

Okgetheng B, Takeuchi K (2024) Estimating Japanese Essay Grading Scores with Large Language Models. Proceedings of 30th Annual Conference of the Language Processing Society in Japan, March 2024

Ortega L (2015) Second language learning explained? SLA across 10 contemporary theories. In VanPatten B, Williams J (ed) Theories in Second Language Acquisition: An Introduction

Rae JW, Borgeaud S, Cai T, et al. (2021) Scaling Language Models: Methods, Analysis & Insights from Training Gopher. ArXiv, abs/2112.11446

Read J (2000) Assessing vocabulary. Cambridge University Press. https://doi.org/10.1017/CBO9780511732942

Rudner LM, Liang T (2002) Automated Essay Scoring Using Bayes’ Theorem. J. Technol., Learning and Assessment, 1 (2)

Sakoda K, Hosoi Y (2020) Accuracy and complexity of Japanese Language usage by SLA learners in different learning environments based on the analysis of I-JAS, a learners’ corpus of Japanese as L2. Math. Linguist. 32(7):403–418. https://doi.org/10.24701/mathling.32.7_403

Suzuki N (1999) Summary of survey results regarding comprehensive essay questions. Final report of “Joint Research on Comprehensive Examinations for the Aim of Evaluating Applicability to Each Specialized Field of Universities” for 1996-2000 [shōronbun sōgō mondai ni kansuru chōsa kekka no gaiyō. Heisei 8 - Heisei 12-nendo daigaku no kaku senmon bun’ya e no tekisei no hyōka o mokuteki to suru sōgō shiken no arikata ni kansuru kyōdō kenkyū’ saishū hōkoku-sho]. University Entrance Examination Section Center Research and Development Department [Daigaku nyūshi sentā kenkyū kaihatsubu], 21–32

Taghipour K, Ng HT (2016) A neural approach to automated essay scoring. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, 1–5 November, pp. 1882–1891. Association for Computational Linguistics

Takeuchi K, Ohno M, Motojin K, Taguchi M, Inada Y, Iizuka M, Abo T, Ueda H (2021) Development of essay scoring methods based on reference texts with construction of research-available Japanese essay data. In IPSJ J 62(9):1586–1604

Ure J (1971) Lexical density: A computational technique and some findings. In Coultard M (ed) Talking about Text. English Language Research, University of Birmingham, Birmingham, England

Vaswani A, Shazeer N, Parmar N, et al. (2017) Attention is all you need. In Advances in Neural Information Processing Systems, Long Beach, CA, 4–7 December, pp. 5998–6008, Curran Associates, Inc., Red Hook, NY

Watanabe H, Taira Y, Inoue Y (1988) Analysis of essay evaluation data [Shōronbun hyōka dēta no kaiseki]. Bulletin of the Faculty of Education, University of Tokyo [Tōkyōdaigaku kyōiku gakubu kiyō], Vol. 28, 143–164

Yao S, Yu D, Zhao J, et al. (2023) Tree of thoughts: Deliberate problem solving with large language models. Advances in Neural Information Processing Systems, 36

Zenker F, Kyle K (2021) Investigating minimum text lengths for lexical diversity indices. Assess. Writ. 47:100505. https://doi.org/10.1016/j.asw.2020.100505

Zhang Y, Warstadt A, Li X, et al. (2021) When do you need billions of words of pretraining data? Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, pp. 1112-1125. Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.acl-long.90

Download references

This research was funded by National Foundation of Social Sciences (22BYY186) to Wenchao Li.

Author information

Authors and affiliations.

Department of Japanese Studies, Zhejiang University, Hangzhou, China

Department of Linguistics and Applied Linguistics, Zhejiang University, Hangzhou, China

You can also search for this author in PubMed   Google Scholar

Contributions

Wenchao Li is in charge of conceptualization, validation, formal analysis, investigation, data curation, visualization and writing the draft. Haitao Liu is in charge of supervision.

Corresponding author

Correspondence to Wenchao Li .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Ethical approval

Ethical approval was not required as the study did not involve human participants.

Informed consent

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental material file #1, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Li, W., Liu, H. Applying large language models for automated essay scoring for non-native Japanese. Humanit Soc Sci Commun 11 , 723 (2024). https://doi.org/10.1057/s41599-024-03209-9

Download citation

Received : 02 February 2024

Accepted : 16 May 2024

Published : 03 June 2024

DOI : https://doi.org/10.1057/s41599-024-03209-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

10 paragraph essay format

COMMENTS

  1. Example of a Great Essay

    This example guides you through the structure of an essay. It shows how to build an effective introduction, focused paragraphs, clear transitions between ideas, and a strong conclusion. Each paragraph addresses a single central point, introduced by a topic sentence, and each point is directly related to the thesis statement.

  2. How to Structure an Essay

    The basic structure of an essay always consists of an introduction, a body, and a conclusion. But for many students, the most difficult part of structuring an essay is deciding how to organize information within the body. This article provides useful templates and tips to help you outline your essay, make decisions about your structure, and ...

  3. How to Write an Essay Outline

    Revised on July 23, 2023. An essay outline is a way of planning the structure of your essay before you start writing. It involves writing quick summary sentences or phrases for every point you will cover in each paragraph, giving you a picture of how your argument will unfold. You'll sometimes be asked to submit an essay outline as a separate ...

  4. A step-by-step guide for creating and formatting APA Style student papers

    This article walks through the formatting steps needed to create an APA Style student paper, starting with a basic setup that applies to the entire paper (margins, font, line spacing, paragraph alignment and indentation, and page headers). It then covers formatting for the major sections of a student paper: the title page, the text, tables and ...

  5. PDF Strategies for Essay Writing

    When you write an essay for a course you are taking, you are being asked not only to create a product (the essay) but, more importantly, to go through a process of thinking more deeply about a question or problem related to the course. By writing about a source or collection of sources, you will have the chance to wrestle with some of the

  6. The 3 Popular Essay Formats: Which Should You Use?

    Formatting an essay may not be as interesting as choosing a topic to write about or carefully crafting elegant sentences, but it's an extremely important part of creating a high-quality paper. In this article, we'll explain essay formatting rules for three of the most popular essay styles: MLA, APA, and Chicago.

  7. How to Craft a Stellar 5-Paragraph Essay: A Step-by-Step Guide

    Write the Introduction. Start the essay with a " hook "—an attention-grabbing statement that will get the reader's interest. This could be an interesting fact, a quote, or a question. After the hook, introduce your topic and end the introduction with a clear thesis statement that presents your main argument or point.

  8. PDF Basic Essay and Paragraph Format

    A basic essay consists of three main parts: introduction, body, and conclusion. Following this format will help you write and organize an essay. However, flexibility is important. While keeping this basic essay format in mind, let the topic and specific assignment guide the writing and organization.

  9. 5 Paragraph Essay: Guide, Topics, Outline, Examples, Tips

    5 Paragraph Essay Format. Let's readdress the five-paragraph essay format and explain it in more detail. So, as already mentioned, it is a widely-used writing structure taught in many schools and universities. A five-paragraph essay comprises an introduction, three body paragraphs, and a conclusion, each playing a significant role in creating a ...

  10. Mastering the Five Paragraph Essay: Easy Steps for Successful Writing

    5. Look for consistency: Check for consistency in your writing style, tone, and formatting. Ensure that you maintain a consistent voice and perspective throughout your essay to keep your argument coherent. 6. Seek feedback from others: Consider asking a peer, teacher, or tutor to review your essay and provide feedback.

  11. Academic Paragraph Structure

    Table of contents. Step 1: Identify the paragraph's purpose. Step 2: Show why the paragraph is relevant. Step 3: Give evidence. Step 4: Explain or interpret the evidence. Step 5: Conclude the paragraph. Step 6: Read through the whole paragraph. When to start a new paragraph.

  12. Paragraphs

    Paragraphs are the building blocks of papers. Many students define paragraphs in terms of length: a paragraph is a group of at least five sentences, a paragraph is half a page long, etc. In reality, though, the unity and coherence of ideas among sentences is what constitutes a paragraph. A paragraph is defined as "a group of sentences or a ...

  13. Five-paragraph essay

    The five-paragraph essay is a form of essay having five paragraphs : one concluding paragraph. The introduction serves to inform the reader of the basic premises, and then to state the author's thesis, or central idea. A thesis can also be used to point out the subject of each body paragraph. When a thesis essay is applied to this format, the ...

  14. Five Paragraph Essay Outline

    If writing isn't one of your favorite requirements of academic life, check out these 10 tips and tricks to navigate creating a five paragraph essay smoothly: Begin early. Make an appointment with the teacher to discuss your ideas/progress and get feedback. Take good notes, and cite the sources as you go. Create a sentence outline before the ...

  15. General Format

    Type your paper on a computer and print it out on standard, white 8.5 x 11-inch paper. Double-space the text of your paper and use a legible font (e.g. Times New Roman). Whatever font you choose, MLA recommends that the regular and italics type styles contrast enough that they are each distinct from one another.

  16. Welcome to the Purdue Online Writing Lab

    Learn how to write effectively for academic, professional, and personal purposes at the Purdue Online Writing Lab, a free resource for writers of all levels.

  17. 10 sentence essay example

    Sure! Let's go through an example 10-sentence essay on the topic "The Importance of Time Management for Students." I'll also provide some tips for writing a concise essay along the way. 1. Time management is a vital skill for students, as it helps them balance academics, extracurricular activities, and personal life. (Tip: Start with a clear and concise topic sentence that introduces the main ...

  18. Rethinking the 5-Paragraph Essay in the ChatGPT Era

    A typical five-paragraph essay asks students to pick a simple thesis, usually from a list of prompts, and compose a short introductory paragraph, followed by three paragraphs each laying out a ...

  19. How to Write an Expository Essay

    The structure of your expository essay will vary according to the scope of your assignment and the demands of your topic. It's worthwhile to plan out your structure before you start, using an essay outline. A common structure for a short expository essay consists of five paragraphs: An introduction, three body paragraphs, and a conclusion.

  20. Applying large language models for automated essay scoring for non

    Synonym overlap paragraph (topic and keywords). ... a dataset of 686 individual essays was utilized. Table 10 presents a sample of the results, summarizing the means, standard deviations, and the ...