WordSelector

18 Other Ways to Say “However” in an Essay

other words for however essay

You’re in the midst of a formal essay, and it looks like you’ve used “however” far too many times. Well, you’ve come to the right place!

Below, we’ve compiled a list of great alternative terms that you can use when “however” starts to feel worn out. So, keep reading to find what you seek!

Other Ways to Say “However”

Nevertheless, alternatively.

  • Nonetheless
  • All the same
  • In spite of
  • Notwithstanding
  • On the other hand
  • In contrast

KEY TAKEAWAYS

  • It’s perfectly okay to use “however” in an essay.
  • “Nevertheless” is a good alternative to use to keep your paper diverse.
  • You can also use “alternatively” to avoid repetition of the word “however.”

Keep reading to see how we use our favorite synonyms for “however” in a couple of useful examples.

After that, we’ll consider whether it’s okay to use “however” in an essay. Is this considered bad practice?

If you’re wondering what to say instead of “however” in an essay, you might want to try “nevertheless.”

Firstly, like the original word, this term is used to introduce contrasting information relating to a previous statement. “Nevertheless” and “however” differ slightly in overall meaning.

However (or nevertheless), you’ll find that they can often be used interchangeably at the start of a sentence.

In other words, “nevertheless” is not a better word than “however” to use in formal or academic writing . But you can use this alternative to avoid repetition in your essay.

Finally, let’s see a couple of faux essay snippets making use of “nevertheless”:

Nevertheless , the ICO has provided several useful resources to guide sellers in their marketing pursuits.

After months of negotiations with unions, strikes broke out, nevertheless .

“Alternatively” is another word to use instead of “however” in academic writing. Like the original phrase, it can be used at the start of a sentence.

Essentially, “alternatively” means “as another possibility.” As such, it can be used to present a counterpoint to a previous statement in a paper.

However is just as effective as “alternatively,” but you can use this synonym to keep your phrasing diverse and your paper more interesting.

Lastly, let’s see a few examples making use of this term:

Small businesses feel that they have no choice but to cease the use of cold-calling altogether or, alternatively , undergo a costly remodeling of their marketing in an attempt to comply.

Alternatively , we may observe adaptation to these new conditions amongst our specimens.

Can I Use “However” in an Essay?

It is perfectly okay to use “however” in an essay . However, we do advise that you use it with caution.

Although it is not a bad word by any means, it is very easy to overuse it. This could be very detrimental to the appearance of your essay to any marker.

Therefore, it’s a good idea to use our list of synonyms to find other ways to say “however” when you have already used it.

Nevertheless, “however” is a perfectly polite word that can be used to introduce contrasting information or to transition to a new sentence. It is very effective, and you’re unlikely to find an academic paper that makes no use of it at all.

We hope you found our list of synonyms helpful. If you think you might need them the next time you’re drafting an essay, why not bookmark this page so you can find it again with ease?

  • 19 Gender-Neutral Alternatives to “Dear Sir or Madam”
  • 15 Other Ways to Say “If I Can Be of Further Assistance”
  • 11 Other Ways to Say “Much Appreciated”
  • 13 Other Ways to Say “Just a Friendly Reminder”

We are a team of experienced communication specialists.

Our mission is to help you choose the right phrase or word for your emails and texts.

Choosing the right words shouldn't be your limitation!

© WordSelector

What's the opposite of
Meaning of the word
Words that rhyme with
Sentences with the word
Translate to
Find Words Use * for blank tiles (max 2) Use * for blank spaces
Find the of
Pronounce the word in
Find Names    
Appearance
Use device theme  
Dark theme
Light theme
? ? Here's a list of from our that you can use instead. ) , did it anyway despite being told not to.” , our daughter wants the crunchy ones. Why don't we have both?” well or badly it goes.” was he able to pull it off despite all obstacles before him?” ) In any case or event from the greatest evils.”
Use * for blank tiles (max 2)
Use * for blank spaces

bottom_desktop desktop:[300x250]

go
Word Tools Finders & Helpers Apps More Synonyms


Copyright WordHippo © 2024

Another Word for However | 100+ Synonyms for However in English

“However” is a transitional word used to introduce a contrasting statement or idea. In this article, we provide over 100 however synonyms for “however,” ranging from formal to informal to creative.

Using synonyms can add variety and clarity to your writing or speech, and it’s important to consider the context and tone of your message when selecting the most appropriate alternative .

However Synonyms

Commonly used synonyms for “however”, formal synonyms for “however”, informal synonyms for “however”, creative synonyms for “however”, rarely used synonyms for “however”, 100+ different words to use instead of “however”.

  • However Synonyms ‘Infographic’

Using synonyms in writing is a great way to avoid repetition and make your writing more interesting. One word that is commonly used in writing is “however.” While it is an important word that helps to indicate contrast, using it repeatedly can make your writing feel stale.

This article will provide 100+ synonyms for “however” in English that you can use to make your writing more varied and interesting.

The following are some of the commonly used synonyms for “however”:

SynonymDefinition
ButUsed to introduce a statement that contrasts with what has already been said.
YetUsed to introduce a statement that contrasts with what has already been said.
AlthoughUsed to introduce a statement that contrasts with what has already been said.
NeverthelessUsed to introduce a statement that contrasts with what has already been said.
StillUsed to introduce a statement that contrasts with what has already been said.
NonethelessUsed to introduce a statement that contrasts with what has already been said.
RegardlessUsed to introduce a statement that contrasts with what has already been said.
Despite thisUsed to introduce a statement that contrasts with what has already been said.
In spite of thisUsed to introduce a statement that contrasts with what has already been said.
On the other handUsed to introduce a contrasting statement.
  • He studied hard for the exam; however , he still failed.
  • He promised to be on time; nevertheless , he arrived late.
  • She is a great speaker; nonetheless , her presentation lacked enthusiasm.
  • I know it’s risky; regardless , I think we should proceed.
  • The weather is bad; despite this , we decided to go camping.

The following are some of the formal synonyms for “however”:

SynonymDefinition
NotwithstandingUsed to introduce a statement that contrasts with what has already been said.
ConverselyUsed to introduce a statement that contrasts with what has already been said.
In contrastUsed to introduce a statement that contrasts with what has already been said.
By contrastUsed to introduce a statement that contrasts with what has already been said.
In any eventUsed to introduce a statement that contrasts
  • Notwithstanding the challenges, the project was completed on time.
  • The CEO was optimistic about the future; conversely , the shareholders were concerned.
  • The new product was a success; in contrast , the old product was not well received.
  • The team was well-prepared; by contrast , their opponents were not.
  • In any event , we need to be prepared for any scenario.

The following are some of the informal synonyms for “however”:

SynonymDefinition
AnywaysUsed to introduce a statement that contrasts with what has already been said.
AnywayUsed to introduce a statement that contrasts with what has already been said.
RegardlessUsed to introduce a statement that contrasts with what has already been said.
Be that as it mayUsed to introduce a statement that contrasts with what has already been said.
Even soUsed to introduce a statement that contrasts with what has already been said.
That being saidUsed to introduce a statement that contrasts with what has already been said.
Still and allUsed to introduce a statement that contrasts with what has already been said.
That saidUsed to introduce a statement that contrasts with what has already been said.
  • I know we’re behind schedule; anyways , we can still catch up.
  • He’s not the most reliable employee; anyway , we need to give him a chance.
  • The conditions were difficult; regardless , we completed the task.
  • Be that as it may , we need to find a solution to this problem.
  • The proposal has some flaws; even so, it’s worth considering.
  • That being said , we need to be careful not to overspend.
  • Still and all , we managed to finish the project on time.
  • That said , we need to weigh the pros and cons before making a decision.

The following are some of the creative synonyms for “however”:

SynonymDefinition
On the flip sideUsed to introduce a contrasting point of view or perspective.
On the contraryUsed to introduce a contrasting point of view or perspective.
That saidUsed to introduce a contrasting point of view or perspective.
In any caseUsed to introduce a contrasting point of view or perspective.
Then againUsed to introduce a contrasting point of view or perspective.
Having said thatUsed to introduce a contrasting point of view or perspective.
Even thoughUsed to introduce a contrasting point of view or perspective.
Despite the fact thatUsed to introduce a contrasting point of view or perspective.
  • The job is well-paid; on the flip side , it requires a lot of overtime.
  • He claimed to be an expert; on the contrary , he had no experience in the field.
  • That said , we need to consider other options before making a decision.
  • In any case , we need to address the issue as soon as possible.
  • The project is challenging; then again , it’s also rewarding.
  • Having said that , we need to be aware of the potential risks.
  • Even though we have a tight deadline, we can still deliver high-quality work.
  • Despite the fact that the market is competitive, we can still succeed.

The following are some of the rarely used synonyms for “however”:

SynonymDefinition
NonethelessUsed to introduce a statement that contrasts with what has already been said.
HowbeitUsed to introduce a statement that contrasts with what has already been said.
NotwithstandingUsed to introduce a statement that contrasts with what has already been said.
AlbeitUsed to
  • The weather was bad; nonetheless , we went ahead with the outdoor event.
  • The product is expensive; howbeit , it’s worth the investment.
  • Notwithstanding the challenges, we managed to complete the project on time.
  • Albeit the results were disappointing, we learned a lot from the experience.
No.SynonymDefinition
1AlthoughUsed to introduce a contrasting statement or idea.
2NeverthelessUsed to introduce a contrasting statement or idea.
3NonethelessUsed to introduce a contrasting statement or idea.
4YetUsed to introduce a contrasting statement or idea.
5StillUsed to introduce a contrasting statement or idea.
6DespiteUsed to introduce a contrasting statement or idea.
7In spite ofUsed to introduce a contrasting statement or idea.
8On the other handUsed to introduce a contrasting point of view or perspective.
9AlternativelyUsed to introduce a contrasting option or choice.
10In contrastUsed to introduce a contrasting statement or idea.
11ConverselyUsed to introduce a contrasting point of view or perspective.
12MeanwhileUsed to introduce a contrasting statement or idea that is happening at the same time.
13WhereasUsed to introduce a contrasting statement or idea.
14NeverthelessUsed to introduce a contrasting statement or idea.
15On the contraryUsed to introduce a contrasting point of view or perspective.
16NonethelessUsed to introduce a contrasting statement or idea.
17Even soUsed to introduce a contrasting statement or idea.
18Be that as it mayUsed to introduce a contrasting statement or idea.
19AnyhowUsed to introduce a contrasting statement or idea. (informal)
20AnywayUsed to introduce a contrasting statement or idea. (informal)
21RegardlessUsed to introduce a contrasting statement or idea. (informal)
22That saidUsed to introduce a contrasting point of view or perspective. (informal)
23Even thoughUsed to introduce a contrasting point of view or perspective. (creative)
24Despite the fact thatUsed to introduce a contrasting point of view or perspective. (creative)
25On the flip sideUsed to introduce a contrasting point of view or perspective. (creative)
26In any caseUsed to introduce a contrasting point of view or perspective. (creative)
27Then againUsed to introduce a contrasting point of view or perspective. (creative)
28Having said thatUsed to introduce a contrasting point of view or perspective. (creative)
29NathelessUsed to introduce a contrasting statement or idea. (rare)
30HowbeitUsed to introduce a contrasting statement or idea. (rare)
31NotwithstandingUsed to introduce a contrasting statement or idea. (rare)
32AlbeitUsed to introduce a contrasting statement or idea. (rare)
No.SynonymDefinition
33AlthoughUsed to introduce a contrasting statement or idea. (formal)
34NeverthelessUsed to introduce a contrasting statement or idea. (formal)
35NonethelessUsed to introduce a contrasting statement or idea. (formal)
36YetUsed to introduce a contrasting statement or idea. (formal)
37StillUsed to introduce a contrasting statement or idea. (formal)
38DespiteUsed to introduce a contrasting statement or idea. (formal)
39In spite ofUsed to introduce a contrasting statement or idea. (formal)
40WhereasUsed to introduce a contrasting statement or idea. (formal)
41By contrastUsed to introduce a contrasting point of view or perspective. (formal)
42AlternativelyUsed to introduce a contrasting option or choice. (formal)
43ConverselyUsed to introduce a contrasting point of view or perspective. (formal)
44In any eventUsed to introduce a contrasting point of view or perspective. (formal)
45In spite of the factUsed to introduce a contrasting point of view or perspective. (formal)
46In the face ofUsed to introduce a contrasting point of view or perspective. (formal)
47NonethelessUsed to introduce a contrasting statement or idea. (formal)
48On the contraryUsed to introduce a contrasting point of view or perspective. (formal)
49RegardlesssUsed to introduce a contrasting statement or idea. (formal)
50All the sameUsed to introduce a contrasting statement or idea. (formal)
51Even thoughUsed to introduce a contrasting point of view or perspective. (formal)
52HoweverUsed to introduce a contrasting statement or idea. (formal)
53At any rateUsed to introduce a contrasting statement or idea. (informal)
54Be that as it mayUsed to introduce a contrasting statement or idea. (informal)
55Even soUsed to introduce a contrasting statement or idea. (informal)
56AnyhowUsed to introduce a contrasting statement or idea. (informal)
57AnywayUsed to introduce a contrasting statement or idea. (informal)
58NonethelessUsed to introduce a contrasting statement or idea. (informal)
59RegardlessUsed to introduce a contrasting statement or idea. (informal)
60That saidUsed to introduce a contrasting point of view or perspective. (informal)
61Then againUsed to introduce a contrasting point of view or perspective. (informal)
62Despite the fact thatUsed to introduce a contrasting point of view or perspective. (creative)
No.SynonymDefinition
63On the other handUsed to introduce a contrasting point of view or perspective. (creative)
64OtherwiseUsed to introduce a contrasting option or choice. (creative)
65That being saidUsed to introduce a contrasting point of view or perspective. (creative)
66In contrastUsed to introduce a contrasting point of view or perspective. (creative)
67At the same timeUsed to introduce a contrasting statement or idea. (creative)
68NotwithstandingUsed to introduce a contrasting statement or idea. (creative)
69Despite thisUsed to introduce a contrasting statement or idea. (creative)
70ThenUsed to introduce a contrasting statement or idea. (creative)
71And yetUsed to introduce a contrasting statement or idea. (creative)
72NonethelessUsed to introduce a contrasting statement or idea. (creative)
73ContrarilyUsed to introduce a contrasting point of view or perspective. (creative)
74In either caseUsed to introduce a contrasting option or choice. (creative)
75In contrast toUsed to introduce a contrasting point of view or perspective. (creative)
76In the endUsed to introduce a contrasting statement or idea. (creative)
77At the end of the dayUsed to introduce a contrasting statement or idea. (creative)
78In factUsed to introduce a contrasting statement or idea. (informal)
79ActuallyUsed to introduce a contrasting statement or idea. (informal)
80But thenUsed to introduce a contrasting statement or idea. (informal)
81In realityUsed to introduce a contrasting statement or idea. (informal)
82NeverthelessUsed to introduce a contrasting statement or idea. (informal)
83On the flip sideUsed to introduce a contrasting point of view or perspective. (informal)
84As much asUsed to introduce a contrasting statement or idea. (informal)
85On the other sideUsed to introduce a contrasting point of view or perspective. (informal)
86To the contraryUsed to introduce a contrasting point of view or perspective. (informal)
87ConverselyUsed to introduce a contrasting point of view or perspective. (informal)
88Despite everythingUsed to introduce a contrasting statement or idea. (informal)
89Not the lessUsed to introduce a contrasting statement or idea. (informal)
90Mind youUsed to introduce a contrasting point of view or perspective. (informal)
91Either wayUsed to introduce a contrasting option or choice. (informal)
92OtherwiseUsed to introduce a contrasting option or choice. (informal)
No.SynonymDefinition
93All the whileUsed to introduce a contrasting statement or idea that has been happening continuously. (informal)
94Be that as it mayUsed to introduce a contrasting statement or idea, even though there might be some obstacles. (formal)
95Even soUsed to introduce a contrasting statement or idea. (formal)
96That saidUsed to introduce a contrasting statement or idea. (formal)
97By contrastUsed to introduce a contrasting point of view or perspective. (formal)
98NonethelessUsed to introduce a contrasting statement or idea. (formal)
99NeverthelessUsed to introduce a contrasting statement or idea. (formal)
100In any eventUsed to introduce a contrasting statement or idea. (formal)
101AlternativelyUsed to introduce a contrasting option or choice. (formal)
102By the same tokenUsed to introduce a contrasting point of view or perspective. (formal)
103That being the caseUsed to introduce a contrasting statement or idea. (formal)
104In the final analysisUsed to introduce a contrasting statement or idea. (formal)
105Even thoughUsed to introduce a contrasting statement or idea. (formal)
106In spite ofUsed to introduce a contrasting statement or idea. (formal)
107GrantedUsed to introduce a contrasting statement or idea. (formal)
108For all thatUsed to introduce a contrasting statement or idea. (formal)
109In essenceUsed to introduce a contrasting statement or idea. (formal)
110In any caseUsed to introduce a contrasting statement or idea. (formal)

However Synonyms ‘ Infographic’

however synonym

Silent Letters | English Words with Silent Letters from A To Z

99 creative ways to say good morning | good morning greetings & messages.

9 Words To Use Instead Of “However” (With Examples)

“However” is undoubtedly a great word to use to counter a previous point you’ve made in writing. It can be somewhat overdone, which is why we think it’s time to look at some synonyms for it. This article will explore all the best alternatives for “however.”

Which Words Can I Use Instead Of “However”?

The preferred version is “but” because it’s the most common one seen in English. Most people know what it means, and it’s easy to read on a page (since it only features three letters).

Another way to say however is by using “but,” and it’s perhaps the most popular choice to do so.

Nevertheless

“Nevertheless” and “nonetheless” are interchangeable and mean the same thing. You can use either to introduce a counterargument from the previous point.

The correct punctuation choices include a semi-colon or a period before “nevertheless” and a comma after it.

“Still” is one of the best choices to replace “but,” and many native speakers use the two interchangeably throughout their writing to keep things unique and creative.

“Still” works well to introduce a counterpoint and comes with the same punctuation needs as “however.” It means the same as “but” but is used as an independent clause to introduce a new idea to an argument or qualify a previous one.

People often find “yet” easy to use because it’s quick to say and only consists of three letters (much like “but”).

“Though” is another common choice which we often see instead of “however.” Many people think it works in the same way as “although.” It’s simply a shorter version of “although” in the case of countering an argument.

You will find “though” written quite a lot in English when introducing a counterpoint, and here are some examples of how it might look:

Despite That

“Despite that” is the first alternative way of saying “however” that uses more than one word in this list. We like it because it works well in more formal situations.

Be That As It May

Having said that.

Again, “having said that” is a formal choice, though it’s not quite as formal as some of the others.

What Does “However” Mean?

Can you say “but however”.

“But however” is never grammatically correct. It’s known as reduplication, where we use two words that have the same meaning. Ultimately “but however” means “but but” or “however however.” Both of those statements are grammatically incorrect.

Which Is Better: But Or However”?

Thesaurus for However

Related terms for however - synonyms, antonyms and sentences with however, similar meaning.

  • nevertheless
  • nonetheless
  • despite that
  • be that as it may
  • notwithstanding
  • having said that
  • in any case
  • all the same
  • at any rate
  • in any event
  • at the same time
  • for all that
  • in spite of that
  • none the less

Opposite meaning

  • accordingly
  • correspondingly
  • appropriately
  • as a consequence
  • consistently
  • as a result
  • consequently
  • because of this
  • in consequence
  • for this reason
  • due to the fact
  • for that reason
  • in that case
  • due to this
  • in so doing
  • in that event

Common usage

  • complication
  • contingency
  • decrepitude
  • distemperature

Sentence Examples

Proper usage in context.

  • A bear, however hard he tries, grows tubby without exercise Pooh's Little Instruction Book
  • However motherhood comes to you, it's a miracle Valerie Harper
  • However you make your living is where your talent lies Ernest Hemingway
  • No labor, however humble, is dishonoring Talmud
  • Boldness is a mask for fear, however great John Dryden

Interesting Literature

22 of the Best Synonyms for ‘However’

By Dr Oliver Tearle (Loughborough University)

There are lots of strange ideas surrounding the word ‘however’. Some teachers tell their students they shouldn’t begin a new sentence with the word ‘but’, and should substitute the word ‘however’ instead.

However (as it were), this misses the fact that ‘but’ and ‘however’ are different classes of words, with ‘but’ being a conjunction and ‘however’ being an adverb.

Curiously, the Oxford English Dictionary ( OED ) names Shakespeare as the first citation of ‘however’ as a synonym for ‘but’ or ‘notwithstanding’: Shakespeare’s late history play Henry VIII , co-authored with John Fletcher, contains the lines: ‘All the Land knowes that: / How euer, yet there is no great breach.’ As the tautological ‘How euer, yet’ shows, ‘However’ is being used to mean ‘yet’ or ‘nevertheless’ here.

In any case, there’s nothing wrong with using ‘but’ at the start of a sentence – and much can go wrong if you misuse ‘however’, treating it as a simple synonym for ‘but’. Let’s take a closer look at some of the alternatives to the word ‘however’ and how they can be used in speech and writing.

Synonyms for ‘however’

Let’s start with BUT . This short, simple word is a conjunction, because it is used to join clauses together, much like ‘and’. Consider these two statements, involving going to look for the cat:

A: I looked in the garden and the cat was there.

B: I looked in the garden but the cat wasn’t there.

In both cases, the ( italicised ) conjunction) joins the two clauses together, but in B, of course, the cat isn’t there so the conjunction but is used. But we could also have used however here:

I looked in the garden; however , the cat wasn’t there.

However is an adverb rather than a conjunction, hence the comma that follows it. Adverbs are often words ending in -ly , such as happily or sadly or quickly , and one can imagine a slightly different sentence that might read:

I looked in the garden; happily , the cat was there.

So but is a simpler and more direct way of saying virtually the same thing as however in such examples.

Another short word, YET , serves a similar function, and can therefore serve as a synonym for however , as in ‘I looked in the garden, yet the cat wasn’t there’.

The word STILL works slightly differently from but and yet and is, in some ways, closer to however than either of those. Indeed, in syntax it is often literally closer to however , since the two are used together, as in this example from the historian Thomas Babington Macaulay in 1825: ‘Still, however, there was another extreme which, though far less dangerous, was also to be avoided.’

In such an example, ‘still’ means something similar to NONETHELESS (or NEVERTHELESS : they are both synonyms for each other): that is, DESPITE THAT , THAT BEING SAID , ALL THE SAME , or JUST THE SAME .

For instance, ‘I know it’s useless buying a lottery ticket; still , someone’s got to win, haven’t they?’ A synonym for still in this sense (and for nonetheless/nevertheless ) is NOTWITHSTANDING .

AFTER ALL also performs this function, as in Edmund Spenser’s The Faerie Queene (1590): ‘Yet after all, he victour did suruiue’ means essentially, ‘however, he survived as victor’.

Indeed, a suite of words which also convey this idea of just the same or notwithstanding are REGARDLESS , ANYHOW , ANYWAY , and EVEN SO . Remember to steer clear of ‘irregardless’, a word frowned upon because it makes no sense (the ir- prefix presumably negates the word regardless , so its meaning would be the opposite of ‘regardless’).

THOUGH and ALTHOUGH are two more words which can be used more or less interchangeably with however . For instance, ‘He’s a good singer; however, he’s no Frank Sinatra’ could be rewritten quite easily to read, ‘He’s a good singer, (al)though he’s no Frank Sinatra’.

Using though (or although ) arguably softens the blow of the criticism of the person’s singing in the second half of the sentence, in a way that however does not: however acts as performative throat-clearing before delivering the stinging indictment of the singer’s abilities, whereas though and although keep the emphasis slightly focused towards the start of the sentence, and the good news (‘He’s a good singer …’).

A slightly more archaic synonym for however is HOWBEIT . Whereas albeit has lasted, howbeit , which means roughly BE THAT AS IT MAY , has become largely obsolete and so is best avoided as an archaism or old-fashioned word. Of course, if you’re writing historical fiction, it may be just the term you’re looking for!

Antonyms for ‘however’

If however sees the clause which follows it swerving away from the clause which precedes it (as in the example ‘I looked in the garden; however , the cat wasn’t there’), then good antonyms for however see the two clauses agreeing with each other: the latter one follows naturally and smoothly from the former.

With this in mind, we might identify THEREFORE , THUS , and SO as antonyms for however .

Discover more from Interesting Literature

Subscribe to get the latest posts to your email.

Type your email…

Subscribe now to keep reading and get access to the full archive.

Continue reading

Other Ways to Say HOWEVER : 42 Powerful Synonyms for However in English

other words for however essay

Other Ways to Say However

However synonyms with examples.

Learn another word for however with example sentences.

All the same , there is some truth in what he says.

Although  it was late, there was still enough time to keep the rendezvous .

The boy is so fat  and yet  he runs very fast.

Anyhow , we must find a way out of this impasse.

It’s going to be difficult.  Anyway , we can try.

At any rate , the size of new fields is diminishing.

I can’t really explain it, at the same time I’m not convinced.

I accept that he’s old and frail; be that as it may , he’s still a good politician.

A word is no arrow,  but  it can pierce the heart.

But despite that , Gloucestershire County Council still can’t find another school that will accept him as a full time student.

He has many faults,  but for all that  I like him.

We waited on for another hour,  but still  she didn’t come.

This is better,  but then again  it costs more.

He gives permission, and,  contrariwise , she refuses it.

Despite  the bad weather, we enjoyed ourselves.

Despite  that  you still owe me $100, I am willing to lend you another $100.

He refused to help me,  despite the fact that  I asked him several times.

It was raining, even so  we had to go out.

She hasn’t phoned,  even though  she said she would.

He has many faults, but  for all that  I like him.

He forgets most things, but  having said that , he always remembers my birthday.

Howbeit , we never met again.

In any case , she couldn’t have held a conversation there.

They went on with their basketball match  in spite of  the rain.

In spite of everything , I still believe that people are really good at heart.

In spite of that , the availability both here and in Britain should be known to the public at large.

I put on my raincoat and big straw hat. But we got soaked just the same .

He was very tired;  nevertheless  he went on walking.

Though he’s fool, I like him  nonetheless .

Notwithstanding  she is beautiful,she doesn’t think it goes for much.

On the flip side , accepting them and not having your needs met is not healthy either.

I want to go to the party, on the other hand  I ought to be studying.

He had worked very hard on the place; she, per contra , had little to do.

Regardless   of how often I correct him, he always makes the same mistake.

Even though you dislike us,  still and all  you should be polite.

That said, there are still places to get free Internet access.

Though  John and Andrew look exactly alike, they act quite differently.

She actually enjoys confrontation,  whereas  I prefer a quiet life.

Other man live to eat,  while  I eat to live.

The path was dark, yet I found my way.

Synonyms for However with Examples | Infographic

However Synonym

Related Posts

other words for however essay

  • Cambridge Dictionary +Plus

Synonyms and antonyms of however in English

Nevertheless.

{{randomImageQuizHook.quizId}}

Word of the Day

at the same time

Worse than or worst of all? How to use the words ‘worse’ and ‘worst’

Worse than or worst of all? How to use the words ‘worse’ and ‘worst’

Learn more with +Plus

  • Recent and Recommended {{#preferredDictionaries}} {{name}} {{/preferredDictionaries}}
  • Definitions Clear explanations of natural written and spoken English English Learner’s Dictionary Essential British English Essential American English
  • Grammar and thesaurus Usage explanations of natural written and spoken English Grammar Thesaurus
  • Pronunciation British and American pronunciations with audio English Pronunciation
  • English–Chinese (Simplified) Chinese (Simplified)–English
  • English–Chinese (Traditional) Chinese (Traditional)–English
  • English–Dutch Dutch–English
  • English–French French–English
  • English–German German–English
  • English–Indonesian Indonesian–English
  • English–Italian Italian–English
  • English–Japanese Japanese–English
  • English–Norwegian Norwegian–English
  • English–Polish Polish–English
  • English–Portuguese Portuguese–English
  • English–Spanish Spanish–English
  • English–Swedish Swedish–English
  • Dictionary +Plus Word Lists
  • conjunction

To add ${headword} to a word list please sign up or log in.

Add ${headword} to one of your lists below, or create a new one.

{{message}}

Something went wrong.

There was a problem sending your report.

ESL Forums

HOWEVER Synonym: 23 Useful Words to Use Instead of HOWEVER

Posted on Last updated: October 24, 2023

HOWEVER Synonym: 23 Useful Words to Use Instead of HOWEVER

Sharing is caring!

HOWEVER Synonym! Learning synonyms in English is one of the best ways to bolster your vocabulary. In this lesson, you will learn a list of synonyms for HOWEVER with picture and example sentences.

Table of Contents

HOWEVER Synonym List

  • Nonetheless
  • Notwithstanding
  • All the same
  • Be that as it may
  • For all that
  • In spite of
  • On the other hand
  • Without regard to
  • At any rate
  • In any case
  • Just the same
  • Nevertheless

HOWEVER Synonyms with Examples

Learn synonyms or other words for HOWEVER with example sentences.

Though he’s a fool, I like him  nonetheless .

He still went home,  notwithstanding  the midnight.

Withal , I always hold an optimistic attitude.

He still yearned after her, even  after all  these years.

I don’t need the literature at present. Thank you very much  all the same .

Anyhow , I must insist that you cannot steal.

I know you don’t like him but  be that as it may , you can at least be polite to him.

The law cannot make all men equal,  but  they are all equal before the law.

She looked lovely,  despite  her strange apparel.

For all that  he was an attractive little creature with a sweetly expressive face.

In spite of  their quarrel, they remain the best of friends.

I want to go to the party, but  on the other hand , I ought to be studying.

The female is generally drab, the male  per contra  brilliant.

Though  you cast out nature with a fork, it will still return.

The 2008 bonus plan was also designed to kick in  without regard to  paper losses.

I shall go and see the patient  anyway .

Well, I’m not going home on foot,  at any rate .

In any case , he is a friend of mine.

Don’t worry about looking handsome, or being strong and brave. Just as you love me unconditionally,I love you  just the same .

Fanned fires and forced love never did well  yet .

Synonym for However | Image

Learn HOWEVER Synonyms (Other Words for HOWEVER)

HOWEVER Synonym

  • Link to facebook
  • Link to linkedin
  • Link to twitter
  • Link to youtube
  • Writing Tips

How to Use the Word “However”

3-minute read

  • 30th January 2016

The adverb “however” is one that causes some confusion , so it’s important to use it correctly in your academic writing .

But how is this term used? And how do you make its meaning clear in your written work? It’s all a matter of punctuation…

However (Whatever)

This sense of “however” typically means “to whatever extent” or “in whatever manner”:

I’ll catch you one day, however far you run!

It’s not a formal event, so dress however you want.

Note that in the examples above, there’s no punctuation between “however” and the thing it’s modifying (i.e., distance/mode of dress).

Another (less common) use of this term is as a synonym for “how.” More specifically, it means “how under the circumstances,” so is typically used when referring to something challenging:

However do proofreaders remember all those grammatical rules?

As above, you’ll notice there is no punctuation between “however” and the rest of the sentence.

However (Nevertheless)

When this term is used as a conjunctive adverb to connect two contrasting points, it should be followed by a comma:

I had planned to go out today. It was rainy, however , so I stayed inside.

The initial results were positive. Further testing, however , is still required.

Find this useful?

Subscribe to our newsletter and get writing tips from our editors straight to your inbox.

Here, it is being used to contrast the latter sentence with the former. As such, we can reformulate these sentences to use “but” instead:

I had planned to go out today, but it was rainy, so I stayed inside.

The initial results were positive, but further testing is still required.

Can I Start a Sentence with However?

Since “however” can substitute for “but,” some claim it shouldn’t be used at the beginning of a sentence. Nevertheless, even if the idea that you shouldn’t use a conjunction like this were true, it wouldn’t apply in this case.

Unlike the coordinating conjunction “but,” “however” is not used to link two independent clauses in a single sentence. As such, if you want to use it to contrast two points, you need to make sure they are both complete sentences.

Beginning a sentence with “however” can even emphasize a contrast, since it flows more smoothly, foregrounds the comparison and ensures clarity:

The initial results were positive. However , further testing is still required.

But if you don’t want to use this term at the beginning of a new sentence, you can also connect two sentences with a semicolon:

The initial results were positive; however , further testing is still required.

However you choose to use “however,” however, make sure you punctuate correctly so that your reader will understand what you mean.

Share this article:

Post A New Comment

Got content that needs a quick turnaround? Let us polish your work. Explore our editorial business services.

9-minute read

How to Use Infographics to Boost Your Presentation

Is your content getting noticed? Capturing and maintaining an audience’s attention is a challenge when...

8-minute read

Why Interactive PDFs Are Better for Engagement

Are you looking to enhance engagement and captivate your audience through your professional documents? Interactive...

7-minute read

Seven Key Strategies for Voice Search Optimization

Voice search optimization is rapidly shaping the digital landscape, requiring content professionals to adapt their...

4-minute read

Five Creative Ways to Showcase Your Digital Portfolio

Are you a creative freelancer looking to make a lasting impression on potential clients or...

How to Ace Slack Messaging for Contractors and Freelancers

Effective professional communication is an important skill for contractors and freelancers navigating remote work environments....

How to Insert a Text Box in a Google Doc

Google Docs is a powerful collaborative tool, and mastering its features can significantly enhance your...

Logo Harvard University

Make sure your writing is the best it can be with our expert English proofreading and editing.

Synonyms of essay

  • as in article
  • as in attempt
  • as in to attempt
  • More from M-W
  • To save this word, you'll need to log in. Log In

Thesaurus Definition of essay

 (Entry 1 of 2)

Synonyms & Similar Words

  • dissertation
  • composition
  • prolegomenon
  • undertaking
  • trial and error
  • experimentation

Thesaurus Definition of essay  (Entry 2 of 2)

  • have a go at
  • try one's hand (at)

Antonyms & Near Antonyms

Synonym Chooser

How does the verb essay differ from other similar words?

Some common synonyms of essay are attempt , endeavor , strive , and try . While all these words mean "to make an effort to accomplish an end," essay implies difficulty but also suggests tentative trying or experimenting.

When might attempt be a better fit than essay ?

While the synonyms attempt and essay are close in meaning, attempt stresses the initiation or beginning of an effort.

Where would endeavor be a reasonable alternative to essay ?

Although the words endeavor and essay have much in common, endeavor heightens the implications of exertion and difficulty.

When is strive a more appropriate choice than essay ?

While in some cases nearly identical to essay , strive implies great exertion against great difficulty and specifically suggests persistent effort.

How do try and attempt relate to one another, in the sense of essay ?

Try is often close to attempt but may stress effort or experiment made in the hope of testing or proving something.

Articles Related to essay

alt 5a4412a517d28

To 'Essay' or 'Assay'?

You'll know the difference if you give it the old college essay

Thesaurus Entries Near essay

Cite this entry.

“Essay.” Merriam-Webster.com Thesaurus , Merriam-Webster, https://www.merriam-webster.com/thesaurus/essay. Accessed 9 Jun. 2024.

More from Merriam-Webster on essay

Nglish: Translation of essay for Spanish Speakers

Britannica English: Translation of essay for Arabic Speakers

Britannica.com: Encyclopedia article about essay

Subscribe to America's largest dictionary and get thousands more definitions and advanced search—ad free!

Play Quordle: Guess all four words in a limited number of tries.  Each of your guesses must be a real 5-letter word.

Can you solve 4 words at once?

Word of the day, consternation.

See Definitions and Examples »

Get Word of the Day daily email!

Popular in Grammar & Usage

What's the difference between 'fascism' and 'socialism', more commonly misspelled words, commonly misspelled words, how to use em dashes (—), en dashes (–) , and hyphens (-), absent letters that are heard anyway, popular in wordplay, the words of the week - june 7, 8 words for lesser-known musical instruments, 9 superb owl words, 10 words for lesser-known games and sports, etymologies for every day of the week, games & quizzes.

Play Blossom: Solve today's spelling word game by finding as many words as you can using just 7 letters. Longer words score more points.

  • Daily Crossword
  • Word Puzzle
  • Word Finder
  • Word of the Day
  • Synonym of the Day
  • Word of the Year
  • Language stories
  • All featured
  • Gender and sexuality
  • All pop culture
  • Writing hub
  • Grammar essentials
  • Commonly confused
  • All writing tips
  • Pop culture
  • Writing tips

Advertisement

noun as in written discourse

Strongest matches

  • dissertation

Strong matches

  • composition
  • disquisition
  • explication

noun as in try, attempt

  • undertaking

Weak matches

  • one's all
  • one's level best

verb as in try, attempt

  • have a crack
  • have a shot
  • make a run at
  • put to the test
  • take a stab at
  • take a whack at

Discover More

Example sentences.

As several of my colleagues commented, the result is good enough that it could pass for an essay written by a first-year undergraduate, and even get a pretty decent grade.

GPT-3 also raises concerns about the future of essay writing in the education system.

This little essay helps focus on self-knowledge in what you’re best at, and how you should prioritize your time.

As Steven Feldstein argues in the opening essay, technonationalism plays a part in the strengthening of other autocracies too.

He’s written a collection of essays on civil engineering life titled Bridginess, and to this day he and Lauren go on “bridge dates,” where they enjoy a meal and admire the view of a nearby span.

I think a certain kind of compelling essay has a piece of that.

The current attack on the Jews,” he wrote in a 1937 essay, “targets not just this people of 15 million but mankind as such.

The impulse to interpret seems to me what makes personal essay writing compelling.

To be honest, I think a lot of good essay writing comes out of that.

Someone recently sent me an old Joan Didion essay on self-respect that appeared in Vogue.

There is more of the uplifted forefinger and the reiterated point than I should have allowed myself in an essay.

Consequently he was able to turn in a clear essay upon the subject, which, upon examination, the king found to be free from error.

It is no part of the present essay to attempt to detail the particulars of a code of social legislation.

But angels and ministers of grace defend us from ministers of religion who essay art criticism!

It is fit that the imagination, which is free to go through all things, should essay such excursions.

Related Words

Words related to essay are not direct synonyms, but are associated with the word essay . Browse related words to learn more about word associations.

verb as in point or direct at a goal

  • concentrate
  • contemplate
  • set one's sights on

noun as in piece of writing

  • think piece

verb as in try, make effort

  • do level best
  • exert oneself
  • give a fling
  • give a whirl
  • give best shot
  • give it a go
  • give it a try
  • give old college try
  • go the limit
  • have a go at
  • shoot the works
  • take best shot
  • try one's hand at

Viewing 5 / 74 related words

On this page you'll find 154 synonyms, antonyms, and words related to essay, such as: article, discussion, dissertation, manuscript, paper, and piece.

From Roget's 21st Century Thesaurus, Third Edition Copyright © 2013 by the Philip Lief Group.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 03 June 2024

Applying large language models for automated essay scoring for non-native Japanese

  • Wenchao Li 1 &
  • Haitao Liu 2  

Humanities and Social Sciences Communications volume  11 , Article number:  723 ( 2024 ) Cite this article

320 Accesses

2 Altmetric

Metrics details

  • Language and linguistics

Recent advancements in artificial intelligence (AI) have led to an increased use of large language models (LLMs) for language assessment tasks such as automated essay scoring (AES), automated listening tests, and automated oral proficiency assessments. The application of LLMs for AES in the context of non-native Japanese, however, remains limited. This study explores the potential of LLM-based AES by comparing the efficiency of different models, i.e. two conventional machine training technology-based methods (Jess and JWriter), two LLMs (GPT and BERT), and one Japanese local LLM (Open-Calm large model). To conduct the evaluation, a dataset consisting of 1400 story-writing scripts authored by learners with 12 different first languages was used. Statistical analysis revealed that GPT-4 outperforms Jess and JWriter, BERT, and the Japanese language-specific trained Open-Calm large model in terms of annotation accuracy and predicting learning levels. Furthermore, by comparing 18 different models that utilize various prompts, the study emphasized the significance of prompts in achieving accurate and reliable evaluations using LLMs.

Similar content being viewed by others

other words for however essay

Scoring method of English composition integrating deep learning in higher vocational colleges

other words for however essay

ChatGPT-3.5 as writing assistance in students’ essays

other words for however essay

Detecting contract cheating through linguistic fingerprint

Conventional machine learning technology in aes.

AES has experienced significant growth with the advancement of machine learning technologies in recent decades. In the earlier stages of AES development, conventional machine learning-based approaches were commonly used. These approaches involved the following procedures: a) feeding the machine with a dataset. In this step, a dataset of essays is provided to the machine learning system. The dataset serves as the basis for training the model and establishing patterns and correlations between linguistic features and human ratings. b) the machine learning model is trained using linguistic features that best represent human ratings and can effectively discriminate learners’ writing proficiency. These features include lexical richness (Lu, 2012 ; Kyle and Crossley, 2015 ; Kyle et al. 2021 ), syntactic complexity (Lu, 2010 ; Liu, 2008 ), text cohesion (Crossley and McNamara, 2016 ), and among others. Conventional machine learning approaches in AES require human intervention, such as manual correction and annotation of essays. This human involvement was necessary to create a labeled dataset for training the model. Several AES systems have been developed using conventional machine learning technologies. These include the Intelligent Essay Assessor (Landauer et al. 2003 ), the e-rater engine by Educational Testing Service (Attali and Burstein, 2006 ; Burstein, 2003 ), MyAccess with the InterlliMetric scoring engine by Vantage Learning (Elliot, 2003 ), and the Bayesian Essay Test Scoring system (Rudner and Liang, 2002 ). These systems have played a significant role in automating the essay scoring process and providing quick and consistent feedback to learners. However, as touched upon earlier, conventional machine learning approaches rely on predetermined linguistic features and often require manual intervention, making them less flexible and potentially limiting their generalizability to different contexts.

In the context of the Japanese language, conventional machine learning-incorporated AES tools include Jess (Ishioka and Kameda, 2006 ) and JWriter (Lee and Hasebe, 2017 ). Jess assesses essays by deducting points from the perfect score, utilizing the Mainichi Daily News newspaper as a database. The evaluation criteria employed by Jess encompass various aspects, such as rhetorical elements (e.g., reading comprehension, vocabulary diversity, percentage of complex words, and percentage of passive sentences), organizational structures (e.g., forward and reverse connection structures), and content analysis (e.g., latent semantic indexing). JWriter employs linear regression analysis to assign weights to various measurement indices, such as average sentence length and total number of characters. These weights are then combined to derive the overall score. A pilot study involving the Jess model was conducted on 1320 essays at different proficiency levels, including primary, intermediate, and advanced. However, the results indicated that the Jess model failed to significantly distinguish between these essay levels. Out of the 16 measures used, four measures, namely median sentence length, median clause length, median number of phrases, and maximum number of phrases, did not show statistically significant differences between the levels. Additionally, two measures exhibited between-level differences but lacked linear progression: the number of attributives declined words and the Kanji/kana ratio. On the other hand, the remaining measures, including maximum sentence length, maximum clause length, number of attributive conjugated words, maximum number of consecutive infinitive forms, maximum number of conjunctive-particle clauses, k characteristic value, percentage of big words, and percentage of passive sentences, demonstrated statistically significant between-level differences and displayed linear progression.

Both Jess and JWriter exhibit notable limitations, including the manual selection of feature parameters and weights, which can introduce biases into the scoring process. The reliance on human annotators to label non-native language essays also introduces potential noise and variability in the scoring. Furthermore, an important concern is the possibility of system manipulation and cheating by learners who are aware of the regression equation utilized by the models (Hirao et al. 2020 ). These limitations emphasize the need for further advancements in AES systems to address these challenges.

Deep learning technology in AES

Deep learning has emerged as one of the approaches for improving the accuracy and effectiveness of AES. Deep learning-based AES methods utilize artificial neural networks that mimic the human brain’s functioning through layered algorithms and computational units. Unlike conventional machine learning, deep learning autonomously learns from the environment and past errors without human intervention. This enables deep learning models to establish nonlinear correlations, resulting in higher accuracy. Recent advancements in deep learning have led to the development of transformers, which are particularly effective in learning text representations. Noteworthy examples include bidirectional encoder representations from transformers (BERT) (Devlin et al. 2019 ) and the generative pretrained transformer (GPT) (OpenAI).

BERT is a linguistic representation model that utilizes a transformer architecture and is trained on two tasks: masked linguistic modeling and next-sentence prediction (Hirao et al. 2020 ; Vaswani et al. 2017 ). In the context of AES, BERT follows specific procedures, as illustrated in Fig. 1 : (a) the tokenized prompts and essays are taken as input; (b) special tokens, such as [CLS] and [SEP], are added to mark the beginning and separation of prompts and essays; (c) the transformer encoder processes the prompt and essay sequences, resulting in hidden layer sequences; (d) the hidden layers corresponding to the [CLS] tokens (T[CLS]) represent distributed representations of the prompts and essays; and (e) a multilayer perceptron uses these distributed representations as input to obtain the final score (Hirao et al. 2020 ).

figure 1

AES system with BERT (Hirao et al. 2020 ).

The training of BERT using a substantial amount of sentence data through the Masked Language Model (MLM) allows it to capture contextual information within the hidden layers. Consequently, BERT is expected to be capable of identifying artificial essays as invalid and assigning them lower scores (Mizumoto and Eguchi, 2023 ). In the context of AES for nonnative Japanese learners, Hirao et al. ( 2020 ) combined the long short-term memory (LSTM) model proposed by Hochreiter and Schmidhuber ( 1997 ) with BERT to develop a tailored automated Essay Scoring System. The findings of their study revealed that the BERT model outperformed both the conventional machine learning approach utilizing character-type features such as “kanji” and “hiragana”, as well as the standalone LSTM model. Takeuchi et al. ( 2021 ) presented an approach to Japanese AES that eliminates the requirement for pre-scored essays by relying solely on reference texts or a model answer for the essay task. They investigated multiple similarity evaluation methods, including frequency of morphemes, idf values calculated on Wikipedia, LSI, LDA, word-embedding vectors, and document vectors produced by BERT. The experimental findings revealed that the method utilizing the frequency of morphemes with idf values exhibited the strongest correlation with human-annotated scores across different essay tasks. The utilization of BERT in AES encounters several limitations. Firstly, essays often exceed the model’s maximum length limit. Second, only score labels are available for training, which restricts access to additional information.

Mizumoto and Eguchi ( 2023 ) were pioneers in employing the GPT model for AES in non-native English writing. Their study focused on evaluating the accuracy and reliability of AES using the GPT-3 text-davinci-003 model, analyzing a dataset of 12,100 essays from the corpus of nonnative written English (TOEFL11). The findings indicated that AES utilizing the GPT-3 model exhibited a certain degree of accuracy and reliability. They suggest that GPT-3-based AES systems hold the potential to provide support for human ratings. However, applying GPT model to AES presents a unique natural language processing (NLP) task that involves considerations such as nonnative language proficiency, the influence of the learner’s first language on the output in the target language, and identifying linguistic features that best indicate writing quality in a specific language. These linguistic features may differ morphologically or syntactically from those present in the learners’ first language, as observed in (1)–(3).

我-送了-他-一本-书

Wǒ-sòngle-tā-yī běn-shū

1 sg .-give. past- him-one .cl- book

“I gave him a book.”

Agglutinative

彼-に-本-を-あげ-まし-た

Kare-ni-hon-o-age-mashi-ta

3 sg .- dat -hon- acc- give.honorification. past

Inflectional

give, give-s, gave, given, giving

Additionally, the morphological agglutination and subject-object-verb (SOV) order in Japanese, along with its idiomatic expressions, pose additional challenges for applying language models in AES tasks (4).

足-が 棒-に なり-ました

Ashi-ga bo-ni nar-mashita

leg- nom stick- dat become- past

“My leg became like a stick (I am extremely tired).”

The example sentence provided demonstrates the morpho-syntactic structure of Japanese and the presence of an idiomatic expression. In this sentence, the verb “なる” (naru), meaning “to become”, appears at the end of the sentence. The verb stem “なり” (nari) is attached with morphemes indicating honorification (“ます” - mashu) and tense (“た” - ta), showcasing agglutination. While the sentence can be literally translated as “my leg became like a stick”, it carries an idiomatic interpretation that implies “I am extremely tired”.

To overcome this issue, CyberAgent Inc. ( 2023 ) has developed the Open-Calm series of language models specifically designed for Japanese. Open-Calm consists of pre-trained models available in various sizes, such as Small, Medium, Large, and 7b. Figure 2 depicts the fundamental structure of the Open-Calm model. A key feature of this architecture is the incorporation of the Lora Adapter and GPT-NeoX frameworks, which can enhance its language processing capabilities.

figure 2

GPT-NeoX Model Architecture (Okgetheng and Takeuchi 2024 ).

In a recent study conducted by Okgetheng and Takeuchi ( 2024 ), they assessed the efficacy of Open-Calm language models in grading Japanese essays. The research utilized a dataset of approximately 300 essays, which were annotated by native Japanese educators. The findings of the study demonstrate the considerable potential of Open-Calm language models in automated Japanese essay scoring. Specifically, among the Open-Calm family, the Open-Calm Large model (referred to as OCLL) exhibited the highest performance. However, it is important to note that, as of the current date, the Open-Calm Large model does not offer public access to its server. Consequently, users are required to independently deploy and operate the environment for OCLL. In order to utilize OCLL, users must have a PC equipped with an NVIDIA GeForce RTX 3060 (8 or 12 GB VRAM).

In summary, while the potential of LLMs in automated scoring of nonnative Japanese essays has been demonstrated in two studies—BERT-driven AES (Hirao et al. 2020 ) and OCLL-based AES (Okgetheng and Takeuchi, 2024 )—the number of research efforts in this area remains limited.

Another significant challenge in applying LLMs to AES lies in prompt engineering and ensuring its reliability and effectiveness (Brown et al. 2020 ; Rae et al. 2021 ; Zhang et al. 2021 ). Various prompting strategies have been proposed, such as the zero-shot chain of thought (CoT) approach (Kojima et al. 2022 ), which involves manually crafting diverse and effective examples. However, manual efforts can lead to mistakes. To address this, Zhang et al. ( 2021 ) introduced an automatic CoT prompting method called Auto-CoT, which demonstrates matching or superior performance compared to the CoT paradigm. Another prompt framework is trees of thoughts, enabling a model to self-evaluate its progress at intermediate stages of problem-solving through deliberate reasoning (Yao et al. 2023 ).

Beyond linguistic studies, there has been a noticeable increase in the number of foreign workers in Japan and Japanese learners worldwide (Ministry of Health, Labor, and Welfare of Japan, 2022 ; Japan Foundation, 2021 ). However, existing assessment methods, such as the Japanese Language Proficiency Test (JLPT), J-CAT, and TTBJ Footnote 1 , primarily focus on reading, listening, vocabulary, and grammar skills, neglecting the evaluation of writing proficiency. As the number of workers and language learners continues to grow, there is a rising demand for an efficient AES system that can reduce costs and time for raters and be utilized for employment, examinations, and self-study purposes.

This study aims to explore the potential of LLM-based AES by comparing the effectiveness of five models: two LLMs (GPT Footnote 2 and BERT), one Japanese local LLM (OCLL), and two conventional machine learning-based methods (linguistic feature-based scoring tools - Jess and JWriter).

The research questions addressed in this study are as follows:

To what extent do the LLM-driven AES and linguistic feature-based AES, when used as automated tools to support human rating, accurately reflect test takers’ actual performance?

What influence does the prompt have on the accuracy and performance of LLM-based AES methods?

The subsequent sections of the manuscript cover the methodology, including the assessment measures for nonnative Japanese writing proficiency, criteria for prompts, and the dataset. The evaluation section focuses on the analysis of annotations and rating scores generated by LLM-driven and linguistic feature-based AES methods.

Methodology

The dataset utilized in this study was obtained from the International Corpus of Japanese as a Second Language (I-JAS) Footnote 3 . This corpus consisted of 1000 participants who represented 12 different first languages. For the study, the participants were given a story-writing task on a personal computer. They were required to write two stories based on the 4-panel illustrations titled “Picnic” and “The key” (see Appendix A). Background information for the participants was provided by the corpus, including their Japanese language proficiency levels assessed through two online tests: J-CAT and SPOT. These tests evaluated their reading, listening, vocabulary, and grammar abilities. The learners’ proficiency levels were categorized into six levels aligned with the Common European Framework of Reference for Languages (CEFR) and the Reference Framework for Japanese Language Education (RFJLE): A1, A2, B1, B2, C1, and C2. According to Lee et al. ( 2015 ), there is a high level of agreement (r = 0.86) between the J-CAT and SPOT assessments, indicating that the proficiency certifications provided by J-CAT are consistent with those of SPOT. However, it is important to note that the scores of J-CAT and SPOT do not have a one-to-one correspondence. In this study, the J-CAT scores were used as a benchmark to differentiate learners of different proficiency levels. A total of 1400 essays were utilized, representing the beginner (aligned with A1), A2, B1, B2, C1, and C2 levels based on the J-CAT scores. Table 1 provides information about the learners’ proficiency levels and their corresponding J-CAT and SPOT scores.

A dataset comprising a total of 1400 essays from the story writing tasks was collected. Among these, 714 essays were utilized to evaluate the reliability of the LLM-based AES method, while the remaining 686 essays were designated as development data to assess the LLM-based AES’s capability to distinguish participants with varying proficiency levels. The GPT 4 API was used in this study. A detailed explanation of the prompt-assessment criteria is provided in Section Prompt . All essays were sent to the model for measurement and scoring.

Measures of writing proficiency for nonnative Japanese

Japanese exhibits a morphologically agglutinative structure where morphemes are attached to the word stem to convey grammatical functions such as tense, aspect, voice, and honorifics, e.g. (5).

食べ-させ-られ-まし-た-か

tabe-sase-rare-mashi-ta-ka

[eat (stem)-causative-passive voice-honorification-tense. past-question marker]

Japanese employs nine case particles to indicate grammatical functions: the nominative case particle が (ga), the accusative case particle を (o), the genitive case particle の (no), the dative case particle に (ni), the locative/instrumental case particle で (de), the ablative case particle から (kara), the directional case particle へ (e), and the comitative case particle と (to). The agglutinative nature of the language, combined with the case particle system, provides an efficient means of distinguishing between active and passive voice, either through morphemes or case particles, e.g. 食べる taberu “eat concusive . ” (active voice); 食べられる taberareru “eat concusive . ” (passive voice). In the active voice, “パン を 食べる” (pan o taberu) translates to “to eat bread”. On the other hand, in the passive voice, it becomes “パン が 食べられた” (pan ga taberareta), which means “(the) bread was eaten”. Additionally, it is important to note that different conjugations of the same lemma are considered as one type in order to ensure a comprehensive assessment of the language features. For example, e.g., 食べる taberu “eat concusive . ”; 食べている tabeteiru “eat progress .”; 食べた tabeta “eat past . ” as one type.

To incorporate these features, previous research (Suzuki, 1999 ; Watanabe et al. 1988 ; Ishioka, 2001 ; Ishioka and Kameda, 2006 ; Hirao et al. 2020 ) has identified complexity, fluency, and accuracy as crucial factors for evaluating writing quality. These criteria are assessed through various aspects, including lexical richness (lexical density, diversity, and sophistication), syntactic complexity, and cohesion (Kyle et al. 2021 ; Mizumoto and Eguchi, 2023 ; Ure, 1971 ; Halliday, 1985 ; Barkaoui and Hadidi, 2020 ; Zenker and Kyle, 2021 ; Kim et al. 2018 ; Lu, 2017 ; Ortega, 2015 ). Therefore, this study proposes five scoring categories: lexical richness, syntactic complexity, cohesion, content elaboration, and grammatical accuracy. A total of 16 measures were employed to capture these categories. The calculation process and specific details of these measures can be found in Table 2 .

T-unit, first introduced by Hunt ( 1966 ), is a measure used for evaluating speech and composition. It serves as an indicator of syntactic development and represents the shortest units into which a piece of discourse can be divided without leaving any sentence fragments. In the context of Japanese language assessment, Sakoda and Hosoi ( 2020 ) utilized T-unit as the basic unit to assess the accuracy and complexity of Japanese learners’ speaking and storytelling. The calculation of T-units in Japanese follows the following principles:

A single main clause constitutes 1 T-unit, regardless of the presence or absence of dependent clauses, e.g. (6).

ケンとマリはピクニックに行きました (main clause): 1 T-unit.

If a sentence contains a main clause along with subclauses, each subclause is considered part of the same T-unit, e.g. (7).

天気が良かった の で (subclause)、ケンとマリはピクニックに行きました (main clause): 1 T-unit.

In the case of coordinate clauses, where multiple clauses are connected, each coordinated clause is counted separately. Thus, a sentence with coordinate clauses may have 2 T-units or more, e.g. (8).

ケンは地図で場所を探して (coordinate clause)、マリはサンドイッチを作りました (coordinate clause): 2 T-units.

Lexical diversity refers to the range of words used within a text (Engber, 1995 ; Kyle et al. 2021 ) and is considered a useful measure of the breadth of vocabulary in L n production (Jarvis, 2013a , 2013b ).

The type/token ratio (TTR) is widely recognized as a straightforward measure for calculating lexical diversity and has been employed in numerous studies. These studies have demonstrated a strong correlation between TTR and other methods of measuring lexical diversity (e.g., Bentz et al. 2016 ; Čech and Miroslav, 2018 ; Çöltekin and Taraka, 2018 ). TTR is computed by considering both the number of unique words (types) and the total number of words (tokens) in a given text. Given that the length of learners’ writing texts can vary, this study employs the moving average type-token ratio (MATTR) to mitigate the influence of text length. MATTR is calculated using a 50-word moving window. Initially, a TTR is determined for words 1–50 in an essay, followed by words 2–51, 3–52, and so on until the end of the essay is reached (Díez-Ortega and Kyle, 2023 ). The final MATTR scores were obtained by averaging the TTR scores for all 50-word windows. The following formula was employed to derive MATTR:

\({\rm{MATTR}}({\rm{W}})=\frac{{\sum }_{{\rm{i}}=1}^{{\rm{N}}-{\rm{W}}+1}{{\rm{F}}}_{{\rm{i}}}}{{\rm{W}}({\rm{N}}-{\rm{W}}+1)}\)

Here, N refers to the number of tokens in the corpus. W is the randomly selected token size (W < N). \({F}_{i}\) is the number of types in each window. The \({\rm{MATTR}}({\rm{W}})\) is the mean of a series of type-token ratios (TTRs) based on the word form for all windows. It is expected that individuals with higher language proficiency will produce texts with greater lexical diversity, as indicated by higher MATTR scores.

Lexical density was captured by the ratio of the number of lexical words to the total number of words (Lu, 2012 ). Lexical sophistication refers to the utilization of advanced vocabulary, often evaluated through word frequency indices (Crossley et al. 2013 ; Haberman, 2008 ; Kyle and Crossley, 2015 ; Laufer and Nation, 1995 ; Lu, 2012 ; Read, 2000 ). In line of writing, lexical sophistication can be interpreted as vocabulary breadth, which entails the appropriate usage of vocabulary items across various lexicon-grammatical contexts and registers (Garner et al. 2019 ; Kim et al. 2018 ; Kyle et al. 2018 ). In Japanese specifically, words are considered lexically sophisticated if they are not included in the “Japanese Education Vocabulary List Ver 1.0”. Footnote 4 Consequently, lexical sophistication was calculated by determining the number of sophisticated word types relative to the total number of words per essay. Furthermore, it has been suggested that, in Japanese writing, sentences should ideally have a length of no more than 40 to 50 characters, as this promotes readability. Therefore, the median and maximum sentence length can be considered as useful indices for assessment (Ishioka and Kameda, 2006 ).

Syntactic complexity was assessed based on several measures, including the mean length of clauses, verb phrases per T-unit, clauses per T-unit, dependent clauses per T-unit, complex nominals per clause, adverbial clauses per clause, coordinate phrases per clause, and mean dependency distance (MDD). The MDD reflects the distance between the governor and dependent positions in a sentence. A larger dependency distance indicates a higher cognitive load and greater complexity in syntactic processing (Liu, 2008 ; Liu et al. 2017 ). The MDD has been established as an efficient metric for measuring syntactic complexity (Jiang, Quyang, and Liu, 2019 ; Li and Yan, 2021 ). To calculate the MDD, the position numbers of the governor and dependent are subtracted, assuming that words in a sentence are assigned in a linear order, such as W1 … Wi … Wn. In any dependency relationship between words Wa and Wb, Wa is the governor and Wb is the dependent. The MDD of the entire sentence was obtained by taking the absolute value of governor – dependent:

MDD = \(\frac{1}{n}{\sum }_{i=1}^{n}|{\rm{D}}{{\rm{D}}}_{i}|\)

In this formula, \(n\) represents the number of words in the sentence, and \({DD}i\) is the dependency distance of the \({i}^{{th}}\) dependency relationship of a sentence. Building on this, the annotation of sentence ‘Mary-ga-John-ni-keshigomu-o-watashita was [Mary- top -John- dat -eraser- acc -give- past] ’. The sentence’s MDD would be 2. Table 3 provides the CSV file as a prompt for GPT 4.

Cohesion (semantic similarity) and content elaboration aim to capture the ideas presented in test taker’s essays. Cohesion was assessed using three measures: Synonym overlap/paragraph (topic), Synonym overlap/paragraph (keywords), and word2vec cosine similarity. Content elaboration and development were measured as the number of metadiscourse markers (type)/number of words. To capture content closely, this study proposed a novel-distance based representation, by encoding the cosine distance between the essay (by learner) and essay task’s (topic and keyword) i -vectors. The learner’s essay is decoded into a word sequence, and aligned to the essay task’ topic and keyword for log-likelihood measurement. The cosine distance reveals the content elaboration score in the leaners’ essay. The mathematical equation of cosine similarity between target-reference vectors is shown in (11), assuming there are i essays and ( L i , …. L n ) and ( N i , …. N n ) are the vectors representing the learner and task’s topic and keyword respectively. The content elaboration distance between L i and N i was calculated as follows:

\(\cos \left(\theta \right)=\frac{{\rm{L}}\,\cdot\, {\rm{N}}}{\left|{\rm{L}}\right|{\rm{|N|}}}=\frac{\mathop{\sum }\nolimits_{i=1}^{n}{L}_{i}{N}_{i}}{\sqrt{\mathop{\sum }\nolimits_{i=1}^{n}{L}_{i}^{2}}\sqrt{\mathop{\sum }\nolimits_{i=1}^{n}{N}_{i}^{2}}}\)

A high similarity value indicates a low difference between the two recognition outcomes, which in turn suggests a high level of proficiency in content elaboration.

To evaluate the effectiveness of the proposed measures in distinguishing different proficiency levels among nonnative Japanese speakers’ writing, we conducted a multi-faceted Rasch measurement analysis (Linacre, 1994 ). This approach applies measurement models to thoroughly analyze various factors that can influence test outcomes, including test takers’ proficiency, item difficulty, and rater severity, among others. The underlying principles and functionality of multi-faceted Rasch measurement are illustrated in (12).

\(\log \left(\frac{{P}_{{nijk}}}{{P}_{{nij}(k-1)}}\right)={B}_{n}-{D}_{i}-{C}_{j}-{F}_{k}\)

(12) defines the logarithmic transformation of the probability ratio ( P nijk /P nij(k-1) )) as a function of multiple parameters. Here, n represents the test taker, i denotes a writing proficiency measure, j corresponds to the human rater, and k represents the proficiency score. The parameter B n signifies the proficiency level of test taker n (where n ranges from 1 to N). D j represents the difficulty parameter of test item i (where i ranges from 1 to L), while C j represents the severity of rater j (where j ranges from 1 to J). Additionally, F k represents the step difficulty for a test taker to move from score ‘k-1’ to k . P nijk refers to the probability of rater j assigning score k to test taker n for test item i . P nij(k-1) represents the likelihood of test taker n being assigned score ‘k-1’ by rater j for test item i . Each facet within the test is treated as an independent parameter and estimated within the same reference framework. To evaluate the consistency of scores obtained through both human and computer analysis, we utilized the Infit mean-square statistic. This statistic is a chi-square measure divided by the degrees of freedom and is weighted with information. It demonstrates higher sensitivity to unexpected patterns in responses to items near a person’s proficiency level (Linacre, 2002 ). Fit statistics are assessed based on predefined thresholds for acceptable fit. For the Infit MNSQ, which has a mean of 1.00, different thresholds have been suggested. Some propose stricter thresholds ranging from 0.7 to 1.3 (Bond et al. 2021 ), while others suggest more lenient thresholds ranging from 0.5 to 1.5 (Eckes, 2009 ). In this study, we adopted the criterion of 0.70–1.30 for the Infit MNSQ.

Moving forward, we can now proceed to assess the effectiveness of the 16 proposed measures based on five criteria for accurately distinguishing various levels of writing proficiency among non-native Japanese speakers. To conduct this evaluation, we utilized the development dataset from the I-JAS corpus, as described in Section Dataset . Table 4 provides a measurement report that presents the performance details of the 14 metrics under consideration. The measure separation was found to be 4.02, indicating a clear differentiation among the measures. The reliability index for the measure separation was 0.891, suggesting consistency in the measurement. Similarly, the person separation reliability index was 0.802, indicating the accuracy of the assessment in distinguishing between individuals. All 16 measures demonstrated Infit mean squares within a reasonable range, ranging from 0.76 to 1.28. The Synonym overlap/paragraph (topic) measure exhibited a relatively high outfit mean square of 1.46, although the Infit mean square falls within an acceptable range. The standard error for the measures ranged from 0.13 to 0.28, indicating the precision of the estimates.

Table 5 further illustrated the weights assigned to different linguistic measures for score prediction, with higher weights indicating stronger correlations between those measures and higher scores. Specifically, the following measures exhibited higher weights compared to others: moving average type token ratio per essay has a weight of 0.0391. Mean dependency distance had a weight of 0.0388. Mean length of clause, calculated by dividing the number of words by the number of clauses, had a weight of 0.0374. Complex nominals per T-unit, calculated by dividing the number of complex nominals by the number of T-units, had a weight of 0.0379. Coordinate phrases rate, calculated by dividing the number of coordinate phrases by the number of clauses, had a weight of 0.0325. Grammatical error rate, representing the number of errors per essay, had a weight of 0.0322.

Criteria (output indicator)

The criteria used to evaluate the writing ability in this study were based on CEFR, which follows a six-point scale ranging from A1 to C2. To assess the quality of Japanese writing, the scoring criteria from Table 6 were utilized. These criteria were derived from the IELTS writing standards and served as assessment guidelines and prompts for the written output.

A prompt is a question or detailed instruction that is provided to the model to obtain a proper response. After several pilot experiments, we decided to provide the measures (Section Measures of writing proficiency for nonnative Japanese ) as the input prompt and use the criteria (Section Criteria (output indicator) ) as the output indicator. Regarding the prompt language, considering that the LLM was tasked with rating Japanese essays, would prompt in Japanese works better Footnote 5 ? We conducted experiments comparing the performance of GPT-4 using both English and Japanese prompts. Additionally, we utilized the Japanese local model OCLL with Japanese prompts. Multiple trials were conducted using the same sample. Regardless of the prompt language used, we consistently obtained the same grading results with GPT-4, which assigned a grade of B1 to the writing sample. This suggested that GPT-4 is reliable and capable of producing consistent ratings regardless of the prompt language. On the other hand, when we used Japanese prompts with the Japanese local model “OCLL”, we encountered inconsistent grading results. Out of 10 attempts with OCLL, only 6 yielded consistent grading results (B1), while the remaining 4 showed different outcomes, including A1 and B2 grades. These findings indicated that the language of the prompt was not the determining factor for reliable AES. Instead, the size of the training data and the model parameters played crucial roles in achieving consistent and reliable AES results for the language model.

The following is the utilized prompt, which details all measures and requires the LLM to score the essays using holistic and trait scores.

Please evaluate Japanese essays written by Japanese learners and assign a score to each essay on a six-point scale, ranging from A1, A2, B1, B2, C1 to C2. Additionally, please provide trait scores and display the calculation process for each trait score. The scoring should be based on the following criteria:

Moving average type-token ratio.

Number of lexical words (token) divided by the total number of words per essay.

Number of sophisticated word types divided by the total number of words per essay.

Mean length of clause.

Verb phrases per T-unit.

Clauses per T-unit.

Dependent clauses per T-unit.

Complex nominals per clause.

Adverbial clauses per clause.

Coordinate phrases per clause.

Mean dependency distance.

Synonym overlap paragraph (topic and keywords).

Word2vec cosine similarity.

Connectives per essay.

Conjunctions per essay.

Number of metadiscourse markers (types) divided by the total number of words.

Number of errors per essay.

Japanese essay text

出かける前に二人が地図を見ている間に、サンドイッチを入れたバスケットに犬が入ってしまいました。それに気づかずに二人は楽しそうに出かけて行きました。やがて突然犬がバスケットから飛び出し、二人は驚きました。バスケット の 中を見ると、食べ物はすべて犬に食べられていて、二人は困ってしまいました。(ID_JJJ01_SW1)

The score of the example above was B1. Figure 3 provides an example of holistic and trait scores provided by GPT-4 (with a prompt indicating all measures) via Bing Footnote 6 .

figure 3

Example of GPT-4 AES and feedback (with a prompt indicating all measures).

Statistical analysis

The aim of this study is to investigate the potential use of LLM for nonnative Japanese AES. It seeks to compare the scoring outcomes obtained from feature-based AES tools, which rely on conventional machine learning technology (i.e. Jess, JWriter), with those generated by AI-driven AES tools utilizing deep learning technology (BERT, GPT, OCLL). To assess the reliability of a computer-assisted annotation tool, the study initially established human-human agreement as the benchmark measure. Subsequently, the performance of the LLM-based method was evaluated by comparing it to human-human agreement.

To assess annotation agreement, the study employed standard measures such as precision, recall, and F-score (Brants 2000 ; Lu 2010 ), along with the quadratically weighted kappa (QWK) to evaluate the consistency and agreement in the annotation process. Assume A and B represent human annotators. When comparing the annotations of the two annotators, the following results are obtained. The evaluation of precision, recall, and F-score metrics was illustrated in equations (13) to (15).

\({\rm{Recall}}(A,B)=\frac{{\rm{Number}}\,{\rm{of}}\,{\rm{identical}}\,{\rm{nodes}}\,{\rm{in}}\,A\,{\rm{and}}\,B}{{\rm{Number}}\,{\rm{of}}\,{\rm{nodes}}\,{\rm{in}}\,A}\)

\({\rm{Precision}}(A,\,B)=\frac{{\rm{Number}}\,{\rm{of}}\,{\rm{identical}}\,{\rm{nodes}}\,{\rm{in}}\,A\,{\rm{and}}\,B}{{\rm{Number}}\,{\rm{of}}\,{\rm{nodes}}\,{\rm{in}}\,B}\)

The F-score is the harmonic mean of recall and precision:

\({\rm{F}}-{\rm{score}}=\frac{2* ({\rm{Precision}}* {\rm{Recall}})}{{\rm{Precision}}+{\rm{Recall}}}\)

The highest possible value of an F-score is 1.0, indicating perfect precision and recall, and the lowest possible value is 0, if either precision or recall are zero.

In accordance with Taghipour and Ng ( 2016 ), the calculation of QWK involves two steps:

Step 1: Construct a weight matrix W as follows:

\({W}_{{ij}}=\frac{{(i-j)}^{2}}{{(N-1)}^{2}}\)

i represents the annotation made by the tool, while j represents the annotation made by a human rater. N denotes the total number of possible annotations. Matrix O is subsequently computed, where O_( i, j ) represents the count of data annotated by the tool ( i ) and the human annotator ( j ). On the other hand, E refers to the expected count matrix, which undergoes normalization to ensure that the sum of elements in E matches the sum of elements in O.

Step 2: With matrices O and E, the QWK is obtained as follows:

K = 1- \(\frac{\sum i,j{W}_{i,j}\,{O}_{i,j}}{\sum i,j{W}_{i,j}\,{E}_{i,j}}\)

The value of the quadratic weighted kappa increases as the level of agreement improves. Further, to assess the accuracy of LLM scoring, the proportional reductive mean square error (PRMSE) was employed. The PRMSE approach takes into account the variability observed in human ratings to estimate the rater error, which is then subtracted from the variance of the human labels. This calculation provides an overall measure of agreement between the automated scores and true scores (Haberman et al. 2015 ; Loukina et al. 2020 ; Taghipour and Ng, 2016 ). The computation of PRMSE involves the following steps:

Step 1: Calculate the mean squared errors (MSEs) for the scoring outcomes of the computer-assisted tool (MSE tool) and the human scoring outcomes (MSE human).

Step 2: Determine the PRMSE by comparing the MSE of the computer-assisted tool (MSE tool) with the MSE from human raters (MSE human), using the following formula:

\({\rm{PRMSE}}=1-\frac{({\rm{MSE}}\,{\rm{tool}})\,}{({\rm{MSE}}\,{\rm{human}})\,}=1-\,\frac{{\sum }_{i}^{n}=1{({{\rm{y}}}_{i}-{\hat{{\rm{y}}}}_{{\rm{i}}})}^{2}}{{\sum }_{i}^{n}=1{({{\rm{y}}}_{i}-\hat{{\rm{y}}})}^{2}}\)

In the numerator, ŷi represents the scoring outcome predicted by a specific LLM-driven AES system for a given sample. The term y i − ŷ i represents the difference between this predicted outcome and the mean value of all LLM-driven AES systems’ scoring outcomes. It quantifies the deviation of the specific LLM-driven AES system’s prediction from the average prediction of all LLM-driven AES systems. In the denominator, y i − ŷ represents the difference between the scoring outcome provided by a specific human rater for a given sample and the mean value of all human raters’ scoring outcomes. It measures the discrepancy between the specific human rater’s score and the average score given by all human raters. The PRMSE is then calculated by subtracting the ratio of the MSE tool to the MSE human from 1. PRMSE falls within the range of 0 to 1, with larger values indicating reduced errors in LLM’s scoring compared to those of human raters. In other words, a higher PRMSE implies that LLM’s scoring demonstrates greater accuracy in predicting the true scores (Loukina et al. 2020 ). The interpretation of kappa values, ranging from 0 to 1, is based on the work of Landis and Koch ( 1977 ). Specifically, the following categories are assigned to different ranges of kappa values: −1 indicates complete inconsistency, 0 indicates random agreement, 0.0 ~ 0.20 indicates extremely low level of agreement (slight), 0.21 ~ 0.40 indicates moderate level of agreement (fair), 0.41 ~ 0.60 indicates medium level of agreement (moderate), 0.61 ~ 0.80 indicates high level of agreement (substantial), 0.81 ~ 1 indicates almost perfect level of agreement. All statistical analyses were executed using Python script.

Results and discussion

Annotation reliability of the llm.

This section focuses on assessing the reliability of the LLM’s annotation and scoring capabilities. To evaluate the reliability, several tests were conducted simultaneously, aiming to achieve the following objectives:

Assess the LLM’s ability to differentiate between test takers with varying levels of oral proficiency.

Determine the level of agreement between the annotations and scoring performed by the LLM and those done by human raters.

The evaluation of the results encompassed several metrics, including: precision, recall, F-Score, quadratically-weighted kappa, proportional reduction of mean squared error, Pearson correlation, and multi-faceted Rasch measurement.

Inter-annotator agreement (human–human annotator agreement)

We started with an agreement test of the two human annotators. Two trained annotators were recruited to determine the writing task data measures. A total of 714 scripts, as the test data, was utilized. Each analysis lasted 300–360 min. Inter-annotator agreement was evaluated using the standard measures of precision, recall, and F-score and QWK. Table 7 presents the inter-annotator agreement for the various indicators. As shown, the inter-annotator agreement was fairly high, with F-scores ranging from 1.0 for sentence and word number to 0.666 for grammatical errors.

The findings from the QWK analysis provided further confirmation of the inter-annotator agreement. The QWK values covered a range from 0.950 ( p  = 0.000) for sentence and word number to 0.695 for synonym overlap number (keyword) and grammatical errors ( p  = 0.001).

Agreement of annotation outcomes between human and LLM

To evaluate the consistency between human annotators and LLM annotators (BERT, GPT, OCLL) across the indices, the same test was conducted. The results of the inter-annotator agreement (F-score) between LLM and human annotation are provided in Appendix B-D. The F-scores ranged from 0.706 for Grammatical error # for OCLL-human to a perfect 1.000 for GPT-human, for sentences, clauses, T-units, and words. These findings were further supported by the QWK analysis, which showed agreement levels ranging from 0.807 ( p  = 0.001) for metadiscourse markers for OCLL-human to 0.962 for words ( p  = 0.000) for GPT-human. The findings demonstrated that the LLM annotation achieved a significant level of accuracy in identifying measurement units and counts.

Reliability of LLM-driven AES’s scoring and discriminating proficiency levels

This section examines the reliability of the LLM-driven AES scoring through a comparison of the scoring outcomes produced by human raters and the LLM ( Reliability of LLM-driven AES scoring ). It also assesses the effectiveness of the LLM-based AES system in differentiating participants with varying proficiency levels ( Reliability of LLM-driven AES discriminating proficiency levels ).

Reliability of LLM-driven AES scoring

Table 8 summarizes the QWK coefficient analysis between the scores computed by the human raters and the GPT-4 for the individual essays from I-JAS Footnote 7 . As shown, the QWK of all measures ranged from k  = 0.819 for lexical density (number of lexical words (tokens)/number of words per essay) to k  = 0.644 for word2vec cosine similarity. Table 9 further presents the Pearson correlations between the 16 writing proficiency measures scored by human raters and GPT 4 for the individual essays. The correlations ranged from 0.672 for syntactic complexity to 0.734 for grammatical accuracy. The correlations between the writing proficiency scores assigned by human raters and the BERT-based AES system were found to range from 0.661 for syntactic complexity to 0.713 for grammatical accuracy. The correlations between the writing proficiency scores given by human raters and the OCLL-based AES system ranged from 0.654 for cohesion to 0.721 for grammatical accuracy. These findings indicated an alignment between the assessments made by human raters and both the BERT-based and OCLL-based AES systems in terms of various aspects of writing proficiency.

Reliability of LLM-driven AES discriminating proficiency levels

After validating the reliability of the LLM’s annotation and scoring, the subsequent objective was to evaluate its ability to distinguish between various proficiency levels. For this analysis, a dataset of 686 individual essays was utilized. Table 10 presents a sample of the results, summarizing the means, standard deviations, and the outcomes of the one-way ANOVAs based on the measures assessed by the GPT-4 model. A post hoc multiple comparison test, specifically the Bonferroni test, was conducted to identify any potential differences between pairs of levels.

As the results reveal, seven measures presented linear upward or downward progress across the three proficiency levels. These were marked in bold in Table 10 and comprise one measure of lexical richness, i.e. MATTR (lexical diversity); four measures of syntactic complexity, i.e. MDD (mean dependency distance), MLC (mean length of clause), CNT (complex nominals per T-unit), CPC (coordinate phrases rate); one cohesion measure, i.e. word2vec cosine similarity and GER (grammatical error rate). Regarding the ability of the sixteen measures to distinguish adjacent proficiency levels, the Bonferroni tests indicated that statistically significant differences exist between the primary level and the intermediate level for MLC and GER. One measure of lexical richness, namely LD, along with three measures of syntactic complexity (VPT, CT, DCT, ACC), two measures of cohesion (SOPT, SOPK), and one measure of content elaboration (IMM), exhibited statistically significant differences between proficiency levels. However, these differences did not demonstrate a linear progression between adjacent proficiency levels. No significant difference was observed in lexical sophistication between proficiency levels.

To summarize, our study aimed to evaluate the reliability and differentiation capabilities of the LLM-driven AES method. For the first objective, we assessed the LLM’s ability to differentiate between test takers with varying levels of oral proficiency using precision, recall, F-Score, and quadratically-weighted kappa. Regarding the second objective, we compared the scoring outcomes generated by human raters and the LLM to determine the level of agreement. We employed quadratically-weighted kappa and Pearson correlations to compare the 16 writing proficiency measures for the individual essays. The results confirmed the feasibility of using the LLM for annotation and scoring in AES for nonnative Japanese. As a result, Research Question 1 has been addressed.

Comparison of BERT-, GPT-, OCLL-based AES, and linguistic-feature-based computation methods

This section aims to compare the effectiveness of five AES methods for nonnative Japanese writing, i.e. LLM-driven approaches utilizing BERT, GPT, and OCLL, linguistic feature-based approaches using Jess and JWriter. The comparison was conducted by comparing the ratings obtained from each approach with human ratings. All ratings were derived from the dataset introduced in Dataset . To facilitate the comparison, the agreement between the automated methods and human ratings was assessed using QWK and PRMSE. The performance of each approach was summarized in Table 11 .

The QWK coefficient values indicate that LLMs (GPT, BERT, OCLL) and human rating outcomes demonstrated higher agreement compared to feature-based AES methods (Jess and JWriter) in assessing writing proficiency criteria, including lexical richness, syntactic complexity, content, and grammatical accuracy. Among the LLMs, the GPT-4 driven AES and human rating outcomes showed the highest agreement in all criteria, except for syntactic complexity. The PRMSE values suggest that the GPT-based method outperformed linguistic feature-based methods and other LLM-based approaches. Moreover, an interesting finding emerged during the study: the agreement coefficient between GPT-4 and human scoring was even higher than the agreement between different human raters themselves. This discovery highlights the advantage of GPT-based AES over human rating. Ratings involve a series of processes, including reading the learners’ writing, evaluating the content and language, and assigning scores. Within this chain of processes, various biases can be introduced, stemming from factors such as rater biases, test design, and rating scales. These biases can impact the consistency and objectivity of human ratings. GPT-based AES may benefit from its ability to apply consistent and objective evaluation criteria. By prompting the GPT model with detailed writing scoring rubrics and linguistic features, potential biases in human ratings can be mitigated. The model follows a predefined set of guidelines and does not possess the same subjective biases that human raters may exhibit. This standardization in the evaluation process contributes to the higher agreement observed between GPT-4 and human scoring. Section Prompt strategy of the study delves further into the role of prompts in the application of LLMs to AES. It explores how the choice and implementation of prompts can impact the performance and reliability of LLM-based AES methods. Furthermore, it is important to acknowledge the strengths of the local model, i.e. the Japanese local model OCLL, which excels in processing certain idiomatic expressions. Nevertheless, our analysis indicated that GPT-4 surpasses local models in AES. This superior performance can be attributed to the larger parameter size of GPT-4, estimated to be between 500 billion and 1 trillion, which exceeds the sizes of both BERT and the local model OCLL.

Prompt strategy

In the context of prompt strategy, Mizumoto and Eguchi ( 2023 ) conducted a study where they applied the GPT-3 model to automatically score English essays in the TOEFL test. They found that the accuracy of the GPT model alone was moderate to fair. However, when they incorporated linguistic measures such as cohesion, syntactic complexity, and lexical features alongside the GPT model, the accuracy significantly improved. This highlights the importance of prompt engineering and providing the model with specific instructions to enhance its performance. In this study, a similar approach was taken to optimize the performance of LLMs. GPT-4, which outperformed BERT and OCLL, was selected as the candidate model. Model 1 was used as the baseline, representing GPT-4 without any additional prompting. Model 2, on the other hand, involved GPT-4 prompted with 16 measures that included scoring criteria, efficient linguistic features for writing assessment, and detailed measurement units and calculation formulas. The remaining models (Models 3 to 18) utilized GPT-4 prompted with individual measures. The performance of these 18 different models was assessed using the output indicators described in Section Criteria (output indicator) . By comparing the performances of these models, the study aimed to understand the impact of prompt engineering on the accuracy and effectiveness of GPT-4 in AES tasks.

  

Model 1: GPT-4

  

  

Model 2: GPT-4 + 17 measures

  

  

Model 3: GPT-4 + MATTR

Model 4: GPT-4 + LD

Model 5: GPT-4 + LS

Model 6: GPT-4 + MLC

Model 7: GPT-4 + VPT

Model 8: GPT-4 + CT

Model 9: GPT-4 + DCT

Model 10: GPT-4 + CNT

Model 11: GPT-4 + ACC

Model 12: GPT-4 + CPC

Model 13: GPT-4 + MDD

Model 14: GPT-4 + SOPT

Model 15: GPT-4 + SOPK

Model 16: GPT-4 + word2vec

 

Model 17: GPT-4 + IMM

Model 18: GPT-4 + GER

 

Based on the PRMSE scores presented in Fig. 4 , it was observed that Model 1, representing GPT-4 without any additional prompting, achieved a fair level of performance. However, Model 2, which utilized GPT-4 prompted with all measures, outperformed all other models in terms of PRMSE score, achieving a score of 0.681. These results indicate that the inclusion of specific measures and prompts significantly enhanced the performance of GPT-4 in AES. Among the measures, syntactic complexity was found to play a particularly significant role in improving the accuracy of GPT-4 in assessing writing quality. Following that, lexical diversity emerged as another important factor contributing to the model’s effectiveness. The study suggests that a well-prompted GPT-4 can serve as a valuable tool to support human assessors in evaluating writing quality. By utilizing GPT-4 as an automated scoring tool, the evaluation biases associated with human raters can be minimized. This has the potential to empower teachers by allowing them to focus on designing writing tasks and guiding writing strategies, while leveraging the capabilities of GPT-4 for efficient and reliable scoring.

figure 4

PRMSE scores of the 18 AES models.

This study aimed to investigate two main research questions: the feasibility of utilizing LLMs for AES and the impact of prompt engineering on the application of LLMs in AES.

To address the first objective, the study compared the effectiveness of five different models: GPT, BERT, the Japanese local LLM (OCLL), and two conventional machine learning-based AES tools (Jess and JWriter). The PRMSE values indicated that the GPT-4-based method outperformed other LLMs (BERT, OCLL) and linguistic feature-based computational methods (Jess and JWriter) across various writing proficiency criteria. Furthermore, the agreement coefficient between GPT-4 and human scoring surpassed the agreement among human raters themselves, highlighting the potential of using the GPT-4 tool to enhance AES by reducing biases and subjectivity, saving time, labor, and cost, and providing valuable feedback for self-study. Regarding the second goal, the role of prompt design was investigated by comparing 18 models, including a baseline model, a model prompted with all measures, and 16 models prompted with one measure at a time. GPT-4, which outperformed BERT and OCLL, was selected as the candidate model. The PRMSE scores of the models showed that GPT-4 prompted with all measures achieved the best performance, surpassing the baseline and other models.

In conclusion, this study has demonstrated the potential of LLMs in supporting human rating in assessments. By incorporating automation, we can save time and resources while reducing biases and subjectivity inherent in human rating processes. Automated language assessments offer the advantage of accessibility, providing equal opportunities and economic feasibility for individuals who lack access to traditional assessment centers or necessary resources. LLM-based language assessments provide valuable feedback and support to learners, aiding in the enhancement of their language proficiency and the achievement of their goals. This personalized feedback can cater to individual learner needs, facilitating a more tailored and effective language-learning experience.

There are three important areas that merit further exploration. First, prompt engineering requires attention to ensure optimal performance of LLM-based AES across different language types. This study revealed that GPT-4, when prompted with all measures, outperformed models prompted with fewer measures. Therefore, investigating and refining prompt strategies can enhance the effectiveness of LLMs in automated language assessments. Second, it is crucial to explore the application of LLMs in second-language assessment and learning for oral proficiency, as well as their potential in under-resourced languages. Recent advancements in self-supervised machine learning techniques have significantly improved automatic speech recognition (ASR) systems, opening up new possibilities for creating reliable ASR systems, particularly for under-resourced languages with limited data. However, challenges persist in the field of ASR. First, ASR assumes correct word pronunciation for automatic pronunciation evaluation, which proves challenging for learners in the early stages of language acquisition due to diverse accents influenced by their native languages. Accurately segmenting short words becomes problematic in such cases. Second, developing precise audio-text transcriptions for languages with non-native accented speech poses a formidable task. Last, assessing oral proficiency levels involves capturing various linguistic features, including fluency, pronunciation, accuracy, and complexity, which are not easily captured by current NLP technology.

Data availability

The dataset utilized was obtained from the International Corpus of Japanese as a Second Language (I-JAS). The data URLs: [ https://www2.ninjal.ac.jp/jll/lsaj/ihome2.html ].

J-CAT and TTBJ are two computerized adaptive tests used to assess Japanese language proficiency.

SPOT is a specific component of the TTBJ test.

J-CAT: https://www.j-cat2.org/html/ja/pages/interpret.html

SPOT: https://ttbj.cegloc.tsukuba.ac.jp/p1.html#SPOT .

The study utilized a prompt-based GPT-4 model, developed by OpenAI, which has an impressive architecture with 1.8 trillion parameters across 120 layers. GPT-4 was trained on a vast dataset of 13 trillion tokens, using two stages: initial training on internet text datasets to predict the next token, and subsequent fine-tuning through reinforcement learning from human feedback.

https://www2.ninjal.ac.jp/jll/lsaj/ihome2-en.html .

http://jhlee.sakura.ne.jp/JEV/ by Japanese Learning Dictionary Support Group 2015.

We express our sincere gratitude to the reviewer for bringing this matter to our attention.

On February 7, 2023, Microsoft began rolling out a major overhaul to Bing that included a new chatbot feature based on OpenAI’s GPT-4 (Bing.com).

Appendix E-F present the analysis results of the QWK coefficient between the scores computed by the human raters and the BERT, OCLL models.

Attali Y, Burstein J (2006) Automated essay scoring with e-rater® V.2. J. Technol., Learn. Assess., 4

Barkaoui K, Hadidi A (2020) Assessing Change in English Second Language Writing Performance (1st ed.). Routledge, New York. https://doi.org/10.4324/9781003092346

Bentz C, Tatyana R, Koplenig A, Tanja S (2016) A comparison between morphological complexity. measures: Typological data vs. language corpora. In Proceedings of the workshop on computational linguistics for linguistic complexity (CL4LC), 142–153. Osaka, Japan: The COLING 2016 Organizing Committee

Bond TG, Yan Z, Heene M (2021) Applying the Rasch model: Fundamental measurement in the human sciences (4th ed). Routledge

Brants T (2000) Inter-annotator agreement for a German newspaper corpus. Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00), Athens, Greece, 31 May-2 June, European Language Resources Association

Brown TB, Mann B, Ryder N, et al. (2020) Language models are few-shot learners. Advances in Neural Information Processing Systems, Online, 6–12 December, Curran Associates, Inc., Red Hook, NY

Burstein J (2003) The E-rater scoring engine: Automated essay scoring with natural language processing. In Shermis MD and Burstein JC (ed) Automated Essay Scoring: A Cross-Disciplinary Perspective. Lawrence Erlbaum Associates, Mahwah, NJ

Čech R, Miroslav K (2018) Morphological richness of text. In Masako F, Václav C (ed) Taming the corpus: From inflection and lexis to interpretation, 63–77. Cham, Switzerland: Springer Nature

Çöltekin Ç, Taraka, R (2018) Exploiting Universal Dependencies treebanks for measuring morphosyntactic complexity. In Aleksandrs B, Christian B (ed), Proceedings of first workshop on measuring language complexity, 1–7. Torun, Poland

Crossley SA, Cobb T, McNamara DS (2013) Comparing count-based and band-based indices of word frequency: Implications for active vocabulary research and pedagogical applications. System 41:965–981. https://doi.org/10.1016/j.system.2013.08.002

Article   Google Scholar  

Crossley SA, McNamara DS (2016) Say more and be more coherent: How text elaboration and cohesion can increase writing quality. J. Writ. Res. 7:351–370

CyberAgent Inc (2023) Open-Calm series of Japanese language models. Retrieved from: https://www.cyberagent.co.jp/news/detail/id=28817

Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, Minneapolis, Minnesota, 2–7 June, pp. 4171–4186. Association for Computational Linguistics

Diez-Ortega M, Kyle K (2023) Measuring the development of lexical richness of L2 Spanish: a longitudinal learner corpus study. Studies in Second Language Acquisition 1-31

Eckes T (2009) On common ground? How raters perceive scoring criteria in oral proficiency testing. In Brown A, Hill K (ed) Language testing and evaluation 13: Tasks and criteria in performance assessment (pp. 43–73). Peter Lang Publishing

Elliot S (2003) IntelliMetric: from here to validity. In: Shermis MD, Burstein JC (ed) Automated Essay Scoring: A Cross-Disciplinary Perspective. Lawrence Erlbaum Associates, Mahwah, NJ

Google Scholar  

Engber CA (1995) The relationship of lexical proficiency to the quality of ESL compositions. J. Second Lang. Writ. 4:139–155

Garner J, Crossley SA, Kyle K (2019) N-gram measures and L2 writing proficiency. System 80:176–187. https://doi.org/10.1016/j.system.2018.12.001

Haberman SJ (2008) When can subscores have value? J. Educat. Behav. Stat., 33:204–229

Haberman SJ, Yao L, Sinharay S (2015) Prediction of true test scores from observed item scores and ancillary data. Brit. J. Math. Stat. Psychol. 68:363–385

Halliday MAK (1985) Spoken and Written Language. Deakin University Press, Melbourne, Australia

Hirao R, Arai M, Shimanaka H et al. (2020) Automated essay scoring system for nonnative Japanese learners. Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pp. 1250–1257. European Language Resources Association

Hunt KW (1966) Recent Measures in Syntactic Development. Elementary English, 43(7), 732–739. http://www.jstor.org/stable/41386067

Ishioka T (2001) About e-rater, a computer-based automatic scoring system for essays [Konpyūta ni yoru essei no jidō saiten shisutemu e − rater ni tsuite]. University Entrance Examination. Forum [Daigaku nyūshi fōramu] 24:71–76

Hochreiter S, Schmidhuber J (1997) Long short- term memory. Neural Comput. 9(8):1735–1780

Article   CAS   PubMed   Google Scholar  

Ishioka T, Kameda M (2006) Automated Japanese essay scoring system based on articles written by experts. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, 17–18 July 2006, pp. 233-240. Association for Computational Linguistics, USA

Japan Foundation (2021) Retrieved from: https://www.jpf.gp.jp/j/project/japanese/survey/result/dl/survey2021/all.pdf

Jarvis S (2013a) Defining and measuring lexical diversity. In Jarvis S, Daller M (ed) Vocabulary knowledge: Human ratings and automated measures (Vol. 47, pp. 13–44). John Benjamins. https://doi.org/10.1075/sibil.47.03ch1

Jarvis S (2013b) Capturing the diversity in lexical diversity. Lang. Learn. 63:87–106. https://doi.org/10.1111/j.1467-9922.2012.00739.x

Jiang J, Quyang J, Liu H (2019) Interlanguage: A perspective of quantitative linguistic typology. Lang. Sci. 74:85–97

Kim M, Crossley SA, Kyle K (2018) Lexical sophistication as a multidimensional phenomenon: Relations to second language lexical proficiency, development, and writing quality. Mod. Lang. J. 102(1):120–141. https://doi.org/10.1111/modl.12447

Kojima T, Gu S, Reid M et al. (2022) Large language models are zero-shot reasoners. Advances in Neural Information Processing Systems, New Orleans, LA, 29 November-1 December, Curran Associates, Inc., Red Hook, NY

Kyle K, Crossley SA (2015) Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Q 49:757–786

Kyle K, Crossley SA, Berger CM (2018) The tool for the automatic analysis of lexical sophistication (TAALES): Version 2.0. Behav. Res. Methods 50:1030–1046. https://doi.org/10.3758/s13428-017-0924-4

Article   PubMed   Google Scholar  

Kyle K, Crossley SA, Jarvis S (2021) Assessing the validity of lexical diversity using direct judgements. Lang. Assess. Q. 18:154–170. https://doi.org/10.1080/15434303.2020.1844205

Landauer TK, Laham D, Foltz PW (2003) Automated essay scoring and annotation of essays with the Intelligent Essay Assessor. In Shermis MD, Burstein JC (ed), Automated Essay Scoring: A Cross-Disciplinary Perspective. Lawrence Erlbaum Associates, Mahwah, NJ

Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 159–174

Laufer B, Nation P (1995) Vocabulary size and use: Lexical richness in L2 written production. Appl. Linguist. 16:307–322. https://doi.org/10.1093/applin/16.3.307

Lee J, Hasebe Y (2017) jWriter Learner Text Evaluator, URL: https://jreadability.net/jwriter/

Lee J, Kobayashi N, Sakai T, Sakota K (2015) A Comparison of SPOT and J-CAT Based on Test Analysis [Tesuto bunseki ni motozuku ‘SPOT’ to ‘J-CAT’ no hikaku]. Research on the Acquisition of Second Language Japanese [Dainigengo to shite no nihongo no shūtoku kenkyū] (18) 53–69

Li W, Yan J (2021) Probability distribution of dependency distance based on a Treebank of. Japanese EFL Learners’ Interlanguage. J. Quant. Linguist. 28(2):172–186. https://doi.org/10.1080/09296174.2020.1754611

Article   MathSciNet   Google Scholar  

Linacre JM (2002) Optimizing rating scale category effectiveness. J. Appl. Meas. 3(1):85–106

PubMed   Google Scholar  

Linacre JM (1994) Constructing measurement with a Many-Facet Rasch Model. In Wilson M (ed) Objective measurement: Theory into practice, Volume 2 (pp. 129–144). Norwood, NJ: Ablex

Liu H (2008) Dependency distance as a metric of language comprehension difficulty. J. Cognitive Sci. 9:159–191

Liu H, Xu C, Liang J (2017) Dependency distance: A new perspective on syntactic patterns in natural languages. Phys. Life Rev. 21. https://doi.org/10.1016/j.plrev.2017.03.002

Loukina A, Madnani N, Cahill A, et al. (2020) Using PRMSE to evaluate automated scoring systems in the presence of label noise. Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, Seattle, WA, USA → Online, 10 July, pp. 18–29. Association for Computational Linguistics

Lu X (2010) Automatic analysis of syntactic complexity in second language writing. Int. J. Corpus Linguist. 15:474–496

Lu X (2012) The relationship of lexical richness to the quality of ESL learners’ oral narratives. Mod. Lang. J. 96:190–208

Lu X (2017) Automated measurement of syntactic complexity in corpus-based L2 writing research and implications for writing assessment. Lang. Test. 34:493–511

Lu X, Hu R (2022) Sense-aware lexical sophistication indices and their relationship to second language writing quality. Behav. Res. Method. 54:1444–1460. https://doi.org/10.3758/s13428-021-01675-6

Ministry of Health, Labor, and Welfare of Japan (2022) Retrieved from: https://www.mhlw.go.jp/stf/newpage_30367.html

Mizumoto A, Eguchi M (2023) Exploring the potential of using an AI language model for automated essay scoring. Res. Methods Appl. Linguist. 3:100050

Okgetheng B, Takeuchi K (2024) Estimating Japanese Essay Grading Scores with Large Language Models. Proceedings of 30th Annual Conference of the Language Processing Society in Japan, March 2024

Ortega L (2015) Second language learning explained? SLA across 10 contemporary theories. In VanPatten B, Williams J (ed) Theories in Second Language Acquisition: An Introduction

Rae JW, Borgeaud S, Cai T, et al. (2021) Scaling Language Models: Methods, Analysis & Insights from Training Gopher. ArXiv, abs/2112.11446

Read J (2000) Assessing vocabulary. Cambridge University Press. https://doi.org/10.1017/CBO9780511732942

Rudner LM, Liang T (2002) Automated Essay Scoring Using Bayes’ Theorem. J. Technol., Learning and Assessment, 1 (2)

Sakoda K, Hosoi Y (2020) Accuracy and complexity of Japanese Language usage by SLA learners in different learning environments based on the analysis of I-JAS, a learners’ corpus of Japanese as L2. Math. Linguist. 32(7):403–418. https://doi.org/10.24701/mathling.32.7_403

Suzuki N (1999) Summary of survey results regarding comprehensive essay questions. Final report of “Joint Research on Comprehensive Examinations for the Aim of Evaluating Applicability to Each Specialized Field of Universities” for 1996-2000 [shōronbun sōgō mondai ni kansuru chōsa kekka no gaiyō. Heisei 8 - Heisei 12-nendo daigaku no kaku senmon bun’ya e no tekisei no hyōka o mokuteki to suru sōgō shiken no arikata ni kansuru kyōdō kenkyū’ saishū hōkoku-sho]. University Entrance Examination Section Center Research and Development Department [Daigaku nyūshi sentā kenkyū kaihatsubu], 21–32

Taghipour K, Ng HT (2016) A neural approach to automated essay scoring. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, 1–5 November, pp. 1882–1891. Association for Computational Linguistics

Takeuchi K, Ohno M, Motojin K, Taguchi M, Inada Y, Iizuka M, Abo T, Ueda H (2021) Development of essay scoring methods based on reference texts with construction of research-available Japanese essay data. In IPSJ J 62(9):1586–1604

Ure J (1971) Lexical density: A computational technique and some findings. In Coultard M (ed) Talking about Text. English Language Research, University of Birmingham, Birmingham, England

Vaswani A, Shazeer N, Parmar N, et al. (2017) Attention is all you need. In Advances in Neural Information Processing Systems, Long Beach, CA, 4–7 December, pp. 5998–6008, Curran Associates, Inc., Red Hook, NY

Watanabe H, Taira Y, Inoue Y (1988) Analysis of essay evaluation data [Shōronbun hyōka dēta no kaiseki]. Bulletin of the Faculty of Education, University of Tokyo [Tōkyōdaigaku kyōiku gakubu kiyō], Vol. 28, 143–164

Yao S, Yu D, Zhao J, et al. (2023) Tree of thoughts: Deliberate problem solving with large language models. Advances in Neural Information Processing Systems, 36

Zenker F, Kyle K (2021) Investigating minimum text lengths for lexical diversity indices. Assess. Writ. 47:100505. https://doi.org/10.1016/j.asw.2020.100505

Zhang Y, Warstadt A, Li X, et al. (2021) When do you need billions of words of pretraining data? Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, pp. 1112-1125. Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.acl-long.90

Download references

This research was funded by National Foundation of Social Sciences (22BYY186) to Wenchao Li.

Author information

Authors and affiliations.

Department of Japanese Studies, Zhejiang University, Hangzhou, China

Department of Linguistics and Applied Linguistics, Zhejiang University, Hangzhou, China

You can also search for this author in PubMed   Google Scholar

Contributions

Wenchao Li is in charge of conceptualization, validation, formal analysis, investigation, data curation, visualization and writing the draft. Haitao Liu is in charge of supervision.

Corresponding author

Correspondence to Wenchao Li .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Ethical approval

Ethical approval was not required as the study did not involve human participants.

Informed consent

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental material file #1, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Li, W., Liu, H. Applying large language models for automated essay scoring for non-native Japanese. Humanit Soc Sci Commun 11 , 723 (2024). https://doi.org/10.1057/s41599-024-03209-9

Download citation

Received : 02 February 2024

Accepted : 16 May 2024

Published : 03 June 2024

DOI : https://doi.org/10.1057/s41599-024-03209-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

other words for however essay

IMAGES

  1. 90+ Synonyms for "However" with Examples

    other words for however essay

  2. Other Words for However: Essential However Synonyms

    other words for however essay

  3. However Synonym: List of 40+ Powerful Synonyms for HOWEVER

    other words for however essay

  4. Synonyms However, Definition and Examples

    other words for however essay

  5. HOWEVER Synonym

    other words for however essay

  6. Another Word for “However”

    other words for however essay

VIDEO

  1. How to use HOWEVER

  2. Use of though and however as adverbs

  3. Разница между "although" и "however"

  4. Struggling to find the right words for an essay #dream #essay #meme #memecut #memes #capcut

  5. Linking Words (however, although and despite) Quiz

  6. However

COMMENTS

  1. 18 Other Ways to Say "However" in an Essay

    In other words, "nevertheless" is not a better word than "however" to use in formal or academic writing. But you can use this alternative to avoid repetition in your essay. ... It is perfectly okay to use "however" in an essay. However, we do advise that you use it with caution. Although it is not a bad word by any means, ...

  2. 10 Other Words for "However" in an Essay

    At the same time. With that said. Still. Be that as it may. But. Keep reading to learn more words to replace "however" in an essay. 1. Though. One of the more common synonyms for "however" in academic writing is "though.".

  3. 22 Synonyms & Antonyms for HOWEVER

    Find 22 different ways to say HOWEVER, along with antonyms, related words, and example sentences at Thesaurus.com.

  4. HOWEVER Synonyms: 17 Similar Words

    Synonyms for HOWEVER: though, nevertheless, nonetheless, still, yet, notwithstanding, at the same time, all the same, just the same, even so

  5. What is another word for however

    Synonyms for however include nevertheless, despite that, regardless, but despite that, but for all that, but in spite of that, even so, having said that, in spite of that and nonetheless. Find more similar words at wordhippo.com!

  6. Vocabulary Tips: Synonyms for "However"

    Vocabulary Tips: Synonyms for "However" The word however is a conjunctive adverb, which means that it's used to link two sentences.Linking or transition words like this are important in academic writing because they show how the ideas you present in your work are connected.. Without transition words, your essays and assignments will be a series of disjointed phrases, making it difficult ...

  7. Another Word for However

    Formal Synonyms for "However". The following are some of the formal synonyms for "however": Synonym. Definition. Notwithstanding. Used to introduce a statement that contrasts with what has already been said. Conversely. Used to introduce a statement that contrasts with what has already been said. In contrast.

  8. However synonyms

    833 other terms for however- words and phrases with similar meaning

  9. 9 Words To Use Instead Of "However" (With Examples)

    Though. "Though" is another common choice which we often see instead of "however.". Many people think it works in the same way as "although.". It's simply a shorter version of "although" in the case of countering an argument. "Though" is a great choice, like "yet" and "but," which only requires a comma to work.

  10. HOWEVER in Thesaurus: 1000+ Synonyms & Antonyms for HOWEVER

    What's the definition of However in thesaurus? Most related words/phrases with sentence examples define However meaning and usage. Thesaurus for However. Related terms for however- synonyms, antonyms and sentences with however. Lists. synonyms. antonyms. definitions. sentences. thesaurus. Parts of speech. adverbs. prepositions. nouns.

  11. 90+ Synonyms for "However" with Examples

    Yes, there are several synonyms for 'however' that are suitable for use in formal writing. Some of these include 'nevertheless', 'nonetheless', 'yet', 'although', 'in spite of that', 'still', 'notwithstanding', 'regardless', 'albeit', and 'despite this'. It is important to choose the appropriate word ...

  12. 22 of the Best Synonyms for 'However'

    With this in mind, we might identify THEREFORE, THUS, and SO as antonyms for however. By Dr Oliver Tearle (Loughborough University) There are lots of strange ideas surrounding the word 'however'. Some teachers tell their students they shouldn't begin a new sentence with the word 'but', and should substitute the word 'however' instead.

  13. 42 Powerful Synonyms for However in English

    Learn useful list of 42 synonyms for however in English. All the same. Although. And yet. Anyhow. Anyway. At any rate. At the same time. Be that as it may.

  14. HOWEVER

    HOWEVER - Synonyms, related words and examples | Cambridge English Thesaurus

  15. However Synonyms: 20 Synonyms and Antonyms for However

    Synonyms for HOWEVER: nevertheless, still, yet, all the same, nonetheless, notwithstanding, although, but, despite, except, withal, still and all, though, even-so ...

  16. Synonyms of HOWEVER

    Thesaurus for however from the Collins English Thesaurus. Read about the team of authors behind Collins Dictionaries. New from Collins Quick word challenge. Quiz Review. ... or tips on writing the perfect college essay, Harper Reference has you covered for all your study needs. February 13, 2020 Read more

  17. 33 Transition Words for Essays

    33 Transition Words and Phrases. 'Besides,' 'furthermore,' 'although,' and other words to help you jump from one idea to the next. Transitional terms give writers the opportunity to prepare readers for a new idea, connecting the previous sentence to the next one. Many transitional words are nearly synonymous: words that broadly indicate that ...

  18. HOWEVER Synonyms

    Synonyms for HOWEVER in English: but, nevertheless, still, though, yet, even though, on the other hand, nonetheless, notwithstanding, anyhow, …

  19. HOWEVER Synonym: 23 Useful Words to Use Instead of HOWEVER

    Be that as it may. But. Despite. For all that. In spite of. On the other hand. Per contra. Though. Without regard to.

  20. Vocabulary Tips: Alternatives to "But" for Academic Writing

    How to Use "However". One common replacement for "but" in academic writing is "however.". But we use this adverb to show a sentence contrasts with something previously said. As such, rather than connecting two parts of a sentence, it should only be used after a semicolon or in a new sentence: I like Brian May's guitar solos.

  21. How to Use the Word "However"

    However (Whatever) This sense of "however" typically means "to whatever extent" or "in whatever manner": I'll catch you one day, however far you run! It's not a formal event, so dress however you want. Note that in the examples above, there's no punctuation between "however" and the thing it's modifying (i.e., distance ...

  22. ESSAY Synonyms: 76 Similar and Opposite Words

    Synonyms for ESSAY: article, paper, dissertation, theme, thesis, composition, treatise, editorial; Antonyms of ESSAY: quit, drop, give up

  23. 80 Synonyms & Antonyms for ESSAY

    Find 80 different ways to say ESSAY, along with antonyms, related words, and example sentences at Thesaurus.com.

  24. Applying large language models for automated essay scoring for non

    Initially, a TTR is determined for words 1-50 in an essay, followed by words 2-51, 3-52, and so on until the end of the essay is reached (Díez-Ortega and Kyle, 2023). The final MATTR scores ...