• Search Menu
  • Sign in through your institution
  • Browse content in Arts and Humanities
  • Browse content in Archaeology
  • Prehistoric Archaeology
  • Browse content in Art
  • History of Art
  • Browse content in Classical Studies
  • Classical History
  • Classical Literature
  • Classical Reception
  • Greek and Roman Archaeology
  • Digital Humanities
  • Browse content in History
  • Diplomatic History
  • Environmental History
  • Genocide and Ethnic Cleansing
  • History by Period
  • Legal and Constitutional History
  • Regional and National History
  • Social and Cultural History
  • Theory, Methods, and Historiography
  • World History
  • Browse content in Language Teaching and Learning
  • Language Teaching Theory and Methods
  • Browse content in Linguistics
  • Applied Linguistics
  • Language Evolution
  • Language Families
  • Lexicography
  • Browse content in Literature
  • Bibliography
  • Literary Studies (American)
  • Literary Studies (20th Century onwards)
  • Literary Studies (British and Irish)
  • Literary Studies (Women's Writing)
  • Literary Theory and Cultural Studies
  • Shakespeare Studies and Criticism
  • Browse content in Media Studies
  • Browse content in Music
  • Applied Music
  • Medicine and Music
  • Music Theory and Analysis
  • Musical Structures, Styles, and Techniques
  • Musicology and Music History
  • Browse content in Philosophy
  • Aesthetics and Philosophy of Art
  • Epistemology
  • History of Western Philosophy
  • Metaphysics
  • Moral Philosophy
  • Philosophy of Mind
  • Philosophy of Science
  • Philosophy of Mathematics and Logic
  • Practical Ethics
  • Browse content in Religion
  • Christianity
  • Judaism and Jewish Studies
  • Religion and Science
  • Religion and Law
  • Religion and Art, Literature, and Music
  • Religious Studies
  • Browse content in Society and Culture
  • Ethical Issues and Debates
  • Browse content in Law
  • Arbitration
  • Company and Commercial Law
  • Comparative Law
  • Competition Law
  • Browse content in Constitutional and Administrative Law
  • Parliamentary and Legislative Practice
  • Employment and Labour Law
  • Environment and Energy Law
  • Financial Law
  • History of Law
  • Human Rights and Immigration
  • Intellectual Property Law
  • Browse content in International Law
  • Private International Law and Conflict of Laws
  • Public International Law
  • IT and Communications Law
  • Jurisprudence and Philosophy of Law
  • Law and Society
  • Legal System and Practice
  • Medical and Healthcare Law
  • Browse content in Medicine and Health
  • Browse content in Allied Health Professions
  • Dietetics and Nutrition
  • Physiotherapy
  • Radiography
  • Anaesthetics
  • Clinical Neuroscience
  • Browse content in Clinical Medicine
  • Acute Medicine
  • Cardiovascular Medicine
  • Clinical Pharmacology and Therapeutics
  • Dermatology
  • Endocrinology and Diabetes
  • Gastroenterology
  • Geriatric Medicine
  • Infectious Diseases
  • Medical Toxicology
  • Medical Oncology
  • Rheumatology
  • Sleep Medicine
  • Community Medical Services
  • Critical Care
  • Forensic Medicine
  • History of Medicine
  • Medical Skills
  • Browse content in Medical Dentistry
  • Restorative Dentistry and Orthodontics
  • Medical Ethics
  • Medical Statistics and Methodology
  • Browse content in Neurology
  • Neuropathology
  • Nursing Studies
  • Browse content in Obstetrics and Gynaecology
  • Gynaecology
  • Occupational Medicine
  • Paediatrics
  • Browse content in Pathology
  • Clinical Cytogenetics and Molecular Genetics
  • Medical Microbiology and Virology
  • Patient Education and Information
  • Browse content in Pharmacology
  • Psychopharmacology
  • Browse content in Preclinical Medicine
  • Molecular Biology and Genetics
  • Reproduction, Growth and Development
  • Primary Care
  • Professional Development in Medicine
  • Browse content in Psychiatry
  • Child and Adolescent Psychiatry
  • Forensic Psychiatry
  • Browse content in Public Health and Epidemiology
  • Epidemiology
  • Public Health
  • Browse content in Radiology
  • Clinical Radiology
  • Interventional Radiology
  • Radiation Oncology
  • Reproductive Medicine
  • Browse content in Surgery
  • Cardiothoracic Surgery
  • Gastro-intestinal and Colorectal Surgery
  • Neurosurgery
  • Plastic and Reconstructive Surgery
  • Trauma and Orthopaedic Surgery
  • Browse content in Science and Mathematics
  • Browse content in Biological Sciences
  • Aquatic Biology
  • Biochemistry
  • Bioinformatics and Computational Biology
  • Developmental Biology
  • Ecology and Conservation
  • Evolutionary Biology
  • Genetics and Genomics
  • Microbiology
  • Molecular and Cell Biology
  • Plant Sciences and Forestry
  • Research Methods in Life Sciences
  • Structural Biology
  • Systems Biology
  • Zoology and Animal Sciences
  • Browse content in Chemistry
  • Medicinal Chemistry
  • Mineralogy and Gems
  • Physical Chemistry
  • Browse content in Computer Science
  • Artificial Intelligence
  • Computer Architecture and Logic Design
  • Human-Computer Interaction
  • Mathematical Theory of Computation
  • Browse content in Computing
  • Computer Security
  • Computer Networking and Communications
  • Browse content in Earth Sciences and Geography
  • Atmospheric Sciences
  • Environmental Geography
  • Geology and the Lithosphere
  • Meteorology and Climatology
  • Browse content in Engineering and Technology
  • Agriculture and Farming
  • Biological Engineering
  • Civil Engineering, Surveying, and Building
  • Energy Technology
  • Engineering (General)
  • Environmental Science, Engineering, and Technology
  • Transport Technology and Trades
  • Browse content in Environmental Science
  • Environmental Sustainability
  • Management of Land and Natural Resources (Environmental Science)
  • Browse content in Materials Science
  • Ceramics and Glasses
  • Composite Materials
  • Nanotechnology
  • Browse content in Mathematics
  • Applied Mathematics
  • Biomathematics and Statistics
  • Mathematical Education
  • Mathematical Analysis
  • Probability and Statistics
  • Pure Mathematics
  • Browse content in Neuroscience
  • Cognition and Behavioural Neuroscience
  • Neuroscientific Techniques
  • Browse content in Physics
  • Astronomy and Astrophysics
  • Classical Mechanics
  • Relativity and Gravitation
  • Browse content in Psychology
  • Clinical Psychology
  • Cognitive Psychology
  • Cognitive Neuroscience
  • Health Psychology
  • Music Psychology
  • Neuropsychology
  • Organizational Psychology
  • Browse content in Social Sciences
  • Browse content in Anthropology
  • Human Evolution
  • Browse content in Business and Management
  • Human Resource Management
  • Industrial and Employment Relations
  • Industry Studies
  • Information and Communication Technologies
  • Organizational Theory and Behaviour
  • Public and Nonprofit Management
  • Browse content in Criminology and Criminal Justice
  • Criminology
  • Browse content in Economics
  • Agricultural, Environmental, and Natural Resource Economics
  • Behavioural Economics and Neuroeconomics
  • Econometrics and Mathematical Economics
  • Economic History
  • Economic Development and Growth
  • Financial Markets
  • Financial Institutions and Services
  • Health, Education, and Welfare
  • Labour and Demographic Economics
  • Law and Economics
  • Public Economics
  • Urban, Rural, and Regional Economics
  • Browse content in Education
  • Schools Studies
  • Teaching of Specific Groups and Special Educational Needs
  • Environment
  • Browse content in Human Geography
  • Economic Geography
  • Browse content in Interdisciplinary Studies
  • Communication Studies
  • Museums, Libraries, and Information Sciences
  • Browse content in Politics
  • Foreign Policy
  • Gender and Politics
  • International Relations
  • International Organization (Politics)
  • Political Behaviour
  • Political Economy
  • Political Institutions
  • Political Sociology
  • Political Theory
  • Public Policy
  • Public Administration
  • Quantitative Political Methodology
  • Regional Political Studies
  • Security Studies
  • Browse content in Regional and Area Studies
  • African Studies
  • Japanese Studies
  • Research and Information
  • Browse content in Social Work
  • Addictions and Substance Misuse
  • Browse content in Sociology
  • Economic Sociology
  • Gender and Sexuality
  • Gerontology and Ageing
  • Health, Illness, and Medicine
  • Migration Studies
  • Race and Ethnicity
  • Social Movements and Social Change
  • Social Research and Statistics
  • Social Stratification, Inequality, and Mobility
  • Sociology of Religion
  • Urban and Rural Studies
  • Journals A to Z
  • Books on Oxford Academic

most read in linguistics

Most Read in Linguistics

From practical applications to the latest academic scholarship, Oxford’s range of linguistics research has unparalleled breadth and authority. Explore a collection of our most read articles and chapters from our linguistics portfolio, available to read for free online until December 2022.

Browse our collections

  • Browse our Journals
  • Browse our Books

BROWSE OUR JOURNALS

Browse our books, affiliations.

  • Copyright © 2024
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

  • Architecture and Design
  • Asian and Pacific Studies
  • Business and Economics
  • Classical and Ancient Near Eastern Studies
  • Computer Sciences
  • Cultural Studies
  • Engineering
  • General Interest
  • Geosciences
  • Industrial Chemistry
  • Islamic and Middle Eastern Studies
  • Jewish Studies
  • Library and Information Science, Book Studies
  • Life Sciences
  • Linguistics and Semiotics
  • Literary Studies
  • Materials Sciences
  • Mathematics
  • Social Sciences
  • Sports and Recreation
  • Theology and Religion
  • Publish your article
  • The role of authors
  • Promoting your article
  • Abstracting & indexing
  • Publishing Ethics
  • Why publish with De Gruyter
  • How to publish with De Gruyter
  • Our book series
  • Our subject areas
  • Your digital product at De Gruyter
  • Contribute to our reference works
  • Product information
  • Tools & resources
  • Product Information
  • Promotional Materials
  • Orders and Inquiries
  • FAQ for Library Suppliers and Book Sellers
  • Repository Policy
  • Free access policy
  • Open Access agreements
  • Database portals
  • For Authors
  • Customer service
  • People + Culture
  • Journal Management
  • How to join us
  • Working at De Gruyter
  • Mission & Vision
  • De Gruyter Foundation
  • De Gruyter Ebound
  • Our Responsibility
  • Partner publishers

research papers in linguistics

Your purchase has been completed. Your documents are now available to view.

journal: Linguistics Vanguard

Linguistics Vanguard

A multimodal journal for the language sciences.

  • Online ISSN: 2199-174X
  • Type: Journal
  • Language: English
  • Publisher: De Gruyter Mouton
  • First published: June 20, 2015
  • Publication Frequency: 1 Issue per Year
  • Audience: researchers and students

research papers in linguistics

Journal of Psycholinguistic Research

  • Rafael Art Javier

Latest articles

Norms for 718 persian words in emotional dimensions, animacy, and familiarity.

  • Firouzeh Mahjoubnavaz
  • Setareh Mokhtari
  • Reza Khosrowabadi

research papers in linguistics

Recognition of Emotional Prosody in Mandarin-Speaking Children: Effects of Age, Noise, and Working Memory

  • Xiaoxiang Chen

research papers in linguistics

Negative Pragmatic Transfer in Bilinguals: Cross-Linguistic Influence in the Acquisition of Quantifiers

  • Greta Mazzaggio
  • Penka Stateva

research papers in linguistics

Cognitive Fluency in L2: The Effect of Automatic and Controlled Lexical Processing on Speech Rate

  • Sanna Olkkonen
  • Patrick Snellings
  • Pekka Lintunen

research papers in linguistics

Foreign Language Learners’ Uncertainty Experiences and Uncertainty Management

  • Aysun Dağtaş
  • Şehnaz Şahinkarakaş

research papers in linguistics

Journal information

  • Current Contents/Social & Behavioral Sciences
  • Google Scholar
  • Japanese Science and Technology Agency (JST)
  • MLA International Bibliography
  • Norwegian Register for Scientific Journals and Series
  • OCLC WorldCat Discovery Service
  • Social Science Citation Index
  • TD Net Discovery Service
  • UGC-CARE List (India)

Rights and permissions

Editorial policies

© Springer Science+Business Media, LLC, part of Springer Nature

  • Find a journal
  • Publish with us
  • Track your research

Linguistics and Language: A Research Guide: Journal Articles & Dissertations

  • Library Support for Linguistics
  • Reference Sources
  • Journal Articles & Dissertations
  • World Languages
  • About English
  • Catalogs and Bibliographies
  • Corpora / Text Analysis
  • Data Management
  • Linguistics @ Cornell
  • Linguistics Associations and Resources
  • Citation and Style Manuals
  • Off Campus Access

Finding Journal Articles and Dissertations

Primary online indexes and databases for linguistics.

  • LLBA Language, Linguistics and Behavior Abstracts Covers all aspects of the study of language including phonetics, phonology, morphology, syntax and semantics. Documents indexed include journal articles, book reviews, books, book chapters, dissertations and working papers.
  • Linguistic Bibliography Online "Contains over 440,000 detailed bibliographical descriptions of linguistic publications on general and language-specific theoretical linguistics. While the bibliography aims to cover all languages of the world, particular attention is given to the inclusion of publications on endangered and lesser-studied languages. Publications in any language are collected, analyzed and annotated (using a state-of-the-art system of subject and language keywords) by an international team of linguists and bibliographers from all over the world." [Introduction]
  • MLA International Bibliography An international index and database providing references to scholarly articles from over 4000 journals in literature, folklore, literary theory, semiotics, and linguistics.
  • ProQuest Dissertations and Theses Global Covers 1731 to the present. Many dissertations are available full-text online. Others (citation-only sources) are available through interlibrary loan.

Print Indexes for Linguistics

  • Bibliographie linguistischer Literatur (BLL) Call Number: Uris Library Stacks Oversize Z 7003 .B58 ++ Cancelled after 2018. BLL covers articles in periodicals and essays in collective works, including conference and congress proceedings and festschriften. The number of periodicals it covers has gradually increased from 123 in Band 1 to about 770 titles in Band 12 (1986). Coverage is international in scope with a one- or two-year time lag. Besides a division on general linguistics it also includes divisions on English, German, and Romance linguistics. Each of these divisions is subdivided into a form section, a systematic section, and a language section (the latter missing of course in the general linguistics division). The systematic section of each division contains all the entries for that division classified under appropriate subject categories. These entries may also qualify for listing again in the form and/or language sections. This whole classification scheme is fully explained in the introduction which, beginning with Band 7, appears in both German and English, as do the headings. A cumulative author index and subject and name index complete each annual volume. This index and the Bibliographie Linguistique/Linguistic Bibliography are international in scope. The BLL, however, is more current and has the advantage of a subject index. On the other hand it does not begin to cover the variety of languages that the Bibliographie Linguistique does. (De Miller)
  • Bibliographie linguistique Call Number: Uris Library Stacks Z 7001 .P451 Library has 1984-2004. See also the online edition, Linguistic Bibliography Online , listed above.

Related Indexes and Databases

  • Language Teaching [journal] Available online via Cambridge Core. The Cornell Library also has the print volumes from 1982-2006 (Library Annex Oversize PB 1 .L28). " ... offers critical survey articles of recent research on specific topics, second and foreign languages and countries, and invites original research articles reporting on replication studies and meta-analyses. The journal also includes regional surveys of outstanding doctoral dissertations, topic-based research timelines, theme-based research agendas, recent plenary conference speeches, and research-in-progress reports." [Publisher]
  • PsycINFO Access to the international literature in psychology and related behavioral and social sciences, including psychiatry, sociology, anthropology, education, pharmacology, and linguistics.
  • Sociology Source Ultimate Offers coverage from all sub-disciplines of sociology.
  • Web of Science Indexes journal articles in the sciences, social sciences, and humanities. It is also a citation database that allows cited- reference searching to identify articles that have cited a particular article or author.

Online Repositories

Increasingly scholars are submitting their papers to open access archives. These digital repositories capture, store, index, preserve, and redistribute digital research material. Many materials archived in digital repositories are searchable by search engines such as Google, as opposed to being sequestered in proprietary databases such as JSTOR or ProQuest.

  • LingBuzz LingBuzz is an openly accessible repository of scholarly papers, discussions and other documents for linguistics.
  • semanticsarchive.net "For exchanging papers of interest to natural language semanticists and philosophers of language." Maintained by the Linguistic Society of America.
  • ROA: Rutgers Optimality Archive "The Rutgers Optimality Archive is a distribution point for research in Optimality Theory and its conceptual affiliates." [Home page]
  • << Previous: Reference Sources
  • Next: World Languages >>
  • Last Updated: Jun 13, 2024 1:50 PM
  • URL: https://guides.library.cornell.edu/linguistics

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Front Psychol

Trends and hot topics in linguistics studies from 2011 to 2021: A bibliometric analysis of highly cited papers

Associated data.

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/ supplementary material .

High citations most often characterize quality research that reflects the foci of the discipline. This study aims to spotlight the most recent hot topics and the trends looming from the highly cited papers (HCPs) in Web of Science category of linguistics and language & linguistics with bibliometric analysis. The bibliometric information of the 143 HCPs based on Essential Citation Indicators was retrieved and used to identify and analyze influential contributors at the levels of journals, authors, and countries. The most frequently explored topics were identified by corpus analysis and manual checking. The retrieved topics can be grouped into five general categories: multilingual-related , language teaching , and learning related , psycho/pathological/cognitive linguistics-related , methods and tools-related , and others . Topics such as bi/multilingual(ism) , translanguaging , language/writing development , models , emotions , foreign language enjoyment (FLE) , cognition , anxiety are among the most frequently explored. Multilingual and positive trends are discerned from the investigated HCPs. The findings inform linguistic researchers of the publication characteristics of the HCPs in the linguistics field and help them pinpoint the research trends and directions to exert their efforts in future studies.

1. Introduction

Citations, as a rule, exhibit a skewed distributional pattern over the academic publications: a few papers accumulate an overwhelming large citations while the majority are rarely, if ever, cited. Correspondingly, the highly cited papers (HCPs) receive the greatest amount of attention in the academia as citations are commonly regarded as a strong indicator of research excellence. For academic professionals, following HCPs is an efficient way to stay current with the developments in a field and to make better informed decisions regarding potential research topics and directions to exert their efforts. For academic institutions, government and private agencies, and generally the science policy makers, they keep a close eye on and take advantage of this visible indicator, citations, to make more informed decisions on research funding allocation and science policy formulation. Under the backdrop of ever-growing academic outputs, there is noticeable attention shift from publication quantity to publication quality. Many countries are developing research policies to identify “excellent” universities, research groups, and researchers ( Danell, 2011 ). In a word, HCPs showcase high-quality research, encompass significant themes, and constitute a critical reference point in a research field as they are “gold bullion of science” ( Smith, 2007 ).

2. Literature review

Bibliometrics, a term coined by Pritchard (1969) , refers to the application of mathematical methods to the analysis of academic publications. Essentially this is a quantitative method to depict publication patterns within a given field based on a body of literature. There are many bibliometric studies on natural and social sciences in general ( Hsu and Ho, 2014 ; Zhu and Lei, 2022 ) and on various specific disciplines such as management sciences ( Liao et al., 2018 ), biomass research ( Chen and Ho, 2015 ), computer sciences ( Xie and Willett, 2013 ), and sport sciences ( Mancebo et al., 2013 ; Ríos et al., 2013 ), etc. In these studies, researchers tracked developments, weighed research impacts, and highlighted emerging scientific fronts with bibliometric methods. In the field of linguistics, bibliometric studies all occurred in the past few years ( van Doorslaer and Gambier, 2015 ; Lei and Liao, 2017 ; Gong et al., 2018 ; Lei and Liu, 2018 , 2019 ). These bibliometric studies mostly examined a sub-area of linguistics, such as corpus linguistics ( Liao and Lei, 2017 ), translation studies ( van Doorslaer and Gambier, 2015 ), the teaching of Chinese as a second/foreign language ( Gong et al., 2018 ), academic journals like System ( Lei and Liu, 2018 ) or Porta Linguarum ( Sabiote and Rodríguez, 2015 ), etc. Although Lei and Liu (2019) took the entire discipline of linguistics under investigation, their research is exclusively focused on applied linguistics and restricted in a limited number of journals (42 journals in total), leaving publications in other linguistics disciplines and qualified journals unexamined.

Over the recent years, a number of studies have been concerned with “excellent” papers or HCPs. For example, Small (2004) surveyed the HCPs authors’ opinions on why their papers are highly cited. The strong interest, the novelty, the utility, and the high importance of the work were among the most frequently mentioned. Most authors also considered that their selected HCPs are indeed based on their most important work in their academic career. Aksnes (2003) investigated the characteristics of HCPs and found that they were generally authored by a large number of scientists, often involving international collaboration. Some researchers even attempted to predict the HCPs by building mathematical models, implying “the first mover advantage in scientific publication” ( Newman, 2008 , 2014 ). In other words, papers published earlier in a field generally are more likely to accumulate more citations than those published later. Although many papers addressed HCPs from different perspectives, they held a common belief that HCPs are very different from less or zero cited papers and thus deserve utmost attention in academic research ( Aksnes, 2003 ; Blessinger and Hrycaj, 2010 ; Yan et al., 2022 ).

Although an increased focus on research quality can be observed in different fields, opinions diverge on the range and the inclusion criterion of excellent papers. Are they ‘highly cited’, ‘top cited’, or ‘most frequently cited’ papers? Aksnes (2003) noted two different approaches to define a highly cited article, involving absolute or relative thresholds, respectively. An absolute threshold stipulates a minimum number of citations for identifying excellent papers while a relative threshold employs the percentile rank classes, for example, the top 10% most highly cited papers in a discipline or in a publication year or in a publication set. It is important to note that citations differ significantly in different fields and disciplines. A HCP in natural sciences generally accumulates more citations than its counterpart in social sciences. Thus, it is necessary to investigate HCPs from different fields separately or adopt different inclusion criterion to ensure a valid comparison.

The present study has been motivated by two considerations. First, the sizable number of publications of varied qualities in a scientific field makes it difficult or even impossible to conduct any reliable and effective literature research. Focusing on the quality publications, the HCPs in particular, might lend more credibility to the findings on trends. Second, HCPs can serve as a great platform to discover potentially important information for the development of a discipline and understand the past, present, and future of the scientific structure. Therefore, the present study aims to investigate the hot topics and publication trends in the Web of Science category of linguistics or language & linguistics (shortened as linguistics in later references) with bibliometric methods. The study aims to answer the following three questions:

  • Who are the most productive and impactful contributors of the HCPs in WoS category of linguistics or language & linguistics in terms of publication venues, authors, and countries?
  • What are the most frequently explored topics in HCPs?
  • What are the general research trends revealed from the HCPs?

3. Materials and methods

Different from previous studies which used an arbitrary inclusion threshold (e.g., Blessinger and Hrycaj, 2010 ; Hsu and Ho, 2014 ), we rely on Essential Science Indicator (ESI) to identify the HCPs. Developed by Clarivate, a leading company in the areas of bibliometrics and scientometrics, ESI reveals emerging science trends as well as influential individuals, institutions, papers, journals, and countries in any scientific fields of inquiry by drawing on the complete WoS databases. ESI has been chosen for the following three reasons. First, ESI adopts a stricter inclusion criterion for HCPs identification. That is, a paper is selected as a HCP only when its citations exceed the top 1% citation threshold in each of the 22 ESI subject categories. Second, ESI is widely used and recognized for its reliability and authority in identifying the top-charting work, generating “excellent” metrics including hot and highly cited papers. Third, ESI automatically updates its database to generate the most recent HCPs, especially suitable for trend studies for a specified timeframe.

3.1. Data source

The data retrieval was completed at the portal of our university library on June 20, 2022. The methods to retrieve the data are described in Table 1 . The bibliometric indicators regarding the important contributors at journal/author/country levels were obtained. Specifically, after the research was completed, we clicked the “Analyze Results” bar on the result page for the detailed descriptive analysis of the retrieved bibliometric data.

Retrieval strategies.

(from Clarivate Analytics Web of Science Core Collection)
Index: Social Science Citation Index (SSCI) and Arts & Humanities Citation Index (A&HCI)
Web of Science categories = linguistics or language & linguistics
Refined by: Highly Cited Papers

Several points should be noted about the search strategies. First, we searched the bibliometric data from two sub-databases of WoS core collection: Social Science Citation Index (SSCI) and Arts & Humanities Citation Index (A&HCI). There is no need to include the sub-database of Science Citation Index Expanded (SCI-EXPANDED) because publications in the linguistics field are almost exclusively indexed in SSCI and A&HCI journals. WoS core collection was chosen as the data source because it boasts one of the most comprehensive and authoritative databases of bibliometric information in the world. Many previous studies utilized WoS to retrieve bibliometric data. van Oorschot et al. (2018) and Ruggeri et al. (2019) even indicated that WoS meets the highest standards in terms of impact factor and citation counts and hence guarantees the validity of any bibliometric analysis. Second, we do not restrict the document types as HCPs selection informed by ESI only considers articles and reviews. Third, we do not set the date range as the dataset of ESI-HCPs is automatically updated regularly to include the most recent 10 years of publications.

The aforementioned query obtained a total of 143 HCPs published in 48 journals contributed by 352 authors of 226 institutions. We then downloaded the raw bibliometric parameters of the 143 HCPs for follow-up analysis including publication years, authors, publication titles, countries, affiliations, abstracts, citation reports, etc. A complete list of the 143 HCPs can be found in the Supplementary Material . We collected the most recent impact factor (IF) of each journal from the 2022 Journal Citation Reports (JCR).

3.2. Data analysis

3.2.1. citation analysis.

A citation threshold is the minimum number of citations obtained by ranking papers in a research field in descending order by citation counts and then selecting the top fraction or percentage of papers. In ESI, the highly cited threshold reveals the minimum number of citations received by the top 1% of papers from each of the 10 database years. In other words, a paper has to meet the minimum citation threshold that varies by research fields and by years to enter the HCP list. Of the 22 research fields in ESI, Social Science, General is a broad field covering a number of WoS categories including linguistics and language & linguistics . We checked the ESI official website to obtain the yearly highly cited thresholds in the research field of Social Science , General as shown in Figure 1 ( https://esi.clarivate.com/ThresholdsAction.action ). As we can see, the longer a paper has been published, the more citations it has to receive to meet the threshold. We then divided the raw citation numbers of HCPs with the Highly Cited Thresholds in the corresponding year to obtain the normalized citations for each HCP.

An external file that holds a picture, illustration, etc.
Object name is fpsyg-13-1052586-g001.jpg

Highly cited thresholds in the research field of Social Sciences, General.

3.2.2. Corpus analysis and manual checking

To determine the most frequently explored topics in these HCPs, we used both corpus-based analysis of word frequency and manual checking. Specifically, the more frequently a word or phrase occurs in a specifically designed corpus, the more likely it constitutes a research topic. In this study, we built an Abstract corpus with all the abstracts of the 143 HCPs, totaling 24,800 tokens. The procedures to retrieve the research topics in the Abstract corpus were as follows. First, the 143 pieces of abstracts were saved as separate .txt files in one folder. Second, AntConc ( Anthony, 2022 ), a corpus analysis tool for concordancing and text analysis, was employed to extract lists of n-grams (2–4) in decreasing order of frequency. We also generated a list of individual nouns because sometimes individual nouns can also constitute research topics. Considering our small corpus data, we adopted both frequency (3) and range criteria (3) for topic candidacy. That is, a candidate n-gram must occur at least 3 times and in at least 3 different abstract files. The frequency threshold guarantees the importance of the candidate topics while the range threshold guarantees that the topics are not overly crowded in a few number of publications. In this process, we actually tested the frequency and range thresholds several rounds for the inclusion of all the potential topics. In total, we obtained 531 nouns, 1,330 2-grams, 331 3-grams, and 81 4-grams. Third, because most of the retrieved n-grams cannot function as meaningful research topics, we manually checked all the candidate items and discussed extensively to decide their roles as potential research topics until full agreements were reached. Finally, we read all the abstracts of the 143 HCPs to further validate their roles as research topics. In the end, we got 118 topic items in total.

4.1. Main publication venues of HCPs

Of the 48 journals which published the 143 HCPs, 17 journals have contributed at least 3 HCPs ( Table 2 ), around 71.33% of the total examined HCPs (102/143), indicating that HCPs tend to be highly concentrated in a limited number of journals. The three largest publication outlets of HCPs are Bilingualism Language and Cognition (16), International Journal of Bilingual Education and Bilingualism (11), and Modern Language Journal (10). Because each journal varies greatly in the number of papers published per year and the number of HCPs is associated with journal circulations, we divided the total number of papers (TP) in the examined years (2011–2021) with the number of the HCPs to acquire the HCP percentage for each journal (HCPs/TP). The three journals with the highest HCPs/TP percentage are Annual Review of Applied Linguistics (2.26), Modern Language Journal (2.08), and Bilingualism Language and Cognition (1.74), indicating that papers published in these journals have a higher probability to enter the HCPs list.

Top 17 publication venues of HCPs.

Publication TitlesNN%TPN/TP % (R)TCTC/HCP (R)IF
1611.199181.74(3)1,699106.19(14)4.763
117.708291.33(6)34931.7(17)3.165
107.004802.08(2)1,353135.3(12)7.5
74.907300.96(10)5,865837.86(1)4.521
74.901,4720.48(15)53376.14(15)4.518
64.201,0400.58(13)1,161193.50(9)4.018
64.206270.96(10)1,186197.67(8)4.155
64.205091.18(7)975162.50(11)5.24
42.802811.42(5)538134.50(13)3.063
42.803541.13(8)2,135533.75(2)7.778
42.802,1220.19(17)1,215303.75(3)1.86
42.803711.08(9)859214.75(6)4.769
42.806810.59(12)21353.25(16)3.401
42.802441.64(4)1,137284.25(4)4.158
32.101332.26(1)755251.67(5)3.87
32.105880.51(14)644214.67(7)5.964
32.108130.37(16)549183.00(10)2.842

N: the number of HCPs in each journal; N%: the percentage of HCPs in each journal in the total of 143 HCPs; TP: the total number of papers in the examined timespan (2011–2021); N/TP %: the percentage of HCPs in the total journal publications in the examined time span; TC/HCP: average citations of each HCP; R: journal ranking for the designated indicator; IF: Impact Factor in the year of 2022.

In terms of the general impact of the HCPs from each journal, we divided the number of HCPs with their total citations (TC) to obtain the average citations for each HCP (TC/HCP). The three journals with the highest TC/HCP are Journal of Memory and Language (837.86), Computational Linguistics (533.75), and Journal of Pragmatics (303.75). It indicates that even in the same WoS category, HCPs in different journals have strikingly different capability to accumulate citations. For example, the TC/HCP in System is as low as 31.73, which is even less than 4% of the highest TC/HCP in Journal of Memory and Language .

In regards to the latest journal impact factor (IF) in 2022, the top four journals with the highest IF are Computational Linguistics (7.778) , Modern Language Journal (7.5), Computer Assisted Language Learning (5.964), and Language Learning (5.24). According to the Journal Citation Reports (JCR) quantile rankings in WoS category of linguistics , all the journals on the list belong to the Q 1 (the top 25%), indicating that contributors are more likely to be attracted to contribute and cite papers in these prestigious high impact journals.

4.2. Authors of HCPs

A total of 352 authors had their names listed in the 143 HCPs, of whom 33 authors appeared in at least 2 HCPs as shown in Table 3 . We also provided in Table 3 other indicators to evaluate the authors’ productivity and impact including the total number of citations (TC), the number of citations per HCP, and the number of First author or Corresponding author HCPs (FA/CA). The reason we include the FA/CA indicator is that first authors and corresponding authors are usually considered to contribute the most and should receive greater proportion of credit in academic publications ( Marui et al., 2004 ; Dance, 2012 ).

Authors with at least 2 HCPs.

AuthorAffiliationsNFA/CATCC/HCP
Birkbeck Univ London7249270.3
Huazhong Univ Sci & Technol5521543
UCL52576115.2
CUNY31543181
Cape Breton Univ3229297.33
Univ Basel33392130.7
Univ British Columbia31915305
CUNY32543181
No Arizona Univ31676225.3
Univ Michigan21375187.5
Univ Auckland209849
UCL22956478
York Univ22241120.5
Karl Franzens Univ Graz20204102
Georgetown Univ21395197.5
Univ Potsdam20694347
Univ Tubingen21280140
Univ Ghent2116281
Penn State Univ22537268.5
Golestan Univ217738.5
Univ Nottingham21281140.5
Univ New South Wales218643
Ningbo Univ226130.5
Amer Univ Sharjah20204102
Xiamen Univ2212763.5
Univ Potsdam20694347
Hong Kong Polytech Univ2214874
Univ Technol Sydney22206103
Macquarie University22226113
Univ Maryland21292146
CUNY22475237.5
UiT Arctic Univ Norway;2114673
Univ Nottingham2012462

N: number of HCPs from each author; FA/CA: first author or corresponding author HCPs; TC: total citations of the HCPs from each author; C/HCP: average citations per HCP for each author.

In terms of the number of HCPs, Dewaele JM from Birkbeck Univ London tops the list with 7 HCPs with total citations of 492 (TC = 492), followed by Li C from Huazhong Univ Sci & Technol (#HCPs = 5; TC = 215) and Saito K from UCL (#HCPs = 5; TC = 576). It is to be noted that both Li C and Saito K have close academic collaborations with Dewaele JM . For example, 3 of the 5 HCPs by Li C are co-authored with Dewaele JM . The topics in their co-authored HCPs are mostly about foreign language learning emotions such as boredom , anxiety , enjoyment , the measurement , and positive psychology .

In regards to TC, Li, W . from UCL stands out as the most influential scholar among all the listed authors with total citations of 956 from 2 HCPs, followed by Norton B from Univ British Columbia (TC = 915) and Vasishth S from Univ Potsdam (TC = 694). The average citations per HCP from them are also the highest among the listed authors (478, 305, 347, respectively). It is important to note that Li, W.’ s 2 HCPs are his groundbreaking works on translanguaging which almost become must-reads for anyone who engages in translanguaging research ( Li, 2011 , 2018 ). Besides, Li, W. single authors his 2 HCPs, which is extremely rare as HCPs are often the results from multiple researchers. Norton B ’s HCPs are exploring some core issues in applied linguistics such as identity and investment , language learning , and social change that are considered the foundational work in its field ( Norton and Toohey, 2011 ; Darvin and Norton, 2015 ).

From the perspective of FA/CA papers, Li C from Huazhong Univ Sci and Technol is prominent because she is the first author of all her 5 HCPs. Her research on language learning emotions in the Chinese context is gaining widespread recognition ( Li et al., 2018 , 2019 , 2021 ; Li, 2019 , 2021 ). However, as a newly emerging researcher, most of her HCPs are published in the very recent years and hence accumulate relatively fewer citations (TC = 215). Mondada L from Univ Basel follows closely and single authors her 3 HCPs. Her work is mostly devoted to conversation analysis , multimodality , and social interaction ( Mondada, 2016 , 2018 , 2019 ).

We need to mention the following points regarding the productive authors of HCPs. First, when we calculated the number of HCPs from each author, only the papers published in the journals indexed in the investigated WoS categories were taken in account ( linguistics; language & linguistics ), which came as a compromise to protect the linguistics oriented nature of the HCPs. For example, Brysbaert M from Ghent University claimed a total of 8 HCPs at the time of the data retrieval, of which 6 HCPs were published in WoS category of psychology and more psychologically oriented, hence not included in our study. Besides, all the authors on the author list were treated equally when we calculated the number of HCPs, disregarding the author ordering. That implies that some influential authors may not be able to enter the list as their publications are comparatively fewer. Second, as some authors reported different affiliations at their different career stages, we only provide their most recent affiliation for convenience. Third, it is highly competitive to have one’s work selected as HCPs. The fact that a majority of the HCPs authors do not appear in our productive author list does not diminish their great contributions to this field. The rankings in Table 3 does not necessarily reflect the recognition authors have earned in academia at large.

4.3. Productive countries of HCPs

In total, the 143 HCPs originated from 33 countries. The most productive countries that contributed at least three HCPs are listed in Table 4 . The USA took an overwhelming lead with 59 HCPs, followed distantly by England with 31 HCPs. They also boasted the highest total citations (TC = 15,770; TC = 9,840), manifesting their high productivity and strong influence as traditional powerhouses in linguistics research. In regards to the average citations per HCP, Germany , England and the USA were the top three countries (TC/HCP = 281.67, 281.14, and 267.29, respectively). Although China held the third position with 19 HCPs published, its TC/HCP is the third from the bottom (TC/HCP = 66.84). One of the important reasons is that 13 out of the 19 HCPs contributed by scholars in China are published in the year of 2020 or 2021. The newly published HCPs may need more time to accumulate citations. Besides, 18 out of the 19 HCPs in China are first author and/or corresponding authors, indicating that scholars in China are becoming more independent and gaining more voice in English linguistics research.

Top 18 countries with at least 3 HCPs.

CountriesHCPsHCPs %TCC/HCPFA/CA
5941.2615,770267.2953
3524.489,840281.1426
1913.291,27066.8418
1510.493,981265.4013
128.391,06188.4210
96.292,535281.675
64.2046978.175
53.5021643.205
42.80668167.001
42.80540135.000
42.80549137.252
42.80539134.753
32.1027491.333
32.10521173.673
32.10523174.330
32.1011538.331
32.10393131.003
32.1023277.331

Two points should be noted here as to the productive countries. First, we calculated the HCP contributions from the country level instead of the region level. In other words, HCP contributions from different regions of the same country will be combined in the calculation. For example, HCPs from Scotland were added to the HCPs from England . HCPs from Hong Kong , Macau , and Taiwan are put together with the HCPs from Mainland China . In this way, a clear picture of the HCPs on the country level can be painted. Second, we manually checked the address information of the first author and corresponding author for each HCP. There are some cases where the first author or the corresponding author may report affiliations from more than one country. In this case, every country in their address list will be treated equally in the FA/CA calculation. In other word, a HCP may be classified into more than one country because of the different country backgrounds of the first and/or the corresponding author.

4.4. Top 20 HCPs

The top 20 HCPs with the highest normed citations are listed in decreasing order in Table 5 . The top cited publications can guide us to better understand the development and research topics in recent years.

Top 20 HCPs.

#RCNCAuthorsTitle (Publication Year)Journals
14,67738.88Barr, D.J., et al.Random effects structure for confirmatory hypothesis testing: keep it maximal (2013)Journal of Memory and Language
251920.24Lee, JB & Azios, JHFacilitator Behaviors Leading to Engagement and Disengagement in Aphasia Conversation Groups (2020)American Journal of Speech-Language Pathology
35838.57Matuschek, H, et al.Balancing type I error and power in linear mixed models (2017)Journal of Memory and Language
41,3138.42Taboada, M, et al.Lexicon-Based methods for sentiment analysis (2011)Computational Linguistics
53747.06Li, WTranslanguaging as a Practical Theory of language (2018)Applied Linguistics
61365.44Alva Manchego, F, et al.Data-Driven sentence simplification: survey and benchmark (2020)Computational Linguistics
76935.22Heritage, JThe epistemic engine: sequence organization and territories of language (2012)Research on Language and Social Interaction
8465.11Zhang, Q; Yang, TReflections on the medium of instruction for ethnic minorities in Xinjiang: the case of bilingual schools in Urumqi (2021)International Journal of Bilingual Education and Bilingualism
95605.08Plonsky, L; Oswald, FLHow big is big? interpreting effect sizes in L2 research (2014)Language Learning
103714.65Kuperberg, GR; Jaeger, TFWhat do we mean by prediction in language comprehension? (2016)Language Cognition and Neuroscience
11414.56Greenier, V, et al.Emotion regulation and psychological well-being in teacher work engagement: a case of British and Iranian English…(2021)System
122404.49Macaro, E, et al.A systematic review of English medium instruction in higher education (2018)Language Teaching
134064.26Otheguy, R, et al.Clarifying translanguaging and deconstructing named languages:a perspective from linguistics (2015)Applied Linguistics Review
141074.24Schad, DJ, et al.How to capitalize on contrasts in linear(mixed) models: a tutorial (2020)Journal of Memory and Language
15384.22Shirvan, ME; Taherian, TLongitudinal examination of university students’ foreign language enjoyment and foreign language classroom anxiety…(2021)International Journal of Bilingual Education and Bilingualism
161014.04MacIntyre, PD, et al.Language teachers’ coping strategies during the Covid-19 conversion to online…(2020)System
173204.03Atkinson, D, et al.A transdisciplinary framework for SLA in a multilingual world (2016)Modern Language Journal
18364.00Jin, YX; Zhang, LJThe dimensions of foreign language classroom enjoyment and their effect on foreign language achievement (2021)International Journal of Bilingual Education and Bilingualism
19353.89Derakhshan, A, et al.Boredom in online classes in the Iranian EFL contexts: sources and solutions (2021)System
205753.83Wei, LMoment analysis and translanguaging space: discursive construction of identities…(2011)Journal of Pragmatics

To save space, not full information about the HCPs is given. Some article titles have been abbreviated if they are too lengthy; for the authors, we report the first two authors and use “et al” if there are three authors or more; RC: raw citations; NC: normalized citations

By reading the titles and the abstracts of these top HCPs, we categorized the topics of the 20 HCPs into the following five groups: (i) statistical and analytical methods in (psycho)linguistics such as sentimental analysis, sentence simplification techniques, effect sizes, linear mixed models (#1, 3, 4, 6, 9, 14), (ii) language learning/teaching emotions such enjoyment, anxiety, boredom, stress (#11, 15, 16, 18, 19), (iii) translanguaging or multilinguilism (#5, 13, 20, 17), (iv) language perception (#2, 7, 10), (v) medium of instruction (#8, 12). It is no surprise that 6 out of the top 20 HCPs are about statistical methods in linguistics because language researchers aspire to employ statistics to make their research more scientific. Besides, we noticed that the papers on language teaching/learning emotions on the list are all published in the year of 2020 and 2021, indicating that these emerging topics may deserve more attention in future research. We also noticed two Covid-19 related articles (#16, 19) explored the emotions teachers and students experience during the pandemic, a timely response to the urgent need of the language learning and teaching community.

It is of special interest to note that papers from the journals indexed in multiple JCR categories seem to accumulate more citations. For example, Journal of Memory and Language , American Journal of Speech-Language Pathology , and Computational Linguistics are indexed both in SSCI and SCIE and contribute the top 4 HCPs, manifesting the advantage of these hybrid journals in amassing citations compared to the conventional language journals. Besides, different to findings from Yan et al. (2022) that most of the top HCPs in the field of radiology are reviews in document types, 19 out of the top 20 HCPs are research articles instead of reviews except Macaro et al. (2018) .

4.5. Most frequently explored topics of HCPs

After obtaining the corpus based topic items, we read all the titles and abstracts of the 143 HCPs to further validate their roles as research topics. Table 6 presents the top research topics with the observed frequency of 5 or above. We grouped these topics into five broad categories: bilingual-related, language learning/teaching-related, psycho/pathological/cognitive linguistics-related, methods and tools-related, and others . The observed frequency count for each topic in the abstract corpus were included in the brackets. We found that about 34 of the 143 HCPs are exploring bilingual related issues, the largest share among all the categorized topics, testifying its academic popularity in the examined timespan. Besides, 30 of the 143 HCPs are investigating language learning/teaching-related issues, with topics ranging from learners (e.g., EFL learners, individual difference) to multiple learning variables (e.g., learning strategy, motivation, agency). The findings here will be validated by the analysis of the keywords.

Categorization of the most explored research topics.

CategoriesNhot topic items
Multilingual-related34Multilingualism(127), translanguaging(42), heritage language/speakers/learners(31), language/education policy(6)
Language learning/teaching-related30Language/writing development(35), academic writing/vocabulary/publishing(22), learning strategy(20), motivation(17), individual differences(13), CLIL(11), agency(11), flipped classroom(9), self-efficacy(9), EFL learner(7), ELF (7), early language(7)
Psycho/pathological/cognitive linguistics-related25Emotion(47), FLE(42), cognition(39), anxiety(35), FLCA(30), stuttering(21), anxiety/language/fluency disorder(16), boredom(14), language impairment(14), brain(11), working memory(9), speech language pathology/therapy/pathologists(7), positive psychology(6), language ideology(5)
Methods and tools-related16Model(67), review (35), qualitative data(14), quantitative data(8), corpus-based studies/teaching(6), longitudinal study/analysis(5), sentiment analysis(5), meta-analysis(5), eye tracking(4), mixed method(4)
Others38Lexical(25), identity(21), social interaction/difficulties(17), sematic models/mapping(15), Covid-19(9)

N: the number of the HCPs in each topic category; ELF: English as a lingua franca; CLIL: content and language integrated learning; FLE: foreign language enjoyment; FLCA: foreign language classroom anxiety

Several points should be mentioned regarding the topic candidacy. First, for similar topic expressions, we used a cover term and added the frequency counts. For example, multilingualism is a cover term for bilinguals, bilingualism, plurilingualism, and multilingualism . Second, for nouns of singular and plural forms (e.g., emotion and emotions ) or for items with different spellings (e.g., meta analysis and meta analyses ), we combined the frequency counts. Third, we found that some longer items (3 grams and 4 grams) could be subsumed to short ones (2 grams or monogram) without loss of essential meaning (e.g., working memory from working memory capacity ). In this case, the shorter ones were kept for their higher frequency. Fourth, some highly frequent terms were discarded because they were too general to be valuable topics in language research, for example, applied linguistics , language use , second language .

5. Discussion and implications

Based on 143 highly cited papers collected from the WoS categories of linguistics , the present study attempts to present a bird’s eye view of the publication landscape and the most updated research themes reflected from the HCPs in the linguistics field. Specifically, we investigated the important contributors of HCPs in terms of journals, authors and countries. Besides, we spotlighted the research topics by corpus-based analysis of the abstracts and a detailed analysis of the top HCPs. The study has produced several findings that bear important implications.

The first finding is that the HCPs are highly concentrated in a limited journals and countries. In regards to journals, those in the spheres of bilingualism and applied linguistics (e.g., language teaching and learning) are likely to accumulate more citations and hence to produce more HCPs. Journals that focus on bilingualism from a linguistic, psycholinguistic, and neuroscientific perspective are the most frequent outlets of HCPs as evidenced by the top two productive journals of HCPs, Bilingualism Language and Cognition and International Journal of Bilingual Education and Bilingualism . This can be explained by the multidisciplinary nature of bilingual-related research and the development of cognitive measurement techniques. The merits of analyzing publication venues of HCPs are two folds. One the one hand, it can point out which sources of high-quality publications in this field can be inquired for readers as most of the significant and cutting-edge achievements are concentrated in these prestigious journals. On the other hand, it also provides essential guidance or channels for authors or contributors to submit their works for higher visibility.

In terms of country distributions, the traditional powerhouses in linguistics research such as the USA and England are undoubtedly leading the HCP publications in both the number and the citations of the HCPs. However, developing countries are also becoming increasing prominent such as China and Iran , which could be traceable in the funding and support of national language policies and development policies as reported in recent studies ( Ping et al., 2009 ; Lei and Liu, 2019 ). Take China as an example. Along with economic development, China has given more impetus to academic outputs with increased investment in scientific research ( Lei and Liao, 2017 ). Therefore, researchers in China are highly motivated to publish papers in high-quality journals to win recognition in international academia and to deal with the publish or perish pressure ( Lee, 2014 ). These factors may explain the rise of China as a new emerging research powerhouse in both natural and social sciences, including English linguistics research.

The second finding is the multilingual trend in linguistics research. The dominant clustering of topics regarding multilingualism can be understood as a timely response to the multilingual research fever ( May, 2014 ). 34 out of the 143 HCPs have such words as bilingualism, bilingual, multilingualism , translanguaging , etc., in their titles, reflecting a strong multilingual tendency of the HCPs. Multilingual-related HCPs mainly involve three aspects: multilingualism from the perspectives of psycholinguistics and cognition (e.g., Luk et al., 2011 ; Leivada et al., 2020 ); multilingual teaching (e.g., Schissel et al., 2018 ; Ortega, 2019 ; Archila et al., 2021 ); language policies related to multilingualism (e.g., Shen and Gao, 2018 ). As a pedagogical process initially used to describe the bilingual classroom practice and also a frequently explored topic in HCPs, translanguaging is developed into an applied linguistics theory since Li’s Translanguaging as a Practical Theory of Language ( Li, 2018 ). The most common collocates of translanguaging in the Abstract corpus are pedagogy/pedagogies, practices, space/spaces . There are two main reasons for this multilingual turn. First, the rapid development of globalization, immigration, and overseas study programs greatly stimulate the use and research of multiple languages in different linguistic contexts. Second, in many non-English countries, courses are delivered through languages (mostly English) besides their mother tongue ( Clark, 2017 ). Students are required to use multiple languages as resources to learn and understand subjects and ideas. The burgeoning body of English Medium Instruction literature in higher education is in line with the rising interest in multilingualism. Due to the innate multidisciplinary nature, it is to be expected that, multilingualism, the topic du jour, is bound to attract more attention in the future.

The third finding is the application of Positive Psychology (PP) in second language acquisition (SLA), that is, the positive trend in linguistic research. In our analysis, 20 out of 143 HCPs have words or phrases such as emotions, enjoyment, boredom, anxiety , and positive psychology in their titles, which might signal a shift of interest in the psychology of language learners and teachers in different linguistic environments. Our study shows Foreign language enjoyment (FLE) is the most frequently explored emotion, followed by foreign language classroom anxiety (FLCA), the learners’ metaphorical left and right feet on their journey to acquiring the foreign language ( Dewaele and MacIntyre, 2016 ). In fact, the topics of PP are not entirely new to SLA. For example, studies of language motivations, affections, and good language learners all provide roots for the emergence of PP in SLA ( Naiman, 1978 ; Gardner, 2010 ). In recent years, both research and teaching applications of PP in SLA are building rapidly, with a diversity of topics already being explored such as positive education and PP interventions. It is to be noted that SLA also feeds back on PP theories and concepts besides drawing inspirations from it, which makes it “an area rich for interdisciplinary cross-fertilization of ideas” ( Macintyre et al., 2019 ).

It should be noted that subjectivity is involved when we decide and categorize the candidate topic items based on the Abstract corpus. However, the frequency and range criteria guarantee that these items are actually more explored in multiple HCPs, thus indicating topic values for further investigation. Some high frequent n-grams are abandoned because they are too general or not meaningful topics. For example, applied linguistics is too broad to be included as most of the HCPs concern issues in this research line instead of theoretical linguistics. By meaningful topics, we mean that the topics can help journal editors and readers quickly locate their interested fields ( Lei and Liu, 2019 ), as the author keywords such as bilingualism , emotions , and individual differences . The examination of the few 3/4-grams and monograms (mostly nouns) revealed that most of them were either not meaningful topics or they could be subsumed in the 2-grams. Besides, there is inevitably some overlapping in the topic categorizations. For example, some topics in the language teaching and learning category are situated and discussed within the context of multilingualism. The merits of topic categorizations are two folds: to better monitor the overlapping between the Abstract corpus-based topic items and the keywords; to roughly delineate the research strands in the HCPs for future research.

It should also be noted that all the results were based on the retrieved HCPs only. The study did not aim to paint a comprehensive and full picture of the whole landscape of linguistic research. Rather, it specifically focused on the most popular literature in a specified timeframe, thus generating the snapshots or trends in linguistic research. One of the important merits of this methodology is that some newly emerging but highly cited researchers can be spotlighted and gain more academic attention because only the metrics of HCPs are considered in calculation. On the contrary, the exclusion of some other highly cited researchers in general such as Rod Ellis and Ken Hyland just indicates that their highly cited publications are not within our investigated timeframe and cannot be interpreted as their diminishing academic influence in the field. Besides, the study does not consider the issue of collaborators or collaborations in calculating the number of HCPs for two reasons. First, although some researchers are regular collaborators such as Li CC and Dewaele JM, their individual contribution can never be undermined. Second, the study also provides additional information about the number of the FA/CA HCPs from each listed author, which may aid readers in locating their interested research.

We acknowledge that our study has some limitations that should be addressed in future research. First, our study focuses on the HCPs extracted from WoS SSCI and A&HCI journals, the alleged most celebrated papers in this field. Future studies may consider including data from other databases such as Scopus to verify the findings of the present study. Second, our Abstract corpus-based method for topic extraction involved human judgement. Although the final list was the result of several rounds of discussions among the authors, it is difficult or even impossible to avoid subjectivity and some worthy topics may be unconsciously missed. Therefore, future research may consider employing automatic algorithms to extract topics. For example, a dependency-based machine learning approach can be used to identify research topics ( Zhu and Lei, 2021 ).

Data availability statement

Author contributions.

SY: conceptualization and methodology. SY and LZ: writing-review and editing and writing-original draft. All authors contributed to the article and approved the submitted version.

This work was supported by Humanities and Social Sciences Youth Fund of China MOE under the grant 20YJC740076 and 18YJC740141.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg.2022.1052586/full#supplementary-material

  • Aksnes D. W. (2003). Characteristics of highly cited papers . Res. Eval. 12 , 159–170. doi: 10.3152/147154403781776645 [ CrossRef ] [ Google Scholar ]
  • Anthony L. (2022). AntConc (version 4.0.5) Tokyo, Japan: Waseda University. Available at: https://www.laurenceanthony.net/software (Accessed June 20, 2022).
  • Archila P. A., Molina J., Truscott de Mejía A.-M. (2021). Fostering bilingual scientific writing through a systematic and purposeful code-switching pedagogical strategy . Int. J. Biling. Educ. Biling. 24 , 785–803. doi: 10.1080/13670050.2018.1516189 [ CrossRef ] [ Google Scholar ]
  • Blessinger K., Hrycaj P. (2010). Highly cited articles in library and information science: an analysis of content and authorship trends . Libr. Inf. Sci. Res. 32 , 156–162. doi: 10.1016/j.lisr.2009.12.007 [ CrossRef ] [ Google Scholar ]
  • Chen H., Ho Y. S. (2015). Highly cited articles in biomass research: a bibliometric analysis . Renew. Sust. Energ. Rev. 49 , 12–20. doi: 10.1016/j.rser.2015.04.060 [ CrossRef ] [ Google Scholar ]
  • Clark S. (2017). Translanguaging in higher education: beyond monolingual ideologies . Int. J. Biling. Educ. Biling. 22 , 1048–1051. doi: 10.1080/13670050.2017.1322568 [ CrossRef ] [ Google Scholar ]
  • Dance A. (2012). Authorship: Who’s on first? Nature 489 , 591–593. doi: 10.1038/nj7417-591a, PMID: [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Danell R. (2011). Can the quality of scientific work be predicted using information on the author’s track record? J. Am. Soc. Inf. Sci. Technol. 62 , 50–60. doi: 10.1002/asi.21454 [ CrossRef ] [ Google Scholar ]
  • Darvin R., Norton B. (2015). Identity and a model of Investment in Applied Linguistics . Annu. Rev. Appl. Linguist. 35 , 36–56. doi: 10.1017/S0267190514000191 [ CrossRef ] [ Google Scholar ]
  • Dewaele J.-M., MacIntyre P. D. (2016). “ Foreign language enjoyment and foreign language classroom anxiety: the right and left feet of the language learner ” in Positive psychology in SLA . eds. Peter D. M., Tammy G., Sarah M. (Bristol, Blue Ridge Summit: Multilingual Matters; ), 215–236. [ Google Scholar ]
  • Gardner R. (2010). Motivation and second language acquisition: The socio-educational model . New York: Peter Lang. [ Google Scholar ]
  • Gong Y., Lyu B., Gao X. (2018). Research on teaching Chinese as a second or foreign language in and outside mainland China: a bibliometric analysis . Asia Pac. Educ. Res. 27 , 277–289. doi: 10.1007/s40299-018-0385-2 [ CrossRef ] [ Google Scholar ]
  • Hsu Y., Ho Y. S. (2014). Highly cited articles in health care sciences and services field in science citation index Expanded . Methods Inf. Med. 53 , 446–458. doi: 10.3414/ME14-01-0022, PMID: [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lee I. (2014). Publish or perish: the myth and reality of academic publishing . Lang. Teach. 47 , 250–261. doi: 10.1017/S0261444811000504 [ CrossRef ] [ Google Scholar ]
  • Lei L., Liao S. (2017). Publications in linguistics journals from mainland China, Hong Kong, Taiwan, and Macau (2003–2012): a bibliometric analysis . J. Quant. Ling. 24 , 54–64. doi: 10.1080/09296174.2016.1260274 [ CrossRef ] [ Google Scholar ]
  • Lei L., Liu D. (2018). The research trends and contributions of System’s publications over the past four decades (1973–2017): a bibliometric analysis . System 80 , 1–13. doi: 10.1016/j.system.2018.10.003 [ CrossRef ] [ Google Scholar ]
  • Lei L., Liu D. (2019). Research trends in applied linguistics from 2005 to 2016: a bibliometric analysis and its implications . Appl. Linguis. 40 , 540–561. doi: 10.1093/applin/amy003 [ CrossRef ] [ Google Scholar ]
  • Leivada E., Westergaard M., Duabeitia J. A., Rothman J. (2020). On the phantom-like appearance of bilingualism effects on neurocognition: (how) should we proceed? Biling. Lang. Congn. 24 , 197–210. doi: 10.1017/S1366728920000358 [ CrossRef ] [ Google Scholar ]
  • Li W. (2011). Moment analysis and translanguaging space: discursive construction of identities by multilingual Chinese youth in Britain . Energy Fuel 43 , 1222–1235. doi: 10.1016/j.pragma.2010.07.035 [ CrossRef ] [ Google Scholar ]
  • Li W. (2018). Translanguaging as a practical theory of language . Appl. Linguis. 39 , 9–30. doi: 10.1093/applin/amx039 [ CrossRef ] [ Google Scholar ]
  • Li C. (2019). A positive psychology perspective on Chinese EFL students’ trait emotional intelligence, foreign language enjoyment and EFL learning achievement . J. Multiling. Multicult. Dev. 41 , 246–263. doi: 10.1080/01434632.2019.1614187 [ CrossRef ] [ Google Scholar ]
  • Li C. (2021). A control-value theory approach to boredom in English classes among university students in China . Mod. Lang. J. 105 , 317–334. doi: 10.1111/modl.12693 [ CrossRef ] [ Google Scholar ]
  • Li C., Dewaele J. M., Hu Y. (2021). Foreign language learning boredom: conceptualization and measurement . Appl. Ling. Rev. doi: 10.1515/applirev-2020-0124 [ CrossRef ] [ Google Scholar ]
  • Li C., Dewaele J. M., Jiang G. (2019). The complex relationship between classroom emotions and EFL achievement in China . Appl. Ling. Rev. 11 , 485–510. doi: 10.1515/applirev-2018-0043 [ CrossRef ] [ Google Scholar ]
  • Li C., Jiang G., Jean-Marc D. (2018). Understanding Chinese high school students’ foreign language enjoyment: validation of the Chinese version of the foreign language enjoyment scale . System 76 , 183–196. doi: 10.1016/j.system.2018.06.004 [ CrossRef ] [ Google Scholar ]
  • Liao S., Lei L. (2017). What we talk about when we talk about corpus: a bibliometric analysis of corpus-related research in linguistics (2000-2015) . Glottometrics 38 , 1–20. [ Google Scholar ]
  • Liao H., Tang M., Li Z., Lev B. (2018). Bibliometric analysis for highly cited papers in operations research and management science from 2008 to 2017 based on essential science indicators . Omega 88 , 223–236. doi: 10.1016/j.omega.2018.11.005 [ CrossRef ] [ Google Scholar ]
  • Luk G., Sa E. D., Bialystok E. (2011). Is there a relation between onset age of bilingualism and enhancement of cognitive control? Biling. Lang. Cogn. 14 , 588–595. doi: 10.1017/S1366728911000010 [ CrossRef ] [ Google Scholar ]
  • Macaro E., Curle S., Pun J., Dearden J. (2018). A systematic review of English medium instruction in higher education . Lang. Teach. 51 , 36–76. doi: 10.1017/S0261444817000350 [ CrossRef ] [ Google Scholar ]
  • Macintyre P., Gregersen T., Mercer S. (2019). Setting an agenda for positive psychology in SLA: theory, practice, and research . Mod. Lang. J. 103 , 262–274. doi: 10.1111/modl.12544 [ CrossRef ] [ Google Scholar ]
  • Mancebo F. P., Sapena A. F., Herrera M. V., González L., Toca H., Benavent R. A. (2013). Scientific literature analysis of judo in web of science . Arch. Budo 9 , 81–91. doi: 10.12659/AOB.883883 [ CrossRef ] [ Google Scholar ]
  • Marui M., Bozikov J., Katavi V., Hren D., Kljakovi-Gapi M., Marui A. (2004). Authorship in a small medical journal: a study of contributorship statements by corresponding authors . Sci. Eng. Ethics 10 , 493–502. doi: 10.1007/s11948-004-0007-7, PMID: [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • May S. (2014). The multilingual turn: Implications for SLA, TESOL and bilingual education . New York: Routledge. [ Google Scholar ]
  • Mondada L. (2016). Challenges of multimodality: language and the body in social interaction . J. Socioling. 20 , 336–366. doi: 10.1111/josl.1_12177 [ CrossRef ] [ Google Scholar ]
  • Mondada L. (2018). Multiple temporalities of language and body in interaction: challenges for transcribing multimodality . Res. Lang. Soc. Interact. 51 , 85–106. doi: 10.1080/08351813.2018.1413878 [ CrossRef ] [ Google Scholar ]
  • Mondada L. (2019). Contemporary issues in conversation analysis: embodiment and materiality, multimodality and multisensoriality in social interaction . J. Pragmat. 145 , 47–62. doi: 10.1016/j.pragma.2019.01.016 [ CrossRef ] [ Google Scholar ]
  • Naiman N. (1978). The good language learner . Clevedon, UK: Multilingual Matters. [ Google Scholar ]
  • Newman M. (2008). The first-mover advantage in scientific publication . Eplasty 86 , 68001–68006. doi: 10.1209/0295-5075/86/68001 [ CrossRef ] [ Google Scholar ]
  • Newman M. (2014). Prediction of highly cited papers . Eplasty 105 :28002. doi: 10.1209/0295-5075/105/28002 [ CrossRef ] [ Google Scholar ]
  • Norton B., Toohey K. (2011). Identity, language learning, and social change . Lang. Teach. 44 , 412–446. doi: 10.1017/S0261444811000309 [ CrossRef ] [ Google Scholar ]
  • Ortega L. (2019). SLA and the study of equitable multilingualism . Mod. Lang. J. 103 , 23–38. doi: 10.1111/modl.12525 [ CrossRef ] [ Google Scholar ]
  • Ping Z., Thijs B., Glnzel W. (2009). Is China also becoming a giant in social sciences? Scientometrics 79 , 593–621. doi: 10.1007/s11192-007-2068-x [ CrossRef ] [ Google Scholar ]
  • Pritchard A. (1969). Statistical bibliography or bibliometrics . J. Doc. 25 , 348–349. [ Google Scholar ]
  • Ríos L. J. C., Tamao I. M., Olmos J. (2013). Bibliometric study (1922-2009) on rugby articles in research journals . South Afr. J. Res. Sport Phys. Educ. Rec. 17 , 313–109. doi: 10.3176/tr.2013.3.06 [ CrossRef ] [ Google Scholar ]
  • Ruggeri G., Orsi L., Corsi S. (2019). A bibliometric analysis of the scientific literature on Fairtrade labelling . Int. IJC 43 , 134–152. doi: 10.1111/ijcs.12492 [ CrossRef ] [ Google Scholar ]
  • Sabiote C. R., Rodríguez J. A. (2015). Bibliometric study and methodological quality indicators of the journal porta Linguarum during six year period 2008-2013 . Porta Ling. 24 , 135–150. doi: 10.30827/Digibug.53866 [ CrossRef ] [ Google Scholar ]
  • Schissel J. L., De Korne H., López-Gopar M. E. (2018). Grappling with translanguaging for teaching and assessment in culturally and linguistically diverse contexts: teacher perspectives from Oaxaca, Mexico . Int. J. Biling. Educ. Biling. 24 , 340–356. doi: 10.1080/13670050.2018.1463965 [ CrossRef ] [ Google Scholar ]
  • Shen Q., Gao X. (2018). Multilingualism and policy making in greater China: ideological and implementational spaces . Lang. Policy 18 , 1–16. doi: 10.1007/s10993-018-9473-7 [ CrossRef ] [ Google Scholar ]
  • Small H. (2004). Why authors think their papers are highly cited . Scientometrics 60 , 305–316. doi: 10.1023/B:SCIE.0000034376.55800.18 [ CrossRef ] [ Google Scholar ]
  • Smith D. R. (2007). The New Zealand timber economy, 1840–1935 . N. Z. Med. J. 120 , U2871–U2313. doi: 10.1016/0305-7488(90)90044-C, PMID: [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • van Doorslaer L., Gambier Y. (2015). Measuring relationships in translation studies. On affiliations and keyword frequencies in the translation studies bibliography . Perspectives 23 , 305–319. doi: 10.1080/0907676X.2015.1026360 [ CrossRef ] [ Google Scholar ]
  • van Oorschot J. A. W. H., Hofman E., Halman J. (2018). A bibliometric review of the innovation adoption literature . Technol. Forecast. Soc. Chang. 134 , 1–21. doi: 10.1016/j.techfore.2018.04.032 [ CrossRef ] [ Google Scholar ]
  • Xie Z., Willett P. (2013). The development of computer science research in the People’s republic of China 2000–2009: a bibliometric study . Inf. Dev. 29 , 251–264. doi: 10.1177/0266666912458515 [ CrossRef ] [ Google Scholar ]
  • Yan S., Zhang H., Wang J. (2022). Trends and hot topics in radiology, nuclear medicine and medical imaging from 2011–2021: a bibliometric analysis of highly cited papers . Jpn. J. Radiol. 40 , 847–856. doi: 10.1007/s11604-022-01268-z, PMID: [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Zhu H., Lei L. (2021). A dependency-based machine learning approach to the identification of research topics: a case in COVID-19 studies . Lib. Hi Tech 40 , 495–515. doi: 10.1108/LHT-01-2021-0051 [ CrossRef ] [ Google Scholar ]
  • Zhu H., Lei L. (2022). The research trends of text classification studies (2000–2020): a bibliometric analysis . SAGE Open 12 , 215824402210899–215824402210816. doi: 10.1177%2F21582440221089963 [ Google Scholar ]

Harvard Working Papers in Linguistics

Harvard Working Papers in Linguistics is a publication of the graduate students of the Department of Linguistics at Harvard University. It is a venue for students and faculty to publish papers that reflect ongoing research and papers that are in their early stages, in order to stimulate discussion and inform the wider linguistics community about exciting developments at Harvard.

12 volumes of HWPL (1992-2007) have been published and are now out of print, and individual articles are available as PDFs through this website. As of volume 13 (now in press, estimated date of publication Fall 2015), HWPL will be published and distributed by our neighbors and colleagues at MIT Working Papers in Linguistics.

For information on formatting your submissions to HWPL and for article templates, see ‘ submissions ‘.

Please direct all questions to  Julia Sturm .

Ohio State navigation bar

  • BuckeyeLink
  • Search Ohio State

Working Papers in Linguistics

Working Papers in Linguistics is an occasional publication of the Department of Linguistics of Ohio State University and usually contains articles written by students and faculty in the department. Below is an indication of the contents of each volume.

Copyright Notice: The contents of OSU Working Papers in Linguistics are freely available for personal, academic, or educational use; however, all rights to the works published therein are held by the authors alone. Note: Volumes 60 and 57 are available only in an online format [pdf].

How to contribute to the working papers in Linguistics

For past issues of Working Papers in Linguistics, please see our archives .

More permanent archiving : We are pleased to formally announce that The Ohio State University (OSU) has agreed to archive digital versions of the OSU Working Papers in Linguistics through the university's highly acclaimed Knowledge Bank. The Knowledge Bank ( http://kb.osu.edu ) is OSU's digital institutional repository. Inclusion in the Knowledge Bank means that past articles from the very first volume on will be searchable (e.g. by Google) and available via the Internet. Other benefits of this arrangement include long-term preservation in a cutting edge, professionally managed repository, a worldwide audience, immediate distribution of research, a long-term stable URL that can be used in citation, and an increased web presence. We welcome feedback from any author (or rights holder) whose work appeared in the OSU WPL and who has any questions or concerns about the digitization process and/or inclusion in the Knowledge Bank. Please contact the OSU WPL committee ([email protected]) by July 31, 2016.

OSU WPL at the Knowledge Bank

Most recent working paper in linguistics, spring 2013,  vol. 60.

Edited by Mary E. Beckman, Marivic Lesho, Judith Tonhauser, and Tsz-Him Tsui

research papers in linguistics

[pdf] - Some links on this page are to .pdf files.  If you need these files in a more accessible format, please contact [email protected] . PDF files require the use of Adobe Acrobat Reader software to open them. If you do not have Reader, you may use the following link to Adobe to download it for free at: Adobe Acrobat Reader .

  • How It Works
  • PhD thesis writing
  • Master thesis writing
  • Bachelor thesis writing
  • Dissertation writing service
  • Dissertation abstract writing
  • Thesis proposal writing
  • Thesis editing service
  • Thesis proofreading service
  • Thesis formatting service
  • Coursework writing service
  • Research paper writing service
  • Architecture thesis writing
  • Computer science thesis writing
  • Engineering thesis writing
  • History thesis writing
  • MBA thesis writing
  • Nursing dissertation writing
  • Psychology dissertation writing
  • Sociology thesis writing
  • Statistics dissertation writing
  • Buy dissertation online
  • Write my dissertation
  • Cheap thesis
  • Cheap dissertation
  • Custom dissertation
  • Dissertation help
  • Pay for thesis
  • Pay for dissertation
  • Senior thesis
  • Write my thesis

211 Research Topics in Linguistics To Get Top Grades

research topics in linguistics

Many people find it hard to decide on their linguistics research topics because of the assumed complexities involved. They struggle to choose easy research paper topics for English language too because they think it could be too simple for a university or college level certificate.

All that you need to learn about Linguistics and English is sprawled across syntax, phonetics, morphology, phonology, semantics, grammar, vocabulary, and a few others. To easily create a top-notch essay or conduct a research study, you can consider this list of research topics in English language below for your university or college use. Note that you can fine-tune these to suit your interests.

Linguistics Research Paper Topics

If you want to study how language is applied and its importance in the world, you can consider these Linguistics topics for your research paper. They are:

  • An analysis of romantic ideas and their expression amongst French people
  • An overview of the hate language in the course against religion
  • Identify the determinants of hate language and the means of propagation
  • Evaluate a literature and examine how Linguistics is applied to the understanding of minor languages
  • Consider the impact of social media in the development of slangs
  • An overview of political slang and its use amongst New York teenagers
  • Examine the relevance of Linguistics in a digitalized world
  • Analyze foul language and how it’s used to oppress minors
  • Identify the role of language in the national identity of a socially dynamic society
  • Attempt an explanation to how the language barrier could affect the social life of an individual in a new society
  • Discuss the means through which language can enrich cultural identities
  • Examine the concept of bilingualism and how it applies in the real world
  • Analyze the possible strategies for teaching a foreign language
  • Discuss the priority of teachers in the teaching of grammar to non-native speakers
  • Choose a school of your choice and observe the slang used by its students: analyze how it affects their social lives
  • Attempt a critical overview of racist languages
  • What does endangered language means and how does it apply in the real world?
  • A critical overview of your second language and why it is a second language
  • What are the motivators of speech and why are they relevant?
  • Analyze the difference between the different types of communications and their significance to specially-abled persons
  • Give a critical overview of five literature on sign language
  • Evaluate the distinction between the means of language comprehension between an adult and a teenager
  • Consider a native American group and evaluate how cultural diversity has influenced their language
  • Analyze the complexities involved in code-switching and code-mixing
  • Give a critical overview of the importance of language to a teenager
  • Attempt a forensic overview of language accessibility and what it means
  • What do you believe are the means of communications and what are their uniqueness?
  • Attempt a study of Islamic poetry and its role in language development
  • Attempt a study on the role of Literature in language development
  • Evaluate the Influence of metaphors and other literary devices in the depth of each sentence
  • Identify the role of literary devices in the development of proverbs in any African country
  • Cognitive Linguistics: analyze two pieces of Literature that offers a critical view of perception
  • Identify and analyze the complexities in unspoken words
  • Expression is another kind of language: discuss
  • Identify the significance of symbols in the evolution of language
  • Discuss how learning more than a single language promote cross-cultural developments
  • Analyze how the loss of a mother tongue affect the language Efficiency of a community
  • Critically examine how sign language works
  • Using literature from the medieval era, attempt a study of the evolution of language
  • Identify how wars have led to the reduction in the popularity of a language of your choice across any country of the world
  • Critically examine five Literature on why accent changes based on environment
  • What are the forces that compel the comprehension of language in a child
  • Identify and explain the difference between the listening and speaking skills and their significance in the understanding of language
  • Give a critical overview of how natural language is processed
  • Examine the influence of language on culture and vice versa
  • It is possible to understand a language even without living in that society: discuss
  • Identify the arguments regarding speech defects
  • Discuss how the familiarity of language informs the creation of slangs
  • Explain the significance of religious phrases and sacred languages
  • Explore the roots and evolution of incantations in Africa

Sociolinguistic Research Topics

You may as well need interesting Linguistics topics based on sociolinguistic purposes for your research. Sociolinguistics is the study and recording of natural speech. It’s primarily the casual status of most informal conversations. You can consider the following Sociolinguistic research topics for your research:

  • What makes language exceptional to a particular person?
  • How does language form a unique means of expression to writers?
  • Examine the kind of speech used in health and emergencies
  • Analyze the language theory explored by family members during dinner
  • Evaluate the possible variation of language based on class
  • Evaluate the language of racism, social tension, and sexism
  • Discuss how Language promotes social and cultural familiarities
  • Give an overview of identity and language
  • Examine why some language speakers enjoy listening to foreigners who speak their native language
  • Give a forensic analysis of his the language of entertainment is different to the language in professional settings
  • Give an understanding of how Language changes
  • Examine the Sociolinguistics of the Caribbeans
  • Consider an overview of metaphor in France
  • Explain why the direct translation of written words is incomprehensible in Linguistics
  • Discuss the use of language in marginalizing a community
  • Analyze the history of Arabic and the culture that enhanced it
  • Discuss the growth of French and the influences of other languages
  • Examine how the English language developed and its interdependence on other languages
  • Give an overview of cultural diversity and Linguistics in teaching
  • Challenge the attachment of speech defect with disability of language listening and speaking abilities
  • Explore the uniqueness of language between siblings
  • Explore the means of making requests between a teenager and his parents
  • Observe and comment on how students relate with their teachers through language
  • Observe and comment on the communication of strategy of parents and teachers
  • Examine the connection of understanding first language with academic excellence

Language Research Topics

Numerous languages exist in different societies. This is why you may seek to understand the motivations behind language through these Linguistics project ideas. You can consider the following interesting Linguistics topics and their application to language:

  • What does language shift mean?
  • Discuss the stages of English language development?
  • Examine the position of ambiguity in a romantic Language of your choice
  • Why are some languages called romantic languages?
  • Observe the strategies of persuasion through Language
  • Discuss the connection between symbols and words
  • Identify the language of political speeches
  • Discuss the effectiveness of language in an indigenous cultural revolution
  • Trace the motivators for spoken language
  • What does language acquisition mean to you?
  • Examine three pieces of literature on language translation and its role in multilingual accessibility
  • Identify the science involved in language reception
  • Interrogate with the context of language disorders
  • Examine how psychotherapy applies to victims of language disorders
  • Study the growth of Hindi despite colonialism
  • Critically appraise the term, language erasure
  • Examine how colonialism and war is responsible for the loss of language
  • Give an overview of the difference between sounds and letters and how they apply to the German language
  • Explain why the placement of verb and preposition is different in German and English languages
  • Choose two languages of your choice and examine their historical relationship
  • Discuss the strategies employed by people while learning new languages
  • Discuss the role of all the figures of speech in the advancement of language
  • Analyze the complexities of autism and its victims
  • Offer a linguist approach to language uniqueness between a Down Syndrome child and an autist
  • Express dance as a language
  • Express music as a language
  • Express language as a form of language
  • Evaluate the role of cultural diversity in the decline of languages in South Africa
  • Discuss the development of the Greek language
  • Critically review two literary texts, one from the medieval era and another published a decade ago, and examine the language shifts

Linguistics Essay Topics

You may also need Linguistics research topics for your Linguistics essays. As a linguist in the making, these can help you consider controversies in Linguistics as a discipline and address them through your study. You can consider:

  • The connection of sociolinguistics in comprehending interests in multilingualism
  • Write on your belief of how language encourages sexism
  • What do you understand about the differences between British and American English?
  • Discuss how slangs grew and how they started
  • Consider how age leads to loss of language
  • Review how language is used in formal and informal conversation
  • Discuss what you understand by polite language
  • Discuss what you know by hate language
  • Evaluate how language has remained flexible throughout history
  • Mimicking a teacher is a form of exercising hate Language: discuss
  • Body Language and verbal speech are different things: discuss
  • Language can be exploitative: discuss
  • Do you think language is responsible for inciting aggression against the state?
  • Can you justify the structural representation of any symbol of your choice?
  • Religious symbols are not ordinary Language: what are your perspective on day-to-day languages and sacred ones?
  • Consider the usage of language by an English man and someone of another culture
  • Discuss the essence of code-mixing and code-switching
  • Attempt a psychological assessment on the role of language in academic development
  • How does language pose a challenge to studying?
  • Choose a multicultural society of your choice and explain the problem they face
  • What forms does Language use in expression?
  • Identify the reasons behind unspoken words and actions
  • Why do universal languages exist as a means of easy communication?
  • Examine the role of the English language in the world
  • Examine the role of Arabic in the world
  • Examine the role of romantic languages in the world
  • Evaluate the significance of each teaching Resources in a language classroom
  • Consider an assessment of language analysis
  • Why do people comprehend beyond what is written or expressed?
  • What is the impact of hate speech on a woman?
  • Do you believe that grammatical errors are how everyone’s comprehension of language is determined?
  • Observe the Influence of technology in language learning and development
  • Which parts of the body are responsible for understanding new languages
  • How has language informed development?
  • Would you say language has improved human relations or worsened it considering it as a tool for violence?
  • Would you say language in a black populous state is different from its social culture in white populous states?
  • Give an overview of the English language in Nigeria
  • Give an overview of the English language in Uganda
  • Give an overview of the English language in India
  • Give an overview of Russian in Europe
  • Give a conceptual analysis on stress and how it works
  • Consider the means of vocabulary development and its role in cultural relationships
  • Examine the effects of Linguistics in language
  • Present your understanding of sign language
  • What do you understand about descriptive language and prescriptive Language?

List of Research Topics in English Language

You may need English research topics for your next research. These are topics that are socially crafted for you as a student of language in any institution. You can consider the following for in-depth analysis:

  • Examine the travail of women in any feminist text of your choice
  • Examine the movement of feminist literature in the Industrial period
  • Give an overview of five Gothic literature and what you understand from them
  • Examine rock music and how it emerged as a genre
  • Evaluate the cultural association with Nina Simone’s music
  • What is the relevance of Shakespeare in English literature?
  • How has literature promoted the English language?
  • Identify the effect of spelling errors in the academic performance of students in an institution of your choice
  • Critically survey a university and give rationalize the literary texts offered as Significant
  • Examine the use of feminist literature in advancing the course against patriarchy
  • Give an overview of the themes in William Shakespeare’s “Julius Caesar”
  • Express the significance of Ernest Hemingway’s diction in contemporary literature
  • Examine the predominant devices in the works of William Shakespeare
  • Explain the predominant devices in the works of Christopher Marlowe
  • Charles Dickens and his works: express the dominating themes in his Literature
  • Why is Literature described as the mirror of society?
  • Examine the issues of feminism in Sefi Atta’s “Everything Good Will Come” and Bernadine Evaristos’s “Girl, Woman, Other”
  • Give an overview of the stylistics employed in the writing of “Girl, Woman, Other” by Bernadine Evaristo
  • Describe the language of advertisement in social media and newspapers
  • Describe what poetic Language means
  • Examine the use of code-switching and code-mixing on Mexican Americans
  • Examine the use of code-switching and code-mixing in Indian Americans
  • Discuss the influence of George Orwell’s “Animal Farm” on satirical literature
  • Examine the Linguistics features of “Native Son” by Richard Wright
  • What is the role of indigenous literature in promoting cultural identities
  • How has literature informed cultural consciousness?
  • Analyze five literature on semantics and their Influence on the study
  • Assess the role of grammar in day to day communications
  • Observe the role of multidisciplinary approaches in understanding the English language
  • What does stylistics mean while analyzing medieval literary texts?
  • Analyze the views of philosophers on language, society, and culture

English Research Paper Topics for College Students

For your college work, you may need to undergo a study of any phenomenon in the world. Note that they could be Linguistics essay topics or mainly a research study of an idea of your choice. Thus, you can choose your research ideas from any of the following:

  • The concept of fairness in a democratic Government
  • The capacity of a leader isn’t in his or her academic degrees
  • The concept of discrimination in education
  • The theory of discrimination in Islamic states
  • The idea of school policing
  • A study on grade inflation and its consequences
  • A study of taxation and Its importance to the economy from a citizen’s perspectives
  • A study on how eloquence lead to discrimination amongst high school students
  • A study of the influence of the music industry in teens
  • An Evaluation of pornography and its impacts on College students
  • A descriptive study of how the FBI works according to Hollywood
  • A critical consideration of the cons and pros of vaccination
  • The health effect of sleep disorders
  • An overview of three literary texts across three genres of Literature and how they connect to you
  • A critical overview of “King Oedipus”: the role of the supernatural in day to day life
  • Examine the novel “12 Years a Slave” as a reflection of servitude and brutality exerted by white slave owners
  • Rationalize the emergence of racist Literature with concrete examples
  • A study of the limits of literature in accessing rural readers
  • Analyze the perspectives of modern authors on the Influence of medieval Literature on their craft
  • What do you understand by the mortality of a literary text?
  • A study of controversial Literature and its role in shaping the discussion
  • A critical overview of three literary texts that dealt with domestic abuse and their role in changing the narratives about domestic violence
  • Choose three contemporary poets and analyze the themes of their works
  • Do you believe that contemporary American literature is the repetition of unnecessary themes already treated in the past?
  • A study of the evolution of Literature and its styles
  • The use of sexual innuendos in literature
  • The use of sexist languages in literature and its effect on the public
  • The disaster associated with media reports of fake news
  • Conduct a study on how language is used as a tool for manipulation
  • Attempt a criticism of a controversial Literary text and why it shouldn’t be studied or sold in the first place

Finding Linguistics Hard To Write About?

With these topics, you can commence your research with ease. However, if you need professional writing help for any part of the research, you can scout here online for the best research paper writing service.

There are several expert writers on ENL hosted on our website that you can consider for a fast response on your research study at a cheap price.

As students, you may be unable to cover every part of your research on your own. This inability is the reason you should consider expert writers for custom research topics in Linguistics approved by your professor for high grades.

ecology topics

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Comment * Error message

Name * Error message

Email * Error message

Save my name, email, and website in this browser for the next time I comment.

As Putin continues killing civilians, bombing kindergartens, and threatening WWIII, Ukraine fights for the world's peaceful future.

Ukraine Live Updates

Linguistics

  • Getting Started
  • Journal Articles
  • Dissertations and Theses
  • Linguistic Corpora
  • Research Methods
  • Citation Management
  • Open Access Publishing Options

Subject Librarian

Profile Photo

Ask a Librarian!

In person: wells library, scholars' commons reference desk (east tower), email: [email protected] chat: libraries.indiana.edu/help.

Welcome to the linguistics research guide! This guide will connect you to resources available through the IU Libraries and open access resources on the web. While this guide is primarily intended for undergraduate and graduate students in liguistics, other students and early-career researchers in the social sciences may also find it useful.

Reference Resources for Linguists

Resource available to authorized IU Bloomington users (on or off campus)

Reference resource for the study of linguistics.

Bilingual dictionaries of English and French, Italian, Spanish, Russian, Chinese, and German.

Access to authoritative bibliographies in the field of linguistics. The study of linguistics makes connections across an array of scholarly concerns, from the humanities to many of the social and behavioral sciences to biology, physics, engineering, and medicine.

Claiming Your Scholarly Profile

If you are a graduate student or early-career researcher in economics, you may want to consider registering for ORCID and creating a Google Scholar Profile. Both of these steps will help you establish your online presence as a researcher and connect people to the important work you do!

ORCID is a persistent identifier that is unique to you. It helps connect you to all your scholarly works and distinguishes you from other researchers with the same name. It is increasingly being used by institutions, publishers, and funders to identify researchers. Visit the ORCID @ Indiana University guide to learn more about how to sign up for ORCID.

Google Scholar Profile

Setting up a Google Scholar Profile is an easy way to share a list of your publications and track citations to those publications over time. If you make your profile public, it will appear in Google Scholar when people search your name. Check out this page for guidance on how to set up your Google Scholar profile .

Accessing Resources Off Campus

When you are not on the IU Network or using a computer on campus in Bloomington, you can still access online resources subscribed to by IU Bloomington Libraries. This page has instructions for how to access library resources off campus .

  • Next: Journal Articles >>
  • Last Updated: Aug 30, 2024 5:29 PM
  • URL: https://guides.libraries.indiana.edu/linguistics

Social media

  • Instagram for Herman B Wells Library
  • Facebook for IU Libraries

Additional resources

Featured databases.

  • Resource available to authorized IU Bloomington users (on or off campus) OneSearch@IU
  • Resource available to authorized IU Bloomington users (on or off campus) Academic Search (EBSCO)
  • Resource available to authorized IU Bloomington users (on or off campus) ERIC (EBSCO)
  • Resource available to authorized IU Bloomington users (on or off campus) Nexis Uni
  • Resource available without restriction HathiTrust Digital Library
  • Databases A-Z
  • Resource available to authorized IU Bloomington users (on or off campus) Google Scholar
  • Resource available to authorized IU Bloomington users (on or off campus) JSTOR
  • Resource available to authorized IU Bloomington users (on or off campus) Web of Science
  • Resource available to authorized IU Bloomington users (on or off campus) Scopus
  • Resource available to authorized IU Bloomington users (on or off campus) WorldCat

IU Libraries

  • Diversity Resources
  • About IU Libraries
  • Alumni & Friends
  • Departments & Staff
  • Jobs & Libraries HR
  • Intranet (Staff)
  • IUL site admin

ORIGINAL RESEARCH article

Trends and hot topics in linguistics studies from 2011 to 2021: a bibliometric analysis of highly cited papers.

Sheng Yan

  • School of Foreign Languages, Central China Normal University, Wuhan, China

High citations most often characterize quality research that reflects the foci of the discipline. This study aims to spotlight the most recent hot topics and the trends looming from the highly cited papers (HCPs) in Web of Science category of linguistics and language & linguistics with bibliometric analysis. The bibliometric information of the 143 HCPs based on Essential Citation Indicators was retrieved and used to identify and analyze influential contributors at the levels of journals, authors, and countries. The most frequently explored topics were identified by corpus analysis and manual checking. The retrieved topics can be grouped into five general categories: multilingual-related , language teaching , and learning related , psycho/pathological/cognitive linguistics-related , methods and tools-related , and others . Topics such as bi/multilingual(ism) , translanguaging , language/writing development , models , emotions , foreign language enjoyment (FLE) , cognition , anxiety are among the most frequently explored. Multilingual and positive trends are discerned from the investigated HCPs. The findings inform linguistic researchers of the publication characteristics of the HCPs in the linguistics field and help them pinpoint the research trends and directions to exert their efforts in future studies.

1. Introduction

Citations, as a rule, exhibit a skewed distributional pattern over the academic publications: a few papers accumulate an overwhelming large citations while the majority are rarely, if ever, cited. Correspondingly, the highly cited papers (HCPs) receive the greatest amount of attention in the academia as citations are commonly regarded as a strong indicator of research excellence. For academic professionals, following HCPs is an efficient way to stay current with the developments in a field and to make better informed decisions regarding potential research topics and directions to exert their efforts. For academic institutions, government and private agencies, and generally the science policy makers, they keep a close eye on and take advantage of this visible indicator, citations, to make more informed decisions on research funding allocation and science policy formulation. Under the backdrop of ever-growing academic outputs, there is noticeable attention shift from publication quantity to publication quality. Many countries are developing research policies to identify “excellent” universities, research groups, and researchers ( Danell, 2011 ). In a word, HCPs showcase high-quality research, encompass significant themes, and constitute a critical reference point in a research field as they are “gold bullion of science” ( Smith, 2007 ).

2. Literature review

Bibliometrics, a term coined by Pritchard (1969) , refers to the application of mathematical methods to the analysis of academic publications. Essentially this is a quantitative method to depict publication patterns within a given field based on a body of literature. There are many bibliometric studies on natural and social sciences in general ( Hsu and Ho, 2014 ; Zhu and Lei, 2022 ) and on various specific disciplines such as management sciences ( Liao et al., 2018 ), biomass research ( Chen and Ho, 2015 ), computer sciences ( Xie and Willett, 2013 ), and sport sciences ( Mancebo et al., 2013 ; Ríos et al., 2013 ), etc. In these studies, researchers tracked developments, weighed research impacts, and highlighted emerging scientific fronts with bibliometric methods. In the field of linguistics, bibliometric studies all occurred in the past few years ( van Doorslaer and Gambier, 2015 ; Lei and Liao, 2017 ; Gong et al., 2018 ; Lei and Liu, 2018 , 2019 ). These bibliometric studies mostly examined a sub-area of linguistics, such as corpus linguistics ( Liao and Lei, 2017 ), translation studies ( van Doorslaer and Gambier, 2015 ), the teaching of Chinese as a second/foreign language ( Gong et al., 2018 ), academic journals like System ( Lei and Liu, 2018 ) or Porta Linguarum ( Sabiote and Rodríguez, 2015 ), etc. Although Lei and Liu (2019) took the entire discipline of linguistics under investigation, their research is exclusively focused on applied linguistics and restricted in a limited number of journals (42 journals in total), leaving publications in other linguistics disciplines and qualified journals unexamined.

Over the recent years, a number of studies have been concerned with “excellent” papers or HCPs. For example, Small (2004) surveyed the HCPs authors’ opinions on why their papers are highly cited. The strong interest, the novelty, the utility, and the high importance of the work were among the most frequently mentioned. Most authors also considered that their selected HCPs are indeed based on their most important work in their academic career. Aksnes (2003) investigated the characteristics of HCPs and found that they were generally authored by a large number of scientists, often involving international collaboration. Some researchers even attempted to predict the HCPs by building mathematical models, implying “the first mover advantage in scientific publication” ( Newman, 2008 , 2014 ). In other words, papers published earlier in a field generally are more likely to accumulate more citations than those published later. Although many papers addressed HCPs from different perspectives, they held a common belief that HCPs are very different from less or zero cited papers and thus deserve utmost attention in academic research ( Aksnes, 2003 ; Blessinger and Hrycaj, 2010 ; Yan et al., 2022 ).

Although an increased focus on research quality can be observed in different fields, opinions diverge on the range and the inclusion criterion of excellent papers. Are they ‘highly cited’, ‘top cited’, or ‘most frequently cited’ papers? Aksnes (2003) noted two different approaches to define a highly cited article, involving absolute or relative thresholds, respectively. An absolute threshold stipulates a minimum number of citations for identifying excellent papers while a relative threshold employs the percentile rank classes, for example, the top 10% most highly cited papers in a discipline or in a publication year or in a publication set. It is important to note that citations differ significantly in different fields and disciplines. A HCP in natural sciences generally accumulates more citations than its counterpart in social sciences. Thus, it is necessary to investigate HCPs from different fields separately or adopt different inclusion criterion to ensure a valid comparison.

The present study has been motivated by two considerations. First, the sizable number of publications of varied qualities in a scientific field makes it difficult or even impossible to conduct any reliable and effective literature research. Focusing on the quality publications, the HCPs in particular, might lend more credibility to the findings on trends. Second, HCPs can serve as a great platform to discover potentially important information for the development of a discipline and understand the past, present, and future of the scientific structure. Therefore, the present study aims to investigate the hot topics and publication trends in the Web of Science category of linguistics or language & linguistics (shortened as linguistics in later references) with bibliometric methods. The study aims to answer the following three questions:

1. Who are the most productive and impactful contributors of the HCPs in WoS category of linguistics or language & linguistics in terms of publication venues, authors, and countries?

2. What are the most frequently explored topics in HCPs?

3. What are the general research trends revealed from the HCPs?

3. Materials and methods

Different from previous studies which used an arbitrary inclusion threshold (e.g., Blessinger and Hrycaj, 2010 ; Hsu and Ho, 2014 ), we rely on Essential Science Indicator (ESI) to identify the HCPs. Developed by Clarivate, a leading company in the areas of bibliometrics and scientometrics, ESI reveals emerging science trends as well as influential individuals, institutions, papers, journals, and countries in any scientific fields of inquiry by drawing on the complete WoS databases. ESI has been chosen for the following three reasons. First, ESI adopts a stricter inclusion criterion for HCPs identification. That is, a paper is selected as a HCP only when its citations exceed the top 1% citation threshold in each of the 22 ESI subject categories. Second, ESI is widely used and recognized for its reliability and authority in identifying the top-charting work, generating “excellent” metrics including hot and highly cited papers. Third, ESI automatically updates its database to generate the most recent HCPs, especially suitable for trend studies for a specified timeframe.

3.1. Data source

The data retrieval was completed at the portal of our university library on June 20, 2022. The methods to retrieve the data are described in Table 1 . The bibliometric indicators regarding the important contributors at journal/author/country levels were obtained. Specifically, after the research was completed, we clicked the “Analyze Results” bar on the result page for the detailed descriptive analysis of the retrieved bibliometric data.

www.frontiersin.org

Table 1 . Retrieval strategies.

Several points should be noted about the search strategies. First, we searched the bibliometric data from two sub-databases of WoS core collection: Social Science Citation Index (SSCI) and Arts & Humanities Citation Index (A&HCI). There is no need to include the sub-database of Science Citation Index Expanded (SCI-EXPANDED) because publications in the linguistics field are almost exclusively indexed in SSCI and A&HCI journals. WoS core collection was chosen as the data source because it boasts one of the most comprehensive and authoritative databases of bibliometric information in the world. Many previous studies utilized WoS to retrieve bibliometric data. van Oorschot et al. (2018) and Ruggeri et al. (2019) even indicated that WoS meets the highest standards in terms of impact factor and citation counts and hence guarantees the validity of any bibliometric analysis. Second, we do not restrict the document types as HCPs selection informed by ESI only considers articles and reviews. Third, we do not set the date range as the dataset of ESI-HCPs is automatically updated regularly to include the most recent 10 years of publications.

The aforementioned query obtained a total of 143 HCPs published in 48 journals contributed by 352 authors of 226 institutions. We then downloaded the raw bibliometric parameters of the 143 HCPs for follow-up analysis including publication years, authors, publication titles, countries, affiliations, abstracts, citation reports, etc. A complete list of the 143 HCPs can be found in the Supplementary Material . We collected the most recent impact factor (IF) of each journal from the 2022 Journal Citation Reports (JCR).

3.2. Data analysis

3.2.1. citation analysis.

A citation threshold is the minimum number of citations obtained by ranking papers in a research field in descending order by citation counts and then selecting the top fraction or percentage of papers. In ESI, the highly cited threshold reveals the minimum number of citations received by the top 1% of papers from each of the 10 database years. In other words, a paper has to meet the minimum citation threshold that varies by research fields and by years to enter the HCP list. Of the 22 research fields in ESI, Social Science, General is a broad field covering a number of WoS categories including linguistics and language & linguistics . We checked the ESI official website to obtain the yearly highly cited thresholds in the research field of Social Science , General as shown in Figure 1 ( https://esi.clarivate.com/ThresholdsAction.action ). As we can see, the longer a paper has been published, the more citations it has to receive to meet the threshold. We then divided the raw citation numbers of HCPs with the Highly Cited Thresholds in the corresponding year to obtain the normalized citations for each HCP.

www.frontiersin.org

Figure 1 . Highly cited thresholds in the research field of Social Sciences, General.

3.2.2. Corpus analysis and manual checking

To determine the most frequently explored topics in these HCPs, we used both corpus-based analysis of word frequency and manual checking. Specifically, the more frequently a word or phrase occurs in a specifically designed corpus, the more likely it constitutes a research topic. In this study, we built an Abstract corpus with all the abstracts of the 143 HCPs, totaling 24,800 tokens. The procedures to retrieve the research topics in the Abstract corpus were as follows. First, the 143 pieces of abstracts were saved as separate.txt files in one folder. Second, AntConc ( Anthony, 2022 ), a corpus analysis tool for concordancing and text analysis, was employed to extract lists of n-grams (2–4) in decreasing order of frequency. We also generated a list of individual nouns because sometimes individual nouns can also constitute research topics. Considering our small corpus data, we adopted both frequency (3) and range criteria (3) for topic candidacy. That is, a candidate n-gram must occur at least 3 times and in at least 3 different abstract files. The frequency threshold guarantees the importance of the candidate topics while the range threshold guarantees that the topics are not overly crowded in a few number of publications. In this process, we actually tested the frequency and range thresholds several rounds for the inclusion of all the potential topics. In total, we obtained 531 nouns, 1,330 2-grams, 331 3-grams, and 81 4-grams. Third, because most of the retrieved n-grams cannot function as meaningful research topics, we manually checked all the candidate items and discussed extensively to decide their roles as potential research topics until full agreements were reached. Finally, we read all the abstracts of the 143 HCPs to further validate their roles as research topics. In the end, we got 118 topic items in total.

4.1. Main publication venues of HCPs

Of the 48 journals which published the 143 HCPs, 17 journals have contributed at least 3 HCPs ( Table 2 ), around 71.33% of the total examined HCPs (102/143), indicating that HCPs tend to be highly concentrated in a limited number of journals. The three largest publication outlets of HCPs are Bilingualism Language and Cognition (16), International Journal of Bilingual Education and Bilingualism (11), and Modern Language Journal (10). Because each journal varies greatly in the number of papers published per year and the number of HCPs is associated with journal circulations, we divided the total number of papers (TP) in the examined years (2011–2021) with the number of the HCPs to acquire the HCP percentage for each journal (HCPs/TP). The three journals with the highest HCPs/TP percentage are Annual Review of Applied Linguistics (2.26), Modern Language Journal (2.08), and Bilingualism Language and Cognition (1.74), indicating that papers published in these journals have a higher probability to enter the HCPs list.

www.frontiersin.org

Table 2 . Top 17 publication venues of HCPs.

In terms of the general impact of the HCPs from each journal, we divided the number of HCPs with their total citations (TC) to obtain the average citations for each HCP (TC/HCP). The three journals with the highest TC/HCP are Journal of Memory and Language (837.86), Computational Linguistics (533.75), and Journal of Pragmatics (303.75). It indicates that even in the same WoS category, HCPs in different journals have strikingly different capability to accumulate citations. For example, the TC/HCP in System is as low as 31.73, which is even less than 4% of the highest TC/HCP in Journal of Memory and Language .

In regards to the latest journal impact factor (IF) in 2022, the top four journals with the highest IF are Computational Linguistics (7.778) , Modern Language Journal (7.5), Computer Assisted Language Learning (5.964), and Language Learning (5.24). According to the Journal Citation Reports (JCR) quantile rankings in WoS category of linguistics , all the journals on the list belong to the Q 1 (the top 25%), indicating that contributors are more likely to be attracted to contribute and cite papers in these prestigious high impact journals.

4.2. Authors of HCPs

A total of 352 authors had their names listed in the 143 HCPs, of whom 33 authors appeared in at least 2 HCPs as shown in Table 3 . We also provided in Table 3 other indicators to evaluate the authors’ productivity and impact including the total number of citations (TC), the number of citations per HCP, and the number of First author or Corresponding author HCPs (FA/CA). The reason we include the FA/CA indicator is that first authors and corresponding authors are usually considered to contribute the most and should receive greater proportion of credit in academic publications ( Marui et al., 2004 ; Dance, 2012 ).

www.frontiersin.org

Table 3 . Authors with at least 2 HCPs.

In terms of the number of HCPs, Dewaele JM from Birkbeck Univ London tops the list with 7 HCPs with total citations of 492 (TC = 492), followed by Li C from Huazhong Univ Sci & Technol (#HCPs = 5; TC = 215) and Saito K from UCL (#HCPs = 5; TC = 576). It is to be noted that both Li C and Saito K have close academic collaborations with Dewaele JM . For example, 3 of the 5 HCPs by Li C are co-authored with Dewaele JM . The topics in their co-authored HCPs are mostly about foreign language learning emotions such as boredom , anxiety , enjoyment , the measurement , and positive psychology .

In regards to TC, Li, W . from UCL stands out as the most influential scholar among all the listed authors with total citations of 956 from 2 HCPs, followed by Norton B from Univ British Columbia (TC = 915) and Vasishth S from Univ Potsdam (TC = 694). The average citations per HCP from them are also the highest among the listed authors (478, 305, 347, respectively). It is important to note that Li, W.’ s 2 HCPs are his groundbreaking works on translanguaging which almost become must-reads for anyone who engages in translanguaging research ( Li, 2011 , 2018 ). Besides, Li, W. single authors his 2 HCPs, which is extremely rare as HCPs are often the results from multiple researchers. Norton B ’s HCPs are exploring some core issues in applied linguistics such as identity and investment , language learning , and social change that are considered the foundational work in its field ( Norton and Toohey, 2011 ; Darvin and Norton, 2015 ).

From the perspective of FA/CA papers, Li C from Huazhong Univ Sci and Technol is prominent because she is the first author of all her 5 HCPs. Her research on language learning emotions in the Chinese context is gaining widespread recognition ( Li et al., 2018 , 2019 , 2021 ; Li, 2019 , 2021 ). However, as a newly emerging researcher, most of her HCPs are published in the very recent years and hence accumulate relatively fewer citations (TC = 215). Mondada L from Univ Basel follows closely and single authors her 3 HCPs. Her work is mostly devoted to conversation analysis , multimodality , and social interaction ( Mondada, 2016 , 2018 , 2019 ).

We need to mention the following points regarding the productive authors of HCPs. First, when we calculated the number of HCPs from each author, only the papers published in the journals indexed in the investigated WoS categories were taken in account ( linguistics; language & linguistics ), which came as a compromise to protect the linguistics oriented nature of the HCPs. For example, Brysbaert M from Ghent University claimed a total of 8 HCPs at the time of the data retrieval, of which 6 HCPs were published in WoS category of psychology and more psychologically oriented, hence not included in our study. Besides, all the authors on the author list were treated equally when we calculated the number of HCPs, disregarding the author ordering. That implies that some influential authors may not be able to enter the list as their publications are comparatively fewer. Second, as some authors reported different affiliations at their different career stages, we only provide their most recent affiliation for convenience. Third, it is highly competitive to have one’s work selected as HCPs. The fact that a majority of the HCPs authors do not appear in our productive author list does not diminish their great contributions to this field. The rankings in Table 3 does not necessarily reflect the recognition authors have earned in academia at large.

4.3. Productive countries of HCPs

In total, the 143 HCPs originated from 33 countries. The most productive countries that contributed at least three HCPs are listed in Table 4 . The USA took an overwhelming lead with 59 HCPs, followed distantly by England with 31 HCPs. They also boasted the highest total citations (TC = 15,770; TC = 9,840), manifesting their high productivity and strong influence as traditional powerhouses in linguistics research. In regards to the average citations per HCP, Germany , England and the USA were the top three countries (TC/HCP = 281.67, 281.14, and 267.29, respectively). Although China held the third position with 19 HCPs published, its TC/HCP is the third from the bottom (TC/HCP = 66.84). One of the important reasons is that 13 out of the 19 HCPs contributed by scholars in China are published in the year of 2020 or 2021. The newly published HCPs may need more time to accumulate citations. Besides, 18 out of the 19 HCPs in China are first author and/or corresponding authors, indicating that scholars in China are becoming more independent and gaining more voice in English linguistics research.

www.frontiersin.org

Table 4 . Top 18 countries with at least 3 HCPs.

Two points should be noted here as to the productive countries. First, we calculated the HCP contributions from the country level instead of the region level. In other words, HCP contributions from different regions of the same country will be combined in the calculation. For example, HCPs from Scotland were added to the HCPs from England . HCPs from Hong Kong , Macau , and Taiwan are put together with the HCPs from Mainland China . In this way, a clear picture of the HCPs on the country level can be painted. Second, we manually checked the address information of the first author and corresponding author for each HCP. There are some cases where the first author or the corresponding author may report affiliations from more than one country. In this case, every country in their address list will be treated equally in the FA/CA calculation. In other word, a HCP may be classified into more than one country because of the different country backgrounds of the first and/or the corresponding author.

4.4. Top 20 HCPs

The top 20 HCPs with the highest normed citations are listed in decreasing order in Table 5 . The top cited publications can guide us to better understand the development and research topics in recent years.

www.frontiersin.org

Table 5 . Top 20 HCPs.

By reading the titles and the abstracts of these top HCPs, we categorized the topics of the 20 HCPs into the following five groups: (i) statistical and analytical methods in (psycho)linguistics such as sentimental analysis, sentence simplification techniques, effect sizes, linear mixed models (#1, 3, 4, 6, 9, 14), (ii) language learning/teaching emotions such enjoyment, anxiety, boredom, stress (#11, 15, 16, 18, 19), (iii) translanguaging or multilinguilism (#5, 13, 20, 17), (iv) language perception (#2, 7, 10), (v) medium of instruction (#8, 12). It is no surprise that 6 out of the top 20 HCPs are about statistical methods in linguistics because language researchers aspire to employ statistics to make their research more scientific. Besides, we noticed that the papers on language teaching/learning emotions on the list are all published in the year of 2020 and 2021, indicating that these emerging topics may deserve more attention in future research. We also noticed two Covid-19 related articles (#16, 19) explored the emotions teachers and students experience during the pandemic, a timely response to the urgent need of the language learning and teaching community.

It is of special interest to note that papers from the journals indexed in multiple JCR categories seem to accumulate more citations. For example, Journal of Memory and Language , American Journal of Speech-Language Pathology , and Computational Linguistics are indexed both in SSCI and SCIE and contribute the top 4 HCPs, manifesting the advantage of these hybrid journals in amassing citations compared to the conventional language journals. Besides, different to findings from Yan et al. (2022) that most of the top HCPs in the field of radiology are reviews in document types, 19 out of the top 20 HCPs are research articles instead of reviews except Macaro et al. (2018) .

4.5. Most frequently explored topics of HCPs

After obtaining the corpus based topic items, we read all the titles and abstracts of the 143 HCPs to further validate their roles as research topics. Table 6 presents the top research topics with the observed frequency of 5 or above. We grouped these topics into five broad categories: bilingual-related, language learning/teaching-related, psycho/pathological/cognitive linguistics-related, methods and tools-related, and others . The observed frequency count for each topic in the abstract corpus were included in the brackets. We found that about 34 of the 143 HCPs are exploring bilingual related issues, the largest share among all the categorized topics, testifying its academic popularity in the examined timespan. Besides, 30 of the 143 HCPs are investigating language learning/teaching-related issues, with topics ranging from learners (e.g., EFL learners, individual difference) to multiple learning variables (e.g., learning strategy, motivation, agency). The findings here will be validated by the analysis of the keywords.

www.frontiersin.org

Table 6 . Categorization of the most explored research topics.

Several points should be mentioned regarding the topic candidacy. First, for similar topic expressions, we used a cover term and added the frequency counts. For example, multilingualism is a cover term for bilinguals, bilingualism, plurilingualism, and multilingualism . Second, for nouns of singular and plural forms (e.g., emotion and emotions ) or for items with different spellings (e.g., meta analysis and meta analyses ), we combined the frequency counts. Third, we found that some longer items (3 grams and 4 grams) could be subsumed to short ones (2 grams or monogram) without loss of essential meaning (e.g., working memory from working memory capacity ). In this case, the shorter ones were kept for their higher frequency. Fourth, some highly frequent terms were discarded because they were too general to be valuable topics in language research, for example, applied linguistics , language use , second language .

5. Discussion and implications

Based on 143 highly cited papers collected from the WoS categories of linguistics , the present study attempts to present a bird’s eye view of the publication landscape and the most updated research themes reflected from the HCPs in the linguistics field. Specifically, we investigated the important contributors of HCPs in terms of journals, authors and countries. Besides, we spotlighted the research topics by corpus-based analysis of the abstracts and a detailed analysis of the top HCPs. The study has produced several findings that bear important implications.

The first finding is that the HCPs are highly concentrated in a limited journals and countries. In regards to journals, those in the spheres of bilingualism and applied linguistics (e.g., language teaching and learning) are likely to accumulate more citations and hence to produce more HCPs. Journals that focus on bilingualism from a linguistic, psycholinguistic, and neuroscientific perspective are the most frequent outlets of HCPs as evidenced by the top two productive journals of HCPs, Bilingualism Language and Cognition and International Journal of Bilingual Education and Bilingualism . This can be explained by the multidisciplinary nature of bilingual-related research and the development of cognitive measurement techniques. The merits of analyzing publication venues of HCPs are two folds. One the one hand, it can point out which sources of high-quality publications in this field can be inquired for readers as most of the significant and cutting-edge achievements are concentrated in these prestigious journals. On the other hand, it also provides essential guidance or channels for authors or contributors to submit their works for higher visibility.

In terms of country distributions, the traditional powerhouses in linguistics research such as the USA and England are undoubtedly leading the HCP publications in both the number and the citations of the HCPs. However, developing countries are also becoming increasing prominent such as China and Iran , which could be traceable in the funding and support of national language policies and development policies as reported in recent studies ( Ping et al., 2009 ; Lei and Liu, 2019 ). Take China as an example. Along with economic development, China has given more impetus to academic outputs with increased investment in scientific research ( Lei and Liao, 2017 ). Therefore, researchers in China are highly motivated to publish papers in high-quality journals to win recognition in international academia and to deal with the publish or perish pressure ( Lee, 2014 ). These factors may explain the rise of China as a new emerging research powerhouse in both natural and social sciences, including English linguistics research.

The second finding is the multilingual trend in linguistics research. The dominant clustering of topics regarding multilingualism can be understood as a timely response to the multilingual research fever ( May, 2014 ). 34 out of the 143 HCPs have such words as bilingualism, bilingual, multilingualism , translanguaging , etc., in their titles, reflecting a strong multilingual tendency of the HCPs. Multilingual-related HCPs mainly involve three aspects: multilingualism from the perspectives of psycholinguistics and cognition (e.g., Luk et al., 2011 ; Leivada et al., 2020 ); multilingual teaching (e.g., Schissel et al., 2018 ; Ortega, 2019 ; Archila et al., 2021 ); language policies related to multilingualism (e.g., Shen and Gao, 2018 ). As a pedagogical process initially used to describe the bilingual classroom practice and also a frequently explored topic in HCPs, translanguaging is developed into an applied linguistics theory since Li’s Translanguaging as a Practical Theory of Language ( Li, 2018 ). The most common collocates of translanguaging in the Abstract corpus are pedagogy/pedagogies, practices, space/spaces . There are two main reasons for this multilingual turn. First, the rapid development of globalization, immigration, and overseas study programs greatly stimulate the use and research of multiple languages in different linguistic contexts. Second, in many non-English countries, courses are delivered through languages (mostly English) besides their mother tongue ( Clark, 2017 ). Students are required to use multiple languages as resources to learn and understand subjects and ideas. The burgeoning body of English Medium Instruction literature in higher education is in line with the rising interest in multilingualism. Due to the innate multidisciplinary nature, it is to be expected that, multilingualism, the topic du jour, is bound to attract more attention in the future.

The third finding is the application of Positive Psychology (PP) in second language acquisition (SLA), that is, the positive trend in linguistic research. In our analysis, 20 out of 143 HCPs have words or phrases such as emotions, enjoyment, boredom, anxiety , and positive psychology in their titles, which might signal a shift of interest in the psychology of language learners and teachers in different linguistic environments. Our study shows Foreign language enjoyment (FLE) is the most frequently explored emotion, followed by foreign language classroom anxiety (FLCA), the learners’ metaphorical left and right feet on their journey to acquiring the foreign language ( Dewaele and MacIntyre, 2016 ). In fact, the topics of PP are not entirely new to SLA. For example, studies of language motivations, affections, and good language learners all provide roots for the emergence of PP in SLA ( Naiman, 1978 ; Gardner, 2010 ). In recent years, both research and teaching applications of PP in SLA are building rapidly, with a diversity of topics already being explored such as positive education and PP interventions. It is to be noted that SLA also feeds back on PP theories and concepts besides drawing inspirations from it, which makes it “an area rich for interdisciplinary cross-fertilization of ideas” ( Macintyre et al., 2019 ).

It should be noted that subjectivity is involved when we decide and categorize the candidate topic items based on the Abstract corpus. However, the frequency and range criteria guarantee that these items are actually more explored in multiple HCPs, thus indicating topic values for further investigation. Some high frequent n-grams are abandoned because they are too general or not meaningful topics. For example, applied linguistics is too broad to be included as most of the HCPs concern issues in this research line instead of theoretical linguistics. By meaningful topics, we mean that the topics can help journal editors and readers quickly locate their interested fields ( Lei and Liu, 2019 ), as the author keywords such as bilingualism , emotions , and individual differences . The examination of the few 3/4-grams and monograms (mostly nouns) revealed that most of them were either not meaningful topics or they could be subsumed in the 2-grams. Besides, there is inevitably some overlapping in the topic categorizations. For example, some topics in the language teaching and learning category are situated and discussed within the context of multilingualism. The merits of topic categorizations are two folds: to better monitor the overlapping between the Abstract corpus-based topic items and the keywords; to roughly delineate the research strands in the HCPs for future research.

It should also be noted that all the results were based on the retrieved HCPs only. The study did not aim to paint a comprehensive and full picture of the whole landscape of linguistic research. Rather, it specifically focused on the most popular literature in a specified timeframe, thus generating the snapshots or trends in linguistic research. One of the important merits of this methodology is that some newly emerging but highly cited researchers can be spotlighted and gain more academic attention because only the metrics of HCPs are considered in calculation. On the contrary, the exclusion of some other highly cited researchers in general such as Rod Ellis and Ken Hyland just indicates that their highly cited publications are not within our investigated timeframe and cannot be interpreted as their diminishing academic influence in the field. Besides, the study does not consider the issue of collaborators or collaborations in calculating the number of HCPs for two reasons. First, although some researchers are regular collaborators such as Li CC and Dewaele JM, their individual contribution can never be undermined. Second, the study also provides additional information about the number of the FA/CA HCPs from each listed author, which may aid readers in locating their interested research.

We acknowledge that our study has some limitations that should be addressed in future research. First, our study focuses on the HCPs extracted from WoS SSCI and A&HCI journals, the alleged most celebrated papers in this field. Future studies may consider including data from other databases such as Scopus to verify the findings of the present study. Second, our Abstract corpus-based method for topic extraction involved human judgement. Although the final list was the result of several rounds of discussions among the authors, it is difficult or even impossible to avoid subjectivity and some worthy topics may be unconsciously missed. Therefore, future research may consider employing automatic algorithms to extract topics. For example, a dependency-based machine learning approach can be used to identify research topics ( Zhu and Lei, 2021 ).

Data availability statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/ supplementary material .

Author contributions

SY: conceptualization and methodology. SY and LZ: writing-review and editing and writing-original draft. All authors contributed to the article and approved the submitted version.

This work was supported by Humanities and Social Sciences Youth Fund of China MOE under the grant 20YJC740076 and 18YJC740141.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg.2022.1052586/full#supplementary-material

Aksnes, D. W. (2003). Characteristics of highly cited papers. Res. Eval. 12, 159–170. doi: 10.3152/147154403781776645

CrossRef Full Text | Google Scholar

Anthony, L. (2022). AntConc (version 4.0.5) Tokyo, Japan: Waseda University. Available at: https://www.laurenceanthony.net/software (Accessed June 20, 2022).

Google Scholar

Archila, P. A., Molina, J., and Truscott de Mejía, A.-M. (2021). Fostering bilingual scientific writing through a systematic and purposeful code-switching pedagogical strategy. Int. J. Biling. Educ. Biling. 24, 785–803. doi: 10.1080/13670050.2018.1516189

Blessinger, K., and Hrycaj, P. (2010). Highly cited articles in library and information science: an analysis of content and authorship trends. Libr. Inf. Sci. Res. 32, 156–162. doi: 10.1016/j.lisr.2009.12.007

Chen, H., and Ho, Y. S. (2015). Highly cited articles in biomass research: a bibliometric analysis. Renew. Sust. Energ. Rev. 49, 12–20. doi: 10.1016/j.rser.2015.04.060

Clark, S. (2017). Translanguaging in higher education: beyond monolingual ideologies. Int. J. Biling. Educ. Biling. 22, 1048–1051. doi: 10.1080/13670050.2017.1322568

Dance, A. (2012). Authorship: Who’s on first? Nature 489, 591–593. doi: 10.1038/nj7417-591a

PubMed Abstract | CrossRef Full Text | Google Scholar

Danell, R. (2011). Can the quality of scientific work be predicted using information on the author’s track record? J. Am. Soc. Inf. Sci. Technol. 62, 50–60. doi: 10.1002/asi.21454

Darvin, R., and Norton, B. (2015). Identity and a model of Investment in Applied Linguistics. Annu. Rev. Appl. Linguist. 35, 36–56. doi: 10.1017/S0267190514000191

Dewaele, J.-M., and MacIntyre, P. D. (2016). “Foreign language enjoyment and foreign language classroom anxiety: the right and left feet of the language learner” in Positive psychology in SLA . eds. D. M. Peter, G. Tammy, and M. Sarah (Bristol, Blue Ridge Summit: Multilingual Matters), 215–236.

Gardner, R. (2010). Motivation and second language acquisition: The socio-educational model . New York: Peter Lang.

Gong, Y., Lyu, B., and Gao, X. (2018). Research on teaching Chinese as a second or foreign language in and outside mainland China: a bibliometric analysis. Asia Pac. Educ. Res. 27, 277–289. doi: 10.1007/s40299-018-0385-2

Hsu, Y., and Ho, Y. S. (2014). Highly cited articles in health care sciences and services field in science citation index Expanded. Methods Inf. Med. 53, 446–458. doi: 10.3414/ME14-01-0022

Lee, I. (2014). Publish or perish: the myth and reality of academic publishing. Lang. Teach. 47, 250–261. doi: 10.1017/S0261444811000504

Lei, L., and Liao, S. (2017). Publications in linguistics journals from mainland China, Hong Kong, Taiwan, and Macau (2003–2012): a bibliometric analysis. J. Quant. Ling. 24, 54–64. doi: 10.1080/09296174.2016.1260274

Lei, L., and Liu, D. (2018). The research trends and contributions of System’s publications over the past four decades (1973–2017): a bibliometric analysis. System 80, 1–13. doi: 10.1016/j.system.2018.10.003

Lei, L., and Liu, D. (2019). Research trends in applied linguistics from 2005 to 2016: a bibliometric analysis and its implications. Appl. Linguis. 40, 540–561. doi: 10.1093/applin/amy003

Leivada, E., Westergaard, M., Duabeitia, J. A., and Rothman, J. (2020). On the phantom-like appearance of bilingualism effects on neurocognition: (how) should we proceed? Biling. Lang. Congn. 24, 197–210. doi: 10.1017/S1366728920000358

Li, W. (2011). Moment analysis and translanguaging space: discursive construction of identities by multilingual Chinese youth in Britain. Energy Fuel 43, 1222–1235. doi: 10.1016/j.pragma.2010.07.035

Li, W. (2018). Translanguaging as a practical theory of language. Appl. Linguis. 39, 9–30. doi: 10.1093/applin/amx039

Li, C. (2019). A positive psychology perspective on Chinese EFL students’ trait emotional intelligence, foreign language enjoyment and EFL learning achievement. J. Multiling. Multicult. Dev. 41, 246–263. doi: 10.1080/01434632.2019.1614187

Li, C. (2021). A control-value theory approach to boredom in English classes among university students in China. Mod. Lang. J. 105, 317–334. doi: 10.1111/modl.12693

Li, C., Dewaele, J. M., and Hu, Y. (2021). Foreign language learning boredom: conceptualization and measurement. Appl. Ling. Rev. doi: 10.1515/applirev-2020-0124

Li, C., Dewaele, J. M., and Jiang, G. (2019). The complex relationship between classroom emotions and EFL achievement in China. Appl. Ling. Rev. 11, 485–510. doi: 10.1515/applirev-2018-0043

Li, C., Jiang, G., and Jean-Marc, D. (2018). Understanding Chinese high school students’ foreign language enjoyment: validation of the Chinese version of the foreign language enjoyment scale. System 76, 183–196. doi: 10.1016/j.system.2018.06.004

Liao, S., and Lei, L. (2017). What we talk about when we talk about corpus: a bibliometric analysis of corpus-related research in linguistics (2000-2015). Glottometrics 38, 1–20.

Liao, H., Tang, M., Li, Z., and Lev, B. (2018). Bibliometric analysis for highly cited papers in operations research and management science from 2008 to 2017 based on essential science indicators. Omega 88, 223–236. doi: 10.1016/j.omega.2018.11.005

Luk, G., Sa, E. D., and Bialystok, E. (2011). Is there a relation between onset age of bilingualism and enhancement of cognitive control? Biling. Lang. Cogn. 14, 588–595. doi: 10.1017/S1366728911000010

Macaro, E., Curle, S., Pun, J., and Dearden, J. (2018). A systematic review of English medium instruction in higher education. Lang. Teach. 51, 36–76. doi: 10.1017/S0261444817000350

Macintyre, P., Gregersen, T., and Mercer, S. (2019). Setting an agenda for positive psychology in SLA: theory, practice, and research. Mod. Lang. J. 103, 262–274. doi: 10.1111/modl.12544

Mancebo, F. P., Sapena, A. F., Herrera, M. V., González, L., Toca, H., and Benavent, R. A. (2013). Scientific literature analysis of judo in web of science. Arch. Budo 9, 81–91. doi: 10.12659/AOB.883883

Marui, M., Bozikov, J., Katavi, V., Hren, D., Kljakovi-Gapi, M., and Marui, A. (2004). Authorship in a small medical journal: a study of contributorship statements by corresponding authors. Sci. Eng. Ethics 10, 493–502. doi: 10.1007/s11948-004-0007-7

May, S. (2014). The multilingual turn: Implications for SLA, TESOL and bilingual education . New York: Routledge.

Mondada, L. (2016). Challenges of multimodality: language and the body in social interaction. J. Socioling. 20, 336–366. doi: 10.1111/josl.1_12177

Mondada, L. (2018). Multiple temporalities of language and body in interaction: challenges for transcribing multimodality. Res. Lang. Soc. Interact. 51, 85–106. doi: 10.1080/08351813.2018.1413878

Mondada, L. (2019). Contemporary issues in conversation analysis: embodiment and materiality, multimodality and multisensoriality in social interaction. J. Pragmat. 145, 47–62. doi: 10.1016/j.pragma.2019.01.016

Naiman, N. (1978). The good language learner . Clevedon, UK: Multilingual Matters.

Newman, M. (2008). The first-mover advantage in scientific publication. Eplasty 86, 68001–68006. doi: 10.1209/0295-5075/86/68001

Newman, M. (2014). Prediction of highly cited papers. Eplasty 105:28002. doi: 10.1209/0295-5075/105/28002

Norton, B., and Toohey, K. (2011). Identity, language learning, and social change. Lang. Teach. 44, 412–446. doi: 10.1017/S0261444811000309

Ortega, L. (2019). SLA and the study of equitable multilingualism. Mod. Lang. J. 103, 23–38. doi: 10.1111/modl.12525

Ping, Z., Thijs, B., and Glnzel, W. (2009). Is China also becoming a giant in social sciences? Scientometrics 79, 593–621. doi: 10.1007/s11192-007-2068-x

Pritchard, A. (1969). Statistical bibliography or bibliometrics. J. Doc. 25, 348–349.

Ríos, L. J. C., Tamao, I. M., and Olmos, J. (2013). Bibliometric study (1922-2009) on rugby articles in research journals. South Afr. J. Res. Sport Phys. Educ. Rec. 17, 313–109. doi: 10.3176/tr.2013.3.06

Ruggeri, G., Orsi, L., and Corsi, S. (2019). A bibliometric analysis of the scientific literature on Fairtrade labelling. Int. IJC 43, 134–152. doi: 10.1111/ijcs.12492

Sabiote, C. R., and Rodríguez, J. A. (2015). Bibliometric study and methodological quality indicators of the journal porta Linguarum during six year period 2008-2013. Porta Ling. 24, 135–150. doi: 10.30827/Digibug.53866

Schissel, J. L., De Korne, H., and López-Gopar, M. E. (2018). Grappling with translanguaging for teaching and assessment in culturally and linguistically diverse contexts: teacher perspectives from Oaxaca, Mexico. Int. J. Biling. Educ. Biling. 24, 340–356. doi: 10.1080/13670050.2018.1463965

Shen, Q., and Gao, X. (2018). Multilingualism and policy making in greater China: ideological and implementational spaces. Lang. Policy 18, 1–16. doi: 10.1007/s10993-018-9473-7

Small, H. (2004). Why authors think their papers are highly cited. Scientometrics 60, 305–316. doi: 10.1023/B:SCIE.0000034376.55800.18

Smith, D. R. (2007). The New Zealand timber economy, 1840–1935. N. Z. Med. J. 120, U2871–U2313. doi: 10.1016/0305-7488(90)90044-C

van Doorslaer, L., and Gambier, Y. (2015). Measuring relationships in translation studies. On affiliations and keyword frequencies in the translation studies bibliography. Perspectives 23, 305–319. doi: 10.1080/0907676X.2015.1026360

van Oorschot, J. A. W. H., Hofman, E., and Halman, J. (2018). A bibliometric review of the innovation adoption literature. Technol. Forecast. Soc. Chang. 134, 1–21. doi: 10.1016/j.techfore.2018.04.032

Xie, Z., and Willett, P. (2013). The development of computer science research in the People’s republic of China 2000–2009: a bibliometric study. Inf. Dev. 29, 251–264. doi: 10.1177/0266666912458515

Yan, S., Zhang, H., and Wang, J. (2022). Trends and hot topics in radiology, nuclear medicine and medical imaging from 2011–2021: a bibliometric analysis of highly cited papers. Jpn. J. Radiol. 40, 847–856. doi: 10.1007/s11604-022-01268-z

Zhu, H., and Lei, L. (2021). A dependency-based machine learning approach to the identification of research topics: a case in COVID-19 studies. Lib. Hi Tech 40, 495–515. doi: 10.1108/LHT-01-2021-0051

Zhu, H., and Lei, L. (2022). The research trends of text classification studies (2000–2020): a bibliometric analysis. SAGE Open 12, 215824402210899–215824402210816. doi: 10.1177%2F21582440221089963

Keywords: bibliometric analysis, linguistics, highly cited papers, corpus analysis, research trends

Citation: Yan S and Zhang L (2023) Trends and hot topics in linguistics studies from 2011 to 2021: A bibliometric analysis of highly cited papers. Front. Psychol . 13:1052586. doi: 10.3389/fpsyg.2022.1052586

Received: 24 September 2022; Accepted: 23 December 2022; Published: 11 January 2023.

Reviewed by:

Copyright © 2023 Yan and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Le Zhang, ✉ [email protected]

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 28 August 2024

AI generates covertly racist decisions about people based on their dialect

  • Valentin Hofmann   ORCID: orcid.org/0000-0001-6603-3428 1 , 2 , 3 ,
  • Pratyusha Ria Kalluri 4 ,
  • Dan Jurafsky   ORCID: orcid.org/0000-0002-6459-7745 4 &
  • Sharese King 5  

Nature ( 2024 ) Cite this article

89 Altmetric

Metrics details

  • Computer science

Hundreds of millions of people now interact with language models, with uses ranging from help with writing 1 , 2 to informing hiring decisions 3 . However, these language models are known to perpetuate systematic racial prejudices, making their judgements biased in problematic ways about groups such as African Americans 4 , 5 , 6 , 7 . Although previous research has focused on overt racism in language models, social scientists have argued that racism with a more subtle character has developed over time, particularly in the United States after the civil rights movement 8 , 9 . It is unknown whether this covert racism manifests in language models. Here, we demonstrate that language models embody covert racism in the form of dialect prejudice, exhibiting raciolinguistic stereotypes about speakers of African American English (AAE) that are more negative than any human stereotypes about African Americans ever experimentally recorded. By contrast, the language models’ overt stereotypes about African Americans are more positive. Dialect prejudice has the potential for harmful consequences: language models are more likely to suggest that speakers of AAE be assigned less-prestigious jobs, be convicted of crimes and be sentenced to death. Finally, we show that current practices of alleviating racial bias in language models, such as human preference alignment, exacerbate the discrepancy between covert and overt stereotypes, by superficially obscuring the racism that language models maintain on a deeper level. Our findings have far-reaching implications for the fair and safe use of language technology.

Similar content being viewed by others

research papers in linguistics

Large language models propagate race-based medicine

research papers in linguistics

The benefits, risks and bounds of personalizing the alignment of large language models to individuals

research papers in linguistics

Cognitive causes of ‘like me’ race and gender biases in human language production

Language models are a type of artificial intelligence (AI) that has been trained to process and generate text. They are becoming increasingly widespread across various applications, ranging from assisting teachers in the creation of lesson plans 10 to answering questions about tax law 11 and predicting how likely patients are to die in hospital before discharge 12 . As the stakes of the decisions entrusted to language models rise, so does the concern that they mirror or even amplify human biases encoded in the data they were trained on, thereby perpetuating discrimination against racialized, gendered and other minoritized social groups 4 , 5 , 6 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 .

Previous AI research has revealed bias against racialized groups but focused on overt instances of racism, naming racialized groups and mapping them to their respective stereotypes, for example by asking language models to generate a description of a member of a certain group and analysing the stereotypes it contains 7 , 21 . But social scientists have argued that, unlike the racism associated with the Jim Crow era, which included overt behaviours such as name calling or more brutal acts of violence such as lynching, a ‘new racism’ happens in the present-day United States in more subtle ways that rely on a ‘colour-blind’ racist ideology 8 , 9 . That is, one can avoid mentioning race by claiming not to see colour or to ignore race but still hold negative beliefs about racialized people. Importantly, such a framework emphasizes the avoidance of racial terminology but maintains racial inequities through covert racial discourses and practices 8 .

Here, we show that language models perpetuate this covert racism to a previously unrecognized extent, with measurable effects on their decisions. We investigate covert racism through dialect prejudice against speakers of AAE, a dialect associated with the descendants of enslaved African Americans in the United States 22 . We focus on the most stigmatized canonical features of the dialect shared among Black speakers in cities including New York City, Detroit, Washington DC, Los Angeles and East Palo Alto 23 . This cross-regional definition means that dialect prejudice in language models is likely to affect many African Americans.

Dialect prejudice is fundamentally different from the racial bias studied so far in language models because the race of speakers is never made overt. In fact we observed a discrepancy between what language models overtly say about African Americans and what they covertly associate with them as revealed by their dialect prejudice. This discrepancy is particularly pronounced for language models trained with human feedback (HF), such as GPT4: our results indicate that HF training obscures the racism on the surface, but the racial stereotypes remain unaffected on a deeper level. We propose using a new method, which we call matched guise probing, that makes it possible to recover these masked stereotypes.

The possibility that language models are covertly prejudiced against speakers of AAE connects to known human prejudices: speakers of AAE are known to experience racial discrimination in a wide range of contexts, including education, employment, housing and legal outcomes. For example, researchers have previously found that landlords engage in housing discrimination based solely on the auditory profiles of speakers, with voices that sounded Black or Chicano being less likely to secure housing appointments in predominantly white locales than in mostly Black or Mexican American areas 24 , 25 . Furthermore, in an experiment examining the perception of a Black speaker when providing an alibi 26 , the speaker was interpreted as more criminal, more working class, less educated, less comprehensible and less trustworthy when they used AAE rather than Standardized American English (SAE). Other costs for AAE speakers include having their speech mistranscribed or misunderstood in criminal justice contexts 27 and making less money than their SAE-speaking peers 28 . These harms connect to themes in broader racial ideology about African Americans and stereotypes about their intelligence, competence and propensity to commit crimes 29 , 30 , 31 , 32 , 33 , 34 , 35 . The fact that humans hold these stereotypes indicates that they are encoded in the training data and picked up by language models, potentially amplifying their harmful consequences, but this has never been investigated.

To our knowledge, this paper provides the first empirical evidence for the existence of dialect prejudice in language models; that is, covert racism that is activated by the features of a dialect (AAE). Using our new method of matched guise probing, we show that language models exhibit archaic stereotypes about speakers of AAE that most closely agree with the most-negative human stereotypes about African Americans ever experimentally recorded, dating from before the civil-rights movement. Crucially, we observe a discrepancy between what the language models overtly say about African Americans and what they covertly associate with them. Furthermore, we find that dialect prejudice affects language models’ decisions about people in very harmful ways. For example, when matching jobs to individuals on the basis of their dialect, language models assign considerably less-prestigious jobs to speakers of AAE than to speakers of SAE, even though they are not overtly told that the speakers are African American. Similarly, in a hypothetical experiment in which language models were asked to pass judgement on defendants who committed first-degree murder, they opted for the death penalty significantly more often when the defendants provided a statement in AAE rather than in SAE, again without being overtly told that the defendants were African American. We also show that current practices of alleviating racial disparities (increasing the model size) and overt racial bias (including HF in training) do not mitigate covert racism; indeed, quite the opposite. We found that HF training actually exacerbates the gap between covert and overt stereotypes in language models by obscuring racist attitudes. Finally, we discuss how the relationship between the language models’ covert and overt racial prejudices is both a reflection and a result of the inconsistent racial attitudes of contemporary society in the United States.

Probing AI dialect prejudice

To explore how dialect choice impacts the predictions that language models make about speakers in the absence of other cues about their racial identity, we took inspiration from the ‘matched guise’ technique used in sociolinguistics, in which subjects listen to recordings of speakers of two languages or dialects and make judgements about various traits of those speakers 36 , 37 . Applying the matched guise technique to the AAE–SAE contrast, researchers have shown that people identify speakers of AAE as Black with above-chance accuracy 24 , 26 , 38 and attach racial stereotypes to them, even without prior knowledge of their race 39 , 40 , 41 , 42 , 43 . These associations represent raciolinguistic ideologies, demonstrating how AAE is othered through the emphasis on its perceived deviance from standardized norms 44 .

Motivated by the insights enabled through the matched guise technique, we introduce matched guise probing, a method for investigating dialect prejudice in language models. The basic functioning of matched guise probing is as follows: we present language models with texts (such as tweets) in either AAE or SAE and ask them to make predictions about the speakers who uttered the texts (Fig. 1 and Methods ). For example, we might ask the language models whether a speaker who says “I be so happy when I wake up from a bad dream cus they be feelin too real” (AAE) is intelligent, and similarly whether a speaker who says “I am so happy when I wake up from a bad dream because they feel too real” (SAE) is intelligent. Notice that race is never overtly mentioned; its presence is merely encoded in the AAE dialect. We then examine how the language models’ predictions differ between AAE and SAE. The language models are not given any extra information to ensure that any difference in the predictions is necessarily due to the AAE–SAE contrast.

figure 1

a , We used texts in SAE (green) and AAE (blue). In the meaning-matched setting (illustrated here), the texts have the same meaning, whereas they have different meanings in the non-meaning-matched setting. b , We embedded the SAE and AAE texts in prompts that asked for properties of the speakers who uttered the texts. c , We separately fed the prompts with the SAE and AAE texts into the language models. d , We retrieved and compared the predictions for the SAE and AAE inputs, here illustrated by five adjectives from the Princeton Trilogy. See Methods for more details.

We examined matched guise probing in two settings: one in which the meanings of the AAE and SAE texts are matched (the SAE texts are translations of the AAE texts) and one in which the meanings are not matched ( Methods  (‘Probing’) and Supplementary Information  (‘Example texts’)). Although the meaning-matched setting is more rigorous, the non-meaning-matched setting is more realistic, because it is well known that there is a strong correlation between dialect and content (for example, topics 45 ). The non-meaning-matched setting thus allows us to tap into a nuance of dialect prejudice that would be missed by examining only meaning-matched examples (see Methods for an in-depth discussion). Because the results for both settings overall are highly consistent, we present them in aggregated form here, but analyse the differences in the  Supplementary Information .

We examined GPT2 (ref. 46 ), RoBERTa 47 , T5 (ref. 48 ), GPT3.5 (ref. 49 ) and GPT4 (ref. 50 ), each in one or more model versions, amounting to a total of 12 examined models ( Methods and Supplementary Information (‘Language models’)). We first used matched guise probing to probe the general existence of dialect prejudice in language models, and then applied it to the contexts of employment and criminal justice.

Covert stereotypes in language models

We started by investigating whether the attitudes that language models exhibit about speakers of AAE reflect human stereotypes about African Americans. To do so, we replicated the experimental set-up of the Princeton Trilogy 29 , 30 , 31 , 34 , a series of studies investigating the racial stereotypes held by Americans, with the difference that instead of overtly mentioning race to the language models, we used matched guise probing based on AAE and SAE texts ( Methods ).

Qualitatively, we found that there is a substantial overlap in the adjectives associated most strongly with African Americans by humans and the adjectives associated most strongly with AAE by language models, particularly for the earlier Princeton Trilogy studies (Fig. 2a ). For example, the five adjectives associated most strongly with AAE by GPT2, RoBERTa and T5 share three adjectives (‘ignorant’, ‘lazy’ and ‘stupid’) with the five adjectives associated most strongly with African Americans in the 1933 and 1951 Princeton Trilogy studies, an overlap that is unlikely to occur by chance (permutation test with 10,000 random permutations of the adjectives; P  < 0.01). Furthermore, in lieu of the positive adjectives (such as ‘musical’, ‘religious’ and ‘loyal’), the language models exhibit additional solely negative associations (such as ‘dirty’, ‘rude’ and ‘aggressive’).

figure 2

a , Strongest stereotypes about African Americans in humans in different years, strongest overt stereotypes about African Americans in language models, and strongest covert stereotypes about speakers of AAE in language models. Colour coding as positive (green) and negative (red) is based on ref. 34 . Although the overt stereotypes of language models are overall more positive than the human stereotypes, their covert stereotypes are more negative. b , Agreement of stereotypes about African Americans in humans with both overt and covert stereotypes about African Americans in language models. The black dotted line shows chance agreement using a random bootstrap. Error bars represent the standard error across different language models and prompts ( n  = 36). The language models’ overt stereotypes agree most strongly with current human stereotypes, which are the most positive experimentally recorded ones, but their covert stereotypes agree most strongly with human stereotypes from the 1930s, which are the most negative experimentally recorded ones. c , Stereotype strength for individual linguistic features of AAE. Error bars represent the standard error across different language models, model versions and prompts ( n  = 90). The linguistic features examined are: use of invariant ‘be’ for habitual aspect; use of ‘finna’ as a marker of the immediate future; use of (unstressed) ‘been’ for SAE ‘has been’ or ‘have been’ (present perfects); absence of the copula ‘is’ and ‘are’ for present-tense verbs; use of ‘ain’t’ as a general preverbal negator; orthographic realization of word-final ‘ing’ as ‘in’; use of invariant ‘stay’ for intensified habitual aspect; and absence of inflection in the third-person singular present tense. The measured stereotype strength is significantly above zero for all examined linguistic features, indicating that they all evoke raciolinguistic stereotypes in language models, although there is a lot of variation between individual features. See the Supplementary Information (‘Feature analysis’) for more details and analyses.

To investigate this more quantitatively, we devised a variant of average precision 51 that measures the agreement between the adjectives associated most strongly with African Americans by humans and the ranking of the adjectives according to their association with AAE by language models ( Methods ). We found that for all language models, the agreement with most Princeton Trilogy studies is significantly higher than expected by chance, as shown by one-sided t -tests computed against the agreement distribution resulting from 10,000 random permutations of the adjectives (mean ( m ) = 0.162, standard deviation ( s ) = 0.106; Extended Data Table 1 ); and that the agreement is particularly pronounced for the stereotypes reported in 1933 and falls for each study after that, almost reaching the level of chance agreement for 2012 (Fig. 2b ). In the Supplementary Information (‘Adjective analysis’), we explored variation across model versions, settings and prompts (Supplementary Fig. 2 and Supplementary Table 4 ).

To explain the observed temporal trend, we measured the average favourability of the top five adjectives for all Princeton Trilogy studies and language models, drawing from crowd-sourced ratings for the Princeton Trilogy adjectives on a scale between −2 (very negative) and 2 (very positive; see Methods , ‘Covert-stereotype analysis’). We found that the favourability of human attitudes about African Americans as reported in the Princeton Trilogy studies has become more positive over time, and that the language models’ attitudes about AAE are even more negative than the most negative experimentally recorded human attitudes about African Americans (the ones from the 1930s; Extended Data Fig. 1 ). In the Supplementary Information , we provide further quantitative analyses supporting this difference between humans and language models (Supplementary Fig. 7 ).

Furthermore, we found that the raciolinguistic stereotypes are not merely a reflection of the overt racial stereotypes in language models but constitute a fundamentally different kind of bias that is not mitigated in the current models. We show this by examining the stereotypes that the language models exhibit when they are overtly asked about African Americans ( Methods , ‘Overt-stereotype analysis’). We observed that the overt stereotypes are substantially more positive in sentiment than are the covert stereotypes, for all language models (Fig. 2a and Extended Data Fig. 1 ). Strikingly, for RoBERTa, T5, GPT3.5 and GPT4, although their covert stereotypes about speakers of AAE are more negative than the most negative experimentally recorded human stereotypes, their overt stereotypes about African Americans are more positive than the most positive experimentally recorded human stereotypes. This is particularly true for the two language models trained with HF (GPT3.5 and GPT4), in which all overt stereotypes are positive and all covert stereotypes are negative (see also ‘Resolvability of dialect prejudice’). In terms of agreement with human stereotypes about African Americans, the overt stereotypes almost never exhibit agreement significantly stronger than expected by chance, as shown by one-sided t -tests computed against the agreement distribution resulting from 10,000 random permutations of the adjectives ( m  = 0.162, s  = 0.106; Extended Data Table 2 ). Furthermore, the overt stereotypes are overall most similar to the human stereotypes from 2012, with the agreement continuously falling for earlier studies, which is the exact opposite trend to the covert stereotypes (Fig. 2b ).

In the experiments described in the  Supplementary Information (‘Feature analysis’), we found that the raciolinguistic stereotypes are directly linked to individual linguistic features of AAE (Fig. 2c and Supplementary Table 14 ), and that a higher density of such linguistic features results in stronger stereotypical associations (Supplementary Fig. 11 and Supplementary Table 13 ). Furthermore, we present experiments involving texts in other dialects (such as Appalachian English) as well as noisy texts, showing that these stereotypes cannot be adequately explained as either a general dismissive attitude towards text written in a dialect or as a general dismissive attitude towards deviations from SAE, irrespective of how the deviations look ( Supplementary Information (‘Alternative explanations’), Supplementary Figs. 12 and 13 and Supplementary Tables 15 and 16 ). Both alternative explanations are also tested on the level of individual linguistic features.

Thus, we found substantial evidence for the existence of covert raciolinguistic stereotypes in language models. Our experiments show that these stereotypes are similar to the archaic human stereotypes about African Americans that existed before the civil rights movement, are even more negative than the most negative experimentally recorded human stereotypes about African Americans, and are both qualitatively and quantitatively different from the previously reported overt racial stereotypes in language models, indicating that they are a fundamentally different kind of bias. Finally, our analyses demonstrate that the detected stereotypes are inherently linked to AAE and its linguistic features.

Impact of covert racism on AI decisions

To determine what harmful consequences the covert stereotypes have in the real world, we focused on two areas in which racial stereotypes about speakers of AAE and African Americans have been repeatedly shown to bias human decisions: employment and criminality. There is a growing impetus to use AI systems in these areas. Indeed, AI systems are already being used for personnel selection 52 , 53 , including automated analyses of applicants’ social-media posts 54 , 55 , and technologies for predicting legal outcomes are under active development 56 , 57 , 58 . Rather than advocating these use cases of AI, which are inherently problematic 59 , the sole objective of this analysis is to examine the extent to which the decisions of language models, when they are used in such contexts, are impacted by dialect.

First, we examined decisions about employability. Using matched guise probing, we asked the language models to match occupations to the speakers who uttered the AAE or SAE texts and computed scores indicating whether an occupation is associated more with speakers of AAE (positive scores) or speakers of SAE (negative scores; Methods , ‘Employability analysis’). The average score of the occupations was negative ( m  = –0.046,  s  = 0.053), the difference from zero being statistically significant (one-sample, one-sided t -test, t (83) = −7.9, P  < 0.001). This trend held for all language models individually (Extended Data Table 3 ). Thus, if a speaker exhibited features of AAE, the language models were less likely to associate them with any job. Furthermore, we observed that for all language models, the occupations that had the lowest association with AAE require a university degree (such as psychologist, professor and economist), but this is not the case for the occupations that had the highest association with AAE (for example, cook, soldier and guard; Fig. 3a ). Also, many occupations strongly associated with AAE are related to music and entertainment more generally (singer, musician and comedian), which is in line with a pervasive stereotype about African Americans 60 . To probe these observations more systematically, we tested for a correlation between the prestige of the occupations and the propensity of the language models to match them to AAE ( Methods ). Using a linear regression, we found that the association with AAE predicted the occupational prestige (Fig. 3b ; β  = −7.8, R 2 = 0.193, F (1, 63) = 15.1, P  < 0.001). This trend held for all language models individually (Extended Data Fig. 2 and Extended Data Table 4 ), albeit in a less pronounced way for GPT3.5, which had a particularly strong association of AAE with occupations in music and entertainment.

figure 3

a , Association of different occupations with AAE or SAE. Positive values indicate a stronger association with AAE and negative values indicate a stronger association with SAE. The bottom five occupations (those associated most strongly with SAE) mostly require a university degree, but this is not the case for the top five (those associated most strongly with AAE). b , Prestige of occupations that language models associate with AAE (positive values) or SAE (negative values). The shaded area shows a 95% confidence band around the regression line. The association with AAE or SAE predicts the occupational prestige. Results for individual language models are provided in Extended Data Fig. 2 . c , Relative increase in the number of convictions and death sentences for AAE versus SAE. Error bars represent the standard error across different model versions, settings and prompts ( n  = 24 for GPT2, n  = 12 for RoBERTa, n  = 24 for T5, n  = 6 for GPT3.5 and n  = 6 for GPT4). In cases of small sample size ( n  ≤ 10 for GPT3.5 and GPT4), we plotted the individual results as overlaid dots. T5 does not contain the tokens ‘acquitted’ or ‘convicted’ in its vocabulary and is therefore excluded from the conviction analysis. Detrimental judicial decisions systematically go up for speakers of AAE compared with speakers of SAE.

We then examined decisions about criminality. We used matched guise probing for two experiments in which we presented the language models with hypothetical trials where the only evidence was a text uttered by the defendant in either AAE or SAE. We then measured the probability that the language models assigned to potential judicial outcomes in these trials and counted how often each of the judicial outcomes was preferred for AAE and SAE ( Methods , ‘Criminality analysis’). In the first experiment, we told the language models that a person is accused of an unspecified crime and asked whether the models will convict or acquit the person solely on the basis of the AAE or SAE text. Overall, we found that the rate of convictions was greater for AAE ( r  = 68.7%) than SAE ( r  = 62.1%; Fig. 3c , left). A chi-squared test found a strong effect ( χ 2 (1,  N  = 96) = 184.7,  P  < 0.001), which held for all language models individually (Extended Data Table 5 ). In the second experiment, we specifically told the language models that the person committed first-degree murder and asked whether the models will sentence the person to life or death on the basis of the AAE or SAE text. The overall rate of death sentences was greater for AAE ( r  = 27.7%) than for SAE ( r  = 22.8%; Fig. 3c , right). A chi-squared test found a strong effect ( χ 2 (1,  N  = 144) = 425.4,  P  < 0.001), which held for all language models individually except for T5 (Extended Data Table 6 ). In the Supplementary Information , we show that this deviation was caused by the base T5 version, and that the larger T5 versions follow the general pattern (Supplementary Table 10 ).

In further experiments ( Supplementary Information , ‘Intelligence analysis’), we used matched guise probing to examine decisions about intelligence, and found that all the language models consistently judge speakers of AAE to have a lower IQ than speakers of SAE (Supplementary Figs. 14 and 15 and Supplementary Tables 17 – 19 ).

Resolvability of dialect prejudice

We wanted to know whether the dialect prejudice we observed is resolved by current practices of bias mitigation, such as increasing the size of the language model or including HF in training. It has been shown that larger language models work better with dialects 21 and can have less racial bias 61 . Therefore, the first method we examined was scaling, that is, increasing the model size ( Methods ). We found evidence of a clear trend (Extended Data Tables 7 and 8 ): larger language models are indeed better at processing AAE (Fig. 4a , left), but they are not less prejudiced against speakers of it. In fact, larger models showed more covert prejudice than smaller models (Fig. 4a , right). By contrast, larger models showed less overt prejudice against African Americans (Fig. 4a , right). Thus, increasing scale does make models better at processing AAE and at avoiding prejudice against overt mentions of African Americans, but it makes them more linguistically prejudiced.

figure 4

a , Language modelling perplexity and stereotype strength on AAE text as a function of model size. Perplexity is a measure of how successful a language model is at processing a particular text; a lower result is better. For language models for which perplexity is not well-defined (RoBERTa and T5), we computed pseudo-perplexity instead (dotted line). Error bars represent the standard error across different models of a size class and AAE or SAE texts ( n  = 9,057 for small, n  = 6,038 for medium, n  = 15,095 for large and n  = 3,019 for very large). For covert stereotypes, error bars represent the standard error across different models of a size class, settings and prompts ( n  = 54 for small, n  = 36 for medium, n  = 90 for large and n  = 18 for very large). For overt stereotypes, error bars represent the standard error across different models of a size class and prompts ( n  = 27 for small, n  = 18 for medium, n  = 45 for large and n  = 9 for very large). Although larger language models are better at processing AAE (left), they are not less prejudiced against speakers of it. Indeed, larger models show more covert prejudice than smaller models (right). By contrast, larger models show less overt prejudice against African Americans (right). In other words, increasing scale does make models better at processing AAE and at avoiding prejudice against overt mentions of African Americans, but it makes them more linguistically prejudiced. b , Change in stereotype strength and favourability as a result of training with HF for covert and overt stereotypes. Error bars represent the standard error across different prompts ( n  = 9). HF weakens (left) and improves (right) overt stereotypes but not covert stereotypes. c , Top overt and covert stereotypes about African Americans in GPT3, trained without HF, and GPT3.5, trained with HF. Colour coding as positive (green) and negative (red) is based on ref. 34 . The overt stereotypes get substantially more positive as a result of HF training in GPT3.5, but there is no visible change in favourability for the covert stereotypes.

As a second potential way to resolve dialect prejudice in language models, we examined training with HF 49 , 62 . Specifically, we compared GPT3.5 (ref. 49 ) with GPT3 (ref. 63 ), its predecessor that was trained without using HF ( Methods ). Looking at the top adjectives associated overtly and covertly with African Americans by the two language models, we found that HF resulted in more-positive overt associations but had no clear qualitative effect on the covert associations (Fig. 4c ). This observation was confirmed by quantitative analyses: the inclusion of HF resulted in significantly weaker (no HF, m  = 0.135,  s  = 0.142; HF, m  = −0.119,  s  = 0.234;  t (16) = 2.6,  P  < 0.05) and more favourable (no HF, m  = 0.221,  s  = 0.399; HF, m  = 1.047,  s  = 0.387;  t (16) = −6.4,  P  < 0.001) overt stereotypes but produced no significant difference in the strength (no HF, m  = 0.153,  s  = 0.049; HF, m  = 0.187,  s  = 0.066;  t (16) = −1.2, P  = 0.3) or unfavourability (no HF, m  = −1.146, s  = 0.580; HF, m = −1.029, s  = 0.196; t (16) = −0.5, P  = 0.6) of covert stereotypes (Fig. 4b ). Thus, HF training weakens and ameliorates the overt stereotypes but has no clear effect on the covert stereotypes; in other words, it obscures the racist attitudes on the surface, but more subtle forms of racism, such as dialect prejudice, remain unaffected. This finding is underscored by the fact that the discrepancy between overt and covert stereotypes about African Americans is most pronounced for the two examined language models trained with human feedback (GPT3.5 and GPT4; see ‘Covert stereotypes in language models’). Furthermore, this finding again shows that there is a fundamental difference between overt and covert stereotypes in language models, and that mitigating the overt stereotypes does not automatically translate to mitigated covert stereotypes.

To sum up, neither scaling nor training with HF as applied today resolves the dialect prejudice. The fact that these two methods effectively mitigate racial performance disparities and overt racial stereotypes in language models indicates that this form of covert racism constitutes a different problem that is not addressed by current approaches for improving and aligning language models.

The key finding of this article is that language models maintain a form of covert racial prejudice against African Americans that is triggered by dialect features alone. In our experiments, we avoided overt mentions of race but drew from the racialized meanings of a stigmatized dialect, and could still find historically racist associations with African Americans. The implicit nature of this prejudice, that is, the fact it is about something that is not explicitly expressed in the text, makes it fundamentally different from the overt racial prejudice that has been the focus of previous research. Strikingly, the language models’ covert and overt racial prejudices are often in contradiction with each other, especially for the most recent language models that have been trained with HF (GPT3.5 and GPT4). These two language models obscure the racism, overtly associating African Americans with exclusively positive attributes (such as ‘brilliant’), but our results show that they covertly associate African Americans with exclusively negative attributes (such as ‘lazy’).

We argue that this paradoxical relation between the language models’ covert and overt racial prejudices manifests the inconsistent racial attitudes present in the contemporary society of the United States 8 , 64 . In the Jim Crow era, stereotypes about African Americans were overtly racist, but the normative climate after the civil rights movement made expressing explicitly racist views distasteful. As a result, racism acquired a covert character and continued to exist on a more subtle level. Thus, most white people nowadays report positive attitudes towards African Americans in surveys but perpetuate racial inequalities through their unconscious behaviour, such as their residential choices 65 . It has been shown that negative stereotypes persist, even if they are superficially rejected 66 , 67 . This ambivalence is reflected by the language models we analysed, which are overtly non-racist but covertly exhibit archaic stereotypes about African Americans, showing that they reproduce a colour-blind racist ideology. Crucially, the civil rights movement is generally seen as the period during which racism shifted from overt to covert 68 , 69 , and this is mirrored by our results: all the language models overtly agree the most with human stereotypes from after the civil rights movement, but covertly agree the most with human stereotypes from before the civil rights movement.

Our findings beg the question of how dialect prejudice got into the language models. Language models are pretrained on web-scraped corpora such as WebText 46 , C4 (ref. 48 ) and the Pile 70 , which encode raciolinguistic stereotypes about AAE. A drastic example of this is the use of ‘mock ebonics’ to parodize speakers of AAE 71 . Crucially, a growing body of evidence indicates that language models pick up prejudices present in the pretraining corpus 72 , 73 , 74 , 75 , which would explain how they become prejudiced against speakers of AAE, and why they show varying levels of dialect prejudice as a function of the pretraining corpus. However, the web also abounds with overt racism against African Americans 76 , 77 , so we wondered why the language models exhibit much less overt than covert racial prejudice. We argue that the reason for this is that the existence of overt racism is generally known to people 32 , which is not the case for covert racism 69 . Crucially, this also holds for the field of AI. The typical pipeline of training language models includes steps such as data filtering 48 and, more recently, HF training 62 that remove overt racial prejudice. As a result, much of the overt racism on the web does not end up in the language models. However, there are currently no measures in place to curtail covert racial prejudice when training language models. For example, common datasets for HF training 62 , 78 do not include examples that would train the language models to treat speakers of AAE and SAE equally. As a result, the covert racism encoded in the training data can make its way into the language models in an unhindered fashion. It is worth mentioning that the lack of awareness of covert racism also manifests during evaluation, where it is common to test language models for overt racism but not for covert racism 21 , 63 , 79 , 80 .

As well as the representational harms, by which we mean the pernicious representation of AAE speakers, we also found evidence for substantial allocational harms. This refers to the inequitable allocation of resources to AAE speakers 81 (Barocas et al., unpublished observations), and adds to known cases of language technology putting speakers of AAE at a disadvantage by performing worse on AAE 82 , 83 , 84 , 85 , 86 , 87 , 88 , misclassifying AAE as hate speech 81 , 89 , 90 , 91 or treating AAE as incorrect English 83 , 85 , 92 . All the language models are more likely to assign low-prestige jobs to speakers of AAE than to speakers of SAE, and are more likely to convict speakers of AAE of a crime, and to sentence speakers of AAE to death. Although the details of our tasks are constructed, the findings reveal real and urgent concerns because business and jurisdiction are areas for which AI systems involving language models are currently being developed or deployed. As a consequence, the dialect prejudice we uncovered might already be affecting AI decisions today, for example when a language model is used in application-screening systems to process background information, which might include social-media text. Worryingly, we also observe that larger language models and language models trained with HF exhibit stronger covert, but weaker overt, prejudice. Against the backdrop of continually growing language models and the increasingly widespread adoption of HF training, this has two risks: first, that language models, unbeknownst to developers and users, reach ever-increasing levels of covert prejudice; and second, that developers and users mistake ever-decreasing levels of overt prejudice (the only kind of prejudice currently tested for) for a sign that racism in language models has been solved. There is therefore a realistic possibility that the allocational harms caused by dialect prejudice in language models will increase further in the future, perpetuating the racial discrimination experienced by generations of African Americans.

Matched guise probing examines how strongly a language model associates certain tokens, such as personality traits, with AAE compared with SAE. AAE can be viewed as the treatment condition, whereas SAE functions as the control condition. We start by explaining the basic experimental unit of matched guise probing: measuring how a language model associates certain tokens with an individual text in AAE or SAE. Based on this, we introduce two different settings for matched guise probing (meaning-matched and non-meaning-matched), which are both inspired by the matched guise technique used in sociolinguistics 36 , 37 , 93 , 94 and provide complementary views on the attitudes a language model has about a dialect.

The basic experimental unit of matched guise probing is as follows. Let θ be a language model, t be a text in AAE or SAE, and x be a token of interest, typically a personality trait such as ‘intelligent’. We embed the text in a prompt v , for example v ( t ) = ‘a person who says t tends to be’, and compute P ( x ∣ v ( t );  θ ), which is the probability that θ assigns to x after processing v ( t ). We calculate P ( x ∣ v ( t );  θ ) for equally sized sets T a of AAE texts and T s of SAE texts, comparing various tokens from a set X as possible continuations. It has been shown that P ( x ∣ v ( t );  θ ) can be affected by the precise wording of v , so small modifications of v can have an unpredictable effect on the predictions made by the language model 21 , 95 , 96 . To account for this fact, we consider a set V containing several prompts ( Supplementary Information ). For all experiments, we have provided detailed analyses of variation across prompts in the  Supplementary Information .

We conducted matched guise probing in two settings. In the first setting, the texts in T a and T s formed pairs expressing the same underlying meaning, that is, the i -th text in T a (for example, ‘I be so happy when I wake up from a bad dream cus they be feelin too real’) matches the i -th text in T s (for example, ‘I am so happy when I wake up from a bad dream because they feel too real’). For this setting, we used the dataset from ref. 87 , which contains 2,019 AAE tweets together with their SAE translations. In the second setting, the texts in T a and T s did not form pairs, so they were independent texts in AAE and SAE. For this setting, we sampled 2,000 AAE and SAE tweets from the dataset in ref. 83 and used tweets strongly aligned with African Americans for AAE and tweets strongly aligned with white people for SAE ( Supplementary Information (‘Analysis of non-meaning-matched texts’), Supplementary Fig. 1 and Supplementary Table 3 ). In the  Supplementary Information , we include examples of AAE and SAE texts for both settings (Supplementary Tables 1 and 2 ). Tweets are well suited for matched guise probing because they are a rich source of dialectal variation 97 , 98 , 99 , especially for AAE 100 , 101 , 102 , but matched guise probing can be applied to any type of text. Although we do not consider it here, matched guise probing can in principle also be applied to speech-based models, with the potential advantage that dialectal variation on the phonetic level could be captured more directly, which would make it possible to study dialect prejudice specific to regional variants of AAE 23 . However, note that a great deal of phonetic variation is reflected orthographically in social-media texts 101 .

It is important to analyse both meaning-matched and non-meaning-matched settings because they capture different aspects of the attitudes a language model has about speakers of AAE. Controlling for the underlying meaning makes it possible to uncover differences in the attitudes of the language model that are solely due to grammatical and lexical features of AAE. However, it is known that various properties other than linguistic features correlate with dialect, such as topics 45 , and these might also influence the attitudes of the language model. Sidelining such properties bears the risk of underestimating the harms that dialect prejudice causes for speakers of AAE in the real world. For example, in a scenario in which a language model is used in the context of automated personnel selection to screen applicants’ social-media posts, the texts of two competing applicants typically differ in content and do not come in pairs expressing the same meaning. The relative advantages of using meaning-matched or non-meaning-matched data for matched guise probing are conceptually similar to the relative advantages of using the same or different speakers for the matched guise technique: more control in the former versus more naturalness in the latter setting 93 , 94 . Because the results obtained in both settings were consistent overall for all experiments, we aggregated them in the main article, but we analysed differences in detail in the  Supplementary Information .

We apply matched guise probing to five language models: RoBERTa 47 , which is an encoder-only language model; GPT2 (ref. 46 ), GPT3.5 (ref. 49 ) and GPT4 (ref. 50 ), which are decoder-only language models; and T5 (ref. 48 ), which is an encoder–decoder language model. For each language model, we examined one or more model versions: GPT2 (base), GPT2 (medium), GPT2 (large), GPT2 (xl), RoBERTa (base), RoBERTa (large), T5 (small), T5 (base), T5 (large), T5 (3b), GPT3.5 (text-davinci-003) and GPT4 (0613). Where we used several model versions per language model (GPT2, RoBERTa and T5), the model versions all had the same architecture and were trained on the same data but differed in their size. Furthermore, we note that GPT3.5 and GPT4 are the only language models examined in this paper that were trained with HF, specifically reinforcement learning from human feedback 103 . When it is clear from the context what is meant, or when the distinction does not matter, we use the term ‘language models’, or sometimes ‘models‘, in a more general way that includes individual model versions.

Regarding matched guise probing, the exact method for computing P ( x ∣ v ( t );  θ ) varies across language models and is detailed in the  Supplementary Information . For GPT4, for which computing P ( x ∣ v ( t );  θ ) for all tokens of interest was often not possible owing to restrictions imposed by the OpenAI application programming interface (API), we used a slightly modified method for some of the experiments, and this is also discussed in the  Supplementary Information . Similarly, some of the experiments could not be done for all language models because of model-specific constraints, which we highlight below. We note that there was at most one language model per experiment for which this was the case.

Covert-stereotype analysis

In the covert-stereotype analysis, the tokens x whose probabilities are measured for matched guise probing are trait adjectives from the Princeton Trilogy 29 , 30 , 31 , 34 , such as ‘aggressive’, ‘intelligent’ and ‘quiet’. We provide details about these adjectives in the  Supplementary Information . In the Princeton Trilogy, the adjectives are provided to participants in the form of a list, and participants are asked to select from the list the five adjectives that best characterize a given ethnic group, such as African Americans. The studies that we compare in this paper, which are the original Princeton Trilogy studies 29 , 30 , 31 and a more recent reinstallment 34 , all follow this general set-up and observe a gradual improvement of the expressed stereotypes about African Americans over time, but the exact interpretation of this finding is disputed 32 . Here, we used the adjectives from the Princeton Trilogy in the context of matched guise probing.

Specifically, we first computed P ( x ∣ v ( t );  θ ) for all adjectives, for both the AAE texts and the SAE texts. The method for aggregating the probabilities P ( x ∣ v ( t );  θ ) into association scores between an adjective x and AAE varies for the two settings of matched guise probing. Let \({t}_{{\rm{a}}}^{i}\) be the i -th AAE text in T a and \({t}_{{\rm{s}}}^{i}\) be the i -th SAE text in T s . In the meaning-matched setting, in which \({t}_{{\rm{a}}}^{i}\) and \({t}_{{\rm{s}}}^{i}\) express the same meaning, we computed the prompt-level association score for an adjective x as

where n = ∣ T a ∣ = ∣ T s ∣ . Thus, we measure for each pair of AAE and SAE texts the log ratio of the probability assigned to x following the AAE text and the probability assigned to x following the SAE text, and then average the log ratios of the probabilities across all pairs. In the non-meaning-matched setting, we computed the prompt-level association score for an adjective x as

where again n = ∣ T a ∣ = ∣ T s ∣ . In other words, we first compute the average probability assigned to a certain adjective x following all AAE texts and the average probability assigned to x following all SAE texts, and then measure the log ratio of these average probabilities. The interpretation of q ( x ;  v ,  θ ) is identical in both settings; q ( x ;  v , θ ) > 0 means that for a certain prompt v , the language model θ associates the adjective x more strongly with AAE than with SAE, and q ( x ;  v ,  θ ) < 0 means that for a certain prompt v , the language model θ associates the adjective x more strongly with SAE than with AAE. In the  Supplementary Information (‘Calibration’), we show that q ( x ;  v , θ ) is calibrated 104 , meaning that it does not depend on the prior probability that θ assigns to x in a neutral context.

The prompt-level association scores q ( x ;  v ,  θ ) are the basis for further analyses. We start by averaging q ( x ;  v ,  θ ) across model versions, prompts and settings, and this allows us to rank all adjectives according to their overall association with AAE for individual language models (Fig. 2a ). In this and the following adjective analyses, we focus on the five adjectives that exhibit the highest association with AAE, making it possible to consistently compare the language models with the results from the Princeton Trilogy studies, most of which do not report the full ranking of all adjectives. Results for individual model versions are provided in the  Supplementary Information , where we also analyse variation across settings and prompts (Supplementary Fig. 2 and Supplementary Table 4 ).

Next, we wanted to measure the agreement between language models and humans through time. To do so, we considered the five adjectives most strongly associated with African Americans for each study and evaluated how highly these adjectives are ranked by the language models. Specifically, let R l  = [ x 1 , …,  x ∣ X ∣ ] be the adjective ranking generated by a language model and \({R}_{h}^{5}\) = [ x 1 , …, x 5 ] be the ranking of the top five adjectives generated by the human participants in one of the Princeton Trilogy studies. A typical measure to evaluate how highly the adjectives from \({R}_{h}^{5}\) are ranked within R l is average precision, AP 51 . However, AP does not take the internal ranking of the adjectives in \({R}_{h}^{5}\) into account, which is not ideal for our purposes; for example, AP does not distinguish whether the top-ranked adjective for humans is on the first or on the fifth rank for a language model. To remedy this, we computed the mean average precision, MAP, for different subsets of \({R}_{h}^{5}\) ,

where \({R}_{h}^{i}\) denotes the top i adjectives from the human ranking. MAP = 1 if, and only if, the top five adjectives from \({R}_{h}^{5}\) have an exact one-to-one correspondence with the top five adjectives from R l , so, unlike AP, it takes the internal ranking of the adjectives into account. We computed an individual agreement score for each language model and prompt, so we average the q ( x ;  v ,  θ ) association scores for all model versions of a language model (GPT2, for example) and the two settings (meaning-matched and non-meaning-matched) to generate R l . Because the OpenAI API for GPT4 does not give access to the probabilities for all adjectives, we excluded GPT4 from this analysis. Results are presented in Fig. 2b and Extended Data Table 1 . In the Supplementary Information (‘Agreement analysis’), we analyse variation across model versions, settings and prompts (Supplementary Figs. 3 – 5 ).

To analyse the favourability of the stereotypes about African Americans, we drew from crowd-sourced favourability ratings collected previously 34 for the adjectives from the Princeton Trilogy that range between −2 (‘very unfavourable’, meaning very negative) and 2 (‘very favourable’, meaning very positive). For example, the favourability rating of ‘cruel’ is −1.81 and the favourability rating of ‘brilliant’ is 1.86. We computed the average favourability of the top five adjectives, weighting the favourability ratings of individual adjectives by their association scores with AAE and African Americans. More formally, let R 5 = [ x 1 , …, x 5 ] be the ranking of the top five adjectives generated by either a language model or humans. Furthermore, let f ( x ) be the favourability rating of adjective x as reported in ref. 34 , and let q ( x ) be the overall association score of adjective x with AAE or African Americans that is used to generate R 5 . For the Princeton Trilogy studies, q ( x ) is the percentage of participants who have assigned x to African Americans. For language models, q ( x ) is the average value of q ( x ;  v ,  θ ). We then computed the weighted average favourability, F , of the top five adjectives as

As a result of the weighting, the top-ranked adjective contributed more to the average than the second-ranked adjective, and so on. Results are presented in Extended Data Fig. 1 . To check for consistency, we also computed the average favourability of the top five adjectives without weighting, which yields similar results (Supplementary Fig. 6) .

Overt-stereotype analysis

The overt-stereotype analysis closely followed the methodology of the covert-stereotype analysis, with the difference being that instead of providing the language models with AAE and SAE texts, we provided them with overt descriptions of race (specifically, ‘Black’/‘black’ and ‘White’/‘white’). This methodological difference is also reflected by a different set of prompts ( Supplementary Information ). As a result, the experimental set-up is very similar to existing studies on overt racial bias in language models 4 , 7 . All other aspects of the analysis (such as computing adjective association scores) were identical to the analysis for covert stereotypes. This also holds for GPT4, for which we again could not conduct the agreement analysis.

We again present average results for the five language models in the main article. Results broken down for individual model versions are provided in the  Supplementary Information , where we also analyse variation across prompts (Supplementary Fig. 8 and Supplementary Table 5 ).

Employability analysis

The general set-up of the employability analysis was identical to the stereotype analyses: we fed text written in either AAE or SAE, embedded in prompts, into the language models and analysed the probabilities that they assigned to different continuation tokens. However, instead of trait adjectives, we considered occupations for X and also used a different set of prompts ( Supplementary Information ). We created a list of occupations, drawing from previously published lists 6 , 76 , 105 , 106 , 107 . We provided details about these occupations in the  Supplementary Information . We then computed association scores q ( x ;  v ,  θ ) between individual occupations x and AAE, following the same methodology as for computing adjective association scores, and ranked the occupations according to q ( x ;  v ,  θ ) for the language models. To probe the prestige associated with the occupations, we drew from a dataset of occupational prestige 105 that is based on the 2012 US General Social Survey and measures prestige on a scale from 1 (low prestige) to 9 (high prestige). For GPT4, we could not conduct the parts of the analysis that require scores for all occupations.

We again present average results for the five language models in the main article. Results for individual model versions are provided in the  Supplementary Information , where we also analyse variation across settings and prompts (Supplementary Tables 6 – 8 ).

Criminality analysis

The set-up of the criminality analysis is different from the previous experiments in that we did not compute aggregate association scores between certain tokens (such as trait adjectives) and AAE but instead asked the language models to make discrete decisions for each AAE and SAE text. More specifically, we simulated trials in which the language models were prompted to use AAE or SAE texts as evidence to make a judicial decision. We then aggregated the judicial decisions into summary statistics.

We conducted two experiments. In the first experiment, the language models were asked to determine whether a person accused of committing an unspecified crime should be acquitted or convicted. The only evidence provided to the language models was a statement made by the defendant, which was an AAE or SAE text. In the second experiment, the language models were asked to determine whether a person who committed first-degree murder should be sentenced to life or death. Similarly to the first (general conviction) experiment, the only evidence provided to the language models was a statement made by the defendant, which was an AAE or SAE text. Note that the AAE and SAE texts were the same texts as in the other experiments and did not come from a judicial context. Rather than testing how well language models could perform the tasks of predicting acquittal or conviction and life penalty or death penalty (an application of AI that we do not support), we were interested to see to what extent the decisions of the language models, made in the absence of any real evidence, were impacted by dialect. Although providing the language models with extra evidence as well as the AAE and SAE texts would have made the experiments more similar to real trials, it would have confounded the effect that dialect has on its own (the key effect of interest), so we did not consider this alternative set-up here. We focused on convictions and death penalties specifically because these are the two areas of the criminal justice system for which racial disparities have been described in the most robust and indisputable way: African Americans represent about 12% of the adult population of the United States, but they represent 33% of inmates 108 and more than 41% of people on death row 109 .

Methodologically, we used prompts that asked the language models to make a judicial decision ( Supplementary Information ). For a specific text, t , which is in AAE or SAE, we computed p ( x ∣ v ( t );  θ ) for the tokens x that correspond to the judicial outcomes of interest (‘acquitted’ or ‘convicted’, and ‘life’ or ‘death’). T5 does not contain the tokens ‘acquitted’ and ‘convicted’ in its vocabulary, so is was excluded from the conviction analysis. Because the language models might assign different prior probabilities to the outcome tokens, we calibrated them using their probabilities in a neutral context following v , meaning without text t 104 . Whichever outcome had the higher calibrated probability was counted as the decision. We aggregated the detrimental decisions (convictions and death penalties) and compared their rates (percentages) between AAE and SAE texts. An alternative approach would have been to generate the judicial decision by sampling from the language models, which would have allowed us to induce the language models to generate justifications of their decisions. However, this approach has three disadvantages: first, encoder-only language models such as RoBERTa do not lend themselves to text generation; second, it would have been necessary to apply jail-breaking for some of the language models, which can have unpredictable effects, especially in the context of socially sensitive tasks; and third, model-generated justifications are frequently not aligned with actual model behaviours 110 .

We again present average results on the level of language models in the main article. Results for individual model versions are provided in the  Supplementary Information , where we also analyse variation across settings and prompts (Supplementary Figs. 9 and 10 and Supplementary Tables 9 – 12 ).

Scaling analysis

In the scaling analysis, we examined whether increasing the model size alleviated the dialect prejudice. Because the content of the covert stereotypes is quite consistent and does not vary substantially between models with different sizes, we instead analysed the strength with which the language models maintain these stereotypes. We split the model versions of all language models into four groups according to their size using the thresholds of 1.5 × 10 8 , 3.5 × 10 8 and 1.0 × 10 10 (Extended Data Table 7 ).

To evaluate the familiarity of the models with AAE, we measured their perplexity on the datasets used for the two evaluation settings 83 , 87 . Perplexity is defined as the exponentiated average negative log-likelihood of a sequence of tokens 111 , with lower values indicating higher familiarity. Perplexity requires the language models to assign probabilities to full sequences of tokens, which is only the case for GPT2 and GPT3.5. For RoBERTa and T5, we resorted to pseudo-perplexity 112 as the measure of familiarity. Results are only comparable across language models with the same familiarity measure. We excluded GPT4 from this analysis because it is not possible to compute perplexity using the OpenAI API.

To evaluate the stereotype strength, we focused on the stereotypes about African Americans reported in ref. 29 , which the language models’ covert stereotypes agree with most strongly. We split the set of adjectives X into two subsets: the set of stereotypical adjectives in ref. 29 , X s , and the set of non-stereotypical adjectives, X n  =  X \ X s . For each model with a specific size, we then computed the average value of q ( x ;  v ,  θ ) for all adjectives in X s , which we denote as q s ( θ ), and the average value of q ( x ;  v ,  θ ) for all adjectives in X n , which we denote as q n ( θ ). The stereotype strength of a model θ , or more specifically the strength of the stereotypes about African Americans reported in ref. 29 , can then be computed as

A positive value of δ ( θ ) means that the model associates the stereotypical adjectives in X s more strongly with AAE than the non-stereotypical adjectives in X n , whereas a negative value of δ ( θ ) indicates anti-stereotypical associations, meaning that the model associates the non-stereotypical adjectives in X n more strongly with AAE than the stereotypical adjectives in X s . For the overt stereotypes, we used the same split of adjectives into X s and X n because we wanted to directly compare the strength with which models of a certain size endorse the stereotypes overtly as opposed to covertly. All other aspects of the experimental set-up are identical to the main analyses of covert and overt stereotypes.

HF analysis

We compared GPT3.5 (ref. 49 ; text-davinci-003) with GPT3 (ref. 63 ; davinci), its predecessor language model that was trained without HF. Similarly to other studies that compare these two language models 113 , this set-up allowed us to examine the effects of HF training as done for GPT3.5 in isolation. We compared the two language models in terms of favourability and stereotype strength. For favourability, we followed the methodology we used for the overt-stereotype analysis and evaluated the average weighted favourability of the top five adjectives associated with AAE. For stereotype strength, we followed the methodology we used for the scaling analysis and evaluated the average strength of the stereotypes as reported in ref.  29 .

Reporting summary

Further information on research design is available in the  Nature Portfolio Reporting Summary linked to this article.

Data availability

All the datasets used in this study are publicly available. The dataset released as ref. 87 can be found at https://aclanthology.org/2020.emnlp-main.473/ . The dataset released as ref. 83 can be found at http://slanglab.cs.umass.edu/TwitterAAE/ . The human stereotype scores used for evaluation can be found in the published articles of the Princeton Trilogy studies 29 , 30 , 31 , 34 . The most recent of these articles 34 also contains the human favourability scores for the trait adjectives. The dataset of occupational prestige that we used for the employability analysis can be found in the corresponding paper 105 . The Brown Corpus 114 , which we used for the  Supplementary Information (‘Feature analysis’), can be found at http://www.nltk.org/nltk_data/ . The dataset containing the parallel AAE, Appalachian English and Indian English texts 115 , which we used in the  Supplementary Information (‘Alternative explanations’), can be found at https://huggingface.co/collections/SALT-NLP/value-nlp-666b60a7f76c14551bda4f52 .

Code availability

Our code is written in Python and draws on the Python packages openai and transformers for language-model probing, as well as numpy, pandas, scipy and statsmodels for data analysis. The feature analysis described in the  Supplementary Information also uses the VALUE Python library 88 . Our code is publicly available on GitHub at https://github.com/valentinhofmann/dialect-prejudice .

Zhao, W. et al. WildChat: 1M ChatGPT interaction logs in the wild. In Proc. Twelfth International Conference on Learning Representations (OpenReview.net, 2024).

Zheng, L. et al. LMSYS-Chat-1M: a large-scale real-world LLM conversation dataset. In Proc. Twelfth International Conference on Learning Representations (OpenReview.net, 2024).

Gaebler, J. D., Goel, S., Huq, A. & Tambe, P. Auditing the use of language models to guide hiring decisions. Preprint at https://arxiv.org/abs/2404.03086 (2024).

Sheng, E., Chang, K.-W., Natarajan, P. & Peng, N. The woman worked as a babysitter: on biases in language generation. In Proc. 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (eds Inui. K. et al.) 3407–3412 (Association for Computational Linguistics, 2019).

Nangia, N., Vania, C., Bhalerao, R. & Bowman, S. R. CrowS-Pairs: a challenge dataset for measuring social biases in masked language models. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing (eds Webber, B. et al.) 1953–1967 (Association for Computational Linguistics, 2020).

Nadeem, M., Bethke, A. & Reddy, S. StereoSet: measuring stereotypical bias in pretrained language models. In Proc. 59th Annual Meeting of the Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (eds Zong, C. et al.) 5356–5371 (Association for Computational Linguistics, 2021).

Cheng, M., Durmus, E. & Jurafsky, D. Marked personas: using natural language prompts to measure stereotypes in language models. In Proc. 61st Annual Meeting of the Association for Computational Linguistics (eds Rogers, A. et al.) 1504–1532 (Association for Computational Linguistics, 2023).

Bonilla-Silva, E. Racism without Racists: Color-Blind Racism and the Persistence of Racial Inequality in America 4th edn (Rowman & Littlefield, 2014).

Golash-Boza, T. A critical and comprehensive sociological theory of race and racism. Sociol. Race Ethn. 2 , 129–141 (2016).

Article   Google Scholar  

Kasneci, E. et al. ChatGPT for good? On opportunities and challenges of large language models for education. Learn. Individ. Differ. 103 , 102274 (2023).

Nay, J. J. et al. Large language models as tax attorneys: a case study in legal capabilities emergence. Philos. Trans. R. Soc. A 382 , 20230159 (2024).

Article   ADS   Google Scholar  

Jiang, L. Y. et al. Health system-scale language models are all-purpose prediction engines. Nature 619 , 357–362 (2023).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V. & Kalai, A. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. Adv. Neural Inf. Process. Syst. 30 , 4356–4364 (2016).

Google Scholar  

Caliskan, A., Bryson, J. J. & Narayanan, A. Semantics derived automatically from language corpora contain human-like biases. Science 356 , 183–186 (2017).

Article   ADS   CAS   PubMed   Google Scholar  

Basta, C., Costa-jussà, M. R. & Casas, N. Evaluating the underlying gender bias in contextualized word embeddings. In Proc. First Workshop on Gender Bias in Natural Language Processing (eds Costa-jussà, M. R. et al.) 33–39 (Association for Computational Linguistics, 2019).

Kurita, K., Vyas, N., Pareek, A., Black, A. W. & Tsvetkov, Y. Measuring bias in contextualized word representations. In Proc. First Workshop on Gender Bias in Natural Language Processing (eds Costa-jussà, M. R. et al.) 166–172 (Association for Computational Linguistics, 2019).

Abid, A., Farooqi, M. & Zou, J. Persistent anti-muslim bias in large language models. In Proc. 2021 AAAI/ACM Conference on AI, Ethics, and Society (eds Fourcade, M. et al.) 298–306 (Association for Computing Machinery, 2021).

Bender, E. M., Gebru, T., McMillan-Major, A. & Shmitchell, S. On the dangers of stochastic parrots: can language models be too big? In Proc. 2021 ACM Conference on Fairness, Accountability, and Transparency 610–623 (Association for Computing Machinery, 2021).

Li, L. & Bamman, D. Gender and representation bias in GPT-3 generated stories. In Proc. Third Workshop on Narrative Understanding (eds Akoury, N. et al.) 48–55 (Association for Computational Linguistics, 2021).

Tamkin, A. et al. Evaluating and mitigating discrimination in language model decisions. Preprint at https://arxiv.org/abs/2312.03689 (2023).

Rae, J. W. et al. Scaling language models: methods, analysis & insights from training Gopher. Preprint at https://arxiv.org/abs/2112.11446 (2021).

Green, L. J. African American English: A Linguistic Introduction (Cambridge Univ. Press, 2002).

King, S. From African American Vernacular English to African American Language: rethinking the study of race and language in African Americans’ speech. Annu. Rev. Linguist. 6 , 285–300 (2020).

Purnell, T., Idsardi, W. & Baugh, J. Perceptual and phonetic experiments on American English dialect identification. J. Lang. Soc. Psychol. 18 , 10–30 (1999).

Massey, D. S. & Lundy, G. Use of Black English and racial discrimination in urban housing markets: new methods and findings. Urban Aff. Rev. 36 , 452–469 (2001).

Dunbar, A., King, S. & Vaughn, C. Dialect on trial: an experimental examination of raciolinguistic ideologies and character judgments. Race Justice https://doi.org/10.1177/21533687241258772 (2024).

Rickford, J. R. & King, S. Language and linguistics on trial: Hearing Rachel Jeantel (and other vernacular speakers) in the courtroom and beyond. Language 92 , 948–988 (2016).

Grogger, J. Speech patterns and racial wage inequality. J. Hum. Resour. 46 , 1–25 (2011).

Katz, D. & Braly, K. Racial stereotypes of one hundred college students. J. Abnorm. Soc. Psychol. 28 , 280–290 (1933).

Gilbert, G. M. Stereotype persistance and change among college students. J. Abnorm. Soc. Psychol. 46 , 245–254 (1951).

Article   CAS   Google Scholar  

Karlins, M., Coffman, T. L. & Walters, G. On the fading of social stereotypes: studies in three generations of college students. J. Pers. Soc. Psychol. 13 , 1–16 (1969).

Article   CAS   PubMed   Google Scholar  

Devine, P. G. & Elliot, A. J. Are racial stereotypes really fading? The Princeton Trilogy revisited. Pers. Soc. Psychol. Bull. 21 , 1139–1150 (1995).

Madon, S. et al. Ethnic and national stereotypes: the Princeton Trilogy revisited and revised. Pers. Soc. Psychol. Bull. 27 , 996–1010 (2001).

Bergsieker, H. B., Leslie, L. M., Constantine, V. S. & Fiske, S. T. Stereotyping by omission: eliminate the negative, accentuate the positive. J. Pers. Soc. Psychol. 102 , 1214–1238 (2012).

Article   PubMed   PubMed Central   Google Scholar  

Ghavami, N. & Peplau, L. A. An intersectional analysis of gender and ethnic stereotypes: testing three hypotheses. Psychol. Women Q. 37 , 113–127 (2013).

Lambert, W. E., Hodgson, R. C., Gardner, R. C. & Fillenbaum, S. Evaluational reactions to spoken languages. J. Abnorm. Soc. Psychol. 60 , 44–51 (1960).

Ball, P. Stereotypes of Anglo-Saxon and non-Anglo-Saxon accents: some exploratory Australian studies with the matched guise technique. Lang. Sci. 5 , 163–183 (1983).

Thomas, E. R. & Reaser, J. Delimiting perceptual cues used for the ethnic labeling of African American and European American voices. J. Socioling. 8 , 54–87 (2004).

Atkins, C. P. Do employment recruiters discriminate on the basis of nonstandard dialect? J. Employ. Couns. 30 , 108–118 (1993).

Payne, K., Downing, J. & Fleming, J. C. Speaking Ebonics in a professional context: the role of ethos/source credibility and perceived sociability of the speaker. J. Tech. Writ. Commun. 30 , 367–383 (2000).

Rodriguez, J. I., Cargile, A. C. & Rich, M. D. Reactions to African-American vernacular English: do more phonological features matter? West. J. Black Stud. 28 , 407–414 (2004).

Billings, A. C. Beyond the Ebonics debate: attitudes about Black and standard American English. J. Black Stud. 36 , 68–81 (2005).

Kurinec, C. A. & Weaver, C. III “Sounding Black”: speech stereotypicality activates racial stereotypes and expectations about appearance. Front. Psychol. 12 , 785283 (2021).

Rosa, J. & Flores, N. Unsettling race and language: toward a raciolinguistic perspective. Lang. Soc. 46 , 621–647 (2017).

Salehi, B., Hovy, D., Hovy, E. & Søgaard, A. Huntsville, hospitals, and hockey teams: names can reveal your location. In Proc. 3rd Workshop on Noisy User-generated Text (eds Derczynski, L. et al.) 116–121 (Association for Computational Linguistics, 2017).

Radford, A. et al. Language models are unsupervised multitask learners. OpenAI https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf (2019).

Liu, Y. et al. RoBERTa: a robustly optimized BERT pretraining approach. Preprint at https://arxiv.org/abs/1907.11692 (2019).

Raffel, C. et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21 , 1–67 (2020).

MathSciNet   Google Scholar  

Ouyang, L. et al. Training language models to follow instructions with human feedback. In Proc. 36th Conference on Neural Information Processing Systems (eds Koyejo, S. et al.) 27730–27744 (NeurIPS, 2022).

OpenAI et al. GPT-4 technical report. Preprint at https://arxiv.org/abs/2303.08774 (2023).

Zhang, E. & Zhang, Y. Average precision. In Encyclopedia of Database Systems (eds Liu, L. & Özsu, M. T.) 192–193 (Springer, 2009).

Black, J. S. & van Esch, P. AI-enabled recruiting: what is it and how should a manager use it? Bus. Horiz. 63 , 215–226 (2020).

Hunkenschroer, A. L. & Luetge, C. Ethics of AI-enabled recruiting and selection: a review and research agenda. J. Bus. Ethics 178 , 977–1007 (2022).

Upadhyay, A. K. & Khandelwal, K. Applying artificial intelligence: implications for recruitment. Strateg. HR Rev. 17 , 255–258 (2018).

Tippins, N. T., Oswald, F. L. & McPhail, S. M. Scientific, legal, and ethical concerns about AI-based personnel selection tools: a call to action. Pers. Assess. Decis. 7 , 1 (2021).

Aletras, N., Tsarapatsanis, D., Preoţiuc-Pietro, D. & Lampos, V. Predicting judicial decisions of the European Court of Human Rights: a natural language processing perspective. PeerJ Comput. Sci. 2 , e93 (2016).

Surden, H. Artificial intelligence and law: an overview. Ga State Univ. Law Rev. 35 , 1305–1337 (2019).

Medvedeva, M., Vols, M. & Wieling, M. Using machine learning to predict decisions of the European Court of Human Rights. Artif. Intell. Law 28 , 237–266 (2020).

Weidinger, L. et al. Taxonomy of risks posed by language models. In Proc. 2022 ACM Conference on Fairness, Accountability, and Transparency 214–229 (Association for Computing Machinery, 2022).

Czopp, A. M. & Monteith, M. J. Thinking well of African Americans: measuring complimentary stereotypes and negative prejudice. Basic Appl. Soc. Psychol. 28 , 233–250 (2006).

Chowdhery, A. et al. PaLM: scaling language modeling with pathways. J. Mach. Learn. Res. 24 , 11324–11436 (2023).

Bai, Y. et al. Training a helpful and harmless assistant with reinforcement learning from human feedback. Preprint at https://arxiv.org/abs/2204.05862 (2022).

Brown, T. B. et al. Language models are few-shot learners. In  Proc. 34th International Conference on Neural Information Processing Systems  (eds Larochelle, H. et al.) 1877–1901 (NeurIPS, 2020).

Dovidio, J. F. & Gaertner, S. L. Aversive racism. Adv. Exp. Soc. Psychol. 36 , 1–52 (2004).

Schuman, H., Steeh, C., Bobo, L. D. & Krysan, M. (eds) Racial Attitudes in America: Trends and Interpretations (Harvard Univ. Press, 1998).

Crosby, F., Bromley, S. & Saxe, L. Recent unobtrusive studies of Black and White discrimination and prejudice: a literature review. Psychol. Bull. 87 , 546–563 (1980).

Terkel, S. Race: How Blacks and Whites Think and Feel about the American Obsession (New Press, 1992).

Jackman, M. R. & Muha, M. J. Education and intergroup attitudes: moral enlightenment, superficial democratic commitment, or ideological refinement? Am. Sociol. Rev. 49 , 751–769 (1984).

Bonilla-Silva, E. The New Racism: Racial Structure in the United States, 1960s–1990s. In Race, Ethnicity, and Nationality in the United States: Toward the Twenty-First Century 1st edn (ed. Wong, P.) Ch. 4 (Westview Press, 1999).

Gao, L. et al. The Pile: an 800GB dataset of diverse text for language modeling. Preprint at https://arxiv.org/abs/2101.00027 (2021).

Ronkin, M. & Karn, H. E. Mock Ebonics: linguistic racism in parodies of Ebonics on the internet. J. Socioling. 3 , 360–380 (1999).

Dodge, J. et al. Documenting large webtext corpora: a case study on the Colossal Clean Crawled Corpus. In Proc. 2021 Conference on Empirical Methods in Natural Language Processing (eds Moens, M.-F. et al.) 1286–1305 (Association for Computational Linguistics, 2021).

Steed, R., Panda, S., Kobren, A. & Wick, M. Upstream mitigation is not all you need: testing the bias transfer hypothesis in pre-trained language models. In Proc. 60th Annual Meeting of the Association for Computational Linguistics (eds Muresan, S. et al.) 3524–3542 (Association for Computational Linguistics, 2022).

Feng, S., Park, C. Y., Liu, Y. & Tsvetkov, Y. From pretraining data to language models to downstream tasks: tracking the trails of political biases leading to unfair NLP models. In Proc. 61st Annual Meeting of the Association for Computational Linguistics (eds Rogers, A. et al.) 11737–11762 (Association for Computational Linguistics, 2023).

Köksal, A. et al. Language-agnostic bias detection in language models with bias probing. In Findings of the Association for Computational Linguistics: EMNLP 2023 (eds Bouamor, H. et al.) 12735–12747 (Association for Computational Linguistics, 2023).

Garg, N., Schiebinger, L., Jurafsky, D. & Zou, J. Word embeddings quantify 100 years of gender and ethnic stereotypes. Proc. Natl Acad. Sci. USA 115 , E3635–E3644 (2018).

Ferrer, X., van Nuenen, T., Such, J. M. & Criado, N. Discovering and categorising language biases in Reddit. In Proc. Fifteenth International AAAI Conference on Web and Social Media (eds Budak, C. et al.) 140–151 (Association for the Advancement of Artificial Intelligence, 2021).

Ethayarajh, K., Choi, Y. & Swayamdipta, S. Understanding dataset difficulty with V-usable information. In Proc. 39th International Conference on Machine Learning (eds Chaudhuri, K. et al.) 5988–6008 (Proceedings of Machine Learning Research, 2022).

Hoffmann, J. et al. Training compute-optimal large language models. Preprint at https://arxiv.org/abs/2203.15556 (2022).

Liang, P. et al. Holistic evaluation of language models. Transactions on Machine Learning Research https://openreview.net/forum?id=iO4LZibEqW (2023).

Blodgett, S. L., Barocas, S., Daumé III, H. & Wallach, H. Language (technology) is power: A critical survey of “bias” in NLP. In Proc. 58th Annual Meeting of the Association for Computational Linguistics (eds Jurafsky, D. et al.) 5454–5476 (Association for Computational Linguistics, 2020).

Jørgensen, A., Hovy, D. & Søgaard, A. Challenges of studying and processing dialects in social media. In Proc. Workshop on Noisy User-generated Text (eds Xu, W. et al.) 9–18 (Association for Computational Linguistics, 2015).

Blodgett, S. L., Green, L. & O’Connor, B. Demographic dialectal variation in social media: a case study of African-American English. In Proc. 2016 Conference on Empirical Methods in Natural Language Processing (eds Su, J. et al.) 1119–1130 (Association for Computational Linguistics, 2016).

Jørgensen, A., Hovy, D. & Søgaard, A. Learning a POS tagger for AAVE-like language. In Proc. 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (eds Knight, K. et al.) 1115–1120 (Association for Computational Linguistics, 2016).

Blodgett, S. L. & O’Connor, B. Racial disparity in natural language processing: a case study of social media African-American English. Preprint at https://arxiv.org/abs/1707.00061 (2017).

Blodgett, S. L., Wei, J. & O’Connor, B. Twitter universal dependency parsing for African-American and mainstream American English. In Proc. 56th Annual Meeting of the Association for Computational Linguistics (eds Gurevych, I. & Miyao, Y.) 1415–1425 (Association for Computational Linguistics, 2018).

Groenwold, S. et al. Investigating African-American vernacular English in transformer-based text generation. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing (eds Webber, B. et al.) 5877–5883 (Association for Computational Linguistics, 2020).

Ziems, C., Chen, J., Harris, C., Anderson, J. & Yang, D. VALUE: Understanding dialect disparity in NLU. In Proc. 60th Annual Meeting of the Association for Computational Linguistics (eds Muresan, S. et al.) 3701–3720 (Association for Computational Linguistics, 2022).

Davidson, T., Bhattacharya, D. & Weber, I. Racial bias in hate speech and abusive language detection datasets. In Proc. Third Workshop on Abusive Language Online (eds Roberts, S. T. et al.) 25–35 (Association for Computational Linguistics, 2019).

Sap, M., Card, D., Gabriel, S., Choi, Y. & Smith, N. A. The risk of racial bias in hate speech detection. In Proc. 57th Annual Meeting of the Association for Computational Linguistics (eds Korhonen, A. et al.) 1668–1678 (Association for Computational Linguistics, 2019).

Harris, C., Halevy, M., Howard, A., Bruckman, A. & Yang, D. Exploring the role of grammar and word choice in bias toward African American English (AAE) in hate speech classification. In Proc. 2022 ACM Conference on Fairness, Accountability, and Transparency 789–798 (Association for Computing Machinery, 2022).

Gururangan, S. et al. Whose language counts as high quality? Measuring language ideologies in text data selection. In Proc. 2022 Conference on Empirical Methods in Natural Language Processing (eds Goldberg, Y. et al.) 2562–2580 (Association for Computational Linguistics, 2022).

Gaies, S. J. & Beebe, J. D. The matched-guise technique for measuring attitudes and their implications for language education: a critical assessment. In Language Acquisition and the Second/Foreign Language Classroom (ed. Sadtano, E.) 156–178 (SEAMEO Regional Language Centre, 1991).

Hudson, R. A. Sociolinguistics (Cambridge Univ. Press, 1996).

Delobelle, P., Tokpo, E., Calders, T. & Berendt, B. Measuring fairness with biased rulers: a comparative study on bias metrics for pre-trained language models. In Proc. 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (eds Carpuat, M. et al.) 1693–1706 (Association for Computational Linguistics, 2022).

Mattern, J., Jin, Z., Sachan, M., Mihalcea, R. & Schölkopf, B. Understanding stereotypes in language models: Towards robust measurement and zero-shot debiasing. Preprint at https://arxiv.org/abs/2212.10678 (2022).

Eisenstein, J., O’Connor, B., Smith, N. A. & Xing, E. P. A latent variable model for geographic lexical variation. In Proc. 2010 Conference on Empirical Methods in Natural Language Processing (eds Li, H. & Màrquez, L.) 1277–1287 (Association for Computational Linguistics, 2010).

Doyle, G. Mapping dialectal variation by querying social media. In Proc. 14th Conference of the European Chapter of the Association for Computational Linguistics (eds Wintner, S. et al.) 98–106 (Association for Computational Linguistics, 2014).

Huang, Y., Guo, D., Kasakoff, A. & Grieve, J. Understanding U.S. regional linguistic variation with Twitter data analysis. Comput. Environ. Urban Syst. 59 , 244–255 (2016).

Eisenstein, J. What to do about bad language on the internet. In Proc. 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (eds Vanderwende, L. et al.) 359–369 (Association for Computational Linguistics, 2013).

Eisenstein, J. Systematic patterning in phonologically-motivated orthographic variation. J. Socioling. 19 , 161–188 (2015).

Jones, T. Toward a description of African American vernacular English dialect regions using “Black Twitter”. Am. Speech 90 , 403–440 (2015).

Christiano, P. F. et al. Deep reinforcement learning from human preferences. Proc. 31st International Conference on Neural Information Processing Systems (eds von Luxburg, U. et al.) 4302–4310 (NeurIPS, 2017).

Zhao, T. Z., Wallace, E., Feng, S., Klein, D. & Singh, S. Calibrate before use: Improving few-shot performance of language models. In Proc. 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.) 12697–12706 (Proceedings of Machine Learning Research, 2021).

Smith, T. W. & Son, J. Measuring Occupational Prestige on the 2012 General Social Survey (NORC at Univ. Chicago, 2014).

Zhao, J., Wang, T., Yatskar, M., Ordonez, V. & Chang, K.-W. Gender bias in coreference resolution: evaluation and debiasing methods. In Proc. 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (eds Walker, M. et al.) 15–20 (Association for Computational Linguistics, 2018).

Hughes, B. T., Srivastava, S., Leszko, M. & Condon, D. M. Occupational prestige: the status component of socioeconomic status. Collabra Psychol. 10 , 92882 (2024).

Gramlich, J. The gap between the number of blacks and whites in prison is shrinking. Pew Research Centre https://www.pewresearch.org/short-reads/2019/04/30/shrinking-gap-between-number-of-blacks-and-whites-in-prison (2019).

Walsh, A. The criminal justice system is riddled with racial disparities. Prison Policy Initiative Briefing https://www.prisonpolicy.org/blog/2016/08/15/cjrace (2016).

Röttger, P. et al. Political compass or spinning arrow? Towards more meaningful evaluations for values and opinions in large language models. Preprint at https://arxiv.org/abs/2402.16786 (2024).

Jurafsky, D. & Martin, J. H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (Prentice Hall, 2000).

Salazar, J., Liang, D., Nguyen, T. Q. & Kirchhoff, K. Masked language model scoring. In Proc. 58th Annual Meeting of the Association for Computational Linguistics (eds Jurafsky, D. et al.) 2699–2712 (Association for Computational Linguistics, 2020).

Santurkar, S. et al. Whose opinions do language models reflect? In Proc. 40th International Conference on Machine Learning (eds Krause, A. et al.) 29971–30004 (Proceedings of Machine Learning Research, 2023).

Francis, W. N. & Kucera, H. Brown Corpus Manual (Brown Univ.,1979).

Ziems, C. et al. Multi-VALUE: a framework for cross-dialectal English NLP. In Proc. 61st Annual Meeting of the Association for Computational Linguistics (eds Rogers, A. et al.) 744–768 (Association for Computational Linguistics, 2023).

Download references

Acknowledgements

V.H. was funded by the German Academic Scholarship Foundation. P.R.K. was funded in part by the Open Phil AI Fellowship. This work was also funded by the Hoffman-Yee Research Grants programme and the Stanford Institute for Human-Centered Artificial Intelligence. We thank A. Köksal, D. Hovy, K. Gligorić, M. Harrington, M. Casillas, M. Cheng and P. Röttger for feedback on an earlier version of the article.

Author information

Authors and affiliations.

Allen Institute for AI, Seattle, WA, USA

Valentin Hofmann

University of Oxford, Oxford, UK

LMU Munich, Munich, Germany

Stanford University, Stanford, CA, USA

Pratyusha Ria Kalluri & Dan Jurafsky

The University of Chicago, Chicago, IL, USA

Sharese King

You can also search for this author in PubMed   Google Scholar

Contributions

V.H., P.R.K., D.J. and S.K. designed the research. V.H. performed the research and analysed the data. V.H., P.R.K., D.J. and S.K. wrote the paper.

Corresponding authors

Correspondence to Valentin Hofmann or Sharese King .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature thanks Rodney Coates and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended data fig. 1 weighted average favourability of top stereotypes about african americans in humans and top overt as well as covert stereotypes about african americans in language models (lms)..

The overt stereotypes are more favourable than the reported human stereotypes, except for GPT2. The covert stereotypes are substantially less favourable than the least favourable reported human stereotypes from 1933. Results without weighting, which are very similar, are provided in Supplementary Fig. 6 .

Extended Data Fig. 2 Prestige of occupations associated with AAE (positive values) versus SAE (negative values), for individual language models.

The shaded areas show 95% confidence bands around the regression lines. The association with AAE versus SAE is negatively correlated with occupational prestige, for all language models. We cannot conduct this analysis with GPT4 since the OpenAI API does not give access to the probabilities for all occupations.

Supplementary information

Supplementary information, reporting summary, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Hofmann, V., Kalluri, P.R., Jurafsky, D. et al. AI generates covertly racist decisions about people based on their dialect. Nature (2024). https://doi.org/10.1038/s41586-024-07856-5

Download citation

Received : 08 February 2024

Accepted : 19 July 2024

Published : 28 August 2024

DOI : https://doi.org/10.1038/s41586-024-07856-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

research papers in linguistics

Linguistics Research Paper

Academic Writing Service

This sample linguistics research paper features: 8700 words (approx. 29 pages), an outline, and a bibliography with 32 sources. Browse other research paper examples for more inspiration. If you need a thorough research paper written according to all the academic standards, you can always turn to our experienced writers for help. This is how your paper can get an A! Feel free to contact our writing service for professional assistance. We offer high-quality assignments for reasonable rates.

Definitions

20th-century delineations, formal linguistics, noam chomsky, language competence and the sentence, functional linguistics, structural and comparative linguistics, sociolinguistic perspectives, language mixtures, pidgins and creoles, linguistics and politics, language extinction, psycholinguistics, semantics and pragmatics.

  • Bibliography

Introduction

Linguistics Research Paper

Academic Writing, Editing, Proofreading, And Problem Solving Services

Get 10% off with 24start discount code, more linguistics research papers:.

  • Linguistics of Word Research Paper
  • Probabilistic Grammars Research Paper
  • Linguistic Pragmatics Research Paper
  • Possession in Linguistics Research Paper
  • Politeness And Language Research Paper
  • Pidgin And Creole Languages Research Paper
  • Nonlinear Phonology Research Paper
  • Phonology Research Paper
  • Articulatory Phonetics Research Paper
  • Psycholinguistics Research Paper
  • Edward Sapir Research Paper
  • Sapir–Whorf Hypothesis Research Paper
  • Semantic Processing Research Paper
  • Semantic Similarity Research Paper
  • Semantics Research Paper
  • Psychological And Neural Aspects of Sign Language Research Paper
  • Syntactic Control Research Paper
  • Syntax Research Paper
  • Syntax-Phonology Interface Research Paper
  • Neural Basis Of Syntactic Aspects Research Paper

The reasons and methods for trying to understand language have changed from one historic era to the next, making scholarly activity in the field known as linguistics as vibrant as each era. Knowledge of the changes in perspective about language development provides one key to unlocking the door to characterize the nature of human beings as well as unlocking the door to the evolution and growth of societies. For example, Franz Boas (1858–1942) used what became known as descriptive-structural linguistics in his studies of culture and anthropology in the early 20th century. His interpretation of language was, in the words of Michael Agar (1994), “just a ‘part’ of anthropological fieldwork, and the point of fieldwork was to get to culture” (p. 49). This sense of linguistics as a vehicle was shared by the students of Boas and became a primary interpretation for many years, especially through the influence of Leonard Bloomfield. One can only imagine the kinds and degrees of meaning that are lost to us about peoples of the world due to the formal methods used in the study of language in the early 20th century and the relegation of language, as a research tool, as it was by Boas and Bloomfield. However, for the time, descriptivestructural linguistics was a significant advancement, albeit more of a part of anthropology rather than a separate field in itself. That changed dramatically in the latter half of the 20th century, particularly with the dynamic referred to by Noam Chomsky (2005) as the second cognitive revolution when the number of new research fields increased (e.g., cognitive psychology, computer science, artificial intelligence). The first cognitive revolution is a cognomen for the period between the 17th and early 19th centuries when classical thoughts and theories about language were proposed, especially by philosophers such as René Descartes, Gottfried Leibnitz, and Immanuel Kant.

In the 21st century, the methods of language study and characterizations of linguistics hardly resemble those of Boas and anthropologists in his era. Current scholars cannot capture all the characteristics of language in just one definition or modality to designate linguistics as one singular field of study. Multiple views of language and linguistics support a richer perspective about the study of language and people than one that identifies linguistic methods only as tools to find out about culture.

Philology in the 1800s was the ancestor to general linguistics. Those who identified themselves as philologists were oftentimes recruits from the field of philosophy. Their studies provided historical perspectives about languages—classifying and categorizing them by phonology, morphology, and syntax (but not so much by semantics and pragmatics).

Much of the early linguistic research (i.e., up to the first half of the 20th century) was undertaken to find out about the speech of ancient peoples. Thus, there was a reliance on writings—as well as on the spoken word—as these survived and changed into modern eras. Comparative linguistics enabled scientists to look for patterns in spoken languages in order to find connections among them that might give some indication of evolution. Those involved in comparative linguistics were close cousins to researchers in the current subfield of sociolinguistics, which attempts to understand language use and its social implications as well as the consequences of language and literacy development and education among citizens of world nations and societies within them.

In the latter half of the 20th century, the pursuit of language understanding enhanced the identity of linguistics as a field constituted of several subfields, with each involving the study of specific human dimensions evidenced in language use. For example, forensic linguistics provides insights into language, law, and crime; neurolinguistics includes the relationships between language and the human nervous system. This latter field holds much promise for understanding individuals afflicted with aphasia and other communication disorders. It also provides answers regarding second-language learning and multilingualism. Another linguistic subfield, computational linguistics, is one that has supported the developments of the computer age. This field involves scholars from a wide range of related disciplines (e.g., logicians, computer scientists, anthropologists, cognitive scientists) in the study of natural language understanding to create models for incorporation in technological devices and instrumentation for crosslinguistic communication and translation. For example, the quality of voice recognition on the telephone, as well as the complexities of voice recognition responses, was unimaginable even in the early 1980s. Likewise, translations of written languages in computer search engines, such as Google, require sensitivity to meaning as well as to the interpretations of words and grammar between any two languages.

The branching off of language studies into a range of related linguistic disciplines demonstrates that there is no limit to the number and variety of questions that can be approached. Answers are constrained only by one’s choice of definition, purpose, and characterization of language. Even so, the richness of language research, both past and present, shows that an answer to one question many times leads to new and more interesting ones. And, for the most part, language questions are now perceived to pose dynamic challenges in and among subfields of linguistics. For example, why should we be concerned about the extinction of languages? How did spoken languages evolve?

The Nature of Language

Studies of language by researchers who are designated as members of one of the several subfields of linguistics is limited by the particular theory or theories held by the particular researcher(s). Each theory is derived from the definitions of elements or characteristics of language that are of interest to the individual. Definitions of language chosen by linguists will influence the direction in which research will proceed; however, among the linguists, there is much cross-disciplinary understanding that continuously reshapes arguments and individual theories.

There are a great variety of scholarly definitions for language as well as for languages. Each reflects the theoretical perspectives and areas of study of the specific group (i.e., subfield) of linguists. If one were to ask for a definition from those who are not considered academics, however, they more often than not would associate language with spoken communication. Joel Davis, in his discussions about the mother tongue, explains that there is somewhat of a dilemma for linguists to pose a singular definition to language because of the multiplicity of characteristics and the use of one’s own language to describe language in general. To capture the nature of language and define it, linguists attempt to study language structure (form) as well as language use ( function ). Studies may reveal things in single languages or singular situations or may uncover things by comparison of one language to another language or other languages.

Those who look at the structure of languages do so to establish a foundation for exploring distinct parts and compositions of specific languages in order to see what might be common among them. Van Valin explains that from the beginning of the 20th century, those who were curious about “linguistic science,” such as Boas and his contemporary Ferdinand de Saussure (1857–1913), were especially focused on identifying language systems to support the further study of language use. This positioned the definitions of language within a construct that came to be known as structural linguistics. In the 1930s, Leonard Bloomfield reinforced the idea of structuralism, claiming that the main object of linguistic study should involve grammatical principles that have little or nothing to do with observations of what individuals know or think about their language.

In the second half of the 20th century, as researchers from fields such as psychology, cognitive science, and sociology began to take interest in language studies, definitions of language could be distinguished as representative of one of two major linguistic areas, formalism or functionalism. The former area involves linguistic study of the systematic, organized ways that language is structured. The latter area is more concerned with language use and the reasons why individuals choose to speak in certain ways and not in others.

Franz Boas, Ferdinand de Saussure, and Leonard Bloomfield are among those who are acknowledged as formal linguistic researchers in the first half of the 20th century. Their theories and the field of structural linguistics led the way to expanded ideas about language study. Boas is considered to be the father of American anthropology, and as stated above, his use of linguistic analyses was only as a tool to get to culture. Although Saussure did not write down his ideas in articles or books, his lecture notes distributed among his students became a text after his death titled Course in General Linguistics. Language researchers give recognition to Saussure for the growth of linguistics as a science, and his work has been a central one for the development of the subfield of sociolinguistics. Bloomfield is best known as a linguist, although some classify him as an anthropologist. Of his many writings, his book Language was revered for its discussions of structural linguistics and comparative work to characterize languages.

The work of these three scholars—Boas, Saussure, and Bloomfield—left an indelible imprint on the field of linguistics. In their wake, there began a strong desire among young language researchers to pursue studies in formal linguistics. However, none was to compare to Noam Chomsky who moved formal linguistics into a new home, that of generative transformational grammar.

A political activist and formal linguist, Chomsky designated two particular foci for characterizing and, thus, added to the definitions of language. In his book Aspects of a Theory of Syntax, he distinguishes between language competence and language performance. Previously, those researchers who were identified with structural linguistics ignored or paid little attention to language competence which, as stated by Van Valin (2001), “refers to a native speaker’s knowledge of his or her native language” (p. 326). Structuralists were more concerned about language performance, or how speakers used the language forms to communicate. In Chomsky’s work and that of others who ascribe to the newer area of formalism, there is more of an involvement with explorations of cognition, and this situates language competence as the main focus for striving to define language. Those who study generative transformational grammar in the tradition of Chomsky look for linguistic characteristics that are universal to all languages (e.g., all natural languages have nouns and verbs). Language is approached by exploring its generative capacity using a logical system of transformations to manipulate syntax.

Chomsky’s work drew attention to distinctions between the surface and deep structures of sentences. For example, he notes that the difference between the following two sentences is at the level of deep structure; both are composed of the same syntactic elements in the same order at the surface but differ at the deep level:

John is easy to please.

John is eager to please.

A critical part of the linguistic theories of Chomsky concerns how humans are “wired” for language. Having critiqued the work presented in B. F. Skinner’s Verbal Behavior, Chomsky reinforced his own belief that humans have innate knowledge of grammar as evidenced in the ways that individuals can generate new, never before uttered sentences.

This particular view of universal grammar and linguistic nativism contradicted the work of Edward Sapir and his student Benjamin Whorf; both had proposed a theory of linguistic relativity. The Sapir-Whorf hypothesis states that the cognition of individuals is influenced by their linguistic experiences within their given cultures. In other words, people in different cultures have different worldviews that have been tempered by the ways that their languages are structured and used.

In the 1960s, Thomas G. Bever and D. Terence Langendoen characterized language competence in this way, “A person knows how to carry out three kinds of activities with his language: He can produce sentences, he can understand sentences, and he can make judgments about potential sentences” (Stockwell & Macaulay, 1972, p. 32). In the previous comment, there is the singular concentration on the role of the sentence. In formal linguistic research, the sentence has been the central grammatical vehicle through which characteristics of language are identified. Although all languages are the subject of study, it is particularly in English and many other SVO languages (i.e., subject-verb-object sentence ordered) that the sentence has provided a foundation for analyses.

Formal linguists who are designated as psycholinguists have long held that designing research at levels of discourse beyond the sentence is especially unwieldy, and it may be difficult to resolve a hypothesis with absolute certainty. One psychologist, who demonstrated this point in his work regarding the interpretation of written texts in the 1980s through the 21st century, is Karl Haberlandt, a scholar in the field of memory and cognition.

The previous discussion requires a clarification about the definition of sentence. Formal linguistics looks at the syntax of sentences and the rules by which the grammar of a language allows for the order of words in sentences. For example, English transitive sentences commonly follow the order [s]ubject, [v]erb, [o]bject, but there may be variations of this order that are acceptable in English conversation. French follows a SVO pattern but is SOV when personal pronouns are used (e.g., Je t’aime, “I you love”). Consider also the ordering of adjectives in English, for example, three enormous green avocados versus green enormous three avocados.

Although not a member of any of the subfields of linguistics yet mentioned here, Richard Montague is a linguist known for his attempts to quantify language by matching the logic of set theory to characterizing the semantics of sentences. Although his life was a short one, his legacy of Montague grammar remains to challenge those who respect formal linguistics and considerations of the ordering of language.

The second area of focus from which we might posit definitions of language is that of functionalism. Individuals who are involved in this particular area propose theories of language use that may or may not allow for grammatical connections. Van Valin classifies the functional linguists as extreme, moderate, or conservative. Those who are in the first category do not admit to any use for grammatical (i.e., syntactic) analysis in their studies. To them, all language study is necessarily at the level of discourse, and observations of language grammar are restricted to the discourse. Those who are conservative functional linguists study language by adding on language use components to formal linguistic grammars. They keep the syntactic structures as the main part of the design of their research and amend them with discourse rules. Susumu Kuno is a well-known functional linguist who proposed a functional sentence perspective that guided a part of his research at Harvard University.

Moderate functional linguistics is especially represented by the work of M. A. K. Halliday. This subfield of linguistics is particularly appealing to anthropologists since it encourages comparative studies of communication and discourse without completely discounting the need for reference to grammatical theories. Moderate formal linguistics includes the consideration of semantics and pragmatics within the analysis of spoken human discourse. Dell Hymes (1996), credited with naming the linguistic subfield of anthropological linguistics, commented on the nature of language and provided a functionalist perspective of grammar in which he criticized Chomskian theories of formal generative grammar. This perspective demonstrates the thinking of the moderate functional linguist:

The heart of the matter is this. A dominant conception of the goals of “linguistic theory” encourages one to think of language exclusively in terms of the vast potentiality of formal grammar, and to think of that potentiality exclusively in terms of universality. But a perspective which treats language only as an attribute is unintelligible. In actuality language is in large part what users have made of it. (Hymes, 1996, p. 26)

One important functional linguist and anthropologist who had studied under Boas, and whose work was particularly vital in the latter half of the 20th century, is Joseph Greenberg (1915–2001). He is credited with providing the first thorough classification of African languages. Greenberg looked for language universals through language performance, rather than through formalistic analyses such as those of Chomsky. Since his work resulted in characterizing languages in this way, Greenberg is also mentioned in discussions of typological universal grammar.

Classification of Human Languages

The classification and categorization of human languages is particularly complex. First, there is the complexity derived from the theories and definitions of the linguists who are influenced by their own subfields of linguistics. Second, there is the complex weave among the topics of language evolution, language modification and change, and language death that in some respects is an uncompleted textile, metaphorically speaking. Each of these areas is connected to the other in simple and intricate ways, and they continue to enkindle disagreements among researchers who want to classify languages. When, why, and how does/did language evolution occur? What are the causes and correlates of language change? Are there any simple reasons why languages die? How do languages differ regarding interpretation and communication both between and among cultures?

In the last quarter of the 20th century, it became somewhat clear that no one subfield of linguistics could provide full answers to those questions that concern the classification of languages. Thus, some linguists have joined forces with individuals who have opposing views from their own or who are experts in allied fields. For example, anthropological linguists do well to partner with formal linguists, neurolinguists, and archaeologists to search for the origins of spoken language. Researchers such as Marc Hauser, Noam Chomsky, Morten Christensen, and Simon Kirby have commented on the need for cross-collaborative efforts to study the evolution of language and languages, and they have been collaborative themselves.

Philologists who, for the most part, were later to be known as comparative philologists and, subsequently, comparative linguists, started out with questions concerning spoken languages and their origins. One of their main areas of inquiry was guided by material gleaned from artifacts that survived from ancient civilizations; most of these included writings and monuments from the Sumerian civilization dating between 5000 and 2000 BCE. Researchers hypothesized about modes of spoken language by evaluating ancient patterns of writing, that is, by separating out demarcations from other elements of what might be a grammar. They also strove to classify spoken languages by documenting those that occurred in various parts of the world, creating models of word structures and grammars as well as looking for consistency and similarities from one geographical area to another. This kind of work, of the philologists and comparative linguists, was, however, once limited by the Societé de Linguistique de Paris in 1866 as a response to the proliferation of ill-conceived explorations into the evolution of language prompted by the publication of Darwin’s On the Origin of Species. It was not until the last decade of the 20th century that research on the origins and evolution of languages had a resurgence among a new breed of anthropological linguists, who were not at all like their comparative linguist predecessors, as well as among teams of researchers from fields such as computer science, neurology, biology, and formal linguistics. Though still using theories derived from formal linguists, new paradigms for research included language competence and communication theories.

In 1997, Philip Parker produced a statistical analysis of over 460 language groups in 234 countries, showing the connections between linguist cultures and life issues in their societies (e.g., economics, resources that defined cultures, and demography). He used variables such as the availability of water, transportation, and means for communication to see patterns regarding the development of nations, especially in third world countries. Parker’s work can be studied to understand the difficulties involved in trying to classify languages as well as in identifying new languages or finding those that are going extinct.

Those who identify themselves as sociolinguists are concerned with the study of how individuals use language to be understood within particular communication contexts. This includes research about sports, courts of law, teen talk, conversations between individuals of the same or different genders, and even ITM (instant text messaging). Sociolinguists primarily concentrate on spoken languages or on gestural languages, such as American Sign Language. However, several scholars have become curious about written languages, especially about literacy. Rather than using formal linguistics, as did the structural linguists, sociolinguists use observations about the human condition, human situations, and ethnographic data to understand language. When their research includes formal linguistic analyses, it is to demonstrate language interpretations and comparisons of language use within particular social contexts.

Sociolinguists are well acquainted with the theories of Saussure. Although Saussure was only 2 years old when Darwin wrote On the Origin of the Species (1859), linguists in the early 20th century have remarked that Saussure showed an awareness of Darwin’s ideas in his lectures on language change and evolution. At that time, those linguists who were concerned with anthropology or language growth and language interactions within societies more than with the formal characterization of languages attended to linguistic performance rather than to linguistic competence. This was the period of structural and comparative linguistics. Until the early 1950s, the term sociolinguist was not used. In the following two decades, researchers were involved in what now is commonly identified as sociolinguistic studies, but these individuals were not fully recognized within the subfield of linguistics called sociolinguistics until well into the 1970s.

Sociolinguists are especially concerned with the processes involved in language use in societies. Their research designs are commonly ethnographic. Dell Hymes has been identified as the father of the ethnography of communication approach used in sociolinguistic research. As an anthropologist, Hymes observed that those in his field and those in linguistics needed to combine theoretical dispositions to fill in the gaps in each other’s research. He saw that the legacy of Boas resulted in many anthropologists thinking about the use of linguistics in their work only at the level of a tool as Agar has interpreted it. Hymes also saw that linguists were focusing on what he thought was too much formalism. An ethnography of speaking would enable those in each field to get a fuller picture of the language processes used by individuals, as well as reasons for their use, processes that are associated with one of a variety of social constructs—politeness behaviors, courts of law, and the deference to the elderly.

Deborah Tannen’s research, concerning gender differences in conversations in the United States in the 1980s, involved the use of video to compare the conversational behaviors of children, teens, and adults who were paired by gender and put into a room for a short time with only their partners. Her work has added much to understanding the effects of communication behaviors, by environment and human nature, along the continuum to adulthood. Although Tannen could have dissected her subjects’ conversations using formal grammatical methods, she was much better able to answer her research questions by analyzing the processes, both verbal and nonverbal, that they used. In fact, the nonverbal behaviors were especially revealing.

Tannen’s previous research had prepared her for her gender comparison study. In one early piece of research, she participated as a collaborator with several other linguists to observe and subsequently characterize differences in verbal interpretations of a film by individuals from several nations around the world. This led to the publication in 1980 of The Pear Stories, edited by Wallace Chafe. Tannen compared the narratives of Athenian Greeks to those of American English speakers and concluded that the style and form of interpretations vary according to how people of a given culture adopt the conventionalization of rhetorical forms used in their culture. She supports her claims with research from sociolinguists John Gumperz and Dell Hymes. Her comments about cultural stereotypes in this early study are one reason that this work should be reread in the 21st century, especially by political scientists and those concerned about cultural misunderstandings derived from translations between the languages of two nations, particularly when the conversations have consequences for peace between these nations:

The cultural differences which have emerged in the present study constitute real differences in habitual ways of talking which operate in actual interaction and create impressions on listeners—the intended impression, very likely, on listeners from the same culture, but possibly confused or misguided impressions on listeners from other cultures. It is easy to see how stereotypes may be created and reinforced. Considering the differences in oral narrative strategies found in the pear narratives, it is not surprising that Americans might develop the impression that Greeks are romantic and irrational, and Greeks might conclude thatAmericans are cold and lacking in human feelings. (Tannen, 1980, p. 88)

The concept of language mixtures is one that has been identified through sociolinguistic research. It includes areas of oral communication accommodation between people who speak different native languages as well as the use of new “half-languages,” as McWhorter calls them—that is, pidgins and creoles. As people migrate, voluntarily or as a consequence of a historical situation (e.g., the great potato famine, the slave trade), they have a need, to a greater or lesser extent, to communicate with those who do not speak their language. For example, the United States experienced large waves of immigration from the mid1800s to the 1920s. As these new Americans populated cities on the East Coast and continued to settle throughout the United States, they maintained their original cultures in ethnic neighborhoods and were comfortable speaking their native languages. Schools accommodated these immigrants, providing instruction in English as well as in dominant European languages. Across the neighborhoods, individuals tried to communicate for economic reasons and for socialization. Sometimes, the elderly preferred to speak only their mother tongue, even insisting that their children or grandchildren do so whenever in their presence. Regardless, these new citizens created what linguists call an interlanguage, which includes words and expressions from both the new language and their mother tongues.

Interlanguage is defined in one of two ways. It may be that an individual creates or mixes terms between the native language and the target language. A Polish immigrant might use an expression such as “Ja be˛de˛ is´ do marku” (“I will go to the market”), substituting the first syllable of the English word, market, in the Polish word, rynku, and retaining the final syllable of the Polish word. ( Rynku is the Polish word for market.)

A second way that interlanguage occurs is in situations where each individual in a conversation uses clever verbal manipulations. It may be that the speaker imposes the syntax of the native language on the order of words in the new language. For example, Larry Selinker, an expert in interlanguage, gives an example where an Israeli says, “I bought downtown the postcard.”

As individuals become bilingual, they will switch between the two languages in their attempts to be understood or to clarify for the listener what they mean. This behavior is called code-switching, and over time, individuals who are in constant communication may create new words and expressions that possess characteristics of each or both languages.

Studies of interlanguage and code-switching provide information regarding the development of new languages but especially new words. Researchers such as Joshua Fishman have observed a special form of language mixture that evolves slowly within speech communities—that is, groups or societies that use one variety of their native language. An example of this situation, called diglossia, is a language vernacular. Some languages have one formal language variety and one or more informal ones. Vernaculars are often called the “common language” of the people. What is very interesting about diglossia is that in some places in the world, as in some parts of Africa, two speech communities may live side by side and never mix. Speakers of one language will continue to use their mother tongue when addressing individuals who speak another language. Yet the latter will understand the former but never adopt any of the morphology, phonology, or grammar of those speakers.

Pidgins are formed when speakers of one language interact with those of a second language for particular purposes. As with language mixtures, they are called contact languages, and for the most part, they developed during the colonial periods when European traders sailed to countries in Africa, as well as to South America, and to islands in one of the great oceans. However, pidgins may arise anytime speakers of two languages have a particular need to communicate. They are characterized by a mixture of words from each language (e.g., French and Eˇwé, an official language of Togo) in a somewhat “abbreviated” kind of grammar. Frequently, pidgin languages die out as individuals become bilingual or if there is no longer a need for communication between speakers of each natural language. Many pidgin languages that prevail become regularized from one generation to the subsequent one, and they take on well-defined morphological and syntactic rules. When this happens, they are then called creole languages. McWhorter observes that, just as natural languages may occur in one of several varieties, creoles, too, may have more than one variety. Creoles often have the same generative properties as natural languages. One very well studied creole language is Tok Pisin of Papua, New Guinea. It is estimated that between 4 and 6 million people speak it.

Linguistic studies regarding language mixtures, including pidgins and creoles, have been a source of valuable information to historians and geographers as well as to anthropologists and sociologists. Besides gaining an understanding about more recent history, especially the colonial eras and migrations in modern times, researchers have been able to hypothesize about the structures of and changes in societies where there has been contact with groups from countries and nations distant from themselves. Those linguists who promote theories of linguistic relativism are able to better understand the effects of language change brought on by social interactions among peoples from different parts of the world. As moderate functionalists, they are also able to evaluate language use by integrating generative functional linguistics into their evaluations.

An edited text by Joseph, DeStephano, Jacobs, and Lehiste (2003) draws on research that is particularly important to sociolinguistic studies—that is, the nature and relationship of languages that may or may not share the same cultural space. In When Languages Collide: Perspectives on Language Conflict, Language Competition, and Language Coexistence, linguists from diverse subfields share essays regarding, as the editors say, “a variety of language-related problems that affect real people in real situations.” Although each one represents the views and perspectives of particular researchers, taken together, they give a powerful message showing that the complexities of language and languages are entities that are indicative of the complexities of human behavior and the structure of societies.

As is the case with so many texts in the subfield of sociolinguistics, When Languages Collide permits much reflection on the multiple roles of language through the paradigms of both formalism and functionalism. It especially provides thought regarding language endangerment and societal change. Among the topics discussed are language ideologies (i.e., the role of governments in determining language use), language resurgence (e.g., increased speakers in the Navajo nation), and language endangerment. Joshua Fishman, an eminent sociolinguist, expounds on the growth of literacy and the political structures of society. His chapter is especially intriguing since most of his other research involves studies of spoken language. Julie Auger describes the growth of literacy among people in the border areas of Belgium and northwestern France. In this area, a fragile language, Picard, has a growing literary tradition in spite of the fact that few individuals speak it.

Just as there has been a resurgence in studies about the classification of existing languages and cultures, there have also been linguists and anthropologists who have tried to understand the reasons for language endangerment and the extinction of languages. They have attempted to keep records about endangered languages, looking at linguistic structures and geographic areas where endangerment predominates. David Crystal, considered one of the world’s foremost experts on language, has compiled research about the language survival situation and reasons for language extinction. In Language Death, Crystal (2000) gave calculations that show that in 100 years between 25% and 80% of the world’s languages will be extinct. As of 2005, the actual number count of known languages (spoken and signed) was estimated as 6,912. Thus, approximately 1,728 languages, as a lower estimate, could be extinct by the year 2105. He states that currently 96% of the world’s population speaks only 4% of existing languages.

Research about language death is a relatively new pursuit. Just as societies have become concerned with ecology, global warming, and survival, they are becoming more aware of the case of linguistic ecology. There currently exists an International Clearing House for Endangered Languages at the University of Tokyo and an Endangered Language Fund in the United States. A new subfield of linguistics, ecolinguistics, has been designated for concentration on issues of language diversity and language death.

Reasons for extinction include the lessening of the numbers of peoples who speak the language, as in Northern (Tundra) Yukaghir, Russia, as well as language assimilation into a language that predominates in a geographic area. Only around 120 individuals in Northern Yukaghir speak the indigenous language of the villages. It is believed that this language is at least 8,000 years old. All of the community of 1,100 people can speak a second language, Yakut, which is the name of the Russian republic in which they live. The two indigenous languages are spoken by the elderly at home. In Ethnologue: Languages of the World, Gordon (2005) noted that these people have no ethnic identity due to their assimilation with other groups in the area, such as theYakuts and the Evens.Yet the NorthernYukaghirs do share cultural bonds as explained in the research of Elena Maslova, a formal linguist.

Salikoko Mufwene has summarized the work of linguists, such as David Crystal and Jean Aitchison, regarding language death, decay, murder, and suicide. He also has conjectured about the possibilities for language persistence and language ecology. To do so, Mufwene looks to the social dimensions of language characterization as he has researched it within the subfield of sociolinguistics. He, like other linguists who are concerned about societies and cultures, takes a historical perspective and includes questions and answers from work on migration and colonization in particular areas of the world (e.g., Sub-Saharan Africa). His research adds a special dimension to the subfield of sociolinguistics, which he calls sociohistorical linguistics.

Psycholinguistics is a subfield of linguistics in which researchers study psychological processes involved in language development and use. The primary focus for the psycholinguist is language behavior, and this may include studies of memory, cognition, speech processing, auditory processing, and reading. This subfield, just as sociolinguistics, is a relatively young one. From the late 20th century to the early 21st century, there has been an exponential growth in the number of psycholinguistic studies concerned with cognition and language processing. What is particularly interesting about this field is its focus on the individual as a speaker, writer, and thinker.

Members of the subfield of psycholinguistics are typically identified within the field of psychology and to some extent in educational psychology. Since a primary goal is to understand connections between the mind and language, there appears to be much more collaboration of psycholinguists with others in allied fields than there is among other subfields of linguistics. Perhaps this collaborative nature exists because a large body of psycholinguistic research has to do with language acquisition. Those involved in developmental psycholinguistics have provided a wealth of research regarding language learning in infants and children, cross-linguistic issues in language development, and correlates of brain development and language maturation.

Although most psycholinguists follow the theories of formalism, many may be identified as functionalists. This is especially true among developmental psycholinguists who study child discourse, bilingualism, and language education. Since psycholinguists have a proclivity for collaboration, researchers who are in fields of applied linguistics (i.e., fields that study language use in a variety of situations) tend to be collaborators with psycholinguists and educational psychologists. For example, Evelyn Hatch, a researcher in second-language learning and discourse, uses a variety of research theories that relate to the theory of knowledge known as constructivism. Annette KarmiloffSmith, who did much early work on children’s narrative interpretations, focuses on the fields of developmental psychology and neuroscience. It has been stated elsewhere that Daniel Slobin’s contributions in developmental psycholinguistics have enabled the field of linguistics in general to understand language acquisition among children in nations that represent a range of spoken language families.

Other concerns of psycholinguists have to do with language perception and language processing. A correlate of these areas is that of forensic linguistics, a growing subfield that has, as one of its areas of focus, the study of language interpretation and expression in matters of the law and crime. Knowledge of the use of memory and language perception is important to forensic linguists, and they are able to draw from the larger subfield of psycholinguistics for their own research.

Language Identification and Tools of Linguistic Studies

The large family of linguists includes those who are driven to research using formal theories and those who are motivated by paradigms of functionalism. At one end of the spectrum are the conservative formal linguists, whose interests are in how the mind uses language and the identification and description of universal principles of grammar, as well as those that are unique to every language group. At the other end of the spectrum are the extreme functionalists, whose work is to uncover meaning in the conversations (verbal discourse) of individuals and to see deductively what is similar and what is different in the language use of peoples. Some linguists look at their research through the lens of the historian or anthropologist; others look through the lens of computational models, as these models are able to mimic natural language. And others take a route of applied linguistics to bring research down to a utilitarian level, as in forensic psychology and in psycholinguistics as a component of educational psychology.

Researchers may be especially concerned about the actual language or languages for study, or they may be more concerned with the individuals in societies and the conditions of their lives that are determined by their language or languages. Whether a sociolinguist or a computational linguist, the resources used in linguistics include words, sentences, conversations, gestures, body language, writings, and a range of nonverbal signals. Linguists separate and manipulate these resources in the main categories of phonology, morphology, syntax, semantics, and pragmatics. These categories apply to analyses of spoken language as well as signed languages, of which there are 119 known throughout the world. Of these, American Sign Language (ASL) is most studied by formal linguists, as well as sociolinguists and other functional linguists.

Languages are also delineated as natural or contrived. Simply put, a natural language is any human language that has developed naturally over time. Invented languages are not a significant area of study by linguists, although this area can be of value regarding computer paradigms. Computational linguists and those involved in the field of artificial intelligence study natural languages and try to figure out how to simulate these in computer technology.

There are many linguists who believe that a research paper of Steven Pinker and Paul Bloom (1990), “Natural Language and Natural Language Selection,” was the main driving force for the spread of legitimate studies about language evolution into the 21st century. As stated previously, there had been a moratorium on this area of research imposed by the Societé de Linguistique de Paris in 1866 due to an unwieldy number of studies of questionable integrity that arose after the 1859 publication of Darwin’s On the Origin of Species.

Phonology refers to the sound system of a language. Descriptive linguistics, during the time of the structural linguists, provided a large body of information regarding the articulation of speech, the classification of speech sounds in natural languages around the world, and the characterization of the brain areas in which receptive and expressive language originate and function. Regarding ASL, linguists only began to characterize phonology (which involves facial expression and physical involvement other than the hands) in the latter half of the 1900s, especially after ASL was acknowledged as a real language.

Through linguistic studies in the early 20th century to the present, there has been much research in developmental linguistics regarding language acquisition and the growth of language as it occurs contrastively in the speech development of infants and children throughout the world. Slobin’s research, comparing the expressive language of children in countries where languages belong to different language families (e.g., Turkish, Korean, Estonian, English), has proven invaluable for further studies of language acquisition. For example, he observed that initially all infants babble similar sounds, but those that are not common in the speech of a particular language drop off and are “forgotten” as the infant says his or her first words generally around the age of 12 months.

Research on the history of the phonology of languages, such as that of John McWhorter, provides a window into the possible ways that languages have changed as well as the development of new languages. McWhorter gives an example of the movement from Latin to French. In the Latin word for woman, femina (FEH-mee-nah), the accented syllable remains and the two weaker syllables are dropped as this word becomes femme (FAHM) in French. McWhorter comments that new words and languages develop with the “erosion” of sounds from the parent language to the new one.

Change in the phonology of languages is believed to be a very slow process, as is the modification of vocabulary forms. These precede changes in grammar. However, research by Atkinson, Meade, Vendetti, Greenhill, and Pagel (2008) indicates that there may be rapid bursts, which they call punctuational bursts, that occur at the beginning of the development of “fledgling languages” that may be derivatives of older languages. These characteristics are then followed by a period of slower development. The authors observed this in their studies of the languages of three language families and hypothesized that it holds for phonology, morphology, and syntax.

Anthropological linguists are especially curious about the studies of phonology to find out when humans first began to speak. Biologists as well have proposed theories based on the findings of archaeologists and paleontologists regarding the evolution of humans. Although there is evidence from fossils that the anatomical parts for speech were in place 150,000 years ago, scientists question when vocalization was cultivated for the use of communication. Even though the physical structures were available in the middle Paleolithic era, archaeological evidence of social organization suggests that the liberal use of speech and verbal language might have more reasonably started around 40,000 years ago during the Upper Paleolithic explosion.

One of the reasons that linguists from several subfields might find it worthwhile to collaborate with other researchers—particularly those in speech perception, audiology, neuroscience, and computational linguistics—is that each has expertise regarding different aspects of phonology. One possible goal of the collaboration might be to enable applications of new knowledge about phonology to support the development of instrumentation or technology to fulfill a medical or engineering purpose. For example, the development of the cochlear implant by individuals such as Graeme Clark involved a team of experts from 10 fields, including electronic and communication engineering, speech processing, speech science, and psychophysics.

Morphology is a branch of grammar that describes the combination of sounds into words, the development of the lexicon of a language. As with phonology, morphology is rule driven. Crystal (1985) explained that there are two divisions of morphology, inflectional morphology and derivational morphology. The study of the structure of words is especially interesting since they are representations of actual entities in a language that involve meaning. Early structural linguists were able to look at the use of words and the growth of language lexicons in order to situate them within the grammar of a language. For example, Boas, in his Handbook of American Indian Languages (1911), called attention to the way that Eskimos (Aleuts) take a single root word and combine it with other morphological components to designate different words for snow according to their unique experience of it in Alaska. This point has frequently been discussed by others, including Benjamin Whorf, who used it to support his theory of linguistic relativism.

In generative linguistics, morphology and syntax are considered central foci for grammar. Crystal explains that the same syntactic rules apply to the structure of words, as well as they do to phrases and sentences.

Sometimes, one may hear the comment, “I don’t have a word for that in my language.” And sometimes, it may take more than a single word to describe a concept captured in another language by a single word. As with the example above regarding snow, linguists may argue for linguistic relativism using similar comments. What intrigues linguists is the way that words may represent degrees of meaning for an entity. For example, alternative verbs for walk give different impressions of movement in a conversation or text (e.g., strut, saunter, shuffle ) . Linguistic studies about conversations and word use provide information regarding the growth of languages and language change, even at the level of morphological analysis.

Wierzbecka explains that polysemous words (i.e., words that have many meanings) are a special case for the study of languages. It is not that there may not be an equivalent word in one language available in another but that a particular usage of the word is not permitted. She gives the example of the word freedom, comparing it in five languages. In English, freedom can be used in the context of freedom from (interruption), freedom to (speak), and freedom of (choice). In Polish, the word wolno´sc´ is used to represent moral and political issues, matters of life and death. Unlike English, it cannot be used in a context such as freedom of access, freedom of movement. It can, however, be used as freedom of conscience.

Syntax refers to the grammar of a language. The study of syntax involves knowledge of the rules that govern the ways that words combine to achieve meaning in a given language. It is at the level of syntax that so much of the work of linguistics has been especially important. Whether in formal or functional paradigms, linguists have concentrated on the sentence and on syntax as primary characteristics that separate humans from the rest of the animal world. The work of Chomsky has contributed not only to the formal understanding of language structure but also to the enabling of researchers to understand something that makes humans special. Belletti and Rizzi (2002) stated it this way:

The critical formal contribution of early generative grammar was to show that the regularity and unboundedness of natural language syntax were expressible by precise grammatical models endowed with recursive procedures. Knowing a language amounts to tacitly possessing a recursive generative procedure. (p. 3)

Formal linguistics, as well as psycholinguistics, makes heavy use of syntactic and morphological structures in its research. There are several methodologies for syntactic, grammatical analysis. Besides those that are based on Chomsky’s generative transformational grammar, there are mathematical methods, such as that of Montague, and methods that probe universal grammar, such as that of optimality-theoretic syntax.

In the case of discourse analyses, those who might be considered conservative functionalists, using the definitions of Van Valin, sometimes combine methods—more of a formal approach to observations of syntax in conversational discourse.

Semantics refers to the study of meaning. Pragmatics refers to the connections between specific contexts and meaning. Although these two are specific areas of linguistics, together they have provided for theories of understanding and human cognition.

The field of semantics has been especially important to modern language philosophy and logic. Philosophers such as Rudolf Carnap (1891–1970) and W. V. O. Quine (1908–2000) delved into language philosophy with consequences for those studying artificial intelligence. Quine, in particular, explored the works of Chomsky and formalism in an attempt to verify his own direction regarding logic and language. Semantics also includes studies of speech acts and conversational implicature. John Searle, a prominent language philosopher who is identified with the free speech movement at Berkeley, has contributed greatly to speech act theory. This theory involves the search for meaning in what individuals say, and that requires further understanding of language contexts as well as linguistic culture. Conversational implicature is one component in speech act theory and has to do with particular conventions of speech in which there may be complicated underlying meanings. For example, a request at dinner, “Can you pass the salt?” does not require a yes/no answer but rather an acknowledgment in action by the guest. An understanding of speech act theory enables anthropological linguists to draw connections regarding the development of cultures as they observe commonalities in the use of language within particular cultural environments (e.g., traditions of rights of passage to adulthood and interactions in the marketplace).

Applications of meaning to grammar have practical consequences for computational linguists as well as for understanding political and other spoken and written discourse. Thus, those in the subfields of psycholinguistics and sociolinguistics have provided much evidence, regarding the role of semantics in a wide range of grammatical and conversational contexts, among a wide number of diverse cultures around the world.

Concerns that have arisen due to linguistic and philosophical theories regarding semantics have to do with variations in both speaking and writing. Two of these areas are ambiguity and referencing. In many spoken languages, such as English, listeners accommodate much ambiguity in conversation. For example, sentences such as “Bill told John that he loved Mary” are well tolerated. Spatial relationships and nonverbal cues help listeners disambiguate referents in statements such as “Here it comes,” when contextualized within a situation such as a baseball flying into the spectator section of a ballpark.

Pragmatics plays an important role regarding semantic interpretation. Subfields in both formal linguistics and functional linguistics concentrate on identifying and interpreting the meaning of statements as they are applied to the real world. Areas of speech acts, conversational implicature, ambiguity, and referencing all involve consideration of real-world contexts. For example, a sentence such as the following is usually understood because of an individual’s prior knowledge of how the world works: “Sarah pulled the rug next to the chair and then sat on it.” In this sentence, a psychological principle known as parallel processing influences the listener’s determination of the referent for the pronoun it. One wants to match the rug as the referent; however, pragmatically speaking, it appears more sensible to choose the chair.

Studies of meaning in linguistics, whether at the philosophical level or that of human culture and society, involve each of the areas of phonology, morphology, and syntax to greater and lesser extents. Although these areas are often dealt with separately in research, they also may be used in one of several combinations or pairings.

It is particularly important for those in the field of anthropology to recognize and understand a wide range of linguistic theories in order to support their investigations and the works of cultures and societies. Rather than considering linguistics as an ancillary tool for research, as was the case with Boas, the new anthropologists of the 21st century need to consider the constitutive nature of language to humanity. The range of characteristics that constitute the matter of linguistics is so broad, however, that researchers of necessity need to collaborate in order to address their particular questions. Further study of the involvement of linguistics in the field of anthropology will require of the individual much reading in subfields, such as those described in this research paper.

Bibliography:

  • Agar, M. (1994). Language shock: The culture of conversation. New York: William Morrow.
  • Aitchison, J. (1991). Language change: Progress or decay? (2nd ed.). New York: Cambridge University Press.
  • Atkinson, Q. D., Meade, A., Vendetti, C., Greenhill, S. J., & Pagel, M. (2008). Languages evolve in punctuational bursts. Science, 319,
  • Belletti,A., & Rizzi,A. (Eds.). (2002). Noam Chomsky: On nature and language. NewYork: Cambridge University Press.
  • Boas, F. (1911). Handbook of American Indian languages (Bureau of American Ethnology, Bulletin 40). Washington, DC: Government Printing Office.
  • Chafe, W. L. (Ed.). (1980). The pear stories: Cognitive, cultural, and linguistic aspects of narrative production. Norwood, NJ: Ablex.
  • Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge: MIT Press.
  • Chomsky, N. (1975). Reflections on language. NewYork: Pantheon.
  • Chomsky, N. (2005, Summer). What we know: On the universals of language and rights. Boston Review. Available from http://bostonreview.net/BR30.3/chomsky.php
  • Chrosniak, P. N. (2009). Language evolution. In H. J. Birx (Ed.), Encyclopedia of time: Science, philosophy, theology, and culture (pp. 760–765). Thousand Oaks, CA: Sage.
  • Clark, G. (2003). Cochlear implants: Fundamentals and applications. Dordrecht, the Netherlands: Springer.
  • Crystal, D. (1985). A dictionary of linguistics and phonetics (2nd ed.). New York: Blackwell.
  • Crystal, D. (2000). Language death. New York: Cambridge University Press.
  • Darwin, C. (1905). On the origin of the species by means of natural selection, or the preservation of favored races in the struggle for life. New York: P. F. Collier.
  • Davis, J. (1994). Mother tongue: How humans create language. Secaucus, NJ: Carol.
  • Gordon, R. G. (Ed.). (2005). Ethnologue: Languages of the world (15th ed.). Dallas, TX: SIL International.
  • Gumperz, J. J., & Hymes, D. (Eds.). (1972). Directions in sociolinguistics: The ethnography of communication. NewYork: Holt, Rinehart & Winston.
  • Haberlandt, K. (1984). Components of sentence and word reading times. In D. E. Kieras & M. A. Just (Eds.), New methods in reading comprehension research (pp. 219–252). Hillsdale, NJ: Lawrence Erlbaum.
  • Hatch, E. (1992). Discourse and language education. New York: Cambridge University Press.
  • Hauser, M. D., Chomsky, N., & Fitch, W. T. (2002). The faculty of language: What is it, who has it, and how did it evolve? Science, 282, 1569–1579.
  • Hymes, D. H. (1996). Ethnography, linguistics, narrative inequality: Toward an understanding voice. Bristol, PA: Taylor & Francis.
  • Joseph, B. D., DeStephano, J., Jacobs, N. G., & Lehiste, I. (Eds.). (2003). When languages collide: Perspectives on language conflict, language competition, and language coexistence. Columbus: Ohio University Press.
  • Maslova, E. (2003). A grammar of Kolyma Yukaghir. New York: Mouton de Gruyter.
  • McWhorter, J. H. (2001). The power of Babel: A natural history of language. New York: Times Books.
  • Parker, P. M. (1997). Linguistic cultures of the world: A statistical reference. Westport, CT: Greenwood Press.
  • Saussure, F. de. (1916). Course in general linguistics (W. Baskin, Trans.). New York: McGraw-Hill.
  • Selinker, L. (1992). Rediscovering interlanguage. New York: Longman.
  • Stockwell, R. P., & Macauley, R. K. S. (Eds.). (1972). Linguistic change and generative theory. Bloomington: Indiana University Press.
  • Tannen, D. (1980). A comparative analysis of oral narrative strategies:Athenian Greek andAmerican English. InW. Chafe (Ed.), The pear stories (pp. 51–88). Norwood, NJ: Ablex.
  • Tannen, D. (1991). You just don’t understand: Women and men in conversation. New York: Ballantine.
  • Van Valin, R. D. (2001). Functional linguistics. In M. Aronoff & J. Rees-Miller (Eds.), The handbook of linguistics (pp. 319–326). Malden, MA: Blackwell.
  • Wierzbecka,A. (1997). Understanding cultures through their key words: English, Russian, Polish, German, and Japanese. New York: Oxford University Press.

ORDER HIGH QUALITY CUSTOM PAPER

research papers in linguistics

IMAGES

  1. Research Concept Paper on Linguistics

    research papers in linguistics

  2. Magiran

    research papers in linguistics

  3. 130+ Original Linguistics Research Topics: That Need To Know

    research papers in linguistics

  4. (PDF) Research Methods in Linguistics

    research papers in linguistics

  5. (PDF) A Cognitive Linguistics account of viewpoint in academic prose

    research papers in linguistics

  6. (PDF) Corpus linguistics research trends from 1997 to 2016: A co

    research papers in linguistics

VIDEO

  1. linguistics solved past papers 2021 #pastpapers #punjabuniversity #ENG-104

  2. Assorted Socio-Linguistics papers

  3. Strong R Disappearing in North England

  4. LINGUISTICS Past Papers Solved MCQs

  5. Approach in linguistics research || Article journal

  6. Past Papers 2015

COMMENTS

  1. Language and linguistics

    Drawing upon the philosophical theories of language—that the meaning and inference of a word is dependent on its use—we argue that the context in which use of the term patient occurs is ...

  2. Research Methods in Applied Linguistics

    Aims & Scope Research Methods in Applied Linguistics is the first and only journal devoted exclusively to research methods in applied linguistics, a discipline that explores real-world language-related issues and phenomena. Core areas of applied linguistics include bilingualism and multilingualism, computer-assisted language learning, conversation analysis, corpus linguistics, critical studies ...

  3. Applied Linguistics

    Publishes research into language with relevance to real-world problems. Connections are made between fields, theories, research methods, scholarly discourses, and articles critically reflect on current practices in applied linguistic research.

  4. Journal of English Linguistics: Sage Journals

    Submit Paper. The Journal of English Linguistics is your premier resource for original linguistic research based on data drawn from the English language, encompassing a broad theoretical and methodological scope. Highlighting theoretically and technologically … | View full journal description. This journal is a member of the Committee on ...

  5. Journal of Linguistics

    Search the journal. Journal of Linguistics has as its goal to publish articles that make a clear contribution to current debate in all branches of theoretical linguistics. The journal also provides an excellent survey of recent linguistics publications, with around thirty book reviews in each volume and regular review articles on major works ...

  6. Research at MIT Linguistics

    Explore the cutting-edge research papers by MIT Linguistics faculty and students on various topics of language and cognition.

  7. Most Read in Linguistics

    Most Read in Linguistics From practical applications to the latest academic scholarship, Oxford's range of linguistics research has unparalleled breadth and authority. Explore a collection of our most read articles and chapters from our linguistics portfolio, available to read for free online until December 2022.

  8. Linguistics and Education

    Linguistics and Education is a research-oriented journal. Papers may address practical and policy implications for education but must be built on robust research and have a strong conceptual grounding in their analyses and discussions. Linguistics and Education welcomes papers from across disciplinary and interdisciplinary research traditions ...

  9. Linguistics Vanguard

    Linguistics Vanguard seeks to publish concise and up-to-date reports on the state of the art in linguistics as well as cutting-edge research papers. With its topical breadth of coverage and anticipated quick rate of production, it is one of the leading platforms for scientific exchange in linguistics.

  10. Applied Linguistics Research: Current Issues, Methods, and Trends

    This book presents key research methods in applied linguistics and includes a comprehensive discussion of quantitative, qualitative, and mixed methods approaches, criteria for judging research quality, cross-sectional and longitudinal data collection, data analysis, and research reports. McKinley, J., & Rose, H. (Eds.). (2017).

  11. Researching language and cognition in bilinguals

    Aims: This article reviews recent research on the relationship between language and thinking in bilinguals. Approach: The paper reviews aspects of previous rese...

  12. Home

    The Journal of Psycholinguistic Research covers a broad range of approaches to the study of the communicative process, including: the social and anthropological bases of communication; development of speech and language; semantics (problems in linguistic meaning); and biological foundations. It also examines the psychopathology of language and ...

  13. Linguistics and Language: A Research Guide: Journal Articles

    Primary Online Indexes and Databases for Linguistics LLBA Language, Linguistics and Behavior Abstracts Covers all aspects of the study of language including phonetics, phonology, morphology, syntax and semantics. Documents indexed include journal articles, book reviews, books, book chapters, dissertations and working papers. Linguistic Bibliography Online "Contains over 440,000 detailed ...

  14. Language and Literature: Sage Journals

    Language and Literature: International Journal of Stylistics. Language and Literature is an invaluable international peer-reviewed journal that covers the latest research in stylistics, defined as the study of style in literary and non-literary language. We publish theoretical, empirical and experimental research … | View full journal ...

  15. Trends and hot topics in linguistics studies from 2011 to 2021: A

    High citations most often characterize quality research that reflects the foci of the discipline. This study aims to spotlight the most recent hot topics and the trends looming from the highly cited papers (HCPs) in Web of Science category of linguistics ...

  16. (PDF) Linguistic Theories, Approaches, and Methods

    Linguistic theories, approaches and methods. Hans-Jörg Schmid, München. 1. Introduction: General aims of linguistic theorizing. According to the sociologist Kurt Lewin (1952: 169), "there is ...

  17. Harvard Working Papers in Linguistics

    Harvard Working Papers in Linguistics is a publication of the graduate students of the Department of Linguistics at Harvard University. It is a venue for students and faculty to publish papers that reflect ongoing research and papers that are in their early stages, in order to stimulate discussion and inform the wider linguistics community about exciting developments at Harvard.

  18. Working Papers in Linguistics

    Working Papers in Linguistics is an occasional publication of the Department of Linguistics of Ohio State University and usually contains articles written by students and faculty in the department. Below is an indication of the contents of each volume.

  19. PDF A Guide to Writing a Senior Thesis in Linguistics

    and, very often, turn in a final research paper. We recommend that you write down the questions and sub-topics that interest you while taking tutorials, so that you can think about the possi-bilities o turning these questions into a research project. (In fact, we recommend doing this during every linguistics class or lecture you attend

  20. 211 Interesting Research Topics in Linguistics For Your Thesis

    Consider these research topics in Linguistics for your university research or college essay. These are engaging project ideas to make writing easy.

  21. Library Research Guides: Linguistics: Getting Started

    Welcome to the linguistics research guide! This guide will connect you to resources available through the IU Libraries and open access resources on the web. While this guide is primarily intended for undergraduate and graduate students in liguistics, other students and early-career researchers in the social sciences may also find it useful.

  22. Frontiers

    High citations most often characterize quality research that reflects the foci of the discipline. This study aims to spotlight the most recent hot topics and the trends looming from the highly cited papers (HCPs) in Web of Science category of linguistics and language & linguistics with bibliometric analysis.

  23. AI generates covertly racist decisions about people based on their

    Hundreds of millions of people now interact with language models, with uses ranging from help with writing1,2 to informing hiring decisions3. However, these language models are known to perpetuate ...

  24. Linguistics Research Paper

    This sample linguistics research paper features: 8700 words (approx. 29 pages), an outline, and a bibliography with 32 sources. Browse other research paper examples for more inspiration. If you need a thorough research paper written according to all the academic standards, you can always turn to our experienced writers for help.