• Privacy Policy

Research Method

Home » Research Results Section – Writing Guide and Examples

Research Results Section – Writing Guide and Examples

Table of Contents

Research Results

Research Results

Research results refer to the findings and conclusions derived from a systematic investigation or study conducted to answer a specific question or hypothesis. These results are typically presented in a written report or paper and can include various forms of data such as numerical data, qualitative data, statistics, charts, graphs, and visual aids.

Results Section in Research

The results section of the research paper presents the findings of the study. It is the part of the paper where the researcher reports the data collected during the study and analyzes it to draw conclusions.

In the results section, the researcher should describe the data that was collected, the statistical analysis performed, and the findings of the study. It is important to be objective and not interpret the data in this section. Instead, the researcher should report the data as accurately and objectively as possible.

Structure of Research Results Section

The structure of the research results section can vary depending on the type of research conducted, but in general, it should contain the following components:

  • Introduction: The introduction should provide an overview of the study, its aims, and its research questions. It should also briefly explain the methodology used to conduct the study.
  • Data presentation : This section presents the data collected during the study. It may include tables, graphs, or other visual aids to help readers better understand the data. The data presented should be organized in a logical and coherent way, with headings and subheadings used to help guide the reader.
  • Data analysis: In this section, the data presented in the previous section are analyzed and interpreted. The statistical tests used to analyze the data should be clearly explained, and the results of the tests should be presented in a way that is easy to understand.
  • Discussion of results : This section should provide an interpretation of the results of the study, including a discussion of any unexpected findings. The discussion should also address the study’s research questions and explain how the results contribute to the field of study.
  • Limitations: This section should acknowledge any limitations of the study, such as sample size, data collection methods, or other factors that may have influenced the results.
  • Conclusions: The conclusions should summarize the main findings of the study and provide a final interpretation of the results. The conclusions should also address the study’s research questions and explain how the results contribute to the field of study.
  • Recommendations : This section may provide recommendations for future research based on the study’s findings. It may also suggest practical applications for the study’s results in real-world settings.

Outline of Research Results Section

The following is an outline of the key components typically included in the Results section:

I. Introduction

  • A brief overview of the research objectives and hypotheses
  • A statement of the research question

II. Descriptive statistics

  • Summary statistics (e.g., mean, standard deviation) for each variable analyzed
  • Frequencies and percentages for categorical variables

III. Inferential statistics

  • Results of statistical analyses, including tests of hypotheses
  • Tables or figures to display statistical results

IV. Effect sizes and confidence intervals

  • Effect sizes (e.g., Cohen’s d, odds ratio) to quantify the strength of the relationship between variables
  • Confidence intervals to estimate the range of plausible values for the effect size

V. Subgroup analyses

  • Results of analyses that examined differences between subgroups (e.g., by gender, age, treatment group)

VI. Limitations and assumptions

  • Discussion of any limitations of the study and potential sources of bias
  • Assumptions made in the statistical analyses

VII. Conclusions

  • A summary of the key findings and their implications
  • A statement of whether the hypotheses were supported or not
  • Suggestions for future research

Example of Research Results Section

An Example of a Research Results Section could be:

  • This study sought to examine the relationship between sleep quality and academic performance in college students.
  • Hypothesis : College students who report better sleep quality will have higher GPAs than those who report poor sleep quality.
  • Methodology : Participants completed a survey about their sleep habits and academic performance.

II. Participants

  • Participants were college students (N=200) from a mid-sized public university in the United States.
  • The sample was evenly split by gender (50% female, 50% male) and predominantly white (85%).
  • Participants were recruited through flyers and online advertisements.

III. Results

  • Participants who reported better sleep quality had significantly higher GPAs (M=3.5, SD=0.5) than those who reported poor sleep quality (M=2.9, SD=0.6).
  • See Table 1 for a summary of the results.
  • Participants who reported consistent sleep schedules had higher GPAs than those with irregular sleep schedules.

IV. Discussion

  • The results support the hypothesis that better sleep quality is associated with higher academic performance in college students.
  • These findings have implications for college students, as prioritizing sleep could lead to better academic outcomes.
  • Limitations of the study include self-reported data and the lack of control for other variables that could impact academic performance.

V. Conclusion

  • College students who prioritize sleep may see a positive impact on their academic performance.
  • These findings highlight the importance of sleep in academic success.
  • Future research could explore interventions to improve sleep quality in college students.

Example of Research Results in Research Paper :

Our study aimed to compare the performance of three different machine learning algorithms (Random Forest, Support Vector Machine, and Neural Network) in predicting customer churn in a telecommunications company. We collected a dataset of 10,000 customer records, with 20 predictor variables and a binary churn outcome variable.

Our analysis revealed that all three algorithms performed well in predicting customer churn, with an overall accuracy of 85%. However, the Random Forest algorithm showed the highest accuracy (88%), followed by the Support Vector Machine (86%) and the Neural Network (84%).

Furthermore, we found that the most important predictor variables for customer churn were monthly charges, contract type, and tenure. Random Forest identified monthly charges as the most important variable, while Support Vector Machine and Neural Network identified contract type as the most important.

Overall, our results suggest that machine learning algorithms can be effective in predicting customer churn in a telecommunications company, and that Random Forest is the most accurate algorithm for this task.

Example 3 :

Title : The Impact of Social Media on Body Image and Self-Esteem

Abstract : This study aimed to investigate the relationship between social media use, body image, and self-esteem among young adults. A total of 200 participants were recruited from a university and completed self-report measures of social media use, body image satisfaction, and self-esteem.

Results: The results showed that social media use was significantly associated with body image dissatisfaction and lower self-esteem. Specifically, participants who reported spending more time on social media platforms had lower levels of body image satisfaction and self-esteem compared to those who reported less social media use. Moreover, the study found that comparing oneself to others on social media was a significant predictor of body image dissatisfaction and lower self-esteem.

Conclusion : These results suggest that social media use can have negative effects on body image satisfaction and self-esteem among young adults. It is important for individuals to be mindful of their social media use and to recognize the potential negative impact it can have on their mental health. Furthermore, interventions aimed at promoting positive body image and self-esteem should take into account the role of social media in shaping these attitudes and behaviors.

Importance of Research Results

Research results are important for several reasons, including:

  • Advancing knowledge: Research results can contribute to the advancement of knowledge in a particular field, whether it be in science, technology, medicine, social sciences, or humanities.
  • Developing theories: Research results can help to develop or modify existing theories and create new ones.
  • Improving practices: Research results can inform and improve practices in various fields, such as education, healthcare, business, and public policy.
  • Identifying problems and solutions: Research results can identify problems and provide solutions to complex issues in society, including issues related to health, environment, social justice, and economics.
  • Validating claims : Research results can validate or refute claims made by individuals or groups in society, such as politicians, corporations, or activists.
  • Providing evidence: Research results can provide evidence to support decision-making, policy-making, and resource allocation in various fields.

How to Write Results in A Research Paper

Here are some general guidelines on how to write results in a research paper:

  • Organize the results section: Start by organizing the results section in a logical and coherent manner. Divide the section into subsections if necessary, based on the research questions or hypotheses.
  • Present the findings: Present the findings in a clear and concise manner. Use tables, graphs, and figures to illustrate the data and make the presentation more engaging.
  • Describe the data: Describe the data in detail, including the sample size, response rate, and any missing data. Provide relevant descriptive statistics such as means, standard deviations, and ranges.
  • Interpret the findings: Interpret the findings in light of the research questions or hypotheses. Discuss the implications of the findings and the extent to which they support or contradict existing theories or previous research.
  • Discuss the limitations : Discuss the limitations of the study, including any potential sources of bias or confounding factors that may have affected the results.
  • Compare the results : Compare the results with those of previous studies or theoretical predictions. Discuss any similarities, differences, or inconsistencies.
  • Avoid redundancy: Avoid repeating information that has already been presented in the introduction or methods sections. Instead, focus on presenting new and relevant information.
  • Be objective: Be objective in presenting the results, avoiding any personal biases or interpretations.

When to Write Research Results

Here are situations When to Write Research Results”

  • After conducting research on the chosen topic and obtaining relevant data, organize the findings in a structured format that accurately represents the information gathered.
  • Once the data has been analyzed and interpreted, and conclusions have been drawn, begin the writing process.
  • Before starting to write, ensure that the research results adhere to the guidelines and requirements of the intended audience, such as a scientific journal or academic conference.
  • Begin by writing an abstract that briefly summarizes the research question, methodology, findings, and conclusions.
  • Follow the abstract with an introduction that provides context for the research, explains its significance, and outlines the research question and objectives.
  • The next section should be a literature review that provides an overview of existing research on the topic and highlights the gaps in knowledge that the current research seeks to address.
  • The methodology section should provide a detailed explanation of the research design, including the sample size, data collection methods, and analytical techniques used.
  • Present the research results in a clear and concise manner, using graphs, tables, and figures to illustrate the findings.
  • Discuss the implications of the research results, including how they contribute to the existing body of knowledge on the topic and what further research is needed.
  • Conclude the paper by summarizing the main findings, reiterating the significance of the research, and offering suggestions for future research.

Purpose of Research Results

The purposes of Research Results are as follows:

  • Informing policy and practice: Research results can provide evidence-based information to inform policy decisions, such as in the fields of healthcare, education, and environmental regulation. They can also inform best practices in fields such as business, engineering, and social work.
  • Addressing societal problems : Research results can be used to help address societal problems, such as reducing poverty, improving public health, and promoting social justice.
  • Generating economic benefits : Research results can lead to the development of new products, services, and technologies that can create economic value and improve quality of life.
  • Supporting academic and professional development : Research results can be used to support academic and professional development by providing opportunities for students, researchers, and practitioners to learn about new findings and methodologies in their field.
  • Enhancing public understanding: Research results can help to educate the public about important issues and promote scientific literacy, leading to more informed decision-making and better public policy.
  • Evaluating interventions: Research results can be used to evaluate the effectiveness of interventions, such as treatments, educational programs, and social policies. This can help to identify areas where improvements are needed and guide future interventions.
  • Contributing to scientific progress: Research results can contribute to the advancement of science by providing new insights and discoveries that can lead to new theories, methods, and techniques.
  • Informing decision-making : Research results can provide decision-makers with the information they need to make informed decisions. This can include decision-making at the individual, organizational, or governmental levels.
  • Fostering collaboration : Research results can facilitate collaboration between researchers and practitioners, leading to new partnerships, interdisciplinary approaches, and innovative solutions to complex problems.

Advantages of Research Results

Some Advantages of Research Results are as follows:

  • Improved decision-making: Research results can help inform decision-making in various fields, including medicine, business, and government. For example, research on the effectiveness of different treatments for a particular disease can help doctors make informed decisions about the best course of treatment for their patients.
  • Innovation : Research results can lead to the development of new technologies, products, and services. For example, research on renewable energy sources can lead to the development of new and more efficient ways to harness renewable energy.
  • Economic benefits: Research results can stimulate economic growth by providing new opportunities for businesses and entrepreneurs. For example, research on new materials or manufacturing techniques can lead to the development of new products and processes that can create new jobs and boost economic activity.
  • Improved quality of life: Research results can contribute to improving the quality of life for individuals and society as a whole. For example, research on the causes of a particular disease can lead to the development of new treatments and cures, improving the health and well-being of millions of people.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Research Paper Citation

How to Cite Research Paper – All Formats and...

Data collection

Data Collection – Methods Types and Examples

Delimitations

Delimitations in Research – Types, Examples and...

Research Paper Formats

Research Paper Format – Types, Examples and...

Research Process

Research Process – Steps, Examples and Tips

Research Design

Research Design – Types, Methods and Examples

How to Write the Discussion Section of a Research Paper

The discussion section of a research paper analyzes and interprets the findings, provides context, compares them with previous studies, identifies limitations, and suggests future research directions.

Updated on September 15, 2023

researchers writing the discussion section of their research paper

Structure your discussion section right, and you’ll be cited more often while doing a greater service to the scientific community. So, what actually goes into the discussion section? And how do you write it?

The discussion section of your research paper is where you let the reader know how your study is positioned in the literature, what to take away from your paper, and how your work helps them. It can also include your conclusions and suggestions for future studies.

First, we’ll define all the parts of your discussion paper, and then look into how to write a strong, effective discussion section for your paper or manuscript.

Discussion section: what is it, what it does

The discussion section comes later in your paper, following the introduction, methods, and results. The discussion sets up your study’s conclusions. Its main goals are to present, interpret, and provide a context for your results.

What is it?

The discussion section provides an analysis and interpretation of the findings, compares them with previous studies, identifies limitations, and suggests future directions for research.

This section combines information from the preceding parts of your paper into a coherent story. By this point, the reader already knows why you did your study (introduction), how you did it (methods), and what happened (results). In the discussion, you’ll help the reader connect the ideas from these sections.

Why is it necessary?

The discussion provides context and interpretations for the results. It also answers the questions posed in the introduction. While the results section describes your findings, the discussion explains what they say. This is also where you can describe the impact or implications of your research.

Adds context for your results

Most research studies aim to answer a question, replicate a finding, or address limitations in the literature. These goals are first described in the introduction. However, in the discussion section, the author can refer back to them to explain how the study's objective was achieved. 

Shows what your results actually mean and real-world implications

The discussion can also describe the effect of your findings on research or practice. How are your results significant for readers, other researchers, or policymakers?

What to include in your discussion (in the correct order)

A complete and effective discussion section should at least touch on the points described below.

Summary of key findings

The discussion should begin with a brief factual summary of the results. Concisely overview the main results you obtained.

Begin with key findings with supporting evidence

Your results section described a list of findings, but what message do they send when you look at them all together?

Your findings were detailed in the results section, so there’s no need to repeat them here, but do provide at least a few highlights. This will help refresh the reader’s memory and help them focus on the big picture.

Read the first paragraph of the discussion section in this article (PDF) for an example of how to start this part of your paper. Notice how the authors break down their results and follow each description sentence with an explanation of why each finding is relevant. 

State clearly and concisely

Following a clear and direct writing style is especially important in the discussion section. After all, this is where you will make some of the most impactful points in your paper. While the results section often contains technical vocabulary, such as statistical terms, the discussion section lets you describe your findings more clearly. 

Interpretation of results

Once you’ve given your reader an overview of your results, you need to interpret those results. In other words, what do your results mean? Discuss the findings’ implications and significance in relation to your research question or hypothesis.

Analyze and interpret your findings

Look into your findings and explore what’s behind them or what may have caused them. If your introduction cited theories or studies that could explain your findings, use these sources as a basis to discuss your results.

For example, look at the second paragraph in the discussion section of this article on waggling honey bees. Here, the authors explore their results based on information from the literature.

Unexpected or contradictory results

Sometimes, your findings are not what you expect. Here’s where you describe this and try to find a reason for it. Could it be because of the method you used? Does it have something to do with the variables analyzed? Comparing your methods with those of other similar studies can help with this task.

Context and comparison with previous work

Refer to related studies to place your research in a larger context and the literature. Compare and contrast your findings with existing literature, highlighting similarities, differences, and/or contradictions.

How your work compares or contrasts with previous work

Studies with similar findings to yours can be cited to show the strength of your findings. Information from these studies can also be used to help explain your results. Differences between your findings and others in the literature can also be discussed here. 

How to divide this section into subsections

If you have more than one objective in your study or many key findings, you can dedicate a separate section to each of these. Here’s an example of this approach. You can see that the discussion section is divided into topics and even has a separate heading for each of them. 

Limitations

Many journals require you to include the limitations of your study in the discussion. Even if they don’t, there are good reasons to mention these in your paper.

Why limitations don’t have a negative connotation

A study’s limitations are points to be improved upon in future research. While some of these may be flaws in your method, many may be due to factors you couldn’t predict.

Examples include time constraints or small sample sizes. Pointing this out will help future researchers avoid or address these issues. This part of the discussion can also include any attempts you have made to reduce the impact of these limitations, as in this study .

How limitations add to a researcher's credibility

Pointing out the limitations of your study demonstrates transparency. It also shows that you know your methods well and can conduct a critical assessment of them.  

Implications and significance

The final paragraph of the discussion section should contain the take-home messages for your study. It can also cite the “strong points” of your study, to contrast with the limitations section.

Restate your hypothesis

Remind the reader what your hypothesis was before you conducted the study. 

How was it proven or disproven?

Identify your main findings and describe how they relate to your hypothesis.

How your results contribute to the literature

Were you able to answer your research question? Or address a gap in the literature?

Future implications of your research

Describe the impact that your results may have on the topic of study. Your results may show, for instance, that there are still limitations in the literature for future studies to address. There may be a need for studies that extend your findings in a specific way. You also may need additional research to corroborate your findings. 

Sample discussion section

This fictitious example covers all the aspects discussed above. Your actual discussion section will probably be much longer, but you can read this to get an idea of everything your discussion should cover.

Our results showed that the presence of cats in a household is associated with higher levels of perceived happiness by its human occupants. These findings support our hypothesis and demonstrate the association between pet ownership and well-being. 

The present findings align with those of Bao and Schreer (2016) and Hardie et al. (2023), who observed greater life satisfaction in pet owners relative to non-owners. Although the present study did not directly evaluate life satisfaction, this factor may explain the association between happiness and cat ownership observed in our sample.

Our findings must be interpreted in light of some limitations, such as the focus on cat ownership only rather than pets as a whole. This may limit the generalizability of our results.

Nevertheless, this study had several strengths. These include its strict exclusion criteria and use of a standardized assessment instrument to investigate the relationships between pets and owners. These attributes bolster the accuracy of our results and reduce the influence of confounding factors, increasing the strength of our conclusions. Future studies may examine the factors that mediate the association between pet ownership and happiness to better comprehend this phenomenon.

This brief discussion begins with a quick summary of the results and hypothesis. The next paragraph cites previous research and compares its findings to those of this study. Information from previous studies is also used to help interpret the findings. After discussing the results of the study, some limitations are pointed out. The paper also explains why these limitations may influence the interpretation of results. Then, final conclusions are drawn based on the study, and directions for future research are suggested.

How to make your discussion flow naturally

If you find writing in scientific English challenging, the discussion and conclusions are often the hardest parts of the paper to write. That’s because you’re not just listing up studies, methods, and outcomes. You’re actually expressing your thoughts and interpretations in words.

  • How formal should it be?
  • What words should you use, or not use?
  • How do you meet strict word limits, or make it longer and more informative?

Always give it your best, but sometimes a helping hand can, well, help. Getting a professional edit can help clarify your work’s importance while improving the English used to explain it. When readers know the value of your work, they’ll cite it. We’ll assign your study to an expert editor knowledgeable in your area of research. Their work will clarify your discussion, helping it to tell your story. Find out more about AJE Editing.

Adam Goulston, Science Marketing Consultant, PsyD, Human and Organizational Behavior, Scize

Adam Goulston, PsyD, MS, MBA, MISD, ELS

Science Marketing Consultant

See our "Privacy Policy"

Ensure your structure and ideas are consistent and clearly communicated

Pair your Premium Editing with our add-on service Presubmission Review for an overall assessment of your manuscript.

Grad Coach

How To Write The Discussion Chapter

A Simple Explainer With Examples + Free Template

By: Jenna Crossley (PhD) | Reviewed By: Dr. Eunice Rautenbach | August 2021

If you’re reading this, chances are you’ve reached the discussion chapter of your thesis or dissertation and are looking for a bit of guidance. Well, you’ve come to the right place ! In this post, we’ll unpack and demystify the typical discussion chapter in straightforward, easy to understand language, with loads of examples .

Overview: The Discussion Chapter

  • What  the discussion chapter is
  • What to include in your discussion
  • How to write up your discussion
  • A few tips and tricks to help you along the way
  • Free discussion template

What (exactly) is the discussion chapter?

The discussion chapter is where you interpret and explain your results within your thesis or dissertation. This contrasts with the results chapter, where you merely present and describe the analysis findings (whether qualitative or quantitative ). In the discussion chapter, you elaborate on and evaluate your research findings, and discuss the significance and implications of your results .

In this chapter, you’ll situate your research findings in terms of your research questions or hypotheses and tie them back to previous studies and literature (which you would have covered in your literature review chapter). You’ll also have a look at how relevant and/or significant your findings are to your field of research, and you’ll argue for the conclusions that you draw from your analysis. Simply put, the discussion chapter is there for you to interact with and explain your research findings in a thorough and coherent manner.

Free template for discussion or thesis discussion section

What should I include in the discussion chapter?

First things first: in some studies, the results and discussion chapter are combined into one chapter .  This depends on the type of study you conducted (i.e., the nature of the study and methodology adopted), as well as the standards set by the university.  So, check in with your university regarding their norms and expectations before getting started. In this post, we’ll treat the two chapters as separate, as this is most common.

Basically, your discussion chapter should analyse , explore the meaning and identify the importance of the data you presented in your results chapter. In the discussion chapter, you’ll give your results some form of meaning by evaluating and interpreting them. This will help answer your research questions, achieve your research aims and support your overall conclusion (s). Therefore, you discussion chapter should focus on findings that are directly connected to your research aims and questions. Don’t waste precious time and word count on findings that are not central to the purpose of your research project.

As this chapter is a reflection of your results chapter, it’s vital that you don’t report any new findings . In other words, you can’t present claims here if you didn’t present the relevant data in the results chapter first.  So, make sure that for every discussion point you raise in this chapter, you’ve covered the respective data analysis in the results chapter. If you haven’t, you’ll need to go back and adjust your results chapter accordingly.

If you’re struggling to get started, try writing down a bullet point list everything you found in your results chapter. From this, you can make a list of everything you need to cover in your discussion chapter. Also, make sure you revisit your research questions or hypotheses and incorporate the relevant discussion to address these.  This will also help you to see how you can structure your chapter logically.

Need a helping hand?

example of research results and discussion

How to write the discussion chapter

Now that you’ve got a clear idea of what the discussion chapter is and what it needs to include, let’s look at how you can go about structuring this critically important chapter. Broadly speaking, there are six core components that need to be included, and these can be treated as steps in the chapter writing process.

Step 1: Restate your research problem and research questions

The first step in writing up your discussion chapter is to remind your reader of your research problem , as well as your research aim(s) and research questions . If you have hypotheses, you can also briefly mention these. This “reminder” is very important because, after reading dozens of pages, the reader may have forgotten the original point of your research or been swayed in another direction. It’s also likely that some readers skip straight to your discussion chapter from the introduction chapter , so make sure that your research aims and research questions are clear.

Step 2: Summarise your key findings

Next, you’ll want to summarise your key findings from your results chapter. This may look different for qualitative and quantitative research , where qualitative research may report on themes and relationships, whereas quantitative research may touch on correlations and causal relationships. Regardless of the methodology, in this section you need to highlight the overall key findings in relation to your research questions.

Typically, this section only requires one or two paragraphs , depending on how many research questions you have. Aim to be concise here, as you will unpack these findings in more detail later in the chapter. For now, a few lines that directly address your research questions are all that you need.

Some examples of the kind of language you’d use here include:

  • The data suggest that…
  • The data support/oppose the theory that…
  • The analysis identifies…

These are purely examples. What you present here will be completely dependent on your original research questions, so make sure that you are led by them .

It depends

Step 3: Interpret your results

Once you’ve restated your research problem and research question(s) and briefly presented your key findings, you can unpack your findings by interpreting your results. Remember: only include what you reported in your results section – don’t introduce new information.

From a structural perspective, it can be a wise approach to follow a similar structure in this chapter as you did in your results chapter. This would help improve readability and make it easier for your reader to follow your arguments. For example, if you structured you results discussion by qualitative themes, it may make sense to do the same here.

Alternatively, you may structure this chapter by research questions, or based on an overarching theoretical framework that your study revolved around. Every study is different, so you’ll need to assess what structure works best for you.

When interpreting your results, you’ll want to assess how your findings compare to those of the existing research (from your literature review chapter). Even if your findings contrast with the existing research, you need to include these in your discussion. In fact, those contrasts are often the most interesting findings . In this case, you’d want to think about why you didn’t find what you were expecting in your data and what the significance of this contrast is.

Here are a few questions to help guide your discussion:

  • How do your results relate with those of previous studies ?
  • If you get results that differ from those of previous studies, why may this be the case?
  • What do your results contribute to your field of research?
  • What other explanations could there be for your findings?

When interpreting your findings, be careful not to draw conclusions that aren’t substantiated . Every claim you make needs to be backed up with evidence or findings from the data (and that data needs to be presented in the previous chapter – results). This can look different for different studies; qualitative data may require quotes as evidence, whereas quantitative data would use statistical methods and tests. Whatever the case, every claim you make needs to be strongly backed up.

Step 4: Acknowledge the limitations of your study

The fourth step in writing up your discussion chapter is to acknowledge the limitations of the study. These limitations can cover any part of your study , from the scope or theoretical basis to the analysis method(s) or sample. For example, you may find that you collected data from a very small sample with unique characteristics, which would mean that you are unable to generalise your results to the broader population.

For some students, discussing the limitations of their work can feel a little bit self-defeating . This is a misconception, as a core indicator of high-quality research is its ability to accurately identify its weaknesses. In other words, accurately stating the limitations of your work is a strength, not a weakness . All that said, be careful not to undermine your own research. Tell the reader what limitations exist and what improvements could be made, but also remind them of the value of your study despite its limitations.

Step 5: Make recommendations for implementation and future research

Now that you’ve unpacked your findings and acknowledge the limitations thereof, the next thing you’ll need to do is reflect on your study in terms of two factors:

  • The practical application of your findings
  • Suggestions for future research

The first thing to discuss is how your findings can be used in the real world – in other words, what contribution can they make to the field or industry? Where are these contributions applicable, how and why? For example, if your research is on communication in health settings, in what ways can your findings be applied to the context of a hospital or medical clinic? Make sure that you spell this out for your reader in practical terms, but also be realistic and make sure that any applications are feasible.

The next discussion point is the opportunity for future research . In other words, how can other studies build on what you’ve found and also improve the findings by overcoming some of the limitations in your study (which you discussed a little earlier). In doing this, you’ll want to investigate whether your results fit in with findings of previous research, and if not, why this may be the case. For example, are there any factors that you didn’t consider in your study? What future research can be done to remedy this? When you write up your suggestions, make sure that you don’t just say that more research is needed on the topic, also comment on how the research can build on your study.

Step 6: Provide a concluding summary

Finally, you’ve reached your final stretch. In this section, you’ll want to provide a brief recap of the key findings – in other words, the findings that directly address your research questions . Basically, your conclusion should tell the reader what your study has found, and what they need to take away from reading your report.

When writing up your concluding summary, bear in mind that some readers may skip straight to this section from the beginning of the chapter.  So, make sure that this section flows well from and has a strong connection to the opening section of the chapter.

Tips and tricks for an A-grade discussion chapter

Now that you know what the discussion chapter is , what to include and exclude , and how to structure it , here are some tips and suggestions to help you craft a quality discussion chapter.

  • When you write up your discussion chapter, make sure that you keep it consistent with your introduction chapter , as some readers will skip from the introduction chapter directly to the discussion chapter. Your discussion should use the same tense as your introduction, and it should also make use of the same key terms.
  • Don’t make assumptions about your readers. As a writer, you have hands-on experience with the data and so it can be easy to present it in an over-simplified manner. Make sure that you spell out your findings and interpretations for the intelligent layman.
  • Have a look at other theses and dissertations from your institution, especially the discussion sections. This will help you to understand the standards and conventions of your university, and you’ll also get a good idea of how others have structured their discussion chapters. You can also check out our chapter template .
  • Avoid using absolute terms such as “These results prove that…”, rather make use of terms such as “suggest” or “indicate”, where you could say, “These results suggest that…” or “These results indicate…”. It is highly unlikely that a dissertation or thesis will scientifically prove something (due to a variety of resource constraints), so be humble in your language.
  • Use well-structured and consistently formatted headings to ensure that your reader can easily navigate between sections, and so that your chapter flows logically and coherently.

If you have any questions or thoughts regarding this post, feel free to leave a comment below. Also, if you’re looking for one-on-one help with your discussion chapter (or thesis in general), consider booking a free consultation with one of our highly experienced Grad Coaches to discuss how we can help you.

example of research results and discussion

Psst... there’s more!

This post was based on one of our popular Research Bootcamps . If you're working on a research project, you'll definitely want to check this out ...

You Might Also Like:

How to write the conclusion chapter of a dissertation

36 Comments

Abbie

Thank you this is helpful!

Sai AKO

This is very helpful to me… Thanks a lot for sharing this with us 😊

Nts'eoane Sepanya-Molefi

This has been very helpful indeed. Thank you.

Cheryl

This is actually really helpful, I just stumbled upon it. Very happy that I found it, thank you.

Solomon

Me too! I was kinda lost on how to approach my discussion chapter. How helpful! Thanks a lot!

Wongibe Dieudonne

This is really good and explicit. Thanks

Robin MooreZaid

Thank you, this blog has been such a help.

John Amaka

Thank you. This is very helpful.

Syed Firoz Ahmad

Dear sir/madame

Thanks a lot for this helpful blog. Really, it supported me in writing my discussion chapter while I was totally unaware about its structure and method of writing.

With regards

Syed Firoz Ahmad PhD, Research Scholar

Kwasi Tonge

I agree so much. This blog was god sent. It assisted me so much while I was totally clueless about the context and the know-how. Now I am fully aware of what I am to do and how I am to do it.

Albert Mitugo

Thanks! This is helpful!

Abduljabbar Alsoudani

thanks alot for this informative website

Sudesh Chinthaka

Dear Sir/Madam,

Truly, your article was much benefited when i structured my discussion chapter.

Thank you very much!!!

Nann Yin Yin Moe

This is helpful for me in writing my research discussion component. I have to copy this text on Microsoft word cause of my weakness that I cannot be able to read the text on screen a long time. So many thanks for this articles.

Eunice Mulenga

This was helpful

Leo Simango

Thanks Jenna, well explained.

Poornima

Thank you! This is super helpful.

William M. Kapambwe

Thanks very much. I have appreciated the six steps on writing the Discussion chapter which are (i) Restating the research problem and questions (ii) Summarising the key findings (iii) Interpreting the results linked to relating to previous results in positive and negative ways; explaining whay different or same and contribution to field of research and expalnation of findings (iv) Acknowledgeing limitations (v) Recommendations for implementation and future resaerch and finally (vi) Providing a conscluding summary

My two questions are: 1. On step 1 and 2 can it be the overall or you restate and sumamrise on each findings based on the reaerch question? 2. On 4 and 5 do you do the acknowlledgement , recommendations on each research finding or overall. This is not clear from your expalanattion.

Please respond.

Ahmed

This post is very useful. I’m wondering whether practical implications must be introduced in the Discussion section or in the Conclusion section?

Lisha

Sigh, I never knew a 20 min video could have literally save my life like this. I found this at the right time!!!! Everything I need to know in one video thanks a mil ! OMGG and that 6 step!!!!!! was the cherry on top the cake!!!!!!!!!

Colbey mwenda

Thanks alot.., I have gained much

Obinna NJOKU

This piece is very helpful on how to go about my discussion section. I can always recommend GradCoach research guides for colleagues.

Mary Kulabako

Many thanks for this resource. It has been very helpful to me. I was finding it hard to even write the first sentence. Much appreciated.

vera

Thanks so much. Very helpful to know what is included in the discussion section

ahmad yassine

this was a very helpful and useful information

Md Moniruzzaman

This is very helpful. Very very helpful. Thanks for sharing this online!

Salma

it is very helpfull article, and i will recommend it to my fellow students. Thank you.

Mohammed Kwarah Tal

Superlative! More grease to your elbows.

Majani

Powerful, thank you for sharing.

Uno

Wow! Just wow! God bless the day I stumbled upon you guys’ YouTube videos! It’s been truly life changing and anxiety about my report that is due in less than a month has subsided significantly!

Joseph Nkitseng

Simplified explanation. Well done.

LE Sibeko

The presentation is enlightening. Thank you very much.

Angela

Thanks for the support and guidance

Beena

This has been a great help to me and thank you do much

Yiting W.

I second that “it is highly unlikely that a dissertation or thesis will scientifically prove something”; although, could you enlighten us on that comment and elaborate more please?

Derek Jansen

Sure, no problem.

Scientific proof is generally considered a very strong assertion that something is definitively and universally true. In most scientific disciplines, especially within the realms of natural and social sciences, absolute proof is very rare. Instead, researchers aim to provide evidence that supports or rejects hypotheses. This evidence increases or decreases the likelihood that a particular theory is correct, but it rarely proves something in the absolute sense.

Dissertations and theses, as substantial as they are, typically focus on exploring a specific question or problem within a larger field of study. They contribute to a broader conversation and body of knowledge. The aim is often to provide detailed insight, extend understanding, and suggest directions for further research rather than to offer definitive proof. These academic works are part of a cumulative process of knowledge building where each piece of research connects with others to gradually enhance our understanding of complex phenomena.

Furthermore, the rigorous nature of scientific inquiry involves continuous testing, validation, and potential refutation of ideas. What might be considered a “proof” at one point can later be challenged by new evidence or alternative interpretations. Therefore, the language of “proof” is cautiously used in academic circles to maintain scientific integrity and humility.

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

SkillsYouNeed

  • LEARNING SKILLS
  • Writing a Dissertation or Thesis
  • Results and Discussion

Search SkillsYouNeed:

Learning Skills:

  • A - Z List of Learning Skills
  • What is Learning?
  • Learning Approaches
  • Learning Styles
  • 8 Types of Learning Styles
  • Understanding Your Preferences to Aid Learning
  • Lifelong Learning
  • Decisions to Make Before Applying to University
  • Top Tips for Surviving Student Life
  • Living Online: Education and Learning
  • 8 Ways to Embrace Technology-Based Learning Approaches
  • Critical Thinking Skills
  • Critical Thinking and Fake News
  • Understanding and Addressing Conspiracy Theories
  • Critical Analysis
  • Study Skills
  • Exam Skills
  • How to Write a Research Proposal
  • Ethical Issues in Research
  • Dissertation: The Introduction
  • Researching and Writing a Literature Review
  • Writing your Methodology
  • Dissertation: Results and Discussion
  • Dissertation: Conclusions and Extras

Writing Your Dissertation or Thesis eBook

Writing a Dissertation or Thesis

Part of the Skills You Need Guide for Students .

  • Research Methods
  • Teaching, Coaching, Mentoring and Counselling
  • Employability Skills for Graduates

Subscribe to our FREE newsletter and start improving your life in just 5 minutes a day.

You'll get our 5 free 'One Minute Life Skills' and our weekly newsletter.

We'll never share your email address and you can unsubscribe at any time.

Writing your Dissertation:  Results and Discussion

When writing a dissertation or thesis, the results and discussion sections can be both the most interesting as well as the most challenging sections to write.

You may choose to write these sections separately, or combine them into a single chapter, depending on your university’s guidelines and your own preferences.

There are advantages to both approaches.

Writing the results and discussion as separate sections allows you to focus first on what results you obtained and set out clearly what happened in your experiments and/or investigations without worrying about their implications.This can focus your mind on what the results actually show and help you to sort them in your head.

However, many people find it easier to combine the results with their implications as the two are closely connected.

Check your university’s requirements carefully before combining the results and discussions sections as some specify that they must be kept separate.

Results Section

The Results section should set out your key experimental results, including any statistical analysis and whether or not the results of these are significant.

You should cover any literature supporting your interpretation of significance. It does not have to include everything you did, particularly for a doctorate dissertation. However, for an undergraduate or master's thesis, you will probably find that you need to include most of your work.

You should write your results section in the past tense: you are describing what you have done in the past.

Every result included MUST have a method set out in the methods section. Check back to make sure that you have included all the relevant methods.

Conversely, every method should also have some results given so, if you choose to exclude certain experiments from the results, make sure that you remove mention of the method as well.

If you are unsure whether to include certain results, go back to your research questions and decide whether the results are relevant to them. It doesn’t matter whether they are supportive or not, it’s about relevance. If they are relevant, you should include them.

Having decided what to include, next decide what order to use. You could choose chronological, which should follow the methods, or in order from most to least important in the answering of your research questions, or by research question and/or hypothesis.

You also need to consider how best to present your results: tables, figures, graphs, or text. Try to use a variety of different methods of presentation, and consider your reader: 20 pages of dense tables are hard to understand, as are five pages of graphs, but a single table and well-chosen graph that illustrate your overall findings will make things much clearer.

Make sure that each table and figure has a number and a title. Number tables and figures in separate lists, but consecutively by the order in which you mention them in the text. If you have more than about two or three, it’s often helpful to provide lists of tables and figures alongside the table of contents at the start of your dissertation.

Summarise your results in the text, drawing on the figures and tables to illustrate your points.

The text and figures should be complementary, not repeat the same information. You should refer to every table or figure in the text. Any that you don’t feel the need to refer to can safely be moved to an appendix, or even removed.

Make sure that you including information about the size and direction of any changes, including percentage change if appropriate. Statistical tests should include details of p values or confidence intervals and limits.

While you don’t need to include all your primary evidence in this section, you should as a matter of good practice make it available in an appendix, to which you should refer at the relevant point.

For example:

Details of all the interview participants can be found in Appendix A, with transcripts of each interview in Appendix B.

You will, almost inevitably, find that you need to include some slight discussion of your results during this section. This discussion should evaluate the quality of the results and their reliability, but not stray too far into discussion of how far your results support your hypothesis and/or answer your research questions, as that is for the discussion section.

See our pages: Analysing Qualitative Data and Simple Statistical Analysis for more information on analysing your results.

Discussion Section

This section has four purposes, it should:

  • Interpret and explain your results
  • Answer your research question
  • Justify your approach
  • Critically evaluate your study

The discussion section therefore needs to review your findings in the context of the literature and the existing knowledge about the subject.

You also need to demonstrate that you understand the limitations of your research and the implications of your findings for policy and practice. This section should be written in the present tense.

The Discussion section needs to follow from your results and relate back to your literature review . Make sure that everything you discuss is covered in the results section.

Some universities require a separate section on recommendations for policy and practice and/or for future research, while others allow you to include this in your discussion, so check the guidelines carefully.

Starting the Task

Most people are likely to write this section best by preparing an outline, setting out the broad thrust of the argument, and how your results support it.

You may find techniques like mind mapping are helpful in making a first outline; check out our page: Creative Thinking for some ideas about how to think through your ideas. You should start by referring back to your research questions, discuss your results, then set them into the context of the literature, and then into broader theory.

This is likely to be one of the longest sections of your dissertation, and it’s a good idea to break it down into chunks with sub-headings to help your reader to navigate through the detail.

Fleshing Out the Detail

Once you have your outline in front of you, you can start to map out how your results fit into the outline.

This will help you to see whether your results are over-focused in one area, which is why writing up your research as you go along can be a helpful process. For each theme or area, you should discuss how the results help to answer your research question, and whether the results are consistent with your expectations and the literature.

The Importance of Understanding Differences

If your results are controversial and/or unexpected, you should set them fully in context and explain why you think that you obtained them.

Your explanations may include issues such as a non-representative sample for convenience purposes, a response rate skewed towards those with a particular experience, or your own involvement as a participant for sociological research.

You do not need to be apologetic about these, because you made a choice about them, which you should have justified in the methodology section. However, you do need to evaluate your own results against others’ findings, especially if they are different. A full understanding of the limitations of your research is part of a good discussion section.

At this stage, you may want to revisit your literature review, unless you submitted it as a separate submission earlier, and revise it to draw out those studies which have proven more relevant.

Conclude by summarising the implications of your findings in brief, and explain why they are important for researchers and in practice, and provide some suggestions for further work.

You may also wish to make some recommendations for practice. As before, this may be a separate section, or included in your discussion.

The results and discussion, including conclusion and recommendations, are probably the most substantial sections of your dissertation. Once completed, you can begin to relax slightly: you are on to the last stages of writing!

Continue to: Dissertation: Conclusion and Extras Writing your Methodology

See also: Writing a Literature Review Writing a Research Proposal Academic Referencing What Is the Importance of Using a Plagiarism Checker to Check Your Thesis?

  • USC Libraries
  • Research Guides

Organizing Your Social Sciences Research Paper

  • 7. The Results
  • Purpose of Guide
  • Design Flaws to Avoid
  • Independent and Dependent Variables
  • Glossary of Research Terms
  • Reading Research Effectively
  • Narrowing a Topic Idea
  • Broadening a Topic Idea
  • Extending the Timeliness of a Topic Idea
  • Academic Writing Style
  • Applying Critical Thinking
  • Choosing a Title
  • Making an Outline
  • Paragraph Development
  • Research Process Video Series
  • Executive Summary
  • The C.A.R.S. Model
  • Background Information
  • The Research Problem/Question
  • Theoretical Framework
  • Citation Tracking
  • Content Alert Services
  • Evaluating Sources
  • Primary Sources
  • Secondary Sources
  • Tiertiary Sources
  • Scholarly vs. Popular Publications
  • Qualitative Methods
  • Quantitative Methods
  • Insiderness
  • Using Non-Textual Elements
  • Limitations of the Study
  • Common Grammar Mistakes
  • Writing Concisely
  • Avoiding Plagiarism
  • Footnotes or Endnotes?
  • Further Readings
  • Generative AI and Writing
  • USC Libraries Tutorials and Other Guides
  • Bibliography

The results section is where you report the findings of your study based upon the methodology [or methodologies] you applied to gather information. The results section should state the findings of the research arranged in a logical sequence without bias or interpretation. A section describing results should be particularly detailed if your paper includes data generated from your own research.

Annesley, Thomas M. "Show Your Cards: The Results Section and the Poker Game." Clinical Chemistry 56 (July 2010): 1066-1070.

Importance of a Good Results Section

When formulating the results section, it's important to remember that the results of a study do not prove anything . Findings can only confirm or reject the hypothesis underpinning your study. However, the act of articulating the results helps you to understand the problem from within, to break it into pieces, and to view the research problem from various perspectives.

The page length of this section is set by the amount and types of data to be reported . Be concise. Use non-textual elements appropriately, such as figures and tables, to present findings more effectively. In deciding what data to describe in your results section, you must clearly distinguish information that would normally be included in a research paper from any raw data or other content that could be included as an appendix. In general, raw data that has not been summarized should not be included in the main text of your paper unless requested to do so by your professor.

Avoid providing data that is not critical to answering the research question . The background information you described in the introduction section should provide the reader with any additional context or explanation needed to understand the results. A good strategy is to always re-read the background section of your paper after you have written up your results to ensure that the reader has enough context to understand the results [and, later, how you interpreted the results in the discussion section of your paper that follows].

Bavdekar, Sandeep B. and Sneha Chandak. "Results: Unraveling the Findings." Journal of the Association of Physicians of India 63 (September 2015): 44-46; Brett, Paul. "A Genre Analysis of the Results Section of Sociology Articles." English for Specific Speakers 13 (1994): 47-59; Go to English for Specific Purposes on ScienceDirect;Burton, Neil et al. Doing Your Education Research Project . Los Angeles, CA: SAGE, 2008; Results. The Structure, Format, Content, and Style of a Journal-Style Scientific Paper. Department of Biology. Bates College; Kretchmer, Paul. Twelve Steps to Writing an Effective Results Section. San Francisco Edit; "Reporting Findings." In Making Sense of Social Research Malcolm Williams, editor. (London;: SAGE Publications, 2003) pp. 188-207.

Structure and Writing Style

I.  Organization and Approach

For most research papers in the social and behavioral sciences, there are two possible ways of organizing the results . Both approaches are appropriate in how you report your findings, but use only one approach.

  • Present a synopsis of the results followed by an explanation of key findings . This approach can be used to highlight important findings. For example, you may have noticed an unusual correlation between two variables during the analysis of your findings. It is appropriate to highlight this finding in the results section. However, speculating as to why this correlation exists and offering a hypothesis about what may be happening belongs in the discussion section of your paper.
  • Present a result and then explain it, before presenting the next result then explaining it, and so on, then end with an overall synopsis . This is the preferred approach if you have multiple results of equal significance. It is more common in longer papers because it helps the reader to better understand each finding. In this model, it is helpful to provide a brief conclusion that ties each of the findings together and provides a narrative bridge to the discussion section of the your paper.

NOTE :   Just as the literature review should be arranged under conceptual categories rather than systematically describing each source, you should also organize your findings under key themes related to addressing the research problem. This can be done under either format noted above [i.e., a thorough explanation of the key results or a sequential, thematic description and explanation of each finding].

II.  Content

In general, the content of your results section should include the following:

  • Introductory context for understanding the results by restating the research problem underpinning your study . This is useful in re-orientating the reader's focus back to the research problem after having read a review of the literature and your explanation of the methods used for gathering and analyzing information.
  • Inclusion of non-textual elements, such as, figures, charts, photos, maps, tables, etc. to further illustrate key findings, if appropriate . Rather than relying entirely on descriptive text, consider how your findings can be presented visually. This is a helpful way of condensing a lot of data into one place that can then be referred to in the text. Consider referring to appendices if there is a lot of non-textual elements.
  • A systematic description of your results, highlighting for the reader observations that are most relevant to the topic under investigation . Not all results that emerge from the methodology used to gather information may be related to answering the " So What? " question. Do not confuse observations with interpretations; observations in this context refers to highlighting important findings you discovered through a process of reviewing prior literature and gathering data.
  • The page length of your results section is guided by the amount and types of data to be reported . However, focus on findings that are important and related to addressing the research problem. It is not uncommon to have unanticipated results that are not relevant to answering the research question. This is not to say that you don't acknowledge tangential findings and, in fact, can be referred to as areas for further research in the conclusion of your paper. However, spending time in the results section describing tangential findings clutters your overall results section and distracts the reader.
  • A short paragraph that concludes the results section by synthesizing the key findings of the study . Highlight the most important findings you want readers to remember as they transition into the discussion section. This is particularly important if, for example, there are many results to report, the findings are complicated or unanticipated, or they are impactful or actionable in some way [i.e., able to be pursued in a feasible way applied to practice].

NOTE:   Always use the past tense when referring to your study's findings. Reference to findings should always be described as having already happened because the method used to gather the information has been completed.

III.  Problems to Avoid

When writing the results section, avoid doing the following :

  • Discussing or interpreting your results . Save this for the discussion section of your paper, although where appropriate, you should compare or contrast specific results to those found in other studies [e.g., "Similar to the work of Smith [1990], one of the findings of this study is the strong correlation between motivation and academic achievement...."].
  • Reporting background information or attempting to explain your findings. This should have been done in your introduction section, but don't panic! Often the results of a study point to the need for additional background information or to explain the topic further, so don't think you did something wrong. Writing up research is rarely a linear process. Always revise your introduction as needed.
  • Ignoring negative results . A negative result generally refers to a finding that does not support the underlying assumptions of your study. Do not ignore them. Document these findings and then state in your discussion section why you believe a negative result emerged from your study. Note that negative results, and how you handle them, can give you an opportunity to write a more engaging discussion section, therefore, don't be hesitant to highlight them.
  • Including raw data or intermediate calculations . Ask your professor if you need to include any raw data generated by your study, such as transcripts from interviews or data files. If raw data is to be included, place it in an appendix or set of appendices that are referred to in the text.
  • Be as factual and concise as possible in reporting your findings . Do not use phrases that are vague or non-specific, such as, "appeared to be greater than other variables..." or "demonstrates promising trends that...." Subjective modifiers should be explained in the discussion section of the paper [i.e., why did one variable appear greater? Or, how does the finding demonstrate a promising trend?].
  • Presenting the same data or repeating the same information more than once . If you want to highlight a particular finding, it is appropriate to do so in the results section. However, you should emphasize its significance in relation to addressing the research problem in the discussion section. Do not repeat it in your results section because you can do that in the conclusion of your paper.
  • Confusing figures with tables . Be sure to properly label any non-textual elements in your paper. Don't call a chart an illustration or a figure a table. If you are not sure, go here .

Annesley, Thomas M. "Show Your Cards: The Results Section and the Poker Game." Clinical Chemistry 56 (July 2010): 1066-1070; Bavdekar, Sandeep B. and Sneha Chandak. "Results: Unraveling the Findings." Journal of the Association of Physicians of India 63 (September 2015): 44-46; Burton, Neil et al. Doing Your Education Research Project . Los Angeles, CA: SAGE, 2008;  Caprette, David R. Writing Research Papers. Experimental Biosciences Resources. Rice University; Hancock, Dawson R. and Bob Algozzine. Doing Case Study Research: A Practical Guide for Beginning Researchers . 2nd ed. New York: Teachers College Press, 2011; Introduction to Nursing Research: Reporting Research Findings. Nursing Research: Open Access Nursing Research and Review Articles. (January 4, 2012); Kretchmer, Paul. Twelve Steps to Writing an Effective Results Section. San Francisco Edit ; Ng, K. H. and W. C. Peh. "Writing the Results." Singapore Medical Journal 49 (2008): 967-968; Reporting Research Findings. Wilder Research, in partnership with the Minnesota Department of Human Services. (February 2009); Results. The Structure, Format, Content, and Style of a Journal-Style Scientific Paper. Department of Biology. Bates College; Schafer, Mickey S. Writing the Results. Thesis Writing in the Sciences. Course Syllabus. University of Florida.

Writing Tip

Why Don't I Just Combine the Results Section with the Discussion Section?

It's not unusual to find articles in scholarly social science journals where the author(s) have combined a description of the findings with a discussion about their significance and implications. You could do this. However, if you are inexperienced writing research papers, consider creating two distinct sections for each section in your paper as a way to better organize your thoughts and, by extension, your paper. Think of the results section as the place where you report what your study found; think of the discussion section as the place where you interpret the information and answer the "So What?" question. As you become more skilled writing research papers, you can consider melding the results of your study with a discussion of its implications.

Driscoll, Dana Lynn and Aleksandra Kasztalska. Writing the Experimental Report: Methods, Results, and Discussion. The Writing Lab and The OWL. Purdue University.

  • << Previous: Insiderness
  • Next: Using Non-Textual Elements >>
  • Last Updated: May 22, 2024 12:03 PM
  • URL: https://libguides.usc.edu/writingguide

example of research results and discussion

Writing the Dissertation - Guides for Success: The Results and Discussion

  • Writing the Dissertation Homepage
  • Overview and Planning
  • The Literature Review
  • The Methodology
  • The Results and Discussion
  • The Conclusion
  • The Abstract
  • The Difference
  • What to Avoid

Overview of writing the results and discussion

The results and discussion follow on from the methods or methodology chapter of the dissertation. This creates a natural transition from how you designed your study, to what your study reveals, highlighting your own contribution to the research area.

Disciplinary differences

Please note: this guide is not specific to any one discipline. The results and discussion can vary depending on the nature of the research and the expectations of the school or department, so please adapt the following advice to meet the demands of your project and department. Consult your supervisor for further guidance; you can also peruse our  Writing Across Subjects guide .

Guide contents

As part of the Writing the Dissertation series, this guide covers the most common conventions of the results and discussion chapters, giving you the necessary knowledge, tips and guidance needed to impress your markers! The sections are organised as follows:

  • The Difference  - Breaks down the distinctions between the results and discussion chapters.
  • Results  - Provides a walk-through of common characteristics of the results chapter.
  • Discussion - Provides a walk-through of how to approach writing your discussion chapter, including structure.
  • What to Avoid  - Covers a few frequent mistakes you'll want to...avoid!
  • FAQs  - Guidance on first- vs. third-person, limitations and more.
  • Checklist  - Includes a summary of key points and a self-evaluation checklist.

Training and tools

  • The Academic Skills team has recorded a Writing the Dissertation workshop series to help you with each section of a standard dissertation, including a video on writing the results and discussion   (embedded below).
  • The dissertation planner tool can help you think through the timeline for planning, research, drafting and editing.
  • iSolutions offers training and a Word template to help you digitally format and structure your dissertation.

Introduction

The results of your study are often followed by a separate chapter of discussion. This is certainly the case with scientific writing. Some dissertations, however, might incorporate both the results and discussion in one chapter. This depends on the nature of your dissertation and the conventions within your school or department. Always follow the guidelines given to you and ask your supervisor for further guidance.

As part of the Writing the Dissertation series, this guide covers the essentials of writing your results and discussion, giving you the necesary knowledge, tips and guidance needed to leave a positive impression on your markers! This guide covers the results and discussion as separate – although interrelated – chapters, as you'll see in the next two tabs. However, you can easily adapt the guidance to suit one single chapter – keep an eye out for some hints on how to do this throughout the guide.

Results or discussion - what's the difference?

To understand what the results and discussion sections are about, we need to clearly define the difference between the two.

The results should provide a clear account of the findings . This is written in a dry and direct manner, simply highlighting the findings as they appear once processed. It’s expected to have tables and graphics, where relevant, to contextualise and illustrate the data.

Rather than simply stating the findings of the study, the discussion interprets the findings  to offer a more nuanced understanding of the research. The discussion is similar to the second half of the conclusion because it’s where you consider and formulate a response to the question, ‘what do we now know that we didn’t before?’ (see our Writing the Conclusion   guide for more). The discussion achieves this by answering the research questions and responding to any hypotheses proposed. With this in mind, the discussion should be the most insightful chapter or section of your dissertation because it provides the most original insight.

Across the next two tabs of this guide, we will look at the results and discussion chapters separately in more detail.

Writing the results

The results chapter should provide a direct and factual account of the data collected without any interpretation or interrogation of the findings. As this might suggest, the results chapter can be slightly monotonous, particularly for quantitative data. Nevertheless, it’s crucial that you present your results in a clear and direct manner as it provides the necessary detail for your subsequent discussion.

Note: If you’re writing your results and discussion as one chapter, then you can either:

1) write them as distinctly separate sections in the same chapter, with the discussion following on from the results, or...

2) integrate the two throughout by presenting a subset of the results and then discussing that subset in further detail.

Next, we'll explore some of the most important factors to consider when writing your results chapter.

How you structure your results chapter depends on the design and purpose of your study. Here are some possible options for structuring your results chapter (adapted from Glatthorn and Joyner, 2005):

  • Chronological – depending on the nature of the study, it might be important to present your results in order of how you collected the data, such as a pretest-posttest design.
  • Research method – if you’ve used a mixed-methods approach, you could isolate each research method and instrument employed in the study.
  • Research question and/or hypotheses – you could structure your results around your research questions and/or hypotheses, providing you have more than one. However, keep in mind that the results on their own don’t necessarily answer the questions or respond to the hypotheses in a definitive manner. You need to interpret the findings in the discussion chapter to gain a more rounded understanding.
  • Variable – you could isolate each variable in your study (where relevant) and specify how and whether the results changed.

Tables and figures

For your results, you are expected to convert your data into tables and figures, particularly when dealing with quantitative data. Making use of tables and figures is a way of contextualising your results within the study. It also helps to visually reinforce your written account of the data. However, make sure you’re only using tables and figures to supplement , rather than replace, your written account of the results (see the 'What to avoid' tab for more on this).

Figures and tables need to be numbered in order of when they appear in the dissertation, and they should be capitalised. You also need to make direct reference to them in the text, which you can do (with some variation) in one of the following ways:

Figure 1 shows…

The results of the test (see Figure 1) demonstrate…

The actual figures and tables themselves also need to be accompanied by a caption that briefly outlines what is displayed. For example:

Table 1. Variables of the regression model

Table captions normally appear above the table, whilst figures or other such graphical forms appear below, although it’s worth confirming this with your supervisor as the formatting can change depending on the school or discipline. The style guide used for writing in your subject area (e.g., Harvard, MLA, APA, OSCOLA) often dictates correct formatting of tables, graphs and figures, so have a look at your style guide for additional support.

Using quotations

If your qualitative data comes from interviews and focus groups, your data will largely consist of quotations from participants. When presenting this data, you should identify and group the most common and interesting responses and then quote two or three relevant examples to illustrate this point. Here’s a brief example from a qualitative study on the habits of online food shoppers:

Regardless of whether or not participants regularly engage in online food shopping, all but two respondents commented, in some form, on the convenience of online food shopping:

"It’s about convenience for me. I’m at work all week and the weekend doesn’t allow much time for food shopping, so knowing it can be ordered and then delivered in 24 hours is great for me” (Participant A).

"It fits around my schedule, which is important for me and my family” (Participant D).

"In the past, I’ve always gone food shopping after work, which has always been a hassle. Online food shopping, however, frees up some of my time” (Participant E).

As shown in this example, each quotation is attributed to a particular participant, although their anonymity is protected. The details used to identify participants can depend on the relevance of certain factors to the research. For instance, age or gender could be included.

Writing the discussion

The discussion chapter is where “you critically examine your own results in the light of the previous state of the subject as outlined in the background, and make judgments as to what has been learnt in your work” (Evans et al., 2014: 12). Whilst the results chapter is strictly factual, reporting on the data on a surface level, the discussion is rooted in analysis and interpretation , allowing you and your reader to delve beneath the surface.

Next, we will review some of the most important factors to consider when writing your discussion chapter.

Like the results, there is no single way to structure your discussion chapter. As always, it depends on the nature of your dissertation and whether you’re dealing with qualitative, quantitative or mixed-methods research. It’s good to be consistent with the results chapter, so you could structure your discussion chapter, where possible, in the same way as your results.

When it comes to structure, it’s particularly important that you guide your reader through the various points, subtopics or themes of your discussion. You should do this by structuring sections of your discussion, which might incorporate three or four paragraphs around the same theme or issue, in a three-part way that mirrors the typical three-part essay structure of introduction, main body and conclusion.

Cycle of introduction (topic sentence), to main body (analysis), to conclusion (takeaways). Graphic at right shows cycle repeating 3, 5, and 4 times for subtopics A, B, and C.

Figure 1: The three-part cycle that embodies a typical essay structure and reflects how you structure themes or subtopics in your discussion.

This is your topic sentence where you clearly state the focus of this paragraph/section. It’s often a fairly short, declarative statement in order to grab the reader’s attention, and it should be clearly related to your research purpose, such as responding to a research question.

This constitutes your analysis where you explore the theme or focus, outlined in the topic sentence, in further detail by interrogating why this particular theme or finding emerged and the significance of this data. This is also where you bring in the relevant secondary literature.

This is the evaluative stage of the cycle where you explicitly return back to the topic sentence and tell the reader what this means in terms of answering the relevant research question and establishing new knowledge. It could be a single sentence, or a short paragraph, and it doesn’t strictly need to appear at the end of every section or theme. Instead, some prefer to bring the main themes together towards the end of the discussion in a single paragraph or two. Either way, it’s imperative that you evaluate the significance of your discussion and tell the reader what this means.

A note on the three-part structure

This is often how you’re taught to construct a paragraph, but the themes and ideas you engage with at dissertation level are going to extend beyond the confines of a short paragraph. Therefore, this is a structure to guide how you write about particular themes or patterns in your discussion. Think of this structure like a cycle that you can engage in its smallest form to shape a paragraph; in a slightly larger form to shape a subsection of a chapter; and in its largest form to shape the entire chapter. You can 'level up' the same basic structure to accommodate a deeper breadth of thinking and critical engagement.

Using secondary literature

Your discussion chapter should return to the relevant literature (previously identified in your literature review ) in order to contextualise and deepen your reader’s understanding of the findings. This might help to strengthen your findings, or you might find contradictory evidence that serves to counter your results. In the case of the latter, it’s important that you consider why this might be and the implications for this. It’s through your incorporation of secondary literature that you can consider the question, ‘What do we now know that we didn’t before?’

Limitations

You may have included a limitations section in your methodology chapter (see our Writing the Methodology guide ), but it’s also common to have one in your discussion chapter. The difference here is that your limitations are directly associated with your results and the capacity to interpret and analyse those results.

Think of it this way: the limitations in your methodology refer to the issues identified before conducting the research, whilst the limitations in your discussion refer to the issues that emerged after conducting the research. For example, you might only be able to identify a limitation about the external validity or generalisability of your research once you have processed and analysed the data. Try not to overstress the limitations of your work – doing so can undermine the work you’ve done – and try to contextualise them, perhaps by relating them to certain limitations of other studies.

Recommendations

It’s often good to follow your limitations with some recommendations for future research. This creates a neat linearity from what didn’t work, or what could be improved, to how other researchers could address these issues in the future. This helps to reposition your limitations in a positive way by offering an action-oriented response. Try to limit the amount of recommendations you discuss – too many can bring the end of your discussion to a rather negative end as you’re ultimately focusing on what should be done, rather than what you have done. You also don’t need to repeat the recommendations in your conclusion if you’ve included them here.

What to avoid

This portion of the guide will cover some common missteps you should try to avoid in writing your results and discussion.

Over-reliance on tables and figures

It’s very common to produce visual representations of data, such as graphs and tables, and to use these representations in your results chapter. However, the use of these figures should not entirely replace your written account of the data. You don’t need to specify every detail in the data set, but you should provide some written account of what the data shows, drawing your reader’s attention to the most important elements of the data. The figures should support your account and help to contextualise your results. Simply stating, ‘look at Table 1’, without any further detail is not sufficient. Writers often try to do this as a way of saving words, but your markers will know!

Ignoring unexpected or contradictory data

Research can be a complex process with ups and downs, surprises and anomalies. Don’t be tempted to ignore any data that doesn’t meet your expectations, or that perhaps you’re struggling to explain. Failing to report on data for these, and other such reasons, is a problem because it undermines your credibility as a researcher, which inevitably undermines your research in the process. You have to do your best to provide some reason to such data. For instance, there might be some methodological reason behind a particular trend in the data.

Including raw data

You don’t need to include any raw data in your results chapter – raw data meaning unprocessed data that hasn’t undergone any calculations or other such refinement. This can overwhelm your reader and obscure the clarity of the research. You can include raw data in an appendix, providing you feel it’s necessary.

Presenting new results in the discussion

You shouldn’t be stating original findings for the first time in the discussion chapter. The findings of your study should first appear in your results before elaborating on them in the discussion.

Overstressing the significance of your research

It’s important that you clarify what your research demonstrates so you can highlight your own contribution to the research field. However, don’t overstress or inflate the significance of your results. It’s always difficult to provide definitive answers in academic research, especially with qualitative data. You should be confident and authoritative where possible, but don’t claim to reach the absolute truth when perhaps other conclusions could be reached. Where necessary, you should use hedging (see definition) to slightly soften the tone and register of your language.

Definition: Hedging refers to 'the act of expressing your attitude or ideas in tentative or cautious ways' (Singh and Lukkarila, 2017: 101). It’s mostly achieved through a number of verbs or adverbs, such as ‘suggest’ or ‘seemingly.’

Q: What’s the difference between the results and discussion?

A: The results chapter is a factual account of the data collected, whilst the discussion considers the implications of these findings by relating them to relevant literature and answering your research question(s). See the tab 'The Differences' in this guide for more detail.

Q: Should the discussion include recommendations for future research?

A: Your dissertation should include some recommendations for future research, but it can vary where it appears. Recommendations are often featured towards the end of the discussion chapter, but they also regularly appear in the conclusion chapter (see our Writing the Conclusion guide   for more). It simply depends on your dissertation and the conventions of your school or department. It’s worth consulting any specific guidance that you’ve been given, or asking your supervisor directly.

Q: Should the discussion include the limitations of the study?

A: Like the answer above, you should engage with the limitations of your study, but it might appear in the discussion of some dissertations, or the conclusion of others. Consider the narrative flow and whether it makes sense to include the limitations in your discussion chapter, or your conclusion. You should also consult any discipline-specific guidance you’ve been given, or ask your supervisor for more. Be mindful that this is slightly different to the limitations outlined in the methodology or methods chapter (see our Writing the Methodology guide vs. the 'Discussion' tab of this guide).

Q: Should the results and discussion be in the first-person or third?

A: It’s important to be consistent , so you should use whatever you’ve been using throughout your dissertation. Third-person is more commonly accepted, but certain disciplines are happy with the use of first-person. Just remember that the first-person pronoun can be a distracting, but powerful device, so use it sparingly. Consult your lecturer for discipline-specific guidance.

Q: Is there a difference between the discussion and the conclusion of a dissertation?

A: Yes, there is a difference. The discussion chapter is a detailed consideration of how your findings answer your research questions. This includes the use of secondary literature to help contextualise your discussion. Rather than considering the findings in detail, the conclusion briefly summarises and synthesises the main findings of your study before bringing the dissertation to a close. Both are similar, particularly in the way they ‘broaden out’ to consider the wider implications of the research. They are, however, their own distinct chapters, unless otherwise stated by your supervisor.

The results and discussion chapters (or chapter) constitute a large part of your dissertation as it’s here where your original contribution is foregrounded and discussed in detail. Remember, the results chapter simply reports on the data collected, whilst the discussion is where you consider your research questions and/or hypothesis in more detail by interpreting and interrogating the data. You can integrate both into a single chapter and weave the interpretation of your findings throughout the chapter, although it’s common for both the results and discussion to appear as separate chapters. Consult your supervisor for further guidance.

Here’s a final checklist for writing your results and discussion. Remember that not all of these points will be relevant for you, so make sure you cover whatever’s appropriate for your dissertation. The asterisk (*) indicates any content that might not be relevant for your dissertation. To download a copy of the checklist to save and edit, please use the Word document, below.

  • Results and discussion self-evaluation checklist

Decorative

  • << Previous: The Methodology
  • Next: The Conclusion >>
  • Last Updated: May 23, 2024 9:36 AM
  • URL: https://library.soton.ac.uk/writing_the_dissertation

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Dissertation
  • How to Write a Discussion Section | Tips & Examples

How to Write a Discussion Section | Tips & Examples

Published on 21 August 2022 by Shona McCombes . Revised on 25 October 2022.

Discussion section flow chart

The discussion section is where you delve into the meaning, importance, and relevance of your results .

It should focus on explaining and evaluating what you found, showing how it relates to your literature review , and making an argument in support of your overall conclusion . It should not be a second results section .

There are different ways to write this section, but you can focus your writing around these key elements:

  • Summary: A brief recap of your key results
  • Interpretations: What do your results mean?
  • Implications: Why do your results matter?
  • Limitations: What can’t your results tell us?
  • Recommendations: Avenues for further studies or analyses

Instantly correct all language mistakes in your text

Be assured that you'll submit flawless writing. Upload your document to correct all your mistakes.

upload-your-document-ai-proofreader

Table of contents

What not to include in your discussion section, step 1: summarise your key findings, step 2: give your interpretations, step 3: discuss the implications, step 4: acknowledge the limitations, step 5: share your recommendations, discussion section example.

There are a few common mistakes to avoid when writing the discussion section of your paper.

  • Don’t introduce new results: You should only discuss the data that you have already reported in your results section .
  • Don’t make inflated claims: Avoid overinterpretation and speculation that isn’t directly supported by your data.
  • Don’t undermine your research: The discussion of limitations should aim to strengthen your credibility, not emphasise weaknesses or failures.

The only proofreading tool specialized in correcting academic writing

The academic proofreading tool has been trained on 1000s of academic texts and by native English editors. Making it the most accurate and reliable proofreading tool for students.

example of research results and discussion

Correct my document today

Start this section by reiterating your research problem  and concisely summarising your major findings. Don’t just repeat all the data you have already reported – aim for a clear statement of the overall result that directly answers your main  research question . This should be no more than one paragraph.

Many students struggle with the differences between a discussion section and a results section . The crux of the matter is that your results sections should present your results, and your discussion section should subjectively evaluate them. Try not to blend elements of these two sections, in order to keep your paper sharp.

  • The results indicate that …
  • The study demonstrates a correlation between …
  • This analysis supports the theory that …
  • The data suggest  that …

The meaning of your results may seem obvious to you, but it’s important to spell out their significance for your reader, showing exactly how they answer your research question.

The form of your interpretations will depend on the type of research, but some typical approaches to interpreting the data include:

  • Identifying correlations , patterns, and relationships among the data
  • Discussing whether the results met your expectations or supported your hypotheses
  • Contextualising your findings within previous research and theory
  • Explaining unexpected results and evaluating their significance
  • Considering possible alternative explanations and making an argument for your position

You can organise your discussion around key themes, hypotheses, or research questions, following the same structure as your results section. Alternatively, you can also begin by highlighting the most significant or unexpected results.

  • In line with the hypothesis …
  • Contrary to the hypothesised association …
  • The results contradict the claims of Smith (2007) that …
  • The results might suggest that x . However, based on the findings of similar studies, a more plausible explanation is x .

As well as giving your own interpretations, make sure to relate your results back to the scholarly work that you surveyed in the literature review . The discussion should show how your findings fit with existing knowledge, what new insights they contribute, and what consequences they have for theory or practice.

Ask yourself these questions:

  • Do your results support or challenge existing theories? If they support existing theories, what new information do they contribute? If they challenge existing theories, why do you think that is?
  • Are there any practical implications?

Your overall aim is to show the reader exactly what your research has contributed, and why they should care.

  • These results build on existing evidence of …
  • The results do not fit with the theory that …
  • The experiment provides a new insight into the relationship between …
  • These results should be taken into account when considering how to …
  • The data contribute a clearer understanding of …
  • While previous research has focused on  x , these results demonstrate that y .

Even the best research has its limitations. Acknowledging these is important to demonstrate your credibility. Limitations aren’t about listing your errors, but about providing an accurate picture of what can and cannot be concluded from your study.

Limitations might be due to your overall research design, specific methodological choices , or unanticipated obstacles that emerged during your research process.

Here are a few common possibilities:

  • If your sample size was small or limited to a specific group of people, explain how generalisability is limited.
  • If you encountered problems when gathering or analysing data, explain how these influenced the results.
  • If there are potential confounding variables that you were unable to control, acknowledge the effect these may have had.

After noting the limitations, you can reiterate why the results are nonetheless valid for the purpose of answering your research question.

  • The generalisability of the results is limited by …
  • The reliability of these data is impacted by …
  • Due to the lack of data on x , the results cannot confirm …
  • The methodological choices were constrained by …
  • It is beyond the scope of this study to …

Based on the discussion of your results, you can make recommendations for practical implementation or further research. Sometimes, the recommendations are saved for the conclusion .

Suggestions for further research can lead directly from the limitations. Don’t just state that more studies should be done – give concrete ideas for how future work can build on areas that your own research was unable to address.

  • Further research is needed to establish …
  • Future studies should take into account …
  • Avenues for future research include …

Discussion section example

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

McCombes, S. (2022, October 25). How to Write a Discussion Section | Tips & Examples. Scribbr. Retrieved 21 May 2024, from https://www.scribbr.co.uk/thesis-dissertation/discussion/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, how to write a results section | tips & examples, research paper appendix | example & templates, how to write a thesis or dissertation introduction.

Research Skills

Results, discussion, and conclusion, results/findings.

The Results (or Findings) section follows the Methods and precedes the Discussion section. This is where the authors provide the data collected during their study. That data can sometimes be difficult to understand because it is often quite technical. Do not let this intimidate you; you will discover the significance of the results next.

The Discussion section follows the Results and precedes the Conclusions and Recommendations section. It is here that the authors indicate the significance of their results. They answer the question, “Why did we get the results we did?” This section provides logical explanations for the results from the study. Those explanations are often reached by comparing and contrasting the results to prior studies’ findings, so citations to the studies discussed in the Literature Review generally reappear here. This section also usually discusses the limitations of the study and speculates on what the results say about the problem(s) identified in the research question(s). This section is very important because it is finally moving towards an argument. Since the researchers interpret their results according to theoretical underpinnings in this section, there is more room for difference of opinion. The way the authors interpret their results may be quite different from the way you would interpret them or the way another researcher would interpret them.

Note: Some articles collapse the Discussion and Conclusion sections together under a single heading (usually “Conclusion”). If you don’t see a separate Discussion section, don’t worry.  Instead, look in the nearby sections for the types of information described in the paragraph above.

When you first skim an article, it may be useful to go straight to the Conclusion and see if you can figure out what the thesis is since it is usually in this final section. The research gap identified in the introduction indicates what the researchers wanted to look at; what did they claim, ultimately, when they completed their research? What did it show them—and what are they showing us—about the topic? Did they get the results they expected? Why or why not? The thesis is not a sweeping proclamation; rather, it is likely a very reasonable and conditional claim.

Nearly every research article ends by inviting other scholars to continue the work by saying that more research needs to be done on the matter. However, do not mistake this directive for the thesis; it’s a convention. Often, the authors provide specific details about future possible studies that could or should be conducted in order to make more sense of their own study’s conclusions.

  • Parts of An Article. Authored by : Kerry Bowers. Provided by : University of Mississippi. Project : WRIT 250 Committee OER Project. License : CC BY-SA: Attribution-ShareAlike

Footer Logo Lumen Candela

Privacy Policy

UCI Libraries Mobile Site

  • Langson Library
  • Science Library
  • Grunigen Medical Library
  • Law Library
  • Connect From Off-Campus
  • Accessibility
  • Gateway Study Center

Libaries home page

Email this link

Writing a scientific paper.

  • Writing a lab report
  • INTRODUCTION

Writing a "good" results section

Figures and Captions in Lab Reports

"Results Checklist" from: How to Write a Good Scientific Paper. Chris A. Mack. SPIE. 2018.

Additional tips for results sections.

  • LITERATURE CITED
  • Bibliography of guides to scientific writing and presenting
  • Peer Review
  • Presentations
  • Lab Report Writing Guides on the Web

This is the core of the paper. Don't start the results sections with methods you left out of the Materials and Methods section. You need to give an overall description of the experiments and present the data you found.

  • Factual statements supported by evidence. Short and sweet without excess words
  • Present representative data rather than endlessly repetitive data
  • Discuss variables only if they had an effect (positive or negative)
  • Use meaningful statistics
  • Avoid redundancy. If it is in the tables or captions you may not need to repeat it

A short article by Dr. Brett Couch and Dr. Deena Wassenberg, Biology Program, University of Minnesota

  • Present the results of the paper, in logical order, using tables and graphs as necessary.
  • Explain the results and show how they help to answer the research questions posed in the Introduction. Evidence does not explain itself; the results must be presented and then explained. 
  • Avoid: presenting results that are never discussed;  presenting results in chronological order rather than logical order; ignoring results that do not support the conclusions; 
  • Number tables and figures separately beginning with 1 (i.e. Table 1, Table 2, Figure 1, etc.).
  • Do not attempt to evaluate the results in this section. Report only what you found; hold all discussion of the significance of the results for the Discussion section.
  • It is not necessary to describe every step of your statistical analyses. Scientists understand all about null hypotheses, rejection rules, and so forth and do not need to be reminded of them. Just say something like, "Honeybees did not use the flowers in proportion to their availability (X2 = 7.9, p<0.05, d.f.= 4, chi-square test)." Likewise, cite tables and figures without describing in detail how the data were manipulated. Explanations of this sort should appear in a legend or caption written on the same page as the figure or table.
  • You must refer in the text to each figure or table you include in your paper.
  • Tables generally should report summary-level data, such as means ± standard deviations, rather than all your raw data.  A long list of all your individual observations will mean much less than a few concise, easy-to-read tables or figures that bring out the main findings of your study.  
  • Only use a figure (graph) when the data lend themselves to a good visual representation.  Avoid using figures that show too many variables or trends at once, because they can be hard to understand.

From:  https://writingcenter.gmu.edu/guides/imrad-results-discussion

  • << Previous: METHODS
  • Next: DISCUSSION >>
  • Last Updated: Aug 4, 2023 9:33 AM
  • URL: https://guides.lib.uci.edu/scientificwriting

Off-campus? Please use the Software VPN and choose the group UCIFull to access licensed content. For more information, please Click here

Software VPN is not available for guests, so they may not have access to some content when connecting from off-campus.

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Results & Discussion

Characteristics of results & discussion.

  • Results section contains data collected by scientists from experiments that they conducted.
  • Data can be measurements, numbers, descriptions and/or observations.
  • Scientific data is typically described using graphs, tables, figures, diagrams, maps, charts, photographs and/or equations.
  • Discussion section provides an interpretation of the data, especially in context to previously published research.

The Results and Discussion sections can be written as separate sections (as shown in Fig. 2 ), but are often combined in a poster into one section called Results and Discussion.   This is done in order to (1) save precious space on a poster for the many pieces of information that a scientist would like to tell their audience and (2) by combining the two sections, it becomes easier for the audience to understand the significance of the research.   Combining the Results section and Discussion section in a poster is different for what is typically done for a scientific journal article.   In most journal articles, the Results section is separated from the Discussion section.   Journal articles are different from posters in that a scientist is not standing next to their journal article explaining it to a reader.   Therefore, in a journal article, an author needs to provide more detailed information so that the reader can understand the research independently.   Separating the Results section and Discussion section allows an author the space necessary to write a lengthier description of the research. Journal articles typically contain more text and more content (e.g., figures, tables) than posters.

The Results and Discussion section should contain data, typically in the form of a graph, histogram, chart, image, color-coded map or table ( Figs. 1 & 4 ).   Very often data means numbers that scientists collect from making measurements.   These data are typically presented to an audience in the form of graphs and charts to show a reader how these numbers change over time, space or experimental conditions ( Fig. 7 ).   Numbers can increase, decrease or stay the same and a graph, or another type of figure, can be effectively used to convey this information to a reader in a visual format ( Fig. 7 ).      

Figure 7. Example of a Graph

bar graph showing deciduous trees in Highbanks Metro Park

An audience will be attracted to a poster because of its figures and so it is very important for the author to pay particular attention to the creation, design and placement of the figures in a poster ( Figs. 1 & 4 ).   A good figure is one that is informative, easy to comprehend and allows the reader to understand the significance of the data and experiment.   Very often an author will use color to draw attention to a figure.      

The Discussion section should state the importance of the research that is presented in the poster.   It should provide an interpretation of the results, especially in context to previously published research.   It may propose future experiments that need to be conducted as a result of the research presented in the poster.   It should clearly illustrate the significance of the research with regards to new knowledge, understanding and/or discoveries that were made as part of the research.

Scientific Posters: A Learner's Guide Copyright © 2020 by Ella Weaver; Kylienne A. Shaul; Henry Griffy; and Brian H. Lower is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License , except where otherwise noted.

Share This Book

Loading metrics

Open Access

Peer-reviewed

Meta-Research Article

Meta-Research Articles feature data-driven examinations of the methods, reporting, verification, and evaluation of scientific research.

See Journal Information »

Assessing the evolution of research topics in a biological field using plant science as an example

Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

Affiliations Department of Plant Biology, Michigan State University, East Lansing, Michigan, United States of America, Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, Michigan, United States of America, DOE-Great Lake Bioenergy Research Center, Michigan State University, East Lansing, Michigan, United States of America

ORCID logo

Roles Conceptualization, Investigation, Project administration, Supervision, Writing – review & editing

Affiliation Department of Plant Biology, Michigan State University, East Lansing, Michigan, United States of America

  • Shin-Han Shiu, 
  • Melissa D. Lehti-Shiu

PLOS

  • Published: May 23, 2024
  • https://doi.org/10.1371/journal.pbio.3002612
  • Peer Review
  • Reader Comments

Fig 1

Scientific advances due to conceptual or technological innovations can be revealed by examining how research topics have evolved. But such topical evolution is difficult to uncover and quantify because of the large body of literature and the need for expert knowledge in a wide range of areas in a field. Using plant biology as an example, we used machine learning and language models to classify plant science citations into topics representing interconnected, evolving subfields. The changes in prevalence of topical records over the last 50 years reflect shifts in major research trends and recent radiation of new topics, as well as turnover of model species and vastly different plant science research trajectories among countries. Our approaches readily summarize the topical diversity and evolution of a scientific field with hundreds of thousands of relevant papers, and they can be applied broadly to other fields.

Citation: Shiu S-H, Lehti-Shiu MD (2024) Assessing the evolution of research topics in a biological field using plant science as an example. PLoS Biol 22(5): e3002612. https://doi.org/10.1371/journal.pbio.3002612

Academic Editor: Ulrich Dirnagl, Charite Universitatsmedizin Berlin, GERMANY

Received: October 16, 2023; Accepted: April 4, 2024; Published: May 23, 2024

Copyright: © 2024 Shiu, Lehti-Shiu. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The plant science corpus data are available through Zenodo ( https://zenodo.org/records/10022686 ). The codes for the entire project are available through GitHub ( https://github.com/ShiuLab/plant_sci_hist ) and Zenodo ( https://doi.org/10.5281/zenodo.10894387 ).

Funding: This work was supported by the National Science Foundation (IOS-2107215 and MCB-2210431 to MDL and SHS; DGE-1828149 and IOS-2218206 to SHS), Department of Energy grant Great Lakes Bioenergy Research Center (DE-SC0018409 to SHS). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Abbreviations: BERT, Bidirectional Encoder Representations from Transformers; br, brassinosteroid; ccTLD, country code Top Level Domain; c-Tf-Idf, class-based Tf-Idf; ChatGPT, Chat Generative Pretrained Transformer; ga, gibberellic acid; LOWESS, locally weighted scatterplot smoothing; MeSH, Medical Subject Heading; SHAP, SHapley Additive exPlanations; SJR, SCImago Journal Rank; Tf-Idf, Term frequency-Inverse document frequency; UMAP, Uniform Manifold Approximation and Projection

Introduction

The explosive growth of scientific data in recent years has been accompanied by a rapidly increasing volume of literature. These records represent a major component of our scientific knowledge and embody the history of conceptual and technological advances in various fields over time. Our ability to wade through these records is important for identifying relevant literature for specific topics, a crucial practice of any scientific pursuit [ 1 ]. Classifying the large body of literature into topics can provide a useful means to identify relevant literature. In addition, these topics offer an opportunity to assess how scientific fields have evolved and when major shifts in took place. However, such classification is challenging because the relevant articles in any topic or domain can number in the tens or hundreds of thousands, and the literature is in the form of natural language, which takes substantial effort and expertise to process [ 2 , 3 ]. In addition, even if one could digest all literature in a field, it would still be difficult to quantify such knowledge.

In the last several years, there has been a quantum leap in natural language processing approaches due to the feasibility of building complex deep learning models with highly flexible architectures [ 4 , 5 ]. The development of large language models such as Bidirectional Encoder Representations from Transformers (BERT; [ 6 ]) and Chat Generative Pretrained Transformer (ChatGPT; [ 7 ]) has enabled the analysis, generation, and modeling of natural language texts in a wide range of applications. The success of these applications is, in large part, due to the feasibility of considering how the same words are used in different contexts when modeling natural language [ 6 ]. One such application is topic modeling, the practice of establishing statistical models of semantic structures underlying a document collection. Topic modeling has been proposed for identifying scientific hot topics over time [ 1 ], for example, in synthetic biology [ 8 ], and it has also been applied to, for example, automatically identify topical scenes in images [ 9 ] and social network topics [ 10 ], discover gene programs highly correlated with cancer prognosis [ 11 ], capture “chromatin topics” that define cell-type differences [ 12 ], and investigate relationships between genetic variants and disease risk [ 13 ]. Here, we use topic modeling to ask how research topics in a scientific field have evolved and what major changes in the research trends have taken place, using plant science as an example.

Plant science corpora allow classification of major research topics

Plant science, broadly defined, is the study of photosynthetic species, their interactions with biotic/abiotic environments, and their applications. For modeling plant science topical evolution, we first identified a collection of plant science documents (i.e., corpus) using a text classification approach. To this end, we first collected over 30 million PubMed records and narrowed down candidate plant science records by searching for those with plant-related terms and taxon names (see Materials and methods ). Because there remained a substantial number of false positives (i.e., biomedical records mentioning plants in passing), a set of positive plant science examples from the 17 plant science journals with the highest numbers of plant science publications covering a wide range of subfields and a set of negative examples from journals with few candidate plant science records were used to train 4 types of text classification models (see Materials and methods ). The best text classification model performed well (F1 = 0.96, F1 of a naïve model = 0.5, perfect model = 1) where the positive and negative examples were clearly separated from each other based on prediction probability of the hold-out testing dataset (false negative rate = 2.6%, false positive rate = 5.2%, S1A and S1B Fig ). The false prediction rate for documents from the 17 plant science journals annotated with the Medical Subject Heading (MeSH) term “Plants” in NCBI was 11.7% (see Materials and methods ). The prediction probability distribution of positive instances with the MeSH term has an expected left-skew to lower values ( S1C Fig ) compared with the distributions of all positive instances ( S1A Fig ). Thus, this subset with the MeSH term is a skewed representation of articles from these 17 major plant science journals. To further benchmark the validity of the plant science records, we also conducted manual annotation of 100 records where the false positive and false negative rates were 14.6% and 10.6%, respectively (see Materials and methods ). Using 12 other plant science journals not included as positive examples as benchmarks, the false negative rate was 9.9% (see Materials and methods ). Considering the range of false prediction rate estimates with different benchmarks, we should emphasize that the model built with the top 17 plant science journals represents a substantial fraction of plant science publications but with biases. Applying the model to the candidate plant science record led to 421,658 positive predictions, hereafter referred to as “plant science records” ( S1D Fig and S1 Data ).

To better understand how the models classified plant science articles, we identified important terms from a more easily interpretable model (Term frequency-Inverse document frequency (Tf-Idf) model; F1 = 0.934) using Shapley Additive Explanations [ 14 ]; 136 terms contributed to predicting plant science records (e.g., Arabidopsis, xylem, seedling) and 138 terms contributed to non-plant science record predictions (e.g., patients, clinical, mice; Tf-Idf feature sheet, S1 Data ). Plant science records as well as PubMed articles grew exponentially from 1950 to 2020 ( Fig 1A ), highlighting the challenges of digesting the rapidly expanding literature. We used the plant science records to perform topic modeling, which consisted of 4 steps: representing each record as a BERT embedding, reducing dimensionality, clustering, and identifying the top terms by calculating class (i.e., topic)-based Tf-Idf (c-Tf-Idf; [ 15 ]). The c-Tf-Idf represents the frequency of a term in the context of how rare the term is to reduce the influence of common words. SciBERT [ 16 ] was the best model among those tested ( S2 Data ) and was used for building the final topic model, which classified 372,430 (88.3%) records into 90 topics defined by distinct combinations of terms ( S3 Data ). The topics contained 620 to 16,183 records and were named after the top 4 to 5 terms defining the topical areas ( Fig 1B and S3 Data ). For example, the top 5 terms representing the largest topic, topic 61 (16,183 records), are “qtl,” “resistance,” “wheat,” “markers,” and “traits,” which represent crop improvement studies using quantitative genetics.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

(A) Numbers of PubMed (magenta) and plant science (green) records between 1950 and 2020. (a, b, c) Coefficients of the exponential function, y = ae b . Data for the plot are in S1 Data . (B) Numbers of documents for the top 30 plant science topics. Each topic is designated by an index number (left) and the top 4–6 terms with the highest cTf-Idf values (right). Data for the plot are in S3 Data . (C) Two-dimensional representation of the relationships between plant science records generated by Uniform Manifold Approximation and Projection (UMAP, [ 17 ]) using SciBERT embeddings of plant science records. All topics panel: Different topics are assigned different colors. Outlier panel: UMAP representation of all records (gray) with outlier records in red. Blue dotted circles: areas with relatively high densities indicating topics that are below the threshold for inclusion in a topic. In the 8 UMAP representations on the right, records for example topics are in red and the remaining records in gray. Blue dotted circles indicate the relative position of topic 48.

https://doi.org/10.1371/journal.pbio.3002612.g001

Records with assigned topics clustered into distinct areas in a two-dimensional (2D) space ( Fig 1C , for all topics, see S4 Data ). The remaining 49,228 outlier records not assigned to any topic (11.7%, middle panel, Fig 1C ) have 3 potential sources. First, some outliers likely belong to unique topics but have fewer records than the threshold (>500, blue dotted circles, Fig 1C ). Second, some of the many outliers dispersed within the 2D space ( Fig 1C ) were not assigned to any single topic because they had relatively high prediction scores for multiple topics ( S2 Fig ). These likely represent studies across subdisciplines in plant science. Third, some outliers are likely interdisciplinary studies between plant science and other domains, such as chemistry, mathematics, and physics. Such connections can only be revealed if records from other domains are included in the analyses.

Topical clusters reveal closely related topics but with distinct key term usage

Related topics tend to be located close together in the 2D representation (e.g., topics 48 and 49, Fig 1C ). We further assessed intertopical relationships by determining the cosine similarities between topics using cTf-Idfs ( Figs 2A and S3 ). In this topic network, some topics are closely related and form topic clusters. For example, topics 25, 26, and 27 collectively represent a more general topic related to the field of plant development (cluster a , lower left in Fig 2A ). Other topic clusters represent studies of stress, ion transport, and heavy metals ( b ); photosynthesis, water, and UV-B ( c ); population and community biology (d); genomics, genetic mapping, and phylogenetics ( e , upper right); and enzyme biochemistry ( f , upper left in Fig 2A ).

thumbnail

(A) Graph depicting the degrees of similarity (edges) between topics (nodes). Between each topic pair, a cosine similarity value was calculated using the cTf-Idf values of all terms. A threshold similarity of 0.6 was applied to illustrate the most related topics. For the full matrix presented as a heatmap, see S4 Fig . The nodes are labeled with topic index numbers and the top 4–6 terms. The colors and width of the edges are defined based on cosine similarity. Example topic clusters are highlighted in yellow and labeled a through f (blue boxes). (B, C) Relationships between the cTf-Idf values (see S3 Data ) of the top terms for topics 26 and 27 (B) and for topics 25 and 27 (C) . Only terms with cTf-Idf ≥ 0.6 are labeled. Terms with cTf-Idf values beyond the x and y axis limit are indicated by pink arrows and cTf-Idf values. (D) The 2D representation in Fig 1C is partitioned into graphs for different years, and example plots for every 5-year period since 1975 are shown. Example topics discussed in the text are indicated. Blue arrows connect the areas occupied by records of example topics across time periods to indicate changes in document frequencies.

https://doi.org/10.1371/journal.pbio.3002612.g002

Topics differed in how well they were connected to each other, reflecting how general the research interests or needs are (see Materials and methods ). For example, topic 24 (stress mechanisms) is the most well connected with median cosine similarity = 0.36, potentially because researchers in many subfields consider aspects of plant stress even though it is not the focus. The least connected topics include topic 21 (clock biology, 0.12), which is surprising because of the importance of clocks in essentially all aspects of plant biology [ 18 ]. This may be attributed, in part, to the relatively recent attention in this area.

Examining topical relationships and the cTf-Idf values of terms also revealed how related topics differ. For example, topic 26 is closely related to topics 27 and 25 (cluster a on the lower left of Fig 2A ). Topics 26 and 27 both contain records of developmental process studies mainly in Arabidopsis ( Fig 2B ); however, topic 26 is focused on the impact of light, photoreceptors, and hormones such as gibberellic acids (ga) and brassinosteroids (br), whereas topic 27 is focused on flowering and floral development. Topic 25 is also focused on plant development but differs from topic 27 because it contains records of studies mainly focusing on signaling and auxin with less emphasis on Arabidopsis ( Fig 2C ). These examples also highlight the importance of using multiple top terms to represent the topics. The similarities in cTf-Idfs between topics were also useful for measuring the editorial scope (i.e., diverse, or narrow) of journals publishing plant science papers using a relative topic diversity measure (see Materials and methods ). For example, Proceedings of the National Academy of Sciences , USA has the highest diversity, while Theoretical and Applied Genetics has the lowest ( S4 Fig ). One surprise is the relatively low diversity of American Journal of Botany , which focuses on plant ecology, systematics, development, and genetics. The low diversity is likely due to the relatively larger number of cellular and molecular science records in PubMed, consistent with the identification of relatively few topical areas relevant to studies at the organismal, population, community, and ecosystem levels.

Investigation of the relative prevalence of topics over time reveals topical succession

We next asked whether relationships between topics reflect chronological progression of certain subfields. To address this, we assessed how prevalent topics were over time using dynamic topic modeling [ 19 ]. As shown in Fig 2D , there is substantial fluctuation in where the records are in the 2D space over time. For example, topic 44 (light, leaves, co, synthesis, photosynthesis) is among the topics that existed in 1975 but has diminished gradually since. In 1985, topic 39 (Agrobacterium-based transformation) became dense enough to be visualized. Additional examples include topics 79 (soil heavy metals), 42 (differential expression), and 82 (bacterial community metagenomics), which became prominent in approximately 2005, 2010, and 2020, respectively ( Fig 2D ). In addition, animating the document occupancy in the 2D space over time revealed a broad change in patterns over time: Some initially dense areas became sparse over time and a large number of topics in areas previously only loosely occupied at the turn of the century increased over time ( S5 Data ).

While the 2D representations reveal substantial details on the evolution of topics, comparison over time is challenging because the number of plant science records has grown exponentially ( Fig 1A ). To address this, the records were divided into 50 chronological bins each with approximately 8,400 records to make cross-bin comparisons feasible ( S6 Data ). We should emphasize that, because of the way the chronological bins were split, the number of records for each topic in each bin should be treated as a normalized value relative to all other topics during the same period. Examining this relative prevalence of topics across bins revealed a clear pattern of topic succession over time (one topic evolved into another) and the presence of 5 topical categories ( Fig 3 ). The topics were categorized based on their locally weighted scatterplot smoothing (LOWESS) fits and ordered according to timing of peak frequency ( S7 and S8 Data , see Materials and methods ). In Fig 3 , the relative decrease in document frequency does not mean that research output in a topic is dwindling. Because each row in the heatmap is normalized based on the minimum and maximum values within each topic, there still can be substantial research output in terms of numbers of publications even when the relative frequency is near zero. Thus, a reduced relative frequency of a topic reflects only a below-average growth rate compared with other topical areas.

thumbnail

(A-E) A heat map of relative topic frequency over time reveals 5 topical categories: (A) stable, (B) early, (C) transitional, (D) sigmoidal, and (E) rising. The x axis denotes different time bins with each bin containing a similar number of documents to account for the exponential growth of plant science records over time. The sizes of all bins except the first are drawn to scale based on the beginning and end dates. The y axis lists different topics denoted by the label and top 4 to 5 terms. In each cell, the prevalence of a topic in a time bin is colored according to the min-max normalized cTf-Idf values for that topic. Light blue dotted lines delineate different decades. The arrows left of a subset of topic labels indicate example relationships between topics in topic clusters. Blue boxes with labels a–f indicate topic clusters, which are the same as those in Fig 2 . Connecting lines indicate successional trends. Yellow circles/lines 1 – 3: 3 major transition patterns. The original data are in S5 Data .

https://doi.org/10.1371/journal.pbio.3002612.g003

The first topical category is a stable category with 7 topics mostly established before the 1980s that have since remained stable in terms of prevalence in the plant science records (top of Fig 3A ). These topics represent long-standing plant science research foci, including studies of plant physiology (topics 4, 58, and 81), genetics (topic 61), and medicinal plants (topic 53). The second category contains 8 topics established before the 1980s that have mostly decreased in prevalence since (the early category, Fig 3B ). Two examples are physiological and morphological studies of hormone action (topic 45, the second in the early category) and the characterization of protein, DNA, and RNA (topic 18, the second to last). Unlike other early topics, topic 78 (paleobotany and plant evolution studies, the last topic in Fig 3B ) experienced a resurgence in the early 2000s due to the development of new approaches and databases and changes in research foci [ 20 ].

The 33 topics in the third, transitional category became prominent in the 1980s, 1990s, or even 2000s but have clearly decreased in prevalence ( Fig 3C ). In some cases, the early and the transitional topics became less prevalent because of topical succession—refocusing of earlier topics led to newer ones that either show no clear sign of decrease (the sigmoidal category, Fig 3D ) or continue to increase in prevalence (the rising category, Fig 3E ). Consistent with the notion of topical succession, topics within each topic cluster ( Fig 2 ) were found across topic categories and/or were prominent at different time periods (indicated by colored lines linking topics, Fig 3 ). One example is topics in topic cluster b (connected with light green lines and arrows, compare Figs 2 and 3 ); the study of cation transport (topic 47, the third in the transitional category), prominent in the 1980s and early 1990s, is connected to 5 other topics, namely, another transitional topic 29 (cation channels and their expression) peaking in the 2000s and early 2010s, sigmoidal topics 24 and 28 (stress response, tolerance mechanisms) and 30 (heavy metal transport), which rose to prominence in mid-2000s, and the rising topic 42 (stress transcriptomic studies), which increased in prevalence in the mid-2010s.

The rise and fall of topics can be due to a combination of technological or conceptual breakthroughs, maturity of the field, funding constraints, or publicity. The study of transposable elements (topic 62) illustrates the effect of publicity; the rise in this field coincided with Barbara McClintock’s 1983 Nobel Prize but not with the publication of her studies in the 1950s [ 21 ]. The reduced prevalence in early 2000 likely occurred in part because analysis of transposons became a central component of genome sequencing and annotation studies, rather than dedicated studies. In addition, this example indicates that our approaches, while capable of capturing topical trends, cannot be used to directly infer major papers leading to the growth of a topic.

Three major topical transition patterns signify shifts in research trends

Beyond the succession of specific topics, 3 major transitions in the dynamic topic graph should be emphasized: (1) the relative decreasing trend of early topics in the late 1970s and early 1980s; (2) the rise of transitional topics in late 1980s; and (3) the relative decreasing trend of transitional topics in the late 1990s and early 2000s, which coincided with a radiation of sigmoidal and rising topics (yellow circles, Fig 3 ). The large numbers of topics involved in these transitions suggest major shifts in plant science research. In transition 1, early topics decreased in relative prevalence in the late 1970s to early 1980s, which coincided with the rise of transitional topics over the following decades (circle 1, Fig 3 ). For example, there was a shift from the study of purified proteins such as enzymes (early topic 48, S5A Fig ) to molecular genetic dissection of genes, proteins, and RNA (transitional topic 35, S5B Fig ) enabled by the wider adoption of recombinant DNA and molecular cloning technologies in late 1970s [ 22 ]. Transition 2 (circle 2, Fig 3 ) can be explained by the following breakthroughs in the late 1980s: better approaches to create transgenic plants and insertional mutants [ 23 ], more efficient creation of mutant plant libraries through chemical mutagenesis (e.g., [ 24 ]), and availability of gene reporter systems such as β-glucuronidase [ 25 ]. Because of these breakthroughs, molecular genetics studies shifted away from understanding the basic machinery to understanding the molecular underpinnings of specific processes, such as molecular mechanisms of flower and meristem development and the action of hormones such as auxin (topic 27, S5C Fig ); this type of research was discussed as a future trend in 1988 [ 26 ] and remains prevalent to this date. Another example is gene silencing (topic 12), which became a focal area of study along with the widespread use of transgenic plants [ 27 ].

Transition 3 is the most drastic: A large number of transitional, sigmoidal, and rising topics became prevalent nearly simultaneously at the turn of the century (circle 3, Fig 3 ). This period also coincides with a rapid increase in plant science citations ( Fig 1A ). The most notable breakthroughs included the availability of the first plant genome in 2000 [ 28 ], increasing ease and reduced cost of high-throughput sequencing [ 29 ], development of new mass spectrometry–based platforms for analyzing proteins [ 30 ], and advancements in microscopic and optical imaging approaches [ 31 ]. Advances in genomics and omics technology also led to an increase in stress transcriptomics studies (42, S5D Fig ) as well as studies in many other topics such as epigenetics (topic 11), noncoding RNA analysis (13), genomics and phylogenetics (80), breeding (41), genome sequencing and assembly (60), gene family analysis (23), and metagenomics (82 and 55).

In addition to the 3 major transitions across all topics, there were also transitions within topics revealed by examining the top terms for different time bins (heatmaps, S5 Fig ). Taken together, these observations demonstrate that knowledge about topical evolution can be readily revealed through topic modeling. Such knowledge is typically only available to experts in specific areas and is difficult to summarize manually, as no researcher has a command of the entire plant science literature.

Analysis of taxa studied reveals changes in research trends

Changes in research trends can also be illustrated by examining changes in the taxa being studied over time ( S9 Data ). There is a strong bias in the taxa studied, with the record dominated by research models and economically important taxa ( S6 Fig ). Flowering plants (Magnoliopsida) are found in 93% of records ( S6A Fig ), and the mustard family Brassicaceae dominates at the family level ( S6B Fig ) because the genus Arabidopsis contributes to 13% of plant science records ( Fig 4A ). When examining the prevalence of taxa being studied over time, clear patterns of turnover emerged similar to topical succession ( Figs 4B , S6C, and S6D ; Materials and methods ). Given that Arabidopsis is mentioned in more publications than other species we analyzed, we further examined the trends for Arabidopsis publications. The increase in the normalized number (i.e., relative to the entire plant science corpus) of Arabidopsis records coincided with advocacy of its use as a model system in the late 1980s [ 32 ]. While it remains a major plant model, there has been a decrease in overall Arabidopsis publications relative to all other plant science publications since 2011 (blue line, normalized total, Fig 4C ). Because the same chronological bins, each with same numbers of records, from the topic-over-time analysis ( Fig 3 ) were used, the decrease here does not mean that there were fewer Arabidopsis publications—in fact, the number of Arabidopsis papers has remained steady since 2011. This decrease means that Arabidopsis-related publications represent a relatively smaller proportion of plant science records. Interestingly, this decrease took place much earlier (approximately 2005) and was steeper in the United States (red line, Fig 4C ) than in all countries combined (blue line, Fig 4C ).

thumbnail

(A) Percentage of records mentioning specific genera. (B) Change in the prevalence of genera in plant science records over time. (C) Changes in the normalized numbers of all records (blue) and records from the US (red) mentioning Arabidopsis over time. The lines are LOWESS fits with fraction parameter = 0.2. (D) Topical over (red) and under (blue) representation among 5 genera with the most plant science records. LLR: log 2 likelihood ratios of each topic in each genus. Gray: topic-species combination not significantly enriched at the 5% level based on enrichment p -values adjusted for multiple testing with the Benjamini–Hochberg method [ 33 ]. The data used for plotting are in S9 Data . The statistics for all topics are in S10 Data .

https://doi.org/10.1371/journal.pbio.3002612.g004

Assuming that the normalized number of publications reflects the relative intensity of research activities, one hypothesis for the relative decrease in focus on Arabidopsis is that advances in, for example, plant transformation, genetic manipulation, and genome research have allowed the adoption of more previously nonmodel taxa. Consistent with this, there was a precipitous increase in the number of genera being published in the mid-90s to early 2000s during which approaches for plant transgenics became established [ 34 ], but the number has remained steady since then ( S7A Fig ). The decrease in the proportion of Arabidopsis papers is also negatively correlated with the timing of an increase in the number of draft genomes ( S7B Fig and S9 Data ). It is plausible that genome availability for other species may have contributed to a shift away from Arabidopsis. Strikingly, when we analyzed US National Science Foundation records, we found that the numbers of funded grants mentioning Arabidopsis ( S7C Fig ) have risen and fallen in near perfect synchrony with the normalized number of Arabidopsis publication records (red line, Fig 4C ). This finding likely illustrates the impact of funding on Arabidopsis research.

By considering both taxa information and research topics, we can identify clear differences in the topical areas preferred by researchers using different plant taxa ( Fig 4D and S10 Data ). For example, studies of auxin/light signaling, the circadian clock, and flowering tend to be carried out in Arabidopsis, while quantitative genetic studies of disease resistance tend to be done in wheat and rice, glyphosate research in soybean, and RNA virus research in tobacco. Taken together, joint analyses of topics and species revealed additional details about changes in preferred models over time, and the preferred topical areas for different taxa.

Countries differ in their contributions to plant science and topical preference

We next investigated whether there were geographical differences in topical preference among countries by inferring country information from 330,187 records (see Materials and methods ). The 10 countries with the most records account for 73% of the total, with China and the US contributing to approximately 18% each ( Fig 5A ). The exponential growth in plant science records (green line, Fig 1A ) was in large part due to the rapid rise in annual record numbers in China and India ( Fig 5B ). When we examined the publication growth rates using the top 17 plant science journals, the general patterns remained the same ( S7D Fig ). On the other hand, the US, Japan, Germany, France, and Great Britain had slower rates of growth compared with all non-top 10 countries. The rapid increase in records from China and India was accompanied by a rapid increase in metrics measuring journal impact ( Figs 5C and S8 and S9 Data ). For example, using citation score ( Fig 5C , see Materials and methods ), we found that during a 22-year period China (dark green) and India (light green) rapidly approached the global average (y = 0, yellow), whereas some of the other top 10 countries, particularly the US (red) and Japan (yellow green), showed signs of decrease ( Fig 5C ). It remains to be determined whether these geographical trends reflect changes in priority, investment, and/or interest in plant science research.

thumbnail

(A) Numbers of plant science records for countries with the 10 highest numbers. (B) Percentage of all records from each of the top 10 countries from 1980 to 2020. (C) Difference in citation scores from 1999 to 2020 for the top 10 countries. (D) Shown for each country is the relationship between the citation scores averaged from 1999 to 2020 and the slope of linear fit with year as the predictive variable and citation score as the response variable. The countries with >400 records and with <10% missing impact values are included. Data used for plots (A–D) are in S11 Data . (E) Correlation in topic enrichment scores between the top 10 countries. PCC, Pearson’s correlation coefficient, positive in red, negative in blue. Yellow rectangle: countries with more similar topical preferences. (F) Enrichment scores (LLR, log likelihood ratio) of selected topics among the top 10 countries. Red: overrepresentation, blue: underrepresentation. Gray: topic-country combination that is not significantly enriched at the 5% level based on enrichment p -values adjusted for multiple testing with the Benjamini–Hochberg method (for all topics and plotting data, see S12 Data ).

https://doi.org/10.1371/journal.pbio.3002612.g005

Interestingly, the relative growth/decline in citation scores over time (measured as the slope of linear fit of year versus citation score) was significantly and negatively correlated with average citation score ( Fig 5D ); i.e., countries with lower overall metrics tended to experience the strongest increase in citation scores over time. Thus, countries that did not originally have a strong influence on plant sciences now have increased impact. These patterns were also observed when using H-index or journal rank as metrics ( S8 Fig and S11 Data ) and were not due to increased publication volume, as the metrics were normalized against numbers of records from each country (see Materials and methods ). In addition, the fact that different metrics with different caveats and assumptions yielded consistent conclusions indicates the robustness of our observations. We hypothesize that this may be a consequence of the ease in scientific communication among geographically isolated research groups. It could also be because of the prevalence of online journals that are open access, which makes scientific information more readily accessible. Or it can be due to the increasing international collaboration. In any case, the causes for such regression toward the mean are not immediately clear and should be addressed in future studies.

We also assessed how the plant research foci of countries differ by comparing topical preference (i.e., the degree of enrichment of plant science records in different topics) between countries. For example, Italy and Spain cluster together (yellow rectangle, Fig 5E ) partly because of similar research focusing on allergens (topic 0) and mycotoxins (topic 54) and less emphasis on gene family (topic 23) and stress tolerance (topic 28) studies ( Fig 5F , for the fold enrichment and corrected p -values of all topics, see S12 Data ). There are substantial differences in topical focus between countries ( S9 Fig ). For example, research on new plant compounds associated with herbal medicine (topic 69) is a focus in China but not in the US, but the opposite is true for population genetics and evolution (topic 86) ( Fig 5F ). In addition to revealing how plant science research has evolved over time, topic modeling provides additional insights into differences in research foci among different countries, which are informative for science policy considerations.

In this study, topic modeling revealed clear transitions among research topics, which represent shifts in research trends in plant sciences. One limitation of our study is the bias in the PubMed-based corpus. The cellular, molecular, and physiological aspects of plant sciences are well represented, but there are many fewer records related to evolution, ecology, and systematics. Our use of titles/abstracts from the top 17 plant science journals as positive examples allowed us to identify papers we typically see in these journals, but this may have led to us missing “outlier” articles, which may be the most exciting. Another limitation is the need to assign only one topic to a record when a study is interdisciplinary and straddles multiple topics. Furthermore, a limited number of large, inherently heterogeneous topics were summarized to provide a more concise interpretation, which undoubtedly underrepresents the diversity of plant science research. Despite these limitations, dynamic topic modeling revealed changes in plant science research trends that coincide with major shifts in biological science. While we were interested in identifying conceptual advances, our approach can identify the trend but the underlying causes for such trends, particularly key records leading to the growth in certain topics, still need to be identified. It also remains to be determined which changes in research trends lead to paradigm shifts as defined by Kuhn [ 35 ].

The key terms defining the topics frequently describe various technologies (e.g., topic 38/39: transformation, 40: genome editing, 59: genetic markers, 65: mass spectrometry, 69: nuclear magnetic resonance) or are indicative of studies enabled through molecular genetics and omics technologies (e.g., topic 8/60: genome, 11: epigenetic modifications, 18: molecular biological studies of macromolecules, 13: small RNAs, 61: quantitative genetics, 82/84: metagenomics). Thus, this analysis highlights how technological innovation, particularly in the realm of omics, has contributed to a substantial number of research topics in the plant sciences, a finding that likely holds for other scientific disciplines. We also found that the pattern of topic evolution is similar to that of succession, where older topics have mostly decreased in relative prevalence but appear to have been superseded by newer ones. One example is the rise of transcriptome-related topics and the correlated, reduced focus on regulation at levels other than transcription. This raises the question of whether research driven by technology negatively impacts other areas of research where high-throughput studies remain challenging.

One observation on the overall trends in plant science research is the approximately 10-year cycle in major shifts. One hypothesis is related to not only scientific advances but also to the fashion-driven aspect of science. Nonetheless, given that there were only 3 major shifts and the sample size is small, it is difficult to speculate as to why they happened. By analyzing the country of origin, we found that China and India have been the 2 major contributors to the growth in the plant science records in the last 20 years. Our findings also show an equalizing trend in global plant science where countries without a strong plant science publication presence have had an increased impact over the last 20 years. In addition, we identified significant differences in research topics between countries reflecting potential differences in investment and priorities. Such information is important for discerning differences in research trends across countries and can be considered when making policy decisions about research directions.

Materials and methods

Collection and preprocessing of a candidate plant science corpus.

For reproducibility purposes, a random state value of 20220609 was used throughout the study. The PubMed baseline files containing citation information ( ftp://ftp.ncbi.nlm.nih.gov/pubmed/baseline/ ) were downloaded on November 11, 2021. To narrow down the records to plant science-related citations, a candidate citation was identified as having, within the titles and/or abstracts, at least one of the following words: “plant,” “plants,” “botany,” “botanical,” “planta,” and “plantarum” (and their corresponding upper case and plural forms), or plant taxon identifiers from NCBI Taxonomy ( https://www.ncbi.nlm.nih.gov/taxonomy ) or USDA PLANTS Database ( https://plants.sc.egov.usda.gov/home ). Note the search terms used here have nothing to do with the values of the keyword field in PubMed records. The taxon identifiers include all taxon names including and at taxonomic levels below “Viridiplantae” till the genus level (species names not used). This led to 51,395 search terms. After looking for the search terms, qualified entries were removed if they were duplicated, lacked titles and/or abstracts, or were corrections, errata, or withdrawn articles. This left 1,385,417 citations, which were considered the candidate plant science corpus (i.e., a collection of texts). For further analysis, the title and abstract for each citation were combined into a single entry. Text was preprocessed by lowercasing, removing stop-words (i.e., common words), removing non-alphanumeric and non-white space characters (except Greek letters, dashes, and commas), and applying lemmatization (i.e., grouping inflected forms of a word as a single word) for comparison. Because lemmatization led to truncated scientific terms, it was not included in the final preprocessing pipeline.

Definition of positive/negative examples

Upon closer examination, a large number of false positives were identified in the candidate plant science records. To further narrow down citations with a plant science focus, text classification was used to distinguish plant science and non-plant science articles (see next section). For the classification task, a negative set (i.e., non-plant science citations) was defined as entries from 7,360 journals that appeared <20 times in the filtered data (total = 43,329, journal candidate count, S1 Data ). For the positive examples (i.e., true plant science citations), 43,329 plant science citations (positive examples) were sampled from 17 established plant science journals each with >2,000 entries in the filtered dataset: “Plant physiology,” “Frontiers in plant science,” “Planta,” “The Plant journal: for cell and molecular biology,” “Journal of experimental botany,” “Plant molecular biology,” “The New phytologist,” “The Plant cell,” “Phytochemistry,” “Plant & cell physiology,” “American journal of botany,” “Annals of botany,” “BMC plant biology,” “Tree physiology,” “Molecular plant-microbe interactions: MPMI,” “Plant biology,” and “Plant biotechnology journal” (journal candidate count, S1 Data ). Plant biotechnology journal was included, but only 1,894 records remained after removal of duplicates, articles with missing info, and/or withdrawn articles. The positive and negative sets were randomly split into training and testing subsets (4:1) while maintaining a 1:1 positive-to-negative ratio.

Text classification based on Tf and Tf-Idf

Instead of using the preprocessed text as features for building classification models directly, text embeddings (i.e., representations of texts in vectors) were used as features. These embeddings were generated using 4 approaches (model summary, S1 Data ): Term-frequency (Tf), Tf-Idf [ 36 ], Word2Vec [ 37 ], and BERT [ 6 ]. The Tf- and Tf-Idf-based features were generated with CountVectorizer and TfidfVectorizer, respectively, from Scikit-Learn [ 38 ]. Different maximum features (1e4 to 1e5) and n-gram ranges (uni-, bi-, and tri-grams) were tested. The features were selected based on the p- value of chi-squared tests testing whether a feature had a higher-than-expected value among the positive or negative classes. Four different p- value thresholds were tested for feature selection. The selected features were then used to retrain vectorizers with the preprocessed training texts to generate feature values for classification. The classification model used was XGBoost [ 39 ] with 5 combinations of the following hyperparameters tested during 5-fold stratified cross-validation: min_child_weight = (1, 5, 10), gamma = (0.5, 1, 1.5, 2.5), subsample = (0.6, 0.8, 1.0), colsample_bytree = (0.6, 0.8, 1.0), and max_depth = (3, 4, 5). The rest of the hyperparameters were held constant: learning_rate = 0.2, n_estimators = 600, objective = binary:logistic. RandomizedSearchCV from Scikit-Learn was used for hyperparameter tuning and cross-validation with scoring = F1-score.

Because the Tf-Idf model had a relatively high model performance and was relatively easy to interpret (terms are frequency-based, instead of embedding-based like those generated by Word2Vec and BERT), the Tf-Idf model was selected as input to SHapley Additive exPlanations (SHAP; [ 14 ]) to assess the importance of terms. Because the Tf-Idf model was based on XGBoost, a tree-based algorithm, the TreeExplainer module in SHAP was used to determine a SHAP value for each entry in the training dataset for each Tf-Idf feature. The SHAP value indicates the degree to which a feature positively or negatively affects the underlying prediction. The importance of a Tf-Idf feature was calculated as the average SHAP value of that feature among all instances. Because a Tf-Idf feature is generated based on a specific term, the importance of the Tf-Idf feature indicates the importance of the associated term.

Text classification based on Word2Vec

The preprocessed texts were first split into train, validation, and test subsets (8:1:1). The texts in each subset were converted to 3 n-gram lists: a unigram list obtained by splitting tokens based on the space character, or bi- and tri-gram lists built with Gensim [ 40 ]. Each n-gram list of the training subset was next used to fit a Skip-gram Word2Vec model with vector_size = 300, window = 8, min_count = (5, 10, or 20), sg = 1, and epochs = 30. The Word2Vec model was used to generate word embeddings for train, validate, and test subsets. In the meantime, a tokenizer was trained with train subset unigrams using Tensorflow [ 41 ] and used to tokenize texts in each subset and turn each token into indices to use as features for training text classification models. To ensure all citations had the same number of features (500), longer texts were truncated, and shorter ones were zero-padded. A deep learning model was used to train a text classifier with an input layer the same size as the feature number, an attention layer incorporating embedding information for each feature, 2 bidirectional Long-Short-Term-Memory layers (15 units each), a dense layer (64 units), and a final, output layer with 2 units. During training, adam, accuracy, and sparse_categorical_crossentropy were used as the optimizer, evaluation metric, and loss function, respectively. The training process lasted 30 epochs with early stopping if validation loss did not improve in 5 epochs. An F1 score was calculated for each n-gram list and min_count parameter combination to select the best model (model summary, S1 Data ).

Text classification based on BERT models

Two pretrained models were used for BERT-based classification: DistilBERT (Hugging face repository [ 42 ] model name and version: distilbert-base-uncased [ 43 ]) and SciBERT (allenai/scibert-scivocab-uncased [ 16 ]). In both cases, tokenizers were retrained with the training data. BERT-based models had the following architecture: the token indices (512 values for each token) and associated masked values as input layers, pretrained BERT layer (512 × 768) excluding outputs, a 1D pooling layer (768 units), a dense layer (64 units), and an output layer (2 units). The rest of the training parameters were the same as those for Word2Vec-based models, except training lasted for 20 epochs. Cross-validation F1-scores for all models were compared and used to select the best model for each feature extraction method, hyperparameter combination, and modeling algorithm or architecture (model summary, S1 Data ). The best model was the Word2Vec-based model (min_count = 20, window = 8, ngram = 3), which was applied to the candidate plant science corpus to identify a set of plant science citations for further analysis. The candidate plant science records predicted as being in the positive class (421,658) by the model were collectively referred to as the “plant science corpus.”

Plant science record classification

In PubMed, 1,384,718 citations containing “plant” or any plant taxon names (from the phylum to genus level) were considered candidate plant science citations. To further distinguish plant science citations from those in other fields, text classification models were trained using titles and abstracts of positive examples consisting of citations from 17 plant science journals, each with >2,000 entries in PubMed, and negative examples consisting of records from journals with fewer than 20 entries in the candidate set. Among 4 models tested the best model (built with Word2Vec embeddings) had a cross validation F1 of 0.964 (random guess F1 = 0.5, perfect model F1 = 1, S1 Data ). When testing the model using 17,330 testing set citations independent from the training set, the F1 remained high at 0.961.

We also conducted another analysis attempting to use the MeSH term “Plants” as a benchmark. Records with the MeSH term “Plants” also include pharmaceutical studies of plants and plant metabolites or immunological studies of plants as allergens in journals that are not generally considered plant science journals (e.g., Acta astronautica , International journal for parasitology , Journal of chromatography ) or journals from local scientific societies (e.g., Acta pharmaceutica Hungarica , Huan jing ke xue , Izvestiia Akademii nauk . Seriia biologicheskaia ). Because we explicitly labeled papers from such journals as negative examples, we focused on 4,004 records with the “Plants” MeSH term published in the 17 plant science journals that were used as positive instances and found that 88.3% were predicted as the positive class. Thus, based on the MeSH term, there is an 11.7% false prediction rate.

We also enlisted 5 plant science colleagues (3 advanced graduate students in plant biology and genetic/genome science graduate programs, 1 postdoctoral breeder/quantitative biologist, and 1 postdoctoral biochemist/geneticist) to annotate 100 randomly selected abstracts as a reviewer suggested. Each record was annotated by 2 colleagues. Among 85 entries where the annotations are consistent between annotators, 48 were annotated as negative but with 7 predicted as positive (false positive rate = 14.6%) and 37 were annotated as positive but with 4 predicted as negative (false negative rate = 10.8%). To further benchmark the performance of the text classification model, we identified another 12 journals that focus on plant science studies to use as benchmarks: Current opinion in plant biology (number of articles: 1,806), Trends in plant science (1,723), Functional plant biology (1,717), Molecular plant pathology (1,573), Molecular plant (1,141), Journal of integrative plant biology (1,092), Journal of plant research (1,032), Physiology and molecular biology of plants (830), Nature plants (538), The plant pathology journal (443). Annual review of plant biology (417), and The plant genome (321). Among the 12,611 candidate plant science records, 11,386 were predicted as positive. Thus, there is a 9.9% false negative rate.

Global topic modeling

BERTopic [ 15 ] was used for preliminary topic modeling with n-grams = (1,2) and with an embedding initially generated by DistilBERT, SciBERT, or BioBERT (dmis-lab/biobert-base-cased-v1.2; [ 44 ]). The embedding models converted preprocessed texts to embeddings. The topics generated based on the 3 embeddings were similar ( S2 Data ). However, SciBERT-, BioBERT-, and distilBERT-based embedding models had different numbers of outlier records (268,848, 293,790, and 323,876, respectively) with topic index = −1. In addition to generating the fewest outliers, the SciBERT-based model led to the highest number of topics. Therefore, SciBERT was chosen as the embedding model for the final round of topic modeling. Modeling consisted of 3 steps. First, document embeddings were generated with SentenceTransformer [ 45 ]. Second, a clustering model to aggregate documents into clusters using hdbscan [ 46 ] was initialized with min_cluster_size = 500, metric = euclidean, cluster_selection_method = eom, min_samples = 5. Third, the embedding and the initialized hdbscan model were used in BERTopic to model topics with neighbors = 10, nr_topics = 500, ngram_range = (1,2). Using these parameters, 90 topics were identified. The initial topic assignments were conservative, and 241,567 records were considered outliers (i.e., documents not assigned to any of the 90 topics). After assessing the prediction scores of all records generated from the fitted topic models, the 95-percentile score was 0.0155. This score was used as the threshold for assigning outliers to topics: If the maximum prediction score was above the threshold and this maximum score was for topic t , then the outlier was assigned to t . After the reassignment, 49,228 records remained outliers. To assess if some of the outliers were not assigned because they could be assigned to multiple topics, the prediction scores of the records were used to put records into 100 clusters using k- means. Each cluster was then assessed to determine if the outlier records in a cluster tended to have higher prediction scores across multiple topics ( S2 Fig ).

Topics that are most and least well connected to other topics

The most well-connected topics in the network include topic 24 (stress mechanisms, median cosine similarity = 0.36), topic 42 (genes, stress, and transcriptomes, 0.34), and topic 35 (molecular genetics, 0.32, all t test p -values < 1 × 10 −22 ). The least connected topics include topic 0 (allergen research, median cosine similarity = 0.12), topic 21 (clock biology, 0.12), topic 1 (tissue culture, 0.15), and topic 69 (identification of compounds with spectroscopic methods, 0.15; all t test p- values < 1 × 10 −24 ). Topics 0, 1, and 69 are specialized topics; it is surprising that topic 21 is not as well connected as explained in the main text.

Analysis of documents based on the topic model

example of research results and discussion

Topical diversity among top journals with the most plant science records

Using a relative topic diversity measure (ranging from 0 to 10), we found that there was a wide range of topical diversity among 20 journals with the largest numbers of plant science records ( S3 Fig ). The 4 journals with the highest relative topical diversities are Proceedings of the National Academy of Sciences , USA (9.6), Scientific Reports (7.1), Plant Physiology (6.7), and PLOS ONE (6.4). The high diversities are consistent with the broad, editorial scopes of these journals. The 4 journals with the lowest diversities are American Journal of Botany (1.6), Oecologia (0.7), Plant Disease (0.7), and Theoretical and Applied Genetics (0.3), which reflects their discipline-specific focus and audience of classical botanists, ecologists, plant pathologists, and specific groups of geneticists.

Dynamic topic modeling

The codes for dynamic modeling were based on _topic_over_time.py in BERTopics and modified to allow additional outputs for debugging and graphing purposes. The plant science citations were binned into 50 subsets chronologically (for timestamps of bins, see S5 Data ). Because the numbers of documents increased exponentially over time, instead of dividing them based on equal-sized time intervals, which would result in fewer records at earlier time points and introduce bias, we divided them into time bins of similar size (approximately 8,400 documents). Thus, the earlier time subsets had larger time spans compared with later time subsets. If equal-size time intervals were used, the numbers of documents between the intervals would differ greatly; the earlier time points would have many fewer records, which may introduce bias. Prior to binning the subsets, the publication dates were converted to UNIX time (timestamp) in seconds; the plant science records start in 1917-11-1 (timestamp = −1646247600.0) and end in 2021-1-1 (timestamp = 1609477201). The starting dates and corresponding timestamps for the 50 subsets including the end date are in S6 Data . The input data included the preprocessed texts, topic assignments of records from global topic modeling, and the binned timestamps of records. Three additional parameters were set for topics_over_time, namely, nr_bin = 50 (number of bins), evolution_tuning = True, and global_tuning = False. The evolution_tuning parameter specified that averaged c-Tf-Idf values for a topic be calculated in neighboring time bins to reduce fluctuation in c-Tf-Idf values. The global_tuning parameter was set to False because of the possibility that some nonexisting terms could have a high c-Tf-Idf for a time bin simply because there was a high global c-Tf-Idf value for that term.

The binning strategy based on similar document numbers per bin allowed us to increase signal particularly for publications prior to the 90s. This strategy, however, may introduce more noise for bins with smaller time durations (i.e., more recent bins) because of publication frequencies (there can be seasonal differences in the number of papers published, biased toward, e.g., the beginning of the year or the beginning of a quarter). To address this, we examined the relative frequencies of each topic over time ( S7 Data ), but we found that recent time bins had similar variances in relative frequencies as other time bins. We also moderated the impact of variation using LOWESS (10% to 30% of the data points were used for fitting the trend lines) to determine topical trends for Fig 3 . Thus, the influence of the noise introduced via our binning strategy is expected to be minimal.

Topic categories and ordering

The topics were classified into 5 categories with contrasting trends: stable, early, transitional, sigmoidal, and rising. To define which category a topic belongs to, the frequency of documents over time bins for each topic was analyzed using 3 regression methods. We first tried 2 forecasting methods: recursive autoregressor (the ForecasterAutoreg class in the skforecast package) and autoregressive integrated moving average (ARIMA implemented in the pmdarima package). In both cases, the forecasting results did not clearly follow the expected trend lines, likely due to the low numbers of data points (relative frequency values), which resulted in the need to extensively impute missing data. Thus, as a third approach, we sought to fit the trendlines with the data points using LOWESS (implemented in the statsmodels package) and applied additional criteria for assigning topics to categories. When fitting with LOWESS, 3 fraction parameters (frac, the fraction of the data used when estimating each y-value) were evaluated (0.1, 0.2, 0.3). While frac = 0.3 had the smallest errors for most topics, in situations where there were outliers, frac = 0.2 or 0.1 was chosen to minimize mean squared errors ( S7 Data ).

The topics were classified into 5 categories based on the slopes of the fitted line over time: (1) stable: topics with near 0 slopes over time; (2) early: topics with negative (<−0.5) slopes throughout (with the exception of topic 78, which declined early on but bounced back by the late 1990s); (3) transitional: early positive (>0.5) slopes followed by negative slopes at later time points; (4) sigmoidal: early positive slopes followed by zero slopes at later time points; and (5) rising: continuously positive slopes. For each topic, the LOWESS fits were also used to determine when the relative document frequency reached its peak, first reaching a threshold of 0.6 (chosen after trial and error for a range of 0.3 to 0.9), and the overall trend. The topics were then ordered based on (1) whether they belonged to the stable category or not; (2) whether the trends were decreasing, stable, or increasing; (3) the time the relative document frequency first reached 0.6; and (4) the time that the overall peak was reached ( S8 Data ).

Taxa information

To identify a taxon or taxa in all plant science records, NCBI Taxonomy taxdump datasets were downloaded from the NCBI FTP site ( https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/new_taxdump/ ) on September 20, 2022. The highest-level taxon was Viridiplantae, and all its child taxa were parsed and used as queries in searches against the plant science corpus. In addition, a species-over-time analysis was conducted using the same time bins as used for dynamic topic models. The number of records in different time bins for top taxa are in the genus, family, order, and additional species level sheet in S9 Data . The degree of over-/underrepresentation of a taxon X in a research topic T was assessed using the p -value of a Fisher’s exact test for a 2 × 2 table consisting of the numbers of records in both X and T, in X but not T, in T but not X, and in neither ( S10 Data ).

For analysis of plant taxa with genome information, genome data of taxa in Viridiplantae were obtained from the NCBI Genome data-hub ( https://www.ncbi.nlm.nih.gov/data-hub/genome ) on October 28, 2022. There were 2,384 plant genome assemblies belonging to 1,231 species in 559 genera (genome assembly sheet, S9 Data ). The date of the assembly was used as a proxy for the time when a genome was sequenced. However, some species have updated assemblies and have more recent data than when the genome first became available.

Taxa being studied in the plant science records

Flowering plants (Magnoliopsida) are found in 93% of records, while most other lineages are discussed in <1% of records, with conifers and related species being exceptions (Acrogynomsopermae, 3.5%, S6A Fig ). At the family level, the mustard (Brassicaceae), grass (Poaceae), pea (Fabaceae), and nightshade (Solanaceae) families are in 51% of records ( S6B Fig ). The prominence of the mustard family in plant science research is due to the Brassica and Arabidopsis genera ( Fig 4A ). When examining the prevalence of taxa being studied over time, clear patterns of turnovers emerged ( Figs 4B , S6C, and S6D ). While the study of monocot species (Liliopsida) has remained steady, there was a significant uptick in the prevalence of eudicot (eudicotyledon) records in the late 90s ( S6C Fig ), which can be attributed to the increased number of studies in the mustard, myrtle (Myrtaceae), and mint (Lamiaceae) families among others ( S6D Fig ). At the genus level, records mentioning Gossypium (cotton), Phaseolus (bean), Hordeum (wheat), and Zea (corn), similar to the topics in the early category, were prevalent till the 1980s or 1990s but have mostly decreased in number since ( Fig 4B ). In contrast, Capsicum , Arabidopsis , Oryza , Vitus , and Solanum research has become more prevalent over the last 20 years.

Geographical information for the plant science corpus

The geographical information (country) of authors in the plant science corpus was obtained from the address (AD) fields of first authors in Medline XML records accessible through the NCBI EUtility API ( https://www.ncbi.nlm.nih.gov/books/NBK25501/ ). Because only first author affiliations are available for records published before December 2014, only the first author’s location was considered to ensure consistency between records before and after that date. Among the 421,658 records in the plant science corpus, 421,585 had Medline records and 421,276 had unique PMIDs. Among the records with unique PMIDs, 401,807 contained address fields. For each of the remaining records, the AD field content was split into tokens with a “,” delimiter, and the token likely containing geographical info (referred to as location tokens) was selected as either the last token or the second to last token if the last token contained “@” indicating the presence of an email address. Because of the inconsistency in how geographical information was described in the location tokens (e.g., country, state, city, zip code, name of institution, and different combinations of the above), the following 4 approaches were used to convert location tokens into countries.

The first approach was a brute force search where full names and alpha-3 codes of current countries (ISO 3166–1), current country subregions (ISO 3166–2), and historical country (i.e., country that no longer exists, ISO 3166–3) were used to search the address fields. To reduce false positives using alpha-3 codes, a space prior to each code was required for the match. The first approach allowed the identification of 361,242, 16,573, and 279,839 records with current country, historical country, and subregion information, respectively. The second method was the use of a heuristic based on common address field structures to identify “location strings” toward the end of address fields that likely represent countries, then the use of the Python pycountry module to confirm the presence of country information. This approach led to 329,025 records with country information. The third approach was to parse first author email addresses (90,799 records), recover top-level domain information, and use country code Top Level Domain (ccTLD) data from the ISO 3166 Wikipedia page to define countries (72,640 records). Only a subset of email addresses contains country information because some are from companies (.com), nonprofit organizations (.org), and others. Because a large number of records with address fields still did not have country information after taking the above 3 approaches, another approach was implemented to query address fields against a locally installed Nominatim server (v.4.2.3, https://github.com/mediagis/nominatim-docker ) using OpenStreetMap data from GEOFABRIK ( https://www.geofabrik.de/ ) to find locations. Initial testing indicated that the use of full address strings led to false positives, and the computing resource requirement for running the server was high. Thus, only location strings from the second approach that did not lead to country information were used as queries. Because multiple potential matches were returned for each query, the results were sorted based on their location importance values. The above steps led to an additional 72,401 records with country information.

Examining the overlap in country information between approaches revealed that brute force current country and pycountry searches were consistent 97.1% of the time. In addition, both approaches had high consistency with the email-based approach (92.4% and 93.9%). However, brute force subregion and Nominatim-based predictions had the lowest consistencies with the above 3 approaches (39.8% to 47.9%) and each other. Thus, a record’s country information was finalized if the information was consistent between any 2 approaches, except between the brute force subregion and Nominatim searches. This led to 330,328 records with country information.

Topical and country impact metrics

example of research results and discussion

To determine annual country impact, impact scores were determined in the same way as that for annual topical impact, except that values for different countries were calculated instead of topics ( S8 Data ).

Topical preferences by country

To determine topical preference for a country C , a 2 × 2 table was established with the number of records in topic T from C , the number of records in T but not from C , the number of non- T records from C , and the number of non- T records not from C . A Fisher’s exact test was performed for each T and C combination, and the resulting p -values were corrected for multiple testing with the Bejamini–Hochberg method (see S12 Data ). The preference of T in C was defined as the degree of enrichment calculated as log likelihood ratio of values in the 2 × 2 table. Topic 5 was excluded because >50% of the countries did not have records for this topic.

The top 10 countries could be classified into a China–India cluster, an Italy–Spain cluster, and remaining countries (yellow rectangles, Fig 5E ). The clustering of Italy and Spain is partly due to similar research focusing on allergens (topic 0) and mycotoxins (topic 54) and less emphasis on gene family (topic 23) and stress tolerance (topic 28) studies ( Figs 5F and S9 ). There are also substantial differences in topical focus between countries. For example, plant science records from China tend to be enriched in hyperspectral imaging and modeling (topic 9), gene family studies (topic 23), stress biology (topic 28), and research on new plant compounds associated with herbal medicine (topic 69), but less emphasis on population genetics and evolution (topic 86, Fig 5F ). In the US, there is a strong focus on insect pest resistance (topic 75), climate, community, and diversity (topic 83), and population genetics and evolution but less focus on new plant compounds. In summary, in addition to revealing how plant science research has evolved over time, topic modeling provides additional insights into differences in research foci among different countries.

Supporting information

S1 fig. plant science record classification model performance..

(A–C) Distributions of prediction probabilities (y_prob) of (A) positive instances (plant science records), (B) negative instances (non-plant science records), and (C) positive instances with the Medical Subject Heading “Plants” (ID = D010944). The data are color coded in blue and orange if they are correctly and incorrectly predicted, respectively. The lower subfigures contain log10-transformed x axes for the same distributions as the top subfigure for better visualization of incorrect predictions. (D) Prediction probability distribution for candidate plant science records. Prediction probabilities plotted here are available in S13 Data .

https://doi.org/10.1371/journal.pbio.3002612.s001

S2 Fig. Relationships between outlier clusters and the 90 topics.

(A) Heatmap demonstrating that some outlier clusters tend to have high prediction scores for multiple topics. Each cell shows the average prediction score of a topic for records in an outlier cluster. (B) Size of outlier clusters.

https://doi.org/10.1371/journal.pbio.3002612.s002

S3 Fig. Cosine similarities between topics.

(A) Heatmap showing cosine similarities between topic pairs. Top-left: hierarchical clustering of the cosine similarity matrix using the Ward algorithm. The branches are colored to indicate groups of related topics. (B) Topic labels and names. The topic ordering was based on hierarchical clustering of topics. Colored rectangles: neighboring topics with >0.5 cosine similarities.

https://doi.org/10.1371/journal.pbio.3002612.s003

S4 Fig. Relative topical diversity for 20 journals.

The 20 journals with the most plant science records are shown. The journal names were taken from the journal list in PubMed ( https://www.nlm.nih.gov/bsd/serfile_addedinfo.html ).

https://doi.org/10.1371/journal.pbio.3002612.s004

S5 Fig. Topical frequency and top terms during different time periods.

(A-D) Different patterns of topical frequency distributions for example topics (A) 48, (B) 35, (C) 27, and (D) 42. For each topic, the top graph shows the frequency of topical records in each time bin, which are the same as those in Fig 3 (green line), and the end date for each bin is indicated. The heatmap below each line plot depicts whether a term is among the top terms in a time bin (yellow) or not (blue). Blue dotted lines delineate different decades (see S5 Data for the original frequencies, S6 Data for the LOWESS fitted frequencies and the top terms for different topics/time bins).

https://doi.org/10.1371/journal.pbio.3002612.s005

S6 Fig. Prevalence of records mentioning different taxonomic groups in Viridiplantae.

(A, B) Percentage of records mentioning specific taxa at the ( A) major lineage and (B) family levels. (C, D) The prevalence of taxon mentions over time at the (C) major lineage and (E) family levels. The data used for plotting are available in S9 Data .

https://doi.org/10.1371/journal.pbio.3002612.s006

S7 Fig. Changes over time.

(A) Number of genera being mentioned in plant science records during different time bins (the date indicates the end date of that bin, exclusive). (B) Numbers of genera (blue) and organisms (salmon) with draft genomes available from National Center of Biotechnology Information in different years. (C) Percentage of US National Science Foundation (NSF) grants mentioning the genus Arabidopsis over time with peak percentage and year indicated. The data for (A–C) are in S9 Data . (D) Number of plant science records in the top 17 plant science journals from the USA (red), Great Britain (GBR) (orange), India (IND) (light green), and China (CHN) (dark green) normalized against the total numbers of publications of each country over time in these 17 journals. The data used for plotting can be found in S11 Data .

https://doi.org/10.1371/journal.pbio.3002612.s007

S8 Fig. Change in country impact on plant science over time.

(A, B) Difference in 2 impact metrics from 1999 to 2020 for the 10 countries with the highest number of plant science records. (A) H-index. (B) SCImago Journal Rank (SJR). (C, D) Plots show the relationships between the impact metrics (H-index in (C) , SJR in (D) ) averaged from 1999 to 2020 and the slopes of linear fits with years as the predictive variable and impact metric as the response variable for different countries (A3 country codes shown). The countries with >400 records and with <10% missing impact values are included. The data used for plotting can be found in S11 Data .

https://doi.org/10.1371/journal.pbio.3002612.s008

S9 Fig. Country topical preference.

Enrichment scores (LLR, log likelihood ratio) of topics for each of the top 10 countries. Red: overrepresentation, blue: underrepresentation. The data for plotting can be found in S12 Data .

https://doi.org/10.1371/journal.pbio.3002612.s009

S1 Data. Summary of source journals for plant science records, prediction models, and top Tf-Idf features.

Sheet–Candidate plant sci record j counts: Number of records from each journal in the candidate plant science corpus (before classification). Sheet—Plant sci record j count: Number of records from each journal in the plant science corpus (after classification). Sheet–Model summary: Model type, text used (txt_flag), and model parameters used. Sheet—Model performance: Performance of different model and parameter combinations on the validation data set. Sheet–Tf-Idf features: The average SHAP values of Tf-Idf (Term frequency-Inverse document frequency) features associated with different terms. Sheet–PubMed number per year: The data for PubMed records in Fig 1A . Sheet–Plant sci record num per yr: The data for the plant science records in Fig 1A .

https://doi.org/10.1371/journal.pbio.3002612.s010

S2 Data. Numbers of records in topics identified from preliminary topic models.

Sheet–Topics generated with a model based on BioBERT embeddings. Sheet–Topics generated with a model based on distilBERT embeddings. Sheet–Topics generated with a model based on SciBERT embeddings.

https://doi.org/10.1371/journal.pbio.3002612.s011

S3 Data. Final topic model labels and top terms for topics.

Sheet–Topic label: The topic index and top 10 terms with the highest cTf-Idf values. Sheets– 0 to 89: The top 50 terms and their c-Tf-Idf values for topics 0 to 89.

https://doi.org/10.1371/journal.pbio.3002612.s012

S4 Data. UMAP representations of different topics.

For a topic T , records in the UMAP graph are colored red and records not in T are colored gray.

https://doi.org/10.1371/journal.pbio.3002612.s013

S5 Data. Temporal relationships between published documents projected onto 2D space.

The 2D embedding generated with UMAP was used to plot document relationships for each year. The plots from 1975 to 2020 were compiled into an animation.

https://doi.org/10.1371/journal.pbio.3002612.s014

S6 Data. Timestamps and dates for dynamic topic modeling.

Sheet–bin_timestamp: Columns are: (1) order index; (2) bin_idx–relative positions of bin labels; (3) bin_timestamp–UNIX time in seconds; and (4) bin_date–month/day/year. Sheet–Topic frequency per timestamp: The number of documents in each time bin for each topic. Sheets–LOWESS fit 0.1/0.2/0.3: Topic frequency per timestamp fitted with the fraction parameter of 0.1, 0.2, or 0.3. Sheet—Topic top terms: The top 5 terms for each topic in each time bin.

https://doi.org/10.1371/journal.pbio.3002612.s015

S7 Data. Locally weighted scatterplot smoothing (LOWESS) of topical document frequencies over time.

There are 90 scatter plots, one for each topic, where the x axis is time, and the y axis is the document frequency (blue dots). The LOWESS fit is shown as orange points connected with a green line. The category a topic belongs to and its order in Fig 3 are labeled on the top left corner. The data used for plotting are in S6 Data .

https://doi.org/10.1371/journal.pbio.3002612.s016

S8 Data. The 4 criteria used for sorting topics.

Peak: the time when the LOWESS fit of the frequencies of a topic reaches maximum. 1st_reach_thr: the time when the LOWESS fit first reaches a threshold of 60% maximal frequency (peak value). Trend: upward (1), no change (0), or downward (−1). Stable: whether a topic belongs to the stable category (1) or not (0).

https://doi.org/10.1371/journal.pbio.3002612.s017

S9 Data. Change in taxon record numbers and genome assemblies available over time.

Sheet–Genus: Number of records mentioning a genus during different time periods (in Unix timestamp) for the top 100 genera. Sheet–Genus: Number of records mentioning a family during different time periods (in Unix timestamp) for the top 100 families. Sheet–Genus: Number of records mentioning an order during different time periods (in Unix timestamp) for the top 20 orders. Sheet–Species levels: Number of records mentioning 12 selected taxonomic levels higher than the order level during different time periods (in Unix timestamp). Sheet–Genome assembly: Plant genome assemblies available from NCBI as of October 28, 2022. Sheet–Arabidopsis NSF: Absolute and normalized numbers of US National Science Foundation funded proposals mentioning Arabidopsis in proposal titles and/or abstracts.

https://doi.org/10.1371/journal.pbio.3002612.s018

S10 Data. Taxon topical preference.

Sheet– 5 genera LLR: The log likelihood ratio of each topic in each of the top 5 genera with the highest numbers of plant science records. Sheets– 5 genera: For each genus, the columns are: (1) topic; (2) the Fisher’s exact test p -value (Pvalue); (3–6) numbers of records in topic T and in genus X (n_inT_inX), in T but not in X (n_inT_niX), not in T but in X (n_niT_inX), and not in T and X (n_niT_niX) that were used to construct 2 × 2 tables for the tests; and (7) the log likelihood ratio generated with the 2 × 2 tables. Sheet–corrected p -value: The 4 values for generating LLRs were used to conduct Fisher’s exact test. The p -values obtained for each country were corrected for multiple testing.

https://doi.org/10.1371/journal.pbio.3002612.s019

S11 Data. Impact metrics of countries in different years.

Sheet–country_top25_year_count: number of total publications and publications per year from the top 25 countries with the most plant science records. Sheet—country_top25_year_top17j: number of total publications and publications per year from the top 25 countries with the highest numbers of plant science records in the 17 plant science journals used as positive examples. Sheet–prank: Journal percentile rank scores for countries (3-letter country codes following https://www.iban.com/country-codes ) in different years from 1999 to 2020. Sheet–sjr: Scimago Journal rank scores. Sheet–hidx: H-Index scores. Sheet–cite: Citation scores.

https://doi.org/10.1371/journal.pbio.3002612.s020

S12 Data. Topical enrichment for the top 10 countries with the highest numbers of plant science publications.

Sheet—Log likelihood ratio: For each country C and topic T, it is defined as log((a/b)/(c/d)) where a is the number of papers from C in T, b is the number from C but not in T, c is the number not from C but in T, d is the number not from C and not in T. Sheet: corrected p -value: The 4 values, a, b, c, and d, were used to conduct Fisher’s exact test. The p -values obtained for each country were corrected for multiple testing.

https://doi.org/10.1371/journal.pbio.3002612.s021

S13 Data. Text classification prediction probabilities.

This compressed file contains the PubMed ID (PMID) and the prediction probabilities (y_pred) of testing data with both positive and negative examples (pred_prob_testing), plant science candidate records with the MeSH term “Plants” (pred_prob_candidates_with_mesh), and all plant science candidate records (pred_prob_candidates_all). The prediction probability was generated using the Word2Vec text classification models for distinguishing positive (plant science) and negative (non-plant science) records.

https://doi.org/10.1371/journal.pbio.3002612.s022

Acknowledgments

We thank Maarten Grootendorst for discussions on topic modeling. We also thank Stacey Harmer, Eva Farre, Ning Jiang, and Robert Last for discussion on their respective research fields and input on how to improve this study and Rudiger Simon for the suggestion to examine differences between countries. We also thank Mae Milton, Christina King, Edmond Anderson, Jingyao Tang, Brianna Brown, Kenia Segura Abá, Eleanor Siler, Thilanka Ranaweera, Huan Chen, Rajneesh Singhal, Paulo Izquierdo, Jyothi Kumar, Daniel Shiu, Elliott Shiu, and Wiggler Catt for their good ideas, personal and professional support, collegiality, fun at parties, as well as the trouble they have caused, which helped us improve as researchers, teachers, mentors, and parents.

  • View Article
  • PubMed/NCBI
  • Google Scholar
  • 2. Blei DM, Lafferty JD. Topic Models. In: Srivastava A, Sahami M, editors. Text Mining. Cambridge: Chapman and Hall/CRC; 2009. pp. 71–93.
  • 7. ChatGPT. [cited 2023 Aug 25]. Available from: https://chat.openai.com
  • 9. Fei-Fei L, Perona P. A Bayesian hierarchical model for learning natural scene categories. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05); 2005. pp. 524–531 vol. 2. https://doi.org/10.1109/CVPR.2005.16
  • 19. Blei DM, Lafferty JD. Dynamic topic models. Proceedings of the 23rd International Conference on Machine learning. New York, NY, USA: Association for Computing Machinery; 2006. pp. 113–120. https://doi.org/10.1145/1143844.1143859
  • 35. Kuhn T. The Structure of Scientific Revolution. Chicago: University of Chicago Press; 1962.
  • 36. CiteSeer | Proceedings of the second international conference on Autonomous agents. [cited 2023 Aug 23]. Available from: https://dl.acm.org/doi/10.1145/280765.280786
  • 39. Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM; 2016. pp. 785–794. https://doi.org/10.1145/2939672.2939785
  • 40. Řehůřek R, Sojka P. Software Framework for Topic Modelling with Large Corpora. Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Valletta, Malta: ELRA; 2010. pp. 45–50.
  • 42. Hugging Face–The AI community building the future. 2023 Aug 19 [cited 2023 Aug 25]. Available from: https://huggingface.co/

Enterprise’s Strategic Agility and Resource Allocation Choice: A Case of SMEs in China

  • Published: 22 May 2024

Cite this article

example of research results and discussion

  • Xiangsheng Dou   ORCID: orcid.org/0000-0001-7795-9111 1 &
  • Fizza Ishaq 1  

Enterprises must optimize resource allocation to achieve their strategic objectives in the context of international competition. This paper proposes a comprehensive index to measure the strategic agility of small and medium-sized enterprises (SMEs) using the basic data of online surveys based on multiple indicators and Likert scale. Then, the paper constructs an econometric model to conduct empirical analysis with representative sample data from the SEMs in the computer, communication, and other manufacturing industries in China. The paper demonstrates the nature and measurement of strategic agility and the matching of resources with strategic agility and advances a theoretical framework that enterprises’ strategic agility determines the joint allocation of non-financial resources to support strategies. The results indicate that enterprises must jointly allocate their resources according to their strategic agility, only in this way can they achieve their strategic goals and outperform their competitors. The paper further develops the concept and idea of strategic agility level zone of effective zone, inert zone, and hyper-power zone, so that enterprises may adopt different strategic decision-making based on different zones. The paper contribute to the literature on interdependencies between strategic agility and resource allocation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

example of research results and discussion

Availability of Data and Material

All data is provided in the results section of this manuscript.

Aghina, W., De Smet, A., & Weerda, K. (2015). Agility: It rhymes with stability. McKinsey Quarterly, 51 , 2–9.

Google Scholar  

Ahammad, M. F., Glaister, K. W., & Gomes, E. (2020). Strategic agility and human resource management. Human Resource Management Review, 30 , 100700.

Article   Google Scholar  

Ahammad, M. F., Basu, S., Munjal, S., Clegg, J., & Shoham, O. B. (2021). Strategic agility, environmental uncertainties and international performance: The perspective of Indian firms. Journal of World Business, 56 , 101218.

Alavi, M., & Leidner, D. E. (2001). Knowledge management and knowledge management systems: Conceptual foundations and research issues. MIS Quarterly, 25 , 107–136.

Ambituuni, A., Azizsafaei, F., & Keegan, A. (2021). HRM operational models and practices to enable strategic agility in PBOs: Managing paradoxical tensions. Journal of Business Research, 133 , 170–182.

Andersen, T., & Minbaeva, D. (2013). The role of human resource management in strategy making. Human Resource Management, 52 , 809–827.

Argyris, C. (1985). Strategy, change and defensive routines . Pitman.

Argyris, C. (1999). On organizational learning . Blackwell Publishers Inc.

Audretsch, D. B., & Belitski, M. (2013). The missing pillar: The creativity theory of knowledge spillover entrepreneurship. Small Business Economics, 41 , 819–836.

Avolio, B. J., & Gardner, W. L. (2005). Authentic leadership development: Getting to the root of positive forms of leadership. The Leadership Quarterly, 16 , 315–338.

Bamel, U. K., & Bamel, N. (2018). Organizational resources, KM process capability and strategic flexibility: A dynamic resource-capability perspective. Journal of Knowledge Management, 22 , 1555–1572.

Barney, J. (1991). Firm resources and sustained competitive advantage. Journal of Management, 17 , 99–120.

Beer, M., & Eisenstat, R. A. (2004). How to have an honest conversation about your business strategy. Harvard Business Review, 82 , 82–89.

Bessant, J., Francis, D., Meredith, S., Kaplinsky, R., & Brown, S. (2001). Developing manufacturing agility in SMEs. International Journal of Technology Management, 22 , 28–54.

Brueller, N., Carmeli, A., & Drori, I. (2014). How do different types of M&A facilitate strategic agility. California Management Review, 56 , 39–57.

Clauss, T., Kraus, S., Kallinger, F. L., Bican, P. M., Brem, A., & Kailer, N. (2021). Organizational ambidexterity and competitive advantage: The role of strategic agility in the exploration-exploitation paradox. Journal of Innovation and Knowledge, 6 , 203–213.

Cramton, M. (2001). The mutual knowledge problem and its consequences for dispersed collaboration. Organization Science, 12 , 346–371.

D’Aveni, R. A., Dagnino, G. B., & Smith, K. G. (2010). The age of temporary advantage. Strategic Management Journal, 31 , 1371–1385.

Debellis, F., Massis, A. D., Petruzzelli, A. M., Frattini, F., & Giudice, M. D. (2021). Strategic agility and international joint ventures: The willingness-ability paradox of family firms. Journal of International Management, 27 , 100739.

Demir, R., Campopiano, G., Kruckenhauser, C., & Bauer, F. (2021). Strategic agility, internationalisation speed and international success-The role of coordination mechanisms and growth modes. Journal of International Management, 27 , 100838.

Dou, X. S. (2022). Agro-ecological sustainability evaluation in China. Journal of Bioeconomics, 24 (3), 223–239.

Dou, X. S., & Ishaq, F. (2023). Regional environment risk assessment over space and time: A case of China. Economics, 17 , 20220049.

Dove, R. (2001). Response ability: The language, structure and culture of the agile enterprise . John Wiley & Sons Inc.

Doz, Y. (2020). Fostering strategic agility: How individual executives and human resource practices contribute. Human Resource Management Review, 30 , 100693.

Doz, Y., & Kosonen, M. (2007). The new deal at the top. Harvard Business Review, 85 , 98–104.

Doz, Y., & Kosonen, M. (2008a). The dynamics of strategic agility: Nokia’s roller coaster experience. California Management Review, 50 , 95–118.

Doz, Y. L., & Kosonen, M. (2008b). Fast strategy: How strategic agility will help you stay ahead of the game . Pearson Education.

Doz, Y., & Kosonen, M. (2010). Embedding strategic agility: A leadership agenda for accelerating business model renewal. Long Range Planning Special Issue on Business Models, 43 , 2–3.

Dumitrache, I., Caramihai, S. I., Sacala, I. S., Moisescu, M. A., & Popescu, D. C. (2020). Future enterprise as an intelligent cyber-physical system. IFAC-Papers on Line, 53 , 10873–10878.

Dyer, J., Gregersen, H., & Christensen, C. (2011). The innovator’s DNA: Mastering the five skills of disruptive innovators . Harvard Business School Press.

Elg, U., Ghauri, P. N., Child, J., & Collinson, S. (2017). MNE micro foundations and routines for building a legitimate and sustainable position in emerging markets. Journal of Organizational Behavior, 38 , 1320–1337.

Ertel, E. (2004). Getting past yes: Negotiating as if implementation mattered. Harvard Business Review, 82 , 60–68.

Ferraris, A., Degbey, W. Y., Singh, S. K., Bresciani, S., Castellano, S., Fiano, F., & Couturier, J. (2022). Micro foundations of strategic agility in emerging markets: Empirical evidence of Italian MNEs in India. Journal of World Business, 57 , 101272.

Gerwin, D. (1993). Manufacturing flexibility: A strategic perspective. Management Science, 39 , 395–410.

Gilbert, C. (2005). Unbundling the structure of inertia: Resource versus routine rigidity. Academy of Management Journal, 48 , 741–763.

Glenn, M. (2009). Organizational agility: How business can survive and thrive in turbulent times (a report from the Economist Intelligence Unit). The Economist , 5. Retrieved from http://www.emc.com/collateral/leadership/organisational-agility-230309.pdf .

Gligor, D. M., & Holcomb, M. C. (2010). Understanding the role of logistics capabilities in achieving supply chain agility: A systematic literature review. Supply Chain Management-an International Journal, 17 , 438–453.

Goldman, S. L., & Nagel, R. N. (1993). Management, technology and agility: The emergence of a new era in manufacturing. International Journal of Technology Management, 18 , 18–38.

Goleman, D. (2013). The hidden driver of excellence . Harper.

Gunasekeran, A. (1999). Agile manufacturing: A framework for research and development. International Journal of Production Economics, 62 , 87–105.

Heifetz, R. A., Linsky, M., & Grashow, A. (2009). The practice of adaptive leadership: Tools and tactics for changing your organization and the world . Harvard Business Press.

Heo, C. Y., Kim, B., Park, K., & Back, R. M. (2022). A comparison of Best-Worst Scaling and Likert Scale methods on peer-to-peer accommodation attributes. Journal of Business Research, 148 , 368–377.

Hong, H. J., & Doz, Y. (2013). L’Oreal masters multiculturalism. Harvard Business Review, 91 , 114–119.

Inman, R. A., Sale, R. S., Green, K. W., Jr., & Whitten, D. (2011). Agile manufacturing: Relation to JIT, operational performance and firm performance. Journal of Operations Management, 29 , 343–355.

Institute, I. (1991). 21st century manufacturing enterprise strategy . Lehigh University.

Jacobs, M., Droge, C., Vickery, S. K., & Calantone, R. (2011). Product and process modularity’s effects on manufacturing agility and firm growth performance. Journal of Product Innovation Management, 28 , 123–137.

Jasmand, C., Blazevic, V., & de Ruyter, K. (2012). Generating sales while providing service: A study of customer service representatives’ ambidextrous behavior. Journal of Marketing, 76 , 20–37.

Kahneman, D., & Tversky, A. (1996). On the reality of cognitive illusions. Psychological Review, 103 , 582–591.

Katzenbach, J. R., & Smith, D. K. (1994). Wisdom of teams . Harper Collins.

Kohtamäki, M., Heimonen, J., Sjödin, D., & Heikkilä, V. (2020). Strategic agility in innovation: Unpacking the interaction between entrepreneurial orientation and absorptive capacity by using practice theory. Journal of Business Research, 118 , 12–25.

Kumkale, İ. (2016). Organization’s tool for creating competitive advantage: Strategic agility. Balkan and near Eastern Journal of Social Sciences, 2 , 118–124.

Laurie, D., & Harreld, B. (2009). New mindset for growth during crisis. Financial Executive, 25 , 34–39.

Lenzner, R., & Johnson, S. (1997). Seeing things as they really are. Forbes, 159 , 122–128.

Lewis, M. W., Andriopoulos, C., & Smith, W. K. (2014). Paradoxical leadership to enable strategic agility. California Management Review, 56 , 58–77.

Lo, F. Y., & Liao, P. C. (2021). Rethinking financial performance and corporate sustainability: Perspectives on resources and strategies. Technological Forecasting & Social Change, 162 , 120346.

Lovallo, D., Maritan, C. A., Wu, B., & Silverman, B. S. (2022). Resource allocation and strategic management. Strategic Management Journal . https://onlinelibrary.wiley.com/journal/10970266 .

Lu, Y., & Ramamurthy, K. (2011). Understanding the link between information technology capability and organizational agility: An empirical examination. MIS Quarterly, 35 , 931–954.

Manzoni, J. F., & Barsoux, B. (2002). The set-up-to-fail syndrome: How good managers cause great people to fail . Harvard Business School Press.

Minin, A. D., Frattini, F., Bianchi, M., Bortoluzzi, G., & Piccaluga, A. (2014). Udinese Calcio soccer club as a talents factory: Strategic agility, diverging objectives, and resource constraints. European Management Journal, 32 , 319–336.

Narasimhan, R., Swink, M., & Kim, S. W. (2006). Disentangling leanness and agility: An empirical investigation. Journal of Operations Management, 24 , 440–457.

Nyamrunda, F. C., & Freeman, S. (2021). Strategic agility, dynamic relational capability and trust among SMEs in transitional economies. Journal of World Business, 56 , 101175.

Porter, M. E. (1985). Competitive advantage . Free Press.

Ramasesh, R., Kulkarni, S., & Jayakumar, M. (2001). Agility in manufacturing systems: An exploratory modeling framework and simulation. Journal of Manufacturing Technology Management, 12 , 534–548.

RESSET (2022). Database . http://www.resset.cn/databases .

Ries, E. (2011). The lean startup: How today’s entrepreneurs use continuous innovation to create radically successful businesses . Crown Business.

Roberts, N., & Grover, V. (2012). Investigating firm’s customer agility and firm performance: The importance of aligning sense and respond capabilities. Journal of Business Research, 65 , 579–585.

Roth, A. (1996). Achieving strategic agility through economies of knowledge. Strategy Leadership, 24 , 30–37.

Sanchez, L. M., & Nagi, R. (2001). A review of agile manufacturing systems. Impact Assessment and Project Appraisal, 39 , 3561–3600.

Scharmer, O. (2007). Theory U: Leading from the future as it emerges . The Society for Organizational Learning.

Scuotto, V., Le Loarne Lemaire, S., Magni, D., & Maalaoui, A. (2022). Extending knowledge-based view: Future trends of corporate social entrepreneurship to fight the gig economy challenges. Journal of Business Research, 139 , 1111–1122.

Senge, P. (1990). The fifth discipline: Mastering the five practices of the learning organization . Doubleday.

Shafer, R., Dyer, L., Kilty, J., Amos, J., & Ericksen, J. (2001). Crafting human resource strategy to foster organizational agility: A case study. Human Resource Management, 40 , 197–211.

Shams, R., Vrontis, D., Belyaeva, Z., Ferraris, A., & Czinkota, M. R. (2021). Strategic agility in international business: A conceptual framework for “agile” multinationals. Journal of International Management, 27 , 100737.

Sharifi, H., & Zhang, Z. (1999). A methodology for achieving agility in manufacturing organisations: An introduction. International Journal of Production Economics, 62 , 7–22.

Shin, H., Lee, J. N., Kim, D., & Rhim, H. (2015). Strategic agility of Korean small and medium enterprises and its influence on operational and firm performance. International Journal of Production Economics, 168 , 181–196.

Spreier, S. W., Fontaine, M. H., & Malloy, R. L. (2006). Leadership run amok: The destructive potential of overachievers. Harvard Business Review, 84 , 72–82.

Teece, D. J., Pisano, G., & Shuen, A. (1997). Dynamic capabilities and strategic management. Strategic Management Journal, 18 , 509–533.

Thorsden, P., Hunter, M., Hanson, N., Sabbagh, R., Sullivan, D., Svith, F., & Sengers, L. (2009). Story-based inquiry: A manual for investigative journalists . Unesco.

Venkatraman, N. (1989). The concept of fit in strategy research: Toward verbal and statistical correspondence. Academy of Management Review, 14 , 423–444.

Weber, Y., & Tarba, S. Y. (2014). Strategic agility: A state of the art. California Management Review, 56 , 5–12.

Xing, Y. J., Liu, Y. P., Boojihawon (Roshan), D. K., & Tarba, S. (2020). Entrepreneurial team and strategic agility: A conceptual framework and research agenda. Human Resource Management Review, 30 , 100696.

Yang, J. (2014). Supply chain agility: Securing performance for Chinese manufacturers. International Journal of Production Economics, 150 , 104–113.

Yusuf, Y. Y., Gunasekaran, A., Musa, A., Dauda, M., El-Berishy, N. M., & Cang, S. (2014). A relational study of supply chain agility, competitiveness and business performance in the oil and gas industry. International Journal of Production Economics, 147 , 531–543.

Zhang, Z., & Sharifi, H. (2000). A methodology for achieving agility in manufacturing organisations. International Journal of Operations & Production Management, 20 , 496–512.

Zheltoukhova, K. (2014). HR: Getting smart about agile working . CIPD research paper.

Download references

Author information

Authors and affiliations.

School of Economics and Management, Southwest Jiaotong University, Chengdu, 610031, China

Xiangsheng Dou & Fizza Ishaq

You can also search for this author in PubMed   Google Scholar

Contributions

The authors contributed equally to this work.

Corresponding author

Correspondence to Xiangsheng Dou .

Ethics declarations

Ethics approval.

Not applicable.

Consent to Participate

Consent for publication, competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Dou, X., Ishaq, F. Enterprise’s Strategic Agility and Resource Allocation Choice: A Case of SMEs in China. J Knowl Econ (2024). https://doi.org/10.1007/s13132-024-02046-0

Download citation

Received : 31 January 2024

Accepted : 25 April 2024

Published : 22 May 2024

DOI : https://doi.org/10.1007/s13132-024-02046-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Strategic management
  • Strategic agility
  • Non-financial resource allocation
  • Strategic agility level zone
  • Find a journal
  • Publish with us
  • Track your research
  • Open access
  • Published: 18 May 2024

Psychometric properties and criterion related validity of the Norwegian version of hospital survey on patient safety culture 2.0

  • Espen Olsen 1 ,
  • Seth Ayisi Junior Addo 1 ,
  • Susanne Sørensen Hernes 2 , 3 ,
  • Marit Halonen Christiansen 4 ,
  • Arvid Steinar Haugen 5 , 6 &
  • Ann-Chatrin Linqvist Leonardsen 7 , 8  

BMC Health Services Research volume  24 , Article number:  642 ( 2024 ) Cite this article

210 Accesses

1 Altmetric

Metrics details

Several studies have been conducted with the 1.0 version of the Hospital Survey on Patient Safety Culture (HSOPSC) in Norway and globally. The 2.0 version has not been translated and tested in Norwegian hospital settings. This study aims to 1) assess the psychometrics of the Norwegian version (N-HSOPSC 2.0), and 2) assess the criterion validity of the N-HSOPSC 2.0, adding two more outcomes, namely ‘pleasure of work’ and ‘turnover intention’.

The HSOPSC 2.0 was translated using a sequential translation process. A convenience sample was used, inviting hospital staff from two hospitals ( N  = 1002) to participate in a cross-sectional questionnaire study. Data were analyzed using Mplus. The construct validity was tested with confirmatory factor analysis (CFA). Convergent validity was tested using Average Variance Explained (AVE), and internal consistency was tested with composite reliability (CR) and Cronbach’s alpha. Criterion related validity was tested with multiple linear regression.

The overall statistical results using the N-HSOPSC 2.0 indicate that the model fit based on CFA was acceptable. Five of the N-HSOPSC 2.0 dimensions had AVE scores below the 0.5 criterium. The CR criterium was meet on all dimensions except Teamwork (0.61). However, Teamwork was one of the most important and significant predictors of the outcomes. Regression models explained most variance related to patient safety rating (adjusted R 2  = 0.38), followed by ‘turnover intention’ (adjusted R 2  = 0.22), ‘pleasure at work’ (adjusted R 2  = 0.14), and lastly, ‘number of reported events’ (adjusted R 2= 0.06).

The N-HSOPSC 2.0 had acceptable construct validity and internal consistency when translated to Norwegian and tested among Norwegian staff in two hospitals. Hence, the instrument is appropriate for use in Norwegian hospital settings. The ten dimensions predicted most variance related to ‘overall patient safety’, and less related to ‘number of reported events’. In addition, the safety culture dimensions predicted ‘pleasure at work’ and ‘turnover intention’, which is not part of the original instrument.

Peer Review reports

Patient harm due to unsafe care is a large and persistent global public health challenge and one of the leading causes of death and disability worldwide [ 1 ]. Improving safety in healthcare is central in governmental policies, though progress in delivering this has been modest [ 2 ]. Patient safety culture surveys have been the most frequently used approach to measure and monitor perception of safety culture [ 3 ]. Safety culture is defined as “the product of individual and group values, attitudes, perceptions, competencies and patterns of behavior that determine the commitment to, and the style and proficiency of, an organization’s health and safety management” [ 4 ]. Moreover, safety culture refers to the perceptions, beliefs, values, attitudes, and competencies within an organization pertaining to safety and prevention of harm [ 5 ]. The importance of measuring patient safety culture was underlined by the results in a 2023 scoping review, where 76 percent of the included studies observed associations between improved safety culture and reduction of adverse events [ 6 ].

To assess patient safety culture in hospitals the US Agency for Healthcare Research and Quality (AHRQ) launched the Hospital Survey on Patient Safety Culture (HSOPSC) version 1.0 in 2004 [ 7 , 8 ]. Since then, HSOPSC 1.0 has become one of the most used tools to evaluate patient safety culture in hospitals, administered to approximately hundred countries and translated into 43 languages as of September 2022 [ 9 ]. HSOPSC 1.0 has generally been considered to be one of the most robust instrument measuring patient safety culture, and it has adequate psychometric properties [ 10 ]. In Norway, the first studies using N-HSOPSC 1.0 concluded that the psychometric properties of the instrument were satisfactory for use in Norwegian hospital settings [ 11 , 12 , 13 ]. A recent review of literature revealed 20 research articles using the N-HSOPSC 1.0 [ 14 ].

Studies of safety culture perceptions in hospitals require valid and psychometric sound instruments [ 12 , 13 , 15 ]. First, an accurate questionnaire structure should demonstrate a match between the theorized content structure and the actual content structure [ 16 , 17 ]. Second, psychometric properties of instruments developed in one context is required to demonstrate appropriateness in other cultures and settings [ 16 , 17 ]. Further, psychometric concepts need to demonstrate relationships with other related and valid criteria. For example, data on criterion validity can be compared with criteria data collected at the same time (concurrent validity) or with similar data from a later time point (predictive validity) [ 12 , 16 , 17 ]. Finally, researchers need to demonstrate a match between the content theorized to be related to the actual content in empirical data [ 15 ]. If these psychometric areas are not taken seriously, this may lead to many pitfalls both for researchers and practitioners [ 14 ]. Pitfalls might be imprecise diagnostics of the patient safety level and failure to evaluate effect of improvement initiative. Moreover, researchers can easily erroneously confirm or reject research hypothesis when applying invalid and inaccurate measurement tools.

Patient safety cannot be understood as an isolated phenomenon, but is influenced by general job characteristics and the well-being of the individual health care workers. Karsh et al. [ 18 ] found that positive staff perceptions of their work environment and low work pressure were significantly related to greater job satisfaction and work commitment. A direct association has also been reported between turnover and work strain, burnout and stress [ 19 ] Zarei et al. [ 20 ] showed a significant relationship between patient safety (safety climate) and unit type, job satisfaction, job interest, and stress in hospitals. This study also illustrated a strong relationship between lack of personal accomplishment, job satisfaction, job interest and stress. Also, there was a negative correlation between occupational burnout and safety climate, where a decrease in the latter was associated with an increase in the former. Hence, patient safety researchers should look at healthcare job characteristics in combination with patient safety culture.

Recently, the AHRQ revised the HSOPSC 1.0 to a 2.0 version, to improve the quality and relevance of the instrument. HSOPSC 2.0 is shorter, with 25 items removed or with changes made for response options and ten additional items added. HSOPSC 2.0 was validated during the revision process [ 21 ], but the psychometric qualities across cultures, countries and in different settings need further investigation. Consequently, the overall aim of this study was to investigate the psychometric properties of the HSOPSC 2.0 [ 21 ] (see supplement 1) in a Norwegian hospital setting. Specifically, the aims were to 1) assess the psychometrics of the Norwegian version (N-HSOPSC 2.0), and 2) assess the criterion validity of the N-HSOPSC 2.0, adding two more outcomes, namely’ pleasure of work’ and ‘turnover intention’.

This study had cross‐sectional design, using a web-based survey solution called “Nettskjema” to distribute questionnaires in two Norwegian hospitals. The study adheres to The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE)Statement guidelines for reporting observational studies [ 22 ].

Translation of the HSOPSC 2.0

We conducted a «forward and backward» translation in-line with recommendations from Brislin [ 23 ]. First, the questionnaires were translated from English to Norwegian by a bilingual researcher. The Norwegian version was then translated back to English by another bilingual researcher. Thereafter, the semantic, idiopathic and conceptual equivalence between the two versions were compared by the research group, consisting of experienced researchers. The face value of the N-HSOPSC 2.0-version was considered to be adequate and the items lend themselves well to the corresponding latent concepts.

The N-HSOPSC 2.0 was pilot-tested with focus on content and face validity. Six randomly selected healthcare personnel were asked to assess whether the questionnaire was adequate, appropriate, and understandable regarding language, instructions, and scores. In addition, an expert group consisting of senior researchers ( n  = 4) and healthcare personnel ( n  = 6), with competence in patient safety culture was asked to assess the same.

The questionnaire

The HSOSPS 2.0 (supplement 1) consists of 32 items using 5-point Likert-like scales of agreement (from 1 = strongly disagree to 5 = strongly agree) or frequency (from 1 = never to 5 = always), as well as an option for “does not apply/do now know”. The 32 items are distributed over ten dimensions. Additionally, 2-single item patient safety culture outcome measures, and 6-item background information measures are included. The patient safety culture single item outcome measures evaluate the overall ‘patient safety rating’ for the work area, and ‘reporting patient safety events’.

In addition to the N-HSOPSC 2.0, participants were asked to respond to three questions about their ‘pleasure at work’ (measure if staff enjoy their work, and are pleased with their work, scored from 1 = never, to 4 = always) [ 24 ], two questions about their ‘intention to quit’ (measure is staff are considering to quit their job, scored on a 5-point likert scale where 1 = strongly agree to 5 = strongly disagree) [ 25 ], as well as demographic variables (gender, age, professional background, primary work area, years of work experience).

Participants and procedure

The data collection was conducted in two phases: the first phase (Nov-Dec 2021) at Hospital A and the second phase at Hospital B (Feb-March 2022)). We used a purposive sampling strategy: At Hospital A (two locations), all employees were invited to participate ( N  = 6648). This included clinical staff, administrators, managers, and technical staff. At Hospital B (three locations) all employees from the anesthesiology, intensive care and operation wards were invited to participate ( N  = 655).

The questionnaire was distributed by e-mail, including a link to a digital survey solution delivered by the University of Oslo, and gathered and stored on a safe research platform: TSD (services for sensitive data). This is a service with two-factor authentication, allowing data-sharing between the collaborating institutions without having to transfer data between them. The system allows for storage of indirectly identifying data, such as gender, age, profession and years of experience, as well as hospital. Reminders were sent out twice.

Statistical analyses

Data were analyzed using Mplus. Normality was assessed for each item using skewness and kurtosis, where values between + 2 and -2 are deemed acceptable for normal distribution [ 26 ]. Missing value analysis was conducted using frequencies, to check the percentage of missing responses for each item. Correlations were assessed using Spearman’s correlation analysis, reported as Cronbach’s alpha.

Confirmatory factor analysis (CFA) was conducted to test the ten-dimension structure of the N-HSOPSC 2.0 using Mplus and Mplus Microsoft Excel Macros. The structure was then tested for fitness using Comparative Fit Index (CFI), Tucker-Lewis Index (TLI), Root Mean Square Error of Approximation (RMSEA) and Standardized Root Mean Square Residual (SRMR) [ 27 ]. Table 1 shows the fitness indices and acceptable thresholds.

Reliability of the 10 predicting dimensions were also assessed using composite reliability (CR) values, where 0.7 or above is deemed acceptable for ascertaining internal consistency [ 25 ].

Convergent validity was assessed using the Average Variance Explained (AVE), where a value of at least 0.5 is deemed acceptable [ 28 ], indicating that at least 50 percent of the variance is explained by the items in a dimension. Criterion-related validity was tested using linear regression, adding ‘turnover intention’ and ‘pleasure at work’ to the two single item outcomes of the N-HSOPSC 2.0.

Internal consistency and reliability were assessed using Cronbach’s alpha, where values > 0.9 is assumed excellent, > 0.8 = good, > 0.7 = acceptable, > 0.6 = questionable, > 0.5 = poor and < 0.5 = unacceptable [ 29 ].

Ethical considerations

The study was conducted in-line with principles for ethical research in the Declaration of Helsinki, and informed consent was obtained from all the participants [ 30 ]. Completed and submitted questionnaires were assumed as consent to participate. Data privacy protection was reviewed by the respective hospitals’ data privacy authority, and assessed by the Norwegian Center for Research Data (NSD, project number 322965).

In total, 1002 participants responded to the questionnaire, representing a response rate of 12.6 percent. As seen in Table  2 , 83.7% of the respondents worked in Hospital A and the remaining 16.3% in Hospital B. The majority of respondents (75.7%) were female, and 75.9 percent of respondents worked directly with patients.

The skewness and kurtosis were between + 2 and -2, indicating that the data were normally distributed. All items had less than two percent of missing values, hence no methods for calculating missing values were used.

Correlations

Correlations and Cronbach’s alpha are displayed in Table  3 .

The following dimensions had the highest correlations; ‘teamwork’, ‘staffing and work pace’, ‘organizational learning-continuous improvement’, ‘response to error’, ‘supervisor support for patient safety’, ‘communication about error’ and ‘communication openness’. Only one dimension, ‘teamwork’ (0.58), had a Cronbach’s alpha below 0.7 (acceptable). Hence, most of the dimensions indicated adequate reliability. Higher levels of the 10 safety dimensions correlate positively with patient safety ratings.

Confirmatory Factor Analysis (CFA)

Table 4 shows the results from the CFA. CFA ( N  = 1002) showed acceptable fitness values [CFI = 0.92, TLI = 0.90, RMSEA = 0.045, SRMR = 0.053] and factor loadings ranged from 0.51–0.89 (see Table  1 ). CR was above the 0.70 criterium on all dimensions except on ‘teamwork’ (0.61). AVE was above the 0.50 criterium except on ‘teamwork’ (0.35), ‘staffing and work pace’ (0.44), ‘organizational learning-continuous improvement’ (0.47), ‘response to error’ (0.47), and communication openness.

Criterion validity

Independent dimensions of HSOPSC 2.0 were employed to predict four different criteria: 1) ‘number of reported events’, 2) ‘patient safety rating’, 3) ‘pleasure at work’, and 4) ‘turnover intentions’. The composite measures explained variance of all the outcome variables significantly thereby ascertaining criterion-related validity (Table  5 ). Regression models explained most variance related to ‘patient safety rating’ (adjusted R 2  = 0.38), followed by ‘turnover intention’ (adjusted R 2  = 0.22), ‘pleasure at work’ (adjusted R 2  = 0.14), and lastly, number of reported events (adjusted R 2  = 0.06).

In this study we have investigated the psychometric properties of the N-HSOPSC 2.0. We found the face and content validity of the questionnaire satisfactory. Moreover, the overall statistical results indicate that the model fit based on CFA was acceptable. Five of the N-HSOPSC 2.0 dimensions had AVE scores below the 0.5 criterium, but we consider this to be the strictest criterium employed in the evaluations of the psychometric properties. The CR criterium was met on all dimensions except ‘teamwork’ (0.61). However, ‘teamwork’ was one of the most important and significant predictors of the outcomes. One the positive side, the CFA results supports the dimensional structure of N-HSOPSC 2.0, and the regression results indicate a satisfactory explanation of the outcomes. On the more critical side, particularly AVE scores reflect threshold below 0.5 on five dimensions, indicating items have certain levels of measurement error as well.

In our study, regression models explained most variance related to ‘patient safety rating’ (R 2  = 0.38), followed by ‘turnover intention’ (R 2  = 0. 22), ‘pleasure at work’ (R 2  = 0.14), and lastly, number of reported events (R 2  = 0.06). This supports the criterion validity of the independent dimensions of N-HSOSPC 2.0, also when adding ‘turnover intention’ and ‘pleasure at work’. These results confirm previous research on the original N-HSOPSC 1.0 [ 12 , 13 ]. The current study also found that ‘number of reported events’ was negatively related to safety culture dimensions, which is also similar to the N-HSOPSC 1.0 findings [ 12 , 13 ].

The current study did more psychometric assessments compared to the first Norwegian studies using HSOPSC 1.0 [ 11 , 12 , 13 ]. However, results from the current study still support that the overall reliability and validity of N-HSOPSC 2.0 when comparing the results with the first studies using N-HSOPSC 1.0 [ 11 , 12 , 13 ]. Also, based on theory and expectations, the dimensions predicted ‘pleasure at work’ and ‘overall safety rating’ positively, and ‘turnover intentions’ and ‘number of reported events’ negatively. The directions of the relations thereby support the overall criterion validity. Some of the dimensions do not predict the outcome variables significantly, nonetheless, each criterion related significantly to at least two dimensions on the HSOPSC 2.0. It is also worth noticing that ‘teamwork’ was generally one of the most important predictors even thought this dimension had the lowest convergent validity (AVE) in the previous findings [ 11 , 12 , 13 ], even if the strict AVE criterium was not satisfactory on the teamwork dimension and CR was also below 0.7. Since the explanatory power of teamwork was satisfactory, this illustrate that the AVE and CR criteria are maybe too strict.

The sample in the current study consisted of 1009 employees at two different hospital trusts in Norway and across different professions. The gender and ages are representative for Norwegian health care workers. In total 760 workers had direct patient contact, 167 had not, and 74 had patient contact sometimes. We think this mix is interesting, since a system perspective is key to establishing patient safety [ 31 ]. The other background variables (work experience, age, primary work area, and gender) indicate a satisfactory spread and mix of personnel in the sample, which is an advantage since then the sample to a large extend represent typical healthcare settings in Norway.

In the current study, N-HSOPSC 2.0 had higher levels of Cronbach’s alpha than in the first N-HSOPSC 1.0 studies [ 11 , 13 ], but more in-line with results from a longitudinal Norwegian study using the N-HSOPSC 1.0 in 2009, 2010 and 2017 respectively [ 23 ]. Moreover, the estimates in the current study reveal a higher level of factor loading on the N-HSOPSC 2.0, ranging from 0.51 to 0.89. This is positive since CFA is a key method when assessing the construct validity [ 16 , 17 , 32 ].

AVE and CR were not estimated in the first Norwegian HSOPSC 1.0 studies [ 11 , 13 ]. The results in this study indicate some issues regarding particularly AVE (convergent validity) since five of the concepts were below the recommended 0.50 threshold [ 32 ]. It is also worth noticing that all measures in the N-HSOPSC 2.0, except ‘teamwork’ (CR = 61), had CR values above 0.70, which is satisfactory. AVE is considered a strict and more conservative measure than CR. The validity of a construct may be adequate even though more than 50% of the variance is due to error [ 33 ]. Hence, some AVE values below 0.50 is not considered critical since the overall results are generally satisfactory.

The first estimate of the criterion related validity of the N-HSOPSC 2.0 using multiple regression indicated that two dimensions where significantly related to ‘number of reported events’, while six dimensions were significantly related to ‘patient safety rating’. The coefficients were negatively related with number of reported events, and positively related with patient safety rating, as expected. In the first Norwegian study in Norway on the N-HSOPSC 1.0 [ 13 ], five dimensions were significantly related to ‘number of reported events’, and seven dimensions were significantly related to ‘patient safety ratings’. The relations with ‘numbers of events reported’ were then both positive and negative, which is not optimal when assessing criterion validity. Hence, since all significant estimates are in the expected directions, the criterion validity of N-HSOPSC 2.0 has generally improved compared to the previous version.

In the current study we added ‘pleasure at work’ and ‘turnover intention’ to extend the assessment of criterion related validity. The first assessment indicated that ‘teamwork’ had a very substantial and positive influence on ‘pleasure at work’. Moreover, ‘staffing and work pace’ also had a positive influence on ‘pleasure at work’, but none of the other concepts were significant predictors. Hence, the teamwork dimension is key in driving ‘pleasure at work’, then followed by ‘staffing and working pace’. ‘Turnover intentions’ was significantly and negatively related to ‘teamwork’, ‘staffing and working pace’, ‘response to error’ and ‘hospital management support’. Hence, the results indicate these dimensions are key drivers in avoiding turnover intentions among staff in hospitals. A direct association has been reported between turnover and work strain, burnout and stress [ 19 ]. Zarei et al. [ 20 ] showed a significant relationship between patient safety (safety climate) and unit type, job satisfaction, job interest, and stress in hospitals. This study also illustrated a strong relationship between lack of personal accomplishment, job satisfaction, job interest and stress. Furthermore, a negative correlation between occupational burnout and safety climate was discovered, where a decrease in the latter is associated with an increase in the former [ 20 ]. Hence, patient safety researchers should look at health care job characteristics in combination with patient safety culture.

Assessment of psychometrics must consider other issues beyond statistical assessments such as theoretical consideration and face validity [ 16 , 17 ]; we believe one of the strengths of the HSOPSC 1.0 is that the instrument was operationalized based on theoretical concepts. This has been a strength, as opposed to other instruments built on EFA and a random selection of items included in the development process. We believe this is also the case in relation to HSOPSC 2.0; the instrument is theoretically based, easy to understand, and most importantly, can function as a tool to improve patient safety in hospitals. Moreover, when assessing the items that belongs to the different latent constructs, item-dimension relationships indicate a high face validity.

Forthcoming studies should consider predicting other outcomes, such as for instance mortality, morbidity, length of stay and readmissions, with the use of N-HSOPSC 2.0.

Limitations

This study is conducted in two Norwegian public hospital trusts, indicating some limitations about generalizability. The response rate within hospitals was low and therefore we could not benchmark subgroups. However, this was not part of the study objectives. The response rate may be hampered by the pandemic workload, and high workload in the hospitals. However, based on the diversity of the sample, we find the study results robust and adequate to explore the psychometric properties of N-HSOPSC 2.0. For the current study, we did not perform sample size calculations. With over 1000 respondents, we consider the sample size adequate to assess psychometric properties. Moreover, the low level of missing responses indicate N-HSOPSC 2.0 was relevant for the staff included in the study.

There are many alternative ways of exploring psychometric capabilities of instruments. For example, we did not investigate alternative factorial structures, e.g. including hierarchical factorial models or try to reduce the factorial structure which has been done with N-HSOPSC 1.0 short [ 34 ]. Lastly, we did not try to predict patient safety indicators over time using a longitudinal design and other objective patient safety indicators.

The results from this study generally support the validity and reliability of the N-HSOPSC 2.0. Hence, we recommend that the N-HSOPSC 2.0 can be applied without any further adjustments. However, future studies should potentially develop structural models to strengthen the knowledge and relationship between the factors included in the N-HSOPSC 2.0/ HSOPSC 2.0. Both improvement initiatives and future research projects can consider including the ‘pleasure at work’ and ‘turnover intentions’ indicators, since N-HSOPSC 2.0 explain a substantial level of variance relating to these criteria. This result also indicates an overlap between general pleasure at work and patient safety culture which is important when trying to improve patient safety.

Availability of data and materials

Datasets generated and/or analyzed during the current study are not publicly available due to local ownership of data, but aggregated data are available from the corresponding author on reasonable request.

World Health Organization. Global patient safety action plan 2021–2030: towards eliminating avoidable harm in health care. 2021. https://www.who.int/teams/integrated-health-services/patient-safety/policy/global-patient-safety-action-plan .

Rafter N, Hickey A, Conroy RM, Condell S, O’Connor P, Vaughan D, Walsh G, Williams DJ. The Irish National Adverse Events Study (INAES): the frequency and nature of adverse events in Irish hospitals—a retrospective record review study. BMJ Qual Saf. 2017;26(2):111–9.

Article   PubMed   Google Scholar  

O’Connor P, O’Malley R, Kaud Y, Pierre ES, Dunne R, Byrne D, Lydon S. A scoping review of patient safety research carried out in the Republic of Ireland. Irish J Med. 2022;192:1–9.

Google Scholar  

Halligan M, Zecevic A. Safety culture in healthcare: a review of concepts, dimensions, measures and progress. BMJ Qual Saf. 2011;20(4):338–43.

Weaver SJ, Lubomksi LH, Wilson RF, Pfoh ER, Martinez KA, Dy SM. Promoting a culture of safety as a patient safety strategy: a systematic review. Ann Intern Med. 2013;158(5):369–74.

Article   PubMed   PubMed Central   Google Scholar  

Vikan M, Haugen AS, Bjørnnes AK, Valeberg BT, Deilkås ECT, Danielsen SO. The association between patient safety culture and adverse events – a scoping review. BMC Health Serv Res 2023;300. https://doi.org/10.1186/s12913-023-09332-8 .

Sorra J, Nieva V. Hospital survey on patient safety culture. AHRQ publication no. 04–0041. Rockville: Agency for Healthcare Research and Quality; 2004.

Nieva VF, Sorra J. Safety culture assessment: a tool for improving patient safety in healthcare organizations. Qual Saf Health Car. 2003;12:II17–23.

Agency for Healthcare Research and Quality (AHQR). International use of SOPS. https://www.ahrq.gov/sops/international/index.html .

Flin R, Burns C, Mearns K, Yule S, Robertson E. Measuring safety climate in health care. Qual Saf Health Care. 2006;15(2):109–15.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Olsen E, Aase K. The challenge of improving safety culture in hospitals: a longitudinal study using hospital survey on patient safety culture. International Probabilistic Safety Assessment and Management Conference and the Annual European Safety and Reliability Conference. 2012;2012:25–9.

Olsen E. Safety climate and safety culture in health care and the petroleum industry: psychometric quality, longitudinal change, and structural models. PhD thesis number 74. University of Stavanger; 2009.

Olsen E. Reliability and validity of the Hospital Survey on Patient Safety Culture at a Norwegian hospital. Quality and safety improvement research: methods and research practice from the International Quality Improvement Research Network (QIRN) 2008:173–186.

Olsen E, Leonardsen ACL. Use of the Hospital Survey of Patient Safety Culture in Norwegian Hospitals: A Systematic Review. Int J Environment Res Public Health. 2021;18(12):6518.

Article   Google Scholar  

Hughes DJ. Psychometric validity: Establishing the accuracy and appropriateness of psychometric measures. The Wiley handbook of psychometric testing: A multidisciplinary reference on survey, scale and test development; 2018:751–779.

DeVillis RF. Scale development: Theory and application. Thousands Oaks: Sage Publications; 2003.

Netemeyer RG, Bearden WO, Sharma S. Scaling procedures: Issues and application. London: SAGE Publications Ltd; 2003.

Book   Google Scholar  

Karsh B, Booske BC, Sainfort F. Job and organizational determinants of nursing home employee commitment, job satisfaction and intent to turnover. Ergonomics. 2005;48:1260–81. https://doi.org/10.1080/00140130500197195 .

Article   CAS   PubMed   Google Scholar  

Hayes L, O’Brien-Pallas L, Duffield C, Shamian J, Buchan J, Hughes F, Spence Laschinger H, North N, Stone P. Nurse turnover: a literature review. Int J Nurs Stud. 2006;43:237–63.

Zarei E, Najafi M, Rajaee R, Shamseddini A. Determinants of job motivation among frontline employees at hospitals in Teheran. Electronic Physician. 2016;8:2249–54.

Agency for Healthcare Research and Quality (AHQR). Hospital Survey on Patient Safety Culture. https://www.ahrq.gov/sops/surveys/hospital/index.html .

von Elm E, Altman DG, Egger M, Pocock SJ, Gotzsche PC, Vandenbroucke JP. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement: guidelines for reporting observational studies. BMJ. 2007;335(7624):806–8.

Brislin R. Back translation for cross-sectional research. J Cross-Cultural Psychol. 1970;1(3):185–216.

Notelaers G, De Witte H, Van Veldhoven M, Vermunt JK. Construction and validation of the short inventory to monitor psychosocial hazards. Médecine du Travail et Ergonomie. 2007;44(1/4):11.

Bentein K, Vandenberghe C, Vandenberg R, Stinglhamber F. The role of change in the relationship between commitment and turnover: a latent growth modeling approach. J Appl Psychol. 2005;90(3):468.

Tabachnick B, Fidell L. Using multivariate statistics. 6th ed. Boston: Pearson; 2013.

Hu L, Bentler P. Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modelling. 1999;6(1):1–55.

Hair J, Sarstedt M, Hopkins L, Kuppelwieser V. Partial least squares structural equation modeling (PLS-SEM): An emerging tool in business research. Eur Business Rev. 2014;26:106–21.

George D, Mallery P. SPSS for Windows step by step: A simple guide and reference. 11.0 update. Boston: Allyn & Bacon; 2003.

World Medical Association. Declaration of Helsinki- Ethical Principles for Medical Research Involving Human Subjects. 2018. http://www.wma.net/en/30publications/10policies/b3 .

Farup PG. Are measurements of patient safety culture and adverse events valid and reliable? Results from a cross sectional study. BMC Health Serv Res. 2015;15(1):1–7.

Hair JF, Black WC, Babin BJ, Anderson RE. Applications of SEM. Multivariate data analysis. Upper Saddle River: Pearson; 2010.

Malhotra NK, Dash S. Marketing research an applied orientation (paperback). London: Pearson Publishing; 2011.

Olsen E, Aase K. A comparative study of safety climate differences in healthcare and the petroleum industry. Qual Saf Health Care. 2010;19(3):i75–9.

Download references

Acknowledgements

Master student Linda Eikemo is acknowledged for participating in the data collection in Hospital A, and Nina Føreland in Hospital B.

Not applicable.

Author information

Authors and affiliations.

UiS Business School, Department of Innovation, Management and Marketing, University of Stavanger, Stavanger, Norway

Espen Olsen & Seth Ayisi Junior Addo

Hospital of Southern Norway, Flekkefjord, Norway

Susanne Sørensen Hernes

Department of Clinical Sciences, University of Bergen, Bergen, Norway

Department of Obstetrics and Gynecology, Stavanger University Hospital, Stavanger, Norway

Marit Halonen Christiansen

Faculty of Health Sciences Department of Nursing and Health Promotion Acute and Critical Illness, OsloMet - Oslo Metropolitan University, Oslo, Norway

Arvid Steinar Haugen

Department of Anaesthesia and Intensive Care, Haukeland University Hospital, Bergen, Norway

Faculty of Health, Welfare and Organization, Østfold University College, Fredrikstad, Norway

Ann-Chatrin Linqvist Leonardsen

Department of anesthesia, Østfold Hospital Trust, Grålum, Norway

You can also search for this author in PubMed   Google Scholar

Contributions

EO, ASH and ACLL initiated the study. All authors (EO, SA, SSH, MHC, ASH, ACLL) participated in the translation process. SSH and ACLL were responsible for data collection. EO and SA performed the statistical analysis, which was reviewed by ASH and ACLL. EO, SA and ACLL wrote the initial draft of the manuscript, and all authors (EO, SA, SSH, MHC, ASH, ACLL) critically reviewed the manuscript. All authors(EO, SA, SSH, MHC, ASH, ACLL) have read and approved the final version of the manuscript.

Corresponding author

Correspondence to Ann-Chatrin Linqvist Leonardsen .

Ethics declarations

Ethics approval and consent to participate.

The study was conducted in-line with principles for ethical research in the Declaration of Helsinki, and informed consent was obtained from all the participants [ 30 ]. Eligible healthcare personnel were informed of the study through hospital e-mails and by text messages. Completed and submitted questionnaires were assumed as consent to participate. According to the Norwegian Health Research Act §4, no ethics approval is needed when including healthcare personnel in research.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Olsen, E., Addo, S.A.J., Hernes, S.S. et al. Psychometric properties and criterion related validity of the Norwegian version of hospital survey on patient safety culture 2.0. BMC Health Serv Res 24 , 642 (2024). https://doi.org/10.1186/s12913-024-11097-7

Download citation

Received : 03 April 2023

Accepted : 09 May 2024

Published : 18 May 2024

DOI : https://doi.org/10.1186/s12913-024-11097-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Hospital survey on patient safety culture
  • Patient safety culture
  • Psychometric testing

BMC Health Services Research

ISSN: 1472-6963

example of research results and discussion

  • Open access
  • Published: 13 May 2024

Identification and verification of a novel signature that combines cuproptosis-related genes with ferroptosis-related genes in osteoarthritis using bioinformatics analysis and experimental validation

  • Baoqiang He 1 , 2   na1 ,
  • Yehui Liao 1   na1 ,
  • Minghao Tian 1 ,
  • Chao Tang 1 ,
  • Qiang Tang 1 ,
  • Wenyang Zhou 1 ,
  • Yebo Leng 1 , 3 &
  • Dejun Zhong 1 , 2  

Arthritis Research & Therapy volume  26 , Article number:  100 ( 2024 ) Cite this article

269 Accesses

Metrics details

Exploring the pathogenesis of osteoarthritis (OA) is important for its prevention, diagnosis, and treatment. Therefore, we aimed to construct novel signature genes (c-FRGs) combining cuproptosis-related genes (CRGs) with ferroptosis-related genes (FRGs) to explore the pathogenesis of OA and aid in its treatment.

Materials and methods

Differentially expressed c-FRGs (c-FDEGs) were obtained using R software. Enrichment analysis was performed and a protein–protein interaction (PPI) network was constructed based on these c-FDEGs. Then, seven hub genes were screened. Three machine learning methods and verification experiments were used to identify four signature biomarkers from c-FDEGs, after which gene set enrichment analysis, gene set variation analysis, single-sample gene set enrichment analysis, immune function analysis, drug prediction, and ceRNA network analysis were performed based on these signature biomarkers. Subsequently, a disease model of OA was constructed using these biomarkers and validated on the GSE82107 dataset. Finally, we analyzed the distribution of the expression of these c-FDEGs in various cell populations.

A total of 63 FRGs were found to be closely associated with 11 CRGs, and 40 c-FDEGs were identified. Bioenrichment analysis showed that they were mainly associated with inflammation, external cellular stimulation, and autophagy. CDKN1A, FZD7, GABARAPL2, and SLC39A14 were identified as OA signature biomarkers, and their corresponding miRNAs and lncRNAs were predicted. Finally, scRNA-seq data analysis showed that the differentially expressed c-FRGs had significantly different expression distributions across the cell populations.

Four genes, namely CDKN1A, FZD7, GABARAPL2, and SLC39A14, are excellent biomarkers and prospective therapeutic targets for OA.

Introduction

As a degenerative disease that is difficult to reverse, osteoarthritis (OA) is often accompanied by joint pain, stiffness, joint swelling, restricted movement, and joint deformity, all of which seriously affect daily life activities. The structural changes in OA mainly involve the articular cartilage, subchondral bone, ligaments, capsule, synovium, and periarticular muscles [ 1 ]. The prevalence of OA is steadily rising due to the aging population and the obesity epidemic [ 1 ], and it has placed a significant burden on society [ 2 ]. Currently, the main treatments for OA remain nonsteroidal anti-inflammatory drugs (NSAIDs), pain medications, and joint replacement surgery. However, these treatments cannot reduce the incidence of the early stages of the disease [ 3 ], prevent further cartilage degeneration, or promote cartilage regeneration [ 4 ]. Therefore, further understanding of the pathophysiological mechanisms of OA could aid in the development of additional approaches for more effective diagnosis and treatment.

Ferroptosis is a specific type of programmed cell death driven by iron-dependent lipid peroxidation characterized by an abnormal accumulation of lipid reactive oxygen species (ROS) [ 5 , 6 ]. This programmed cell death was first reported and named by Dixon in 2012 [ 7 ]. Many studies have demonstrated that ferroptosis and the development of OA are closely related [ 8 , 9 , 10 , 11 ], and ferroptosis-related genes (FRGs) can help in the diagnosis of OA, as well as in predicting the immune status of patients with OA [ 12 , 13 ].

Copper is an indispensable trace element involved in a wide range of biological reactions. A small study reported elevated plasma and synovial copper concentrations in patients with OA compared with healthy controls [ 14 ], and another study also found that elevated levels of copper were associated with an increased risk of OA [ 15 ]. When the oxidizing capacity of copper ions in the body exceeds the antioxidant capacity of the body, joints can be destroyed [ 16 ]. Cuproptosis is a novel form of programmed cell death during which copper binds directly to the fatty acylated components of the tricarboxylic acid (TCA) cycle, thereby leading to an increase in toxic proteins and ultimately to cell death [ 17 ]. Ferroptosis is an iron-dependent programmed cell death caused by lipid peroxidation and the massive accumulation of reactive oxygen radicals[ 7 ]. Furthermore, copper and iron are closely related; copper is essential for iron absorption, meaning that copper deficiency or overload can impair the balance of iron metabolism [ 18 ]. When the balance of iron metabolism is disturbed, lipid peroxidation and oxidative stress may be induced, which in turn leads to ferroptosis and alters the expression of FRGs [ 19 , 20 , 21 ]. However, it has not yet been reported whether new signature genes (c-FRGs) combining cuproptosis-related genes (CRGs) with FRGs are beneficial for the diagnosis and treatment of OA.

In this study, we explored and analyzed the immune characteristics and biological functions of c-FRGs in patients with OA. In addition, we screened key ferroptosis-related biomarkers associated with cuproptosis in OA, constructed ceRNA networks, and predicted potential drugs for OA treatment. Our results suggest that c-FRGs may play an important role in the pathophysiological process of OA and provide new directions and ideas for OA research.

Data collection

The US National Center for Biotechnology Information (NCBI) gene expression omnibus (GEO) is the world's largest international public repository of high-throughput molecular information. Using “osteoarthritis” as a search term, the GEO database ( https://www.ncbi.nlm.nih.gov/geo/ ) was searched for appropriate datasets, and four datasets that met the study requirements were downloaded. These four datasets were GSE55235, GSE169077, GSE55457, and GSE55584, and the chip type was Affymetrix Human Genome U133a. We eventually obtained 25 normal human synovial samples and 32 OA synovial samples from the four datasets as samples for the follow-up study. To assess the accuracy of the analysis, the GSE82107 dataset was used as validation sets. In addition, the FRGs and CRGs were obtained from the published literature [ 6 ] and the FerrDb website ( http://www.zhounan.org/ferrdb/ ).

Extraction of c-FRGs and obtaining differentially expressed c-FRGs

Inter-batch differences between the four groups (GSE55235, GSE169077, GSE55457, and GSE55584) were eliminated using “affy” packet merging and the “sva” packet. We performed a Pearson correlation analysis of CRGs with FRGs to obtain particular FRGs (c-FRGs) that were highly correlated with CRGs (|r| > 0.5, adj. p value < 0.05). Differentially expressed genes (DEGs) and differentially expressed c-FRGs (c-FDEGs) were obtained using the “limma” package ( p value < 0.05).

Function enrichment analysis and protein–protein interaction (PPI) networks

To acquire disease-related biological functions and signaling pathways, Gene Ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of c-FDEGs were performed. GO enrichment analysis was used to describe the molecular functions (MF), cellular components (CC), and biological processes (BP) involved in the target genes ( p -value < 0.05). KEGG analysis was used to systematically analyze gene functions and to link genomic information and functional information ( p -value < 0.05). The results of the gene set enrichment analysis (GSEA), GO enrichment analysis, and KEGG pathway enrichment analysis of the c-FDEGs were visualized using the “ClusterProfiler” package in R. GSEA was based on the gene set (h. all. v7. 5. 1. symbols. gmt), which was downloaded from MSigDB ( https://www.gsea-msigdb.org/gsea/msigdb/index.jsp ). The STRING database is used for searching interactions between known proteins and for predicting interactions between proteins and is one of the most data-rich and widely used databases for studying protein interactions. Protein interaction analysis was performed on all c-FDEGs through the STRING website ( https://string-db.org/ ) and visualized using Cytoscape software. The degree values of the c-FDEGs were calculated using the cytoHubba plugin, and the top seven genes were used as hub genes.

Acquisition and validation of biomarkers

In this research, we used three machine learning algorithms: support vector machine recursive feature elimination (SVM-RFE), least absolute shrinkage and selection operator (LASSO) regression analysis, and random forest analysis (RF). First, we used the “e1071” R package for SVM-RFE analysis. Subsequently, the “glmnet” package was used to perform LASSO regression analysis. In addition, RF was conducted adopting the “randomForest” package, and genes with importance > 1 were retained. The crossover genes obtained by these three methods were regarded as prospective biomarkers for OA.

Construction and validation of disease model (nomogram)

In addition, a nomogram based on characteristic biomarkers was structured using the “rms” R package. Receiver operating characteristic (ROC) analysis was performed on the biomarkers and the obtained models, and the area under the curve (AUC) values were calculated with the “pROC” package to assess the diagnostic efficacy of the potential biomarkers. In addition, the four biomarkers and the obtained disease nomogram were validated on the GSE82107 validation set.

Collection of clinical samples

Synovial tissue collection and all experimental procedures were approved by the Institutional Review Board of the Affiliated Hospital of Southwest Medical University (KY2023293) in accordance with the guidelines of the Chinese Health Sciences Administration, and written informed consent was obtained from the donors. Synovial tissue from the suprapatellar bursa was collected as OA synovial samples and normal control samples, respectively, from patients who met the American College of Rheumatology criteria for the diagnosis of primary symptomatic knee OA (n=6; men: 3, women: 3; age: 55-70 years) and from patients who underwent trauma-related lower extremity amputation but did not have osteoarthritis or rheumatoid arthritis (n=6; men: 4, women: 2; age: 50-67 years). All samples were collected within two hours of arthroplasty or lower limb amputation and were divided into two portions for subsequent immunofluorescence staining and western blot experiments, respectively.

Immunofluorescence staining

Mid-sagittal sections (4-μm thick) of paraffin-embedded clinical synovial specimens were incubated for 1 hour at room temperature, after which the slides were closed with 10% bovine serum (Solarbio, Beijing, China) for 1 hour at room temperature and then incubated with primary antibodies for 16 hours at 4°C. The fluorescent dye was incubated for 1 hour at room temperature, and the slides were subsequently sealed with DAPI Sealer (Thermo Fisher Scientific, Waltham, MA, USA).

Western blot analysis

Protein lysates were extracted from synovial tissue samples and lysed with RIPA buffer to extract the total protein. After conducting a BCA protein assay (Beyotime, Shanghai, China), 5 × sample buffer (Servicebio, Wuhan, China) was added to the protein lysates. Equal amounts of lysates were then separated through SDS-PAGE and transferred to a 0.22-um PVDF microporous membrane (Merck Millipore, Burlington, MA, USA). Next, the membrane was sealed with 5% skimmed milk and incubated with the primary antibody for 16 hours at 4°C, after which the membrane was incubated with the secondary antibody for 60 minutes at room temperature. Target protein bands were visualized using FDbio-Dura ECL (Merck Millipore, Burlington, MA, USA). The antibodies used for immunofluorescence and western blot in this study were as follows: rabbit anti-FZD7 (Cat. #: DF8657, 1:1,000; AFFBIOTECH, USA), rabbit anti-SLC39A14 (ZIP14) (Cat. #: 26540-1-AP, 1:1,000, Proteintech, Rosemont, IL, USA), rabbit anti-CDKN1A (p21) (Cat. #: 2947T, 1:1,000, Cell Signaling Technology, Danvers, MA, USA), rabbit anti-GABARAPL2 (Cat. #: 14256T, 1:1,000, Cell Signaling Technology), anti-GAPDH (Cat. #: 60004 -1-Ig, 1:1,000, Proteintech, USA), and species-matched HRP-conjugated secondary antibody (Cat. #: SA00001-1, 1:1,000; Proteintech, USA).

ssGSEA, GSEA, and GSVA for differentially expressed c-FRGs

The gene set (h.all.v2022.1.Hs.symbols.gmt), a collection of 50 symbolic gene sets for humans, was downloaded from MSigDB ( https://www.gsea-msigdb.org/gsea/msigdb/index.jsp ). The 50 symbolic human gene set scores were calculated for each sample using single-sample GSEA (ssGSEA), and differential scores were obtained for the non-OA and OA groups. The “corrplot” package was used to perform correlation analysis between biomarkers and ssGSEA gene sets. Next, GSEA and gene set variation analysis (GSVA) were performed for the four biomarkers, the seven hub genes, and the remaining 29 differentially expressed c-FRGs.

Prediction of therapeutic drugs

The gene–drug interaction database (DGIDB, http://www.dgidb.org ) [ 22 ] can help researchers annotate known pharmacogenetic interactions and potential drug accessibility–related genes. In this research, we used DGIdb to filter potential drugs targeted to biomarkers so as to identify new therapeutic targets. The obtained drug prediction results were also imported into Cytoscape (v3.9.1) software for visualization.

Construction of ceRNA network

The miRanda, TargetScan, and miRDB databases are authoritative databases used for predicting miRNA–target gene regulatory relationships, and spongeScan is a web tool designed for sequence-based complementary detection of miRNA-binding elements in lncRNA sequences. Biomarkers of common mRNA–miRNA interactions were identified in miRanda ( http://www.microrna.org/microrna/home.do ), TargetScan ( http://www.targetscan.org ), and miRDB ( https://mirdb.org ). miRNA–lncRNA interactions were obtained from Spongescan ( http://spongescan.rc.ufl.edu ). These interactions were imported into Cytoscape to construct the ceRNA network.

Immune infiltration analysis

To better understand the changes that occur in the immune system of patients with OA, the “CIBERSORT” R package was used to describe the basic expression of 22 immune cell subtypes. Next, we analyzed the correlation between potential biomarkers, hub genes, and the 22 immune cell types.

scRNA‑seq analysis

The OA synovial scRNA-seq data (GSE152805) from three patients were obtained from the GEO database and analyzed using the "Seurat" software package. To ensure high quality of the data, we removed low-quality cells (cells with <200 or >10,000 detected genes, >10% of mitochondrial genes, or <300 or >30,000 expressed genes) and low-expressed genes (any gene expressed in less than three cells). We used the "NormalizeData" function to normalize the gene expression of the included cells and performed principal component analysis (PCA) using the top 2000 highly variable genes to extract the top 12 principal components (PCs), which were retained for further analysis using the "FindVariableFeatures" function. To perform unsupervised and unbiased clustering of cell subpopulations, the "FindNeighbors," "FindClusters" (resolution = 0.6), and "RunUMAP" functions were applied. Each cell cluster was manually annotated according to the cell-specific marker genes. These marker genes were obtained from previously published literature[ 23 , 24 ] and from the CellMarker website ( http://xteam.xbio.top/CellMarker/ ). Finally, we used CellChat (1.6.1) for the inference and analysis of cell–cell communication.

Figure 1 describes the entire flow of the study.

figure 1

A graphical flowchart of the study design

Extracting c-FRGs and obtaining differentially expressed c-FRGs

After merging the GSE55235, GSE169077, GSE55457, and GSE55584 datasets (Table 1 ), the newly produced gene expression matrices were subjected to normalization and presented as bidimensional PCA plots prior to and after processing (Fig. 2 a and b), indicating that the final sample data obtained were plausible. A total of 63 FRGs were found to be closely associated with 11 CRGs (Fig. 2 e, Supplementary Table 1 ). A total of 4167 DEGs were determined and identified (Fig. 2 c). There were a total of 40 c-FDEGs, including 13 upregulated genes and 27 downregulated genes (Fig. 2 d, Supplementary Table 2 ). The correlations between the 40 c-FDEGs are shown in Supplementary Figure 1 . The expression patterns of the 40 c-FDEGs are visualized in the heatmap (Fig. 2 f).

figure 2

Extraction of particular ferroptosis-related genes (c-FRGs) and obtainment of differentially expressed c-FRGs (c-FDEGs). a, b Two-dimensional PCA cluster plot of GSE55235, GSE169077, GSE55457, and GSE55584 datasets before and after normalization. c Volcano plot of DEGs. Red spots represent upregulated genes and green spots represent downregulated genes. d Overall expression landscape of c-FRGs in osteoarthritis (OA). * P < 0.05; ** P < 0.01; *** P < 0. 001. OA represents the OA group and Normal represents the normal control group. e Extraction of c-FDEGs. f  Heatmap of c-FDEGs. The redder the color, the higher the expression; conversely, the bluer the color, the lower the expression

Function enrichment analysis

Understanding the signaling pathways, biological processes, and interrelationships involved in c-FDEGs is of great importance in revealing the pathogenesis of OA. GO enrichment analysis showed that c-FDEGs were significantly enriched in the regulation of the inflammatory response (BP), the positive regulation of cellular catabolic process (BP), the autophagosome membrane (CC), the recycling endosome (CC), and NF-κB binding (MF) (Fig. 3 a, Supplementary Table 3 ). KEGG pathway analysis showed that these c-FDEGs were mainly involved in the IL-17 signaling pathway, NOD-like receptor signaling pathway, HIF-1 signaling pathway, and TNF signaling pathway (Fig. 3 b, Supplementary Table 4 ). GSEA suggested that the development of OA may be associated with hypoxia, MYC targets v2, the P53 pathway, the inflammatory response, TNFα signaling via NF-κB, the interferon-α response, and peroxisome (Fig. 3 c and d).

figure 3

Functional analyses: ( a ) Gene Ontology (GO) enrichment analysis showed that the 40 c-FDEGs were significantly enriched in the regulation of the inflammatory response, the positive regulation of cellular catabolic process, the autophagosome membrane, the recycling endosome, and NF-κB binding. b Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis showed that these c-FDEGs were mainly involved in the IL-17 signaling pathway, NOD-like receptor signaling pathway, HIF-1 signaling pathway, and TNF signaling pathway. c Gene set enrichment analysis (GSEA) in the normal control group and (d) GSEA in the OA group based on the core set of 50 human genes suggested that the development of OA may be associated with hypoxia, MYC targets v2, the P53 pathway, the inflammatory response, TNFα signaling via NF-κB, the interferon-α response, and peroxisome

Building PPI networks

The String database is a database that can be used to retrieve interactions between known and predicted proteins. To explore the interactions between each c-FDEG, all of the abovementioned 40 c-FDEGs were imported into the STRING database. The PPI network of c-FDEGs after deleting isolated c-FDEGs and adding the six related CRGs (without CDKN2A) is shown in Fig. 4 a. The cytoHubba plugin in Cytoscape software was used to calculate the degree values (degrees) of the top seven genes (IL6, IL1B, RELA, PTGS2, EGFR, CDKN2A, and SOCS1) as the PPI network’s hub genes (Fig. 4 b).

figure 4

Protein–protein interaction (PPI) network and core gene screening. a PPI network constructed from 40 c-FDEGs; red triangles represent c-FDEGs, green triangles represent CRGs that are closely related to them, and the correlation between c-FDEGs and CRGs is indicated by dashed lines. b The top seven core gene interaction networks calculated using the cytoHubba plugin: the darker the color, the more powerful the critical degree

Machine learning algorithm–based biomarker screening for patients with OA

In this study, 40 c-FDEGs were further analyzed for potential biomarkers associated with OA using multiple machine learning methods. SVM-RFE analysis showed that the model containing 24 genes had the best accuracy (Fig. 5 a). LASSO regression analysis showed that the model was able to accurately predict OA when λ was equal to 12. Thus, the LASSO regression model generated 12 candidate genes (Fig. 5 b). We retained the candidate biomarkers with RF results importance > 1 (Fig. 5 c). Lastly, the results of these three methods were integrated, and CDKN1A, FZD7, GABARAPL2, and SLC39A14 were identified as the final potential biomarkers for OA (Fig. 5 d).

figure 5

Machine learning-based potential biomarker screening. a SVM-RFE model with the optimal error rate when the number of signature genes was 58. b LASSO regression model. c Random forest model and the top 20 genes in terms of importance. d The final biomarkers screened using three machine learning algorithms

Experimental validation of four biomarkers

To validate the results of the bioinformatics analysis, we collected OA samples (n=6) and normal group samples (n=6), respectively, and performed western blot analysis and immunofluorescence staining (Fig. 6 ). Both results were consistent with the bioinformatics analysis, i.e., higher expression of FZD7 and GABARAPL2 and lower expression of CDKN1A (p21) and SLC39A14 (ZIP14) in the OA group compared with the normal group.

figure 6

Experimental validation of four biomarkers. a Representative immunofluorescence staining images of the four biomarker proteins (p21, FZD7, GABARAPL2, and ZIP14) in the normal and OA groups, with nuclei stained blue with 4’,6-diamidino-2-phenylindole. Scale bar = 25 µm. b Semi-quantitative analysis of mean fluorescence intensity of the four biomarker proteins in the normal and OA groups ( n = 6). (c, d) Representative western blotting and statistical comparisons of the four biomarker proteins in the normal and OA groups ( n = 6). * p < 0.05, ** p < 0.01, all by independent samples t-test

To better capture the function of the four biomarkers in OA, GSEA, GSVA, and ssGSEA were conducted on each of the above biomarkers (Fig. 7 ). The ssGSEA showed that the OA group was significantly enriched in Notch signaling, interferon alpha (IFN-α) response, the Wnt/β-catenin pathway, bile acid metabolism, and peroxisome, while the non-OA group was mainly enriched in TNFα signaling via NF-κB, hypoxia, MYC targets v2, the P53 pathway, the inflammatory response, PI3K AKT mTOR signaling, and IL6 JAK STAT3 signaling (Fig. 7 i). Correlation analysis showed that CDKN1A and SLC39A14 were significantly positively correlated with the gene sets of hypoxia, TNF-α signaling via NF-κB, the P53 pathway, and mTORC1 signaling. Meanwhile, GABARAPL2 and FZD7 showed significant negative correlations with the gene sets of TNF-α signaling via NF-κB, PI3K AKT mTOR signaling, and mTORC1 signaling (Fig. 7 j). The single-gene GSEA results for the seven hub genes are shown in Supplementary Figure 2 (a–g). The remaining 29 differentially expressed c-FRGs are shown in Supplementary Figure 3 .

figure 7

GSEA, GSVA, and ssGSEA results of four potential biomarkers. a–d Single-gene GSEA-KEGG pathway analysis of four potential biomarkers. We show the top six pathways with the smallest p -value. e–h High- and low-expression groups based on the expression levels of each potential biomarker combined with gene set variation analysis (GSVA). Red means the pathway is significantly upregulated, green means the pathway is significantly downregulated, and gray means the pathway is not statistically significant. i ssGSEA of OA and normal controls based on the h.all.v7.5.1.symbols.gmt gene set. * P < 0.05; ** P < 0.01; *** P < 0. 001. Treat represents the OA group, and control represents the normal group. (j) Correlation of four biomarkers with 50 human symbolic gene sets from the h.all.v7.5.1.symbols.gmt gene set

Using the above four biomarkers, a disease nomogram was constructed. The AUC values of the individual genes CDKN1A, FZD7, GABARAPL2, and SLC39A4 were 0.931, 0.879, 0.989, and 0.850, respectively, all of which were greater than 0.85 (Fig. 8 a), further indicating that the above genes had good diagnostic ability (Fig. 8 b). The AUC value of this model was 0. 996, which was significantly greater than the AUC value of individual biomarkers, indicating that this model had good diagnostic value (Fig. 8 c and d). To verify whether the above model is diagnostically meaningful, validation was performed on the GSE8207 dataset. The results showed that the AUC values of the four biomarkers were all greater than 0.7, and the AUC value of the model was 1 for the validation set (Fig. 8 f). These results indicate that CDKN1A, FZD7, GABARAPL2, and SLC39A4 are effective disease biomarkers for OA and that the model has high diagnostic efficacy.

figure 8

Validation of four biomarkers. a ROC analysis of the four biomarkers. b ROC analysis of the disease model constructed from the four biomarkers. c, d Nomograms based on the disease model: we obtained the corresponding scores for each genetic variable, drew a vertical line above the “points” axis, summed the scores of all predictor variables, found the final value on the “total score” axis, and then drew a straight line on the “probability” axis to determine the patient’s risk of osteoarthritis. e, f Validation of the disease model and four biomarkers on the GSE82107 validation dataset

Construction of drug prediction network and lncRNA–miRNA–mRNA network

The corresponding drug prediction network was constructed using the database based on the four biomarkers (Supplementary Figure 4 a). The predicted drugs were celecoxib, paclitaxel, carboplatin, acetaminophen, vantictumab, and nortriptyline. Based on the competitive endogenous RNA hypothesis, an lncRNA–miRNA–mRNA competitive endogenous RNA (ceRNA) network was constructed to explore the function of lncRNA as an miRNA sponge in OA. We obtained 150 target miRNAs based on these biomarkers. Then, 48 lncRNAs were obtained based on these miRNA predictions. The four biomarkers with predicted miRNAs and lncRNAs were introduced into Cytoscape, and constituted a ceRNA network containing 48 lncRNA nodes, 150 miRNA nodes, 4 hub gene nodes, and 198 edges (Supplementary Figure 4 b).

The immune microenvironment plays an important role in the progression of OA. Therefore, with the help of CIBERSORT, we summarized the differences in immune infiltration by immune cell subpopulations between OA samples and non-OA tissues (Fig 9 a). The OA samples contained a higher proportion of memory B cells, M0 macrophages, M2 macrophages, and resting mast cells than the control group, as well as a lower proportion of resting CD4 memory T cells and activated mast cells. Correlation analysis showed that activated mast cells showed positive correlations with PTGS2, IL6, and IL1B, and the correlation between activated mast cells and PTGS2 was the highest (0. 686) (Fig. 9 b). There were positive correlations between IL1B, PTGS2, and M1 macrophages, resting CD4 memory T cells and PTGS2, and regulatory T cells (Tregs) and RELA. There were significant negative correlations between follicular helper T cells and RELA, as well as between plasma cells and SLC39A14 (Fig. 9 c and d).

figure 9

Results of immune infiltration by CIBERSORTx. a Bar plot showing the composition of 22 types of immune cells. b Box plot presenting the difference of immune infiltration of 22 types of immune cells. Treat represents the OA group, and Control represents the normal group. c Heatmap showing the correlation between seven hub genes and 22 types of immune cells in osteoarthritis. d Correlation between the four biomarkers and 22 types of immune cells in osteoarthritis

Single‑cell analysis

The scRNA-seq data from three OA synovial samples were obtained from the GSE152805 dataset. After initial quality control, we finally retained 10,194 cells for cell annotation (Supplementary Figure 5 ). The top 2000 highly variable genes were selected for further analysis (Supplementary Figure 5 b). We used the "RunPCA" function to reduce the dimensionality and obtained 14 clusters (Supplementary Figures 6 d and e); the first five DEGs of each cluster are shown in Supplementary Table 5 . Later, we performed cellular annotation using marker genes and annotated seven cell populations: fibroblasts (77.7%), macrophages (8.8%), dendritic cells (DCs) (3.6%), endothelial cells (ECs) (3.5%), smooth muscle cells (SMCs) (3.4%), T cells (1.8%), and mast cells (1.2%) (Fig. 10 a). Next, we performed differential gene expression analysis on these seven cell populations to verify the accuracy of the cell annotation (Fig. 10 b). Figures 10 c and d show the distribution and expression of seven hub genes and four biomarker genes in different cell populations. We found that 11 c-FRGs were significantly different in macrophages, DCs, mast cells, and NK cells. For example, IL1B, PTGS2, and SLC39A4 were significantly highly expressed in some cells, whereas they were significantly less expressed, or even absent, in other cells. We used CellChat to identify differentially overexpressed ligands and receptors for each cell population. In total, 254 significant ligand–receptor pairs were detected, which were further classified into 62 signaling pathways (Table 2 ). We found that the immune cells interacted weakly with each other; however, the non-immune cells had extensive communication interactions with other cells and were involved in various paracrine and autocrine signaling interactions (Fig. 10 e to g).

figure 10

Analysis of single-cell RNA sequencing data from three OA synovial samples. a UMAP plot of scRNA-seq showing unsupervised clusters colored according to putative cell types among a total of 10,194 cells in OA synovial samples. The percentages of total acquired cells were as follows: 77.7% fibroblasts, 8.8% macrophages, 3.6% dendritic cells (DCs), 3.5% endothelial cells (ECs), 3.4% smooth muscle cells (SMCs), 1.8% T cells, and 1.2% mast cells. b Heatmap depicting the expression levels of the top five marker genes among seven detected cell clusters. c, d UMAP plots and violin plots showing the expression of the selected seven hub c-FRGs and four potential biomarkers for each cell type. e Interaction net count plot of OA synovial cells. The thicker the line, the greater the number of interactions. f Interaction weight plot of synovial cells. The thicker the line, the stronger the interaction weights/strength between the two cell types. g Detailed network of cell–cell interactions among seven cell subsets

Copper is an irreplaceable trace metal element that participates in a variety of biological processes. When copper ions accumulate in excess, they eventually lead to cell death, and this new form of programmed cell death is known as cuproptosis [ 17 ]. A recent report has demonstrated that copper levels are significantly higher in the serum and synovial tissue of patients with OA than in controls [ 14 ]. Evidence from several studies suggests that the development of OA is closely related to ferroptosis in articular cartilage and synovium [ 25 , 26 , 27 , 28 , 29 ], and that OA can be treated to some extent by modulation of ferroptosis [ 29 , 30 ]. Additionally, previous studies have reported that copper and iron levels are closely correlated with each other in patients with OA [ 14 , 15 , 31 ].

In this study, we identified transcriptional alterations and expression of c-FRGs based on the GSE55235, GSE169077, GSE55457, and GSE55584 datasets. Forty c-FDEGs were identified in 63 c-FRGs. GO enrichment analysis showed that these 40 c-FDEGs were mainly associated with the inflammatory response, cellular response to external stimulus, and autophagy. The KEGG enrichment analysis showed that these genes were highly enriched mainly in the IL-17 signaling pathway, NOD-like receptor signaling pathway, HIF-1 signaling pathway, and TNFα signaling pathway. For both OA and non-OA groups, GSEA and ssGSEA showed that OA was mainly associated with the enrichments in Notch signaling, adipogenesis, xenobiotic metabolism, fatty acid metabolism, peroxisome, TNFα signaling via NF-κB, the inflammatory response, PI3K AKT mTOR signaling, and IL6 JAK STAT3 signaling. This indicates that the mechanism of OA development is closely related to fatty acid metabolism, the inflammatory response, immune regulation, and cell adhesion.

We analyzed the PPI results using the cytoHubba plugin in Cytoscape, revealing seven key c-FDEGs, including IL6, IL1B, RELA, PTGS2, EGFR, CDKN2A, and SOCS1. GSEA and GSVA of the seven genes revealed that IL6, IL1B, RELA, PTGS2, SOCS1, and EGFR were closely associated with inflammation, immune regulation, extracellular matrix, and cell adhesion pathways in OA, which is consistent with previous findings [ 32 , 33 ]. Interestingly, we also found that they were closely associated with lipid metabolism and fatty acid metabolism in OA. Considering that increased iron accumulation, free radical production, fatty acid supply, and increased lipid peroxidation are key to the induction of ferroptosis [ 5 , 6 , 7 ], it is possible that they affect the development of OA by regulating lipid metabolism and fatty acid metabolism, which affects ferroptosis; however, this needs to be further investigated.

Notably, CDKN2A acts as both a cuproptosis-related gene and a ferroptosis-related gene simultaneously. CDKN2A is often considered an important gene in cellular senescence and aging [ 34 ], and it is used as a molecular marker of cellular senescence [ 35 ]. Our study showed that CDKN2A expression was higher in patients with OA, suggesting that CDKN2A may contribute to the development of OA by affecting cellular senescence and thereby promoting the development of OA.

This is the first study to use the new signature genes combining CRGs with FRGs to reveal the pathogenesis of OA and aid in its treatment. We executed three machine learning algorithms using the 40 c-FDEGs mentioned above and eventually identified four biomarkers: CDKN1A, FZD7, GABARAPL2, and SLC39A14.

Frizzled7 (FZD7) is known to be a receptor of the Wnt pathway. Fzl receptors are usually classified as belonging to the G protein receptor family and are rich in cysteine, which can directly interact with Wnt proteins and thus activate downstream responses [ 36 , 37 , 38 ]. Numerous studies have shown that excessive upregulation or downregulation of Wnt signaling pathways in OA may lead to cartilage damage and ultimately accelerate the progression of OA. Therefore, it is necessary and important to maintain a balance in the biological activity of Wnt-related pathways [ 39 , 40 , 41 ]. In the present study, FZD7 was significantly increased in the OA group compared with the non-OA group. Therefore, we speculate that an excess of FZD7 may lead to the abnormal activation of Wnt-related pathways and ultimately accelerate the development of OA.

ZIP14 (SLC39A14) is a metal transporter [ 42 ] that affects the metabolic balance of zinc, manganese, iron, copper, and other metals [ 43 ]. For example, ZIP14 can transport non-transferrin-bound iron (NTBI) [ 44 ] and ZIP14 can transport cadmium and manganese through metal/bicarbonate symbiotic activity [ 45 ]. It has been shown that OA is closely related to the metabolic balance of metals such as iron, copper, and manganese [ 14 , 15 , 31 , 46 , 47 , 48 ]. In this study, we found that ZIP14 was greatly reduced in the OA group compared with the non-OA group. Furthermore, scRNA-seq analysis showed that the distribution of SLC39A14 in OA patients varied significantly among cell populations, with low or even no expression in some cells, which is likely to disrupt the metal metabolic balance in the joints and eventually cause the accumulation of metals such as iron and copper. Therefore, SLC39A14 (ZIP14) may be a very important therapeutic target for OA treatment in the future.

ssGSEA showed that CDKN1A significantly positively correlated with TNF-α signaling via NF-κB, the TGF-β signaling pathway, hypoxia, the P53 pathway, apoptosis, mTORC1 signaling, and other gene sets, suggesting that CDKN1A may affect OA by regulating inflammation, apoptosis, and hypoxia. Although both the CDKN1A and GABARAPL2 genes have been reported previously [ 49 , 50 , 51 , 52 ], their relationship with ferroptosis and cuproptosis in OA is not yet known. This suggests that these genes may be targets not only for immunotherapy, inflammation, and autophagy but also for the treatment of cuproptosis and ferroptosis in OA. Notably, we found that melphalan, paclitaxel, vinblastine, and vantictumab may serve as potential drugs for the treatment of OA. Previous studies have reported that they act therapeutically by regulating CDKN1A or FZD7 [ 53 , 54 , 55 ], thus affecting processes such as the cell cycle, cell proliferation, and apoptosis, which also validates our prediction. We then constructed a disease model of OA based on these four biomarkers that could significantly improve our ability to recognize OA at an early stage. Thus, our findings suggest that CDKN1A, FZD7, GABARAPL2, and SLC39A14 are excellent disease biomarkers and potential therapeutic targets for OA, and the disease model constructed based on them has good diagnostic efficacy.

Recently, an increasing number of studies have shown that immune cell infiltration is essential for OA onset and development and cartilage repair [ 56 , 57 , 58 ]. Our study showed a close relationship between the seven hub genes and immune cells. Notably, there were significant positive correlations of PTGS2, IL6, and IL1B with M1 macrophages and activated mast cells. Previous studies have demonstrated that the activation of macrophages and mast cells may significantly accelerate the progression of OA [ 58 , 59 , 60 ]. Therefore, we speculate that PTGS2, IL6, and IL1B may influence the onset and progression of OA by regulating these cells. Interestingly, scRNA-seq analysis further revealed that PTGS2 was significantly highly expressed in mast cells, leading us to speculate that PTGC2 may influence the progression of OA by regulating the activation of mast cells and thus the progression of OA. Surprisingly, we found weak interactions between immune cells in the synovial tissue of patients with OA, whereas there were complex communication networks between immune and non-immune cells (fibroblasts, SMCs, and ECs). These hypotheses and questions require more studies to reveal intricate interrelationships between these c-FRGs, immune cells, and OA.

In addition, we found that C10orf91 could regulate CDKN1A and SLC39A14 by regulating hsa-miR-149-3p, hsa-miR-423-5p, hsa-miR-31-5p, and hsa-miR-30b-3p. Both hsa-miR-513a-3p and has-miR-548c-3p can regulate both CDKN1A and GABARAPL2; however, no related study has been reported yet, so this needs to be further investigated and validated in the future.

This study was conducted mainly using bioinformatics analysis, and despite the combination of scRNA-seq analysis and the use of powerful machine learning algorithms, such as RF and SVM-RFE, there are still some limitations to our study. First, the small sample size of the analysis may have led to inaccuracies in the determination of hub genes, CIBERSORT analysis, and single-cell analysis. Second, although the disease model nomogram was well validated, the data was obtained retrospectively from public databases, meaning that inherent selection bias may have affected their accuracy. In addition, while our data can show the correlation between OA and immune cells, they cannot reveal causality. Extensive prospective studies, as well as complementary in vivo and in vitro experimental studies, are necessary to validate the accuracy of potential therapeutic targets and biomarkers.

Conclusions

Our study showed that four genes—CDKN1A, FZD7, GABARAPL2, and SLC39A14—are good disease biomarkers and potential therapeutic targets for OA. Our study provides a theoretical basis and research direction for understanding the role of c-FRGs in the pathophysiological process and for potential therapeutic intervention in OA.

Availability of data and materials

The datasets used or analysed during the current study are available from the corresponding author on reasonable request.

Abbreviations

  • Osteoarthritis

Nonsteroidal anti-inflammatory drugs

Reactive oxygen species

Ferroptosis-related genes

Tricarboxylic acid

Cuproptosis-related genes

The new signature genes combining cuproptosis-related genes (CRGs) with ferroptosis-related genes (FRGs)

National Center for Biotechnology Information

Gene expression omnibus

Differentially expressed genes

Differentially expressed c-FRGs

Gene Ontology

Kyoto Encyclopedia of Genes and Genomes

Gene set enrichment analysis

Support vector machine recursive feature elimination

Random forest analysis

Least absolute shrinkage and selection operator

Receiver operating characteristic

Hunter DJ, Bierma-Zeinstra S. Osteoarthritis. Lancet. 2019;393(10182):1745–59.

Article   CAS   PubMed   Google Scholar  

Hunter DJ, Schofield D, Callander E. The individual and socioeconomic impact of osteoarthritis. Nat Rev Rheumatol. 2014;10(7):437–41.

Article   PubMed   Google Scholar  

Markenson JA. ACP Journal Club. Review: glucosamine and chondroitin, alone or in combination, do not clinically improve knee or hip pain in osteoarthritis. Ann Intern Med. 2011;154(6):Jc3-4.

Martel-Pelletier J, Barr AJ, Cicuttini FM, Conaghan PG, Cooper C, Goldring MB, Goldring SR, Jones G, Teichtahl AJ, Pelletier JP. Osteoarthritis Nat Rev Dis Primers. 2016;2:16072.

Jiang X, Stockwell BR, Conrad M. Ferroptosis: mechanisms, biology and role in disease. Nat rev mol cell bio. 2021;22(4):266–82.

Article   Google Scholar  

Stockwell BR, Friedmann Angeli JP, Bayir H, Bush AI, Conrad M, Dixon SJ, Fulda S, Gascón S, Hatzios SK, Kagan VE, et al. Ferroptosis: A Regulated Cell Death Nexus Linking Metabolism, Redox Biology, and Disease. Cell. 2017;171(2):273–85.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Dixon SJ, Lemberg KM, Lamprecht MR, Skouta R, Zaitsev EM, Gleason CE, Patel DN, Bauer AJ, Cantley AM, Yang WS, et al. Ferroptosis: an iron-dependent form of nonapoptotic cell death. Cell. 2012;149(5):1060–72.

Mobasheri A, Rayman MP, Gualillo O, Sellam J, van der Kraan P, Fearon U. The role of metabolism in the pathogenesis of osteoarthritis. Nat rev rheumatol. 2017;13(5):302–11.

Robinson WH, Lepus CM, Wang Q, Raghu H, Mao R, Lindstrom TM, Sokolove J. Low-grade inflammation as a key mediator of the pathogenesis of osteoarthritis. Nat rev rheumatol. 2016;12(10):580–92.

Wang S, Li W, Zhang P, Wang Z, Ma X, Liu C, Vasilev K, Zhang L, Zhou X, Liu L, et al. Mechanical overloading induces GPX4-regulated chondrocyte ferroptosis in osteoarthritis via Piezo1 channel facilitated calcium influx. J adv res. 2022;41((null)):63–75.

Yao X, Sun K, Yu S, Luo J, Guo J, Lin J, Wang G, Guo Z, Ye Y, Guo F. Chondrocyte ferroptosis contribute to the progression of osteoarthritis. J orthop transl. 2021;27((null)):33–43.

Google Scholar  

Liu H, Deng Z, Yu B, Liu H, Yang Z, Zeng A, Fu M. Identification of SLC3A2 as a Potential Therapeutic Target of Osteoarthritis Involved in Ferroptosis by Integrating Bioinformatics, Clinical Factors and Experiments. Cells. 2022;11(21):3430.

Xia L, Gong N. Identification and verification of ferroptosis-related genes in the synovial tissue of osteoarthritis using bioinformatics analysis. Front Mol Biosci. 2022;9:992044.

Yazar M, Sarban S, Kocyigit A, Isikan UE. Synovial fluid and plasma selenium, copper, zinc, and iron concentrations in patients with rheumatoid arthritis and osteoarthritis. Biol trace elem res. 2005;106(2):123–32.

Zhou J, Liu C, Sun Y, Francis M, Ryu MS, Grider A, Ye K. Genetically predicted circulating levels of copper and zinc are associated with osteoarthritis but not with rheumatoid arthritis. Osteoarthr cartilage. 2021;29(7):1029–35.

Article   CAS   Google Scholar  

Tiku ML, Narla H, Jain M, Yalamanchili P. Glucosamine prevents in vitro collagen degradation in chondrocytes by inhibiting advanced lipoxidation reactions and protein oxidation. Arthritis res ther. 2007;9(4):R76.

Article   PubMed   PubMed Central   Google Scholar  

Tsvetkov P, Coy S, Petrova B, Dreishpoon M, Verma A, Abdusamad M, Rossen J, Joesch-Cohen L, Humeidi R, Spangler RD, et al. Copper induces cell death by targeting lipoylated TCA cycle proteins. Science. 2022;375(6586):1254–61.

Myint ZW, Oo TH, Thein KZ, Tun AM, Saeed H. Copper deficiency anemia: review article. Ann hematol. 2018;97(9):1527–34.

Jiang X, Stockwell BR, Conrad M. Ferroptosis: mechanisms, biology and role in disease. Nat Rev Mol Cell Biol. 2021;22(4):266–82.

Dixon SJ, Olzmann JA: The cell biology of ferroptosis. Nat Rev Mol Cell Biol 2024.  https://doi.org/10.1038/s41580-024-00703-5 .

Stockwell BR. Ferroptosis turns 10: Emerging mechanisms, physiological functions, and therapeutic applications. Cell. 2022;185(14):2401–21.

Freshour SL, Kiwala S, Cotto KC, Coffman AC, McMichael JF, Song JF, Griffith JJ, Griffith M, Griffith OL, Wagner AH. Integration of the Drug-Gene Interaction Database (DGIdb 4.0) with open crowdsource efforts. Nucleic Acids Res. 2021;49(D1):D1144-d1151.

Chou CH, Jain V, Gibson J, Attarian DE, Haraden CA, Yohn CB, Laberge RM, Gregory S, Kraus VB. Synovial cell cross-talk with cartilage plays a major role in the pathogenesis of osteoarthritis. Sci Rep. 2020;10(1):10868.

Huang ZY, Luo ZY, Cai YR, Chou CH, Yao ML, Pei FX, Kraus VB, Zhou ZK. Single cell transcriptomics in human osteoarthritis synovium and in silico deconvoluted bulk RNA sequencing. Osteoarthritis Cartilage. 2022;30(3):475–80.

Lv Z, Han J, Li J, Guo H, Fei Y, Sun Z, Dong J, Wang M, Fan C, Li W, et al. Single cell RNA-seq analysis identifies ferroptotic chondrocyte cluster and reveals TRPV1 as an anti-ferroptotic target in osteoarthritis. EBioMedicine. 2022;84:104258.

Miao Y, Chen Y, Xue F, Liu K, Zhu B, Gao J, Yin J, Zhang C, Li G. Contribution of ferroptosis and GPX4’s dual functions to osteoarthritis progression. EBioMedicine. 2022;76:103847.

Sun K, Guo Z, Hou L, Xu J, Du T, Xu T, Guo F. Iron homeostasis in arthropathies: From pathogenesis to therapeutic potential. Ageing res rev. 2021;72:101481.

Zhang S, Xu J, Si H, Wu Y, Zhou S, Shen B. The Role Played by Ferroptosis in Osteoarthritis: Evidence Based on Iron Dyshomeostasis and Lipid Peroxidation. Antioxidants (Basel). 2022;11(9):1668.

Zhou X, Zheng Y, Sun W, Zhang Z, Liu J, Yang W, Yuan W, Yi Y, Wang J, Liu J. D-mannose alleviates osteoarthritis progression by inhibiting chondrocyte ferroptosis in a HIF-2α-dependent manner. Cell proliferat. 2021;54(11):e13134.

Guo Z, Lin J, Sun K, Guo J, Yao X, Wang G, Hou L, Xu J, Guo J, Guo F. Deferoxamine Alleviates Osteoarthritis by Inhibiting Chondrocyte Ferroptosis and Activating the Nrf2 Pathway. Front Pharmacol. 2022;13:791376.

Lin R, Deng C, Li X, Liu Y, Zhang M, Qin C, Yao Q, Wang L, Wu C. Copper-incorporated bioactive glass-ceramics inducing anti-inflammatory phenotype and regeneration of cartilage/bone interface. Theranostics. 2019;9(21):6300–13.

Chen YH, Hsieh SC, Chen WY, Li KJ, Wu CH, Wu PC, Tsai CY, Yu CL. Spontaneous resolution of acute gouty arthritis is associated with rapid induction of the anti-inflammatory factors TGFβ1, IL-10 and soluble TNF receptors and the intracellular cytokine negative regulators CIS and SOCS3. Ann rheum dis. 2011;70(9):1655–63.

Kobayashi H, Chang SH, Mori D, Itoh S, Hirata M, Hosaka Y, Taniguchi Y, Okada K, Mori Y, Yano F, et al. Biphasic regulation of chondrocytes by Rela through induction of anti-apoptotic and catabolic target genes. Nat Commun. 2016;7:13336.

Melzer D, Pilling LC, Ferrucci L. The genetics of human ageing. Nat rev genet. 2020;21(2):88–101.

Lye JJ, Latorre E, Lee BP, Bandinelli S, Holley JE, Gutowski NJ, Ferrucci L, Harries LW. Astrocyte senescence may drive alterations in GFAPα, CDKN2A p14(ARF), and TAU3 transcript expression and contribute to cognitive decline. Geroscience. 2019;41(5):561–73.

Bhanot P, Brink M, Samos CH, Hsieh JC, Wang Y, Macke JP, Andrew D, Nathans J, Nusse R. A new member of the frizzled family from Drosophila functions as a Wingless receptor. Nature. 1996;382(6588):225–30.

Yang-Snyder J, Miller JR, Brown JD, Lai CJ, Moon RT. A frizzled homolog functions in a vertebrate Wnt signaling pathway. Curr Biol. 1996;6(10):1302–6.

Foord SM, Bonner TI, Neubig RR, Rosser EM, Pin JP, Davenport AP, Spedding M, Harmar AJ. International Union of Pharmacology. XLVI. G protein-coupled receptor list. Pharmacol Rev. 2005;57(2):279–88.

Nalesso G, Thomas BL, Sherwood JC, Yu J, Addimanda O, Eldridge SE, Thorup AS, Dale L, Schett G, Zwerina J, et al. WNT16 antagonises excessive canonical WNT activation and protects cartilage in osteoarthritis. Ann Rheum Dis. 2017;76(1):218–26.

Tong W, Zeng Y, Chow DHK, Yeung W, Xu J, Deng Y, Chen S, Zhao H, Zhang X, Ho KK, et al. Wnt16 attenuates osteoarthritis progression through a PCP/JNK-mTORC1-PTHrP cascade. Ann Rheum Dis. 2019;78(4):551–61.

Lories RJ, Corr M, Lane NE. To Wnt or not to Wnt: the bone and joint health dilemma. Nat Rev Rheumatol. 2013;9(6):328–39.

Aydemir TB, Cousins RJ. The Multiple Faces of the Metal Transporter ZIP14 (SLC39A14). J nutr. 2018;148(2):174–84.

Pinilla-Tenas JJ, Sparkman BK, Shawki A, Illing AC, Mitchell CJ, Zhao N, Liuzzi JP, Cousins RJ, Knutson MD, Mackenzie B. Zip14 is a complex broad-scope metal-ion transporter whose functional properties support roles in the cellular uptake of zinc and nontransferrin-bound iron. Am J Physiol Cell Physiol. 2011;301(4):C862-871.

Liuzzi JP, Aydemir F, Nam H, Knutson MD, Cousins RJ. Zip14 (Slc39a14) mediates non-transferrin-bound iron uptake into cells. Proc Natl Acad Sci U S A. 2006;103(37):13612–7.

Girijashanker K, He L, Soleimani M, Reed JM, Li H, Liu Z, Wang B, Dalton TP, Nebert DW. Slc39a14 gene encodes ZIP14, a metal/bicarbonate symporter: similarities to the ZIP8 transporter. Mol Pharmacol. 2008;73(5):1413–23.

He H, Wang Y, Yang Z, Ding X, Yang T, Lei G, Li H, Xie D. Association between serum zinc and copper concentrations and copper/zinc ratio with the prevalence of knee chondrocalcinosis: a cross-sectional study. BMC Musculoskelet Disord. 2020;21(1):97.

Li G, Cheng T, Yu X. The Impact of Trace Elements on Osteoarthritis. Front Med (Lausanne). 2021;8:771297.

Roczniak W, Brodziak-Dopierała B, Cipora E, Jakóbik-Kolon A, Kluczka J, Babuśka-Roczniak M. Factors that Affect the Content of Cadmium, Nickel, Copper and Zinc in Tissues of the Knee Joint. Biol trace elem res. 2017;178(2):201–9.

Bertram KL, Narendran N, Tailor P, Jablonski C, Leonard C, Irvine E, Hess R, Masson AO, Abubacker S, Rinker K, et al. 17-DMAG regulates p21 expression Irvine  to induce chondrogenesis in vitro and in vivo. Dis Model Mech. 2018;11((10):dmm033662.

Ding Z, Lu W, Dai C, Huang W, Liu F, Shan W, Cheng C, Xu J, Yin Z, He W. The CRD of Frizzled 7 exhibits chondroprotective effects in osteoarthritis via inhibition of the canonical Wnt3a/β-catenin signaling pathway. Int immunopharmacol. 2020;82:106367.

Faust HJ, Zhang H, Han J, Wolf MT, Jeon OH, Sadtler K, Peña AN, Chung L, Maestas DR, Tam AJ, et al. IL-17 and immunologically induced senescence regulate response to injury in osteoarthritis. J clin invest. 2020;130(10):5493–507.

Xing D, Wang B, Xu Y, Tao K, Lin J. Overexpression of microRNA-1 controls the development of osteoarthritis via targeting FZD7 of Wnt/β-catenin signaling. Osteoarthr cartilage. 2016;24:S181-s182.

Flanagan DJ, Barker N, Costanzo NSD, Mason EA, Gurney A, Meniel VS, Koushyar S, Austin CR, Ernst M, Pearson HB, et al. Frizzled-7 Is Required for Wnt Signaling in Gastric Tumors with and Without Apc Mutations. Cancer Res. 2019;79(5):970–81.

Cho HJ, Mei AHC, Tung K, Han J, Perumal D, Keats JJ, Auclair D, Chari A, Jagannath S, Parekh S. MAGE-A3 Promotes Chemotherapy Resistance and Proliferation in Multiple Myeloma through Regulation of BIM and p21Cip1. Blood. 2018;132:4464.

Wang Z, Li Y, Wu D, Yu S, Wang Y, Leung Chan F. Nuclear receptor HNF4α performs a tumor suppressor function in prostate cancer via its induction of p21-driven cellular senescence. Oncogene. 2020;39(7):1572–89.

Sakkas LI, Platsoucas CD. The role of T cells in the pathogenesis of osteoarthritis. Arthritis Rheum. 2007;56(2):409–24.

Li M, Yin H, Yan Z, Li H, Wu J, Wang Y, Wei F, Tian G, Ning C, Li H, et al. The immune microenvironment in cartilage injury and repair. Acta biomater. 2022;140:23–42.

Wang Q, Lepus CM, Raghu H, Reber LL, Tsai MM, Wong HH, von Kaeppler E, Lingampalli N, Bloom MS, Hu N, et al. IgE-mediated mast cell activation promotes inflammation and cartilage destruction in osteoarthritis. Elife. 2019;8:e39905.

Liu B, Zhang M, Zhao J, Zheng M, Yang H. Imbalance of M1/M2 macrophages is linked to severity level of knee osteoarthritis. Exp Ther Med. 2018;16(6):5009–14.

CAS   PubMed   PubMed Central   Google Scholar  

Lee H, Kashiwakura J, Matsuda A, Watanabe Y, Sakamoto-Sasaki T, Matsumoto K, Hashimoto N, Saito S, Ohmori K, Nagaoka M, et al. Activation of human synovial mast cells from rheumatoid arthritis or osteoarthritis patients in response to aggregated IgG through Fcγ receptor I and Fcγ receptor II. Arthritis rheum-us. 2013;65(1):109–19.

Download references

Acknowledgments

This study was a re-analysis based on published data from the GEO database. We would like to thank the GEO database for sharing the data.

This study was supported by Sichuan Medical Association (No. S17075, Q22008, Q21005), the Sichuan Science and Technology Program(No. 24NSFSC2177), the Science and Technology Strategic Cooperation Project between the People's Government of Luzhou City and Southwest Medical University (No. 2020LZXNYDJ22), the Doctoral Research Initiation Fund of Affiliated Hospital of Southwest Medical University (No. 22155), and Sichuan Student Innovation and Entrepreneurship Training Program Project (No. S202010632174).

Author information

Baoqiang He and Yehui Liao are contributed equally.

Authors and Affiliations

Department of Orthopedics, The Affiliated Hospital of Southwest Medical University, No. 25 Taping Street, Lu Zhou City, China

Baoqiang He, Yehui Liao, Minghao Tian, Chao Tang, Qiang Tang, Fei Ma, Wenyang Zhou, Yebo Leng & Dejun Zhong

Southwest Medical University, Lu Zhou City, China

Baoqiang He & Dejun Zhong

Meishan Tianfu New Area People’s Hospital, Meishan City, China

You can also search for this author in PubMed   Google Scholar

Contributions

HBQ, LYB, LYH and ZDJ designed the study. Data analysis was performed by HBQ, TC, TQ and MF. HBQ, TMH and ZWY carried out the experiments. HBQ, LYB, and ZDJ wrote the first draft. ZDJ critically revised the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Yebo Leng or Dejun Zhong .

Ethics declarations

Ethics approval and consent to participate.

Synovial tissue collection and all experimental procedures were approved by the Institutional Review Board of the Affiliated Hospital of Southwest Medical University (KY2023293) in accordance with the guidelines of the Chinese Health Sciences Administration, and written informed consent was obtained from the donors.

Consent for publication

All authors agree to publish.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

He, B., Liao, Y., Tian, M. et al. Identification and verification of a novel signature that combines cuproptosis-related genes with ferroptosis-related genes in osteoarthritis using bioinformatics analysis and experimental validation. Arthritis Res Ther 26 , 100 (2024). https://doi.org/10.1186/s13075-024-03328-3

Download citation

Received : 22 January 2024

Accepted : 23 April 2024

Published : 13 May 2024

DOI : https://doi.org/10.1186/s13075-024-03328-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Cuproptosis
  • Ferroptosis
  • Machine learning
  • Bioinformatics

Arthritis Research & Therapy

ISSN: 1478-6362

example of research results and discussion

IMAGES

  1. Guide to Writing the Results and Discussion Sections of a Scientific

    example of research results and discussion

  2. How to Write Your Results and Discussion Section for a research article

    example of research results and discussion

  3. (PDF) Qualitative Content Analysis: Results and Discussion

    example of research results and discussion

  4. 6 Writing the Discussion and Conclusion Sections

    example of research results and discussion

  5. Results and Discussion Example

    example of research results and discussion

  6. How To Write The Discussion Section Of A Research Paper Apa Ee

    example of research results and discussion

VIDEO

  1. PSYCHOLOGY RESEARCH METHODS 9: SECTIONS OF A SCIENTIFIC REPORT

  2. How to Write the Results and Discussion in A Research Paper

  3. Research Designs, Qualitative and quantitative research designs

  4. Why do research proposals get rejected?

  5. What is Research

  6. Insider Secrets: The Three Types of Research!

COMMENTS

  1. Guide to Writing the Results and Discussion Sections of a ...

    Tips to Write the Results Section. Direct the reader to the research data and explain the meaning of the data. Avoid using a repetitive sentence structure to explain a new set of data. Write and highlight important findings in your results. Use the same order as the subheadings of the methods section.

  2. How to Write a Discussion Section

    The discussion section is where you delve into the meaning, importance, and relevance of your results.. It should focus on explaining and evaluating what you found, showing how it relates to your literature review and paper or dissertation topic, and making an argument in support of your overall conclusion.It should not be a second results section.. There are different ways to write this ...

  3. How to Write a Results Section

    A two-sample t test was used to test the hypothesis that higher social distance from environmental problems would reduce the intent to donate to environmental organizations, with donation intention (recorded as a score from 1 to 10) as the outcome variable and social distance (categorized as either a low or high level of social distance) as the predictor variable.Social distance was found to ...

  4. Research Results Section

    The discussion should also address the study's research questions and explain how the results contribute to the field of study. Limitations: This section should acknowledge any limitations of the study, such as sample size, data collection methods, or other factors that may have influenced the results.

  5. How to Write Discussions and Conclusions

    the results of your research, a discussion of related research, and; a comparison between your results and initial hypothesis. ... reduce dependence on traditionally single-use plastic items (e.g. shampoo bottles), for example by refilling or buying larger bottles; (3) replace plastic items with reusable and/or alternative products with a lower ...

  6. Reporting Research Results in APA Style

    Making scientific research available to others is a key part of academic integrity and open science. Interpretation or discussion of results; This belongs in your discussion section. Your results section is where you objectively report all relevant findings and leave them open for interpretation by readers.

  7. PDF 7th Edition Discussion Phrases Guide

    Discussion Phrases Guide. Papers usually end with a concluding section, often called the "Discussion.". The Discussion is your opportunity to evaluate and interpret the results of your study or paper, draw inferences and conclusions from it, and communicate its contributions to science and/or society. Use the present tense when writing the ...

  8. How to Write a Results Section

    A two-sample t test was used to test the hypothesis that higher social distance from environmental problems would reduce the intent to donate to environmental organisations, with donation intention (recorded as a score from 1 to 10) as the outcome variable and social distance (categorised as either a low or high level of social distance) as the predictor variable.Social distance was found to ...

  9. 8. The Discussion

    The discussion section is often considered the most important part of your research paper because it: Most effectively demonstrates your ability as a researcher to think critically about an issue, to develop creative solutions to problems based upon a logical synthesis of the findings, and to formulate a deeper, more profound understanding of the research problem under investigation;

  10. PDF Results Section for Research Papers

    The results section of a research paper tells the reader what you found, while the discussion section tells the reader what your findings mean. The results section should present the facts in an academic and unbiased manner, avoiding any attempt at analyzing or interpreting the data. Think of the results section as setting the stage for the ...

  11. How to Write the Discussion Section of a Research Paper

    The discussion section provides an analysis and interpretation of the findings, compares them with previous studies, identifies limitations, and suggests future directions for research. This section combines information from the preceding parts of your paper into a coherent story. By this point, the reader already knows why you did your study ...

  12. PDF Discussion Section for Research Papers

    The discussion section is one of the final parts of a research paper, in which an author describes, analyzes, and interprets their findings. They explain the significance of those results and tie everything back to the research question(s). In this handout, you will find a description of what a discussion section does, explanations of how to ...

  13. How To Write A Dissertation Discussion Chapter

    Step 1: Restate your research problem and research questions. The first step in writing up your discussion chapter is to remind your reader of your research problem, as well as your research aim (s) and research questions. If you have hypotheses, you can also briefly mention these.

  14. The Writing Center

    IMRaD Results Discussion. Results and Discussion Sections in Scientific Research Reports (IMRaD) After introducing the study and describing its methodology, an IMRaD* report presents and discusses the main findings of the study. In the results section, writers systematically report their findings, and in discussion, they interpret these findings.

  15. Dissertation Writing: Results and Discussion

    Summarise your results in the text, drawing on the figures and tables to illustrate your points. The text and figures should be complementary, not repeat the same information. You should refer to every table or figure in the text. Any that you don't feel the need to refer to can safely be moved to an appendix, or even removed.

  16. 7. The Results

    For most research papers in the social and behavioral sciences, there are two possible ways of organizing the results. Both approaches are appropriate in how you report your findings, but use only one approach. Present a synopsis of the results followed by an explanation of key findings. This approach can be used to highlight important findings.

  17. The Results and Discussion

    Guide contents. As part of the Writing the Dissertation series, this guide covers the most common conventions of the results and discussion chapters, giving you the necessary knowledge, tips and guidance needed to impress your markers! The sections are organised as follows: The Difference - Breaks down the distinctions between the results and discussion chapters.

  18. How to Write an Effective Discussion in a Research Paper; a Guide to

    Explaining the meaning of the results to the reader is the purpose of the discussion section of a research paper. There are elements of the discussion that should be included and other things that ...

  19. How to Write a Discussion Section

    Table of contents. What not to include in your discussion section. Step 1: Summarise your key findings. Step 2: Give your interpretations. Step 3: Discuss the implications. Step 4: Acknowledge the limitations. Step 5: Share your recommendations. Discussion section example.

  20. Results, Discussion, and Conclusion

    The Results (or Findings) section follows the Methods and precedes the Discussion section. This is where the authors provide the data collected during their study. That data can sometimes be difficult to understand because it is often quite technical. Do not let this intimidate you; you will discover the significance of the results next. Discussion

  21. Research Guides: Writing a Scientific Paper: RESULTS

    Present the results of the paper, in logical order, using tables and graphs as necessary. Explain the results and show how they help to answer the research questions posed in the Introduction. Evidence does not explain itself; the results must be presented and then explained. Avoid: presenting results that are never discussed; presenting ...

  22. Results & Discussion

    The Results and Discussion sections can be written as separate sections (as shown in Fig. 2), but are often combined in a poster into one section called Results and Discussion. This is done in order to (1) save precious space on a poster for the many pieces of information that a scientist would like to tell their audience and (2) by combining the two sections, it becomes easier for the ...

  23. <em>Psychology and Psychotherapy: Theory, Research and Practice</em

    Objectives. The current study aimed to examine: (1.1) causal beliefs about adolescent depression in a sample of adolescents with a clinical depression and their mothers and fathers; (1.2) within-family overlap of causal beliefs; (2.1) mothers' and fathers' reflected causal beliefs about their child's perspective; (2.2) the accuracy of mothers' and fathers' reflected causal beliefs as related ...

  24. Assessing the evolution of research topics in a biological field using

    Our ability to understand the progress of science through the evolution of research topics is limited by the need for specialist knowledge and the exponential growth of the literature. This study uses artificial intelligence and machine learning approaches to demonstrate how a biological field (plant science) has evolved, how the model systems have changed, and how countries differ in terms of ...

  25. Enterprise's Strategic Agility and Resource Allocation ...

    The research results show that different strategic agile enterprises should adopt corresponding adaptive resource allocation models to adapt to the fiercely competitive market. ... The second section is theory and hypotheses. In the third section, methodology is proposed. The fourth section is results and discussion. The final section is a ...

  26. Psychometric properties and criterion related validity of the Norwegian

    Background Several studies have been conducted with the 1.0 version of the Hospital Survey on Patient Safety Culture (HSOPSC) in Norway and globally. The 2.0 version has not been translated and tested in Norwegian hospital settings. This study aims to 1) assess the psychometrics of the Norwegian version (N-HSOPSC 2.0), and 2) assess the criterion validity of the N-HSOPSC 2.0, adding two more ...

  27. What's the difference between results and discussion?

    The results chapter or section simply and objectively reports what you found, without speculating on why you found these results. The discussion interprets the meaning of the results, puts them in context, and explains why they matter. In qualitative research, results and discussion are sometimes combined. But in quantitative research, it's ...

  28. Diagnostics

    Serous effusion cytology is a pivotal diagnostic and staging tool in clinical pathology, valued for its simplicity and cost-effectiveness. Staining techniques such as Giemsa and Papanicolaou are foundational, yet the search for rapid and efficient alternatives continues. Our study assesses the efficacy of an in-house-developed BlueStain, a toluidine blue variant, within the International ...

  29. Enchancing country-level impact for health policy and systems research

    At a recent webinar hosted by the Alliance for Health Policy and Systems Research, global health experts gathered to discuss the critical role of health policy and systems research in advancing country-specific health priorities. The event, Aiming for impact: embedding health policy and systems research to advance country priorities, also marked the launch of the Alliance's new five-year ...

  30. Identification and verification of a novel signature that combines

    Background Exploring the pathogenesis of osteoarthritis (OA) is important for its prevention, diagnosis, and treatment. Therefore, we aimed to construct novel signature genes (c-FRGs) combining cuproptosis-related genes (CRGs) with ferroptosis-related genes (FRGs) to explore the pathogenesis of OA and aid in its treatment. Materials and methods Differentially expressed c-FRGs (c-FDEGs) were ...