reporting statistics in a research paper

When you choose to publish with PLOS, your research makes an impact. Make your work accessible to all, without restrictions, and accelerate scientific discovery with options like preprints and published peer review that make your work more Open.

PLOS Biology
PLOS Climate
PLOS Complex Systems
PLOS Computational Biology
PLOS Digital Health
PLOS Genetics
PLOS Global Public Health
PLOS Medicine
PLOS Mental Health
PLOS Neglected Tropical Diseases
PLOS Pathogens
PLOS Sustainability and Transformation
PLOS Collections
How to Report Statistics

Ensure appropriateness and rigor, avoid flexibility and above all never manipulate results

In many fields, a statistical analysis forms the heart of both the methods and results sections of a manuscript. Learn how to report statistical analyses, and what other context is important for publication success and future reproducibility.

A matter of principle

First and foremost, the statistical methods employed in research must always be:

Appropriate for the study design

Rigorously reported in sufficient detail for others to reproduce the analysis

Free of manipulation, selective reporting, or other forms of “spin”

Just as importantly, statistical practices must never be manipulated or misused . Misrepresenting data, selectively reporting results or searching for patterns that can be presented as statistically significant, in an attempt to yield a conclusion that is believed to be more worthy of attention or publication is a serious ethical violation. Although it may seem harmless, using statistics to “spin” results can prevent publication, undermine a published study, or lead to investigation and retraction.

Supporting public trust in science through transparency and consistency

Along with clear methods and transparent study design, the appropriate use of statistical methods and analyses impacts editorial evaluation and readers’ understanding and trust in science.

In 2011 False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant exposed that “flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates” and demonstrated “how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis”.

Arguably, such problems with flexible analysis lead to the “ reproducibility crisis ” that we read about today.

A constant principle of rigorous science The appropriate, rigorous, and transparent use of statistics is a constant principle of rigorous, transparent, and Open Science. Aim to be thorough, even if a particular journal doesn’t require the same level of detail. Trust in science is all of our responsibility. You cannot create any problems by exceeding a minimum standard of information and reporting.

reporting statistics in a research paper

Sound statistical practices

While it is hard to provide statistical guidelines that are relevant for all disciplines, types of research, and all analytical techniques, adherence to rigorous and appropriate principles remains key. Here are some ways to ensure your statistics are sound.

Define your analytical methodology before you begin Take the time to consider and develop a thorough study design that defines your line of inquiry, what you plan to do, what data you will collect, and how you will analyze it. (If you applied for research grants or ethical approval, you probably already have a plan in hand!) Refer back to your study design at key moments in the research process, and above all, stick to it.

To avoid flexibility and improve the odds of acceptance, preregister your study design with a journal Many journals offer the option to submit a study design for peer review before research begins through a practice known as preregistration. If the editors approve your study design, you’ll receive a provisional acceptance for a future research article reporting the results. Preregistering is a great way to head off any intentional or unintentional flexibility in analysis. By declaring your analytical approach in advance you’ll increase the credibility and reproducibility of your results and help address publication bias, too. Getting peer review feedback on your study design and analysis plan before it has begun (when you can still make changes!) makes your research even stronger AND increases your chances of publication—even if the results are negative or null. Never underestimate how much you can help increase the public’s trust in science by planning your research in this way.

Imagine replicating or extending your own work, years in the future Imagine that you are describing your approach to statistical analysis for your future self, in exactly the same way as we have described for writing your methods section . What would you need to know to replicate or extend your own work? When you consider that you might be at a different institution, working with different colleagues, using different programs, applications, resources — or maybe even adopting new statistical techniques that have emerged — you can help yourself imagine the level of reporting specificity that you yourself would require to redo or extend your work. Consider:

Which details would you need to be reminded of?
What did you do to the raw data before analysis?
Did the purpose of the analysis change before or during the experiments?
What participants did you decide to exclude?
What process did you adjust, during your work?

Even if a necessary adjustment you made was not ideal, transparency is the key to ensuring this is not regarded as an issue in the future. It is far better to transparently convey any non-optimal techniques or constraints than to conceal them, which could result in reproducibility or ethical issues downstream.

Existing standards, checklists, guidelines for specific disciplines

You can apply the Open Science practices outlined above no matter what your area of expertise—but in many cases, you may still need more detailed guidance specific to your own field. Many disciplines, fields, and projects have worked hard to develop guidelines and resources to help with statistics, and to identify and avoid bad statistical practices. Below, you’ll find some of the key materials.

TIP: Do you have a specific journal in mind?

Be sure to read the submission guidelines for the specific journal you are submitting to, in order to discover any journal- or field-specific policies, initiatives or tools to utilize.

Articles on statistical methods and reporting

Makin, T.R., Orban de Xivry, J. Science Forum: Ten common statistical mistakes to watch out for when writing or reviewing a manuscript . eLife 2019;8:e48175 (2019). https://doi.org/10.7554/eLife.48175

Munafò, M., Nosek, B., Bishop, D. et al. A manifesto for reproducible science . Nat Hum Behav 1, 0021 (2017). https://doi.org/10.1038/s41562-016-0021

Writing tips

Your use of statistics should be rigorous, appropriate, and uncompromising in avoidance of analytical flexibility. While this is difficult, do not compromise on rigorous standards for credibility!

Remember that trust in science is everyone’s responsibility.
Keep in mind future replicability.
Consider preregistering your analysis plan to have it (i) reviewed before results are collected to check problems before they occur and (ii) to avoid any analytical flexibility.
Follow principles, but also checklists and field- and journal-specific guidelines.
Consider a commitment to rigorous and transparent science a personal responsibility, and not simple adhering to journal guidelines.
Be specific about all decisions made during the experiments that someone reproducing your work would need to know.
Consider a course in advanced and new statistics, if you feel you have not focused on it enough during your research training.

Don’t

Misuse statistics to influence significance or other interpretations of results
Conduct your statistical analyses if you are unsure of what you are doing—seek feedback (e.g. via preregistration) from a statistical specialist first.
How to Write a Great Title
How to Write an Abstract
How to Write Your Methods
How to Write Discussions and Conclusions
How to Edit Your Work

The contents of the Peer Review Center are also available as a live, interactive training session, complete with slides, talking points, and activities. …

The contents of the Writing Center are also available as a live, interactive training session, complete with slides, talking points, and activities. …

There’s a lot to consider when deciding where to submit your work. Learn how to choose a journal that will help your study reach its audience, while reflecting your values as a researcher…

The Plagiarism Checker Online For Your Academic Work

Start Plagiarism Check

Editing & Proofreading for Your Research Paper

Get it proofread now

Online Printing & Binding with Free Express Delivery

Configure binding now

Academic essay overview
The writing process
Structuring academic essays
Types of academic essays
Academic writing overview
Sentence structure
Academic writing process
Improving your academic writing
Titles and headings
APA style overview
APA citation & referencing
APA structure & sections
Citation & referencing
Structure and sections
APA examples overview
Commonly used citations
Other examples
British English vs. American English
Chicago style overview
Chicago citation & referencing
Chicago structure & sections
Chicago style examples
Citing sources overview
Citation format
Citation examples
College essay overview
Application
How to write a college essay
Types of college essays
Commonly confused words
Definitions
Dissertation overview
Dissertation structure & sections
Dissertation writing process
Graduate school overview
Application & admission
Study abroad
Master degree
Harvard referencing overview
Language rules overview
Grammatical rules & structures
Parts of speech
Punctuation
Methodology overview
Analyzing data
Experiments
Observations
Inductive vs. Deductive
Qualitative vs. Quantitative
Types of validity
Types of reliability
Sampling methods
Theories & Concepts
Types of research studies
Types of variables
MLA style overview
MLA examples
MLA citation & referencing
MLA structure & sections
Plagiarism overview
Plagiarism checker
Types of plagiarism
Printing production overview
Research bias overview
Types of research bias
Example sections
Types of research papers
Research process overview
Problem statement
Research proposal
Research topic
Statistics overview
Levels of measurment
Frequency distribution
Measures of central tendency
Measures of variability
Hypothesis testing
Parameters & test statistics
Types of distributions
Correlation
Effect size
Hypothesis testing assumptions
Types of ANOVAs
Types of chi-square
Statistical data
Statistical models
Spelling mistakes
Tips overview
Academic writing tips
Dissertation tips
Sources tips
Working with sources overview
Evaluating sources
Finding sources
Including sources
Types of sources

Your Step to Success

Plagiarism Check within 10min

Printing & Binding with 3D Live Preview

Reporting Statistics In APA – A Guide With Rules & Examples

How do you like this article cancel reply.

Save my name, email, and website in this browser for the next time I comment.

In academic writing, accurately reporting statistics is crucial. Following the APA guidelines for reporting statistics ensures clarity and consistency. This guide provides researchers with a definitive roadmap for presenting quantitative results in APA style , from basic descriptive statistics to complex inferential analyses. Mastering the art of reporting statistics in APA enhances the credibility and impact of research, fostering academic rigor and evidence-based conclusions.

Inhaltsverzeichnis

1 In a Nutshell: Reporting statistics in APA
2 Definition: Reporting statistics in APA
3 Reporting statistics in APA: Statistical results
4 Reporting statistics in APA: Formatting guidelines
5 Reporting statistic tests in APA

In a Nutshell: Reporting statistics in APA

Statistical analysis is the process of collecting and testing quantitative data to make extrapolations about certain elements or the world in general.
The APA Publication Manual provides guidelines and standard suggestions for reporting statistics in APA.
The formula for representing statistics in APA differs depending on the type of statistics.

Definition: Reporting statistics in APA

The APA Publication Manual provides guidelines and standard suggestions for formatting and reporting statistics in APA. Here are the general rules for reporting statistics in APA:

Use words for numbers under ten (1-9) and numerals for ten and over
Use space after commas, variables, and mathematical symbols
Round the decimal points to two places, except for p-values
Italicize the symbols and abbreviations, except if you have Greek letters

Reporting statistics in APA: Statistical results

Below are some basic guidelines for reporting statistics in APA:

Before presenting the data, repeat the hypotheses and explain if your statistical results support them.
Present the results in a condensed format without interpretation
Do not go into the tests you used
Every report should relate to the hypothesis

Reporting statistics in APA: Formatting guidelines

This section provides guidelines for presenting test results when reporting statistics in APA.

Stating numbers

The general APA guidelines recommend using words for numbers below ten and numerals for ten and above. Use numerals for exact numbers before measurement units, equations, percentages, points on a scale, money, ratios, uncommon fractions, and decimals.

Measuring units

Regarding units of measurement, the rule is to use numerals to report exact measurements.

The stone weighed 9 kg.

In reporting statistics in APA, include a space between the abbreviation and the digit to represent units. On the other hand, when stating approximate digits, use words to express numbers below ten, then spell out the unit names.

The stone weighed approximately nine kilograms.

It is worth mentioning that all quantities should be reported in metric units. However, if you recorded them in non-metrical units, include metric equals in your report and the original units.

Percentages

When using percentages while reporting statistics in APA, use numerals followed by the % symbol.

Of the participants, 19% disagreed with the statement.

Decimal places

The content or information you wish to report will influence the number of decimal places you use. However, the general rule for reporting statistics in APA is to round off the numbers while retaining precision. Here are some stats you can round off to one place when reporting statistics in APA:

Standard deviation
Descriptive statistics based on discrete data

You can round off to two decimal places when reporting:

Correlation coefficient
Proportions and ratios
Inferential statistics , like t- and f-values
Exact p-values (more than .001)

Use a leading zero

The zero before a decimal point when a number is less than one is called the leading zero. According to guidelines for reporting statistics in APA, you can only use a leading zero in the following cases:

When the statistics you want to describe are greater than one

In contrast, you do not need the leading zero when:

The variables can never be greater than one
Pearson correlation coefficient
Coefficient of determination
Cronbach’s alpha

Mathematical formulas

While reporting statistics in APA, you must provide formulas for new and uncommon equations. If the formulas are short, present them in one line within the main text. For complex equations, you can take more than one line to present them.

Using parentheses or brackets

While reporting statistics in APA, use round brackets for primary operations (first steps), square brackets for secondary ones (second steps), and then curly brackets for tertiary ones (third steps). While reporting statistics in APA, you should purpose to avoid nested parentheses if necessary.

Reporting statistic tests in APA

When reporting statistics in APA, use descriptive statistics as summaries for your data. Below are guidelines for reporting statistics in APA regarding statistical tests.

Descriptive statistics: Means and standard variations

Means and standard deviations should appear in the main text and parentheses (either or both). Regarding statistics relating to the same data, you do not need to repeat the measurement units.

The average productivity rate was 124.7 minutes (SD = 12.1).

Chi-square tests

Include the following when reporting Chi-square tests :

Degrees of freedom in the parentheses (df)
Chi-square values (X 2 )

The chi-square revealed a link between weather and productivity, X 2 (9) = 21.7, P = .013.

Hypothesis tests

When reporting statistics in APA, reports for t-tests should have:

Participants’ scores were higher than the population, z = 3.35, p = .012. (2)

When reporting statistics in APA, reports for p-tests should have:

Degrees of freedom in parentheses

Females experienced more severe symptoms than makes, t(33) = 3.87, p = .007.

Reporting ANOVA

When reporting statistics in APA, the ANOVA reports should include:

Degrees of freedom in the parentheses

We found a statistical significance in the leadership style effect on productivity F(3, 78) = 5.68, P =.018.

Correlations

Correlation reports when reporting statistics in APA should include:

We found a strong link between the temperature and productivity levels, r(401) =.43, p less than .00.1.

Regressions

Display regression results in a table. However, if you present them in the text, the report should include:

GAT Severity scores predicted anxiety levels, R 2 = .31, F(1, 510) = 6.71, p = .009.

Confidence intervals

When reporting statistics in APA, you have to report the confidence levels. Use parentheses to enclose the lower and upper limits of the confidence interval, separated by a comma.

Male participants experienced more positive results than female ones, t(19) = 3.94, p = .007, d= 0.88, 90% CI [0.6, 1.13].
On average, the tests resulted in a 43% increase in positive feelings, 97% CI [21.34, 44.5]
✓ Post a picture on Instagram
✓ Get the most likes on your picture
✓ Receive up to $300 cash back

Which specific statistical results need to be reported in APA format?

Statistical results like proportions, measurements, ranges, percentages, and others that describe samples can be reported in APA format.

What is the formula for reporting statistics in APA?

The formula differs depending on the type of statistics.

What results do you need for reporting statistics in APA?

You need the p-value, t-value, freedom degrees, and course of effect.

How many decimal places can you use when reporting statistics in APA?

Depending on the statistical data, you can use one or two decimal places.

We use cookies on our website. Some of them are essential, while others help us to improve this website and your experience.

External Media

Individual Privacy Preferences

Cookie Details Privacy Policy Imprint

Here you will find an overview of all cookies used. You can give your consent to whole categories or display further information and select certain cookies.

Accept all Save

Essential cookies enable basic functions and are necessary for the proper function of the website.

Show Cookie Information Hide Cookie Information

Statistics cookies collect information anonymously. This information helps us to understand how our visitors use our website.

Content from video platforms and social media platforms is blocked by default. If External Media cookies are accepted, access to those contents no longer requires manual consent.

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Publications
Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

Advanced Search
Journal List
v.6(1); 2002

Statistics review 1: Presenting and summarising data

Elise whitley.

1 Lecturer in Medical Statistics, University of Bristol, Bristol, UK

Jonathan Ball

2 Lecturer in Intensive Care Medicine, St George's Hospital Medical School, London, UK

The present review is the first in an ongoing guide to medical statistics, using specific examples from intensive care. The first step in any analysis is to describe and summarize the data. As well as becoming familiar with the data, this is also an opportunity to look for unusually high or low values (outliers), to check the assumptions required for statistical tests, and to decide the best way to categorize the data if this is necessary. In addition to tables and graphs, summary values are a convenient way to summarize large amounts of information. This review introduces some of these measures. It describes and gives examples of qualitative data (unordered and ordered) and quantitative data (discrete and continuous); how these types of data can be represented figuratively; the two important features of a quantitative dataset (location and variability); the measures of location (mean, median and mode); the measures of variability (range, interquartile range, standard deviation and variance); common distributions of clinical data; and simple transformations of positively skewed data.

Introduction

Data description is a vital part of any research project and should not be ignored in the rush to start testing hypotheses. There are many reasons for this important process, such as gaining familiarity with the data, looking for unusually high or low values (outliers) and checking the assumptions required for statistical testing. The two most common types of data are qualitative and quantitative (Fig. (Fig.1). 1 ). Qualitative data fall into two categories: unordered qualitative data, such as ventilatory support (none, non-invasive, intermittent positive-pressure ventilation, oscillatory); and ordered qualitative data, such as severity of disease (mild, moderate, severe). Quantitative data are numerical and fall into two categories: discrete quantitative data, such as the number of days spent in intensive care; and continuous quantitative data, such as blood pressure or haemoglobin concentrations. Tables are a useful way of describing both qualitative and grouped quantitative data and there are also many types of graph that provide a convenient summary. Qualitative data are commonly described using bar or pie charts, whereas quantitative data can be represented using histograms or box and whisker plots.

An external file that holds a picture, illustration, etc.
Object name is cc1455-1.jpg

Types of data. ICU = intensive care unit.

Tables and graphs provide a convenient simple picture of a set of data (dataset), but it is often necessary to further summarize quantitative data, for example for hypothesis testing. The two most important elements of a dataset are its location (where on average the data lie) and its variability (the extent to which individual data values deviate from the location). There are several different measures of location and variability that can be calculated, and the choice of which to use depends on individual circumstances.

Measuring location

The mean is the most well known average value. It is calculated by summing all of the values in a dataset and dividing them by the total number of values. The algebraic notation for the mean of a set of n values ( X 1 , X 2 ,..., X n ) is:

Of all the measures of location, the mean is the most commonly used because it is easily understood and has useful mathematical properties that make it convenient for use in many statistical contexts. It is strongly influenced by extreme values (outliers), however, and is most representative when the data are symmetrically distributed (see below).

The median is the central value when all observations are sorted in order. If there is an odd number of observations then it is simply the middle value; if there is an even number of observations then it is the average of the middle two. The median does not have the beneficial mathematical properties of the mean. However, it is not generally influenced by extreme values (outliers), and as a result it is particularly useful in situations where there are unusually low or high values that would render the mean unrepresentative of the data.

The mode is simply the most commonly occurring value in the data. It is not generally used because it is often not representative of the data, particularly when the dataset is small.

Example of calculating location

To see how these quantities are calculated in practise, consider the data shown in Table Table1. 1 . These are haemoglobin concentration measurements taken from 48 patients on admission to an intensive care unit, listed here in ascending order.

Haemoglobin (g/dl) from 48 intensive care patients

The first step in exploring these data is to construct a histogram to illustrate the shape of the distribution. Rather than plot the frequency of each value separately (e.g. one patient with haemoglobin 5.4 g/dl, two patients with haemoglobin 6.4 g/dl, one patient with haemoglobin 7.0 g/dl, and so on), continuous data are generally grouped or categorized before plotting (e.g. one patient with haemoglobin between 5.0 and 5.9 g/dl, two patients with haemoglobin between 6.0 and 6.9 g/dl, four patients with haemoglobin between 7.0 and 7.9 g/dl, and so on). These categories can be defined in any way and need not necessarily be of the same width, although it is generally more convenient to have equally sized groups. However, the categories must be exhaustive (the categories must cover the full range of values in the dataset) and exclusive (there should be no overlap between categories). Therefore, if one category ends with 6.9 g/dl then the next must begin with 7.0 g/dl rather than 6.9 g/dl. Fig. Fig.2 2 shows the data in Table Table1 1 grouped into 1 g/dl categories (5.0—5.9, 6.0—6.9,...,14.0–14.9 g/dl).

An external file that holds a picture, illustration, etc.
Object name is cc1455-2.jpg

Histogram of admission haemoglobin measurements from 48 intensive care patients.

Fig. Fig.2 2 shows that the data are roughly symmetrically distributed; more common values are clustered around a peak in the middle of the distribution, with decreasing numbers of smaller and larger values on either side. The mean, median and mode of these data are shown in Table Table2 2 .

Mean, median and mode of haemoglobin measurements from 48 intensive care patients listed in Table Table1 1

Notice that the mean and the median are similar. This is because the data are approximately symmetrical. In general, the mean, median and mode will be similar in a dataset that has a symmetrical distribution with a single peak, such as that shown in Fig. Fig.2. 2 . However, the dataset presented here is rather small and so the mode is not such a good measure of location.

Measuring variability

As with location, there are a number of different measures of variability. The simplest of these is probably the range, which is the difference between the largest and smallest observation in the dataset. The disadvantage of this measure is that it is based on only two of the observations and may not be representative of the whole dataset, particularly if there are outliers. In addition, it gives no information regarding how the data are distributed between the two extremes.

Interquartile range

An alternative to the range is the interquartile range. Quartiles are calculated in a similar way to the median; the median splits a dataset into two equally sized groups, tertiles split the data into three (approximately) equally sized groups, quartiles into four, quintiles into five, and so on. The interquartile range is the range between the bottom and top quartiles, and indicates where the middle 50% of the data lie. Like the median, the interquartile range is not influenced by unusually high or low values and may be particularly useful when data are not symmetrically distributed. Ranges based on alternative subdivisions of the data can also be calculated; for example, if the data are split into deciles, 80% of the data will lie between the bottom and top deciles and so on.

Standard deviation

The standard deviation is a measure of the degree to which individual observations in a dataset deviate from the mean value. Broadly, it is the average deviation from the mean across all observations. It is calculated by squaring the difference of each individual observation from the mean (squared to remove any negative differences), adding them together, dividing by the total number of observations minus 1, and taking the square root of the result.

Algebraically the standard deviation for a set of n values ( X 1 , X 2 ,..., X n } is written as follows:

Another measure of variability that may be encountered is the variance. This is simply the square of the standard deviation:

The variance is not generally used in data description but is central to analysis of variance (covered in a subsequent review in this series).

Example of calculating variability

Table Table3 3 shows the calculation of the range, interquartile range and standard deviation of the data shown in Table Table1. 1 . The range, from 5.4 to 14.1 g/dl, indicates the full extent of the data, but does not give any information regarding how the remaining observations are distributed between these extremes. For example, it may be that the lower value of 5.4 g/dl is an outlier and the remainder of the observations are all over 10.0 g/dl, or that most values lie at the lower end of the range with substantially fewer at the other extreme. It is impossible to tell this from the range alone.

Range, interquartile range and standard deviation of haemoglobin measurements from 48 intensive care patients listed in Table Table1 1

The interquartile range (which contains the central 50% of the data) gives a better indication of the general shape of the distribution, and indicates that 50% of all observations fall in a rather narrower range (from 8.7 to 10.8 g/dl). In addition, the median and mean both fall approximately in the centre of the interquartile range, which suggests that the distribution is reasonably symmetrical.

The standard deviation in isolation does not provide a great deal of information, although it is sometimes expressed as a percentage of the mean, known as the coefficient of variation. However, it is often used to calculate another extremely useful quantity known as the reference range; this will be covered in more detail in the next article.

Common distributions and simple transformations

Quantitative clinical data follow a wide variety of distributions, but the majority are unimodal, meaning that the data has a single (modal) peak with a tail on either side. The most common of these unimodal distributions are symmetrical, as shown in Fig. Fig.2, 2 , with a peak in the centre of the data and evenly balanced tails on the right and left.

However, not all unimodal distributions are symmetrical; some are skewed with a substantially longer tail on one side. The type of skew is determined by which tail is longer. A positively skewed distribution has a longer tail on the right; in other words the majority of values are relatively low with a smaller number of extreme high values. Fig. Fig.3 3 shows the admission serum urea levels of 100 intensive care patients. The majority have a serum urea level below 20 mmol/l, with a peak between 4.0 and 7.9 mmol/l. However, an important minority of patients have levels above 20 mmol/l and some have levels as high as 60 mmol/l.

An external file that holds a picture, illustration, etc.
Object name is cc1455-3.jpg

Histogram of admission serum urea levels from 100 intensive care patients. A = mean; B = median; C = geometric mean.

The mean of these data is 12.25 mmol/l (A) and the median is 9 mmol/l (B), as indicated in Fig. Fig.3. 3 . In a positively skewed distribution the median will always be smaller than the mean because the mean is strongly influenced by the extreme values in the right-hand tail, and may therefore be less representative of the data as a whole. However, it is possible to transform data of this type in order to obtain a more representative mean value. This type of transformation is also useful when statistical tests require data to be more symmetrically distributed (see subsequent reviews in this series for details). There is a wide range of transformations that can be used in this context [ 2 ], but the most commonly used with positively skewed data is the logarithmic transformation.

In a logarithmic transformation, every value in the dataset is replaced by its logarithm. Logarithms are defined to a base, the most common being base e (natural logarithms) or base 10. The end result of a logarithmic transformation is independent of the base chosen, although the same base must be used throughout. As an example, consider the data shown in Fig. Fig.3. 3 . Although the majority of values are below 20, there is also an important number of values above this. Table Table4 4 shows a sample of the raw numbers along with their logarithmically transformed values (to base e).

Raw and logarithmically transformed serum urea levels

Notice that the differences between the raw values are always the same (1), whereas the differences in the transformed values are larger at the lower end of the scale (0.18 and 0.16) than at the upper end (0.02 and 0.01). The logarithmic transformation stretches out the lower end and compresses the upper end of a distribution, with the result that positively skewed data will tend to become more symmetrical in shape. The transformed data from Fig. Fig.3 3 are shown in Fig. Fig.4, 4 , in which it can be seen that there is a single peak at around 2.4 with similar tails to the right and left.

An external file that holds a picture, illustration, etc.
Object name is cc1455-4.jpg

Logarithmically transformed admission serum urea levels from 100 intensive care patients.

Calculations and statistical tests can now be carried out on the transformed data before converting the results back to the original scale. For example, the mean of the transformed serum urea data is 2.19. To transform this value back to the original scale, the antilog (or exponential in the case of natural, base e logarithms) is applied. This gives a 'geometric mean' of 8.94 mmol/l on the original scale (C in Fig. Fig.3), 3 ), the term 'geometric' indicating that calculations have been carried out on the logarithmic scale. This is in contrast to the standard (arithmetic) mean value (calculated on the original scale) of 12.25 mmol/l (A in Fig. Fig.3). 3 ). Looking at Fig. Fig.3, 3 , it is clear that the geometric mean is more representative of the data than the arithmetic mean.

Similarly, a negatively skewed distribution has a longer tail to the left; in other words, the extreme values are at the lower end of the scale. Fig. Fig.5 5 shows a negatively skewed distribution of admission arterial blood pH from 100 intensive care patients. In this case the mean will be unduly influenced by the extreme low values and the median (which is always greater than the mean in this setting) may be a more representative measure. However, as in the positively skewed case it is possible to transform this type of data in order to make it more symmetrical, although the function used in this setting is not the logarithm (for more details, see Kirkwood [ 2 ]).

An external file that holds a picture, illustration, etc.
Object name is cc1455-5.jpg

Admission arterial blood pH from 100 intensive care patients.

Finally, it is possible that data may arise with more than one (modal) peak. These data can be difficult to manage and it may be the case that neither the mean nor the median is a representative measure. However, such distributions are rare and may well be artefactual. For example, a (bimodal) distribution with two peaks may actually be a combination of two uni-modal distributions (such as hormone levels in men and women). Alternatively, a (multimodal) distribution with multiple peaks may be due to digit preference (rounding observations up or down) during data collection, where peaks appear at round numbers, for example peaks in systolic blood pressure at 90, 100, 110, 120 mmHg, and so on. In such cases appropriate subdivision, categorization, or even recollection of the data may be required to eliminate the problem.

Competing interests

None declared.

Altman DG. Practical Statistics for Medical Research London: Chapman & Hall; 1991.
Kirkwood BR. Essentials of medical Statistics London: Blackwell Science Ltd; 1988.

Purdue Online Writing Lab Purdue OWL® College of Liberal Arts

Writing with Descriptive Statistics

Welcome to the Purdue OWL

This page is brought to you by the OWL at Purdue University. When printing this page, you must include the entire legal notice.

Copyright ©1995-2018 by The Writing Lab & The OWL at Purdue and Purdue University. All rights reserved. This material may not be published, reproduced, broadcast, rewritten, or redistributed without permission. Use of this site constitutes acceptance of our terms and conditions of fair use.

Usually there is no good way to write a statistic. It rarely sounds good, and often interrupts the structure or flow of your writing. Oftentimes the best way to write descriptive statistics is to be direct. If you are citing several statistics about the same topic, it may be best to include them all in the same paragraph or section.

The mean of exam two is 77.7. The median is 75, and the mode is 79. Exam two had a standard deviation of 11.6.

Overall the company had another excellent year. We shipped 14.3 tons of fertilizer for the year, and averaged 1.7 tons of fertilizer during the summer months. This is an increase over last year, where we shipped only 13.1 tons of fertilizer, and averaged only 1.4 tons during the summer months. (Standard deviations were as followed: this summer .3 tons, last summer .4 tons).

Some fields prefer to put means and standard deviations in parentheses like this:

If you have lots of statistics to report, you should strongly consider presenting them in tables or some other visual form. You would then highlight statistics of interest in your text, but would not report all of the statistics. See the section on statistics and visuals for more details.

If you have a data set that you are using (such as all the scores from an exam) it would be unusual to include all of the scores in a paper or article. One of the reasons to use statistics is to condense large amounts of information into more manageable chunks; presenting your entire data set defeats this purpose.

At the bare minimum, if you are presenting statistics on a data set, it should include the mean and probably the standard deviation. This is the minimum information needed to get an idea of what the distribution of your data set might look like. How much additional information you include is entirely up to you. In general, don't include information if it is irrelevant to your argument or purpose. If you include statistics that many of your readers would not understand, consider adding the statistics in a footnote or appendix that explains it in more detail.

PRO Courses Guides New Tech Help Pro Expert Videos About wikiHow Pro Upgrade Sign In
EDIT Edit this Article
EXPLORE Tech Help Pro About Us Random Article Quizzes Request a New Article Community Dashboard This Or That Game Popular Categories Arts and Entertainment Artwork Books Movies Computers and Electronics Computers Phone Skills Technology Hacks Health Men's Health Mental Health Women's Health Relationships Dating Love Relationship Issues Hobbies and Crafts Crafts Drawing Games Education & Communication Communication Skills Personal Development Studying Personal Care and Style Fashion Hair Care Personal Hygiene Youth Personal Care School Stuff Dating All Categories Arts and Entertainment Finance and Business Home and Garden Relationship Quizzes Cars & Other Vehicles Food and Entertaining Personal Care and Style Sports and Fitness Computers and Electronics Health Pets and Animals Travel Education & Communication Hobbies and Crafts Philosophy and Religion Work World Family Life Holidays and Traditions Relationships Youth
Browse Articles
Learn Something New
Quizzes Hot
This Or That Game
Train Your Brain
Explore More
Support wikiHow
About wikiHow
Log in / Sign up
Education and Communications
Official Writing
Report Writing

How to Write a Statistical Report

Last Updated: March 6, 2024 Fact Checked

This article was reviewed by Grace Imson, MA and by wikiHow staff writer, Jennifer Mueller, JD . Grace Imson is a math teacher with over 40 years of teaching experience. Grace is currently a math instructor at the City College of San Francisco and was previously in the Math Department at Saint Louis University. She has taught math at the elementary, middle, high school, and college levels. She has an MA in Education, specializing in Administration and Supervision from Saint Louis University. This article has been fact-checked, ensuring the accuracy of any cited facts and confirming the authority of its sources. This article has been viewed 402,831 times.

A statistical report informs readers about a particular subject or project. You can write a successful statistical report by formatting your report properly and including all the necessary information your readers need. [1] X Research source

A Beginner’s Guide to Statistical Report Writing

Use other statistical reports as a guide to format your own. Type your report in an easy-to-read font, include all the information that your reader needs, and present your results in a table or graph.

Formatting Your Report

Step 1 Look at other statistical reports.

If you're completing your report for a class, your instructor or professor may be willing to show you some reports submitted by previous students if you ask.
University libraries also have copies of statistical reports created by students and faculty researchers on file. Ask the research librarian to help you locate one in your field of study.
You also may be able to find statistical reports online that were created for business or marketing research, as well as those filed for government agencies.
Be careful following samples exactly, particularly if they were completed for research in another field. Different fields of study have their own conventions regarding how a statistical report should look and what it should contain. For example, a statistical report by a mathematician may look incredibly different than one created by a market researcher for a retail business.

Step 2 Type your report in an easy-to-read font.

You typically want to have 1-inch margins around all sides of your report. Be careful when adding visual elements such as charts and graphs to your report, and make sure they don't bleed over the margins or your report may not print properly and will look sloppy.
You may want to have a 1.5-inch margin on the left-hand side of the page if you anticipate putting your study into a folder or binder, so all the words can be read comfortably when the pages are turned.
Don't double-space your report unless you're writing it for a class assignment and the instructor or professor specifically tells you to do so.
Use headers to add the page number to every page. You may also want to add your last name or the title of the study along with the page number.

Step 3 Use the appropriate citation method.

Citation methods typically are included in style manuals, which not only detail how you should cite your references but also have rules on acceptable punctuation and abbreviations, headings, and the general formatting of your report.
For example, if you're writing a statistical report based on a psychological study, you typically must use the style manual published by the American Psychological Association (APA).
Your citation method is all the more important if you anticipate your statistical report will be published in a particular trade or professional journal.

If you're creating your statistical report for a class, a cover sheet may be required. Check with your instructor or professor or look on your assignment sheet to find out whether a cover sheet is required and what should be included on it.
For longer statistical reports, you may also want to include a table of contents. You won't be able to format this until after you've finished the report, but it will list each section of your report and the page on which that section starts.

If you decide to create section headings, they should be bold-faced and set off in such a way that they stand out from the rest of the text. For example, you may want to center bold-faced headings and use a slightly larger font size.
Make sure a section heading doesn't fall at the bottom of the page. You should have at least a few lines of text, if not a full paragraph, below each section heading before the page break.

Check the margins around visual elements and make sure the text lines up and is not too close to the visual element. You want it to be clear where the text ends and the words associated with the visual element (such as the axis labels for a graph) begin.
Visual elements can cause your text to shift, so you'll need to double-check your section headings after your report is complete and make sure none of them are at the bottom of a page.
Where possible, you also want to change your page breaks to eliminate situations in which the last line of a page is the first line of a paragraph, or the first line of a page is the last line of a paragraph. These are difficult to read.

Creating Your Content

Step 1 Write the abstract of your report.

Avoid overly scientific or statistical language in your abstract as much as possible. Your abstract should be understandable to a larger audience than those who will be reading the entire report.
It can help to think of your abstract as an elevator pitch. If you were in an elevator with someone and they asked you what your project was about, your abstract is what you would say to that person to describe your project.
Even though your abstract appears first in your report, it's often easier to write it last, after you've completed the entire report.

Aim for clear and concise language to set the tone for your report. Put your project in layperson's terms rather than using overly statistical language, regardless of the target audience of your report.
If your report is based on a series of scientific experiments or data drawn from polls or demographic data, state your hypothesis or expectations going into the project.
If other work has been done in the field regarding the same subject or similar questions, it's also appropriate to include a brief review of that work after your introduction. Explain why your work is different or what you hope to add to the existing body of work through your research.

Step 3 Describe the research methods you used.

Include a description of any particular methods you used to track results, particularly if your experiments or studies were longer-term or observational in nature.
If you had to make any adjustments during the development of the project, identify those adjustments and explain what required you to make them.
List any software, resources, or other materials you used in the course of your research. If you used any textbook material, a reference is sufficient – there's no need to summarize that material in your report.

Start with your main results, then include subsidiary results or interesting facts or trends you discovered.
Generally you want to stay away from reporting results that have nothing to do with your original expectations or hypotheses. However, if you discovered something startling and unexpected through your research, you may want to at least mention it.
This typically will be the longest section of your report, with the most detailed statistics. It also will be the driest and most difficult section for your readers to get through, especially if they are not statisticians.
Small graphs or charts often show your results more clearly than you can write them in text.

When you get to this section of your report, leave the heavy, statistical language behind. This section should be easy for anyone to understand, even if they skipped over your results section.
If any additional research or study is necessary to further explore your hypotheses or answer questions that arose in the context of your project, describe that as well.

It is often the case that you see things in hindsight that would have made data-gathering easier or more efficient. This is the place to discuss those. Since the scientific method is designed so that others can repeat your study, you want to pass on to future researchers your insights.
Any speculation you have, or additional questions that came to mind over the course of your study, also are appropriate here. Just make sure you keep it to a minimum – you don't want your personal opinions and speculation to overtake the project itself.

For example, if you compared your study to a similar study conducted in another city the year before yours, you would want to include a citation to that report in your references.
Cite your references using the appropriate citation method for your discipline or field of study.
Avoid citing any references that you did not mention in your report. For example, you may have done some background reading in preparation for your project. However, if you didn't end up directly citing any of those sources in your report, there's no need to list them in your references.

Avoid trade "terms of art" or industry jargon if your report will be read mainly by people outside your particular industry.
Make sure the terms of art and statistical terms that you do use in your report are used correctly. For example, you shouldn't use the word "average" in a statistical report because people often use that word to refer to different measures. Instead, use "mean," "median," or "mode" – whichever is correct.

Presenting Your Data

Step 1 Label and title all tables or graphs.

This is particularly important if you're submitting your report for publication in a trade journal. If the pages are different sizes than the paper you print your report on, your visual elements won't line up the same way in the journal as they do in your manuscript.
This also can be a factor if your report will be published online, since different display sizes can cause visual elements to display differently.
The easiest way to label your visual elements is "Figure," followed by a number. Then you simply number each element sequentially in the order in which they appear in your report.
Your title describes the information presented by the visual element. For example, if you've created a bar graph that shows the test scores of students on the chemistry class final, you might title it "Chemistry Final Test Scores, Fall 2016."

Step 2 Keep your visual elements neat and clean.

Make sure each visual element is large enough in size that your readers can see everything they need to see without squinting. If you have to shrink down a graph to the point that readers can't make out the labels, it won't be very helpful to them.
Create your visual elements using a format that you can easily import into your word-processing file. Importing using some graphics formats can distort the image or result in extremely low resolution.

Step 3 Distribute information appropriately.

For example, if you have hundreds of samples, your x axis will be cluttered if you display each sample individually as a bar. However, you can move the measure on the y axis to the x axis, and use the y axis to measure the frequency.
When your data include percentages, only go out to fractions of a percentage if your research demands it. If the smallest difference between your subjects is two percentage points, there's no need to display more than the whole percentage. However, if the difference between your subjects comes down to hundredths of a percent, you would need to display percentages to two decimal places so the graph would show the difference.
For example, if your report includes a bar graph of the distribution of test scores for a chemistry class, and those scores are 97.56, 97.52, 97.46, and 97.61, your x axis would be each of the students and your y axis would start at 97 and go up to 98. This would highlight the differences in the students' scores.

Be careful that your appendix does not overwhelm your report. You don't necessarily want to include every data sheet or other document you created over the course of your project.
Rather, you only want to include documents that reasonably expand and lead to a further understanding of your report.
For example, when describing your methods you state that a survey was conducted of students in a chemistry class to determine how they studied for the final exam. You might include a copy of the questions the students were asked in an appendix. However, you wouldn't necessarily need to include a copy of each student's answers to those questions.

Statistical Report Outline

Community Q&A

↑ https://www.ibm.com/docs/en/iotdm/11.3?topic=SSMLQ4_11.3.0/com.ibm.nex.optimd.dg.doc/11arcperf/oparcuse-r-statistical_reports.html
↑ https://www.examples.com/business/report/statistics-report.html
↑ https://collaboratory.ucr.edu/sites/g/files/rcwecm2761/files/2019-04/Final_Report_dan.pdf
↑ https://tex.stackexchange.com/questions/49386/what-is-the-recommended-font-to-use-for-a-statistical-table-in-an-academic-journ
↑ https://psychology.ucsd.edu/undergraduate-program/undergraduate-resources/academic-writing-resources/writing-research-papers/citing-references.html
↑ https://www.youtube.com/watch?v=kl3JOCmuil4

About This Article

Start your statistical report with an introduction explaining the purpose of your research. Then, dive into your research methods, how you collected data, and the experiments you conducted. Present you results with any necessary charts and graphs, but do not discuss or analyze the numbers -- in a statistical report, all analysis should happen in the conclusion. Once you’ve finished writing your report, draft a 200 word abstract and create a cover sheet with your name, the date, and the report title. Don’t forget to cite the appropriate references when necessary! For more formatting help, read on! Did this summary help you? Yes No

Send fan mail to authors

Reader Success Stories

Dorothy Walter

Jan 15, 2017

Did this article help you?

Sarvath Ali

Feb 10, 2017

Mar 8, 2018

Sonam Sharma

Apr 30, 2019

Ashley Persaud

Jan 23, 2018

Featured Articles

100+ Good Morning Texts for Her (& Other Ways to Make Her Smile)

Watch Articles

Terms of Use
Privacy Policy
Do Not Sell or Share My Info
Not Selling Info

Don’t miss out! Sign up for

wikiHow’s newsletter

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

View all journals
My Account Login
Explore content
About the journal
Publish with us
Sign up for alerts
Data Descriptor
Open access
Published: 03 May 2024

A dataset for measuring the impact of research data and their curation

Libby Hemphill ORCID: orcid.org/0000-0002-3793-7281 1 , 2 ,
Andrea Thomer 3 ,
Sara Lafia 1 ,
Lizhou Fan 2 ,
David Bleckley ORCID: orcid.org/0000-0001-7715-4348 1 &
Elizabeth Moss 1

Scientific Data volume 11 , Article number: 442 ( 2024 ) Cite this article

686 Accesses

8 Altmetric

Metrics details

Research data
Social sciences

Science funders, publishers, and data archives make decisions about how to responsibly allocate resources to maximize the reuse potential of research data. This paper introduces a dataset developed to measure the impact of archival and data curation decisions on data reuse. The dataset describes 10,605 social science research datasets, their curation histories, and reuse contexts in 94,755 publications that cover 59 years from 1963 to 2022. The dataset was constructed from study-level metadata, citing publications, and curation records available through the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan. The dataset includes information about study-level attributes (e.g., PIs, funders, subject terms); usage statistics (e.g., downloads, citations); archiving decisions (e.g., curation activities, data transformations); and bibliometric attributes (e.g., journals, authors) for citing publications. This dataset provides information on factors that contribute to long-term data reuse, which can inform the design of effective evidence-based recommendations to support high-impact research data curation decisions.

SciSciNet: A large-scale open data lake for the science of science research

Data, measurement and empirical methods in the science of science

Interdisciplinarity revisited: evidence for research impact and dynamism

Background & summary.

Recent policy changes in funding agencies and academic journals have increased data sharing among researchers and between researchers and the public. Data sharing advances science and provides the transparency necessary for evaluating, replicating, and verifying results. However, many data-sharing policies do not explain what constitutes an appropriate dataset for archiving or how to determine the value of datasets to secondary users 1 , 2 , 3 . Questions about how to allocate data-sharing resources efficiently and responsibly have gone unanswered 4 , 5 , 6 . For instance, data-sharing policies recognize that not all data should be curated and preserved, but they do not articulate metrics or guidelines for determining what data are most worthy of investment.

Despite the potential for innovation and advancement that data sharing holds, the best strategies to prioritize datasets for preparation and archiving are often unclear. Some datasets are likely to have more downstream potential than others, and data curation policies and workflows should prioritize high-value data instead of being one-size-fits-all. Though prior research in library and information science has shown that the “analytic potential” of a dataset is key to its reuse value 7 , work is needed to implement conceptual data reuse frameworks 8 , 9 , 10 , 11 , 12 , 13 , 14 . In addition, publishers and data archives need guidance to develop metrics and evaluation strategies to assess the impact of datasets.

Several existing resources have been compiled to study the relationship between the reuse of scholarly products, such as datasets (Table 1 ); however, none of these resources include explicit information on how curation processes are applied to data to increase their value, maximize their accessibility, and ensure their long-term preservation. The CCex (Curation Costs Exchange) provides models of curation services along with cost-related datasets shared by contributors but does not make explicit connections between them or include reuse information 15 . Analyses on platforms such as DataCite 16 have focused on metadata completeness and record usage, but have not included related curation-level information. Analyses of GenBank 17 and FigShare 18 , 19 citation networks do not include curation information. Related studies of Github repository reuse 20 and Softcite software citation 21 reveal significant factors that impact the reuse of secondary research products but do not focus on research data. RD-Switchboard 22 and DSKG 23 are scholarly knowledge graphs linking research data to articles, patents, and grants, but largely omit social science research data and do not include curation-level factors. To our knowledge, other studies of curation work in organizations similar to ICPSR – such as GESIS 24 , Dataverse 25 , and DANS 26 – have not made their underlying data available for analysis.

This paper describes a dataset 27 compiled for the MICA project (Measuring the Impact of Curation Actions) led by investigators at ICPSR, a large social science data archive at the University of Michigan. The dataset was originally developed to study the impacts of data curation and archiving on data reuse. The MICA dataset has supported several previous publications investigating the intensity of data curation actions 28 , the relationship between data curation actions and data reuse 29 , and the structures of research communities in a data citation network 30 . Collectively, these studies help explain the return on various types of curatorial investments. The dataset that we introduce in this paper, which we refer to as the MICA dataset, has the potential to address research questions in the areas of science (e.g., knowledge production), library and information science (e.g., scholarly communication), and data archiving (e.g., reproducible workflows).

We constructed the MICA dataset 27 using records available at ICPSR, a large social science data archive at the University of Michigan. Data set creation involved: collecting and enriching metadata for articles indexed in the ICPSR Bibliography of Data-related Literature against the Dimensions AI bibliometric database; gathering usage statistics for studies from ICPSR’s administrative database; processing data curation work logs from ICPSR’s project tracking platform, Jira; and linking data in social science studies and series to citing analysis papers (Fig. 1 ).

Steps to prepare MICA dataset for analysis - external sources are red, primary internal sources are blue, and internal linked sources are green.

Enrich paper metadata

The ICPSR Bibliography of Data-related Literature is a growing database of literature in which data from ICPSR studies have been used. Its creation was funded by the National Science Foundation (Award 9977984), and for the past 20 years it has been supported by ICPSR membership and multiple US federally-funded and foundation-funded topical archives at ICPSR. The Bibliography was originally launched in the year 2000 to aid in data discovery by providing a searchable database linking publications to the study data used in them. The Bibliography collects the universe of output based on the data shared in each study through, which is made available through each ICPSR study’s webpage. The Bibliography contains both peer-reviewed and grey literature, which provides evidence for measuring the impact of research data. For an item to be included in the ICPSR Bibliography, it must contain an analysis of data archived by ICPSR or contain a discussion or critique of the data collection process, study design, or methodology 31 . The Bibliography is manually curated by a team of librarians and information specialists at ICPSR who enter and validate entries. Some publications are supplied to the Bibliography by data depositors, and some citations are submitted to the Bibliography by authors who abide by ICPSR’s terms of use requiring them to submit citations to works in which they analyzed data retrieved from ICPSR. Most of the Bibliography is populated by Bibliography team members, who create custom queries for ICPSR studies performed across numerous sources, including Google Scholar, ProQuest, SSRN, and others. Each record in the Bibliography is one publication that has used one or more ICPSR studies. The version we used was captured on 2021-11-16 and included 94,755 publications.

To expand the coverage of the ICPSR Bibliography, we searched exhaustively for all ICPSR study names, unique numbers assigned to ICPSR studies, and DOIs 32 using a full-text index available through the Dimensions AI database 33 . We accessed Dimensions through a license agreement with the University of Michigan. ICPSR Bibliography librarians and information specialists manually reviewed and validated new entries that matched one or more search criteria. We then used Dimensions to gather enriched metadata and full-text links for items in the Bibliography with DOIs. We matched 43% of the items in the Bibliography to enriched Dimensions metadata including abstracts, field of research codes, concepts, and authors’ institutional information; we also obtained links to full text for 16% of Bibliography items. Based on licensing agreements, we included Dimensions identifiers and links to full text so that users with valid publisher and database access can construct an enriched publication dataset.

Gather study usage data

ICPSR maintains a relational administrative database, DBInfo, that organizes study-level metadata and information on data reuse across separate tables. Studies at ICPSR consist of one or more files collected at a single time or for a single purpose; studies in which the same variables are observed over time are grouped into series. Each study at ICPSR is assigned a DOI, and its metadata are stored in DBInfo. Study metadata follows the Data Documentation Initiative (DDI) Codebook 2.5 standard. DDI elements included in our dataset are title, ICPSR study identification number, DOI, authoring entities, description (abstract), funding agencies, subject terms assigned to the study during curation, and geographic coverage. We also created variables based on DDI elements: total variable count, the presence of survey question text in the metadata, the number of author entities, and whether an author entity was an institution. We gathered metadata for ICPSR’s 10,605 unrestricted public-use studies available as of 2021-11-16 ( https://www.icpsr.umich.edu/web/pages/membership/or/metadata/oai.html ).

To link study usage data with study-level metadata records, we joined study metadata from DBinfo on study usage information, which included total study downloads (data and documentation), individual data file downloads, and cumulative citations from the ICPSR Bibliography. We also gathered descriptive metadata for each study and its variables, which allowed us to summarize and append recoded fields onto the study-level metadata such as curation level, number and type of principle investigators, total variable count, and binary variables indicating whether the study data were made available for online analysis, whether survey question text was made searchable online, and whether the study variables were indexed for search. These characteristics describe aspects of the discoverability of the data to compare with other characteristics of the study. We used the study and series numbers included in the ICPSR Bibliography as unique identifiers to link papers to metadata and analyze the community structure of dataset co-citations in the ICPSR Bibliography 32 .

Process curation work logs

Researchers deposit data at ICPSR for curation and long-term preservation. Between 2016 and 2020, more than 3,000 research studies were deposited with ICPSR. Since 2017, ICPSR has organized curation work into a central unit that provides varied levels of curation that vary in the intensity and complexity of data enhancement that they provide. While the levels of curation are standardized as to effort (level one = less effort, level three = most effort), the specific curatorial actions undertaken for each dataset vary. The specific curation actions are captured in Jira, a work tracking program, which data curators at ICPSR use to collaborate and communicate their progress through tickets. We obtained access to a corpus of 669 completed Jira tickets corresponding to the curation of 566 unique studies between February 2017 and December 2019 28 .

To process the tickets, we focused only on their work log portions, which contained free text descriptions of work that data curators had performed on a deposited study, along with the curators’ identifiers, and timestamps. To protect the confidentiality of the data curators and the processing steps they performed, we collaborated with ICPSR’s curation unit to propose a classification scheme, which we used to train a Naive Bayes classifier and label curation actions in each work log sentence. The eight curation action labels we proposed 28 were: (1) initial review and planning, (2) data transformation, (3) metadata, (4) documentation, (5) quality checks, (6) communication, (7) other, and (8) non-curation work. We note that these categories of curation work are very specific to the curatorial processes and types of data stored at ICPSR, and may not match the curation activities at other repositories. After applying the classifier to the work log sentences, we obtained summary-level curation actions for a subset of all ICPSR studies (5%), along with the total number of hours spent on data curation for each study, and the proportion of time associated with each action during curation.

Data Records

The MICA dataset 27 connects records for each of ICPSR’s archived research studies to the research publications that use them and related curation activities available for a subset of studies (Fig. 2 ). Each of the three tables published in the dataset is available as a study archived at ICPSR. The data tables are distributed as statistical files available for use in SAS, SPSS, Stata, and R as well as delimited and ASCII text files. The dataset is organized around studies and papers as primary entities. The studies table lists ICPSR studies, their metadata attributes, and usage information; the papers table was constructed using the ICPSR Bibliography and Dimensions database; and the curation logs table summarizes the data curation steps performed on a subset of ICPSR studies.

Studies (“ICPSR_STUDIES”): 10,605 social science research datasets available through ICPSR up to 2021-11-16 with variables for ICPSR study number, digital object identifier, study name, series number, series title, authoring entities, full-text description, release date, funding agency, geographic coverage, subject terms, topical archive, curation level, single principal investigator (PI), institutional PI, the total number of PIs, total variables in data files, question text availability, study variable indexing, level of restriction, total unique users downloading study data files and codebooks, total unique users downloading data only, and total unique papers citing data through November 2021. Studies map to the papers and curation logs table through ICPSR study numbers as “STUDY”. However, not every study in this table will have records in the papers and curation logs tables.

Papers (“ICPSR_PAPERS”): 94,755 publications collected from 2000-08-11 to 2021-11-16 in the ICPSR Bibliography and enriched with metadata from the Dimensions database with variables for paper number, identifier, title, authors, publication venue, item type, publication date, input date, ICPSR series numbers used in the paper, ICPSR study numbers used in the paper, the Dimension identifier, and the Dimensions link to the publication’s full text. Papers map to the studies table through ICPSR study numbers in the “STUDY_NUMS” field. Each record represents a single publication, and because a researcher can use multiple datasets when creating a publication, each record may list multiple studies or series.

Curation logs (“ICPSR_CURATION_LOGS”): 649 curation logs for 563 ICPSR studies (although most studies in the subset had one curation log, some studies were associated with multiple logs, with a maximum of 10) curated between February 2017 and December 2019 with variables for study number, action labels assigned to work description sentences using a classifier trained on ICPSR curation logs, hours of work associated with a single log entry, and total hours of work logged for the curation ticket. Curation logs map to the study and paper tables through ICPSR study numbers as “STUDY”. Each record represents a single logged action, and future users may wish to aggregate actions to the study level before joining tables.

Entity-relation diagram.

Technical Validation

We report on the reliability of the dataset’s metadata in the following subsections. To support future reuse of the dataset, curation services provided through ICPSR improved data quality by checking for missing values, adding variable labels, and creating a codebook.

All 10,605 studies available through ICPSR have a DOI and a full-text description summarizing what the study is about, the purpose of the study, the main topics covered, and the questions the PIs attempted to answer when they conducted the study. Personal names (i.e., principal investigators) and organizational names (i.e., funding agencies) are standardized against an authority list maintained by ICPSR; geographic names and subject terms are also standardized and hierarchically indexed in the ICPSR Thesaurus 34 . Many of ICPSR’s studies (63%) are in a series and are distributed through the ICPSR General Archive (56%), a non-topical archive that accepts any social or behavioral science data. While study data have been available through ICPSR since 1962, the earliest digital release date recorded for a study was 1984-03-18, when ICPSR’s database was first employed, and the most recent date is 2021-10-28 when the dataset was collected.

Curation level information was recorded starting in 2017 and is available for 1,125 studies (11%); approximately 80% of studies with assigned curation levels received curation services, equally distributed between Levels 1 (least intensive), 2 (moderately intensive), and 3 (most intensive) (Fig. 3 ). Detailed descriptions of ICPSR’s curation levels are available online 35 . Additional metadata are available for a subset of 421 studies (4%), including information about whether the study has a single PI, an institutional PI, the total number of PIs involved, total variables recorded is available for online analysis, has searchable question text, has variables that are indexed for search, contains one or more restricted files, and whether the study is completely restricted. We provided additional metadata for this subset of ICPSR studies because they were released within the past five years and detailed curation and usage information were available for them. Usage statistics including total downloads and data file downloads are available for this subset of studies as well; citation statistics are available for 8,030 studies (76%). Most ICPSR studies have fewer than 500 users, as indicated by total downloads, or citations (Fig. 4 ).

ICPSR study curation levels.

ICPSR study usage.

A subset of 43,102 publications (45%) available in the ICPSR Bibliography had a DOI. Author metadata were entered as free text, meaning that variations may exist and require additional normalization and pre-processing prior to analysis. While author information is standardized for each publication, individual names may appear in different sort orders (e.g., “Earls, Felton J.” and “Stephen W. Raudenbush”). Most of the items in the ICPSR Bibliography as of 2021-11-16 were journal articles (59%), reports (14%), conference presentations (9%), or theses (8%) (Fig. 5 ). The number of publications collected in the Bibliography has increased each decade since the inception of ICPSR in 1962 (Fig. 6 ). Most ICPSR studies (76%) have one or more citations in a publication.

ICPSR Bibliography citation types.

ICPSR citations by decade.

Usage Notes

The dataset consists of three tables that can be joined using the “STUDY” key as shown in Fig. 2 . The “ICPSR_PAPERS” table contains one row per paper with one or more cited studies in the “STUDY_NUMS” column. We manipulated and analyzed the tables as CSV files with the Pandas library 36 in Python and the Tidyverse packages 37 in R.

The present MICA dataset can be used independently to study the relationship between curation decisions and data reuse. Evidence of reuse for specific studies is available in several forms: usage information, including downloads and citation counts; and citation contexts within papers that cite data. Analysis may also be performed on the citation network formed between datasets and papers that use them. Finally, curation actions can be associated with properties of studies and usage histories.

This dataset has several limitations of which users should be aware. First, Jira tickets can only be used to represent the intensiveness of curation for activities undertaken since 2017, when ICPSR started using both Curation Levels and Jira. Studies published before 2017 were all curated, but documentation of the extent of that curation was not standardized and therefore could not be included in these analyses. Second, the measure of publications relies upon the authors’ clarity of data citation and the ICPSR Bibliography staff’s ability to discover citations with varying formality and clarity. Thus, there is always a chance that some secondary-data-citing publications have been left out of the bibliography. Finally, there may be some cases in which a paper in the ICSPSR bibliography did not actually obtain data from ICPSR. For example, PIs have often written about or even distributed their data prior to their archival in ICSPR. Therefore, those publications would not have cited ICPSR but they are still collected in the Bibliography as being directly related to the data that were eventually deposited at ICPSR.

In summary, the MICA dataset contains relationships between two main types of entities – papers and studies – which can be mined. The tables in the MICA dataset have supported network analysis (community structure and clique detection) 30 ; natural language processing (NER for dataset reference detection) 32 ; visualizing citation networks (to search for datasets) 38 ; and regression analysis (on curation decisions and data downloads) 29 . The data are currently being used to develop research metrics and recommendation systems for research data. Given that DOIs are provided for ICPSR studies and articles in the ICPSR Bibliography, the MICA dataset can also be used with other bibliometric databases, including DataCite, Crossref, OpenAlex, and related indexes. Subscription-based services, such as Dimensions AI, are also compatible with the MICA dataset. In some cases, these services provide abstracts or full text for papers from which data citation contexts can be extracted for semantic content analysis.

Code availability

The code 27 used to produce the MICA project dataset is available on GitHub at https://github.com/ICPSR/mica-data-descriptor and through Zenodo with the identifier https://doi.org/10.5281/zenodo.8432666 . Data manipulation and pre-processing were performed in Python. Data curation for distribution was performed in SPSS.

He, L. & Han, Z. Do usage counts of scientific data make sense? An investigation of the Dryad repository. Library Hi Tech 35 , 332–342 (2017).

Article Google Scholar

Brickley, D., Burgess, M. & Noy, N. Google dataset search: Building a search engine for datasets in an open web ecosystem. In The World Wide Web Conference - WWW ‘19 , 1365–1375 (ACM Press, San Francisco, CA, USA, 2019).

Buneman, P., Dosso, D., Lissandrini, M. & Silvello, G. Data citation and the citation graph. Quantitative Science Studies 2 , 1399–1422 (2022).

Chao, T. C. Disciplinary reach: Investigating the impact of dataset reuse in the earth sciences. Proceedings of the American Society for Information Science and Technology 48 , 1–8 (2011).

Article ADS Google Scholar

Parr, C. et al . A discussion of value metrics for data repositories in earth and environmental sciences. Data Science Journal 18 , 58 (2019).

Eschenfelder, K. R., Shankar, K. & Downey, G. The financial maintenance of social science data archives: Four case studies of long–term infrastructure work. J. Assoc. Inf. Sci. Technol. 73 , 1723–1740 (2022).

Palmer, C. L., Weber, N. M. & Cragin, M. H. The analytic potential of scientific data: Understanding re-use value. Proceedings of the American Society for Information Science and Technology 48 , 1–10 (2011).

Zimmerman, A. S. New knowledge from old data: The role of standards in the sharing and reuse of ecological data. Sci. Technol. Human Values 33 , 631–652 (2008).

Cragin, M. H., Palmer, C. L., Carlson, J. R. & Witt, M. Data sharing, small science and institutional repositories. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 368 , 4023–4038 (2010).

Article ADS CAS Google Scholar

Fear, K. M. Measuring and Anticipating the Impact of Data Reuse . Ph.D. thesis, University of Michigan (2013).

Borgman, C. L., Van de Sompel, H., Scharnhorst, A., van den Berg, H. & Treloar, A. Who uses the digital data archive? An exploratory study of DANS. Proceedings of the Association for Information Science and Technology 52 , 1–4 (2015).

Pasquetto, I. V., Borgman, C. L. & Wofford, M. F. Uses and reuses of scientific data: The data creators’ advantage. Harvard Data Science Review 1 (2019).

Gregory, K., Groth, P., Scharnhorst, A. & Wyatt, S. Lost or found? Discovering data needed for research. Harvard Data Science Review (2020).

York, J. Seeking equilibrium in data reuse: A study of knowledge satisficing . Ph.D. thesis, University of Michigan (2022).

Kilbride, W. & Norris, S. Collaborating to clarify the cost of curation. New Review of Information Networking 19 , 44–48 (2014).

Robinson-Garcia, N., Mongeon, P., Jeng, W. & Costas, R. DataCite as a novel bibliometric source: Coverage, strengths and limitations. Journal of Informetrics 11 , 841–854 (2017).

Qin, J., Hemsley, J. & Bratt, S. E. The structural shift and collaboration capacity in GenBank networks: A longitudinal study. Quantitative Science Studies 3 , 174–193 (2022).

Article PubMed PubMed Central Google Scholar

Acuna, D. E., Yi, Z., Liang, L. & Zhuang, H. Predicting the usage of scientific datasets based on article, author, institution, and journal bibliometrics. In Smits, M. (ed.) Information for a Better World: Shaping the Global Future. iConference 2022 ., 42–52 (Springer International Publishing, Cham, 2022).

Zeng, T., Wu, L., Bratt, S. & Acuna, D. E. Assigning credit to scientific datasets using article citation networks. Journal of Informetrics 14 , 101013 (2020).

Koesten, L., Vougiouklis, P., Simperl, E. & Groth, P. Dataset reuse: Toward translating principles to practice. Patterns 1 , 100136 (2020).

Du, C., Cohoon, J., Lopez, P. & Howison, J. Softcite dataset: A dataset of software mentions in biomedical and economic research publications. J. Assoc. Inf. Sci. Technol. 72 , 870–884 (2021).

Aryani, A. et al . A research graph dataset for connecting research data repositories using RD-Switchboard. Sci Data 5 , 180099 (2018).

Färber, M. & Lamprecht, D. The data set knowledge graph: Creating a linked open data source for data sets. Quantitative Science Studies 2 , 1324–1355 (2021).

Perry, A. & Netscher, S. Measuring the time spent on data curation. Journal of Documentation 78 , 282–304 (2022).

Trisovic, A. et al . Advancing computational reproducibility in the Dataverse data repository platform. In Proceedings of the 3rd International Workshop on Practical Reproducible Evaluation of Computer Systems , P-RECS ‘20, 15–20, https://doi.org/10.1145/3391800.3398173 (Association for Computing Machinery, New York, NY, USA, 2020).

Borgman, C. L., Scharnhorst, A. & Golshan, M. S. Digital data archives as knowledge infrastructures: Mediating data sharing and reuse. Journal of the Association for Information Science and Technology 70 , 888–904, https://doi.org/10.1002/asi.24172 (2019).

Lafia, S. et al . MICA Data Descriptor. Zenodo https://doi.org/10.5281/zenodo.8432666 (2023).

Lafia, S., Thomer, A., Bleckley, D., Akmon, D. & Hemphill, L. Leveraging machine learning to detect data curation activities. In 2021 IEEE 17th International Conference on eScience (eScience) , 149–158, https://doi.org/10.1109/eScience51609.2021.00025 (2021).

Hemphill, L., Pienta, A., Lafia, S., Akmon, D. & Bleckley, D. How do properties of data, their curation, and their funding relate to reuse? J. Assoc. Inf. Sci. Technol. 73 , 1432–44, https://doi.org/10.1002/asi.24646 (2021).

Lafia, S., Fan, L., Thomer, A. & Hemphill, L. Subdivisions and crossroads: Identifying hidden community structures in a data archive’s citation network. Quantitative Science Studies 3 , 694–714, https://doi.org/10.1162/qss_a_00209 (2022).

ICPSR. ICPSR Bibliography of Data-related Literature: Collection Criteria. https://www.icpsr.umich.edu/web/pages/ICPSR/citations/collection-criteria.html (2023).

Lafia, S., Fan, L. & Hemphill, L. A natural language processing pipeline for detecting informal data references in academic literature. Proc. Assoc. Inf. Sci. Technol. 59 , 169–178, https://doi.org/10.1002/pra2.614 (2022).

Hook, D. W., Porter, S. J. & Herzog, C. Dimensions: Building context for search and evaluation. Frontiers in Research Metrics and Analytics 3 , 23, https://doi.org/10.3389/frma.2018.00023 (2018).

https://www.icpsr.umich.edu/web/ICPSR/thesaurus (2002). ICPSR. ICPSR Thesaurus.

https://www.icpsr.umich.edu/files/datamanagement/icpsr-curation-levels.pdf (2020). ICPSR. ICPSR Curation Levels.

McKinney, W. Data Structures for Statistical Computing in Python. In van der Walt, S. & Millman, J. (eds.) Proceedings of the 9th Python in Science Conference , 56–61 (2010).

Wickham, H. et al . Welcome to the Tidyverse. Journal of Open Source Software 4 , 1686 (2019).

Fan, L., Lafia, S., Li, L., Yang, F. & Hemphill, L. DataChat: Prototyping a conversational agent for dataset search and visualization. Proc. Assoc. Inf. Sci. Technol. 60 , 586–591 (2023).

Download references

Acknowledgements

We thank the ICPSR Bibliography staff, the ICPSR Data Curation Unit, and the ICPSR Data Stewardship Committee for their support of this research. This material is based upon work supported by the National Science Foundation under grant 1930645. This project was made possible in part by the Institute of Museum and Library Services LG-37-19-0134-19.

Author information

Authors and affiliations.

Inter-university Consortium for Political and Social Research, University of Michigan, Ann Arbor, MI, 48104, USA

Libby Hemphill, Sara Lafia, David Bleckley & Elizabeth Moss

School of Information, University of Michigan, Ann Arbor, MI, 48104, USA

Libby Hemphill & Lizhou Fan

School of Information, University of Arizona, Tucson, AZ, 85721, USA

Andrea Thomer

You can also search for this author in PubMed Google Scholar

Contributions

L.H. and A.T. conceptualized the study design, D.B., E.M., and S.L. prepared the data, S.L., L.F., and L.H. analyzed the data, and D.B. validated the data. All authors reviewed and edited the manuscript.

Corresponding author

Correspondence to Libby Hemphill .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Hemphill, L., Thomer, A., Lafia, S. et al. A dataset for measuring the impact of research data and their curation. Sci Data 11 , 442 (2024). https://doi.org/10.1038/s41597-024-03303-2

Download citation

Received : 16 November 2023

Accepted : 24 April 2024

Published : 03 May 2024

DOI : https://doi.org/10.1038/s41597-024-03303-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

Explore articles by subject
Guide to authors
Editorial policies

A .gov website belongs to an official government organization in the United States.

A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Risk Factors
Providing Care
Living with Diabetes
Clinical Guidance
DSMES for Health Care Providers
Prevent Type 2 Diabetes: Talking to Your Patients About Lifestyle Change
Employers and Insurers
Community-based Organizations (CBOs)
Toolkits for Diabetes Educators and Community Health Workers
National Diabetes Statistics Report
Reports and Publications
Current Research Projects
National Diabetes Prevention Program
State, Local, and National Partner Diabetes Programs for Public Health
Diabetes Self-Management Education and Support (DSMES) Toolkit

Methods for the National Diabetes Statistics Report

Learn about the methods used in the National Diabetes Statistics Report.

Data collection

The estimates (unless otherwise noted) were derived from various data systems of the Centers for Disease Control and Prevention (CDC), Indian Health Service (IHS), Agency for Healthcare Research and Quality (AHRQ), and U.S. Census Bureau and from published research studies. Estimated percentages and total number of people with diabetes and prediabetes were derived from the National Health and Nutrition Examination Survey (NHANES), National Health Interview Survey (NHIS), IHS National Data Warehouse (NDW), Behavioral Risk Factor Surveillance System (BRFSS), United States Diabetes Surveillance System (USDSS), and U.S. resident population estimates.

Diagnosed diabetes status was determined from self-reported information provided by survey respondents. Undiagnosed diabetes was determined by measured fasting plasma glucose or A1C levels among people without self-reported diagnosed diabetes. Numbers and rates for acute and long-term complications of diabetes were derived from the National Inpatient Sample (NIS) and National Emergency Department Sample (NEDS), as well as NHIS.

For some measures, estimates were not available for certain racial and ethnic subgroups due to small sample sizes.

Diabetes estimates

An alpha level of 0.05 was used when determining statistically significant differences between groups. Age-adjusted estimates were calculated among adults aged 18 years or older by the direct method to the 2000 U.S. Census standard population, using age groups 18–44, 45–64, and 65 years or older. Most estimates of diabetes in this report do not differentiate between type 1 and type 2 diabetes. However, as type 2 diabetes accounts for 90% to 95% of all diabetes cases, the data presented here are more likely to be characteristic of type 2 diabetes, except as noted.

More information about the data sources, methods, and references is available in Appendix B: Detailed Methods and Data Sources .

Diabetes is a chronic disease that affects how your body turns food into energy. About 1 in 10 Americans has diabetes.

For Everyone

Health care providers, public health.

An official website of the United States government

Here’s how you know

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

United States Drug Enforcement Administration

Get Updates
Submit A Tip

DEA Releases 2024 National Drug Threat Assessment

WASHINGTON – Today, DEA Administrator Anne Milgram announced the release of the 2024 National Drug Threat Assessment (NDTA), DEA’s comprehensive strategic assessment of illicit drug threats and trafficking trends endangering the United States.

For more than a decade, DEA’s NDTA has been a trusted resource for law enforcement agencies, policy makers, and prevention and treatment specialists and has been integral in informing policies and laws. It also serves as a critical tool to inform and educate the public.

DEA’s top priority is reducing the supply of deadly drugs in our country and defeating the two cartels responsible for the vast majority of drug trafficking in the United States. The drug poisoning crisis remains a public safety, public health, and national security issue, which requires a new approach.

“The shift from plant-based drugs, like heroin and cocaine, to synthetic, chemical-based drugs, like fentanyl and methamphetamine, has resulted in the most dangerous and deadly drug crisis the United States has ever faced,” said DEA Administrator Anne Milgram. “At the heart of the synthetic drug crisis are the Sinaloa and Jalisco cartels and their associates, who DEA is tracking world-wide. The suppliers, manufacturers, distributors, and money launderers all play a role in the web of deliberate and calculated treachery orchestrated by these cartels. DEA will continue to use all available resources to target these networks and save American lives.”

Drug-related deaths claimed 107,941 American lives in 2022, according to the Centers for Disease Control and Prevention (CDC). Fentanyl and other synthetic opioids are responsible for approximately 70% of lives lost, while methamphetamine and other synthetic stimulants are responsible for approximately 30% of deaths.

Fentanyl is the nation’s greatest and most urgent drug threat. Two milligrams (mg) of fentanyl is considered a potentially fatal dose. Pills tested in DEA laboratories average 2.4 mg of fentanyl, but have ranged from 0.2 mg to as high as 9 mg. The advent of fentanyl mixtures to include other synthetic opioids, such as nitazenes, or the veterinary sedative xylazine have increased the harms associated with fentanyl. Seizures of fentanyl, in both powder and pill form, are at record levels. Over the past two years seizures of fentanyl powder nearly doubled. DEA seized 13,176 kilograms (29,048 pounds) in 2023. Meanwhile, the more than 79 million fentanyl pills seized by DEA in 2023 is almost triple what was seized in 2021. Last year, 30% of the fentanyl powder seized by DEA contained xylazine. That is up from 25% in 2022.

Social media platforms and encrypted apps extend the cartels’ reach into every community in the United States and across nearly 50 countries worldwide. Drug traffickers and their associates use technology to advertise and sell their products, collect payment, recruit and train couriers, and deliver drugs to customers without having to meet face-to-face. This new age of digital drug dealing has pushed the peddling of drugs off the streets of America and into our pockets and purses.

The cartels have built mutually profitable partnerships with China-based precursor chemical companies to obtain the necessary ingredients to manufacturer synthetic drugs. They also work in partnership with Chinese money laundering organizations to launder drug proceeds and are increasingly using cryptocurrency.

Nearly all the methamphetamines sold in the United States today is manufactured in Mexico, and it is purer and more potent than in years past. The shift to Mexican-manufactured methamphetamine is evidenced by the dramatic decline in domestic clandestine lab seizures. In 2023, DEA’s El Paso Intelligence Center (EPIC) documented 60 domestic methamphetamine clandestine lab seizures, which is a stark comparison to 2004 when 23,700 clandestine methamphetamine labs were seized in the United States.

DEA’s NDTA gathers information from many data sources, such as drug investigations and seizures, drug purity, laboratory analysis, and information on transnational and domestic criminal groups.

It is available DEA.gov to view or download.

Are Markups Driving the Ups and Downs of Inflation?

Download PDF (158 KB)

FRBSF Economic Letter 2024-12 | May 13, 2024

How much impact have price markups for goods and services had on the recent surge and the subsequent decline of inflation? Since 2021, markups have risen substantially in a few industries such as motor vehicles and petroleum. However, aggregate markups—which are more relevant for overall inflation—have generally remained flat, in line with previous economic recoveries over the past three decades. These patterns suggest that markup fluctuations have not been a main driver of the ups and downs of inflation during the post-pandemic recovery.

In the recovery from the pandemic, U.S. inflation surged to a peak of over 7% in June 2022 and has since declined to 2.7% in March 2024, as measured by the 12-month change in the personal consumption expenditures (PCE) price index. What factors have been driving the ups and downs of inflation? Production costs are traditionally considered a main contributor, particularly costs stemming from fluctuations in demand for and supply of goods and services. As demand for their products rises, companies need to hire more workers and buy more intermediate goods, pushing up production costs. Supply chain disruptions can also push up the cost of production. Firms may pass on all or part of the cost increases to consumers by raising prices. Thus, an important theoretical linkage runs from cost increases to inflation. Likewise, decreases in costs should lead to disinflation.

Labor costs are an important factor of production costs and are often useful for gauging inflationary pressures. However, during the post-pandemic surge in inflation, nominal wages rose more slowly than prices, such that real labor costs were falling until early 2023. By contrast, disruptions to global supply chains pushed up intermediate goods costs, contributing to the surge in inflation (see, for example, Liu and Nguyen 2023). However, supply chains have more direct impacts on goods inflation than on services inflation, which also rose substantially.

In this Economic Letter , we consider another factor that might drive inflation fluctuations: changes in firms’ pricing power and markups. An increase in pricing power would be reflected in price-cost markups, leading to higher inflation; likewise, a decline in pricing power and markups could alleviate inflation pressures. We use industry-level measures of markups to trace their evolving impact on inflation during the current expansion. We find that markups rose substantially in some sectors, such as the motor vehicles industry. However, the aggregate markup across all sectors of the economy, which is more relevant for inflation, has stayed essentially flat during the post-pandemic recovery. This is broadly in line with patterns during previous business cycle recoveries. Overall, our analysis suggests that fluctuations in markups were not a main driver of the post-pandemic surge in inflation, nor of the recent disinflation that started in mid-2022.

Potential drivers of inflation: Production costs and markups

To support households and businesses during the pandemic, the Federal Reserve lowered the federal funds rate target to essentially zero, and the federal government provided large fiscal transfers and increased unemployment benefits. These policies boosted demand for goods and services, especially as the economy recovered from the depth of the pandemic.

The increase in overall demand, combined with supply shortages, boosted the costs of production, contributing to the surge in inflation during the post-pandemic recovery. Although labor costs account for a large part of firms’ total production costs, real labor costs were falling between early 2021 and mid-2022 such that the increases in prices outpaced those in nominal wages. This makes it unlikely that labor costs were driving the surge in inflation.

Instead, we focus on another potential alternative driver of inflation that resulted from firms’ ability to adjust prices, known as pricing power. As demand for goods surged early in the post-pandemic recovery, companies may have had a greater ability to raise their prices above their production costs, a gap known as markups. Following a sharp drop in spending at the height of the pandemic, people may have become eager to resume normal spending patterns and hence more tolerant to price increases than in the past. In fact, growth of nonfinancial corporate profits accelerated in the early part of the recovery (see Figure 1), suggesting that companies had increased pricing power. Some studies have pointed to the strong growth in nonfinancial corporate profits in 2021 as evidence that increased markups have contributed to inflation (see, for example, Weber and Wasmer 2023). However, the figure also shows that growth in corporate profits is typically volatile. Corporate profits tend to rise in the early stages of economic recoveries. Data for the current recovery show that the increase in corporate profits is not particularly pronounced compared with previous recoveries.

Figure 1 Profit growth for nonfinancial businesses

More importantly, corporate profits are an imperfect measure of a firm’s pricing power because several other factors can drive changes in profitability. For instance, much of the recent rise in corporate profits can be attributed to lower business taxes and higher subsidies from pandemic-related government support, as well as lower net interest payments due to monetary policy accommodation (Pallazzo 2023).

Instead of relying on profits as a measure of pricing power, we construct direct measures of markups based on standard economic models. Theory suggests that companies set prices as a markup over variable production costs, and that markup can be inferred from the share of a firm’s revenue spent on a given variable production factor, such as labor or intermediate goods. Over the period of data we use, we assume that the specific proportion of a company’s production costs going toward inputs does not change. If the share of a firm’s revenue used for inputs falls, it would imply a rise in the firm’s price-cost margin or markup. In our main analysis, we use industry-level data from the Bureau of Economic Analysis (BEA) to compute markups based on the share of revenue spent on intermediate inputs. Our results are similar if we instead use the share of revenue going toward labor costs.

We compare the evolution of markups to that of prices, as measured by the PCE price index, since the recovery from the pandemic. In constructing this price index, the BEA takes into account changes in product characteristics (for instance, size) that could otherwise bias the inflation measure by comparing the prices of inherently different products over time. Similarly, based upon standard economic theory, our markup measure implicitly captures changes in those characteristics (see, for example, Aghion et al. 2023).

The post-pandemic evolution of markups

We examine the evolution of markups in each industry since the third quarter of 2020, the start of the post-pandemic recovery. Figure 2 shows that some sectors, such as the motor vehicles and petroleum industries, experienced large cumulative increases in markups during the recovery. Markups also rose substantially in general merchandise, such as department stores, and for other services, such as repair and maintenance, personal care, and laundry services. Since the start of the expansion, markups in those industries rose by over 10%—comparable in size to the cumulative increases over the same period in the core PCE price index, which excludes volatile food and energy components. However, the surge in inflation through June 2022 was broad based, with prices also rising substantially outside of these sectors. Thus, understanding the importance of markups for driving inflation requires a macroeconomic perspective that examines the evolution of aggregate markups across all sectors of the economy.

Figure 2 Cumulative changes in markups for salient industries

The role of aggregate markups in the economy

To assess how much markup changes contribute to movements in inflation more broadly, we use our industry-level measurements to calculate an aggregate markup at the macroeconomic level. We aggregate the cumulative changes in industry markups, applying two different weighting methods, as displayed in Figure 3. In the first method (green line), we match our industry categories to the spending categories in the core PCE price index for ease of comparison; we then use the PCE weights for each category to compute the aggregate markup. Alternatively, we use each industry’s cost weights to compute the aggregate markup (blue line). Regardless of the weighting method, Figure 3 shows that aggregate markups have stayed essentially flat since the start of the recovery, while the core PCE price index (gray line) rose by more than 10%. Thus, changes in markups are not likely to be the main driver of inflation during the recovery, which aligns with results from Glover, Mustre-del-Río, and von Ende-Becker (2023) and Hornstein (2023) using different methodologies or data. Markups also have not played much of a role in the slowing of inflation since the summer of 2022.

Figure 3 Cumulative changes in aggregate markups and prices

Moreover, the path of aggregate markups over the past three years is not unusual compared with previous recoveries. Figure 4 shows the cumulative changes in aggregate markups since the start of the current recovery (dark blue line), alongside aggregate markups following the 1991 (green line), 2001 (yellow line), and 2008 (light blue line) recessions. Aggregate markups have stayed roughly constant throughout all four recoveries.

Figure 4 Cumulative changes of aggregate markups in recoveries

Firms’ pricing power may change over time, resulting in markup fluctuations. In this Letter , we examine whether increases in markups played an important role during the inflation surge between early 2021 and mid-2022 and if declines in markups have contributed to disinflation since then. Using industry-level data, we show that markups did rise substantially in a few important sectors, such as motor vehicles and petroleum products. However, aggregate markups—the more relevant measure for overall inflation—have stayed essentially flat since the start of the recovery. As such, rising markups have not been a main driver of the recent surge and subsequent decline in inflation during the current recovery.

Aghion, Philippe, Antonin Bergeaud, Timo Boppart, Peter J. Klenow, and Huiyu Li. 2023. “A Theory of Falling Growth and Rising Rents.” Review of Economic Studies 90(6), pp.2,675-2,702.

Glover, Andrew, José Mustre-del-Río, and Alice von Ende-Becker. 2023. “ How Much Have Record Corporate Profits Contributed to Recent Inflation? ” FRB Kansas City Economic Review 108(1).

Hornstein, Andreas. 2023. “ Profits and Inflation in the Time of Covid .” FRB Richmond Economic Brief 23-38 (November).

Liu, Zheng, and Thuy Lan Nguyen. 2023. “ Global Supply Chain Pressures and U.S. Inflation .” FRBSF Economic Letter 2023-14 (June 20).

Palazzo, Berardino. 2023. “ Corporate Profits in the Aftermath of COVID-19 .” FEDS Notes , Federal Reserve Board of Governors, September 8.

Weber, Isabella M. and Evan Wasner. 2023. “Sellers’ Inflation, Profits and Conflict: Why Can Large Firms Hike Prices in an Emergency?” Review of Keynesian Economics 11(2), pp. 183-213.

Opinions expressed in FRBSF Economic Letter do not necessarily reflect the views of the management of the Federal Reserve Bank of San Francisco or of the Board of Governors of the Federal Reserve System. This publication is edited by Anita Todd and Karen Barnes. Permission to reprint portions of articles or whole articles must be obtained in writing. Please send editorial comments and requests for reprint permission to [email protected]

McKinsey Global Private Markets Review 2024: Private markets in a slower era

At a glance, macroeconomic challenges continued.

McKinsey Global Private Markets Review 2024: Private markets: A slower era

If 2022 was a tale of two halves, with robust fundraising and deal activity in the first six months followed by a slowdown in the second half, then 2023 might be considered a tale of one whole. Macroeconomic headwinds persisted throughout the year, with rising financing costs, and an uncertain growth outlook taking a toll on private markets. Full-year fundraising continued to decline from 2021’s lofty peak, weighed down by the “denominator effect” that persisted in part due to a less active deal market. Managers largely held onto assets to avoid selling in a lower-multiple environment, fueling an activity-dampening cycle in which distribution-starved limited partners (LPs) reined in new commitments.

About the authors

This article is a summary of a larger report, available as a PDF, that is a collaborative effort by Fredrik Dahlqvist , Alastair Green , Paul Maia, Alexandra Nee , David Quigley , Aditya Sanghvi , Connor Mangan, John Spivey, Rahel Schneider, and Brian Vickery , representing views from McKinsey’s Private Equity & Principal Investors Practice.

Performance in most private asset classes remained below historical averages for a second consecutive year. Decade-long tailwinds from low and falling interest rates and consistently expanding multiples seem to be things of the past. As private market managers look to boost performance in this new era of investing, a deeper focus on revenue growth and margin expansion will be needed now more than ever.

Perspectives on a slower era in private markets

Global fundraising contracted.

Fundraising fell 22 percent across private market asset classes globally to just over $1 trillion, as of year-end reported data—the lowest total since 2017. Fundraising in North America, a rare bright spot in 2022, declined in line with global totals, while in Europe, fundraising proved most resilient, falling just 3 percent. In Asia, fundraising fell precipitously and now sits 72 percent below the region’s 2018 peak.

Despite difficult fundraising conditions, headwinds did not affect all strategies or managers equally. Private equity (PE) buyout strategies posted their best fundraising year ever, and larger managers and vehicles also fared well, continuing the prior year’s trend toward greater fundraising concentration.

The numerator effect persisted

Despite a marked recovery in the denominator—the 1,000 largest US retirement funds grew 7 percent in the year ending September 2023, after falling 14 percent the prior year, for example 1 “U.S. retirement plans recover half of 2022 losses amid no-show recession,” Pensions and Investments , February 12, 2024. —many LPs remain overexposed to private markets relative to their target allocations. LPs started 2023 overweight: according to analysis from CEM Benchmarking, average allocations across PE, infrastructure, and real estate were at or above target allocations as of the beginning of the year. And the numerator grew throughout the year, as a lack of exits and rebounding valuations drove net asset values (NAVs) higher. While not all LPs strictly follow asset allocation targets, our analysis in partnership with global private markets firm StepStone Group suggests that an overallocation of just one percentage point can reduce planned commitments by as much as 10 to 12 percent per year for five years or more.

Despite these headwinds, recent surveys indicate that LPs remain broadly committed to private markets. In fact, the majority plan to maintain or increase allocations over the medium to long term.

Investors fled to known names and larger funds

Fundraising concentration reached its highest level in over a decade, as investors continued to shift new commitments in favor of the largest fund managers. The 25 most successful fundraisers collected 41 percent of aggregate commitments to closed-end funds (with the top five managers accounting for nearly half that total). Closed-end fundraising totals may understate the extent of concentration in the industry overall, as the largest managers also tend to be more successful in raising non-institutional capital.

While the largest funds grew even larger—the largest vehicles on record were raised in buyout, real estate, infrastructure, and private debt in 2023—smaller and newer funds struggled. Fewer than 1,700 funds of less than $1 billion were closed during the year, half as many as closed in 2022 and the fewest of any year since 2012. New manager formation also fell to the lowest level since 2012, with just 651 new firms launched in 2023.

Whether recent fundraising concentration and a spate of M&A activity signals the beginning of oft-rumored consolidation in the private markets remains uncertain, as a similar pattern developed in each of the last two fundraising downturns before giving way to renewed entrepreneurialism among general partners (GPs) and commitment diversification among LPs. Compared with how things played out in the last two downturns, perhaps this movie really is different, or perhaps we’re watching a trilogy reusing a familiar plotline.

Dry powder inventory spiked (again)

Private markets assets under management totaled $13.1 trillion as of June 30, 2023, and have grown nearly 20 percent per annum since 2018. Dry powder reserves—the amount of capital committed but not yet deployed—increased to $3.7 trillion, marking the ninth consecutive year of growth. Dry powder inventory—the amount of capital available to GPs expressed as a multiple of annual deployment—increased for the second consecutive year in PE, as new commitments continued to outpace deal activity. Inventory sat at 1.6 years in 2023, up markedly from the 0.9 years recorded at the end of 2021 but still within the historical range. NAV grew as well, largely driven by the reluctance of managers to exit positions and crystallize returns in a depressed multiple environment.

Private equity strategies diverged

Buyout and venture capital, the two largest PE sub-asset classes, charted wildly different courses over the past 18 months. Buyout notched its highest fundraising year ever in 2023, and its performance improved, with funds posting a (still paltry) 5 percent net internal rate of return through September 30. And although buyout deal volumes declined by 19 percent, 2023 was still the third-most-active year on record. In contrast, venture capital (VC) fundraising declined by nearly 60 percent, equaling its lowest total since 2015, and deal volume fell by 36 percent to the lowest level since 2019. VC funds returned –3 percent through September, posting negative returns for seven consecutive quarters. VC was the fastest-growing—as well as the highest-performing—PE strategy by a significant margin from 2010 to 2022, but investors appear to be reevaluating their approach in the current environment.

Private equity entry multiples contracted

PE buyout entry multiples declined by roughly one turn from 11.9 to 11.0 times EBITDA, slightly outpacing the decline in public market multiples (down from 12.1 to 11.3 times EBITDA), through the first nine months of 2023. For nearly a decade leading up to 2022, managers consistently sold assets into a higher-multiple environment than that in which they had bought those assets, providing a substantial performance tailwind for the industry. Nowhere has this been truer than in technology. After experiencing more than eight turns of multiple expansion from 2009 to 2021 (the most of any sector), technology multiples have declined by nearly three turns in the past two years, 50 percent more than in any other sector. Overall, roughly two-thirds of the total return for buyout deals that were entered in 2010 or later and exited in 2021 or before can be attributed to market multiple expansion and leverage. Now, with falling multiples and higher financing costs, revenue growth and margin expansion are taking center stage for GPs.

Real estate receded

Demand uncertainty, slowing rent growth, and elevated financing costs drove cap rates higher and made price discovery challenging, all of which weighed on deal volume, fundraising, and investment performance. Global closed-end fundraising declined 34 percent year over year, and funds returned −4 percent in the first nine months of the year, losing money for the first time since the 2007–08 global financial crisis. Capital shifted away from core and core-plus strategies as investors sought liquidity via redemptions in open-end vehicles, from which net outflows reached their highest level in at least two decades. Opportunistic strategies benefited from this shift, with investors focusing on capital appreciation over income generation in a market where alternative sources of yield have grown more attractive. Rising interest rates widened bid–ask spreads and impaired deal volume across food groups, including in what were formerly hot sectors: multifamily and industrial.

Private debt pays dividends

Debt again proved to be the most resilient private asset class against a turbulent market backdrop. Fundraising declined just 13 percent, largely driven by lower commitments to direct lending strategies, for which a slower PE deal environment has made capital deployment challenging. The asset class also posted the highest returns among all private asset classes through September 30. Many private debt securities are tied to floating rates, which enhance returns in a rising-rate environment. Thus far, managers appear to have successfully navigated the rising incidence of default and distress exhibited across the broader leveraged-lending market. Although direct lending deal volume declined from 2022, private lenders financed an all-time high 59 percent of leveraged buyout transactions last year and are now expanding into additional strategies to drive the next era of growth.

Infrastructure took a detour

After several years of robust growth and strong performance, infrastructure and natural resources fundraising declined by 53 percent to the lowest total since 2013. Supply-side timing is partially to blame: five of the seven largest infrastructure managers closed a flagship vehicle in 2021 or 2022, and none of those five held a final close last year. As in real estate, investors shied away from core and core-plus investments in a higher-yield environment. Yet there are reasons to believe infrastructure’s growth will bounce back. Limited partners (LPs) surveyed by McKinsey remain bullish on their deployment to the asset class, and at least a dozen vehicles targeting more than $10 billion were actively fundraising as of the end of 2023. Multiple recent acquisitions of large infrastructure GPs by global multi-asset-class managers also indicate marketwide conviction in the asset class’s potential.

Private markets still have work to do on diversity

Private markets firms are slowly improving their representation of females (up two percentage points over the prior year) and ethnic and racial minorities (up one percentage point). On some diversity metrics, including entry-level representation of women, private markets now compare favorably with corporate America. Yet broad-based parity remains elusive and too slow in the making. Ethnic, racial, and gender imbalances are particularly stark across more influential investing roles and senior positions. In fact, McKinsey’s research reveals that at the current pace, it would take several decades for private markets firms to reach gender parity at senior levels. Increasing representation across all levels will require managers to take fresh approaches to hiring, retention, and promotion.

Artificial intelligence generating excitement

The transformative potential of generative AI was perhaps 2023’s hottest topic (beyond Taylor Swift). Private markets players are excited about the potential for the technology to optimize their approach to thesis generation, deal sourcing, investment due diligence, and portfolio performance, among other areas. While the technology is still nascent and few GPs can boast scaled implementations, pilot programs are already in flight across the industry, particularly within portfolio companies. Adoption seems nearly certain to accelerate throughout 2024.

Private markets in a slower era

If private markets investors entered 2023 hoping for a return to the heady days of 2021, they likely left the year disappointed. Many of the headwinds that emerged in the latter half of 2022 persisted throughout the year, pressuring fundraising, dealmaking, and performance. Inflation moderated somewhat over the course of the year but remained stubbornly elevated by recent historical standards. Interest rates started high and rose higher, increasing the cost of financing. A reinvigorated public equity market recovered most of 2022’s losses but did little to resolve the valuation uncertainty private market investors have faced for the past 18 months.

Within private markets, the denominator effect remained in play, despite the public market recovery, as the numerator continued to expand. An activity-dampening cycle emerged: higher cost of capital and lower multiples limited the ability or willingness of general partners (GPs) to exit positions; fewer exits, coupled with continuing capital calls, pushed LP allocations higher, thereby limiting their ability or willingness to make new commitments. These conditions weighed on managers’ ability to fundraise. Based on data reported as of year-end 2023, private markets fundraising fell 22 percent from the prior year to just over $1 trillion, the largest such drop since 2009 (Exhibit 1).

The impact of the fundraising environment was not felt equally among GPs. Continuing a trend that emerged in 2022, and consistent with prior downturns in fundraising, LPs favored larger vehicles and the scaled GPs that typically manage them. Smaller and newer managers struggled, and the number of sub–$1 billion vehicles and new firm launches each declined to its lowest level in more than a decade.

Despite the decline in fundraising, private markets assets under management (AUM) continued to grow, increasing 12 percent to $13.1 trillion as of June 30, 2023. 2023 fundraising was still the sixth-highest annual haul on record, pushing dry powder higher, while the slowdown in deal making limited distributions.

Investment performance across private market asset classes fell short of historical averages. Private equity (PE) got back in the black but generated the lowest annual performance in the past 15 years, excluding 2022. Closed-end real estate produced negative returns for the first time since 2009, as capitalization (cap) rates expanded across sectors and rent growth dissipated in formerly hot sectors, including multifamily and industrial. The performance of infrastructure funds was less than half of its long-term average and even further below the double-digit returns generated in 2021 and 2022. Private debt was the standout performer (if there was one), outperforming all other private asset classes and illustrating the asset class’s countercyclical appeal.

Private equity down but not out

Higher financing costs, lower multiples, and an uncertain macroeconomic environment created a challenging backdrop for private equity managers in 2023. Fundraising declined for the second year in a row, falling 15 percent to $649 billion, as LPs grappled with the denominator effect and a slowdown in distributions. Managers were on the fundraising trail longer to raise this capital: funds that closed in 2023 were open for a record-high average of 20.1 months, notably longer than 18.7 months in 2022 and 14.1 months in 2018. VC and growth equity strategies led the decline, dropping to their lowest level of cumulative capital raised since 2015. Fundraising in Asia fell for the fourth year of the last five, with the greatest decline in China.

Despite the difficult fundraising context, a subset of strategies and managers prevailed. Buyout managers collectively had their best fundraising year on record, raising more than $400 billion. Fundraising in Europe surged by more than 50 percent, resulting in the region’s biggest haul ever. The largest managers raised an outsized share of the total for a second consecutive year, making 2023 the most concentrated fundraising year of the last decade (Exhibit 2).

Despite the drop in aggregate fundraising, PE assets under management increased 8 percent to $8.2 trillion. Only a small part of this growth was performance driven: PE funds produced a net IRR of just 2.5 percent through September 30, 2023. Buyouts and growth equity generated positive returns, while VC lost money. PE performance, dating back to the beginning of 2022, remains negative, highlighting the difficulty of generating attractive investment returns in a higher interest rate and lower multiple environment. As PE managers devise value creation strategies to improve performance, their focus includes ensuring operating efficiency and profitability of their portfolio companies.

Deal activity volume and count fell sharply, by 21 percent and 24 percent, respectively, which continued the slower pace set in the second half of 2022. Sponsors largely opted to hold assets longer rather than lock in underwhelming returns. While higher financing costs and valuation mismatches weighed on overall deal activity, certain types of M&A gained share. Add-on deals, for example, accounted for a record 46 percent of total buyout deal volume last year.

Real estate recedes

For real estate, 2023 was a year of transition, characterized by a litany of new and familiar challenges. Pandemic-driven demand issues continued, while elevated financing costs, expanding cap rates, and valuation uncertainty weighed on commercial real estate deal volumes, fundraising, and investment performance.

Managers faced one of the toughest fundraising environments in many years. Global closed-end fundraising declined 34 percent to $125 billion. While fundraising challenges were widespread, they were not ubiquitous across strategies. Dollars continued to shift to large, multi-asset class platforms, with the top five managers accounting for 37 percent of aggregate closed-end real estate fundraising. In April, the largest real estate fund ever raised closed on a record $30 billion.

Capital shifted away from core and core-plus strategies as investors sought liquidity through redemptions in open-end vehicles and reduced gross contributions to the lowest level since 2009. Opportunistic strategies benefited from this shift, as investors turned their attention toward capital appreciation over income generation in a market where alternative sources of yield have grown more attractive.

In the United States, for instance, open-end funds, as represented by the National Council of Real Estate Investment Fiduciaries Fund Index—Open-End Equity (NFI-OE), recorded $13 billion in net outflows in 2023, reversing the trend of positive net inflows throughout the 2010s. The negative flows mainly reflected $9 billion in core outflows, with core-plus funds accounting for the remaining outflows, which reversed a 20-year run of net inflows.

As a result, the NAV in US open-end funds fell roughly 16 percent year over year. Meanwhile, global assets under management in closed-end funds reached a new peak of $1.7 trillion as of June 2023, growing 14 percent between June 2022 and June 2023.

Real estate underperformed historical averages in 2023, as previously high-performing multifamily and industrial sectors joined office in producing negative returns caused by slowing demand growth and cap rate expansion. Closed-end funds generated a pooled net IRR of −3.5 percent in the first nine months of 2023, losing money for the first time since the global financial crisis. The lone bright spot among major sectors was hospitality, which—thanks to a rush of postpandemic travel—returned 10.3 percent in 2023. 2 Based on NCREIFs NPI index. Hotels represent 1 percent of total properties in the index. As a whole, the average pooled lifetime net IRRs for closed-end real estate funds from 2011–20 vintages remained around historical levels (9.8 percent).

Global deal volume declined 47 percent in 2023 to reach a ten-year low of $650 billion, driven by widening bid–ask spreads amid valuation uncertainty and higher costs of financing (Exhibit 3). 3 CBRE, Real Capital Analytics Deal flow in the office sector remained depressed, partly as a result of continued uncertainty in the demand for space in a hybrid working world.

During a turbulent year for private markets, private debt was a relative bright spot, topping private markets asset classes in terms of fundraising growth, AUM growth, and performance.

Fundraising for private debt declined just 13 percent year over year, nearly ten percentage points less than the private markets overall. Despite the decline in fundraising, AUM surged 27 percent to $1.7 trillion. And private debt posted the highest investment returns of any private asset class through the first three quarters of 2023.

Private debt’s risk/return characteristics are well suited to the current environment. With interest rates at their highest in more than a decade, current yields in the asset class have grown more attractive on both an absolute and relative basis, particularly if higher rates sustain and put downward pressure on equity returns (Exhibit 4). The built-in security derived from debt’s privileged position in the capital structure, moreover, appeals to investors that are wary of market volatility and valuation uncertainty.

Direct lending continued to be the largest strategy in 2023, with fundraising for the mostly-senior-debt strategy accounting for almost half of the asset class’s total haul (despite declining from the previous year). Separately, mezzanine debt fundraising hit a new high, thanks to the closings of three of the largest funds ever raised in the strategy.

Over the longer term, growth in private debt has largely been driven by institutional investors rotating out of traditional fixed income in favor of private alternatives. Despite this growth in commitments, LPs remain underweight in this asset class relative to their targets. In fact, the allocation gap has only grown wider in recent years, a sharp contrast to other private asset classes, for which LPs’ current allocations exceed their targets on average. According to data from CEM Benchmarking, the private debt allocation gap now stands at 1.4 percent, which means that, in aggregate, investors must commit hundreds of billions in net new capital to the asset class just to reach current targets.

Private debt was not completely immune to the macroeconomic conditions last year, however. Fundraising declined for the second consecutive year and now sits 23 percent below 2021’s peak. Furthermore, though private lenders took share in 2023 from other capital sources, overall deal volumes also declined for the second year in a row. The drop was largely driven by a less active PE deal environment: private debt is predominantly used to finance PE-backed companies, though managers are increasingly diversifying their origination capabilities to include a broad new range of companies and asset types.

Infrastructure and natural resources take a detour

For infrastructure and natural resources fundraising, 2023 was an exceptionally challenging year. Aggregate capital raised declined 53 percent year over year to $82 billion, the lowest annual total since 2013. The size of the drop is particularly surprising in light of infrastructure’s recent momentum. The asset class had set fundraising records in four of the previous five years, and infrastructure is often considered an attractive investment in uncertain markets.

While there is little doubt that the broader fundraising headwinds discussed elsewhere in this report affected infrastructure and natural resources fundraising last year, dynamics specific to the asset class were at play as well. One issue was supply-side timing: nine of the ten largest infrastructure GPs did not close a flagship fund in 2023. Second was the migration of investor dollars away from core and core-plus investments, which have historically accounted for the bulk of infrastructure fundraising, in a higher rate environment.

The asset class had some notable bright spots last year. Fundraising for higher-returning opportunistic strategies more than doubled the prior year’s total (Exhibit 5). AUM grew 18 percent, reaching a new high of $1.5 trillion. Infrastructure funds returned a net IRR of 3.4 percent in 2023; this was below historical averages but still the second-best return among private asset classes. And as was the case in other asset classes, investors concentrated commitments in larger funds and managers in 2023, including in the largest infrastructure fund ever raised.

The outlook for the asset class, moreover, remains positive. Funds targeting a record amount of capital were in the market at year-end, providing a robust foundation for fundraising in 2024 and 2025. A recent spate of infrastructure GP acquisitions signal multi-asset managers’ long-term conviction in the asset class, despite short-term headwinds. Global megatrends like decarbonization and digitization, as well as revolutions in energy and mobility, have spurred new infrastructure investment opportunities around the world, particularly for value-oriented investors that are willing to take on more risk.

Private markets make measured progress in DEI

Diversity, equity, and inclusion (DEI) has become an important part of the fundraising, talent, and investing landscape for private market participants. Encouragingly, incremental progress has been made in recent years, including more diverse talent being brought to entry-level positions, investing roles, and investment committees. The scope of DEI metrics provided to institutional investors during fundraising has also increased in recent years: more than half of PE firms now provide data across investing teams, portfolio company boards, and portfolio company management (versus investment team data only). 4 “ The state of diversity in global private markets: 2023 ,” McKinsey, August 22, 2023.

In 2023, McKinsey surveyed 66 global private markets firms that collectively employ more than 60,000 people for the second annual State of diversity in global private markets report. 5 “ The state of diversity in global private markets: 2023 ,” McKinsey, August 22, 2023. The research offers insight into the representation of women and ethnic and racial minorities in private investing as of year-end 2022. In this chapter, we discuss where the numbers stand and how firms can bring a more diverse set of perspectives to the table.

The statistics indicate signs of modest advancement. Overall representation of women in private markets increased two percentage points to 35 percent, and ethnic and racial minorities increased one percentage point to 30 percent (Exhibit 6). Entry-level positions have nearly reached gender parity, with female representation at 48 percent. The share of women holding C-suite roles globally increased 3 percentage points, while the share of people from ethnic and racial minorities in investment committees increased 9 percentage points. There is growing evidence that external hiring is gradually helping close the diversity gap, especially at senior levels. For example, 33 percent of external hires at the managing director level were ethnic or racial minorities, higher than their existing representation level (19 percent).

Yet, the scope of the challenge remains substantial. Women and minorities continue to be underrepresented in senior positions and investing roles. They also experience uneven rates of progress due to lower promotion and higher attrition rates, particularly at smaller firms. Firms are also navigating an increasingly polarized workplace today, with additional scrutiny and a growing number of lawsuits against corporate diversity and inclusion programs, particularly in the US, which threatens to impact the industry’s pace of progress.

Fredrik Dahlqvist is a senior partner in McKinsey’s Stockholm office; Alastair Green is a senior partner in the Washington, DC, office, where Paul Maia and Alexandra Nee are partners; David Quigley is a senior partner in the New York office, where Connor Mangan is an associate partner and Aditya Sanghvi is a senior partner; Rahel Schneider is an associate partner in the Bay Area office; John Spivey is a partner in the Charlotte office; and Brian Vickery is a partner in the Boston office.

The authors wish to thank Jonathan Christy, Louis Dufau, Vaibhav Gujral, Graham Healy-Day, Laura Johnson, Ryan Luby, Tripp Norton, Alastair Rami, Henri Torbey, and Alex Wolkomir for their contributions

The authors would also like to thank CEM Benchmarking and the StepStone Group for their partnership in this year's report.

This article was edited by Arshiya Khullar, an editor in the Gurugram office.

Explore a career with us

CEO alpha: A new approach to generating private equity outperformance

Close up of network data flowing on black background

Private equity turns to resiliency strategies for software investments

The state of diversity in global Private Markets: 2023

The state of diversity in global private markets: 2022

ArXiv Papers

Simplifying Debiased Inference via Automatic Differentiation and Probabilistic Programming

Published 5/14/2024

We introduce an algorithm that simplifies the construction of efficient estimators, making them accessible to a broader audience. 'Dimple' takes as input computer code representing a parameter of interest and outputs an efficient estimator. Unlike standard approaches, it does not require users to derive a functional derivative known as the efficient influence function. Dimple avoids this task by applying automatic differentiation to the statistical functional of interest. Doing so requires expressing this functional as a composition of primitives satisfying a novel differentiability condition. Dimple also uses this composition to determine the nuisances it must estimate. In software, primitives can be implemented independently of one another and reused across different estimation problems. We provide a proof-of-concept Python implementation and showcase through examples how it allows users to go from parameter specification to efficient estimation with just a few lines of code.

IMAGES

Reporting Results of Common Statistical Tests in APA Format
How do I report Pearson's r and scatterplotsin APA style?
Reporting Statistics
Reporting Statistics
(PDF) A Guideline to Reporting Statistics in APA Style
Research Report Template

VIDEO

Statistics in real life
Components of research Report
Reporting Statistics in APA Style
Reporting Basics
Types of Report Writing
Statistics for Research

COMMENTS

Reporting Statistics in APA Style
Reporting Statistics in APA Style | Guidelines & Examples. Published on April 1, 2021 by Pritha Bhandari.Revised on January 17, 2024. The APA Publication Manual is commonly used for reporting research results in the social and natural sciences. This article walks you through APA Style standards for reporting statistics in academic writing.
PDF Reporting Results of Common Statistical Tests in APA Format
p values. There are two ways to report p values. One way is to use the alpha level (the a priori criterion for the probablility of falsely rejecting your null hypothesis), which is typically .05 or .01. Example: F(1, 24) = 44.4, p < .01. You may also report the exact p value (the a posteriori probability that the result that you obtained, or ...
How to Report Statistics
In many fields, a statistical analysis forms the heart of both the methods and results sections of a manuscript. Learn how to report statistical analyses, and what other context is important for publication success and future reproducibility. A matter of principle. First and foremost, the statistical methods employed in research must always be:
Guidelines for Reporting Statistics
When reporting a value that is calculated from a mean and SD value, report it in the following manner: mean + SD = 6; When reporting mean and SD in a table, include both values in the same column. The column or row header should include the wording "mean (SD)" after the variable name. Example: Age (years), mean (SD)
Reporting Statistics APA Style
General tips for Reporting Statistics APA Style. Use readable spacing, placing a space after commas, variables and mathematical symbols. For example: Correct: r (55) = .49, p < .001. Incorrect: r (55)=.49,p<.001. APA suggests using two spaces after periods to aid readability, but this is not required. Don't state formulas for common ...
Reporting Statistical Results in Medical Journals
P-value. In text, the P -value is written as an italic capital P followed by the value, while as a table header, it should be written as P -value. The authors should write the value instead of reporting the result as "not significant" or "NS" ( 3 ). For example, "the comparison was significant, with P = 0.003".
Reporting Statistics In APA ~ Rules & Examples
Here are the general rules for reporting statistics in APA: Use words for numbers under ten (1-9) and numerals for ten and over. Use space after commas, variables, and mathematical symbols. Round the decimal points to two places, except for p-values. Italicize the symbols and abbreviations, except if you have Greek letters.
Guidelines for Statistical Reporting in Medical Journals
Clarity and accuracy of statistical reporting in medical journals can enhance readers' understanding of the research conducted and the results obtained. In this manuscript, we provide guidelines for statistical reporting in medical journals for authors to consider, with a focus on the Journal of Thoracic Oncology.
PDF 7th Edition Numbers and Statistics Guide
° Report other means and standard deviations and correlations, proportions, and inferential statistics (t, F, chi-square) to two decimals. ° Report exact p values to two or three decimals (e.g., p = .006, p = .03). ° However, report p values less than .001 as "p < .001." • Keep in mind that these are general guidelines
New Guidelines for Statistical Reporting in the Journal
Some Journal readers may have noticed more parsimonious reporting of P values in our research articles over the past year. For example, in November 2018, we published two reports from the Vitamin ...
Statistics in APA
Cite your source automatically in APA. When including statistics in written text, be sure to include enough information for the reader to understand the study. Although the amount of explanation and data included depends upon the study, APA style has guidelines for the representation of statistical information: Use terms like respectively and ...
Reporting Statistics
Field-tested with students and professionals alike, this holistic book is the go-to guide for everyone who writes or speaks about numbers. Call Number: T11 .M485 (in the Data Services Lab) ISBN: 9780226185774. Publication Date: 2015-04-09. The Chicago Guide to Writing about Multivariate Analysis by Jane E. Miller.
Reporting statistical methods and outcome of statistical ...
The issue whether estimates from models or statistics calculated from the raw data provided throughout the paper should be clearly stated in the Materials and methods section. It is not necessary to report the descriptive statistics in the text if it is already reported in the tables or can be easily determined from the graphs.
Statistics review 1: Presenting and summarising data
The present review is the first in an ongoing guide to medical statistics, using specific examples from intensive care. The first step in any analysis is to describe and summarize the data. As well as becoming familiar with the data, this is also an opportunity to look for unusually high or low values (outliers), to check the assumptions ...
Guidelines for reporting statistics in journals published by the
The current guidelines issued by the Committee (see Ref. 14, p.39) are essentially identical. To an author unknowing about statistics, these Uniform Requirements guidelines give only slightly more help.. In this editorial, we present specific guidelines for reporting statistics. 1 These guidelines embody fundamental concepts in statistics; they are consistent with the Uniform Requirements and ...
Writing with Descriptive Statistics
Usually there is no good way to write a statistic. It rarely sounds good, and often interrupts the structure or flow of your writing. Oftentimes the best way to write descriptive statistics is to be direct. If you are citing several statistics about the same topic, it may be best to include them all in the same paragraph or section.
How to Write a Statistical Report (with Pictures)
For example, a statistical report by a mathematician may look incredibly different than one created by a market researcher for a retail business. 2. Type your report in an easy-to-read font. Statistical reports typically are typed single-spaced, using a font such as Arial or Times New Roman in 12-point size.
Statistics
Read the latest Research articles in Statistics from Scientific Reports
Data, Statistics, and Reporting
Data, Statistics, and Reporting. What to know. The Injury Center uses data to understand when and why violence and injuries occur and how to prevent them. Data and data systems help us understand the burden of unintentional and violence-related injury and death in the United States.
A dataset for measuring the impact of research data and their ...
Science funders, publishers, and data archives make decisions about how to responsibly allocate resources to maximize the reuse potential of research data. This paper introduces a dataset ...
Methods for the National Diabetes Statistics Report
Methods used for National Diabetes Statistics Report. Data collection. The estimates (unless otherwise noted) were derived from various data systems of the Centers for Disease Control and Prevention (CDC), Indian Health Service (IHS), Agency for Healthcare Research and Quality (AHRQ), and U.S. Census Bureau and from published research studies.
DEA Releases 2024 National Drug Threat Assessment
WASHINGTON - Today, DEA Administrator Anne Milgram announced the release of the 2024 National Drug Threat Assessment (NDTA), DEA's comprehensive strategic assessment of illicit drug threats and trafficking trends endangering the United States. For more than a decade, DEA's NDTA has been a trusted resource for law enforcement agencies, policy makers, and prevention and treatment ...
Are Markups Driving the Ups and Downs of Inflation?
How much impact have price markups for goods and services had on the recent surge and the subsequent decline of inflation? Since 2021, markups have risen substantially in a few industries such as motor vehicles and petroleum. However, aggregate markups—which are more relevant for overall inflation—have generally remained flat, in line with previous economic recoveries over the past three ...
Global private markets review 2024
The research offers insight into the representation of women and ethnic and racial minorities in private investing as of year-end 2022. In this chapter, we discuss where the numbers stand and how firms can bring a more diverse set of perspectives to the table. The statistics indicate signs of modest advancement.
AACR Releases Cancer Disparities Progress Report 2024
Report features latest statistics and nine compelling patient stories. PHILADELPHIA — The American Association for Cancer Research (AACR) today released its Cancer Disparities Progress Report 2024.First published in 2020, this biennial report raises awareness of the enormous toll that cancer exacts on racial and ethnic minority groups and other medically underserved populations in the United ...
Simplifying Debiased Inference via Automatic Differentiation and
We introduce an algorithm that simplifies the construction of efficient estimators, making them accessible to a broader audience. 'Dimple' takes as input computer code representing a parameter of interest and outputs an efficient estimator. Unlike standard approaches, it does not require users to derive a functional derivative known as the efficient influence function.
USDA
Access the portal of NASS, the official source of agricultural data and statistics in the US, and explore various reports and products.