• Privacy Policy

Research Method

Home » Factor Analysis – Steps, Methods and Examples

Factor Analysis – Steps, Methods and Examples

Table of Contents

Factor Analysis

Factor Analysis

Definition:

Factor analysis is a statistical technique that is used to identify the underlying structure of a relatively large set of variables and to explain these variables in terms of a smaller number of common underlying factors. It helps to investigate the latent relationships between observed variables.

Factor Analysis Steps

Here are the general steps involved in conducting a factor analysis:

1. Define the Research Objective:

Clearly specify the purpose of the factor analysis. Determine what you aim to achieve or understand through the analysis.

2. Data Collection:

Gather the data on the variables of interest. These variables should be measurable and related to the research objective. Ensure that you have a sufficient sample size for reliable results.

3. Assess Data Suitability:

Examine the suitability of the data for factor analysis. Check for the following aspects:

  • Sample size: Ensure that you have an adequate sample size to perform factor analysis reliably.
  • Missing values: Handle missing data appropriately, either by imputation or exclusion.
  • Variable characteristics: Verify that the variables are continuous or at least ordinal in nature. Categorical variables may require different analysis techniques.
  • Linearity: Assess whether the relationships among variables are linear.

4. Determine the Factor Analysis Technique:

There are different types of factor analysis techniques available, such as exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). Choose the appropriate technique based on your research objective and the nature of the data.

5. Perform Factor Analysis:

   a. Exploratory Factor Analysis (EFA):

  • Extract factors: Use factor extraction methods (e.g., principal component analysis or common factor analysis) to identify the initial set of factors.
  • Determine the number of factors: Decide on the number of factors to retain based on statistical criteria (e.g., eigenvalues, scree plot) and theoretical considerations.
  • Rotate factors: Apply factor rotation techniques (e.g., varimax, oblique) to simplify the factor structure and make it more interpretable.
  • Interpret factors: Analyze the factor loadings (correlations between variables and factors) to interpret the meaning of each factor.
  • Determine factor reliability: Assess the internal consistency or reliability of the factors using measures like Cronbach’s alpha.
  • Report results: Document the factor loadings, rotated component matrix, communalities, and any other relevant information.

   b. Confirmatory Factor Analysis (CFA):

  • Formulate a theoretical model: Specify the hypothesized relationships among variables and factors based on prior knowledge or theoretical considerations.
  • Define measurement model: Establish how each variable is related to the underlying factors by assigning factor loadings in the model.
  • Test the model: Use statistical techniques like maximum likelihood estimation or structural equation modeling to assess the goodness-of-fit between the observed data and the hypothesized model.
  • Modify the model: If the initial model does not fit the data adequately, revise the model by adding or removing paths, allowing for correlated errors, or other modifications to improve model fit.
  • Report results: Present the final measurement model, parameter estimates, fit indices (e.g., chi-square, RMSEA, CFI), and any modifications made.

6. Interpret and Validate the Factors:

Once you have identified the factors, interpret them based on the factor loadings, theoretical understanding, and research objectives. Validate the factors by examining their relationships with external criteria or by conducting further analyses if necessary.

Types of Factor Analysis

Types of Factor Analysis are as follows:

Exploratory Factor Analysis (EFA)

EFA is used to explore the underlying structure of a set of observed variables without any preconceived assumptions about the number or nature of the factors. It aims to discover the number of factors and how the observed variables are related to those factors. EFA does not impose any restrictions on the factor structure and allows for cross-loadings of variables on multiple factors.

Confirmatory Factor Analysis (CFA)

CFA is used to test a pre-specified factor structure based on theoretical or conceptual assumptions. It aims to confirm whether the observed variables measure the latent factors as intended. CFA tests the fit of a hypothesized model and assesses how well the observed variables are associated with the expected factors. It is often used for validating measurement instruments or evaluating theoretical models.

Principal Component Analysis (PCA)

PCA is a dimensionality reduction technique that can be considered a form of factor analysis, although it has some differences. PCA aims to explain the maximum amount of variance in the observed variables using a smaller number of uncorrelated components. Unlike traditional factor analysis, PCA does not assume that the observed variables are caused by underlying factors but focuses solely on accounting for variance.

Common Factor Analysis

It assumes that the observed variables are influenced by common factors and unique factors (specific to each variable). It attempts to estimate the common factor structure by extracting the shared variance among the variables while also considering the unique variance of each variable.

Hierarchical Factor Analysis

Hierarchical factor analysis involves multiple levels of factors. It explores both higher-order and lower-order factors, aiming to capture the complex relationships among variables. Higher-order factors are based on the relationships among lower-order factors, which are in turn based on the relationships among observed variables.

Factor Analysis Formulas

Factor Analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors.

Here are some of the essential formulas and calculations used in factor analysis:

Correlation Matrix :

The first step in factor analysis is to create a correlation matrix, which calculates the correlation coefficients between pairs of variables.

Correlation coefficient (Pearson’s r) between variables X and Y is calculated as:

r(X,Y) = Σ[(xi – x̄)(yi – ȳ)] / [n-1] σx σy

where: xi, yi are the data points, x̄, ȳ are the means of X and Y respectively, σx, σy are the standard deviations of X and Y respectively, n is the number of data points.

Extraction of Factors :

The extraction of factors from the correlation matrix is typically done by methods such as Principal Component Analysis (PCA) or other similar methods.

The formula used in PCA to calculate the principal components (factors) involves finding the eigenvalues and eigenvectors of the correlation matrix.

Let’s denote the correlation matrix as R. If λ is an eigenvalue of R, and v is the corresponding eigenvector, they satisfy the equation: Rv = λv

Factor Loadings :

Factor loadings are the correlations between the original variables and the factors. They can be calculated as the eigenvectors normalized by the square roots of their corresponding eigenvalues.

Communality and Specific Variance :

Communality of a variable is the proportion of variance in that variable explained by the factors. It can be calculated as the sum of squared factor loadings for that variable across all factors.

The specific variance of a variable is the proportion of variance in that variable not explained by the factors, and it’s calculated as 1 – Communality.

Factor Rotation : Factor rotation, such as Varimax or Promax, is used to make the output more interpretable. It doesn’t change the underlying relationships but affects the loadings of the variables on the factors.

For example, in the Varimax rotation, the objective is to minimize the variance of the squared loadings of a factor (column) on all the variables (rows) in a factor matrix, which leads to more high and low loadings, making the factor easier to interpret.

Examples of Factor Analysis

Here are some real-time examples of factor analysis:

  • Psychological Research: In a study examining personality traits, researchers may use factor analysis to identify the underlying dimensions of personality by analyzing responses to various questionnaires or surveys. Factors such as extroversion, neuroticism, and conscientiousness can be derived from the analysis.
  • Market Research: In marketing, factor analysis can be used to understand consumers’ preferences and behaviors. For instance, by analyzing survey data related to product features, pricing, and brand perception, researchers can identify factors such as price sensitivity, brand loyalty, and product quality that influence consumer decision-making.
  • Finance and Economics: Factor analysis is widely used in portfolio management and asset pricing models. By analyzing historical market data, factors such as market returns, interest rates, inflation rates, and other economic indicators can be identified. These factors help in understanding and predicting investment returns and risk.
  • Social Sciences: Factor analysis is employed in social sciences to explore underlying constructs in complex datasets. For example, in education research, factor analysis can be used to identify dimensions such as academic achievement, socio-economic status, and parental involvement that contribute to student success.
  • Health Sciences: In medical research, factor analysis can be utilized to identify underlying factors related to health conditions, symptom clusters, or treatment outcomes. For instance, in a study on mental health, factor analysis can be used to identify underlying factors contributing to depression, anxiety, and stress.
  • Customer Satisfaction Surveys: Factor analysis can help businesses understand the key drivers of customer satisfaction. By analyzing survey responses related to various aspects of product or service experience, factors such as product quality, customer service, and pricing can be identified, enabling businesses to focus on areas that impact customer satisfaction the most.

Factor analysis in Research Example

Here’s an example of how factor analysis might be used in research:

Let’s say a psychologist is interested in the factors that contribute to overall wellbeing. They conduct a survey with 1000 participants, asking them to respond to 50 different questions relating to various aspects of their lives, including social relationships, physical health, mental health, job satisfaction, financial security, personal growth, and leisure activities.

Given the broad scope of these questions, the psychologist decides to use factor analysis to identify underlying factors that could explain the correlations among responses.

After conducting the factor analysis, the psychologist finds that the responses can be grouped into five factors:

  • Physical Wellbeing : Includes variables related to physical health, exercise, and diet.
  • Mental Wellbeing : Includes variables related to mental health, stress levels, and emotional balance.
  • Social Wellbeing : Includes variables related to social relationships, community involvement, and support from friends and family.
  • Professional Wellbeing : Includes variables related to job satisfaction, work-life balance, and career development.
  • Financial Wellbeing : Includes variables related to financial security, savings, and income.

By reducing the 50 individual questions to five underlying factors, the psychologist can more effectively analyze the data and draw conclusions about the major aspects of life that contribute to overall wellbeing.

In this way, factor analysis helps researchers understand complex relationships among many variables by grouping them into a smaller number of factors, simplifying the data analysis process, and facilitating the identification of patterns or structures within the data.

When to Use Factor Analysis

Here are some circumstances in which you might want to use factor analysis:

  • Data Reduction : If you have a large set of variables, you can use factor analysis to reduce them to a smaller set of factors. This helps in simplifying the data and making it easier to analyze.
  • Identification of Underlying Structures : Factor analysis can be used to identify underlying structures in a dataset that are not immediately apparent. This can help you understand complex relationships between variables.
  • Validation of Constructs : Factor analysis can be used to confirm whether a scale or measure truly reflects the construct it’s meant to measure. If all the items in a scale load highly on a single factor, that supports the construct validity of the scale.
  • Generating Hypotheses : By revealing the underlying structure of your variables, factor analysis can help to generate hypotheses for future research.
  • Survey Analysis : If you have a survey with many questions, factor analysis can help determine if there are underlying factors that explain response patterns.

Applications of Factor Analysis

Factor Analysis has a wide range of applications across various fields. Here are some of them:

  • Psychology : It’s often used in psychology to identify the underlying factors that explain different patterns of correlations among mental abilities. For instance, factor analysis has been used to identify personality traits (like the Big Five personality traits), intelligence structures (like Spearman’s g), or to validate the constructs of different psychological tests.
  • Market Research : In this field, factor analysis is used to identify the factors that influence purchasing behavior. By understanding these factors, businesses can tailor their products and marketing strategies to meet the needs of different customer groups.
  • Healthcare : In healthcare, factor analysis is used in a similar way to psychology, identifying underlying factors that might influence health outcomes. For instance, it could be used to identify lifestyle or behavioral factors that influence the risk of developing certain diseases.
  • Sociology : Sociologists use factor analysis to understand the structure of attitudes, beliefs, and behaviors in populations. For example, factor analysis might be used to understand the factors that contribute to social inequality.
  • Finance and Economics : In finance, factor analysis is used to identify the factors that drive financial markets or economic behavior. For instance, factor analysis can help understand the factors that influence stock prices or economic growth.
  • Education : In education, factor analysis is used to identify the factors that influence academic performance or attitudes towards learning. This could help in developing more effective teaching strategies.
  • Survey Analysis : Factor analysis is often used in survey research to reduce the number of items or to identify the underlying structure of the data.
  • Environment : In environmental studies, factor analysis can be used to identify the major sources of environmental pollution by analyzing the data on pollutants.

Advantages of Factor Analysis

Advantages of Factor Analysis are as follows:

  • Data Reduction : Factor analysis can simplify a large dataset by reducing the number of variables. This helps make the data easier to manage and analyze.
  • Structure Identification : It can identify underlying structures or patterns in a dataset that are not immediately apparent. This can provide insights into complex relationships between variables.
  • Construct Validation : Factor analysis can be used to validate whether a scale or measure accurately reflects the construct it’s intended to measure. This is important for ensuring the reliability and validity of measurement tools.
  • Hypothesis Generation : By revealing the underlying structure of your variables, factor analysis can help generate hypotheses for future research.
  • Versatility : Factor analysis can be used in various fields, including psychology, market research, healthcare, sociology, finance, education, and environmental studies.

Disadvantages of Factor Analysis

Disadvantages of Factor Analysis are as follows:

  • Subjectivity : The interpretation of the factors can sometimes be subjective, depending on how the data is perceived. Different researchers might interpret the factors differently, which can lead to different conclusions.
  • Assumptions : Factor analysis assumes that there’s some underlying structure in the dataset and that all variables are related. If these assumptions do not hold, factor analysis might not be the best tool for your analysis.
  • Large Sample Size Required : Factor analysis generally requires a large sample size to produce reliable results. This can be a limitation in studies where data collection is challenging or expensive.
  • Correlation, not Causation : Factor analysis identifies correlational relationships, not causal ones. It cannot prove that changes in one variable cause changes in another.
  • Complexity : The statistical concepts behind factor analysis can be difficult to understand and require expertise to implement correctly. Misuse or misunderstanding of the method can lead to incorrect conclusions.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

ANOVA

ANOVA (Analysis of variance) – Formulas, Types...

Documentary Analysis

Documentary Analysis – Methods, Applications and...

Correlation Analysis

Correlation Analysis – Types, Methods and...

Discriminant Analysis

Discriminant Analysis – Methods, Types and...

Methodological Framework

Methodological Framework – Types, Examples and...

Regression Analysis

Regression Analysis – Methods, Types and Examples

  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Factor Analysis Guide with an Example

By Jim Frost 19 Comments

What is Factor Analysis?

Factor analysis uses the correlation structure amongst observed variables to model a smaller number of unobserved, latent variables known as factors. Researchers use this statistical method when subject-area knowledge suggests that latent factors cause observable variables to covary. Use factor analysis to identify the hidden variables.

Analysts often refer to the observed variables as indicators because they literally indicate information about the factor. Factor analysis treats these indicators as linear combinations of the factors in the analysis plus an error. The procedure assesses how much of the variance each factor explains within the indicators. The idea is that the latent factors create commonalities in some of the observed variables.

For example, socioeconomic status (SES) is a factor you can’t measure directly. However, you can assess occupation, income, and education levels. These variables all relate to socioeconomic status. People with a particular socioeconomic status tend to have similar values for the observable variables. If the factor (SES) has a strong relationship with these indicators, then it accounts for a large portion of the variance in the indicators.

The illustration below illustrates how the four hidden factors in blue drive the measurable values in the yellow indicator tags.

Factor analysis illustration.

Researchers frequently use factor analysis in psychology, sociology, marketing, and machine learning.

Let’s dig deeper into the goals of factor analysis, critical methodology choices, and an example. This guide provides practical advice for performing factor analysis.

Analysis Goals

Factor analysis simplifies a complex dataset by taking a larger number of observed variables and reducing them to a smaller set of unobserved factors. Anytime you simplify something, you’re trading off exactness with ease of understanding. Ideally, you obtain a result where the simplification helps you better understand the underlying reality of the subject area. However, this process involves several methodological and interpretative judgment calls. Indeed, while the analysis identifies factors, it’s up to the researchers to name them! Consequently, analysts debate factor analysis results more often than other statistical analyses.

While all factor analysis aims to find latent factors, researchers use it for two primary goals. They either want to explore and discover the structure within a dataset or confirm the validity of existing hypotheses and measurement instruments.

Exploratory Factor Analysis (EFA)

Researchers use exploratory factor analysis (EFA) when they do not already have a good understanding of the factors present in a dataset. In this scenario, they use factor analysis to find the factors within a dataset containing many variables. Use this approach before forming hypotheses about the patterns in your dataset. In exploratory factor analysis, researchers are likely to use statistical output and graphs to help determine the number of factors to extract.

Exploratory factor analysis is most effective when multiple variables are related to each factor. During EFA, the researchers must decide how to conduct the analysis (e.g., number of factors, extraction method, and rotation) because there are no hypotheses or assessment instruments to guide them. Use the methodology that makes sense for your research.

For example, researchers can use EFA to create a scale, a set of questions measuring one factor. Exploratory factor analysis can find the survey items that load on certain constructs.

Confirmatory Factor Analysis (CFA)

Confirmatory factor analysis (CFA) is a more rigid process than EFA. Using this method, the researchers seek to confirm existing hypotheses developed by themselves or others. This process aims to confirm previous ideas, research, and measurement and assessment instruments. Consequently, the nature of what they want to verify will impose constraints on the analysis.

Before the factor analysis, the researchers must state their methodology including extraction method, number of factors, and type of rotation. They base these decisions on the nature of what they’re confirming. Afterwards, the researchers will determine whether the model’s goodness-of-fit and pattern of factor loadings match those predicted by the theory or assessment instruments.

In this vein, confirmatory factor analysis can help assess construct validity. The underlying constructs are the latent factors, while the items in the assessment instrument are the indicators. Similarly, it can also evaluate the validity of measurement systems. Does the tool measure the construct it claims to measure?

For example, researchers might want to confirm factors underlying the items in a personality inventory. Matching the inventory and its theories will impose methodological choices on the researchers, such as the number of factors.

We’ll get to an example factor analysis in short order, but first, let’s cover some key concepts and methodology choices you’ll need to know for the example.

Learn more about Validity and Construct Validity .

In this context, factors are broader concepts or constructs that researchers can’t measure directly. These deeper factors drive other observable variables. Consequently, researchers infer the properties of unobserved factors by measuring variables that correlate with the factor. In this manner, factor analysis lets researchers identify factors they can’t evaluate directly.

Psychologists frequently use factor analysis because many of their factors are inherently unobservable because they exist inside the human brain.

For example, depression is a condition inside the mind that researchers can’t directly observe. However, they can ask questions and make observations about different behaviors and attitudes. Depression is an invisible driver that affects many outcomes we can measure. Consequently, people with depression will tend to have more similar responses to those outcomes than those who are not depressed.

For similar reasons, factor analysis in psychology often identifies and evaluates other mental characteristics, such as intelligence, perseverance, and self-esteem. The researchers can see how a set of measurements load on these factors and others.

Method of Factor Extraction

The first methodology choice for factor analysis is the mathematical approach for extracting the factors from your dataset. The most common choices are maximum likelihood (ML), principal axis factoring (PAF), and principal components analysis (PCA).

You should use either ML or PAF most of the time.

Use ML when your data follow a normal distribution. In addition to extracting factor loadings, it also can perform hypothesis tests, construct confidence intervals, and calculate goodness-of-fit statistics .

Use PAF when your data violates multivariate normality. PAF doesn’t assume that your data follow any distribution, so you could use it when they are normally distributed. However, this method can’t provide all the statistical measures as ML.

PCA is the default method for factor analysis in some statistical software packages, but it isn’t a factor extraction method. It is a data reduction technique to find components. There are technical differences, but in a nutshell, factor analysis aims to reveal latent factors while PCA is only for data reduction. While calculating the components, PCA doesn’t assess the underlying commonalities that unobserved factors cause.

PCA gained popularity because it was a faster algorithm during a time of slower, more expensive computers. If you’re using PCA for factor analysis, do some research to be sure it’s the correct method for your study. Learn more about PCA in, Principal Component Analysis Guide and Example .

There are other methods of factor extraction, but the factor analysis literature has not strongly shown that any of them are better than maximum likelihood or principal axis factoring.

Number of Factors to Extract

You need to specify the number of factors to extract from your data except when using principal component components. The method for determining that number depends on whether you’re performing exploratory or confirmatory factor analysis.

Exploratory Factor Analysis

In EFA, researchers must specify the number of factors to retain. The maximum number of factors you can extract equals the number of variables in your dataset. However, you typically want to reduce the number of factors as much as possible while maximizing the total amount of variance the factors explain.

That’s the notion of a parsimonious model in statistics. When adding factors, there are diminishing returns. At some point, you’ll find that an additional factor doesn’t substantially increase the explained variance. That’s when adding factors needlessly complicates the model. Go with the simplest model that explains most of the variance.

Fortunately, a simple statistical tool known as a scree plot helps you manage this tradeoff.

Use your statistical software to produce a scree plot. Then look for the bend in the data where the curve flattens. The number of points before the bend is often the correct number of factors to extract.

The scree plot below relates to the factor analysis example later in this post. The graph displays the Eigenvalues by the number of factors. Eigenvalues relate to the amount of explained variance.

Scree plot that helps us decide the number of factors to extract.

The scree plot shows the bend in the curve occurring at factor 6. Consequently, we need to extract five factors. Those five explain most of the variance. Additional factors do not explain much more.

Some analysts and software use Eigenvalues > 1 to retain a factor. However, simulation studies have found that this tends to extract too many factors and that the scree plot method is better. (Costello & Osborne, 2005).

Of course, as you explore your data and evaluate the results, you can use theory and subject-area knowledge to adjust the number of factors. The factors and their interpretations must fit the context of your study.

Confirmatory Factor Analysis

In CFA, researchers specify the number of factors to retain using existing theory or measurement instruments before performing the analysis. For example, if a measurement instrument purports to assess three constructs, then the factor analysis should extract three factors and see if the results match theory.

Factor Loadings

In factor analysis, the loadings describe the relationships between the factors and the observed variables. By evaluating the factor loadings, you can understand the strength of the relationship between each variable and the factor. Additionally, you can identify the observed variables corresponding to a specific factor.

Interpret loadings like correlation coefficients . Values range from -1 to +1. The sign indicates the direction of the relations (positive or negative), while the absolute value indicates the strength. Stronger relationships have factor loadings closer to -1 and +1. Weaker relationships are close to zero.

Stronger relationships in the factor analysis context indicate that the factors explain much of the variance in the observed variables.

Related post : Correlation Coefficients

Factor Rotations

In factor analysis, the initial set of loadings is only one of an infinite number of possible solutions that describe the data equally. Unfortunately, the initial answer is frequently difficult to interpret because each factor can contain middling loadings for many indicators. That makes it hard to label them. You want to say that particular variables correlate strongly with a factor while most others do not correlate at all. A sharp contrast between high and low loadings makes that easier.

Rotating the factors addresses this problem by maximizing and minimizing the entire set of factor loadings. The goal is to produce a limited number of high loadings and many low loadings for each factor.

This combination lets you identify the relatively few indicators that strongly correlate with a factor and the larger number of variables that do not correlate with it. You can more easily determine what relates to a factor and what does not. This condition is what statisticians mean by simplifying factor analysis results and making them easier to interpret.

Graphical illustration

Let me show you how factor rotations work graphically using scatterplots .

Factor analysis starts by calculating the pattern of factor loadings. However, it picks an arbitrary set of axes by which to report them. Rotating the axes while leaving the data points unaltered keeps the original model and data pattern in place while producing more interpretable results.

To make this graphable in two dimensions, we’ll use two factors represented by the X and Y axes. On the scatterplot below, the six data points represent the observed variables, and the X and Y coordinates indicate their loadings for the two factors. Ideally, the dots fall right on an axis because that shows a high loading for that factor and a zero loading for the other.

Scatterplot of the initial factor loadings.

For the initial factor analysis solution on the scatterplot, the points contain a mixture of both X and Y coordinates and aren’t close to a factor’s axis. That makes the results difficult to interpret because the variables have middling loads on all the factors. Visually, they’re not clumped near axes, making it difficult to assign the variables to one.

Rotating the axes around the scatterplot increases or decreases the X and Y values while retaining the original pattern of data points. At the blue rotation on the graph below, you maximize one factor loading while minimizing the other for all data points. The result is that the loads are high on one indicator but low on the other.

Scatterplot of rotated loadings in a factor analysis.

On the graph, all data points cluster close to one of the two factors on the blue rotated axes, making it easy to associate the observed variables with one factor.

Types of Rotations

Throughout these rotations, you work with the same data points and factor analysis model. The model fits the data for the rotated loadings equally as well as the initial loadings, but they’re easier to interpret. You’re using a different coordinate system to gain a different perspective of the same pattern of points.

There are two fundamental types of rotation in factor analysis, oblique and orthogonal.

Oblique rotations allow correlation amongst the factors, while orthogonal rotations assume they are entirely uncorrelated.

Graphically, orthogonal rotations enforce a 90° separation between axes, as shown in the example above, where the rotated axes form right angles.

Oblique rotations are not required to have axes forming right angles, as shown below for a different dataset.

Oblique rotation for a factor analysis.

Notice how the freedom for each axis to take any orientation allows them to fit the data more closely than when enforcing the 90° constraint. Consequently, oblique rotations can produce simpler structures than orthogonal rotations in some cases. However, these results can contain correlated factors.

Promax Varimax
Oblimin Equimax
Direct Quartimin Quartimax

In practice, oblique rotations produce similar results as orthogonal rotations when the factors are uncorrelated in the real world. However, if you impose an orthogonal rotation on genuinely correlated factors, it can adversely affect the results. Despite the benefits of oblique rotations, analysts tend to use orthogonal rotations more frequently, which might be a mistake in some cases.

When choosing a rotation method in factor analysis, be sure it matches your underlying assumptions and subject-area knowledge about whether the factors are correlated.

Factor Analysis Example

Imagine that we are human resources researchers who want to understand the underlying factors for job candidates. We measured 12 variables and perform factor analysis to identify the latent factors. Download the CSV dataset: FactorAnalysis

The first step is to determine the number of factors to extract. Earlier in this post, I displayed the scree plot, which indicated we should extract five factors. If necessary, we can perform the analysis with a different number of factors later.

For the factor analysis, we’ll assume normality and use Maximum Likelihood to extract the factors. I’d prefer to use an oblique rotation, but my software only has orthogonal rotations. So, we’ll use Varimax. Let’s perform the analysis!

Interpreting the Results

Statistical output for the factor analysis example.

In the bottom right of the output, we see that the five factors account for 81.8% of the variance. The %Var row along the bottom shows how much of the variance each explains. The five factors are roughly equal, explaining between 13.5% to 19% of the variance. Learn about Variance .

The Communality column displays the proportion of the variance the five factors explain for each variable. Values closer to 1 are better. The five factors explain the most variance for Resume (0.989) and the least for Appearance (0.643).

In the factor analysis output, the circled loadings show which variables have high loadings for each factor. As shown in the table below, we can assign labels encompassing the properties of the highly loading variables for each factor.

1 Relevant Background Academic record, Potential, Experience
2 Personal Characteristics Confidence, Likeability, Appearance
3 General Work Skills Organization, Communication
4 Writing Skills Letter, Resume
5 Overall Fit Company Fit, Job Fit

In summary, these five factors explain a large proportion of the variance, and we can devise reasonable labels for each. These five latent factors drive the values of the 12 variables we measured.

Hervé Abdi (2003), “Factor Rotations in Factor Analyses,” In: Lewis-Beck M., Bryman, A., Futing T. (Eds.) (2003). Encyclopedia of Social Sciences Research Methods . Thousand Oaks (CA): Sage.

Brown, Michael W., (2001) “ An Overview of Analytic Rotation in Exploratory Factor Analysis ,” Multivariate Behavioral Research , 36 (1), 111-150.

Costello, Anna B. and Osborne, Jason (2005) “ Best practices in exploratory factor analysis: four recommendations for getting the most from your analysis ,” Practical Assessment, Research, and Evaluation : Vol. 10 , Article 7.

Share this:

types of factor analysis in research methodology

Reader Interactions

' src=

May 26, 2024 at 8:51 am

Good day Jim, I am running in troubles in terms of the item analysis on the 5 point Likert scale that I am trying to create. The thing is, is that my CFI is around 0.9 and TLI is around 0.8 which is good but my RMSEA and SRMR has a awful result as the RMSEA is around 0.1 and SRMR is 0.2. And it is a roadblock for me, I want to ask on how I can improve my RMSEA and SRMR? so that it would reach the cut off.

I hope that his message would reach you and thank you for taking the time and reading and responding to my troubled question.

' src=

May 15, 2024 at 11:27 am

Good day, Sir Jim. I am currently trying to create a 5-Likert scale that tries to measure National Identity Conformity in three ways: (1) Origin – (e.g., Americans are born in/from America), (2) Culture (e.g., Americans are patriotic) and (3) Belief (e.g., Americans embrace being Americans).

In the process of establishing the scale’s validity, I was told to use Exploratory Factor Analysis, and I would like to ask what methods of extraction and rotation can be best used to ensure that the inter-item validity of my scale is good. I would also like to understand how I can avoid crossloading or limit crossloading factors.

' src=

May 15, 2024 at 3:13 pm

I discuss those issues in this post. I’d recommend PAF as the method of extraction because your data being Likert scale won’t be normally distribution. Read the Method of Factor Extraction section for more information.

As for cross-loading, the method of rotation can help with that. The choice depends largely on subject-area knowledge and what works best for your data, so I can’t provide a suggested method. Read the Factor Rotations section for more information about that. For instance, if you get cross-loadings with orthogonal rotations, using an oblique rotation might help.

If factor rotation doesn’t sufficiently reduce cross-loading, you might need to rework your questions so they’re more distinct, remove problematic items, or increase your sample size (can provide more stable factor solutions and clearer patterns of loadings). In this scenario where changing rotations doesn’t help, you’ll need to determine whether the underlying issue is with your questions or having to small of a sample size.

I hope that helps!

' src=

March 6, 2024 at 10:20 pm

What does negative loadings mean? How to proceed further with these loadings?

March 6, 2024 at 10:44 pm

Loadings are like correlation coefficients and range from -1 to +1. More extreme positive and negative values indicate stronger relationships. Negative loadings indicate a negative relationship between the latent factors and observed variables. Highly negative values are as good as highly positive values. I discuss this in detail in the the Factor Loadings section of this post.

' src=

March 6, 2024 at 10:10 am

Good day Jim,

The methodology seems loaded with opportunities for errors. So often we are being asked to translate a nebulous English word into some sort of mathematical descriptor. As an example, in the section labelled ‘Interpreting the Results’, what are we to make of the words ‘likeability’ or ‘self-confidence’ ? How can we possibly evaluate those things…and to three significant decimal places ?

You Jim, understand and use statistical methods correctly. Yet, too often people who apply statistics fail to examine the language of their initial questions and end up doing poor analysis. Worse, many don’t understand the software they use.

On a more cheery note, keep up the great work. The world needs a thousand more of you.

March 6, 2024 at 5:08 pm

Thanks for the thoughtful comment. I agree with your concerns.

Ideally, all of those attributes are measured using validated measurement scales. The field of psychology is pretty good about that for terms that seem kind of squishy. For instance, they usually have thorough validation processes for personality traits, etc. However, your point is well taken, you need to be able to trust your data.

All statistical analyses depend on thorough subject-area knowledge, and that’s very true for factor analysis. You must have a solid theoretical understanding of these latent factors from extensive research before considering FA. Then FA can see if there’s evidence that they actually exist. But, I do agree with you that between the rotations and having to derive names to associate with the loadings, it can be a fairly subjective process.

Thanks so much for your kind words! I appreciate them because I do strive for accuracy.

' src=

March 2, 2024 at 8:44 pm

sir, i want to know that after successfully identifying my 3 factors with above give method now i want to regress on the data how to get single value for each factor rather than these number of values

' src=

February 28, 2024 at 7:48 am

Hello, Thanks for your effort on this post, it really helped me a lot. I want your recommendation for my case if you don’t mind.

I’m working on my research and I’ve 5 independent variables and 1 dependent variable, I want to use a factor analysis method in order to know which variable contributes the most in the dependent variable.

Also, what kind of data checks and preparations shall I make before starting the analysis.

Thanks in advance for your consideration.

February 28, 2024 at 1:46 pm

Based on the information you provided, I don’t believe factor analysis is the correct analysis for you.

Factor analysis is primarily used for understanding the structure of a set of variables and for reducing data dimensions by identifying underlying latent factors. It’s particularly useful when you have a large number of observed variables and believe that they are influenced by a smaller number of unobserved factors.

Instead, it sounds like you have the IVs and DV and want to understand the relationships between them. For that, I recommend multiple regression. Learn more in my post about When to Use Regression . After you settle on a model, there are several ways to Identify the Most Important Variables in the Model .

In terms of checking assumptions, familiarize yourself with the Ordinary Least Squares Regression Assumptions . Least squares regression is the most common and is a good place to start.

Best of luck with your analysis!

' src=

December 1, 2023 at 1:01 pm

What would be the eign value in efa

' src=

November 1, 2023 at 4:42 am

Hi Jim, this is an excellent yet succinct article on the topic. A very basic question, though: the dataset contains ordinal data. Is this ok? I’m a student in a Multivariate Statistics course, and as far as I’m aware, both PCA and common factor analysis dictate metric data. Or is it assumed that since the ordinal data has been coded into a range of 0-10, then the data is considered numeric and can be applied with PCA or CFA?

Sorry for the dumb question, and thank you.

November 1, 2023 at 8:00 pm

That’s a great question.

For the example in this post, we’re dealing with data on a 10 point scale where the differences between all points are equal. Consequently, we can treat discrete data as continuous data.

Now, to your question about ordinal data. You can use ordinal data with factor analysis however you might need to use specific methods.

For ordinal data, it’s often recommended to use polychoric correlations instead of Pearson correlations. Polychoric correlations estimate the correlation between two latent continuous variables that underlie the observed ordinal variables. This provides a more accurate correlation matrix for factor analysis of ordinal data.

I’ve also heard about categorical PCA and nonlinear Factor Analysis that use a monotonical transformation of ordinal data.

I hope that helps clarify it for you!

' src=

September 2, 2023 at 4:14 pm

Once identifying how much each variability the factors contribute, what steps could we take from here to make predictions about variables ?

September 2, 2023 at 6:53 pm

Hi Brittany,

Thanks for the great question! And thanks for you kind words in your other comment! 🙂

What you can do is calculate all the factor scores for each observation. Some software will do this for you as an option. Or, you can input values into the regression equations for the factor scores that are included in the output.

Then use these scores as the independent variables in regression analysis. From there, you can use the regression model to make predictions .

Ideally, you’d evaluate the regression model before making predictions and use cross validation to be sure that the model works for observations outside the dataset you used to fit the model.

September 2, 2023 at 4:13 pm

Wow! This was really helpful and structured very well for interpretation. Thank you!

' src=

October 6, 2022 at 10:55 am

I can imagine that Prof will have further explanations on this down the line at some point in future. I’m waiting… Thanks Prof Jim for your usual intuitive manner of explaining concepts. Funsho

' src=

September 26, 2022 at 8:08 am

Thanks for a very comprehensive guide. I learnt a lot. In PCA, we usually extract the components and use it for predictive modeling. Is this the case with Factor Analysis as well? Can we use factors as predictors?

September 26, 2022 at 8:27 pm

I have not used factors as predictors, but I think it would be possible. However, PCA’s goal is to maximize data reduction. This process is particularly valuable when you have many variables, low sample size and/or collinearity between the predictors. Factor Analysis also reduces the data but that’s not its primary goal. Consequently, my sense is that PCA is better for that predictive modeling while Factor Analysis is better for when you’re trying to understand the underlying factors (which you aren’t with PCA). But, again, I haven’t tried using factors in that way nor I have compared the results to PCA. So, take that with a grain of salt!

Comments and Questions Cancel reply

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Int J Med Educ

Factor Analysis: a means for theory and instrument development in support of construct validity

Mohsen tavakol.

1 School of Medicine, Medical Education Centre, the University of Nottingham, UK

Angela Wetzel

2 School of Education, Virginia Commonwealth University, USA

Introduction

Factor analysis (FA) allows us to simplify a set of complex variables or items using statistical procedures to explore the underlying dimensions that explain the relationships between the multiple variables/items. For example, to explore inter-item relationships for a 20-item instrument, a basic analysis would produce 400 correlations; it is not an easy task to keep these matrices in our heads. FA simplifies a matrix of correlations so a researcher can more easily understand the relationship between items in a scale and the underlying factors that the items may have in common. FA is a commonly applied and widely promoted procedure for developing and refining clinical assessment instruments to produce evidence for the construct validity of the measure.

In the literature, the strong association between construct validity and FA is well documented, as the method provides evidence based on test content and evidence based on internal structure, key components of construct validity. 1 From FA, evidence based on internal structure and evidence based on test content can be examined to tell us what the instrument really measures - the intended abstract concept (i.e., a factor/dimension/construct) or something else. Establishing construct validity for the interpretations from a measure is critical to high quality assessment and subsequent research using outcomes data from the measure. Therefore, FA should be a researcher’s best friend during the development and validation of a new measure or when adapting a measure to a new population. FA is also a useful companion when critiquing existing measures for application in research or assessment practice. However, despite the popularity of FA, when applied in medical education instrument development, factor analytic procedures do not always match best practice. 2 This editorial article is designed to help medical educators use FA appropriately.

The Applications of FA

The applications of FA depend on the purpose of the research. Generally speaking, there are two most important types of FA: Explorator Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA).

Exploratory Factor Analysis

Exploratory Factor Analysis (EFA) is widely used in medical education research in the early phases of instrument development, specifically for measures of latent variables that cannot be assessed directly. Typically, in EFA, the researcher, through a review of the literature and engagement with content experts, selects as many instrument items as necessary to fully represent the latent construct (e.g., professionalism). Then, using EFA, the researcher explores the results of factor loadings, along with other criteria (e.g., previous theory, Minimum average partial, 3 Parallel analysis, 4 conceptual meaningfulness, etc.) to refine the measure. Suppose an instrument consisting of 30 questions yields two factors - Factor 1 and Factor 2. A good definition of a factor as a theoretical construct is to look at its factor loadings. 5 The factor loading is the correlation between the item and the factor; a factor loading of more than 0.30 usually indicates a moderate correlation between the item and the factor. Most statistical software, such as SAS, SPSS and R, provide factor loadings. Upon review of the items loading on each factor, the researcher identifies two distinct constructs, with items loading on Factor 1 all related to professionalism, and items loading on Factor 2 related, instead, to leadership. Here, EFA helps the researcher build evidence based on internal structure by retaining only those items with appropriately high loadings on Factor 1 for professionalism, the construct of interest.

It is important to note that, often, Principal Component Analysis (PCA) is applied and described, in error, as exploratory factor analysis. 2 , 6 PCA is appropriate if the study primarily aims to reduce the number of original items in the intended instrument to a smaller set. 7 However, if the instrument is being designed to measure a latent construct, EFA, using Maximum Likelihood (ML) or Principal Axis Factoring (PAF), is the appropriate method. 7   These exploratory procedures statistically analyze the interrelationships between the instrument items and domains to uncover the unknown underlying factorial structure (dimensions) of the construct of interest. PCA, by design, seeks to explain total variance (i.e., specific and error variance) in the correlation matrix. The sum of the squared loadings on a factor matrix for a particular item indicates the proportion of variance for that given item that is explained by the factors. This is called the communality. The higher the communality value, the more the extracted factors explain the variance of the item. Further, the mean score for the sum of the squared factor loadings specifies the proportion of variance explained by each factor. For example, assume four items of an instrument have produced Factor 1, factor loadings of Factor 1 are 0.86, 0.75, 0.66 and 0.58, respectively. If you square the factor loading of items, you will get the percentage of the variance of that item which is explained by Factor 1. In this example, the first principal component (PC) for item1, item2, item3 and item4 is 74%, 56%, 43% and 33%, respectively. If you sum the squared factor loadings of Factor 1, you will get the eigenvalue, which is 2.1 and dividing the eigenvalue by four (2.1/4= 0.52) we will get the proportion of variance accounted for Factor 1, which is 52 %. Since PCA does not separate specific variance and error variance, it often inflates factor loadings and limits the potential for the factor structure to be generalized and applied with other samples in subsequent study. On the other hand, Maximum likelihood and Principal Axis Factoring extraction methods separate common and unique variance (specific and error variance), which overcomes the issue attached to PCA.  Thus, the proportion of variance explained by an extracted factor more precisely reflects the extent to which the latent construct is measured by the instrument items. This focus on shared variance among items explained by the underlying factor, particularly during instrument development, helps the researcher understand the extent to which a measure captures the intended construct. It is useful to mention that in PAF, the initial communalities are not set at 1s, but they are chosen based on the squared multiple correlation coefficient. Indeed, if you run a multiple regression to predict say  item1 (dependent variable)  from other items (independent variables) and then look at the R-squared (R2), you will see R2 is equal to the communalities of item1 derived from PAF.

Confirmatory Factor Analysis

When prior EFA studies are available for your intended instrument, Confirmatory Factor Analysis extends on those findings, allowing you to confirm or disconfirm the underlying factor structures, or dimensions, extracted in prior research. CFA is a theory or model-driven approach that tests how well the data “fit” to the proposed model or theory. CFA thus departs from EFA in that researchers must first identify a factor model before analysing the data. More fundamentally, CFA is a means for statistically testing the internal structure of instruments and relies on the maximum likelihood estimation (MLE) and a different set of standards for assessing the suitability of the construct of interest. 7 , 8

Factor analysts usually use the path diagram to show the theoretical and hypothesized relationships between items and the factors to create a hypothetical model to test using the ML method. In the path diagram, circles or ovals represent factors. A rectangle represents the instrument items. Lines (→ or ↔) represent relationships between items. No line, no relationship. A single-headed arrow shows the causal relationship (the variable that the arrowhead refers to is the dependent variable), and a double-headed shows a covariance between variables or factors.

If CFA indicates the primary factors, or first-order factors, produced by the prior PAF are correlated, then the second-order factors need to be modelled and estimated to get a greater understanding of the data. It should be noted if the prior EFA applied an orthogonal rotation to the factor solution, the factors produced would be uncorrelated. Hence, the analysis of the second-order factors is not possible. Generally, in social science research, most constructs assume inter-related factors, and therefore should apply an oblique rotation. The justification for analyzing the second-order factors is that when the correlations between the primary factors exist, CFA can then statistically model a broad picture of factors not captured by the primary factors (i.e., the first-order factors). 9   The analysis of the first-order factors is like surveying mountains with a zoom lens binoculars, while the analysis of the second-order factors uses a wide-angle lens. 10 Goodness of- fit- tests need to be conducted when evaluating the hypothetical model tested by CFA. The question is: does the new data fit the hypothetical model? However, the statistical models of the goodness of- fit- tests are complex, and extend beyond the scope of this editorial paper; thus,we strongly encourage the readers consult with factors analysts to receive resources and possible advise.

Conclusions

Factor analysis methods can be incredibly useful tools for researchers attempting to establish high quality measures of those constructs not directly observed and captured by observation. Specifically, the factor solution derived from an Exploratory Factor Analysis provides a snapshot of the statistical relationships of the key behaviors, attitudes, and dispositions of the construct of interest. This snapshot provides critical evidence for the validity of the measure based on the fit of the test content to the theoretical framework that underlies the construct. Further, the relationships between factors, which can be explored with EFA and confirmed with CFA, help researchers interpret the theoretical connections between underlying dimensions of a construct and even extending to relationships across constructs in a broader theoretical model. However, studies that do not apply recommended extraction, rotation, and interpretation in FA risk drawing faulty conclusions about the validity of a measure. As measures are picked up by other researchers and applied in experimental designs, or by practitioners as assessments in practice, application of measures with subpar evidence for validity produces a ripple effect across the field. It is incumbent on researchers to ensure best practices are applied or engage with methodologists to support and consult where there are gaps in knowledge of methods. Further, it remains important to also critically evaluate measures selected for research and practice, focusing on those that demonstrate alignment with best practice for FA and instrument development. 7 , 11

Conflicts of Interest

The authors declare that they have no conflicts of interest.

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

Lesson 12: factor analysis, overview section  .

Factor Analysis is a method for modeling observed variables, and their covariance structure, in terms of a smaller number of underlying unobservable (latent) “factors.” The factors typically are viewed as broad concepts or ideas that may describe an observed phenomenon. For example, a basic desire of obtaining a certain social level might explain most consumption behavior. These unobserved factors are more interesting to the social scientist than the observed quantitative measurements.

Factor analysis is generally an exploratory/descriptive method that requires many subjective judgments. It is a widely used tool and often controversial because the models, methods, and subjectivity are so flexible that debates about interpretations can occur.

The method is similar to principal components although, as the textbook points out, factor analysis is more elaborate. In one sense, factor analysis is an inversion of principal components. In factor analysis, we model the observed variables as linear functions of the “factors.” In principal components, we create new variables that are linear combinations of the observed variables.  In both PCA and FA, the dimension of the data is reduced. Recall that in PCA, the interpretation of the principal components is often not very clean. A particular variable may, on occasion, contribute significantly to more than one of the components. Ideally, we like each variable to contribute significantly to only one component. A technique called factor rotation is employed toward that goal. Examples of fields where factor analysis is involved include physiology, health, intelligence, sociology, and sometimes ecology among others.

  • Understand the terminology of factor analysis, including the interpretation of factor loadings, specific variances, and commonalities;
  • Understand how to apply both principal component and maximum likelihood methods for estimating the parameters of a factor model;
  • Understand factor rotation, and interpret rotated factor loadings.

Root out friction in every digital experience, super-charge conversion rates, and optimize digital self-service

Uncover insights from any interaction, deliver AI-powered agent coaching, and reduce cost to serve

Increase revenue and loyalty with real-time insights and recommendations delivered to teams on the ground

Know how your people feel and empower managers to improve employee engagement, productivity, and retention

Take action in the moments that matter most along the employee journey and drive bottom line growth

Whatever they’re are saying, wherever they’re saying it, know exactly what’s going on with your people

Get faster, richer insights with qual and quant tools that make powerful market research available to everyone

Run concept tests, pricing studies, prototyping + more with fast, powerful studies designed by UX research experts

Track your brand performance 24/7 and act quickly to respond to opportunities and challenges in your market

Explore the platform powering Experience Management

  • Free Account
  • Product Demos
  • For Digital
  • For Customer Care
  • For Human Resources
  • For Researchers
  • Financial Services
  • All Industries

Popular Use Cases

  • Customer Experience
  • Employee Experience
  • Net Promoter Score
  • Voice of Customer
  • Customer Success Hub
  • Product Documentation
  • Training & Certification
  • XM Institute
  • Popular Resources
  • Customer Stories
  • Artificial Intelligence
  • Market Research
  • Partnerships
  • Marketplace

The annual gathering of the experience leaders at the world’s iconic brands building breakthrough business results, live in Salt Lake City.

  • English/AU & NZ
  • Español/Europa
  • Español/América Latina
  • Português Brasileiro
  • REQUEST DEMO
  • Experience Management
  • Survey Data Analysis & Reporting
  • Factor Analysis

Try Qualtrics for free

Factor analysis and how it simplifies research findings.

17 min read There are many forms of data analysis used to report on and study survey data. Factor analysis is best when used to simplify complex data sets with many variables.

What is factor analysis?

Factor analysis is the practice of condensing many variables into just a few, so that your research data is easier to work with.

For example, a retail business trying to understand customer buying behaviours might consider variables such as ‘did the product meet your expectations?’, ‘how would you rate the value for money?’ and ‘did you find the product easily?’. Factor analysis can help condense these variables into a single factor, such as ‘customer purchase satisfaction’.

customer purchase satisfaction tree

The theory is that there are deeper factors driving the underlying concepts in your data, and that you can uncover and work with them instead of dealing with the lower-level variables that cascade from them. Know that these deeper concepts aren’t necessarily immediately obvious – they might represent traits or tendencies that are hard to measure, such as extraversion or IQ.

Factor analysis is also sometimes called “dimension reduction”: you can reduce the “dimensions” of your data into one or more “super-variables,” also known as unobserved variables or latent variables. This process involves creating a factor model and often yields a factor matrix that organizes the relationship between observed variables and the factors they’re associated with.

As with any kind of process that simplifies complexity, there is a trade-off between the accuracy of the data and how easy it is to work with. With factor analysis, the best solution is the one that yields a simplification that represents the true nature of your data, with minimum loss of precision. This often means finding a balance between achieving the variance explained by the model and using fewer factors to keep the model simple.

Factor analysis isn’t a single technique, but a family of statistical methods that can be used to identify the latent factors driving observable variables. Factor analysis is commonly used in market research , as well as other disciplines like technology, medicine, sociology, field biology, education, psychology and many more.

What is a factor?

In the context of factor analysis, a factor is a hidden or underlying variable that we infer from a set of directly measurable variables.

Take ‘customer purchase satisfaction’ as an example again. This isn’t a variable you can directly ask a customer to rate, but it can be determined from the responses to correlated questions like ‘did the product meet your expectations?’, ‘how would you rate the value for money?’ and ‘did you find the product easily?’.

While not directly observable, factors are essential for providing a clearer, more streamlined understanding of data. They enable us to capture the essence of our data’s complexity, making it simpler and more manageable to work with, and without losing lots of information.

Free eBook: 2024 global market research trends report

Key concepts in factor analysis

These concepts are the foundational pillars that guide the application and interpretation of factor analysis.

Central to factor analysis, variance measures how much numerical values differ from the average. In factor analysis, you’re essentially trying to understand how underlying factors influence this variance among your variables. Some factors will explain more variance than others, meaning they more accurately represent the variables they consist of.

The eigenvalue expresses the amount of variance a factor explains. If a factor solution (unobserved or latent variables) has an eigenvalue of 1 or above, it indicates that a factor explains more variance than a single observed variable, which can be useful in reducing the number of variables in your analysis. Factors with eigenvalues less than 1 account for less variability than a single variable and are generally not included in the analysis.

Factor score

A factor score is a numeric representation that tells us how strongly each variable from the original data is related to a specific factor. Also called the component score, it can help determine which variables are most influenced by each factor and are most important for each underlying concept.

Factor loading

Factor loading is the correlation coefficient for the variable and factor. Like the factor score, factor loadings give an indication of how much of the variance in an observed variable can be explained by the factor. High factor loadings (close to 1 or -1) mean the factor strongly influences the variable.

When to use factor analysis

Factor analysis is a powerful tool when you want to simplify complex data, find hidden patterns, and set the stage for deeper, more focused analysis.

It’s typically used when you’re dealing with a large number of interconnected variables, and you want to understand the underlying structure or patterns within this data. It’s particularly useful when you suspect that these observed variables could be influenced by some hidden factors.

For example, consider a business that has collected extensive customer feedback through surveys. The survey covers a wide range of questions about product quality, pricing, customer service and more. This huge volume of data can be overwhelming, and this is where factor analysis comes in. It can help condense these numerous variables into a few meaningful factors, such as ‘product satisfaction’, ‘customer service experience’ and ‘value for money’.

Factor analysis doesn’t operate in isolation – it’s often used as a stepping stone for further analysis. For example, once you’ve identified key factors through factor analysis, you might then proceed to a cluster analysis – a method that groups your customers based on their responses to these factors. The result is a clearer understanding of different customer segments, which can then guide targeted marketing and product development strategies.

By combining factor analysis with other methodologies, you can not only make sense of your data but also gain valuable insights to drive your business decisions.

Factor analysis assumptions

Factor analysis relies on several assumptions for accurate results. Violating these assumptions may lead to factors that are hard to interpret or misleading.

Linear relationships between variables

This ensures that changes in the values of your variables are consistent.

Sufficient variables for each factor

Because if only a few variables represent a factor, it might not be identified accurately.

Adequate sample size

The larger the ratio of cases (respondents, for instance) to variables, the more reliable the analysis.

No perfect multicollinearity and singularity

No variable is a perfect linear combination of other variables, and no variable is a duplicate of another.

Relevance of the variables

There should be some correlation between variables to make a factor analysis feasible.

assumptions for factor analysis

Types of factor analysis

There are two main factor analysis methods: exploratory and confirmatory. Here’s how they are used to add value to your research process.

Confirmatory factor analysis

In this type of analysis, the researcher starts out with a hypothesis about their data that they are looking to prove or disprove. Factor analysis will confirm – or not – where the latent variables are and how much variance they account for.

Principal component analysis (PCA) is a popular form of confirmatory factor analysis. Using this method, the researcher will run the analysis to obtain multiple possible solutions that split their data among a number of factors. Items that load onto a single particular factor are more strongly related to one another and can be grouped together by the researcher using their conceptual knowledge or pre-existing research.

Using PCA will generate a range of solutions with different numbers of factors, from simplified 1-factor solutions to higher levels of complexity. However, the fewer number of factors employed, the less variance will be accounted for in the solution.

Exploratory factor analysis

As the name suggests, exploratory factor analysis is undertaken without a hypothesis in mind. It’s an investigatory process that helps researchers understand whether associations exist between the initial variables, and if so, where they lie and how they are grouped.

How to perform factor analysis: A step-by-step guide

Performing a factor analysis involves a series of steps, often facilitated by statistical software packages like SPSS, Stata and the R programming language . Here’s a simplified overview of the process.

how to perform factor analysis

Prepare your data

Start with a dataset where each row represents a case (for example, a survey respondent), and each column is a variable you’re interested in. Ensure your data meets the assumptions necessary for factor analysis.

Create an initial hypothesis

If you have a theory about the underlying factors and their relationships with your variables, make a note of this. This hypothesis can guide your analysis, but keep in mind that the beauty of factor analysis is its ability to uncover unexpected relationships.

Choose the type of factor analysis

The most common type is exploratory factor analysis, which is used when you’re not sure what to expect. If you have a specific hypothesis about the factors, you might use confirmatory factor analysis.

Form your correlation matrix

After you’ve chosen the type of factor analysis, you’ll need to create the correlation matrix of your variables. This matrix, which shows the correlation coefficients between each pair of variables, forms the basis for the extraction of factors. This is a key step in building your factor analysis model.

Decide on the extraction method

Principal component analysis is the most commonly used extraction method. If you believe your factors are correlated, you might opt for principal axis factoring, a type of factor analysis that identifies factors based on shared variance.

Determine the number of factors

Various criteria can be used here, such as Kaiser’s criterion (eigenvalues greater than 1), the scree plot method or parallel analysis. The choice depends on your data and your goals.

Interpret and validate your results

Each factor will be associated with a set of your original variables, so label each factor based on how you interpret these associations. These labels should represent the underlying concept that ties the associated variables together.

Validation can be done through a variety of methods, like splitting your data in half and checking if both halves produce the same factors.

How factor analysis can help you

As well as giving you fewer variables to navigate, factor analysis can help you understand grouping and clustering in your input variables, since they’ll be grouped according to the latent variables.

Say you ask several questions all designed to explore different, but closely related, aspects of customer satisfaction:

  • How satisfied are you with our product?
  • Would you recommend our product to a friend or family member?
  • How likely are you to purchase our product in the future?

But you only want one variable to represent a customer satisfaction score. One option would be to average the three question responses. Another option would be to create a factor dependent variable. This can be done by running a principal component analysis (PCA) and keeping the first principal component (also known as a factor). The advantage of a PCA over an average is that it automatically weights each of the variables in the calculation.

Say you have a list of questions and you don’t know exactly which responses will move together and which will move differently; for example, purchase barriers of potential customers. The following are possible barriers to purchase:

  • Price is prohibitive
  • Overall implementation costs
  • We can’t reach a consensus in our organization
  • Product is not consistent with our business strategy
  • I need to develop an ROI, but cannot or have not
  • We are locked into a contract with another product
  • The product benefits don’t outweigh the cost
  • We have no reason to switch
  • Our IT department cannot support your product
  • We do not have sufficient technical resources
  • Your product does not have a feature we require
  • Other (please specify)

Factor analysis can uncover the trends of how these questions will move together. The following are loadings for 3 factors for each of the variables.

factor analysis data

Notice how each of the principal components have high weights for a subset of the variables. Weight is used interchangeably with loading, and high weight indicates the variables that are most influential for each principal component. +0.30 is generally considered to be a heavy weight.

The first component displays heavy weights for variables related to cost, the second weights variables related to IT, and the third weights variables related to organizational factors. We can give our new super variables clever names.

factor analysis data 2

If we were to cluster the customers based on these three components, we can see some trends. Customers tend to be high in cost barriers or organizational barriers, but not both.

The red dots represent respondents who indicated they had higher organizational barriers; the green dots represent respondents who indicated they had higher cost barriers

factor analysis graph

Considerations when using factor analysis

Factor analysis is a tool, and like any tool its effectiveness depends on how you use it. When employing factor analysis, it’s essential to keep a few key considerations in mind.

Oversimplification

While factor analysis is great for simplifying complex data sets, there’s a risk of oversimplification when grouping variables into factors. To avoid this you should ensure the reduced factors still accurately represent the complexities of your variables.

Subjectivity

Interpreting the factors can sometimes be subjective, and requires a good understanding of the variables and the context. Be mindful that multiple analysts may come up with different names for the same factor.

Supplementary techniques

Factor analysis is often just the first step. Consider how it fits into your broader research strategy and which other techniques you’ll use alongside it.

Examples of factor analysis studies

Factor analysis, including PCA, is often used in tandem with segmentation studies. It might be an intermediary step to reduce variables before using KMeans to make the segments.

Factor analysis provides simplicity after reducing variables. For long studies with large blocks of Matrix Likert scale questions, the number of variables can become unwieldy. Simplifying the data using factor analysis helps analysts focus and clarify the results, while also reducing the number of dimensions they’re clustering on.

Sample questions for factor analysis

Choosing exactly which questions to perform factor analysis on is both an art and a science. Choosing which variables to reduce takes some experimentation, patience and creativity. Factor analysis works well on Likert scale questions and Sum to 100 question types.

Factor analysis works well on matrix blocks of the following question genres:

Psychographics (Agree/Disagree):

  • I value family
  • I believe brand represents value

Behavioral (Agree/Disagree):

  • I purchase the cheapest option
  • I am a bargain shopper

Attitudinal (Agree/Disagree):

  • The economy is not improving
  • I am pleased with the product

Activity-Based (Agree/Disagree):

  • I love sports
  • I sometimes shop online during work hours

Behavioral and psychographic questions are especially suited for factor analysis.

Sample output reports

Factor analysis simply produces weights (called loadings) for each respondent. These loadings can be used like other responses in the survey.

Cost Barrier IT Barrier Org Barrier
R_3NWlKlhmlRM0Lgb 0.7 1.3 -0.9
R_Wp7FZE1ziZ9czSN 0.2 -0.4 -0.3
R_SJlfo8Lpb6XTHGh -0.1 0.1 0.4
R_1Kegjs7Q3AL49wO -0.1 -0.3 -0.2
R_1IY1urS9bmfIpbW 1.6 0.3 -0.3

Related resources

Analysis & Reporting

Margin of error 11 min read

Data saturation in qualitative research 8 min read, thematic analysis 11 min read, behavioral analytics 12 min read, statistical significance calculator: tool & complete guide 18 min read, regression analysis 19 min read, data analysis 31 min read, request demo.

Ready to learn more about Qualtrics?

Drive Research Logo

Contact Us (315) 303-2040

  • Market Research Company Blog

Factor Analysis: Definition, Types, and Examples

by Tim Gell

Posted at: 5/5/2023 7:30 PM

Factor analysis is a statistical method used to uncover underlying dimensions, or factors, in a dataset.

By examining patterns of correlation between variables, factor analysis helps to identify groups of variables that are highly interrelated and can be used to explain a common underlying theme.

In this blog, we will explore the different types of factor analysis, their benefits, and examples of when to use them.

What is Factor Analysis?

Factor analysis is a commonly used data reduction statistical technique within the context of market research. The goal of factor analysis is to discover relationships between variables within a dataset by looking at correlations.

This advanced technique groups questions that are answered similarly among respondents in a survey.

The output will be a set of latent factors that represent questions that “move” together.

In other words, a resulting factor may consist of several survey questions whose data tend to increase or decrease in unison.

If you don’t need the underlying factors in your dataset and just want to understand the relationship between variables, regression analysis may be a better fit.

What are the Different Types of Factor Analysis? 

When discussing this topic, it is always good to distinguish between the different types of factor analysis. 

There are different approaches that achieve similar results in the end, but it's important to understand that there is different math going on behind the scenes for each method. 

Types of factor analysis include:

  • Principal component analysis
  • Exploratory factor analysis
  • Confirmatory factor analysis 

1. Principal component analysis

Factor analysis assumes the existence of latent factors within the dataset, and then works backward from there to identify the factors.

In contrast, principal component analysis (also known as PCA) uses the variables within a dataset to create a composite of the other variables.

With PCA, you're starting with the variables and then creating a weighted average called a “component,” similar to a factor.

2. Exploratory factor analysis

In exploratory factor analysis, you're forming a hypothesis about potential relationships between your variables.

You might be using this approach if you're not sure what to expect in the way of factors.

You may need assistance with identifying the underlying themes among your survey questions and in this case, I recommend working with a market research company , like Drive Research. 

Exploratory factor analysis ultimately helps understand how many factors are present in the data and what the skeleton of the factors might look like.

The process involves a manual review of factor loadings values for each data input, which are outputs to assess the suitability of the factors.

Do these factors make sense? If they don’t make adjustments to the inputs and try again.

If they do, you often move forward to the next step of confirmatory factor analysis. 

3. Confirmatory factor analysis

Exploratory factor analysis and confirmatory factor analysis go hand in hand.

Now that you have a hypothesis from exploratory factor analysis, confirmatory factor analysis is going to test that hypothesis of potential relationships in your variables.

This process is essentially fine-tuning your factors so that you land at a spot where the factors make sense with respect to your objectives.

The sought outcome of confirmatory factor analysis is to achieve statistically sound and digestible factors for yourself or a client.

A best practice for confirmatory factor analysis is testing the model's goodness of fit.

This involves splitting your data into two equal segments: a test set and a training set.

The next step is to test the goodness of fit on that training data set, which includes applying the created factors from the training data set to the test dataset.

If you achieve similar factors in both sets, this then gives you thumbs up that the model is statistically valid.

How Factor Analysis Can Benefit You

1. Spot trends within your data

If you are part of a business and leveraging factor analysis with your data, some of the advantages include the ability to spot trends or themes within your data.

Certain attributes may be connected in a way you wouldn’t have known otherwise.

You may learn that different customer behaviors and attitudes are closely related. This knowledge can be used to inform marketing decisions when it comes to your product or service.

2. Pinpoint the number of factors in a data set

Factor analysis, or exploratory factor analysis more specifically, can also be used to pinpoint the right number of factors within a data set.

Knowing how many overarching factors you need to worry about allows you to spend your time focusing on the aspects of your data that have the greatest impact.

This will save you time,  instill confidence in the results, and equip you with more actionable information.

3. Streamlines segmenting data

Lastly, factor analysis can be a great first step and lead-in for a cluster analysis if you are planning a customer segmentation study .

As a prerequisite, factor analysis streamlines the inputs for your segmentation. It helps to eliminate redundancies or irrelevant data, giving you a result that is clearer and easier to understand. 

Here are 6 easy steps to conducting customer segmentation . Factor analysis could fit nicely between Step 3 and Step 4 if you are working with a high number of inputs.

Examples of Performing a Factor Analysis

With so many types of market research , factor analysis has a wide range of applications.

Although, employee surveys and customer surveys are two of the best examples of when factor analysis is most helpful.

1. Employee surveys

For example, when using a third party for employee surveys, ask if the employee survey company can use factor analysis.

In these surveys, businesses aim to learn what matters most to their employees.

Because there is a myriad of variables that impact the employee experience, factor analysis has the potential to narrow down all these variables into a few manageable latent factors.

You might learn that flexibility, growth opportunities, and compensation are three key factors propelling your employees’ experiences.

Understanding these categories will make the management or hiring process that much easier.

2. Customer surveys

Factor analysis can also be a great application when conducting customer satisfaction surveys .

Let's say you have a lot of distinct variables going on in relation to customer preferences.

Customers are weighing these various product attributes each time before they make a purchase.

Factor analysis can group these attributes into useful factors, enabling you to see the forest through the trees.

You may have a hunch about what the categories would be, but factor analysis gives an approach backed by statistics to say this is how your product attributes should be grouped.

Factor Analysis Best Practices

1. Use the right inputs

For any market research study in which you plan to use factor analysis, you also need to make sure you have the proper inputs.

What it comes down to is asking survey questions that capture ordinal quantitative data.

Open-ended answers are not going to be useful for factor analysis.

Valid input data could involve rating scales, Likert scales, or even Yes/No questions that can be boiled down to binary ones and zeros.

Any combination of these questions could be effectively used for factor analysis. 

2. Include enough data points

It is also imperative to include enough data inputs.

Running factor analysis on 50 attributes will tell you a whole lot more than an analysis on 5 attributes.

After all, the idea is to throw an unorganized mass of attributes into the analysis to see what latent factors really exist among them.

3. Large sample sizes are best

A large sample size will also convey more confidence when you share the results of the factor analysis.

At least 100 responses per audience segment in the analysis is a good starting point, if possible.

Final Thoughts

Factor analysis is a powerful tool for identifying underlying dimensions in a dataset, allowing us to better understand complex phenomena.

By reducing the number of variables needed to explain a given phenomenon, factor analysis can help us to simplify our understanding of the world and make more informed decisions.

Whether used in marketing research, psychology, or other fields, this approach has proven to be highly effective.

Now that you know the different types of factor analysis and its benefits, you can begin to apply them to your own research or data analysis projects, leading to deeper insights and better outcomes.

Other relevant resources from our market research company include:

  • What is Correlation Analysis?
  • Conducting a Conjoin Survey [Ultimate Guide]
  • What is Regression Analysis [& How Is It Used?]
  • Explaining Choice-Based Conjoint Analysis

Contact Drive Research to Perform Factor Analysis

Drive Research is a national market research company located in Upstate NY . Advanced methods like factor analysis are a part of our wheelhouse to get the most out of your data.

Interested in learning more about our market research services ? Reach out through any of the four ways below.

  • Message us on our website
  • Email us at [email protected]
  • Call us at 888-725-DATA
  • Text us at 315-303-2040

tim gell - about the author

As a Senior Research Analyst, Tim is involved in every stage of a market research project for our clients. He first developed an interest in market research while studying at Binghamton University based on its marriage of business, statistics, and psychology.

Learn more about Tim, here .

subscribe to our blog

Categories: Market Research Analysis

Need help with your project? Get in touch with Drive Research.

View Our Blog

Last updated 20/06/24: Online ordering is currently unavailable due to technical issues. We apologise for any delays responding to customers while we resolve this. For further updates please visit our website: https://www.cambridge.org/news-and-insights/technical-incident

We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings .

Login Alert

types of factor analysis in research methodology

  • > The Cambridge Handbook of Research Methods and Statistics for the Social and Behavioral Sciences
  • > Introduction to Exploratory Factor Analysis: An Applied Approach

types of factor analysis in research methodology

Book contents

  • The Cambridge Handbook of Research Methods and Statistics for the Social and Behavioral Sciences
  • Cambridge Handbooks in Psychology
  • Copyright page
  • Contributors
  • Part I From Idea to Reality: The Basics of Research
  • Part II The Building Blocks of a Study
  • Part III Data Collection
  • Part IV Statistical Approaches
  • 21 Data Cleaning
  • 22 Descriptive and Inferential Statistics
  • 23 Testing Theories with Bayes Factors
  • 24 Introduction to Exploratory Factor Analysis: An Applied Approach
  • 25 Structural Equation Modeling
  • 26 Multilevel Modeling
  • 27 Meta-Analysis
  • 28 Qualitative Analysis
  • Part V Tips for a Successful Research Career

24 - Introduction to Exploratory Factor Analysis: An Applied Approach

from Part IV - Statistical Approaches

Published online by Cambridge University Press:  25 May 2023

This chapter provides an overview of exploratory factor analysis (EFA) from an applied perspective. We start with a discussion of general issues and applications, including definitions of EFA and the underlying common factors model. We briefly cover history and general applications. The most substantive part of the chapter focuses on six steps of EFA. More specifically, we consider variable (or indicator) selection (Step 1), computing the variance–covariance matrix (Step 2), factor-extraction methods (Step 3), factor-retention procedures (Step 4), factor-rotation methods (Step 5), and interpretation (Step 6). We include a data analysis example throughout (with example code for R), with full details in an online supplement. We hope the chapter will provide helpful guidance to applied researchers in the social and behavioral sciences.

Access options

Save book to kindle.

To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle .

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service .

  • Introduction to Exploratory Factor Analysis: An Applied Approach
  • By Martin Sellbom , David Goretzko
  • Edited by Austin Lee Nichols , Central European University, Vienna , John Edlund , Rochester Institute of Technology, New York
  • Book: The Cambridge Handbook of Research Methods and Statistics for the Social and Behavioral Sciences
  • Online publication: 25 May 2023
  • Chapter DOI: https://doi.org/10.1017/9781009010054.025

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox .

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive .

  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Institute for Digital Research and Education

A Practical Introduction to Factor Analysis: Exploratory Factor Analysis

This seminar is the first part of a two-part seminar that introduces central concepts in factor analysis. Part 1 focuses on exploratory factor analysis (EFA). Although the implementation is in SPSS, the ideas carry over to any software program. Part 2 introduces confirmatory factor analysis (CFA). Please refer to A Practical Introduction to Factor Analysis: Confirmatory Factor Analysis .

I. Exploratory Factor Analysis

  • Motivating example: The SAQ
  • Pearson correlation formula

Partitioning the variance in factor analysis

  • principal components analysis
  • principal axis factoring
  • maximum likelihood

Simple Structure

  • Orthogonal rotation (Varimax)
  • Oblique (Direct Oblimin)
  • Generating factor scores

Back to Launch Page

Introduction.

Suppose you are conducting a survey and you want to know whether the items in the survey have similar patterns of responses, do these items “hang together” to create a construct? The basic assumption of factor analysis is that for a collection of observed variables there are a set of underlying variables called  factors (smaller than the observed variables), that can explain the interrelationships among those variables. Let’s say you conduct a survey and collect responses about people’s anxiety about using SPSS. Do all these items actually measure what we call “SPSS Anxiety”?

fig01b

Motivating Example: The SAQ (SPSS Anxiety Questionnaire)

Let’s proceed with our hypothetical example of the survey which Andy Field terms the SPSS Anxiety Questionnaire. For simplicity, we will use the so-called “ SAQ-8 ” which consists of the first eight items in the SAQ . Click on the preceding hyperlinks to download the SPSS version of both files. The SAQ-8 consists of the following questions:

  • Statistics makes me cry
  • My friends will think I’m stupid for not being able to cope with SPSS
  • Standard deviations excite me
  • I dream that Pearson is attacking me with correlation coefficients
  • I don’t understand statistics
  • I have little experience of computers
  • All computers hate me
  • I have never been good at mathematics

Pearson Correlation of the SAQ-8

Let’s get the table of correlations in SPSS Analyze – Correlate – Bivariate:

Correlations
1 2 3 4 5 6 7 8
1 1
2 -.099 1
3 -.337 .318 1
4 .436 -.112 -.380 1
5 .402 -.119 -.310 .401 1
6 .217 -.074 -.227 .278 .257 1
7 .305 -.159 .409 .339 1
8 .331 -.050 -.259 .349 .269 .223 .297 1
**. Correlation is significant at the 0.01 level (2-tailed).
*. Correlation is significant at the 0.05 level (2-tailed).

From this table we can see that most items have some correlation with each other ranging from \(r=-0.382\) for Items 3 and 7 to \(r=.514\) for Items 6 and 7. Due to relatively high correlations among items, this would be a good candidate for factor analysis. Recall that the goal of factor analysis is to model the interrelationships between items with fewer (latent) variables. These interrelationships can be broken up into multiple components

Since the goal of factor analysis is to model the interrelationships among items, we focus primarily on the variance and covariance rather than the mean. Factor analysis assumes that variance can be partitioned into two types of variance, common and unique

  • Communality (also called \(h^2\)) is a definition of common variance that ranges between \(0 \) and \(1\). Values closer to 1 suggest that extracted factors explain more of the variance of an individual item.
  • Specific variance : is variance that is specific to a particular item (e.g., Item 4 “All computers hate me” may have variance that is attributable to anxiety about computers in addition to anxiety about SPSS).
  • Error variance:  comes from errors of measurement and basically anything unexplained by common or specific variance (e.g., the person got a call from her babysitter that her two-year old son ate her favorite lipstick).

The figure below shows how these concepts are related:

fig02d

Performing Factor Analysis

As a data analyst, the goal of a factor analysis is to reduce the number of variables to explain and to interpret the results. This can be accomplished in two steps:

  • factor extraction
  • factor rotation

Factor extraction involves making a choice about the type of model as well the number of factors to extract. Factor rotation comes after the factors are extracted, with the goal of achieving  simple structure  in order to improve interpretability.

Extracting Factors

There are two approaches to factor extraction which stems from different approaches to variance partitioning: a) principal components analysis and b) common factor analysis.

Principal Components Analysis

Unlike factor analysis, principal components analysis or PCA makes the assumption that there is no unique variance, the total variance is equal to common variance. Recall that variance can be partitioned into common and unique variance. If there is no unique variance then common variance takes up total variance (see figure below). Additionally, if the total variance is 1, then the common variance is equal to the communality.

Running a PCA with 8 components in SPSS

The goal of a PCA is to replicate the correlation matrix using a set of components that are fewer in number and linear combinations of the original set of items. Although the following analysis defeats the purpose of doing a PCA we will begin by extracting as many components as possible as a teaching exercise and so that we can decide on the optimal number of components to extract later.

First go to Analyze – Dimension Reduction – Factor. Move all the observed variables over the Variables: box to be analyze.

fig4-2a

Under Extraction – Method, pick Principal components and make sure to Analyze the Correlation matrix. We also request the Unrotated factor solution and the Scree plot. Under Extract, choose Fixed number of factors, and under Factor to extract enter 8. We also bumped up the Maximum Iterations of Convergence to 100.

fig4-2b4

The equivalent SPSS syntax is shown below:

Eigenvalues and Eigenvectors

Before we get into the SPSS output, let’s understand a few things about eigenvalues and eigenvectors.

Eigenvalues represent the total amount of variance that can be explained by a given principal component.  They can be positive or negative in theory, but in practice they explain variance which is always positive.

  • If eigenvalues are greater than zero, then it’s a good sign.
  • Since variance cannot be negative, negative eigenvalues imply the model is ill-conditioned.
  • Eigenvalues close to zero imply there is item multicollinearity, since all the variance can be taken up by the first component.

Eigenvalues are also the sum of squared component loadings across all items for each component, which represent the amount of variance in each item that can be explained by the principal component.

Eigenvectors represent a weight for each eigenvalue. The eigenvector times the square root of the eigenvalue gives the component loadings  which can be interpreted as the correlation of each item with the principal component. For this particular PCA of the SAQ-8, the  eigenvector associated with Item 1 on the first component is \(0.377\), and the eigenvalue of Item 1 is \(3.057\). We can calculate the first component as

$$(0.377)\sqrt{3.057}= 0.659.$$

In this case, we can say that the correlation of the first item with the first component is \(0.659\). Let’s now move on to the component matrix.

Component Matrix

The components can be interpreted as the correlation of each item with the component. Each item has a loading corresponding to each of the 8 components. For example, Item 1 is correlated \(0.659\) with the first component, \(0.136\) with the second component and \(-0.398\) with the third, and so on.

The square of each loading represents the proportion of variance (think of it as an \(R^2\) statistic) explained by a particular component. For Item 1, \((0.659)^2=0.434\) or \(43.4\%\) of its variance is explained by the first component. Subsequently, \((0.136)^2 = 0.018\) or \(1.8\%\) of the variance in Item 1 is explained by the second component. The total variance explained by both components is thus \(43.4\%+1.8\%=45.2\%\). If you keep going on adding the squared loadings cumulatively down the components, you find that it sums to 1 or 100%. This is also known as the communality , and in a PCA the communality for each item is equal to the total variance.

Component Matrix
Item Component
1 2 3 4 5 6 7 8
1 0.659 0.136 -0.398 0.160 -0.064 0.568 -0.177 0.068
2 -0.300 0.866 -0.025 0.092 -0.290 -0.170 -0.193 -0.001
3 -0.653 0.409 0.081 0.064 0.410 0.254 0.378 0.142
4 0.720 0.119 -0.192 0.064 -0.288 -0.089 0.563 -0.137
5 0.650 0.096 -0.215 0.460 0.443 -0.326 -0.092 -0.010
6 0.572 0.185 0.675 0.031 0.107 0.176 -0.058 -0.369
7 0.718 0.044 0.453 -0.006 -0.090 -0.051 0.025 0.516
8 0.568 0.267 -0.221 -0.694 0.258 -0.084 -0.043 -0.012
Extraction Method: Principal Component Analysis.
a. 8 components extracted.

Summing the squared component loadings across the components (columns) gives you the communality estimates for each item, and summing each squared loading down the items (rows) gives you the eigenvalue for each component. For example, to obtain the first eigenvalue we calculate:

$$(0.659)^2 +  (-.300)^2 – (-0.653)^2 + (0.720)^2 + (0.650)^2 + (0.572)^2 + (0.718)^2 + (0.568)^2 = 3.057$$

You will get eight eigenvalues for eight components, which leads us to the next table.

Total Variance Explained in the 8-component PCA

Recall that the eigenvalue represents the total amount of variance that can be explained by a given principal component. Starting from the first component, each subsequent component is obtained from partialling out the previous component. Therefore the first component explains the most variance, and the last component explains the least. Looking at the Total Variance Explained table, you will get the total variance explained by each component. For example, Component 1 is \(3.057\), or \((3.057/8)\% = 38.21\%\) of the total variance. Because we extracted the same number of components as the number of items, the Initial Eigenvalues column is the same as the Extraction Sums of Squared Loadings column.

Total Variance Explained
Component Initial Eigenvalues Extraction Sums of Squared Loadings
Total % of Variance Cumulative % Total % of Variance Cumulative %
1 3.057 38.206 38.206 3.057 38.206 38.206
2 1.067 13.336 51.543 1.067 13.336 51.543
3 0.958 11.980 63.523 0.958 11.980 63.523
4 0.736 9.205 72.728 0.736 9.205 72.728
5 0.622 7.770 80.498 0.622 7.770 80.498
6 0.571 7.135 87.632 0.571 7.135 87.632
7 0.543 6.788 94.420 0.543 6.788 94.420
8 0.446 5.580 100.000 0.446 5.580 100.000
Extraction Method: Principal Component Analysis.

Choosing the number of components to extract

Since the goal of running a PCA is to reduce our set of variables down, it would useful to have a criterion for selecting the optimal number of components that are of course smaller than the total number of items. One criterion is the choose components that have eigenvalues greater than 1. Under the Total Variance Explained table, we see the first two components have an eigenvalue greater than 1. This can be confirmed by the Scree Plot which plots the eigenvalue (total variance explained) by the component number. Recall that we checked the Scree Plot option under Extraction – Display, so the scree plot should be produced automatically.

fig4-2d

The first component will always have the highest total variance and the last component will always have the least, but where do we see the largest drop? If you look at Component 2, you will see an “elbow” joint. This is the marking point where it’s perhaps not too beneficial to continue further component extraction. There are some conflicting definitions of the interpretation of the scree plot but some say to take the number of components to the left of the the “elbow”. Following this criteria we would pick only one component. A more subjective interpretation of the scree plots suggests that any number of components between 1 and 4 would be plausible and further corroborative evidence would be helpful.

Some criteria say that the total variance explained by all components should be between 70% to 80% variance, which in this case would mean about four to five components. The authors of the book say that this may be untenable for social science research where extracted factors usually explain only 50% to 60%. Picking the number of components is a bit of an art and requires input from the whole research team. Let’s suppose we talked to the principal investigator and she believes that the two component solution makes sense for the study, so we will proceed with the analysis.

Running a PCA with 2 components in SPSS

Running the two component PCA is just as easy as running the 8 component solution. The only difference is under Fixed number of factors – Factors to extract you enter 2.

fig06

We will focus the differences in the output between the eight and two-component solution. Under Total Variance Explained, we see that the Initial Eigenvalues no longer equals the Extraction Sums of Squared Loadings. The main difference is that there are only two rows of eigenvalues, and the cumulative percent variance goes up to \(51.54\%\).

Total Variance Explained
Component Initial Eigenvalues Extraction Sums of Squared Loadings
Total % of Variance Cumulative % Total % of Variance Cumulative %
1 3.057 38.206 38.206 3.057 38.206 38.206
2 1.067 13.336 51.543 1.067 13.336 51.543
3 0.958 11.980 63.523
4 0.736 9.205 72.728
5 0.622 7.770 80.498
6 0.571 7.135 87.632
7 0.543 6.788 94.420
8 0.446 5.580 100.000
Extraction Method: Principal Component Analysis.

Similarly, you will see that the Component Matrix has the same loadings as the eight-component solution but instead of eight columns it’s now two columns.

Component Matrix
Item Component
1 2
1 0.659 0.136
2 -0.300 0.866
3 -0.653 0.409
4 0.720 0.119
5 0.650 0.096
6 0.572 0.185
7 0.718 0.044
8 0.568 0.267
Extraction Method: Principal Component Analysis.
a. 2 components extracted.

Again, we interpret Item 1 as having a correlation of 0.659 with Component 1. From glancing at the solution, we see that Item 4 has the highest correlation with Component 1 and Item 2 the lowest. Similarly, we see that Item 2 has the highest correlation with Component 2 and Item 7 the lowest.

Quick check:

True or False

  • The elements of the Component Matrix are correlations of the item with each component.
  • The sum of the squared eigenvalues is the proportion of variance under Total Variance Explained.
  • The Component Matrix can be thought of as correlations and the Total Variance Explained table can be thought of as \(R^2\).

1.T, 2.F (sum of squared loadings), 3. T

Communalities of the 2-component PCA

The communality is the sum of the squared component loadings up to the number of components you extract. In the SPSS output you will see a table of communalities.

Communalities
Initial Extraction
1 1.000 0.453
2 1.000 0.840
3 1.000 0.594
4 1.000 0.532
5 1.000 0.431
6 1.000 0.361
7 1.000 0.517
8 1.000 0.394
Extraction Method: Principal Component Analysis.

Since PCA is an iterative estimation process, it starts with 1 as an initial estimate of the communality (since this is the total variance across all 8 components), and then proceeds with the analysis until a final communality extracted. Notice that the Extraction column is smaller Initial column because we only extracted two components. As an exercise, let’s manually calculate the first communality from the Component Matrix. The first ordered pair is \((0.659,0.136)\) which represents the correlation of the first item with Component 1 and Component 2. Recall that squaring the loadings and summing down the components (columns) gives us the communality:

$$h^2_1 = (0.659)^2 + (0.136)^2 = 0.453$$

Going back to the Communalities table, if you sum down all 8 items (rows) of the Extraction column, you get \(4.123\). If you go back to the Total Variance Explained table and summed the first two eigenvalues you also get \(3.057+1.067=4.124\). Is that surprising? Basically it’s saying that the summing the communalities across all items is the same as summing the eigenvalues across all components.

1. In a PCA, when would the communality for the Initial column be equal to the Extraction column?

Answer : When you run an 8-component PCA.

  • The eigenvalue represents the communality for each item.
  • For a single component, the sum of squared component loadings across all items represents the eigenvalue for that component.
  • The sum of eigenvalues for all the components is the total variance.
  • The sum of the communalities down the components is equal to the sum of eigenvalues down the items.

1. F, the eigenvalue is the total communality across all items for a single component, 2. T, 3. T, 4. F (you can only sum communalities across items, and sum eigenvalues across components, but if you do that they are equal).

Common Factor Analysis

The partitioning of variance differentiates a principal components analysis from what we call common factor analysis. Both methods try to reduce the dimensionality of the dataset down to fewer unobserved variables, but whereas PCA assumes that there common variances takes up all of total variance, common factor analysis assumes that total variance can be partitioned into common and unique variance. It is usually more reasonable to assume that you have not measured your set of items perfectly. The unobserved or latent variable that makes up common variance is called a factor , hence the name factor analysis. The other main difference between PCA and factor analysis lies in the goal of your analysis. If your goal is to simply reduce your variable list down into a linear combination of smaller components then PCA is the way to go. However, if you believe there is some latent construct that defines the interrelationship among items, then factor analysis may be more appropriate. In this case, we assume that there is a construct called SPSS Anxiety that explains why you see a correlation among all the items on the SAQ-8, we acknowledge however that SPSS Anxiety cannot explain all the shared variance among items in the SAQ, so we model the unique variance as well. Based on the results of the PCA, we will start with a two factor extraction.

Running a Common Factor Analysis with 2 factors in SPSS

To run a factor analysis, use the same steps as running a PCA (Analyze – Dimension Reduction – Factor) except under Method choose Principal axis factoring. Note that we continue to set Maximum Iterations for Convergence at 100 and we will see why later.

fig07

Pasting the syntax into the SPSS Syntax Editor we get:

Note the main difference is under /EXTRACTION we list PAF for Principal Axis Factoring instead of PC for Principal Components. We will get three tables of output, Communalities, Total Variance Explained and Factor Matrix. Let’s go over each of these and compare them to the PCA output.

Communalities of the 2-factor PAF

Communalities
Item Initial Extraction
1 0.293 0.437
2 0.106 0.052
3 0.298 0.319
4 0.344 0.460
5 0.263 0.344
6 0.277 0.309
7 0.393 0.851
8 0.192 0.236
Extraction Method: Principal Axis Factoring.

The most striking difference between this communalities table and the one from the PCA is that the initial extraction is no longer one. Recall that for a PCA, we assume the total variance is completely taken up by the common variance or communality, and therefore we pick 1 as our best initial guess. What principal axis factoring does is instead of guessing 1 as the initial communality, it chooses the squared multiple correlation coefficient \(R^2\). To see this in action for Item 1  run a linear regression where Item 1 is the dependent variable and Items 2 -8 are independent variables. Go to Analyze – Regression – Linear and enter q01 under Dependent and q02 to q08 under Independent(s).

fig08

Pasting the syntax into the Syntax Editor gives us:

The output we obtain from this analysis is

Model Summary
Model R R Square Adjusted R Square Std. Error of the Estimate
1 .541 0.293 0.291 0.697

Note that 0.293 (highlighted in red) matches the initial communality estimate for Item 1. We can do eight more linear regressions in order to get all eight communality estimates but SPSS already does that for us. Like PCA,  factor analysis also uses an iterative estimation process to obtain the final estimates under the Extraction column. Finally, summing all the rows of the extraction column, and we get 3.00. This represents the total common variance shared among all items for a two factor solution.

Total Variance Explained (2-factor PAF)

The next table we will look at is Total Variance Explained. Comparing this to the table from the PCA we notice that the Initial Eigenvalues are exactly the same and includes 8 rows for each “factor”. In fact, SPSS simply borrows the information from the PCA analysis for use in the factor analysis and the factors are actually components in the Initial Eigenvalues column. The main difference now is in the Extraction Sums of Squares Loadings. We notice that each corresponding row in the Extraction column is lower than the Initial column. This is expected because we assume that total variance can be partitioned into common and unique variance, which means the common variance explained will be lower. Factor 1 explains 31.38% of the variance whereas Factor 2 explains 6.24% of the variance. Just as in PCA the more factors you extract, the less variance explained by each successive factor.

Total Variance Explained
Factor Initial Eigenvalues Extraction Sums of Squared Loadings
Total % of Variance Cumulative % Total % of Variance Cumulative %
1 3.057 38.206 38.206 2.511 31.382 31.382
2 1.067 13.336 51.543 0.499 6.238 37.621
3 0.958 11.980 63.523
4 0.736 9.205 72.728
5 0.622 7.770 80.498
6 0.571 7.135 87.632
7 0.543 6.788 94.420
8 0.446 5.580 100.000
Extraction Method: Principal Axis Factoring.

A subtle note that may be easily overlooked is that when SPSS plots the scree plot or the Eigenvalues greater than 1 criteria (Analyze – Dimension Reduction – Factor – Extraction), it bases it off the Initial and not the Extraction solution. This is important because the criteria here assumes no unique variance as in PCA, which means that this is the total variance explained not accounting for specific or measurement error. Note that in the Extraction of Sums Squared Loadings column the second factor has an eigenvalue that is less than 1 but is still retained because the Initial value is 1.067. If you want to use this criteria for the common variance explained you would need to modify the criteria yourself.

fig09

  • In theory, when would the percent of variance in the Initial column ever equal the Extraction column?
  • True or False, in SPSS when you use the Principal Axis Factor method the scree plot uses the final factor analysis solution to plot the eigenvalues.

Answers: 1. When there is no unique variance (PCA assumes this whereas common factor analysis does not, so this is in theory and not in practice), 2. F, it uses the initial PCA solution and the eigenvalues assume no unique variance.

Factor Matrix (2-factor PAF)

Factor Matrix
Item Factor
1 2
1 0.588 -0.303
2 -0.227 0.020
3 -0.557 0.094
4 0.652 -0.189
5 0.560 -0.174
6 0.498 0.247
7 0.771 0.506
8 0.470 -0.124
Extraction Method: Principal Axis Factoring.
a. 2 factors extracted. 79 iterations required.

First note the annotation that 79 iterations were required. If we had simply used the default 25 iterations in SPSS, we would not have obtained an optimal solution. This is why in practice it’s always good to increase the maximum number of iterations. Now let’s get into the table itself. The elements of the Factor Matrix table are called loadings and represent the correlation of each item with the corresponding factor. Just as in PCA, squaring each loading and summing down the items (rows) gives the total variance explained by each factor. Note that they are no longer called eigenvalues as in PCA. Let’s calculate this for Factor 1:

$$(0.588)^2 +  (-0.227)^2 + (-0.557)^2 + (0.652)^2 + (0.560)^2 + (0.498)^2 + (0.771)^2 + (0.470)^2 = 2.51$$

This number matches the first row under the Extraction column of the Total Variance Explained table. We can repeat this for Factor 2 and get matching results for the second row. Additionally, we can get the communality estimates by summing the squared loadings across the factors (columns) for each item. For example, for Item 1:

$$(0.588)^2 +  (-0.303)^2 = 0.437$$

Note that these results match the value of the Communalities table for Item 1 under the Extraction column. This means that the sum of squared loadings across factors represents the communality estimates for each item.

The relationship between the three tables

To see the relationships among the three tables let’s first start from the Factor Matrix (or Component Matrix in PCA). We will use the term factor to represent components in PCA as well. These elements represent the correlation of the item with each factor. Now, square each element to obtain squared loadings or the proportion of variance explained by each factor for each item. Summing the squared loadings across factors you get the proportion of variance explained by all factors in the model. This is known as common variance or communality, hence the result is the Communalities table. Going back to the Factor Matrix, if you square the loadings and sum down the items you get Sums of Squared Loadings (in PAF) or eigenvalues (in PCA) for each factor. These now become elements of the Total Variance Explained table. Summing down the rows (i.e., summing down the factors) under the Extraction column we get \(2.511 + 0.499 = 3.01\) or the total (common) variance explained. In words, this is the total (common) variance explained by the two factor solution for all eight items. Equivalently, since the Communalities table represents the total common variance explained by both factors for each item, summing down the items in the Communalities table also gives you the total (common) variance explained, in this case

$$ (0.437)^2 + (0.052)^2 + (0.319)^2 + (0.460)^2 + (0.344)^2 + (0.309)^2 + (0.851)^2 + (0.236)^2 = 3.01$$

which is the same result we obtained from the Total Variance Explained table. Here is a table that that may help clarify what we’ve talked about:

fig12b

In summary:

  • Squaring the elements in the Factor Matrix gives you the squared loadings
  • Summing the squared loadings of the Factor Matrix across the factors gives you the communality estimates for each item in the Extraction column of the Communalities table.
  • Summing the squared loadings of the Factor Matrix down the items gives you the Sums of Squared Loadings (PAF) or eigenvalue (PCA) for each factor across all items.
  • Summing the eigenvalues or Sums of Squared Loadings in the Total Variance Explained table gives you the total common variance explained.
  • Summing down all items of the Communalities table is the same as summing the eigenvalues or Sums of Squared Loadings down all factors under the Extraction column of the Total Variance Explained table.

True or False (the following assumes a two-factor Principal Axis Factor solution with 8 items)

  • The elements of the Factor Matrix represent correlations of each item with a factor.
  • Each squared element of Item 1 in the Factor Matrix represents the communality.
  • Summing the squared elements of the Factor Matrix down all 8 items within Factor 1 equals the first Sums of Squared Loading under the Extraction column of Total Variance Explained table.
  • Summing down all 8 items in the Extraction column of the Communalities table gives us the total common variance explained by both factors.
  • The total common variance explained is obtained by summing all Sums of Squared Loadings of the Initial column of the Total Variance Explained table
  • The total Sums of Squared Loadings in the Extraction column under the Total Variance Explained table represents the total variance which consists of total common variance plus unique variance.
  • In common factor analysis, the sum of squared loadings is the eigenvalue.

Answers: 1. T, 2. F, the sum of the squared elements across both factors, 3. T, 4. T, 5. F, sum all eigenvalues from the Extraction column of the Total Variance Explained table, 6. F, the total Sums of Squared Loadings represents only the total common variance excluding unique variance, 7. F, eigenvalues are only applicable for PCA.

Maximum Likelihood Estimation (2-factor ML)

Since this is a non-technical introduction to factor analysis, we won’t go into detail about the differences between Principal Axis Factoring (PAF) and Maximum Likelihood (ML). The main concept to know is that ML also assumes a common factor analysis using the \(R^2\) to obtain initial estimates of the communalities, but uses a different iterative process to obtain the extraction solution. To run a factor analysis using maximum likelihood estimation under Analyze – Dimension Reduction – Factor – Extraction – Method choose Maximum Likelihood.

fig10

Although the initial communalities are the same between PAF and ML, the final extraction loadings will be different, which means you will have different Communalities, Total Variance Explained, and Factor Matrix tables (although Initial columns will overlap). The other main difference is that you will obtain a Goodness-of-fit Test table, which gives you a absolute test of model fit. Non-significant values suggest a good fitting model. Here the p -value is less than 0.05 so we reject the two-factor model.

Goodness-of-fit Test
Chi-Square df Sig.
198.617 13 0.000

In practice, you would obtain chi-square values for multiple factor analysis runs, which we tabulate below from 1 to 8 factors. The table shows the number of factors extracted (or attempted to extract) as well as the chi-square, degrees of freedom, p-value and iterations needed to converge. Note that as you increase the number of factors, the chi-square value and degrees of freedom decreases but the iterations needed and p-value increases. Practically, you want to make sure the number of iterations you specify exceeds the iterations needed. Additionally, NS means no solution and N/A means not applicable. In SPSS, no solution is obtained when you run 5 to 7 factors because the degrees of freedom is negative (which cannot happen). For the eight factor solution, it is not even applicable in SPSS because it will spew out a warning that “You cannot request as many factors as variables with any extraction method except PC. The number of factors will be reduced by one.” This means that if you try to extract an eight factor solution for the SAQ-8, it will default back to the 7 factor solution. Now that we understand the table, let’s see if we can find the threshold at which the absolute fit indicates a good fitting model. It looks like here that the p -value becomes non-significant at a 3 factor solution. Note that differs from the eigenvalues greater than 1 criteria which chose 2 factors and using Percent of Variance explained you would choose 4-5 factors. We talk to the Principal Investigator and at this point, we still prefer the two-factor solution. Note that there is no “right” answer in picking the best factor model, only what makes sense for your theory. We will talk about interpreting the factor loadings when we talk about factor rotation to further guide us in choosing the correct number of factors.

Number of Factors Chi-square Df -value Iterations needed
1 553.08 20 <0.05 4
2 198.62 13 < 0.05 39
3 13.81 7 0.055 57
4 1.386 2 0.5 168
5 NS -2 NS NS
6 NS -5 NS NS
7 NS -7 NS NS
8 N/A N/A N/A N/A
  • The Initial column of the Communalities table for the Principal Axis Factoring and the Maximum Likelihood method are the same given the same analysis.
  • Since they are both factor analysis methods, Principal Axis Factoring and the Maximum Likelihood method will result in the same Factor Matrix.
  • In SPSS, both Principal Axis Factoring and Maximum Likelihood methods give chi-square goodness of fit tests.
  • You can extract as many factors as there are items as when using ML or PAF.
  • When looking at the Goodness-of-fit Test table, a p -value less than 0.05 means the model is a good fitting model.
  • In the Goodness-of-fit Test table, the lower the degrees of freedom the more factors you are fitting.

Answers: 1. T, 2. F, the two use the same starting communalities but a different estimation process to obtain extraction loadings, 3. F, only Maximum Likelihood gives you chi-square values, 4. F, you can extract as many components as items in PCA, but SPSS will only extract up to the total number of items minus 1, 5. F, greater than 0.05, 6. T, we are taking away degrees of freedom but extracting more factors.

Comparing Common Factor Analysis versus Principal Components

As we mentioned before, the main difference between common factor analysis and principal components is that factor analysis assumes total variance can be partitioned into common and unique variance, whereas principal components assumes common variance takes up all of total variance (i.e., no unique variance). For both methods, when you assume total variance is 1, the common variance becomes the communality. The communality is unique to each item, so if you have 8 items, you will obtain 8 communalities; and it represents the common variance explained by the factors or components. However in the case of principal components, the communality is the total variance of each item, and summing all 8 communalities gives you the total variance across all items. In contrast, common factor analysis assumes that the communality is a portion of the total variance, so that summing up the communalities represents the total common variance and not the total variance. In summary, for PCA, total common variance is equal to total variance explained , which in turn is equal to the total variance, but in common factor analysis, total common variance is equal to total variance explained but does not equal total variance.

fig11c

The following applies to the SAQ-8 when theoretically extracting 8 components or factors for 8 items:

  • For each item, when the total variance is 1, the common variance becomes the communality.
  • In principal components, each communality represents the total variance across all 8 items.
  • In common factor analysis, the communality represents the common variance for each item.
  • The communality is unique to each factor or component.
  • For both PCA and common factor analysis, the sum of the communalities represent the total variance explained.
  • For PCA, the total variance explained equals the total variance, but for common factor analysis it does not.

Answers: 1. T, 2. F, the total variance for each item, 3. T, 4. F, communality is unique to each item (shared across components or factors), 5. T, 6. T.

Rotation Methods

After deciding on the number of factors to extract and with analysis model to use, the next step is to interpret the factor loadings. Factor rotations help us interpret factor loadings. There are two general types of rotations, orthogonal and oblique.

  • orthogonal rotation assume factors are independent or uncorrelated with each other
  • oblique rotation factors are not independent and are correlated

The goal of factor rotation is to improve the interpretability of the factor solution by reaching simple structure. 

Simple structure

Without rotation, the first factor is the most general factor onto which most items load and explains the largest amount of variance. This may not be desired in all cases. Suppose you wanted to know how well a set of items load on each  factor; simple structure helps us to achieve this.

The definition of simple structure is that in a factor loading matrix:

  • Each row should contain at least one zero.
  • For m factors, each column should have at least m zeroes (e.g., three factors, at least 3 zeroes per factor).

For every pair of factors (columns),

  • there should be several items for which entries approach zero in one column but large loadings on the other.
  • a large proportion of items should have entries approaching zero.
  • only a small number of items have two non-zero entries.

The following table is an example of simple structure with three factors:

Item Factor 1 Factor 2 Factor 3
1 0.8 0 0
2 0.8 0 0
3 0.8 0 0
4 0 0.8 0
5 0 0.8 0
6 0 0.8 0
7 0 0 0.8
8 0 0 0.8

Let’s go down the checklist to criteria to see why it satisfies simple structure:

  • each row contains at least one zero (exactly two in each row)
  • each column contains at least three zeros (since there are three factors)
  • for every pair of factors, most items have zero on one factor and non-zeros on the other factor (e.g., looking at Factors 1 and 2, Items 1 through 6 satisfy this requirement)
  • for every pair of factors, all items have zero entries
  • for every pair of factors, none of the items have two non-zero entries

An easier criteria from Pedhazur and Schemlkin (1991) states that

  • each item has high loadings on one factor only
  • each factor has high loadings for only some of the items.

For the following factor matrix, explain why it does not conform to simple structure using both the conventional and Pedhazur test.

Item Factor 1 Factor 2 Factor 3
1 0.8 0 0.8
2 0.8 0 0.8
3 0.8 0 0
4 0.8 0 0
5 0 0.8 0.8
6 0 0.8 0.8
7 0 0.8 0.8
8 0 0.8 0

Solution: Using the conventional test, although Criteria 1 and 2 are satisfied (each row has at least one zero, each column has at least three zeroes), Criteria 3 fails because for Factors 2 and 3, only 3/8 rows have 0 on one factor and non-zero on the other. Additionally, for Factors 2 and 3, only Items 5 through 7 have non-zero loadings or 3/8 rows have non-zero coefficients (fails Criteria 4 and 5 simultaneously). Using the Pedhazur method, Items 1, 2, 5, 6, and 7 have high loadings on two factors (fails first criteria) and Factor 3 has high loadings on a majority or 5/8 items (fails second criteria).

Orthogonal Rotation (2 factor PAF)

We know that the goal of factor rotation is to rotate the factor matrix so that it can approach simple structure in order to improve interpretability. Orthogonal rotation assumes that the factors are not correlated. The benefit of doing an orthogonal rotation is that loadings are simple correlations of items with factors, and standardized solutions can estimate unique contribution of each factor. The most common type of orthogonal rotation is Varimax rotation. We will walk through how to do this in SPSS.

Running a two-factor solution (PAF) with Varimax rotation in SPSS

The steps to running a two-factor Principal Axis Factoring is the same as before (Analyze – Dimension Reduction – Factor – Extraction), except that under Rotation – Method we check Varimax. Make sure under Display to check Rotated Solution and Loading plot(s), and under Maximum Iterations for Convergence enter 100.

fig13

Pasting the syntax into the SPSS editor you obtain:

Let’s first talk about what tables are the same or different from running a PAF with no rotation. First, we know that the unrotated factor matrix (Factor Matrix table) should be the same. Additionally, since the  common variance explained by both factors should be the same, the Communalities table should be the same. The main difference is that we ran a rotation, so we should get the rotated solution (Rotated Factor Matrix) as well as the transformation used to obtain the rotation (Factor Transformation Matrix). Finally, although the total variance explained by all factors stays the same, the total variance explained by  each  factor will be different.

Rotated Factor Matrix (2-factor PAF Varimax)

Rotated Factor Matrix
Factor
1 2
1 0.646 0.139
2 -0.188 -0.129
3 -0.490 -0.281
4 0.624 0.268
5 0.544 0.221
6 0.229 0.507
7 0.275 0.881
8 0.442 0.202
Extraction Method: Principal Axis Factoring. Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 3 iterations.

The Rotated Factor Matrix table tells us what the factor loadings look like after rotation (in this case Varimax).  Kaiser normalization  is a method to obtain stability of solutions across samples. After rotation, the loadings are rescaled back to the proper size. This means that equal weight is given to all items when performing the rotation. The only drawback is if the communality is low for a particular item, Kaiser normalization will weight these items equally with items with high communality. As such, Kaiser normalization is preferred when communalities are high across all items. You can turn off Kaiser normalization by specifying

Here is what the Varimax rotated loadings look like without Kaiser normalization. Compared to the rotated factor matrix with Kaiser normalization the patterns look similar if you flip Factors 1 and 2; this may be an artifact of the rescaling. Another possible reasoning for the stark differences may be due to the low communalities for Item 2  (0.052) and Item 8 (0.236). Kaiser normalization weights these items equally with the other high communality items.

Rotated Factor Matrix
Factor
1 2
1 0.207 0.628
2 -0.148 -0.173
3 -0.331 -0.458
4 0.332 0.592
5 0.277 0.517
6 0.528 0.174
7 0.905 0.180
8 0.248 0.418
Extraction Method: Principal Axis Factoring. Rotation Method: Varimax without Kaiser Normalization.
a. Rotation converged in 3 iterations.

Interpreting the factor loadings (2-factor PAF Varimax)

In the table above, the absolute loadings that are higher than 0.4 are highlighted in blue for Factor 1 and in red for Factor 2. We can see that Items 6 and 7 load highly onto Factor 1 and Items 1, 3, 4, 5, and 8 load highly onto Factor 2. Item 2 does not seem to load highly on any factor. Looking more closely at Item 6 “My friends are better at statistics than me” and Item 7 “Computers are useful only for playing games”, we don’t see a clear construct that defines the two. Item 2, “I don’t understand statistics” may be too general an item and isn’t captured by SPSS Anxiety. It’s debatable at this point whether to retain a two-factor or one-factor solution, at the very minimum we should see if Item 2 is a candidate for deletion.

Factor Transformation Matrix and Factor Loading Plot (2-factor PAF Varimax)

The Factor Transformation Matrix tells us how the Factor Matrix was rotated. In SPSS, you will see a matrix with two rows and two columns because we have two factors.

Factor Transformation Matrix
Factor 1 2
1 0.773 0.635
2 -0.635 0.773
Extraction Method: Principal Axis Factoring. Rotation Method: Varimax with Kaiser Normalization.

How do we interpret this matrix? Well, we can see it as the way to move from the Factor Matrix to the Rotated Factor Matrix. From the Factor Matrix we know that the loading of Item 1 on Factor 1 is \(0.588\) and the loading of Item 1 on Factor 2 is \(-0.303\), which gives us the pair \((0.588,-0.303)\); but in the Rotated Factor Matrix the new pair is \((0.646,0.139)\). How do we obtain this new transformed pair of values? We can do what’s called matrix multiplication. The steps are essentially to start with one column of the Factor Transformation matrix, view it as another ordered pair and multiply matching ordered pairs. To get the first element, we can multiply the ordered pair in the Factor Matrix \((0.588,-0.303)\) with the matching ordered pair \((0.773,-0.635)\) in the first column of the Factor Transformation Matrix.

$$(0.588)(0.773)+(-0.303)(-0.635)=0.455+0.192=0.647.$$

To get the second element, we can multiply the ordered pair in the Factor Matrix \((0.588,-0.303)\) with the matching ordered pair \((0.773,-0.635)\) from the second column of the Factor Transformation Matrix:

$$(0.588)(0.635)+(-0.303)(0.773)=0.373-0.234=0.139.$$

Voila! We have obtained the new transformed pair with some rounding error. The figure below summarizes the steps we used to perform the transformation

fig18

The Factor Transformation Matrix can also tell us angle of rotation if we take the inverse cosine of the diagonal element. In this case, the angle of rotation is \(cos^{-1}(0.773) =39.4 ^{\circ}\). In the factor loading plot, you can see what that angle of rotation looks like, starting from \(0^{\circ}\) rotating up in a counterclockwise direction by \(39.4^{\circ}\). Notice here that the newly rotated x and y-axis are still at \(90^{\circ}\) angles from one another, hence the name orthogonal (a non-orthogonal or oblique rotation means that the new axis is no longer \(90^{\circ}\) apart. The points do not move in relation to the axis but rotate with it.

fig17b

Total Variance Explained (2-factor PAF Varimax)

The Total Variance Explained table contains the same columns as the PAF solution with no rotation, but adds another set of columns called “Rotation Sums of Squared Loadings”. This makes sense because if our rotated Factor Matrix is different, the square of the loadings should be different, and hence the Sum of Squared loadings will be different for each factor. However, if you sum the Sums of Squared Loadings across all factors for the Rotation solution,

$$ 1.701 + 1.309 = 3.01$$

and for the unrotated solution,

$$ 2.511 + 0.499 = 3.01,$$

you will see that the two sums are the same. This is because rotation does not change the total common variance. Looking at the Rotation Sums of Squared Loadings for Factor 1, it still has the largest total variance, but now that shared variance is split more evenly.

Total Variance Explained
Factor Rotation Sums of Squared Loadings
Total % of Variance Cumulative %
1 1.701 21.258 21.258
2 1.309 16.363 37.621
Extraction Method: Principal Axis Factoring.

Other Orthogonal Rotations

Varimax rotation is the most popular but one among other orthogonal rotations. The benefit of Varimax rotation is that it maximizes the variances of the loadings within the factors while maximizing differences between high and low loadings on a particular factor. Higher loadings are made higher while lower loadings are made lower. This makes Varimax rotation good for achieving simple structure but not as good for detecting an overall factor because it splits up variance of major factors among lesser ones. Quartimax may be a better choice for detecting an overall factor. It maximizes the squared loadings so that each item loads most strongly onto a single factor.

Here is the output of the Total Variance Explained table juxtaposed side-by-side for Varimax versus Quartimax rotation.

Total Variance Explained
Factor Quartimax Varimax
Total Total
1 2.381 1.701
2 0.629 1.309
Extraction Method: Principal Axis Factoring.

You will see that whereas Varimax distributes the variances evenly across both factors, Quartimax tries to consolidate more variance into the first factor.

Equamax is a hybrid of Varimax and Quartimax, but because of this may behave erratically and according to Pett et al. (2003), is not generally recommended.

Oblique Rotation

In oblique rotation, the factors are no longer orthogonal to each other (x and y axes are not \(90^{\circ}\) angles to each other). Like orthogonal rotation, the goal is rotation of the reference axes about the origin to achieve a simpler and more meaningful factor solution compared to the unrotated solution. In oblique rotation, you will see three unique tables in the SPSS output:

  • factor pattern matrix contains partial standardized regression coefficients of each item with a particular factor
  • factor structure matrix contains simple zero order correlations of each item with a particular factor
  • factor correlation matrix is a matrix of intercorrelations among factors

Suppose the Principal Investigator hypothesizes that the two factors are correlated, and wishes to test this assumption. Let’s proceed with one of the most common types of oblique rotations in SPSS, Direct Oblimin.

Running a two-factor solution (PAF) with Direct Quartimin rotation in SPSS

The steps to running a Direct Oblimin is the same as before (Analyze – Dimension Reduction – Factor – Extraction), except that under Rotation – Method we check Direct Oblimin. The other parameter we have to put in is delta , which defaults to zero. Technically, when delta = 0, this is known as Direct Quartimin. Larger positive values for delta increases the correlation among factors. However, in general you don’t want the correlations to be too high or else there is no reason to split your factors up. In fact, SPSS caps the delta value at 0.8 (the cap for negative values is -9999). Negative delta factors may lead to orthogonal factor solutions. For the purposes of this analysis, we will leave our delta = 0 and do a Direct Quartimin analysis.

fig14

All the questions below pertain to Direct Oblimin in SPSS.

  • When selecting Direct Oblimin, delta = 0 is actually Direct Quartimin.
  • Smaller delta values will increase the correlations among factors.
  • You typically want your delta values to be as high as possible.

Answers: 1. T, 2. F, larger delta values, 3. F, delta leads to higher factor correlations, in general you don’t want factors to be too highly correlated

Factor Pattern Matrix (2-factor PAF Direct Quartimin)

The factor pattern matrix represent partial standardized regression coefficients of each item with a particular factor. For example,  \(0.740\) is the effect of Factor 1 on Item 1 controlling for Factor 2 and \(-0.137\) is the effect of Factor 2 on Item 1 controlling for Factor 1. Just as in orthogonal rotation, the square of the loadings represent the contribution of the factor to the variance of the item, but excluding the overlap between correlated factors. Factor 1 uniquely contributes \((0.740)^2=0.405=40.5\%\) of the variance in Item 1 (controlling for Factor 2 ), and Factor 2 uniquely contributes \((-0.137)^2=0.019=1.9%\) of the variance in Item 1 (controlling for Factor 1).

Pattern Matrix
Factor
1 2
1 0.740 -0.137
2 -0.180 -0.067
3 -0.490 -0.108
4 0.660 0.029
5 0.580 0.011
6 0.077 0.504
7 -0.017 0.933
8 0.462 0.036
Extraction Method: Principal Axis Factoring. Rotation Method: Oblimin with Kaiser Normalization.
a. Rotation converged in 5 iterations.

Factor Structure Matrix (2-factor PAF Direct Quartimin)

The factor structure matrix represent the simple zero-order correlations of the items with each factor (it’s as if you ran a simple regression of a single factor on the outcome). For example, \(0.653\) is the simple correlation of Factor 1 on Item 1 and \(0.333\) is the simple correlation of Factor 2 on Item 1. The more correlated the factors, the more difference between pattern and structure matrix and the more difficult to interpret the factor loadings. From this we can see that Items 1, 3, 4, 5, and 8 load highly onto Factor 1 and Items 6, and 7 load highly onto Factor 2. Item 2 doesn’t seem to load well on either factor.

Additionally, we can look at the variance explained by each factor not controlling for the other factors. For example,  Factor 1 contributes \((0.653)^2=0.426=42.6\%\) of the variance in Item 1, and Factor 2 contributes \((0.333)^2=0.11=11.0%\) of the variance in Item 1. Notice that the contribution in variance of Factor 2 is higher \(11\%\) vs. \(1.9\%\) because in the Pattern Matrix we controlled for the effect of Factor 1, whereas in the Structure Matrix we did not.

Structure Matrix
Factor
1 2
1 0.653 0.333
2 -0.222 -0.181
3 -0.559 -0.420
4 0.678 0.449
5 0.587 0.380
6 0.398 0.553
7 0.577 0.923
8 0.485 0.330
Extraction Method: Principal Axis Factoring. Rotation Method: Oblimin with Kaiser Normalization.

Factor Correlation Matrix (2-factor PAF Direct Quartimin)

Recall that the more correlated the factors, the more difference between pattern and structure matrix and the more difficult to interpret the factor loadings. In our case, Factor 1 and Factor 2 are pretty highly correlated, which is why there is such a big difference between the factor pattern and factor structure matrices.

Factor Correlation Matrix
Factor 1 2
1 1.000 0.636
2 0.636 1.000
Extraction Method: Principal Axis Factoring. Rotation Method: Oblimin with Kaiser Normalization.

Factor plot

The difference between an orthogonal versus oblique rotation is that the factors in an oblique rotation are correlated. This means not only must we account for the angle of axis rotation \(\theta\), we have to account for the angle of correlation \(\phi\). The angle of axis rotation is defined as the angle between the rotated and unrotated axes (blue and black axes). From the Factor Correlation Matrix, we know that the correlation is \(0.636\), so the angle of correlation is \(cos^{-1}(0.636) = 50.5^{\circ}\), which is the angle between the two rotated axes (blue x and blue y-axis). The sum of rotations \(\theta\) and \(\phi\) is the total angle rotation. We are not given the angle of axis rotation, so we only know that the total angle rotation is \(\theta + \phi = \theta + 50.5^{\circ}\).

fig19c

Relationship between the Pattern and Structure Matrix

The structure matrix is in fact a derivative of the pattern matrix. If you multiply the pattern matrix by the factor correlation matrix, you will get back the factor structure matrix. Let’s take the example of the ordered pair \((0.740,-0.137)\) from the Pattern Matrix, which represents the partial correlation of Item 1 with Factors 1 and 2 respectively. Performing matrix multiplication for the first column of the Factor Correlation Matrix we get

$$ (0.740)(1) + (-0.137)(0.636) = 0.740 – 0.087 =0.652.$$

Similarly, we multiple the ordered factor pair with the second column of the Factor Correlation Matrix to get:

$$ (0.740)(0.636) + (-0.137)(1) = 0.471 -0.137 =0.333 $$

Looking at the first row of the Structure Matrix we get \((0.653,0.333)\) which matches our calculation! This neat fact can be depicted with the following figure:

fig21

As a quick aside, suppose that the factors are orthogonal, which means that the factor correlations are 1′ s on the diagonal and zeros on the off-diagonal, a quick calculation with the ordered pair \((0.740,-0.137)\)

$$ (0.740)(1) + (-0.137)(0) = 0.740$$

and similarly,

$$ (0.740)(0) + (-0.137)(1) = -0.137$$

and you get back the same ordered pair. This is called multiplying by the identity matrix (think of it as multiplying \(2*1 = 2\)).

  • Without changing your data or model, how would you make the factor pattern matrices and factor structure matrices more aligned with each other?
  • True or False, When you decrease delta, the pattern and structure matrix will become closer to each other.

Answers: 1. Decrease the delta values so that the correlation between factors approaches zero. 2. T, the correlations will become more orthogonal and hence the pattern and structure matrix will be closer.

Total Variance Explained (2-factor PAF Direct Quartimin)

The column Extraction Sums of Squared Loadings is the same as the unrotated solution, but we have an additional column known as Rotation Sums of Squared Loadings. SPSS says itself that “when factors are correlated, sums of squared loadings cannot be added to obtain total variance”. You will note that compared to the Extraction Sums of Squared Loadings, the Rotation Sums of Squared Loadings is only slightly lower for Factor 1 but much higher for Factor 2. This is because unlike orthogonal rotation, this is no longer the unique contribution of Factor 1 and Factor 2. How do we obtain the Rotation Sums of Squared Loadings? SPSS squares the Structure Matrix and sums down the items.

Total Variance Explained
Factor Extraction Sums of Squared Loadings Rotation Sums of Squared Loadings
Total % of Variance Cumulative % Total
1 2.511 31.382 31.382 2.318
2 0.499 6.238 37.621 1.931
Extraction Method: Principal Axis Factoring.
a. When factors are correlated, sums of squared loadings cannot be added to obtain a total variance.

As a demonstration, let’s obtain the loadings from the Structure Matrix for Factor 1

$$ (0.653)^2 + (-0.222)^2 + (-0.559)^2 + (0.678)^2 + (0.587)^2 + (0.398)^2 + (0.577)^2 + (0.485)^2 = 2.318.$$

Note that \(2.318\) matches the Rotation Sums of Squared Loadings for the first factor. This means that the Rotation Sums of Squared Loadings represent the non- unique contribution of each factor to total common variance, and summing these squared loadings for all factors can lead to estimates that are greater than total variance.

Interpreting the factor loadings (2-factor PAF Direct Quartimin)

Finally, let’s conclude by interpreting the factors loadings more carefully. Let’s compare the Pattern Matrix and Structure Matrix tables side-by-side. First we highlight absolute loadings that are higher than 0.4 in blue for Factor 1 and in red for Factor 2. We see that the absolute loadings in the Pattern Matrix are in general higher in Factor 1 compared to the Structure Matrix and lower for Factor 2. This makes sense because the Pattern Matrix partials out the effect of the other factor. Looking at the Pattern Matrix, Items 1, 3, 4, 5, and 8 load highly on Factor 1, and Items 6 and 7 load highly on Factor 2. Looking at the Structure Matrix, Items 1, 3, 4, 5, 7 and 8 are highly loaded onto Factor 1 and Items 3, 4, and 7 load highly onto Factor 2. Item 2 doesn’t seem to load on any factor. The results of the two matrices are somewhat inconsistent but can be explained by the fact that in the Structure Matrix Items 3, 4 and 7 seem to load onto both factors evenly but not in the Pattern Matrix. For this particular analysis, it seems to make more sense to interpret the Pattern Matrix because it’s clear that Factor 1 contributes uniquely to most items in the SAQ-8 and Factor 2 contributes common variance only to two items (Items 6 and 7). There is an argument here that perhaps Item 2 can be eliminated from our survey and to consolidate the factors into one SPSS Anxiety factor. We talk to the Principal Investigator and we think it’s feasible to accept SPSS Anxiety as the single factor explaining the common variance in all the items, but we choose to remove Item 2, so that the SAQ-8 is now the SAQ-7.

Pattern Matrix Structure Matrix
Factor Factor
1 2 1 2
1 0.740 -0.137 0.653 0.333
2 -0.180 -0.067 -0.222 -0.181
3 -0.490 -0.108 -0.559 -0.420
4 0.660 0.029 0.678 0.449
5 0.580 0.011 0.587 0.380
6 0.077 0.504 0.398 0.553
7 -0.017 0.933 0.577 0.923
8 0.462 0.036 0.485 0.330
  • In oblique rotation, an element of a factor pattern matrix is the unique contribution of the factor to the item whereas an element in the factor structure matrix is the non- unique contribution to the factor to an item.
  • In the Total Variance Explained table, the Rotation Sum of Squared Loadings represent the unique contribution of each factor to total common variance.
  • The Pattern Matrix can be obtained by multiplying the Structure Matrix with the Factor Correlation Matrix
  • If the factors are orthogonal, then the Pattern Matrix equals the Structure Matrix
  • In oblique rotations, the sum of squared loadings for each item across all factors is equal to the communality (in the SPSS Communalities table) for that item.

Answers: 1. T, 2. F, represent the non -unique contribution (which means the total sum of squares can be greater than the total communality), 3. F, the Structure Matrix is obtained by multiplying the Pattern Matrix with the Factor Correlation Matrix, 4. T, it’s like multiplying a number by 1, you get the same number back, 5. F, this is true only for orthogonal rotations, the SPSS Communalities table in rotated factor solutions is based off of the unrotated solution, not the rotated solution.

As a special note, did we really achieve simple structure? Although rotation helps us achieve simple structure, if the interrelationships do not hold itself up to simple structure, we can only modify our model. In this case we chose to remove Item 2 from our model.

Promax Rotation

Promax rotation begins with Varimax (orthgonal) rotation, and uses Kappa to raise the power of the loadings. Promax really reduces the small loadings. Promax also runs faster than Varimax, and in our example Promax took 3 iterations while Direct Quartimin (Direct Oblimin with Delta =0) took 5 iterations.

  • Varimax, Quartimax and Equamax are three types of orthogonal rotation and Direct Oblimin, Direct Quartimin and Promax are three types of oblique rotations.

Answers: 1. T.

Generating Factor Scores

Suppose the Principal Investigator is happy with the final factor analysis which was the two-factor Direct Quartimin solution. She has a hypothesis that SPSS Anxiety and Attribution Bias predict student scores on an introductory statistics course, so would like to use the factor scores as a predictor in this new regression analysis. Since a factor is by nature unobserved, we need to first predict or generate plausible factor scores. In SPSS, there are three methods to factor score generation, Regression, Bartlett, and Anderson-Rubin.

Generating factor scores using the Regression Method in SPSS

In order to generate factor scores, run the same factor analysis model but click on Factor Scores (Analyze – Dimension Reduction – Factor – Factor Scores). Then check Save as variables, pick the Method and optionally check Display factor score coefficient matrix.

fig25

The code pasted in the SPSS Syntax Editor looksl like this:

Here we picked the Regression approach after fitting our two-factor Direct Quartimin solution. After generating the factor scores, SPSS will add two extra variables to the end of your variable list, which you can view via Data View. The figure below shows what this looks like for the first 5 participants, which SPSS calls FAC1_1 and FAC2_1 for the first and second factors. These are now ready to be entered in another analysis as predictors.

fig26

For those who want to understand how the scores are generated, we can refer to the Factor Score Coefficient Matrix. These are essentially the regression weights that SPSS uses to generate the scores. We know that the ordered pair of scores for the first participant is \(-0.880, -0.113\). We also know that the 8 scores for the first participant are \(2, 1, 4, 2, 2, 2, 3, 1\). However, what SPSS uses is actually the standardized scores, which can be easily obtained in SPSS by using Analyze – Descriptive Statistics – Descriptives – Save standardized values as variables. The standardized scores obtained are:   \(-0.452, -0.733, 1.32, -0.829, -0.749, -0.2025, 0.069, -1.42\). Using the Factor Score Coefficient matrix, we multiply the participant scores by the coefficient matrix for each column. For the first factor:

$$ \begin{eqnarray} &(0.284) (-0.452) + (-0.048)-0.733) + (-0.171)(1.32) + (0.274)(-0.829) \\ &+ (0.197)(-0.749) +(0.048)(-0.2025) + (0.174) (0.069) + (0.133)(-1.42) \\ &= -0.880, \end{eqnarray} $$

which matches FAC1_1  for the first participant. You can continue this same procedure for the second factor to obtain FAC2_1.

Factor Score Coefficient Matrix
Item Factor
1 2
1 0.284 0.005
2 -0.048 -0.019
3 -0.171 -0.045
4 0.274 0.045
5 0.197 0.036
6 0.048 0.095
7 0.174 0.814
8 0.133 0.028
Extraction Method: Principal Axis Factoring. Rotation Method: Oblimin with Kaiser Normalization. Factor Scores Method: Regression.

The second table is the Factor Score Covariance Matrix,

Factor Score Covariance Matrix
Factor 1 2
1 1.897 1.895
2 1.895 1.990
Extraction Method: Principal Axis Factoring. Rotation Method: Oblimin with Kaiser Normalization. Factor Scores Method: Regression.

This table can be interpreted as the covariance matrix of the factor scores, however it would only be equal to the raw covariance if the factors are orthogonal. For example, if we obtained the raw covariance matrix of the factor scores we would get

Correlations
FAC1_1 FAC1_2
FAC1_1 Covariance 0.777 0.604
FAC1_2 Covariance 0.604 0.870

You will notice that these values are much lower. Let’s compare the same two tables but for Varimax rotation:

Factor Score Covariance Matrix
Factor 1 2
1 0.670 0.131
2 0.131 0.805
Extraction Method: Principal Axis Factoring. Rotation Method: Varimax with Kaiser Normalization. Factor Scores Method: Regression.

If you compare these elements to the Covariance table below, you will notice they are the same.

Correlations
FAC1_1 FAC1_2
FAC1_1 Covariance 0.670 0.131
FAC1_2 Covariance 0.131 0.805

Note with the Bartlett and Anderson-Rubin methods you will not obtain the Factor Score Covariance matrix.

Regression, Bartlett and Anderson-Rubin compared

Among the three methods, each has its pluses and minuses. The regression method maximizes the correlation (and hence validity) between the factor scores and the underlying factor but the scores can be somewhat biased. This means even if you have an orthogonal solution, you can still have correlated factor scores. For Bartlett’s method, the factor scores highly correlate with its own factor and not with others, and they are an unbiased estimate of the true factor score. Unbiased scores means that with repeated sampling of the factor scores, the average of the scores is equal to the average of the true factor score. The Anderson-Rubin method perfectly scales the factor scores so that the factor scores are uncorrelated with other factors and uncorrelated with other factor scores . Since Anderson-Rubin scores impose a correlation of zero between factor scores, it is not the best option to choose for oblique rotations. Additionally, Anderson-Rubin scores are biased.

In summary, if you do an orthogonal rotation, you can pick any of the the three methods. For orthogonal rotations, use Bartlett if you want unbiased scores, use the regression method if you want to maximize validity and use Anderson-Rubin if you want the factor scores themselves to be uncorrelated with other factor scores. If you do oblique rotations, it’s preferable to stick with the Regression method. Do not use Anderson-Rubin for oblique rotations.

  • If you want the highest correlation of the factor score with the corresponding factor (i.e., highest validity), choose the regression method.
  • Bartlett scores are unbiased whereas Regression and Anderson-Rubin scores are biased.
  • Anderson-Rubin is appropriate for orthogonal but not for oblique rotation because factor scores will be uncorrelated with other factor scores.

Answers: 1. T, 2. T, 3. T

Your Name (required)

Your Email (must be a valid email for us to receive the report!)

Comment/Error Report (required)

How to cite this page

  • © 2021 UC REGENTS

logo image missing

  • > Machine Learning

Factor Analysis: Types & Applications

  • Soumyaa Rawat
  • Sep 14, 2021

Factor Analysis: Types & Applications title banner

What is Factor Analysis ?

Data is everywhere. From data research to artificial intelligence technology, data has become an essential commodity that is being perceived as a link between our past and future. Is an organization willing to collect its past records? 

Data is the key solution to this problem. Is any programmer willing to formulate a Machine Learning algorithm ? Data is what s/he needs to begin with. 

While the world has moved on to technology, it still is unaware of the fact that data is the building block of all these technological advancements that have together made the world so advanced. 

When it comes to data, a number of tools and techniques are put to work to arrange, organize, and accumulate data the way one wants to. Factor Analysis is one of them. A data reduction technique, Factor Analysis is a statistical method used to reduce the number of observed factors for a much better insight into a given dataset. 

But first, we shall understand what is a factor. A factor is a set of observed variables that have similar responses to an action. Since variables in a given dataset can be too much to deal with, Factor Analysis condenses these factors or variables into fewer variables that are actionable and substantial to work upon. 

A technique of dimensionality reduction in data mining, Factor Analysis works on narrowing the availability of variables in a given data set, allowing deeper insights and better visibility of patterns for data research. 

Most commonly used to identify the relationship between various variables in statistics , Factor Analysis can be thought of as a compressor that compresses the size of variables and produces a much enhanced, insightful, and accurate variable set. 

“FA is considered an extension of principal component analysis since the ultimate objective for both techniques is a data reduction.” Factor Analysis in Data Reduction  

Types of Factor Analysis

Developed in 1904 by Spearman, Factor Analysis is broadly divided into various types based upon the approach to detect underlying variables and establish a relationship between them. 

While there are a variety of techniques to conduct factor analysis like Principal Component Analysis or Independent Component Analysis , Factor Analysis can be divided into 2 types which we will discuss below. Let us get started. 

Confirmatory Factor Analysis

As the name of this concept suggests, Confirmatory Factor Analysis (CFA) lets one determine whether a relationship between factors or a set of overserved variables and their underlying components exists. 

It helps one confirm whether there is a connection between two components of variables in a given dataset. Usually, the purpose of CFA is to test whether certain data fit the requirements of a particular hypothesis. 

The process begins with a researcher formulating a hypothesis that is made to fit along the lines of a certain theory. If the constraints imposed on a model do not fit well with the data, then the model is rejected, and it is confirmed that no relationship exists between a factor and its underlying construct. Perhaps hypothetical testing also finds a space in the world of Factor Analysis.  

Exploratory Factor Analysis

In the case of Exploratory Factor Statistical Analysis , the purpose is to determine/explore the underlying latent structure of a large set of variables. EFA, unlike CFA, tends to uncover the relationship, if any, between measured variables of an entity (for example - height, weight, etc. in a human figure). 

While CFA works on finding a relationship between a set of observed variables and their underlying structure, this works to uncover a relationship between various variables within a given dataset. 

Conducting Exploratory Factor Analysis involves figuring the total number of factors involved in a dataset. 

“EFA is generally considered to be more of a theory-generating procedure than a theory-testing procedure. In contrast, confirmatory factor analysis (CFA) is generally based on a strong theoretical and/or empirical foundation that allows the researcher to specify an exact factor model in advance.” EFA in Hypothesis Testing  

Applications of Factor Analysis

With immense use in various fields in real life, this segment presents a list of applications of Factor Analysis and the way FA is used in day-to-day operations. 

This banner introduces the readers to the applications of Factor Analysis in real world. 1. Marketing 2. Data Mining 3. Machine Learning 4. Nutritional Science 5. Business

Applications of factor analysis

Marketing is defined as the act of promoting a good or a service or even a brand. When it comes to Factor Analysis in marketing, one can benefit immensely from this statistical method. 

In order to boost marketing campaigns and accelerate success, in the long run, companies employ Factor Analysis techniques that help to find a correlation between various variables or factors of a marketing campaign. 

Moreover, FA also helps to establish connections with customer satisfaction and consequent feedback after a marketing campaign in order to check its efficacy and impact on the audiences. 

That said, the realm of marketing can largely benefit from Factor Analysis and trigger sales with respect to much-enhanced feedback and customer satisfaction reports. 

(Must read: Marketing management guide )

Data Mining

In data mining, Factor Analysis can play a role as important as that of artificial intelligence. Owing to its ability to transform a complex and vast dataset into a group of filtered out variables that are related to each other in some way or the other, FA eases out the process of data mining. 

For data scientists, the tedious task of finding relationships and establishing correlation among various variables has always been full of obstacles and errors. 

However, with the help of this statistical method, data mining has become much more advanced. 

(Also read: Data mining tools )

Machine Learning

Machine Learning and data mining tools go hand in hand. Perhaps this is the reason why Factor Analysis finds a place among Machine Learning tools and techniques.

As Factor Analysis in machine learning helps in reducing the number of variables in a given dataset to procure a more accurate and enhanced set of observed factors, various machine learning algorithms are put to use to work accordingly. 

They are trained well with humongous data to rightly work in order to give way to other applications. An unsupervised machine learning algorithm, FA is largely used for dimensionality reduction in machine learning. 

Thereby, machine learning can very well collaborate with Factor Analysis to give rise to data mining techniques and make the task of data research massively efficient. 

(Recommended blog: Data mining software )

Nutritional Science

Nutritional Science is a prominent field of work in the contemporary scenario. By focusing on the dietary practices of a given population, Factor Analysis helps to establish a relationship between the consumption of nutrients in an adult’s diet and the nutritional health of that person. 

Furthermore, an individual’s nutrient intake and consequent health status have helped nutritionists to compute the appropriate quantity of nutrients one should intake in a given period of time. 

The application of Factor Analysis in business is rather surprising and satisfactory.

Remember the times when business firms had to employ professionals to dig out patterns from past records in order to lay a road ahead for strategic business plans?

Well, gone are the days when so much work had to be done. Thanks to Factor Analysis, the world of business can use it for eliminating the guesswork and formulating more accurate and straightforward decisions in various aspects like budgeting, marketing, production, and transport. 

Pros and Cons of Factor Analysis  

Having learned about Factor Analysis in detail, let us now move on to looking closely into the pros and cons of this statistical method. 

Pros of Factor Analysis

Measurable attributes.

The first and foremost pro of FA is that it is open to all measurable attributes. Be it subjective or objective, any kind of attribute can be worked upon when it comes to this statistical technique. 

Unlike some statistical models that only work on objective attributes, Factor Analysis goes well with both subjective and objective attributes. 

Cost-Effective

While data research and data mining algorithms can cost a lot due to the extraordinary charges, this statistical model is surprisingly cost-effective and does not take many resources to work with. 

That said, it can be incorporated by any beginner or an experienced professional in light of its cost-effective and easy approach towards data mining and data reduction. 

Flexible Approach

While many machine learning algorithms are rigid and constricted to a single approach, Factor Analysis does not work that way. 

Rather, this statistical model has a flexible approach towards multivariate datasets that let one obtain relationships or correlations between various variables and their underlying components. 

(Must read: AI algorithms )

Cons of Factor Analysis

Incomprehensive results.

While there are many pros of Factor Analysis, there are various cons of this method as well. Primarily, Factor Analysis can procure incompetent results due to incomprehensive datasets. 

While various data points can have similar traits, some other variables or factors can go unnoticed due to being isolated in a vast dataset. That said, the results of this method could be incomprehensive. 

Non-Identification of Complicated Factors

Another drawback of Factor Analysis is that it does not identify complicated factors that underlie a dataset. 

While some results could clearly indicate a correlation between two variables, some complicated correlations can go unnoticed in such a method. 

Perhaps the non-identification of complicated factors and their relationships could be an issue for data research. 

Reliant on Theory

Even though Factor Analysis skills can be imitated by machine learning algorithms, this method is still reliant on theory and thereby data researchers. 

While many components of a dataset can be handled by a computer, some other details are required to be looked into by humans. 

Thus, one of the major drawbacks of Factor Analysis is that it is somehow reliant on theory and cannot fully function without manual assistance. 

(Suggested reading: Deep learning algorithms )

Summing Up  

To sum up, Factor Analysis is an extensive statistical model that is used to reduce dimensions of a given dataset with the help of condensing observed variables in a smaller size. 

(Top reading: Statistical data distribution models )

By arranging observed variables in groups of super-variables, Factor Analysis has immensely impacted the way data mining is done. With numerous fields relying on this technique for better performance, FA is the need of the hour. 

Share Blog :

types of factor analysis in research methodology

Be a part of our Instagram community

Trending blogs

5 Factors Influencing Consumer Behavior

Elasticity of Demand and its Types

An Overview of Descriptive Analysis

What is PESTLE Analysis? Everything you need to know about it

What is Managerial Economics? Definition, Types, Nature, Principles, and Scope

5 Factors Affecting the Price Elasticity of Demand (PED)

6 Major Branches of Artificial Intelligence (AI)

Scope of Managerial Economics

Dijkstra’s Algorithm: The Shortest Path Algorithm

Different Types of Research Methods

Latest Comments

types of factor analysis in research methodology

Comprehensive Guide to Factor Analysis

Introduction to Factor Analysis

Factor analysis is a sophisticated statistical method aimed at reducing a large number of variables into a smaller set of factors. This technique is valuable for extracting the maximum common variance from all variables, transforming them into a single score for further analysis. As a part of the general linear model (GLM), factor analysis is predicated on certain key assumptions such as linearity, absence of multicollinearity, inclusion of relevant variables, and a true correlation between variables and factors.

Principal Methods of Factor Extraction

Principal Component Analysis (PCA) :

PCA is the most widely used technique. It begins by extracting the maximum variance, assigning it to the first factor. Subsequent factors are determined by removing variance accounted for by earlier factors and extracting the maximum variance from what remains. This sequential process continues until all factors are identified.

Common Factor Analysis :

Preferred for structural equation modeling (SEM), this method focuses on extracting common variance among variables, excluding unique variances. It’s particularly useful for understanding underlying relationships that may not be immediately apparent from the observed variables.

Image Factoring :

Based on a correlation matrix, image factoring uses ordinary least squares regression to predict factors, making it distinct in its approach to factor extraction.

Maximum Likelihood Method :

This technique utilizes the maximum likelihood estimation approach to factor analysis, working from the correlation matrix to derive factors.

Other Methods :

Including Alpha factoring and weighted least squares, these methods provide alternatives that may be suitable depending on the specific characteristics of the data set.

Factor Loadings and Their Interpretation

Factor loadings play a crucial role in factor analysis, representing the correlation between the variable and the factor. A factor loading of 0.7 or higher typically indicates that the factor sufficiently captures the variance of that variable. These loadings help in determining the importance and contribution of each variable to a factor.

Looking for assistance with your research?

Schedule a time to speak with an expert using the calendar below.

User-friendly Software

Transform raw data to written interpreted APA formatted results in seconds.

Eigenvalues and Factor Scores

  • Eigenvalues : Also known as characteristic roots, eigenvalues represent the variance explained by a factor out of the total variance. They are critical for understanding the contribution of each factor to explaining the pattern in the data.
  • Factor Scores : These scores, which can be standardized, represent the estimated scores of each observation for the factors and are used for further analysis. They essentially provide a way to reduce the dimensionality of the data set while retaining as much information as possible.

Determining the Number of Factors

The number of factors to retain can be determined by several criteria:

  • Kaiser Criterion : An eigenvalue greater than one suggests that the factor should be retained.
  • Variance Extraction Rule : Factors should explain a significant portion of the variance, typically set at a threshold of 0.7 or higher.

Rotation Techniques to Enhance Interpretability

Rotations in factor analysis, whether orthogonal like Varimax and Quartimax or oblique like Direct Oblimin and Promax, help in achieving a simpler, more interpretable factor structure. These methods adjust the axes on which factors are plotted to maximize the distinction between factors and improve the clarity of the results.

Assumptions and Data Requirements

  • Data Characteristics : Factor analysis assumes no outliers, a sufficient sample size (cases should exceed the number of factors), and interval-level data measurement.
  • Statistical Assumptions : There should be no perfect multicollinearity among variables, and while the model assumes linearity, nonlinear variables can be transformed to meet this requirement.

Factor analysis is a powerful tool for data reduction and interpretation, enabling researchers to uncover underlying dimensions or factors that explain patterns in complex data sets. By adhering to its assumptions and appropriately choosing factor extraction and rotation methods, researchers can effectively use factor analysis to simplify data, construct scales, and enhance the validity of their studies.

Bryant, F. B., & Yarnold, P. R. (1995). Principal components analysis and exploratory and confirmatory factor analysis. In L. G. Grimm & P. R. Yarnold (Eds.), Reading and understanding multivariate analysis . Washington, DC: American Psychological Association.

Dunteman, G. H. (1989). Principal components analysis . Newbury Park, CA: Sage Publications.

Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4 (3), 272-299.

Gorsuch, R. L. (1983). Factor Analysis . Hillsdale, NJ: Lawrence Erlbaum Associates.

Hair, J. F., Jr., Anderson, R. E., Tatham, R. L., & Black, W. C. (1995). Multivariate data analysis with readings (4th ed.). Upper Saddle River, NJ: Prentice-Hall.

Hatcher, L. (1994). A step-by-step approach to using the SAS system for factor analysis and structural equation modeling . Cary, NC: SAS Institute.

Hutcheson, G., & Sofroniou, N. (1999). The multivariate social scientist: Introductory statistics using generalized linear models . Thousand Oaks, CA: Sage Publications.

Kim, J. -O., & Mueller, C. W. (1978a). Introduction to factor analysis: What it is and how to do it . Newbury Park, CA: Sage Publications.

Kim, J. -O., & Mueller, C. W. (1978b). Factor Analysis: Statistical methods and practical issues . Newbury Park, CA: Sage Publications.

Lawley, D. N., & Maxwell, A. E. (1962). Factor analysis as a statistical method. The Statistician, 12 (3), 209-229.

Levine, M. S. (1977). Canonical analysis and factor comparison . Newbury Park, CA: Sage Publications.

Pett, M. A., Lackey, N. R., & Sullivan, J. J. (2003). Making sense of factor analysis: The use of factor analysis for instrument development in health care research . Thousand Oaks, CA: Sage Publications.

Shapiro, S. E., Lasarev, M. R., & McCauley, L. (2002). Factor analysis of Gulf War illness: What does it add to our understanding of possible health effects of deployment, American Journal of Epidemiology, 156 , 578-585.

Velicer, W. F., Eaton, C. A., & Fava, J. L. (2000). Construct explication through factor or component analysis: A review and evaluation of alternative procedures for determining the number of factors or components. In R. D. Goffin & E. Helmes (Eds.), Problems and solutions in human assessment: Honoring Douglas Jackson at seventy. Boston, MA: Kluwer.

Widaman, K. F. (1993). Common factor analysis versus principal component analysis: Differential bias in representing model parameters, Multivariate Behavioral Research, 28 , 263-311.

Related Pages:

  • General Linear Model
  • Confirmatory Factor Analysis
  • Exploratory Factor Analysis
  • Principal Component Analysis

Statistics Solutions can assist with your quantitative analysis by assisting you to develop your methodology and results chapters. The services that we offer include:

Data Analysis Plan

  • Edit your research questions and null/alternative hypotheses
  • Write your data analysis plan; specify specific statistics to address the research questions, the assumptions of the statistics, and justify why they are the appropriate statistics; provide references
  • Justify your sample size/power analysis, provide references
  • Explain your data analysis plan to you so you are comfortable and confident
  • Two hours of additional support with your statistician

Quantitative Results Section (Descriptive Statistics, Bivariate and Multivariate Analyses, Structural Equation Modeling , Path analysis, HLM, Cluster Analysis )

Clean and code dataset

  • Conduct descriptive statistics (i.e., mean, standard deviation, frequency and percent, as appropriate)
  • Conduct analyses to examine each of your research questions
  • Write-up results
  • Provide APA 7 th edition tables and figures
  • Explain Chapter 4 findings
  • Ongoing support for entire results chapter statistics

Please call 727-442-4290 to request a quote based on the specifics of your research, schedule using the calendar on this page, or email [email protected]

types of factor analysis in research methodology

Factor Analysis 101: The Basics

  • Market Research , Survey Tips

What is Factor Analysis?

Factor analysis is a powerful data reduction technique that enables researchers to investigate concepts that cannot easily be measured directly. By boiling down a large number of variables into a handful of comprehensible underlying factors, factor analysis results in easy-to-understand, actionable data. 

By applying this method to your research, you can spot trends faster and see themes throughout your datasets, enabling you to learn what the data points have in common. 

Unlike statistical methods such as regression analysis , factor analysis does not require defined variables. 

Factor analysis is most commonly used to identify the relationship between all of the variables included in a given dataset.

The Objectives of Factor Analysis

 Think of factor analysis as shrink wrap. When applied to a large amount of data, it compresses the set into a smaller set that is far more manageable, and easier to understand. 

The overall objective of factor analysis can be broken down into four smaller objectives: 

  • To definitively understand how many factors are needed to explain common themes amongst a given set of variables.
  • To determine the extent to which each variable in the dataset is associated with a common theme or factor.
  • To provide an interpretation of the common factors in the dataset.
  • To determine the degree to which each observed data point represents each theme or factor.

When to Use Factor Analysis

Determining when to use particular statistical methods to get the most insight out of your data can be tricky.

When considering factor analysis, have your goal top-of-mind.

There are three main forms of factor analysis. If your goal aligns to any of these forms, then you should choose factor analysis as your statistical method of choice: 

Exploratory Factor Analysi s should be used when you need to develop a hypothesis about a relationship between variables. 

Confirmatory Factor Analysis should be used to test a hypothesis about the relationship between variables.

Construct Validity should be used to test the degree to which your survey actually measures what it is intended to measure.

How To Ensure Your Survey is Optimized for Factor Analysis

If you know that you’ll want to perform a factor analysis on response data from a survey, there are a few things you can do ahead of time to ensure that your analysis will be straightforward, informative, and actionable.

Identify and Target Enough Respondents

Large datasets are the lifeblood of factor analysis. You’ll need large groups of survey respondents, often found through panel services , for factor analysis to yield significant results. 

While variables such as population size and your topic of interest will influence how many respondents you need, it’s best to maintain a “more respondents the better” mindset. 

The More Questions, The Better

While designing your survey , load in as many specific questions as possible. Factor analysis will fall flat if your survey only has a few broad questions.  

The ultimate goal of factor analysis is to take a broad concept and simplify it by considering more granular, contextual information, so this approach will provide you the results you’re looking for. 

Aim for Quantitative Data

If you’re looking to perform a factor analysis, you’ll want to avoid having open-ended survey questions . 

By providing answer options in the form of scales (whether they be Likert Scales , numerical scales, or even ‘yes/no’ scales) you’ll save yourself a world of trouble when you begin conducting your factor analysis. Just make sure that you’re using the same scaled answer options as often as possible.

types of factor analysis in research methodology

See all blog posts >

types of factor analysis in research methodology

  • AI , Alchemer Pulse , Customer Experience , Customer Feedback

types of factor analysis in research methodology

  • AI , Alchemer Pulse , Press Release , Product News

professional man viewing mobile device in the city

  • Alchemer Digital , Customer Emotion & Sentiment

See it in Action

types of factor analysis in research methodology

  • Privacy Overview
  • Strictly Necessary Cookies
  • 3rd Party Cookies

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.

Keeping this cookie enabled helps us to improve our website.

Please enable Strictly Necessary Cookies first so that we can save your preferences!

  • Factor Analysis

Factor analysis is a technique in mathematics that we use to reduce a larger number into a smaller number. Moreover, in this topic, we will talk about it and its various aspects.

What is Factor Analysis?

It refers to a method that reduces a large variable into a smaller variable factor. Furthermore, this technique takes out maximum ordinary variance from all the variables and put them in common score.

Moreover, it is a part of General Linear Model (GLM) and it believes several theories that contain no multicollinearity, linear relationship, true correlation , and relevant variables into the analysis among factors and variables.

factor analysis

Types of Factor Analysis

There are different methods that we use in factor analysis from the data set:

1. Principal component analysis

It is the most common method which the researchers use. Also, it extracts the maximum variance and put them into the first factor. Subsequently, it removes the variance explained by the first factor and extracts the second factor. Moreover, it goes on until the last factor.

2. Common Factor Analysis

It’s the second most favoured technique by researchers. Also, it extracts common variance and put them into factors . Furthermore, this technique doesn’t include the variance of all variables and is used in SEM.

3. Image Factoring

It is on the basis of the correlation matrix and makes use of OLS regression technique in order to predict the factor in image factoring.

4. Maximum likelihood method

It also works on the correlation matrix but uses a maximum likelihood method to factor.

5. Other methods of factor analysis

Alfa factoring outweighs least squares. Weight square is another regression-based method that we use for factoring.

Factor loading- Basically it the correlation coefficient for the factors and variables. Also, it explains the variable on a particular factor shown by variance.

Eigenvalues- Characteristics roots are its other name. Moreover, it explains the variance shown by that particular factor out of the total variance. Furthermore, commonality column helps to know how much variance the first factor explained out of total variance.

Factor Score- It’s another name is the component score. Besides, it’s the score of all rows and columns that we can use as an index for all variables and for further analysis. Moreover, we can standardize it by multiplying it with a common term.

Rotation method- This method makes it more reliable to understand the output. Also, it affects the eigenvalues method but the eigenvalues method doesn’t affect it. Besides, there are 5 rotation methods: (1) No Rotation Method, (2) Varimax Rotation Method, (3) Quartimax Rotation Method, (4) Direct Oblimin Rotation Method, and (5) Promax Rotation Method.

Assumptions of Factor Analysis

Factor analysis has several assumptions. These include:

  • There are no outliers in the data.
  • The sample size is supposed to be greater than the factor.
  • It is an interdependency method so there should be no perfect multicollinearity between the variables.
  • Factor analysis is a linear function thus it doesn’t require homoscedasticity between variables.
  • It is also based on the linearity assumption. So, we can also use non-linear variables. However, after a transfer, they change into a linear variable.
  • Moreover, it assumes interval data.

Key Concepts of Factor Analysis

It includes the following key concept:

Exploratory factor analysis- It assumes that any variable or indicator can be associated with any factor. Moreover, it is the most common method used by researchers. Furthermore, it isn’t based on any prior theory.

Confirmatory Factor Analysis- It is used to determine the factors loading and factors of measured variables, and to confirm what it expects on the basis of pre-established assumption. Besides, it uses two approaches:

  • The Traditional Method
  • The SEM Approach

Solved Question for You

Question. How many types of Factor analysis are there?

A. 5 B. 6 C. 4 D. 3

Answer . The correct answer is option A.

Customize your course in 30 seconds

Which class are you in.

tutor

Factorisation

  • What is Exponential Form – How to Write Numbers in Exponential Form?
  • Associative and Commutative Property of Addition and Multiplication
  • Factorisation Using Identities
  • Factorisation Using Division

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Download the App

Google Play

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

mathematics-logo

Article Menu

types of factor analysis in research methodology

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Bayesian statistical inference for factor analysis models with clustered data.

types of factor analysis in research methodology

Share and Cite

Chen, B.; He, N.; Li, X. Bayesian Statistical Inference for Factor Analysis Models with Clustered Data. Mathematics 2024 , 12 , 1949. https://doi.org/10.3390/math12131949

Chen B, He N, Li X. Bayesian Statistical Inference for Factor Analysis Models with Clustered Data. Mathematics . 2024; 12(13):1949. https://doi.org/10.3390/math12131949

Chen, Bowen, Na He, and Xingping Li. 2024. "Bayesian Statistical Inference for Factor Analysis Models with Clustered Data" Mathematics 12, no. 13: 1949. https://doi.org/10.3390/math12131949

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

University of the People Logo

Higher Education News , Tips for Online Students , Tips for Students

A Comprehensive Guide to Different Types of Research

types of factor analysis in research methodology

Updated: June 19, 2024

Published: June 15, 2024

two researchers working in a laboratory

When embarking on a research project, selecting the right methodology can be the difference between success and failure. With various methods available, each suited to different types of research, it’s essential you make an informed choice. This blog post will provide tips on how to choose a research methodology that best fits your research goals .

We’ll start with definitions: Research is the systematic process of exploring, investigating, and discovering new information or validating existing knowledge. It involves defining questions, collecting data, analyzing results, and drawing conclusions.

Meanwhile, a research methodology is a structured plan that outlines how your research is to be conducted. A complete methodology should detail the strategies, processes, and techniques you plan to use for your data collection and analysis.

 a computer keyboard being worked by a researcher

Research Methods

The first step of a research methodology is to identify a focused research topic, which is the question you seek to answer. By setting clear boundaries on the scope of your research, you can concentrate on specific aspects of a problem without being overwhelmed by information. This will produce more accurate findings. 

Along with clarifying your research topic, your methodology should also address your research methods. Let’s look at the four main types of research: descriptive, correlational, experimental, and diagnostic.

Descriptive Research

Descriptive research is an approach designed to describe the characteristics of a population systematically and accurately. This method focuses on answering “what” questions by providing detailed observations about the subject. Descriptive research employs surveys, observational studies , and case studies to gather qualitative or quantitative data. 

A real-world example of descriptive research is a survey investigating consumer behavior toward a competitor’s product. By analyzing the survey results, the company can gather detailed insights into how consumers perceive a competitor’s product, which can inform their marketing strategies and product development.

Correlational Research

Correlational research examines the statistical relationship between two or more variables to determine whether a relationship exists. Correlational research is particularly useful when ethical or practical constraints prevent experimental manipulation. It is often employed in fields such as psychology, education, and health sciences to provide insights into complex real-world interactions, helping to develop theories and inform further experimental research.

An example of correlational research is the study of the relationship between smoking and lung cancer. Researchers observe and collect data on individuals’ smoking habits and the incidence of lung cancer to determine if there is a correlation between the two variables. This type of research helps identify patterns and relationships, indicating whether increased smoking is associated with higher rates of lung cancer.

Experimental Research

Experimental research is a scientific approach where researchers manipulate one or more independent variables to observe their effect on a dependent variable. This method is designed to establish cause-and-effect relationships. Fields like psychology , medicine, and social sciences frequently employ experimental research to test hypotheses and theories under controlled conditions. 

A real-world example of experimental research is Pavlov’s Dog experiment. In this experiment, Ivan Pavlov demonstrated classical conditioning by ringing a bell each time he fed his dogs. After repeating this process multiple times, the dogs began to salivate just by hearing the bell, even when no food was presented. This experiment helped to illustrate how certain stimuli can elicit specific responses through associative learning.

Diagnostic Research

Diagnostic research tries to accurately diagnose a problem by identifying its underlying causes. This type of research is crucial for understanding complex situations where a precise diagnosis is necessary for formulating effective solutions. It involves methods such as case studies and data analysis and often integrates both qualitative and quantitative data to provide a comprehensive view of the issue at hand. 

An example of diagnostic research is studying the causes of a specific illness outbreak. During an outbreak of a respiratory virus, researchers might conduct diagnostic research to determine the factors contributing to the spread of the virus. This could involve analyzing patient data, testing environmental samples, and evaluating potential sources of infection. The goal is to identify the root causes and contributing factors to develop effective containment and prevention strategies.

Using an established research method is imperative, no matter if you are researching for marketing , technology , healthcare , engineering, or social science. A methodology lends legitimacy to your research by ensuring your data is both consistent and credible. A well-defined methodology also enhances the reliability and validity of the research findings, which is crucial for drawing accurate and meaningful conclusions. 

Additionally, methodologies help researchers stay focused and on track, limiting the scope of the study to relevant questions and objectives. This not only improves the quality of the research but also ensures that the study can be replicated and verified by other researchers, further solidifying its scientific value.

a graphical depiction of the wide possibilities of research

How to Choose a Research Methodology

Choosing the best research methodology for your project involves several key steps to ensure that your approach aligns with your research goals and questions. Here’s a simplified guide to help you make the best choice.

Understand Your Goals

Clearly define the objectives of your research. What do you aim to discover, prove, or understand? Understanding your goals helps in selecting a methodology that aligns with your research purpose.

Consider the Nature of Your Data

Determine whether your research will involve numerical data, textual data, or both. Quantitative methods are best for numerical data, while qualitative methods are suitable for textual or thematic data.

Understand the Purpose of Each Methodology

Becoming familiar with the four types of research – descriptive, correlational, experimental, and diagnostic – will enable you to select the most appropriate method for your research. Many times, you will want to use a combination of methods to gather meaningful data. 

Evaluate Resources and Constraints

Consider the resources available to you, including time, budget, and access to data. Some methodologies may require more resources or longer timeframes to implement effectively.

Review Similar Studies

Look at previous research in your field to see which methodologies were successful. This can provide insights and help you choose a proven approach.

By following these steps, you can select a research methodology that best fits your project’s requirements and ensures robust, credible results.

Completing Your Research Project

Upon completing your research, the next critical step is to analyze and interpret the data you’ve collected. This involves summarizing the key findings, identifying patterns, and determining how these results address your initial research questions. By thoroughly examining the data, you can draw meaningful conclusions that contribute to the body of knowledge in your field. 

It’s essential that you present these findings clearly and concisely, using charts, graphs, and tables to enhance comprehension. Furthermore, discuss the implications of your results, any limitations encountered during the study, and how your findings align with or challenge existing theories.

Your research project should conclude with a strong statement that encapsulates the essence of your research and its broader impact. This final section should leave readers with a clear understanding of the value of your work and inspire continued exploration and discussion in the field.

Now that you know how to perform quality research , it’s time to get started! Applying the right research methodologies can make a significant difference in the accuracy and reliability of your findings. Remember, the key to successful research is not just in collecting data, but in analyzing it thoughtfully and systematically to draw meaningful conclusions. So, dive in, explore, and contribute to the ever-growing body of knowledge with confidence. Happy researching!

At UoPeople, our blog writers are thinkers, researchers, and experts dedicated to curating articles relevant to our mission: making higher education accessible to everyone.

Related Articles

You might be using an unsupported or outdated browser. To get the best possible experience please use the latest version of Chrome, Firefox, Safari, or Microsoft Edge to view this website.

The State Of Workplace Communication In 2024

Leeron Hoory

Updated: Mar 8, 2023, 12:37pm

The State Of Workplace Communication In 2024

Table of Contents

Key takeaways, communication tools used in the workplace in 2023, how covid-19 continues to affect work communication, the majority of workers use digital communication tools for up to 20 hours a week, digital communication tools are affecting work-life balance, how ineffective communication affects the work environment, digital communication tools are increasing stress in the workplace, most workers prefer email to other digital communication options, how workers are using digital communication to connect, how many people still work from home in each state, methodology.

With work from home increasing to 58% of the workforce (92 million workers), digital communication has become a focal point of workplace communication and productivity. Following an analysis, Forbes Advisor found that Colorado and Maryland had the highest number of remote workers. The survey also found that 28% of all respondents report using a voice-over-internet-protocol (VoIP) phone system . While half of the respondents we surveyed worked in a hybrid environment, 27% worked remotely and 20% on-site.

  • Workers are spending an average of 20 hours a week using digital communication tools.
  • Forty-five percent of workers feel more connected to their team as a result of using digital communication.
  • Digital communication makes 58% of workers feel like they need to be available more often.
  • Sixty percent of workers feel increased burnout as a result of communicating digitally.
  • Nearly half of workers report their productivity being affected by ineffective communication.
  • Forty-two percent of workers experience stress trying to form responses that convey the right tone of voice.

The days of the phone call may not be behind us, despite how many other communication platforms there are today. Workers are finding that the more effective communication platforms range in the type of communication they provide, whether that be instant messaging, video calls or VoIP systems. Google Meet and Zoom ranked highest for video calls, being used by 40% and 46% of respondents, respectively.

Remote and hybrid workers are using VoIP systems to communicate more often than in-office workers. VoIP systems were used by over a quarter of total respondents, with 37% of remote workers using them, 23% of on-site workers and 24% of hybrid workers.

The most effective communication tools for in-office, hybrid and remote workers

The most effective communication tool varied between on-site, remote and hybrid workers. For on-site workers, the mobile phone was the most effective method of communication for 38% of respondents, followed by landline (22%) and Zoom (21%). For people working remotely, Zoom was the most effective method for 22% of respondents, as well as Google Chat (also 22%). Hybrid workers followed a similar trend: 31% ranked Zoom as the most effective and 23% ranked Google Meet as the most effective.

Most people turn to tools beyond the standard phone to communicate at work, with 14% of respondents using VoIP when they didn’t prior to the pandemic. Over 20% of them are remote workers. It may seem obvious that more people began using Zoom (24% of respondents), but mobile phones also saw a spike in use by 20% after March 1, 2020.

Over 40% of workers feel more connected to their team since Covid-19

While Covid-19 changed the way offices and teams communicate, it didn’t necessarily lead to workers feeling less connected across the board. A total of 45% of workers who took the survey actually felt more connected to their team after Covid-19 (43% of on-site, 52% remote and 46% hybrid workers).

Some workers did feel less connected (25%). Remote workers were the most likely to report feeling less connected (34%) while the numbers were lower for on-site workers (27%) and hybrid workers (20%). There were also those who experienced no change. Of these respondents, on-site workers were the most likely to report no change (28%).

Many workers spend all day in front of a screen. The highest percentage of respondents (16%) said they spend 21 to 25 hours per week on digital communication platforms. That’s around five hours per day on average.

Fifteen percent spent 16 to 20 hours, 14% spent 11 to 15 hours and 12% spent six to 10 hours. There was a sharp decrease when the numbers reached 31 to 35 hours: only 5% said they spent this much time on digital communication tools. Digital communication tools took up the use of more than a 40-hour workweek for 2% of respondents.

With so many digital communication tools available, more workers are feeling pressure to stay connected to their coworkers outside of normal working hours. Nearly 25% of workers said that they always feel pressured to stay connected to their peers, while 35% said they often feel pressure. On the other end—those who felt free from pressure—the numbers were much smaller. Seven percent said they rarely felt pressure while 10% said they never do.

Digital communication increased burnout for 60% of workers

Whether working from home, on-site or both, digital communication has a high chance of increasing feelings of burnout. Our survey showed that 60% of respondents said that digital communication increased feelings of burnout. Nearly 70% of remote workers said they experienced burnout from digital communication. Hybrid and on-site workers were less likely to experience burnout as a result of digital communication: 56% and 49% respectively.

Only 11% of workers report that ineffective communication does not have any effect on them. For the rest of the respondents, poor communication greatly affected workers in many areas. Most notably, it impacted productivity for 49% of respondents. Nearly 50% of respondents reported that ineffective communication impacted job satisfaction while 42% said it affected stress levels.

Poor communication is affecting trust for 45% of workers

For over 40% of workers, poor communication reduces trust both in leadership and in their team. Remote workers were more affected, with 54% reporting poor communication impacts trust in leadership and 52% reporting it impacts trust in the team. For on-site workers, poor communication did not impact trust to the same extent, though it still had a big impact: 43% reported trust in leadership was impacted and 38% said trust in their team was affected.

Job satisfaction relies on effective communication for the majority of workers

Respondents reported that effective communication impacted several areas of work. Forty-two percent said it impacted cross-functional collaboration. Job satisfaction is another big area that is affected by communication: 48% said they were impacted. Nearly half of the respondents said their productivity was impacted.

For 46% of respondents, seeing messages ignored for long periods of time led to stress in the workplace. The notification that their manager is typing a message caused stress for 45% of respondents. Many other aspects of digital communication led to stress as well: crafting digital responses with the right tone of voice (42%), deciphering the tone behind digital messages (38%), last-minute video calls from leadership (36%) and turning off your camera when on video calls (35%).

When it comes to preferred methods of communication, many workers prefer old-fashioned tools. Email is the most popular tool, with 18% of total respondents marking it as their preference (25% of remote workers and 10% of on-site workers). Video calls were the next popular choice (17%) followed by direct messages (16%). For on-site workers, in-person conversations were by far the most preferred method of communication, with 34% of respondents saying it’s their preference.

  • Preferences were the same across gender, though varied considerably when it came to video calls: 22% of male respondents preferred video while 12% of females preferred video.
  • Age played a role in preference of communication methods: 40% of respondents between 59 and 77 preferred in-person conversation while that was only true for 17% of people ages 18 to 26 and 16% of people ages 27 to 42.

For many workers, digital communication is an essential part of their day, but they differ in the methods of communication they use. More than half (56%) of respondents use video for their communication and 55% use audio. Personalized greetings are less common (44%). Emojis and GIFs are still relatively common forms of communication: 42% and 34% respectively.

  • Female respondents preferred personalized greetings more than male respondents: 47% compared with 40%.
  • Male respondents preferred audio more than female respondents: 63% compared with 50%. Video followed a similar pattern: 61% (male) versus 53% (female).
  • Respondents of ages 43 to 58 had the highest preference for GIFs: 42% compared with 31% of respondents between the ages of 18 and 26.
  • Respondents between 18 and 26 years old were the most likely to prefer video (69%). Preference for video declined with age: 60% of respondents between the ages of 27 and 42, 50% of people between 43 and 58 years old and only 23% of people surveyed between the ages of 59 and 77.

Forbes Advisor found the total number of people working from home in each state in 2023. The survey found that the percentage of remote workers varied by state. Between 20% and 24.2% of people work from home in the 11 states with the largest work-from-home workforce.

  • Washington has the highest percentage of people who work from home at 24.2% of the workforce working at home, followed by Maryland (24%) and Colorado (23.7%).
  • Massachusetts was the next state with the highest percentage of people working from home (23.7%), followed by Oregon (22.7%), Virginia (22.3%) and New Jersey (22.1%).
  • Mississippi has the smallest workforce of people who work from home. Of the 1.2 million workers, only 6.3% (76,556) of people work from home.

While much has changed in the world of digital communication since Covid-19, there have also been constants. Email and phone are still two of the most preferred methods of communication, despite the numerous options and tools available. VoIP systems are increasing in popularity as well, with 28% of all respondents using them. Workers are spending an average of 20 hours per week on digital communication platforms—that’s half the 40-hour workweek.

Looking ahead, it will be important for teams and small businesses to establish productive systems of digital communication, especially given that over half of the people we surveyed reported that digital communication leads to increased burnout.

If a company or team establishes a healthy culture around digital communication, it can potentially lead to better job satisfaction, increased productivity and higher trust in a company’s leadership and the team.

Forbes Advisor commissioned a survey of 1,000 employed Americans who work in an office setting by market research company OnePoll, in accordance with the Market Research Society’s code of conduct. The margin of error is +/- 3.1 points with 95% confidence. The OnePoll research team is a member of the MRS and has corporate membership with the American Association for Public Opinion Research (AAPOR).

To find the number of workers in each state who work from home, Forbes Advisor sourced data from the Census Bureau’s American Community Survey .

  • Best VoIP Service
  • Best Free VoIP Phone Service
  • Best VoIP Apps
  • Best VoIP Cell Phone Service
  • Best Conference Calling Services
  • Best Call Center Software
  • Best Business Cell Phone Plans
  • Best Video Conferencing Software
  • Best Webinar Software
  • RingCentral Review
  • Nextiva Review
  • Dialpad Review
  • Ooma Office Review
  • Grasshopper Review
  • Freshdesk Contact Center Review
  • Zoom Review
  • Google Voice Review
  • Vonage Review
  • Intermedia Unite Review
  • Nextiva vs. RingCentral
  • Ooma vs. Vonage
  • Ooma vs. MagicJack
  • Grasshopper vs. Google Voice
  • RingCentral vs. Grasshopper
  • Google Meet vs. Zoom
  • Webex vs. Zoom
  • Microsoft Teams vs. Zoom
  • What is a VoIP Caller?
  • What is Wi-Fi Calling?
  • What is CPaaS?
  • How to Get a Free Business Phone Number
  • Google Voice Number Guide
  • Call Center Analytics Guide
  • What Is Non-Fixed VoIP?
  • What Is Computer Telephony Integration (CTI)?

Best Hawaii Registered Agent Services Of 2024

Best Hawaii Registered Agent Services Of 2024

Katherine Haan

Best Arizona Registered Agent Services Of 2024

Free Mission Statement Template (With Examples)

Free Mission Statement Template (With Examples)

Shweta

How To Start A Print On Demand Business In 2024

HR For Small Businesses: The Ultimate Guide

HR For Small Businesses: The Ultimate Guide

Anna Baluch

How One Company Is Using AI To Transform Manufacturing

Rae Hartley Beck

Leeron is a New York-based writer with experience covering technology and politics. Her work has appeared in publications such as Quartz, the Village Voice, Gothamist, and Slate.

  • Open access
  • Published: 12 June 2024

Identifying therapeutic target genes for migraine by systematic druggable genome-wide Mendelian randomization

  • Chengcheng Zhang 1 ,
  • Yiwei He 2 &

The Journal of Headache and Pain volume  25 , Article number:  100 ( 2024 ) Cite this article

450 Accesses

1 Altmetric

Metrics details

Currently, the treatment and prevention of migraine remain highly challenging. Mendelian randomization (MR) has been widely used to explore novel therapeutic targets. Therefore, we performed a systematic druggable genome-wide MR to explore the potential therapeutic targets for migraine.

We obtained data on druggable genes and screened for genes within brain expression quantitative trait locis (eQTLs) and blood eQTLs, which were then subjected to two-sample MR analysis and colocalization analysis with migraine genome-wide association studies data to identify genes highly associated with migraine. In addition, phenome-wide research, enrichment analysis, protein network construction, drug prediction, and molecular docking were performed to provide valuable guidance for the development of more effective and targeted therapeutic drugs.

We identified 21 druggable genes significantly associated with migraine (BRPF3, CBFB, CDK4, CHD4, DDIT4, EP300, EPHA5, FGFRL1, FXN, HMGCR, HVCN1, KCNK5, MRGPRE, NLGN2, NR1D1, PLXNB1, TGFB1, TGFB3, THRA, TLN1 and TP53), two of which were significant in both blood and brain (HMGCR and TGFB3). The results of phenome-wide research showed that HMGCR was highly correlated with low-density lipoprotein, and TGFB3 was primarily associated with insulin-like growth factor 1 levels.

Conclusions

This study utilized MR and colocalization analysis to identify 21 potential drug targets for migraine, two of which were significant in both blood and brain. These findings provide promising leads for more effective migraine treatments, potentially reducing drug development costs.

Peer Review reports

Migraine is a prevalent chronic disease characterized by recurring headaches that are typically unilateral and throbbing, ranging from moderate to severe intensity, and often accompanied by nausea, vomiting, sensitivity to light, among other symptoms [ 1 ]. Migraine is recognized as the second most disabling condition globally, creating substantial challenges for those affected and also placing a considerable strain on society overall [ 2 ]. Genetic factors play a substantial role in migraine, with its heritability estimated to be as high as 57% [ 3 ].

Currently, the treatment and prevention of migraine remain highly challenging. Although new drugs (e.g. targeting the calcitonin gene-related peptide, namely CGRP) have been developed, offering significant benefits to migraine sufferers, there are still many issues, such as side effects and less than ideal response rates [ 4 ]. Therefore, it is necessary to continue exploring potential therapeutic targets for migraine treatment. Integrating genetics into drug development may provide a novel approach. While genome-wide association studies (GWAS) are very effective in identifying single nucleotide polymorphisms (SNPs) associated with the risk of migraine [ 5 ], the GWAS method does not clearly and directly identify the causative genes or drive drug development without substantial downstream analyses [ 6 , 7 ].

Mendelian randomization (MR) is a method that utilizes genetic variation as instrumental variables (IVs) to uncover a causal connection between an exposure and an outcome [ 8 ]. MR analysis has been widely applied to discover new therapeutic targets by integrating summarized data from disease GWAS and expression quantitative trait loci (eQTL) studies [ 9 ]. The eQTLs found in the genomic regions of druggable genes are always considered as proxies, since the expression levels of gene can be seen as a form of lifelong exposure. Therefore, we performed a systematic druggable genome-wide MR to explore the potential therapeutic targets for migraine. First, we obtained data on druggable genes and screened for genes within brain eQTLs and blood eQTLs, which were then subjected to two-sample MR analysis with migraine GWAS data to identify genes highly associated with migraine. Subsequently, we conducted colocalization analysis to ensure the robustness of our results. For significant genes both in blood and brain, the phenome-wide research was conducted to explore the relationship between shared potential therapeutic targets and other characteristics. In addition, enrichment analysis, protein network construction, drug prediction, and molecular docking were performed for all significant genes to provide valuable guidance for the development of more effective and targeted therapeutic drugs.

The overview of this study is presented in Fig.  1 .

figure 1

Overview of this study design. DGIdb: Drug-Gene Interaction Database; eQTL: expression quantitative trait loci; GWAS: genome-wide association studies; PheWAS: Phenome-wide association study; PPI: protein–protein interaction; DSigDB: Drug Signatures Database

Druggable genes

Druggable genes were sourced from the Drug-Gene Interaction Database (DGIdb, https://www.dgidb.org/ ) [ 10 ] and a comprehensive review [ 11 ]. The DGIdb offers insights into drug-gene interactions and the potential for druggability. We accessed the 'Categories Data' from DGIdb, which was updated in February 2022. Additionally, we utilized a list of druggable genes provided in a review authored by Finan et al. [ 11 ]. By consolidating druggable genes from two sources, a broader range of druggable genes can be obtained, which have already been applied in previous study [ 12 ].

eQTL datasets

The blood eQTL dataset was sourced from eQTLGen ( https://eqtlgen.org/ ) [ 13 ], which provided cis-eQTLs for 16,987 genes derived from 31,684 blood samples collected from healthy individuals of European ancestry (Table 1 ). We acquired cis-eQTL results that were fully significant (with a false discovery rate (FDR) less than 0.05) along with information on allele frequencies. We obtained the brain eQTL data from the PsychENCODE consortia ( http://resource.psychencode.org ) [ 14 ], encompassing 1,387 samples from the prefrontal cortex, primarily of European descent (Table 1 ). We downloaded all significant eQTLs (with FDR less than 0.05) for genes that exhibited an expression level greater than 0.1 fragments per kilobase per million mapped fragments in at least 10 samples, along with complete SNP information.

Migraine GWAS dataset

In this study, the summary statistics data for migraine were obtained from a meta-analysis of GWAS conducted by the International Headache Genetics Consortium (IHGC) in 2022 [ 5 ]. To address privacy concerns related to participants in the 23andMe cohort, the GWAS summary statistics data used in this study did not include samples from the 23andMe cohort. The summary data comprised 589,356 individuals of European ancestry, with 48,975 cases and 540,381 controls (Table  1 ).

Mendelian randomization analysis

MR analyses were conducted using the 'TwoSampleMR' package (version 0.5.7) [ 15 ] in R. We chose the eQTLs of the drug genome as the exposure data. For constructing IVs, SNPs with a FDR below 0.05 and located within ± 100 kb of the transcriptional start site (TSS) of each gene were selected. These SNPs were subsequently clumped at an r 2 less than 0.001 using European samples from the 1000 Genomes Project [ 16 ]. The R package 'phenoscanner' [ 17 ] (version 1.0) was employed to identify phenotypes related to the IVs. Additionally, we excluded SNPs that were directly associated with migraine and the trait directly linked to migraine, namely headache. We harmonised and conducted MR analyses on the filtered SNPs. When only one SNP was available for analysis, we use the Wald ratio method to perform MR estimation. When multiple SNPs were available, MR analysis was performed using the inverse-variance weighted (IVW) method with random effects [ 18 ]. We used Cochran's Q test to assess heterogeneity among the individual causal effects of the SNPs [ 19 ]. Additionally, MR Egger's intercept was utilized to evaluate SNP pleiotropy [ 20 ]. P -values were adjusted by FDR, and 0.05 was considered as the significant threshold. Additionally, we selected target genes associated with commonly used medications for migraine and compared their MR results with those of significantly druggable genes.

Colocalization analysis

Sometimes, a single SNP is located in the regions of two or more genes. In such cases, its impact on a disease (here, migraine) is influenced by a mix of different genes. Colocalization analysis was used to confirm the potential shared causal genetic variations in physical location between migraine and eQTLs. We separately filtered SNPs located within ± 100 kb from each migraine risk gene's TSS from migraine GWAS data, blood eQTL data, and brain eQTL data. The probability that a given SNP is associated with migraine is denoted as P1, the probability that a given SNP is a significant eQTL is denoted as P2, and the probability that a given SNP is both associated with migraine and is an eQTL result is denoted as P12. All probabilities were set to default values (P1 = 1 × 10 −4 , P2 = 1 × 10 −4 , and P12 = 1 × 10 −5 ) [ 21 ]. We used posterior probabilities (PP) to quantify the support for all hypotheses, which are identified as PPH0 through PPH4: PPH0, not associated with any trait; PPH1, related to gene expression but not associated with migraine risk; PPH2, associated with migraine risk but not related to gene expression; PPH3, associated with both migraine risk and gene expression, with clear causal variation; and PPH4, associated with both migraine risk and gene expression, with a common causal variant. Given the limited capacity of colocalization analysis, we restricted our subsequent analyses to genes where PPH4 was greater than or equal to 0.75. Colocalization analysis was conducted using the R package 'coloc' (version 5.2.3).

Phenome-wide association analysis

We used the IEU OpenGWAS Project ( https://gwas.mrcieu.ac.uk/phewas/ ) [ 15 ] to obtain the phenome-wide association study (PheWAS) data of SNPs corresponding to druggable genes that were significant in both blood and brain following colocalization analysis.

Enrichment analysis

To explore the functionals' characteristics and biological relevance of predetermined prospective druggable genes, the R package 'clusterProfiler' (version 4.10.1) [ 22 ] was used for Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment studies. GO includes three terms: Biological Process (BP), Molecular Function (MF), and Cellular Component (CC). KEGG pathways can provide information about metabolic pathways.

Protein–protein interaction network construction

The protein–protein interaction (PPI) networks can visually display the relationships between protein interactions of significant druggable genes. We constructed PPI networks using the STRING ( https://string-db.org/ ) s' with a confidence score threshold of 0.4 as the minimum required interaction score, while all other parameters were maintained at their default settings [ 23 ].

Candidate drug prediction

Drug Signatures Database (DSigDB, http://dsigdb.tanlab.org/DSigDBv1.0/ ) [ 24 ] is a sizable database with 22,527 gene sets and 17,389 unique compounds spanning 19,531 genes. We uploaded previously identified significant druggable genes to DSigDB to predict candidate drugs and evaluate the pharmacological activity of target genes.

Molecular docking

We conducted molecular docking to assess the binding energies and interaction patterns between candidate drugs and their targets. By identifying ligands that exhibit high binding affinity and beneficial interaction patterns, we are able to prioritize drug targets for additional experimental validation and refine the design of prospective candidate drugs. Drug structural data were sourced from the PubChem Compound Database ( https://pubchem.ncbi.nlm.nih.gov/ ) [ 25 ] and downloaded in SDF format, then converted to pdb format using OpenBabel 2.4.1. Protein structural data were downloaded from the Protein Data Bank (PDB, http://www.rcsb.org/ ). The top five important drugs and the proteins encoded by the respective target genes were subjected to molecular docking using the computerized protein–ligand docking software AutoDock 4.2.6 ( http://autodock.scripps.edu/ ) [ 26 ], and the results were visualized using PyMol 3.0.2 ( https://www.pymol.org/ ). The final structures of six proteins and four drugs were obtained.

Druggable genome

We obtained 3,953 druggable genes from the DGIdb (Table S1). Additionally, we acquired 4,463 druggable genes from previous reviews (Table S2) [ 11 ]. After integrating the data, we obtained 5,883 unique druggable genes named by the Human Genome Organisation Gene Nomenclature Committee for subsequent analysis (Table S3).

Candidate druggable genes

After intersecting eQTLs from blood and brain tissue with druggable genes respectively, the blood eQTLs contained 3,460 gene symbols, while the brain eQTLs had 2,624 gene symbols. We performed MR analysis and identified 24 significant genes associated with migraine from blood and 10 from brain tissue (Figs. 2 and 3 ). Among them, two genes, HMGCR and TGFB3, reached significance in both blood (HMGCR OR 1.38 and TGFB3 OR 0.88) and brain tissues (HMGCR OR 2.02 and TGFB3 OR 0.73). Detailed results for the significant IVs and full results of MR are available in the Table S4-S6.

figure 2

Forest plot of 24 significant genes associated with migraine from blood

figure 3

Forest plot of 10 significant genes associated with migraine from brain

We selected target genes associated with commonly used medications for migraine as comparisons for our study results [ 27 ]. These include CGRP-related gene (CALCB, CALCRL, RAMP1 and RAMP3), genes related to 5-hydroxytryptamine (5-HT) receptors targeted by ergot alkaloids, triptans, and ditans (HTR1B, HTR1D, HTR1F), γ-aminobutyric acid (GABA) receptor-related genes targeted by topiramate (GABRA1), calcium ion channel-related genes targeted by flunarizine (CACNA1H, CACNA1I, CALM1), and genes related to β-adrenoceptor targeted by propranolol (ADRB1, ADRB2). Among these genes (Fig.  4 ), CALM1 showed significant association with migraine in blood eQTL, but it lost significance after FDR correction (OR 0.92, P  = 0.039, FDR-P = 0.455). In brain eQTL, CALCB and RAMP3 showed correlation with migraine, and after FDR correction, CALCB still maintained significance (CALCB: OR 0.68, P  = 0.0001, FDR-P = 0.029; RAMP3: OR 1.16, P  = 0.031, FDR-P = 0.425).

figure 4

Forest plot of 13 genes associated with commonly used medications for migraine from blood and brain

The results indicated that, of the previously identified 24 significant genes from blood, 17 had a PPH4 greater than 0.75. Among the 10 significant genes from brain, 6 had a PPH4 greater than 0.75. HMGCR and TGFB3 showed significant colocalization results in both blood and brain tissues (Table  2 , Table  3 and Table S7).

Due to the presence of the blood–brain barrier, compared to various components in the blood and other organs, brain tissue is more difficult to be affected by the action of drugs [ 28 ]. Therefore, we used the IEU OpenGWAS Project to obtain the PheWAS results of SNPs corresponding to HMGCR and TGFB3 from blood, rather than from brain tissue. The results showed that HMGCR was highly correlated with low-density lipoprotein (LDL), and TGFB3 was primarily associated with the level of insulin-like growth factor 1 (IGF1). The complete results are available in the Table S8-S9.

Through GO analysis of 21 potential targets, we found that these targets are primarily involved in BP such as regulation of protein secretion (GO: 0050708), response to hypoxia (GO: 0001666), negative regulation of carbohydrate metabolic processes (GO: 0045912), and the intrinsic apoptotic signaling pathway in response to DNA damage by p53 class mediator (GO: 0042771). The main MF include transcription coregulator binding (GO: 0001221) and chromatin DNA binding (GO: 0031490, Fig.  5 ). To explore the potential therapeutic pathways of migraine-associated significant druggable genes, KEGG analysis indicates that the target genes were primarily enriched in pathways such as Human T-cell leukemia virus 1 infection (hsa05166) and the Cell cycle (hsa04110, Fig.  6 ).

figure 5

GO enrichment results for three terms

figure 6

KEGG enrichment results

We loaded 21 drug target genes into the STRING database to create a PPI network. The results, shown in Fig.  7 , displayed protein interaction pathways consisting of 21 nodes and 22 edges.

figure 7

PPI network built with STRING

We used DSigDB to predict potentially effective intervention drugs and listed the top 10 potential intervention drugs based on the adjusted P -values (Table  4 ). The results indicated that butyric acid (butyric acid CTD 00007353) and clofibrate (clofibrate CTD 00005684) were the two most significant drugs, connected respectively to TGFB1, TGFB3, EP300, TP53 and TGFB1, CDK4, HMGCR, TP53. Additionally, arsenenous acid (Arsenenous acid CTD 00000922) and dexamethasone (dexamethasone CTD 00005779) were associated with most of the significant druggable genes.

We used AutoDock 4.2.6 to analyze the binding sites and interactions between the top 5 candidate drugs and the proteins encoded by the corresponding genes, generating the binding energy for each interaction. We obtained 14 effective docking results between the proteins and drugs (Table  5 ). Docking amino acid residues and hydrogen bond lengths are shown in Fig. 8 . Among these, the binding between CDK4 and andrographolide exhibited the lowest binding energy (-7.11 kcal/mol), indicating stable binding.

figure 8

Molecular docking results of available proteins and drugs. a TGFB1 docking butyric acid, b TGFB1 docking clofibrate, c TGFB1 docking Sorafenib, d TGFB1 docking Andrographolide, e TGFB3 docking butyric acid, f EP300 docking butyric acid, g TP53 docking butyric acid, h CDK4 docking clofibrate, i CDK4 docking Sorafenib, j CDK4 docking Andrographolide, k HMGCR docking clofibrate, l TP53 docking clofibrate, m TP53 docking Sorafenib, n TP53 docking Andrographolide

This study integrated existing druggable gene targets with migraine GWAS data through MR and colocalization analysis, identifying 21 druggable genes significantly associated with migraine (BRPF3, CBFB, CDK4, CHD4, DDIT4, EP300, EPHA5, FGFRL1, FXN, HMGCR, HVCN1, KCNK5, MRGPRE, NLGN2, NR1D1, PLXNB1, TGFB1, TGFB3, THRA, TLN1 and TP53). To further illustrate the potential pleiotropy and drug side effects of significant druggable genes, we conducted a phenome-wide research of two SNPs associated with two druggable genes of interest (HMGCR and TGFB3). Additionally, we performed enrichment analysis and constructed PPI network for these 21 significant genes to understand the biological significance and interaction mechanisms of these drug targets. Finally, drug prediction and molecular docking were conducted to further validate the pharmaceutical value of these significant druggable genes.

The association between HMGCR and migraine has been supported by multiple prior studies. One study indicated that migraine has significant shared signals with certain lipoprotein subgroups at the HMGCR locus [ 29 ]. Hong et al. found that HMGCR genotypes associated with higher LDL cholesterol levels are linked to an increased risk of migraine [ 30 ]. Statins inhibit the activity of HMG-CoA reductase, which is encoded by the HMGCR gene, to exert their lipid-lowering effects and have been widely used in the prevention and treatment of coronary heart disease and ischemic stroke. Previous clinical research has shown that simvastatin combined with vitamin D can effectively prevent episodic migraines in adults [ 31 ]. Additionally, HMGCR may also be involved in immune modulation, with studies suggesting that migraine patients experience neuroinflammation due to activation of the trigeminal-vascular system, leading to peripheral and central sensitization of pain and triggering migraine attacks [ 32 , 33 ]. HMGCR inhibitors can suppress the production of inflammatory mediators and cytokines, thus reducing inflammatory responses [ 34 ]. We speculate that the role of HMGCR in regulating inflammation and immunity may have influenced the drug prediction results generated by DSigDB, which based on Gene Set Enrichment Analysis (GSEA) [ 24 , 35 , 36 ], diluting the role of HMGCR in regulating lipid metabolism. Therefore, statins did not appear in the predicted list of candidate drugs.

TGFB1 and TGFB3 encodes different secreted ligands of the transforming growth factor-beta (TGF-β) superfamily of proteins, namely TGF-β1 and TGF-β3. TGF-β is a pleiotropic cytokine closely associated with immunity and inflammation [ 37 ]. Research indicated that TGF-β3 can inhibit B cell proliferation and antibody production by suppressing the phosphorylation of NF-κB, thus exerting its anti-inflammatory effects [ 38 ]. The activation of the classical NF-κB pathway is a key mechanism that upregulates pro-inflammatory cytokines, promoting central sensitization and leading to the onset of chronic migraine [ 39 ]. A previous clinical study indicated that the serum levels of TGF-β1 are significantly elevated in migraine patients [ 40 ]. Ishizaki et al. found that TGF-β1 levels in the platelet poor plasma of migraine patients are significantly increased during headache-free intervals [ 41 ]. Bø et al. discovered that during acute migraine attacks, the levels of TGF-β1 in cerebrospinal fluid are significantly higher compared to the control group [ 42 ]. Although some studies consider TGF-β1 to be an anti-inflammatory cytokine [ 43 ], based on previous research and the results of this study, we believe that TGFB1 and its encoded protein, TGF-β1, are associated with an increased risk of migraine. The pleiotropic effects of TGF-β1 on inflammation may depend on concentration and environment [ 44 ]. In addition, we found an association between TGFB3 and IGF1 in our phenome-wide research. A previous MR study showed that increased levels of IGF1 are causally associated with decreased migraine risk [ 45 ]. Recent experimental results suggest that the miR-653-3p/IGF1 axis regulating the AKT/TRPV1 signaling pathway may be a potential pathogenic mechanism for migraine [ 46 ]. The beneficial effects of TGF-β3 and IGF1 on migraine may be associated with the regulation of gene expression in different microenvironments to promote the transition of microglial cells from M1 (pathogenic) to M2 (protective) phenotypes [ 47 ].

Among the 13 genes targeted by some commonly used migraine treatment drugs, the MR results for 3 genes were significant in blood or brain eQTL. Although only one gene remained significant after FDR correction, this still demonstrates that the significant genes newly identified in this study are reliable and have potential as drug targets to some extent. The lack of significance in certain drug target genes may be related to the insufficient sample size of the migraine GWAS data included in our study. It would be meaningful to validate the results of this study with more large-sample GWAS data available in the future.

In this study, DSigDB predicted 10 potential drugs for migraine, but current clinical research is mainly focused on melatonin and dexamethasone. ClinicalTrials ( https://clinicaltrials.gov/ ) has registered multiple studies on the efficacy of melatonin and dexamethasone for migraine. Many research findings differ differently and controversially. A published clinical study on acute treatment of pediatric migraine showed that both low and high doses of melatonin contributed to pain relief [ 48 ]. The consensus published by the Brazilian Headache Society in 2022 lists melatonin as a recommended medication for preventing episodic migraine (Class II; Level C) [ 49 ]. However, study indicated that bedtime administration of sustained-release melatonin did not lead to a reduction in migraine attack frequency compared to placebo [ 50 ]. Dexamethasone has shown good efficacy for severe acute migraine attacks [ 51 ]. The 2016 guidelines for the emergency treatment of acute migraines in adults, issued by the American Headache Society, mention that dexamethasone should be administered to prevent the recurrence of migraine (Should offer—Level B) [ 52 ]. But study suggested that dexamethasone does not reduce migraine recurrence [ 53 ].

An animal study has shown that clofibrate can improve oxidative stress and neuroinflammation caused by the exaggerated production of lipid peroxidation products [ 54 ]. Clofibrate can activate peroxisome-proliferator-activated receptors (PPAR) α, inhibit the activation of the NF-κB signaling pathway and the production of interleukin (IL)-6, exerting an anti-inflammatory effect [ 55 , 56 ]. Additionally, a recent animal study indicated the upregulation of astrocytic activation and glial fibrillary acidic protein (GFAP) expression in the trigeminal nucleus caudalis (TNC) in migraine mice model induced by recurrent dural infusion of inflammatory soup (IS). This was accompanied by the release of various cytokines, increased neuronal excitability, and promotion of central sensitization processes [ 57 ]. Clofibrate can reduce the activation of astrocytes and the expression of GFAP, thereby inhibiting neuroinflammation [ 54 ]. Andrographolide is a major bioactive constituent of Andrographis paniculata, has broad effects on various inflammatory and neurological disorders [ 58 , 59 , 60 ]. Although we did not find any migraine clinical trials related to clofibrate and andrographolide on PubMed and ClinicalTrials, we believe that the prospects for using clofibrate and andrographolide in the treatment of migraine are quite promising. We hope to see more research on the association of clofibrate and andrographolide with migraine in the future.

Our study has several advantages. First, we provided compelling genetic evidence about migraine drug targets using MR, utilizing the largest publicly available GWAS data to date. Additionally, colocalization analysis helps reduce false negatives and false positives to ensure the robustness of the results. Enrichment analysis and PPI illustrate the functional characteristics and regulatory relationships of these targets genes, providing potential avenues for migraine drug development. The drug predictions demonstrate the medicinal potential of these genes, and high binding activity from molecular docking indicates the strong potential of these genes as drug targets. Our research conducts a comprehensive evaluation from identifying migraine-related druggable genes to drug binding properties, proposing migraine drug targets with compelling evidence.

This study also includes several notable limitations. Firstly, the number of eQTL IVs in MR is limited, with most not exceeding three SNPs, which restricts the credibility of the MR results. Additionally, while MR offers valuable insights into causality, it assumes a linear connection between low-dose drug exposure and the exposure-outcome relationship, which may not fully replicate real-world clinical trials that typically assess high doses of drugs in a short timeframe. Therefore, MR results may not accurately reflect the effect sizes observed in actual clinical settings, nor fully predict the impacts of drugs. Secondly, the generalizability of this study is limited by its primary inclusion of individuals of European descent. Extrapolating the findings to individuals of other genetic ancestry populations requires further research and validation to ensure broader applicability. Thirdly, the study focuses mainly on cis-eQTLs and their relationship with migraine, potentially overlooking other regulatory and environmental factors that contribute to the complexity of the disease. Fourthly, while enrichment analysis is valuable, it has inherent limitations as it relies on predefined gene sets or pathways, which may not encompass all possible biological mechanisms or interactions. A lack of significant enrichment does not necessarily mean there is no biological relevance, and researchers should interpret results cautiously. Fifth, the accuracy of molecular docking analysis largely depends on the quality of the protein structures and ligands. While this method identified potential drug targets, it does not guarantee their efficacy in clinical settings. Subsequent experimental validation and clinical trials are necessary to confirm the therapeutic potential of the identified targets. Moreover, we only investigated the side effects of 2 significant druggable genes. The effects of drugs on targets are very broad, and many off-target effects cannot be explored through MR, requiring further basic and clinical trials to gain a more comprehensive understanding. Finally, the clinical relevance of our study results needs further validation; the lack of clinical data related to our study is a significant limitation.

This study utilized MR and colocalization analysis to identify 21 potential drug targets for migraine, two of which were significant in both blood and brain. These findings provide promising leads for more effective migraine treatments, potentially reducing drug development costs. The study contributes valuably to the field, highlighting the importance of these druggable genes significantly associated with migraine. Further clinical trials on drugs targeting these genes are necessary in the future.

Availability of data and materials

The Migraine GWAS dataset provided by Hautakangas et al. can be obtained by contacting International Headache Genetics Consortium [ 5 ]. Other data can be obtained from the original literature and websites.

Abbreviations

  • Mendelian randomization

Expression quantitative trait loci

Genome-wide association studies

Calcitonin gene-related peptide

Single nucleotide polymorphisms

Instrumental variables

Drug-Gene Interaction Database

False discovery rate

International Headache Genetics Consortium

Transcriptional start site

Inverse-variance weighted

5-Hydroxytryptamine

γ-Aminobutyric acid

Posterior probabilities

Phenome-wide association study

Gene Ontology

Kyoto Encyclopedia of Genes and Genomes

Biological process

Molecular function

Cellular component

Protein–protein interaction

Drug Signatures Database

Protein Data Bank

Low-density lipoprotein

Gene Set Enrichment Analysis

Insulin-like growth factor 1

Transforming growth factor-beta

Peroxisome-proliferator-activated receptors

Interleukin

Glial fibrillary acidic protein

Trigeminal nucleus caudalis

Inflammatory soup

(2018) Headache Classification Committee of the International Headache Society (IHS) The International Classification of Headache Disorders, 3rd edition. Cephalalgia 38(1):1–211. https://doi.org/10.1177/0333102417738202

GBD Neurology Collaborators (2019) Global, regional, and national burden of neurological disorders, 1990–2016: a systematic analysis for the Global Burden of Disease Study 2016. Lancet Neurol. 18(5):459–480. https://doi.org/10.1016/s1474-4422(18)30499-x

Article   Google Scholar  

Choquet H, Yin J, Jacobson AS, Horton BH, Hoffmann TJ, Jorgenson E et al (2021) New and sex-specific migraine susceptibility loci identified from a multiethnic genome-wide meta-analysis. Commun Biol 4(1):864. https://doi.org/10.1038/s42003-021-02356-y

Article   PubMed   PubMed Central   Google Scholar  

Tanaka M, Szabó Á, Körtési T, Szok D, Tajti J, Vécsei L (2023) From CGRP to PACAP, VIP, and beyond: unraveling the next chapters in migraine treatment. Cells 12(22).  http://doi.org/10.3390/cells12222649 .

Hautakangas H, Winsvold BS, Ruotsalainen SE, Bjornsdottir G, Harder AVE, Kogelman LJA et al (2022) Genome-wide analysis of 102,084 migraine cases identifies 123 risk loci and subtype-specific risk alleles. Nat Genet 54(2):152–160. https://doi.org/10.1038/s41588-021-00990-0

Article   CAS   PubMed   PubMed Central   Google Scholar  

Qi T, Song L, Guo Y, Chen C, Yang J (2024) From genetic associations to genes: methods, applications, and challenges. Trends Genet. https://doi.org/10.1016/j.tig.2024.04.008

Article   PubMed   Google Scholar  

Namba S, Konuma T, Wu KH, Zhou W, Okada Y (2022) A practical guideline of genomics-driven drug discovery in the era of global biobank meta-analysis. Cell Genom 2(10):100190. https://doi.org/10.1016/j.xgen.2022.100190

Burgess S, Timpson NJ, Ebrahim S, Davey Smith G (2015) Mendelian randomization: where are we now and where are we going? Int J Epidemiol 44(2):379–388. https://doi.org/10.1093/ije/dyv108

Storm CS, Kia DA, Almramhi MM, Bandres-Ciga S, Finan C, Hingorani AD et al (2021) Finding genetically-supported drug targets for Parkinson’s disease using Mendelian randomization of the druggable genome. Nat Commun 12(1):7342. https://doi.org/10.1038/s41467-021-26280-1

Freshour SL, Kiwala S, Cotto KC, Coffman AC, McMichael JF, Song JJ et al (2021) Integration of the Drug-Gene Interaction Database (DGIdb 4.0) with open crowdsource efforts. Nucleic Acids Res 49(D1):D1144–d1151. https://doi.org/10.1093/nar/gkaa1084

Article   CAS   PubMed   Google Scholar  

Finan C, Gaulton A, Kruger F A, Lumbers R T, Shah T, Engmann J, et al. (2017) The druggable genome and support for target identification and validation in drug development. Sci Transl Med 9(383).  http://doi.org/10.1126/scitranslmed.aag1166 .

Su WM, Gu XJ, Dou M, Duan QQ, Jiang Z, Yin KF et al (2023) Systematic druggable genome-wide Mendelian randomisation identifies therapeutic targets for Alzheimer’s disease. J Neurol Neurosurg Psychiatry 94(11):954–961. https://doi.org/10.1136/jnnp-2023-331142

Võsa U, Claringbould A, Westra HJ, Bonder MJ, Deelen P, Zeng B et al (2021) Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat Genet 53(9):1300–1310. https://doi.org/10.1038/s41588-021-00913-z

Wang D, Liu S, Warrell J, Won H, Shi X, Navarro F C P, et al. (2018) Comprehensive functional genomic resource and integrative model for the human brain. Science 362(6420).  http://doi.org/10.1126/science.aat8464 .

Hemani G, Zheng J, Elsworth B, Wade K H, Haberland V, Baird D, et al. (2018) The MR-Base platform supports systematic causal inference across the human phenome. Elife 7.  http://doi.org/10.7554/eLife.34408 .

Consortium. G P, Auton A, Brooks L D, Durbin R M, Garrison E P, Kang H M, et al 2015 A global reference for human genetic variation. Nature 526(7571):68-74. http://doi.org/10.1038/nature15393

Staley JR, Blackshaw J, Kamat MA, Ellis S, Surendran P, Sun BB et al (2016) PhenoScanner: a database of human genotype-phenotype associations. Bioinformatics 32(20):3207–3209. https://doi.org/10.1093/bioinformatics/btw373

Burgess S, Davey Smith G, Davies NM, Dudbridge F, Gill D, Glymour MM et al (2019) Guidelines for performing Mendelian randomization investigations: update for summer 2023. Wellcome Open Res 4:186. https://doi.org/10.12688/wellcomeopenres.15555.3

Greco MF, Minelli C, Sheehan NA, Thompson JR (2015) Detecting pleiotropy in Mendelian randomisation studies with summary data and a continuous outcome. Stat Med 34(21):2926–2940. https://doi.org/10.1002/sim.6522

Bowden J, Davey Smith G, Burgess S (2015) Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int J Epidemiol 44(2):512–525. https://doi.org/10.1093/ije/dyv080

Giambartolomei C, Vukcevic D, Schadt EE, Franke L, Hingorani AD, Wallace C et al (2014) Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet 10(5):e1004383. https://doi.org/10.1371/journal.pgen.1004383

Yu G, Wang LG, Han Y, He QY (2012) clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 16(5):284–287. https://doi.org/10.1089/omi.2011.0118

Szklarczyk D, Kirsch R, Koutrouli M, Nastou K, Mehryary F, Hachilif R et al (2023) The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res 51(D1):D638–d646. https://doi.org/10.1093/nar/gkac1000

Yoo M, Shin J, Kim J, Ryall KA, Lee K, Lee S et al (2015) DSigDB: drug signatures database for gene set analysis. Bioinformatics 31(18):3069–3071. https://doi.org/10.1093/bioinformatics/btv313

Kim S, Chen J, Cheng T, Gindulyte A, He J, He S et al (2023) PubChem 2023 update. Nucleic Acids Res 51(D1):D1373–d1380. https://doi.org/10.1093/nar/gkac956

Morris GM, Huey R, Lindstrom W, Sanner MF, Belew RK, Goodsell DS et al (2009) AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility. J Comput Chem 30(16):2785–2791. https://doi.org/10.1002/jcc.21256

Zobdeh F, Ben Kraiem A, Attwood MM, Chubarev VN, Tarasov VV, Schiöth HB et al (2021) Pharmacological treatment of migraine: drug classes, mechanisms of action, clinical trials and new treatments. Br J Pharmacol 178(23):4588–4607. https://doi.org/10.1111/bph.15657

Pandit R, Chen L, Götz J (2020) The blood-brain barrier: physiology and strategies for drug delivery. Adv Drug Deliv Rev 165–166:1–14. https://doi.org/10.1016/j.addr.2019.11.009

Guo Y, Daghlas I, Gormley P, Giulianini F, Ridker PM, Mora S et al (2021) Phenotypic and Genotypic Associations Between Migraine and Lipoprotein Subfractions. Neurology 97(22):e2223–e2235. https://doi.org/10.1212/wnl.0000000000012919

Hong P, Han L, Wan Y (2024) Mendelian randomization study of lipid metabolism characteristics and migraine risk. Eur J Pain. https://doi.org/10.1002/ejp.2235

Buettner C, Nir RR, Bertisch SM, Bernstein C, Schain A, Mittleman MA et al (2015) Simvastatin and vitamin D for migraine prevention: A randomized, controlled trial. Ann Neurol 78(6):970–981. https://doi.org/10.1002/ana.24534

Ferrari MD, Klever RR, Terwindt GM, Ayata C, van den Maagdenberg AM (2015) Migraine pathophysiology: lessons from mouse models and human genetics. Lancet Neurol 14(1):65–80. https://doi.org/10.1016/s1474-4422(14)70220-0

Kursun O, Yemisci M, van den Maagdenberg A, Karatas H (2021) Migraine and neuroinflammation: the inflammasome perspective. J Headache Pain 22(1):55. https://doi.org/10.1186/s10194-021-01271-1

Greenwood J, Mason JC (2007) Statins and the vascular endothelial inflammatory response. Trends Immunol 28(2):88–98. https://doi.org/10.1016/j.it.2006.12.003

Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA et al (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102(43):15545–15550. https://doi.org/10.1073/pnas.0506580102

Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J et al (2003) PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 34(3):267–273. https://doi.org/10.1038/ng1180

Sanjabi S, Zenewicz LA, Kamanaka M, Flavell RA (2009) Anti-inflammatory and pro-inflammatory roles of TGF-beta, IL-10, and IL-22 in immunity and autoimmunity. Curr Opin Pharmacol 9(4):447–453. https://doi.org/10.1016/j.coph.2009.04.008

Okamura T, Sumitomo S, Morita K, Iwasaki Y, Inoue M, Nakachi S et al (2015) TGF-β3-expressing CD4+CD25(-)LAG3+ regulatory T cells control humoral immune responses. Nat Commun 6:6329. https://doi.org/10.1038/ncomms7329

Sun S, Fan Z, Liu X, Wang L, Ge Z (2024) Microglia TREM1-mediated neuroinflammation contributes to central sensitization via the NF-κB pathway in a chronic migraine model. J Headache Pain 25(1):3. https://doi.org/10.1186/s10194-023-01707-w

Güzel I, Taşdemir N, Celik Y (2013) Evaluation of serum transforming growth factor β1 and C-reactive protein levels in migraine patients. Neurol Neurochir Pol 47(4):357–362. https://doi.org/10.5114/ninp.2013.36760

Ishizaki K, Takeshima T, Fukuhara Y, Araki H, Nakaso K, Kusumi M et al (2005) Increased plasma transforming growth factor-beta1 in migraine. Headache 45(9):1224–1228. https://doi.org/10.1111/j.1526-4610.2005.00246.x

Bø SH, Davidsen EM, Gulbrandsen P, Dietrichs E, Bovim G, Stovner LJ et al (2009) Cerebrospinal fluid cytokine levels in migraine, tension-type headache and cervicogenic headache. Cephalalgia 29(3):365–372. https://doi.org/10.1111/j.1468-2982.2008.01727.x

Yang L, Zhou Y, Zhang L, Wang Y, Zhang Y, Xiao Z (2023) Aryl hydrocarbon receptors improve migraine-like pain behaviors in rats through the regulation of regulatory T cell/T-helper 17 cell-related homeostasis. Headache 63(8):1045–1060. https://doi.org/10.1111/head.14599

Komai T, Okamura T, Inoue M, Yamamoto K, Fujio K (2018) Reevaluation of pluripotent cytokine TGF-β3 in immunity. Int J Mol Sci 19(8):2261. https://doi.org/10.3390/ijms19082261

Abuduxukuer R, Niu PP, Guo ZN, Xu YM, Yang Y (2022) Circulating insulin-like growth factor 1 levels and migraine risk: a mendelian randomization study. Neurol Ther 11(4):1677–1689. https://doi.org/10.1007/s40120-022-00398-w

Ye S, Wei L, Jiang Y, Yuan Y, Zeng Y, Zhu L et al (2024) Mechanism of NO(2)-induced migraine in rats: The exploration of the role of miR-653-3p/IGF1 axis. J Hazard Mater 465:133362. https://doi.org/10.1016/j.jhazmat.2023.133362

Ji J, Xue TF, Guo XD, Yang J, Guo RB, Wang J et al (2018) Antagonizing peroxisome proliferator-activated receptor γ facilitates M1-to-M2 shift of microglia by enhancing autophagy via the LKB1-AMPK signaling pathway. Aging Cell 17(4):e12774. https://doi.org/10.1111/acel.12774

Gelfand AA, Ross AC, Irwin SL, Greene KA, Qubty WF, Allen IE (2020) Melatonin for Acute Treatment of Migraine in Children and Adolescents: A Pilot Randomized Trial. Headache 60(8):1712–1721. https://doi.org/10.1111/head.13934

Santos PSF, Melhado EM, Kaup AO, Costa A, Roesler CAP, Piovesan ÉJ et al (2022) Consensus of the Brazilian Headache Society (SBCe) for prophylactic treatment of episodic migraine: part II. Arq Neuropsiquiatr 80(9):953–969. https://doi.org/10.1055/s-0042-1755320

Alstadhaug KB, Odeh F, Salvesen R, Bekkelund SI (2010) Prophylaxis of migraine with melatonin: a randomized controlled trial. Neurology 75(17):1527–1532. https://doi.org/10.1212/WNL.0b013e3181f9618c

Gelfand AA, Goadsby PJ (2012) A neurologist’s guide to acute migraine therapy in the emergency room. Neurohospitalist 2(2):51–59. https://doi.org/10.1177/1941874412439583

Orr SL, Friedman BW, Christie S, Minen MT, Bamford C, Kelley NE et al (2016) Management of Adults With Acute Migraine in the Emergency Department: The American Headache Society Evidence Assessment of Parenteral Pharmacotherapies. Headache 56(6):911–940. https://doi.org/10.1111/head.12835

Rowe BH, Colman I, Edmonds ML, Blitz S, Walker A, Wiens S (2008) Randomized controlled trial of intravenous dexamethasone to prevent relapse in acute migraine headache. Headache 48(3):333–340. https://doi.org/10.1111/j.1526-4610.2007.00959.x

Oyagbemi AA, Adebiyi OE, Adigun KO, Ogunpolu BS, Falayi OO, Hassan FO et al (2020) Clofibrate, a PPAR-α agonist, abrogates sodium fluoride-induced neuroinflammation, oxidative stress, and motor incoordination via modulation of GFAP/Iba-1/anti-calbindin signaling pathways. Environ Toxicol 35(2):242–253. https://doi.org/10.1002/tox.22861

Sánchez-Aguilar M, Ibarra-Lara L, Cano-Martínez A, Soria-Castro E, Castrejón-Téllez V, Pavón N, et al. (2023) PPAR Alpha Activation by Clofibrate Alleviates Ischemia/Reperfusion Injury in Metabolic Syndrome Rats by Decreasing Cardiac Inflammation and Remodeling and by Regulating the Atrial Natriuretic Peptide Compensatory Response. Int J Mol Sci 24(6).  http://doi.org/10.3390/ijms24065321 .

Brown JD, Plutzky J (2007) Peroxisome proliferator-activated receptors as transcriptional nodal points and therapeutic targets. Circulation 115(4):518–533. https://doi.org/10.1161/circulationaha.104.475673

Zhang L, Lu C, Kang L, Li Y, Tang W, Zhao D et al (2022) Temporal characteristics of astrocytic activation in the TNC in a mice model of pain induced by recurrent dural infusion of inflammatory soup. J Headache Pain 23(1):8. https://doi.org/10.1186/s10194-021-01382-9

Patel R, Kaur K, Singh S (2021) Protective effect of andrographolide against STZ induced Alzheimer’s disease in experimental rats: possible neuromodulation and Aβ((1–42)) analysis. Inflammopharmacology 29(4):1157–1168. https://doi.org/10.1007/s10787-021-00843-6

Ahmed S, Kwatra M, Ranjan Panda S, Murty USN, Naidu VGM (2021) Andrographolide suppresses NLRP3 inflammasome activation in microglia through induction of parkin-mediated mitophagy in in-vitro and in-vivo models of Parkinson disease. Brain Behav Immun 91:142–158. https://doi.org/10.1016/j.bbi.2020.09.017

Ciampi E, Uribe-San-Martin R, Cárcamo C, Cruz JP, Reyes A, Reyes D et al (2020) Efficacy of andrographolide in not active progressive multiple sclerosis: a prospective exploratory double-blind, parallel-group, randomized, placebo-controlled trial. BMC Neurol 20(1):173. https://doi.org/10.1186/s12883-020-01745-w

Download references

Acknowledgements

The authors sincerely thank related investigators for sharing the statistics included in this study.

This study was funded by China National Natural Science Foundation (82374575, 82074179), Beijing Natural Science Foundation (7232270), Outstanding Young Talents Program of Capital Medial University (B2207), Capital’s Funds for Health Improvement and Research (CFH2024-2–2235).

Author information

Authors and affiliations.

Department of Acupuncture and Moxibustion, Beijing Hospital of Traditional Chinese Medicine, Capital Medical University, Beijing Key Laboratory of Acupuncture Neuromodulation, No. 23, Meishuguan Houjie, Beijing, 100010, China

Chengcheng Zhang & Lu Liu

State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, 100876, China

You can also search for this author in PubMed   Google Scholar

Contributions

LL contributed to the study conception and design. CCZ and YWH performed the statistical analysis. CCZ drafted the manuscript. All authors commented on previous versions of the manuscript. All authors contributed to the article and approved the submitted version.

Corresponding author

Correspondence to Lu Liu .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

All data analyzed during this study have been previously published.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Zhang, C., He, Y. & Liu, L. Identifying therapeutic target genes for migraine by systematic druggable genome-wide Mendelian randomization. J Headache Pain 25 , 100 (2024). https://doi.org/10.1186/s10194-024-01805-3

Download citation

Received : 05 May 2024

Accepted : 05 June 2024

Published : 12 June 2024

DOI : https://doi.org/10.1186/s10194-024-01805-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Druggable target genes

The Journal of Headache and Pain

ISSN: 1129-2377

types of factor analysis in research methodology

  • Open access
  • Published: 14 June 2024

Associations between deep venous thrombosis and thyroid diseases: a two-sample bidirectional Mendelian randomization study

  • Lifeng Zhang 1   na1 ,
  • Kaibei Li 2   na1 ,
  • Qifan Yang 1 ,
  • Yao Lin 1 ,
  • Caijuan Geng 1 ,
  • Wei Huang 1 &
  • Wei Zeng 1  

European Journal of Medical Research volume  29 , Article number:  327 ( 2024 ) Cite this article

234 Accesses

Metrics details

Some previous observational studies have linked deep venous thrombosis (DVT) to thyroid diseases; however, the findings were contradictory. This study aimed to investigate whether some common thyroid diseases can cause DVT using a two-sample Mendelian randomization (MR) approach.

This two-sample MR study used single nucleotide polymorphisms (SNPs) identified by the FinnGen genome-wide association studies (GWAS) to be highly associated with some common thyroid diseases, including autoimmune hyperthyroidism (962 cases and 172,976 controls), subacute thyroiditis (418 cases and 187,684 controls), hypothyroidism (26,342 cases and 59,827 controls), and malignant neoplasm of the thyroid gland (989 cases and 217,803 controls. These SNPs were used as instruments. Outcome datasets for the GWAS on DVT (6,767 cases and 330,392 controls) were selected from the UK Biobank data, which was obtained from the Integrative Epidemiology Unit (IEU) open GWAS project. The inverse variance weighted (IVW), MR-Egger and weighted median methods were used to estimate the causal association between DVT and thyroid diseases. The Cochran’s Q test was used to quantify the heterogeneity of the instrumental variables (IVs). MR Pleiotropy RESidual Sum and Outlier test (MR-PRESSO) was used to detect horizontal pleiotropy. When the causal relationship was significant, bidirectional MR analysis was performed to determine any reverse causal relationships between exposures and outcomes.

This MR study illustrated that autoimmune hyperthyroidism slightly increased the risk of DVT according to the IVW [odds ratio (OR) = 1.0009; p  = 0.024] and weighted median methods [OR = 1.001; p  = 0.028]. According to Cochran’s Q test, there was no evidence of heterogeneity in IVs. Additionally, MR-PRESSO did not detect horizontal pleiotropy ( p  = 0.972). However, no association was observed between other thyroid diseases and DVT using the IVW, weighted median, and MR-Egger regression methods.

Conclusions

This study revealed that autoimmune hyperthyroidism may cause DVT; however, more evidence and larger sample sizes are required to draw more precise conclusions.

Introduction

Deep venous thrombosis (DVT) is a common type of disease that occurs in 1–2 individuals per 1000 each year [ 1 ]. In the post-COVID-19 era, DVT showed a higher incidence rate [ 2 ]. Among hospitalized patients, the incidence rate of this disease was as high as 2.7% [ 3 ], increasing the risk of adverse events during hospitalization. According to the Registro Informatizado Enfermedad Tromboembolica (RIETE) registry, which included data from ~ 100,000 patients from 26 countries, the 30-day mortality rate was 2.6% for distal DVT and 3.3% for proximal DVT [ 4 ]. Other studies have shown that the one-year mortality rate of DVT is 19.6% [ 5 ]. DVT and pulmonary embolism (PE), collectively referred to as venous thromboembolism (VTE), constitute a major global burden of disease [ 6 ].

Thyroid diseases are common in the real world. Previous studies have focused on the relationship between DVT and thyroid diseases, including thyroid dysfunction and thyroid cancer. Some case reports [ 7 , 8 , 9 ] have demonstrated that hyperthyroidism is often associated with DVT and indicates a worse prognosis [ 10 ]. The relationship between thyroid tumors and venous thrombosis has troubled researchers for many years. In 1989, the first case of papillary thyroid carcinoma presenting with axillary vein thrombosis as the initial symptom was reported [ 11 ]. In 1995, researchers began to notice the relationship between thyroid tumors and hypercoagulability [ 12 ], laying the foundation for subsequent extensive research. However, the aforementioned observational studies had limitations, such as small sample sizes, selection bias, reverse causality, and confounding factors, which may have led to unreliable conclusions [ 13 ].

Previous studies have explored the relationship of thyroid disease and DVT and revealed that high levels of thyroid hormones may increase the risk of DVT. Hyperthyroidism promotes a procoagulant and hypofibrinolytic state by affecting the von Willebrand factor, factors VIII, IV, and X, fibrinogen, and plasminogen activator inhibitor-1 [ 14 , 15 ]. At the molecular level, researchers believe that thyroid hormones affect coagulation levels through an important nuclear thyroid hormone receptor (TR), TRβ [ 16 ], and participate in pathological coagulation through endothelial dysfunction. Thyroid hormones may have non-genetic effects on the behavior of endothelial cells [ 17 , 18 ]. In a study regarding tumor thrombosis, Lou [ 19 ] found that 303 circular RNAs were differentially expressed in DVT using microarray. Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis revealed that the most significantly enriched pathways included thyroid hormone-signaling pathway and endocytosis, and also increased level of proteoglycans in cancer. This indicated that tumor cells and thyroid hormones might interact to promote thrombosis. Based on these studies, we speculated that thyroid diseases, including thyroid dysfunction and thyroid tumors, may cause DVT.

Mendelian randomization (MR) research is a causal inference technique that can be used to assess the causal relationship and reverse causation between specific exposure and outcome factors. If certain assumptions [ 20 ] are fulfilled, genetic variants can be employed as instrumental variables (IVs) to establish causal relationships. Bidirectional MR analysis can clarify the presence of reverse causal relationships [ 21 ], making the conclusions more comprehensive. Accordingly, we aimed to apply a two-sample MR strategy to investigate whether DVT is related to four thyroid diseases, including autoimmune hyperthyroidism, subacute thyroiditis, hypothyroidism, and thyroid cancer.

Study design

MR relies on single nucleotide polymorphisms (SNPs) as IVs. The IVs should fulfill the following three criteria [ 22 ]: (1) IVs should be strongly associated with exposure. (2) Genetic variants must be independent of unmeasured confounding factors that may affect the exposure–outcome association. (3) IVs are presumed to affect the outcome only through their associations with exposure (Fig.  1 ). IVs that met the above requirements were used to estimate the relationship between exposure and outcome. Our study protocol conformed to the STROBE-MR Statement [ 23 ], and all methods were performed in accordance with the relevant guidelines and regulations.

figure 1

The relationship between instrumental variables, exposure, outcome, and confounding factors

Data sources and instruments

Datasets (Table  1 ) in this study were obtained from a publicly available database (the IEU open genome-wide association studies (GWAS) project [ 24 ] ( https://gwas.mrcieu.ac.uk )). There was no overlap in samples between the data sources of outcome and exposures. Using de-identified summary-level data, privacy information such as overall age and gender were hidden. Ethical approval was obtained for all original work. This study complied with the terms of use of the database.

MR analysis was performed using the R package “TwoSampleMR”. SNPs associated with each thyroid disease at the genome-wide significance threshold of p  < 5.0 × 10 –8 were selected as potential IVs. To ensure independence between the genetic variants used as IVs, the linkage disequilibrium (LD) threshold for grouping was set to r 2  < 0.001 with a window size of 10,000 kb. The SNP with the lowest p -value at each locus was retained for analyses.

Statistical analysis

Multiple MR methods were used to infer causal relationships between thyroid diseases and DVT, including the inverse variance weighted (IVW), weighted median, and MR-Egger tests, after harmonizing the SNPs across the GWASs of exposures and outcomes. The main analysis was conducted using the IVW method. Heterogeneity and pleiotropy were also performed in each MR analysis. Meanwhile, the MR-PRESSO Global test [ 25 ] was utilized to detect horizontal pleiotropy. The effect trend of SNP was observed through a scatter plot, and the forest plot was used to observe the overall effects. When a significant causal relationship was confirmed by two-sample MR analysis, bidirectional MR analysis was performed to assess reverse causal relationships by swapping exposure and outcome factors. Parameters were set the same as before. All abovementioned statistical analyses were performed using the package TwoSampleMR (version 0.5.7) in the R program (version 4.2.1).

After harmonizing the SNPs across the GWASs for exposures and outcomes, the IVW (OR = 1.0009, p  = 0.024, Table  2 ) and weighted median analyses (OR = 1.001, p  = 0.028) revealed significant causal effects between autoimmune hyperthyroidism and DVT risk. Similar results were observed using the weighted median approach Cochran’s Q test, MR-Egger intercept, and MR-PRESSO tests suggested that the results were not influenced by pleiotropy and heterogeneity (Table  2 ). However, the leave-one-out analysis revealed a significant difference after removing some SNPs (rs179247, rs6679677, rs72891915, and rs942495, p  < 0.05, Figure S2a), indicating that MR results were dependent on these SNPs (Figure S2, Table S1). No significant effects were observed in other thyroid diseases (Table  2 ). The estimated scatter plot of the association between thyroid diseases and DVT is presented in Fig.  2 , indicating a positive causal relationship between autoimmune hyperthyroidism and DVT (Fig.  2 a). The forest plots of single SNPs affecting the risk of DVT are displayed in Figure S1.

figure 2

The estimated scatter plot of the association between thyroid diseases and DVT. MR-analyses are derived using IVW, MR-Egger, weighted median and mode. By fitting different models, the scatter plot showed the relationship between SNP and exposure factors, predicting the association between SNP and outcomes

Bidirectional MR analysis was performed to further determine the relationship between autoimmune hyperthyroidism and DVT. The reverse causal relationship was not observed (Table S2), which indicated that autoimmune hyperthyroidism can cause DVT from a mechanism perspective.

This study used MR to assess whether thyroid diseases affect the incidence of DVT. The results showed that autoimmune hyperthyroidism can increase the risk of DVT occurrence, but a reverse causal relationship was not observed between them using bidirectional MR analysis. However, other thyroid diseases, such as subacute thyroiditis, hypothyroidism, and thyroid cancer, did not show a similar effect.

Recently, several studies have suggested that thyroid-related diseases may be associated with the occurrence of DVT in the lower extremities, which provided etiological clues leading to the occurrence of DVT in our subsequent research. In 2006, a review mentioned the association between thyroid dysfunction and coagulation disorders [ 26 ], indicating a hypercoagulable state in patients with hyperthyroidism. In 2011, a review further suggested a clear association between hypothyroidism and bleeding tendency, while hyperthyroidism appeared to increase the risk of thrombotic events, particularly cerebral venous thrombosis [ 27 ]. A retrospective cohort study [ 28 ] supported this conclusion, but this study only observed a higher proportion of concurrent thyroid dysfunction in patients with cerebral venous thrombosis. The relationship between thyroid function and venous thromboembolism remains controversial. Krieg VJ et al. [ 29 ] found that hypothyroidism has a higher incidence rate in patients with chronic thromboembolic pulmonary hypertension and may be associated with more severe disease, which seemed to be different from previous views that hyperthyroidism may be associated with venous thrombosis. Alsaidan [ 30 ] also revealed that the risk of developing venous thrombosis was almost increased onefold for cases with a mild-to-moderate elevation of thyroid stimulating hormone and Free thyroxine 4(FT4). In contrast, it increased twofold for cases with a severe elevation of thyroid stimulating hormone and FT4. Raised thyroid hormones may increase the synthesis or secretion of coagulation factors or may decrease fibrinolysis, which may lead to the occurrence of coagulation abnormality.

Other thyroid diseases are also reported to be associated with DVT. In a large prospective cohort study [ 31 ], the incidence of venous thromboembolism was observed to increase in patients with thyroid cancer over the age of 60. However, other retrospective studies did not find any difference compared with the general population [ 32 ]. In the post-COVID-19 era, subacute thyroiditis has received considerable attention from researchers. New evidence suggests that COVID-19 may be associated with subacute thyroiditis [ 33 , 34 ]. Mondal et al. [ 35 ] found that out of 670 COVID-19 patients, 11 presented with post-COVID-19 subacute thyroiditis. Among them, painless subacute thyroiditis appeared earlier and exhibited symptoms of hyperthyroidism. Another case report also indicated the same result, that is, subacute thyroiditis occurred after COVID-19 infection, accompanied by thyroid function changes [ 36 ]. This led us to hypothesize that subacute thyroiditis may cause DVT through alterations in thyroid function.

This study confirmed a significant causal relationship between autoimmune hyperthyroidism and DVT ( p  = 0.02). The data were tested for heterogeneity and gene pleiotropy using MR-Egger, Cochran’s Q, and MR-PRESSO tests. There was no evidence that the results were influenced by pleiotropy or heterogeneity. In the leave-one-out analysis, four of the five selected SNPs showed significant effects of autoimmune hyperthyroidism on DVT, suggesting an impact of these SNPs on DVT outcome. Previous studies have focused on the relationship between hyperthyroidism and its secondary arrhythmias and arterial thromboembolism [ 37 , 38 ]. This study emphasized the risk of DVT in patients with hyperthyroidism, which has certain clinical implications. Prophylactic anticoagulant therapy was observed to help prevent DVT in patients with hyperthyroidism. Unfortunately, the results of this study did not reveal any evidence that suggests a relationship between other thyroid diseases and DVT occurrence. This may be due to the limited database, as this study only included the GWAS data from a subset of European populations. Large-scale multiracial studies are needed in the future.

There are some limitations to this study. First, it was limited to participants of European descent. Consequently, further investigation is required to confirm these findings in other ethnicities. Second, this study did not reveal the relationship between complications of hyperthyroidism and DVT. Additionally, this study selected IVs from the database using statistical methods rather than selecting them from the real population. This may result in weaker effects of the screened IVs and reduce the clinical significance of MR analysis. Moreover, the definitions of some diseases in this study were not clear in the original database, and some of the diseases were self-reported, which may reduce the accuracy of diagnosis. Further research is still needed to clarify the causal relationship between DVT and thyroid diseases based on prospective cohort and randomized controlled trials (RCTs).

This study analyzed large-scale genetic data and provided evidence of a causal relationship between autoimmune hyperthyroidism and the risk of DVT, Compared with the other thyroid diseases investigated. Prospective RCTs or MR studies with larger sample sizes are still needed to draw more precise conclusions.

Availability of data and materials

The IEU open gwas project, https://gwas.mrcieu.ac.uk/

Ortel TL, Neumann I, Ageno W, et al. American society of hematology 2020 guidelines for management of venous thromboembolism: treatment of deep vein thrombosis and pulmonary embolism. Blood Adv. 2020;4(19):4693–738.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Mehrabi F, Farshbafnadi M, Rezaei N. Post-discharge thromboembolic events in COVID-19 patients: a review on the necessity for prophylaxis. Clin Appl Thromb Hemost. 2023;29:10760296221148476.

Article   PubMed   PubMed Central   Google Scholar  

Loffredo L, Vidili G, Sciacqua A, et al. Asymptomatic and symptomatic deep venous thrombosis in hospitalized acutely ill medical patients: risk factors and therapeutic implications. Thromb J. 2022;20(1):72.

RIETE Registry. Death within 30 days. RIETE Registry. 2022[2023.8.23]. https://rieteregistry.com/graphics-interactives/dead-30-days/ .

Minges KE, Bikdeli B, Wang Y, Attaran RR, Krumholz HM. National and regional trends in deep vein thrombosis hospitalization rates, discharge disposition, and outcomes for medicare beneficiaries. Am J Med. 2018;131(10):1200–8.

Di Nisio M, van Es N, Büller HR. Deep vein thrombosis and pulmonary embolism. Lancet. 2016;388(10063):3060–73.

Article   PubMed   Google Scholar  

Aquila I, Boca S, Caputo F, et al. An unusual case of sudden death: is there a relationship between thyroid disorders and fatal pulmonary thromboembolism? A case report and review of literature. Am J Forensic Med Pathol. 2017;38(3):229–32.

Katić J, Katić A, Katić K, Duplančić D, Lozo M. Concurrent deep vein thrombosis and pulmonary embolism associated with hyperthyroidism: a case report. Acta Clin Croat. 2021;60(2):314–6.

PubMed   PubMed Central   Google Scholar  

Hieber M, von Kageneck C, Weiller C, Lambeck J. Thyroid diseases are an underestimated risk factor for cerebral venous sinus thrombosis. Front Neurol. 2020;11:561656.

Pohl KR, Hobohm L, Krieg VJ, et al. Impact of thyroid dysfunction on short-term outcomes and long-term mortality in patients with pulmonary embolism. Thromb Res. 2022;211:70–8.

Article   CAS   PubMed   Google Scholar  

Sirota DK. Axillary vein thrombosis as the initial symptom in metastatic papillary carcinoma of the thyroid. Mt Sinai J Med. 1989;56(2):111–3.

CAS   PubMed   Google Scholar  

Raveh E, Cohen M, Shpitzer T, Feinmesser R. Carcinoma of the thyroid: a cause of hypercoagulability? Ear Nose Throat J. 1995;74(2):110–2.

Davey Smith G, Hemani G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum Mol Genet. 2014;23(R1):R89–98.

Stuijver DJ, van Zaane B, Romualdi E, Brandjes DP, Gerdes VE, Squizzato A. The effect of hyperthyroidism on procoagulant, anticoagulant and fibrinolytic factors: a systematic review and meta-analysis. Thromb Haemost. 2012;108(6):1077–88.

PubMed   Google Scholar  

Son HM. Massive cerebral venous sinus thrombosis secondary to Graves’ disease. Yeungnam Univ J Med. 2019;36(3):273–80.

Elbers LP, Moran C, Gerdes VE, et al. The hypercoagulable state in hyperthyroidism is mediated via the thyroid hormone β receptor pathway. Eur J Endocrinol. 2016;174(6):755–62.

Davis PJ, Sudha T, Lin HY, et al. Thyroid hormone, hormone analogs, and angiogenesis. Compr Physiol. 2015;6(1):353–62.

Mousa SA, Lin HY, Tang HY, et al. Modulation of angiogenesis by thyroid hormone and hormone analogues: implications for cancer management. Angiogenesis. 2014;17(3):463–9.

Lou Z, Li X, Li C, et al. Microarray profile of circular RNAs identifies hsa_circ_000455 as a new circular RNA biomarker for deep vein thrombosis. Vascular. 2022;30(3):577–89.

Hemani G, Bowden J, Davey SG. Evaluating the potential role of pleiotropy in Mendelian randomization studies. Hum Mol Genet. 2018;27(R2):R195–208.

Zhang Z, Li L, Hu Z, et al. Causal effects between atrial fibrillation and heart failure: evidence from a bidirectional Mendelian randomization study. BMC Med Genomics. 2023;16(1):187.

Emdin CA, Khera AV, Kathiresan S. Mendelian randomization. JAMA. 2017;318(19):1925–6.

Skrivankova VW, Richmond RC, Woolf BAR, et al. Strengthening the reporting of observational studies in epidemiology using Mendelian randomization: the STROBE-MR statement. JAMA. 2021;326(16):1614–21.

Hemani G, Zheng J, Elsworth B, et al. The MR-Base platform supports systematic causal inference across the human phenome. Elife. 2018;7: e34408.

Verbanck M, Chen CY, Neale B, Do R. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat Genet. 2018;50(5):693–8.

Franchini M. Hemostatic changes in thyroid diseases: haemostasis and thrombosis. Hematology. 2006;11(3):203–8.

Franchini M, Lippi G, Targher G. Hyperthyroidism and venous thrombosis: a casual or causal association? A systematic literature review. Clin Appl Thromb Hemost. 2011;17(4):387–92.

Fandler-Höfler S, Pilz S, Ertler M, et al. Thyroid dysfunction in cerebral venous thrombosis: a retrospective cohort study. J Neurol. 2022;269(4):2016–21.

Krieg VJ, Hobohm L, Liebetrau C, et al. Risk factors for chronic thromboembolic pulmonary hypertension—importance of thyroid disease and function. Thromb Res. 2020;185:20–6.

Alsaidan AA, Alruwiali F. Association between hyperthyroidism and thromboembolism: a retrospective observational study. Ann Afr Med. 2023;22(2):183–8.

Walker AJ, Card TR, West J, Crooks C, Grainge MJ. Incidence of venous thromboembolism in patients with cancer—a cohort study using linked United Kingdom databases. Eur J Cancer. 2013;49(6):1404–13.

Ordookhani A, Motazedi A, Burman KD. Thrombosis in thyroid cancer. Int J Endocrinol Metab. 2017;16(1): e57897.

Ziaka M, Exadaktylos A. Insights into SARS-CoV-2-associated subacute thyroiditis: from infection to vaccine. Virol J. 2023;20(1):132.

Henke K, Odermatt J, Ziaka M, Rudovich N. Subacute thyroiditis complicating COVID-19 infection. Clin Med Insights Case Rep. 2023;16:11795476231181560.

Mondal S, DasGupta R, Lodh M, Ganguly A. Subacute thyroiditis following recovery from COVID-19 infection: novel clinical findings from an Eastern Indian cohort. Postgrad Med J. 2023;99(1172):558–65.

Nham E, Song E, Hyun H, et al. Concurrent subacute thyroiditis and graves’ disease after COVID-19: a case report. J Korean Med Sci. 2023;38(18): e134.

Mouna E, Molka BB, Sawssan BT, et al. Cardiothyreosis: epidemiological, clinical and therapeutic approach. Clin Med Insights Cardiol. 2023;17:11795468231152042.

Maung AC, Cheong MA, Chua YY, Gardner DS. When a storm showers the blood clots: a case of thyroid storm with systemic thromboembolism. Endocrinol Diabetes Metab Case Rep. 2021;2021:20–0118.

Download references

Not applicable.

Author information

Lifeng Zhang and Kaibei Li have contributed equally to this work and share the first authorship.

Authors and Affiliations

Department of Vascular Surgery, Hospital of Chengdu University of Traditional Chinese Medicine, No. 39, Shierqiao Road, Jinniu District, Chengdu, 610072, Sichuan, People’s Republic of China

Lifeng Zhang, Qifan Yang, Yao Lin, Caijuan Geng, Wei Huang & Wei Zeng

Disinfection Supply Center, Hospital of Chengdu University of Traditional Chinese Medicine, No. 39, Shierqiao Road, Jin Niu District, Chengdu, 610072, Sichuan, People’s Republic of China

You can also search for this author in PubMed   Google Scholar

Contributions

Conception and design: LFZ and WZ. Analysis and interpretation: LFZ, KBL and WZ. Data collection: LFZ, QFY, YL, CJG and WH. Writing the article: LFZ, KBL. Critical revision of the article: LFZ, GFY and WZ. Final approval of the article: LFZ, KBL, YL, CJG, WH, QFY and WZ. Statistical analysis: YL, QFY.

Corresponding author

Correspondence to Wei Zeng .

Ethics declarations

Ethics approval and consent to participate.

Ethical approval was obtained in all original studies. This study complies with the terms of use of the database.

Competing interests

Additional information, publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Zhang, L., Li, K., Yang, Q. et al. Associations between deep venous thrombosis and thyroid diseases: a two-sample bidirectional Mendelian randomization study. Eur J Med Res 29 , 327 (2024). https://doi.org/10.1186/s40001-024-01933-1

Download citation

Received : 12 September 2023

Accepted : 09 June 2024

Published : 14 June 2024

DOI : https://doi.org/10.1186/s40001-024-01933-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Deep venous thrombosis
  • Thyroid diseases
  • Mendelian randomization analysis

European Journal of Medical Research

ISSN: 2047-783X

types of factor analysis in research methodology

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Published: 19 June 2024

Detection of microplastics in the human penis

  • Jason Codrington   ORCID: orcid.org/0009-0003-1490-4211 1 ,
  • Alexandra Aponte Varnum 1 ,
  • Lars Hildebrandt 2 ,
  • Daniel Pröfrock 2 ,
  • Joginder Bidhan 1 ,
  • Kajal Khodamoradi   ORCID: orcid.org/0000-0003-2951-4382 1 ,
  • Anke-Lisa Höhme 3 ,
  • Martin Held   ORCID: orcid.org/0000-0003-1869-463X 3 ,
  • Aymara Evans 1 ,
  • David Velasquez   ORCID: orcid.org/0009-0003-0475-4918 1 ,
  • Christina C. Yarborough 1 ,
  • Bahareh Ghane-Motlagh 4 ,
  • Ashutosh Agarwal 1 , 5 ,
  • Justin Achua   ORCID: orcid.org/0000-0002-4159-439X 6 ,
  • Edoardo Pozzi   ORCID: orcid.org/0000-0002-0228-7039 1 , 7 , 8 ,
  • Francesco Mesquita 1 ,
  • Francis Petrella 1 ,
  • David Miller 1 &
  • Ranjith Ramasamy 1  

International Journal of Impotence Research ( 2024 ) Cite this article

732 Accesses

725 Altmetric

Metrics details

  • Medical research
  • Sexual dysfunction

The proliferation of microplastics (MPs) represents a burgeoning environmental and health crisis. Measuring less than 5 mm in diameter, MPs have infiltrated atmospheric, freshwater, and terrestrial ecosystems, penetrating commonplace consumables like seafood, sea salt, and bottled beverages. Their size and surface area render them susceptible to chemical interactions with physiological fluids and tissues, raising bioaccumulation and toxicity concerns. Human exposure to MPs occurs through ingestion, inhalation, and dermal contact. To date, there is no direct evidence identifying MPs in penile tissue. The objective of this study was to assess for potential aggregation of MPs in penile tissue. Tissue samples were extracted from six individuals who underwent surgery for a multi-component inflatable penile prosthesis (IPP). Samples were obtained from the corpora using Adson forceps before corporotomy dilation and device implantation and placed into cleaned glassware. A control sample was collected and stored in a McKesson specimen plastic container. The tissue fractions were analyzed using the Agilent 8700 Laser Direct Infrared (LDIR) Chemical Imaging System (Agilent Technologies. Moreover, the morphology of the particles was investigated by a Zeiss Merlin Scanning Electron Microscope (SEM), complementing the detection range of LDIR to below 20 µm. MPs via LDIR were identified in 80% of the samples, ranging in size from 20–500 µm. Smaller particles down to 2 µm were detected via SEM. Seven types of MPs were found in the penile tissue, with polyethylene terephthalate (47.8%) and polypropylene (34.7%) being the most prevalent. The detection of MPs in penile tissue raises inquiries on the ramifications of environmental pollutants on sexual health. Our research adds a key dimension to the discussion on man-made pollutants, focusing on MPs in the male reproductive system.

types of factor analysis in research methodology

This is a preview of subscription content, access via your institution

Access options

Subscribe to this journal

Receive 8 print issues and online access

251,40 € per year

only 31,43 € per issue

Buy this article

  • Purchase on Springer Link
  • Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

types of factor analysis in research methodology

Similar content being viewed by others

types of factor analysis in research methodology

Does time of intraoperative exposure to the aerobiome increase microbial growth on inflatable penile prostheses?

types of factor analysis in research methodology

A novel method for extraction, quantification, and identification of microplastics in CreamType of cosmetic products

types of factor analysis in research methodology

Microplastics generated when opening plastic packaging

Data availability.

All relevant data to the current study that was generated and analyzed is available upon reasonable request from the corresponding author.

Schwabl P, Köppel S, Königshofer P, Bucsics T, Trauner M, Reiberger T, et al. Detection of various microplastics in human stool: a prospective case series. Ann Intern Med. 2019;171:453–7.

Article   PubMed   Google Scholar  

Zhu L, Zhu J, Zuo R, Xu Q, Qian Y, An L. Identification of microplastics in human placenta using laser direct infrared spectroscopy. Sci Total Environ. 2023;856:159060

Article   CAS   PubMed   Google Scholar  

Ragusa A, Svelato A, Santacroce C, Catalano P, Notarstefano V, Carnevali O, et al. Plasticenta: first evidence of microplastics in human placenta. Environ Int. 2021;146:106274.

Amato-Lourenço LF, Carvalho-Oliveira R, Júnior GR, Dos Santos Galvão L, Ando RA, Mauad T. Presence of airborne microplastics in human lung tissue. J Hazard Mater. 2021;416:126124.

Jenner LC, Rotchell JM, Bennett RT, Cowen M, Tentzeris V, Sadofsky LR. Detection of microplastics in human lung tissue using μFTIR spectroscopy. Sci Total Environ. 2022;831:154907.

Yang Y, Xie E, Du Z, Peng Z, Han Z, Li L, et al. Detection of various microplastics in patients undergoing cardiac surgery. Environ Sci Technol. 2023;57:10911–8.

Wang C, Zhao J, Xing B. Environmental source, fate, and toxicity of microplastics. J Hazard Mater. 2021;407:124357.

da Silva Brito WA, Mutter F, Wende K, Cecchini AL, Schmidt A, Bekeschus S. Consequences of nano and microplastic exposure in rodent models: the known and unknown. Part Fibre Toxicol. 2022;19:28.

Article   PubMed   PubMed Central   Google Scholar  

Wright SL, Kelly FJ. Plastic and human health: a micro issue? Environ Sci Technol. 2017;51:6634–47.

Ragusa A, Notarstefano V, Svelato A, Belloni A, Gioacchini G, Blondeel C, et al. Raman microspectroscopy detection and characterisation of microplastics in human breastmilk. Polymers. 2022;14:2700.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Cox KD, Covernton GA, Davies HL, Dower JF, Juanes F, Dudas SE. Human consumption of microplastics. Environ Sci Technol. 2019;53:7068–74.

Barceló D, Picó Y, Alfarhan AH. Microplastics: detection in human samples, cell line studies, and health impacts. Environ Toxicol Pharmacol. 2023;101:104204.

Gautam R, Jo J, Acharya M, Maharjan A, Lee D, KC PB. et al. Evaluation of potential toxicity of polyethylene microplastics on human derived cell lines. Sci Total Environ. 2022;838:156089

Sorci G, Loiseau C. Should we worry about the accumulation of microplastics in human organs? EBioMedicine. 2022;82:104191.

Wang W, Ge J, Yu X. Bioavailability and toxicity of microplastics to fish species: A review. Ecotoxicol Environ Saf. 2020;189:109913.

Yong CQY, Valiyaveettil S, Tang BL. Toxicity of microplastics and nanoplastics in mammalian systems. Int J Environ Res Public Health. 2020;17:1509.

D’Angelo S, Meccariello R. Microplastics: a threat for male fertility. Int J Environ Res Public Health. 2021;18:2392.

Hou B, Wang F, Liu T, Wang Z. Reproductive toxicity of polystyrene microplastics: In vivo experimental study on testicular toxicity in mice. J Hazard Mater. 2021;405:124028.

Jaeger VK, Walker UA. Erectile dysfunction in systemic sclerosis. Curr Rheumatol Rep. 2016;18:49.

Jung J, Jo HW, Kwon H, Jeong NY. Clinical neuroanatomy and neurotransmitter-mediated regulation of penile erection. Int Neurourol J. 2014;18:58–62.

Sopko NA, Hannan JL, Bivalacqua TJ. Understanding and targeting the Rho kinase pathway in erectile dysfunction. Nat Rev Urol. 2014;11:622–8.

Sorkhi S, Sanchez CC, Cho MC, Cho SY, Chung H, Park MG, et al. Transpelvic magnetic stimulation enhances penile microvascular perfusion in a rat model: a novel interventional strategy to prevent penile fibrosis after cavernosal nerve injury. World J Mens Health. 2022;40:501–8.

Hildebrandt L, Zimmermann T, Primpke S, Fischer D, Gerdts G, Pröfrock D. Comparison and uncertainty evaluation of two centrifugal separators for microplastic sampling. J Hazard Mater. 2021;414:125482.

Morgado V, Palma C, Bettencourt da Silva RJN. Bottom-up evaluation of the uncertainty of the quantification of microplastics contamination in sediment samples. Environ Sci Technol. 2022;56:11080–90.

Hildebrandt L, El Gareb F, Zimmermann T, Klein O, Kerstan A, Emeis KC, et al. Spatial distribution of microplastics in the tropical Indian Ocean based on laser direct infrared imaging and microwave-assisted matrix digestion. Environ Pollut Barking Essex 1987. 2022;307:119547.

CAS   Google Scholar  

Hansen J, Hildebrandt L, Zimmermann T, El Gareb F, Fischer EK, Pröfrock D. Quantification and characterization of microplastics in surface water samples from the Northeast Atlantic Ocean using laser direct infrared imaging. Mar Pollut Bull. 2023;190:114880.

Rani M, Ducoli S, Depero LE, Prica M, Tubić A, Ademovic Z, et al. A complete guide to extraction methods of microplastics from complex environmental matrices. Molecules. 2023;28:5710.

Enders K, Lenz R, Beer S, Stedmon CA. Extraction of microplastic from biota: recommended acidic digestion destroys common plastic polymers. ICES J Mar Sci. 2017;74:326–31.

Article   Google Scholar  

Lopes C, Fernández-González V, Muniategui-Lorenzo S, Caetano M, Raimundo J. Improved methodology for microplastic extraction from gastrointestinal tracts of fat fish species. Mar Pollut Bull. 2022;181:113911.

Barboza LGA, Dick Vethaak A, Lavorante BRBO, Lundebye AK, Guilhermino L. Marine microplastic debris: an emerging issue for food security, food safety and human health. Mar Pollut Bull. 2018;133:336–48.

Wang S, Lu W, Cao Q, Tu C, Zhong C, Qiu L, et al. Microplastics in the lung tissues associated with blood test index. Toxics. 2023;11:759.

Ribeiro VV, Nobre CR, Moreno BB, Semensatto D, Sanz-Lazaro C, Moreira LB, et al. Oysters and mussels as equivalent sentinels of microplastics and natural particles in coastal environments. Sci Total Environ. 2023;874:162468.

Ourgaud M, Phuong NN, Papillon L, Panagiotopoulos C, Galgani F, Schmidt N, et al. Identification and quantification of microplastics in the marine environment using the laser direct infrared (LDIR) technique. Environ Sci Technol. 2022;56:9999–10009.

Zhao Q, Zhu L, Weng J, Jin Z, Cao Y, Jiang H, et al. Detection and characterization of microplastics in the human testis and semen. Sci Total Environ. 2023;877:162713.

Wu P, Lin S, Cao G, Wu J, Jin H, Wang C, et al. Absorption, distribution, metabolism, excretion and toxicity of microplastics in the human body and health implications. J Hazard Mater. 2022;437:129361.

Urbanek AK, Rymowicz W, Mirończuk AM. Degradation of plastics and plastic-degrading bacteria in cold marine habitats. Appl Microbiol Biotechnol. 2018;102:7669–78.

Jin Y, Qiu J, Zhang L, Zhu M. [Biodegradation of polyethylene terephthalate: a review]. Sheng Wu Gong Cheng Xue Bao Chin J Biotechnol. 2023;39:4445–62.

Çaykara T, Sande MG, Azoia N, Rodrigues LR, Silva CJ. Exploring the potential of polyethylene terephthalate in the design of antibacterial surfaces. Med Microbiol Immunol. 2020;209:363–72.

Sharifinia M, Bahmanbeigloo ZA, Keshavarzifard M, Khanjani MH, Lyons BP. Microplastic pollution as a grand challenge in marine research: A closer look at their adverse impacts on the immune and reproductive systems. Ecotoxicol Environ Saf. 2020;204:111109.

Potential toxicity of polystyrene microplastic particles. Scientific Reports. Available from: https://www.nature.com/articles/s41598-020-64464-9

Zhang C, Chen J, Ma S, Sun Z, Wang Z. Microplastics may be a significant cause of male infertility. Am J Mens Health. 2022;16:15579883221096549.

Compa M, Capó X, Alomar C, Deudero S, Sureda A. A meta-analysis of potential biomarkers associated with microplastic ingestion in marine fish. Environ Toxicol Pharmacol. 2024;107:104414.

Hildebrandt L, Nack FL, Zimmermann T, Pröfrock D. Microplastics as a Trojan horse for trace metals. J Hazard Mater Lett. 2021;2:100035.

Article   CAS   Google Scholar  

Download references

Author information

Authors and affiliations.

Desai Sethi Urology Institute, Miller School of Medicine, University of Miami, Miami, FL, USA

Jason Codrington, Alexandra Aponte Varnum, Joginder Bidhan, Kajal Khodamoradi, Aymara Evans, David Velasquez, Christina C. Yarborough, Ashutosh Agarwal, Edoardo Pozzi, Francesco Mesquita, Francis Petrella, David Miller & Ranjith Ramasamy

Institute of Coastal Environmental Chemistry, Department for Inorganic Environmental Chemistry, Helmholtz-Zentrum Hereon, Max-Planck-Str 1, 21502, Geesthacht, Germany

Lars Hildebrandt & Daniel Pröfrock

Institute of Membrane Research, Helmholtz-Zentrum Hereon, Max-Planck-Str 1, 21502, Geesthacht, Germany

Anke-Lisa Höhme & Martin Held

Dr. J.T. MacDonald Foundation BioNIUM, Miller School of Medicine, University of Miami, Miami, FL, USA

Bahareh Ghane-Motlagh

Department of Biomedical Engineering, University of Miami, Miami, FL, USA

Ashutosh Agarwal

University of Colorado, Anschutz Medical Campus, Aurora, CO, USA

Justin Achua

Vita-Salute San Raffaele University, Milan, Italy

Edoardo Pozzi

IRCCS Ospedale San Raffaele, Urology, Milan, Italy

You can also search for this author in PubMed   Google Scholar

Contributions

Jason Codrington—conceptualization, methodology, investigation, project administration, data curation, visualization, writing—original draft, editing. Alexandra Aponte Varnum—investigation, writing—original draft, editing, data curation, visualization. Lars Hildebrandt—investigation, writing—original draft, validation, resources. Daniel Pröfrock—investigation, editing, validation, resources. Joginder Bidhan—resources, writing—original draft. Kajal Khodamoradi—project administration, resources. Anke-Lisa Höhme—investigation, visualization. Martin Held—writing—original draft, editing. Aymara Evans—writing—original draft. David Velasquez—writing—original draft. Christina C. Yarborough—writing—original draft. Bahareh Ghane-Motlagh—investigation. Ashutosh Agarwal—investigation. Justin Achua—writing—original draft. Edoardo Pozzi—editing. Francesco Mesquita—editing. Francis Petrella—writing—review. David Miller—writing—review. Ranjith Ramasamy—conceptualization, methodology, project administration, resources, supervision, editing, funding acquisition

Corresponding author

Correspondence to Ranjith Ramasamy .

Ethics declarations

Competing interests.

Dr. Edoardo Pozzi is currently an Associate Editor for the International Journal of Impotence Research.

Ethics approval

The study was approved by the Institutional Review Board of the University of Miami (Study # 20150740) and conducted following the Declaration of Helsinki. All patients provided written and informed consent to participate in the study.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article.

Codrington, J., Varnum, A.A., Hildebrandt, L. et al. Detection of microplastics in the human penis. Int J Impot Res (2024). https://doi.org/10.1038/s41443-024-00930-6

Download citation

Received : 21 March 2024

Revised : 29 May 2024

Accepted : 04 June 2024

Published : 19 June 2024

DOI : https://doi.org/10.1038/s41443-024-00930-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

types of factor analysis in research methodology

Comprehensive analysis of single cell and bulk RNA sequencing reveals the heterogeneity of melanoma tumor microenvironment and predicts the response of immunotherapy

  • Original Research Paper
  • Published: 19 June 2024

Cite this article

types of factor analysis in research methodology

  • Yuan Zhang 1 , 2 ,
  • Cong Zhang 1 , 2 ,
  • Jing He 1 , 2 ,
  • Guichuan Lai 1 , 2 ,
  • Wenlong Li 1 , 2 ,
  • Haijiao Zeng 1 , 2 ,
  • Xiaoni Zhong 1 , 2 &
  • Biao Xie 1 , 2  

68 Accesses

Explore all metrics

Tumor microenvironment (TME) heterogeneity is an important factor affecting the treatment response of immune checkpoint inhibitors (ICI). However, the TME heterogeneity of melanoma is still widely characterized.

We downloaded the single-cell sequencing data sets of two melanoma patients from the GEO database, and used the “Scissor” algorithm and the “BayesPrism” algorithm to comprehensively analyze the characteristics of microenvironment cells based on single-cell and bulk RNA-seq data. The prediction model of immunotherapy response was constructed by machine learning and verified in three cohorts of GEO database.

We identified seven cell types. In the Scissor + subtype cell population, the top three were T cells, B cells and melanoma cells. In the Scissor − subtype, there are more macrophages. By quantifying the characteristics of TME, significant differences in B cells between responders and non-responders were observed. The higher the proportion of B cells, the better the prognosis. At the same time, macrophages in the non-responsive group increased significantly. Finally, nine gene features for predicting ICI response were constructed, and their predictive performance was superior in three external validation groups.

Our study revealed the heterogeneity of melanoma TME and found a new predictive biomarker, which provided theoretical support and new insights for precise immunotherapy of melanoma patients.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

types of factor analysis in research methodology

Similar content being viewed by others

types of factor analysis in research methodology

Integrated analysis of single-cell and bulk RNA sequencing data reveals a pan-cancer stemness signature predicting immunotherapy response

types of factor analysis in research methodology

A gene expression signature of TREM2hi macrophages and γδ T cells predicts immunotherapy response

types of factor analysis in research methodology

Unravelling immune microenvironment features underlying tumor progression in the single-cell era

Data availability.

No datasets were generated or analysed during the current study.

Abbreviations

Immune checkpoint inhibitors

Programmed cell death-1

Programmed cell death-ligand 1

Cytotoxic T lymphocyte-associated protein 4

  • Tumor microenvironment

Overall survival

Single-cell RNA sequencing

Gene Expression Omnibus

Complete remission

Partial remission

Stable disease

Progression-free survival

Progressive disease

Uniform Manifold Approximation and Projection

Differentially expressed genes

Gene ontology

Kyoto Encyclopedia of Genes and Genomes

Module Membership

Gene Significance

Support vector machine

Area under the curve

Immunotherapy response-related genes

Tertiary lymphoid structure

Pre-T-cell receptor

Chalmers ZR, Connelly CF, Fabrizio D, Gay L, Ali SM, Ennis R, et al. Analysis of 100,000 human cancer genomes reveals the landscape of tumor mutational burden. Genome Med. 2017;9(1):34.

Article   PubMed   PubMed Central   Google Scholar  

Pitcovski J, Shahar E, Aizenshtein E, Gorodetsky R. Melanoma antigens and related immunological markers. Crit Rev Oncol Hematol. 2017;115:36–49.

Article   PubMed   Google Scholar  

Maida I, Zanna P, Guida S, et al. Translational control mechanisms in cutaneous malignant melanoma: the role of eIF2α. J Transl Med. 2019;17(1):20.

Arnold M, Singh D, Laversanne M, Vignat J, Vaccarella S, Meheus F, et al. Global Burden of Cutaneous Melanoma in 2020 and projections to 2040. JAMA Dermatol. 2022;158(5):495–503.

Li M, Long X, Bu W, Zhang G, Deng G, Liu Y, et al. Immune-related risk score: an immune-cell-pair-based prognostic model for cutaneous melanoma. Front Immunol. 2023;14:1112181.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Yu L, He R, Cui Y. Characterization of tumor microenvironment and programmed death-related genes to identify molecular subtypes and drug resistance in pancreatic cancer. Front Pharmacol. 2023;14:1146280.

Knackstedt T, Knackstedt RW, Couto R, Gastman B. Malignant melanoma: Diagnostic and Management Update. Plast Reconstr Surg. 2018;142(2):e202–16.

Article   Google Scholar  

Kessenbrock K, Plaks V, Werb Z. Matrix metalloproteinases: regulators of the tumor microenvironment. Cell. 2010;141(1):52–67.

Long GV, Swetter SM, Menzies AM, Gershenwald JE, Scolyer RA. Cutaneous melanoma. Lancet. 2023;402(10400):485–502.

Czarnecka AM, Sobczuk P, Rogala P, Świtaj T, Placzke J, Kozak K, et al. Efficacy of immunotherapy beyond RECIST progression in advanced melanoma: a real-world evidence. Cancer Immunol Immunother. 2022;71(8):1949–58.

Sharma P, Hu-Lieskovan S, Wargo JA, Ribas A. Primary, adaptive, and Acquired Resistance to Cancer Immunotherapy. Cell. 2017;168(4):707–23.

Ye D, Desai J, Shi J, Liu SM, Shen W, Liu T, Shi Y, et al. Co-enrichment of CD8-positive T cells and macrophages is associated with clinical benefit of tislelizumab in solid tumors. Biomark Res. 2023;11(1):25.

Junttila MR, de Sauvage FJ. Influence of tumour micro-environment heterogeneity on therapeutic response. Nature. 2013;501(7467):346–54.

Article   CAS   PubMed   Google Scholar  

Ren D, Hua Y, Yu B, Ye X, He Z, Li C, et al. Predictive biomarkers and mechanisms underlying resistance to PD1/PD-L1 blockade cancer immunotherapy. Mol Cancer. 2020;19(1):19.

He M, Roussak K, Ma F, Borcherding N, Garin V, White M, et al. CD5 expression by dendritic cells directs T cell immunity and sustains immunotherapy responses. Science. 2023;379(6633):eabg2752.

McDermott DF, Huseni MA, Atkins MB, Motzer RJ, Rini BI, Escudier B, et al. Clinical activity and molecular correlates of response to atezolizumab alone or in combination with bevacizumab versus sunitinib in renal cell carcinoma. Nat Med. 2018;24(6):749–57.

Ceci C, Atzori MG, Lacal PM, Graziani G. Targeting Tumor-Associated macrophages to increase the efficacy of Immune Checkpoint inhibitors: a glimpse into Novel Therapeutic approaches for metastatic melanoma. Cancers (Basel). 2020;12(11):3401.

Xiong D, Wang Y, You M. A gene expression signature of TREM2hi macrophages and γδ T cells predicts immunotherapy response. Nat Commun. 2020;11(1):5084.

Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, et al. NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res. 2013;41(Database issue):D991–5.

CAS   PubMed   Google Scholar  

Sade-Feldman M, Yizhak K, Bjorgaard SL, Ray JP, de Boer CG, Jenkins RW, et al. Defining T Cell States Associated with response to Checkpoint Immunotherapy in Melanoma. Cell. 2019;176(1–2):404.

Jerby-Arnon L, Shah P, Cuoco MS, Rodman C, Su MJ, Melms JC, et al. A Cancer Cell Program promotes T cell exclusion and resistance to checkpoint blockade. Cell. 2018;175(4):984–e99724.

Riaz N, Havel JJ, Makarov V, Desrichard A, Urba WJ, Sims JS, et al. Tumor and Microenvironment Evolution during Immunotherapy with Nivolumab. Cell. 2017;171(4):934–e94916.

Gide TN, Quek C, Menzies AM, Tasker AT, Shang P, Holst J, et al. Distinct Immune cell populations define response to Anti-PD-1 monotherapy and Anti-PD-1/Anti-CTLA-4 combined Therapy. Cancer Cell. 2019;35(2):238–e2556.

Hugo W, Zaretsky JM, Sun L, Song C, Moreno BH, Hu-Lieskovan S, et al. Genomic and transcriptomic features of response to Anti-PD-1 therapy in metastatic melanoma. Cell. 2016;165(1):35–44.

Lee JH, Shklovskaya E, Lim SY, Carlino MS, Menzies AM, Stewart A, et al. Transcriptional downregulation of MHC class I and melanoma de- differentiation in resistance to PD-1 inhibition. Nat Commun. 2020;11(1):1897.

Auslander N, Zhang G, Lee JS, Frederick DT, Miao B, Moll T, et al. Robust prediction of response to immune checkpoint blockade therapy in metastatic melanoma. Nat Med. 2018;24(10):1545–9.

Du K, Wei S, Wei Z, Frederick DT, Miao B, Moll T, et al. Pathway signatures derived from on-treatment tumor specimens predict response to anti-PD1 blockade in metastatic melanoma. Nat Commun. 2021;12(1):6023.

Butler A, Hoffman P, Smibert P, Papalexi E, Satija R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol. 2018;36(5):411–20.

Kang B, Camps J, Fan B, Jiang H, Ibrahim MM, Hu X, et al. Parallel single-cell and bulk transcriptome analyses reveal key features of the gastric tumor microenvironment. Genome Biol. 2022;23(1):265.

Zhang Y, Bai Y, Ma XX, Song JK, Luo Y, Fei XY, et al. Clinical-mediated discovery of pyroptosis in CD8 + T cell and NK cell reveals melanoma heterogeneity by single-cell and bulk sequence. Cell Death Dis. 2023;14(8):553.

Cao J, Spielmann M, Qiu X, Huang X, Ibrahim DM, Hill AJ, et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature. 2019;566(7745):496–502.

Jin S, Guerrero-Juarez CF, Zhang L, Chang I, Ramos R, Kuan CH, et al. Inference and analysis of cell-cell communication using CellChat. Nat Commun. 2021;12(1):1088.

Sun D, Guan X, Moran AE, Wu LY, Qian DZ, Schedin P, et al. Identifying phenotype-associated subpopulations by integrating bulk and single-cell sequencing data. Nat Biotechnol. 2022;40(4):527–38.

Chu T, Wang Z, Pe’er D, Danko CG. Cell type and gene expression deconvolution with BayesPrism enables bayesian integrative analysis across bulk and single-cell RNA sequencing in oncology. Nat Cancer. 2022;3(4):505–17.

Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:559.

Wang H, Shao Y, Zhou S, Zhang C, Xiu N. Support Vector Machine Classifier via L0/1 soft-margin loss. IEEE Trans Pattern Anal Mach Intell. 2022;44(10):7253–65.

Reinhold WC, Sunshine M, Liu H, Varma S, Kohn KW, Morris J, et al. CellMiner: a web-based suite of genomic and pharmacologic tools to explore transcript and drug patterns in the NCI-60 Cell Line Set. Cancer Res. 2012;72(14):3499–511.

Tirosh I, Izar B, Prakadan SM, Wadsworth MH 2nd, Treacy D, Trombetta JJ, et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science. 2016;352(6282):189–96.

Chen C, Li Y, Zhou ZY, Sun GQ. An Immune-Related Gene Prognostic Index for Head and Neck squamous cell carcinoma. Clin Cancer Res. 2021;27(1):330–41.

Davidson G, Helleux A, Vano YA, Lindner V, Fattori A, Cerciat M, et al. Mesenchymal-like Tumor cells and Myofibroblastic Cancer-Associated fibroblasts are Associated with Progression and Immunotherapy Response of Clear Cell Renal Cell Carcinoma. Cancer Res. 2023;83(17):2952–69.

Bindea G, Mlecnik B, Tosolini M, Kirilovsky A, Waldner M, Obenauf AC, et al. Spatiotemporal Dynamics of Intratumoral Immune Cells Reveal the Immune Landscape in Human Cancer. Immunity. 2013;39(4):782–95.

Song P, Li W, Guo L, Ying J, Gao S, He J. Identification and validation of a Novel signature based on NK cell marker genes to Predict Prognosis and Immunotherapy Response in Lung Adenocarcinoma by Integrated Analysis of single-cell and bulk RNA-Sequencing. Front Immunol. 2022;13:850745.

Song P, Li W, Wu X, Qian Z, Ying J, Gao S, et al. Integrated analysis of single-cell and bulk RNA-sequencing identifies a signature based on B cell marker genes to predict prognosis and immunotherapy response in lung adenocarcinoma. Cancer Immunol Immunother. 2022;71(10):2341–54.

Shi X, Dong A, Jia X, Zheng G, Wang N, Wang Y, et al. Integrated analysis of single-cell and bulk RNA-sequencing identifies a signature based on T-cell marker genes to predict prognosis and therapeutic response in lung squamous cell carcinoma. Front Immunol. 2022;13:992990.

Becht E, Giraldo NA, Lacroix L, Buttard B, Elarouci N, Petitprez F, et al. Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome Biol. 2016;17(1):218.

Schumacher TN, Thommen DS. Tertiary lymphoid structures in cancer. Science. 2022;375(6576):eabf9419.

Fridman WH, Sibéril S, Pupier G, Soussan S, Sautès-Fridman C. Activation of B cells in Tertiary lymphoid structures in cancer: anti-tumor or anti-self? Semin Immunol. 2023;65:101703.

Lindner S, Dahlke K, Sontheime K, Hagn M, Kaltenmeier C, Barth TF, et al. Interleukin-21-Induced Granzyme B-Expressing B lymphocytes infiltrate tumors and regulate T cells. Cancer Res. 2013;73(8):2468–79.

Zhang G, Gao Z, Guo X, Ma R, Wang X, Zhou P, et al. CAP2 promotes gastric cancer metastasis by mediating the interaction between tumor cells and tumor-associated macrophages. J Clin Invest. 2023;133(21):e166224.

Ostuni R, Kratochvill F, Murray PJ, Natoli G. Macrophages and cancer: from mechanisms to therapeutic implications. Trends Immunol. 2015;36(4):229–39.

Wildes TJ, Dyson KA, Francis C, Wummer B, Yang C, Yegorov O, et al. Immune escape after adoptive T-cell therapy for malignant gliomas. Clin Cancer Res. 2020;26(21):5689–700.

Chen S, Saeed AFUH, Liu Q, Jiang Q, Xu H, Xiao GG, et al. Macrophages in immunoregulation and therapeutics. Signal Transduct Target Ther. 2023;8(1):207.

Yu Y, Dai K, Gao Z, Tang W, Shen T, Yuan Y, et al. Sulfated polysaccharide directs therapeutic angiogenesis via endogenous VEGF secretion of macrophages. Sci Adv. 2021;7(7):eabd8217.

Aegerter H, Lambrecht BN, Jakubzick CV. Biology of Lung macrophages in health and disease. Immunity. 2022;55(9):1564–80.

Fridman WH, Meylan M, Petitprez F, Sun CM, Italiano A, Sautès-Fridman C. B cells and tertiary lymphoid structures as determinants of tumour immune contexture and clinical outcome. Nat Rev Clin Oncol. 2022;19(7):441–57.

Schmidt M, Micke P, Gehrmann M, Hengstler JG. Immunoglobulin kappa chain as an immunologic biomarker of prognosis and chemotherapy response in solid tumors. Oncoimmunology. 2012;1(7):1156–8.

Fridman WH, Meylan M, Petitprez F, Sun C-M, Italiano A, Sautès-Fridman C. B cells and tertiary lymphoid structures as determinants of tumour immune contexture and clinical outcome. Nat Rev Clin Oncol. 2022;19:441–57.

Cabrita R, Lauss M, Sanna A, Donia M, Skaarup Larsen M, Mitra S, et al. Tertiary lymphoid structures improve immunotherapy and survival in melanoma. Nature. 2020;577(7791):561–5. https://doi.org/10.1038/s41586-019-1914-8 .

Kim S, Song HS, Yu J, Kim YM. MiT Family transcriptional factors in Immune Cell functions. Mol Cells. 2021;44(5):342–55.

Rehli M, Lichanska A, Cassady AI, Ostrowski MC, Hume DA. TFEC is a macrophage-restricted member of the microphthalmia-TFE subfamily of basic helix-loop-helix leucine zipper transcription factors. J Immunol. 1999;162(3):1559–65.

Wang N, Zhou X, Wang X, Zhu X. Identification of Grb2-associated binding protein 3 expression to predict clinical outcomes and immunotherapeutic responses in lung adenocarcinoma. J Biochem Mol Toxicol. 2022;36(10):e23166.

Sliz A, Locker KCS, Lampe K, Godarova A, Plas DR, Janssen EM, et al. Gab3 is required for IL-2– and IL-15–induced NK cell expansion and limits trophoblast invasion during pregnancy. Sci Immunol. 2019;4(38):eaav3866.

Awasthi N, Liongue C, Ward AC. STAT proteins: a kaleidoscope of canonical and non-canonical functions in immunity and cancer. J Hematol Oncol. 2021;14(1):198.

Mogensen TH, IRF, Transcription Factors STAT. - from Basic Biology to roles in infection, protective immunity, and primary immunodeficiencies. Front Immunol. 2019;9:3047.

Recio C, Guerra B, Guerra-Rodríguez M, Aranda-Tavío H, Martín-Rodríguez P, de Mirecki-Garrido M, et al. Signal transducer and activator of transcription (STAT)-5: an opportunity for drug development in oncohematology. Oncogene. 2019;38(24):4657–68.

Salas A, Hernandez-Rocha C, Duijvestein M, Faubion W, McGovern D, Vermeire S, et al. JAK-STAT pathway targeting for the treatment of inflammatory bowel disease. Nat Rev Gastroenterol Hepatol. 2020;17(6):323–37.

Wang H, Zeng X, Fan Z, Lim B. RhoH modulates pre-TCR and TCR signalling by regulating LCK. Cell Signal. 2011;23(1):249–58.

Jiang B, Weinstock DM, Donovan KA, Sun HW, Wolfe A, Amaka S, et al. ITK degradation to block T cell receptor signaling and overcome therapeutic resistance in T cell lymphomas. Cell Chem Biol. 2023;30(4):383–e3936.

Gu Y, Jasti AC, Jansen M, Siefring JE. RhoH, a hematopoietic-specific rho GTPase, regulates proliferation, survival, migration, and engraftment of hematopoietic progenitor cells. Blood. 2005;105:1467–75.

Guo F, Cheng X, Jing B, Wu H, Jin X. FGD3 binds with HSF4 to suppress p65 expression and inhibit pancreatic cancer progression. Oncogene. 2022;41(6):838–51.

Zhu J, Hao S, Zhang X, Qiu J, Xuan Q, Ye L. Integrated Bioinformatics Analysis Exhibits Pivotal Exercise-Induced genes and corresponding pathways in malignant melanoma. Front Genet. 2021;11:637320.

Huang L, Zhong L, Cheng R, Chang L, Qin M, Liang H, et al. Ferroptosis and WDFY4 as novel targets for immunotherapy of lung adenocarcinoma. Aging. 2023;15(18):9676–94.

Conti BJ, Davis BK, Zhang J, O’connor W KL Jr, Ting JP. CATERPILLER 16.2 (CLR16.2), a novel NBD/LRR family member that negatively regulates T cell function. J Biol Chem. 2005;280(18):18375–85.

Caraux A, Kim N, Bell SE, Zompi S, Ranson T, Lesjean-Pottier S, et al. Phospholipase C-gamma2 is essential for NK cell cytotoxicity and innate immunity to malignant and virally infected cells. Blood. 2006;107(3):994–1002.

Qiu C, Shi W, Wu H, Zou S, Li J, Wang D, et al. Identification of Molecular subtypes and a prognostic signature based on inflammation-related genes in Colon Adenocarcinoma. Front Immunol. 2021;12:769685.

Download references

Acknowledgements

Here, we thank the GEO, BioProject, and EGA databases for providing relevant data to support our studies.

This research was funded by the National Youth Science Foundation Project, grant number (82204159), Science and Technology Research Program of Chongqing Municipal Education Commission (Grant No. KJQN202300423), Chongqing Maternal and Child Disease Prevention and Control and Public Health Research Center Open Project (Grant No. CQFYJB01001), and the Chongqing Postgraduate Scientific Research and Innovation Project in 2023 (No. CYS23355).

Author information

Authors and affiliations.

Department of Epidemiology and Health Statistics, School of Public Health, Chongqing Medical University, Yixue Road, Chongqing, 400016, China

Yuan Zhang, Cong Zhang, Jing He, Guichuan Lai, Wenlong Li, Haijiao Zeng, Xiaoni Zhong & Biao Xie

Research Center for Medicine and Social Development, Chongqing Medical University, Chongqing, China

You can also search for this author in PubMed   Google Scholar

Contributions

Conceptualization, Y.Z. and C.Z.; methodology, Y.Z.; software, Y.Z. and J.H.; validation, G.L.; formal analysis, W.L.; investigation, H.Z.; resources, X.Z.; data curation, J.H.; writing-original draft preparation, Y.Z.; writing-review and editing, Y.Z.; visualization, C.Z.; supervision, X.Z.; project administration, B.X.; funding acquisition, B.X.; All authors have read and agreed to the published version of the manuscript.

Corresponding authors

Correspondence to Xiaoni Zhong or Biao Xie .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Communicated by John Di Battista.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary material 2, rights and permissions.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Zhang, Y., Zhang, C., He, J. et al. Comprehensive analysis of single cell and bulk RNA sequencing reveals the heterogeneity of melanoma tumor microenvironment and predicts the response of immunotherapy. Inflamm. Res. (2024). https://doi.org/10.1007/s00011-024-01905-5

Download citation

Received : 26 March 2024

Revised : 07 June 2024

Accepted : 09 June 2024

Published : 19 June 2024

DOI : https://doi.org/10.1007/s00011-024-01905-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Immune checkpoint inhibitor
  • Deconvolution
  • Predicting biomarkers
  • Find a journal
  • Publish with us
  • Track your research

COMMENTS

  1. Factor Analysis

    Factor Analysis Steps. Here are the general steps involved in conducting a factor analysis: 1. Define the Research Objective: Clearly specify the purpose of the factor analysis. Determine what you aim to achieve or understand through the analysis. 2. Data Collection: Gather the data on the variables of interest.

  2. Factor Analysis Guide with an Example

    The first methodology choice for factor analysis is the mathematical approach for extracting the factors from your dataset. The most common choices are maximum likelihood (ML), principal axis factoring (PAF), and principal components analysis (PCA). You should use either ML or PAF most of the time.

  3. Factor Analysis: a means for theory and instrument development in

    Factor analysis methods can be incredibly useful tools for researchers attempting to establish high quality measures of those constructs not directly observed and captured by observation. Specifically, the factor solution derived from an Exploratory Factor Analysis provides a snapshot of the statistical relationships of the key behaviors ...

  4. A Practical Introduction to Factor Analysis

    Overview. Factor analysis is a method for modeling observed variables and their covariance structure in terms of unobserved variables (i.e., factors). There are two types of factor analyses, exploratory and confirmatory. Exploratory factor analysis (EFA) is method to explore the underlying structure of a set of observed variables, and is a ...

  5. Lesson 12: Factor Analysis

    Overview. Factor Analysis is a method for modeling observed variables, and their covariance structure, in terms of a smaller number of underlying unobservable (latent) "factors.". The factors typically are viewed as broad concepts or ideas that may describe an observed phenomenon. For example, a basic desire of obtaining a certain social ...

  6. Factor Analysis and How It Simplifies Research Findings

    Factor analysis isn't a single technique, but a family of statistical methods that can be used to identify the latent factors driving observable variables. Factor analysis is commonly used in market research , as well as other disciplines like technology, medicine, sociology, field biology, education, psychology and many more.

  7. Exploratory Factor Analysis: A Guide to Best Practice

    Exploratory factor analysis (EFA) is one of a family of multivariate statistical methods that attempts to identify the smallest number of hypothetical constructs (also known as factors, dimensions, latent variables, synthetic variables, or internal attributes) that can parsimoniously explain the covariation observed among a set of measured variables (also called observed variables, manifest ...

  8. Factor Analysis: Definition, Types, and Examples

    Factor analysis is a commonly used data reduction statistical technique within the context of market research. The goal of factor analysis is to discover relationships between variables within a dataset by looking at correlations. ... but it's important to understand that there is different math going on behind the scenes for each method. Types ...

  9. PDF What is factor analysis and how does it simplify research findings?

    Types of factor analysis There are two basic forms of factor analysis, explorator y and confirmator y. Here's how they are used to add value to your research process. Confirmatory factor analysis In this type of analysis, the researcher starts out with a hypothesis about their data that they are looking to prove or disprove. Factor analysis will

  10. Introduction to Exploratory Factor Analysis: An Applied Approach

    The most substantive part of the chapter focuses on six steps of EFA. More specifically, we consider variable (or indicator) selection (Step 1), computing the variance-covariance matrix (Step 2), factor-extraction methods (Step 3), factor-retention procedures (Step 4), factor-rotation methods (Step 5), and interpretation (Step 6).

  11. PDF A Beginner's Guide to Factor Analysis: Focusing on Exploratory Factor

    The formula for deriving the communalities is where a equals the loadings for j variables. Using the factor loadings in Table 1, we then calculate the communalities using the aforementioned formula, thus. = 0.78. The values in the table represent the factor loadings and how much the variable contributes to. Figure 2.

  12. A Practical Introduction to Factor Analysis: Exploratory Factor Analysis

    Purpose. This seminar is the first part of a two-part seminar that introduces central concepts in factor analysis. Part 1 focuses on exploratory factor analysis (EFA). Although the implementation is in SPSS, the ideas carry over to any software program. Part 2 introduces confirmatory factor analysis (CFA).

  13. Sage Research Methods

    Describes various commonly used methods of initial factoring and factor rotation. In addition to a full discussion of exploratory factor analysis, confirmatory factor analysis and various methods of constructing factor scales are also presented

  14. Factor Analysis: Types and Applications

    Pros and Cons of Factor Analysis . Having learned about Factor Analysis in detail, let us now move on to looking closely into the pros and cons of this statistical method. Pros of Factor Analysis . Measurable Attributes . The first and foremost pro of FA is that it is open to all measurable attributes.

  15. Sage Research Methods Foundations

    Confirmatory factor analysis can be seen as a special case of structural equation modeling. This entry presents a short overview over the concept and history of confirmatory factor analysis and explains the basic mathematical fundamentals. Then, it shows options to decide about the model fit and discusses multiple group comparisons.

  16. An Introduction to Factor Analysis: Reducing Variables

    Factor analysis is a sophisticated statistical method aimed at reducing a large number of variables into a smaller set of factors. This technique is valuable for extracting the maximum common variance from all variables, transforming them into a single score for further analysis. As a part of the general linear model (GLM), factor analysis is ...

  17. Factor Analysis 101: The Basics

    Factor analysis is a powerful data reduction technique that enables researchers to investigate concepts that cannot easily be measured directly. By boiling down a large number of variables into a handful of comprehensible underlying factors, factor analysis results in easy-to-understand, actionable data.

  18. PDF Factor Analysis

    Factor Analysis Factor analysis is used to uncover the latent structure of a set of variables. It reduces attribute space from a large no. of variables to a smaller no. of factors and as such is a non dependent procedure. Factor analysis could be used for any of the following purpose- 1.

  19. Sage Research Methods: Business

    This guide further explains various parts and parcels of factor analysis: (1) the process of factor loading on a specific survey case, (2) the identification process for an appropriate number of factors and optimal combination of factors, depending on the specific research design and goals, and (3) an explanation of dimensions, their reduction ...

  20. Factor Analysis

    There are different methods that we use in factor analysis from the data set: 1. Principal component analysis. It is the most common method which the researchers use. Also, it extracts the maximum variance and put them into the first factor. Subsequently, it removes the variance explained by the first factor and extracts the second factor.

  21. PDF Factor Rotations in Factor Analyses.

    practice of rotation in factor analysis, it is strongly recommended to try several sizes for the subspace of the retained factors in order to assess the robustness of the interpretation of the rotation. Notations 1In: Lewis-Beck M., Bryman, A., Futing T. (Eds.) (2003). Encyclopedia of Social Sciences Research Methods. Thousand Oaks (CA): Sage.

  22. (PDF) Overview of Factor Analysis

    Chapter 1. Theoretical In tro duction. • Factor analysis is a collection of methods used to examine how underlying constructs influence the. resp onses on a n umber of measured v ariables ...

  23. Bayesian Statistical Inference for Factor Analysis Models with ...

    Clustered data are a complex and frequently used type of data. Traditional factor analysis methods are effective for non-clustered data, but they do not adequately capture correlations between multiple observed individuals or variables in clustered data. This paper proposes a Bayesian approach utilizing MCMC and Gibbs sampling algorithms to accurately estimate parameters of interest within the ...

  24. A Beginner's Guide to Types of Research

    Research Methods. The first step of a research methodology is to identify a focused research topic, which is the question you seek to answer. By setting clear boundaries on the scope of your research, you can concentrate on specific aspects of a problem without being overwhelmed by information. This will produce more accurate findings.

  25. The State Of Workplace Communication In 2024

    When it comes to preferred methods of communication, many workers prefer old-fashioned tools. Email is the most popular tool, with 18% of total respondents marking it as their preference (25% of ...

  26. Identifying therapeutic target genes for migraine by systematic

    The results of phenome-wide research showed that HMGCR was highly correlated with low-density lipoprotein, and TGFB3 was primarily associated with insulin-like growth factor 1 levels. This study utilized MR and colocalization analysis to identify 21 potential drug targets for migraine, two of which were significant in both blood and brain.

  27. Associations between deep venous thrombosis and thyroid diseases: a two

    Statistical analysis. Multiple MR methods were used to infer causal relationships between thyroid diseases and DVT, including the inverse variance weighted (IVW), weighted median, and MR-Egger tests, after harmonizing the SNPs across the GWASs of exposures and outcomes. The main analysis was conducted using the IVW method.

  28. Detection of microplastics in the human penis

    Seven types of MPs were found in the penile tissue, with polyethylene terephthalate (47.8%) and polypropylene (34.7%) being the most prevalent. ... Our research adds a key dimension to the ...

  29. Sage Research Methods Foundations

    Discover method in the Methods Map. Foundations Add to list Added to list . Factor Analysis. By: ... Methods: Factor analysis, Exploratory factor analysis, Covariance matrix; Length: 5k+ Words DOI: https:// doi. org/10.4135 ... About Sage Publishing About Sage Research Methods Accessibility Author Guidelines AI/LLM CCPA ...

  30. Comprehensive analysis of single cell and bulk RNA ...

    Background Tumor microenvironment (TME) heterogeneity is an important factor affecting the treatment response of immune checkpoint inhibitors (ICI). However, the TME heterogeneity of melanoma is still widely characterized. Methods We downloaded the single-cell sequencing data sets of two melanoma patients from the GEO database, and used the "Scissor" algorithm and the "BayesPrism ...