Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Correlational Research

Learning objectives.

  • Explain correlational research, including what a correlation coefficient tells us about the relationship between variables

One of the primary methods used to study abnormal behavior is the correlational method.  Correlation means that there is a relationship between two or more variables (such between the variables of negative thinking and depressive symptoms), but this relationship does not necessarily imply cause and effect. When two variables are correlated, it simply means that as one variable changes, so does the other. We can measure correlation by calculating a statistic known as a correlation coefficient. A correlation coefficient is a number from negative one to positive one that indicates the strength and direction of the relationship between variables. The association between two variables can be summarized statistically using the correlation coefficient (abbreviated as  r ).

The number portion of the correlation coefficient indicates the strength of the relationship. The closer the number is to one (be it negative or positive), the more strongly related the variables are, and the more predictable changes in one variable will be as the other variable changes. The closer the number is to zero, the weaker the relationship, and the less predictable the relationships between the variables becomes. For instance, a correlation coefficient of 0.9 indicates a far stronger relationship than a correlation coefficient of 0.3. If the variables are not related to one another at all, the correlation coefficient is zero. The example above about negative thinking and depressive symptoms is an example of two variables that we might expect to have a relationship to each other.  When higher values in one variable (negative thinking) are associated with higher values in the other variable (depressive symptoms), there is a  positive correlation  between the variables.

The sign—positive or negative—of the correlation coefficient indicates the direction of the relationship.  Positive correlations carry positive signs; negative correlations carry negative signs.  A positive correlation means that the variables move in the same direction. Put another way, it means that as one variable increases so does the other, and conversely, when one variable decreases so does the other. A negative correlation means that the variables move in opposite directions. If two variables are negatively correlated, a decrease in one variable is associated with an increase in the other and vice versa.

Other examples of positive correlations are the relationship between depression and disturbance in normal sleep patterns. One might expect then that scores on a measure of depression would be positively correlated with scores on a measure of sleep disturbances.

One might expect a negative correlation to exist between  between depression and self-esteem.  The more depressed people are, the lower their scores are on the Rosenberg self-esteem scale (RSES), a self-esteem measure widely used in social-science research.  Keep in mind that a negative correlation is not the same as no correlation. For example, we would probably find no correlation between  depression  and someone’s   height. 

In correlational research,  scientists passively observe and measure phenomena.    Here, we do not intervene and change behavior, as we do in experiments. In correlational research, we identify patterns of relationships, but we usually cannot infer what causes what. Importantly, with correlational research, you can examine only two variables at a time, no more and no less.

As mentioned earlier, correlations have predictive value. So, what if you wanted to test whether spending on others is related to happiness, but you don’t have $20 to give to each participant? You could use a correlational design—which is exactly what Professor Dunn did, too. She asked people how much of their income they spent on others or donated to charity, and later she asked them how happy they were. Do you think these two variables were related? Yes, they were! The more money people reported spending on others, the happier they were.

More Details about the Correlation

To find out how well two variables correspond, we can plot the relationship between the two scores on what is known as a scatterplot (Figure 1). In the scatterplot, each dot represents a data point. (In this case it’s individuals, but it could be some other unit.) Importantly, each dot provides us with two pieces of information—in this case, information about how good the person rated the past month ( x -axis) and how happy the person felt in the past month ( y -axis). Which variable is plotted on which axis does not matter.

Scatterplot of the association between happiness and ratings of the past month, a positive correlation (r = .81)

For the example above, the direction of the association is positive. This means that people who perceived the past month as being good reported feeling more happy, whereas people who perceived the month as being bad reported feeling less happy.

In a scatterplot, the dots form a pattern that extends from the bottom left to the upper right (just as they do in Figure 1). The  r  value for a positive correlation is indicated by a positive number (although, the positive sign is usually omitted). Here, the  r  value is 0.81.

Figure 2 shows a  negative correlation,   the association between the average height of males in a country ( y -axis) and the pathogen prevalence, or commonness of disease, of that country ( x -axis). In this scatterplot, each dot represents a country. Notice how the dots extend from the top left to the bottom right. What does this mean in real-world terms? It means that people are shorter in parts of the world where there is more disease. The  r  value for a negative correlation is indicated by a negative number—that is, it has a minus (−) sign in front of it. Here, it is −0.83.

Scatterplot showing the association between average male height and pathogen prevalence, a negative correlation (r = –.83).

The strength of a correlation has to do with how well the two variables align. Recall that in Professor Dunn’s correlational study, spending on others positively correlated with happiness: the more money people reported spending on others, the happier they reported to be. At this point, you may be thinking to yourself, “I know a very generous person who gave away lots of money to other people but is miserable!” Or maybe you know of a very stingy person who is happy as can be. Yes, there might be exceptions. If an association has many exceptions, it is considered a weak correlation. If an association has few or no exceptions, it is considered a strong correlation. A strong correlation is one in which the two variables always, or almost always, go together. In the example of happiness and how good the month has been, the association is strong. The stronger a correlation is, the tighter the dots in the scatterplot will be arranged along a sloped line. [1]

Problems with correlation

If generosity and happiness are positively correlated, should we conclude that being generous causes happiness? Similarly, if height and pathogen prevalence are negatively correlated, should we conclude that disease causes shortness? From a correlation alone, we can’t be certain. For example, in the first case it may be that happiness causes generosity, or that generosity causes happiness. Or, a third variable might cause both happiness  and  generosity, creating the illusion of a direct link between the two. For example, wealth could be the third variable that causes both greater happiness and greater generosity. This is why correlation does not mean causation—an often repeated phrase among psychologists. [2]

Correlation Does Not Indicate Causation

Correlational research is useful because it allows us to discover the strength and direction of relationships that exist between two variables. However, correlation is limited because establishing the existence of a relationship tells us little about cause and effect . While variables are sometimes correlated because one does cause the other, it could also be that some other factor, a confounding variable , is actually causing the systematic movement in our variables of interest. In the  depression and negative thinking   example mentioned earlier, stress  is a confounding variable that could account for the relationship between the two variables.   

Even when we cannot point to clear confounding variables, we should not assume that a correlation between two variables implies that one variable causes changes in another. This can be frustrating when a cause-and-effect relationship seems clear and intuitive. Think back to our example about the relationship between depression and disturbance in normal sleep patterns.  It seems reasonable to assume that s leep disturbance might cause a higher score on a measure of depression, just as a high degree of depression might cause more disturbed sleep patterns , but if we were limited to  correlational research , we would be overstepping our bounds by making this assumption.  Both depression and sleep disturbance could be due to an underlying physiological disorder o r any to other third variable that you have not measured .

Unfortunately, people mistakenly make claims of causation as a function of correlations all the time.   While correlational research is invaluable in identifying relationships among variables, a major limitation is the inability to establish causality.  The correlational method does not involve manipulation of the variables of interest. In the previous example, the experimenter does not manipulate people’s depressive symptoms or sleep patterns.  Psychologists want to make statements about cause and effect, but the only way to do that is to conduct an experiment to answer a research question. The next section describes how  investigators use experimental methods in which the experimenter manipulates one or more variables of interest and observes their effects on other variables or outcomes under controlled conditions.

In this video, we discuss one of the best methods psychologists have for predicting behaviors: correlation. But does that mean that a behavior is absolutely going to happen? Let’s find out!

You can view the transcript for “#5 Correlation vs. Causation – Psy 101” here (opens in new window) .

Think It Over

Consider why correlational research is often used in the study of abnormal behavior. If correlational designs do not demonstrate causation, why do researchers make causal claims regarding their results? Are there instances when correlational results could demonstrate causation?

cause-and-effect relationship:  changes in one variable cause the changes in the other variable; can be determined only through an experimental research design

confirmation bias:  tendency to ignore evidence that disproves ideas or beliefs

confounding variable:  unanticipated outside factor that affects both variables of interest,\; often gives the false impression that changes in one variable causes changes in the other variable, when, in actuality, the outside factor causes changes in both variables

correlation: the relationship between two or more variables; when two variables are correlated, one variable changes as the other does

correlation coefficient:  number from -1 to +1, indicating the strength and direction of the relationship between variables, and usually represented by r

negative correlation:  two variables change in different directions, with one becoming larger as the other becomes smaller; a negative correlation is not the same thing as no correlation

positive correlation:  two variables change in the same direction, both becoming either larger or smaller

CC Licensed Content, Shared Previously

  • Correlational Research . Authored by : Sonja Ann Miller for Lumen Learning.  Provided by : Lumen Learning.  License :  CC BY: Attribution
  • Analyzing Findings.  Authored by : OpenStax College.  Located at :  http://cnx.org/contents/[email protected]:mfArybye@7/Analyzing-Findings .  License :  CC BY: Attribution .  License Terms : Download for free at http://cnx.org/contents/[email protected]
  • Research Designs.  Authored by : Christie Napa Scollon .  Provided by : Singapore Management University.  Located at :  https://nobaproject.com/modules/research-designs .  Project : The Noba Project.  License :  CC BY-NC-SA: Attribution-NonCommercial-ShareAlike

All Rights Reserved Content

  • Correlation vs. Causality: Freakonomics Movie.  Located at :  https://www.youtube.com/watch?v=lbODqslc4Tg .  License :  Other .  License Terms : Standard YouTube License
  • Scollon, C. N. (2020). Research designs. In R. Biswas-Diener & E. Diener (Eds), Noba textbook series: Psychology. Champaign, IL: DEF publishers. Retrieved from http://noba.to/acxb2thy ↵

Correlational Research Copyright © by Meredith Palm is licensed under a Creative Commons Attribution 4.0 International License , except where otherwise noted.

Share This Book

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • Correlational Research | When & How to Use

Correlational Research | When & How to Use

Published on July 7, 2021 by Pritha Bhandari . Revised on June 22, 2023.

A correlational research design investigates relationships between variables without the researcher controlling or manipulating any of them.

A correlation reflects the strength and/or direction of the relationship between two (or more) variables. The direction of a correlation can be either positive or negative.

Positive correlation Both variables change in the same direction As height increases, weight also increases
Negative correlation The variables change in opposite directions As coffee consumption increases, tiredness decreases
Zero correlation There is no relationship between the variables Coffee consumption is not correlated with height

Table of contents

Correlational vs. experimental research, when to use correlational research, how to collect correlational data, how to analyze correlational data, correlation and causation, other interesting articles, frequently asked questions about correlational research.

Correlational and experimental research both use quantitative methods to investigate relationships between variables. But there are important differences in data collection methods and the types of conclusions you can draw.

Correlational research Experimental research
Purpose Used to test strength of association between variables Used to test cause-and-effect relationships between variables
Variables Variables are only observed with no manipulation or intervention by researchers An is manipulated and a dependent variable is observed
Control Limited is used, so other variables may play a role in the relationship are controlled so that they can’t impact your variables of interest
Validity High : you can confidently generalize your conclusions to other populations or settings High : you can confidently draw conclusions about causation

Prevent plagiarism. Run a free check.

Correlational research is ideal for gathering data quickly from natural settings. That helps you generalize your findings to real-life situations in an externally valid way.

There are a few situations where correlational research is an appropriate choice.

To investigate non-causal relationships

You want to find out if there is an association between two variables, but you don’t expect to find a causal relationship between them.

Correlational research can provide insights into complex real-world relationships, helping researchers develop theories and make predictions.

To explore causal relationships between variables

You think there is a causal relationship between two variables, but it is impractical, unethical, or too costly to conduct experimental research that manipulates one of the variables.

Correlational research can provide initial indications or additional support for theories about causal relationships.

To test new measurement tools

You have developed a new instrument for measuring your variable, and you need to test its reliability or validity .

Correlational research can be used to assess whether a tool consistently or accurately captures the concept it aims to measure.

There are many different methods you can use in correlational research. In the social and behavioral sciences, the most common data collection methods for this type of research include surveys, observations , and secondary data.

It’s important to carefully choose and plan your methods to ensure the reliability and validity of your results. You should carefully select a representative sample so that your data reflects the population you’re interested in without research bias .

In survey research , you can use questionnaires to measure your variables of interest. You can conduct surveys online, by mail, by phone, or in person.

Surveys are a quick, flexible way to collect standardized data from many participants, but it’s important to ensure that your questions are worded in an unbiased way and capture relevant insights.

Naturalistic observation

Naturalistic observation is a type of field research where you gather data about a behavior or phenomenon in its natural environment.

This method often involves recording, counting, describing, and categorizing actions and events. Naturalistic observation can include both qualitative and quantitative elements, but to assess correlation, you collect data that can be analyzed quantitatively (e.g., frequencies, durations, scales, and amounts).

Naturalistic observation lets you easily generalize your results to real world contexts, and you can study experiences that aren’t replicable in lab settings. But data analysis can be time-consuming and unpredictable, and researcher bias may skew the interpretations.

Secondary data

Instead of collecting original data, you can also use data that has already been collected for a different purpose, such as official records, polls, or previous studies.

Using secondary data is inexpensive and fast, because data collection is complete. However, the data may be unreliable, incomplete or not entirely relevant, and you have no control over the reliability or validity of the data collection procedures.

After collecting data, you can statistically analyze the relationship between variables using correlation or regression analyses, or both. You can also visualize the relationships between variables with a scatterplot.

Different types of correlation coefficients and regression analyses are appropriate for your data based on their levels of measurement and distributions .

Correlation analysis

Using a correlation analysis, you can summarize the relationship between variables into a correlation coefficient : a single number that describes the strength and direction of the relationship between variables. With this number, you’ll quantify the degree of the relationship between variables.

The Pearson product-moment correlation coefficient , also known as Pearson’s r , is commonly used for assessing a linear relationship between two quantitative variables.

Correlation coefficients are usually found for two variables at a time, but you can use a multiple correlation coefficient for three or more variables.

Regression analysis

With a regression analysis , you can predict how much a change in one variable will be associated with a change in the other variable. The result is a regression equation that describes the line on a graph of your variables.

You can use this equation to predict the value of one variable based on the given value(s) of the other variable(s). It’s best to perform a regression analysis after testing for a correlation between your variables.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

chapter 3 research methods correlations

It’s important to remember that correlation does not imply causation . Just because you find a correlation between two things doesn’t mean you can conclude one of them causes the other for a few reasons.

Directionality problem

If two variables are correlated, it could be because one of them is a cause and the other is an effect. But the correlational research design doesn’t allow you to infer which is which. To err on the side of caution, researchers don’t conclude causality from correlational studies.

Third variable problem

A confounding variable is a third variable that influences other variables to make them seem causally related even though they are not. Instead, there are separate causal links between the confounder and each variable.

In correlational research, there’s limited or no researcher control over extraneous variables . Even if you statistically control for some potential confounders, there may still be other hidden variables that disguise the relationship between your study variables.

Although a correlational study can’t demonstrate causation on its own, it can help you develop a causal hypothesis that’s tested in controlled experiments.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Quantitative research
  • Ecological validity

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

A correlation reflects the strength and/or direction of the association between two or more variables.

  • A positive correlation means that both variables change in the same direction.
  • A negative correlation means that the variables change in opposite directions.
  • A zero correlation means there’s no relationship between the variables.

A correlational research design investigates relationships between two variables (or more) without the researcher controlling or manipulating any of them. It’s a non-experimental type of quantitative research .

Controlled experiments establish causality, whereas correlational studies only show associations between variables.

  • In an experimental design , you manipulate an independent variable and measure its effect on a dependent variable. Other variables are controlled so they can’t impact the results.
  • In a correlational design , you measure variables without manipulating any of them. You can test whether your variables change together, but you can’t be sure that one variable caused a change in another.

In general, correlational research is high in external validity while experimental research is high in internal validity .

A correlation is usually tested for two variables at a time, but you can test correlations between three or more variables.

A correlation coefficient is a single number that describes the strength and direction of the relationship between your variables.

Different types of correlation coefficients might be appropriate for your data based on their levels of measurement and distributions . The Pearson product-moment correlation coefficient (Pearson’s r ) is commonly used to assess a linear relationship between two quantitative variables.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bhandari, P. (2023, June 22). Correlational Research | When & How to Use. Scribbr. Retrieved September 11, 2024, from https://www.scribbr.com/methodology/correlational-research/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, what is quantitative research | definition, uses & methods, correlation vs. causation | difference, designs & examples, correlation coefficient | types, formulas & examples, "i thought ai proofreading was useless but..".

I've been using Scribbr for years now and I know it's a service that won't disappoint. It does a good job spotting mistakes”

Logo for Kwantlen Polytechnic University

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Non-Experimental Research

29 Correlational Research

Learning objectives.

  • Define correlational research and give several examples.
  • Explain why a researcher might choose to conduct correlational research rather than experimental research or another type of non-experimental research.
  • Interpret the strength and direction of different correlation coefficients.
  • Explain why correlation does not imply causation.

What Is Correlational Research?

Correlational research is a type of non-experimental research in which the researcher measures two variables (binary or continuous) and assesses the statistical relationship (i.e., the correlation) between them with little or no effort to control extraneous variables. There are many reasons that researchers interested in statistical relationships between variables would choose to conduct a correlational study rather than an experiment. The first is that they do not believe that the statistical relationship is a causal one or are not interested in causal relationships. Recall two goals of science are to describe and to predict and the correlational research strategy allows researchers to achieve both of these goals. Specifically, this strategy can be used to describe the strength and direction of the relationship between two variables and if there is a relationship between the variables then the researchers can use scores on one variable to predict scores on the other (using a statistical technique called regression, which is discussed further in the section on Complex Correlation in this chapter).

Another reason that researchers would choose to use a correlational study rather than an experiment is that the statistical relationship of interest is thought to be causal, but the researcher  cannot manipulate the independent variable because it is impossible, impractical, or unethical. For example, while a researcher might be interested in the relationship between the frequency people use cannabis and their memory abilities they cannot ethically manipulate the frequency that people use cannabis. As such, they must rely on the correlational research strategy; they must simply measure the frequency that people use cannabis and measure their memory abilities using a standardized test of memory and then determine whether the frequency people use cannabis is statistically related to memory test performance. 

Correlation is also used to establish the reliability and validity of measurements. For example, a researcher might evaluate the validity of a brief extraversion test by administering it to a large group of participants along with a longer extraversion test that has already been shown to be valid. This researcher might then check to see whether participants’ scores on the brief test are strongly correlated with their scores on the longer one. Neither test score is thought to cause the other, so there is no independent variable to manipulate. In fact, the terms  independent variable  and dependent variabl e  do not apply to this kind of research.

Another strength of correlational research is that it is often higher in external validity than experimental research. Recall there is typically a trade-off between internal validity and external validity. As greater controls are added to experiments, internal validity is increased but often at the expense of external validity as artificial conditions are introduced that do not exist in reality. In contrast, correlational studies typically have low internal validity because nothing is manipulated or controlled but they often have high external validity. Since nothing is manipulated or controlled by the experimenter the results are more likely to reflect relationships that exist in the real world.

Finally, extending upon this trade-off between internal and external validity, correlational research can help to provide converging evidence for a theory. If a theory is supported by a true experiment that is high in internal validity as well as by a correlational study that is high in external validity then the researchers can have more confidence in the validity of their theory. As a concrete example, correlational studies establishing that there is a relationship between watching violent television and aggressive behavior have been complemented by experimental studies confirming that the relationship is a causal one (Bushman & Huesmann, 2001) [1] .

Does Correlational Research Always Involve Quantitative Variables?

A common misconception among beginning researchers is that correlational research must involve two quantitative variables, such as scores on two extraversion tests or the number of daily hassles and number of symptoms people have experienced. However, the defining feature of correlational research is that the two variables are measured—neither one is manipulated—and this is true regardless of whether the variables are quantitative or categorical. Imagine, for example, that a researcher administers the Rosenberg Self-Esteem Scale to 50 American college students and 50 Japanese college students. Although this “feels” like a between-subjects experiment, it is a correlational study because the researcher did not manipulate the students’ nationalities. The same is true of the study by Cacioppo and Petty comparing college faculty and factory workers in terms of their need for cognition. It is a correlational study because the researchers did not manipulate the participants’ occupations.

Figure 6.2 shows data from a hypothetical study on the relationship between whether people make a daily list of things to do (a “to-do list”) and stress. Notice that it is unclear whether this is an experiment or a correlational study because it is unclear whether the independent variable was manipulated. If the researcher randomly assigned some participants to make daily to-do lists and others not to, then it is an experiment. If the researcher simply asked participants whether they made daily to-do lists, then it is a correlational study. The distinction is important because if the study was an experiment, then it could be concluded that making the daily to-do lists reduced participants’ stress. But if it was a correlational study, it could only be concluded that these variables are statistically related. Perhaps being stressed has a negative effect on people’s ability to plan ahead (the directionality problem). Or perhaps people who are more conscientious are more likely to make to-do lists and less likely to be stressed (the third-variable problem). The crucial point is that what defines a study as experimental or correlational is not the variables being studied, nor whether the variables are quantitative or categorical, nor the type of graph or statistics used to analyze the data. What defines a study is how the study is conducted.

chapter 3 research methods correlations

Data Collection in Correlational Research

Again, the defining feature of correlational research is that neither variable is manipulated. It does not matter how or where the variables are measured. A researcher could have participants come to a laboratory to complete a computerized backward digit span task and a computerized risky decision-making task and then assess the relationship between participants’ scores on the two tasks. Or a researcher could go to a shopping mall to ask people about their attitudes toward the environment and their shopping habits and then assess the relationship between these two variables. Both of these studies would be correlational because no independent variable is manipulated. 

Correlations Between Quantitative Variables

Correlations between quantitative variables are often presented using scatterplots . Figure 6.3 shows some hypothetical data on the relationship between the amount of stress people are under and the number of physical symptoms they have. Each point in the scatterplot represents one person’s score on both variables. For example, the circled point in Figure 6.3 represents a person whose stress score was 10 and who had three physical symptoms. Taking all the points into account, one can see that people under more stress tend to have more physical symptoms. This is a good example of a positive relationship , in which higher scores on one variable tend to be associated with higher scores on the other. In other words, they move in the same direction, either both up or both down. A negative relationship is one in which higher scores on one variable tend to be associated with lower scores on the other. In other words, they move in opposite directions. There is a negative relationship between stress and immune system functioning, for example, because higher stress is associated with lower immune system functioning.

Figure 6.3 Scatterplot Showing a Hypothetical Positive Relationship Between Stress and Number of Physical Symptoms

The strength of a correlation between quantitative variables is typically measured using a statistic called  Pearson’s Correlation Coefficient (or Pearson's  r ) . As Figure 6.4 shows, Pearson’s r ranges from −1.00 (the strongest possible negative relationship) to +1.00 (the strongest possible positive relationship). A value of 0 means there is no relationship between the two variables. When Pearson’s  r  is 0, the points on a scatterplot form a shapeless “cloud.” As its value moves toward −1.00 or +1.00, the points come closer and closer to falling on a single straight line. Correlation coefficients near ±.10 are considered small, values near ± .30 are considered medium, and values near ±.50 are considered large. Notice that the sign of Pearson’s  r  is unrelated to its strength. Pearson’s  r  values of +.30 and −.30, for example, are equally strong; it is just that one represents a moderate positive relationship and the other a moderate negative relationship. With the exception of reliability coefficients, most correlations that we find in Psychology are small or moderate in size. The website http://rpsychologist.com/d3/correlation/ , created by Kristoffer Magnusson, provides an excellent interactive visualization of correlations that permits you to adjust the strength and direction of a correlation while witnessing the corresponding changes to the scatterplot.

Figure 6.4 Range of Pearson’s r, From −1.00 (Strongest Possible Negative Relationship), Through 0 (No Relationship), to +1.00 (Strongest Possible Positive Relationship)

There are two common situations in which the value of Pearson’s  r  can be misleading. Pearson’s  r  is a good measure only for linear relationships, in which the points are best approximated by a straight line. It is not a good measure for nonlinear relationships, in which the points are better approximated by a curved line. Figure 6.5, for example, shows a hypothetical relationship between the amount of sleep people get per night and their level of depression. In this example, the line that best approximates the points is a curve—a kind of upside-down “U”—because people who get about eight hours of sleep tend to be the least depressed. Those who get too little sleep and those who get too much sleep tend to be more depressed. Even though Figure 6.5 shows a fairly strong relationship between depression and sleep, Pearson’s  r  would be close to zero because the points in the scatterplot are not well fit by a single straight line. This means that it is important to make a scatterplot and confirm that a relationship is approximately linear before using Pearson’s  r . Nonlinear relationships are fairly common in psychology, but measuring their strength is beyond the scope of this book.

Figure 6.5 Hypothetical Nonlinear Relationship Between Sleep and Depression

The other common situations in which the value of Pearson’s  r  can be misleading is when one or both of the variables have a limited range in the sample relative to the population. This problem is referred to as  restriction of range . Assume, for example, that there is a strong negative correlation between people’s age and their enjoyment of hip hop music as shown by the scatterplot in Figure 6.6. Pearson’s  r  here is −.77. However, if we were to collect data only from 18- to 24-year-olds—represented by the shaded area of Figure 6.6—then the relationship would seem to be quite weak. In fact, Pearson’s  r  for this restricted range of ages is 0. It is a good idea, therefore, to design studies to avoid restriction of range. For example, if age is one of your primary variables, then you can plan to collect data from people of a wide range of ages. Because restriction of range is not always anticipated or easily avoidable, however, it is good practice to examine your data for possible restriction of range and to interpret Pearson’s  r  in light of it. (There are also statistical methods to correct Pearson’s  r  for restriction of range, but they are beyond the scope of this book).

Figure 6.6 Hypothetical Data Showing How a Strong Overall Correlation Can Appear to Be Weak When One Variable Has a Restricted Range

Correlation Does Not Imply Causation

You have probably heard repeatedly that “Correlation does not imply causation.” An amusing example of this comes from a 2012 study that showed a positive correlation (Pearson’s r = 0.79) between the per capita chocolate consumption of a nation and the number of Nobel prizes awarded to citizens of that nation [2] . It seems clear, however, that this does not mean that eating chocolate causes people to win Nobel prizes, and it would not make sense to try to increase the number of Nobel prizes won by recommending that parents feed their children more chocolate.

There are two reasons that correlation does not imply causation. The first is called the  directionality problem . Two variables,  X  and  Y , can be statistically related because X  causes  Y  or because  Y  causes  X . Consider, for example, a study showing that whether or not people exercise is statistically related to how happy they are—such that people who exercise are happier on average than people who do not. This statistical relationship is consistent with the idea that exercising causes happiness, but it is also consistent with the idea that happiness causes exercise. Perhaps being happy gives people more energy or leads them to seek opportunities to socialize with others by going to the gym. The second reason that correlation does not imply causation is called the  third-variable problem . Two variables,  X  and  Y , can be statistically related not because  X  causes  Y , or because  Y  causes  X , but because some third variable,  Z , causes both  X  and  Y . For example, the fact that nations that have won more Nobel prizes tend to have higher chocolate consumption probably reflects geography in that European countries tend to have higher rates of per capita chocolate consumption and invest more in education and technology (once again, per capita) than many other countries in the world. Similarly, the statistical relationship between exercise and happiness could mean that some third variable, such as physical health, causes both of the others. Being physically healthy could cause people to exercise and cause them to be happier. Correlations that are a result of a third-variable are often referred to as  spurious correlations .

Some excellent and amusing examples of spurious correlations can be found at http://www.tylervigen.com  (Figure 6.7  provides one such example).

chapter 3 research methods correlations

“Lots of Candy Could Lead to Violence”

Although researchers in psychology know that correlation does not imply causation, many journalists do not. One website about correlation and causation, http://jonathan.mueller.faculty.noctrl.edu/100/correlation_or_causation.htm , links to dozens of media reports about real biomedical and psychological research. Many of the headlines suggest that a causal relationship has been demonstrated when a careful reading of the articles shows that it has not because of the directionality and third-variable problems.

One such article is about a study showing that children who ate candy every day were more likely than other children to be arrested for a violent offense later in life. But could candy really “lead to” violence, as the headline suggests? What alternative explanations can you think of for this statistical relationship? How could the headline be rewritten so that it is not misleading?

As you have learned by reading this book, there are various ways that researchers address the directionality and third-variable problems. The most effective is to conduct an experiment. For example, instead of simply measuring how much people exercise, a researcher could bring people into a laboratory and randomly assign half of them to run on a treadmill for 15 minutes and the rest to sit on a couch for 15 minutes. Although this seems like a minor change to the research design, it is extremely important. Now if the exercisers end up in more positive moods than those who did not exercise, it cannot be because their moods affected how much they exercised (because it was the researcher who used random assignment to determine how much they exercised). Likewise, it cannot be because some third variable (e.g., physical health) affected both how much they exercised and what mood they were in. Thus experiments eliminate the directionality and third-variable problems and allow researchers to draw firm conclusions about causal relationships.

Media Attributions

  • Nicholas Cage and Pool Drownings  © Tyler Viegen is licensed under a  CC BY (Attribution)  license
  • Bushman, B. J., & Huesmann, L. R. (2001). Effects of televised violence on aggression. In D. Singer & J. Singer (Eds.), Handbook of children and the media (pp. 223–254). Thousand Oaks, CA: Sage. ↵
  • Messerli, F. H. (2012). Chocolate consumption, cognitive function, and Nobel laureates. New England Journal of Medicine, 367 , 1562-1564. ↵

A graph that presents correlations between two quantitative variables, one on the x-axis and one on the y-axis. Scores are plotted at the intersection of the values on each axis.

A relationship in which higher scores on one variable tend to be associated with higher scores on the other.

A relationship in which higher scores on one variable tend to be associated with lower scores on the other.

A statistic that measures the strength of a correlation between quantitative variables.

When one or both variables have a limited range in the sample relative to the population, making the value of the correlation coefficient misleading.

The problem where two variables, X  and  Y , are statistically related either because X  causes  Y, or because  Y  causes  X , and thus the causal direction of the effect cannot be known.

Two variables, X and Y, can be statistically related not because X causes Y, or because Y causes X, but because some third variable, Z, causes both X and Y.

Correlations that are a result not of the two variables being measured, but rather because of a third, unmeasured, variable that affects both of the measured variables.

Research Methods in Psychology Copyright © 2019 by Rajiv S. Jhangiani, I-Chant A. Chiang, Carrie Cuttler, & Dana C. Leighton is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Introduction to Research Methods

13 correlations.

Have you ever heard someone say “that’s just correlation, not causation!” Well, even if you haven’t we’ll talk about both parts of that statement in this chapter. Yes, correlation isn’t causation, but it’s also an important part of figuring out if something causes something else. Let’s start with the definition of correlation, and we’ll talk about causation at the end.

13.1 Concepts

Correlation isn’t a terribly complicated phenomenon to understand, but it can be hard to explain in a single clear sentence. I’ll try though. Correlation describes a relationship between two things (generally two numeric variables) based on patterns of change, where a change is one is asscoaited wtih a change in the other.

Let’s start with an example that will probably be immediately intuitive. Height is correlated with weight. What does that mean? People that are taller typically weigh more, and people that are shorter typically weigh less. The two things are highly correlated, meaning that one can be used as a predictor of the other. Let’s look at the scatter plot below with some data for heights and weights

chapter 3 research methods correlations

In the data, people that are taller typically weigh more, and those that weight less typically weigh less. There are exceptions. I highlight two individual points below in blue - they have essentially equivalent heights but have very different weights.

chapter 3 research methods correlations

Does that mean that we can’t use height to predict weight? No. But we also have to acknowledge any single guess will be wrong. But if all you knew about a person was their height, you’d have more information to guess their weight than if you didn’t know how tall they were. It’s useful information, because the two things co-occur. Correlations just refer to the averages across the data. The fact that there are exceptions is important, but it doesn’t mean that in general the two things don’t go together.

Correlations are a simple measure that helps us to make better guesses, as indicated above.

What kind of guesses?

Imagine you’re walking down the street with a book under your arm, The Poppy War and you run into a friend. Your friend asks what you’re reading and say they’re looking for a new book, so do you think they would like it. You don’t know whether they’ll like the book yet, they haven’t read it yet. But you could try to learn something else to make a better prediction for whether they would like this particular book. So you ask them whether they’ve read and liked Game of Thrones, because you think that will be a good predictor for whether they like The Poppy War. The Poppy War has a lot of similarities with the Game of Thrones books/series, so you think if they liked X (Game of Thrones) they’re more likely to like Y (The Poppy War). Correlations are just about making better predictions.

Let’s go through three examples quickly, using elementary schools in California.

Do you think schools that scored better (higher) on the state’s standardized reading test had higher or lower median parental incomes (i.e., were the parents richer)? Unsurprisingly, schools were parents were doing better financially did better on the test.

chapter 3 research methods correlations

Okay, do you think schools that scored higher on the math test had more or fewer students that were receiving a free/reduced cost lunch (a common measure of poverty at a school)? You would probably predict they’d perform worse. So values that are higher for one variable (free lunches) would be lower for the other (math scores), and vice versa.

chapter 3 research methods correlations

Last one. Do you think schools with more students had higher or lower median incomes? That’s something of a trick question, because they’re not correlated. Some richer schools had more students, and some had fewer. Knowing the median income of a school doesn’t help you predict the total number of students. You’ll see below that it’s lumpy data, but knowing the median income of a school really doesn’t help you guess whether the student body will be larger or smaller.

chapter 3 research methods correlations

So that’s sort of a quick introduction to different types of correlation and the intuition of using it. Things can be positively correlated (higher values=higher values, lower values=lower values), they can be negatively correlated (higher values=lower values, lower values=higher values), and there can be no correlation higher values=???, lower values=???).

We can see whether and how things are correlated using scatter plots, like we’ve done above, but we can also understand it more formally by calculating the correlation coefficient.

13.1.1 Correlation Coefficient

A correlation coefficient is a single number that can range from 1 to -1, and can fall anywhere in between. What the correlation coefficient does is gives you a quick way of understanding the relationship between two things. A correlation of 1 means that two things are positively and perfectly correlated. Essentially, on a scatter plot they would form a straight line. If you know one variable you know the other one exactly, there’s no random noise. Such correlations are rare in the real world, but it’s a useful starting off point. We can also have things that are perfectly correlated and negative, which would have a correlation of -1.

chapter 3 research methods correlations

Again, we rarely observe things that are perfectly correlated in the negative direction. But those are the two extremes, and we see every combination between those two limits. The chart below will (hopefully) help you to visualize different correlations and their relative strength.

chapter 3 research methods correlations

The larger the correlation coefficient is in absolute terms (closer to 1 or -1), the more helpful it is in making predictions when you only know one of the two measures.

How do we calculate the correlation coefficient? It’s worth understanding the underlying math that goes into calculating a correlation coefficient so that we don’t leave it as this magical numerical figure that is randomly generated.

13.1.1.1 Formula for Correlation Coefficient

chapter 3 research methods correlations

So that is the formula for calculating the correlation coefficient, which probably doesn’t make any sense right now. That’s fine, most mathematical formulas are Greek to me too. So we’ll break it down so you can understand exactly what all those different pieces are doing. The most important parts are the two z-scores on top.

What’s a z-score? I’m glad you asked. A z-core refers to a measure after we’ve standardized it. Let’s go back to our example for height and weight to explain. Weight was measured in pounds, and height was measured in inches. So how do we compare those two different units? How many inches is worth a pound? Those are two totally different things.

Remember earlier we described a positive correlation as being a situation where one thing goes up (increases) and the other goes up (increases) too. But goes up from what? What does it mean for income and test scores to both increase in tandem.

But we get rid of those units. What we care about in calculating a correlation coefficient isn’t how many pounds or inches someone is, but how many standard deviations each value is from the mean. Remember, we can use the mean to measure the middle of our data, and the standard deviation to measure how spread out the data is from the mean. Let’s take 12 heights and weights to demonstrate.

Index Height Weight
1 65.78 113
2 71.52 136.5
3 69.4 153
4 68.22 142.3
5 67.79 144.3
6 68.7 123.3
7 69.8 141.5
8 70.01 136.5
9 67.9 112.4
10 66.78 120.7
11 66.49 127.5
12 67.62 114.1

So first we calculate the mean and standard deviations for both height and weight for those 12 numbers.

Mean Height 68.33
Standard Deviation Height 1.642941
Mean Weight 130.4
Standard Deviation Weight 13.78589

The mean for Weight is basically twice as large as height. The tallest person is fewer inches tall than the skinniest persons weight is in our data. That’s fine. What we want to know is how many standard deviations each figure in the data is from the mean, because that neutralizes the differences in units that we have. We’ll add two new columns with that information

Now we don’t have to worry about the fact that weights are typically twice what heights are. What we care about is the size of the z-scores we just calculated, which tell us how many standard deviations above or below the mean each individual observation is. If the value for height is above the mean, is the value for weight also above the mean? Do they change a similar number of standard deviations, or do they move in opposite directions?

chapter 3 research methods correlations

Let’s go back to our formula.

Once we’ve calculated the zscores for each x varaible (Height) and the y variable (Weight), we multiply those figures for each individual observation. Once we multiply each set, we just add those values together. You don’t need to learn a lot of Greek to be good at data analysis, but you’ll see the E looking character that is used for sum. Sum, again, just means add all the values up. Once we have that sum, we just divide by the size of the sample (which in this case is 12) and we’ve got our correlation coefficient.

Index Height Weight HeightZ WeightZ HeightZxWeightZ
1 65.78 113 -1.553 -1.264 1.963
2 71.52 136.5 1.936 0.4402 0.8522
3 69.4 153 0.6479 1.64 1.062
4 68.22 142.3 -0.07168 0.8644 -0.06195
5 67.79 144.3 -0.3327 1.007 -0.3349
6 68.7 123.3 0.2212 -0.5163 -0.1142
7 69.8 141.5 0.8933 0.8034 0.7177
8 70.01 136.5 1.023 0.4383 0.4483
9 67.9 112.4 -0.2628 -1.309 0.344
10 66.78 120.7 -0.9446 -0.7074 0.6682
11 66.49 127.5 -1.124 -0.2153 0.242
12 67.62 114.1 -0.4328 -1.181 0.511

The sum of the column HeightZxWeightZ is 6.297529, which divided by 12-1 equals .573. .573 is our correlation coefficient.

That was more math than I like to do, but it’s worth pulling back the veil. Just because R can do magic for us us doesn’t mean that it should be be mystical how the math works.

As height increases, weight increases too in general. Or more specifically, as the distance from the mean for height increases, the distance from the mean for weight increases too. I’ll give you the spell later, but calculating correlations in r just takes 3 letters.

13.1.2 Strength of Correlation

The correlation coefficient offers us a single number that describes the strength of association between two variables. And we know that it runs from 1, positive and perfect correlation, to -1, negative and perfect correlation. But how do we know if correlation is strong or not, if it isn’t perfect?

We have some general guidelines for that. The chart below breaks down some generally regarded cut points for the strength of correlation.

Coefficient ‘r’ Direction Strength
1 Positive Perfect
0.81 - 0.99 Positive Very Strong
0.61 - 0.80 Positive Strong
0.41 - 0.60 Positive Moderate
0.21 - 0.40 Positive Weak
0.00 - 0.20 Positive Very weak
0.00 - -0.20 Negative Very weak
-0.21 - -0.40 Negative Weak
-0.41 - -0.60 Negative Moderate
-0.61 - -0.80 Negative Strong
-0.81 - -0.99 Negative Very Strong
-1 Negative Perfect

Like any cut points, these aren’t perfect. How much stronger of a correlation is there between two measures with an r of .79 vs .81? Not very much stronger, one doesn’t magically become embedded with the extra power of being “Very Strong” just by getting over that limit. It’s just a framework for evaluating the strength of different correlations.

And those cut points can be useful when evaluating the correlations between different sets of pairs. We can only calculate the correlation of two variables at a time, but we might be interested in what variable in our data has the strongest correlation with another variable of interest.

The data above is for counties in the United States, and has a set of measures relating to the demographics and economies of each county. Imagine that you’re a government official for a county, and you want to see your counties economy get stronger. To answer that question you get data on all of the counties in the US and try to figure out what are counties that have higher median incomes doing, because maybe there’s a lesson there you could apply to your county. You could test the correlations for each other variable with median income one by one, or you could look at them all together in what is called a correlation matrix .

That’s a lot of numbers! And it’s probably pretty confusing to look at initially. We’ll practice making similar tables that are a little better visually later, but for now let’s try to understand what that seemingly random collection of numbers is telling us.

Our data has 5 variables in it: pop (population); collegepct (percent college graduates); medinc (median income); medhomevalue (median price of homes); and povpct (percent of people in poverty.)

The names of each of those variables are displayed across the top of the table, and in the first column.

Each cell with a number is the correlation coefficient for the combination of the name of the row and the column. Let me demonstrate that by annotating what is displayed above by just looking at three of the numbers displayed.

chapter 3 research methods correlations

The first row is for population, and so is the first column. So what is the correlation between population and population? They’re exactly the same, they’re the same variable. So of course the correlation is perfect. It doesn’t actually mean anything, it’s included to create a break between the top and the bottom half of the chart.

The other two variables show the correlation between median income and poverty percent. It’s the same number, because whether we are correlating median income and poverty, or poverty and median income, they are the same.

And so what a correlation matrix shows us is the correlations between all of the variables in our data set, and more specifically it shows us them twice. You can work across any row to see all the correlations for one particular variable, or down any column.

A correlation matrix lets you compare the correlations for different variables at the same time. I hope all of that has been clear. If you’re wondering whether you understand what is being displayed in a correlation matrix you should be able to answer these questions: 1. what has the strongest correlation with collegepct, and what has the weakest correlation in the data.

For the first one you have to work across the entire row or column for college pct to find the largest number. For the other you should identify the number that is closest to 0.

Got your answer yet? medhomevalue has the strongest correlation with collegepct, and the correlation between povpct and pop is the weakest.

Okay, so we’ve got our correlation matrix and we know how to read it now. Let’s return to our question from above and figure out what is the strongest correlation with median income among counties. Let’s take a look at that chart again.

According to the chart above, the strongest correlation with median income is the percent of residents in poverty. Great, so to increase median incomes the best thing to do is reduce poverty? Maybe, but if we increased peoples median incomes we’d also probably see a reduction in poverty. Both those variables are measuring really similar things, but in slightly different ways. Essentially they’re both trying to understand how wealthy a community is, but one is oriented towards how rich the community is and the other towards how poor it is. They’re not perfectly correlated though, some communities with higher poverty rates have slightly higher or lower median incomes, but the two are strongly associated.

It’s useful to know there is a strong association between those two things, but it isn’t immediately clear how we use that knowledge to improve policy outcomes. This gets to the limitation of looking at correlations, just in and of themselves. They tell us something about the world (where there’s more poverty there’s typically lower median incomes) but it doesn’t tell us why that is true. It’s worth talking more about what correlation doesn’t and doesn’t tell us then.

13.1.3 Correlation and Causation

From the wonderful and nerdy xkcd comics

From the wonderful and nerdy xkcd comics

Correlation and causation are often intertwined. Way back in the chapter on Introduction to Research we talked about the goal of science to explain things, particularly the causes of things. The reason that a rock falls when dropped is because gravity causes it to go down.

Correlation is useful for making predictions, which beings to help us build causal arguments. But correlation and causation shouldn’t be overly confused. Height and weight are correlated, but does being taller cause you to weight more? Maybe, partially, because it gives your body more space to carry weight. But weighing more can also help you grow, which is why that the types of food available to societies predict differences in average height across countries. A body needs fuel to grow, and then that growth supports the addition of additional pounds. It’s all to say that the causation is complicated, even if it doesn’t change the fact about whether height and weight are correlated.

As one of my pre-law students once put it: correlation is necessary for causation to be present, but it’s not sufficient on its own.

There are three necessary criteria to assert causality (that A causes B):

  • Co-variation
  • Temporal Precedence
  • Elimination of Extraneous Variables

Co-variation is what we’ve been measuring. As one variable moves, the other variable moves in unison. As we’ve discussed, parental incomes in high schools in California correlates with test scores.

Temporal precedence refers to the timing of the two variables. In order for A to cause B, A must precede B. I cause the TV to turn on by pushing a button, you wouldn’t say the TV turning on caused me to push a button. The measurement for parental income comes before the math tests here, so we do have temporal precedence in that example. The two variables, parental income and test scores were measured at the same time, but it’s unlikely that math scores helped parents to earn more (unless the state has introduced some sort of test reward system for parents).

So what about Extraneous Variables . We don’t just need to prove that income and math scores are correlated and that the income preceded the tests. We need to prove that nothing else could explain the relationship. Is the parental income really the cause of the scores, or is it the higher education of parents (which helps them earn more)? Is it because those parents could help their children with math homework at night, or because they could afford math camps in the summer? There are lots of things that would correlate with parental income, that would also correlate with school math scores. Until we can eliminate all of those possibilities, we can’t say for sure that parental income causes higher math scores.

These issues have arisen in the real world. A while back someone realized that children with more books in their home did better on reading tests. So nonprofits and schools started giving away books, to try and ensure every student would have books in their homes and thus and do better on tests.

What happened? Not much. The books didn’t make a difference, having parents that would read books to their children every night did, along with many other factors (having a consistent home to store them at, parents that could afford books, etc.). That’s why it’s important to eliminate every other possible explanation.

Let’s look at one more example. Homeschools students do better than those in public school. Great! Let’s home school everyone, right?

Well, home schooled students do better on average, but that’s probably related to the fact they have families with enough income for one parent to stay home and not work regularly and they have a parent that feels comfortable teaching (high education themselves). Just based off that, I’m guessing that shutting down public schools and sending everyone home wont send scores skyrocketing. But there is still a correlation between home schooling and scores, but it may not be causal.

This goes a long way to explaining why experiments are the gold standard in science.

chapter 3 research methods correlations

One benefit of an experiment (if designed correctly) is that other confounding factors are randomized between the treatment and control group. So imagine we did an experiment to see if homeschooling improved kids scores. We’d take a random subsection of students, a combination of minority and white children, rich and poor, different types of parents, different previous test scores, and either have them continue in school or go to home school. At the end of a year or some time period we’d compare their scores, and we wouldn’t have to worry that there are systematic differences between the group doing one thing and another.

13.1.4 Spurious Correlations

So correlation is necessary for showing causation, but not sufficient. If I want to sell my new wonder drug that makes people lose weight, I need to show that people that take my wonder drug lose weight - that there is a correlation between weight lose and consumption of the drug. If I don’t show that, it’s going to be a hard sell.

But sometimes two things correlate and it doesn’t mean anything.

For instance, would you assume there was a relationship between eating cheese and the number of people that die from getting tangled in bed sheets? No? Well, good luck explaining to me why they correlate then!

chapter 3 research methods correlations

Or what about the divorce rate in the State of Maine and per capita consumption of margarine?

chapter 3 research methods correlations

Those are all from the wonderful and wacky website of Tyler Vigen , where he’s investigated some of the weirder correlations that exist. Sometimes two things co-occur, and it’s just by random chance. We call those spurious correlations , for when two things correlate but there isn’t a causal relationship between them. Those are funny examples, but misunderstanding causation and correlation can have significant consequences. It is so critically important that a researcher think through any and every other explanation behind a correlation before declaring that one of the variables is causing the other one. It doesn’t not matter how strong the correlation is, it can be spurious if the two factors are not actually causing each other.

The growth of the anti-vax movement is actually driven in part by a spurious correlation.

Andrew Wakefield was part of a 1998 paper published in a leading journal that argued that the increasing rates of autism being diagnosed were linked to increasing levels of mercury in vaccines. I’m somewhat oversimplifying their argument, but it was based on a correlation between levels of mercury in vaccines and autism rates. The image below, from the original paper, shows how rates of autism (on the y-axis) increased rapidly after the beginning of the MMR vaccine.

chapter 3 research methods correlations

Is there a relationship between vaccines and autism, or mercury in vaccines and autism? No. But then why did rates of autism suddenly increase after the introduction of new vaccines? Doctors got better at diagnosing autism, which has always existed, but for centuries went undiagnosed and ignored. Wakefield failed to even consider alternative explanations for the link, and the anti-vax movement has continued to grow as a result of that mistake.

The original paper was retracted from the journal , the author Andrew Wakefiled lost his medical license , and significant scientific evidence has been produced disproving Wakefield’s conclusion. But a simple spurious correlation, which coincided with parents growing concern over a new and growing diagnoses in children, has caused irreparable damage to public health.

Which is again to emphasize, be careful with correlations. They can indicate important relationships, and are a good way to start exploring what is going on in your data. But they’re limited in what they can show without further evidence and should just be viewed as a beginning point.

13.2 Practice

Calculating the correlation coefficient between two variables in R is relatively straightforward. R will do all the steps we outlined above with the command cor().

Let’s start with crime rate data for US States. That data is already loaded into R, so we can call it in with the command data()

To use the cor() command we need to tell it what two variables we want the correlation between. So if we want to see the correlation for Murder and Assault rates…

The correlation is .8, so very high and positive.

And for Murder and Rape…

A little lower, but still positive.

Just to summarize, to get the correlation between two variables we use the command cor()with the two variables we’re interested in inserted in the parentheses and separated by a comma.

We can only calculate correlation for two variables at a time, but we can also calculate for multiple pairs of variables simultaneously. For instance, if we insert the name of our data set into cor() it will automatically calculate the correlation for each pair of variables, like below…

One issue to keep in mid though is that you can only calculate a correlation for a numeric variable. What’s the correlation between the color of your shoes and height? You can’t calculate that because there’s no mean for the color of your shoes.

Similarly, if you want to produce a correlation matrix (like we just did) but there are non-numeric variables in the data, R will give you an error message. For instance, let’s read some data in about city economies and take a look at the top few lines.

We can calculate the correlation for population (POP) and median income (MEDINC), like so…

But we can’t produce a correlation matrix for the entire data set because PLACE and STATE are both non-numeric. What we can do to get around that though is create a new data set without those columns, and produce a correlation matrix for all the numeric columns. Only the variables named below will be in the new data set called “city2”.

So that’s how to make a correlation matrix, but there are other more attractive ways to display the strength and direction of correlations across a data set in R. I discuss those below for anyone interested in learning more.

13.2.1 Advanced Practice

This advanced practice will combine visualization (graphing) and practice with correlations. One way to display or talk about the correlations present in your data is with a correlation matrix, as we just built above. But there are other ways to use them in a paper project.

These are examples of the types of things you can find by just googling how to do things in R. I googled “how to make cool correlation graphs in R”, found this website by James Marquez , and now reproduce some below.

We’ll use some new data for these next examples. The data below is from Switzerland and predictors of fertility in 1888 . It’s a weird data set I know, but it’s all numeric and it has some interesting variables. We don’t need to worry too much about the data, but just focus on some pretty graphs below.

The first graph I’ll talk about is a variation on the traditional correlation matrix we’ve shown above. As we’ve discussed, the correlation coefficients are displayed twice for each pair of variables. That means that the same information is displayed twice, meaning that we could do something different with half that space. We need to install and load a package to create the following graph, which was discussed in the section on Polling .

chapter 3 research methods correlations

There’s a lot going on there, and it might be two artsy on some occasions. The correlation coefficients are displayed on the bottom half of the table, just like in the basic matrix, but the top half is instead circles with the size representing the strength of correlation. Blue indicates a positive correlation, and red is used for negative correlations. In addition, how dark the numbers are on the bottom is shaded based on the size of the correlation coefficient. That means not all of the data is as easy to read, but that is intentional because it tells you those hard to read numbers are smaller and thus less important. This graph really emphasizes which variables have stronger correlations. Too much? Maybe, but it’s more interesting than the black and white collection of numbers crammed together earlier.

One more, from a different package. This one is from the package corrr.

chapter 3 research methods correlations

Not all of the variables in the data are displayed on that graph, only the ones with a correlation coefficient above the minimum set with the option min_cor. There I specified .35, so only those variables with a correlation above that figure are graphed. Similar to the first graph we made, negative correlations are displayed in red, and positive ones are in blue. Here there is a line between the names of different variables indicating that they are correlated. For instance, we can see that Infant.Mortality is correlated positively with Fertility because of the blue line, but Infant.Mortality is not correlated above .35 with Catholic because there is no line. It’s a minimalist way to display correlations, that again only emphasizes those variables that are associated above a certain point.

Those are just a few examples of some of the cool things you can do in R.

Logo for Portland State University Pressbooks

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Correlational Research

Rajiv S. Jhangiani; I-Chant A. Chiang; Carrie Cuttler; and Dana C. Leighton

Learning Objectives

  • Define correlational research and give several examples.
  • Explain why a researcher might choose to conduct correlational research rather than experimental research or another type of non-experimental research.
  • Interpret the strength and direction of different correlation coefficients.
  • Explain why correlation does not imply causation.

What Is Correlational Research?

Correlational research is a type of non-experimental research in which the researcher measures two variables (binary or continuous) and assesses the statistical relationship (i.e., the correlation) between them with little or no effort to control extraneous variables. There are many reasons that researchers interested in statistical relationships between variables would choose to conduct a correlational study rather than an experiment. The first is that they do not believe that the statistical relationship is a causal one or are not interested in causal relationships. Recall two goals of science are to describe and to predict and the correlational research strategy allows researchers to achieve both of these goals. Specifically, this strategy can be used to describe the strength and direction of the relationship between two variables and if there is a relationship between the variables then the researchers can use scores on one variable to predict scores on the other (using a statistical technique called regression, which is discussed further in the section on Complex Correlation in this chapter).

Another reason that researchers would choose to use a correlational study rather than an experiment is that the statistical relationship of interest is thought to be causal, but the researcher  cannot manipulate the independent variable because it is impossible, impractical, or unethical. For example, while a researcher might be interested in the relationship between the frequency people use cannabis and their memory abilities they cannot ethically manipulate the frequency that people use cannabis. As such, they must rely on the correlational research strategy; they must simply measure the frequency that people use cannabis and measure their memory abilities using a standardized test of memory and then determine whether the frequency people use cannabis is statistically related to memory test performance. 

Correlation is also used to establish the reliability and validity of measurements. For example, a researcher might evaluate the validity of a brief extraversion test by administering it to a large group of participants along with a longer extraversion test that has already been shown to be valid. This researcher might then check to see whether participants’ scores on the brief test are strongly correlated with their scores on the longer one. Neither test score is thought to cause the other, so there is no independent variable to manipulate. In fact, the terms  independent variable  and dependent variabl e  do not apply to this kind of research.

Another strength of correlational research is that it is often higher in external validity than experimental research. Recall there is typically a trade-off between internal validity and external validity. As greater controls are added to experiments, internal validity is increased but often at the expense of external validity as artificial conditions are introduced that do not exist in reality. In contrast, correlational studies typically have low internal validity because nothing is manipulated or controlled but they often have high external validity. Since nothing is manipulated or controlled by the experimenter the results are more likely to reflect relationships that exist in the real world.

Finally, extending upon this trade-off between internal and external validity, correlational research can help to provide converging evidence for a theory. If a theory is supported by a true experiment that is high in internal validity as well as by a correlational study that is high in external validity then the researchers can have more confidence in the validity of their theory. As a concrete example, correlational studies establishing that there is a relationship between watching violent television and aggressive behavior have been complemented by experimental studies confirming that the relationship is a causal one (Bushman & Huesmann, 2001) [1] .

Does Correlational Research Always Involve Quantitative Variables?

A common misconception among beginning researchers is that correlational research must involve two quantitative variables, such as scores on two extraversion tests or the number of daily hassles and number of symptoms people have experienced. However, the defining feature of correlational research is that the two variables are measured—neither one is manipulated—and this is true regardless of whether the variables are quantitative or categorical. Imagine, for example, that a researcher administers the Rosenberg Self-Esteem Scale to 50 American college students and 50 Japanese college students. Although this “feels” like a between-subjects experiment, it is a correlational study because the researcher did not manipulate the students’ nationalities. The same is true of the study by Cacioppo and Petty comparing college faculty and factory workers in terms of their need for cognition. It is a correlational study because the researchers did not manipulate the participants’ occupations.

Figure 6.2 shows data from a hypothetical study on the relationship between whether people make a daily list of things to do (a “to-do list”) and stress. Notice that it is unclear whether this is an experiment or a correlational study because it is unclear whether the independent variable was manipulated. If the researcher randomly assigned some participants to make daily to-do lists and others not to, then it is an experiment. If the researcher simply asked participants whether they made daily to-do lists, then it is a correlational study. The distinction is important because if the study was an experiment, then it could be concluded that making the daily to-do lists reduced participants’ stress. But if it was a correlational study, it could only be concluded that these variables are statistically related. Perhaps being stressed has a negative effect on people’s ability to plan ahead (the directionality problem). Or perhaps people who are more conscientious are more likely to make to-do lists and less likely to be stressed (the third-variable problem). The crucial point is that what defines a study as experimental or correlational is not the variables being studied, nor whether the variables are quantitative or categorical, nor the type of graph or statistics used to analyze the data. What defines a study is how the study is conducted.

chapter 3 research methods correlations

Data Collection in Correlational Research

Again, the defining feature of correlational research is that neither variable is manipulated. It does not matter how or where the variables are measured. A researcher could have participants come to a laboratory to complete a computerized backward digit span task and a computerized risky decision-making task and then assess the relationship between participants’ scores on the two tasks. Or a researcher could go to a shopping mall to ask people about their attitudes toward the environment and their shopping habits and then assess the relationship between these two variables. Both of these studies would be correlational because no independent variable is manipulated. 

Correlations Between Quantitative Variables

Correlations between quantitative variables are often presented using scatterplots . Figure 6.3 shows some hypothetical data on the relationship between the amount of stress people are under and the number of physical symptoms they have. Each point in the scatterplot represents one person’s score on both variables. For example, the circled point in Figure 6.3 represents a person whose stress score was 10 and who had three physical symptoms. Taking all the points into account, one can see that people under more stress tend to have more physical symptoms. This is a good example of a positive relationship , in which higher scores on one variable tend to be associated with higher scores on the other. In other words, they move in the same direction, either both up or both down. A negative relationship is one in which higher scores on one variable tend to be associated with lower scores on the other. In other words, they move in opposite directions. There is a negative relationship between stress and immune system functioning, for example, because higher stress is associated with lower immune system functioning.

Figure 6.3 Scatterplot Showing a Hypothetical Positive Relationship Between Stress and Number of Physical Symptoms

The strength of a correlation between quantitative variables is typically measured using a statistic called  Pearson’s Correlation Coefficient (or Pearson's  r ) . As Figure 6.4 shows, Pearson’s r ranges from −1.00 (the strongest possible negative relationship) to +1.00 (the strongest possible positive relationship). A value of 0 means there is no relationship between the two variables. When Pearson’s  r  is 0, the points on a scatterplot form a shapeless “cloud.” As its value moves toward −1.00 or +1.00, the points come closer and closer to falling on a single straight line. Correlation coefficients near ±.10 are considered small, values near ± .30 are considered medium, and values near ±.50 are considered large. Notice that the sign of Pearson’s  r  is unrelated to its strength. Pearson’s  r  values of +.30 and −.30, for example, are equally strong; it is just that one represents a moderate positive relationship and the other a moderate negative relationship. With the exception of reliability coefficients, most correlations that we find in Psychology are small or moderate in size. The website http://rpsychologist.com/d3/correlation/ , created by Kristoffer Magnusson, provides an excellent interactive visualization of correlations that permits you to adjust the strength and direction of a correlation while witnessing the corresponding changes to the scatterplot.

Figure 6.4 Range of Pearson’s r, From −1.00 (Strongest Possible Negative Relationship), Through 0 (No Relationship), to +1.00 (Strongest Possible Positive Relationship)

There are two common situations in which the value of Pearson’s  r  can be misleading. Pearson’s  r  is a good measure only for linear relationships, in which the points are best approximated by a straight line. It is not a good measure for nonlinear relationships, in which the points are better approximated by a curved line. Figure 6.5, for example, shows a hypothetical relationship between the amount of sleep people get per night and their level of depression. In this example, the line that best approximates the points is a curve—a kind of upside-down “U”—because people who get about eight hours of sleep tend to be the least depressed. Those who get too little sleep and those who get too much sleep tend to be more depressed. Even though Figure 6.5 shows a fairly strong relationship between depression and sleep, Pearson’s  r  would be close to zero because the points in the scatterplot are not well fit by a single straight line. This means that it is important to make a scatterplot and confirm that a relationship is approximately linear before using Pearson’s  r . Nonlinear relationships are fairly common in psychology, but measuring their strength is beyond the scope of this book.

Figure 6.5 Hypothetical Nonlinear Relationship Between Sleep and Depression

The other common situations in which the value of Pearson’s  r  can be misleading is when one or both of the variables have a limited range in the sample relative to the population. This problem is referred to as  restriction of range . Assume, for example, that there is a strong negative correlation between people’s age and their enjoyment of hip hop music as shown by the scatterplot in Figure 6.6. Pearson’s  r  here is −.77. However, if we were to collect data only from 18- to 24-year-olds—represented by the shaded area of Figure 6.6—then the relationship would seem to be quite weak. In fact, Pearson’s  r  for this restricted range of ages is 0. It is a good idea, therefore, to design studies to avoid restriction of range. For example, if age is one of your primary variables, then you can plan to collect data from people of a wide range of ages. Because restriction of range is not always anticipated or easily avoidable, however, it is good practice to examine your data for possible restriction of range and to interpret Pearson’s  r  in light of it. (There are also statistical methods to correct Pearson’s  r  for restriction of range, but they are beyond the scope of this book).

Figure 6.6 Hypothetical Data Showing How a Strong Overall Correlation Can Appear to Be Weak When One Variable Has a Restricted Range

Correlation Does Not Imply Causation

You have probably heard repeatedly that “Correlation does not imply causation.” An amusing example of this comes from a 2012 study that showed a positive correlation (Pearson’s r = 0.79) between the per capita chocolate consumption of a nation and the number of Nobel prizes awarded to citizens of that nation [2] . It seems clear, however, that this does not mean that eating chocolate causes people to win Nobel prizes, and it would not make sense to try to increase the number of Nobel prizes won by recommending that parents feed their children more chocolate.

There are two reasons that correlation does not imply causation. The first is called the  directionality problem . Two variables,  X  and  Y , can be statistically related because X  causes  Y  or because  Y  causes  X . Consider, for example, a study showing that whether or not people exercise is statistically related to how happy they are—such that people who exercise are happier on average than people who do not. This statistical relationship is consistent with the idea that exercising causes happiness, but it is also consistent with the idea that happiness causes exercise. Perhaps being happy gives people more energy or leads them to seek opportunities to socialize with others by going to the gym. The second reason that correlation does not imply causation is called the  third-variable problem . Two variables,  X  and  Y , can be statistically related not because  X  causes  Y , or because  Y  causes  X , but because some third variable,  Z , causes both  X  and  Y . For example, the fact that nations that have won more Nobel prizes tend to have higher chocolate consumption probably reflects geography in that European countries tend to have higher rates of per capita chocolate consumption and invest more in education and technology (once again, per capita) than many other countries in the world. Similarly, the statistical relationship between exercise and happiness could mean that some third variable, such as physical health, causes both of the others. Being physically healthy could cause people to exercise and cause them to be happier. Correlations that are a result of a third-variable are often referred to as  spurious correlations .

Some excellent and amusing examples of spurious correlations can be found at http://www.tylervigen.com  (Figure 6.7  provides one such example).

chapter 3 research methods correlations

“Lots of Candy Could Lead to Violence”

Although researchers in psychology know that correlation does not imply causation, many journalists do not. One website about correlation and causation, http://jonathan.mueller.faculty.noctrl.edu/100/correlation_or_causation.htm , links to dozens of media reports about real biomedical and psychological research. Many of the headlines suggest that a causal relationship has been demonstrated when a careful reading of the articles shows that it has not because of the directionality and third-variable problems.

One such article is about a study showing that children who ate candy every day were more likely than other children to be arrested for a violent offense later in life. But could candy really “lead to” violence, as the headline suggests? What alternative explanations can you think of for this statistical relationship? How could the headline be rewritten so that it is not misleading?

As you have learned by reading this book, there are various ways that researchers address the directionality and third-variable problems. The most effective is to conduct an experiment. For example, instead of simply measuring how much people exercise, a researcher could bring people into a laboratory and randomly assign half of them to run on a treadmill for 15 minutes and the rest to sit on a couch for 15 minutes. Although this seems like a minor change to the research design, it is extremely important. Now if the exercisers end up in more positive moods than those who did not exercise, it cannot be because their moods affected how much they exercised (because it was the researcher who used random assignment to determine how much they exercised). Likewise, it cannot be because some third variable (e.g., physical health) affected both how much they exercised and what mood they were in. Thus experiments eliminate the directionality and third-variable problems and allow researchers to draw firm conclusions about causal relationships.

Media Attributions

  • Nicholas Cage and Pool Drownings  © Tyler Viegen is licensed under a  CC BY (Attribution)  license
  • Bushman, B. J., & Huesmann, L. R. (2001). Effects of televised violence on aggression. In D. Singer & J. Singer (Eds.), Handbook of children and the media (pp. 223–254). Thousand Oaks, CA: Sage. ↵
  • Messerli, F. H. (2012). Chocolate consumption, cognitive function, and Nobel laureates. New England Journal of Medicine, 367 , 1562-1564. ↵

A graph that presents correlations between two quantitative variables, one on the x-axis and one on the y-axis. Scores are plotted at the intersection of the values on each axis.

A relationship in which higher scores on one variable tend to be associated with higher scores on the other.

A relationship in which higher scores on one variable tend to be associated with lower scores on the other.

A statistic that measures the strength of a correlation between quantitative variables.

When one or both variables have a limited range in the sample relative to the population, making the value of the correlation coefficient misleading.

The problem where two variables, X  and  Y , are statistically related either because X  causes  Y, or because  Y  causes  X , and thus the causal direction of the effect cannot be known.

Two variables, X and Y, can be statistically related not because X causes Y, or because Y causes X, but because some third variable, Z, causes both X and Y.

Correlations that are a result not of the two variables being measured, but rather because of a third, unmeasured, variable that affects both of the measured variables.

Correlational Research Copyright © by Rajiv S. Jhangiani; I-Chant A. Chiang; Carrie Cuttler; and Dana C. Leighton is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Logo for M Libraries Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

8.3 Complex Correlational Designs

Learning objectives.

  • Explain some reasons that researchers use complex correlational designs.
  • Create and interpret a correlation matrix.
  • Describe how researchers can use correlational research to explore causal relationships among variables—including the limits of this approach.

As we have already seen, researchers conduct correlational studies rather than experiments when they are interested in noncausal relationships or when they are interested in causal relationships where the independent variable cannot be manipulated for practical or ethical reasons. In this section, we look at some approaches to complex correlational research that involve measuring several variables and assessing the relationships among them.

Correlational Studies With Factorial Designs

We have already seen that factorial experiments can include manipulated independent variables or a combination of manipulated and nonmanipulated independent variables. But factorial designs can also include only nonmanipulated independent variables, in which case they are no longer experiments but correlational studies. Consider a hypothetical study in which a researcher measures both the moods and the self-esteem of several participants—categorizing them as having either a positive or negative mood and as being either high or low in self-esteem—along with their willingness to have unprotected sexual intercourse. This can be conceptualized as a 2 × 2 factorial design with mood (positive vs. negative) and self-esteem (high vs. low) as between-subjects factors. (Willingness to have unprotected sex is the dependent variable.) This design can be represented in a factorial design table and the results in a bar graph of the sort we have already seen. The researcher would consider the main effect of sex, the main effect of self-esteem, and the interaction between these two independent variables.

Again, because neither independent variable in this example was manipulated, it is a correlational study rather than an experiment. (The similar study by MacDonald and Martineau [2002] was an experiment because they manipulated their participants’ moods.) This is important because, as always, one must be cautious about inferring causality from correlational studies because of the directionality and third-variable problems. For example, a main effect of participants’ moods on their willingness to have unprotected sex might be caused by any other variable that happens to be correlated with their moods.

Assessing Relationships Among Multiple Variables

Most complex correlational research, however, does not fit neatly into a factorial design. Instead, it involves measuring several variables—often both categorical and quantitative—and then assessing the statistical relationships among them. For example, researchers Nathan Radcliffe and William Klein studied a sample of middle-aged adults to see how their level of optimism (measured by using a short questionnaire called the Life Orientation Test) relates to several other variables related to having a heart attack (Radcliffe & Klein, 2002). These included their health, their knowledge of heart attack risk factors, and their beliefs about their own risk of having a heart attack. They found that more optimistic participants were healthier (e.g., they exercised more and had lower blood pressure), knew about heart attack risk factors, and correctly believed their own risk to be lower than that of their peers.

This approach is often used to assess the validity of new psychological measures. For example, when John Cacioppo and Richard Petty created their Need for Cognition Scale—a measure of the extent to which people like to think and value thinking—they used it to measure the need for cognition for a large sample of college students, along with three other variables: intelligence, socially desirable responding (the tendency to give what one thinks is the “appropriate” response), and dogmatism (Caccioppo & Petty, 1982). The results of this study are summarized in Table 8.1 “Correlation Matrix Showing Correlations Among the Need for Cognition and Three Other Variables Based on Research by Cacioppo and Petty” , which is a correlation matrix showing the correlation (Pearson’s r ) between every possible pair of variables in the study. For example, the correlation between the need for cognition and intelligence was +.39, the correlation between intelligence and socially desirable responding was −.02, and so on. (Only half the matrix is filled in because the other half would contain exactly the same information. Also, because the correlation between a variable and itself is always +1.00, these values are replaced with dashes throughout the matrix.) In this case, the overall pattern of correlations was consistent with the researchers’ ideas about how scores on the need for cognition should be related to these other constructs.

Table 8.1 Correlation Matrix Showing Correlations Among the Need for Cognition and Three Other Variables Based on Research by Cacioppo and Petty

Need for cognition Intelligence Social desirability Dogmatism
Need for cognition
Intelligence +.39
Social desirability +.08 +.02
Dogmatism −.27 −.23 +.03

When researchers study relationships among a large number of conceptually similar variables, they often use a complex statistical technique called factor analysis . In essence, factor analysis organizes the variables into a smaller number of clusters, such that they are strongly correlated within each cluster but weakly correlated between clusters. Each cluster is then interpreted as multiple measures of the same underlying construct. These underlying constructs are also called “factors.” For example, when people perform a wide variety of mental tasks, factor analysis typically organizes them into two main factors—one that researchers interpret as mathematical intelligence (arithmetic, quantitative estimation, spatial reasoning, and so on) and another that they interpret as verbal intelligence (grammar, reading comprehension, vocabulary, and so on). The Big Five personality factors have been identified through factor analyses of people’s scores on a large number of more specific traits. For example, measures of warmth, gregariousness, activity level, and positive emotions tend to be highly correlated with each other and are interpreted as representing the construct of extroversion. As a final example, researchers Peter Rentfrow and Samuel Gosling asked more than 1,700 college students to rate how much they liked 14 different popular genres of music (Rentfrow & Gosling, 2008). They then submitted these 14 variables to a factor analysis, which identified four distinct factors. The researchers called them Reflective and Complex (blues, jazz, classical, and folk), Intense and Rebellious (rock, alternative, and heavy metal), Upbeat and Conventional (country, soundtrack, religious, pop), and Energetic and Rhythmic (rap/hip-hop, soul/funk, and electronica).

Two additional points about factor analysis are worth making here. One is that factors are not categories. Factor analysis does not tell us that people are either extroverted or conscientious or that they like either “reflective and complex” music or “intense and rebellious” music. Instead, factors are constructs that operate independently of each other. So people who are high in extroversion might be high or low in conscientiousness, and people who like reflective and complex music might or might not also like intense and rebellious music. The second point is that factor analysis reveals only the underlying structure of the variables. It is up to researchers to interpret and label the factors and to explain the origin of that particular factor structure. For example, one reason that extroversion and the other Big Five operate as separate factors is that they appear to be controlled by different genes (Plomin, DeFries, McClean, & McGuffin, 2008).

Exploring Causal Relationships

Another important use of complex correlational research is to explore possible causal relationships among variables. This might seem surprising given that “correlation does not imply causation.” It is true that correlational research cannot unambiguously establish that one variable causes another. Complex correlational research, however, can often be used to rule out other plausible interpretations.

The primary way of doing this is through the statistical control of potential third variables. Instead of controlling these variables by random assignment or by holding them constant as in an experiment, the researcher measures them and includes them in the statistical analysis. Consider some research by Paul Piff and his colleagues, who hypothesized that being lower in socioeconomic status (SES) causes people to be more generous (Piff, Kraus, Côté, Hayden Cheng, & Keltner, 2011). They measured their participants’ SES and had them play the “dictator game.” They told participants that each would be paired with another participant in a different room. (In reality, there was no other participant.) Then they gave each participant 10 points (which could later be converted to money) to split with the “partner” in whatever way he or she decided. Because the participants were the “dictators,” they could even keep all 10 points for themselves if they wanted to.

As these researchers expected, participants who were lower in SES tended to give away more of their points than participants who were higher in SES. This is consistent with the idea that being lower in SES causes people to be more generous. But there are also plausible third variables that could explain this relationship. It could be, for example, that people who are lower in SES tend to be more religious and that it is their greater religiosity that causes them to be more generous. Or it could be that people who are lower in SES tend to come from ethnic groups that emphasize generosity more than other ethnic groups. The researchers dealt with these potential third variables, however, by measuring them and including them in their statistical analyses. They found that neither religiosity nor ethnicity was correlated with generosity and were therefore able to rule them out as third variables. This does not prove that SES causes greater generosity because there could still be other third variables that the researchers did not measure. But by ruling out some of the most plausible third variables, the researchers made a stronger case for SES as the cause of the greater generosity.

Many studies of this type use a statistical technique called multiple regression . This involves measuring several independent variables ( X 1 , X 2 , X 3 ,…X i ), all of which are possible causes of a single dependent variable ( Y ). The result of a multiple regression analysis is an equation that expresses the dependent variable as an additive combination of the independent variables. This regression equation has the following general form:

b 1 X 1 + b 2 X 2 + b 3 X 3 + … + b i X i = Y .

The quantities b 1 , b 2 , and so on are regression weights that indicate how large a contribution an independent variable makes, on average, to the dependent variable. Specifically, they indicate how much the dependent variable changes for each one-unit change in the independent variable.

The advantage of multiple regression is that it can show whether an independent variable makes a contribution to a dependent variable over and above the contributions made by other independent variables. As a hypothetical example, imagine that a researcher wants to know how the independent variables of income and health relate to the dependent variable of happiness. This is tricky because income and health are themselves related to each other. Thus if people with greater incomes tend to be happier, then perhaps this is only because they tend to be healthier. Likewise, if people who are healthier tend to be happier, perhaps this is only because they tend to make more money. But a multiple regression analysis including both income and happiness as independent variables would show whether each one makes a contribution to happiness when the other is taken into account. (Research like this, by the way, has shown both income and health make extremely small contributions to happiness except in the case of severe poverty or illness; Diener, 2000.)

The examples discussed in this section only scratch the surface of how researchers use complex correlational research to explore possible causal relationships among variables. It is important to keep in mind, however, that purely correlational approaches cannot unambiguously establish that one variable causes another. The best they can do is show patterns of relationships that are consistent with some causal interpretations and inconsistent with others.

Key Takeaways

  • Researchers often use complex correlational research to explore relationships among several variables in the same study.
  • Complex correlational research can be used to explore possible causal relationships among variables using techniques such as multiple regression. Such designs can show patterns of relationships that are consistent with some causal interpretations and inconsistent with others, but they cannot unambiguously establish that one variable causes another.
  • Practice: Make a correlation matrix for a hypothetical study including the variables of depression, anxiety, self-esteem, and happiness. Include the Pearson’s r values that you would expect.
  • Discussion: Imagine a correlational study that looks at intelligence, the need for cognition, and high school students’ performance in a critical-thinking course. A multiple regression analysis shows that intelligence is not related to performance in the class but that the need for cognition is. Explain what this study has shown in terms of what causes good performance in the critical-thinking course.

Cacioppo, J. T., & Petty, R. E. (1982). The need for cognition. Journal of Personality and Social Psychology, 42 , 116–131.

Diener, E. (2000). Subjective well-being: The science of happiness, and a proposal for a national index. American Psychologist , 55 , 34–43.

MacDonald, T. K., & Martineau, A. M. (2002). Self-esteem, mood, and intentions to use condoms: When does low self-esteem lead to risky health behaviors? Journal of Experimental Social Psychology, 38 , 299–306.

Piff, P. K., Kraus, M. W., Côté, S., Hayden Cheng, B., & Keltner, D. (2011). Having less, giving more: The influence of social class on prosocial behavior. Journal of Personality and Social Psychology , 99 , 771–784.

Plomin, R., DeFries, J. C., McClearn, G. E., & McGuffin, P. (2008). Behavioral genetics (5th ed.). New York, NY: Worth.

Radcliffe, N. M., & Klein, W. M. P. (2002). Dispositional, unrealistic, and comparative optimism: Differential relations with knowledge and processing of risk information and beliefs about personal risk. Personality and Social Psychology Bulletin , 28 , 836–846.

Rentfrow, P. J., & Gosling, S. D. (2008). The do re mi’s of everyday life: The structure and personality correlates of music preferences. Journal of Personality and Social Psychology , 84 , 1236–1256.

Research Methods in Psychology Copyright © 2016 by University of Minnesota is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Study Site Homepage

  • Request new password
  • Create a new account

Research Methods and Statistics in Psychology

Student resources, chapter 3: research methods.

1. Which of the following statements is not true? [TY3.1]

  • Psychological measurement can involve the measurement of phenomena believed to be related to a given psychological state or process.
  • Psychological measurement can involve the measurement of behaviour believed to result from a given psychological state or process.
  • Psychological measurement can involve self-reports of behaviour believed to be related to a given psychological state or process.
  • Psychological measurement can involve the self-reports of a sample drawn from a particular sub-population.
  • Psychological measurement can involve direct examination of psychological states and processes.

2. A researcher conducts an experiment that tests the hypothesis that ‘anxiety has an adverse effect on students’ exam performance’. Which of the following statements is true? [TY3.2]

  • Anxiety is the dependent variable, exam performance is the independent variable.
  • Anxiety is the dependent variable, students are the independent variable.
  • Anxiety is the independent variable, students are the dependent variable.
  • Anxiety is the independent variable, exam performance is the dependent variable.
  • Students are the dependent variable, exam performance is the independent variable.

3. An experimenter conducts a study in which she wants to look at the effects of altitude on psychological well-being. To do this she randomly allocates people to two groups and takes one group up in a plane to a height of 1000 metres and leaves the other group in the airport terminal as a control group. When the plane is in the air she seeks to establish the psychological well-being of both groups. Which of the following is a potential confound, threatening the internal validity of the study? [TY3.3]

  • The reliability of the questionnaire that she uses to establish psychological health.
  • The size of the space in which the participants are confined.
  • The susceptibility of the experimental group to altitude sickness.
  • The susceptibility of the control group to altitude sickness.
  • The age of people in experimental and control groups.

4. What distinguishes the experimental method from the quasi-experimental method? [TY3.4]

  • The scientific status of the research.
  • The existence of an independent variable.
  • The existence of different levels of an independent variable.
  • The sensitivity of the dependent variable.
  • The random assignment of participants to conditions.

5. Which of the following is not an advantage of the survey/correlational method? [TY3.5]

  • It allows researchers to examine a number of different variables at the same time.
  • It allows researchers to examine the relationship between variables in natural settings.
  • It allows researchers to make predictions based on observed relationships between variables.
  • It allows researchers to explain observed relationships between variables.
  • It is often more convenient than experimental methods.

6. Which of the following statements is true? [TY3.6]

  • Case studies have played no role in the development of psychological theory.
  • Case studies have all of the weaknesses and none of the strengths of larger studies.
  • Case studies have none of the weaknesses and all of the strengths of larger studies.
  • Case studies should only be conducted if every other option has been ruled out.
  • None of the above.

7. An experimenter, Tom, conducts an experiment to see whether accuracy of responding and reaction time are affected by consumption of alcohol. To do this, Tom conducts a study in which students at university A react to pairs of symbols by saying ‘same’ or ‘different’ after consuming two glasses of water and students at university B react to pairs of symbols by saying ‘same’ or ‘different’ after consuming two glasses of wine. Tom predicts that reaction times will be slower and that there will be more errors in the responses of students who have consumed alcohol. Which of the following statements is not true? [TY3.7]

  • The university attended by participants is a confound.
  • The experiment has two dependent variables.
  • Reaction time is the independent variable.
  • Tom’s ability to draw firm conclusions about the impact of alcohol on reaction time would be improved by assigning participants randomly to experimental conditions.
  • This study is actually a quasi-experiment.

8. What is an extraneous variable? [TY3.8]

  • A variable that can never be manipulated.
  • A variable that can never be controlled.
  • A variable that can never be measured.
  • A variable that clouds the interpretation of results.

9. Which of the following statements is true? [TY3.9]

  • The appropriateness of any research method is always determined by the research question and the research environment.
  • Good experiments all involve a large number of participants.
  • Experiments should be conducted in laboratories in order to improve experimental control.
  • Surveys have no place in good psychological research.
  • Case studies are usually carried out when researchers are too lazy to find enough participants.

10. A piece of research that is conducted in a natural (non-artificial) setting is called: [TY3.10]

  • A case study.
  • A field study.
  • A quasi-experiment.
  • An observational study.

11. “Measures designed to gain insight into particular psychological states or processes that involve recording performance on particular activities or tasks.” What type of measures does this glossary entry describe?

  • State measures.
  • Behavioural measures.
  • Physiological measures.
  • Activity measures.
  • Performance measures.

12. “An approach to psychology that asserts that human behaviour can be understood in terms of directly observable relationships (in particular, between a stimulus and a response) without having to refer to underlying mental states.” Which approach to psychology is this a glossary definition of?

  • Behaviourism.
  • Freudianism.
  • Cognitivism.
  • Radical observationism.

13. “The complete set of events, people or things that a researcher is interested in and from which any sample is taken.” What does this glossary entry define?

  • Total sample.
  • Complete sample.
  • Reference sample.
  • Reference group.
  • Population.

14. “Either the process of reaching conclusions about the effect of one variable on another, or the outcome of such a process.” What does this glossary entry define?

  • Causal inference.
  • Inductive reasoning.
  • Inferential accounting.

15. “The extent to which the effect of an independent variable on a dependent variable has been correctly interpreted.” Which construct is this a glossary definition of?

  • Internal inference.
  • External inference.
  • External validity.
  • Holistic deduction.
  • Internal validity.

Our systems are now restored following recent technical disruption, and we’re working hard to catch up on publishing. We apologise for the inconvenience caused. Find out more: https://www.cambridge.org/universitypress/about-us/news-and-blogs/cambridge-university-press-publishing-update-following-technical-disruption

We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings .

Login Alert

chapter 3 research methods correlations

  • > Management Research Methods
  • > Correlational field study (survey) designs

chapter 3 research methods correlations

Book contents

  • Frontmatter
  • Part 1 Introduction
  • Part 2 Research designs
  • 2 Experimental and quasi-experimental designs
  • 3 Correlational field study (survey) designs
  • 4 Case study research designs
  • 5 Action research designs
  • Part 3 Methods of data collection
  • Part 4 Measurement
  • Part 5 Methods of data analysis
  • Part 6 Reporting research findings and ethical considerations

3 - Correlational field study (survey) designs

Published online by Cambridge University Press:  05 June 2012

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'

Access options

Save book to kindle.

To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle .

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service .

  • Correlational field study (survey) designs
  • Phyllis Tharenou , University of South Australia , Ross Donohue , Monash University, Victoria , Brian Cooper , Monash University, Victoria
  • Book: Management Research Methods
  • Online publication: 05 June 2012
  • Chapter DOI: https://doi.org/10.1017/CBO9780511810527.004

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox .

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive .

6.2 Correlational Research

Learning objectives.

  • Define correlational research and give several examples.
  • Explain why a researcher might choose to conduct correlational research rather than experimental research or another type of non-experimental research.
  • Interpret the strength and direction of different correlation coefficients.
  • Explain why correlation does not imply causation.

What Is Correlational Research?

Correlational research is a type of non-experimental research in which the researcher measures two variables and assesses the statistical relationship (i.e., the correlation) between them with little or no effort to control extraneous variables. There are many reasons that researchers interested in statistical relationships between variables would choose to conduct a correlational study rather than an experiment. The first is that they do not believe that the statistical relationship is a causal one or are not interested in causal relationships. Recall two goals of science are to describe and to predict and the correlational research strategy allows researchers to achieve both of these goals. Specifically, this strategy can be used to describe the strength and direction of the relationship between two variables and if there is a relationship between the variables then the researchers can use scores on one variable to predict scores on the other (using a statistical technique called regression).

Another reason that researchers would choose to use a correlational study rather than an experiment is that the statistical relationship of interest is thought to be causal, but the researcher  cannot  manipulate the independent variable because it is impossible, impractical, or unethical. For example, while I might be interested in the relationship between the frequency people use cannabis and their memory abilities I cannot ethically manipulate the frequency that people use cannabis. As such, I must rely on the correlational research strategy; I must simply measure the frequency that people use cannabis and measure their memory abilities using a standardized test of memory and then determine whether the frequency people use cannabis use is statistically related to memory test performance. 

Correlation is also used to establish the reliability and validity of measurements. For example, a researcher might evaluate the validity of a brief extraversion test by administering it to a large group of participants along with a longer extraversion test that has already been shown to be valid. This researcher might then check to see whether participants’ scores on the brief test are strongly correlated with their scores on the longer one. Neither test score is thought to cause the other, so there is no independent variable to manipulate. In fact, the terms  independent variable  and dependent variabl e  do not apply to this kind of research.

Another strength of correlational research is that it is often higher in external validity than experimental research. Recall there is typically a trade-off between internal validity and external validity. As greater controls are added to experiments, internal validity is increased but often at the expense of external validity. In contrast, correlational studies typically have low internal validity because nothing is manipulated or control but they often have high external validity. Since nothing is manipulated or controlled by the experimenter the results are more likely to reflect relationships that exist in the real world.

Finally, extending upon this trade-off between internal and external validity, correlational research can help to provide converging evidence for a theory. If a theory is supported by a true experiment that is high in internal validity as well as by a correlational study that is high in external validity then the researchers can have more confidence in the validity of their theory. As a concrete example, correlational studies establishing that there is a relationship between watching violent television and aggressive behavior have been complemented by experimental studies confirming that the relationship is a causal one (Bushman & Huesmann, 2001) [1] .  These converging results provide strong evidence that there is a real relationship (indeed a causal relationship) between watching violent television and aggressive behavior.

Data Collection in Correlational Research

Again, the defining feature of correlational research is that neither variable is manipulated. It does not matter how or where the variables are measured. A researcher could have participants come to a laboratory to complete a computerized backward digit span task and a computerized risky decision-making task and then assess the relationship between participants’ scores on the two tasks. Or a researcher could go to a shopping mall to ask people about their attitudes toward the environment and their shopping habits and then assess the relationship between these two variables. Both of these studies would be correlational because no independent variable is manipulated. 

Correlations Between Quantitative Variables

Correlations between quantitative variables are often presented using scatterplots . Figure 6.3 shows some hypothetical data on the relationship between the amount of stress people are under and the number of physical symptoms they have. Each point in the scatterplot represents one person’s score on both variables. For example, the circled point in Figure 6.3 represents a person whose stress score was 10 and who had three physical symptoms. Taking all the points into account, one can see that people under more stress tend to have more physical symptoms. This is a good example of a positive relationship , in which higher scores on one variable tend to be associated with higher scores on the other. A  negative relationship  is one in which higher scores on one variable tend to be associated with lower scores on the other. There is a negative relationship between stress and immune system functioning, for example, because higher stress is associated with lower immune system functioning.

Figure 2.2 Scatterplot Showing a Hypothetical Positive Relationship Between Stress and Number of Physical Symptoms

Figure 6.3 Scatterplot Showing a Hypothetical Positive Relationship Between Stress and Number of Physical Symptoms. The circled point represents a person whose stress score was 10 and who had three physical symptoms. Pearson’s r for these data is +.51.

The strength of a correlation between quantitative variables is typically measured using a statistic called  Pearson’s Correlation Coefficient (or Pearson’s  r ) . As Figure 6.4 shows, Pearson’s r ranges from −1.00 (the strongest possible negative relationship) to +1.00 (the strongest possible positive relationship). A value of 0 means there is no relationship between the two variables. When Pearson’s  r  is 0, the points on a scatterplot form a shapeless “cloud.” As its value moves toward −1.00 or +1.00, the points come closer and closer to falling on a single straight line. Correlation coefficients near ±.10 are considered small, values near ± .30 are considered medium, and values near ±.50 are considered large. Notice that the sign of Pearson’s  r  is unrelated to its strength. Pearson’s  r  values of +.30 and −.30, for example, are equally strong; it is just that one represents a moderate positive relationship and the other a moderate negative relationship. With the exception of reliability coefficients, most correlations that we find in Psychology are small or moderate in size. The website http://rpsychologist.com/d3/correlation/ , created by Kristoffer Magnusson, provides an excellent interactive visualization of correlations that permits you to adjust the strength and direction of a correlation while witnessing the corresponding changes to the scatterplot.

Figure 2.3 Range of Pearson’s r, From −1.00 (Strongest Possible Negative Relationship), Through 0 (No Relationship), to +1.00 (Strongest Possible Positive Relationship)

Figure 6.4 Range of Pearson’s r, From −1.00 (Strongest Possible Negative Relationship), Through 0 (No Relationship), to +1.00 (Strongest Possible Positive Relationship)

There are two common situations in which the value of Pearson’s  r  can be misleading. Pearson’s  r  is a good measure only for linear relationships, in which the points are best approximated by a straight line. It is not a good measure for nonlinear relationships, in which the points are better approximated by a curved line. Figure 6.5, for example, shows a hypothetical relationship between the amount of sleep people get per night and their level of depression. In this example, the line that best approximates the points is a curve—a kind of upside-down “U”—because people who get about eight hours of sleep tend to be the least depressed. Those who get too little sleep and those who get too much sleep tend to be more depressed. Even though Figure 6.5 shows a fairly strong relationship between depression and sleep, Pearson’s  r  would be close to zero because the points in the scatterplot are not well fit by a single straight line. This means that it is important to make a scatterplot and confirm that a relationship is approximately linear before using Pearson’s  r . Nonlinear relationships are fairly common in psychology, but measuring their strength is beyond the scope of this book.

Figure 2.4 Hypothetical Nonlinear Relationship Between Sleep and Depression

Figure 6.5 Hypothetical Nonlinear Relationship Between Sleep and Depression

The other common situations in which the value of Pearson’s  r  can be misleading is when one or both of the variables have a limited range in the sample relative to the population. This problem is referred to as  restriction of range . Assume, for example, that there is a strong negative correlation between people’s age and their enjoyment of hip hop music as shown by the scatterplot in Figure 6.6. Pearson’s  r  here is −.77. However, if we were to collect data only from 18- to 24-year-olds—represented by the shaded area of Figure 6.6—then the relationship would seem to be quite weak. In fact, Pearson’s  r  for this restricted range of ages is 0. It is a good idea, therefore, to design studies to avoid restriction of range. For example, if age is one of your primary variables, then you can plan to collect data from people of a wide range of ages. Because restriction of range is not always anticipated or easily avoidable, however, it is good practice to examine your data for possible restriction of range and to interpret Pearson’s  r  in light of it. (There are also statistical methods to correct Pearson’s  r  for restriction of range, but they are beyond the scope of this book).

Figure 12.10 Hypothetical Data Showing How a Strong Overall Correlation Can Appear to Be Weak When One Variable Has a Restricted Range

Figure 6.6 Hypothetical Data Showing How a Strong Overall Correlation Can Appear to Be Weak When One Variable Has a Restricted Range.The overall correlation here is −.77, but the correlation for the 18- to 24-year-olds (in the blue box) is 0.

Correlation Does Not Imply Causation

You have probably heard repeatedly that “Correlation does not imply causation.” An amusing example of this comes from a 2012 study that showed a positive correlation (Pearson’s r = 0.79) between the per capita chocolate consumption of a nation and the number of Nobel prizes awarded to citizens of that nation [2] . It seems clear, however, that this does not mean that eating chocolate causes people to win Nobel prizes, and it would not make sense to try to increase the number of Nobel prizes won by recommending that parents feed their children more chocolate.

There are two reasons that correlation does not imply causation. The first is called the  directionality problem . Two variables,  X  and  Y , can be statistically related because X  causes  Y  or because  Y  causes  X . Consider, for example, a study showing that whether or not people exercise is statistically related to how happy they are—such that people who exercise are happier on average than people who do not. This statistical relationship is consistent with the idea that exercising causes happiness, but it is also consistent with the idea that happiness causes exercise. Perhaps being happy gives people more energy or leads them to seek opportunities to socialize with others by going to the gym. The second reason that correlation does not imply causation is called the  third-variable problem . Two variables,  X  and  Y , can be statistically related not because  X  causes  Y , or because  Y  causes  X , but because some third variable,  Z , causes both  X  and  Y . For example, the fact that nations that have won more Nobel prizes tend to have higher chocolate consumption probably reflects geography in that European countries tend to have higher rates of per capita chocolate consumption and invest more in education and technology (once again, per capita) than many other countries in the world. Similarly, the statistical relationship between exercise and happiness could mean that some third variable, such as physical health, causes both of the others. Being physically healthy could cause people to exercise and cause them to be happier. Correlations that are a result of a third-variable are often referred to as  spurious correlations.

Some excellent and funny examples of spurious correlations can be found at http://www.tylervigen.com  (Figure 6.7  provides one such example).

Figure 2.5 Example of a Spurious Correlation Source: http://tylervigen.com/spurious-correlations (CC-BY 4.0)

“Lots of Candy Could Lead to Violence”

Although researchers in psychology know that correlation does not imply causation, many journalists do not. One website about correlation and causation, http://jonathan.mueller.faculty.noctrl.edu/100/correlation_or_causation.htm , links to dozens of media reports about real biomedical and psychological research. Many of the headlines suggest that a causal relationship has been demonstrated when a careful reading of the articles shows that it has not because of the directionality and third-variable problems.

One such article is about a study showing that children who ate candy every day were more likely than other children to be arrested for a violent offense later in life. But could candy really “lead to” violence, as the headline suggests? What alternative explanations can you think of for this statistical relationship? How could the headline be rewritten so that it is not misleading?

As you have learned by reading this book, there are various ways that researchers address the directionality and third-variable problems. The most effective is to conduct an experiment. For example, instead of simply measuring how much people exercise, a researcher could bring people into a laboratory and randomly assign half of them to run on a treadmill for 15 minutes and the rest to sit on a couch for 15 minutes. Although this seems like a minor change to the research design, it is extremely important. Now if the exercisers end up in more positive moods than those who did not exercise, it cannot be because their moods affected how much they exercised (because it was the researcher who determined how much they exercised). Likewise, it cannot be because some third variable (e.g., physical health) affected both how much they exercised and what mood they were in (because, again, it was the researcher who determined how much they exercised). Thus experiments eliminate the directionality and third-variable problems and allow researchers to draw firm conclusions about causal relationships.

Key Takeaways

  • Correlational research involves measuring two variables and assessing the relationship between them, with no manipulation of an independent variable.
  • Correlation does not imply causation. A statistical relationship between two variables,  X  and  Y , does not necessarily mean that  X  causes  Y . It is also possible that  Y  causes  X , or that a third variable,  Z , causes both  X  and  Y .
  • While correlational research cannot be used to establish causal relationships between variables, correlational research does allow researchers to achieve many other important objectives (establishing reliability and validity, providing converging evidence, describing relationships and making predictions)
  • Correlation coefficients can range from -1 to +1. The sign indicates the direction of the relationship between the variables and the numerical value indicates the strength of the relationship.
  • A cognitive psychologist compares the ability of people to recall words that they were instructed to “read” with their ability to recall words that they were instructed to “imagine.”
  • A manager studies the correlation between new employees’ college grade point averages and their first-year performance reports.
  • An automotive engineer installs different stick shifts in a new car prototype, each time asking several people to rate how comfortable the stick shift feels.
  • A food scientist studies the relationship between the temperature inside people’s refrigerators and the amount of bacteria on their food.
  • A social psychologist tells some research participants that they need to hurry over to the next building to complete a study. She tells others that they can take their time. Then she observes whether they stop to help a research assistant who is pretending to be hurt.

2. Practice: For each of the following statistical relationships, decide whether the directionality problem is present and think of at least one plausible third variable.

  • People who eat more lobster tend to live longer.
  • People who exercise more tend to weigh less.
  • College students who drink more alcohol tend to have poorer grades.
  • Bushman, B. J., & Huesmann, L. R. (2001). Effects of televised violence on aggression. In D. Singer & J. Singer (Eds.), Handbook of children and the media (pp. 223–254). Thousand Oaks, CA: Sage. ↵
  • Messerli, F. H. (2012). Chocolate consumption, cognitive function, and Nobel laureates. New England Journal of Medicine, 367 , 1562-1564. ↵

Creative Commons License

Share This Book

  • Increase Font Size

Techniques of Qualitative Data Collection

  • First Online: 10 June 2024

Cite this chapter

chapter 3 research methods correlations

  • Rashina Hoda   ORCID: orcid.org/0000-0001-5147-8096 2  

In this chapter, we will learn about two groups of data collection techniques: custom-data and existing-data collection. Then we will delve into the details of popular collection techniques such as pre-interview questionnaires, semi-structured interviews, and observations. We will look into other sources and collection techniques such as focus groups, surveys, recordings, texts, social media, artefacts, data mining, and immersive experiences in extended realities. Researchers will benefit from reading this chapter in conjunction with the Basics of Qualitative Data Collection , Qualitative Data Preparation and Filtering , and Socio-technical Grounded Theory for Qualitative Data Analysis chapters. Collectively, they cover socio-technical grounded theory’s Basic Stage.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Durable hardcover edition
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Unable to display preview.  Download preview PDF.

Begel, A., DeLine, R., & Zimmermann, T. (2010). Social media for software engineering. In Proceedings of the FSE/SDP workshop on Future of software engineering research (pp. 33–38).

Google Scholar  

Berntzen, M., Hoda, R., Moe, N. B., & Stray, V. (2022). A taxonomy of inter-team coordination mechanisms in large-scale agile. IEEE Transactions on Software Engineering, 49 (2), 699–718.

Article   Google Scholar  

Berntzen, M., Stray, V., Moe, N. B., & Hoda, R. (2023). Responding to change over time: A longitudinal case study on changes in coordination mechanisms in large-scale agile. Empirical Software Engineering, 28 (5), 114.

Charmaz, K., Belgrave, L., et al. (2012). Qualitative interviewing and grounded theory analysis. The SAGE Handbook of Interview Research: The Complexity of the Craft, 2 , 347–365.

Gold, N. E., & Krinke, J. (2022). Ethics in the mining of software repositories. Empirical Software Engineering, 27 (1), 17.

Gonzalez-Barahona, J. (2020). Mining software repositories while respecting privacy. In Mining Software Repositories Tutorial . https://2020. msrconf. org/details/msr-2020-Education/1/Mining-Software-Repositories-While-Respect ing-Privacy

Graetsch, U. M., Khalajzadeh, H., Shahin, M., Hoda, R., & Grundy, J. (2023). Dealing with data challenges when delivering data-intensive software solutions. IEEE Transactions on Software Engineering . https://doi.org/10.1109/TSE.2023.3291003

Hine, C. (2008). Virtual ethnography: Modes, varieties, affordances. In The SAGE Handbook of Online Research Methods (pp. 257–270).

Hoda, R., Noble, J., & Marshall, S. (2012). Self-organizing roles on agile software development teams. IEEE Transactions on Software Engineering, 39 (3), 422–444.

Kaplan, A. M., & Haenlein, M. (2010). Users of the world, unite! the challenges and opportunities of social media. Business horizons, 53 (1), 59–68.

LeCompte, M. D., & Goetz, J. P. (1982). Problems of reliability and validity in ethnographic research, Review of Educational Research, 52 (1), 31–60.

Liu, M., Peng, X., Marcus, A., Xing, S., Treude, C., & Zhao, C. (2021). API-related developer information needs in stack overflow. IEEE Transactions on Software Engineering, 48 (11), 4485–4500.

Madampe, K., Hoda, R., & Singh, P. (2020). Towards understanding emotional response to requirements changes in agile teams. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: New Ideas and Emerging Results (pp. 37–40).

Masood, Z., Hoda, R., & Blincoe, K. (2020a). How agile teams make self-assignment work: A grounded theory study, Empirical Software Engineering, 25 , 4962–5005. https://doi.org/10.1007/s10664-020-09876-x

Masood, Z., Hoda, R., Blincoe, K. (2020b). Real world Scrum: A grounded theory of variations in practice. IEEE Transactions on Software Engineering, 48 (5), 1579–1591.

McCarney, R., Warner, J., Iliffe, S., Van Haselen, R., Griffin, M., & Fisher, P. (2007). The hawthorne effect: a randomised, controlled trial. BMC Medical Research Methodology, 7 (1), 30.

Mead, G. H. (1934). Mind, self and society (Vol. 111). Chicago University of Chicago Press.

Monahan, T., & Fisher, J. A. (2010). Benefits of ‘observer effects’: Lessons from the field. Qualitative Research, 10 (3), 357–376.

O’Connor, R. (2012). Using grounded theory coding mechanisms to analyze case study and focus group data in the context of software process research. In Research methodologies, innovations and philosophies in software systems engineering and information systems (pp. 256–270). IGI Global.

Shastri, Y., Hoda, R., & Amor, R. (2017). Understanding the roles of the manager in agile project management. In Proceedings of the 10th Innovations in Software Engineering Conference (pp. 45–55).

Storey, M.-A., Treude, C., van Deursen, A., & Cheng, L.-T. (2010). The impact of social media on software engineering practices and tools. In Proceedings of the FSE/SDP workshop on Future of software engineering research (pp. 359–364).

Treude, C., Barzilay, O., & Storey, M.-A. (2011). How do programmers ask and answer questions on the web? (NIER track). In Proceedings of the 33rd International Conference on Software Engineering (pp. 804–807).

Wolfe, A. (1991). Mind, self, society, and computer: Artificial intelligence and the sociology of mind. American Journal of Sociology, 96 (5), 1073–1096.

Download references

Author information

Authors and affiliations.

Faculty of Information Technology, Monash University, Melbourne, VIC, Australia

Rashina Hoda

You can also search for this author in PubMed   Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Hoda, R. (2024). Techniques of Qualitative Data Collection. In: Qualitative Research with Socio-Technical Grounded Theory. Springer, Cham. https://doi.org/10.1007/978-3-031-60533-8_8

Download citation

DOI : https://doi.org/10.1007/978-3-031-60533-8_8

Published : 10 June 2024

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-60532-1

Online ISBN : 978-3-031-60533-8

eBook Packages : Computer Science Computer Science (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

IMAGES

  1. Research Methods

    chapter 3 research methods correlations

  2. CHAPTER 3 Research Methods

    chapter 3 research methods correlations

  3. 03chapter 3

    chapter 3 research methods correlations

  4. Chapter 3

    chapter 3 research methods correlations

  5. AQA GCSE Psychology: Correlations. (Lesson 11 of Research Methods

    chapter 3 research methods correlations

  6. Chapter 3 : Research Methodology by Norhaizerah Nordin on Prezi

    chapter 3 research methods correlations

VIDEO

  1. Relationship between theory and data

  2. 3.3 8 Correlation

  3. How to calculate Correlation

  4. Correlations with three variables

  5. QUALITATIVE RESEARCH: CHAPTER III SAMPLE CONTENTS

  6. Unit 1: Correlational Research (AP Psychology)

COMMENTS

  1. PDF A2 Chapter 3

    Chapter 3: Research methods 64—65 Case studies and Content analysis Think of another case Activity type Idea The textbook uses the case study of HM to illustrate the strengths of the case study approach, e.g. to demonstrate the way that it might change our understanding or even our theories. Often students are so focused on the

  2. PDF CHAPTER III RESEARCH METHODOLOGY

    This chapter includes research design, population and sample, research hypothesis, data collection, the research instruments try out, and data analysis. 3.1 Research Design Since the main purpose of this research is to investigate whether there is any correlation between explicit grammatical knowledge and writing ability of EFL

  3. PDF Chapter 3: Research methods Do you know everything that you need to

    Write appropriately operationalised null and directional hypotheses for the following: a) The relationship between age and running speed over 100 metres. b) Time taken to revise for Psychology Mock exam and the score obtained. c) Reaction time and alcohol units consumed. 8.

  4. Correlational Research

    Correlational Research. One of the primary methods used to study abnormal behavior is the correlational method. Correlation means that there is a relationship between two or more variables (such between the variables of negative thinking and depressive symptoms), but this relationship does not necessarily imply cause and effect. When two variables are correlated, it simply means that as one ...

  5. 7.2 Correlational Research

    Correlational research is a type of nonexperimental research in which the researcher measures two variables and assesses the statistical relationship (i.e., the correlation) between them with little or no effort to control extraneous variables. There are essentially two reasons that researchers interested in statistical relationships between ...

  6. Correlational Research

    Published on July 7, 2021 by Pritha Bhandari. Revised on June 22, 2023. A correlational research design investigates relationships between variables without the researcher controlling or manipulating any of them. A correlation reflects the strength and/or direction of the relationship between two (or more) variables.

  7. Correlational Research

    Correlational research is a type of non-experimental research in which the researcher measures two variables (binary or continuous) and assesses the statistical relationship (i.e., the correlation) between them with little or no effort to control extraneous variables. There are many reasons that researchers interested in statistical ...

  8. 13 Correlations

    A correlation coefficient is a single number that can range from 1 to -1, and can fall anywhere in between. What the correlation coefficient does is gives you a quick way of understanding the relationship between two things. A correlation of 1 means that two things are positively and perfectly correlated. Essentially, on a scatter plot they ...

  9. chap3.html

    Chapter 3 Research Methods. The goal of this chapter is to give you a rudimentary knowledge of research methods. ... Correlational Research. A correlation is a relationship between two variables. For example, we could look at the relationship between age and health and ask whether health improves or declines with age. We could also look at the ...

  10. PDF Chapter 3: Research methods Spearman and Pearson

    Chapter 3: Research methods Tests of correlation: Spearman's and Pearson's 78-79 Spearman and Pearson 3.14 3.14 Task 1 In the following examples, which statistical test of correlation would be used? 1. An investigation into the relationship between the number of hours of day care a child experiences per week and their score on an IQ test. 2.

  11. PDF CHAPTER III RESEARCH METHODOLOGY

    This chapter includes research design, Population and sample, Research hypothesis, data collection, trying out the research instruments, and data analysis. 3.1 Research Design In this present research, quantitative approach with correlation method is employed. Quantitative research is used since this research focuses on analyzing

  12. Research Methods in Psychology

    Cohen's d is a measure of relationship strength (or effect size) for differences between two group or condition means. It is the difference of the means divided by the standard deviation. In general, values of ±0.20, ±0.50, and ±0.80 can be considered small, medium, and large, respectively.

  13. Correlational Research

    Correlational research is a type of non-experimental research in which the researcher measures two variables (binary or continuous) and assesses the statistical relationship (i.e., the correlation) between them with little or no effort to control extraneous variables. There are many reasons that researchers interested in statistical ...

  14. (PDF) Chapter 3 Research Design and Methodology

    Research Design and Methodology. Chapter 3 consists of three parts: (1) Purpose of the. study and research design, (2) Methods, and (3) Statistical. Data analysis procedure. Part one, Purpose of ...

  15. 8.3 Complex Correlational Designs

    Most complex correlational research, however, does not fit neatly into a factorial design. Instead, it involves measuring several variables—often both categorical and quantitative—and then assessing the statistical relationships among them. For example, researchers Nathan Radcliffe and William Klein studied a sample of middle-aged adults to ...

  16. Chapter 3: Research Methods

    Psychological measurement can involve the self-reports of a sample drawn from a particular sub-population. Psychological measurement can involve direct examination of psychological states and processes. 2. A researcher conducts an experiment that tests the hypothesis that 'anxiety has an adverse effect on students' exam performance'.

  17. PDF Chapter 3 Research Strategies and Methods

    Chapter 3 Research Strategies and Methods. Chapter 3Research Strategies and MethodsDesign science is not a resea. ch strategy, nor is it a research method. But design science projects make use of both. research strategies and research methods. The purpose of research is to create reliable and useful knowledge based on.

  18. 16: Correlation, Similarity, and Distance

    Partial correlation as a method of determining whether a measured third variable is correlated with the two variables of interest. Note: source material is missing a dataset used for one worked example. 16.3: Data aggregation and correlation How correlations among grouped or aggregated data may differ from the underlying individual correlations.

  19. Correlational field study (survey) designs (Chapter 3)

    Cycyota, C. S. & Harrison, D. A. ( 2006 ). What (not) to expect when surveying executives: A meta-analysis of top manager response rates and techniques over time. Organizational Research Methods, 9, 133-160. CrossRef Google Scholar. Dillman, D. A. ( 1991 ). The design and administration of mail surveys.

  20. 6.2 Correlational Research

    Correlational research is a type of non-experimental research in which the researcher measures two variables and assesses the statistical relationship (i.e., the correlation) between them with little or no effort to control extraneous variables. There are many reasons that researchers interested in statistical relationships between variables ...

  21. Chapter 3

    Study with Quizlet and memorize flashcards containing terms like Tests for correlation, Lol, Lol and more. Fresh features from the #1 AI-enhanced learning platform. Explore the lineup

  22. Basics of Qualitative Data Collection

    Qualitative data is the main currency of the trade in a socio-technical grounded theory (STGT) study and in qualitative research in general. The quality of a STGT study depends first and foremost on the quality of the data collected. In this chapter, we will learn...

  23. Correlational Research

    Correlational research is a type of non-experimental research in which the researcher measures two variables (binary or continuous) and assesses the statistical relationship (i.e., the correlation) between them with little or no effort to control extraneous variables. There are many reasons that researchers interested in statistical ...

  24. Techniques of Qualitative Data Collection

    In this chapter, we will learn about two groups of data collection techniques: custom-data and existing-data collection. Then we will delve into the details of popular collection techniques such as pre-interview questionnaires, semi-structured interviews, and observations.

  25. chap3.html

    Chapter 3 Research Methods The goal of this chapter is to give you a rudimentary knowledge of research methods. ... Correlational Research. A correlation is a relationship between two variables. For example, we could look at the relationship between age and health and ask whether health improves or declines with age. We could also look at the ...