UNDERSTANDING OF COORELATION METHODS
A mistake that is made by researchers more often than it ought to be is to assume that, because two variables are highly correlated, one is responsible for variation in the other. Always remember that correlation does not imply causation. Suppose we conduct a study that reveals a strong positive correlation between the consumption of alcohol and aggressiveness. On this basis it cannot be concluded that alcohol causes aggressiveness. You could equally argue that aggressiveness causes people to drink more, or the relationship may be the product of a third factor, such as upbringing. Perhaps having hostile parents leads people to be aggressive and also to drink more. It is therefore possible that upbringing encourages alcohol consumption and aggressiveness, without each having a direct effect on the other. There are many real-life examples of spurious correlations that have arisen from the influence of a third factor. For example, when researchers found that there was a high correlation between the presence of ‘spongy tar’ in children’s playgrounds and the incidence of polio, they misguidedly inferred that ‘spongy tar’ caused polio. As a result, some schools went to great expense to get rid of it. In fact, both spongy tar and polio were both linked to a third factor: excessively high temperature. So it was this that needed to be controlled, not the type of tar in the playground. This inability to draw strict causal inferences (and the associated
temptation to do so) is by far the most serious problem associated with both correlational and survey ethodology.
Measurement of Correlation
Correlations are usually measured in terms of correlation coefficients. [correlation coefficient a measure of the degree of correspondence or association between two variables that are being studied] The most common of these is the Pearson product–moment correlation, or Pearson’s r. [Pearson’s r the commonly used name for Pearson’s product-moment correlation coefficient] The value of r indicates how strong a correlation is and can vary from −1.00 to 1.00.
As with t-tests, computation of Pearson’s r involves going through a series of standard steps. These allow us to establish whether high scores on one variable are associated with high scores on the other, and if low scores on one variable are associated with low scores on the other.
An r-value of 1.00 indicates a perfect positive correlation, and an r-value of −1.00 indicates a perfect negative correlation. In both these cases, the value of one variable can be predicted precisely for any value of the other variable. An r-value of 0.00 indicates there is no relationship between the variables at all.
[Karl Pearson (1857–1936) graduated from Cambridge University in 1879 but spent most of his career at University College, London. His book The Grammar of Science (1892) was remarkable in that it anticipated some of the ideas of relativity theory. Pearson then became interested in developing mathematical methods for studying the processes of heredity and evolution. He was a major contributor to statistics, building on past techniques and developing new concepts and theories. He defined the term ‘standard deviation’ in 1893. Pearson’s other important contributions include the method of moments, the Pearson system of curves, correlation and the chi-squared test. He was the first Galton Professor of Eugenics at University College, London, holding the chair from 1911 to 1933.]
A mistake that is made by researchers more often than it ought to be is to assume that, because two variables are highly correlated, one is responsible for variation in the other. Always remember that correlation does not imply causation. Suppose we conduct a study that reveals a strong positive correlation between the consumption of alcohol and aggressiveness. On this basis it cannot be concluded that alcohol causes aggressiveness. You could equally argue that aggressiveness causes people to drink more, or the relationship may be the product of a third factor, such as upbringing. Perhaps having hostile parents leads people to be aggressive and also to drink more. It is therefore possible that upbringing encourages alcohol consumption and aggressiveness, without each having a direct effect on the other. There are many real-life examples of spurious correlations that have arisen from the influence of a third factor. For example, when researchers found that there was a high correlation between the presence of ‘spongy tar’ in children’s playgrounds and the incidence of polio, they misguidedly inferred that ‘spongy tar’ caused polio. As a result, some schools went to great expense to get rid of it. In fact, both spongy tar and polio were both linked to a third factor: excessively high temperature. So it was this that needed to be controlled, not the type of tar in the playground. This inability to draw strict causal inferences (and the associated
temptation to do so) is by far the most serious problem associated with both correlational and survey ethodology.
Measurement of Correlation
Correlations are usually measured in terms of correlation coefficients. [correlation coefficient a measure of the degree of correspondence or association between two variables that are being studied] The most common of these is the Pearson product–moment correlation, or Pearson’s r. [Pearson’s r the commonly used name for Pearson’s product-moment correlation coefficient] The value of r indicates how strong a correlation is and can vary from −1.00 to 1.00.
As with t-tests, computation of Pearson’s r involves going through a series of standard steps. These allow us to establish whether high scores on one variable are associated with high scores on the other, and if low scores on one variable are associated with low scores on the other.
An r-value of 1.00 indicates a perfect positive correlation, and an r-value of −1.00 indicates a perfect negative correlation. In both these cases, the value of one variable can be predicted precisely for any value of the other variable. An r-value of 0.00 indicates there is no relationship between the variables at all.
[Karl Pearson (1857–1936) graduated from Cambridge University in 1879 but spent most of his career at University College, London. His book The Grammar of Science (1892) was remarkable in that it anticipated some of the ideas of relativity theory. Pearson then became interested in developing mathematical methods for studying the processes of heredity and evolution. He was a major contributor to statistics, building on past techniques and developing new concepts and theories. He defined the term ‘standard deviation’ in 1893. Pearson’s other important contributions include the method of moments, the Pearson system of curves, correlation and the chi-squared test. He was the first Galton Professor of Eugenics at University College, London, holding the chair from 1911 to 1933.]
1 comment:
The correlation is one of the most common and most useful statistics. A correlation is a single number that describes the degree of relationship between two variables. Let's work through an example to show you how this statistic is computed.
perason correlation
Post a Comment