Correlation
Enter dataset numbers separated by a space.
Covariance
The covariance of two statistical series is a statistical measure that quantifies their independence.
Calculation of covariance from population data
X and Y are two population datasets,
`X = {x_1, x_2, ..., x_N}`
`Y = {y_1, y_2, ..., y_N}`
We denote `bar x` the arithmetic mean of the X series, `bar x = 1/N.sum_{i=1}^{i=N}x_i`
The arithmetic mean of Y dataset is `bar y`, `bar y = 1/N.sum_{i=1}^{i=N}y_i`
The covariance of X and Y series can be calculated as follows :
`\sigma _{xy} = \frac{1}{N}sum_{i=1}^{i=N} (x_i - bar x) (y_i - bar y)`
Calculation of covariance from sample data
In this case, values are available for a sample and not for entire population. The following estimator is used to estimate the covariance for the entire population:
X and Y are two sample series,
`X={x_1,x_2,...,x_n}`
`Y={y_1,y_2,...,y_n}`
The averages of the two samples are `bar x` and `bar y`,
`bar x = 1/n.sum_{i=1}^{i=n}x_i`
`bar y = 1/n.sum_{i=1}^{i=n}y_i`
The unbiased covariance estimator for the entire population is:
`\sigma _{xy} = \frac{1}{n-1}sum_{i=1}^{i=n} (x_i - bar x) (y_i - bar y)`
Pearson Correlation Coefficient
What is called 'correlation' in statistics is actually a linear correlation coefficient which is equal to the quotient of their covariance by the product of their standard deviations.
X and Y are two datasets,
`X = {x_1, x_2, ..., x_N}`
`Y = {y_1, y_2, ..., y_N}`
We denote `bar x` the arithmetic mean of the X series, `bar x = 1/N.sum_{i=1}^{i=N}x_i`
The arithmetic mean of the Y series is `bar y`, `bar y = 1/N.sum_{i=1}^{i=N}y_i`
The correlation coefficient of X and Y series can be calculated as follows :
`r = \frac{sum_{i=1}^{i=N} (x_i - bar x) (y_i - bar y)}{sqrt(sum_{i=1}^{i=N} (x_i - bar x)^2) . sqrt(sum_{i=1}^{i=N} (y_i - bar y)^2)}`
Coefficient of determination R²
The coefficient of determination is an indication of the quality of the prediction of a linear regression.
How to calculate the coefficient of determination ?
X is a dataset `X = {x_1, x_2, ..., x_N}`
We denote `bar x` the arithmetic mean of the X series either, `bar x = 1/N.sum_{i=1}^{i=N}x_i`
The coefficient of determination of the X series can be calculated as follows:
`R^2 = 1 - \frac{sum_{i=1}^{i=N} (x_i - hat x_i)^2}{sum_{i=1}^{i=N} (x_i - bar x)^2}`
`{hat x_1, hat x_2,..., hat x_N}` being the values predicted by the linear regression of the X series.
See also
Standard deviation
Arithmetic mean
Linear Regression