Source code for simplestatistics.statistics.covariance

from .mean import mean

[docs]def covariance(x, y): """ (Sample) Covariance_ is a measure of how two random variables vary together. When the greater values of one variable correspond to the greater values of the other variable, this is a positive covariance. Whereas when the greater values of one variable correspond to the lesser values of the other variable, this is negative covariance. .. _Covariance: https://en.wikipedia.org/wiki/Covariance This `Cross Validated answer`_ provides a good explanation of the difference between covariance and correlation. Covariance is understood in the context of the units and scales involved. You cannot compare covariances across those contexts. A correlation is a "normalized" covariance that will always be a value between -1 and 1 and takes into account the scale of the variables. .. _`Cross Validated answer`: http://stats.stackexchange.com/a/18089 Equation: .. math:: q_{jk} = \\frac{1}{N - 1} \\sum\\limits_{i=1}^N (X_{ij} - \\bar{X_j}) (X_{ik} - \\bar{X_k}) Args: x: A list of numerical objects. y: A list of numerical objects that has the same length as x. Returns: A numerical object. Examples: >>> covariance([1,2,3,4,5,6], [6,5,4,3,2,1]) -3.5 >>> covariance([1,2,3], [4, 4.5, 5]) 0.5 >>> covariance(2, 3) Traceback (most recent call last): ... ValueError: To calculate covariance you need lists or tuples of equal length. Length must be > 1. >>> covariance([2, 4], [6, 6.5, 7]) Traceback (most recent call last): ... ValueError: To calculate covariance you need lists or tuples of equal length. Length must be > 1. >>> covariance([1], [-1]) Traceback (most recent call last): ... ValueError: covariance requires lists of equal length where length is > 1. """ if type(x) not in [list, tuple] or type(y) not in [list, tuple]: raise ValueError("To calculate covariance you need lists or tuples of equal length. Length must be > 1.") if len(x) != len(y): raise ValueError("To calculate covariance you need lists or tuples of equal length. Length must be > 1.") if len(x) <= 1 or len(y) <= 1: raise ValueError("covariance requires lists of equal length where length is > 1.") xmean = mean(x[:]) ymean = mean(y[:]) covsum = 0 n = len(x) for ii in list(range(n)): covsum += (x[ii] - xmean) * (y[ii] - ymean) bessels_correction = n - 1 return(covsum / bessels_correction)