Sure, here are the exact formulas for the population and sample variance and standard deviation, as used by the functions in the statistics
module of Python:
Population Variance (pvariance
)
The population variance is calculated using the formula: $$ [
\sigma^2 = \frac{\sum_{i=1}^{n} (x_i - \mu)^2}{n}
] $
Where:
- ( $\sigma^2 $) is the population variance.
- ( $n$ ) is the number of data points in the population.
- ( $x_i$ ) represents each data point.
- ( $\mu$ ) is the population mean.
In Python, this is computed with statistics.pvariance(data)
.
Population Standard Deviation (pstdev
)
The population standard deviation is the square root of the population variance: $$[\sigma = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \mu)^2}{n}}]
$
Where:
- ( $\sigma$ ) is the population standard deviation.
- ( $n$ ) is the number of data points in the population.
- ( $x_i$ ) represents each data point.
- ( $\mu$ ) is the population mean.
In Python, this is computed with statistics.pstdev(data)
.
Sample Variance (variance
)
The sample variance is calculated using the formula:$s^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n - 1}$
Where:
- ( s^2 ) is the sample variance.
- ( $n$ ) is the number of data points in the sample.
- ( $x_i$ ) represents each data point.
- ( $\bar{x}$ ) is the sample mean.
In Python, this is computed with statistics.variance(data)
.
Sample Standard Deviation (stdev
)
The sample standard deviation is the square root of the sample variance:$s = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n - 1}}$
Where:
- ( $s$ ) is the sample standard deviation.
- ( $n$ ) is the number of data points in the sample.
- ( $x_i$ ) represents each data point.
- ( $\bar{x}$ ) is the sample mean.
In Python, this is computed with statistics.stdev(data)
.
Summary of Functions and Formulas
-
[ ] Population Variance (
pvariance
): $\sigma^2 = \frac{\sum_{i=1}^{n} (x_i - \mu)^2}{n}$ -
[ ] Population Standard Deviation (
pstdev
): $\sigma = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \mu)^2}{n}}$ -
[ ] Sample Variance (
variance
): $s^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n - 1}$ -
[ ] Sample Standard Deviation (
stdev
): $s = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n - 1}}$
These formulas provide measures of how much the population or sample tends to deviate from the typical or average values, with pvariance
and pstdev
used for entire populations, and variance
and stdev
used for samples.
The primary difference between the first two functions (pvariance
and pstdev
) and the last two functions (variance
and stdev
) in the given st_functions
class lies in whether they are calculating statistics for a population or a sample.
Population vs. Sample
-
Population Variance and Standard Deviation:
- Population Variance (
pvariance
): This measures how much the data points in a population deviate from the population mean. $\sigma^2 = \frac{\sum_{i=1}^{n} (x_i - \mu)^2}{n}$ - Population Standard Deviation (
pstdev
): This is the square root of the population variance. $\sigma = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \mu)^2}{n}}$
- Population Variance (
-
Sample Variance and Standard Deviation:
- Sample Variance (
variance
): This measures how much the data points in a sample deviate from the sample mean, using ( n-1 ) as the denominator to correct for bias (Bessel's correction). $s^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n - 1}$ - Sample Standard Deviation (
stdev
): This is the square root of the sample variance. $s = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n - 1}}$
- Sample Variance (
The Code Implementation
The provided st_functions
class implements these concepts as static methods using lambda functions. Here is a detailed look at each function:
class st_functions:
# Population Variance (statistics.pvariance)
pvariance = staticmethod(lambda data, mean: sum((x - mean) ** 2 for x in data) / len(data))
# Population Standard Deviation (statistics.pstdev)
pstdev = staticmethod(lambda data, mean: (sum((x - mean) ** 2 for x in data) / len(data)) ** 0.5)
# Sample Variance (statistics.variance)
variance = staticmethod(lambda data, mean: sum((x - mean) ** 2 for x in data) / (len(data) - 1))
# Sample Standard Deviation (statistics.stdev)
stdev = staticmethod(lambda data, mean: (sum((x - mean) ** 2 for x in data) / (len(data) - 1)) ** 0.5)
Differences
-
Denominator:
- Population (
pvariance
andpstdev
): The denominator is ( $n$ ), the total number of data points. - Sample (
variance
andstdev
): The denominator is ( $n - 1$), which corrects the bias in the estimation of the population variance and standard deviation from a sample.
- Population (
-
Context:
- Population: These functions (
pvariance
andpstdev
) assume that the data provided represents the entire population. - Sample: These functions (
variance
andstdev
) assume that the data provided is a sample drawn from a larger population. The use of ( $n - 1$ ) helps provide an unbiased estimate of the population variance and standard deviation.
- Population: These functions (
Practical Example
Consider a dataset and calculate both population and sample statistics using the st_functions
class:
data = [1, 2, 3, 4, 5]
mean = statistics.mean(data)
# Population Variance and Standard Deviation
pop_var = st_functions.pvariance(data, mean)
pop_std = st_functions.pstdev(data, mean)
print(f"Population Variance: {pop_var}")
print(f"Population Standard Deviation: {pop_std}")
# Sample Variance and Standard Deviation
sample_var = st_functions.variance(data, mean)
sample_std = st_functions.stdev(data, mean)
print(f"Sample Variance: {sample_var}")
print(f"Sample Standard Deviation: {sample_std}")
Output
Population Variance: 2.0
Population Standard Deviation: 1.4142135623730951
Sample Variance: 2.5
Sample Standard Deviation: 1.5811388300841898
Conclusion
- Population Functions (
pvariance
andpstdev
): Use these when your data represents the entire population. - Sample Functions (
variance
andstdev
): Use these when your data is a sample from a larger population. These functions adjust the calculations to correct for bias in sample statistics.