Statistics

Standard Deviation

STANDARD DEVIATION

Standard deviation is the measure of dispersion or distribution of the data. The standard deviation measures the spread of statistical data. 

Low standard deviation is when the values are close to the mean. High standard deviation is when the values are far from the mean.

STANDARD DEVIATION FORMULA

Sample standard deviation formula

s=\sqrt{\frac{\sum \left ( x-\bar{x} \right )^{2}}{n-1}}
  • s – sample standard deviation
  • x – individual data point
  • \bar{x} – sample mean
  • n – total number of observation

Population standard deviation formula

\sigma=\sqrt{\frac{\sum \left ( x-\mu  \right )^{2}}{N}}
  • \sigma – population standard deviation
  • x – individual data point
  • \mu – population mean
  • N – total number of observation

STANDARD DEVIATION CALCULATION STEPS

  • Find the mean of the given data set
  • Subtract the mean from each data point and square the result
  • Sum all the squared difference and divide it by n-1 for the sample or divide by N for the population 
  • Square root the obtained value to get the standard deviation

Mean Median Mode

MEAN, MEDIAN, MODE

Mean, median, mode are three measures of central tendency to describe the center of data set in statistics. 

MEAN

The mean is the average of the given data set. To find the arithmetic mean you need to find the sum of all the data and then divide it by number of observation. 

Mean = Sum of all the observation / No of observation

Example: Find the mean of the following data set. 8,3,9,6,4

Sum of all the observation = 8+3+9+6+4=30

No of observation = 5

Mean = 30/5

          = 6

MEAN OF A UNGROUPED DATA

A raw set of individual values without any categories or intervals is referred to as ungrouped data.

Each data point in an ungrouped dataset represents a single observation or measurement.

To find the mean of the ungrouped data find sum of all the given observation and divide it by number of observation.

\mu=\frac{\sum_{i=1}^{\infty }x_{i}}{n}

Where

\mu is the mean of ungrouped data

x_{i} represents each individual value in the data set

n is the total number of observation

MEAN OF A GROUPED DATA

Grouped data involves organizing individual values into intervals or groups. Each group contains a range of values and the frequency of the observations.

Grouping a data is easier to interpret and analyze large datasets but it can also make lose in the details compared with individual values. 

\overline{x}=\frac{\sum_{i=1}^{\infty }m_{i}f_{i}}{N}

where,

\overline{x} is the mean of a grouped data

m_{i} is the mid point of the interval

f_{i} is the frequency of the interval

N is the total number of observation

MEDIAN

The median is a measure of central tendency in statistics. Median is the middle value of a numerically ordered data.

The order can be ascending or descending but most students prefer ascending order.

To calculate the median:

  1. Arrange the dataset in ascending or descending order.
  2. If the number of observations (or data points) is odd, the median is the middle value.
  3. If the number of observations is even, the median is the average of the two middle values

For example, consider the dataset {4,7,2, 5,10}

1. Arrange the dataset in ascending order {2,4,6,7,10}

2. Since the number of observation is odd (5), the median is the middle value, which is 6.

Another example with an even number of observations: {3, 5, 9, 10, 1, 7}:

  1. Arrange the dataset: {1, 3, 5, 7, 9, 10}.
  2. Since the number of observations is even (6), the median is the average of the two middle values, which are 4 and 6. So, the median is (5 + 7) / 2 = 6.

Example 01: 4,5,12,17,20

The median of the above data is 12 because it is the middle value.

Example 02:  3,9,5,12,10

The above problem is not in an order. So first we need to numerically order it.

3,5,9,10,12

So median = 9

Median Formula

n = number of observation

If n = odd

Median = \left(\frac{n+1}{2}\right )^{th}term

If n = even

Median = \frac{\frac{n}{2}^{th}observation+\left ( \frac{n}{2}+1 \right)^{th}observation}{2}

MODE

Mode is the most frequently occurring value in the dataset. A dataset can have one mode called unimodal or multiple modes called multimodal.  The mode can be useful for describing the typical or most common value in a dataset, especially for categorical or discrete data.

let’s consider the following data set.

2,5,6,6,7,12

In the given data set 6 appears most frequently, occurring two times, making it the modes of the data set. 

The mean is influenced by outliers, and the median, while less sensitive to outliers, may not always accurately reflect the overall distribution, the mode is strong to the impact of outliers.

While the mode provides information about the most common value(s) in a dataset, it doesn’t provide any information about the distribution of the remaining values or the extent of variability within the dataset