Central limit theorem

the central limit theorem is one of the

most important concepts in statistics

the reason for this is the unmatched

practical application of the theorem

okay let's get started then imagine that

you are given a data set its

distribution does not matter it could be

normal uniform binomial or completely

random the first thing you want to do is

start taking out subsets from the data

set or as statisticians call it you

start sampling it this would allow you

to get a better idea of how the entire

data set is made right okay

once you have taken a sufficient number

of samples and then calculated the mean

of each sample we'll be able to apply

the central limit theorem

no matter the distribution of the entire

data set binomial uniform or another one

the means of the samples you took from

the entire data set will approximate a

normal distribution the more samples you

extract and the bigger they are the

closer to a normal distribution the

sample means will be more over their

distribution will have the same mean as

the original data set and an end times

smaller variance where n is the size of

your samples you took from the data set

let's confirm the theorem with an

example we have prepared 960 random

numbers from 1 to 1,000 this is their

frequency distribution so you are sure

that they are randomly picked the mean

of this data set is 489 and it's

variance is 82,000 805 let's extract 30

random samples out of the data set each

consisting of 25 numbers remember when

we said that the sample should be

sufficiently large a common rule of

thumb is that the sample should be

bigger than 25 observations the bigger

the sample size the better the results

you'll get so we have our samples now we

are going to calculate their means and

plot them once again okay excellent

it looks approximately normally

distributed doesn't it let's check if

the other part of the theorem was right

the mean of our newly acquired dataset

is 492 while it's variance 3000 171 did

we expect these numbers we anticipated a

mean of 489 and a variance of 80 2805

divided by 25 so around three thousand

three hundred twelve well when dealing

with such big numbers we almost get the

mean right and the variance was not that

far off either in the next few lectures

you will learn how to statistically

confirm whether such small differences

are close enough to the actual result we

expect to obtain spoiler alert they are

and we'll show you why so we have

learned the main idea behind the central

limit theorem the key takeaway from this

lesson is that the number of samples

taken tends towards infinity the

distribution of the means start

approximating a normal distribution

imagine their power if your data set was

made up of millions of values and you

could afford to sample just a tiny bit

of them we can be assuming normally

distributed data almost all the time and

that's extremely helpful as you will see

later on okay thanks for watching

for more videos like this one please