WOLFRAM NOTEBOOK

WOLFRAM|DEMONSTRATIONS PROJECT

Calculating Sample Size

% confidence level
50
68
90
95
99
% confidence interval (e)
0.26
% accuracy
0.795
data size (population)
486000000
calculated sample size
10
Statistically, 10% of a population is enough to estimate the survey results of 100%. But if you have a huge dataset, such as 1 billion records, instead of looking at 10% of the population (which is still large), you can look for the optimal (minimum) amount of data to survey.
This standard equation defines the appropriate sample size (
SS
) of people to use for a survey:
SS=
2
Z
P(1-P)
2
e
1+
2
Z
P(1-P)
2
e
N
.
It is very common to use this equation for population sizes of big data projects in order to define the appropriate sample of data that should be analyzed.
The parameters to define the sample size are:
Confidence level
Z
: the precision required for the survey
Confidence interval
e
: the error tolerance for the survey,
-0.04e0.4
Accuracy
P
: the data quality or trustworthiness of the information in the data
Data size
N
: the total population (or number of records in the database)
Wolfram Cloud

You are using a browser not supported by the Wolfram Cloud

Supported browsers include recent versions of Chrome, Edge, Firefox and Safari.


I understand and wish to continue anyway »

You are using a browser not supported by the Wolfram Cloud. Supported browsers include recent versions of Chrome, Edge, Firefox and Safari.