Sampling Techniques and the Misuses of Statistics
- statistics is the art and science of gathering, analyzing and making predictions from numerical information obtained in an experiment.
- In probability, we look at all of the possible outcomes to decide what the chances of an event are.
- In statistics, we look at the results of a few experiments and determine what the outcomes must look like.
- Consider a magic 8-ball
- The probability approach is to crack it open and count the number of good, neutral and negative responses and then calculate the probability of each.
- A statistics approach is to check it a number of times and try to predict the number of each type of response.
- The population is the collection of items studied.
- A sample is a subset of the population, frequently used to predict properties of the population.
- Statisticians sample the population because
- It might be
- Impractical
- Impossible
- Very inconvenient
to look at the entire population.
- To get a good prediction of the entire population it is important to have a representative or unbiased sample
- In order to accomplish this, statisticians have identified a number of sampling techniques
- Random Sampling: samples are selected in such a way that each item in the population has an equal chance of being drawn
- Assign a unique number to each member of the population and use a random number generator to select the sample.
- Hard to achieve in some cases.
- This is the gold standard.
- Systematic Sampling: Starting at some position, select every nth item.
- If multiple items are producing the population, make sure that you hit all items.
- Cluster Sampling : Randomly select a group or groups of the population. Then either observe the entire group, or use another sampling technique on the groups.
- Stratified Sampling: divide the population based on some characteristic. Sample each group proportionally.
- Convenience sampling: pick a sample based on what is easy.
- Biased Sampling: Pick a group that agrees with what you are trying to demonstrate.
- When reading statistics
- It is important to know the sampling techniques, are the data valid?
- Make sure words have meaning
- What is the "biggest" retailer in the nation?
- This is especially true with the word "average"
- Be careful with graphs
- Y axis should start at 0
- In a circle chart, percent should add to 100
- EVERYTHING should be labeled.
- Careful changing two/three dimensional areas
- Don't draw correlations where they don't exist