Sampling Techniques and Misuses of Statistics
- Statistics is a field of Mathematics interested in data.
- Collection
- Organization
- Analysis
- Interpretation
- Presentation
- This field has skyrocketed recently
- Data driven decision making
- Data analytic
- Data science
- Statistics is the art and science of gathering, analyzing, and making inferences from numerical information obtained in an experiment.
- An experiment is a controlled operation that yields a set of results.
- Descriptive statistics is concerned with collecting, organizing, and analyzing data.
- Inferential statistics is concerned with making generalization or predictions from the data collected.
- A statistician is interested in describing a population based on a sampling.
- A population is the universal set for a given study.
- A sample is a subset of the population, hopefully representative of the population.
- Sampling
- There are a number of reasons we can not examine an entire population
- It might not be possible:
- I would like to know what percent of the stars in the universe are larger than our sun.
- It might be possible but very expensive
- I would like to know what portion of the tick population in Erie County are infected with the bacteria that causes Lyme Disease
- Can I collect and test all of them?
- It might destroy the thing I want to know about
- I would like to know what portion of the bags of potato chips at a factory contain a burnt chip.
- If I open every bag, I won't have anything left to sell.
- The goal is sampling is to obtain an unbiased sample
- This is a sample where every item in the population has an equally likely chance of being selected.
- Sampling Techniques
- I would like to build a new Student Activity Center
- I want students to vote to pay a fee to fund this center.
- It will be filled with game opportunities
- Laser Tag
- Escape Rooms
- Computer Game Center
- To support this, I want to do a study of interest by conducting a poll.
- Biased Sampling
- Find the items to prove your point and select these.
- Absolutely worthless
- For my activity center, I will find 100 students who support this and ask their opinions on the center.
- Random Sampling
- Select items randomly from a population.
- Assign each a random id number
- Select from those id numbers.
- For my activity center, I will select 100 ids at random from the list of all active student id numbers..
- This is the gold standard of sampling.
- Systematic Sampling
- Start at some random point and select every nth item
- Probably best in some type of assembly line situation
- Select every 50th bag of potato chips and test for burnt chips.
- Can be problematic if I have 5 machines that bag chips. Why?
- For my activity center, I will:
- List the student population by banner id in order
- Divide the student population by 100 -> k
- Select a value between 1 and k to start
- Select ids at k, 2k, 3k, ... (there will be 100)
- Cluster Sampling or Area Sampling
- Divide an area into groups or sections.
- Usually based on physical proximity.
- Select a number of these groups
- Sample the selected groups, either fully or using another technique.
- Example 1: Select a number of cases of potato chips produced on a given day, then randomly sample 5 bags from each to see how many contain burnt chips.
- Example 2: Select 100 city blocks at random in Erie then survey every household in those selected blocks.
- For my activity center, I will:
- Select 10 classrooms from all classrooms in use at 2:00 on Monday.
- Select 10 people at random to survey
- Stratified Sampling
- When you know the population is broken into different categories
- Select a representative sample from each strata.
- Erie County PA has the following political mix
- 55% Democrat
- 34% Republican
- 11% Other
- So conduct a poll and ask an opinion of the correct percentage of each group.
- For my activity center, I will:
- Divide the list of students at Edinboro into class rank.
- Find the percentage of each.
- Select that percent of students from each group to survey.
- Convenience Sampling
- Pick whoever is handy.
- Ask the first 100 people you encounter
- In the Student Center Lower Level
- In the Library
- In McComb Field House
- In the Science and Health Professions Living Learning Floor
- In the parking lot.
- Misuse of statistics
- Make sure you know the sampling technique involved.
- Look out for biased experiments
- Are you still cheating on your exams?
- Look out for poorly constructed experiments
- Ambiguous terms: who is the biggest store?
- Misuse of Graphs
- Look how great the stock marked did Friday
-
- Or how poorly it did last week
-
- Or the last five years
-
- In general with graphs
- Show the full extent of the y axis.
- Make sure circle graphs sum to 100% only
- As you double 2d items, area is x4.
- As you double 3d items, volume is x8.