Statistics and Sampling Techniques
- Statistics is the art and science of gathering, organizing, analyzing, and making inferences from numerical information obtained from an experiment.
Cool side note: There is evidence that statistical methods the 17th century.
- Numerical information obtained from an experiment is referred to as data.
- Descriptive statistics is concerned with collection, organization and analysis of data.
- Inferential statistics is concerned with making generalizations or predictions from the data collected.
- Probability vs Statistics
- A probability problem is solved by examining the experiment, knowing all outcomes, and computing the chances of an event based on this information.
- A statistician will conduct an experiment a number of times, observing the results and predicting the outcomes.
- If there is a box of marbles
- The probability expert will open the box, count the marbles and predict what will happen.
- The statistician will sample marbles from the box and predict what the contents of the box are.
- When performing statistics, the universal set is the population
- A subset of the population is called the sample
- Making decisions bout a population based upon a sample can be inaccurate
- But it is frequently too costly
- Or too expensive
- Or not possible
- To examine the entire population.
- Think about quality control, crash testing, ....
- Would it be a good thing to test every car produced in a crash test?
- Would it be possible to ask every person their opinion on who should be president?
- Would it be feasible to check every can of soup for spoilage
- So we need to employ sampling techniques.
- There is a note that different symbols are used to represent the same statistical measurement for populations and samples
- mean , x̄ for sample, μ for population
- standard deviation, s for a sample, σ for a population
- When sampling, a goal is to obtain an unbiased sample or a sample that is representative of the entire population.
- There are a number of techniques employed to construct samples.
- Question: Should we build a hockey arena?
- Biased Sampling or Purposive sampling.
- At worst, pick items in the population which will give the answer you want
- This is not a valid sampling technique if useful information about the population is required..
- This can be done unintentionally, for example consult the experts in a given field.
- Hockey Arena : Ask only students involved in athletics, or only students involved in a single degree program.
- Hockey Arena : Or ask the people who run the athletic programs, they know the most about constructing and maintaining athletic venues and about the population of people who would use such a venue.
- The results will be biased.
- Convenience Sampling
- Sample the portion of the population that is easy to reach.
- Such a sample is most likely to be biased and unreliable.
- The book states it is sometimes better than nothing.
- Hockey Arena: Ask the first 10 people to exit McComb.
- Hockey Arena: Ask the first 10 people to show up at Baron-Forness in the morning.
- Random Sampling
- A sample is conducted in such a way that each member of the population is equally likely to be drawn.
- Assign a number to each item.
- Select a set of these numbers using a random number generator.
- You need to be sure that the sample is truly random.
- How could we do this for the Hockey Arena?
- Systematic Sampling
- Sample every nth item on a list or production line.
- Starting point should be random.
- Select the sampling interval (n) (every 100 items)
- Select a random number between 0 and 99 (s)
- Select items s,n+s, 2n+s, ... for the entire population
- Make sure that the technique does not correspond to some element in the manufacturing process.
- How would we do this with the Hockey Arena?
- Cluster Sampling
- Called area sampling
- Break an area into smaller areas or clusters.
- Sample all items located within a random sample of cluster.
- Sometimes a random sample of items within the cluster is sampled.
- Breaking a city into blocks and randomly selecting 20 blocks, t hen sampling all residents of those blocks.
- Selecting 10 boxes of bolts and counting the defective bolts in those boxes.
- How could we do this for the Hockey Arena?
- Stratified Sampling
- This method is used when there are divisions within the population
- And the opinions of all sub-groups are desired.
- Break the population into sub groups or strata
- Every element of the population must fit in exactly one strata.
- perform a sample on EVERY strata.
- The number of samples in a given strata is based on the proportionality of the strata to the total population.
- Examples (Classify each type of sampling):
- All registered vehicles in the state of Georgia are classified according to type: subcompact, compact, mid-size, full-size, SUV and truck. A random sample of vehicles from each category is selected.
- Every 10th iPod coming off an assembly line is checked for defects.
- A state is divided into counties. A random sample of 12 counties is selected. A random sample from each of the 12 counties is selected.
- All customers at the Eggstra Good Breakfast Restaurant are asked if their should be a law limiting cholesterol intake.
- The first 10 people to walk by this room are asked a question.