Introduction to Statistics
- Statistics is the art and science of gathering, analyzing, and making inferences (predictions) from numerical information.
- The numerical information is called data
- Descriptive statistics is concerned with collecting, organizing and analyzing data.
- Inferential statistics is concerned with making generalizations, or predictions, from the data.
- "If a probability expert and a statistician find identical boxes, the probability expert might open the box, observe the contents, replace the cover, and proceed to compute the probability of randomly selecting a specific object from the box. The statistician might select a few items from the box without looking at the contents and make predictions as to the total contents of the box."
- The universal set is called the population in statistics.
- Statisticians form a sample or a subset of the population for study.
- Samples are studied because
- It is sometimes impossible to study the entire population: Think mosquitoes
- Studying the population might destroy it: Think testing cans of food for quality of canning process.
- It would be too expensive to study the entire population: Think talk to every single student at Edinboro.
- We use different symbols for some statistical information to indicate that we are dealing with a population or a sample.
- Mean: x̄ for sample, μ for population
- Standard Deviation: s for sample, σ for population
- Generally you can assume we are looking at a sample.
- The goal of honest statistics is to construct an unbiased sample or a sample that is representative of the population.
- The book lists a number of sampling techniques
- Biased Sampling: Select a population that agrees with what you want to show.
- Biased by definition
- Dishonest
- Worthless scientifically
- Convenience Sampling: Select whatever data is handy
- Probably biased
- Probably not very useful
- Random Sampling: Select randomly from the population
- The gold standard.
- Each item has an equally likely chance of being selected.
- Draw numbers from a hat, ...
- Number each element of the population.
- Select n random numbers
- Elements corresponding to these numbers are selected.
- Systematic sampling
- Select every nth item that passes by the sampling point.
- Make sure that the entire population is sampled.
- This is some times a problem due to systematic processes.
- Cluster Sampling
- Divide physically separated data into geographic regions.
- Randomly select a number of these regions to sample.
- Use all data within the region, or conduct another sampling technique on the region.
- Stratified sampling: Divide the population into classes based on some identifying characteristic
- Sample each of the class
- Do some problems page 778