A First Pass at the Superheros
- You should be working on the Cleaned Data worksheet of your superheros workbook.
- You should also have the methods document open to the section that discusses cleaning the data.
- Do you see any obvious problems with the dataset?
- Column A doesn't have a title.
- What does this column appear to be?
- What should we do about it?
- Why should we do this?
- After we make the decision, what should we do?
- In A1 capitalize name.
- The Weight field seems to have a problem.
- There are a couple of -99
- Does anyone weight -99?
- It is a semi-normal thing to make unknown numbers -99, or -999, or -9999
- This seems problematic here.
- We will need to fix this.
- It looks like the same is true for Height.
- And there appear to be a number of - values in the other fields.
- Before we start changing things, let's do some basic exploration.
- Naming ranges
- We have been selecting ranges for all of our computations so far.
- This is somewhat painful.
- We can Name ranges to make computations easier and more accurate.
- This is discussed starting on page 379 of the book.
- Naming a range allows you to refer to that range by a name other than a number.
- We will name j2:j735 to be weight.
- Naming all the columns in a table
- Select the table.
- Move to cell A1
- Select all cells through J735
- Go to the Formulas tab in the Defined Names command group.
- Select Create from Selection
-
- This will bring up the Create Names from Selection dialog.
-
- The names should come from the Top Row only, so uncheck Left Column if it is checked.
- Click on OK.
- You can see the names you just defined by clicking on the Name Manager
-
- This brings up the Name Manager dialog
-
- Notice that spaces have been replaced with the underscore.
- You can delete names here if you need to.
- Caution
- Don't create two different ranges with the same name.
- This can lead to problems.
- Analysis of Weight.
- Create a new worksheet called Weight Analysis
- Add a comment in cell a1 that states "This worksheet contains basic analysis of the weight field of the Cleaned Data worksheet.
- Let's start by counting the data
- We will use the count function.
- This only counts numeric values.
- In cell A3 enter Count of Data
- In cell B3 enter =count(wei
- Notice that it shows a list of potential candidates, including Weight
-
- B3 should eventually contain =count(Weight)
-
- To double check the count function, try counting the Name column
- It is a normal thing to compute the five number summary of a data field.
- The maximum value of the data
- The minimum value of the data
- The middle (or median, or second quartile) of the data
- The first quartile.
- The third quartile.
- This will involve three functions.
- =max computes the maximum numeric value in a range.
- =min computes the minimum numeric value in a range.
- =quartile or =quartile.* (inc, exc)
- There are multiple methods of computing the quartile.
- quartile is deprecated
- quartile.inc is probably the one to use for our class.
- If you want to know exactly what is happening, look here
- For small data, none of these are what you would expect when the data is even.
- For large data it doesn't matter.
- =quartile.inc(range, quartile)
- The quartile value is between 0 (min) and 4(max)
- Calculate these values