Finishing Numeric Fields
- We will use hero2.xlsx, the worksheet from last time.
- mode
- The mode of a data set is the number that occurs most frequently.
- There are debates if there can be more than one mode.
- But this further tests the "normality" of a dataset.
- =mode.sngl() returns the first "mode" encountered.
- =mode.mult() is strange in that you need to use an array to compute this.
- I will skip it for now.
- Let's Build a better histogram.
- I would like to exclude outliers from the histogram.
- An outlier is 1.5 times the inter quartile range above or below the Q1 and Q3
- The inner quartile range is just Q3-Q1
- Then the upper and lower bounds are just Q1-1.5*IQR
- I got 146 and 218.
- I want to divide this into 10 bins plus two extra
- For 10 bins between 146 and 218 I divide the difference by 10
-
-
- I want to build a the ranges for these bins.
- Label Two columns "Low" and "High" (D23 and E23)
- In D24 put 0
- In D25 put the computed lower bound.
- In D26 put =d25+the bin size
- Copy this down to D35
- In E24 copy D25
- Copy this down to E34
- In E35 put the maximum value.
-
-
- In F23 put "Range Title"
- We want to build a string that represents the range value.
- To do this we need to use string concatenation.
- The command is &
- To start with, in F25 put =D25 & " , "
- Not bad, modify this to be =D25 & " , " & E25
- Not bad, modify this to be = "(" & D25 & " , " & E25 & "]"
- Copy this down.
- Modify F25 to be [146, 153.2]
- In F24 Put Low Outliers
- In F35 put High Outliers
-
-
- What do [ and ( mean when we are talking about ranges?
- G23 put "Count"
- We will want to count things right.
- We will use a count if
- In the first bin I want to count everything less than or equal to 146
- But countf is strange, it takes a range and a formula
- But the formula is a string.
- In G24 put =countif(height, "<" & D25)
- Let's do the high outliers
- This is everything larger than D35.
- Build this.
- Ok, now for the second we want to count things that are greater than or equal to 146 but less than or equal to 153.2
- A single countif would be nasty.
- Let's build it up slowly
- In G25 put =countif(height, "<=" & D26)
- What did this count?
- Now let's take off the things that are less than 146
- In G25 put =countif(height, "<=" & D26) - countif(height, "<" & D25)
- For everything else, we need to modify this slightly
- We don't want to include the starting value, it will be counted in the bin below.
- Copy the equation from G25 to G26
- Change the second < to be a <=
- Copy this down, but don't clobber the last computation.
-
-
- How can we check this?
- Add up all of the values.
- What should these be?
- We can check against the counts (Count-count-na)
- Lets also check by changing a value to 146 and 153.2 in the data.
- Turn this into a chart.
- Highlight the range
- Select column chart.
- Fix the titles
- Fix the column widths.
- Look at 3d chart.