Creating Category Data from Numeric Data
- For these notes we will continue to use the Cleaned Hero Dataset
- My problem is that there are too many different weights and heights to deal with.
- In D&D the build a table to classify things
- Look here (
https://www.dandwiki.com/wiki/SRD:Table_of_Creature_Size_and_Scale)
- I would like to use this table to classify my superheros weight and height as Fine, Diminutive, ... Colossal
- To do this I will need to
- Download the table and reformat it.
- Convert weights to pounds and heights to inches
- Associate the Size category with each superhero.
- We could just copy and paste the table but let's use a cool feature of excel.
- On the Data tab in the Get & Transform Data workgroup select From Web
-
- Fill out the form
-
- Select the table we are interested in and select Load
-
- This may take a moment.
- Notice this created a new worksheet.
- You are in the Queries & Connections tab.
-
- You could do many things here, but that is for DSCI 201.
- Just closet the query tab.
- I would rename the worksheet Weight & Height
- Cleaning the Weight and Height table
- Covert it back to a range.
- Clear the formatting.
- Delete row 13
- Delete Rows 2 and 3
- Delete all columns except
- Size Category
- Height or Length
- Weight
- Fix the names if you need to.
-
- Fixing the Weight.
- The weight field is currently text.
- I really want the first number out of this field.
- And I will probably need the units as well.
- We will use the Text To Columns from the Data Tools command group in the Data tab to do this.
-
- Be careful when you use this tool, it will clobber
- The data you are working on.
- Data to the right of your data.
- Splitting a text field.
- Select C2:C10 and click on Text to Columns
- This starts the Convert Text to Columns Wizard
-
- Delimited means we know what we want to split on.
- In this case, tabs or spaces is what we want.
- Fixed means fields are aligned and we know how wide each is.
- We want delimited so select next.
- In step 2 we need to tell it to split on space.
-
- Select space.
- I usually want to treat consecutive delimiters as one.
- Make sure that you check out the preview.
- In step 3, we can
-
- Format columns (sort of)
- Eliminate columns.
- Format the first as general,
- The next two as text
- Drop the last one.
- Select Finish
-
- Finishing the Weight Field.
- The first two, excel decided to convert to dates anyway
- We will want a 0 in the first, just put it there.
- The second should be 1/8 so enter =1/8 there
- We will need to reformat these fields.
- C2 as a number
- C3 as a fraction, but listen to the warning.
-
- We will need some constants for conversions, so
- In A15 put the following
-
- You can name individual cells
- Click in cell B16
- In the name box enter LbToTon
-
- Repeat this for the other two fields.
- In column F, compute the weight in Pounds
- Use an = to just copy over C1:C7
- Multiply by LbToTon in the last 3
-
-
- You can now delete Fields D and E
- We need to repeat the same process for the Height or Length column.
- But there is already data to the left, so we need to make space.
- I believe that we will need three columns.
- Insert these three columns to the RIGHT of the Height filed.
- Do the split.
- Replace the first one with 0
- In the new column E, convert everything to Feet.
- I forgot an Inch To Foot constant, so Add it.
- Delete columns C and D.
- Format the table headers nicely.
-
-
- Save this workseet, we will need it in the next set of notes.