Cleaning and Documenting
- The modified worksheet is here
- Start by adding an About worksheet at the beginning
- Add your name
- A project description
- Add a description of the data.
-
- Set up the data dictionary
- Enter Data Dictionary, Field, Type and Discussion
- Format these.
-
- Copy and paste, transposed a set of headers.
-
- Complete the type and discussion fields.
-
- If this were a long term project, we should archive a copy of the current worksheet.
- Preserving the raw data is very useful.
- Let's start cleaning the data some.
- This is gong to be painful
- Let's sort the data first.
- I think we can delete two fields
- Select is a context sensitive field based on if the user has permission to enroll in the class.
- I think we can safely delete this from all sheets.
- Attribute is the same for all sheets by definition.
- I think we can safely delete this from all sheets.
- Delete these two, but document the decision.
- This caused a blank row at the top of each data set, so I deleted that as well.
- I resized all of the rows so that they fit on the worksheet.
-
- I just realized that I have inserted blank rows into my data. This is not good.
- I want to restart, but I don't want to lose the ABOUT tab, so
- I want to transfer the about tab back to the archived copy.
- Excel will not let you have two versions of the same workbook open at the same time
- Save the current worksheet as junk.
- Open ComputerLiteracyA.xlsx
- Use the Move or Copy option on the tab menu to transfer the about tab to ComputerLiteracyA.xlsx.
-
- Ok, so let's try to sort the data now.
- Go to the S20 tab
- Click anywhere in the data
- On the Data Tab, click Smallest to Largest in the Sort & Filter command group.
-
- Perhaps not what we want, but it is a start.
- Delete the line with Accounting
- This seems reasonable, it looks like the Program and the Subj match
- IE we don't lose any information by deleting this.
- This gives us a set of headers at the top of the table.
- This will allow us to sort using the Sort button
- Sort on the Actual field.
- Notice, it is confused and thinks this is a text field.
- That is ok, look at the data at the bottom, does it have any data we need to maintain?
- It will also allow us to filter the data.
- Filter on act not blank or Act.
- Looks like the data we need.
- Select this
- Press CTRL-Home to go to cell A1
- Press CTRL-SHIFT-DOWN_ARROW to select all rows
- Press CTRL-SHIFT-RIGHT_ARROW to select all data.
- Or
- press CTRL-END to move to one past the last cell.
- Move up one cell.
- Press CTRL-SHIFT-Home to select the table.
- In either case select copy.
- Add a new worksheet, called cleaned data.
- Before we go on, we better add a new column
- Right now, the term is identified by the worksheet the data is on.
- We probably want a field that identifies this when we merge all the data.
- We could compute this form the dates, but that might get messy
- And it is easy to do do now.
- So add a field Term
- Put Spring 20 in the first cell.
- Copy that down, darn fix it.
- We better add Term to our data dictionary right now.
- Go back
- Sort and filter all other semesters
- Add the Term field
- Copy and paste the data, without the header into the cleaned data sheet.
- Now delete the Select and Attribute fields.
- Save this as ComputerLiteracyB.xlsx
- Look through the data, do we need to do anything else?
- Update the data dictionary (TBA, HON, ...)