Some Finishing Touches
- We will continue working with this document.
- Today you will also need heroes_information.csv
- I would like to add
- A preliminary data dictionary
- A sample of the data.
- The original data was in a zip file.
- This allows you to package many files into a single file.
- Windows is good at handling zip files.
- Double click on it and drag out the files you want
- Or right click on it and select extract all.
- Do this demo quickly.
- Download and open heroes information.
- Note, the type is .csv
- This stands for comma separated values.
- There is a comma between every field.
- But in addition, there are rules if a field contains a comma.
- Open this with notepad
- Excel is happy to work with CSV files.
- BUT
- Unless you change the format, it will not preserve any of the formatting, equations, ...
- Just the data.
- And only the tab you save.
- Walk through saving as a workbook (xlsx)
- Adding an about tab
- At the bottom of the excel workbook, click on the plus inside of the circle.
- This will add a new worksheet.
- The entire document is called a workbook
- "Pages" within a workbook are called worksheets
- We use these to organize the worksheet and make it more readable.
- Right click on the tab and select Rename
- Rename the heroes_information tab to be Raw Data
- Grab the tab and move it first in the list of tabs.
- Populate the About Sheet
- Put your name in cell A1
- Put "Super Heroes Project" in cell A2
- Put a description of the project in cell A4
- Put source information (kaggle, url, .. in cell A6-7)
- Build a data dictionary
- In Cell B9 Put Field
- In cell C9 put Type
- In cell D9 put Description
- Make these bold and draw a bottom border on the cells.
- Copy the headings from the Raw Data and past them in cell B10:B...
- Copy, use transpose paste.
- Add the types, (Text, Numerical, Categorical, ...)
- Look at the kaggle page, add a description to each field.
- At least for the fields where there is a description.
- When working on data, always make an About tab.
- It is probably best to save the workbook now.
- Let's add this data dictionary to the document.
- From the workbook, select the fields and descriptions
- highlight one, click highlight the second.
- Copy
- move to the workbook, but paste with Keep Text Only.
- Format this with Bullets (Home tab, Paragraph command group)
- Not great, when we formatted the entire document as double spaced, it changed the lists as well
- Find the style for lists and change that to single spaced.
- While we are here note the different types of lists we can build.
- I find this difficult but
- As you work on a long term project, you should keep all of the pieces up to date.
- Build the ABOUT page when you grab the data, not later.
- Build the data dictionary when you grab the data, not later.
- Document your work as you go
- DO IT NOW.
- Project Proposal -> Methods Document
- In my mind these are the same thing.
- The Methods document/final report should be a continuation of the Project Proposal.
- Just keep adding pieces.
- Sample Data.
- Add a new heading 2, Sample Data
- Copy a few rows from the rawData filed, including the headings.
- Paste this as a table in the document under Sample Data.
- This just doesn't fit.
- We could mess with it and make it fit.
- But other data sets will be larger.
- Put a Continuous Section Break before and after the sample data.
- Move to the Sample Data section and change the page orientation to be Landscape.
- This is in the Layout tab in the Page Setup command group under Orientation
- Spend a little time formatting the table.
- Look at the table tabs, and apply a reasonable style.
- If you need more space, split your data into two or more tables.