Starting a Project
- For the next few sets of notes we will be working with the Super Heroes Dataset from kaggle.
- A local copy is available.
- Download this.
- Save it as an excel file somewhere you can get access to it.
- Worksheets
- Take a look at the specification for the worksheets.
- Build an about, Raw Data and Cleaned Data worksheet.
- We will not construct the ABOUT sheet, but it should be there for the final project.
- Preparing to start.
- Take a look at the Methods Document rubric right now.
- My document needs some reorganization
- I will use the rubric,
- but the following will be a good guide to creating your final document.
- Right now open a word document.
- Add the following sections
- TOC
- Introduction
- Describe the basics of your project, what question will you try to answer and what data will you look at.
- This section should
- Introduce anything the reader needs to understand to read the rest of the document.
- Draw the reader in, invite them to continue reading.
- Don't discuss meta-information about the project or paper.
(ie this stuff)
- Data
- Overview
- Introduce your dataset.
- Where did you get this data?
- How much data is there (rows and fields)
- If you will filter your data (remove some of the original dataset)
- Briefly describe why you will filter the data.
- Provide an overview (size in rows and fields) of the remaining data.
- Describe the fields of interest.
- Define any specialized terms associated with this dataset.
- Data Dictionary
- Provide a description of every field in the dataset.
- This should be a well formatted table or a bulleted list.
- Example Data
- Give a few rows from the dataset
- These should contain examples of valid data in all fields.
- This should be one or more well formatted tables.
- Data Cleaning
- If you decide to filter the data (not due to problems)
- Discuss the reasons for filtering data
- Discuss the rational for selecting the data you did.
- Describe how you filtered the data (technical excel discussion)
- For each problem you encounter with the data
- Describe the problem.
- Provide an example of bad/missing data
- Describe how the problem will be mitigated.
- Give a rational for fixing this problem the way you did.
- Describe the actions taken in excel to mitigate the problem.
- Analysis of Fields Used.
- This will depend on the field.
- Numeric data should have at least a five number summary.
- Some form of chart/graph to depict the data.
- If you performed some computation on this data:
- Describe that computation in sufficient details that a knowledgeable excel user could reproduce the computation.
- This will include a written discussion.
- It should include screen shots for any out of the ordinary computations.
- Methods
- What did you do beyond basic analysis of the data?
- Include a description of what you did.
- Discuss why you did it.
- Provide a description of how you did it in excel
- Description and screen shots as required.
- What did it yield.
- Problems Encountered
- What went wrong?
- Why did it go wrong?
- This is not a whining section.
- You want to provide your reader with a guide to problems.
- You want to document that you tried something that didn't work so they know not to try it
- Or at least not try it the way you did.
- Don't include
- A discussion of how you didn't know how to use excel.
- You will never completely know how to use a tool
- They are too large
- There are constantly new pieces added.
- People discover new ways to do things.
- A discussion of running out of time.
- A discussion of what you did not know how to do.
- Discoveries and Conclusions
- Restate what you have learned/discovered.
- State the question and discuss the answer based on what you have discovered/learned.
- Future Work
- Talk about things you would like to do but did not have time to accomplish
- But don't say "I ran out of time, but I would like to ..."
- New data you would like to collect to strengthen your study.
- Other areas you would like to investigate.
- Bib
- This is a general set of guidelines.
- Start with your proposal document.
- Go through the rubric and see what it asks for.
- If you don't have information for a section
- State that you didn't encounter the issue/problem the section discusses
- State why this is the case.
- For example
- If your dataset was in great shape when you got it your Data Cleaning section could consist of the following:
- "After basic analysis it was found that the data contained no errors. It was not necessary to filter or clean to raw dataset to obtain a usable version of the data."
- Create lots of chart (graphs) and include them in your presentation.
- Keep your worksheets labeled and clean as you go
- You won't have to go back and redo the work when you produce your final report.
- As we learn a new technique, apply it to your dataset.