A Data Science Project Paper Overview
- Cover Page
- TOC
- Introduction
- Describe the basics of your project, what question will you try to answer and what data will you look at.
- This section should
- Introduce anything the reader needs to understand to read the rest of the document.
- Draw the reader in, invite them to continue reading.
- Don't discuss meta-information about the project or paper.
(ie this stuff)
- Data
- Overview
- Introduce your dataset.
- Where did you get this data?
- How much data is there (rows and fields)
- If you will filter your data (remove some of the original dataset)
- Briefly describe why you will filter the data.
- Provide an overview (size in rows and fields) of the remaining data.
- Describe the fields of interest.
- Define any specialized terms associated with this dataset.
- Data Dictionary
- Provide a description of every field in the dataset.
- This should be a well formatted table or a bulleted list.
- Example Data
- Give a few rows from the dataset
- These should contain examples of valid data in all fields.
- This should be one or more well formatted tables.
- Data Cleaning
- If you decide to filter the data (not due to problems)
- Discuss the reasons for filtering data
- Discuss the rational for selecting the data you did.
- Describe how you filtered the data (technical excel discussion)
- For each problem you encounter with the data
- Describe the problem.
- Provide an example of bad/missing data
- Describe how the problem will be mitigated.
- Give a rational for fixing this problem the way you did.
- Describe the actions taken in excel to mitigate the problem.
- Analysis of Fields Used.
- This will depend on the field.
- Numeric data should have at least a five number summary.
- Some form of chart/graph to depict the data.
- If you performed some computation on this data:
- Describe that computation in sufficient details that a knowledgeable excel user could reproduce the computation.
- This will include a written discussion.
- It should include screen shots for any out of the ordinary computations.
- Methods
- What did you do beyond basic analysis of the data?
- Include a description of what you did.
- Discuss why you did it.
- Provide a description of how you did it in excel
- Description and screen shots as required.
- What did it yield.
- Problems Encountered
- What went wrong?
- Why did it go wrong?
- This is not a whining section.
- You want to provide your reader with a guide to problems.
- You want to document that you tried something that didn't work so they know not to try it
- Or at least not try it the way you did.
- Don't include
- A discussion of how you didn't know how to use excel.
- You will never completely know how to use a tool
- They are too large
- There are constantly new pieces added.
- People discover new ways to do things.
- A discussion of running out of time.
- A discussion of what you did not know how to do.
- Discoveries and Conclusions
- Restate what you have learned/discovered.
- State the question and discuss the answer based on what you have discovered/learned.
- Future Work
- Talk about things you would like to do but did not have time to accomplish
- But don't say "I ran out of time, but I would like to ..."
- New data you would like to collect to strengthen your study.
- Other areas you would like to investigate.
- Bib