Getting Started
- I thought the best way to get started is to do a mini project.
- This class is about excel, not strictly data analytics, but
- It carries the data science label
- It seems like a good starting place.
- Stages of a data science project
- Pose a question: what are you trying to discover.
- Find/collect/discover data that you might use to answer this question.
- Clean the data
- Data exploration
- Build a model
- Interpret the results.
- Since statistics is not a prerequisite for this class, we will probably not do the 5th stage.
- Interestingly in 5 Steps of a Data Science Project Life-cycle
- Lau states 60% to 70% of the time is spent in the gathering/cleaning stage.
- We will certainly to that portion
- This article is fairly light weight, give it a read.
- Our question: How has enrollment in computer competency classes changed over the last few years at Edinboro?
- Our data is here
- This data was collected by Dr. Hoggard.
- He used the Advanced Search tool in SCOTS Look Up Classes selection
- He Selected all departments.
- He filtered on the Session: Attribute Type: as .COMPUTER COMPETENCY COURSE
- He copied each result into a tab.
- I added S20.
- Let's do that to add F17 for a full three years data.
- Navigate to the page is SCOTS
- Add a new tab
- Rename the tab F17
- Copy and paste the data
- Use Match Destination Format
- The need for Documentation
- There is a huge push for reproducible research in data science.
- An early article: Ten Simple Rules for Reproducible Computational Research
- For Every Result, Keep Track of How It Was Produced
- Avoid Manual Data Manipulation Steps
- Archive the Exact Versions of All External Programs Used
- Version Control All Custom Scripts
- Record All Intermediate Results, When Possible in Standardized Formats
- For Analyses That Include Randomness, Note Underlying Random Seeds
- Always Store Raw Data behind Plots
- Generate Hierarchical Analysis Output, Allowing Layers of Increasing Detail to Be Inspected
- Connect Textual Statements to Underlying Results
- Provide Public Access to Scripts, Runs, and Results
- We really can't do some of this in excel. (IE RULE 2 will constantly be violated)
- In fact you can find many articles discussing the problem of using excel for data analysis.
- But we really need to do our best to record what we have done.
- So I propose an outline for a project report paper.
- This should be a living document
- Start working on this now.
- Add to it as you work on the project.
- By the way, while we are at it, here is a set of guidelines for a workbook