Analyze the Groundhog Day Forecasts and Temperatures from kaggle.com. I have provided a local copy if you don't want to sign up for an account. See kaggle for a description of the fields.
Remember, the goal here is to show you can perform a basic analysis. You do not have to learn new tools, nor perform advanced computations at this point. You may if you wish, but that is not a requirment. Please ask for assistance if you need help.
- Open the file in excel, then make sure you save as a workbook, not a CSV file.
- If you save as a csv file, you will lose all of your formatting.
- [2 points] You must name, and use the names of all data ranges you use.
- [2 points] Build and maintain an About worksheet throughout this process.
- You are attempting to answer the question: How accurate is Punxsutawney Phil?
- Or, Can I come up with a definition of "Winter" and "Spring" based on the data that matches Phil's?
- Clean the data by removing
- All years when No record of the prediction was maintained.
- All years when no temperature data was recorded.
- You can do what you want with the Partial Shadow records, but document this on the about page.
- Rename the worksheet to something reasonable.
- Select the fields you will be using for your analysis. I used Year, Punxsautauny Phil, Feb PA Average, March PA Average
- [4 points] Add a new sheet to perform a basic analysis of the fields you use in your analysis.
- For numeric fields (other than year),
- Compute the five number summary, count, average and standard deviation.
- Draw a box and whisker plot of the data
- Draw any other graphs that help you understand this data.
-
- For text fields
- Count the number in each category
- countif is useful here.
- Draw a picture representing the result.
-
- [4 points] Perform some additional computations to answer the question.
- I added a column to compute if Phil predicted an early spring.
- This is just a boolean computation
- =b2="No Shadow"
-
- I added a column to compute if it was an early spring
- My criteria was: Is the pa average March temperature more than one standard deviation above the average of the March averages?
-
- Average of all March temp = 35.99
- SD of March temp = 4.23
- I then calculated if the Prediction matched the Classification
-
- IE Did C2=L2
- [2 points] Finally, I computed
- The number of actual early springs
- The number of predicted early springs
- The accuracy of the prediction
- I made use of the countif function here.
-
- You are free to use any computation you wish to use to determine if the year had an early Spring.
- I used z-score > 1.
- Dr. Hoggard suggested a difference between the Feb Z score and the March Z score of more than 1.
- Dr. Schaeffer suggests it is not Spring until the average temperature is above 40.
- Dr. Puharic suggests it is not Spring until March 20/21