Homework 1: title

The goals of this homework are:

  1. Read Twenty Principles for Good Spreadsheet Practice by ICAEW.
    1. [4 points] Select any two principles between 8 and 16 inclusive.
      1. Summarize the principle in your own words.
      2. Explain how following the principle would lead to spreadsheets that are more usable or contain few errors.
  2. [2 points] Read Tidy Data by Hadley Wickham. The tidy tools portion of this document clearly does not apply to this class, does this mean we should not put our data into a "tidy" form? Why or why not?
  3. Analyze the Groundhog Day Forecasts and Temperatures from kaggle.com. I have provided a local copy if you don't want to sign up for an account. See kaggle for a description of the fields.

    Remember, the goal here is to show you can perform a basic analysis. You do not have to learn new tools, nor perform advanced computations at this point. You may if you wish, but that is not a requirment. Please ask for assistance if you need help.

    1. Open the file in excel, then make sure you save as a workbook, not a CSV file.
      • If you save as a csv file, you will lose all of your formatting.
    2. [2 points] You must name, and use the names of all data ranges you use.
    3. [2 points] Build and maintain an About worksheet throughout this process.
      • You are attempting to answer the question: How accurate is Punxsutawney Phil?
      • Or, Can I come up with a definition of "Winter" and "Spring" based on the data that matches Phil's?
    4. Clean the data by removing
      • All years when No record of the prediction was maintained.
      • All years when no temperature data was recorded.
      • You can do what you want with the Partial Shadow records, but document this on the about page.
      • Rename the worksheet to something reasonable.
    5. Select the fields you will be using for your analysis. I used Year, Punxsautauny Phil, Feb PA Average, March PA Average
    6. [4 points] Add a new sheet to perform a basic analysis of the fields you use in your analysis.
      • For numeric fields (other than year),
        • Compute the five number summary, count, average and standard deviation.
        • Draw a box and whisker plot of the data
        • Draw any other graphs that help you understand this data.
      • For text fields
        • Count the number in each category
        • countif is useful here.
        • Draw a picture representing the result.
    7. [4 points] Perform some additional computations to answer the question.
      • I added a column to compute if Phil predicted an early spring.
        • This is just a boolean computation
        • =b2="No Shadow"
      • I added a column to compute if it was an early spring
        • My criteria was: Is the pa average March temperature more than one standard deviation above the average of the March averages?
          • Average of all March temp = 35.99
          • SD of March temp = 4.23
      • I then calculated if the Prediction matched the Classification
        • IE Did C2=L2
      • [2 points] Finally, I computed
        • The number of actual early springs
        • The number of predicted early springs
        • The accuracy of the prediction
        • I made use of the countif function here.
    8. You are free to use any computation you wish to use to determine if the year had an early Spring.
      • I used z-score > 1.
      • Dr. Hoggard suggested a difference between the Feb Z score and the March Z score of more than 1.
      • Dr. Schaeffer suggests it is not Spring until the average temperature is above 40.
      • Dr. Puharic suggests it is not Spring until March 20/21

Submit

When you have completed this assignment, attach a word document and an excel worksheet an email message to dbennett@edinboro.edu.