• I wanted to understand who was in this class.
    • Questions: Is there anything special I should be doing for this class?
  • Step 1, Find some data:
    • My only real data was from SCOTS.
      • We have access to a detailed class list.
      • Unfortunately this is
        • A web page, (in html)
        • And not directly accessible (you will see why this is important later).
  • Step 2: Acquire and transform the data.
    • I did a copy and paste from the web page into a text file.
    • This is not terribly useful, so I wrote a python script to convert this to a Comma Separated Value or (CSV) file.
    • reader.py
    • This produced
  • Step 3: Import and clean the data.
    • Just clicking on a CSV file will start Excel.
    • The data consists of
      1. Level (undergraduate/graduate)
      2. Class (Freshman, Sophomore, Junior, Senior)
      3. Year
      4. First Program
      5. First College
      6. First Department
      7. First Program
      8. First Track
      9. Second Program Information
      10. Second College
      11. Second Department
      12. Second Program
      13. Second Track
    • These are not labeled, so I labeled them
    • I really don't care about the Level, Year, and any secondary degree information.
    • So I removed these fields.
      • By the way, two definitions
        • A record is a collection of information about an individual element of the population.
          • In this case the data on each of you constitutes a record.
          • In some cases, it might be an individual measurement.
          • But it is all the data associated with the individual/measurement, ...
        • A field is a piece of data in a record.
          • In this case the Class, Program, College, Department, Major and Track are fields.
        • Records are composed of fields.
        • Hopefully datasets are composed of records.
    • At this point we should note
      • For multiple degree students, we have the first degree listed, not all degrees.
      • Not all students have tracks.
  • Step 4 basic data exploration
    • Easy data exploration is to turn the worksheet into a table.
      • This allows you to filter the data
      • And see that we mostly have freshmen,
      • Note: the upperclassmen are all science majors.
    • Removing the table, subtotals provide an easy way to see the class counts
    • Building a pivot table allows additional exploration
      • What colleges are the Freshmen/Sophomores from?
      • We can even produce a view of the counts by college/Program
  • Step 5: Interpretations/Recommendations
    • There is a small number of Juniors/Seniors enrolled in this class
      • All are science majors.
      • Most have credit for computer skills. (Look each one up on SCOTS)
      • Recommendation: Have an alternative for these students.
    • The majority of the remaining students are first semester freshmen
      • Recommendation: Make sure to pay attention to freshman issues.
    • The majority of the freshmen are likely to have or need quantitative
      • Recommendation: Make sure that basics are covered but some chance for more quantitative analysis.
      • Recommendation: Look for more business oriented data sets.
  • The data files
    • class.csv the csv file.
    • Analysis.xlsx the analysis file.