Cleaning the House Sales Data

When you finish this exercise you should You will be dealing with another dataset from Kaggle.com. This one represents the prices of houses sold in King County Washington between May 2014 and May 2015. We will use this data for the next few exercises to see if we can find anything interesting about this housing market.

Word of caution, this is a fairly large data set. 21,000+ records. Be careful.

  1. Download this file. Save it somewhere you can find it.
  2. Information step: the data in this file consists of:
  3. The Format data in the ID column as a number with no decimal places.
  4. Format the price column as an amount.
  5. Hide the lat and long columns.
  6. Let's turn the date column into something useful.
  7. Go through and change the labels to something more readable. sqft_lot should be changed to SQFT Lot for example.
  8. It appears that the sizes of some houses and lots have changed.
  9. It would be nice to have all of the size measurements together
  10. Let's do a check to see if the living area matches the upper + basement area.
  11. It would probably be good to see the age of the house when sold.
  12. Save your work
  13. Submit your saved document to the Houses Part 1 folder in the Assignment section of D2L for this class.