View on GitHub

TC's Data Science Vault

Keep an eye out for code sharing, data projects and interesting analytics...

Predicting Property Sales Prices using Ames Housing Data

The Ames Housing Dataset was compiled by Dean De Cock and it is a common dataset used for data science education. It contains 80 explanatory variables describing aspects of residential homes in Ames, Iowa. The goal of the analysis is to predict property selling price using regression.

This project was undertaken from the perspective of an investment company that wants to use statistical analysis to optimize investment and maximize return. The analysis had two components:

Analysis Results

The fixed variables were found to be the most predictive property attributes for sale price.

The fixed variable model uses multiple linear regression that identifies the following variables as the most predictive of sale price. They are ranked in terms of their absolute (lasso) coefficients:

  1. Total Square Foot: 33458
  2. Year Built: 12568
  3. GF Living Area: 9852
  4. Garage Area: 8389
  5. Fireplaces: 7205
  6. Garage Year Blt: 5034
  7. Bathrooms (Number) Total: 4157
  8. Lot Area: 3610
  9. Garage Car Spaces: 3507
  10. Lot Frontage: 3116

The coefficient of determination (R^2) of this model for the train/test data (pre/2010 sales) is:

Ridge and Lasso regression were then performed on the training data to see if they improved the R^2 of our linear regression:

There is only minor improvement with both regularisation techiques, which indicates variables cannot be significantly penalised (i.e.: their magnitudes cannot be significantly reduced) to improve model performance, and there is little overfitting.

The changeable feature model has an R^2 value of 0.273. These features alone are unable to confidently predict sale price although some interesting insights on the types of property features that produced the most significant coefficients were observed…

To review the results of this analysis in greater detail in the technical notebook please follow this link:

Technical Notebook

Back to Homepage