An overview of the workflow to generate a tested and tuned machine learning algorithm that takes recent information about sold in Tucson, AZ and accurately predicts the price they sold for. This is the first step in building an interactive app that people can use to determine the likely sale price of a house.
This is the workshop I recently ran for the iSpace Workshop Series at the University of Arizona Science and Engineering Laboratory. We used the caret pacakage to go through an example classification problem and cover loading data, preProcessing data, model comparison and prediction on test data.
An updated random forest bean model with new data - now spanning 1987-2017. I show off the new model (which still has excellent predictive power, though somewhat less than the previous model) and demonstrate some new analytics to maximize the purchasing insight from the model outputs.
Here, I show a demonstration of a simple investing/selling simulation based on historic bean data and the Random Forest model we built two posts ago. The basic premise is to see how much money we could make if we had had access to the predictive model in the 80s to 2011, and had purchased and sold beans based on the predictions it makes.
A short post detailing a nice visualization suggested by a reader!
A break from prediction and machine learning, we go old-school and use some visualization and summary techniques from the Tidyverse to explore the bean data set we’ve built in the last few posts.
Finally, we get to some machine learning models. I go over using the functions in the caret package to build, test and tune a variety of models, and end up with a nice Random Forest model that does a very solid job predicting bean market prices 6 months in the future. We finish up with a little bit of data viz to assess our prediction power.