An updated random forest bean model with new data - now spanning 1987-2017. I show off the new model (which still has excellent predictive power, though somewhat less than the previous model) and demonstrate some new analytics to maximize the purchasing insight from the model outputs.
Here, I show a demonstration of a simple investing/selling simulation based on historic bean data and the Random Forest model we built two posts ago. The basic premise is to see how much money we could make if we had had access to the predictive model in the 80s to 2011, and had purchased and sold beans based on the predictions it makes.
A break from prediction and machine learning, we go old-school and use some visualization and summary techniques from the Tidyverse to explore the bean data set we’ve built in the last few posts.
Finally, we get to some machine learning models. I go over using the functions in the caret package to build, test and tune a variety of models, and end up with a nice Random Forest model that does a very solid job predicting bean market prices 6 months in the future. We finish up with a little bit of data viz to assess our prediction power.
Here, we take the next step in bean market prediction with machine learning project. I go through the steps I use to preprocess and split the data into training and test sets using the R package caret.
An overview of my experience jumping into Machine Learning. The mind-switch from explanation to prediction, and a basic overview of my understanding of what Machine Learning is (and isn’t).
Messy Excel Files So, as I discussed last time, the first big hurdle in starting to explore the domestic dry bean market data was overcoming the terror of working with a bunch of really messy, really gnarly excel files.
The main one looks like this:Lots of problems, right? The data are in multiple sheets in a single workbook, they’re not uniform, etc. It’s an R-user’s nightmare, but the reality is that data often look like this.