Random Forest

1 Intro 2 Data wrangling 2.1 Long format with aggregated values 2.2 Extend into the future 2.3 External regressor 2.3.1 Lags and rolling lags 2.3.2 Covid 2.3.3 Time series features 3 Splitting 4 Pre-processing recipes Pre-processing order 5. Modelling Workflow 6. Evaluate 6.1 Evaluate against the training set What’s inside the calibrated table 6.1 Evaluate with cross validation 8 Conclusion 1 Intro The aim of this series of blog is to predict monthly admissions to Singapore public acute adult hospitals.

Recap This is a continuation on the explanation of machine learning model predictions. Specifically, random forest models. We can depend on the random forest package itself to explain predictions based on impurity importance or permutation importance. Today, we will explore external packages which aid in explaining random forest predictions. External packages There are external a few packages which offer to calculate variable importance for random forest models apart from the conventional measurements found within the random forest package.

Intro Recap There are 2 approaches to explaining models Use simple interpretable models. This approach was covered in the previous posts where we looked at logistic regression and decision trees as examples of white box models. Conduct post-hoc interpretation on models. There are two are two types of post-hoc analysis which can be done, model specific and model agonistic. Direction of post In the next few posts, we will look at model specific post-hoc analysis which involves ranking the variables according to importance to the model.

Random Forest

Hierarchical forecasting of hospital admissions- ML approach (screen variables)

Explaining Predictions: Random Forest Post-hoc Analysis (randomForestExplainer package)

Explaining Predictions: Random Forest Post-hoc Analysis (permutation & impurity variable importance)