Tidymodels

Intro Recap There are 2 approaches to explaining models Use simple interpretable models. This approach was covered in the previous posts where we looked at logistic regression and decision trees as examples of white box models. Conduct post-hoc interpretation on models. There are two are two types of post-hoc analysis which can be done, model specific and model agonistic. Direction of post In the next few posts, we will look at model specific post-hoc analysis which involves ranking the variables according to importance to the model.

Introduction This is a follow up post of using simple models to explain machine learning predictions. In the last post, we introduced logistic regression and in today’s entry we will learn about decision tree. We will continue to use the Cleveland heart dataset and use tidymodels principles where possible. The details of the Cleveland heart dataset was also described in the last post. #library library(tidyverse) library(tidymodels) #import heart<-read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data", col_names = F) # Renaming var colnames(heart)<- c("age", "sex", "rest_cp", "rest_bp", "chol", "fast_bloodsugar","rest_ecg","ex_maxHR","ex_cp", "ex_STdepression_dur", "ex_STpeak","coloured_vessels", "thalassemia","heart_disease") #elaborating cat var ##simple ifelse conversion heart<-heart %>% mutate(sex= ifelse(sex=="1", "male", "female"),fast_bloodsugar= ifelse(fast_bloodsugar=="1", ">120", "<120"), ex_cp=ifelse(ex_cp=="1", "yes", "no"), heart_disease=ifelse(heart_disease=="0", "no", "yes")) ## complex ifelse conversion using `case_when` heart<-heart %>% mutate( rest_cp=case_when(rest_cp== "1" ~ "typical",rest_cp=="2" ~ "atypical", rest_cp== "3" ~ "non-CP pain",rest_cp== "4" ~ "asymptomatic"), rest_ecg=case_when(rest_ecg=="0" ~ "normal",rest_ecg=="1" ~ "ST-T abnorm",rest_ecg=="2" ~ "LV hyperthrophy"), ex_STpeak=case_when(ex_STpeak=="1" ~ "up/norm", ex_STpeak== "2" ~ "flat",ex_STpeak== "3" ~ "down"), thalassemia=case_when(thalassemia=="3.

Introduction The rise of machine learning In this current 4th industrial revolution, data science has penetrated all industries and healthcare is no exception. There has been an exponential use of machine learning in clinical research in the past decade and it is expected to continue to grow at an even faster rate in the following decade. Many machine learning techniques are considered as black box algorithms as the intrinsic workings of the models are too complex in justifying the reasons for the predictions.

Tidymodels

Explaining Predictions: Random Forest Post-hoc Analysis (permutation & impurity variable importance)

Explaining Predictions: Interpretable models (decision tree)

Explaining Predictions: Interpretable models (logistic regression)