notast

Introduction This is a follow up post of using simple models to explain machine learning predictions. In the last post, we introduced logistic regression and in today’s entry we will learn about decision tree. We will continue to use the Cleveland heart dataset and use tidymodels principles where possible. The details of the Cleveland heart dataset was also described in the last post. #library library(tidyverse) library(tidymodels) #import heart<-read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data", col_names = F) # Renaming var colnames(heart)<- c("age", "sex", "rest_cp", "rest_bp", "chol", "fast_bloodsugar","rest_ecg","ex_maxHR","ex_cp", "ex_STdepression_dur", "ex_STpeak","coloured_vessels", "thalassemia","heart_disease") #elaborating cat var ##simple ifelse conversion heart<-heart %>% mutate(sex= ifelse(sex=="1", "male", "female"),fast_bloodsugar= ifelse(fast_bloodsugar=="1", ">120", "<120"), ex_cp=ifelse(ex_cp=="1", "yes", "no"), heart_disease=ifelse(heart_disease=="0", "no", "yes")) ## complex ifelse conversion using `case_when` heart<-heart %>% mutate( rest_cp=case_when(rest_cp== "1" ~ "typical",rest_cp=="2" ~ "atypical", rest_cp== "3" ~ "non-CP pain",rest_cp== "4" ~ "asymptomatic"), rest_ecg=case_when(rest_ecg=="0" ~ "normal",rest_ecg=="1" ~ "ST-T abnorm",rest_ecg=="2" ~ "LV hyperthrophy"), ex_STpeak=case_when(ex_STpeak=="1" ~ "up/norm", ex_STpeak== "2" ~ "flat",ex_STpeak== "3" ~ "down"), thalassemia=case_when(thalassemia=="3.

Introduction The rise of machine learning In this current 4th industrial revolution, data science has penetrated all industries and healthcare is no exception. There has been an exponential use of machine learning in clinical research in the past decade and it is expected to continue to grow at an even faster rate in the following decade. Many machine learning techniques are considered as black box algorithms as the intrinsic workings of the models are too complex in justifying the reasons for the predictions.

Intro There are many illnesses and diseases known to man. How do the various stakeholders in the medical science industry classify the same illness? The illness will need to be coded in a standardized manner to aid in fair reimbursements and concise reporting of diseases. The International Classification of Diseases (ICD) provides this uniform coding system. The ICD “is the standard diagnostic tool for epidemiology, health management and clinical purposes.

Intro A week ago, Havard Business Review published an article on process mining and provided reasons for companies to adopt it. If you need a refresher on the concepts of process mining, you can refer to my first post. Conducting process mining is easy with R’s bupaR package. bupaR allows you to create a variety of visualizations as you analyse event logs. It includes visualizations of workflow on the ground which you can then compare them against theoretical models to discover deviations.

Recap In the last post, the discipline of event log and process mining were defined. The bupaR package was introduced as a technique to do process mining in R. Objectives for This Post Visualize workflow Understand the concept of activity reoccurrences We will use a pre-loaded dataset sepsis from the bupaR package. This event log is based on real life management of sepsis from the point of admission to discharge.

Explaining Predictions: Interpretable models (decision tree)

Explaining Predictions: Interpretable models (logistic regression)

What's that disease called? Overview of icd package

Process Mining (Part 3/3): More analysis and visualizations

Process Mining (Part 2/3): More on bupaR package