Chinese Hepatolgy ›› 2023, Vol. 28 ›› Issue (4): 469-473.

• Non-alcoholic Fatty Liver Disease • Previous Articles     Next Articles

Development of Prediction Models Based on Machine Learning for Non-alcoholic fatty liver disease

LIU Lu, ZHU Jin-zhou, LIU Xiao-lin, WANG Chao, YI Min-yuen, GAO Jing-wen, XU Chun-fang   

  1. The First Affiliated Hospital of Soochow University, Jiangsu 215006, China
  • Received:2022-06-19 Online:2023-04-30 Published:2023-08-29
  • Contact: XU Chun-fang,Email:xcf601@163.com

Abstract: Objective To develop prediction models based on H2O automated machine learning(AutoML) tools for the incidence of Non-alcoholic fatty liver disease (NAFLD). Methods A total of 4,105 subjects were recruited in the study. The data was loaded using H2O AutoML to develop various machine learning models to predict NAFLD. The model was evaluated by ROC curve and confusion matrix, while visualized by SHAP, LIME, and partial dependence plots. Results Twenty-eight machine learning models were fitted. The best model was a gradient boosting machine (GBM) model (Gini 0.80, R2 0.42, LogLoss 0.45). Triglyceride (95%CI: -1.053~-0.887), aspartate aminotransferase (AST) (95%CI: -20.433~-16.927), high density lipoprotein (HDL) (95%CI: 0.232~0.268), ferritin (95%CI: -80.533~-68.607), and blood glucose (95%CI: -0.576~-0.424) were the important variables. The area under ROC in the validation dataset was 0.766 with a sensitivity of 0.715 and a specificity of 0.818, which suggested that the GBM models performed better than the XGBoost models, logistic regression, random forest, and deep learning. Conclusion The prediction model based on H2O AutoML algorithm provides both promise and insights in screening NAFLD patients.

Key words: NAFLD, Automatic machine learning (AutoML), Prediction model, Receiver operating characteristic curve (ROC), Confusion matrix, Shapley additive explanations (SHAP), Partial dependence plots (PDP)