Prediction of Patients’ Mortality during Hospitalizations

Mojtaba Zare, Janusz Wojtusiak, Mehrbakhsh Nilashi


In this study, we predict patients’ mortality in the session that a patient is hospitalized. We focus on both lab results and vital signs collected in the first 24 hours of a patient admission to predict his/her mortality within that hospitalization. We use MIMIC-III dataset for data analysis and building predictive models. We include only patients with age of 18 or above resulting in a sample of 38,578 patients. Independent variables include patients’ demographic information, lab results and vital signs. The dependent variable is whether the patient dies within that hospitalization. We use Weka 3.8 for data analysis and model building. After randomly splitting the data into 80 and 20 percent, then we use the 80 percent of the data for feature selection as well as training the prediction models. As a result, we construct four prediction models using Bayes Network, Logistic Regression, Naïve Bayes and Random Forest. After constructing these models, we test them on the remaining 20 percent of the data. We use Receiver Operating Characteristic (ROC) area, precreation and recall values to compare the accuracy of these models with each other. Result showed that the highest ROC area belongs to the Bayes Network at 0.824 followed closely by Logistic Regression at 0.817 for the test set.


Mortality, MIMIC-III, Classification algorithm, Prediction models

Full Text:

Abstract PDF


Awad, A., Bader-El-Den, M., McNicholas, J., & Briggs, J. (2017). Early hospital mortality prediction of intensive care unit patients using an ensemble learning approach. International Journal of Medical Informatics, 108, 185-195.

Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157-1182.

Haapio, M., Helve, J., Grönhagen-Riska, C., & Finne, P. (2017). One-and 2-Year Mortality Prediction for Patients Starting Chronic Dialysis. Kidney International Reports, 2(6), 1176-1185.

Johnson, A. E., Pollard, T. J., Shen, L., Lehman, L. W. H., Feng, M., Ghassemi, M., ... & Mark, R. G. (2016). MIMIC-III, a freely accessible critical care database. Scientific data, 3.

Lee, J., Maslove, D. M., & Dubin, J. A. (2015). Personalized mortality prediction driven by electronic medical data and a patient similarity metric. PloS one, 10(5), e0127428.

Malvezzi, M., Bertuccio, P., Rosso, T., Rota, M., Levi, F., La Vecchia, C., & Negri, E. (2015). European cancer mortality predictions for the year 2015: does lung cancer have the highest death rate in EU women?. Annals of Oncology, 26(4), 779-786.

Martín-Sánchez, J. C., Clèries, R., Lidón, C., González-de Paz, L., Lunet, N., & Martínez-Sánchez, J. M. (2016). Bayesian prediction of lung and breast cancer mortality among women in Spain (2014–2020). Cancer epidemiology, 43, 22-29.

Vijayarani, S., Dhayanand, S., & Phil, M. (2015). Kidney disease prediction using SVM and ANN algorithms. International Journal of Computing and Business Research (IJCBR), 6(2).

Weka software. Online from:

Wojtusiak, J., Elashkar, E., & Nia, R. M. (2017, February). C-Lace: Computational Model to Predict 30-Day Post-Hospitalization Mortality. In HEALTHINF (pp. 169-177).


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.