Disease Diagnosis Using Machine Learning Techniques: A Review and Classification

Mehrbakhsh Nilashi, Neda Ahmadi, Sarminah Samad, Leila Shahmoradi, Hossein Ahmadi, Othman Ibrahim, Shahla Asadi, Rusli Abdullah, Rabab Ali Abumalloh, Elaheh Yadegaridehkordi


In this research, we reviewed and classified academic conference and journal papers; which used data mining techniques in disease classification and diagnosis based on public medical datasets published between 2007 and 2019. The results of this review demonstrated that the application of data mining techniques in disease classification has experienced a dramatic rise in recent years. The finding of this paper also revealed that there was minimal focus on developing methods using incremental version of data mining techniques. We hope that this research will provide useful information about various data mining techniques, their application in disease diagnosis, and help researchers in developing medical decision support systems with insights into the state-of-the-art of development methods.


Data Mining, Public Medical Datasets, Diseases Diagnosis, Literature Survey, UCI, PRISMA

Full Text:

Abstract PDF


Abdar, M., Zomorodi-Moghadam, M., 2018. Impact of patients’ gender on parkinson’s disease using classification algorithms. Journal of AI and Data Mining 6(2), 277-285.

Abdel-Zaher, A.M., Eldeib, A.M., 2016. Breast cancer classification using deep belief networks. Expert Systems with Applications 46, 139-144.

Adeli, A., Neshat, M., 2010. A fuzzy expert system for heart disease diagnosis, Proceedings of International Multi Conference of Engineers and Computer Scientists, Hong Kong. pp. 28-30.

Agrawal, P., Jayaswal, P., 2020. Diagnosis and classifications of bearing faults using artificial neural network and support vector machine. Journal of The Institution of Engineers (India): Series C 101(1), 61-72.

Ahmadi, H., Gholamzadeh, M., Shahmoradi, L., Nilashi, M., Rashvand, P., 2018. Diseases diagnosis using fuzzy logic methods: A systematic and meta-analysis review. Computer Methods and Programs in Biomedicine 161, 145-172.

Ahmadi, N., 2019. Intelligent Approaches towards Fuzzy Segmentation and Fuzzy Edge Detection. Journal of Soft Computing and Decision Support Systems 6(6), 9-13.

Ahmadi, N., Akbarizadeh, G., 2017. Hybrid robust iris recognition approach using iris image pre-processing, two-dimensional gabor features and multi-layer perceptron neural network/PSO. Iet Biometrics 7(2), 153-162.

Ahmadi, N., Akbarizadeh, G., 2018. Iris tissue recognition based on GLDM feature extraction and hybrid MLPNN-ICA classifier. Neural Computing and Applications, 1-15.

Ahmadi, N., Nilashi, M., Samad, S., Rashid, T.A., Ahmadi, H., 2019. An intelligent method for iris recognition using supervised machine learning techniques. Optics & Laser Technology 120, 105701.

Al-Fatlawi, A.H., Jabardi, M.H., Ling, S.H., 2016. Efficient diagnosis system for Parkinson's disease using deep belief network, 2016 IEEE Congress on Evolutionary Computation (CEC). IEEE, pp. 1324-1330.

Amin, M.S., Chiam, Y.K., Varathan, K.D., 2019. Identification of significant features and data mining techniques in predicting heart disease. Telematics and Informatics 36, 82-93.

Anooj, P., 2012. Clinical decision support system: Risk level prediction of heart disease using weighted fuzzy rules. Journal of King Saud University-Computer and Information Sciences 24(1), 27-40.

Aslam, M.W., Zhu, Z., Nandi, A.K., 2013. Feature generation using genetic programming with comparative partner selection for diabetes classification. Expert Systems with Applications 40(13), 5402-5412.

Åström, F., Koker, R., 2011. A parallel neural network approach to prediction of Parkinson’s Disease. Expert systems with applications 38(10), 12470-12474.

Asuncion, A., Newman, D., 2007. UCI machine learning repository.

Avci, D., Dogantekin, A., 2016. An expert diagnosis system for parkinson disease based on genetic algorithm-wavelet kernel-extreme learning machine. Parkinson’s disease 2016.

Babu, G.S., Suresh, S., 2013. Parkinson’s disease prediction using gene expression–A projection based learning meta-cognitive neural classifier approach. Expert Systems with Applications 40(5), 1519-1529.

Bashir, S., Khan, Z.S., Khan, F.H., Anjum, A., Bashir, K., 2019. Improving Heart Disease Prediction Using Feature Selection Approaches, 2019 16th International Bhurban Conference on Applied Sciences and Technology (IBCAST). IEEE, pp. 619-623.

Behroozi, M., Sami, A., 2016. A multiple-classifier framework for Parkinson’s disease detection based on various vocal tests. International journal of telemedicine and applications 2016.

Bhardwaj, A., Tiwari, A., 2015. Breast cancer diagnosis using genetically optimized neural network model. Expert Systems with Applications 42(10), 4611-4620.

Bhatia, S., Prakash, P., Pillai, G., 2008. SVM based decision support system for heart disease classification with integer-coded genetic algorithm to select critical features, Proceedings of the world congress on engineering and computer science. pp. 34-38.

Bhattacharya, I., Bhatia, M.P.S., 2010. SVM classification to distinguish Parkinson disease patients, Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India. pp. 1-6.

Budrionis, A., Bellika, J.G., 2016. The learning healthcare system: where are we now? A systematic review. Journal of biomedical informatics 64, 87-92.

Buza, K., Varga, N.Á., 2016. Parkinsonet: estimation of updrs score using hubness-aware feedforward neural networks. Applied Artificial Intelligence 30(6), 541-555.

Çalişir, D., Doğantekin, E., 2011. An automatic diabetes diagnosis system based on LDA-Wavelet Support Vector Machine Classifier. Expert Systems with Applications 38(7), 8311-8315.

Cauchi, M., Fowler, D., Walton, C., Turner, C., Waring, R., Ramsden, D., Hunter, J., Teale, P., Cole, J., Bessant, C., 2015. Comparison of GC-MS, HPLC-MS and SIFT-MS in conjunction with multivariate classification for the diagnosis of Crohn's disease in urine. Analytical Methods 7(19), 8379-8385.

Chaurasia, V., Pal, S., Tiwari, B., 2018. Prediction of benign and malignant breast cancer using data mining techniques. Journal of Algorithms & Computational Technology 12(2), 119-126.

Chen, C.-H., 2014. A hybrid intelligent model of analyzing clinical breast cancer data using clustering techniques with feature selection. Applied Soft Computing 20, 4-14.

Chen, H.-L., Huang, C.-C., Yu, X.-G., Xu, X., Sun, X., Wang, G., Wang, S.-J., 2013. An efficient diagnosis system for detection of Parkinson’s disease using fuzzy k-nearest neighbor approach. Expert systems with applications 40(1), 263-271.

Darmawahyuni, A., Nurmaini, S., Firdaus, F., 2019. Coronary Heart Disease Interpretation Based on Deep Neural Network. Computer Engineering and Applications Journal 8(1), 1-12.

Das, R., 2010. A comparison of multiple classification methods for diagnosis of Parkinson disease. Expert Systems with Applications 37(2), 1568-1572.

Das, R., Turkoglu, I., Sengur, A., 2009. Effective diagnosis of heart disease through neural networks ensembles. Expert systems with applications 36(4), 7675-7680.

Dogantekin, E., Dogantekin, A., Avci, D., Avci, L., 2010. An intelligent diagnosis system for diabetes on linear discriminant analysis and adaptive network based fuzzy inference system: LDA-ANFIS. Digital Signal Processing 20(4), 1248-1255.

Erkaymaz, O., Ozer, M., 2016. Impact of small-world network topology on the conventional artificial neural network for the diagnosis of diabetes. Chaos, Solitons & Fractals 83, 178-185.

Eskidere, Ö., Ertaş, F., Hanilçi, C., 2012. A comparison of regression methods for remote tracking of Parkinson’s disease progression. Expert Systems with Applications 39(5), 5523-5528.

Froelich, W., Wrobel, K., Porwik, P., 2015. Diagnosis of Parkinson's disease using speech samples and threshold-based classification. Journal of Medical Imaging and Health Informatics 5(6), 1358-1363.

Ganji, M.F., Abadeh, M.S., 2011. A fuzzy classification system based on Ant Colony Optimization for diabetes disease diagnosis. Expert Systems with Applications 38(12), 14650-14659.

Ghumbre, S., Patil, C., Ghatol, A., 2011. Heart disease diagnosis using support vector machine, International conference on computer science and information technology (ICCSIT’) Pattaya.

Gopalakrishnan, V., Lustgarten, J.L., Visweswaran, S., Cooper, G.F., 2010. Bayesian rule learning for biomedical data mining. Bioinformatics 26(5), 668-675.

Grover, S., Bhartia, S., Yadav, A., Seeja, K., 2018. Predicting severity of Parkinson’s disease using deep learning. Procedia computer science 132, 1788-1794.

Gudadhe, M., Wankhade, K., Dongre, S., 2010. Decision support system for heart disease based on support vector machine and artificial neural network, 2010 International Conference on Computer and Communication Technology (ICCCT). IEEE, pp. 741-745.

Gunduz, H., 2019. Deep Learning-Based Parkinson’s Disease Classification Using Vocal Feature Sets. IEEE Access 7, 115540-115551.

Guo, P.-F., Bhattacharya, P., Kharma, N., 2010. Advances in detecting Parkinson’s disease, International Conference on Medical Biometrics. Springer, pp. 306-314.

Han, J., Kamber, M., Pei, J., 2011. Data mining concepts and techniques third edition. The Morgan Kaufmann Series in Data Management Systems, 83-124.

Hariharan, M., Polat, K., Sindhu, R., 2014. A new hybrid intelligent system for accurate detection of Parkinson's disease. Computer methods and programs in biomedicine 113(3), 904-913.

Hayashi, Y., Yukita, S., 2016. Rule extraction using Recursive-Rule extraction algorithm with J48graft combined with sampling selection techniques for the diagnosis of type 2 diabetes mellitus in the Pima Indian dataset. Informatics in Medicine Unlocked 2, 92-104.

Islam, M.M., Iqbal, H., Haque, M.R., Hasan, M.K., 2017. Prediction of breast cancer using support vector machine and K-Nearest neighbors, 2017 IEEE Region 10 Humanitarian Technology Conference (R10-HTC). IEEE, pp. 226-229.

Jain, S., Shetty, S., 2016. Improving accuracy in noninvasive telemonitoring of progression of Parkinson'S Disease using two-step predictive model, 2016 Third International Conference on Electrical, Electronics, Computer Engineering and their Applications (EECEA). IEEE, pp. 104-109.

Johri, A., Tripathi, A., 2019. Parkinson Disease Detection Using Deep Neural Networks, 2019 Twelfth International Conference on Contemporary Computing (IC3). IEEE, pp. 1-4.

Kadam, V.J., Jadhav, S.M., Vijayakumar, K., 2019. Breast cancer diagnosis using feature ensemble learning based on stacked sparse autoencoders and softmax regression. Journal of medical systems 43(8), 263.

Kahramanli, H., Allahverdi, N., 2008. Design of a hybrid system for the diabetes and heart diseases. Expert systems with applications 35(1-2), 82-89.

Kannadasan, K., Edla, D.R., Kuppili, V., 2019. Type 2 diabetes data classification using stacked autoencoders in deep neural networks. Clinical Epidemiology and Global Health 7(4), 530-535.

Karabatak, M., 2015. A new classifier for breast cancer detection based on Naïve Bayesian. Measurement 72, 32-36.

Kausar, N., Abdullah, A., Samir, B.B., Palaniappan, S., AlGhamdi, B.S., Dey, N., 2016. Ensemble clustering algorithm with supervised classification of clinical data for early diagnosis of coronary artery disease. Journal of Medical Imaging and Health Informatics 6(1), 78-87.

Khan, M.M., Chalup, S.K., Mendes, A., 2016. Parkinson’s disease data classification using evolvable wavelet neural networks, Australasian Conference on Artificial Life and Computational Intelligence. Springer, pp. 113-124.

Khourdifi, Y., Bahaj, M., 2018. K-Nearest Neighbour Model Optimized by Particle Swarm Optimization and Ant Colony Optimization for Heart Disease Classification, International Conference on Big Data and Smart Digital Environment. Springer, pp. 215-224.

Khuriwal, N., Mishra, N., 2018. Breast cancer diagnosis using adaptive voting ensemble machine learning algorithm, 2018 IEEMA Engineer Infinite Conference (eTechNxT). IEEE, pp. 1-5.

Kotsiantis, S.B., Zaharakis, I., Pintelas, P., 2007. Supervised machine learning: A review of classification techniques. Emerging artificial intelligence applications in computer engineering 160, 3-24.

Li, D.-C., Liu, C.-W., Hu, S.C., 2011. A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets. Artificial Intelligence in Medicine 52(1), 45-52.

Long, N.C., Meesad, P., Unger, H., 2015. A highly accurate firefly based algorithm for heart disease prediction. Expert Systems with Applications 42(21), 8221-8231.

Maniruzzaman, M., Kumar, N., Abedin, M.M., Islam, M.S., Suri, H.S., El-Baz, A.S., Suri, J.S., 2017. Comparative approaches for classification of diabetes mellitus data: Machine learning paradigm. Computer methods and programs in biomedicine 152, 23-34.

Marcano-Cedeño, A., Quintanilla-Domínguez, J., Andina, D., 2011. WBCD breast cancer database classification applying artificial metaplasticity neural network. Expert Systems with Applications 38(8), 9573-9579.

Nahar, J., Imam, T., Tickle, K.S., Chen, Y.-P.P., 2013. Association rule mining to detect factors which contribute to heart disease in males and females. Expert Systems with Applications 40(4), 1086-1093.

Naranjo, L., Pérez, C.J., Campos-Roca, Y., Martín, J., 2016. Addressing voice recording replications for Parkinson’s disease detection. Expert Systems with Applications 46, 286-292.

Nguyen, T., Khosravi, A., Creighton, D., Nahavandi, S., 2015a. Classification of healthcare data using genetic fuzzy logic system and wavelets. Expert Systems with Applications 42(4), 2184-2197.

Nguyen, T., Khosravi, A., Creighton, D., Nahavandi, S., 2015b. Medical data classification using interval type-2 fuzzy logic system and wavelets. Applied Soft Computing 30, 812-822.

Nilashi, M., bin Ibrahim, O., & Ithnin, N. (2014a). Hybrid recommendation approaches for multi-criteria collaborative filtering. Expert Systems with Applications, 41(8), 3879-3900.

Nilashi, M., bin Ibrahim, O., & Ithnin, N. (2014b). Multi-criteria collaborative filtering with high accuracy using higher order singular value decomposition and Neuro-Fuzzy system. Knowledge-Based Systems, 60, 82-101.

Nilashi, M., bin Ibrahim, O., Ithnin, N., & Sarmin, N. H. (2015a). A multi-criteria collaborative filtering recommender system for the tourism domain using Expectation Maximization (EM) and PCA–ANFIS. Electronic Commerce Research and Applications, 14(6), 542-562.

Nilashi, M., Jannach, D., bin Ibrahim, O., & Ithnin, N. (2015b). Clustering-and regression-based multi-criteria collaborative filtering with incremental updates. Information Sciences, 293, 235-250.

Nilashi, M., Ahmadi, H., Shahmoradi, L., Ibrahim, O., Akbari, E., 2019a. A predictive method for hepatitis disease diagnosis using ensembles of neuro-fuzzy technique. Journal of infection and public health 12(1), 13-20.

Nilashi, M., bin Ibrahim, O., Ahmadi, H., Shahmoradi, L., 2017a. An analytical method for diseases prediction using machine learning techniques. Computers & Chemical Engineering 106, 212-223.

Nilashi, M., Bin Ibrahim, O., Mardani, A., Ahani, A., Jusoh, A., 2018a. A soft computing approach for diabetes disease classification. Health Informatics Journal 24(4), 379-393.

Nilashi, M., Ibrahim, O., Ahani, A., 2016a. Accuracy improvement for predicting Parkinson’s disease progression. Scientific reports 6, 34181.

Nilashi, M., Ibrahim, O., Ahani, A., 2016b. Accuracy improvement for predicting Parkinson’s disease progression. Scientific reports 6(1), 1-18.

Nilashi, M., Ibrahim, O., Ahmadi, H., Shahmoradi, L., 2017b. A knowledge-based system for breast cancer classification using fuzzy logic method. Telematics and Informatics 34(4), 133-144.

Nilashi, M., Ibrahim, O., Ahmadi, H., Shahmoradi, L., Farahmand, M., 2018b. A hybrid intelligent system for the prediction of Parkinson's Disease progression using machine learning techniques. Biocybernetics and Biomedical Engineering 38(1), 1-15.

Nilashi, M., Ibrahim, O., Dalvi, M., Ahmadi, H., Shahmoradi, L., 2017c. Accuracy improvement for diabetes disease classification: a case on a public medical dataset. Fuzzy Information and Engineering 9(3), 345-357.

Nilashi, M., Ibrahim, O., Samad, S., Ahmadi, H., Shahmoradi, L., Akbari, E., 2019b. An analytical method for measuring the Parkinson’s disease progression: A case on a Parkinson’s telemonitoring dataset. Measurement 136, 545-557.

Onan, A., 2015. A fuzzy-rough nearest neighbor classifier combined with consistency-based subset evaluation and instance selection for automated diagnosis of breast cancer. Expert Systems with Applications 42(20), 6844-6852.

Ozcift, A., 2012. SVM feature selection based rotation forest ensemble classifiers to improve computer-aided diagnosis of Parkinson disease. Journal of medical systems 36(4), 2141-2147.

Parnetti, L., Gaetani, L., Eusebi, P., Paciotti, S., Hansson, O., El-Agnaf, O., Mollenhauer, B., Blennow, K., Calabresi, P., 2019. CSF and blood biomarkers for Parkinson's disease. The Lancet Neurology.

Paul, A.K., Shill, P.C., Rabin, M.R.I., Murase, K., 2018. Adaptive weighted fuzzy rule-based system for the risk level assessment of heart disease. Applied Intelligence 48(7), 1739-1756.

Pawlovsky, A.P., 2018. An ensemble based on distances for a kNN method for heart disease diagnosis, 2018 International Conference on Electronics, Information, and Communication (ICEIC). IEEE, pp. 1-4.

Peterek, T., Dohnálek, P., Gajdoš, P., Šmondrk, M., 2013. Performance evaluation of Random Forest regression model in tracking Parkinson's disease progress, 13th International Conference on Hybrid Intelligent Systems (HIS 2013). IEEE, pp. 83-87.

Polat, K., 2012. Classification of Parkinson's disease using feature weighting method on the basis of fuzzy C-means clustering. International Journal of Systems Science 43(4), 597-609.

Polat, K., Güneş, S., 2007. Breast cancer diagnosis using least square support vector machine. Digital signal processing 17(4), 694-701.

Polat, K., Güneş, S., Arslan, A., 2008. A cascade learning system for classification of diabetes disease: Generalized discriminant analysis and least square support vector machine. Expert systems with applications 34(1), 482-487.

Rout, S., 2012. Fuzzy petri net application: Heart disease diagnosis, Int. Conference on Computing and Control Engineering (ICCCE 2012).

Şahan, S., Polat, K., Kodaz, H., Güneş, S., 2007. A new hybrid method based on fuzzy-artificial immune system and k-nn algorithm for breast cancer diagnosis. Computers in Biology and Medicine 37(3), 415-423.

Sakri, S.B., Rashid, N.B.A., Zain, Z.M., 2018. Particle swarm optimization feature selection for breast cancer recurrence prediction. IEEE Access 6, 29637-29647.

Schraagen, J.M., Verhoeven, F., 2013. Methods for studying medical device technology and practitioner cognition: The case of user-interface issues with infusion pumps. Journal of biomedical informatics 46(1), 181-195.

Shao, Y.E., Hou, C.-D., Chiu, C.-C., 2014. Hybrid intelligent modeling schemes for heart disease classification. Applied Soft Computing 14, 47-52.

Sheikhpour, R., Sarram, M.A., Sheikhpour, R., 2016. Particle swarm optimization for bandwidth determination and feature selection of kernel density estimation based classifiers in diagnosis of breast cancer. Applied Soft Computing 40, 113-131.

Shilaskar, S., Ghatol, A., 2013. Feature selection for medical diagnosis: Evaluation for cardiovascular diseases. Expert Systems with Applications 40(10), 4146-4153.

Soni, J., Ansari, U., Sharma, D., Soni, S., 2011. Intelligent and effective heart disease prediction system using weighted associative classifiers. International Journal on Computer Science and Engineering 3(6), 2385-2392.

Sri, M.N., Sailaja, D., Priyanka, J.H., Chittineni, S., RamaKrishnaMurthy, M., 2019. Performance Evaluation of SVM and Neural Network Classification Methods for Diagnosis of Breast Cancer, International Conference on E-Business and Telecommunications. Springer, pp. 344-349.

Takci, H., 2018. Improvement of heart attack prediction by the feature selection methods. Turkish Journal of Electrical Engineering & Computer Sciences 26(1), 1-10.

Temurtas, H., Yumusak, N., Temurtas, F., 2009. A comparative study on diabetes disease diagnosis using neural networks. Expert Systems with applications 36(4), 8610-8615.

Übeyli, E.D., 2007. Implementing automated diagnostic systems for breast cancer detection. Expert systems with Applications 33(4), 1054-1062.

Wan, S., Liang, Y., Zhang, Y., Guizani, M., 2018. Deep multi-layer perceptron classifier for behavior analysis to estimate parkinson’s disease severity using smartphones. IEEE Access 6, 36825-36833.

Wang, B., Tian, R., 2019. Judgement of critical state of water film rupture on corrugated plate wall based on SIFT feature selection algorithm and SVM classification method. Nuclear Engineering and Design 347, 132-139.

Zare, M., Pahl, C., Rahnama, H., Nilashi, M., Mardani, A., Ibrahim, O., Ahmadi, H., 2016. Multi-criteria decision making approach in E-learning: A systematic review and classification. Applied Soft Computing 45, 108-128.

Zheng, B., Yoon, S.W., Lam, S.S., 2014. Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms. Expert Systems with Applications 41(4), 1476-1482.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.