A Review of Semantic Similarity Measures in Biomedical Domain Using SNOMED-CT

Mojtaba Zare, Christina Pahl, Mehrbakhsh Nilashi, Naomie Salim, Othman Ibrahim


The determination of semantic similarity between word pairs is an important task in text understanding that supports the processing, classification and structuring of textual resources. In the field of biomedical, semantic similarity measures have been the focus of much research by exploiting knowledge sources such as domain ontologies. SNOMED-CT as a main biomedical ontology provides a global and broad hierarchical terminology for clinical data storage, encoding, and the retrieval of health and diseases information. In this study, we classified the measures proposed in biomedical domain and used SNOMED-CT as an input ontology. We also examined the studies that evaluated these methods using biomedical benchmarks. Regarding this, three major databases, including Science Direct, Springer and IEEE were selected to extract studies which proposed similarity measures and used SNOMED-CT as a knowledge source.  The purpose of this study is to provide the reader with the understanding about the application of semantic similarity measures in biomedical domain using SNOMED-CT, and to gain a clear insight about the performance of these methods. This study also supports researchers and practitioners in effectively adapting semantic similarity measures in SNOMED-CT and provides an insight into its state-of-the-art.


Biomedical ontologies, SNOMED-CT, Semantic similarity measure

Full Text:

Abstract PDF


Al-Mubaid, H., & Nguyen, H. (2006, August). A cluster-based approach for semantic similarity in the biomedical domain. In Engineering in Medicine and Biology Society, 2006. EMBS'06. 28th Annual International Conference of the IEEE (pp. 2713-2717). IEEE.

Al-Mubaid, H., & Nguyen, H. (2009). Measuring semantic similarity between biomedical concepts within multiple ontologies. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 39(4), 389-398.

Batet, M., Harispe, S., Ranwez, S., Sánchez, D., & Ranwez, V. (2014). An information theoretic approach to improve semantic similarity assessments across multiple ontologies. Information Sciences, 283, 197-210.

Batet, M., Sánchez, D., & Valls, A. (2011). An ontology-based measure to compute semantic similarity in biomedicine. Journal of biomedical informatics, 44(1), 118-125.

Batet, M., Sánchez, D., Valls, A., & Gibert, K. (2013). Semantic similarity estimation from multiple ontologies. Applied intelligence, 38(1), 29-44.

Bizer, C., Heath, T., & Berners-Lee, T. (2009). Linked data-the story so far.Semantic Services, Interoperability and Web Applications: Emerging Concepts, 205-227.

Bodenreider, O. (2004). The unified medical language system (UMLS): integrating biomedical terminology. Nucleic acids research, 32(suppl 1), D267-D270.

Campbell, J. R., Brear, H., Scichilone, R., White, S., Giannangelo, K., Carlsen, B., ... & Fung, K. W. (2013). Semantic interoperation and electronic health records: context sensitive mapping from SNOMED CT to ICD-10. In MedInfo (pp. 603-607).

Caviedes, J. E., & Cimino, J. J. (2004). Towards the development of a conceptual distance metric for the UMLS. Journal of biomedical informatics,37(2), 77-85.

Chakraborty, U. K., Gurung, R., & Roy, S. (2014, December). Semantic Similarity Based Approach for Automatic Evaluation of Free Text Answers Using Link Grammar. In Technology for Education (T4E), 2014 IEEE Sixth International Conference on (pp. 218-221). IEEE.

Chang, J. Y., & Lee, K. M. (2015). Large margin learning of hierarchical semantic similarity for image classification. Computer Vision and Image Understanding, 132, 3-11.

Chaves-González, J. M., & MartíNez-Gil, J. (2013). Evolutionary algorithm based on different semantic similarity functions for synonym recognition in the biomedical domain. Knowledge-Based Systems, 37, 62-69.

Chen, Y., Gu, H. H., Perl, Y., & Geller, J. (2009). Structural group-based auditing of missing hierarchical relationships in UMLS. Journal of biomedical informatics, 42(3), 452-467.

Choi, I., & Kim, M. (2003, July). Topic distillation using hierarchy concept tree. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval (pp. 371-372). ACM.

Cross, V., Yu, X., & Hu, X. (2013). Unifying ontological similarity measures: A theoretical and empirical investigation. International Journal of Approximate Reasoning, 54(7), 861-875.

Duarte, J., Castro, S., Santos, M., Abelha, A., & Machado, J. (2014). Improving Quality of Electronic Health Records with SNOMED. Procedia Technology, 16, 1342-1350.

El-Rab, W. G., Zaïane, O. R., & El-Hajj, M. (2013). Analyzing the Impact of UMLS Relations on Word-sense Disambiguation Accuracy. Procedia Computer Science, 21, 295-301.

Fan, J. W., & Friedman, C. (2007). Semantic classification of biomedical concepts using distributional similarity. Journal of the American Medical Informatics Association, 14(4), 467-477.

García, M. M., Allones, J. L. I., Hernández, D. M., & Iglesias, M. J. T. (2012). Semantic similarity-based alignment between clinical archetypes and SNOMED CT: an application to observations. International journal of medical informatics, 81(8), 566-578.

Garla, V. N., & Brandt, C. (2012a). Semantic similarity in the biomedical domain: an evaluation across knowledge sources. BMC bioinformatics, 13(1), 261.

Garla, V. N., & Brandt, C. (2012b). Ontology-guided feature engineering for clinical text classification. Journal of biomedical informatics, 45(5), 992-998.

Gøeg, K. R., Cornet, R., & Andersen, S. K. (2015). Clustering clinical models from local electronic health records based on semantic similarity. Journal of biomedical informatics, 54, 294-304.

Gottlieb, A., Stein, G. Y., Ruppin, E., & Sharan, R. (2011). PREDICT: a method for inferring novel drug indications with application to personalized medicine. Molecular systems biology, 7(1), 496.

Gruber, T. R. (1995). Toward principles for the design of ontologies used for knowledge sharing?. International journal of human-computer studies, 43(5), 907-928.

Guzzi, P. H., Mina, M., Guerra, C., & Cannataro, M. (2012). Semantic similarity analysis of protein data: assessment with biological features and issues. Briefings in bioinformatics, 13(5), 569-585.

Harispe, S., Sánchez, D., Ranwez, S., Janaqi, S., & Montmain, J. (2014). A framework for unifying ontology-based semantic similarity measures: A study in the biomedical domain. Journal of biomedical informatics, 48, 38-53.

Hliaoutakis, A. (2005). Semantic similarity measures in MeSH ontology and their application to information retrieval on Medline. Master's thesis.

Iosif, E., & Potamianos, A. (2015). Similarity computation using semantic networks created from web-harvested data. Natural Language Engineering,21(01), 49-79.

Janowicz, K., Raubal, M., & Kuhn, W. (2015). The semantics of similarity in geographic information retrieval. Journal of Spatial Information Science, (2), 29-57.

Jiang, J. J., & Conrath, D. W. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint cmp-lg/9709008.

Jiang, Y., Wang, X., & Zheng, H. T. (2014). A semantic similarity measure based on information distance for ontology alignment. Information Sciences,278, 76-87.

Lamurias, A., Grego, T., & Couto, F. M. (2013, October). Chemical compound and drug name recognition using CRFs and semantic similarity based on ChEBI. In BioCreative Challenge Evaluation Workshop (Vol. 2, p. 75).

Leacock, C., & Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. WordNet: An electronic lexical database, 49(2), 265-283.

Lee, D., de Keizer, N., Lau, F., & Cornet, R. (2014). Literature review of SNOMED CT use. Journal of the American Medical Informatics Association, 21(e1), e11-e19.

Lesk, M. (1986, June). Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. InProceedings of the 5th annual international conference on Systems documentation (pp. 24-26). ACM.

Li, Y., Bandar, Z., & McLean, D. (2003). An approach for measuring semantic similarity between words using multiple information sources. Knowledge and Data Engineering, IEEE Transactions on, 15(4), 871-882.

Liao, H., Xu, Z., & Zeng, X. J. (2014). Distance and similarity measures for hesitant fuzzy linguistic term sets and their application in multi-criteria decision making. Information Sciences, 271, 125-142.

Lin, D. (1998, July). An information-theoretic definition of similarity. In ICML(Vol. 98, pp. 296-304).

Mabotuwana, T., Lee, M. C., & Cohen-Solal, E. V. (2013). An ontology-based similarity measure for biomedical data–Application to radiology reports. Journal of biomedical informatics, 46(5), 857-868.

Marquet, G., Mosser, J., & Burgun, A. (2007). A method exploiting syntactic patterns and the UMLS semantics for aligning biomedical ontologies: the case of OBO disease ontologies. international journal of medical informatics, 76, S353-S361.

Martínez, S., Sánchez, D., & Valls, A. (2013). A semantic framework to protect the privacy of electronic health records with non-numerical attributes. Journal of Biomedical Informatics, 46(2), 294-303.

McInnes, B. T., & Pedersen, T. (2013). Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text. Journal of biomedical informatics, 46(6), 1116-1124.

McInnes, B. T., & Pedersen, T. (2015). Evaluating semantic similarity and relatedness over the semantic grouping of clinical term pairs. Journal of biomedical informatics, 54, 329-336.

Meng, L., Huang, R., & Gu, J. (2013). A review of semantic similarity measures in wordnet. International Journal of Hybrid Information Technology, 6(1), 1-12.

Merabti, T., Joubert, M., Lecroq, T., Rath, A., & Darmoni, S. J. (2010). Mapping biomedical terminologies using natural language processing tools and UMLS: mapping the Orphanet thesaurus to the MeSH. Irbm, 31(4), 221-225.

Miller, T., Biemann, C., Zesch, T., & Gurevych, I. (2012). Using Distributional Similarity for Lexical Expansion in Knowledge-based Word Sense Disambiguation. In COLING (pp. 1781-1796).

Neches, R., Fikes, R. E., Finin, T., Gruber, T., Patil, R., Senator, T., & Swartout, W. R. (1991). Enabling technology for knowledge sharing. AI magazine, 12(3), 36.

Nguyen, H., & Al-Mubaid, H. (2006, May). New ontology-based semantic similarity measure for the biomedical domain. In Granular Computing, 2006 IEEE International Conference on (pp. 623-628). IEEE.

Pakhomov, S. V., Pedersen, T., McInnes, B., Melton, G. B., Ruggieri, A., & Chute, C. G. (2011). Towards a framework for developing semantic relatedness reference standards. Journal of biomedical informatics, 44(2), 251-265.

Pakhomov, S., McInnes, B., Adam, T., Liu, Y., Pedersen, T., & Melton, G. B. (2010). Semantic similarity and relatedness between clinical terms: an experimental study. In AMIA annual symposium proceedings (Vol. 2010, p. 572). American Medical Informatics Association.

Pedersen, T., Pakhomov, S. V., Patwardhan, S., & Chute, C. G. (2007). Measures of semantic similarity and relatedness in the biomedical domain.Journal of biomedical informatics, 40(3), 288-299.

Pesquita, C., Faria, D., Falcao, A. O., Lord, P., & Couto, F. M. (2009). Semantic similarity in biomedical ontologies. PLoS Comput Biol, 5(7), e1000443.

Pivovarov, R., & Elhadad, N. (2012). A hybrid knowledge-based and data-driven approach to identifying semantically similar concepts. Journal of biomedical informatics, 45(3), 471-481.

Rada, R., Mili, H., Bicknell, E., & Blettner, M. (1989). Development and application of a metric on semantic nets. Systems, Man and Cybernetics, IEEE Transactions on, 19(1), 17-30.

Resnik, P. (1999). Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. J. Artif. Intell. Res.(JAIR), 11, 95-130.

Sahni, L., Sehgal, A., Kochar, S., Ahmad, F., & Ahmad, T. (2014, December). A Novel Approach to Find Semantic Similarity Measure between Words. InComputational and Business Intelligence (ISCBI), 2014 2nd International Symposium on (pp. 89-92). IEEE.

SáNchez, D., & Batet, M. (2011). Semantic similarity estimation in the biomedical domain: An ontology-based information-theoretic perspective. Journal of biomedical informatics, 44(5), 749-759.

Sánchez, D., & Batet, M. (2013). A semantic similarity method based on information content exploiting multiple ontologies. Expert Systems with Applications, 40(4), 1393-1399.

Sánchez, D., Batet, M., & Isern, D. (2011). Ontology-based information content computation. Knowledge-Based Systems, 24(2), 297-303.

Saruladha, K., Aghila, G., & Bhuvaneswary, A. (2011, June). COSS: Cross Ontology Semantic Similarity measure—An information content based approach. In Recent Trends in Information Technology (ICRTIT), 2011 International Conference on (pp. 485-490). IEEE.

Schulz, S., & Martínez-Costa, C. (2013). How ontologies can improve semantic interoperability in health care. In Process Support and Knowledge Representation in Health Care (pp. 1-10). Springer International Publishing.

Schulz, S., Rodrigues, J. M., Rector, A., Spackman, K., Campbell, J., Üstün, B., ... & Persson, K. B. (2014). What's in a Class? Lessons Learnt from the ICD–SNOMED CT Harmonisation. Studies in health technology and informatics, 205, 1038.

Sicilia, M. A. (Ed.). (2014). Handbook of metadata, semantics and ontologies. World Scientific.

Sokal, R. R., & Sneath, P. H. (1963). Principles of numerical taxonomy.Principles of numerical taxonomy.

Song, X., Li, L., Srimani, P. K., Yu, P. S., & Wang, J. Z. (2014). Measure the semantic similarity of go terms using aggregate information content.IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB),11(3), 468-476.

Steichen, O., Daniel-Le Bozec, C., Thieu, M., Zapletal, E., & Jaulent, M. C. (2006). Computation of semantic similarity within an ontology of breast pathology to assist inter-observer consensus. Computers in Biology and Medicine, 36(7), 768-788.

Studer, R., Benjamins, V. R., & Fensel, D. (1998). Knowledge engineering: principles and methods. Data & knowledge engineering, 25(1), 161-197.

Thomasson, A. L. (2014). Ontology made easy. Oxford University Press.

Uddin, M. N., Duong, T. H., Nguyen, N. T., Qi, X. M., & Jo, G. S. (2013). Semantic similarity measures for enhancing information retrieval in folksonomies. Expert Systems with Applications, 40(5), 1645-1653.

Wu, Z., & Palmer, M. (1994, June). Verbs semantics and lexical selection. InProceedings of the 32nd annual meeting on Association for Computational Linguistics (pp. 133-138). Association for Computational Linguistics.

Zadeh, P. D. H., & Reformat, M. Z. (2013). Assessment of semantic similarity of concepts defined in ontology. Information Sciences, 250, 21-39.

Zaid, N. M., & Lau, S. K. (2014). Emerging of Academic Information Search System with Ontology-Based Approach. Procedia-Social and Behavioral Sciences, 116, 132-138.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.