Sentence Similarity Techniques for Automatic Text Summarization

Yazan Alaya AL-Khassawneh, Naomie Salim, Adekunle Isiaka Obasae

Abstract


The technology of summarizing documents automatically is increasing rapidly and may give an answer for the information overload quandary. These days, document summarization is assumed an imperative part of information retrieval. With expansive amounts of documents, giving the user a short version of every document incredibly encourages the errand of discovering required documents. Text summarization is a procedure for making a packed form of a particular document that gives the users utilizable info, and summarization of  multi document is engender summary distributing the meaning of the most info either explicitly or implicitly from a group of documents about main topic. In text summarization, resemblance among several sentences in a text has a major role. As such, development of methods of summarization has taken into consideration the aspect of similarities between several sentences in a text. This paper seeks to investigate different techniques of automatic summarization based on the element of sentence resemblance. Comparison is also developed for functionalities of various techniques with respect to recall, precision and F-measure values.


Keywords


Text summarization, Extractive summarization, Abstractive summarization, Sentence similarity

Full Text:

Abstract PDF

References


Alguliev, R. M., & Alyguliev, R. M. (2007). Summarization of text-based documents with a determination of latent topical sections and information-rich sentences. Automatic Control and Computer Sciences, 41, 132–140.

Alguliev, R. M., & Aliguliyev, R. M. (2005). Effective summarization method of text documents. In Proceedings of the 2005 IEEE/WIC/ACM international conference on web intelligence (WI’05), 19–22 September (pp. 264–271), France.

Alguliev, R. M., Aliguliyev, R. M., & Bagirov, A. M. (2005). Global optimization in the summarization of text documents. Automatic Control and Computer Sciences, 39, 42–47

Aliguliyev, R. M. (2006). A novel partitioning-based clustering method and generic document summarization. In Proceedings of the 2006 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology (WI–IAT 2006 Workshops) (WI–IATW’06), 18–22 December (pp. 626–629), Hong Kong, China.

Aliguliyev, R. M. (2007). Automatic document summarization by sentence extraction. Journal of Computational Technologies, 12, 5–15.

AL-Khassawneh, Y. A., Salim, N., & Isiaka, O. A. (2014). Extractive Text Summarisation using Graph Triangle Counting Approach: Proposed Method. In 1 st International Conference of Recent Trends in Information and Communication Technologies in Universiti Teknologi Malaysia, Johor, Malaysia (pp. 300-311)

Amit S. Zore, Aarati Deshpande Extractive Multi-Document summarizer algorithm International Journal of Computer Science and Information Technologies, Vol. 5, 5245-5248, 2014.

Anjali R Deshpande, Lobo L M R J Text summarization using Clustering technique, International Journal of Engineering Trends and Technology, Volume 4, Issue 8 (August 2013).

Barzilay, R., & Elhadad, M. (1999). Using lexical chains for text summarization. Advances in automatic text summarization, 111-121.

Barzilay, R., McKeown, K. R., & Elhadad, M. (1999, June). Information fusion in the context of multi-document summarization. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics (pp. 550-557). Association for Computational Linguistics.

Bollegala, D., Matsuo, Y., & Ishizuka, M. (2007). Measuring semantic similarity between words using web search engines. In Proceedings of 16th world wide web conference (WWW16), May 8–12 (pp. 757–766) Banff, Alberta, Canada.

Chen, H. H., & Lin, C. J. (2000, July). A multilingual news summarizer. InProceedings of the 18th conference on Computational linguistics-Volume 1(pp. 159-165). Association for Computational Linguistics.

Copeck, T., Szpakowicz, S., & Japkowicz, N. (2002, July). Learning How Best to Summarize. In Proceedings of the Workshop on Multi-Document Summarization Evaluation of the 2nd Document Understanding Conference at the 4Oth Meeting of the Association for Computational Linguistics, Philadelphia, PA.

Dunlavy, D. M., O’Leary, D. P., Conroy, J. M., & Schlesinger, J. D. (2007). QCS: A system for querying, clustering and summarizing documents. Information Processing and Management, 43, 1588–1605.

Edmundson, H. P. (1969). New methods in automatic extracting. Journal of the ACM (JACM), 16(2), 264-285.

Erkan, G., & Radev, D. R. (2004). LexRank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 457-479.

Farzindar, A., Rozon, F., & Lapalme, G. (2005, October). CATS a topic-oriented multi-document summarization system at DUC 2005. In Proc. of the 2005 Document Understanding Workshop (DUC2005).

Fattah, M. A., & Ren, F. (2008). Automatic text summarization. World Academy of Science, Engineering and Technology, 37, 2008.

Fisher, S., & Roark, B. (2006). Query-focused summarization by supervised sentence ranking and skewed word distributions. In Proceedings of the document understanding workshop (DUC 2006), 8–9 June (pp. 8) New York, USA.

Fung, P., & Ngai, G. (2006). One story, one flow: Hidden Markov story models for multilingual multidocument summarization. ACM Transaction on Speech and Language Processing, 3, 1–16.

Gong, Y., & Liu, X. (2001). Creating generic text summaries. In Proceedings of the 6th international conference on document analysis and recognition (ICDAR’01), 10–13 September (pp. 903–907) Seattle, USA.

Guo, Y., & Stylios, G. (2005). An intelligent summarization system based on cognitive psychology. Information Sciences, 174, 1–36.

Hovy, E., & Lin, C. Y. (1998, October). Automated text summarization and the SUMMARIST system. In Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998 (pp. 197-214). Association for Computational Linguistics.

Jing, H., & McKeown, K. R. (2000, April). Cut and paste based text summarization. In Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference (pp. 178-185). Association for Computational Linguistics.

Jones, K. S. (2007). Automatic summarizing: The state of the art. Information Processing and Management, 43, 1449–1481.

Kågebäck, M., Mogren, O., Tahmasebi, N., & Dubhashi, D. (2014, April). Extractive summarization using continuous vector space models. InProceedings of the 2nd Workshop on Continuous Vector Space Models and their Compositionality (CVSC)@ EACL (pp. 31-39).

Kan, M. Y., & McKeown, K. (1999). Information extraction and summarization: Domain independence through focus types.

Kumar, A. S., Premch, P., & Govardhan, A. (2011). Query-based summarizer based on similarity of sentences and word frequency. International Journal of Data Mining and Knowledge Management Process, vol.1, no.3.

Kupiec, J., Pedersen, J., & Chen, F. (1995, July). A trainable document summarizer. In Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 68-73). ACM.

Li, J., Sun, L., Kit, C., & Webster, J. (2007). A query-focused multi-document summarizer based on lexical chains. In Proceedings of the document understanding conference 2007 (DUC 2007), 26–27 April (p. 4.) New York, USA.

Li, W. (2015) Abstractive Multi-document Summarization with Semantic Information Extraction. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1908–1913, Lisbon, Portugal.

Li, Y., Luo, C., & Chung, S. M. (2008). Text clustering with feature selection by using statistical data. IEEE Transactions on Knowledge and Data Engineering, 20, 641–652.

Li, Y., McLean, D., Bandar, Z. A., O’Shea, J. D., & Crockett, K. (2006). Sentence similarity based on semantic nets and corpus statistics. IEEE Transactions on Knowledge and Data Engineering, 18, 1138–1150.

Liu, X., Zhou, & Y., Zheng, R. (2007). Sentence similarity based on dynamic time warping. In Proceedings of the first international conference on semantic computing (ICSC 2007), 17–19 September (pp. 250–256) Irvine, USA.

Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of research and development, 2(2), 159-165..

Mallett, D., Elding, J., & Nascimento, M. A. (2004, April). Information-content based sentence extraction for text summarization. In Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004. International Conference on (Vol. 2, pp. 214-218). IEEE.

Marcu, D. (1999). Discourse trees are good indicators of importance in text.Advances in automatic text summarization, 123-136.

McDonald, D. M., & Chen, H. (2006). Summary in context: Searching versus browsing. ACM Transactions on Information Systems, 24, 111–141.

McKeown, K. R., Barzilay, R., Evans, D., Hatzivassiloglou, V., Klavans, J. L., Nenkova, A., ... & Sigelman, S. (2002, March). Tracking and summarizing news on a daily basis with Columbia's Newsblaster. InProceedings of the second international conference on Human Language Technology Research (pp. 280-285). Morgan Kaufmann Publishers Inc.

Mihalcea, R., & Ceylan, H. (2007). Explorations in automatic book summarization. In Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL 2007), 28– 30 June (pp. 380–389) Prague, Czech Republic.

Moawad, I. F., & Aref, M. (2012, November). Semantic graph reduction approach for abstractive Text Summarization. In Computer Engineering & Systems (ICCES), 2012 Seventh International Conference on (pp. 132-138). IEEE.

Mogren, O., Kågebäck, M., & Dubhashi, D. (2015) Extractive Summarization by Aggregating Multiple Similarities. In Proceedings of Recent Advances in Natural Language Processing, pages 451–457, Hissar, Bulgaria .

Perumal, K., & Chaudhuri, B. B. (2011). Language independent sentence extraction based text summarization. In Proceedings of ICON-2011: 9th International Conference on Natural Language Processing.

Radev, D. R., Hovy, E., & McKeown, K. (2002). Introduction to the special issue on summarization. Computational linguistics, 28(4), 399-408.

Radev, D. R., Jing, H., Stys, M., & Tam, D. (2004). Centroid-based summarization of multiple documents. Information Processing and Management, 40, 919–938.

Saggion, H., & Lapalme, G. (2002). Generating indicative-informative summaries with sumUM. Computational linguistics, 28(4), 497-526.

Salton, G. (1989). Automatic text processing: The transformation, analysis, and retrieval of. Reading: Addison-Wesley.

Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5), 513-523.

Salton, G., Singhal, A., Mitra, M., & Buckley, C. (1997). Automatic text structuring and summarization. Information Processing and Management, 33, 193–207.

Suanmali, L., Salim, N., & Binwahlan, M. S. (2009). Fuzzy logic based method for improving text summarization. International Journal of Computer Science and Information Security, Vol. 2, No. 1, 2009.

Thomas, S., Beutenmüller, C., de la Puente, X., Remus, R., & Bordag, S. (2015, September). ExB Text Summarizer. In 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue (p. 260).

Wan, X. (2007). A novel document similarity measure based on earth mover’s distance. Information Sciences, 177, 3718–3730.

Wan, X. (2008). Using only cross-document relationships for both generic and topic-focused multi-document summarizations. Information Retrieval, 11, 25–49.

Yeh, J-Y., Ke, H-R., Yang, W-P., & Meng, I-H. (2005). Text summarization using a trainable summarizer and latent semantic analysis. Information Processing and Management, 41, 75–95.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.