Arabic Part-of-Speech Tagging

Rabab Ali Abumalloh, Hassan Maudi Al-Sarhan, Othman Ibrahim, Waheeb Abu-Ulbeh

Abstract


The study described in this paper belongs to the area of computational linguistics. Computational linguistics is a field of artificial intelligence dealing with the logical modeling of natural language from a computational perspective. It unites two areas that are quite different in appearance, computer science and natural languages. Computational linguistics might be considered as a synonym of automatic processing of natural language, since the main task of computational linguistics is just the construction of computer programs to process words and texts in natural language. There are many areas that may be considered as properly included within the discipline of computational linguistics. One of these areas is part-of-speech tagging (POS-tagging). POS-tagging is considered as a process for automatically assigning the proper grammatical tag to each word of a written text according to its appearance on the text. Thus, the task of POS-tagging is attaching appropriate grammatical or morpho-syntactical category labels to each word, token, symbol, abbreviation and even punctuation mark in a corpus. POS-tagging is usually the first step in linguistic analysis. Also, it is very important intermediate step to build many natural language processing applications. It could be used in spell checking and correcting systems, speech recognition systems, information retrieval systems and text-to-speech synthesis systems.


Keywords


Tagging, Natural language processing, Arabic language

Full Text:

Abstract PDF

References


Calabrese, F. A. (2005). The early pathways: theory to practice–a continuum. Creating the Discipline of Knowledge Management, Elsevier, New York, NY, 15-20.

Capizzi, M. T., & Ferguson, R. (2005). Loyalty trends for the twenty-first century. Journal of Consumer Marketing, 22(2), 72-80.

Jakkilinki, R., Georgievski, M., & Sharda, N. (2007). Connecting destinations with an ontology-based e-tourism planner. Information and Communication Technologies in Tourism 2007, 21-32.

Aumueller, D. (2005, May). Semantic authoring and retrieval within a Wiki. In Demos and Posters of the 2nd European Semantic Web Conference (ESWC 2005), Heraklion, Greece.

Adafre, S. F. (2005, June). Part of speech tagging for Amharic using conditional random fields. In Proceedings of the ACL workshop on computational approaches to semitic languages (pp. 47-54). Association for Computational Linguistics.

Negnevitsky, M. (2005). Artificial intelligence: a guide to intelligent systems. Pearson Education..

Alqrainy, S. (2008). A morphological-syntactical analysis approach for Arabic textual tagging.

Alqrainy, S., & Ayesh, A. (2006). Developing a tagset for automated POS tagging in Arabic. WSEAS transactions on computers, 5(11), 2787-2792.

Altunyurt, L., & Orhan, Z. (2006). PART OF SPEECH TAGGER FOR TURKISH.

Attia, M. (2006, October). An ambiguity-controlled morphological analyzer for modern standard arabic modelling finite state networks. In Challenges of Arabic for NLP/MT Conference, The British Computer Society, London, UK (Vol. 200610, No. 1.72).

Bahl, L. R., & Mercer, R. L. (1976). Part of speech assignment by a statistical decision algorithm.

Benello, J., Mackie, A. W., & Anderson, J. A. (1989). Syntactic category disambiguation with neural networks. Computer Speech & Language, 3(3), 203-217.

Brill, E. (1992, February). A simple rule-based part of speech tagger. In Proceedings of the workshop on Speech and Natural Language (pp. 112-116). Association for Computational Linguistics.

Cutting, D., Kupiec, J., Pedersen, J., & Sibun, P. (1992, March). A practical part-of-speech tagger. In Proceedings of the third conference on Applied natural language processing (pp. 133-140). Association for Computational Linguistics.

Beale, M. H., Hagan, M. T., & Demuth, H. B. (1992). Neural Network Toolbox™ User's Guide. R2014a ed, 2014.

DeRose, S. J. (1988). Grammatical category disambiguation by statistical optimization. Computational Linguistics, 14(1), 31-39.

Diab, M., Hacioglu, K., & Jurafsky, D. (2004, May). Automatic tagging of Arabic text: From raw text to base phrase chunks. In Proceedings of HLT-NAACL 2004: Short papers (pp. 149-152). Association for Computational Linguistics.

Attia, M. (2006, October). An ambiguity-controlled morphological analyzer for modern standard arabic modelling finite state networks. In Challenges of Arabic for NLP/MT Conference, The British Computer Society, London, UK (Vol. 200610, No. 1.72).

Church, K. W. (1988, February). A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of the second conference on Applied natural language processing (pp. 136-143). Association for Computational Linguistics.

Cutting, D., Kupiec, J., Pedersen, J., & Sibun, P. (1992, March). A practical part-of-speech tagger. In Proceedings of the third conference on Applied natural language processing (pp. 133-140). Association for Computational Linguistics.

Beale, M. H., Hagan, M. T., & Demuth, H. B. (1992). Neural Network Toolbox™ User's Guide. R2014a ed, 2014.

Elhadj, Y. O. (2009). Statistical part-of-speech tagger for traditional Arabic texts. Journal of Computer Science, 5(11), 794.

Garside, R. (1987). The CLAWS word-tagging system.

Greene, B. B., & Rubin, G. M. (1971). Automatic grammatical tagging of English. Department of Linguistics, Brown University.

Habash, N. (2007). Arabic morphological representations for machine translation. In Arabic computational morphology (pp. 263-285). Springer Netherlands.

Habash, N., & Rambow, O. (2005, June). Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (pp. 573-580). Association for Computational Linguistics.

Harris, Z. (1962). String Analysis of Language Structure. Mouton and Co., The Hague.

Jiyad, M. (2006). A Hundred and One Rules!. A short reference for Arabic syntactic, morphological & phonological rules for novice & intermediate levels of proficiency.

Jurafsky, D., & Speech, M. J. H. (2008). Language Processing. International Edition, 66-67.

Kasabov, N. K. (1996). Foundations of neural networks, fuzzy systems, and knowledge engineering. Marcel Alencar.

Kasabov, N. K. (1997). Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering. Computers and Mathematics with Applications, 7(33), 136.

Khoja, S. (2003). APT: an automatic Arabic part-of-speech tagger (Doctoral dissertation, Lancaster University).

Klein, S., & Simmons, R. F. (1963). A computational approach to grammatical coding of English words. Journal of the ACM (JACM), 10(3), 334-347.

Kupiec, J. (1992). Robust part-of-speech tagging using a hidden Markov model. Computer Speech & Language, 6(3), 225-242.

McEnery, A. M., & McEnery, T. (1992). Computational linguistics: a handbook & toolbox for natural language processing. Sigma Press.

Merialdo, B. (1994). Tagging English text with a probabilistic model. Computational linguistics, 20(2), 155-171.

Al-Serhan, H. M. (2008). Extraction of Arabic word roots: An Approach Based on Computational Model and Multi-Backpropagation Neural Networks.

Van Noord, G. (2004, July). Error mining for wide-coverage grammar engineering. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (p. 446). Association for Computational Linguistics.

Schmid, H. (1994, September). Probabilistic part-of-speech tagging using decision trees. In Proceedings of the international conference on new methods in language processing (Vol. 12, pp. 44-49).

Seikaly, Z. (2007). The arabic language: The glue that binds the arab world

. Al Shamsi, F., & Guessoum, A. (2006, April). A hidden Markov model-based POS tagger for Arabic. In Proceeding of the 8th International Conference on the Statistical Analysis of Textual Data, France (pp. 31-42).

Stolz, W. S., Tannenbaum, P. H., & Carstensen, F. V. (1965). Stochastic approach to the grammatical coding of english. Communications of the ACM, 8(6), 399-405.

Tamburini, F. (2009). PoS-tagging Italian texts with CORISTagger. In Proc of EVALITA 2009. AI* IA Workshop on Evaluation of NLP and Speech Tools for Italian.

Weischedel, R., Schwartz, R., Palmucci, J., Meteer, M., & Ramshaw, L. (1993). Coping with ambiguity and unknown words through probabilistic models. Computational linguistics, 19(2), 361-382.

Zanoli, R., & Pianta, E. A multistage PoS-tagger at the EVALITA 2009 PoS-tagging Task.

Søgaard, A. (2011, June). Semisupervised condensed nearest neighbor for part-of-speech tagging. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers-Volume 2 (pp. 48-52). Association for Computational Linguistics.

Ali, B. B., & Jarray, F. (2013). Genetic approach for arabic part of speech tagging. arXiv preprint arXiv:1307.3489.

Mohamed, E., & Kübler, S. (2010, June). Is Arabic part of speech tagging feasible without word segmentation?. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 705-708). Association for Computational Linguistics.

Schmid, H. (1994, August). Part-of-speech tagging with neural networks. In Proceedings of the 15th conference on Computational linguistics-Volume 1 (pp. 172-176). Association for Computational Linguistics.

Prins, R. (2004, July). Beyond N in N-gram Tagging. In Proceedings of the ACL 2004 workshop on Student research (p. 61). Association for Computational Linguistics.

Tamburini, F. (2009). PoS-tagging Italian texts with CORISTagger. In Proc of EVALITA 2009. AI* IA Workshop on Evaluation of NLP and Speech Tools for Italian.

Elhadj, Y. O., Abdelali, A., Bouziane, R., & Ammar, A. H. (2014, November). Revisiting Arabic Part of Speech Tagsets. In Computer Systems and Applications (AICCSA), 2014 IEEE/ACS 11th International Conference on (pp. 793-802). IEEE.

Abbas, Q. (2014). Semi-semantic part of speech annotation and evaluation. LAW VIII, 75.

Schneider, G., & Volk, M. (1998). Adding manual constraints and lexical look-up to a Brill-tagger for German. In Proceedings of the ESSLLI-98 Workshop on Recent Advances in Corpus Annotation, Saarbrücken.

Perez-Ortiz, J. A., & Forcada, M. L. (2001). Part-of-speech tagging with recurrent neural networks. Universitat d’Alacant, Spain.

Chanod, J. P., & Tapanainen, P. (1995, March). Tagging French: comparing a statistical and a constraint-based method. In Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics (pp. 149-156). Morgan Kaufmann Publishers Inc..

Mohamed, E., & Kübler, S. (2010, May). Arabic Part of Speech Tagging. In LREC.

Ku, H., & Francis, W. N. (1967). Computational Analysis of Present-Day {A} merican {E} nglish.

MacWhinney, B., & Snow, C. (1985). The child language data exchange system. Journal of child language, 12(02), 271-295.

Aarts, J., van Halteren, H., & Oostdijk, N. (1998). The linguistic annotation of corpora: The TOSCA analysis system. International journal of corpus linguistics, 3(2), 189-210.

Marcus, M. P., Marcinkiewicz, M. A., & Santorini, B. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational linguistics, 19(2), 313-330.

Al-Sulaiti, L., & Atwell, E. S. (2006). The design of a corpus of contemporary Arabic. International Journal of Corpus Linguistics, 11(2), 135-171.

Das, B. R., Sahoo, S., Panda, C. S., & Patnaik, S. (2015). Part of Speech Tagging in Odia Using Support Vector Machine. Procedia Computer Science, 48, 507-512.

Stenström, A. B., Andersen, G., & Hasund, I. K. (2002). Trends in teenage talk: Corpus compilation, analysis and findings (Vol. 8). John Benjamins Publishing.

Calciu, R. H. Semantic change in the age of corpus linguistics. EDITORIAL SECRETARY, 45..

Maamouri, M., & Bies, A. (2004, August). Developing an Arabic treebank: Methods, guidelines, procedures, and tools. In Proceedings of the Workshop on Computational Approaches to Arabic Script-based languages (pp. 2-9). Association for Computational Linguistics.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.