Native Language Processing

By:
To add a paper, Login.

This paper aims at developing a very basic NLP (Natural Language Processing) system for the Hindi language. Also this paper proposes a format which can make the system compatible with the languages supporting the grammar of Hindi Language. This includes languages like Punjabi, Gujarati and the different type of dialects which are related to Hindi in one respect or the other. The flexibility is possible due to the flexible word structure proposed for the dictionary of the language. Due to the closeness of the system with the Indian Languages, the project has been named as NATIVE LANGUAGE PROCESSING. The base of the project involves storing base information (basic words) in an XML file which is subsequently used in the later stages of the Language Processor. In addition to all the conventional parts a NLP system viz. Lexical Analyzer, Syntactic & Semantic Parser, POS Tagger, this system has an optional answering part proposed) for generating a proper response to the given output. Hence, the system learns and uses a sentence generator to generate appropriate answers based on sentence structures. Additionally, the semantic parser has been split into proper and improper sections which analyze syntactically proper and improper statements, thereby understanding the correct but improper statements. The system is also supposed to incorporate a word recognition algorithm which takes as input an unknown word and gives out the probability of the word being a possible word in the language. Thus, the system is supposed to learn at runtime and identify valid/invalid words and accept or reject based on the context. The
system has been modeled in such a manner that it can process a given sentence and generate the outputs at different levels of the language processor. Moreover the structure and the proposed extension of the system make it possible to be compatible with most of the modern Indian languages following the Hindi grammar.


Keywords: Native Language Processing, Hindi
Stream: Knowledge and Technology
Presentation Type: Virtual Presentation in English
Paper: Native Language Processing, , ,


Aman Kumar

Assistant Systems Engineer, Information Technology, Tata Consultancy Services
Chennai, TamilNadu, India

I hold a B.Tech. Honors Degree from SASTRA Universuty and have done a project on Natural Language Processing (NLP) during my final year of undergraduate study. After looking at the current progress in NLP, I decided to design a generic architecture for an NLP system which could cater to the needs of various independent languages. I then sought out the help of Professor A. Rajaraman of Indian Institute of Technology (IIT), Madras for project guidance. He proposed a project to implement the same NLP system for Indian languages as most of the Indian languages have the same basic architecture. Under his expert guidance the project was undertaken for the Indian national language i.e. Hindi. Due to the adoption of Hindi as the language, we named the system as Native Language Processing. Moreover, the project was developed in a way such that all the phases of the NLP system were standalone and had an independent processing. Use of XML for maintaining language specific information paved a way for support of various other languages independent of the architecture of the system. The development of the system in an open source linux environment using “gcc” provided scope for various further flexibilities and open source libraries. We also developed a Hidden Markov Model (HMM) based word rejection algorithm for our project which would try to reject words not present in the language. The project also resulted in creation of a paper on native language processing. My other project includes “Shape Recognition” under Visual Computing. I analyzed how computers or machines could interpret images and shapes. Recognition of shapes, especially when they may or may not be in appropriate form, is a difficult task for digital machines. This project was done under the guidance of Professor Uma Makeshwari of SASTRA University and Prof. A. Rajaraman of IIT Madras. I used Matlab for this project along with various image transforms to analyze and interpret various two dimensional shapes.

Anand Bora

Software Engineer, IT, Aricent
Bangalore, Karnataka, India

I graduated with Honors in computer science from SASTRA University. I took up Natural Language Processing (NLP) as one of my Honors papers and was deeply interested in the subject. I started with a project in NLP under the guidance of Dr. A. Rajaraman, a professor at the Indian Institute of Technology (IIT), Chennai. The project dealt with making a language processor for the Hindi Language and was titled Native Language Processing. I was simultaneously involved in another project called Shape Recognition in the field of Computer Vision, which involved geometric shape recognition. I analyzed how computers and machines could interpret images and shapes. Recognition of shapes, especially when they may or may not be in appropriate form, is a difficult task for digital machines. This project was done under the guidance of Professor Uma Makeshwari of SASTRA University and Prof. A. Rajaraman of IIT, Chennai.

Ref: T08P0429