Multilingual Information Processing at NMSU CRL: Problems, Technologies, Applications, Tools and Resources
Sergei Nirenburg - Computing Research Laboratory, New Mexico State University
Sep 18, 1998
AbstractThe objective of this talk is to introduce the audience to some current issues and applications in natural language processing and to stimulate thinking and discussions about possible uses of NLP to help solve problems in computer information security.
Large-scale R&D projects are ongoing at NMSU CRL in the areas of
- information extraction and filtering and
- computer-assisted language instruction.
Multilinguality is a major focus of all CRL work. At present, we work on ten languages: Arabic, Chinese, English, Japanese, Korean, Persian, Russian, Serbo-Croatian, Spanish and Turkish.
For the support of the above applications, CRL is developing tools, resources and engines. Tools include:
- control architectures,
- document management environments, including Unicode support,
- text corpus acquisition and processing tools
- interactive knowledge elicitation systems,
- end-user GUIs, and
- developer GUIs.
- tokenization, morphology and syntax grammars,
- computational lexicons (word, phrase, and proper-name),
- a language-neutral world model, or ontology, and
- text corpora.
- Text tokenizers and segmentors,
- Morphological analyzers,
- Syntactic analyzers,
- Semantic and pragmatics/discourse analyzers,
- transfer modules,
- information extraction modules,
- text summarization modules, and
- text generators.
In this talk, I will briefly describe the goals and status of the following projects:
A system supporting the process of building MT systems from any source language into English. It includes a "universal" MT engine geared at English as the target language; a knowledge elicitation system (called Boas) for guiding the developer through acquiring the static knowledge sources for the SL in question; and a configuration and control system to help configure the run-time MT system. Expedition will be tested on three languages, the first of which is Turkish (the other two will be announced at preset intervals).
A system for extracting, filtering and summarizing knowledge in four languages (English, Japanese, Russian and Spanish).
An environment for rapid manual development of low- to medium-quality MT systems from Arabic, Japanese, Korean, Russian, Serbo-Croatian and Spanish into English.
A machine translation system between Persian and English.
an interlingual, knowledge-based MT system from Chinese and Spanish into English; among the salient features of the system are such static knowledge sources as an ontology and semantic lexicons and such dynamic knowledge sources, such as a semantic analyzer with capabilities for processing non-literal language and a text planner.
An environment for computer-assisted language learning.
Crosslinguistic information retrieval based on Unicode support for writing systems around the world.
About the SpeakerDr. Sergei Nirenburg is Director of Computing Research Laboratory and Professor of Computer Science at New Mexico State University. He has received his Ph.D. in Linguistics from the Hebrew University of Jerusalem, Israel, and his M.Sc. in Computational Linguistics from Kharkov State University, USSR. Before coming to NMSU, he taught at the Hebrew University, Colgate University and Carnegie Mellon University. He has Adjunct Garduate Faculty status in Linguistics at Purdue.
Dr. Nirenburg has written or edited six books and published over a hundred and twenty articles in various areas of computational linguistics and artificial intelligence. His research interests include all aspects of research and development of computer systems for multilingual natural language processing, centrally including machine translation (knowledge-based, example-based, rapid-deployment, clossary-based, multi-engine, etc.), computational semantics, computational lexicography, natural language analysis and generation, knowledge acquisition, intelligent interfaces, planning and cognitive modelling.
Dr. Nirenburg is a member of the International Committee on Computational Linguistics which runs COLINGs, the central conferences in the field. In 1987-96 he has served as Editor-in-Chief of the journal "Machine Translation," and in 1991-94, as First Vice President of the Association for Machine Translation in the Americas. He has founded and is Steering Committee Chair of a series of widely attended scientific conferences on theoretical and methodological issues in machine translation, the eigth of which will take place in July 1999 in Chester, UK.
The views, opinions and assumptions expressed in these videos are those of the presenter and do not necessarily reflect the official policy or position of CERIAS or Purdue University. All content included in these videos, are the property of Purdue University, the presenter and/or the presenter’s organization, and protected by U.S. and international copyright laws. The collection, arrangement and assembly of all content in these videos and on the hosting website exclusive property of Purdue University. You may not copy, reproduce, distribute, publish, display, perform, modify, create derivative works, transmit, or in any other way exploit any part of copyrighted material without permission from CERIAS, Purdue University.