> > > Eine neue Computersprache soll die Kommunikation im Internet über > > > Sprachgrenzen hinaus möglich machen. Mehr als 120 Computerexperten und ... > > Etwas verwundert bin ich, dass dies doch im Prinzip ein aehnlicher > Ansatz ist wie das, was man seinerzeit mit Esperanto erreichen wollte. > Warum werden diese alten Bemuehungen komplett ueber Bord geworfen und > mal wieder etwas neues gebaut? Hat das Sprachkonzept von Esperanto > so grosse Macken, dass sein Einsatz nicht mehr zeitgemaess erscheint? Eo ist als Universalsprache (neutrale Quellsprache zur automatischen Uebersetzung in alle anderen Sprachen) sicherlich tauglicher als irgendeine Nationalsprache. Es gab ein EU-gefoerdertes Projekt namens DLT, das Eo als Zwischensprache einsetzte. Dennoch ist Eo fuer diesen Zweck nicht geschaffen. Es ist vor allem eine (fuer Europaeer) leicht erlernnbare und schnell aktiv beherrschbare Lingua Franca. Die behaupteten Vorzuegen der Systematik / Logik / Universalitaet hingegen hat Eo nur in begrenztem Masse. Wenn man mit Universalitaet ernst macht, kommt so etwas wie Lojban (http://xiron.pc.helsinki.fi/lojban) heraus. Also keine sehr leichte Kost. UNL soll nun noch schwerere Kost sein: eine ueberhaupt nicht menschenlesbare Computersyntax. Inzwischen konnte ich die UNL-Leitseite finden (trotz fehlerhafter Verweise): http://www.ias.unu.edu/research_prog/science_technology/ universalnetwork_language.html und habe das HTML-Original angehaengt. Die Geheimniskraemerei dieser Leute finde ich sehr beunruhigend. Es gibt keine Spezifikation der Sprache und schon gar keine oeffentliche Software. Lediglich Namen von geschlossenen Institutionen und Konferenzorten und Aussichten auf Kompatibilitaet mit "standard network servers". Angesichts der Wichtigkeit und des immensen Arbeitsaufwandes auf die Methode des verteilten Arbeitens (die schreiben selber, dass nur das Internet den Erfolg eines so grossen Projektes ueberhaupt traeumen laesst), ist diese Geheimniskraemerei ein boeser Witz. -- Hartmut Pilch http://www.a2e.de/phm/Title: Universal Networking Language
UNL stands for "Universal Networking Language", an electronic language that enables communication between different native languages. It is a system of "enconverter" and "deconverter" software that will reside on the Internet, and will be compatible with standard network servers. Any person with access to the Internet will be able to "enconvert" text from a range of native languages into UNL. Just as easily, any UNL text can be "deconverted" from UNL into native languages.
The UNL would bring potential benefits to individuals, government and non-government organizations across all linguistic barriers. It is envisaged, that it would be helpful to many millions of people and in different situations. With respect to the UNUs mandate, the UNL may become a powerful instrument to promote networking around the world (thus alleviating the isolation of scholars in developing countries) as well as supporting the development of the "virtual university" which will enhance access to knowledge. For the UN in general, for UNESCO and other multilateral organizations, it has enormous potential as a tool to foster dialogue among nations and for the promotion of peace, culture, cooperation and development. The UNL complements the UNU/IAS Virtual University initiative.
The Internet holds promise of access to information to all people, but the evolution of English as the de facto standard language of the Internet, limits access to the percentage of the worlds population that reads and writes English. The complete fulfillment of the bright promise brought about by the contemporary networking facilities, is hindered by language barriers which continue to prevent worldwide communication. By allowing people to participate in global information exchange in their native languages, the UNL will fulfill the promise of the Internet.
The design of the UNL system and its core software applications is being created by the UNL Centre at the Institute of Advanced Studies of UNU. This includes: (1) the Universal Networking Language system; (2) enconversion and deconversion software; and (3) technical specifications and guidelines for developing native-language enconversion and deconversion modules. Conversion software modules for each native language are being developed in partnership with research institutes, universities, and R&D groups under contract with UNU/IAS.
Launched in 1996, the UNL project has already demonstrated both R&D vigor and a capacity to motivate and obtain commitments from the best-known research centres in computer linguistics. It has also produced tangible results that are encouraging the further expansion of commitments.
The period for the full development of the UNL project is ten years. The first three years (1996-1998) are devoted to creating the UNL core system and of the conversion modules for a dozen natural languages, including the six official languages of the United Nations. The UNL Centre is already operational, and has begun implementing an experimental software system, in collaboration with research institutes and R&D companies in Brazil, China, Egypt, France, Germany, India, Indonesia, Italy, Japan, Jordan, Mongolia, Portugal, Russia, and Spain. The remaining seven years (1999-2005) will be applied to the development of modules for the native languages of the other member states and to improving system performance and quality.
Keywords: Enconverter, deconverter, knowledge-base, human language technology, computer-aided translation
Duration: 24 months (current phase of a ten year programme which commenced in 1996).
Start Date: July 1997
End Date: June 1999
Person responsible for the project
UNU Officer in Charge: Professor T. Della Senta,
Project Directors: Mr. K. Nishi, Visiting Professor,
Dr. H. Uchida, Advisor
Mr. T. Makino, Visiting Professor,
Background and justification
Definition and scope of the project
There are as many as 3,000 different languages spoken on the Earth today, depending on classification adopted. More than 38 languages are the native tongues of at least 10 million people. A dozen of them are spoken by more than 100 million people, among these Arabic, Chinese, English, French, German, Hindi, Indonesian, Japanese, Portuguese, Russian, Spanish and Swahili. Six are official languages of the UN (Arabic, Chinese, English, French, Russian and Spanish). English, as Latin in the past, has become de facto the lingua franca of today. Language diversity, in turn, is a basic trait of cultural pluralism of humankind. Cultural identity, hence diversity, rests on their forms of expression. Preserving and promoting the survival of languages spoken around the world is essential in order to sustain cultural diversity - an ideal fostered by the UN, UNESCO and its member states.
Other more pragmatic (mainly economic) reasons also exists for advocating the need for language diversity together with easy access to information and knowledge. With the recent breakthroughs in means of communications and information technologies, humankind is living in a global village. Globalization is changing the economic, political and cultural structures of societies. Demand for cross-cultural communication is ever rising, and becoming an overwhelming influence on human activity in contemporary civilization. Internet is extending it a global network of communication around the world. In recent years, it has become a dramatic force, bringing about radical changes in the way information flows. With a new infrastructure for international communication under construction, there is now a new environment in which fosters global communication. With the Internet, the global village now has public "plazas" and "free speech corners". The biggest problem remaining is how to overcome language barriers, while respecting language diversity.
Language differences are a barrier to the smooth flow of information. The current options available are: (a) joining a dominant lingua franca, or (b) let the gap increase between those with knowledge and those without. There is a general recognition that both seem equally perverse. Crushed by the powerful media of communication (supported by the dominant languages), cultural pluralism and peoples identity are in jeopardy.
Fundamentally, this is a "catch-twenty-two" dilemma: how to preserve language diversity while promoting information flow? Or how to increase communication among people using their own native language? The UNL proposal was inspired by the challenge of solving this dilemma by proposing a third path.
Computers have been used for years in an attempt to overcome the language barrier, with limited success. Research and development in computer-based translation has a long history, and recently some machine translation products are available. However, a series of problems remain unsolved. Translation has been challenging enough for humans throughout the centuries. Needless to say, it is a far more daunting task for computers. Computer translation is unable to supply a high-quality output as yet, hence its limited use. Furthermore, it does not address the fundamental issue of how to promote the flow of information across language barriers, while preserving native language diversity.
The UNL Project represents a quantum leap over computer-based translation. The new approach it promotes has been recognized by experts in the field and determined their decision to merge their intellectual capacity and experience with that of UNU/IAS. The UNL differs from traditional language conversion methods in the sense that it does not involve language analysis, so it is not a translation system. The UNL enables communication between a number of languages using stable language generation techniques, avoiding language analysis, the stumbling block in the existing technology.
As it will be explained later on, the UNL is a global-scale intermediary language, intended to be transparent to all languages. Information encoded in UNL is converted to an equivalent counterpart text written in the target language, through a language generator, the "deconvertor" to be created for each language. An editor called "enconverter" translates information written in a given language to UNL-format information. As UNL excludes obscurity and is purely logical, it is difficult for humans to operate it without the help of the digital media. The language exists only in electronic form and on the information network.
The UNL was launched by UNU/IAS in April 1996. The idea was conceived by Mr. K. Nishi, Visiting Professor of IAS, an expert in media systems, who invited Dr. U. Uchida to undertake a feasibility study of the new concept using coding and decoding systems plugged onto the Internet. Based on his previous experience in machine translation, Dr. Uchida endorsed the feasibility of UNL as an entirely original approach for an intermediary electronic language.
Controversy and skepticism regarding the feasibility of the UNL, as expected, still prevails. The questions raised include: Is it feasible, after so many unsuccessful attempts of building a common language (Esperanto) and the meager results of machine translation? Why bother if there key languages such as English which are already used de facto as a convenient intermediary language, widely used in business transactions, in international relations and in interpersonal communication? What is UNU/IAS comparative advantage in being involved in such a venture?
Although here is not the proper place for an extended debate on these questions, neither for an exhaustive justification in support of this initiative, the question must be addressed without hesitation. It seems important that a solid rationale for undertaking such a challenging - in some respects risky endeavour is provided. In a nutshell, the rationale is supported by four main considerations: preventing exclusion; the growing strength of the Internet; the technical and scientific "critical mass"; and the IAS comparative advantage to lead the initiative.
Preventing scientific, economic and cultural exclusion
Throughout human history, there has been a gradual ebb and flow of languages across the globe. This process has witnessed certain languages gain either regional or sectoral ascendency (e.g., Greek, Latin, Arabic, Chinese, French, etc.). In this Century, a range of forces have made English de facto the dominant language, globally, and across all sectors of human activity. While acknowledging that, at present, English is fulfilling an important and useful role, it is necessary also to consider whether wider opportunities for information exchange and knowledge access (particularly high-tech) are being missed.
Moreover, while recognizing the fundamental influence of certain languages in shaping human thought and action, it also important to reflect upon the vast wealth of human knowledge that exists embodied within the myriad of languages on this globe. This is one of the greatest resources available to humanity. Yet, at present, by default, a process of exclusion seems to be the predominant tendency, affecting billions of citizens around the world today. With increasing international economic interaction, the amount of information that needs to cross language barriers is rising dramatically. Among the factors that make information communication harder, are the high cost of employing translators and the relatively low productivity of the translation process. Data such as management and operations manuals for large plants, are huge in volume, and require fairly frequent updating in those industrial sectors subject to rapid technological advancements. Transfer of technology to, and business operations with, a country where people speak one of the worlds major languages does not represent major difficulties. However, when the countrys mother tongue is a relatively minor language, operations are likely to be in English. It is therefore fair to say that in many countries the language barrier creates a bottleneck for employment opportunities and economic activities.
There is a reverse form of exclusion, too. Considerable amounts of scientific, technological and cultural information produced in languages other than English become unavailable. Access to such information is confined to the language of its origin. Even with a tremendous effort by a large number of translators, few intellectual breakthroughs from Russian, Japanese, German, Swedish, let alone Egyptian, Korean and many other civilizations are available internationally. A Chinese scientist, for instance, would have a hard time to have his scientific findings published internationally. Many scientific and technological contributions are not available to the rest of the world because of the tremendous efforts - and costs - to overcome their native language barrier; even when someone more fortunate succeeds, most of the time, the genuine character of his/her contribution is filtered by the language distance.
Most Japanese Internet sites, for example, offer their home pages only in Japanese, although corporations and universities provide English versions as well. With language as the stumbling block, those Japanese-only sites effectively exclude access from abroad. Even with the advances in information communications hardware and the growth of Internet, the language barrier still exists to hinder human communication.
Intergovernmental organizations and institutions that must ensure a fair circulation of information devote considerable funds and labour to document translation. At UN headquarters, for instance, conference materials are translated into the six official languages designated by the organization, while in the EU, conference materials are translated into twelve or more languages for the member states. These measures, nonetheless, are insufficient to ensure the necessary circulation of information.
With the Internet, a New Momentum
The Internet exacerbates the risk of exclusion, as it creates a new unprecedented opportunity for all. The risk is not only because Internet is structured and run in English, (this makes its strength and utility) but also because it enhances accessibility - almost exclusively - to everything that is produced in that language.
The network was first used in 1969, and has since evolved into what is now know as the Internet: an amalgamation of over 100,000 computer networks, run by governments, universities, non-profit organizations and private corporations. Complying the same technical standards, enables individual networks to link to the backbone network, by accessing one of the main high-speed, long-distance Internet lines. When this takes place, the linked network becomes a part of the Internet. At the same time, as private companies discovered new applications for the network, the commercial use of the Internet began to increase. Recently, an increasing number of users have been linking directly to the Internet, without going through one of the commercial on-line services.
The numbers are impressive. The number of US households with access to the Internet has doubled over the past year to 14.7 million. And the number of Internet access providers has exceeded the number of commercial on-line services. As of September 1996, the number of Web users on the Internet stood at 20 million a week, and the number of adult Americans who have access the Web every day was approximately 9 million. The results of a survey conducted a year earlier indicated that the number of Web users was 5.3 million on a weekly basis, and that daily users stood at 2.3 million. The survey also revealed that some 6.2 million households had accessed the Internet over the last year. Some 38.7 million people aged 18 or above have entered cyberspace on at least one occasion.
The situation is evolving in the same direction in other nations. The number of so-called providers, who provides users with the Internet access, is rapidly increasing in Japan and Europe, at a rate that makes it difficult for survey companies to keep up-to-date information. Accessing Internet has become so easy for general households, that people to use it on a daily basis.
The Internet, thus, creates an immense opportunity for communication. It also creates the conditions to overcome the barriers among languages existing today. It provides the UNL with the platform it needs for inter-mediating the interaction among these languages. It allows thousands of people to work on the same subject in a collective effort to overcome obstacles that could not be eliminated by single initiatives. As it would inconceivable to reach the moon, and explore the human genoma without the help of powerful computers, the UNL is inconceivable without the Internet, with its worldwide network of computers and people. The Internet makes it possible the "economy of scale" effect that the UNL represents over the machine translation. The weakness of machine translation, so far, has been the absence of a common platform for a broad programme of international collaboration among multiple languages, which could reduce the costs and - more importantly - multiply the effect of the developments of each individual language over all others. Lacking such a common platform, millions of dollars invested in machine translation by several developed countries, resulted in rather limited benefits. The task is obviously too gigantic to be accomplished on bilateral basis. A global engagement of as many languages as possible seems to be a pre-condition for success of the UNL.
Critical Mass Resulting from Cumulative Experience
Translation is labour intensive, and it is almost impossible to keep up with the translation demands for mass information "manually", either in terms of cost or of quantity of output. This fact led R&D into computer-aided translation and interpretation systems. Tests production of a number of systems, generally referred as machine translation, automated translation, computer-aided translation and automated interpretation systems, have been underway for more than thirty years.
Machine translation refers to a system where text is entered, based on which, sentence and semantic structures are induced with the use of a dictionary, rules of grammar and semantic information. The translated text is in turn generated in the target language by applying, to the induced structures, a dictionary and the grammatical rules of the language. The assessment of machine translation systems varies.
In the latter half of the 1980s, several commercial systems were introduced, which have some practical use thanks to the improvements in computer performance. These systems generally provide one-to-one language translation. Recently, systems incorporating a machine translation function have been made available on Internet browsers. Although these systems do not deliver an advanced translation function, they are useful for users who want to skim a home page to get the gist of its content. The user is thus able to obtain information, which, if it were not for the system, would not be open to him or her. In fact, these machine translation systems appear to be enjoying extensive use. The ability to formulate easy-to-read sentences is necessary, when users are quickly going through the information translated into their native tongue.
Automatic interpretation systems require speech recognition and synthesis technologies, as well as the translation function of the machine translation systems. Despite intensive efforts in research and development, there has been no prospect of such systems ever coming out of the laboratory into commercial use. As the target of linguistic analysis is the spoken language, the system must process sentences that are grammatically wrong and sentences that contain many abbreviations and anaphora. The speech recognition technology that is at the heart of the system can currently recognize the speech of unspecified persons up to approximately 5,000 words uttered without a break. Nonetheless, it appears that many more technical breakthroughs are needed before the system becomes fully adaptable to daily spoken language. Many issues remain to be addressed in terms of linguistic analysis and voice recognition.
As machine translation faces many quality issues, it is likely to be some time before automatic interpretation systems are commercialized. Yet the spread of information networks has prompted rising demand for the exchange of information between different languages. The state of the art on human language technology was analyzed by Prof. A. Zapolli, of the University of Pisa. His survey published in 1997 offers an overview of the main areas of work, the capabilities and limitations of current technology, and the technical challenges that must be overcome to realize the vision of graceful human computer interaction using natural communication skill.
The UNL Project builds upon the rich experience gained so far by several centres of excellence working on computational linguistics. A vast wealth of knowledge has been accumulated in natural languages translation, though dispersed or conducted in parallel efforts. It appears that the momentum has come for a quantum jump, (a sort of mutation in computational linguistics) provided that catalytic factor is brought into play. What is needed is a "catalyser" that is powerful enough to bring these disperse efforts together, and generate a "critical mass" capable of breaking the language barrier.
Comparative Advantage to UNU Programmes
The UNU/IAS is in special position within the UNU, thus, within UN and UNESCO. UNUs mandate, in very broad terms, is to mobilize the scientific and intellectual support to foster the principles of the charters of these intergovernmental organizations. This provides the UNU with an operational basis that very few intergovernmental organizations, scientific bodies, much less private organizations or NGOs could enjoy. On top of this, being an autonomous academic institution, the UNU provides the conditions that promote interaction between researchers and scholars as equal partners. Its position becomes even more strategic in a scenario in which the capacity for bridging North/South, or for mobilizing different schools of thought, becomes critical. As an international community of scholars, the UNU is meant to facilitate access to knowledge, and to contribute to alleviate the isolation of scholar, particular in developing countries.
The UNU/IAS shares such a unique position and strategic function. In fact, this can be one of its major strength in undertaking global and complex - at times risky - endeavours such as the UNL that require broad-based collaboration, while ensuring fair opportunities for all partners. This requires from the Institute the ability to formulate attractive programmatic initiatives, innovative projects and to incorporate talented scholars as in-house staff. In addition, the Institute must raise the necessary funds to support them.
With the UNL Project, the UNU/IAS was fortunate to bring together scientists in computer and linguistics, as well as Internet experts and entrepreneurs in electronic communications. They were able to formulate a long-term plan that captured the motivation of the best centres of excellence in the field of computational linguistics around the world. Many among them have been working for years in this field, and have accumulated a wealth of experience and data on machine translation and computational linguistics. They are now brought into a global network of research and development (R&D) that works around the clock in developing the various components of the UNL system from their native languages. The UNU/IAS capitalizes on - and multiplies the impact of - previous work on machine translation done by the various centres in their respective languages. The double effect of "economies of scale" and "critical mass" mentioned above, provide the Institute with a strategic position in promoting a powerful network of research institutions, software developers and experts in Linguistics.
The most striking comparative advantage of IAS is its global reach and neutral posture in respect to diversity of cultures, which allows for catalyzing energies, dispersed talent and efforts, thus generating the critical mass that is required for success of the UNL Project. Language has a very sensitive political dimension and cultural differences must be respected. The fact is that the Institute must attract cooperation among talented people from various scientific and cultural backgrounds that would not otherwise work together. The nature of the UNL project requires - in addition to technical competence - long-term resolve and sustained institutional commitment.
UNL: a System for Multilingual Communication
In this section, a succinct explanation on the UNL system is provided
The Core UNL System
The UNL serves as a common language shared by people over the Internet in multi-lingual network. The multi-lingual network enables people to communicate in their mother language with peoples of different language.
The UNL system may be described as an "electronic language", and consists of three interlinked computer softwares: the "editor", the "enconverter" and the "deconverter". These three computer programmes reside in the Internet, and are accessible to any user in its respective native language. As any other computer programme, the UNL system is "invisible" to the user, but can be operated from any PC like most popular software applications. The UNL is a set of logical operations containing digital information adapted for computer processing, and therefore it is not easy for human to directly handle. Users only need to input information in their own language, using the UNL editor. Preciseness of conversion can be verified by re-converting the UNL representation into the language from which the UNL was originated.
The core of the UNL system is the "editor", a software application equipped with a "enconverter" and a "deconverter", coupled with a function of word processor. The conversion process from native languages into UNL is called "enconverter", and that from UNL into native languages is called "deconverter". Information from each native language, once "enconverted" in the form of UNL, is ready to be exchanged via network and disseminated to all other languages connected to the UNL. Information represented in UNL is "deconverted" into native languages and becomes available on the terminal of network.
How the UNL System Operates
The UNL "editor" software facilitates writing in UNL. As a writer inputs a document in his/her native language, the editor simultaneously "enconverts" it into a UNL representation. This can be done in an interactive process between the writer and the computer. The UNL editor shows back to the writer the document as it is "deconverted" from the UNL, which represents how the system understands the original document being produced by the writer, thus allowing him/her to check the correctness of "enconversion". Preciseness of conversion can be verified by re-converting the UNL representation into the language from which the UNL is obtained. In such an interactive process, the writer can produce UNL document as accurately as he wishes.
Information written in UNL will appear on homepages, or will be stored in archives. UNL documents will be exchanged in network, and stored in WWW servers. They can also be forwarded by file transfer program. UNL documents received on network terminal will be converted into each native language and read by any people on a browser that is equipped with "deconverter".
The UNL Deconverter: Generating Information in Native Languages
The "deconverter" generates documents and information in native languages, from the UNL system. Since the "deconverter" is built upon a well-established language generation technology, it is expected that information represented in UNL is deconverted into each native language with almost 100% accuracy.
It is very significant that the "deconverter" is capable of expressing UNL information with 100% accuracy. It follows that, once composed in UNL, information can be understood in any language for which a "deconverter" module is created.
Language generation technology has been developed as a part of machine translation, and a fairly matured technology has been achieved in several languages. In these cases, therefore, a "deconverter" can be developed by interfacing existing language generation technology with the UNL system. Needless to say, the UNL system does not mean to eliminate existing machine translation system.
"Deconverter" is a programme that automatically converts UNL into native languages, and is required to achieve a high quality and correct result. For that purpose, it is also important that the basic architecture of the "deconverter" is shared by all languages. The technology developed for one language can be applied to other languages, provided that the architecture is shared. Although letters (written characters, scripts) vary from language to language, generating program and dictionary management system can be shared if the format of the "generation rules" and of the "dictionary" are standardized.
Enconverter : Enconverting Information into UNL
Information input in a native language is "enconverted" into UNL. The correctness of the outcome of conversion is verified by showing the "deconverted" result in the original language. In this verification, the accuracy of "deconversion" can be up to 100%.
When it is found that the result is not correct enough, the user can either rewrite the original document or directly produce UNL text interactively according to the guidance that is provided by the editor.
For this reason, the most advanced technology of natural language analysis must be introduced to the development of "enconverter". Analysis technology so far has aimed at automatic analysis. Natural language documents, however, sometimes include sentences that even for human are incomprehensible, ambiguous, or excessively long, which cannot always be analyzed by computer. But human assistance can overcome text ambiguity. The conversion into UNL is carried out interactively by the writer, in a process in which ambiguities are eliminated by changing words or rewriting the sentence.
However, no matter how correctly the UNL system converts UNL text into various languages, it will be used only if is easy enough to compose UNL documents. The UNL editor is a tool to help producing documentation in UNL.
How the UNL Information is Represented in Computer
Documents are processed by the computer according to various sets of information. For instance, a document in word processor contains "instructions" on typeface, text layout, graphics, and so on. In Internet communication, the dominant text format, HTML, is capable of containing links to other documents, and it enables readers to refer to various related documents by tracing links. Thus the electronic document can contain various supplementary information in it, which increase the quality of the document at the receiving end. The UNL documents are to be treated in the same way by the Internet.
One of the merits in HTML is that the whole document is made up of plain text. In general, information is divided into text and embedded. In HTML, however, even embedded information is also described in plain text. This characteristic gives HTML a universal adaptability to any editing system compatible with hyper-text. Furthermore in HTML, description format for embedding is open to the public. HTML conventions are still expanding and improving.
As the UNL is designed to be a common language in a network, information expressed in UNL should be handled by any network system in the world. In order to achieve this aim, the format description for UNL expression is proposed as one of extension of HTML convention. UNL information can be embedded in HTML document with tags to specify the UNL information attached at its beginning and end. Extensions of convention should conform to the existing HTML so that it enables UNL expression to be handled as any other document, without damaging the HTML hyper-text structure. Conforming with the HTML convention, the description of the UNL will be all made in plain text, and the format will be open to the public.
Automatic Management of UNL Vocabulary
The vocabulary in UNL is called "Universal Word" or "UW". UW represents the meaning of a word in various languages, and it is used as a symbol to represent information in computer system. UW is developed according to certain conventions, agreed among all partners.
An enormous number of UWs will be defined throughout the development of the UNL system. Meanings of words, however, vary significantly from language to language, and even within the same language. Each UW, in its various meanings, has to be registered in order to become universally convertible to other languages. Users can easily do so over the UNL system. To what extent a given UW has a value, or not, for the whole system is very difficult to anticipate. The system will reject automatically non-convertible UWs.
The management of the whole UW stock seems almost impossible for a human mind. The UNL system aims at the automatic management of UWs. Users are allowed to register as many new UWs as they need provided that they define not only a UW label, but also the correspondence with a word in his own language, together with the classification in the conceptual hierarchy. With this information, the UNL system will automatically manage UWs, and also provide the "deconversion" in each language. On the other hand, UWs with no access by users will be removed from the register.
Progress Made so far
At the UNU/IAS UNL Centre, the following tasks were accomplished as July 1997:
Establishment of a global network of R&D
While these developments were in process, a partnership environment has been created. A global network of research institutions has been set up and is now in full operation. The network includes 15 institutions and covers 15 languages. Research and development R&D for developing the modules for these languages is making significant progress, which is monitored by, and reported to, the UNL Centre. An on-line system maintained by the UNL Centre is available to support the R&D of the partners on the network.
Regional workshops were held in Curitiba, (Brazil), Amman (Jordan), Pisa (Italy) and Beijing (China) during the first half of 1997 with the purpose of strengthening the cohesion of the overall R&D effort throughout the network. From these workshops it is fair to conclude that all partners in the network are highly motivated and making steady progress.
The ultimate outcome of the Project is the UNL system that enables written communication in everyones mother language via Internet. The first results of the Project are already available, though in experimental and restricted form. The existing UNL embryo design and core technology will be further developed at UNU/IAS. This will include:
Modules for each native language are being developed in partnership with research institutes, universities, and R&D groups of each country, under the technical and financial support of United Nations University. They will be completed during the next biennium.
The potential beneficiaries are millions of people, and for many decades ahead. The UNL could have a significant impact across all sectors of society: on business operations (marketing, stock markets, technical handbooks, software manuals etc.); media and communications (newspapers and magazines, advertising agencies, news agencies; information networks, and telecommunications, etc.); leisure and entertainment (tourist information, travels, movie, music and art markets); printing and in public displaying (publishing houses, libraries, and museums); and international organizations documentation (UN, UNESCO, OCED, etc.). The UNL would be particularly helpful to the academic world (universities, schools scientific organizations, educational institutions, journals, etc.).
In the immediate term, the UNL can become a powerful instrument for UNU in accomplishing its mission of maintaining the "international community of scholars", to "alleviate the isolation of scholars" particularly in developing countries, to increase interaction among associated institution, to address global issues. The UNL is an instrument to enhance scientific and cultural diversity, since it facilitates access to and interaction with - many different schools of thought, value and social organization systems.
In the context of IAS, UNL will be developed in tandem with the Virtual University Project. It will provide the VU with a "language" that would allow information flow, learning, access to databases, libraries, and various other sources of organized knowledge. As part of the research on Information Technology applied to Linguistics, the UNL will also facilitate knowledge creation, knowledge representation and dissemination.
In terms of tangible products, in its first year the Project has already offered some of the anticipated R&D results, as indicated above.
The development period for the UNL project is ten years, starting from 1996. The first three years will be concentrated on the creation and completion of conversion modules for major spoken languages. The initial stage is including 13 languages: the six official languages of the United Nations, Arabic, Chinese, English, French, Russian and Spanish; plus seven other languages, German, Hindi, Italian, Indonesian, Japanese, Mongol, Portuguese. Swahili, Dutch and Finish are now being added in response to expressed interest of these countries.
In a second phase, modules for the native languages of the member states of the UN may be developed, as long as a converter and deconverter module is generated for each of these languages. Experts in computational linguistics in these languages may take the initiative of integrating the global interlingual network offered by UNL through the Internet for develop such modules.
The UNL Project is essentially an R&D type of work software applied to Linguistics. It is typical case of interdisciplinary research in the area of computational linguistic. As such, it calls for methods of research used in Linguistics (natural languages, communication, semiotics etc.), as well as in Computer Sciences. From the latter, the UNL makes use of methodologies for the development of tools related to digital processing and language representation, machine translation, information network and Internet. All these methods of work require intense use of computers and the support of network systems.
The actual work is carried out at the UNU/IAS by the UNL Centre, and throughout a global network in partnership with other research institutions. The overall project is coordinated by a director, Dr. Uchida, who is supervises a full time team of experts in Computer and Linguistics sciences.
Similarly, partner institutions conduct their respective tasks in their home institutions, but in close consultation with the UNL Centre. Frequent communication through e-mail and on-line file sharing, which allows exclusive and restricted access to the UNL core system. This is necessary to ensure consistency and synchronized progress of the whole the network and consistent with task of UNL Centre.
The regional workshops and the International symposium once a-year contribute to such a consistency and synchronized performance.
As UNL progresses, dissemination and public relations are becoming an important component for the success of the Project. A campaign prepared and supported by professional PR services will explain the benefits of the UNL and the ways of taking advantage of it.
In parallel with the development of the application of the UNL, particularly for learning purposes, is intended to explore practical applications within the Virtual University Project and in other well defined subjects, such as the description of the World Cultural Heritage sites, of the interest of the UNESCO.
Collaborating individuals and institutions
The development of the UNL is a joint effort of a core group of R&D at the UNU/IAS with a number of centres of excellence in various languages as summarised below:
The UNL Centre at UNU/IAS
The design of the UNL system and its core software is being undertaken at the Institute of Advanced Studies, where the UNL Centre is located. The UNL Centre set up and coordinates a global network of research institutions, supervises their work so as to ensure quality control and synchronization progress. For that purpose, the UNL Centre provides the partners with various software that constitute the core UNL system. It also provides them with the specifications for the development of the modules for each of the 13 languages. In addition, it provides them with technical support on line and through e-mail.
A Global Network of Research Institutions
The UNL research network is constituted by 17 research institutes, universities, and R&D groups in each country (see list below). These have been selected on the basis of their long experience and high reputation in computational linguistics. They are committed to develop a module for their respective native language. In some cases, there is more than one group working on the same language (for instance for Arabic, German and Russian). The work is done in partnership with the UNU/IAS and under the direction of the UNL Centre. In each institution there is a leader with a team of researchers in computer sciences, in linguistics, or computational linguistics. Worldwide, over 100 researchers are currently full-time involved in the development of UNL, forming a global network of research.
A one-year contract has been signed with each institution, in which the task is specified for each group. These are the common tasks that each group must accomplish within the first year: to produce a dictionary of 100,000 entries, an analyzer and a generator between the UNL and the respective native language. The work is to be done according the specification of the generator module provided by UNL Centre.
These are the partners that constitute the Global Network of Research Institutions:
Two institutions participate in UNL Project. One is from Egypt, a private and leading software company in the field of Arabic language processing: SAKHR Software. The other is from Jordan Computer Technology Department of The Royal Scientific Society.
Governmental institution of Peoples Republic of China, Department of Computer & Information Technology Advancement of Ministry of Electronics Industry.
English module is developed under the responsibility of the UNL Centre the UNL.
From France, the institution with considerable experience in French machine translation, GETA group of Fédération IMAG, Laboratories CLIPS, Université Joseph Fourier.
Institut der Gesellschaft zur Förderung der Angewandten Information Forschung e.V. an der Universität des Saarlandes (IAI).
Department of Computer Science and engineering of Indian Institute of Technology, Bombay participates in this project from India anticipating to extend this system to their various languages other than Hindi.
Indonesia, like India, holds various languages spoken in the country. Electronics and Information Technology Agency for the Assessment and Application Technology (BPP Teknology) participates in the project and works to develop.
Instituto di Linguistica Computazionale del CNR joins this project.
The Japanese module is developed by the UNL Centre , at the UNU/IAS in Tokyo.
Governmental institution, Brazilian Research Network (RNP) is providing playing the leading role of articulating a dozen of universities in Brazil, together with CITS, a leading software center in Brazil.
From Russia, two institutions participated; one is governmental institution in Moscow, and the other is STAR SPb Ltd., a private documentation company in St. Petersburg. Computational Linguistics Laboratory of Institute for Information Transmission Problems, Russian Academy of Sciences.
Departamento de Inteligencia Artificial, Facultad de Informatica de Madrid,Universidad Politecnica de Madrid.
This module is being developed under the supervision of Prof. Makino, from the Toho University.
The Department of Information Science, Faculty of Science, Toho University works on the research of Universal Grammar. And this department will undertake a feasibility study of UNL parser and generator for distinctive languages under the collaboration with Mongol Pedagogical University in Mongolian language, and also with Institute of Linguistic Research of St. Petersburg University for Tajikian, Latvian, and other languages.
In 1996, agreement between UNU/IAS and these institutions were signed, which includes financial support from the UNU/IAS and specific tasks to be accomplished by each contracting institution.
The development period for the UNL Project covers ten years. The first three years will be concentrated on design and development of systems core part, and on the creation and completion of conversion modules for major spoken languages. The remaining seven years will be dedicated to ever improving the UNL core system, and in applying it to the development of modules for the native languages of the member states of UN. All other languages may take advantage of the UNL system to integrate the global inter-lingual network offered by UNL, as long as a converter and decoverter module is generated.
The first stage (1996 - 1998) is devoted to the design and development of systems core part, and its application to the major populated languages. This comprises: