MT News

International

Newsletter of the International Association for Machine Translation

 

ISSN 0965-5476……………………………………………………………Issue no.4, January 1993


IN THIS ISSUE:

Conference reports

Association news (IAMT, AMTA, AAMT, EAMT)

People on the move...

Systems and Projects

Users' Views

Evaluation of systems

Publications

Conference Announcements

Forthcoming Events

Application and registration forms


Editor-in-Chief: John Hutchins, The Library, University of East Anglia, Norwich NR4 7TJ, United Kingdom. Fax: +44 (603) 58553; Email: L101@uea.ac.uk

Regional editors:

AMTA: Joseph E.Pentheroudakis, Microsoft Corporation, Redmond, WA 98052, USA. Fax: +1 (206) 936-7329; Email: josephp@microsoft.com

EAMT: Tom C.Gerhardt, CRPCU/CRETA, 13 rue de Bragance, L-1255 Luxembourg. Fax: +352 (44) 73 52; Email: tom@crpculu.lu

AAMT: Professor Hirosato Nomura, Kyushu Institute of Technology, Iizuka, 820 Japan. Fax: +81 (948) 29-7601; Email: nomura@dumbo.ai.kyutech.ac.jp


CONFERENCE REPORTS

MT Seminar for Translators in San Diego

Joann Ryan

It was standing-room only at the jointly-sponsored ATA/AMTA seminar entitled "Machine Translation for Translators" that was held in San Diego on 4 November 1992. The program began with a lively introduction to the basics of MT by Veronica Lawson, an independent consultant from England with many years of experience in MT projects.

Next on the agenda was an overview of seven successful MT applications by Joann Ryan, an MT consultant with 20 years of experience in developing SYSTRAN. The ways in which each project handles dictionary-building, inputting, pre-editing, post-editing, and feedback clearly illustrate that the success of an MT application depends largely on the creativity and dedication of the translators and other personnel who customize the MT system to meet the needs of their environment. Although the applications covered differed in many respects, including the MT system used (Systran, Logos, METAL or ENGSPAN™/SPANAM™), interviews with managers and translators from these projects revealed the following similarities:

1) Every project needs at least one person who is responsible for terminology; if more than one person provides terminology, control over terminology should be centred in one person.

2) Although much of the input to MT is already in electronic form, it is in a wide variety of formats; if the MT system developer does not provide an interface to a particular word processing or publishing package, the user must write the necessary interface in order to preserve formatting and graphics throughout the translation process.

3) Most users do simple pre-editing in the form of spell-checking; the more carefully the document is checked at this stage, the less post-editing is needed.

4) All users currently do a full post-edit of all documents that are to be published and/or disseminated. Although post-editing speed varies greatly by document, all users report that on the average post-editing can be done at a speed of one-and-one-half to two times the speed of human translation (HT); the exception is the U.S. Air Force, which is able to do a "rapid post-edit" at seven times the speed of HT with the aid of a computer program which flags those areas requiring attention.

5) The key to the success of an MT project is the amount of feedback that is provided to the MT systemby the user, usually by adding terminology to the dictionary and occasionally by recommending linguistic improvements; many users report that they do not take the time to give enough feedback, but when they do they really see improvements.

The morning session ended with some interesting insights from Ulrike MacIsaac, acquired during her work as production manager for SYSTRAN's new in-house translation service. In a format which encouraged input from the audience, she stressed the importance of special training for post-editors, the need to create customized glossaries for every MT project, and the many areas where strict quality control is required in preparing a final translation for a customer.

The afternoon session began with a detailed report from Margarita Baena of the Centro Internacional de Agricultura Tropical (CIAT) in Colombia on her personal experience with post-editing the output of ENGSPAN™, the Pan American Health Organization's English-Spanish MT system, which CIAT has been using for 5 years. She explained that at CIAT one person using MT can produce 35 pages in a day and a half -- as much as 6 translators can produce in one day of unaided HT. She emphasized the need for the translator to "become friends with the MT system," since "it takes two to make a marriage work." One of her charts listed all of the elements necessary to make MT work for an organization, ranging from qualified personnel and a powerful MT system to support at the level of top management.

L.Chris Miller, a PC consultant from the Washington area, gave a preview of the results of her ATA-sponsored research on PC-based MT software packages, which she has been collecting for her personal use for the past 4 years. Her research revealed that there are 165 of these packages around the world, although only 15 or 16 are commercially available in the United States. Since all of the large mainframe MT systems will soon be available on a PC, she included them in her chart of language pairs offered.

The day ended with an open discussion between members of the audience and a panel of MT users chaired by Veronica Lawson. In addition to Margarita Baena, panel members included Doris Albisser, a user of Siemens' METAL system and head of the computer-integrated terminology department for the Union Bank of Switzerland, which is building a database of banking terminology; Ghassan Haddad, senior manager of the product translation department at Intergraph, which uses their own DP/Translator system to translate all of their documentation (when given a choice, all of their 25 translators decided to use MT); Daphne Walmer, manager of the user documentation department at the Rosemont Corporation, which uses SYSTRAN's in-house service to translate its documentation; and Robert Hensley, manager of the language translation department at AT&T, which has been using LOGOS and DP/Translator and will now add TOVNA's English-Russian system and Sharp's DUET E/Jfor English-Japanese. Robert Hensley explained that AT&T was compelled to use MT because of their translation volume; they have translated over 300,000 pages, mostly from and into Spanish, and expect to do 50-75,000 pages into Russian next year. So far they have been very successful in outputting everything in the Postscript format; however, electronic distribution is rapidly overtaking hard-copy distribution.

Questions from the audience covered a wide variety of topics: the type of training needed by a free-lance translator who wants to become a post-editor, the type of background needed by an MT system developer, how to get writers to write text suitable for translation by MT, the options available to a prospective MT user with limited resources, and whether it could be profitable for a translation agency to offer post-editing services to an MT developer. Answers from panellists and audience members indicated that there is a growing need to train free-lance translators in post-editing and that a working relationship between a translation agency and an MT developer can indeed be mutually beneficial as together they can provide the type of service that today's customers need.

When one participant asked the panellists to predict what they would be talking about at next year's seminar, some of the answers were: the delivery of multi-lingual documentation on CD-ROM, the creation of English-to-English "translation systems" tosimplify the language of input documents, the availability of all of the big mainframe systems on PC's and workstations, and increasingly personalized use of MT. The day ended with an invitation to the audience to come back next year to see for themselves what the topics would be.


MT Showcase at the Evaluation Workshop

John Benoit/ Joseph Pendourakis

The AMTA-sponsored MT Showcase exhibit at the MT Evaluation Workshop was well subscribed, actively presented, and well attended. Since it was conveniently located next to the workshop hall, it was easy to drop in for a few minutes. Special demonstrations were scheduled throughout the conference; however, if these were missed, it was usually ease to get a hands on demonstration at the booths. A brief description of the exhibits follows.

There were three groups exhibiting translation workstations and tools. The Translation Bureau of the Department of the Secretary of State of Canada showed version 2.0 of the Translator's Work Station (TWS), a system designed "to automate or accelerate auxiliary stages of the translation process." The TWS integrates word processing, bilingual spell checkers, terminology file management, text comparison and other utilities in a graphical environment; the system is described in a separate story elsewhere in this issue. MCB Systems, a company which distributes translation software throughout the North America, demonstrated Trados Translator's Workbench II (TWB II), an integrated package of tools designed to assist the human translator. It includes large, customizable terminology databases for several European languages, and is hosted on PC systems [for more information on MCB Systems and the products they distribute, see elsewhere in this issue]. Finally, Seagull Software (Poway, California) presented the Text Translation Tool, a highly interactive tool designed to assist human translators in producing foreign language versions of interactive programs. As the company points out, the system does not translate, but, rather, locates and presents textual material in existing programs. The Text Translation Tool runs on PCs.

Three groups presented systems that translated between English and Japanese. EJ Bilingual, Inc. (Torrance, California) presented EZ Japanese Writer, a PC based English-to-Japanese MT system. Language Engineering Corporation (Belmont, Mass.) demonstrated LogoVista, an interactive English-to-Japanese system which uses the syntactic transfer approach. LogoVista runs on the Macintosh, Sun, H-P, and Sony NEWS hardware, and is equipped with a Japanese word processor that uses English input. Finally, Sharp Corporation (Japan) demonstrated Duet Qt, a Japanese-to-English system which was shown on a portable, UNIX based system. The system may be directly connected to an OCR developed by Sharp.

A new player has been addedto the MT field, this time with an English-to-Arabic system. AppTek, Inc. (McLean, Virginia) demonstrated its UNIX-based English-to-Arabic MT system on a PC, which generated considerable interest. They expect to have an Arabic-to-English system available next year. [For more on AppTek see page 13 of this issue.]

The Pan American Health Organization (Washington, D.C.) demonstrated SPANAM™ (Spanish-to-English) and ENGSPAN™ (English-to-Spanish), which have been ported to "C" and now run under MS-DOS. Systran Translation Systems, Inc. (La Jolla, California) also demonstrated their system running on a PS2 using a special adapter board that allows one to run S/370 applications. The system is being converted to the C programming language to allow operation within a UNIX environment. Intergraph Corporation (Huntsville, Alabama) showed DP/Translator, which translates between English and several European languages and is integrated with the Intergraph publishing system. It runs on Intergraph's RISC workstations [see Intergraph's press release on page 13 of this issue]. Linguistic Products (The Woodlands, Texas) showed PC-Translator, their flagship product which runs on PC DOS and translates between English and several European languages. It has recently been upgraded to use the formats of several of the PC document preparation systems [a press announcement from Linguistic Products is featured on page 13 of this issue]. Logos Corporation (Dedham, Mass.) demonstratedtheir Logos Multilingual Document Translation Software which translates between several European languages (English is one of the languages of most of the pairs). It has been ported to Unix workstations including SUN systems and PCs [see MTNI#3:12.]


Salford Colloquium on Translation

John Hutchins

Some thirty academics and researchers from the fields of MT and translation theory were invited by Jacques Durand of the University of Salford (UK) to participate in a colloquium on the theme "Machine vs Human Translation - can they learn from each other?" The colloquium took place at Salford University on 5th and 6th November 1992.

The first day began with Frank van Eynde (Leuven) arguing for transfer representations combining bilingual lexical information and interlingual treatment of grammatical elements (‘Machine translation and linguistic motivation’). He was followed by Doug Arnold (Essex) on ‘Discourse and the machine translator’ and recent computational models, by Harold Somers (UMIST) giving a survey on ‘Example-based and corpus-based approaches to machine translation’, and by Toni Badia (Barcelona) on ‘Some thoughts on Quine's thesis on the indeterminacy of translation’. Helge Dyvik (Bergen) gave a detailed exposition of his PONS system for translating between closely related languages such as Norwegian and Swedish, where the level of grammatical or semantic analysis is determined by the degree of structuralsimilarities in particular sentences, and Erich Steiner (Saarbrücken) spoke about ‘Multi-lingual information and its status within (machine) translation’, and the day was concluded by an excellent dinner, where Leo Hickey (Salford) entertained participants with examples of legal translation and its pitfalls for the unwary.

The second day continued with presentations bridging the two fields: John Hutchins (East Anglia) attempted to define the concept of ‘literal translation’ in a way relevant for both MT and human translation, and Paul Bennett (UMIST) spoke on ‘The translation unit in human and machine translation’ arguing that MT should be based on the smallest manageable units and that adopting larger (discourse) units may be less efficient and may not improve quality. The talk by Monique L’Huillier, who unfortunately could not be present, entitled ‘Students learn; can MT graduate too?’ considered whether MT could learn to improve from exposure to texts and examples in the way that students improve their translating abilities.

The next two talks illustrated contrasting models or theories of human translation. Ian Mason and Basil Hatim (Edinburgh), in a joint presentation (‘Signalling text structure’), highlighted the need for both human translators and MT to identify indicators of text structure and organisation, not just for the sake of sentence cohesion but also for semantic and pragmatic interpretation. Roger Bell (Westminster) outlined his stratificational interlingualmodel of translation, not as a psychological model but as a conceptual model of basic processes common to both human and machine translation. The colloquium ended with a description by Chris Brew (Edinburgh) of a project for building a system for the ‘Automatic evaluation of computer generated text’ including the assessment of the quality changes of an MT system from one version to the next.

The Salford colloquium represented a further stage in the growing interaction between practitioners of MT and translation theory, an area much neglected in the past and which should bear fruit in the future development of both fields. It is hoped that papers given at the colloquium may be published in the near future.


Translating and the Computer 14

John Hutchins

The fourteenth conference of this well-known annual series took place 10th-11th November 1992, organized by Aslib in conjunction with the Institute of Translation and Interpreting, the Aslib Technical Translation Group and this year, for the first time, the European Association for Machine Translation. Participation by UK delegates was somewhat lower than in previous years, which was probably attributable to the current economic situation, but participants from overseas were at the usual level. This year there were two innovations: copies of the proceedings were issued upon registration, and the conference included three parallel sessions.

As in previous years, the topics ranged from practical issues for human translators to presentations of research developments. There were three major strands: quality assurance, standards, and training; the translator's workbench (TWB) and other translation support facilities; and recent developments in machine translation. The first strand is of minimal interest to readers of MTNI and will not be reported here. The second strand was represented by presentations of the TWB project of the Commission of European Communities by major participants: Khai Le-Hong (Mercedes-Benz), Gerhard Freibott (Krupp), and Khursid Ahmad (University of Surrey). Other topics were the evaluation of software (Jenny Rowley), the setting up of bulletin boards for translators (Christoph Bouthillier, and Bob Clark), and software for Oriental scripts (Han-Zhong Zhang).

The third strand was represented primarily in sessions sponsored by EAMT. On the 10th November, Alain Paillet described the implementation of the METAL system at Boehringer Ingelheim for the translation of pharmaceutical literature from German into English and into Spanish. Various problems were encountered and frankly described. Overall the installation has been a success, but not yet in monetary terms, although translation productivity has increased. In the same session, Jean-Jacques Pérot (SITE) described in broad terms the Eurolang project for the development of a multilingual system based on the Ariane `engine' from GETA, which SITE hadevaluated as part of the French National MT Project. Expertise in MT is being gathered from former Eurotra and GETA researchers and the possible collaboration of European companies involved in translation technologies. The aim is the production of a "user-friendly translator's workstation" with facilities for pre-editing, post-editing, dictionary access, alternative MT outputs, translation management, and providing the "European business community the command of multilingual technical and commercial communication which... should surpass the Japanese competition in this strategic field."

A general talk on the first day which covered the second and third strands was given by Flemming Svanholm (IBM Copenhagen). He described the activities of IBM in the development of machine aids for translators (the IBM TranslationManager/2) and the MT system LMT (Logic-based Machine Translation), and emphasised particularly the integration of computer-based aids and MT systems in a controlled translation production system.

On Wednesday 11th November, presentations were given by suppliers and vendors, and included live demonstrations of METAL (by Sietec), of Gloablink (by the UK distributors Archers) and of the experimental ATAMIRI system from Bolivia. The second EAMT-sponsored session included a description by Manny Rayner (SRI) of the experimental Swedish-English dialogue translation system which is now being extended with modules for speech recognition and synthesis. It was followed by Gert van der Steen's detailed account of the development of controlled language systems at Volmac (Utrecht) which integrate Dutch text editors and English text generators to produce high quality translations. Systems have been developed for large industrial clients in the fields of textile manufacturing, software manuals, insurance, and aircraft maintenance. Further developments include translation into Spanish and a controlled English (`Simplified English') for translation into French.

In the final session, participants learnt about the linguistic integration of the communication systems of the British, Belgian and French railways for the imminent opening of the Channel Tunnel, involving problems not only of language and culture but also of differences between the railways' operational practices.

The proceedings have been published as Translating and the Computer 14 (for details see Recent Publications in this issue) and are available from: Aslib, the Association for Information Management, Information House, 20-24 Old Street, London EC1V 9AP (tel: +1 (0)71 253 4488.)


ASSOCIATION NEWS

International Association for Machine TranslationMeeting of the Council, Montréal, PQ, 25 June 1992

Present: Nagao, Presiding; Bennett, Hutchins, King, Magnusdottir, Merchant, Nirenburg, Pentheroudakis, Somers, Tanaka, Uchicha, and Whitelock.

The meeting was called to order at approximately 5:45 PM, June 25, 1992, by Nagao, President. Nagao welcomed all those attending the meeting.

1. Adoption of the agenda

The agenda was approved as published.

2. Approval of minutes of previous meetings

Minutes of the Council meeting, July 3, 1991, were read by Nagao and approved by the Council. Minutes of the general membership meeting, July 4, 1991, were read by Nagao. The section on the EAMT was amended. The amended minutes were approved by the Council.

3. Report of the President

Nagao presented his report on the activities of the IAMT. Included in his report was the announcement that the IAMT had contributed US$ 2000 to the TMI-92.

4. Report of the Secretary

Bennett, acting for Vasconcellos, stated that there was nothing to report.

5. Report of the Treasurer

Merchant reported that there were US $10749.85 in contributions as income and US $2000.00 for the TMI-92 as expenses, as of June 12, 1992. King stated that the EAMT owes the IAMT approximately US $2000. There was some discussion about the percentages that each of the regional organizations contribute to the IAMT. The JAMT seems to be paying a larger amount than might be justified. There was also discussion about whether the IAMT should help fund such events as the TMIs and the MT Summits. These issues will remain open for now.

6. Report of the Editor

Hutchins reported the costs of publishing the first two issues of the newsletter. It was felt that the £893 paid for the first newsletter was excessive and that the cost of production could be reduced. The possibility of printing the newsletter in the US and distributing it from there will be considered.

Hutchins stated that it was decided to publish only three issues of the newsletter each year in 1992 and in 1992 due to the cost. He also suggested that we may want to change the logo and the title, both of which he created.

There was a discussion of how the newsletter is put together. Hutchins requested more contributions, especially on new products, language pairs, and installations. Regional editors should actively solicit contributions. Also we may consider advertisements in the newsletter. Bennett will ask Bill Fry, who manages the ATA newsletter, for advice and a possible presentation to the Council

7. Decision to proceed with incorporation of the IAMT

The Council approved the incorporation of the IAMT as a non-profit organization in the District of Columbia. At some point in the future a draft version of the bylaws will be written and distributed.

8. Report of the AMTA

Bennett reported on the activities of the AMTA. Included in his report were: membership (170+ members), the MT Evaluation Workshop and Showcase in San Diego, and the possible "MT Yellow Pages".

9. Report of the AAMT

Nagao reported on the activities of the AAMT. He reported the establishment of the regional organization as the Asian-Pacific Association for Machine Translation on June 17, 1992. The JAMT will be incorporated in the AAMT. Also included in his report were: the two MT World events in Tokyo, the workshop on evaluation, and the plans for an AAMT newsletter in English and a MT journal in Japanese. A written report is attached.

10. Report of the EAMT

King reported on the activities of the EAMT. Included in her report were: current membership (40-50 regular, 5 institutional, and 5 industrial members), workshops, and the incorporation of the EAMT under Swiss Law. Hutchins was commended for his work as the Editor of the newsletter. IAMT support (without funding) for a course in MT for users and potential users and for the evaluation forums was noted.

11. Report of the JAMT

The report on the JAMT was included with the report on the AAMT. A written report is attached.

12. Report on the formation of the TMI steering committee

Nirenburg reported that the TMI Conferences had grown from a "family" to a committee structure as the conferences had grown. This parallels the growth of the COLING Conferences. The current organizing committee consists of: Bennett, Nagao, Nirenburg, and Isabelle. Nirenburg also stated that there is a strong sense that funding from the IAMT or any of the regional organizations should be decided on a case-by-case basis and not automatic.

13. Report on the evaluation initiative

It was reported that the European Community has established five expert groups on natural language processing. These groups are called EAGLES: Expert Advisory Groups on Language Engineering Standards. There are groups for: corpora, lexicons, formalism, speech, and evaluation; King is chair of the evaluation group.

14. Report on the MT Summit

Nagao reported on the MT Summit IV. It is entitled "International Cooperation for Global Communication" and will be held in Kobe, Japan, July 19-22, 1993. July 19 will be for the Executive Briefing; July 20-22 for the Summit itself. All papers will be invited.

15. TMI-93

Tanaka reported on the TMI-93. It will be held in Kyoto, Japan, July 14-16, 1993. The principal theme is to be "MT in the Next Generation".

16. New business

There was no new business.

17. Data and place of next meeting of the Council

The next meeting of the Council will be in San Diego, November, 1992. Date and time will be established by the AMTA. The agenda will include: status of the bylaws, advertising in the newsletter, and a report from the EAMT.

18. Adjournment

The meeting adjourned at approximately 8:45 PM, June 25, 1992.

Respectfully submitted,

Winfield S. Bennett, Acting Secretary


IAMT Council Meets in San Diego

The third meeting of the IAMT Council was held in San Diego on 3 November 1992 at the time of the MT Evaluation Workshop with the following persons in attendance: President Makoto Nagao, Vice President Margaret King, Secretary Muriel Vasconcellos, Hirosato Nomura and Hiroshi Uchida representing AAMT, W. Scott Bennett and Joseph Pentheroudakis representing AMTA, Doris Albisser and Veronica Lawson representing EAMT, General Counsel Robert Carswell, and Bill Fry as invited guest. Joseph Pentheroudakis also represented newsletter Editor-in-Chief John Hutchins.

Report of the President

President Nagao reviewed the year's events, including TMI-92, sponsored in part by IAMT, and the MT Evaluation Workshop, the first activity to be organized by IAMT. He pointed out that the membership of the combined regional associations had grown to over 500 since the founding of IAMT less than 18 months ago. The regional member association for Asia is now the newly formed Asian Pacific Association for Machine Translation (AAMT).

Report of the Secretary

Muriel Vasconcellos, Secretary, reported that IAMT was registered as a nonprofit corporation in the District of Columbia on 15 October 1992. Bylaws for the Association are being drafted.

Report of the Treasurer

Roberta Merchant, Treasurer, reported a balance on hand of $9,596.53 as of 30 October 1992. Income since the Association's inception, including contributions of 10% on membership dues from all three regions, had amounted to $13,444.35, against which there had been total expenses of $3,937.82, representing $2,000.00 for partial sponsorship of TMI92 plus costs in connection with Council meetings, communications, and MT News International. Gross receipts from registrations for the Evaluation Workshop were exceeding expectations, and Merchant predicted that the event would break even. It had served to attract a number of new members.

Report of the Editor

AMTA Regional Editor Joseph Pentheroudakis, speaking for Editor-in-Chief Hutchins, reported that the European copies of issue no. 3 of MT News International had been printed in the U.S., with appreciable cost savings, and that it was planned to continue with this arrangement. There was discussion of whether or not MTNI should carry advertising, and it was decided that Pentheroudakis and Bill Fry would work together on a proposal. It was also decided to include book reviews in MTNI as long as the titles had direct relevance for MT.

Work is under way on the MT Directory, being edited by John Hutchins, and the final product should be ready in time for MT Summit IV in July 1993. Volunteers are needed.

After due debate, it was decided to offer subscriptions to MTNI, separate from memberships, at $50.00 a year.

Report of AAMT

The AAMT report was presented by Hirosato Nomura. The new Association for Machine Translation (AAMT), which was established on 17 June 1992, has members in China, Korea, Thailand, Malaysia, Indonesia, Singapore, India, and Taiwan. AAMT has some 200 members, of which 60 are corporate. Starting 1 November the JAMT Journal it will be published in both English and Japanese under a new name.

Report of AMTA

Muriel Vasconcellos, President of AMTA, reported that AMTA registered an amended purpose on 22 September 1992 to reflect the fact that the main interest of members is to study, evaluate, and understand the science of machine translation and educate the public on important scientific techniques and principles in this field. She also announced that AMTA had been granted nonprofit status under Section 501(c)(3) of the Internal Revenue Code. (The same status is being requested for IAMT, which shares the same purposes.)

As of 30 October 1992, AMTA had 239 members, of which 12 were corporate and 2 were institutional. The balance on hand in the treasury as of the same date was $7,993.46, minus an obligation of $244.12.

A directory of members and MT activities in the Americas, the MT Yellow Pages, is in preparation.

The main activity of AMTA during the year was the MT Evaluation Workshop/ Showcase, which attracted over a hundred registrants, including many new members for the Association, and will probably pay for itself. AMTA also was a sponsor of TMI-92, having contributed $2,000.00.

Report of EAMT

Margaret King, President of EAMT, reported that EAMT had 80 members, of which 7 were corporate and 14 were institutional. The balance in the bank on 30 October stood at 6,899.20 Sfr. Activities during the year had included workshops in Nantes (22 July) and Saarbrücken (29-30 October) plus a series of other initiatives. For 1993 EAMT is organizing the workshop "The Lexicon and Machine Translation" (Heidelberg, 26-28 April) and will also sponsor the Bulgarian Summer School in Computational Linguistics and a second Evaluators' Forum.

The first General Assembly will be held in Utrecht in April. Norwich and Brussels have been mentioned as sites for MT Summit V, and the decision will be made in Utrecht in April. It is planned to hold TMI-VI "back-to-back" at a nearby site.

Report on TMI-V

Hirosato Nomura reported that arrangements were under way for TMI-V, to be held in Kyoto on 14-16 July. A call for papers was being widely circulated.

Report on MT Summit V

Preparations were well advanced for MT Summit IV, to be held in Kobe City on 19-22 July 1993. Participation in the program is by invitation. The registration fee for Summit V will be ¥50,000, which covers the proceedings, banquet, and reception, but not the hotel.

Date and Place of Next Meeting

It was decided to hold the next meeting of the Council in Japan at the time of MT Summit IV.

Respectfully submitted,

Muriel Vasconcellos, Secretary

The full minutes with attachments are on file with the IAMT Secretariat.


Adoption of IAMT articles

On 15 October 1992 the International Association for Machine Translation became a nonprofit corporation in the District of Columbia. Its purposes (i.e. Articles of Incorporation), approved by the interim Council, are as follows:

ARTICLES OF INCORPORATION

OF THE

INTERNATIONAL ASSOCIATION FOR MACHINE TRANSLATION,INC.

(IAMT)

We, the undersigned, all residents of the United States over the age of 21 years, desiring to form a corporation pursuant to the District of Columbia Nonprofit Corporation Act,Title 29, Chapter 5 of the District of Columbia Code, do hereby certify as follows:

NAME: ARTICLE 1The name of the corporation is INTERNATIONAL ASSOCIATION FOR MACHINE TRANSLATION,INC. (IAMT).

DURATION: ARTICLE 2The corporation shall have perpetual existence.PURPOSES: ARTICLE 3The purposes of the International Association for Machine Translation, Inc., shall be exclusively nonprofit, scientific, and educational. It shall bring together users, developers, researchers, sponsors, and other individuals or institutional or corporate entities interested in machine translation for the purpose of studying,evaluating, and understanding the science of machine translation and educating the public on important scientific techniques and principles of machine translation. In furtherance of these purposes, the corporation may carry on the following activities:

a. Sharing knowledge about the science and technology of machine translation through the collection, compilation, exchange, and dissemination of information;

b. Sponsoring and supporting workshops, symposia, and conferences on machine translation and related technologies and applications;

c. Developing appropriate educational materials and programs;

d. Facilitating access by researchers to machine-readable corpora and cooperating in the exchange of formats and text-encoding conventions; and

e. Discussing and establishing reference criteria for the evaluation of machine translation technology.

MEMBERS: ARTICLE 4

Membership in the IAMT shall be open to individuals from any geographic area of the world who are interested in the purposes of IAMT. All members shall be entitled to participate in educational functions of the IAMT and to receive educational materials distributed by the IAMT. The corporation shall not issue any shares of stock.

COUNCIL: ARTICLE 5

The Board of Directors shall herein be called the "Council." The affairs of the Corporation shall be managed by the Council. The Council shall have all the powers, rights, and duties of a Board of Directors, and it shall have no fewer than ten (10) members. The Bylaws shall provide for the manner in which the members of the Council are selected and shall specify their powers and duties.

STATUS AND DISSOLUTION: ARTICLE 6

No part of the net earnings of the corporation shall inure to the benefit of, or be distributable to, any private member, corporate or individual, Council member, officer, or other private person, except that the corporation shall be authorized and empowered to make reasonable compensation for services rendered. No substantial part of the activities of the corporation shall be the carrying on of propaganda or otherwise attempting to influence legislation, and the corporation shall not participate or intervene (nor publish or distribute statements) in any political campaign on behalf of any candidate for public office. Notwithstanding any other provision of these Articles, the corporation shall not carry on any other activities not permitted to be carried on (a) by a corporation exempt from Federal income tax under Section 501(c)(3) of the Internal Revenue Code of 1954 (or the corresponding provision of any future United States Internal Revenue Law) or (b) by a corporation contributions to which are deductible under Section 170(c)(2) of the Internal Revenue Code of 1954 (or the corresponding provision of any future United States Internal Revenue Law).

Upon the dissolution of the corporation, the Council then in office shall, after paying or making provision for the payment of all the liabilities of the corporation, dispose of all the assets of the corporation to such tax-exempt educational, charitable, or humanitarian organizations within the meaning of Section 501(c)(3) of the Internal Revenue Code of 1954 (or the corresponding provision of any future United States Internal Revenue Law) as shall be chosen by said Board of Directors. No private member, individual or corporate, shall have any right, title, or interest to any remaining assets of the corporation. No distribution of assets shall go to any organization any part of whose earnings inure to the benefit of any private individual shareholder, nor shall any assets be distributed to any organization a substantial part of whose activities is carrying on propaganda or otherwise attempting to influence legislation or which participates or intervenes in any political campaign on behalf of any candidate for public office.

REGISTERED OFFICE, REGISTERED AGENT: ARTICLE 7

The address of the corporation's initial registered office is Suite 310, 655 Fifteenth Street, N.W., Washington, D.C. 20005, and the name of the corporation's initial registered agent at such address is Robert M. Carswell, Jr.

INITIAL COUNCIL: ARTICLE 8

The initial Council shall consist of ten (10) persons, whose names and addresses are:

Winfield S. Bennett, Linguistics Research Center, University of Texas, Austin, Texas 78713-7247.

Roberta Merchant, 5420 Storm Drift, Columbia, Maryland 21045.

Muriel Vasconcellos, 1739½ Corcoran Street, N.W., Washington, D.C.20009.

Doris Albisser, GRCT, New Technology Division, Bahnhofstrasse 45, CH-8021, Zurich, Switzerland.

W. John Hutchins (University of East Anglia), 89 Christchurch Road, Norwich NR2 3NG, England.

Margaret King, ISSCO, University of Geneva, 54 route des Acacias, CH-1227 Geneva, Switzerland.

Ulrike Schwall, Scientific Center, Wilkensstrasse 1a, W-6900, Heidelberg,Germany.

Taizou Kotani, Inter Group, Shiroguchi Bldg., 2-15 Sumita-cho, Kita-ku, Osaka, Japan.

Makoto Nagao, Department of Electrical Engineering, Kyoto University, Kyoto 606-01, Japan.

Yushi Uchida, Information Processing Section, Fujitsu Research Laboratory, 1015 Kamiodanaka, Nakahara, Kawasaki City, Japan.

The initial Council is empowered to establish the amount of or rate of calculation of dues and to solicit and receive or collect funds on behalf of the corporation. Such actions by the initial Council shall be subject to ratification by the first general meeting of members. This power shall thereafter be provided for and governed by provisions of the Bylaws. The initial Council is empowered to call a general meeting of the members, which shall take place no later July 1993.

INCORPORATORS: ARTICLE 9

The names and addresses of the incorporators are:

Robert M. Carswell, Jr., Suite 310, 655 Fifteenth Street, N.W.,Washington, D.C. 20005.

Muriel Vasconcellos, 1739½ Corcoran Street, N.W., Washington, D.C.20009.

Deanna L. Hammond, 3560 South George Mason Drive, Alexandria, Virginia 22302.


Amendment of AMTA article

On 23 September 1992 the Association for Machine Translation in the Americas was granted an amendment to its purposes, which now read as follows:

ARTICLES OF AMENDMENT TO

ARTICLES OF INCORPORATION OF

ASSOCIATION FOR MACHINE TRANSLATION IN THE AMERICAS, INC.

(AMTA)

TO: Department of Consumer and Regulatory Affairs, Washington, D.C. 20001

Pursuant to the provisions of the District of Columbia Non-profit Corporation Act, the undersigned adopts the following Articles of Amendment to its Articles of Incorporation:

FIRST: The name of the corporation is: Association for Machine Translation in the Americas, Inc. (AMTA).

SECOND: The following amendment of the Articles of Incorporation was adopted by the Corporation in the manner prescribed by the District of Columbia Non-Profit Corporation Act:

PURPOSES. ARTICLE 3. The purposes of the Association for Machine Translation in the Americas, Inc. shall be exclusively nonprofit, scientific, and educational. It shall bring together users, developers, researchers, sponsors, and other individuals or institutional or corporate entities interested in machine translation for the purpose of studying, evaluating, and understanding the science of machine translation and educating the public on important scientific techniques and principles of machine translation. In furtherance of these purposes, the corporation may carry on the following activities:

(a) Sharing knowledge about the science and technology of machine translation through the collection, compilation, exchange, and dissemination of information;

(b) Sponsoring and supporting workshops, symposia, and conferences on machine translation and related technologies and applications;

(c) Developing appropriate training materials and programs;

(d) Facilitating access by researchers to machine-readable corpora and cooperating in the exchange of formats and text-encoding conventions; and

(e) Discussing and establishing reference criteria for the evaluation of machine translation technology.

THIRD: The amendment was adopted by a consent in writing signed by all members entitled to vote with respect thereto.


EAMT Open Meeting, Saarbrücken

[from minutes provided by Ian Johnson, acting EAMT secretary]

The EAMT held a general open meeting on Thursday 29 October 1992 in a seminar room of the Universität des Saarlandes in Saarbrücken, Germany. The meeting was attended by about 30 members.

Margaret King began by summarizing the current state of the association. There are now 69 individual members, 7 institutional members and 14 non-profit making organization members. The EAMT balance as at 30 September was: income SFr. 6899.20 (with SFr.950 fees in process), expenditure SFr. 790 to go to IAMT (with bill for Newsletter to come.) Maghi King then reported on activities of IAMT, in particular the Council meeting in Montreal. Current total IAMT membership at that date was 500 (AMTA 200, JAMT 200, EAMT 100). EAMT contributes 10% of its membership fees to IAMT.

Reports of past activities were given; the Machine Translation and Translation Theory workshop was held in Nantes on 22 July 1992, organized by Christa Hauenschild and Ulrike Schwall and attended by about 40 participants. A workshop on Machine Translation and its Users was held in Saarbrücken on 29 October 1992, organized by Tom Gerhardt.

Future plans include:

· two sessions sponsored by EAMT at the Translating and the Computer 14 conference [see report elsewhere in this issue]

· a workshop on the Lexicon and Machine Translation planned for 26-28 April 1993 in Heidelberg [see separate announcement in this issue]

· the first General Assembly of EAMT to be convened at Utrecht during the meeting of the European Chapter of the Association for Computational Linguistics, 22-24 April 1993

· the sponsorship by EAMT of the Bulgarian Summer School in Computational Linguistics in 1993

· a second Evaluators' Forum is planned for next year, to be held in Lausanne and organized by Kirsten Falkedal (ISSCO, Geneva) - details will be decided after the San Diego Evaluation conference [see report elsewhere in this issue]. It was noted that the proceedings of the first Evaluators' Forum will soon be available at approximately SFr.40.

Other proposals:

· Christa Hauenschild and Ulrike Schwall have proposed an EAMT conference or workshop in 1994 on Machine Translation and Corpora or Machine Translation and Machine-Readable Dictionaries.

· MT Summit V is to take place somewhere in Europe in 1995. It is proposed that the General Meeting in Utrecht should decide, with the deadline for proposals to be the end of February 1993.

· Harry Somers suggested that the TMI and MT Summit conferences should be kept separate, as they address different groups. Maghi King was asked to represent this view at the San Diego meeting of IAMT.

· John Hutchins proposed that the newsletter should be available for sale to non-members; and the views of the other associations are to be sought with a final decision by IAMT Council.

· proposals to introduce associate membership, reduced rates for East European members, and payment in local currencies, were all referred to the General Assembly for consideration.

It was noted that many formal items of procedure have yet to be determined by the General Assembly, e.g. the frequency of meetings, the election of officers and committee members, the final adoption of the Statutes.


German MT Users Circle meets in Saarbrücken

Tom C.Gerhardt/John Hutchins

The seventh meeting of the Anwender Arbeitskreis MÜ met on 28 October 1992 during the Saarbrücken Technology Fair. Various topics were discussed by working groups. One considered the drawing up of guidance for those introducing MT systems, the problems of installing systems over a period of time into translation or documentation departments, the transformation with the help of MT of translation departments into `innovative cost centres' to raise company's productivity directly or by improving information provision. A second group on quality comparison discussed the results of a ‘naïve’ experiment in evaluation: a comparison of the results of translating a short text by the Systran, METAL and Logos systems. The comments were not considered to be sufficiently useful as guidance for users who might want to undertake evaluations. It was proposed to have a second run with fuller lexical coverage to improve the comparability of results. A third group on `secondary tools in the translation environment' proposed to focus on recent trends in machine-aided translation and terminology interfaces, e.g. the ‘memory-based’ systems (IBM TranslationManager/2, Trados Translators Workbench II, Star Transition), the linkage of term banks and MT systems and the exchange of terminology between different systems.

It was announced that IT&S will have an informationstand at CeBit 1993 in the special exhibition "Chancen 2000 - Technologie verbindet". A research MT system is going to be demonstrated and information about the MT Users' Circle given out. The next meeting of the group will take place on 6-7 June 1993 in Leverkusen.


PEOPLE ON THE MOVE...

Muriel Vasconcellos, president of AMTA and secretary of IAMT, has taken early retirement from the Pan American Health Organization, where for 15 years she oversaw the development and implementation of PAHO's two proprietary MT systems, SPANAM(tm) and ENGSPAN(tm). Muriel now runs her own consulting business, specializing in translation and machine translation, and she will be devoting more time than ever to the Associations. She can be reached by fax at (202) 667-8808 or on CompuServe at 71024,123.

Ian Piggott, long-time head of the Systran operation in the Commission of the European Communities, has relinquished his responsibilities to Jean-Marie Leick. In an interview in the latest issue of Language International (vol.4 no.6 December 1992, p.13-14), Ian Piggott reflects on his experiences promoting Systran and MT within the Commission. After preparing a detailed account of Systran over the years, Ian will move to his new job involving research into improving document preparation, and indirectly enhancing the quality of MT output.

Patricia ("Patty") Schmid has rejoined Logos Corporation. Until not too long ago, patty worked for Globalink, and had worked for Logos prior to that. She can be reached at Logos Corporation, 111 Howard Blvd., Suite 214, Mt.Arlington, NJ 07856, USA.


SYSTEMS and PROJECTS

NEC MT system

[from CompuServe, NewsBytes]

NEC claims to have developed an automatic language translation machine in cooperation with Korea's Science and Technology Institute (KAIST). This machine translates Japanese into English and Korean languages automatically. according to NEC.

The speed and quality of translation are relatively satisfactory, according to NEC. Briefly described, this is the way it works. The machine displays three languages in each window on the screen. It analyzes the relationship between each word in a sentence. Then it translates the sentence into an intermediary language, which was developed by NEC. The machine will translates the sentence written in this intermediary language into the target languages: English and Korean.

This automatic language translation system is said to be best at translating scientific and technical materials where the accuracy of the translation is about 60 to 80 percent. This is relatively good compared with existing translation systems. The speed of this translation machine is also said to be faster than current counterparts. The system is still not good, however, at the handling of conversational sentences, which often have obscure grammatical structure.

This automatic language system, still a prototype, is said to be able to translateJapanese into English and Korean simultaneously. English sentences can also be translated into Japanese and Korean. NEC is working on the development of a system to translate Korean into English and Japanese. Also, it is trying to improve the system so that it can handle conversational sentences.

The prototype language translation system will be extensively tested by researchers at Korea's Science and Technology Institute through a telecommunication line.


Multiple Language TV Captioning To Be Developed

[from CompuServe, Newsbytes]

The Japanese Ministry of International Trade and Industry (MITI) says it will start developing a multiple language translation system for captioning of television programs. The software system will be able to automatically translate TV programs into the local languages of Asian countries.

MITI is already developing a multi-language translation system in cooperation with laboratories in Southeast Asia including China, Thailand, Malaysia and Indonesia. This project is called MMT (multi-language machine translation), and the Ministry will spend a total of 20 billion yen ($160 million) for this project, which will last 3 years.

However, the Ministry think the system will be of great benefit to Asian communities. The 100 billion yen ($800) project should result in a system in which the original language of a television broadcast will be firsttranslated into an intermediary language, and then translated into the target languages of each country.

Translating broadcasts is extremely bothersome and costly for TV

producers and broadcasters. Also, it takes time, as translations are done manually.


CSK Corporation

Akira Takagi

[extracts from AAMT Journal no.1]

Our department [the Advanced Technology Division of the Linguistic Information Research Dept. of CSK Corporation] researches elemental technology in natural language processing, such as machine translation. Using a Japanese-English computerized translation system for stock information, developed jointly with Nihon Keizai Simbun Inc., we translate market reports on the TSE reported in morning and evening newspapers of Nihon Keizai Simbun, and send them to forty countries every day. Moreover, having been developed as a general system, the Japanese-English system named ARGO is now on sale.

Our department is researching the development of techniques for natural language analysis for use in the next MT system to follow ARGO. At present, the principal topic is advanced syntactic and semantic analysis, because only the correct understanding of given sentences can produce correct results, and the technology of syntactic and semantic analysis is at a remarkably low level.

What is understanding of meaning? Nobody knows the answer in terms of researches in artificial intelligence at present. Suppose that we have some world models in our heads. When we understand meanings, we set the meanings of given words, phrases, clauses and sentences on the model, reproduce phenomena and recognize the relationships between each phenomena. However, it is still unknown which form and structure this world model has and how it should be written. At present, a simple model is tried to be composed symbolically within limits in some ways... We now research the precise expression of semantic structure, concentrating on an ability to express meaning equality of equivalents correctly, on an ability to build a visual-linguistic interface and a possibility to get meanings. Moreover, we study imaginal world model in terms of what backs up symbols with meanings and on which model any semantic structure can be mapped with one algorithm according to its meaning...

The great variety of expressions in language makes it difficult to process natural language precisely. There could be many rules of language that we don't know yet. We must find out these rules to improve linguistic analysis without developing techniques of semantic analysis. From this point of view, we investigate to find out rules to select correct words among others and decide modification relationships, storing large amount of modification data.

On the other hand, contextual analysis is very important as a technique of linguisticanalysis in the next generation. However, it is impossible to recognize meanings of words, phrases, clauses and sentences with the present techniques of natural language processing. Therefore, we are researching how to represent the meanings of sentences approximately by abstracting the themes of sentences. That is: our aim is to develop technology to presume abbreviated words in sentences, select correct words and decide modification relationships by recognizing each theme of sentences...


Project Babel-R

[translated extracts from an article by Laurence Jacqmin in La Tribune des Industries de la Langue, no.7-8, October 1992]

Babel-Research is an MT research project at the Free University of Brussels, which began in 1988. It is a joint collaboration between industry and university, and has recently resulted in the realisation of a prototype for French-English translation of commercial correspondence.

The project is part of a tradition going back to the 1960s. Lydia Hirschberg pioneered MT research at the Université Libre de Bruxelles (1960-64) before the setback from the celebrated ALPAC report. Some twenty years later, the University took up the challenge of MT again in the form of an applied research project financed jointly by industrial (UNISYS) and public (Brussels region) funds. After a feasibility period (1984-87), the Babel-R project entered what may be called the pre-industrial era of MT at the Free University, whichextended from 1984 to 1990.

Apart from the prototype this phase highlighted a set of fundamental problems to be resolved. The University has thus now embarked on a new period of basic research (1991-94) to establish its competence in the area and to prepare further industrial collaboration.

Commercial trade is becoming increasingly international and so is the documentation which goes with it. This fact determined the choice of commercial correspondence as our field for applying MT. Moreover, preliminary studies gave hope of identifying for commercial exchanges a true ‘sublanguage', i.e. a subset of language characterized by a series of normalizations, or of simplifications in comparison with the language as a whole, where these restrictions touch at the same time the lexicon, syntax and semantics. Through the reduction of `linguistic ambiguity' a sublanguage makes an MT project viable for a specific area of activity. Unfortunately, a study in depth revealed that the syntactic and semantic complexity had been previously underestimated, and refuted the hypothesis that commercial language represented a sublanguage. This realisation led us to impose on the compilers of texts for translation a certain number of editing rules in order to restrict the linguistic complexity of texts.

The system is designed for an office environment. Its user would be roughly as follows: a secretary with naive knowledge of the language of compilation (neither alinguist nor a translator) and not knowing the target language. Because of this ignorance of the target language, the user imposes importance constraints on the translation system regarding quality of translation and the manner in which human interaction during processes takes place. Among existing MT systems there are two large families. On the one hand there are systems operating on a very restricted and well-defined sublanguage (such as meteorology) - these systems function entirely automatically and provide satisfactory translations. But most current systems are more correctly called machine aided translation, they operate on larger fields (technical reports, commercial correspondence) and are limited to rough translations intended to be corrected by human translators. Our system will apply to a field which is too large to exclude human intervention completely, but post-editing will certainly not be appropriate for our monolingual users.

The BABEL-Research project is original in seeking the solution to these problems through both the control and standardization of input and through human interaction with the source language.

The product is intended to expand progressively in number and directions of languages. In this context, a transfer architecture is adequate...

The system is being developed on Unisys U6030 and U6050 machines, which offers us a considerable range of programming languages: UNIX,C, Prolog and Oracle. Our prototype is a uni-directional French-English translator, i.e. producing French analysis, French-English transfer and English generation.

As we have indicated above, the lack of a real sublanguage has made us impose certain restrictions on the linguistic complexity to be mastered by the system. Study of a corpus of commercial correspondence identified a set of typical constructions...

[Examples of the corpus and the sublanguage are given, and the article then outlines the translation methodology: analysis, transfer and generation.]

The lexical database implemented in ORACLE contains at present 250 lexical units (corresponding to 1040 words) representative of the field covered. The system consists of three interconnected dictionaries: two of them monolingual describing words in the languages treated, and the third (the transfer dictionary) linking the other two by specifying equivalences, both structural and word to word. We prefer to call upon human expertise to resolve residual ambiguities during analysis and transfer. This aspect of the research has not yet been developed. In effect, it involves the collection of these ambiguities, identifying their nature, their frequency, etc.

To bring the prototype to a commercial stage a certain number of enhancements have yet to be realised. Apart from extension of its range,in terms of languages and grammatical constructions, we must next envisage qualitative tasks regarding the modelling of the linguistic tools and the methodology for lexical development. It is these last two points which are the concern of researchers at the present time.


Carnegie Group signs contract with Caterpillar

According to a press release issued by Carnegie Group, Inc., in Pittsburgh, Pennsylvania, in May, 1992, Carnegie Group and Caterpillar Inc. have initiated a five-year project to develop a fully automated machine translation system for Caterpillar's technical documentation. The Automated Machine Translation (AMT) system, under development in cooperation with Carnegie Mellon University's Center for Machine Translation, is intended to completely automate the translation of manuals distributed internationally by Caterpillar for its equipment. The system is also intended to eliminate the need to post-edit machine-generated documents in any of 11 major languages.

Caterpillar initiated the AMT project to facilitate its global operations and to comply with the European Economic Community's (EEC) recent directive stipulating that all operations manuals for equipment and hardware shipped into EEC countries after 1992 be written in the user's native language. Caterpillar's complete set of manuals consists of more than two million pages of text which are ultimately translated into 35 targetlanguages.

Caterpillar will implement the AMT system over a period of five years and will use it to translate all operations and technical manuals that support the company's extensive line of equipment, the announcement said.

For more information, contact Jennifer Ace at Carnegie Group, Inc., at (412) 642-6900 or Sharlene Gallup at Caterpillar Inc., at (309) 675-7342.


New Products from MCB Systems

MCB Systems, a San Diego-based company which distributes software for translators throughout North America, has announced the Trados Translator's Workbench II (TW II), developed by Trados GmbH of Stuttgart, which was officially introduced earlier this year at the CeBIT computer fair in Hannover. The product, which runs under MS-DOS, integrates the MultiTerm terminology database with a specialized translation editor and "translation memory", a feature allowing the storage of the source and target language versions of every sentence in the text; when a stored sentence, or a sentence similar to it, occurs again, the stored translation is suggested and can be incorporated in the text. According to MCB Systems, TW II is ideally suited for large projects with much repetition, such as manuals written for one product and then adapted for a similar product, frequently updated software and hardware manuals, legal documents with large amounts of repeated phrases or text, and so on. The company also points outthat TW II is fully network-compatible, so that multiple translators can take advantage of shared translation memory and terminology databases.

MCB Systems has also introduced MultiTerm for Windows, the Microsoft Windows version of MultiTerm 2. MultiTerm for Windows is a fully network-compatible program which offers flexible and yet structured multilingual terminology management, according to the company. The system runs under Microsoft Windows 3.0 or higher and requires at least an 80386 SX processor and 4 MB of RAM; a true 80386 with 25 MHz and 8 MB of RAM is recommended. Network operation requires a network operating system that supports DOS function calls at version 3.3 or higher; these include Novell Netware and IBM Token Ring LAN Manager, as well as UNIX networks with a DOS shell.

Finally, MCB Systems now offers several European Commission terminology databases; this has been made possible by an agreement arrived at between Trados and the EC allowing the company to market portions of the EC's Eurodicautom database in MultiTerm format. Each of the databases is available in two different versions, a basic version containing English, German, French, Italian, and Spanish, and the EC version containing all official EC languages except Greek. The company points out that custom language combinations can also be created. At present, three databases are available: an EC Terminology database, an Information Technology database, and an International Index ofAbbreviations. Other databases in a variety of languages and subject areas are currently being converted to MultiTerm format. Contact MCB Systems for an up-to-date list.

MCB Systems can be reached at (800) 397-4406 or (619) 457-7711, or by fax at (619) 457-9613, or on CompuServe at 71570,3116.


English-Arabic MT from AppTek, Inc.

AppTek, Inc., of McLean, Virginia, has announced a beta version of its English-Arabic translation system. The system is being placed in beta test sites, and additional beta test partners are being sought. According to the company's press release, AppTek designs, develops, markets and maintains state-of-the-art computer-based technology to assist professionals in language translation to and from Arabic. The system includes an English lexicon, an Arabic generation lexicon with fully specified morphology, and a bilingual English-Arabic lexicon, for a combined total of over 100,000 entries, according to the release. The system is an implementation of Lexical-Functional Grammar, and reflects the collaboration of computational linguists in the US, the University of Jordan in Amman and the Hassan II University in Casablanca, Morocco. The system currently runs under UNIX on an Intel 386 or 486-based workstation, and can be ported to other platforms. The company states that it intends to create an interface withpopular modern word processing and OCR utilities to provide the initial digitization of the source material to be translated. The system is compatible with the Arabic ASMO 449, ASMO 708 and Al-Arabi character representations.

AppTek has collaborated with TRW's Command Support Division in the US, and demonstrated an English-Arabic version of TRW's fully integrated bilingual command and control system at the June, 1992 AFCIA exhibition and the October, 1992 AUSA exhibit in Washington, D.C. According to the announcement, this integrated system provided a bilingual capability to DBMS, spreadsheet and word processing.

For further information about the company and its products, please contact Alex Zalani at (703) 821-5000, or by fax at (703) 734-5703. The company's address is 1420 Beverly Rd., Suite 120, McLean, VA 22101, USA.


New from Intergraph

Intergraph Corporation has announced the availability of Spanish-to-English translation system and two new filters for popular desktop publishing (DTP) programs for its DP/Translator system. In the same release, Intergraph introduced a new interface that automates much of the translation workflow.

DP/Translator's Spanish-to-English translation module is available after being successfully beta tested at the AT&T Document Development Organization. The systemtranslates from English to French, German, Italian, Portuguese and Spanish, and from French, German and Spanish to English.

Intergraph also announced expanded connectivity to FrameMaker and Microsoft Word; the filters are used to preserve the document formatting of the DTP programs in the translated versions of the documents. The company reminds us that DP/Translator provides similar connectivity with WordPerfect, QuarkXPress, Interleaf as well as the company's own DP/Publisher. The announcement points out that DP/Translator includes tools that will enable users to create filters for DTP applications not currently supported.

DP/Translator has also been equipped with a new interface which is intended to automate much of the pre-translation and post-translation file processing. The interface allows the user to specify the file type and file name on the translation menu; DP/Translator automatically manages the sequence of processes to produce the finished translation.

For further information please contact Gary Thornton, Electronic Publishing Division, Intergraph Corporation, Mail Stop LR23A2, Huntsville, Alabama 35894-0001, USA. He can be reached by telephone at (205) 730-8327, or by fax at (205) 730-9478.


Linguistic Products Announces Version 3.4 of PC-TRANSLATOR

Linguistic Products of The Woodlands, Texas, has announced the release ofversion 3.4 of PC-TRANSLATOR, its flagship PC-based MT system. The new version allows a more flexible search through its phrase dictionaries by accepting multiple wildcards. The system is available in both directions between English and Spanish, French, Italian, Swedish, Danish and Portuguese, as well as from German to English. The price per language pair and direction is US$985.00.

The company describes PC-TRANSLATOR as primarily a tool for international businesses and organizations, who must communicate in several languages. According to the announcement, the system identifies and analyzes words and phrases in the source text, retrieves the corresponding dictionary data, processes the retrieved data with grammar and syntax, deposits the translated data in an output file, and prepares an alphabetical list of unknown words. As the company puts it, "PC-TRANSLATOR increases translator productivity, leaving more time for complex problem solving and eliminating the need to look up the same words again and again."

PC-TRANSLATOR is also able to retain the source text formatting for documents created in WordPerfect, MS Word, WordStar, and WordStar 2000.

For more information about PC-TRANSLATOR, please contact Ralph Dessau or George Mallard at Linguistic Products, P.O. Box 8263, The Woodlands, Texas 77387. The company can also be reached by telephone at (713) 363-9154 or (407) 395-0568,or by fax at (713) 298-1911.


Translator's Work Station Version 2.0

The Translation Bureau of the Department of the Secretary of State of Canada has announced the installation of Version 2.0 of the Translator's Work Station (TWS). The TWS is described as a tool designed to automate or accelerate auxiliary stages of the translation process, and replaces the prototype system that was developed by the Canadian Workplace Automation Research Centre (CWARC) in 1988.

According to the announcement, the TWS is based on the concept of integrating existing hardware and software. It is built around a 386 SX microcomputer with a 2MB extended memory card, a colour monitor, a mouse and a high-definition laser printer, and includes the following programs: WordPerfect, bilingual spelling checkers, terminology file management with access to Harrap's and Collins dictionaries, verb conjugation, text comparison, word processing file conversion, TERMIUM on CD-ROM, and a window program to manage all applications simultaneously.

The Translation Bureau continues to use translators' feedback in evaluating the TWS. The Bureau also wishes to determine under what conditions the system could be introduced more widely or simply by modules; some components of the TWS are being evaluated in the context of the possible introduction of a more modest computer package for the useof all translators in the Bureau. As well, CWARC is putting the finishing touches on the TWS 3.0, to be released in March 1993.

For more information on the Translation Bureau's TWS, please call (819) 953-4533, or write to the Secretary of State Department, Translation Bureau, Translation Operations Branch, Ottawa, Ontario, Canada K1A 0M5.


USERS' VIEWS

IBS Corporation - user of a Japanese-English system

Keizou Sakurai

[edited extracts from AAMT Journal no.1]

Five years have passed since we began machine translation, and in fact, it was merely one and a half years ago when our MT system was activated fully. In the first three and a half years, the system was in our office but used only part time. However, ... our MT system now handles all the processes of translation... At present, we employ ten devices, with five translation machines and five X-terminals accordingly. Also there are five LANs (local area networks) for personal computers (PCs) to support MT and do post-translation work, with thirty personal computers and four editing devices on the LANs. One of the LANs for PCs is assigned to an English OCR device.

We have nine workers for MT at present. In order to activate an end processor for MT, we categorized translation work into three levels. That is, MT without editing process, MT with modest editing (enough for readability) and complete translation....

At present we are undertaking a big project to translate a training handbook from Japanese into English for Tokyo Gas, for their gas project in Malaysia. In this case, Japanese text is input from the word processor and it is translated into English with MT. The translation is done on MT end-processors. One end-processor is supplied to each translator to use as a translation workbench... A native English rewriter proofreads the output, then the final touch is added on a Macintosh.

What is satisfying with our procedure is that team spirit grows in the group. If you have a large-scale project where all your staff are out-house working in their homes, then it is almost

impossible to share professional knowledge among the staff easily...

We used to make a glossary in the past, and after distributing it to translators, the maintenance of the word list (supplying additional terminology during the project) was extremely complicated. But with the MT system any new word entered by anybody on the project will be shared by all other staff members at once, and everybody will be using the same updated glossary...

It may not be necessarily recognized as an advantage of MT, but when staff members are working as a team in the workbench method, mutual communication is smoother, problems are shared and solutions found as well as vocabulary controlled...

In fact, it is just one year ago that the effect of using our MT system as workbenchbecame visible. We used to employ it in the standard way: processing the original text with the MT system, transferring the output to a PC and editing the file... Translation involves the work of pre-editing, post-editing and rewriting along with translation, so that trying to do all this work on a single MT system demands many hands to complete the translation of an original text... But it is not possible to draw the line between each task of the translation process; when one job is shared among many it is difficult to measure the productivity of each individual... and to simplify assessment we decided to introduce the workbench method. A worker is now entrusted with the whole translation process. We have benefitted from the decision by increased efficiency.

There are other benefits of the shift to the workbench method. Translation is a constant battle with dictionaries... an MT system has its own dictionary and we should utilize more of the system's reference function, in other words, use the MT system's dictionary as a source...

How much does a MT system cost? The present price for an end-processor is about 22,400 dollars 280,000 yen), or a monthly lease of 432 to 440 dollars (54,000 to 55,000 yen). Even if one person occupies a terminal solely for MT use, I estimate even a beginner with just one month's experience may save you about 50,000 yen, with the workbench method.

Most of our present staff operating an MT system havenot had previous experience of translation. In other words, our requirement is for employees who read and understand English, enjoy translation work and like using computers. Advertisements attract 30 to 70 applicants, among whom we can pick 6 or 7 suitable staff. Even when we specify that the job involves MT we get a good response. Previously, when we sought experienced translators it was hard to attract people to work with MT systems. Now we have people who apply as office workers, not as free-lance translators ... ...

For Japanese to English translation the basic idea must be to input Japanese text better fitted for MT systems and to get better output when using an MT system. In other words, you rewrite the original Japanese text to be translated into English. If the text in Japanese is written clearly and concisely, it may be translated into good English which a native speaker can retouch to improve it. When it does not, the original Japanese is too complicated. I made a handbook of about 10 pages... entitled "Rewriting Japanese text to get good English" and distributed in our office. When a Japanese text is translated into English manually, all the know-how remains in the head, while in the case of MT the theory presented in the handbook makes the achievement of good English output fairly easy. In a highly objective manner, it is possible to give exact instructions on "what to do when you want to get a good translation", regardless of whether the staff know English well or not; thus itmakes the process of teaching translation less painful. As translation demands enormous knowledge from translators it is essential that each and everyone in the team knows how to obtain such knowledge. The primary key to translation efficiency is the sharing of the enormous amount of knowledge required...

As I have said, with Japanese to English translation the Japanese can be rewritten; there is a similar tendency with English to Japanese translation. But our company has not reached the stage of looking at this aspect. What we do now, when we receive an English text, is to read it with an OCR device and then a native speaker checks the input. Theoretically it should be better to rewrite the English for MT at this stage, and then the level and precision of translation would be higher. Of course, among the texts we receive some are written in perfect English while others are less than perfect. It is quite understandable that these latter ones are present for just as it is difficult for us Japanese to write good Japanese text, there are those who are not fluent in writing good English.

When we first introduced the MT system to our office, there were many anxieties, but after seeing that the changes increased the work flow and the translation process as a whole, its many advantages and benefits were finally discovered. If I may add one more point, the ratio of staff who settle down in our translation section seems to be higher than with the previous translation environment, and evencompared with the ratio in other sections of our company. One reason for this could be said to be that even after acquiring the technical skill for MT, the opportunities for using it are limited. But I would imagine that our staff too believe what I believe, that: MT work is quite interesting and has a futuristic image in itself. In this sense, for the promotion of MT: once the operators sitting in front of MT terminals really feel for themselves that "there is a future for MT", then MT will have matured enough to meet people's expectations.


EVALUATION of MT SYSTEMS

IAMT/AMTA Holds a Workshop on MT Evaluation

"The best workshop I have ever attended," was an opinion voiced by several participants. There were talking about the MT Evaluation Workshop, the first-ever event to be organized by the IAMT and the AMTA. It was held at the spectacular Princess Resort in San Diego on 2-3 November 1992, and well over a hundred people attended, a testimony to the increasing interest in the methods and purposes of MT evaluation. The participants included representatives from MT research laboratories, MT users, and commercial system developers.

AMTA President Muriel Vasconcellos welcomed everyone at the workshop's opening session. In her remarks, she pointed out the significance of a workshop on evaluation from the perspective of the users, who would like to know the capabilities of the numerous systems on the market, as well as the researchers, who need to sort out and assess the ideas behind their systems. She was followed by Makoto Nagao, President of IAMT, who reminded the audience that the workshop was consistent with the IAMT's stated goals. He quoted from the IAMT's articles of incorporation, which state that the purpose of the association is "to bring together users, developers, researchers, sponsors, and other individuals [...] for the purpose of studying, evaluating, and understanding the science of machine translation..." EAMT President Margaret King addressed the participants and reviewed the activities of the European association during 1992, the association's first year of operation. Finally, the audience heard from Su-Shing Shen, who represented Y.T. Chien of the National Science Foundation, the agency that sponsored the workshop. Dr. Shen discussed the NSF's interest in multilingual processing and support for basic research in MT, AI, and NLP. The NSF's role complements that of DARPA, another U.S. agency with an active interest in MT and NLP and the evaluation of research and development in those fields. Finally, George Doddington from the Defense Advanced Research Projects Agency (DARPA) also welcomed the participants and talked about his agency's sponsorship of research in MT, which includes objective evaluations at regular intervals.

The first presentation was by Yorick Wilks (New Mexico State University) who reviewed the traditions in the evaluation of MT. He suggested that the best taxonomy of approaches to MT evaluation is purpose-oriented, and commented on "glass-box" approaches as well as the use of criteria such as accuracy and intelligibility, which have been core issues in evaluating MT systems.

Margaret King (University of Geneva) then discussed the efforts at international co-ordination in MT evaluation, and described the activities of the International Working Group on MT Evaluation, which organized an Evaluators' Forum in Switzerland in the spring of 1991. The group hopes to co-ordinate its activities with the IAMT. She also expressed the hope that a formal group would be formed in the IAMT focusing on MT Evaluation; she pointed out the need for a shared understanding of what counts as evaluation and what counts as a reasonable way to perform it. The discussion that followed centred on a comparison of the methods used to evaluate human translation and whether these methods are appropriate when evaluating MT systems (a question that was raised several times during the workshop). The view of a user was also heard, who expressed the hope that the user's input would some day find its place in evaluation methodologies.

Loll Rolling (Commission of the European Community) offered the view of a veteran user of an MT system (SYSTRAN), and discussed the methodology that was used when, in the early seventies, the Commission decided to use MT and had to choose between two systems known to it then, TITUS and SYSTRAN. The interesting evolution of evaluation within the Commission was outlined; first, the focus was on comparing it with human translation, with no particular concern for cost, speed or user satisfaction. Then, in the 1980s, evaluation focused on global measures of quality, speed and cost. In the coming years, the Commission will monitor the progress of MT systems, and the relevant benchmarks to form part of that evaluation were described. The speaker also pointed out that conformity with linguistic theory is emphatically not to be used as a criterion in evaluation. It was also pointed out that the results of the MT Evaluation Seminar held in Luxembourg on 28 February 1978 are available from the Commission.

The topic of the next session was "Evaluation Surveys." Hirosato Nomura (Kyushu Institute of Technology) presented the methodology and criteria for MT evaluation developed by the Machine Translation System Research Committee at the Japan Electronic Industry Development Association (JEIDA). These criteria fall in three groups: (a) user evaluation of economic factors in selecting an MT system, (b) technical evaluation of a system by users, and (c) technical evaluation by developers. Additionally, the speaker mentioned that the committee is developing criteria for the qualitative evaluation of MT. A detailed report on the evaluation methodology, including complete survey questionnaires, was distributed to the participants. [See a review of the JEIDA report on page 19 of this issue.]

The next speaker, John Benoit (MITRE), was part of a group which evaluated over 20 MT systems in 1991 on behalf of a group of users; the other members of the group were Pam Jordan (MITRE) and Bonnie Dorr (University of Maryland). The purpose of the evaluation was to recommend MT software for purchase and R&D support to those users. The evaluation looked at the systems in terms of user-environment criteria and translation quality needs. One of the results of the survey was that acceptable systems which can be used to scan text for relevance and systems to be used for assimilation do not exist at the present time, but that systems which are integrated in a publishing environment do exist. In direct contrast to the observation made earlier by Loll Rolling of the EC, John Benoit asserted that the linguistic framework used in the system is significant, since it can be used to predict the system's maintainability and extensibility. The corpus used for the evaluation as well the responses obtained may be made available at a later date.

The first afternoon session was devoted to the methodology used by DARPA to evaluate the work of the systems it has been sponsoring: George Doddington opened the session by observing that the evaluation is performed to help direct the R&D effort at those systems, and that the focus is on core MT technology. The agency is interested in coming up with cost-effective diagnostic, sensitive measures to judge the adequacy of algorithms.

John White of PRC, the Washington-area group which conducted the evaluation and interpreted the results, presented the evaluation methodology. Because the systems were very heterogeneous, a black-box approach was used. Short texts were prepared and submitted for processing by the various systems; they were then judged for quality (including accuracy and style) using criteria developed to evaluate human translators, and comprehension, using multiple-choice questions. The time required to perform these tests was normalized, and time vs. quality and time vs. comprehension plots were prepared. Another evaluation exercise is to be performed in March or April of 1993.

New directions in evaluation were discussed in the next session, which included three papers. "A Constraint-Based Approach" by Robert Berwick (MIT) discussed doing MT using a system based on the Principles and Parameters framework, and pointed out that information can be lost in automatic scoring techniques. Henry Thompson (Univ. of Edinburgh) distinguished three kinds of evaluation: user-based evaluation, performed to determine the adequacy of a system given the needs of a specific user; developer-based assessment, done to monitor the progress of a particular systems, and finally diagnostic evaluation, which evaluates the performance of a systems based on material such as test suites. His presentation discussed the second type. Some of the problems facing the automatic evaluation of MT output were shown to be related to the use of a single standard used for comparison, and it was suggested that the problem could be remedied by comparing the output against a set of standards. The final paper in this session, "Evaluation of the Machine-Aided Voice Translation (MAVT) System," was presented by Christine Montgomery of Language Systems, Inc. in Woodland Hills, California. The MAVT project is the first phase of voice-to-voice translation system prototype, resulting in the development of a speaker-independent continuous speech translation system for English® Spanish® English. Details were presented on the test and evaluation of the system's components, which included "black-box" and "glass-box" tests.

Evaluation methods from representatives from the rest of the NLP community occupied the remainder of the afternoon. Beth Sundheim (US Naval Command, Control and Ocean Surveillance Center, RDT&E Division) presented the methodology used at the message understanding conferences (MUC). Unlike the DARPA MT evaluation methodology, MUC uses fully automatic techniques, does not compare results from different languages (although it does evaluate results in English, Spanish and Japanese), and is characterized by minimal human involvement in scoring. The metrics used focus on the completeness and accuracy of the message-understanding system's output. The automation of testing is possible because each system's output is compared to a hand-produced result.

Lynn Carlson (U.S. Department of Defense) then discussed the evaluation of Japanese and English output of the systems participating in the DARPA Tipster project. This evaluation involves a semi-automated interactive scoring program. Again, the highly structured nature of the expected output and the criteria used to evaluate the results makes automation possible.

Rita McCardell Doerr (U.S. Department of Defense) reviewed the evaluation of the systems participating in the Murasaki project, which involves data extraction from Japanese and Spanish texts (once again, this does not involve any MT). The purpose of this project is to develop techniques to help analysts track information and produce reports.

The last paper of the day was by Craig A. Will (IDA) and Boyan Onyshkevych (US Department of Defense) and presented preliminary data on the performance of humans when creating a database of information extracted from newspaper articles in the domains of joint business ventures and microelectronics fabrication. This project is part of an effort to evaluate the automatic extraction of data in the DARPA Tipster project.

The second day started with a panel discussion moderated by Muriel Vasconcellos, the topic of which was listed as "Apples, Oranges, or Kiwis? Criteria for the Comparison of Systems." In addition to the moderator, the panel included Ralph Dessau, Ed Hovy, Veronica Lawson, Christine Miller, Makoto Nagao and Bernard Scott. The presentations and the discussion looked at ways to compare heterogeneous systems, and how sound and informative such a comparison can be. The suggestion was debated whether systems should be judged according to aspects such as cost, degree of automation, quality/fidelity, whether different criteria should be used for users and developers (black box for users, glass box for developers), or whether a system could be judged on its capacity to grow. It was also suggested that the corpus prepared by researchers at Hewlett Packard's NLP laboratory some years ago, which contains about 2,000 English sentences sorted by syntactic structure, could be used to evaluate the syntactic coverage of systems with English as their source.

The next session was the first of two on internal evaluations. The first presentation was by Koji Tomura and Isao Tominaga, and described the evaluation process for the J/E MAJESTIC system, developed at the Japanese Information Center of Science and Technology (JICST); it was followed by a presentation by Bernard Scott describing the evaluation techniques used at Logos. Finally, David Farwell presented an overview of the techniques used to evaluate the prototype system under development at New Mexico State University's Computer Research Laboratory. The system is interlingua-based, and translates between Chinese, English, German, Japanese, and Spanish.

The second session on internal evaluation included a presentation of SYSTRAN's evaluation methodology by Elke Lange, Jeanne Homer and Laurie Gerber, and a discussion of evaluation at METAL by W. Scott Bennett.

Approaches to evaluating J/E MT systems were the topic of the morning's last session. Satoru Ikehara (NTT) suggested that evaluation tests should be designed with special attention to specific differences between the two languages. Thus, tests intended to evaluate Chinese-English translation would differ from tests evaluating Japanese-English output. A suite of sentences based on this principle have been developed to test the capability of J/E MT systems. Masaru Tomita (CMU and Keio University, Japan) ended the morning session by presenting the results of an experiment evaluating six commercial E/J systems by using material from TOEFL (Teaching of English as a Foreign Language) exams.

The afternoon session opened with a paper by Gudrun Magnusdottir (University of Gothenburg, Sweden) discussing the evaluation of a system's lexical coverage, both in terms of the accuracy of the information in the dictionary and in terms of the amount of work that is required to update that information. This led into the next session, which was a panel discussion on assessing the labour-intensive elements of MT. The session was moderated by Marjorie Leon (PAHO), and the panellists were Doris Albisser (Union Bank of Switzerland), Julia Aymerich (PAHO), Margarita Baena (Unidad de Publicacion, CIAT, Colombia), Larry Childs (Novell), and Gudrun Magnusdottir. Various aspects of the operational environment of an MT system were discussed, such as dictionary update, capturing and preparation of the input text, interaction during the MT process, and postediting.

The next session reported on the results of an experiment designed by Marjorie Leon, and which involved evaluating some of the results of the DARPA evaluation exercise in preparation for the conference.

The workshop closed with a panel discussion on future directions in MT. The panel was moderated by Sergei Nirenburg, and included W. Scott Bennett, Bonnie Dorr, Ed Hovy, Satoru Ikehara, Margaret King, Alan Melby, Makoto Nagao, Masaru Tomita, Virginia Teller, and Yorick Wilks.

The workshop officially ended with closing remarks by Makoto Nagao and Muriel Vasconcellos. The participants then assembled on the deck outside the meeting rooms to enjoy a popcorn bash and, armed with exciting ideas about evaluation methodology, assessed the performance of the media coverage of the U.S. general election, which was taking place that day. No word, however, on whether the results of this evaluation will be made public...


MT evaluation symposium at MT World '92

The first issue of AAMT Journal contains an extensive report of the panel discussion of MT evaluation which took place under the chairmanship of Hozumi Tanaka at the MT World '92 Symposium in March. The session began with an account by Takenori Makino of the discussions by the two JAMT committees on evaluation: one looking at the MT system itself (hardware and software) and the other at the translation environment as a whole. In its investigations JAMT discovered that translation companies had halved their costs and increased the speed of translating large-volume documentation. It is now clear that post-editing should not been done by translators but by people specifically trained as post-editors, "as MT specialists able to translate large amounts of a documents in a short period of time", and that this training is best done in-house. System enhancements (principally dictionary upgrading) was undertaken by only half the companies investigated; those that upgraded dictionaries found them "difficult to compile and manage". Users were demanding better interfaces, editing facilities, SGML support, and enhancement of translator functions. JAMT has found that from the economic perspective, English-Japanese systems are already effective and practicable; there is more limited use of Japanese-English systems, but these too are reaching practical application. The further promotion of MT depends on document standardization and the restriction or control of input texts, the development of more friendly interfaces and environments, the customization of systems, easier dictionary compilation, and training of MT editors.

The second speaker was Shoichi Yokoyama, who considered the evaluation of MT from the natural language processing viewpoint and covered the classification of text types and the formation of an "illustrative corpus" for evaluation, the detailed classification of the lexicon, the examination of errors in the input texts, and contrastive linguistic studies. Satoru Ikehara discussed evaluation from the perspective of the developer of systems: correctness, clarity, idiomaticity; characteristics of the source texts according to the language skill of author, educational level, document type and subject content; knowledge necessary for translation; and differences of syntax and lexicon between Japanese and English at all levels. Tatsuo Ashizaki reported on the evaluation at JICST of its MT development (based on the Kyoto Mu system) currently translating 70,000 Japanese titles and 20,000 abstracts into English: comparisons before and after dictionary enhancements, analysis of post-editing and of errors, software improvements, evaluation of dictionary entries, problems of input texts, etc. Finally, Shoji Nii and Jun-ichi Nakamura considered evaluation from the viewpoint of users at the Toppan Printing Company and at Keio University, where the Oki PENSEE system is used to teach students about translation.


The Automatic Evaluation of Computer Generated Text

Henry Thompson

Chris Brew

This two-year SERC sponsored project, at the Human Communication Research Centre, Edinburgh University, aims to develop a new approach to the automatic evaluation of computer-generated texts, based on the use of standard sets. We are concentrating our efforts on the evaluation of French-English translations, so the texts which we have in mind are machine translations. Although we do not know the criteria by which translators produce and evaluate translations, we anticipate that it will be possible to model their preferences by collecting translations and evaluations from classes of part- qualified professional translators.

Standardised methods of automatic evaluation have been central to the recent explosion of interest in speech recognition; the main object of the project is to provide a similar service for Machine Translation.

It is clear that there is no one correct translation of a piece of text. It would therefore be misguided to attempt an automatic evaluation by any form of comparison with a single standard. The situation changes once we have a set of standards: now we can obtain useful information from even the crudest methods of measuring the distance between texts (such as word-for-word comparison). On plausible assumptions about the way translators work it is possible to convert distance measurements into measures of translation quality in a number of different ways. The main object of the research is to determine whether any of these measures of quality can be made to reflect human intuitions in a sufficiently consistent way to be useful.

Our techniques complement the role of the expert evaluator. A well-informed expert is in a position to diagnose and describe the failings of an MT system, and might well be able to make useful suggestions about how the system might be improved. However, suitable experts are costly, hard to find, and almost inevitably prejudiced in favour of one theoretical approach or another. Our system, while unable to provide much more than a bare numerical rating, would be cheap, convenient, utterly unbiased, and could, unlike the human export, be re-used as often as necessary without fear of boredom, irritation or staleness.

For further information contact: Henry Thompson and Chris Brew, Human Communication Research Centre, Edinburgh University, 2 Buccleuch Place, Edinburgh EH8 9LW, UK.


JEIDA report on Evaluation Methodology

John Hutchins

For several years a sub-committee of the Japan Electronic Industry Development Association (JEIDA) has been working on criteria for evaluating MT systems. In November 1992 the Machine Translation Market and Technology Study Committee, chaired by Hirosato Nomura, published its substantial report: JEIDA Methodology and Criteria on Machine Translation Evaluation. This publication represents a milestone in the increasingly active area of MT system evaluation. For the first time we have a solidly researched set of proposed criteria and methods as the basis for evaluations by users and developers. The committee has developed three methodologies for:

a) user evaluation of economic factors

b) technical evaluation by users

c) technical evaluation by developers

In each case, criteria have been developed which can be objectively derived, which can be assigned numerical values and which can be represented visually in the form of radar charts. For the user evaluations the aim has been find the least number of items making assessment as appropriate as possible. For developer evaluations, on the other hand, they attempt to list the greatest number of factors to enable the most detailed assessment as possible. The group was concerned primarily with English-to-Japanese translation.

The first set of proposed criteria is designed to enable potential users to decide whether and what type of MT or MAT system could be introduced with maximum economic efficiency in their current translation environment. It consists of two questionnaires: one to establish the current translation situation, and the other to establish users' translation needs. Fourteen parameters characterizing MT systems were prepared to evaluate the answers objectively. Answers may relate to more than one parameter, and the total value for a given parameter my be derived from more than one answer. For example, the answer to a question on the current volume of translation will contribute to both the parameter `Translation need' and to the parameter `Time'. Other questions concern types of document, desired quality of output, fields of application, etc. Scores are derived for each parameter which give a chart representing the overall MT requirements. The charts can then be compared with radar charts for seven `system types'. The seven types as defined in the report are:

"(1). Preliminary translation, high-speed translation, mass processing, mainly batch processing (Workstation) The system is installed in-house. Adjustments are possible.

(2). High-quality translation, for particular fields, various adjustments, restricted language (Workstation, exclusive use machine, stand-alone) Fields are limited and adjustments are available.

(3). Translation assistance, dictionary look-up, mainly for interactive use (Personal computer, Workstation). Dictionary look-up and advanced translation assistance functions.

(4). Preliminary translation, low-speed translation, small quantity, low cost (Personal computer). A low-priced system operating on personal computers.

(5). Terminology bank (Personal computer). No translation is conducted, mainly used as dictionary look-up.

(6). English word processing. Used as an English word processor

(7). Preliminary translation, high-speed translation, mass processing, mainly batch processing (LAN, server, network). Accessed through LAN. Adjustment is difficult. Used for providing translation service."

The second set of criteria for users is intended to enable potential users to evaluate the technical capabilities of a system. The method consists of five stages. Firstly users complete a questionnaire (A) to determine their requirements: language pairs, quality of translation, document types, system configuration, style of use, personnel, and budgetary and installation considerations. The answers are then transferred onto another form (D) following specific instructions, from which can be identified the system configuration needed by the user and the preferred hardware configuration. At the same time, the MT system suppliers complete a questionnaire (B) which produces a clear definition of the overall functions of the system. In the fourth and fifth stages, a `typical document' is translated by the system under evaluation and assessed on a form (E), and these results together with a comparison of the user's requirements (D) and the supplier's system facilities (B) are analyzed on a final evaluation form (C). The result is a radar chart representing both the system performance and the degree of the user's satisfaction with the system. The evaluation is made with 10-point scoring on ten axes: (1) what is to be done (as clarified in form A); (2) speed of translation; (3) quality of translation without adjustment (e.g. before dictionary updating); (4) quality of translation after adjustment; (5) ability to edit (i.e. functions to support post-editing); (6) system configuration; (7) style of operation (e.g. input and output requirements); (8) personnel; (9) introduction (i.e. satisfaction with installation costs and other requirements specified in form A); (10) system (i.e. satisfaction with hardware configuration).

This outline can only hint at the detailed and thorough work which has gone into these methods. Even so, the authors stress the preliminary nature of their methodology; they point out that no real-life evaluations have been done, and that further problems need to be tackled. For example, while the criteria may well suffice for distinguishing users' needs for different system types they are not yet sensitive enough to differentiate between systems of the same type.

The third evaluation methodology described in this report is for developers to make an in-house assessment of the technical achievements of their systems. Here again, numerical values are assigned to six axes to produce a radar chart for visual evaluation. The six axes are: general purpose or field specific; coverage of basic functions; degree of accuracy; originality of techniques; system openness (to users); ease of use. Questions cover: dictionary, analysis, intermediate representation, generation, treatment of special forms (e.g. titles, quotations, figures, tables), customization and learning functions, environment and operation.

The report concludes with an outline of the committee's ideas on the evaluation of MT quality. The approach proposed is to have as its primary aim an assessment by the developer of what linguistic phenomena the system cannot deal with, expressed as objectively as possible, and secondarily an assessment by users whether the system can deal with linguistic phenomena which must be translated. The method will be corpus-based with a test-suite (mathematics and health textbooks currently), and the emphasis will be on simple criteria relating to surface-structure phenomena (tree structures).

The report contains in a 129-page appendix the full sets of questions for the three evaluation methods developed. This impressive publication represents probably the most important single contribution yet to the literature on MT evaluation; it must surely be essential reading for anyone concerned with the development of evaluation methodologies for MT systems. The report is available from the Japan Electronic Industry Development Association, Kikai Shinko Kaikan, 3-5-8 Shiba-koen, Minato-ku, Tokyo 105 Japan.


Language Research & Engineering (CEC)

In its Second Call the Commission asked for proposals for: (1) research aimed at the improvement of the scientific basis in (a) Computational semantics, (b) Evaluation and quality assessment methods; (2) creation of common methods, tools and resources for (a) Multi-lingual corpora of machine-tractable texts in EC languages (guidelines, methods and tools) and (b) Study toward creation of catalogues & repositories of resources; (3) Pilot and demonstration projects: (a) Machine translation and automated aids for translation, (b)Office automation tools, especially multilingual ones: electronic publishing and authoring, methods & tools for hypertext databases, structured text databases and intelligent text aids, reuse of terminology, lexical resources, and (c) Computer-aided training, teaching and learning. Other R&D themes and topics relevant to LRE are not excluded.

The Second Call will have a larger budget than the First, and to be looking for larger projects: it is expected that average Community contribution to a project would be 0.5-1.5 MECU. Consortia should be kept small (not above 4 partners), short (not over 30 months), and user and industrial organizations are highly desirable as participants (especially since the First Call is thought to have been too heavily academic in tone). The assumption is that there will be one project in each of the above categories.

Plans are also afoot for the Third (and last) Call of LRE, which should be announced in early June 1993, with a submissions deadline around mid-September 1993, and selection completed by early December 1993. The topics currently in view for it are: performance, robustness & integration; grammars and development tool kits (support for ALEP); further applications projects (as reflecting the priorities from earlier calls). The total budget for the 3rd Call would, however, be rather small: only 1.75 MECU.


PUBLICATIONS

JTEC Panel Reports on MT in Japan

Joseph Pentheroudakis

JTEC Panel Report on Machine Translation in Japan. Baltimore,Maryland: Loyola College in Maryland, 1992. 142p. Distributed by the National Technical Information Service (NTIS), 5285 Port Royal Road, Springfield, VA 22161, USA,telephone +1 (703) 487-4650. Publication number PB92-100239.

This report was prepared by the Japanese Technology Evaluation Center (JTEC), an agency sponsored by the U.S. National Science Foundation (NSF) and overseen by Loyola College in Baltimore, Maryland. The Center studies ongoing research and development activities in Japan in several technology areas of interest to the U.S., and is part of a larger effort to gain a clearer understanding of the relative position of the United States and its competitors in those areas. This information is then widely distributed and is available to U.S. government agencies and private organizations that set policies affecting the nation's competitive position.

The panel assembled to perform the study includes experts selected with the help of the NSF and any other agencies co-sponsoring the study. In addition to their in-depth knowledge of the relevant technology, the panellists are selected for their ability to produce a comprehensive, informed and unbiased report. The report under review was co-sponsored by the NSF, the Defense Advanced Research Projects Agency (DARPA) and the United States Department of Commerce, the latter two represented by Charles Wayne and Joseph Clark, respectively.

The panel that was assigned the task of preparing the report consisted of the following individuals, all of whom have long and active histories of involvement in MT and natural language processing: Jaime Carbonell (Carnegie-Mellon University), who chaired the panel; Elaine Rich (Microelectronics and Computer Technology Corporation), Co-Chair; David Johnson (IBM); Masaru Tomita (CMU); Muriel Vasconcellos, who at the time was chief of the translation program at the Pan American Health Organization; and Yorick Wilks (Computing Research Laboratory at New Mexico State University).

The panellists travelled to Japan and in the course of one week (25-30 November 1990) visited 25 sites. These included research and development sites, MT user sites, and MT system vendors. In addition, three sites were visited by the panellists separately. The panellists acknowledge several individuals whose assistance was instrumental in carrying out their mission; special mention is made of Professor Makoto Nagao of Kyoto University, who offered help and advice during the trip and specific suggestions throughout the process of producing the report.

The stated goals of the report, best articulated in the executive summary, are "to provide an overview of the state of the art of machine translation (MT) in Japan and to provide a comparison between Japanese and Western [sic] technology in this area." Given these goals, it is this reviewer's opinion that the report is a qualified success. It is a success in that the information made available to the panel at the sites visited is indeed presented in a detailed, unbiased and informative manner (see especially Chapters 3, 5, 6, 7 and 9). It is much less successful (and we believe this to be a real weakness in the report) in attempting a comparison of the state of the art in MT in the U.S. and Japan (see especially Chapter 1). A summary of each section in the report is presented below, followed by a critique of the report as a whole.

Chapter 1 ("Introduction: Machine Translation in Japan and the U.S.) was authored by Jaime Carbonell, the Chair of the JTEC panel. This chapter introduces the working concepts that will be used in the remainder of the report, and offers a concise view of the history of MT as well as of the traditional paradigms used in MT development in the U.S., which are useful in providing a general taxonomy of MT systems. The distinction between the use of MT for the assimilation versus the dissemination of information is also introduced, and the importance of MT in Japan given the huge demand for translation in that country is discussed. In general, the information in this section does place the development of MT systems in the proper historical perspective, and should be very useful to the reader of the report.

As mentioned above, however, this chapter also attempts a comparative analysis of Japanese and U.S. machine translation (Section 1.5). The result of this analysis is echoed in the executive summary, which earlier concluded that "[...] a comparison between the U.S. and Japan in terms of MT and related technologies shows that Japan is ahead of the U.S in several important ways, including the commercial use of MT, the acceptance of MT among users, the development of knowledge sources such as dictionaries, and the use of optical character recognition (OCR) as an input modality, as well as in funding levels for R&D in MT. The U.S. has led in funding for basic research in natural language processing [...], and continues to lead in technological diversity [...], linguistic diversity [...], and level of effort devoted to R&D in speech processing." (p. 6)

It is quite unfortunate that this comparison is performed in an unquantified, undocumented and rather impressionistic fashion, representing "rough composite estimates based on the knowledge the panel has about MT efforts, both here and in Japan" (p. 12). We do not question the accuracy of the panel's estimates; on the other hand, neither is any evidence adduced to support them. The reader, as well as the entire MT community in Japan and the U.S., is entitled to the information to which the panel had access, or at least to a useful identification of its source. It becomes evident after reading other chapters in the report (especially Chapter 9) that Japan is indeed quite a bit more active in the area of technological diversity than this chapter suggests, thus contradicting some of these conclusions. The comparison between Japanese and U.S. MT should have included independent supporting evidence for the conclusions stated, or it should have been left out altogether; as presented, the comparison does a disservice to the community.

Chapter 2 ("Technical Infrastructure," by David Johnson) outlines the basic linguistic technology used in typical state-of-the-art Japanese MT systems on the market today, with specific attention paid to the transfer-based approach, the preferred paradigm in most Japanese MT systems. The author does not attempt any comparisons or evaluations of specific systems. Several systems are discussed here (NEC's PIVOT, Ricoh's MT system, Hitachi's HICATS/J-E, IBM's JETS, MU/JICST, and systems by Toshiba and CSK). The translation stages of the linguistic processor (analysis, transfer and generation) are described, and are accompanied by a generally informative discussion of issues such as lexical representation and ambiguity resolution. The fact that the grammars in many Japanese MT systems are not bi-directional (that is, can only be used for analysis or for generation, but not for both) is mentioned, but without any discussion; this issue is, however, taken up in Chapter 9 (p.105).

Chapter 3 is by Muriel Vasconcellos, and is entitled "Languages and Application Domains." The author points out that MT efforts in Japan have focused on Japanese and English, reflecting Japan's political, economic and social imperatives. However, several groups are now venturing into other languages. The figures in this chapter show the language pairs worked on in the various sites visited by the panel, as well as all source and target language combinations. There are 17 systems for J/E and 15 for E/J; furthermore, a major effort is under way at the Center for the International Cooperation in Computerization (CICC), a MITI-organized consortium of seven industry giants, to develop an interlingua-based system in Japanese, Chinese, Thai and Indonesian. New languages are being added to existing interlingua-based systems, including experiments with languages like Swahili and Inuit. Fujitsu, NEC, ATLAS-II, PIVOT, Matsushita (PAROLE), Oki Electric (PENSEE), Catena (STAR) are some of the systems which have developed, or which have under development, pairs involving languages other than Japanese and English. In addition, several systems offering only J/E or E/J plan to add the other direction. Costs associated with developing additional languages are also presented, based on information provided by developers such as SYSTRAN, Bravice, Matsushita and Fujitsu.

The distinction between domain-specific and general-purpose systems is also discussed. It is interesting to read that in Japan several systems were initially domain-specific, focusing on the translation of computer manuals (said to represent 80% of all MT use in that country), evolving into domain-adaptable and ultimately general-purpose systems (p. 45). Several systems which followed this course are listed.

Chapter 4 ("Knowledge Sources for Machine Translation," by Yorick Wilks) presents an overview of the types of data used by an MT system, typically morphology tables, grammar rules, lexicons, and/or world knowledge representations. The development of knowledge sources in Toshiba's system is presented, showing a course from development of morphology and a dictionary in the late 1970s, used in the implementation of a Japanese word processor, to the development, in the mid to late 1980s, of a large-scale grammar and dictionary and a large-scale semantic database, all used in the company's ASTRANSAC system. Tables are included showing the size and type of the various knowledge sources used by the systems visited. However, the reader is left with several unanswered questions. For example, the author mentions that several different rates were given to the panel for the rate at which system builders (not users) could add new terms to the dictionary (from 5 entries/hour to six person years to customize a dictionary for a new application); what factors contribute to these differences? What are the rates at which users can be expected to enter information in the dictionary?

More generally, the author fails to discuss varying approaches to building and evaluating dictionaries; these issues are of extreme interest to readers of this report, given the ambitious scale of dictionary projects in Japan, and given also the critical importance of the dictionary in the success of a system. What plans, if any, exist for evaluating the quality of the data in the huge dictionary under development at the EDR dictionary project, or, for that matter, at any other dictionary project?

Chapter 5 ("Life Cycle of Ma