USERS of SYSTEMS

The Current State of MT Usage

Or: How Do I Use Thee? Let Me Count the Ways


Muriel Vasconcellos


[This paper was presented in 1993 at the Fourth Machine Translation Summit in Kobe]


PCMT: A New Passion that Changes Everything

Two years ago, when we met in Washington at MT Summit III, it was obvious that MT was increasingly headed for the personal computer. Today the revolution is upon us. The advent of affordable software that can run on anyone's desktop ("PCMT"2) has totally challenged the received wisdom about MT usage. We must take a new look at the user profile, the purposes of MT, the products and the markets to which they are being directed, and the long-range future of the industry as a whole.

This report addresses the gap in our understanding of current MT usage by attempting an overview of all uses of MT based on the most concrete facts that could be found. It has considered only tried-and-true experiences and cumulative data reported directly by users. Information is particularly nebulous in the area of PCMT. Since there is no major up-front investment that needs to be justified, the user is less motivated to keep statistics. Nevertheless, some impressive facts are already a matter of record.

In the first place, there is now evidence that we are talking in rather large numbers of MT users. The June 1993 issue of WordPerfect Magazinereported the results of a mail-in poll in which readers voted for their favourite PCMT software. A total of 7,865 respondents took the trouble to send in their vote.3 Presumably these people have road-tested at least one of the products and may in fact be using MT for practical purposes. The top three choices were Linguistic Products' PC-Translator, MicroTac Software's Translation Assistant, and Globalink's GTS (version unspecified). PC-Translator has doubled in sales each year since it first appeared on the market in 1985. The company periodically introduces improvements in its 12 language combinations and usually has new combinations in the pipeline; the developers have been heartened by the high percentage of registered users who request upgrades and new languages.4 Globalink, which offers seven language combinations, went public in June 19935, and their prospectus states that approximately 13,000 units have been sold or placed with dealers since 1990. MicroTac, for its part, leads the market by a wide margin: in May 1993, all-time total sales of its four bidirectional packages reached a staggering 150,000 units6. The Translation Assistants are priced at under US$100 and, in some discount houses, as little as US$60.

In all, there are 10 companies selling PCMT in the United States. Together they translate in a total of 17 different directions, and a number of other systems and language combinations are under development7.

These products are being used in myriad ways. In the long run, translation varies as greatly as the texts that undergo it, the people who perform the process, and the consumers who require it. Each use is somewhat unique.

Even more impressive than the numbers is the fact that many users of the PCMT systems are happy campers. Their ranks include both translators and nontranslators, and it is among the latter that PCMT is cutting its widest swath. From unsolicited testimonials received by the vendors8, we learn that many people are enlisting these packages to prepare letters and memos in languages that are foreign to them. One user of this kind writes: "The PC-Translator is doing wonderfully, we are all satisfied." There seems to be a slight preference for enlisting them to produce translations of texts prepared by the user rather than to comprehend foreign texts, which are typically input by hand or by a pesky process of optical scanning.

Sometimes the users do not know the target language at all. Installed on a laptop, PCMT has served as a practical companion in social situations where language is a barrier, and it has helped travellers to get around in foreign countries. An American in Paris reports that he used French Assistant to explain to the caretaker of his building that the hot water was off. Another MicroTac user, an American priest filling in at the last minute on a cruise ship, relied on this same software to prepare his sermon in French. Most touching, perhaps, is the user of Italian Assistant who wrote: "Through your product I have been able to correspond with my relatives in Italy since my trip in 1990, when I was introduced to them for the first time. My dad passed away two years ago and my mom is too old to write."

Finding the Real MT Users

Finding out who really uses machine translation is no simple task. A few years ago it was possible, with help from the vendors, to identify at least those customers who were using MT on a significant scale. Today, however, with PCMT selling in large volume and with vendors busy attending to a broader customer base, the picture is far less clear. For the purpose of this report, a strategy was devised for locating a representative sample of MT users, who were to be presented with the following list of questions9.

Survey Questions

System used? Since when?
Language combinations (from => into)?
Hardware platform? Since when?
Form of input (e.g., disk, downloaded files, OCR, manual keying)?
Purpose of translation?
Type of documents translated -- discourse genre (e.g., "technical manuals"), subject matter?
Output per year (number of words) percentage of total translation volume?
Dictionary size (number of entries) for each language combination?
Description of personnel who use it (e.g., contract translators, etc.)? How many?
Type and amount of pre-editing done?
Type and amount of postediting done?
System for incorporating feedback from end-consumers?
Advantages, disadvantages of MT?
News flash: Latest developments? Novel uses of MT? Plans for the future?

As the first step, a list was drawn up of known users for whom fax addresses were available10. There were 33 of these (two of whom could not be reached). Next, a list was prepared of individuals who had checked the "User" box on their application form when they joined the Association for Machine Translation in the Americas. This exercise garnered 15 more names. It was known that some of these people were prospective users still investigating the feasibility of MT, so a letter was prepared addressing each one as a "user or potential user of MT" and asking them to report on their plans for using it if they did not already have it installed. The third step was to contact the vendors directly to ask them for the names and fax numbers of "some of [their] principal clients," sharing with them the list of questions that would be asked. Because of multiple sites and contacts, a total of 32 inquiries were sent out to vendors of 23 systems or families of systems. Six additional known vendors could not be reached. Of the 32 who were contacted, 14 replied and provided information about their users. These replies yielded 22 additional users, all of whom were approached. In the end, fax letters went out to 70 users or potential users.


Thus a fairly wide net was cast. Even so, the coverage was far from complete. The information obtained without the assistance of the vendors was not collected in any systematic way. In the vendor cycle, not all of them could be contacted, many who were contacted did not respond, and those who did reply did not necessarily give a full list of their customers. Response from the PCMT vendors, who account for far and away the largest volume of purchased (if not operating) units, was particularly low: only three replied, and only one of these directed us to specific users. Given such large gaps in the coverage, the answers received can only be considered representative of the vendors and users who were reached and had the time and inclination to share their experience. They do not speak for MT as a whole.

Another piece of missing information, which would be difficult for any survey to ferret out, is the user sites that have fallen by the wayside -- and why. This information is important for a full understanding of MT usage. However, it is hard to come by. One usually learns it by chance. Recently, for example, in a translation service that had shown positive results with MT, there was a breakdown in the hardware on which the system depends, and management was unwilling to buy the same equipment again. Elsewhere, an MT operation was eliminated because of a company-wide "reorganization"~perhaps an indirect victim of the foundering economy. At yet other site the operation was dependent on an individual, and when that person left there is no structure to keep it going. There may also be MT failures in the true sense that the text was not a good match for the system or not enough time and money were being saved to justify the investment. For a variety of reasons, most of this information, which would be very illuminating, is kept dark.

Despite its limitations, however, the material collected for the present report is significant in many ways. Its very abundance gives it a certain authority. A total of 40 responses were received: 33 from actual MT users, one from a user with a commitment to start in July 1993, and six from companies that were in the process of investigating MT -- two were running pilot tests, one had put out an invitation to bid, and three were undertaking feasibility studies. CompuServe was included in this last group, with plans to offer on-line service from English to French starting in the fall of 1993 and other combinations later. In addition, answers to the same questions, gathered within the last nine months, were available from five other users and were included in the study. The analysis that follows covers the 33 responses from actual users and the five additional ones for which information was available, for a total of 38 user sites -- or 54% of those that had been contacted. In all, they represent 15. different systems: Atlas, DP/Translator, Duet Qt, G‚n‚ral TAO, Hicats, Shalt, JICST, Logos, MicroCat, Metal, PC-Translator, Pivot, NHK, Spanam/Engspan, and Systran (including Systran Express, the on-line service that anyone with a PC, a modem, and a checkbook can tap into). There were 16 users in the Americas, 11 from Europe, and 11 from Japan11. This may be the largest body of data ever collected at a single time on the use of MT. While it does not permit hard statistics, some very interesting conclusions can be drawn about how MT stands up to the test of translating texts in the real world.

Measuring MT Usage

We can learn a lot about how much MT is being used from the volume of translation being produced and the percentage that this represents of the total workload. The survey yielded some illuminating information in this regard.

Thirty of the 38 users gave information on the volume of translation they produce using MT, the percentage that this represents of their total workload, or both (see table). Many of them had statistics at their fingertips, and it is easy to see that high-volume users, new or pilot users who are keeping a close watch on the effect of MT implementation, and users closely involved with development of the system itself would have reason to keep careful records.

In the category of large-volume users, the figures show that there are some truly industrial-strength MT operations. The European Commission is near the top of the list with 30 million words a year of general translation, for which they use Systran in a total of 13 different language combinations and serve from 400 to 500 end-consumers. These numbers take on special importance because the translations are in a wide range of subject areas and discourse genres. They amount to 15% of the total translation workload of the CEC. Interestingly, only 30% of Systran's output is postedited by professional translators; the rest is delivered "raw" and is used for information purposes only.

Two other very large users are Bull in France, which expects to be using Systran at an annual rate of 45 million words by the end of 1993, and Lexi-tech, which uses Logos for about 25 million words a year. Both these companies are using MT for technical documentation. M‚t‚o generates about 17 million words of weather bulletins each year for Environment Canada. The U.S. Air Force/FASTC, in its venerable information-gathering operation, annually translates between 10 and 12.5 million words with Systran. Intergraph relies on their own DP/Translator for about 10 million words. Xerox produces about 9 million words with Systran. Nikkei Printing uses NEC's Pivot and Sharp's Duet Qt for about 4.5 million words. And so on.

Added together, the volume of MT produced by these users -- slightly over half the known users approached in the survey -- comes to about 180 million words a year. MT use in the world undoubtedly exceeds 380 million. These figures translate, respectively, to some 720,000 pages of known use and about 1.2 million pages of estimated use. It is impossible to guess what percentage this represents of total translation in the world, however, since experts recognize that there is really no w