THE INTERNET AND LANGUAGES [around the year 2000] MARIE LEBERT NEF, University of Toronto, 2009 Copyright © 2009 Marie Lebert. All rights reserved. TABLE Introduction "Language nations" online Towards a "linguistic democracy" Encoding: from ASCII to Unicode First multilingual projects Online language dictionaries Learning languages online Minority languages on the web Multilingual encyclopedias Localization and internationalization Machine translation Chronology Websites INTRODUCTION It is true that the internet transcends the limitations of time, distances and borders, but what about languages? Non-English-speakinginternet users reached 50% in July 2000. # "Language Nations" "Because the internet has no national boundaries, the organization ofusers is bounded by other criteria driven by the medium itself. Interms of multilingualism, you have virtual communities, for example, ofwhat I call 'Language Nations'. .. All those people on the internetwherever they may be, for whom a given language is their nativelanguage. Thus, the Spanish Language nation includes not only Spanishand Latin American users, but millions of Hispanic users in the U. S. , as well as odd places like Spanish-speaking Morocco. " (Randy Hobler, consultant in internet marketing for translation products and services, September 1998) # "Linguistic Democracy" "Whereas 'mother-tongue education' was deemed a human right for everychild in the world by a UNESCO report in the early 1950s, 'mother-tongue surfing' may very well be the Information Age equivalent. If theinternet is to truly become the Global Network that it is promoted asbeing, then all users, regardless of language background, should haveaccess to it. To keep the internet as the preserve of those who, byhistorical accident, practical necessity, or political privilege, happen to know English, is unfair to those who don't. " (Brian King, director of the WorldWide Language Institute, September 1998) # A medium for the world "It is very important to be able to communicate in various languages. Iwould even say this is mandatory, because the information given on theinternet is meant for the whole world, so why wouldn't we get thisinformation in our language or in the language we wish? Worldwideinformation, but no broad choice for languages, this would be quite acontradiction, wouldn't it?" (Maria Victoria Marinetti, teacher inSpanish and translator, August 1999) # Good software "When software gets good enough for people to chat or talk on the webin real time in different languages, then we will see a whole new worldappear before us. Scientists, political activists, businesses and manymore groups will be able to communicate immediately without having togo through mediators or translators. " (Tim McKenna, writer andphilosopher, October 2000) *** Unless specified otherwise, quotations are excerpts from NEFinterviews. Many thanks to all those who are quoted in this book, andwho kindly answered questions about multilingualism over the years. Most interviews are available online . This book is also available in French, with a different text. Both versions are available online. The author, whose mother tongue is French, is responsible for any remainingmistakes in English. Marie Lebert is a researcher and editor specializing in technology forbooks, other media, and languages. Her books are published by NEF (Netdes études françaises / Net of French Studies), University of Toronto, Canada, and are freely available online . "LANGUAGE NATIONS" ONLINE = [Quote] Randy Hobler, a consultant in internet marketing for Globalink, acompany specializing in language translation software and services, wrote in September 1998: "Because the internet has no nationalboundaries, the organization of users is bounded by other criteriadriven by the medium itself. In terms of multilingualism, you havevirtual communities, for example, of what I call 'Language Nations'. .. All those people on the internet wherever they may be, for whom a givenlanguage is their native language. Thus, the Spanish Language nationincludes not only Spanish and Latin American users, but millions ofHispanic users in the U. S. , as well as odd places like Spanish-speakingMorocco. " = [Text] At first, the internet was nearly 100% English. A network was set up bythe Pentagon in 1969, before spreading to U. S. Governmental agenciesand universities from 1974 onwards, after Vinton Cerf and Bob Kahninvented TCP/IP (transmission control protocol / internet protocol). After the creation of the World Wide Web in 1989-90 by Tim Berners-Leeat the European Laboratory for Particle Physics (CERN) in Geneva, Switzerland, and the distribution of the first browser Mosaic, theancestor of Netscape, from November 1993 onwards, the internet reallytook off, first in the U. S. And Canada, then worldwide. Why did the internet spread in North America first? The U. S. And Canadawere leading the way in computer science and communication technology, and a connection to the internet, mainly through a phone line at thetime, was much cheaper than in most countries. In Europe, avid internetusers needed to navigate the web at night, when phone rates by theminute were cheaper, to cut their expenses. In 1998, some French, Italian and German users were so fed up with the high rates that theylaunched a movement to boycott the internet one day per week, forinternet providers and phone companies to set up a special monthly ratefor them. This paid off, and providers began to offer monthly "internetrates". In the 1990s, the percentage of English decreased from nearly 100% to80%. People from all over the world began to have access to theinternet, and to post more and more webpages in their own languages. The first major study about language distribution on the web was run byBabel, a joint initiative from Alis Technologies, a companyspecializing in language translation services, and the InternetSociety. The results were published in June 1997 on a webpage named"Web Languages Hit Parade". The main languages were English with 82. 3%, German with 4. 0%, Japanese with 1. 6%, French with 1. 5%, Spanish with1. 1%, Swedish with 1. 1%, and Italian with 1. 0%. In "Web Embraces Language Translation", an article published in ZDNN(ZDNetwork News) on 21 July 1998, Martha L. Stone explained: "Thisyear, the number of new non-English websites is expected to outpace thegrowth of new sites in English, as the cyber world truly becomes a'World Wide Web'. " According to Global Reach, a branch of Euro-Marketing Associates, aninternational marketing consultancy, there were 56 million non-English-speaking users in July 1998, with 22. 4% Spanish-speaking users, 12. 3%Japanese-speaking users, 14% German-speaking users, and 10% French-speaking users. But 80% of all webpages were still in English, whereasonly 6% of the world population was speaking English as a nativelanguage, while 16% was speaking Spanish as a native language. 15% ofEurope's half a billion population spoke English as a first language, 28% didn't speak English at all, and 32% were using the web in English. Jean-Pierre Cloutier was the editor of "Chroniques de Cybérie", aweekly French-language online report of internet news. He wrote inAugust 1999: "We passed a milestone this summer. Now more than half theusers of the internet live outside the United States. Next year, morethan half of all users will be non English-speaking, compared with only5% five years ago. Isn't that great? (. .. ) The web is going to grow innon-English-speaking regions. So we have to take into account thetechnical aspects of the medium if we want to reach these 'new' users. I think it is a pity there are so few translations of importantdocuments and essays published on the web - from English into otherlanguages and vice versa. (. .. ) In the same way, the recent spreadingof the internet in new regions raises questions which would be good toread about. When will Spanish-speaking communication theorists andthose speaking other languages be translated?" Will the web hold as many languages as the ones spoken on our planet?This will be quite a challenge, with the 6, 700 languages listed in "TheEthnologue: Languages of the World", an authoritative catalog publishedby SIL International (SIL: Summer Institute of Linguistics) and freelyavailable on the web since the mid-1990s. The year 2000 was a turning point for a multilingual internet, regarding its users. Non English-speaking users reached 50% in summer2000. According to Global Reach, they were 52. 5% in summer 2001, 57% inDecember 2001, 59. 8% in April 2002, 64. 4% in September 2003 (including34. 9% non-English-speaking Europeans and 29. 4% Asians), and 64. 2% inMarch 2004 (including 37. 9% non-English-speaking Europeans and 33%Asians). Despite the so-called English-language hegemony some non-English-speaking intellectuals were complaining about, without doing much topromote their own language, the internet was also a good medium forminority languages, as stated by Caoimhín Ó Donnaíle. Caoimhín hastaught computing at the Institute Sabhal Mór Ostaig, on the Island ofSkye (Scotland). He has also created and maintained the collegewebsite, as the main site worldwide with information on ScottishGaelic, with a bilingual (English, Gaelic) list of European minoritylanguages. He wrote in May 2001: "Students do everything by computer, use Gaelic spell-checking, a Gaelic online terminology database. Thereare more hits on our website. There is more use of sound. Gaelic radio(both Scottish and Irish) is now available continuously worldwide viathe internet. A major project has been the translation of the Operaweb-browser into Gaelic - the first software of this size available inGaelic. " TOWARDS A "LINGUISTIC DEMOCRACY" = [Quote] Brian King, director of the WorldWide Language Institute (WWLI), brought up the concept of "linguistic democracy" in September 1998:"Whereas 'mother-tongue education' was deemed a human right for everychild in the world by a UNESCO report in the early 1950s, 'mother-tongue surfing' may very well be the Information Age equivalent. If theinternet is to truly become the Global Network that it is promoted asbeing, then all users, regardless of language background, should haveaccess to it. To keep the internet as the preserve of those who, byhistorical accident, practical necessity, or political privilege, happen to know English, is unfair to those who don't. " = [Text] Yoshi Mikami, a computer scientist at Asia Info Network in Fujisawa(Japan), launched in December 1995 the website "The Languages of theWorld by Computers and the Internet", also known as the Logos Home Pageor Kotoba Home Page. (The website was updated until September 2001. )Yoshi was also the co-author (with Kenji Sekine and Nobutoshi Kohara)of "The Multilingual Web Guide" (Japanese edition), a print bookpublished by O'Reilly Japan in August 1997, and translated in 1998 intoEnglish, French and German. Yoshi Mikami explained in December 1998: "My native tongue is Japanese. Because I had my graduate education in the U. S. And worked in thecomputer business, I became bilingual in Japanese and American English. I was always interested in languages and different cultures, so Ilearned some Russian, French and Chinese along the way. In late 1995, Icreated on the web 'The Languages of the World by Computers and theInternet' and tried to summarize there the brief history, linguisticand phonetic features, writing system and computer processing aspectsfor each of the six major languages of the world, in English andJapanese. As I gained more experience, I invited my two associates tohelp me write a book on viewing, understanding and creatingmultilingual webpages, which was published in August 1997 as 'TheMultilingual Web Guide', in a Japanese edition, the world's first bookon such a subject. " Yoshi added in the same email interview: "Thousands of years ago, inEgypt, China and elsewhere, people were more concerned aboutcommunicating their laws and thoughts not in just one language, but inseveral. In our modern world, most nation states have each adopted onelanguage for their own use. I predict greater use of differentlanguages and multilingual pages on the internet, not a simplegravitation to American English, and also more creative use ofmultilingual computer translation. 99% of the websites created in Japanare written in Japanese. " Robert Ware launched his website OneLook Dictionaries in April 1996 asa "fast finder" in hundreds of online dictionaries. On September 2, 1998, the fast finder could "browse" 2, 058, 544 words in 425dictionaries covering various topics: business, computer/internet, medical, miscellaneous, religion, science, sports, technology, general, and slang. OneLook Dictionaries was provided as a free service by thecompany Study Technologies, in Englewood, Colorado. Robert Ware explained in September 1998: "On the personal side, I wasalmost entirely in contact with people who spoke one language and didnot have much incentive to expand language abilities. Being in contactwith the entire world has a way of changing that. And changing it forthe better! (. .. ) I have been slow to start including non-Englishdictionaries (partly because I am monolingual). But you will now find afew included. " In the same email interview, Robert wrote about a personal experienceshowing the internet could promote both a common language andmultilingualism: "In 1994, I was working for a college and trying toinstall a software package on a particular type of computer. I locateda person who was working on the same problem and we began exchangingemail. Suddenly, it hit me. .. The software was written only 30 milesaway but I was getting help from a person half way around the world. Distance and geography no longer mattered! OK, this is great! But whatis it leading to? I am only able to communicate in English but, fortunately, the other person could use English as well as German whichwas his mother tongue. The internet has removed one barrier (distance)but with that comes the barrier of language. It seems that the internetis moving people in two quite different directions at the same time. The internet (initially based on English) is connecting people allaround the world. This is further promoting a common language forpeople to use for communication. But it is also creating contactbetween people of different languages and creates a greater interest inmultilingualism. A common language is great but in no way replaces thisneed. So the internet promotes both a common language *and*multilingualism. The good news is that it helps provide solutions. Theincreased interest and need is creating incentives for people aroundthe world to create improved language courses and other assistance, andthe internet is providing fast and inexpensive opportunities to makethem available. " The internet could also be a tool to develop a "cultural identity". During the Symposium on Multimedia Convergence organized by theInternational Labor Office (ILO) in January 1997, Shinji Matsumoto, general secretary of the Musicians' Union of Japan (MUJ), explained:"Japan is quite receptive to foreign culture and foreign technology. (. .. ) Foreign culture is pouring into Japan and, in fact, the domesticmarket is being dominated by foreign products. Despite this, when itcomes to preserving and further developing Japanese culture, there hasbeen insufficient support from the government. (. .. ) With thedevelopment of information networks, the earth is getting smaller andit is wonderful to be able to make cultural exchanges across vastdistances and to deepen mutual understanding among people. We have toremember to respect national cultures and social systems. " December 1997 was a turning point for a plurilingual web. AltaVista, aleading search engine, was the first website to launch a freetranslation software called Babel Fish (or AltaVista Translation), which could translate up to three pages from English into French, German, Italian, Portuguese or Spanish, and vice versa. Non-English-speaking users were thrilled. The software was developed by Systran, apioneer company specializing in machine translation. Later on, othertranslation software was developed by Alis Technologies, Globalink, Lernout & Hauspie, Softissimo, Wordfast and Trados, with free and/orpaid versions available on the web. Brian King, director of the WorldWide Language Institute (WWLI), brought up the concept of "linguistic democracy" in September 1998:"Whereas 'mother-tongue education' was deemed a human right for everychild in the world by a UNESCO report in the early 1950s, 'mother-tongue surfing' may very well be the Information Age equivalent. If theinternet is to truly become the Global Network that it is promoted asbeing, then all users, regardless of language background, should haveaccess to it. To keep the internet as the preserve of those who, byhistorical accident, practical necessity, or political privilege, happen to know English, is unfair to those who don't. " Geoffrey Kingscott was the managing director of Praetorius, a languageconsultancy in applied languages. He wrote in September 1998: "Becausethe salient characteristics of the web are the multiplicity of sitegenerators and the cheapness of message generation, as the web maturesit will in fact promote multilingualism. The fact that the weboriginated in the USA means that it is still predominantly in Englishbut this is only a temporary phenomenon. If I may explain this further, when we relied on the print and audiovisual (film, television, radio, video, cassettes) media, we had to depend on the information orentertainment we wanted to receive being brought to us by agents(publishers, television and radio stations, cassette and videoproducers) who have to subsist in a commercial world or -- as in thecase of public service broadcasting -- under severe budgetaryrestraints. That means that the size of the customer-base is all-important, and determines the degree to which languages other than theubiquitous English can be accommodated. These constraints disappearwith the web. To give only a minor example from our own experience, wepublish the print version of Language Today [a magazine for linguists, published by Praetorius] only in English, the common denominator of ourreaders. When we use an article which was originally in a languageother than English, or report an interview which was conducted in alanguage other than English, we translate into English and publish onlythe English version. This is because the number of pages we can printis constrained, governed by our customer-base (advertisers andsubscribers). But for our web edition we also give the originalversion. " Founder of Euro-Marketing Associates and its virtual branch GlobalReach, Bill Dunlap was championing the assets of e-commerce in Europeamong his fellow compatriots in the U. S. Bill wrote in December 1998:"There are so few people in the U. S. Interested in communicating inmany languages -- most Americans are still under the delusion that therest of the world speaks English. However, here in Europe (I'm writingfrom France), the countries are small enough so that an internationalperspective has been necessary for centuries. " As the internet quickly spread worldwide, more and more people in theU. S. Realized that, although English may stay the main internationallanguage for exchanges of all kinds, people did prefer to readinformation in their own language. To reach as large an audience aspossible, companies and organizations needed to offer bilingual, trilingual, even multilingual websites, while adapting their content toa given audience. Thus the need of both localization andinternationalization, which became a major trend in the followingyears, not only in the U. S. But in many countries, with companiessetting up bilingual websites, in their language and in English, toreach a wider audience, and get more clients. Brian King, director of the WorldWide Language Institute (WWLI), explained in September 1998: "As well as the appropriate technologybeing available so that the non-English speaker can go, there is theimpact of 'electronic commerce' as a major force that may makemultilingualism the most natural path for cyberspace. A pull from non-English-speaking computer users and a push from technology companiescompeting for global markets has made localization a fast growing areain software and hardware development. " In 1998, the European Network in Language and Speech (ELSNET) was anetwork of more than 100 European academic and industrial institutions. ELSNET members intended to build multilingual speech and naturallanguage systems with coverage of both spoken and written language. Steven Krauwer, coordinator of ELSNET, explained in September 1998: "Asa European citizen I think that multilingualism on the web isabsolutely essential, as in the long run I don't think that it is ahealthy situation when only those who have a reasonable command ofEnglish can fully exploit the benefits of the web. As a researcher(specialized in machine translation) I see multilingualism as a majorchallenge: how can we ensure that all information on the web isaccessible to everybody, irrespective of language differences. " Steven added in August 1999: "I've become more and more convinced weshould be careful not to address the multilinguality problem inisolation. I've just returned from a wonderful summer vacation inFrance, and even if my knowledge of French is modest (to put itmildly), it's surprising to see that I still manage to communicatesuccessfully by combining my poor French with gestures, facialexpressions, visual clues and diagrams. I think the web (as opposed toold-fashioned text-only email) offers excellent opportunities toexploit the fact that transmission of information via differentchannels (or modalities) can still work, even if the process is onlypartially successful for each of the channels in isolation. " What practical solutions would he suggest for a truly multilingual web?"At the author end: better education of web authors to use combinationsof modalities to make communication more effective across languagebarriers (and not just for cosmetic reasons). At the server end: moretranslation facilities à la AltaVista (quality not impressive, butalways better than nothing). At the browser end: more integratedtranslation facilities (especially for the smaller languages), and morequick integrated dictionary lookup facilities. " Linguistic pluralism and diversity are everybody's business, asexplained in a petition launched by the European Committee for theRespect of Cultures and Languages in Europe (ECRCLE) "for a humanistand multilingual Europe, rich of its cultural diversity": "Linguisticpluralism and diversity are not obstacles to the free circulation ofmen, ideas, goods and services, as would like to suggest some objectiveallies, consciously or not, of the dominant language and culture. Indeed, standardization and hegemony are the obstacles to the freeblossoming of individuals, societies and the information economy, themain source of tomorrow's jobs. On the contrary, the respect forlanguages is the last hope for Europe to get closer to the citizens, anobjective always claimed and almost never put into practice. The Unionmust therefore give up privileging the language of one group. " The fulltext of the petition was available in the eleven official languages ofthe European Union. Among other things, the petition asked the revisorsof the Treaty of the European Union to include the respect of nationalcultures and languages in the text of the treaty, and the nationalgovernments to "teach the youth at least two, and preferably threeforeign European languages; encourage the national audiovisual andmusical industries; and favour the diffusion of European works. " Henk Slettenhaar is a professor in communication technology at WebsterUniversity in Geneva, Switzerland. Henk is a trilingual European. He isDutch, he teaches computer science in English, and he is fluent inFrench as a resident in neighboring France. He has regularly insistedon the need of bilingual websites, in the original language and inEnglish. He wrote in December 1998: "I see multilingualism as a veryimportant issue. Local communities which are on the web should use thelocal language first and foremost for their information. If they wantto be able to present their information to the world community as well, their information should be in English as well. I see a real need forbilingual websites. (. .. ) As far as languages are concerned, I amdelighted that there are so many offerings in the original languagesnow. I much prefer to read the original with difficulty than to get abad translation. " Henk added in August 1999: "There are two main categories of websitesin my opinion. The first one is the global outreach for business andinformation. Here the language is definitely English first, with localversions where appropriate. The second one is local information of allkinds in the most remote places. If the information is meant for peopleof an ethnic and/or language group, it should be in that languagefirst, with perhaps a summary in English. We have seen lately howimportant these local websites are -- in Kosovo and Turkey, to mentionjust the most recent ones. People were able to get information abouttheir relatives through these sites. " Marcel Grangier was the head of the French Section of the Swiss FederalGovernment's Central Linguistic Services, which means he was in chargeof organizing translations into French for the Swiss government. Hewrote in January 1999: "We can see multilingualism on the internet as ahappy and irreversible inevitability. So we have to laugh at thedoomsayers who only complain about the supremacy of English. Suchsupremacy is not wrong in itself, because it is mainly based onstatistics (more PCs per inhabitant, more people speaking English, etc. ). The answer is not to 'fight' English, much less whine about it, but to build more sites in other languages. As a translation service, we also recommend that websites be multilingual. The increasing numberof languages on the internet is inevitable and can only boostmulticultural exchanges. For this to happen in the best possiblecircumstances, we still need to develop tools to improve compatibility. Fully coping with accents and other characters is only one example ofwhat can be done. " Alain Bron, a consultant in information systems and a writer, wrote inJanuary 1999: "Different languages will still be used for a long timeto come and this is healthy for the right to be different. The risk isof course an invasion of one language to the detriment of others, andwith it the risk of cultural standardization. I think online serviceswill gradually emerge to get around this problem. First, translatorswill be able to translate and comment on texts by request, but mainlysites with a large audience will provide different language versions, just as the audiovisual industry does now. " Guy Antoine, founder of Windows on Haiti, a reference website aboutHaitian culture, wrote in November 1999: "It is true that for allintents and purposes English will continue to dominate the web. This isnot so bad in my view, in spite of regional sentiments to the contrary, because we do need a common language to foster communications betweenpeople the world over. That being said, I do not adopt the doomsdayview that other languages will just roll over in submission. Quite thecontrary. The internet can serve, first of all, as a repository ofuseful information on minority languages that might otherwise vanishwithout leaving a trace. Beyond that, I believe that it provides anincentive for people to learn languages associated with the culturesabout which they are attempting to gather information. One soonrealizes that the language of a people is an essential and inextricablepart of its culture. (. .. ) From this standpoint, I have much less faith in mechanized tools oflanguage translation, which render words and phrases but do a poor jobof conveying the soul of a people. Who are the Haitian people, forinstance, without "Kreyòl" (Creole for the non-initiated), the languagethat has evolved and bound various African tribes transplanted in Haitiduring the slavery period? It is the most palpable exponent ofcommonality that defines us as a people. However, it is primarily aspoken language, not a widely written one. I see the web changing thissituation more so than any traditional means of language dissemination. In Windows on Haiti, the primary language of the site is English, butone will equally find a center of lively discussion conducted in"Kreyòl". In addition, one will find documents related to Haiti inFrench, in the old colonial creole, and I am open to publishing othersin Spanish and other languages. I do not offer any sort of translation, but multilingualism is alive and well at the site, and I predict thatthis will increasingly become the norm throughout the web. " ENCODING: FROM ASCII TO UNICODE = [Quote] Brian King, director of the WorldWide Language Institute (WWLI), explained in September 1998: "The first step was for ASCII to becomeExtended ASCII. This meant that computers could begin to startrecognizing the accents and symbols used in variants of the Englishalphabet -- mostly used by European languages. But only one languagecould be displayed on a page at a time. (. .. ) The most recentdevelopment is Unicode. Although still evolving and only just beingincorporated into the latest software, this new coding systemtranslates each character into 16 bytes. Whereas 8-byte extended ASCIIcould only handle a maximum of 256 characters, Unicode can handle over65, 000 unique characters and therefore potentially accommodate all ofthe world's writing systems on the computer. So now the tools are moreor less in place. They are still not perfect, but at last we can atleast surf the web in Chinese, Japanese, Korean, and numerous otherlanguages that don't use the Western alphabet. As the internet spreadsto parts of the world where English is rarely used - such as China, forexample, it is natural that Chinese, and not English, will be thepreferred choice for interacting with it. For the majority of the usersin China, their mother tongue will be the only choice. "