POST-EDITING – PROFESSIONAL TRANSLATION SERVICE REDEFINED Darja Fišer University of Ljubljana MT@Work 5 December 2014 Brussels, Belgium Presentation Outline 1. Basic concepts about post-editing 2. Quality in translation projects 3. Types of post-editing 4. Post-editing guidelines MOTIVATION Why should I care about post-editing? Why PEMT? • Increasing demand for PEMT in the market: • increasing volume of short-lived documents • different levels of text quality acceptable • The industry perspective on MT: • to lower productivity prices • to publish more content • to publish in more languages • to publish in less time • The TAUS (2010) survey: • 52% companies in the US, EU & Asia provide PE services regularly • 74% of the resources they used are freelance translators THE BIG PICTURE How does MT affect the translation process and the translator? Integrating MT in the translation process Phase 1: Translation Memories Source text 0% translate d Translation Memory (TM) Hybrid text x% translate d Phase 2: Machine Translation Machine Translation (MT) Untranslated segments Hybrid text 100% translate d but with MT errors Phase 3: Post-editing Posteditor Target text 100% translate d The role of translators in PEMT • The role of PEMT experts: • edit the output • select the adequate corpus • clean up the data so the output is more suitable for the customer • provide constant feedback to improve the system’s performance • the role changes as MT improves • The nature of PEMT projects: • large contents of highly repetitive nature, short-lived, internal use • pre-editing (at the SL level before MT to avoid ambiguous input to the MY system) • post-editing (at the TL level after MT to correct errors in the MT output) BASIC CONCEPTS IN PEMT Is post-editing a translation task or a revision task? Post-editing vs. Translation • Post-editing: • reviewing an MT text against an original text and correcting any errors in order to comply with a set of quality criteria in as few edits as possible • the set of quality criteria ≠ a personal idea of translation quality • as few edits as possible > to increase the productivity • PE vs. Translation: • Translation: 1 source • PE: 2 sources (the original & the raw MT output): 1. reject the MT output & translate from scratch (PE closer to Trans than Rev) 2. correct a lot/a few of errors (PE closer to Rev than Trans) 3. accept the proposed translation as is (PE closer to Rev than Trans) • PE should be done by a translator, not a monolingual reviewer! Post-editing vs. Revision • PE: • deals with recurring, predictable errors • MT texts put a strain on the post-editing expert, so PE is more cognitively demanding • Revision: • checks for random mistranslations or omissions • human errors more difficult to spot but the texts are easier to read • PE & Revision both require specific skills and should be tackled by translators trained & experienced in the task! • > 100,000 words / 1 month of full-time post-editing THE JOB PROFILE What skills and qualities do I need to be a good postediting expert? Skills for post-editing (O’Brien 2002) • Degree in translation or related subjects • Expert in the subject area and target language • Proficient in the source language and contrastive issues • Experience in technical translation/localization • Advanced word processing skills, full key proficiency (search&replace) • Positive, tolerant and open-minded towards MT • Confidence in abilities and technical expertise • Recognition of typical and repetitive MR errors • Ability to use macros and coded dictionaries • Advanced terminology management skills • Background MT knowledge, types of PE and levels of expected quality • Pre-editing skills (controlled language & controlled authoring tools) • Programming skills for automatically correcting errors MT QUALITY What can I expect from MT and what can clients expect from PEMT? Common MT errors • What MT errors to expect: • Depend on the MT system, the content and the language pair used! • error analysis time-consuming but: • crucial to improve the MT system • crucial to raise awareness about the post-editing task • Several error classifications exist (Schäffer, 2003): 1. Lexical errors (general vocabulary, terminology, polysemy, idioms) 2. Syntactic errors (sentence analysis, word order) 3. Grammatical mistakes (tense, number, gender, case, punctuation) 4. Errors due to defective input (mistakes in the source language) Quality in technical translation and localization • Functionalist’s approach to quality: • the focus is on the customer’s needs and what they pay for • quality is variable and is defined by clients, not the society in general • Fit-for-purpose! (not what trained translators would consider the best) • Quality of MT • standard MT evaluation measures (BLEU, Meteor, NIST, TER): • how close the input is to human quality with a single number • not very reliable in most translation projects • manual quality assessment needed! • crucial for productivity savings & pricing • random strings checked for grammar, terminology and format (grades 1-5) • very specific client’s quality expectations needed! (rapid/full PE) • Quality of PE • MT is used to save costs, so revision of PE texts is usually not done • Crucial to strike a balance between speed and the quality of PE TYPES OF POST-EDITING How much post-editing should I do? Different levels of post-editing 1. No post-editing • directly published on the internet, with disclaimer 2. Rapid post-editing • suitable for short-lived documents needed gisting & internal use • min editing, shortest time possible, min no. of changes, to remove blatant & significant errors, no stylistic changes 3. Full post-editing • leading to human quality, required for texts for publication • max editing, all errors and stylistic changes taken into account (but still in less time than translating from scratch) • Criteria: • the MT system and language pair used • the domain and structure of the text • the use of the final text, the desired quality and the type of readers • the volume of translation and the time available POST-EDITING GUIDELINES What exactly should I correct and how much? General guidelines for PE • Language- and project-specific guidelines needed for each project! • as short and precise as possible: • a description of the MT system and the source text used • a description of the quality of MT output and the expected quality of the • • • • finished translation scenarios when to discard a useless segment typical types of errors that need to be corrected changes to be avoided terminology issues Guidelines for rapid PE 1. Read the source segment first 2. Read the MT suggestion 3. Make the necessary changes: • Make sure the content of the sentence is accurate • If the terminology is incorrect, don’t spend too much time researching • Don’t post-edit word-order if the sentence can be understood as is • Don’t change style • Don’t replace words with a synonym • Don’t correct grammar mistakes unless the target sentence doesn’t reflect the meaning of the source sentence Guidelines for full PE • Always very project-specific • Use the MT suggestion if: • a large piece of the sentence is correct • the raw MT quality is very good with only minor corrections needed • the raw MT quality is not so good but would still be faster to correct it than to translate from scratch • the MT has the correct meaning and is completely understandable • Don’t use the MT suggestion if: • the raw MT doesn’t make any sense and it would take longer to correct it than to translate from scratch • you need a more than a few seconds to understand it • there are errors that would require rearranging most of the text Examples from the guidelines at Microsoft • The 5-10 second evaluation: • the maximum time you should spend evaluating the validity of the MT suggestion • if it is hard to understand already at the beginning, don’t even read the whole sentence, just proceed to translate from scratch instead. • The High-5 & Low-5 rule: • When you detect a long sentence, do the following: • Read the first 5 words. If it’s good, read on until it’s bad, then stop and copy the correct part and continue to translate and forget about reading on. • If the first 5 or 6 words aren’t good, skip to read the last 5 or 6 words. It the last part of the sentence is correct, use it, or just start the whole thing from scratch. • If both first 5 and last 5 words are incorrect, do not carry on reading through the middle to try to identify correct MT segments. Just discard the MT suggestion and proceed to translate from scratch. POST-EDITING EFFORT AND PRODUCTIVITY How hard will post-editing be and how much will I gain? Post-editing effort • Key element to decide if the use of MT is worthwhile or not (Krings, 2001): • Temporal PE effort • Does PEMT save time vs. human translation? • Does PEMT save time vs. TM fuzzy matches? • Depends of the quality of the raw MT output and type of errors! • Cognitive PE effort • How complex and cognitively demanding are the corrections? • Obvious mistakes (gender) vs. ambiguous complex syntactic structures • Technical PE effort • Does PE require to delete, insert, reorder or a all 3? • Measuring PE effort: • temporal: the easiest to measure • cognitive & technical PE: eye-trackers, Translog, Think Aloud Protocols (useful in research, less so in the commercial world) Post-editing productivity • One of the big unknown factors in PEMT projects • new field, so no standard metrics exist • productivity in PE estimated at 4,000-10,000 words/day • many variables to consider: • • • • the quality of raw MT output? the productivity of translators in general? the experience of post-editors? the amount of effort to post-edit fuzzy matches? • inconclusive results: • early studies: show productivity gains up to 3 times compared to HT (Vasconcellos and Leon 1985) • recent studies: productivity gain not always achieved (O’Brien 2006, Guerberof 2008) • commercial users: many claim high productivity gains but don’t make their methodology available • Test before you commit! LETS PEMT! PEMT is here to stay Learn & Teach PEMT Ride the wave!