Google Translate Project
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

I need to have some text translated from German to English in several files. Specifically, I need only the text that lies between braces, “{“ and “}”, translated. Text outside the braces should be left alone.

I tried running the files through Google Translate and it mangled all the other text outside of the braces. So, the solution probably consists of building a script or program to call Google Translate in the Cloud for the German text strings one by one. It looks like anyone can get a free account on Google Cloud if you stay under the service limits.

Email me to request the zipped file at “ewainwright69 AtSign gmail.com”

Thanks, -Eric

Here's an example of some of the text:

  1. e4 c5 2. Nf3 Nc6 3. d4 cxd4 4. Nxd4 g6 5. c4 Nf6 6. Nc3 Nxd4 7. Qxd4 d6 { Ende Buch} 8. Bg5 Bg7 9. Qd2 {Ende Buch} Be6 10. f3 Qa5 11. Be2 O-O 12. Rc1 Rfc8 {Weiß verfügt evtl. über einen leichten Stellungsvorteil.} 13. b3 b6
  2. Nd5 Qxd2+ 15. Kxd2 Bxd5 {dieser Abtausch ist aus strategischen Gründen fragwürdig, überlässt er doch Weiß wieder ohne Not das Läuferpaar. Es ist jedoch bekannt, dass schon die Mephisto-Programme gerne mit Springern spielen.}
  3. cxd5 e6 {Hiarcs hat nun fraglos Stellungsvorteile.} 17. dxe6 fxe6 18. Bb5 Rc5 19. Rxc5 bxc5 20. Kc1 {warum nicht 20. Kd2-c2?} Rb8 21. Be2 Nd7 22. Be3 d5
Is it one-time task?
drakmail 2 months ago
Yes, it's a one-time task.
CuriousMynd 2 months ago
awarded to CyteBode

Crowdsource coding tasks.

2 Solutions


Bonus the js function that you can run inside chrome console

function fetchApi(text) {
  return new Promise(resolve => {
    fetch("https://translate.googleapis.com/translate_a/single?client=gtx&sl=de&tl=en&dt=t&q=" + text)
      .then(response => response.json())
      .then(data => {
        if (data[0][0][0]) {
          return resolve(data[0][0][0]);
        }
        return resolve(text);
      })
      .catch(error => {
        return resolve(text);
      });
  });
}

async function tranlsate(input) {
  let outputString = input;
  const needTranslates = outputString.match(/\{.+?\}/g);
  for (let index = 0; index < needTranslates.length; index++) {
    const translated = await fetchApi(needTranslates[index]);
    outputString = outputString.replace(needTranslates[index], translated);
  }
  return outputString;
}

and then:

tranlsate(
  "e4 c5 2. Nf3 Nc6 3. d4 cxd4 4. Nxd4 g6 5. c4 Nf6 6. Nc3 Nxd4 7. Qxd4 d6 { Ende Buch} 8. Bg5 Bg7 9. Qd2 {Ende Buch} Be6 10. f3 Qa5 11. Be2 O-O 12. Rc1 Rfc8 {Weiß verfügt evtl. über einen leichten Stellungsvorteil.} 13. b3 b6"
)
.then(output => {
   console.log(output);
}).catch(error => console.log(error));
This looks like a good solution too. The files, though, are fairly large. Would I be able to point the translate function to a file?
CuriousMynd 2 months ago
of course! It's pure js so you can wrap into script of html file easily, example: https://i.imgur.com/5BLeKxG.png
minhtc 2 months ago
I tried this solution. It writes the file to the console, but it doesn't translate the text. Perhaps, you can send me an email (above) to discuss the solution further.
CuriousMynd 2 months ago

Requirements

  • Python 2.7 or 3 (Tested with 2.7.15 and 3.6.7)
  • requests

translate_pgn.py

# -*- coding: utf-8 -*-

import codecs
import json
import re
import textwrap

import requests


try:
    kb_input = raw_input # Python 2
except NameError:
    kb_input = input     # Python 3


HEADERS = {
    "User-Agent": ("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:54.0) "
                 + "Gecko/20100101 Firefox/54.0")
}

LINE_WRAP = 79


def boxed_print(text):
    assert len(text.splitlines()) == 1
    if len(text) + 4 > LINE_WRAP:
        text = text[0:LINE_WRAP - 7] + "..."
    print("-" * (len(text) + 4))
    print("| %s |" % text)
    print("-" * (len(text) + 4))


class TranslateException(Exception):
    pass


class YandexTranslator(object):
    """ Translates some text using the Yandex translation API. """
    # Documentation: https://tech.yandex.com/translate/

    # Daily character limit: 1'000'000/day. Count resets at 00:00 UTC.
    API_URL = "https://translate.yandex.net/api/v1.5/tr.json/translate"
    CHAR_LIMIT = 10000 # Limit per request
    VALID_CODES = {
        "af": "Afrikaans", "sq": "Albanian", "am": "Amharic", "ar": "Arabic",
        "hy": "Armenian", "az": "Azerbaijan", "ba": "Bashkir", "eu": "Basque",
        "be": "Belarusian", "bn": "Bengali", "bs": "Bosnian",
        "bg": "Bulgarian", "my": "Burmese", "ca": "Catalan", "ceb": "Cebuano",
        "zh": "Chinese", "hr": "Croatian", "cs": "Czech", "da": "Danish",
        "nl": "Dutch", "en": "English", "eo": "Esperanto", "et": "Estonian",
        "fi": "Finnish", "fr": "French", "gl": "Galician", "ka": "Georgian",
        "de": "German", "el": "Greek", "gu": "Gujarati",
        "ht": "Haitian (Creole)", "he": "Hebrew", "mrj": "Hill Mari",
        "hi": "Hindi", "hu": "Hungarian", "i": "Icelandic", "id": "Indonesian",
        "ga": "Irish", "it": "Italian", "ja": "Japanese", "jv": "Javanese",
        "kn": "Kannada", "kk": "Kazakh", "km": "Khmer", "ko": "Korean",
        "ky": "Kyrgyz", "lo": "Laotian", "la": "Latin", "lv": "Latvian",
        "lt": "Lithuanian", "lb": "Luxembourgish", "mk": "Macedonian",
        "mg": "Malagasy", "ms": "Malay", "ml": "Malayalam", "mt": "Maltese",
        "mi": "Maori", "mr": "Marathi", "mhr": "Mari", "mn": "Mongolian",
        "ne": "Nepali", "no": "Norwegian", "pap": "Papiamento",
        "fa": "Persian", "pl": "Polish", "pt": "Portuguese", "pa": "Punjabi",
        "ro": "Romanian", "ru": "Russian", "gd": "Scottish", "sr": "Serbian",
        "si": "Sinhala", "sk": "Slovakian", "sl": "Slovenian", "es": "Spanish",
        "su": "Sundanese", "sw": "Swahili", "sv": "Swedish", "tl": "Tagalog",
        "tg": "Tajik", "ta": "Tamil", "tt": "Tatar", "te": "Telugu",
        "th": "Thai", "tr": "Turkish", "udm": "Udmurt", "uk": "Ukrainian",
        "ur": "Urdu", "uz": "Uzbek", "vi": "Vietnamese", "cy": "Welsh",
        "xh": "Xhosa", "yi": "Yiddish"
    }

    @staticmethod
    def human_readable_language(code):
        return YandexTranslator.VALID_CODES.get(code.lower())

    def __init__(self, api_key, target, source = "auto", verbose = False):
        print("Powered by Yandex.Translate: http://translate.yandex.com/")
        self._api_key = api_key
        self._verbose = verbose

        target = target.lower()
        source = source.lower()
        if target not in YandexTranslator.VALID_CODES:
            raise TranslateException("Invalid target language: %s" % target)
        if source not in YandexTranslator.VALID_CODES and source != "auto":
            raise TranslateException("Invalid source language: %s" % source)

        self._lang = target if source == "auto" else source + "-" + target

    def translate(self, text):
        """ Translate some text, making a request to the API. """
        data = { "key": self._api_key, "text": text, "lang": self._lang }

        try:
            response = requests.post(YandexTranslator.API_URL,
                                     data = data, headers = HEADERS)
            json_response = response.json()
        except Exception as e:
            raise TranslateException(e)

        if "text" in json_response:
            translated = json_response["text"][0]
            if self._verbose:
                boxed_print("Original")
                print(text)
                boxed_print("Translated")
                print(translated)
                print("")
            return translated
        else:
            raise TranslateException(json_response["message"])

    def translate_lines(self, lines):
        """ Translate a list of lines of texts. The list may be split into
            multiple requests in order to respect the character limit. """
        joined = "\n".join(lines)
        if len(joined) < YandexTranslator.CHAR_LIMIT:
            return self.translate(joined).split("\n")
        elif len(lines) == 1:
            raise TranslateException(
                "A line is too long: %d characters." % len(lines[0]))
        else:
            # Divide and conquer
            return (self.translate_lines(lines[:len(lines)//2])
                  + self.translate_lines(lines[len(lines)//2:]))


def translate_pgn(ifile, ofile, translator):
    pgn_regex    = re.compile(r"((?:[ \t]*\n){2,})")
    braces_regex = re.compile(r"{([^}]*)}")

    UTF8_BOM = b"\xEF\xBB\xBF"

    with open(ifile, "rb") as f:
        pgn_text = f.read()
        has_crlf = b"\r\n" in pgn_text
        has_bom  = pgn_text[0:3] == UTF8_BOM

        # Get rid of the BOM, and of the CR's in the line endings
        pgn_text = pgn_text.decode("utf-8-sig").replace("\r", "")
        tags_and_moves = pgn_regex.split(pgn_text)

        output = []
        try:
            for j, stuff in enumerate(tags_and_moves):
                # [Tags, empty line(s), moves, empty line(s)]
                if j % 4 == 2: # Moves
                    # All in one line
                    stuff = stuff.replace("\n", " ")

                    to_translate = []
                    moves_and_comments = braces_regex.split(stuff)
                    for i, text in enumerate(moves_and_comments):
                        if i % 2 == 1:
                            to_translate.append(text)

                    if to_translate:
                        translated = translator.translate_lines(to_translate)
                        # Replace the comments with their translations
                        for i, text in enumerate(translated):
                            moves_and_comments[i*2+1] = "{%s}" % text

                    line = "".join(moves_and_comments)
                    # Line wrapping
                    output.append("\n".join(textwrap.wrap(line, LINE_WRAP)))
                else: # Everything else
                    output.append(stuff)
        finally:
            # Put a BOM if there was one in the input
            with open(ofile, "wb+") as f:
                if has_bom:
                    f.write(UTF8_BOM)

            with codecs.open(ofile, "a+", "utf-8") as f:
                for stuff in output:
                    f.write(stuff.replace("\n", "\r\n") if has_crlf else stuff)


def main():
    import argparse
    import glob
    import os.path
    import sys

    Translator = YandexTranslator
    # To make an account and get a key: https://tech.yandex.com/translate/
    API_KEY = "trnsl.1.1.datetime.hex64.hex160"

    parser = argparse.ArgumentParser(description =
        "Translate the {comments} in PGN files from one language to another.",
        formatter_class = argparse.ArgumentDefaultsHelpFormatter)

    parser.add_argument("--input", "-i", help = "Input file(s)",
                        required = True, nargs = "+")
    parser.add_argument("--output", "-o", help = "Output directory")
    parser.add_argument("--target", "-t", help = "Target language (ISO 639-1)",
                        default = "en")
    parser.add_argument("--source", "-s", help = "Source language (ISO 639-1)",
                        default = "auto")
    parser.add_argument("--verbose", "-v",
                        help = "Print the strings as they get translated",
                        action = "store_true")

    args = parser.parse_args()

    source = args.source.strip().lower()
    target = args.target.strip().lower()

    try:
        translator = Translator(API_KEY, target, source, args.verbose)
    except TranslateException as e:
        print(e)
        return -1

    if source != "auto":
        print("Translating PGN files from %s to %s..."
              % (Translator.human_readable_language(source),
                 Translator.human_readable_language(target)))
    else:
        print("Translating PGN files to %s..."
              % Translator.human_readable_language(target))

    output_dir = os.path.abspath(args.output) if args.output else None
    if output_dir:
        if not os.path.exists(output_dir):
            os.makedirs(output_dir)
        elif not os.path.isdir(output_dir):
            sys.stderr.write("Not a directory: %s\n" % output_dir)
            return -1

    files = []
    for input_ in args.input:
        if len(glob.glob(input_)) == 0:
            sys.stderr.write("Invalid input path: %s\n" % input_)
            return -1
        # Expand the * wildcard if needed
        for file in glob.glob(input_):
            files.append(file)

    successful = 0
    for file in files:
        input_path = os.path.abspath(file)
        root, name = os.path.split(input_path)

        if output_dir:
            output_path = os.path.join(output_dir, name)
        else:
            name_root, name_ext = os.path.splitext(name)
            output_name = name_root + "_translated" + name_ext
            output_path = os.path.join(root, output_name)

        if args.verbose:
            print("")
            boxed_print("Translating \"%s\"..." % name)
        else:
            print("Translating \"%s\"..." % name)
        try:
            translate_pgn(input_path, output_path, translator)
            successful += 1
        except TranslateException as e:
            print("!!! %s: %s" % (type(e).__name__, e))
            print("    Partial results were saved to file anyway.")
            kb_input("Press enter to continue.")

    return successful


if __name__ == '__main__':
    import sys
    sys.exit(main())

Usage

python translate_pgn.py -i /path/to/pgns/*.pgn -o /path/to/output/dir/

If the output directory isn't provided, the output file(s) will be filename_translated.pgn, in the same directory. python translate_pgn.py --help for more info.

Edit 1: Overhauled the script to work directly with .pgn files. Changed the translation API to Yandex as the public Google API was problematic due to rate limiting and issues processing multiple lines and special characters.

Edit 2: Cleaned up the code. Removed the 1s sleep as it's not necessary. Switched from params to data in translate() to make a proper POST request. Added BOM and CRLF detection to have the output file be similar to the input. Made translate_pgn work with games that don't have any comments in the moves. Added support for any language pair with the --target and --source switches. Added a --verbose switch to have the translated strings be printed out. Made the input argument(s) go through glob.glob() to make things work if the * wildcard doesn't get expanded.

Edit 3: Refactored the translation code into a class to make it easier to swap it out and use a different API. Added some error handling and language validation. Tweaked the regexes in translate_pgn, and fixed a bug that would cause short move sections (<= 79 characters) to be skipped due to a missing else clause.

This looks like a great solution, thanks!
CuriousMynd 2 months ago
CyteBode, terrific job!! I’m very pleased with this second effort.
CuriousMynd 2 months ago
Awesome, thanks for this update! Very useful.
CuriousMynd 2 months ago
View Timeline