Google Translate Project
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

I need to have some text translated from German to English in several files. Specifically, I need only the text that lies between braces, “{“ and “}”, translated. Text outside the braces should be left alone.

I tried running the files through Google Translate and it mangled all the other text outside of the braces. So, the solution probably consists of building a script or program to call Google Translate in the Cloud for the German text strings one by one. It looks like anyone can get a free account on Google Cloud if you stay under the service limits.

Email me to request the zipped file at “ewainwright69 AtSign gmail.com”

Thanks, -Eric

Here's an example of some of the text:

  1. e4 c5 2. Nf3 Nc6 3. d4 cxd4 4. Nxd4 g6 5. c4 Nf6 6. Nc3 Nxd4 7. Qxd4 d6 { Ende Buch} 8. Bg5 Bg7 9. Qd2 {Ende Buch} Be6 10. f3 Qa5 11. Be2 O-O 12. Rc1 Rfc8 {Weiß verfügt evtl. über einen leichten Stellungsvorteil.} 13. b3 b6
  2. Nd5 Qxd2+ 15. Kxd2 Bxd5 {dieser Abtausch ist aus strategischen Gründen fragwürdig, überlässt er doch Weiß wieder ohne Not das Läuferpaar. Es ist jedoch bekannt, dass schon die Mephisto-Programme gerne mit Springern spielen.}
  3. cxd5 e6 {Hiarcs hat nun fraglos Stellungsvorteile.} 17. dxe6 fxe6 18. Bb5 Rc5 19. Rxc5 bxc5 20. Kc1 {warum nicht 20. Kd2-c2?} Rb8 21. Be2 Nd7 22. Be3 d5
Is it one-time task?
drakmail 5 days ago
Yes, it's a one-time task.
CuriousMynd 5 days ago
awarded to CyteBode

Crowdsource coding tasks.

2 Solutions


Bonus the js function that you can run inside chrome console

function fetchApi(text) {
  return new Promise(resolve => {
    fetch("https://translate.googleapis.com/translate_a/single?client=gtx&sl=de&tl=en&dt=t&q=" + text)
      .then(response => response.json())
      .then(data => {
        if (data[0][0][0]) {
          return resolve(data[0][0][0]);
        }
        return resolve(text);
      })
      .catch(error => {
        return resolve(text);
      });
  });
}

async function tranlsate(input) {
  let outputString = input;
  const needTranslates = outputString.match(/\{.+?\}/g);
  for (let index = 0; index < needTranslates.length; index++) {
    const translated = await fetchApi(needTranslates[index]);
    outputString = outputString.replace(needTranslates[index], translated);
  }
  return outputString;
}

and then:

tranlsate(
  "e4 c5 2. Nf3 Nc6 3. d4 cxd4 4. Nxd4 g6 5. c4 Nf6 6. Nc3 Nxd4 7. Qxd4 d6 { Ende Buch} 8. Bg5 Bg7 9. Qd2 {Ende Buch} Be6 10. f3 Qa5 11. Be2 O-O 12. Rc1 Rfc8 {Weiß verfügt evtl. über einen leichten Stellungsvorteil.} 13. b3 b6"
)
.then(output => {
   console.log(output);
}).catch(error => console.log(error));
This looks like a good solution too. The files, though, are fairly large. Would I be able to point the translate function to a file?
CuriousMynd 5 days ago
of course! It's pure js so you can wrap into script of html file easily, example: https://i.imgur.com/5BLeKxG.png
minhtc 5 days ago
I tried this solution. It writes the file to the console, but it doesn't translate the text. Perhaps, you can send me an email (above) to discuss the solution further.
CuriousMynd 5 days ago

Requirements

  • Python 2.7 or 3 (Tested with 2.7.15 and 3.6.7)
  • requests

translate_pgn.py

# -*- coding: utf-8 -*-

import codecs
import json
import re
import time
import textwrap

import requests


HEADERS = {
    "User-Agent": ("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:54.0) "
                   + "Gecko/20100101 Firefox/54.0")
}

# To make an account and get a key: https://tech.yandex.com/translate/
YANDEX_API_KEY = "trnsl.1.1.datetime.hex64.hex160"
# Daily character limit: 1'000'000/day. The count resets at 00:00 UTC.


class TranslateException(Exception):
    pass


def translate(text, target, source = "auto"):
    """ Translate some text using the Yandex translation API. """
    URL = "https://translate.yandex.net/api/v1.5/tr.json/translate"

    data = {
        "key": YANDEX_API_KEY,
        "text": text,
        "lang": (target if source == "auto" else source + "-" + target)
    }

    r = requests.post(URL, data = data, headers = HEADERS)
    json_response = r.json()

    if "text" in json_response:
        return json_response["text"][0]
    else:
        raise TranslateException(json_response["message"])


def translate_pgn(ifile, ofile, target, source = "auto", verbose = False):
    def translate_lines(lines):
        joined = "\n".join(lines)
        if len(joined) < 10000:
            translated = translate(joined, target, source)

            if verbose:
                print("------------")
                print("| Original |")
                print("------------")
                print(joined)

                print("--------------")
                print("| Translated |")
                print("--------------")
                print(translated)
                print("")

            return translated
        elif len(lines) == 1:
            raise TranslateException("A line is too long: %d chars."
                                     % len(lines[0]))
        else:
            # Divide and conquer
            return "\n".join([translate_lines(lines[:len(lines)//2]),
                              translate_lines(lines[len(lines)//2:])])

    braces_regex = re.compile(r"\{(.*)\}")
    pgn_regex = re.compile(r"(\n\n+)")
    braces_regex = re.compile("\{([^}]*)\}")

    with open(ifile, "rb") as f:
        pgn_text = f.read()
        has_crlf = b"\r\n" in pgn_text
        has_bom = pgn_text[0:3] == b"\xEF\xBB\xBF"

        # Get rid of the BOM and of the CR's in the line endings
        pgn_text = pgn_text.decode("utf-8-sig").replace("\r", "")
        tags_and_moves = pgn_regex.split(pgn_text)

        output = []
        for j, stuff in enumerate(tags_and_moves):
            # [Tags, empty line(s), moves, empty line(s)]
            if j % 4 == 2: # Moves
                # All on one line
                stuff = stuff.replace("\n", " ")

                to_translate = []
                moves_and_comments = braces_regex.split(stuff)
                for i, text in enumerate(moves_and_comments):
                    if i % 2 == 1:
                        to_translate.append(text)

                if to_translate:
                    translated = translate_lines(to_translate)
                    # Replace the comments with their translations
                    for i, text in enumerate(translated.split("\n")):
                        moves_and_comments[i*2+1] = "{%s}" % text

                line = "".join(moves_and_comments)
                if len(line) > 79:
                    # 79 character limit
                    for wrapped_line in textwrap.wrap(line, 79):
                        output.append(wrapped_line)
                        output.append("\n")
                    output.pop()
            else:
                output.append(stuff)

        # Put a BOM if there was one in the input
        with open(ofile, "wb+") as f:
            if has_bom:
                f.write(b"\xEF\xBB\xBF")

        with codecs.open(ofile, "a+", "utf-8") as f:
            for stuff in output:
                if has_crlf:
                    f.write(stuff.replace("\n", "\r\n"))
                else:
                    f.write(stuff)


def main():
    import argparse
    import glob
    import os.path

    print("Powered by Yandex.Translate: http://translate.yandex.com/")

    parser = argparse.ArgumentParser(description =
        "Translate the {comments} in PGN files from one language to another.",
        formatter_class = argparse.ArgumentDefaultsHelpFormatter)

    parser.add_argument("--input", "-i", help = "Input file(s)",
                        required = True, nargs = "+")
    parser.add_argument("--output", "-o", help = "Output directory")
    parser.add_argument("--target", "-t", help = "Target language (2-letter)",
                        default = "en")
    parser.add_argument("--source", "-s", help = "Source language (2-letter)",
                        default = "auto")
    parser.add_argument("--verbose", "-v",
                        help = "Print the strings as they get translated",
                        action = "store_true")

    args = parser.parse_args()

    source = args.source.lower()
    target = args.target.lower()

    if source != "auto":
        print("Translating PGN files from %s to %s..." % (source, target))
    else:
        print("Translating PGN files to %s..." % target)

    output_dir = os.path.abspath(args.output) if args.output else None
    if output_dir:
        if not os.path.exists(output_dir):
            os.makedirs(output_dir)
        elif not os.path.isdir(output_dir):
            raise Exception("Not a directory: " % output_dir)

    files = []
    for globbed in (glob.glob(input_) for input_ in args.input):
        for file in globbed:
            files.append(file)

    for file in files:
        input_path = os.path.abspath(file)
        root, name = os.path.split(input_path)

        if output_dir:
            output_path = os.path.join(output_dir, name)
        else:
            name_root, name_ext = os.path.splitext(name)
            output_name = name_root + "_translated" + name_ext
            output_path = os.path.join(root, output_name)

        print("Translating %s..." % name)
        translate_pgn(input_path, output_path, target, source, args.verbose)


if __name__ == '__main__':
    main()

Usage

python translate_pgn.py -i /path/to/pgns/*.pgn -o /path/to/output/dir/

If the output directory isn't provided, the output file(s) will be filename_translated.pgn, in the same directory. python translate_pgn.py --help for more info.

Edit 1: Overhauled the script to work directly with .pgn files. Changed the translation API to Yandex as the public Google API was problematic due to rate limiting and issues processing multiple lines and special characters.

Edit 2: Cleaned up the code. Removed the 1s sleep as it's not necessary. Switched from params to data in translate() to make a proper POST request. Added BOM and CRLF detection to have the output file be similar to the input. Made translate_pgn work with games that don't have any comments in the moves. Added support for any language pair with the --target and --source switches. Added a --verbose switch to have the translated strings be printed out. Made the input argument(s) go through glob.glob() to make things work if the * wildcard doesn't get expanded.

This looks like a great solution, thanks!
CuriousMynd 5 days ago
CyteBode, terrific job!! I’m very pleased with this second effort.
CuriousMynd 4 days ago
View Timeline