Solution Timeline

All versions (edits) of solutions to Add additional output functionality to existing Node script appear below in the order they were created. Comments that appear under revisions were those created when that particular revision was current.

To see the revision history of a single solution (with diffs), click on the solution number (ie. "#1") in the upper right corner of a solution revision below.

← Bounty Expand all edits

Solution will be up here: https://github.com/slang800/bountify-split-json in a bit. I ended up rewriting it to provide a --help command like:

usage: index.js split [-h] KEY [DIRECTORY]

Read JSON from STDIN, split by key and output to directory

Positional arguments:
  KEY         Key that determines where to output JSON
  DIRECTORY   Location to output JSON files, defaults to ./

Optional arguments:
  -h, --help  Show this help message and exit.

And changed the internals to use streams, so when you have JSON files that are larger than memory (several GB) it won't crash.

Solution will be up here: https://github.com/slang800/bountify-split-json in a bit. I ended up rewriting it to provide a --help command like:

usage: index.js split [-h] KEY [DIRECTORY]

Read JSON from STDIN, split by key and output to directory

Positional arguments:
  KEY         Key that determines where to output JSON
  DIRECTORY   Location to output JSON files, defaults to ./

Optional arguments:
  -h, --help  Show this help message and exit.
  --omit-key  Remove the KEY field from records before writing.

And changed the internals to use streams, so when you have JSON files that are larger than memory (several GB) it won't crash.

Solution will be up here: https://github.com/slang800/bountify-split-json in a bit. I ended up rewriting it to provide a --help command like:

$ node lib/split.js --help
usage: split.js [-h] [-v] [--omit-key] KEY [DIRECTORY]

Read JSON from STDIN, split by key and output to directory

Positional arguments:
  KEY            Key that determines where to output JSON
  DIRECTORY      Location to output JSON files, defaults to ./

Optional arguments:
  -h, --help     Show this help message and exit.
  -v, --version  Show program's version number and exit.
  --omit-key     Remove the KEY field from records before writing.

And changed the internals to use streams, so when you have JSON files that are larger than memory (several GB) it won't crash.

Solution will be up here: https://github.com/slang800/bountify-split-json in a bit. I ended up rewriting it to provide a --help command like:

$ node lib/split.js --help
usage: split.js [-h] [-v] [--omit-key] KEY [DIRECTORY]

Read JSON from STDIN, split by key and output to directory

Positional arguments:
  KEY            Key that determines where to output JSON
  DIRECTORY      Location to output JSON files, defaults to ./

Optional arguments:
  -h, --help     Show this help message and exit.
  -v, --version  Show program's version number and exit.
  --omit-key     Remove the KEY field from records before writing.

And changed the internals to use streams, so when you have JSON files that are larger than memory (several GB) it won't crash.

For your merge script, I think it would be faster to just use jq, since you're just cating a bunch of JSON together. For example, with a moderate number of files you could do this to merge and format them all:

cat dir/*.json | jq -c > output.json

output.json would look like:

{"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"}
{"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"}
{"key2":"path/to/folder/2/","key3":"value3","key4":"value4","key5":"value5"}

If you want to wrap them in an array, you can do:

cat dir/*.json | jq -c --slurp > output.json

If you have thousands/millions of files, you can do this to avoid listing all the files at once with a glob:

find dir/ -type f -name "*.json" | while read i; do cat "$i"; done | jq -c > output.json

Solution will be up here: https://github.com/slang800/bountify-split-json in a bit. I ended up rewriting it to provide a --help command like:

$ node lib/split.js --help
usage: split.js [-h] [-v] [--omit-key] KEY [DIRECTORY]

Read JSON from STDIN, split by key and output to directory

Positional arguments:
  KEY            Key that determines where to output JSON
  DIRECTORY      Location to output JSON files, defaults to ./

Optional arguments:
  -h, --help     Show this help message and exit.
  -v, --version  Show program's version number and exit.
  --omit-key     Remove the KEY field from records before writing.

And changed the internals to use streams, so when you have JSON files that are larger than memory (several GB) it won't crash.

For your merge script, I think it would be faster to just use jq, since you're just cating a bunch of JSON together. For example, with a moderate number of files you could do this to merge and format them all:

cat dir/*.json | jq -c > output.json

output.json would look like:

{"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"}
{"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"}
{"key2":"path/to/folder/2/","key3":"value3","key4":"value4","key5":"value5"}

If you want to wrap them in an array, you can do:

cat dir/*.json | jq -c --slurp > output.json

If you have thousands/millions of files, you can do this to avoid listing all the files at once with a glob:

find dir/ -type f -name "*.json" | while read i; do cat "$i"; done | jq -c > output.json

If you want it in the ElasticSearch bulk API format, then JQ can reformat it easily:

cat dir/*.json | jq -c '[{"index": {"_id": .key3}}, .] | .[]'
{"index":{"_id":"value3"}}
{"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"}
{"index":{"_id":"value3"}}
{"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"}
{"index":{"_id":"value3"}}
{"key2":"path/to/folder/2/","key3":"value3","key4":"value4","key5":"value5"}

Solution will be up here: https://github.com/slang800/bountify-split-json in a bit. I ended up rewriting it to provide a --help command like:

$ node lib/split.js --help
usage: split.js [-h] [-v] [--omit-key] KEY [DIRECTORY]

Read JSON from STDIN, split by key and output to directory

Positional arguments:
  KEY            Key that determines where to output JSON
  DIRECTORY      Location to output JSON files, defaults to ./

Optional arguments:
  -h, --help     Show this help message and exit.
  -v, --version  Show program's version number and exit.
  --omit-key     Remove the KEY field from records before writing.

And changed the internals to use streams, so when you have JSON files that are larger than memory (several GB) it won't crash.

For your merge script, I think it would be faster to just use jq, since you're just cating a bunch of JSON together. For example, with a moderate number of files you could do this to merge and format them all:

cat dir/*.json | jq -c > output.json

output.json would look like:

{"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"}
{"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"}
{"key2":"path/to/folder/2/","key3":"value3","key4":"value4","key5":"value5"}

If you want to wrap them in an array, you can do:

cat dir/*.json | jq -c --slurp > output.json

If you have thousands/millions of files, you can do this to avoid listing all the files at once with a glob:

find dir/ -type f -name "*.json" | while read i; do cat "$i"; done | jq -c > output.json

If you want it in the ElasticSearch bulk API format, then jq can reformat it easily:

$ cat dir/*.json | jq -c '[{"index": {"_id": .key3}}, .] | .[]'
{"index":{"_id":"value3"}}
{"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"}
{"index":{"_id":"value3"}}
{"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"}
{"index":{"_id":"value3"}}
{"key2":"path/to/folder/2/","key3":"value3","key4":"value4","key5":"value5"}

Solution will be up here: https://github.com/slang800/bountify-split-json in a bit. I ended up rewriting it to provide a --help command like:

$ node lib/split.js --help
usage: split.js [-h] [-v] [--omit-key] KEY [DIRECTORY]

Read JSON from STDIN, split by key and output to directory

Positional arguments:
  KEY            Key that determines where to output JSON
  DIRECTORY      Location to output JSON files, defaults to ./

Optional arguments:
  -h, --help     Show this help message and exit.
  -v, --version  Show program's version number and exit.
  --omit-key     Remove the KEY field from records before writing.

And changed the internals to use streams, so when you have JSON files that are larger than memory (several GB) it won't crash.

For your merge script, I think it would be faster to use jq, since you're just cating a bunch of JSON together. For example, with a moderate number of files you could do this to merge and format them all:

cat dir/*.json | jq -c > output.json

output.json would look like:

{"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"}
{"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"}
{"key2":"path/to/folder/2/","key3":"value3","key4":"value4","key5":"value5"}

If you want to wrap them in an array, you can do:

cat dir/*.json | jq -c --slurp > output.json

If you have thousands/millions of files, you can do this to avoid listing all the files at once with a glob:

find dir/ -type f -name "*.json" | while read i; do cat "$i"; done | jq -c > output.json

If you want it in the ElasticSearch bulk API format, then jq can reformat it easily:

$ cat dir/*.json | jq -c '[{"index": {"_id": .key3}}, .] | .[]'
{"index":{"_id":"value3"}}
{"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"}
{"index":{"_id":"value3"}}
{"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"}
{"index":{"_id":"value3"}}
{"key2":"path/to/folder/2/","key3":"value3","key4":"value4","key5":"value5"}

Usage

split

  Split a json-line file or merge json files.                    

  A json-line file is a file containing valid json on each line. 

Usage

  node split.js -s FILE [options]        
  node split.js -m DIR [options]         

  The first form to split FILE.          
  The second form to merge files in DIR. 

Split options

  -s   --split file     a json-line file to split                                        
  -n   --name-key key   key for the name of file, will groups objects with the same file 
  -p   --path-key key   key for the output path                                          
  -t   --omit-name      omit the name key                                                
  -u   --omit-path      omit the path key                                                
  -j   --out-jl         output the groups as json-line instead of array                  
  -k   --out-key key    output the groups as array value of this key                     

Merge options

  -m   --merge dir                dir with json files to merge 
  -o   --merge-output file        merge output file            
  -x   --index ESJSON index key   specify index key for ESJSON 

General options

  -d   --output-dir dir   root output dir, defaults to current dir 
  -h   --help             show this help                           

Split example

node split -s output.jl -n type -p path -tu

Split output.jl file with filename on type key, path on path key, and omit the filename and path key (-tu or -t -u).

Merge example

node split -m camera -o esjson.json -x model

Merge json files in the camera folder and output to esjson.json with index on model key.

split.js

var fs = require("fs");
var path = require("path");
const clu = require('command-line-usage');
const cla = require('command-line-args');

const splitOptions = [
  { name: 'split', alias: 's', type: String, arg: 'file', desc: 'a json-line file to split' },
  { name: 'name-key', alias: 'n', type: String, arg: 'key', desc: 'key for the name of file, will groups objects with the same file' },
  { name: 'path-key', alias: 'p', type: String, arg: 'key', desc: 'key for the output path' },
  { name: 'omit-name', alias: 't', type: Boolean, desc: 'omit the name key' },
  { name: 'omit-path', alias: 'u', type: Boolean, desc: 'omit the path key' },
  { name: 'out-key', alias: 'k', type: String, arg: 'key', desc: 'output the groups as array value of this key' },
];

const mergeOptions = [
  { name: 'merge', alias: 'm', type: String, arg: 'dir', desc: 'dir with json files to merge' },
  { name: 'merge-output', alias: 'o', type: String, arg: 'file', desc: 'merge output file' },
  { name: 'index', alias: 'x', type: String, arg: 'ESJSON index key', desc: 'specify index key for ESJSON' },
];

const generalOptions = [
  { name: 'output-dir', alias: 'd', type: String, defaultValue: '.', arg: 'dir', desc: 'root output dir, defaults to current dir' },
  { name: 'help', alias: 'h', type: Boolean, desc: 'show this help' },
];

const optionDefs = splitOptions.concat(mergeOptions).concat(generalOptions);

const help = [
  {
    header: 'split',
    content: [
      'Split a json-line file or merge json files.',
      '',
      'A json-line file is a file containing valid json on each line.',
    ],
  },
  {
    header: 'Usage',
    content: [
      'node split.js -s FILE [options]',
      'node split.js -m DIR [options]',
      '',
      'The first form to split FILE.',
      'The second form to merge files in DIR.',
    ],
  },
  getSectionOption('Split options', splitOptions),
  getSectionOption('Merge options', mergeOptions),
  getSectionOption('General options', generalOptions),
];

function getSectionOption(title, optionDef) {
  return {
    header: title,
    content: optionDef.map(o => ({ 
      a: '-' + o.alias, 
      b: '--' + o.name + ' ' + (o.arg || ''), 
      c: o.desc })),
  };
}

// parse options
const opts = cla(optionDefs);
// console.log(opts);

// handle errors
if (!opts.split && !opts.merge) {
  exitErr('Please specify an action: -s (split) or -m (merge).');
} 

function exitErr(str) {
  const errorSection = {
    'header': 'Error',
    'content': str,
  };
  help.push(errorSection);
  console.log(clu(help));
  process.exit(-1);
}

// show help
if (opts.help) {
  console.log(clu(help));
  process.exit(0);
}

if (opts.split) {
  const filepath = opts.split
  // Asynchronous read
  fs.readFile(filepath, function (err, data) {
    if (err) {
      return console.error(err);
    }
    var lines = data.toString().split("\n");
    // determine the input type
    var type = "ndjson";
    // Note: The comma at the end of the line is optional. I assume the format
    // is [{object}],\n[{object}],\n[{object}]\EOF
    if (lines[0].match(/[[]]*],?/)) {
      // it's the JSON-style format [<json>],
      type = "json";
    }
    var out = "";
    for (var i = 0; i < lines.length; i++) {
      if (lines[i].trim() == "") {
        continue;
      }
      var json;
      if (type == "ndjson") {
        json = JSON.parse(lines[i]);
      }
      else if (type == "json") {
        json = JSON.parse(lines[i].match(/[([]]*)],?/)[1]);
      }

      const nameKey = opts['name-key'];
      const pathKey = opts['path-key'];

      if (!nameKey) {
        exitErr("Please specify the name-key.");
      }
      if (!pathKey) {
        exitErr("Please specify the path-key.");
      }

      const filename = json[nameKey];
      const filepath = json[pathKey];

      if (opts['omit-name']) {
        delete json[nameKey];       
      }
      if (opts['omit-path']) {
        delete json[pathKey];
      }

      const outKey = opts['out-key'];
      if (outKey) {
        json = {
          [outKey]: json,
        };
      }

      fs.appendFile(
        getOutputPath(filepath) + "/" + filename + ".json",
        JSON.stringify(json) + "\n",
        function () { } // supresses warning
      );
    }
  });
}
else if (opts.merge) {
  const mergeDir = opts.merge;
  var data;
  // get the desired output format from the user
  getFormat(function (format) {
    if (Number(format) == 3 && !opts.index) {
      console.log("You forgot to declare an index (e.g.- pid) at EOL, run script again.");
      process.exit();
    }
    var index = opts.index;
    var mergedString = "";
    var items = fs.readdirSync(mergeDir);
    for (var i = 0; i < items.length; i++) {
      if (items[i].endsWith(".json")) {
        data = fs.readFileSync(mergeDir + '/' + items[i], "utf8");
        for (var a in data.toString().split("\n")) {
          var item = data.toString().split("\n")[a];
          if (item != "") {
            switch (Number(format)) {
              case 1: // minified JSON
                mergedString = mergedString + "[" + item + "],\n";
                break;
              case 2: // NDJSON
                mergedString += item + "\n";
                break;
              case 3: // ESJSON
                mergedString += '{"index":{"_id":"' +
                  JSON.parse(item)[index] +
                  '"}}\n' +
                  item +
                  "\n";
                break;
              default:
                break;
            }
          }
        }
      }
    }
    const filename = opts['merge-output'];
    if (!filename) {
      exitErr('Please specify merge-output file.');
    }

    const filepath = path.join(getOutputPath(), filename); 

    var writeStream = fs.createWriteStream(filepath);
    writeStream.write(mergedString);
    writeStream.end();
    writeStream.on("finish", function () {
      process.exit();
    });
  });
}
else {
  console.log("Please provide a correct action");
}

// function to use recursion to simulate syncronous access to stdin/out
function getFormat(callback) {
  process.stdout.write(
    "Select output format: 1:minified JSON, 2: NDJSON, 3:ESJSON: "
  );
  process.stdin.setEncoding('utf8');
  process.stdin.once('data', function (val) {
    // check validity of input
    if (!isNaN(val) && 0 < Number(val) < 3) {
      callback(val);
    }
    else {
      // if input is invalid, ask again
      getFormat(callback);
    }
  }).resume();
}

function mkDir(dir) {
  return dir.split('/').reduce((path, folder) => {
    path = path + '/' + fixName(folder);
    if (!fs.existsSync(path)) {
      fs.mkdirSync(path);
    }
    return path;
  }, '');
}

function fixName(name) {
  return name.replace(/\s+/g, '_');  
}

function getOutputPath(dir='') {
  return mkDir(path.resolve(path.join(
    opts['output-dir'], 
    dir)));
}

Usage

split

  Split a json-line file or merge json files.                    

  A json-line file is a file containing valid json on each line. 

Usage

  node split.js -s FILE [options]        
  node split.js -m DIR [options]         

  The first form to split FILE.          
  The second form to merge files in DIR. 

Split options

  -s   --split file     a json-line file to split                                        
  -n   --name-key key   key for the name of file, will groups objects with the same file 
  -p   --path-key key   key for the output path                                          
  -t   --omit-name      omit the name key                                                
  -u   --omit-path      omit the path key                                                
  -j   --out-jl         output the groups as json-line instead of array                  
  -k   --out-key key    output the groups as array value of this key                     

Merge options

  -m   --merge dir                dir with json files to merge 
  -o   --merge-output file        merge output file            
  -x   --index ESJSON index key   specify index key for ESJSON 

General options

  -d   --output-dir dir   root output dir, defaults to current dir 
  -h   --help             show this help                           

Split example

node split -s output.jl -n type -p path -tu

Split output.jl file with filename on type key, path on path key, and omit the filename and path key (-tu or -t -u).

Merge example

node split -m camera -o esjson.json -x model

Merge json files in the camera folder and output to esjson.json with index on model key.

Install Dependencies

The script need these packages in order to work.

command-line-args
command-line-usage

Run this in the script directory.

npm i -S command-line-args command-line-usage

split.js

var fs = require("fs");
var path = require("path");
const clu = require('command-line-usage');
const cla = require('command-line-args');

const splitOptions = [
  { name: 'split', alias: 's', type: String, arg: 'file', desc: 'a json-line file to split' },
  { name: 'name-key', alias: 'n', type: String, arg: 'key', desc: 'key for the name of file, will groups objects with the same file' },
  { name: 'path-key', alias: 'p', type: String, arg: 'key', desc: 'key for the output path' },
  { name: 'omit-name', alias: 't', type: Boolean, desc: 'omit the name key' },
  { name: 'omit-path', alias: 'u', type: Boolean, desc: 'omit the path key' },
  { name: 'out-key', alias: 'k', type: String, arg: 'key', desc: 'output the groups as array value of this key' },
];

const mergeOptions = [
  { name: 'merge', alias: 'm', type: String, arg: 'dir', desc: 'dir with json files to merge' },
  { name: 'merge-output', alias: 'o', type: String, arg: 'file', desc: 'merge output file' },
  { name: 'index', alias: 'x', type: String, arg: 'ESJSON index key', desc: 'specify index key for ESJSON' },
];

const generalOptions = [
  { name: 'output-dir', alias: 'd', type: String, defaultValue: '.', arg: 'dir', desc: 'root output dir, defaults to current dir' },
  { name: 'help', alias: 'h', type: Boolean, desc: 'show this help' },
];

const optionDefs = splitOptions.concat(mergeOptions).concat(generalOptions);

const help = [
  {
    header: 'split',
    content: [
      'Split a json-line file or merge json files.',
      '',
      'A json-line file is a file containing valid json on each line.',
    ],
  },
  {
    header: 'Usage',
    content: [
      'node split.js -s FILE [options]',
      'node split.js -m DIR [options]',
      '',
      'The first form to split FILE.',
      'The second form to merge files in DIR.',
    ],
  },
  getSectionOption('Split options', splitOptions),
  getSectionOption('Merge options', mergeOptions),
  getSectionOption('General options', generalOptions),
];

function getSectionOption(title, optionDef) {
  return {
    header: title,
    content: optionDef.map(o => ({ 
      a: '-' + o.alias, 
      b: '--' + o.name + ' ' + (o.arg || ''), 
      c: o.desc })),
  };
}

// parse options
const opts = cla(optionDefs);
// console.log(opts);

// handle errors
if (!opts.split && !opts.merge) {
  exitErr('Please specify an action: -s (split) or -m (merge).');
} 

function exitErr(str) {
  const errorSection = {
    'header': 'Error',
    'content': str,
  };
  help.push(errorSection);
  console.log(clu(help));
  process.exit(-1);
}

// show help
if (opts.help) {
  console.log(clu(help));
  process.exit(0);
}

if (opts.split) {
  const filepath = opts.split
  // Asynchronous read
  fs.readFile(filepath, function (err, data) {
    if (err) {
      return console.error(err);
    }
    var lines = data.toString().split("\n");
    // determine the input type
    var type = "ndjson";
    // Note: The comma at the end of the line is optional. I assume the format
    // is [{object}],\n[{object}],\n[{object}]\EOF
    if (lines[0].match(/[[]]*],?/)) {
      // it's the JSON-style format [<json>],
      type = "json";
    }
    var out = "";
    for (var i = 0; i < lines.length; i++) {
      if (lines[i].trim() == "") {
        continue;
      }
      var json;
      if (type == "ndjson") {
        json = JSON.parse(lines[i]);
      }
      else if (type == "json") {
        json = JSON.parse(lines[i].match(/[([]]*)],?/)[1]);
      }

      const nameKey = opts['name-key'];
      const pathKey = opts['path-key'];

      if (!nameKey) {
        exitErr("Please specify the name-key.");
      }
      if (!pathKey) {
        exitErr("Please specify the path-key.");
      }

      const filename = json[nameKey];
      const filepath = json[pathKey];

      if (opts['omit-name']) {
        delete json[nameKey];       
      }
      if (opts['omit-path']) {
        delete json[pathKey];
      }

      const outKey = opts['out-key'];
      if (outKey) {
        json = {
          [outKey]: json,
        };
      }

      fs.appendFile(
        getOutputPath(filepath) + "/" + filename + ".json",
        JSON.stringify(json) + "\n",
        function () { } // supresses warning
      );
    }
  });
}
else if (opts.merge) {
  const mergeDir = opts.merge;
  var data;
  // get the desired output format from the user
  getFormat(function (format) {
    if (Number(format) == 3 && !opts.index) {
      console.log("You forgot to declare an index (e.g.- pid) at EOL, run script again.");
      process.exit();
    }
    var index = opts.index;
    var mergedString = "";
    var items = fs.readdirSync(mergeDir);
    for (var i = 0; i < items.length; i++) {
      if (items[i].endsWith(".json")) {
        data = fs.readFileSync(mergeDir + '/' + items[i], "utf8");
        for (var a in data.toString().split("\n")) {
          var item = data.toString().split("\n")[a];
          if (item != "") {
            switch (Number(format)) {
              case 1: // minified JSON
                mergedString = mergedString + "[" + item + "],\n";
                break;
              case 2: // NDJSON
                mergedString += item + "\n";
                break;
              case 3: // ESJSON
                mergedString += '{"index":{"_id":"' +
                  JSON.parse(item)[index] +
                  '"}}\n' +
                  item +
                  "\n";
                break;
              default:
                break;
            }
          }
        }
      }
    }
    const filename = opts['merge-output'];
    if (!filename) {
      exitErr('Please specify merge-output file.');
    }

    const filepath = path.join(getOutputPath(), filename); 

    var writeStream = fs.createWriteStream(filepath);
    writeStream.write(mergedString);
    writeStream.end();
    writeStream.on("finish", function () {
      process.exit();
    });
  });
}
else {
  console.log("Please provide a correct action");
}

// function to use recursion to simulate syncronous access to stdin/out
function getFormat(callback) {
  process.stdout.write(
    "Select output format: 1:minified JSON, 2: NDJSON, 3:ESJSON: "
  );
  process.stdin.setEncoding('utf8');
  process.stdin.once('data', function (val) {
    // check validity of input
    if (!isNaN(val) && 0 < Number(val) < 3) {
      callback(val);
    }
    else {
      // if input is invalid, ask again
      getFormat(callback);
    }
  }).resume();
}

function mkDir(dir) {
  return dir.split('/').reduce((path, folder) => {
    path = path + '/' + fixName(folder);
    if (!fs.existsSync(path)) {
      fs.mkdirSync(path);
    }
    return path;
  }, '');
}

function fixName(name) {
  return name.replace(/\s+/g, '_');  
}

function getOutputPath(dir='') {
  return mkDir(path.resolve(path.join(
    opts['output-dir'], 
    dir)));
}

Usage

split

  Split a json-line file or merge json files.                    

  A json-line file is a file containing valid json on each line. 

Usage

  node split.js -s FILE [options]        
  node split.js -m DIR [options]         

  The first form to split FILE.          
  The second form to merge files in DIR. 

Split options

  -s   --split file     a json-line file to split                                        
  -n   --name-key key   key for the name of file, will groups objects with the same file 
  -p   --path-key key   key for the output path                                          
  -t   --omit-name      omit the name key                                                
  -u   --omit-path      omit the path key                                                
  -j   --out-jl         output the groups as json-line instead of array                  
  -k   --out-key key    output the groups as array value of this key                     

Merge options

  -m   --merge dir                dir with json files to merge 
  -o   --merge-output file        merge output file            
  -x   --index ESJSON index key   specify index key for ESJSON 

General options

  -d   --output-dir dir   root output dir, defaults to current dir 
  -h   --help             show this help                           

Split example

node split -s output.jl -n type -p path -tu

Split output.jl file with filename on type key, path on path key, and omit the filename and path key (-tu or -t -u).

Merge example

node split -m camera -o esjson.json -x model

Merge json files in the camera folder and output to esjson.json with index on model key.

Install Dependencies

The script need these packages in order to work.

command-line-args
command-line-usage

Run this in the script directory.

npm i -S command-line-args command-line-usage

split.js

var fs = require("fs");
var path = require("path");
const clu = require('command-line-usage');
const cla = require('command-line-args');

const splitOptions = [
  { name: 'split', alias: 's', type: String, arg: 'file', desc: 'a json-line file to split' },
  { name: 'name-key', alias: 'n', type: String, arg: 'key', desc: 'key for the name of file, will groups objects with the same file' },
  { name: 'path-key', alias: 'p', type: String, arg: 'key', desc: 'key for the output path' },
  { name: 'omit-name', alias: 't', type: Boolean, desc: 'omit the name key' },
  { name: 'omit-path', alias: 'u', type: Boolean, desc: 'omit the path key' },
  { name: 'out-key', alias: 'k', type: String, arg: 'key', desc: 'output the groups as array value of this key' },
];

const mergeOptions = [
  { name: 'merge', alias: 'm', type: String, arg: 'dir', desc: 'dir with json files to merge' },
  { name: 'merge-output', alias: 'o', type: String, arg: 'file', desc: 'merge output file' },
  { name: 'index', alias: 'x', type: String, arg: 'ESJSON index key', desc: 'specify index key for ESJSON' },
];

const generalOptions = [
  { name: 'output-dir', alias: 'd', type: String, defaultValue: '.', arg: 'dir', desc: 'root output dir, defaults to current dir' },
  { name: 'help', alias: 'h', type: Boolean, desc: 'show this help' },
];

const optionDefs = splitOptions.concat(mergeOptions).concat(generalOptions);

const help = [
  {
    header: 'split',
    content: [
      'Split a json-line file or merge json files.',
      '',
      'A json-line file is a file containing valid json on each line.',
    ],
  },
  {
    header: 'Usage',
    content: [
      'node split.js -s FILE [options]',
      'node split.js -m DIR [options]',
      '',
      'The first form to split FILE.',
      'The second form to merge files in DIR.',
    ],
  },
  getSectionOption('Split options', splitOptions),
  getSectionOption('Merge options', mergeOptions),
  getSectionOption('General options', generalOptions),
];

function getSectionOption(title, optionDef) {
  return {
    header: title,
    content: optionDef.map(o => ({ 
      a: '-' + o.alias, 
      b: '--' + o.name + ' ' + (o.arg || ''), 
      c: o.desc })),
  };
}

// parse options
const opts = cla(optionDefs);
// console.log(opts);

// handle errors
if (!opts.split && !opts.merge) {
  exitErr('Please specify an action: -s (split) or -m (merge).');
} 

function exitErr(str) {
  const errorSection = {
    'header': 'Error',
    'content': str,
  };
  help.push(errorSection);
  console.log(clu(help));
  process.exit(-1);
}

// show help
if (opts.help) {
  console.log(clu(help));
  process.exit(0);
}

if (opts.split) {
  const filepath = opts.split
  // Asynchronous read
  fs.readFile(filepath, function (err, data) {
    if (err) {
      return console.error(err);
    }
    var lines = data.toString().split("\n");
    // determine the input type
    var type = "ndjson";
    // Note: The comma at the end of the line is optional. I assume the format
    // is [{object}],\n[{object}],\n[{object}]\EOF
    if (lines[0].match(/[[]]*],?/)) {
      // it's the JSON-style format [<json>],
      type = "json";
    }
    var out = "";
    for (var i = 0; i < lines.length; i++) {
      if (lines[i].trim() == "") {
        continue;
      }
      var json;
      if (type == "ndjson") {
        json = JSON.parse(lines[i]);
      }
      else if (type == "json") {
        json = JSON.parse(lines[i].match(/[([]]*)],?/)[1]);
      }

      const nameKey = opts['name-key'];
      const pathKey = opts['path-key'];

      if (!nameKey) {
        exitErr("Please specify the name-key.");
      }
      if (!pathKey) {
        exitErr("Please specify the path-key.");
      }

      const filename = json[nameKey];
      const filepath = json[pathKey];

      if (opts['omit-name']) {
        delete json[nameKey];       
      }
      if (opts['omit-path']) {
        delete json[pathKey];
      }

      const outfile = getOutputPath(filepath) + "/" + filename + ".json";

      const outKey = opts['out-key'];
      if (outKey) {
        // add it to the array on out-key
        let obj;
        if (fs.existsSync(outfile)) {
          obj = JSON.parse(fs.readFileSync(outfile));
        }
        else {
          obj = { [outKey]: [] };
        }
        obj[outKey].push(json);
        fs.writeFileSync(outfile, JSON.stringify(obj));
      }
      else {
        fs.appendFile(
          outfile,
          JSON.stringify(json) + "\n",
          function () { } // supresses warning
        );
      }
    }
  });
}
else if (opts.merge) {
  const mergeDir = opts.merge;
  var data;
  // get the desired output format from the user
  getFormat(function (format) {
    if (Number(format) == 3 && !opts.index) {
      console.log("You forgot to declare an index (e.g.- pid) at EOL, run script again.");
      process.exit();
    }
    var index = opts.index;
    var mergedString = "";
    var items = fs.readdirSync(mergeDir);
    for (var i = 0; i < items.length; i++) {
      if (items[i].endsWith(".json")) {
        data = fs.readFileSync(mergeDir + '/' + items[i], "utf8");
        for (var a in data.toString().split("\n")) {
          var item = data.toString().split("\n")[a];
          if (item != "") {
            switch (Number(format)) {
              case 1: // minified JSON
                mergedString = mergedString + "[" + item + "],\n";
                break;
              case 2: // NDJSON
                mergedString += item + "\n";
                break;
              case 3: // ESJSON
                mergedString += '{"index":{"_id":"' +
                  JSON.parse(item)[index] +
                  '"}}\n' +
                  item +
                  "\n";
                break;
              default:
                break;
            }
          }
        }
      }
    }
    const filename = opts['merge-output'];
    if (!filename) {
      exitErr('Please specify merge-output file.');
    }

    const filepath = path.join(getOutputPath(), filename); 

    var writeStream = fs.createWriteStream(filepath);
    writeStream.write(mergedString);
    writeStream.end();
    writeStream.on("finish", function () {
      process.exit();
    });
  });
}
else {
  console.log("Please provide a correct action");
}

// function to use recursion to simulate syncronous access to stdin/out
function getFormat(callback) {
  process.stdout.write(
    "Select output format: 1:minified JSON, 2: NDJSON, 3:ESJSON: "
  );
  process.stdin.setEncoding('utf8');
  process.stdin.once('data', function (val) {
    // check validity of input
    if (!isNaN(val) && 0 < Number(val) < 3) {
      callback(val);
    }
    else {
      // if input is invalid, ask again
      getFormat(callback);
    }
  }).resume();
}

function mkDir(dir) {
  return dir.split('/').reduce((path, folder) => {
    path = path + '/' + fixName(folder);
    if (!fs.existsSync(path)) {
      fs.mkdirSync(path);
    }
    return path;
  }, '');
}

function fixName(name) {
  return name.replace(/\s+/g, '_');  
}

function getOutputPath(dir='') {
  return mkDir(path.resolve(path.join(
    opts['output-dir'], 
    dir)));
}

Usage

split

  Split a json-line file or merge json files.                    

  A json-line file is a file containing valid json on each line. 

Usage

  node split.js -s FILE [options]        
  node split.js -m DIR [options]         

  The first form to split FILE.          
  The second form to merge files in DIR. 

Split options

  -s   --split file     a json-line file to split                                        
  -n   --name-key key   key for the name of file, will groups objects with the same file 
  -p   --path-key key   key for the output path                                          
  -t   --omit-name      omit the name key                                                
  -u   --omit-path      omit the path key                                                
  -j   --out-jl         output the groups as json-line instead of array                  
  -k   --out-key key    output the groups as array value of this key                     

Merge options

  -m   --merge dir                dir with json files to merge 
  -o   --merge-output file        merge output file            
  -x   --index ESJSON index key   specify index key for ESJSON 

General options

  -d   --output-dir dir   root output dir, defaults to current dir 
  -h   --help             show this help                           

Split example

node split -s output.jl -n type -p path -tu

Split output.jl file with filename on type key, path on path key, and omit the filename and path key (-tu or -t -u).

Merge example

node split -m camera -o esjson.json -x model

Merge json files in the camera folder and output to esjson.json with index on model key.

Installation

  • Create a folder for the script and enter it
  • Create the package.json file
  • Create the spmer.js file

Then run:

npm install
npm link

Now spmer is installed globally.

package.json

{
  "name": "spmer",
  "version": "1.0.0",
  "description": "",
  "main": "spmer.js",
  "dependencies": {
    "command-line-args": "^4.0.1",
    "command-line-usage": "^4.0.0"
  },
  "devDependencies": {},
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "Farolan Faisal",
  "license": "ISC",
  "preferGlobal": true,
  "bin": {
    "spmer": "spmer.js"
  }
}

spmer.js

#! /usr/bin/env node
var fs = require("fs");
var path = require("path");
const clu = require('command-line-usage');
const cla = require('command-line-args');

const splitOptions = [
  { name: 'split', alias: 's', type: String, arg: 'file', desc: 'a json-line file to split' },
  { name: 'name-key', alias: 'n', type: String, arg: 'key', desc: 'key for the name of file, will groups objects with the same file' },
  { name: 'path-key', alias: 'p', type: String, arg: 'key', desc: 'key for the output path' },
  { name: 'omit-name', alias: 't', type: Boolean, desc: 'omit the name key' },
  { name: 'omit-path', alias: 'u', type: Boolean, desc: 'omit the path key' },
  { name: 'out-key', alias: 'k', type: String, arg: 'key', desc: 'output the groups as array value of this key' },
];

const mergeOptions = [
  { name: 'merge', alias: 'm', type: String, arg: 'dir', desc: 'dir with json files to merge' },
  { name: 'merge-output', alias: 'o', type: String, arg: 'file', desc: 'merge output file' },
  { name: 'index', alias: 'x', type: String, arg: 'ESJSON index key', desc: 'specify index key for ESJSON' },
];

const generalOptions = [
  { name: 'output-dir', alias: 'd', type: String, defaultValue: '.', arg: 'dir', desc: 'root output dir, defaults to current dir' },
  { name: 'help', alias: 'h', type: Boolean, desc: 'show this help' },
];

const optionDefs = splitOptions.concat(mergeOptions).concat(generalOptions);

const help = [
  {
    header: 'spmer',
    content: [
      'Split a json-line file or merge json files.',
      '',
      'A json-line file is a file containing valid json on each line.',
    ],
  },
  {
    header: 'Usage',
    content: [
      'node spmer.js -s FILE [options]',
      'node spmer.js -m DIR [options]',
      '',
      'The first form to split FILE.',
      'The second form to merge files in DIR.',
    ],
  },
  getSectionOption('Split options', splitOptions),
  getSectionOption('Merge options', mergeOptions),
  getSectionOption('General options', generalOptions),
];

function getSectionOption(title, optionDef) {
  return {
    header: title,
    content: optionDef.map(o => ({ 
      a: '-' + o.alias, 
      b: '--' + o.name + ' ' + (o.arg || ''), 
      c: o.desc })),
  };
}

// parse options
const opts = cla(optionDefs);
// console.log(opts);

// handle errors
if (!opts.split && !opts.merge) {
  exitErr('Please specify an action: -s (split) or -m (merge).');
} 

function exitErr(str) {
  const errorSection = {
    'header': 'Error',
    'content': str,
  };
  help.push(errorSection);
  console.log(clu(help));
  process.exit(-1);
}

// show help
if (opts.help) {
  console.log(clu(help));
  process.exit(0);
}

if (opts.split) {
  const filepath = opts.split
  // Asynchronous read
  fs.readFile(filepath, function (err, data) {
    if (err) {
      return console.error(err);
    }
    var lines = data.toString().split("\n");
    // determine the input type
    var type = "ndjson";
    // Note: The comma at the end of the line is optional. I assume the format
    // is [{object}],\n[{object}],\n[{object}]\EOF
    if (lines[0].match(/[[]]*],?/)) {
      // it's the JSON-style format [<json>],
      type = "json";
    }
    var out = "";
    for (var i = 0; i < lines.length; i++) {
      if (lines[i].trim() == "") {
        continue;
      }
      var json;
      if (type == "ndjson") {
        json = JSON.parse(lines[i]);
      }
      else if (type == "json") {
        json = JSON.parse(lines[i].match(/[([]]*)],?/)[1]);
      }

      const nameKey = opts['name-key'];
      const pathKey = opts['path-key'];

      if (!nameKey) {
        exitErr("Please specify the name-key.");
      }
      if (!pathKey) {
        exitErr("Please specify the path-key.");
      }

      const filename = json[nameKey];
      const filepath = json[pathKey];

      if (opts['omit-name']) {
        delete json[nameKey];       
      }
      if (opts['omit-path']) {
        delete json[pathKey];
      }

      const outfile = getOutputPath(filepath) + "/" + filename + ".json";

      const outKey = opts['out-key'];
      if (outKey) {
        // add it to the array on out-key
        let obj;
        if (fs.existsSync(outfile)) {
          obj = JSON.parse(fs.readFileSync(outfile));
        }
        else {
          obj = { [outKey]: [] };
        }
        obj[outKey].push(json);
        fs.writeFileSync(outfile, JSON.stringify(obj));
      }
      else {
        fs.appendFile(
          outfile,
          JSON.stringify(json) + "\n",
          function () { } // supresses warning
        );
      }
    }
  });
}
else if (opts.merge) {
  const mergeDir = opts.merge;
  var data;
  // get the desired output format from the user
  getFormat(function (format) {
    if (Number(format) == 3 && !opts.index) {
      console.log("You forgot to declare an index (e.g.- pid) at EOL, run script again.");
      process.exit();
    }
    var index = opts.index;
    var mergedString = "";
    var items = fs.readdirSync(mergeDir);
    for (var i = 0; i < items.length; i++) {
      if (items[i].endsWith(".json")) {
        data = fs.readFileSync(mergeDir + '/' + items[i], "utf8");
        for (var a in data.toString().split("\n")) {
          var item = data.toString().split("\n")[a];
          if (item != "") {
            switch (Number(format)) {
              case 1: // minified JSON
                mergedString = mergedString + "[" + item + "],\n";
                break;
              case 2: // NDJSON
                mergedString += item + "\n";
                break;
              case 3: // ESJSON
                mergedString += '{"index":{"_id":"' +
                  JSON.parse(item)[index] +
                  '"}}\n' +
                  item +
                  "\n";
                break;
              default:
                break;
            }
          }
        }
      }
    }
    const filename = opts['merge-output'];
    if (!filename) {
      exitErr('Please specify merge-output file.');
    }

    const filepath = path.join(getOutputPath(), filename); 

    var writeStream = fs.createWriteStream(filepath);
    writeStream.write(mergedString);
    writeStream.end();
    writeStream.on("finish", function () {
      process.exit();
    });
  });
}
else {
  console.log("Please provide a correct action");
}

// function to use recursion to simulate syncronous access to stdin/out
function getFormat(callback) {
  process.stdout.write(
    "Select output format: 1:minified JSON, 2: NDJSON, 3:ESJSON: "
  );
  process.stdin.setEncoding('utf8');
  process.stdin.once('data', function (val) {
    // check validity of input
    if (!isNaN(val) && 0 < Number(val) < 3) {
      callback(val);
    }
    else {
      // if input is invalid, ask again
      getFormat(callback);
    }
  }).resume();
}

function mkDir(dir) {
  return dir.split('/').reduce((path, folder) => {
    path = path + '/' + fixName(folder);
    if (!fs.existsSync(path)) {
      fs.mkdirSync(path);
    }
    return path;
  }, '');
}

function fixName(name) {
  return name.replace(/\s+/g, '_');  
}

function getOutputPath(dir='') {
  return mkDir(path.resolve(path.join(
    opts['output-dir'], 
    dir)));
}

Usage

split

  Split a json-line file or merge json files.                    

  A json-line file is a file containing valid json on each line. 

Usage

  node split.js -s FILE [options]        
  node split.js -m DIR [options]         

  The first form to split FILE.          
  The second form to merge files in DIR. 

Split options

  -s   --split file     a json-line file to split                                        
  -n   --name-key key   key for the name of file, will groups objects with the same file 
  -p   --path-key key   key for the output path                                          
  -t   --omit-name      omit the name key                                                
  -u   --omit-path      omit the path key                                                
  -j   --out-jl         output the groups as json-line instead of array                  
  -k   --out-key key    output the groups as array value of this key                     

Merge options

  -m   --merge dir                dir with json files to merge 
  -o   --merge-output file        merge output file            
  -x   --index ESJSON index key   specify index key for ESJSON 

General options

  -d   --output-dir dir   root output dir, defaults to current dir 
  -h   --help             show this help                           

Split example

node split -s output.jl -n type -p path -tu

Split output.jl file with filename on type key, path on path key, and omit the filename and path key (-tu or -t -u).

Merge example

node split -m camera -o esjson.json -x model

Merge json files in the camera folder and output to esjson.json with index on model key.

Installation

  • Create a folder for the script and enter it
  • Create the package.json file
  • Create the spmer.js file

Then run:

npm install
npm link

Now spmer is installed globally.

package.json

{
  "name": "spmer",
  "version": "1.0.0",
  "description": "",
  "main": "spmer.js",
  "dependencies": {
    "command-line-args": "^4.0.1",
    "command-line-usage": "^4.0.0"
  },
  "devDependencies": {},
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "Farolan Faisal",
  "license": "ISC",
  "preferGlobal": true,
  "bin": {
    "spmer": "spmer.js"
  }
}

spmer.js

#! /usr/bin/env node
var fs = require("fs");
var path = require("path");
const clu = require('command-line-usage');
const cla = require('command-line-args');

const splitOptions = [
  { name: 'split', alias: 's', type: String, arg: 'file', desc: 'a json-line file to split' },
  { name: 'name-key', alias: 'n', type: String, arg: 'key', desc: 'key for the name of file, will groups objects with the same file' },
  { name: 'path-key', alias: 'p', type: String, arg: 'key', desc: 'key for the output path' },
  { name: 'omit-name', alias: 't', type: Boolean, desc: 'omit the name key' },
  { name: 'omit-path', alias: 'u', type: Boolean, desc: 'omit the path key' },
  { name: 'out-key', alias: 'k', type: String, arg: 'key', desc: 'output the groups as array value of this key' },
  { name: 'append', alias: 'a', type: Boolean, desc: 'append to existing files' },
];

const mergeOptions = [
  { name: 'merge', alias: 'm', type: String, arg: 'dir', desc: 'dir with json files to merge' },
  { name: 'merge-output', alias: 'o', type: String, arg: 'file', desc: 'merge output file' },
  { name: 'index', alias: 'x', type: String, arg: 'ESJSON index key', desc: 'specify index key for ESJSON' },
];

const generalOptions = [
  { name: 'output-dir', alias: 'd', type: String, defaultValue: '.', arg: 'dir', desc: 'root output dir, defaults to current dir' },
  { name: 'help', alias: 'h', type: Boolean, desc: 'show this help' },
];

const optionDefs = splitOptions.concat(mergeOptions).concat(generalOptions);

const help = [
  {
    header: 'spmer',
    content: [
      'Split a json-line file or merge json files.',
      '',
      'A json-line file is a file containing valid json on each line.',
    ],
  },
  {
    header: 'Usage',
    content: [
      'node spmer.js -s FILE [options]',
      'node spmer.js -m DIR [options]',
      '',
      'The first form to split FILE.',
      'The second form to merge files in DIR.',
    ],
  },
  getSectionOption('Split options', splitOptions),
  getSectionOption('Merge options', mergeOptions),
  getSectionOption('General options', generalOptions),
];

function getSectionOption(title, optionDef) {
  return {
    header: title,
    content: optionDef.map(o => ({ 
      a: '-' + o.alias, 
      b: '--' + o.name + ' ' + (o.arg || ''), 
      c: o.desc })),
  };
}

// parse options
const opts = cla(optionDefs);
// console.log(opts);

// handle errors
if (!opts.split && !opts.merge) {
  exitErr('Please specify an action: -s (split) or -m (merge).');
} 

function exitErr(str) {
  const errorSection = {
    'header': 'Error',
    'content': str,
  };
  help.push(errorSection);
  console.log(clu(help));
  process.exit(-1);
}

// show help
if (opts.help) {
  console.log(clu(help));
  process.exit(0);
}


if (opts.split) {
  // track the processed file
  const filenames = [];

  const filepath = opts.split
  // Asynchronous read
  fs.readFile(filepath, function (err, data) {
    if (err) {
      return console.error(err);
    }
    var lines = data.toString().split("\n");
    // determine the input type
    var type = "ndjson";
    // Note: The comma at the end of the line is optional. I assume the format
    // is [{object}],\n[{object}],\n[{object}]\EOF
    if (lines[0].match(/[[]]*],?/)) {
      // it's the JSON-style format [<json>],
      type = "json";
    }
    var out = "";
    for (var i = 0; i < lines.length; i++) {
      if (lines[i].trim() == "") {
        continue;
      }
      var json;
      if (type == "ndjson") {
        json = JSON.parse(lines[i]);
      }
      else if (type == "json") {
        json = JSON.parse(lines[i].match(/[([]]*)],?/)[1]);
      }

      const nameKey = opts['name-key'];
      const pathKey = opts['path-key'];

      if (!nameKey) {
        exitErr("Please specify the name-key.");
      }
      if (!pathKey) {
        exitErr("Please specify the path-key.");
      }

      const filename = json[nameKey];
      const filepath = json[pathKey];

      if (opts['omit-name']) {
        delete json[nameKey];       
      }
      if (opts['omit-path']) {
        delete json[pathKey];
      }

      const outfile = getOutputPath(filepath) + "/" + filename + ".json";

      let truncate = false;
      if (!opts.append && !filenames.includes(outfile)) {
        truncate = true;
        filenames.push(outfile);
      }

      const outKey = opts['out-key'];
      if (outKey) {
        // add it to the array on out-key
        let obj;
        if (!truncate && fs.existsSync(outfile)) {
          try {
            obj = JSON.parse(fs.readFileSync(outfile));
          }
          catch(x) {
            if (x instanceof SyntaxError) {
              console.log("\nError:\n  A file exists with the same name but not in a valid JSON format.\n\  Perhaps it's the result of previous operation?\n\  Please delete the file or specify another output-dir.\n");                  
            }
            else {
              console.log(x);
            }
            process.exit(-1);
          }
        }
        else {
          obj = { [outKey]: [] };
        }
        obj[outKey].push(json);
        fs.writeFileSync(outfile, JSON.stringify(obj));
      }
      else {
        const data = JSON.stringify(json) + "\n";

        // truncate if this is the first time writing to this file
        // and not appending 
        if (truncate) {
          fs.writeFileSync(outfile, data, 'utf8');
        }
        else {
          fs.appendFile(
            outfile,
            data,
            function () { } // supresses warning
          );
        }
      }
    }
  });
}
else if (opts.merge) {
  const mergeDir = opts.merge;
  var data;
  // get the desired output format from the user
  getFormat(function (format) {
    if (Number(format) == 3 && !opts.index) {
      console.log("You forgot to declare an index (e.g.- pid) at EOL, run script again.");
      process.exit();
    }
    var index = opts.index;
    var mergedString = "";
    var items = fs.readdirSync(mergeDir);
    for (var i = 0; i < items.length; i++) {
      if (items[i].endsWith(".json")) {
        data = fs.readFileSync(mergeDir + '/' + items[i], "utf8");
        for (var a in data.toString().split("\n")) {
          var item = data.toString().split("\n")[a];
          if (item != "") {
            switch (Number(format)) {
              case 1: // minified JSON
                mergedString = mergedString + "[" + item + "],\n";
                break;
              case 2: // NDJSON
                mergedString += item + "\n";
                break;
              case 3: // ESJSON
                mergedString += '{"index":{"_id":"' +
                  JSON.parse(item)[index] +
                  '"}}\n' +
                  item +
                  "\n";
                break;
              default:
                break;
            }
          }
        }
      }
    }
    const filename = opts['merge-output'];
    if (!filename) {
      exitErr('Please specify merge-output file.');
    }

    const filepath = path.join(getOutputPath(), filename); 

    var writeStream = fs.createWriteStream(filepath);
    writeStream.write(mergedString);
    writeStream.end();
    writeStream.on("finish", function () {
      process.exit();
    });
  });
}
else {
  console.log("Please provide a correct action");
}

// function to use recursion to simulate syncronous access to stdin/out
function getFormat(callback) {
  process.stdout.write(
    "Select output format: 1:minified JSON, 2: NDJSON, 3:ESJSON: "
  );
  process.stdin.setEncoding('utf8');
  process.stdin.once('data', function (val) {
    // check validity of input
    if (!isNaN(val) && 0 < Number(val) < 3) {
      callback(val);
    }
    else {
      // if input is invalid, ask again
      getFormat(callback);
    }
  }).resume();
}

function mkDir(dir) {
  return dir.split('/').reduce((path, folder) => {
    path = path + '/' + fixName(folder);
    if (!fs.existsSync(path)) {
      fs.mkdirSync(path);
    }
    return path;
  }, '');
}

function fixName(name) {
  return name.replace(/\s+/g, '_');  
}

function getOutputPath(dir='') {
  return mkDir(path.resolve(path.join(
    opts['output-dir'], 
    dir)));
}
Tipped

Usage

spmer

  Split a json-line file or merge json files.                    

  A json-line file is a file containing valid json on each line. 

Usage

  node spmer.js -s FILE [options]        
  node spmer.js -m DIR [options]         

  The first form to split FILE.          
  The second form to merge files in DIR. 

Split options

  -s   --split file     a json-line file to split                                        
  -n   --name-key key   key for the name of file, will groups objects with the same file 
  -p   --path-key key   key for the output path                                          
  -t   --omit-name      omit the name key                                                
  -u   --omit-path      omit the path key                                                
  -k   --out-key key    output the groups as array value of this key                     
  -a   --append         append to existing files                                         

Merge options

  -m   --merge dir                dir with json files to merge 
  -o   --merge-output file        merge output file            
  -x   --index ESJSON index key   specify index key for ESJSON 

General options

  -d   --output-dir dir   root output dir, defaults to current dir 
  -h   --help             show this help                           

Split example

node split -s output.jl -n type -p path -tu

Split output.jl file with filename on type key, path on path key, and omit the filename and path key (-tu or -t -u).

Merge example

node split -m camera -o esjson.json -x model

Merge json files in the camera folder and output to esjson.json with index on model key.

Installation

  • Create a folder for the script and enter it
  • Create the package.json file
  • Create the spmer.js file

Then run:

npm install
npm link

Now spmer is installed globally.

package.json

{
  "name": "spmer",
  "version": "1.0.0",
  "description": "",
  "main": "spmer.js",
  "dependencies": {
    "command-line-args": "^4.0.1",
    "command-line-usage": "^4.0.0"
  },
  "devDependencies": {},
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "Farolan Faisal",
  "license": "ISC",
  "preferGlobal": true,
  "bin": {
    "spmer": "spmer.js"
  }
}

spmer.js

#! /usr/bin/env node
var fs = require("fs");
var path = require("path");
const clu = require('command-line-usage');
const cla = require('command-line-args');

const splitOptions = [
  { name: 'split', alias: 's', type: String, arg: 'file', desc: 'a json-line file to split' },
  { name: 'name-key', alias: 'n', type: String, arg: 'key', desc: 'key for the name of file, will groups objects with the same file' },
  { name: 'path-key', alias: 'p', type: String, arg: 'key', desc: 'key for the output path' },
  { name: 'omit-name', alias: 't', type: Boolean, desc: 'omit the name key' },
  { name: 'omit-path', alias: 'u', type: Boolean, desc: 'omit the path key' },
  { name: 'out-key', alias: 'k', type: String, arg: 'key', desc: 'output the groups as array value of this key' },
  { name: 'append', alias: 'a', type: Boolean, desc: 'append to existing files' },
];

const mergeOptions = [
  { name: 'merge', alias: 'm', type: String, arg: 'dir', desc: 'dir with json files to merge' },
  { name: 'merge-output', alias: 'o', type: String, arg: 'file', desc: 'merge output file' },
  { name: 'index', alias: 'x', type: String, arg: 'ESJSON index key', desc: 'specify index key for ESJSON' },
];

const generalOptions = [
  { name: 'output-dir', alias: 'd', type: String, defaultValue: '.', arg: 'dir', desc: 'root output dir, defaults to current dir' },
  { name: 'help', alias: 'h', type: Boolean, desc: 'show this help' },
];

const optionDefs = splitOptions.concat(mergeOptions).concat(generalOptions);

const help = [
  {
    header: 'spmer',
    content: [
      'Split a json-line file or merge json files.',
      '',
      'A json-line file is a file containing valid json on each line.',
    ],
  },
  {
    header: 'Usage',
    content: [
      'node spmer.js -s FILE [options]',
      'node spmer.js -m DIR [options]',
      '',
      'The first form to split FILE.',
      'The second form to merge files in DIR.',
    ],
  },
  getSectionOption('Split options', splitOptions),
  getSectionOption('Merge options', mergeOptions),
  getSectionOption('General options', generalOptions),
];

function getSectionOption(title, optionDef) {
  return {
    header: title,
    content: optionDef.map(o => ({ 
      a: '-' + o.alias, 
      b: '--' + o.name + ' ' + (o.arg || ''), 
      c: o.desc })),
  };
}

// parse options
const opts = cla(optionDefs);
// console.log(opts);

// handle errors
if (!opts.split && !opts.merge) {
  exitErr('Please specify an action: -s (split) or -m (merge).');
} 

function exitErr(str) {
  const errorSection = {
    'header': 'Error',
    'content': str,
  };
  help.push(errorSection);
  console.log(clu(help));
  process.exit(-1);
}

// show help
if (opts.help) {
  console.log(clu(help));
  process.exit(0);
}


if (opts.split) {
  // track the processed file
  const filenames = [];

  const filepath = opts.split
  // Asynchronous read
  fs.readFile(filepath, function (err, data) {
    if (err) {
      return console.error(err);
    }
    var lines = data.toString().split("\n");
    // determine the input type
    var type = "ndjson";
    // Note: The comma at the end of the line is optional. I assume the format
    // is [{object}],\n[{object}],\n[{object}]\EOF
    if (lines[0].match(/[[]]*],?/)) {
      // it's the JSON-style format [<json>],
      type = "json";
    }
    var out = "";
    for (var i = 0; i < lines.length; i++) {
      if (lines[i].trim() == "") {
        continue;
      }
      var json;
      if (type == "ndjson") {
        json = JSON.parse(lines[i]);
      }
      else if (type == "json") {
        json = JSON.parse(lines[i].match(/[([]]*)],?/)[1]);
      }

      const nameKey = opts['name-key'];
      const pathKey = opts['path-key'];

      if (!nameKey) {
        exitErr("Please specify the name-key.");
      }
      if (!pathKey) {
        exitErr("Please specify the path-key.");
      }

      const filename = json[nameKey];
      const filepath = json[pathKey];

      if (opts['omit-name']) {
        delete json[nameKey];       
      }
      if (opts['omit-path']) {
        delete json[pathKey];
      }

      const outfile = getOutputPath(filepath) + "/" + filename + ".json";

      let truncate = false;
      if (!opts.append && !filenames.includes(outfile)) {
        truncate = true;
        filenames.push(outfile);
      }

      const outKey = opts['out-key'];
      if (outKey) {
        // add it to the array on out-key
        let obj;
        if (!truncate && fs.existsSync(outfile)) {
          try {
            obj = JSON.parse(fs.readFileSync(outfile));
          }
          catch(x) {
            if (x instanceof SyntaxError) {
              console.log("\nError:\n  A file exists with the same name but not in a valid JSON format.\n\  Perhaps it's the result of previous operation?\n\  Please delete the file or specify another output-dir.\n");                  
            }
            else {
              console.log(x);
            }
            process.exit(-1);
          }
        }
        else {
          obj = { [outKey]: [] };
        }
        obj[outKey].push(json);
        fs.writeFileSync(outfile, JSON.stringify(obj));
      }
      else {
        const data = JSON.stringify(json) + "\n";

        // truncate if this is the first time writing to this file
        // and not appending 
        if (truncate) {
          fs.writeFileSync(outfile, data, 'utf8');
        }
        else {
          fs.appendFile(
            outfile,
            data,
            function () { } // supresses warning
          );
        }
      }
    }
  });
}
else if (opts.merge) {
  const mergeDir = opts.merge;
  var data;
  // get the desired output format from the user
  getFormat(function (format) {
    if (Number(format) == 3 && !opts.index) {
      console.log("You forgot to declare an index (e.g.- pid) at EOL, run script again.");
      process.exit();
    }
    var index = opts.index;
    var mergedString = "";
    var items = fs.readdirSync(mergeDir);
    for (var i = 0; i < items.length; i++) {
      if (items[i].endsWith(".json")) {
        data = fs.readFileSync(mergeDir + '/' + items[i], "utf8");
        for (var a in data.toString().split("\n")) {
          var item = data.toString().split("\n")[a];
          if (item != "") {
            switch (Number(format)) {
              case 1: // minified JSON
                mergedString = mergedString + "[" + item + "],\n";
                break;
              case 2: // NDJSON
                mergedString += item + "\n";
                break;
              case 3: // ESJSON
                mergedString += '{"index":{"_id":"' +
                  JSON.parse(item)[index] +
                  '"}}\n' +
                  item +
                  "\n";
                break;
              default:
                break;
            }
          }
        }
      }
    }
    const filename = opts['merge-output'];
    if (!filename) {
      exitErr('Please specify merge-output file.');
    }

    const filepath = path.join(getOutputPath(), filename); 

    var writeStream = fs.createWriteStream(filepath);
    writeStream.write(mergedString);
    writeStream.end();
    writeStream.on("finish", function () {
      process.exit();
    });
  });
}
else {
  console.log("Please provide a correct action");
}

// function to use recursion to simulate syncronous access to stdin/out
function getFormat(callback) {
  process.stdout.write(
    "Select output format: 1:minified JSON, 2: NDJSON, 3:ESJSON: "
  );
  process.stdin.setEncoding('utf8');
  process.stdin.once('data', function (val) {
    // check validity of input
    if (!isNaN(val) && 0 < Number(val) < 3) {
      callback(val);
    }
    else {
      // if input is invalid, ask again
      getFormat(callback);
    }
  }).resume();
}

function mkDir(dir) {
  return dir.split('/').reduce((path, folder) => {
    path = path + '/' + fixName(folder);
    if (!fs.existsSync(path)) {
      fs.mkdirSync(path);
    }
    return path;
  }, '');
}

function fixName(name) {
  return name.replace(/\s+/g, '_');  
}

function getOutputPath(dir='') {
  return mkDir(path.resolve(path.join(
    opts['output-dir'], 
    dir)));
}

Solution will be up here: https://github.com/slang800/bountify-split-json in a bit. I ended up rewriting it to provide a --help command like:

$ node lib/split.js --help
usage: split.js [-h] [-v] [--omit-key] KEY [DIRECTORY]

Read JSON from STDIN, split by key and output to directory

Positional arguments:
  KEY            Key that determines where to output JSON
  DIRECTORY      Location to output JSON files, defaults to ./

Optional arguments:
  -h, --help     Show this help message and exit.
  -v, --version  Show program's version number and exit.
  --omit-key     Remove the KEY field from records before writing.

And changed the internals to use streams, so when you have JSON files that are larger than memory (several GB) it won't crash.

For your merge script, I think it would be faster to use jq, since you're just cating a bunch of JSON together. For example, with a moderate number of files you could do this to merge and format them all:

cat dir/*.json | jq -c > output.json

output.json would look like:

{"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"}
{"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"}
{"key2":"path/to/folder/2/","key3":"value3","key4":"value4","key5":"value5"}

If you want to wrap them in an array, you can do:

cat dir/*.json | jq -c --slurp > output.json

If you have thousands/millions of files, you can do this to avoid listing all the files at once with a glob:

find dir/ -type f -name "*.json" | while read i; do cat "$i"; done | jq -c > output.json

If you want it in the ElasticSearch bulk API format, then jq can reformat it easily:

$ cat dir/*.json | jq -c '[{"index": {"_id": .key3}}, .] | .[]'
{"index":{"_id":"value3"}}
{"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"}
{"index":{"_id":"value3"}}
{"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"}
{"index":{"_id":"value3"}}
{"key2":"path/to/folder/2/","key3":"value3","key4":"value4","key5":"value5"}

To merge the filename and folder fields in the example file you uploaded, you can use the following:

$ cat sample_records.ndjson | jq -c '.filename = .folder + .filename | del(.folder)' | node lib/split.js filename

Solution is up here: https://github.com/slang800/bountify-split-json. I ended up rewriting it to provide a --help command like:

$ node lib/split.js --help
usage: split.js [-h] [-v] [--omit-key] KEY [DIRECTORY]

Read JSON from STDIN, split by key and output to directory

Positional arguments:
  KEY            Key that determines where to output JSON
  DIRECTORY      Location to output JSON files, defaults to ./

Optional arguments:
  -h, --help     Show this help message and exit.
  -v, --version  Show program's version number and exit.
  --omit-key     Remove the KEY field from records before writing.

And changed the internals to use streams, so when you have JSON files that are larger than memory (several GB) it won't crash.

For your merge script, I think it would be faster to use jq, since you're just cating a bunch of JSON together. For example, with a moderate number of files you could do this to merge and format them all:

cat dir/*.json | jq -c > output.json

output.json would look like:

{"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"}
{"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"}
{"key2":"path/to/folder/2/","key3":"value3","key4":"value4","key5":"value5"}

If you want to wrap them in an array, you can do:

cat dir/*.json | jq -c --slurp > output.json

If you have thousands/millions of files, you can do this to avoid listing all the files at once with a glob:

find dir/ -type f -name "*.json" | while read i; do cat "$i"; done | jq -c > output.json

If you want it in the ElasticSearch bulk API format, then jq can reformat it easily:

$ cat dir/*.json | jq -c '[{"index": {"_id": .key3}}, .] | .[]'
{"index":{"_id":"value3"}}
{"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"}
{"index":{"_id":"value3"}}
{"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"}
{"index":{"_id":"value3"}}
{"key2":"path/to/folder/2/","key3":"value3","key4":"value4","key5":"value5"}

To merge the filename and folder fields in the example file you uploaded, you can use the following:

$ cat sample_records.ndjson | jq -c '.filename = .folder + .filename | del(.folder)' | node lib/split.js filename

Usage

spmer

  Split a json-line file or merge json files.                    

  A json-line file is a file containing valid json on each line. 

Usage

  node spmer.js -s FILE [options]        
  node spmer.js -m DIR [options]         

  The first form to split FILE.          
  The second form to merge files in DIR. 

Split options

  -s   --split file     a json-line file to split                                        
  -n   --name-key key   key for the name of file, will groups objects with the same file 
  -p   --path-key key   key for the output path                                          
  -t   --omit-name      omit the name key                                                
  -u   --omit-path      omit the path key                                                
  -k   --out-key key    output the groups as array value of this key                     
  -a   --append         append to existing files                                         

Merge options

  -m   --merge dir                dir with json files to merge 
  -o   --merge-output file        merge output file            
  -x   --index ESJSON index key   specify index key for ESJSON 

General options

  -d   --output-dir dir   root output dir, defaults to current dir 
  -h   --help             show this help                           

Split example

node split -s output.jl -n type -p path -tu

Split output.jl file with filename on type key, path on path key, and omit the filename and path key (-tu or -t -u).

Merge example

node split -m camera -o esjson.json -x model

Merge json files in the camera folder and output to esjson.json with index on model key.

Installation

  • Create a folder for the script and enter it
  • Create the package.json file
  • Create the spmer.js file

Then run:

npm install
npm link

Now spmer is installed globally.

package.json

{
  "name": "spmer",
  "version": "1.0.0",
  "description": "",
  "main": "spmer.js",
  "dependencies": {
    "command-line-args": "^4.0.1",
    "command-line-usage": "^4.0.0"
  },
  "devDependencies": {},
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "Farolan Faisal",
  "license": "ISC",
  "preferGlobal": true,
  "bin": {
    "spmer": "spmer.js"
  }
}

spmer.js

#! /usr/bin/env node
var fs = require("fs");
var path = require("path");
const clu = require('command-line-usage');
const cla = require('command-line-args');

const splitOptions = [
  { name: 'split', alias: 's', type: String, arg: 'file', desc: 'a json-line file to split' },
  { name: 'name-key', alias: 'n', type: String, arg: 'key', desc: 'key for the name of file, will groups objects with the same file' },
  { name: 'path-key', alias: 'p', type: String, arg: 'key', desc: 'key for the output path' },
  { name: 'omit-name', alias: 't', type: Boolean, desc: 'omit the name key' },
  { name: 'omit-path', alias: 'u', type: Boolean, desc: 'omit the path key' },
  { name: 'out-key', alias: 'k', type: String, arg: 'key', desc: 'output the groups as array value of this key' },
  { name: 'append', alias: 'a', type: Boolean, desc: 'append to existing files' },
];

const mergeOptions = [
  { name: 'merge', alias: 'm', type: String, arg: 'dir', desc: 'dir with json files to merge' },
  { name: 'merge-output', alias: 'o', type: String, arg: 'file', desc: 'merge output file' },
  { name: 'index', alias: 'x', type: String, arg: 'ESJSON index key', desc: 'specify index key for ESJSON' },
];

const generalOptions = [
  { name: 'output-dir', alias: 'd', type: String, defaultValue: '.', arg: 'dir', desc: 'root output dir, defaults to current dir' },
  { name: 'help', alias: 'h', type: Boolean, desc: 'show this help' },
];

const optionDefs = splitOptions.concat(mergeOptions).concat(generalOptions);

const help = [
  {
    header: 'spmer',
    content: [
      'Split a json-line file or merge json files.',
      '',
      'A json-line file is a file containing valid json on each line.',
    ],
  },
  {
    header: 'Usage',
    content: [
      'node spmer.js -s FILE [options]',
      'node spmer.js -m DIR [options]',
      '',
      'The first form to split FILE.',
      'The second form to merge files in DIR.',
    ],
  },
  getSectionOption('Split options', splitOptions),
  getSectionOption('Merge options', mergeOptions),
  getSectionOption('General options', generalOptions),
];

function getSectionOption(title, optionDef) {
  return {
    header: title,
    content: optionDef.map(o => ({ 
      a: '-' + o.alias, 
      b: '--' + o.name + ' ' + (o.arg || ''), 
      c: o.desc })),
  };
}

// parse options
const opts = cla(optionDefs);
// console.log(opts);

// handle errors
if (!opts.split && !opts.merge) {
  exitErr('Please specify an action: -s (split) or -m (merge).');
} 

function exitErr(str) {
  const errorSection = {
    'header': 'Error',
    'content': str,
  };
  help.push(errorSection);
  console.log(clu(help));
  process.exit(-1);
}

// show help
if (opts.help) {
  console.log(clu(help));
  process.exit(0);
}

if (opts.split) {
  // track the processed file
  const filenames = [];

  const filepath = opts.split
  fs.readFile(filepath, function (err, data) {
    if (err) {
      return console.error(err);
    }
    var lines = data.toString().split("\n");
    // determine the input type
    var type = "ndjson";
    // Note: The comma at the end of the line is optional. I assume the format
    // is [{object}],\n[{object}],\n[{object}]\EOF
    if (lines[0].match(/[[]]*],?/)) {
      // it's the JSON-style format [<json>],
      type = "json";
    }
    var out = "";
    for (var i = 0; i < lines.length; i++) {
      if (lines[i].trim() == "") {
        continue;
      }
      var json;
      if (type == "ndjson") {
        json = JSON.parse(lines[i]);
      }
      else if (type == "json") {
        json = JSON.parse(lines[i].match(/[([]]*)],?/)[1]);
      }

      const nameKey = opts['name-key'];
      const pathKey = opts['path-key'];

      if (!nameKey) {
        exitErr("Please specify the name-key.");
      }
      if (!pathKey) {
        exitErr("Please specify the path-key.");
      }

      const filename = json[nameKey];
      const filepath = json[pathKey];

      if (opts['omit-name']) {
        delete json[nameKey];       
      }
      if (opts['omit-path']) {
        delete json[pathKey];
      }

      const outfile = getOutputPath(filepath) + "/" + filename + ".json";

      // truncate if this is the first time writing to this file
      // and not appending 
      let truncate = false;
      if (!opts.append && !filenames.includes(outfile)) {
        truncate = true;
        filenames.push(outfile);
      }

      const outKey = opts['out-key'];
      if (outKey) {
        // add it to the array on out-key
        let obj;
        if (!truncate && fs.existsSync(outfile)) {
          try {
            obj = JSON.parse(fs.readFileSync(outfile));
          }
          catch(x) {
            if (x instanceof SyntaxError) {
              console.log("\nError:\n  A file exists with the same name but not in a valid JSON format.\n\  Perhaps it's the result of previous operation?\n\  Please delete the file or specify another output-dir.\n");                  
            }
            else {
              console.log(x);
            }
            process.exit(-1);
          }
        }
        else {
          obj = { [outKey]: [] };
        }
        obj[outKey].push(json);
        fs.writeFileSync(outfile, JSON.stringify(obj));
      }
      else {
        const data = JSON.stringify(json) + "\n";

        if (truncate) {
          fs.writeFileSync(outfile, data);
        }
        else {
          fs.appendFileSync(outfile, data);
        }
      }
    }
  });
}
else if (opts.merge) {
  const mergeDir = opts.merge;
  var data;
  // get the desired output format from the user
  getFormat(function (format) {
    if (Number(format) == 3 && !opts.index) {
      console.log("You forgot to declare an index (e.g.- pid) at EOL, run script again.");
      process.exit();
    }
    var index = opts.index;
    var mergedString = "";
    var items = fs.readdirSync(mergeDir);
    for (var i = 0; i < items.length; i++) {
      if (items[i].endsWith(".json")) {
        data = fs.readFileSync(mergeDir + '/' + items[i], "utf8");
        for (var a in data.toString().split("\n")) {
          var item = data.toString().split("\n")[a];
          if (item != "") {
            switch (Number(format)) {
              case 1: // minified JSON
                mergedString = mergedString + "[" + item + "],\n";
                break;
              case 2: // NDJSON
                mergedString += item + "\n";
                break;
              case 3: // ESJSON
                mergedString += '{"index":{"_id":"' +
                  JSON.parse(item)[index] +
                  '"}}\n' +
                  item +
                  "\n";
                break;
              default:
                break;
            }
          }
        }
      }
    }
    const filename = opts['merge-output'];
    if (!filename) {
      exitErr('Please specify merge-output file.');
    }

    const filepath = path.join(getOutputPath(), filename); 

    var writeStream = fs.createWriteStream(filepath);
    writeStream.write(mergedString);
    writeStream.end();
    writeStream.on("finish", function () {
      process.exit();
    });
  });
}
else {
  console.log("Please provide a correct action");
}

// function to use recursion to simulate syncronous access to stdin/out
function getFormat(callback) {
  process.stdout.write(
    "Select output format: 1:minified JSON, 2: NDJSON, 3:ESJSON: "
  );
  process.stdin.setEncoding('utf8');
  process.stdin.once('data', function (val) {
    // check validity of input
    if (!isNaN(val) && 0 < Number(val) < 3) {
      callback(val);
    }
    else {
      // if input is invalid, ask again
      getFormat(callback);
    }
  }).resume();
}

function mkDir(dir) {
  return dir.split('/').reduce((path, folder) => {
    path = path + '/' + fixName(folder);
    if (!fs.existsSync(path)) {
      fs.mkdirSync(path);
    }
    return path;
  }, '');
}

function fixName(name) {
  return name.replace(/\s+/g, '_');  
}

function getOutputPath(dir='') {
  return mkDir(path.resolve(path.join(
    opts['output-dir'], 
    dir)));
}

Usage

spmer

  Split a json-line file or merge json files.                    

  A json-line file is a file containing valid json on each line. 

Usage

  node spmer.js -s FILE [options]        
  node spmer.js -m DIR [options]         

  The first form to split FILE.          
  The second form to merge files in DIR. 

Split options

  -s   --split file     a json-line file to split                                        
  -n   --name-key key   key for the name of file, will groups objects with the same file 
  -p   --path-key key   key for the output path                                          
  -t   --omit-name      omit the name key                                                
  -u   --omit-path      omit the path key                                                
  -k   --out-key key    output the groups as array value of this key                     
  -a   --append         append to existing files                                         

Merge options

  -m   --merge dir                dir with json files to merge 
  -o   --merge-output file        merge output file            
  -x   --index ESJSON index key   specify index key for ESJSON 

General options

  -d   --output-dir dir   root output dir, defaults to current dir 
  -h   --help             show this help                           

Split example

node split -s output.jl -n type -p path -tu

Split output.jl file with filename on type key, path on path key, and omit the filename and path key (-tu or -t -u).

Merge example

node split -m camera -o esjson.json -x model

Merge json files in the camera folder and output to esjson.json with index on model key.

Installation

  • Create a folder for the script and enter it
  • Create the package.json file
  • Create the spmer.js file

Then run:

npm install
npm link

Now spmer is installed globally.

package.json

{
  "name": "spmer",
  "version": "1.0.0",
  "description": "",
  "main": "spmer.js",
  "dependencies": {
    "command-line-args": "^4.0.1",
    "command-line-usage": "^4.0.0"
  },
  "devDependencies": {},
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "Farolan Faisal",
  "license": "ISC",
  "preferGlobal": true,
  "bin": {
    "spmer": "spmer.js"
  }
}

spmer.js

#!/usr/bin/env node
var fs = require("fs");
var path = require("path");
const clu = require('command-line-usage');
const cla = require('command-line-args');

const splitOptions = [
  { name: 'split', alias: 's', type: String, arg: 'file', desc: 'a json-line file to split' },
  { name: 'name-key', alias: 'n', type: String, arg: 'key', desc: 'key for the name of file, will groups objects with the same file' },
  { name: 'path-key', alias: 'p', type: String, arg: 'key', desc: 'key for the output path' },
  { name: 'omit-name', alias: 't', type: Boolean, desc: 'omit the name key' },
  { name: 'omit-path', alias: 'u', type: Boolean, desc: 'omit the path key' },
  { name: 'out-key', alias: 'k', type: String, arg: 'key', desc: 'output the groups as array value of this key' },
  { name: 'append', alias: 'a', type: Boolean, desc: 'append to existing files' },
];

const mergeOptions = [
  { name: 'merge', alias: 'm', type: String, arg: 'dir', desc: 'dir with json files to merge' },
  { name: 'merge-output', alias: 'o', type: String, arg: 'file', desc: 'merge output file' },
  { name: 'index', alias: 'x', type: String, arg: 'ESJSON index key', desc: 'specify index key for ESJSON' },
];

const generalOptions = [
  { name: 'output-dir', alias: 'd', type: String, defaultValue: '.', arg: 'dir', desc: 'root output dir, defaults to current dir' },
  { name: 'help', alias: 'h', type: Boolean, desc: 'show this help' },
];

const optionDefs = splitOptions.concat(mergeOptions).concat(generalOptions);

const help = [
  {
    header: 'spmer',
    content: [
      'Split a json-line file or merge json files.',
      '',
      'A json-line file is a file containing valid json on each line.',
    ],
  },
  {
    header: 'Usage',
    content: [
      'node spmer.js -s FILE [options]',
      'node spmer.js -m DIR [options]',
      '',
      'The first form to split FILE.',
      'The second form to merge files in DIR.',
    ],
  },
  getSectionOption('Split options', splitOptions),
  getSectionOption('Merge options', mergeOptions),
  getSectionOption('General options', generalOptions),
];

function getSectionOption(title, optionDef) {
  return {
    header: title,
    content: optionDef.map(o => ({ 
      a: '-' + o.alias, 
      b: '--' + o.name + ' ' + (o.arg || ''), 
      c: o.desc })),
  };
}

// parse options
const opts = cla(optionDefs);
// console.log(opts);

// handle errors
if (!opts.split && !opts.merge) {
  exitErr('Please specify an action: -s (split) or -m (merge).');
} 

function exitErr(str) {
  const errorSection = {
    'header': 'Error',
    'content': str,
  };
  help.push(errorSection);
  console.log(clu(help));
  process.exit(-1);
}

// show help
if (opts.help) {
  console.log(clu(help));
  process.exit(0);
}

if (opts.split) {
  // track the processed file
  const filenames = [];

  const filepath = opts.split
  fs.readFile(filepath, function (err, data) {
    if (err) {
      return console.error(err);
    }
    var lines = data.toString().split("\n");
    // determine the input type
    var type = "ndjson";
    // Note: The comma at the end of the line is optional. I assume the format
    // is [{object}],\n[{object}],\n[{object}]\EOF
    if (lines[0].match(/[[]]*],?/)) {
      // it's the JSON-style format [<json>],
      type = "json";
    }
    var out = "";
    for (var i = 0; i < lines.length; i++) {
      if (lines[i].trim() == "") {
        continue;
      }
      var json;
      if (type == "ndjson") {
        json = JSON.parse(lines[i]);
      }
      else if (type == "json") {
        json = JSON.parse(lines[i].match(/[([]]*)],?/)[1]);
      }

      const nameKey = opts['name-key'];
      const pathKey = opts['path-key'];

      if (!nameKey) {
        exitErr("Please specify the name-key.");
      }

      const filename = json[nameKey];
      const filepath = json[pathKey] || '';

      if (opts['omit-name']) {
        delete json[nameKey];       
      }
      if (opts['omit-path']) {
        delete json[pathKey];
      }

      const outfile = getOutputPath(filepath) + "/" + filename + ".json";

      // truncate if this is the first time writing to this file
      // and not appending 
      let truncate = false;
      if (!opts.append && !filenames.includes(outfile)) {
        truncate = true;
        filenames.push(outfile);
      }

      const outKey = opts['out-key'];
      if (outKey) {
        // add it to the array on out-key
        let obj;
        if (!truncate && fs.existsSync(outfile)) {
          try {
            obj = JSON.parse(fs.readFileSync(outfile));
          }
          catch(x) {
            if (x instanceof SyntaxError) {
              console.log("\nError:\n  A file exists with the same name but not in a valid JSON format.\n\  Perhaps it's the result of previous operation?\n\  Please delete the file or specify another output-dir.\n");                  
            }
            else {
              console.log(x);
            }
            process.exit(-1);
          }
        }
        else {
          obj = { [outKey]: [] };
        }
        obj[outKey].push(json);
        fs.writeFileSync(outfile, JSON.stringify(obj));
      }
      else {
        const data = JSON.stringify(json) + "\n";

        if (truncate) {
          fs.writeFileSync(outfile, data);
        }
        else {
          fs.appendFileSync(outfile, data);
        }
      }
    }
  });
}
else if (opts.merge) {
  const mergeDir = opts.merge;
  var data;
  // get the desired output format from the user
  getFormat(function (format) {
    if (Number(format) == 3 && !opts.index) {
      console.log("You forgot to declare an index (e.g.- pid) at EOL, run script again.");
      process.exit();
    }
    var index = opts.index;
    var mergedString = "";
    var items = fs.readdirSync(mergeDir);
    for (var i = 0; i < items.length; i++) {
      if (items[i].endsWith(".json")) {
        data = fs.readFileSync(mergeDir + '/' + items[i], "utf8");
        for (var a in data.toString().split("\n")) {
          var item = data.toString().split("\n")[a];
          if (item != "") {
            switch (Number(format)) {
              case 1: // minified JSON
                mergedString = mergedString + "[" + item + "],\n";
                break;
              case 2: // NDJSON
                mergedString += item + "\n";
                break;
              case 3: // ESJSON
                mergedString += '{"index":{"_id":"' +
                  JSON.parse(item)[index] +
                  '"}}\n' +
                  item +
                  "\n";
                break;
              default:
                break;
            }
          }
        }
      }
    }
    const filename = opts['merge-output'];
    if (!filename) {
      exitErr('Please specify merge-output file.');
    }

    const filepath = path.join(getOutputPath(), filename); 

    var writeStream = fs.createWriteStream(filepath);
    writeStream.write(mergedString);
    writeStream.end();
    writeStream.on("finish", function () {
      process.exit();
    });
  });
}
else {
  console.log("Please provide a correct action");
}

// function to use recursion to simulate syncronous access to stdin/out
function getFormat(callback) {
  process.stdout.write(
    "Select output format: 1:minified JSON, 2: NDJSON, 3:ESJSON: "
  );
  process.stdin.setEncoding('utf8');
  process.stdin.once('data', function (val) {
    // check validity of input
    if (!isNaN(val) && 0 < Number(val) < 3) {
      callback(val);
    }
    else {
      // if input is invalid, ask again
      getFormat(callback);
    }
  }).resume();
}

function mkDir(dir) {
  return dir.split('/').reduce((path, folder) => {
    path = path + '/' + fixName(folder);
    if (!fs.existsSync(path)) {
      fs.mkdirSync(path);
    }
    return path;
  }, '');
}

function fixName(name) {
  return name.replace(/\s+/g, '_');  
}

function getOutputPath(dir='') {
  return mkDir(path.resolve(path.join(
    opts['output-dir'], 
    dir)));
}
Winning solution