slang's solution to "Add additional output functionality to existing Node script"

Solution will be up here: https://github.com/slang800/bountify-split-json in a bit. I ended up rewriting it to provide a `--help` command like: usage: index.js split [-h] KEY [DIRECTORY] Read JSON from STDIN, split by key and output to directory Positional arguments: KEY Key that determines where to output JSON DIRECTORY Location to output JSON files, defaults to ./ Optional arguments: -h, --help Show this help message and exit. --omit-key Remove the KEY field from records before writing. And changed the internals to use streams, so when you have JSON files that are larger than memory (several GB) it won't crash.
Solution will be up here: https://github.com/slang800/bountify-split-json in a bit. I ended up rewriting it to provide a `--help` command like: : $ node lib/split.js --help usage: index.js split.js [-h] [-v] [--omit-key] KEY [DIRECTORY] Read JSON from STDIN, split by key and output to directory Positional arguments: KEY Key that determines where to output JSON DIRECTORY Location to output JSON files, defaults to ./ Optional arguments: -h, --help Show this help message and exit. -v, --version Show thisprogram's helpversion messagenumber and exit. --omit-key Remove the KEY field from records before writing. . And changed the internals to use streams, so when you have JSON files that are larger than memory (several GB) it won't crash.
Solution will be up here: https://github.com/slang800/bountify-split-json in a bit. I ended up rewriting it to provide a `--help` command like: $ node lib/split.js --help usage: split.js [-h] [-v] [--omit-key] KEY [DIRECTORY] Read JSON from STDIN, split by key and output to directory Positional arguments: KEY Key that determines where to output JSON DIRECTORY Location to output JSON files, defaults to ./ Optional arguments: -h, --help Show this help message and exit. -v, --version Show program's version number and exit. --omit-key Remove the KEY field from records before writing. . And changed the internals to use streams, so when you have JSON files that are larger than memory (several GB) it won't crash. For your merge script, I think it would be faster to just use [jq](https://stedolan.github.io/jq/), since you're just `cat`ing a bunch of JSON together. For example, with a moderate number of files you could do this to merge and format them all: cat dir/*.json | jq -c > output.json `output.json` would look like: {"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"} {"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"} {"key2":"path/to/folder/2/","key3":"value3","key4":"value4","key5":"value5"} If you want to wrap them in an array, you can do: cat dir/*.json | jq -c --slurp > output.json If you have thousands/millions of files, you can do this to avoid listing all the files at once with a glob: find dir/ -type f -name "*.json" | while read i; do cat "$i"; done | jq -c > output.json
Solution will be up here: https://github.com/slang800/bountify-split-json in a bit. I ended up rewriting it to provide a `--help` command like: $ node lib/split.js --help usage: split.js [-h] [-v] [--omit-key] KEY [DIRECTORY] Read JSON from STDIN, split by key and output to directory Positional arguments: KEY Key that determines where to output JSON DIRECTORY Location to output JSON files, defaults to ./ Optional arguments: -h, --help Show this help message and exit. -v, --version Show program's version number and exit. --omit-key Remove the KEY field from records before writing. And changed the internals to use streams, so when you have JSON files that are larger than memory (several GB) it won't crash. For your merge script, I think it would be faster to just use [jq](https://stedolan.github.io/jq/), since you're just `cat`ing a bunch of JSON together. For example, with a moderate number of files you could do this to merge and format them all: cat dir/*.json | jq -c > output.json `output.json` would look like: {"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"} {"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"} {"key2":"path/to/folder/2/","key3":"value3","key4":"value4","key5":"value5"} If you want to wrap them in an array, you can do: cat dir/*.json | jq -c --slurp > output.json If you have thousands/millions of files, you can do this to avoid listing all the files at once with a glob: find dir/ -type f -name "*.json" | while read i; do cat "$i"; done | jq -c > output.json If you want it in the ElasticSearch bulk API format, then JQ can reformat it easily: cat dir/*.json | jq -c '[{"index": {"_id": .key3}}, .] | .[]' {"index":{"_id":"value3"}} {"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"} {"index":{"_id":"value3"}} {"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"} {"index":{"_id":"value3"}} {"key2":"path/to/folder/2/","key3":"value3","key4":"value4","key5":"value5"}
Solution will be up here: https://github.com/slang800/bountify-split-json in a bit. I ended up rewriting it to provide a `--help` command like: $ node lib/split.js --help usage: split.js [-h] [-v] [--omit-key] KEY [DIRECTORY] Read JSON from STDIN, split by key and output to directory Positional arguments: KEY Key that determines where to output JSON DIRECTORY Location to output JSON files, defaults to ./ Optional arguments: -h, --help Show this help message and exit. -v, --version Show program's version number and exit. --omit-key Remove the KEY field from records before writing. And changed the internals to use streams, so when you have JSON files that are larger than memory (several GB) it won't crash. For your merge script, I think it would be faster to just use [jq](https://stedolan.github.io/jq/), since you're just `cat`ing a bunch of JSON together. For example, with a moderate number of files you could do this to merge and format them all: cat dir/*.json | jq -c > output.json `output.json` would look like: {"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"} {"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"} {"key2":"path/to/folder/2/","key3":"value3","key4":"value4","key5":"value5"} If you want to wrap them in an array, you can do: cat dir/*.json | jq -c --slurp > output.json If you have thousands/millions of files, you can do this to avoid listing all the files at once with a glob: find dir/ -type f -name "*.json" | while read i; do cat "$i"; done | jq -c > output.json If you want it in the ElasticSearch bulk API format, then JQ can reformat it easily: cat dir/*.json | jq can reformat it easily: $ cat dir/*.json | jq -c '[{"index": {"_id": .key3}}, .] | .[]' {"index":{"_id":"value3"}} {"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"} {"index":{"_id":"value3"}} {"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"} {"index":{"_id":"value3"}} {"key2":"path/to/folder/2/","key3":"value3","key4":"value4","key5":"value5"}
Solution will be up here: https://github.com/slang800/bountify-split-json in a bit. I ended up rewriting it to provide a `--help` command like: $ node lib/split.js --help usage: split.js [-h] [-v] [--omit-key] KEY [DIRECTORY] Read JSON from STDIN, split by key and output to directory Positional arguments: KEY Key that determines where to output JSON DIRECTORY Location to output JSON files, defaults to ./ Optional arguments: -h, --help Show this help message and exit. -v, --version Show program's version number and exit. --omit-key Remove the KEY field from records before writing. And changed the internals to use streams, so when you have JSON files that are larger than memory (several GB) it won't crash. For your merge script, I think it would be faster to just use [jq](https://stedolan.github.io/jq/), since you're just `cat`ing a bunch of JSON together. For example, with a moderate number of files you could do this to merge and format them all: cat dir/*.json | jq -c > output.json `output.json` would look like: {"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"} {"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"} {"key2":"path/to/folder/2/","key3":"value3","key4":"value4","key5":"value5"} If you want to wrap them in an array, you can do: cat dir/*.json | jq -c --slurp > output.json If you have thousands/millions of files, you can do this to avoid listing all the files at once with a glob: find dir/ -type f -name "*.json" | while read i; do cat "$i"; done | jq -c > output.json If you want it in the ElasticSearch bulk API format, then jq can reformat it easily: $ cat dir/*.json | jq -c '[{"index": {"_id": .key3}}, .] | .[]' {"index":{"_id":"value3"}} {"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"} {"index":{"_id":"value3"}} {"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"} {"index":{"_id":"value3"}} {"key2":"path/to/folder/2/","key3":"value3","key4":"value4","key5":"value5"}
Solution will be up here: https://github.com/slang800/bountify-split-json in a bit. I ended up rewriting it to provide a `--help` command like: $ node lib/split.js --help usage: split.js [-h] [-v] [--omit-key] KEY [DIRECTORY] Read JSON from STDIN, split by key and output to directory Positional arguments: KEY Key that determines where to output JSON DIRECTORY Location to output JSON files, defaults to ./ Optional arguments: -h, --help Show this help message and exit. -v, --version Show program's version number and exit. --omit-key Remove the KEY field from records before writing. And changed the internals to use streams, so when you have JSON files that are larger than memory (several GB) it won't crash. For your merge script, I think it would be faster to use [jq](https://stedolan.github.io/jq/), since you're just `cat`ing a bunch of JSON together. For example, with a moderate number of files you could do this to merge and format them all: cat dir/*.json | jq -c > output.json `output.json` would look like: {"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"} {"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"} {"key2":"path/to/folder/2/","key3":"value3","key4":"value4","key5":"value5"} If you want to wrap them in an array, you can do: cat dir/*.json | jq -c --slurp > output.json If you have thousands/millions of files, you can do this to avoid listing all the files at once with a glob: find dir/ -type f -name "*.json" | while read i; do cat "$i"; done | jq -c > output.json If you want it in the ElasticSearch bulk API format, then jq can reformat it easily: $ cat dir/*.json | jq -c '[{"index": {"_id": .key3}}, .] | .[]' {"index":{"_id":"value3"}} {"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"} {"index":{"_id":"value3"}} {"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"} {"index":{"_id":"value3"}} {"key2":"path/to/folder/2/","key3":"value3","key4":"value4","key5":"value5"}"} To merge the `filename` and `folder` fields in the example file you uploaded, you can use the following: $ cat sample_records.ndjson | jq -c '.filename = .folder + .filename | del(.folder)' | node lib/split.js filename
Solution willis be up here: https://github.com/slang800/bountify-split-json in a bit. I ended up rewriting it to provide a `--help` command like: $ node lib/split.js --help usage: split.js [-h] [-v] [--omit-key] KEY [DIRECTORY] Read JSON from STDIN, split by key and output to directory Positional arguments: KEY Key that determines where to output JSON DIRECTORY Location to output JSON files, defaults to ./ Optional arguments: -h, --help Show this help message and exit. -v, --version Show program's version number and exit. --omit-key Remove the KEY field from records before writing. And changed the internals to use streams, so when you have JSON files that are larger than memory (several GB) it won't crash. For your merge script, I think it would be faster to use [jq](https://stedolan.github.io/jq/), since you're just `cat`ing a bunch of JSON together. For example, with a moderate number of files you could do this to merge and format them all: cat dir/*.json | jq -c > output.json `output.json` would look like: {"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"} {"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"} {"key2":"path/to/folder/2/","key3":"value3","key4":"value4","key5":"value5"} If you want to wrap them in an array, you can do: cat dir/*.json | jq -c --slurp > output.json If you have thousands/millions of files, you can do this to avoid listing all the files at once with a glob: find dir/ -type f -name "*.json" | while read i; do cat "$i"; done | jq -c > output.json If you want it in the ElasticSearch bulk API format, then jq can reformat it easily: $ cat dir/*.json | jq -c '[{"index": {"_id": .key3}}, .] | .[]' {"index":{"_id":"value3"}} {"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"} {"index":{"_id":"value3"}} {"key2":"path/to/folder/1/","key3":"value3","key4":"value4","key5":"value5"} {"index":{"_id":"value3"}} {"key2":"path/to/folder/2/","key3":"value3","key4":"value4","key5":"value5"} To merge the `filename` and `folder` fields in the example file you uploaded, you can use the following: $ cat sample_records.ndjson | jq -c '.filename = .folder + .filename | del(.folder)' | node lib/split.js filename

User: slang

Question: Add additional output functionality to existing Node script

Back to question