Filter JSON file by value for single field
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

Hi all!

I'm working with this pretty gigantic JSON file of business registration IDs and addresses from the State of Georgia, and I need to find a way to filter the dataset down only to include the records in which the string field entitled City has a value of Savannah. I'm sure there's an easy way to do this programmatically that I'm missing. Can someone help?

Will php code do?
Codeword 1 month ago
hello paragon, I have implemented the solution in both php and python, Select any of the two solutions you want and comfortable with. Thank you
Codeword 1 month ago
@codeword This isn't working for me. When I run it, I just get the following: {"created_at":"2015-08-20T16:06:47.649776+00:00","dataset":{"created_at":"2015-08-20T16:06:47.649776+00:00","description":"Business ID's and corporate addresses.","description_short":"Business ID's and corporate addresses.","display_name":"Georgia Corporate Registrations - Addresses","id":"d37709e1-6fbe-41cb-a957-a4b5fde9e5cc","modified_at":"2017-06-30T07:00:54.518257+00:00","published":true},"fields":[{"data_type":"string","description":"Business Entity ID","display_name":"Business Entity ID","name":"bizentityid","visible_by_default":true},{"data_type":"string","description":"Control Number","display_name":"Control Number","name":"control_number","visible_by_default":true},{"data_type":"string","description":"
paragon21 1 month ago
Okay which version of code you are running php or python ?
Codeword 1 month ago
Just noticed the Python one. Running it now.
paragon21 1 month ago
Let me know what the output comes, Both version of code(python and php) I have tested before posting solution.Thank you
Codeword 1 month ago
@codeword -- here's the error message I'm getting when running your python script in Python 3.6
paragon21 1 month ago
what the code is doing , it modify the input data. The output data contains all the info of input json file but the rows in final output data is only the filtered rows.
Codeword 1 month ago
This is code is for python 2.7. see its running https://drive.google.com/open?id=0Bycm505pe6VGbmExekNSTUlVUTg I have also tested the php code
Codeword 1 month ago
for python 3.6 do this print( json.dumps(data) ) enclose print in parenthese.Try this
Codeword 1 month ago
you can use this site to get an organised view of output json . http://jsbeautifier.org/
Codeword 1 month ago
See here https://drive.google.com/open?id=0Bycm505pe6VGZ095OW5yRUNSbTA say if we want the row, with city = lawrenceville then it outputs the result. Note :Let me know if you want the extra info like created at, fields etc to be removed.
Codeword 1 month ago
hey, paragon I have updated both the code(PHP and python). Note: 1. updated my python code to work with 3.6 but remember python one is case sensitive 2. updated my php code to show a no result mesage php one is case- insensitive Please check the code and let me know if you want any modification.Thank you
Codeword 1 month ago
Please check my latest updated python code. You can also run and test my php code here https://goo.gl/vaUcoE . put in the city name just run execute.
Codeword 1 month ago
Hi @codeword, none of the other provided solutions by other users is doing the trick and I haven’t had time to test your latest updates, so an awarding to you with assumption you’ve got it working fine and I can reproduce. Thanks again!!
paragon21 23 days ago
Hello paragon, if any error or problem occurs let me know.Thank you
Codeword 23 days ago
awarded to Codeword

Crowdsource coding tasks.

6 Solutions


disclaimer: the solution is using javascript/nodeJs

assuming the dataset is an array of the json object you provided named ARR

ARR.filter(v => v.table_rows.rows.some(row => row[4] === "Savannah"))

otherwise, if you only need to filter the row in a single json object named A

A.table_rows.rows.filter(row => row[4] === "Savannah")


#!/usr/bin/env python3


import sys
import os
import json


# -----------------------------------------------------------------------------
def main(argv = None):
    """
    Main function.

    """

    if argv is None:
        argv = sys.argv

    relpath_self  = __file__ if __file__ else argv[0]
    filepath_self = os.path.abspath(relpath_self)
    dirpath_self  = os.path.dirname(filepath_self)
    filepath_data = os.path.join(dirpath_self, 'ga_corporations_address.json')

    field_display_name = 'City'
    query_string = 'Atlanta'

    with open(filepath_data, 'rt', encoding = 'utf8') as file_data:
        json_data = file_data.read()

    data = json.loads(json_data)

    filtered_data = _filter(data, field_display_name, query_string)

    filepath_out = os.path.join(dirpath_self,
                                'ga_corporations_address_filtered.json')

    with open(filepath_out, 'wt', encoding = 'utf8') as file_out:
        file_out.write(json.dumps(filtered_data, indent=4, sort_keys=True))


# -----------------------------------------------------------------------------
def _filter(data, field_display_name, query_string):
    """
    Fiter the specified data.

    """
    # Find the field name for display_name == 'City'
    #
    name = None
    for field in data['fields']:
        if field['display_name'] == 'City':
            name = field['name']
    assert name is not None

    # Find the index for our field name
    #
    index = None
    for (ifield, field) in enumerate(data['table_rows']['fields']):
        if field == name:
            index = ifield
    assert index is not None

    # Build the new rows.
    #
    query_string = query_string.lower()
    filtered_rows = list()
    for row in data['table_rows']['rows']:
        if row[index].lower() == query_string:
            filtered_rows.append(row)

    # Replace the old rows with the new rows.
    #
    data['table_rows']['rows'] = filtered_rows
    return data


# -----------------------------------------------------------------------------
if __name__ == "__main__":
    sys.exit(main())

PHP CODE

Hello paragon, this is my solution.Its 100% working I have tested it using php

<?php

$json_string = file_get_contents("ga_corporations_address.json"); // reading content of json file
$json_array = json_decode($json_string, true); // converting json object to an associative array
$json_rows = $json_array['table_rows']['rows']; // getting all the rows of data

$filtered_array = search_array($json_rows,"Lawrenceville");  // searching of the city name( case insensitive)
$json_array['table_rows']['count'] = count($filtered_array); // upadating rows count after filter
$json_array['table_rows']['rows'] = $filtered_array;  // upadting the rows array with the filtered data

$json_out = json_encode($json_array); // converting array to json object

if(count($filtered_array)){
    print_r($json_out);
}else{
    echo "No results found";
}



function search_array($array,$city){
    $filtered_result = array();
    $city_lowercase =strtolower($city);
    foreach ($array as $value) {
        if(in_arrayi($city, $value) || in_arrayi($city_lowercase, $value)){
            array_push($filtered_result, $value);
        }
    }

    return $filtered_result;
}

// case insensitive array search
function in_arrayi($needle, $haystack){
    return in_array(strtolower($needle), array_map('strtolower', $haystack));
}

?>
Winning solution

PYTHON CODE
Hello paragon, this is my solution.Its 100% working I have tested it using python 2.7

import json
with open('ga_corporations_address.json') as data_file:
    data = json.load(data_file)

def search_array(array,city):
    filtered_result = [];
    for inner_arr in array:
        inner_arr[4] = inner_arr[4].lower()
    for value in array:
        if city.lower() in value:
            value[4] = value[4].upper()
            filtered_result.append(value)
    return filtered_result

json_rows = data['table_rows']['rows']
filtered_array = search_array(json_rows,"LAWRENCEVILL")
data['table_rows']['count'] = len(filtered_array)
data['table_rows']['rows'] = filtered_array

if len(filtered_array) !=0:
    print(json.dumps(data))
else:
    print("No results found")
I haven’t ran this in 2.7 yet, only 3.4, so that’s likely the issue.
paragon21 23 days ago

Here's a drag and drop interface. Type in the city name, e.g. 'atlanta' and drag and drop the json file in the drop area. You will get an updated json file with only the selected city as entries.

https://codepen.io/kostasx/pen/veyNrR

This just didn’t work for me. It would spit out simply the index field values of JSON.
paragon21 23 days ago

You can do this in 1 line with jq, a program specifically written for processing streams of JSON:

cat ga_corporations_address.json | jq '.table_rows.rows[] | select((.[4] | ascii_downcase) == "savannah")'

The .table_rows.rows[] selects all the rows of actual data in the file, .[4] is the city field, according to the description that the fields object gives, and ascii_downcase is to lowercase the city name, so we can do a case-insensitive match.

Thanks for letting me know about jq. I’m not familiar with it and honestly don’t have the time investment to familiarize myself with it for this one task, but am leaving a tip for the future possible shortcut.
paragon21 23 days ago
View Timeline