Filter JSON file by value for single field
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

Hi all!

I'm working with this pretty gigantic JSON file of business registration IDs and addresses from the State of Georgia, and I need to find a way to filter the dataset down only to include the records in which the string field entitled City has a value of Savannah. I'm sure there's an easy way to do this programmatically that I'm missing. Can someone help?

Will php code do?
Codeword over 2 years ago
hello paragon, I have implemented the solution in both php and python, Select any of the two solutions you want and comfortable with. Thank you
Codeword over 2 years ago
@codeword This isn't working for me. When I run it, I just get the following: {"created_at":"2015-08-20T16:06:47.649776+00:00","dataset":{"created_at":"2015-08-20T16:06:47.649776+00:00","description":"Business ID's and corporate addresses.","description_short":"Business ID's and corporate addresses.","display_name":"Georgia Corporate Registrations - Addresses","id":"d37709e1-6fbe-41cb-a957-a4b5fde9e5cc","modified_at":"2017-06-30T07:00:54.518257+00:00","published":true},"fields":[{"data_type":"string","description":"Business Entity ID","display_name":"Business Entity ID","name":"bizentityid","visible_by_default":true},{"data_type":"string","description":"Control Number","display_name":"Control Number","name":"control_number","visible_by_default":true},{"data_type":"string","description":"
paragon21 over 2 years ago
Okay which version of code you are running php or python ?
Codeword over 2 years ago
Just noticed the Python one. Running it now.
paragon21 over 2 years ago
Let me know what the output comes, Both version of code(python and php) I have tested before posting solution.Thank you
Codeword over 2 years ago
@codeword -- here's the error message I'm getting when running your python script in Python 3.6
paragon21 over 2 years ago
what the code is doing , it modify the input data. The output data contains all the info of input json file but the rows in final output data is only the filtered rows.
Codeword over 2 years ago
This is code is for python 2.7. see its running I have also tested the php code
Codeword over 2 years ago
for python 3.6 do this print( json.dumps(data) ) enclose print in parenthese.Try this
Codeword over 2 years ago
you can use this site to get an organised view of output json .
Codeword over 2 years ago
See here say if we want the row, with city = lawrenceville then it outputs the result. Note :Let me know if you want the extra info like created at, fields etc to be removed.
Codeword over 2 years ago
hey, paragon I have updated both the code(PHP and python). Note: 1. updated my python code to work with 3.6 but remember python one is case sensitive 2. updated my php code to show a no result mesage php one is case- insensitive Please check the code and let me know if you want any modification.Thank you
Codeword over 2 years ago
Please check my latest updated python code. You can also run and test my php code here . put in the city name just run execute.
Codeword over 2 years ago
Hi @codeword, none of the other provided solutions by other users is doing the trick and I haven’t had time to test your latest updates, so an awarding to you with assumption you’ve got it working fine and I can reproduce. Thanks again!!
paragon21 over 2 years ago
Hello paragon, if any error or problem occurs let me know.Thank you
Codeword over 2 years ago
awarded to Codeword

Crowdsource coding tasks.

5 Solutions

disclaimer: the solution is using javascript/nodeJs

assuming the dataset is an array of the json object you provided named ARR

ARR.filter(v => v.table_rows.rows.some(row => row[4] === "Savannah"))

otherwise, if you only need to filter the row in a single json object named A

A.table_rows.rows.filter(row => row[4] === "Savannah")

#!/usr/bin/env python3

import sys
import os
import json

# -----------------------------------------------------------------------------
def main(argv = None):
    Main function.


    if argv is None:
        argv = sys.argv

    relpath_self  = __file__ if __file__ else argv[0]
    filepath_self = os.path.abspath(relpath_self)
    dirpath_self  = os.path.dirname(filepath_self)
    filepath_data = os.path.join(dirpath_self, 'ga_corporations_address.json')

    field_display_name = 'City'
    query_string = 'Atlanta'

    with open(filepath_data, 'rt', encoding = 'utf8') as file_data:
        json_data =

    data = json.loads(json_data)

    filtered_data = _filter(data, field_display_name, query_string)

    filepath_out = os.path.join(dirpath_self,

    with open(filepath_out, 'wt', encoding = 'utf8') as file_out:
        file_out.write(json.dumps(filtered_data, indent=4, sort_keys=True))

# -----------------------------------------------------------------------------
def _filter(data, field_display_name, query_string):
    Fiter the specified data.

    # Find the field name for display_name == 'City'
    name = None
    for field in data['fields']:
        if field['display_name'] == 'City':
            name = field['name']
    assert name is not None

    # Find the index for our field name
    index = None
    for (ifield, field) in enumerate(data['table_rows']['fields']):
        if field == name:
            index = ifield
    assert index is not None

    # Build the new rows.
    query_string = query_string.lower()
    filtered_rows = list()
    for row in data['table_rows']['rows']:
        if row[index].lower() == query_string:

    # Replace the old rows with the new rows.
    data['table_rows']['rows'] = filtered_rows
    return data

# -----------------------------------------------------------------------------
if __name__ == "__main__":
Winning solution

Hello paragon, this is my solution.Its 100% working I have tested it using python 2.7

import json
with open('ga_corporations_address.json') as data_file:
    data = json.load(data_file)

def search_array(array,city):
    filtered_result = [];
    for inner_arr in array:
        inner_arr[4] = inner_arr[4].lower()
    for value in array:
        if city.lower() in value:
            value[4] = value[4].upper()
    return filtered_result

json_rows = data['table_rows']['rows']
filtered_array = search_array(json_rows,"LAWRENCEVILL")
data['table_rows']['count'] = len(filtered_array)
data['table_rows']['rows'] = filtered_array

if len(filtered_array) !=0:
    print("No results found")
I haven’t ran this in 2.7 yet, only 3.4, so that’s likely the issue.
paragon21 over 2 years ago

Here's a drag and drop interface. Type in the city name, e.g. 'atlanta' and drag and drop the json file in the drop area. You will get an updated json file with only the selected city as entries.

This just didn’t work for me. It would spit out simply the index field values of JSON.
paragon21 over 2 years ago

You can do this in 1 line with jq, a program specifically written for processing streams of JSON:

cat ga_corporations_address.json | jq '.table_rows.rows[] | select((.[4] | ascii_downcase) == "savannah")'

The .table_rows.rows[] selects all the rows of actual data in the file, .[4] is the city field, according to the description that the fields object gives, and ascii_downcase is to lowercase the city name, so we can do a case-insensitive match.

Thanks for letting me know about jq. I’m not familiar with it and honestly don’t have the time investment to familiarize myself with it for this one task, but am leaving a tip for the future possible shortcut.
paragon21 over 2 years ago
View Timeline