Need large .csv column split and loaded into a serverless DB
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

I have a 3M record .csv file (short records) that I want to load into a serverless database and access remotely via GraphQL from a React-Native app. I want fo split one of the columns City and State (separated by comma) into two columns.

I thought of using Redshift and loading it with Lodr.io Lodr is too expensive with this number of records.

I'm looking for suggestions on the column splitting, loading and the database choice.

awarded to alixaxel

Crowdsource coding tasks.

2 Solutions

Winning solution

Not GraphQL, but have you looked at AWS Athena?

You just upload your CSV to S3 and you can perform queries over these files with SQL.

With Amazon Athena, you pay only for the queries that you run. You are charged $5 per terabyte scanned by your queries. You can save from 30% to 90% on your per-query costs and get better performance by compressing, partitioning, and converting your data into columnar formats. Athena queries data directly in Amazon S3. There are no additional storage charges beyond S3.

Check the getting started tutorial: https://docs.aws.amazon.com/athena/latest/ug/getting-started.html


Hi,

just in case you want to pre-process the CSV and split the column, you can do this:

<?php

// Usage:
// php split.php < input.csv > output.csv

// CSV format example:
// street,"city, state",zip
// "3526 HIGH ST","SACRAMENTO, CA",95838
// "51 OMAHA CT","SACRAMENTO, CA",95823
// "2796 BRANCH ST","SACRAMENTO, CA",95815
// "2805 JANETTE WAY","SACRAMENTO, CA",95815
// "6001 MCMAHON DR","SACRAMENTO, CA",95824
// "5828 PEPPERMILL CT","SACRAMENTO, CA",95841
// "6048 OGDEN NASH WAY","SACRAMENTO, CA",95842
// "2561 19TH AVE","SACRAMENTO, CA",95820
// "11150 TRINITY RIVER DR Unit 114","RANCHO CORDOVA, CA",95670

ini_set('auto_detect_line_endings', TRUE); // added to manage line endings

$in  = fopen('php://stdin', 'r');
$out = fopen('php://stdout', 'w');

while ( ! feof($in)) {

    $csv_line = fgetcsv($in);

    // skip empty lines
    if(empty($csv_line))
        continue;

    $split    = explode(',', $csv_line[1]); // "City, State"
    $output   = [
        $csv_line[0],    // Street
        trim($split[0]), // City
        trim($split[1]), // State
        $csv_line[2]     // Zip
    ];

    fputcsv($out, $output);
}

fclose($out);
fclose($in);
Great input but I already awarded. Thanks.
billsouthworth 5 months ago
No problem. Thanks to you.
cueball 5 months ago
View Timeline