Excel-style CSV parser for unix
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

Standard unix programs (cut, etc) use newline delimiters.
I would like a script (should run with default mac tools, bash preferred) which converts excel-type CSV (may contain quotes) into delimiter-separated entries, and a second program to do the reverse.

Example:

csvtodsv -d "\t" <<EOF
"a ""b", c, d
EOF

should output

a "b\tc\td\n

Raw delimiters should be escaped. (Yes, I know cut won't understand the results)

dsvtocsv should be the exact reverse of csvtodsv (i.e. cat | csvtodsv | dsvtocsv is just cat, up to whitespace).

dsvtocsv -d "\n" << EOF
a "b
c
d
EOF

outputs

"a ""b",c,d

Edit: Multiline example as requested

csvtodsv -d "\t" <<EOF
"a ""b", c, d
"a ""b", c, d
EOF

outputs

a "b\tc\td\na "b\tc\td\n

and

csvtodsv -d "\n" <<EOF
"a ""b", c, d
"a ""b", c, d
EOF

outputs

a "b\nc\nd\n%\na "b\nc\nd\n

or written out:

a "b
c
d
%
a "b
c
d
Should csvtodsv deal with comma in fields (like "this, that")? Or can you guarantee that there will be no commas inside fields. In both examples there is just one record. Do you want to read in more than one record? In the first example the obvious record separator would be "\n". What should be the record separator in the second example? Maybe "\n\n"? Apart from that I would guess that awk or perl would be the tools to tackle this. Are these tools default mac tools?
mamawe over 5 years ago
awk and perl are both default tools--you can ask anyone with a mac. i guess the standard thing to do to separate multi-line records is % on a line by itself? Empty strings are fairly common in CSV so you can't do \n\n between CSV records
vanceza over 5 years ago
It could include commas -- it can include anything legal in excel-style csv (see http://tools.ietf.org/html/rfc4180)
vanceza over 5 years ago
awarded to iurisilvio
Tags
bash
unix

Crowdsource coding tasks.

2 Solutions


My solution is here: https://github.com/iurisilvio/csvtodsv

I did it with Python, I guess it is standard in mac (python 2.6 or 2.7).

I used almost the default Python csv module, but had issues with \n separator. It worked in my manual tests, I expect it works fine to you.

Could you modify this to handle more than one record like in comments above?
vanceza over 5 years ago
Can you add a multiline example to your question? I'm not sure how to handle it. How a,b\nc,d\n is converted (-d "\n")?
iurisilvio over 5 years ago
Thanks, now I know how to do it. I will work on that soon.
iurisilvio over 5 years ago
Winning solution

Update my github repository. https://github.com/iurisilvio/csvtodsv

Some working tests:

bash-4.1echo "a,b,c
d,e,e" | ./csvtodsv -d "\t" | ./dsvtocsv -d "\t"
a,b,c
d,e,e

bash-4.1cat csv.txt
"a ""b",c,d
f,"g ""h",i

bash-4.1cat csv.txt | ./csvtodsv -d "\n"
"a ""b"
c
d
%
f
"g ""h"
i

bash-4.1cat csv.txt | ./csvtodsv -d "\n" | ./dsvtocsv -d "\n"
['"a ""b",c,d', 'f,"g ""h",i']
"a ""b",c,d
f,"g ""h",i
View Timeline