script to extract and save elements from XML files
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

I need a script to extract values from each attribute in each row in XML files and save them into simple markdown files in a target directory
The gist of the script is as follows:

  1. Loop through each ‘product’ row in XML file
  2. Combine and output the name of each attribute as well as it’s value into this:

    'attribute_name': 'attribute_value'

  3. Insert 'layout: single-product' to the top of each markdown file

  4. Insert the value of 'merchantName' attribute to the top of each markdown file

  5. If the attribute ‘manufacturer_name’ is missing then use the value from ‘merchantName’
    e.g: manufacturer_name: merchantName

  6. If not missing then just use the 'manufacturer_name'

  7. The final output from each row in XML should be as follows:

    /---

    layout: single-product

    merchantName: ‘value_of_merchant_name’

    attribute_name_1: value_of_attribute_1

    attribute_name_2: value_of_attribute_2

    etc

    /---

  8. Also if a markdown with the same title already exists in the target directory, then don't add duplicates.

**Btw remove the / forward slash because bountify won't allow me to insert three dashes for some reason

Each markdown file should be saved in the following format into a target directory

current_year-current_month-current_date-name.markdown
e.g: 2015-02-21-The Roll.markdown

The script also needs to loop through each XML file in a target directory.

I’m willing to pay more tips for this script to work well. I need this script to be executable from a cronjob as well as handle large XML files.

Here is an example XML file:
https://drive.google.com/file/d/0B6h9HPRdfghjSUdMOFM0RnBlWHM/view?usp=sharing

BTW: any techniques or tools can be used as long as the issue is resolved.

can a external command line tool be used or only stardard ubuntu tools are allowed?
andijcr 6 years ago
Hey, sorry I forgot to mention. Any tools or techniques can be used as long as it resolves the issue. Cheers
user0809 6 years ago
@user856: How large do you expect the XML files to get? Will they fit in memory?
alixaxel 6 years ago
awarded to iurisilvio
Tags
xml
ubuntu

Crowdsource coding tasks.

2 Solutions

Winning solution

I created a gist with my solution: https://gist.github.com/iurisilvio/217ecc660dbf10e4d075

It works with python2.7 as a command line tool, without other external dependencies:

python parser.py --input /your/input/directory --output /your/output/directory

If I forgot any requirement, just ask me. Cheers!

Hey I posted in the gist my questions :)
user0809 6 years ago
I fixed my gist to generate a safe filename and handle the encoding issue you had. I ran it with all your demo.xml and worked fine.
iurisilvio 6 years ago
Hi, I changed my gist again, to get all node data.
iurisilvio 6 years ago
Can you please have a look at my reply?
user0809 6 years ago
Done! ---
iurisilvio 6 years ago
Hey can you please take a look at my latest reply? Thanks
user0809 6 years ago

Here's the pure linux shell solution: https://app.box.com/s/u7628p5h5yob4jxu5vd2hpjjxwny8jz2

  • This is a shell script. Just do a chmod a+rwx product_parse.sh and run it like ./product_parse.sh demo.xml. It'll write MD files to the current dir.

UPDATE:

The script is updated to parse all XML files in a given dir and to output to a specified dir.

Usage: product_parse.sh <xml-file-or-dir> <MD-output-dir>

UPDATE2:

Here's a different approach for your newer demands: https://app.box.com/s/ttqp776ov9urepl4ct5flpq9i1l5cf2m

Now it uses xml2 utility which you'll need to install with:

sudo apt-get install xml2

The usage is the same:

./product_parse_xml2.sh demo.xml /your/output/dir

How do I use this solution?
user0809 6 years ago
This is a shell script. Save it as e.g. product_parse.sh. Then in your terminal run chmod a+rwx product_parse.sh to allow its execution and then run it like ./product_parse.sh demo.xml. It'll write MD files to the current dir. Do you have any difficulties with that?
dekkard 6 years ago
I need to be able to specify a target xml folder and a target output folder though
user0809 6 years ago
In the folder with the shell script I ran the command I got the following: productparse.sh /var/static-operation/xml-files/temp /var/static-operation/output-static-files/temp productparse.sh: command not found
user0809 6 years ago
Note the "./" before the script name: ./productparse.sh /var/static-operation/xml-files/temp /var/static-operation/output-static-files/temp That's how you run executables from current dir in linux. Another way is to specify the full path to productparse.sh
dekkard 6 years ago
Do you have any questions or suggestions?
dekkard 6 years ago
View Timeline