Python 3 script to convert XML file rows into csv and HTML files
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

I need a python 3 script which will:

Import a file called export.xml into an Pandas Dataframe.

From the Dataframe convert to output files
a) a csv-flatfile
b) a printable file (maybe in HTML format)
c) a PDF.

The outout a the csv-file should have these fields:
Docid, ordner, hauptordner, bemerkung, status, revision, dokumentenart, Datum, bu-monat, bu-jahr, firma

Output b+c only a few selected fields:
Docid, hauptordner, bemerkung, dokumentenart, bu-monat, bu-jahr, firma

Inside the XML there can be multiple for each docid (1.0, 1.1, 1.2)
We only need the last/highest revison. In this sample it is 1.2 in the XML.

I want to call (double click) that script from the commandline and it should import the export.csv from the same folder and write back the csv, html and pdf.

The File should be named
"export-IBMAppleMS-1908.csv"
"export-IBMAppleMS-1908.html"
"export-IBMAppleMS-1908.pdf"

If another file format than HTML is better for printing then please tell me.

Screenshot of XML File and needed Format vor PDF, HTML
https://www.screencast.com/t/qurkLfW7l8

Export.XML File
https://www.dropbox.com/s/l03k1a3lefjhlyn/export.xml?dl=0

Can you use a C# program? Has one small button you browse for the input XML then you select the output folder and it does the conversion.
mashtullah 26 days ago
Sorry, no. Need the Python code for Integration.
need_solution 26 days ago
awarded to meo

Crowdsource coding tasks.

2 Solutions

Winning solution

Checkout my solution here: https://gist.github.com/minhtc/13a3ba2ddfcd9d5fa8553358e09477e2

Usage command:
python3 convert-xml.py export.xml

Thanks for the good and fast works. Really nice to see how clean and short you can code that in Python. What I'm missing is output Format c) the PDF file Also the HTML is a little bit hard to read. Can it be formatted like this sample https://www.screencast.com/t/ZHyAvnxQ With striped background, smaller Fonts for Hauptordner, Dokumentenart, bu-monat, bu-jahr Wrapping for Column Bemerkung Added RecNo Col at the Front Chris
need_solution 26 days ago
@need_solution: I've updated python script, check the comment for the stylesheet file source code, put file styles.css in the same folder with html file. To generate PDF file, we need to install an other lib and the style doesn't looks good, so for the printing format, I would prefer to print from html or CSV, XLSX instead
meo 26 days ago
preview output: https://i.imgur.com/454extX.png
meo 26 days ago
That worked great and gave me a really nice formatting. One little thing is the hardcoded Filename. 'export-IBMAppleMS-1908.csv' It should be created from the fields 'firma' 'bu-jahr' 'bu-monat' and the file format with open('./export-IBMAppleMS-1908.html', 'w') as f:
need_solution 26 days ago
updated script to use filename from fields 'firma' 'bu-jahr' 'bu-monat'
meo 26 days ago
Hi meo just did a final test with my real data and noticed there is one more thing we need to check for. It's the rows in the XML that have trashed='true' If trashed='true' these rows need to be copied to a separate file. An csv file format is enough. Maybe name it instead of export-IBMAppleMS-1908.csv to trash-IBMAppleMS-1908.csv Can you please add that to your Solution? Thanks Christoph
need_solution 26 days ago
@need_solution: I have updated script. Please check!
meo 26 days ago
***** Thanks, works great.
need_solution 26 days ago

Hey need_solution, here's my solution: https://pastebin.com/car90Szh

Copy and paste the code in a new file called py-export-rows.py and run it with python py-export-rows.py (assuming that python links to a python 3.x binary). It will read export.xml and produce .csv, .html and .pdf versions of the required columns.

Dependencies: pandas, weasyprint.

Install pandas with pip install pandas.

Install weasyprint by following your platform specific instructions here: https://weasyprint.readthedocs.io/en/stable/install.html

Edit: Updated styling to match the one requested. Example PDF: https://www.dropbox.com/s/xczsmpp5y45rksv/export-IBMAppleMS-1908.pdf?dl=0

Hi Wuddrum your script gives me Errors. I've installed Weasyprint in PyCharm. https://www.screencast.com/t/TYjLqSla
need_solution 26 days ago
Hey, you need to install GTK for windows. Head to https://github.com/tschoonj/GTK-for-Windows-Runtime-Environment-Installer/releases then download and install the latest gtk3-runtime-3.24.11-2019-10-04-ts-win64.exe. You might need to restart PyCharm after the installation completes so that terminal's PATH variable is updated.
Wuddrum 26 days ago
Weasyprint Import Error I've tried everything. Even restarted my PC. https://www.screencast.com/t/rrcQltQ0 The Weasyprint package is installed. https://www.screencast.com/t/EgbU7ZovIdU and the path looks fine to me https://www.screencast.com/t/raJPmOORzg8n
need_solution 26 days ago
Strange, worked just fine on my machine after installing GTK. Is your python 32bit by any chance? You can check that with python --version --version
Wuddrum 26 days ago
yes, it gives me Python 3.7.4 (tags/v3.7.4:e09359112e, Jul 8 2019, 19:29:22) [MSC v.1916 32 bit (Intel)] Can I update that to 64bit easy?
need_solution 26 days ago
Should be a pretty straightforward thing to to, depending on what you're using for creating python environments. You will need to recreate this current environment and re-run pip install again, however.
Wuddrum 26 days ago
I'm not very familiar with PyCharm, but if this is how you create new environments https://www.jetbrains.com/help/pycharm/creating-virtual-environment.html then you can download 64bit python and install it in a separate folder than your current 32bit python. Afterwards when recreating environment for this solution, just select the 64bit python executable instead of the 32bit one.
Wuddrum 26 days ago
I've update the solution to include the styling you prefer. Here's an example PDF file it produces: https://www.dropbox.com/s/xczsmpp5y45rksv/export-IBMAppleMS-1908.pdf?dl=0
In case you don't want to install 64bit Python, you can try to install GTK+ for 32bit version by following this instruction https://weasyprint.readthedocs.io/en/stable/install.html#install-gtk-with-the-aid-of-msys2 You'd need to update your PATH variable accordingly to swap out the 64bit GTK for 32bit.
Wuddrum 26 days ago
I've updated to 64-bit already. Your Script works now. Will look at the new script tomorrow morning
need_solution 26 days ago
View Timeline