Python 3 script to convert XML file rows into csv and HTML files
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

I need a python 3 script which will:

Import a file called export.xml into an Pandas Dataframe.

From the Dataframe convert to output files
a) a csv-flatfile
b) a printable file (maybe in HTML format)
c) a PDF.

The outout a the csv-file should have these fields:
Docid, ordner, hauptordner, bemerkung, status, revision, dokumentenart, Datum, bu-monat, bu-jahr, firma

Output b+c only a few selected fields:
Docid, hauptordner, bemerkung, dokumentenart, bu-monat, bu-jahr, firma

Inside the XML there can be multiple for each docid (1.0, 1.1, 1.2)
We only need the last/highest revison. In this sample it is 1.2 in the XML.

I want to call (double click) that script from the commandline and it should import the export.csv from the same folder and write back the csv, html and pdf.

The File should be named

If another file format than HTML is better for printing then please tell me.

Screenshot of XML File and needed Format vor PDF, HTML

Export.XML File

Can you use a C# program? Has one small button you browse for the input XML then you select the output folder and it does the conversion.
mashtullah 3 years ago
Sorry, no. Need the Python code for Integration.
need_solution 3 years ago
awarded to meo

Crowdsource coding tasks.

2 Solutions

Winning solution

Checkout my solution here:

Usage command:
python3 export.xml

Thanks for the good and fast works. Really nice to see how clean and short you can code that in Python. What I'm missing is output Format c) the PDF file Also the HTML is a little bit hard to read. Can it be formatted like this sample With striped background, smaller Fonts for Hauptordner, Dokumentenart, bu-monat, bu-jahr Wrapping for Column Bemerkung Added RecNo Col at the Front Chris
need_solution 3 years ago
@need_solution: I've updated python script, check the comment for the stylesheet file source code, put file styles.css in the same folder with html file. To generate PDF file, we need to install an other lib and the style doesn't looks good, so for the printing format, I would prefer to print from html or CSV, XLSX instead
meo 3 years ago
preview output:
meo 3 years ago
That worked great and gave me a really nice formatting. One little thing is the hardcoded Filename. 'export-IBMAppleMS-1908.csv' It should be created from the fields 'firma' 'bu-jahr' 'bu-monat' and the file format with open('./export-IBMAppleMS-1908.html', 'w') as f:
need_solution 3 years ago
updated script to use filename from fields 'firma' 'bu-jahr' 'bu-monat'
meo 3 years ago
Hi meo just did a final test with my real data and noticed there is one more thing we need to check for. It's the rows in the XML that have trashed='true' If trashed='true' these rows need to be copied to a separate file. An csv file format is enough. Maybe name it instead of export-IBMAppleMS-1908.csv to trash-IBMAppleMS-1908.csv Can you please add that to your Solution? Thanks Christoph
need_solution 3 years ago
@need_solution: I have updated script. Please check!
meo 3 years ago
***** Thanks, works great.
need_solution 3 years ago

Hey need_solution, here's my solution:

Copy and paste the code in a new file called and run it with python (assuming that python links to a python 3.x binary). It will read export.xml and produce .csv, .html and .pdf versions of the required columns.

Dependencies: pandas, weasyprint.

Install pandas with pip install pandas.

Install weasyprint by following your platform specific instructions here:

Edit: Updated styling to match the one requested. Example PDF:

Hi Wuddrum your script gives me Errors. I've installed Weasyprint in PyCharm.
need_solution 3 years ago
Hey, you need to install GTK for windows. Head to then download and install the latest gtk3-runtime-3.24.11-2019-10-04-ts-win64.exe. You might need to restart PyCharm after the installation completes so that terminal's PATH variable is updated.
Wuddrum 3 years ago
Weasyprint Import Error I've tried everything. Even restarted my PC. The Weasyprint package is installed. and the path looks fine to me
need_solution 3 years ago
Strange, worked just fine on my machine after installing GTK. Is your python 32bit by any chance? You can check that with python --version --version
Wuddrum 3 years ago
yes, it gives me Python 3.7.4 (tags/v3.7.4:e09359112e, Jul 8 2019, 19:29:22) [MSC v.1916 32 bit (Intel)] Can I update that to 64bit easy?
need_solution 3 years ago
Should be a pretty straightforward thing to to, depending on what you're using for creating python environments. You will need to recreate this current environment and re-run pip install again, however.
Wuddrum 3 years ago
I'm not very familiar with PyCharm, but if this is how you create new environments then you can download 64bit python and install it in a separate folder than your current 32bit python. Afterwards when recreating environment for this solution, just select the 64bit python executable instead of the 32bit one.
Wuddrum 3 years ago
I've update the solution to include the styling you prefer. Here's an example PDF file it produces:
In case you don't want to install 64bit Python, you can try to install GTK+ for 32bit version by following this instruction You'd need to update your PATH variable accordingly to swap out the 64bit GTK for 32bit.
Wuddrum 3 years ago
I've updated to 64-bit already. Your Script works now. Will look at the new script tomorrow morning
need_solution 3 years ago
View Timeline