Python docx :: Create Dictionary of Heading List Numbers
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

Using Pythons Docx module, I'm trying build up a map of the major Style Heading numbers, and their title. Getting the Title/Text is easy, getting the number though, is proving quite difficult.
E.g.

  1. Introduction

This is some paragraph text

1.1 Purpose

More paragraph text

1.2 Scope

More paragraph text

1.3 ABC

More paragraph text

From the above, I'd like to be able to generate a list like ::

['1. Introduction','1.1 Purpose','1.2 Scope','1.3 ABC']

Instead, all i've been able to get is ;

['Introduction','Purpose','Scope','ABC'].

The ultimate desire is to be able to pass a document to the function, and a particular section number, and have it return all the contents of the 1st table within that section.

I've uploaded the module i've been trying so far to build;
https://pastebin.com/rnYizqpy

If not possible using docx, a good tip to any provided code in an alternative language that can provide the same function.

awarded to PlatinumBobo

Crowdsource coding tasks.

1 Solution

Winning solution

Change line 191 of your module from sectionHeading = curHeadNm.lstrip().split(" ")[1] to sectionHeading = curListNm.

Then, if you call parseTable("test.docx", "1.3.1"), where test.docx is a Word file with a table in section 1.3.1, you get a print of the rows of the table, which seems to be your "ultimate desire".

If you still need curHeadNm to contain the number and the heading, you can change line 155 to 161 to this:

if len(block.text.strip().lower())>0:
    add_to_sectionnumber(curHeadIntLv)
curListNm = '.'.join(map(str, gblDocListNumber))
curHeadNm = "%s %s" % (curListNm, block.text.strip())
print('Current Heading: {0}'.format(curHeadNm))
print('Current Style Number: {0}'.format(curHeadIntLv))

And line 191 to sectionHeading = curHeadNm.lstrip().split(" ")[0] (0 instead of 1).