Python XML - build flat record from dynamic nested "node" elements

I need to parse an XML file and build a record-based output from the data. The problem is that the XML is in a "generic" form, in that it has several levels of nested "node" elements that represent some sort of data structure. I need to build the records dynamically based on the deepest level of the "node" element. Some example XML and expected output are at the bottom.

I am most familiar w/ python's ElementTree, so I'd prefer to use that but I just can't wrap my head around a way to dynamically build the output record based on a dynamic node depth. Also - we can't assume that the nested nodes will be x levels deep, so just hardcoding each level w/ a loop isn't possible. Is there a way to parse the XML and build the output on the fly?

Some Additional Notes:

  • The node names are all "node" except the parent and detail info (rate, price, etc)
  • The node depth is not static. So - assume further levels than displayed in the sample
  • Each "level" can have multiple sub-levels. So - you need to loop on each child "node" to properly build each record.

Any ideas / input would be greatly appreciated.

<root>
   <node>101
      <node>A
         <node>PlanA     
            <node>default
                <rate>100.00</rate>
            </node>
            <node>alternative
                <rate>90.00</rate>
            </node>
         </node>
      </node>
   </node>
   <node>102
      <node>B
         <node>PlanZZ     
            <node>Group 1
               <node>default
                   <rate>100.00</rate>
               </node>
               <node>alternative
                   <rate>90.00</rate>
               </node>
            </node>
            <node>Group 2
               <node>Suba
                  <node>default
                      <rate>1.00</rate>
                  </node>
                      <node>alternative
                      <rate>88.00</rate>
                  </node>
               </node>
               <node>Subb
                  <node>default
                      <rate>200.00</rate>
                  </node>
                      <node>alternative
                      <rate>4.00</rate>
                  </node>
               </node>
            </node>
         </node>
      </node>  
   </node>
</root>

The Output would look like this:

SRV  SUB  PLAN   Group    SubGrp  DefRate   AltRate
101  A    PlanA                   100       90
102  B    PlanB  Group1           100       90
102  B    PlanB  Group2   Suba    1         88
102  B    PlanB  Group2   Subb    200       4


Asked by: Chelsea443 | Posted: 06-12-2021






Answer 1

That's why you have Element Tree find method with an XPath.

class Plan( object ):
    def __init__( self ):
        self.srv= None
        self.sub= None
        self.plan= None
        self.group= None
        self.subgroup= None
        self.defrate= None
        self.altrate= None
    def initFrom( self, other ):
        self.srv= other.srv
        self.sub= other.sub
        self.plan= other.plan
        self.group= other.group
        self.subgroup= other.subgroup
    def __str__( self ):
        return "%s %s %s %s %s %s %s" % (
            self.srv, self.sub, self.plan, self.group, self.subgroup,
            self.defrate, self.altrate )

def setRates( obj, aSearch ):
    for rate in aSearch:
        if rate.text.strip() == "default":
            obj.defrate= rate.find("rate").text.strip()
        elif rate.text.strip() == "alternative":
            obj.altrate= rate.find("rate").text.strip()
        else:
            raise Exception( "Unexpected Structure" )

def planIter( doc ):
    for topNode in doc.findall( "node" ):
        obj= Plan()
        obj.srv= topNode.text.strip()
        subNode= topNode.find("node")
        obj.sub= subNode.text.strip()
        planNode= topNode.find("node/node")
        obj.plan= planNode.text.strip()
        l3= topNode.find("node/node/node")
        if l3.text.strip() in ( "default", "alternative" ):
            setRates( obj, topNode.findall("node/node/node") )
            yield obj
        else:
            for group in topNode.findall("node/node/node"):
                grpObj= Plan()
                grpObj.initFrom( obj )
                grpObj.group= group.text.strip()
                l4= group.find( "node" )
                if l4.text.strip() in ( "default", "alternative" ):
                    setRates( grpObj, group.findall( "node" ) )
                    yield grpObj
                else:
                    for subgroup in group.findall("node"):
                        subgrpObj= Plan()
                        subgrpObj.initFrom( grpObj )
                        subgrpObj.subgroup= subgroup.text.strip()
                        setRates( subgrpObj, subgroup.findall("node") )
                        yield subgrpObj

import xml.etree.ElementTree as xml
doc = xml.XML( doc )

for plan in planIter( doc ):
    print plan

Edit

Whoever gave you this XML document needs to find another job. This is A Bad Thing (TM) and indicates a fairly casual disregard for what XML means.

Answered by: Arnold384 | Posted: 07-01-2022



Answer 2

I'm not too familiar with the ElementTree module, but you should be able to use the getchildren() method on an element, and recursively parse data until there are no more children. This is more sudo-code than anything:

def parseXml(root, data):
    # INSERT CODE to populate your data object here with the values 
    # you want from this node
    sub_nodes = root.getchildren()
    for node in sub_nodes:
        parseXml(node, data)

data = {}  # I'm guessing you want a dict of some sort here to store the data you parse
parseXml(parse(file).getroot(), data)
# data will be filled and ready to use

Answered by: Gianna823 | Posted: 07-01-2022



Similar questions

python - Update dynamic elements

I'm trying to update a list of lights on the scene. For this I keep the last list and delete all the elements, and create new ones based on the new list... It deletes all the elements just fine, but when I try to update with the new ones.. it just crashes and says: RuntimeError: Object's name 'textName1' is not unique. # It's suposed to be deleted, isn't it?? this is the code of this part.. ...


python - Create a dynamic XML elements with Lxml

I've had some help to generate my XML using the lxml library and that has been very useful and I've been able to extend it to solve most of my issues. There is one use case which I'm struggling with. I've tried a suggestion but still struggling A simple representation of my dataset below ID,Currency,Notional,Maturity,Type ID1,,,,2018-06-01, ID1-L1,EUR,100,,,Bond ID1-L2,JPY,110,,A ID1-L2,CNY,115,,B I...


Getting elements with dynamic range of list in python

I am trying to get the elements of a list in python but with a dynamic range ie if I have two lists ['9','e','s','t','1','2','3'] and ['9','e','1','2','3','s','t'] now I need to access the three numbers including 1, so what I did was reached for 1 and then pass the index value of 1 and extract the desired values ie s_point = valueList.index('1') print (valueList[s_point::3] b...


python - how can i look for dynamic string with find elements by xpath?

I use this line of code in order to get links that contain '6z_04n'. However, this string changes according to the month. How can I put dynamically changing string value? flag = '6z_' + month + 'n' // 6z_04n links = driver.find_elements_by_xpath("//a[contains(@href, '6z_04n')]") I want something like: flag = '6z_' + month + 'n' // 6z_04n links = driver.find_elements_by_xpa...


python - Django make dynamic Q object by list elements

I want to make dynamic q objects by given list. tag_list = [&quot;t1&quot;] # tag list element length is 1 result_list = [] q_objs_list = [] # In this case, I want to make list like below.. # some def needed to do below q_objs = Q() q_objs &amp;= Q(tag=&quot;1&quot;) result_list.append(q_objs) q_objs_list.append(1) #because positive q_objs is only when tag &quot;1&quot; 1case. q_objs = Q() q_objs &amp;= ~Q...


Create dynamic Python list with N elements

This question already has answers here:


python - Update dynamic elements

I'm trying to update a list of lights on the scene. For this I keep the last list and delete all the elements, and create new ones based on the new list... It deletes all the elements just fine, but when I try to update with the new ones.. it just crashes and says: RuntimeError: Object's name 'textName1' is not unique. # It's suposed to be deleted, isn't it?? this is the code of this part.. ...


python - Create a dynamic XML elements with Lxml

I've had some help to generate my XML using the lxml library and that has been very useful and I've been able to extend it to solve most of my issues. There is one use case which I'm struggling with. I've tried a suggestion but still struggling A simple representation of my dataset below ID,Currency,Notional,Maturity,Type ID1,,,,2018-06-01, ID1-L1,EUR,100,,,Bond ID1-L2,JPY,110,,A ID1-L2,CNY,115,,B I...


Getting elements with dynamic range of list in python

I am trying to get the elements of a list in python but with a dynamic range ie if I have two lists ['9','e','s','t','1','2','3'] and ['9','e','1','2','3','s','t'] now I need to access the three numbers including 1, so what I did was reached for 1 and then pass the index value of 1 and extract the desired values ie s_point = valueList.index('1') print (valueList[s_point::3] b...


python - how can i look for dynamic string with find elements by xpath?

I use this line of code in order to get links that contain '6z_04n'. However, this string changes according to the month. How can I put dynamically changing string value? flag = '6z_' + month + 'n' // 6z_04n links = driver.find_elements_by_xpath("//a[contains(@href, '6z_04n')]") I want something like: flag = '6z_' + month + 'n' // 6z_04n links = driver.find_elements_by_xpa...


python - Django make dynamic Q object by list elements

I want to make dynamic q objects by given list. tag_list = [&quot;t1&quot;] # tag list element length is 1 result_list = [] q_objs_list = [] # In this case, I want to make list like below.. # some def needed to do below q_objs = Q() q_objs &amp;= Q(tag=&quot;1&quot;) result_list.append(q_objs) q_objs_list.append(1) #because positive q_objs is only when tag &quot;1&quot; 1case. q_objs = Q() q_objs &amp;= ~Q...


python - Can't get all xpath elements from dynamic webpage

First time here asking. Hope someone can help me with this, it's driving me crazy ! I'm trying to scrape a used-car webpage from my country. The data loads when you start to scroll down, so, the first part of the code is for scrolling down and load the webpage. I'm trying to get the link of every car published here, that's why I'm using find_elements_by_xpath in the t...


python - Alter elements of a list

I have a list of booleans where occasionally I reset them all to false. After first writing the reset as: for b in bool_list: b = False I found it doesn't work. I spent a moment scratching my head, then remembered that of course it won't work since I'm only changing a reference to the bool, not its value. So I rewrote as: for i in xrange(len(bool_list)): bool_list[i...


Find unique elements in tuples in a python list

Is there a better way to do this in python, or rather: Is this a good way to do it? x = ('a', 'b', 'c') y = ('d', 'e', 'f') z = ('g', 'e', 'i') l = [x, y, z] s = set([e for (_, e, _) in l]) I looks somewhat ugly but does what i need without writing a complex "get_unique_elements_from_tuple_list" function... ;) edit: expected value of s is set(['b','e'])


python - How to make two elements in gtk have the same size?

I'm using pyGTK. I want to layout a large element with 2 smaller ones on each side. For aesthetic reasons, I want the 2 smaller ones to be the same size. As it is, they differ by a few pixels, and the middle element is not centered as a result. I tried using gtk.Table with 3 cells, but having homogeneous=True doesn't have the desired effect. I tried messing with it by making 8 cells, and then having the center one ...


python - How to pick certain elements of x-tuple returned by a function?

I am a newbie to Python. Consider the function str.partition() which returns a 3-tuple. If I am interested in only elements 0 and 2 of this tuple, what is the best way to pick only certain elements out of such a tuple? I can currently do either: # Introduces "part1" variable, which is useless (part0, part1, part2) = str.partition(' ') Or: # Multiple cal...


python - What's the most pythonic way to ensure that all elements of a list are different?

I have a list in Python that I generate as part of the program. I have a strong assumption that these are all different, and I check this with an assertion. This is the way I do it now: If there are two elements: try: assert(x[0] != x[1]) except: print debug_info raise Exception("throw to caller") If there are three: try: assert(x[0] != x[1]...


Python switch order of elements

I am a newbie and seeking for the Zen of Python :) Today's koan was finding the most Pythonesq way to solve the following problem: Permute the letters of a string pairwise, e.g. input: 'abcdefgh' output: 'badcfehg'


Remove elements as you traverse a list in Python

This question already has answers here:


python - Fastest nested loops over a single list (with elements remove or not)

I am looking for advice about how to parse a single list, using two nested loops, in the fastest way, avoiding doing len(list)^2 comparisons, and avoiding duplicate files in groups. More precisely: I have a list of 'file' objects, that each has a timestamp. I want to group the files by their timestamp and a time offset. Ex. starting from a file X, I want to create a group with all the files that have t...


python - Replace SRC of all IMG elements using Parser

I am looking for a way to replace the SRC attribute in all IMG tags not using Regular expressions. (Would like to use any out-of-the box HTML parser included with default Python install) I need to reduce the source from what ever it may be to: &lt;img src="cid:imagename"&gt; I am trying to replace all src tags to point to the cid of an attachment for an HTML email so I will also need to ch...


How do I get the number of elements in a list in Python?

How do I get the number of elements in the list items? items = [&quot;apple&quot;, &quot;orange&quot;, &quot;banana&quot;] # There are 3 items.






Still can't find your answer? Check out these communities...



PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python



top