Python XML Processing

Python – XML Processing

XML (eXtensible Markup Language) is a widely used format for storing and exchanging data. It is a text-based format that is both human-readable and machine-readable. Python provides several libraries and modules for processing XML data, making it easy to parse, manipulate, and generate XML documents. In this article, we will explore the different ways to handle XML data in Python.

1. DOM (Document Object Model)

The Document Object Model (DOM) is a standard API for accessing and manipulating XML documents. Python provides the xml.dom module, which allows us to parse an XML document and create a tree-like structure of nodes representing the elements, attributes, and text content of the document.

Here’s an example of how to use the DOM API to parse an XML document:

import xml.dom.minidom

# Load the XML document
xml_doc = xml.dom.minidom.parse("example.xml")

# Get the root element
root = xml_doc.documentElement

# Access elements and attributes
title = root.getElementsByTagName("title")[0]
print("Title:", title.firstChild.data)

author = root.getElementsByTagName("author")[0]
print("Author:", author.firstChild.data)

# Modify the XML document
title.firstChild.data = "New Title"

# Save the modified document
with open("modified.xml", "w") as file:
    file.write(xml_doc.toxml())

This example demonstrates how to load an XML document, access elements and attributes, modify the content, and save the modified document back to a file.

2. SAX (Simple API for XML)

The Simple API for XML (SAX) is an event-based API for parsing XML documents. Instead of creating a tree-like structure of nodes, SAX parses the XML document sequentially and triggers events for each element, attribute, and text content encountered.

Python provides the xml.sax module, which allows us to define custom event handlers to process the XML events. Here’s an example:

import xml.sax

# Define a custom event handler
class MyHandler(xml.sax.ContentHandler):
    def startElement(self, name, attrs):
        if name == "title":
            self.title = ""
    
    def characters(self, content):
        self.title += content
    
    def endElement(self, name):
        if name == "title":
            print("Title:", self.title)

# Create a SAX parser
parser = xml.sax.make_parser()

# Set the custom event handler
handler = MyHandler()
parser.setContentHandler(handler)

# Parse the XML document
parser.parse("example.xml")

In this example, we define a custom event handler that captures the content of the “title” element and prints it. The SAX parser sequentially triggers the startElement, characters, and endElement methods of the event handler as it parses the XML document.

3. ElementTree

The ElementTree module provides a high-level, easy-to-use interface for parsing and manipulating XML data. It is part of the Python standard library, so no additional installation is required.

Here’s an example of how to use ElementTree to parse an XML document:

import xml.etree.ElementTree as ET

# Parse the XML document
tree = ET.parse("example.xml")

# Get the root element
root = tree.getroot()

# Access elements and attributes
title = root.find("title").text
print("Title:", title)

author = root.find("author").text
print("Author:", author)

# Modify the XML document
root.find("title").text = "New Title"

# Save the modified document
tree.write("modified.xml")

This example demonstrates how to parse an XML document using ElementTree, access elements and attributes using XPath-like syntax, modify the content, and save the modified document back to a file.

Conclusion

Python provides several options for processing XML data, including the DOM API, SAX API, and the ElementTree module. Each approach has its own advantages and use cases. Whether you need to parse, manipulate, or generate XML documents, Python has the tools to make it easy and efficient.

By using the appropriate XML processing library or module in Python, you can easily work with XML data and integrate it into your applications or workflows.

Scroll to Top