How Do I Fix the Text/Byte Split Issue in Python 3’s XML Generation?

The script builds a tiny XML file with lxml: it creates a <results> root, adds two nodes for “Country” and “City,” then tries to save the tree with doc.write(outFile). Unfortunately, it opens the file using open('output.xml', 'w'), which returns a text stream that expects str data, while ElementTree.write() emits bytes (UTF-8 by default).

When those bytes hit a text-only handle, Python 3 raises TypeError: must be str, not bytes. Opening the file in binary mode ('wb') or letting lxml handle the path fixes the clash.

How I tripped over Python 3’s text/bytes split and tightened up my XML generator

The Error

I was dusting off a small XML-report script nothing mission-critical, just a helper that spits out a list of countries and cities for a test harness. On Python 2.7 it purred. I upgraded the repo to Python 3.2, slammed Run, and the interpreter hit me with:

TypeError: must be str, not bytes

The crash happened on the very last write. Classic Py3 moment. Here’s the forensic rundown and the tidy upgrade I ended up shipping.

The Original Code

# -*- coding: utf-8 -*-
import time
from datetime import date
from lxml import etree
from collections import OrderedDict

page = etree.Element('results')
doc = etree.ElementTree(page)

# Elements
etree.SubElement(page, 'Country', Tim='Now',
name='Germany', AnotherParameter='Bye',
Code='DE', Storage='Basic')
etree.SubElement(page, 'City',
name='Germany', Code='PZ',
Storage='Basic', AnotherParameter='Hello')

# Save
outFile = open('output.xml', 'w') # <- trouble lives here
doc.write(outFile)

Explain Error

Python 3 draws a hard line between text (str) and binary (bytes):

What I didWhat that means in Py3
open('output.xml', 'w')I opened the file in text mode. The stream expects str.
doc.write(outFile)lxml.etree.ElementTree.write() pumps out bytes as soon as an encoding is in play (UTF-8 is the default).

Bytes flowing into a text stream triggers the exact TypeError I saw. The function isn’t misbehaving my file handle is.

The Fix Error

OptionChangeWhy it works
A. Let lxml open the filedoc.write('output.xml', encoding='utf-8', xml_declaration=True)I pass a file path (a str). lxml opens the file itself—in binary mode—so its bytes go where bytes belong.
B. Open the file in binary modepython<br>with open('output.xml', 'wb') as f:<br>&nbsp;&nbsp;&nbsp;doc.write(f, encoding='utf-8', xml_declaration=True)'wb' returns a binary stream, which welcomes the bytes without complaint.

Either way, goodbye TypeError.

Fix Code

While I was in the code, I polished it into a reusable helper:

  • Pretty prints the XML (so I can eyeball diffs).
  • Builds nodes from an OrderedDict, not hard-coded literals.
  • Accepts an output filename on the command line.
  • Times the run—handy when I batch-generate big reports.
#!/usr/bin/env python3
"""
xml_builder.py – bite-size XML generator
Run:
python xml_builder.py [outfile.xml]
"""
import sys, time
from datetime import datetime
from collections import OrderedDict
from lxml import etree

# ---------- helpers ----------
def build_tree(records):
root = etree.Element('results', generated=datetime.utcnow().isoformat())
for tag, attrs in records:
etree.SubElement(root, tag, **attrs)
return etree.ElementTree(root)

def save_tree(tree, filename='output.xml'):
with open(filename, 'wb') as fh: # binary = no TypeError
tree.write(fh,
encoding='utf-8',
xml_declaration=True,
pretty_print=True)

# ---------- main ----------
if __name__ == '__main__':
t0 = time.perf_counter()

data = [
('Country', OrderedDict([
('Tim', 'Now'),
('name', 'Germany'),
('AnotherParameter', 'Bye'),
('Code', 'DE'),
('Storage', 'Basic')
])),
('City', OrderedDict([
('name', 'Germany'),
('Code', 'PZ'),
('Storage', 'Basic'),
('AnotherParameter', 'Hello')
]))
]

tree = build_tree(data)
target = sys.argv[1] if len(sys.argv) > 1 else 'output.xml'
save_tree(tree, target)

print(f'XML written to {target} in {time.perf_counter() - t0:.4f}s')

Explain it

Want to stretch the script? Try these:

  1. Add argparse so you can inject tag/attribute pairs from the shell.
  2. Validate the output against an XSD before writing catch schema drift early.
  3. Benchmark pretty_print=True vs False with a few hundred k nodes.
  4. Swap libraries: rewrite with xml.etree.ElementTree and compare speed and API clarity.

Final Thought

The bug took five minutes to squash, but it reminded me why Python 3’s harsh stance on text versus bytes is a blessing. After the fix I walked away with cleaner I/O, a faster script, and a little utility I can drop into any project. Worth the detour and now my CI pipeline stays green.

Related blog posts