How to Fix the u2013 Unicode Error in Openpyxl on Python 2.7

I hit this snag while cleaning up an old Python 2.7 box that still churns out reports. One Excel file refused to convert because of a single en dash hiding in the data. Here’s how I spotted the bug, squashed it, and then polished the script so it can serve me (and maybe you) for years to come.

My Code

#!/home/casper/python/core/2.7.14/exec/bin/python2.7
# -*- coding: utf-8 -*-
import openpyxl
import csv

wb = openpyxl.load_workbook('RiskLimitSnapshot.xlsx')
sh = wb.get_active_sheet()

with open('goodRiskLimitSnapshot.csv', 'wb') as f:
c = csv.writer(f)
for r in sh.rows:
c.writerow([cell.value for cell in r])

The Error

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 74: ordinal not in range(128)

Explain Error

  • u2013 = en dash. Perfectly normal in Unicode, but not part of ASCII’s tiny 128-character family.
  • Python 2 tries ASCII first whenever it has to squeeze a unicode string into raw bytes. Anything above 127 explodes with UnicodeEncodeError.
  • csv in Python 2 only writes bytes, so the unwanted conversion happens inside csv.writerow, miles away from my loop.

I tried opening the file with io.open(..., encoding='utf8') next, but that swung the pendulum the other way:

TypeError: write() argument 1 must be unicode, not str

Now the file object expected unicode, while csv.writer was still sending byte strings. Same fight, opposite corner.

Correct Code

#!/usr/bin/env python2
# -*- coding: utf-8 -*-
import openpyxl, csv, os, sys

def excel_to_csv(xlsx_path, csv_path, sheet_name=None):
"""
Convert one sheet from an .xlsx file to a UTF-8 CSV file.
"""
wb = openpyxl.load_workbook(xlsx_path, data_only=True, read_only=True)
sh = wb[sheet_name] if sheet_name else wb.active

with open(csv_path, 'wb') as fh:
writer = csv.writer(fh)
for row in sh.rows:
utf8_row = []
for cell in row:
# Make every value Unicode first
text = u'' if cell.value is None else unicode(cell.value)
# Then encode safely to UTF-8
utf8_row.append(text.encode('utf-8'))
writer.writerow(utf8_row)

if __name__ == '__main__':
# Usage: python excel2csv.py sheet.xlsx sheet.csv "Optional Sheet Name"
if len(sys.argv) < 3:
sys.exit("Usage: excel2csv.py <xlsx> <csv> [<sheet-name>]")
excel_to_csv(sys.argv[1], sys.argv[2],
sys.argv[3] if len(sys.argv) > 3 else None)
print("Saved →", os.path.abspath(sys.argv[2]))

Why it works

  • I keep the file in binary mode ('wb') because csv demands raw bytes on Python 2.
  • I explicitly encode every cell to UTF-8, side-stepping the silent ASCII trap.
  • read_only=True streams rows, so even a giant workbook doesn’t hog RAM.

Fun Practice Tweaks

What to addOne-line hint
Convert all sheetsLoop over wb.sheetnames, dump <sheet>.csv.
Skip blank rowsif all(c.value is None for c in row): continue.
Swap weird dashestext = text.replace(u'\u2013', '-').
Try another delimitercsv.writer(fh, delimiter='\t').
See progressif i % 1000 == 0: print("row", i).
Batch every .xlsxfor path in glob.glob('*.xlsx'):.
Tiny unit testWrite clean(text) and hit it with pytest.

These micro challenges keep the muscle memory fresh when you’re stuck on legacy Python.

Key Take Aways

  1. Declare the encoding yourself in Python 2; never let ASCII sneak in.
  2. csv is byte-only on Py2. Either hand it bytes or use Samuel Colvin’s unicodecsv back-port for nicer semantics.
  3. A few quality-of-life flags (sheet name, delimiter, logging) turn a throwaway script into a tiny command-line app you’ll reuse.

Final Thought

I wrote the first version just to dodge one stubborn en dash, but the refactor gave me a durable tool that chews through spreadsheets all day. That’s the sweet spot with these little maintenance chores: fix one headache, learn a clean pattern, and walk away with code that pays rent every morning.

Related blog posts