If you’re building a Django-based sales system and need to incorporate electronic invoicing compliant with government regulations, XML signing is a critical step. Many developers face challenges here, especially when transitioning from testing to production. I’ll walk through how to sign XML documents in Python, troubleshoot common errors (like the dreaded “Incorrect reference digest value”), and ensure compliance with regulatory standards.
XML Signature Validation Failures
In my case, the regulatory authority required XML invoices to be signed with a company’s digital certificate, zipped, and sent via a web service. Despite using Python’s lxml
and cryptography
libraries, the web service returned an error:Error: The entered electronic document has been altered - Detail: Incorrect reference digest value
.
This error indicates that the digest value (a hashed summary of the XML content) calculated during signing doesn’t match what the web service computed. The culprit? Incorrect canonicalization or improper handling of the XML structure during signing.
Understanding XML Signatures
XML signatures follow the XML Signature Syntax and Processing standard. The process involves:
- Canonicalization: Converting the XML to a standardized format (to ignore trivial differences like whitespace).
- Digest Calculation: Hashing the canonicalized content.
- Signature Generation: Encrypting the digest with a private key.
Mistakes in any step break the signature validation.
What Went Wrong?
Let’s dissect the original code and identify issues:
Incorrect Canonicalization
The code uses etree.tostring(copy_tree, exclusive=1)
for canonicalization. While exclusive=1
applies Exclusive XML Canonicalization, the regulatory authority might require Inclusive Canonicalization (common in Latin American e-invoicing systems). This discrepancy alters the digest value.
Manual Whitespace Stripping
Using re.sub(b'>\s*<', b'><', xml_serialized)
is a red flag. Canonicalization should handle whitespace—manual stripping can corrupt the structure.
Reference URI Mismatch
The Reference
element’s URI
attribute was empty (URI=""
), implying the entire document is signed. However, some systems expect a fragment identifier (e.g., URI="#Invoice"
), especially if the XML contains multiple signable sections.
Incorrect Order of Operations
The DigestValue
was calculated before finalizing the SignedInfo
structure. If the SignedInfo
itself is modified after hashing, the digest becomes invalid.
Revised Python Code
Here’s an improved approach using lxml
and cryptography
, with careful attention to canonicalization and transformations:
from lxml import etree from cryptography.hazmat.backends import default_backend from cryptography.hazmat.primitives import hashes from cryptography.hazmat.primitives.asymmetric import padding from cryptography.hazmat.primitives.serialization import load_pem_private_key import base64 def sign_xml(xml_path, private_key_path, cert_path, output_path): # Load XML tree = etree.parse(xml_path) root = tree.getroot() # Register namespaces ns = { "ds": "http://www.w3.org/2000/09/xmldsig#", "ext": "urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2", "cac": "...", # Add all required namespaces } # Prepare the Signature element in the correct location extension_content = root.find(".//ext:ExtensionContent", namespaces=ns) if extension_content is None: raise ValueError("ExtensionContent not found!") # Create Signature structure signature = etree.SubElement(extension_content, "{http://www.w3.org/2000/09/xmldsig#}Signature", Id="SignatureSP") signed_info = etree.SubElement(signature, "{http://www.w3.org/2000/09/xmldsig#}SignedInfo") # Canonicalization method (confirm with your authority!) etree.SubElement( signed_info, "{http://www.w3.org/2000/09/xmldsig#}CanonicalizationMethod", Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315" # Inclusive Canonicalization ) # Signature method etree.SubElement( signed_info, "{http://www.w3.org/2000/09/xmldsig#}SignatureMethod", Algorithm="http://www.w3.org/2001/04/xmldsig-more#rsa-sha256" ) # Reference element with correct URI reference = etree.SubElement( signed_info, "{http://www.w3.org/2000/09/xmldsig#}Reference", URI="" # Or "#Invoice" if targeting a specific element ) # Transforms transforms = etree.SubElement(reference, "{http://www.w3.org/2000/09/xmldsig#}Transforms") etree.SubElement( transforms, "{http://www.w3.org/2000/09/xmldsig#}Transform", Algorithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature" ) # Add canonicalization transform if needed etree.SubElement( transforms, "{http://www.w3.org/2000/09/xmldsig#}Transform", Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315" ) # Digest method etree.SubElement( reference, "{http://www.w3.org/2000/09/xmldsig#}DigestMethod", Algorithm="http://www.w3.org/2001/04/xmlenc#sha256" ) # --- Calculate DigestValue --- # Canonicalize the XML *excluding* the Signature element signed_info_element = root.find(".//ds:SignedInfo", namespaces=ns) if signed_info_element is not None: signed_info_element.getparent().remove(signed_info_element) # Use lxml's canonicalize() for accurate canonicalization canonical_xml = etree.tostring( root, method="c14n", exclusive=False, with_comments=False ) # Compute digest digest = hashes.Hash(hashes.SHA256(), backend=default_backend()) digest.update(canonical_xml) digest_value = base64.b64encode(digest.finalize()).decode() etree.SubElement(reference, "{http://www.w3.org/2000/09/xmldsig#}DigestValue").text = digest_value # --- Calculate SignatureValue --- # Re-canonicalize SignedInfo signed_info_canonical = etree.tostring( signed_info, method="c14n", exclusive=False ) # Load private key with open(private_key_path, "rb") as f: private_key = load_pem_private_key(f.read(), password=None, backend=default_backend()) # Sign signature_bytes = private_key.sign( signed_info_canonical, padding.PKCS1v15(), hashes.SHA256() ) signature_value = base64.b64encode(signature_bytes).decode() etree.SubElement(signature, "{http://www.w3.org/2000/09/xmldsig#}SignatureValue").text = signature_value # Add X509 certificate key_info = etree.SubElement(signature, "{http://www.w3.org/2000/09/xmldsig#}KeyInfo") x509_data = etree.SubElement(key_info, "{http://www.w3.org/2000/09/xmldsig#}X509Data") with open(cert_path, "rb") as cert_file: cert_data = cert_file.read() etree.SubElement(x509_data, "{http://www.w3.org/2000/09/xmldsig#}X509Certificate").text = base64.b64encode(cert_data).decode() # Save signed XML tree.write(output_path, encoding="UTF-8", xml_declaration=True)
Key Fixes Explained
- Canonicalization Method:
Switched toÂmethod="c14n"
 (Inclusive Canonicalization) instead of manual regex stripping. Confirm with your authority which method they require. - Transforms Order:
Added bothÂenveloped-signature
 (to exclude theÂSignature
 itself during hashing) and a canonicalization transform if needed. - Digest Calculation:
Removed theÂSignature
 element before canonicalizing the XML to avoid hashing an incomplete structure. - Proper Namespace Handling:
Explicitly defined all namespaces to prevent mismatches.
Testing and Validation
- Validate XML Structure:
Use tools like XMLSchema to ensure compliance with the regulatory schema. - Online Validators:
Test your signed XML with tools like XMLSec or government-provided validators. - Compare with Official Samples:
Obtain a correctly signed XML sample from your regulatory authority and compare structures using diff tools.
Final Thoughts
XML signing in Python is feasible but requires meticulous attention to canonicalization, transforms, and digest calculations. While lxml
and cryptography
work, consider using specialized libraries like xmlsec
for a more streamlined workflow. Always double-check the regulatory requirements for canonicalization methods, URI references, and certificate formatting—these details are often the difference between success and cryptic errors.