Validating of OpenXml generated documents or The file cannot be opened because there are problems with contents.

Stackoverflow is useful until you have a quite specific question and you have used Google before asking a question. My last 11 questions were left with no answers for one or another reason.

Today was not an exception. My question about debugging process during OpenXml development was left unnoticed. So I had to figure out for myself! Sigh!

UPDATE: I have discovered Open XML SDK Tool that adds a lot of features to this game

Anyway, I digress. My current task involves generating a MS Word document from C#. For that I’m using OpenXml library available in .Net. This is my first time touching OpenXml and maybe I’m talking about basic stuff, but this was not easily googleable.

Many times after writing a Word file, I try to open it and see this message:

The file .docx cannot be opened because there are problems with the contents. Details: Unspecified error

The file .docx cannot be opened because there are problems with the contents. Details: Unspecified error

This is really frustrating. The error message sometimes gives you a column inside of the xml file to look at, but does not say what exactly is wrong with the document. You look hard on generated xml, but can’t see anything. Because your eyes bleed from the angle brackets. Eventually you figure out that you try to do something silly, like add Text to Paragraph without having a Run or add a Paragraph directly to a TableRow without a TableCell (if you’ve done some OpenXml, you’ll know what I’m talking about!).

I thought there should be a better way to validate documents, and there was! Right on the page where I took my documentation from, there was a link to a page that talked about document validation! How did I not see that!?

Here it is for your consideration: http://msdn.microsoft.com/en-us/library/office/bb497334(v=office.15).aspx

Turns out that OpenXml library has a OpenXmlValidator class that does what is says on the tin – validates your OpenXml documents.

Being hard about my tests, I instantly rewrote the sample into a test that takes a Stream object with OpenXml generated document and outputs the validation errors:

using System;
using System.IO;
using System.Linq;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Validation;


public static class WordDocumentValidator
{
    public static void ValidateWordDocument(Stream wordDocumentStream)
    {
        using (var wordprocessingDocument = WordprocessingDocument.Open(wordDocumentStream, false))
        {
            var validator = new OpenXmlValidator();
            var validationErrors = validator.Validate(wordprocessingDocument).ToList();
            var errorMessage = String.Format("There are {0} validation errors with document", validationErrors.Count);

            if (validationErrors.Any())
            {
                Console.WriteLine(errorMessage);
                Console.WriteLine();
            }

            foreach (var error in validationErrors)
            {
                Console.WriteLine("Description: " + error.Description);
                Console.WriteLine("ErrorType: " + error.ErrorType);
                Console.WriteLine("Node: " + error.Node);
                Console.WriteLine("Path: " + error.Path.XPath);
                Console.WriteLine("Part: " + error.Part.Uri);
                if (error.RelatedNode != null)
                {
                    Console.WriteLine("Related Node: " + error.RelatedNode);
                    Console.WriteLine("Related Node Inner Text: " + error.RelatedNode.InnerText);
                }
                Console.WriteLine();
                Console.WriteLine("==============================");
                Console.WriteLine();
            }

            if (validationErrors.Any())
            {
                throw new Exception(errorMessage);
            }
        }

    }
}

And this Validator should be used like this:

using System.IO;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
using Xunit;


public class ValidatorTestSample
{
    // here you generate your Word document
    public static Stream GenerateValidDocument()
    {
        var memoryStream = new MemoryStream();
        using (var wordDocument = WordprocessingDocument.Create(memoryStream, WordprocessingDocumentType.Document))
        {
            // Add a main document part. 
            var mainPart = wordDocument.AddMainDocumentPart();

            // Create the document structure and add some text.
            mainPart.Document = new Document();
            var body = mainPart.Document.AppendChild(new Body());
            var paragraph = body.AppendChild(new Paragraph());
            var run = paragraph.AppendChild(new Run());
            run.AppendChild(new RunProperties(
                    new FontSize() { Val = "40" },
                    new RunFonts() { Ascii = "Helvetica" }
                ));
            run.AppendChild(new Text("Create text in body - CreateWordprocessingDocument"));
        }

        return memoryStream;
    }

    [Fact]
    public void GenerateValidDocument_Always_CreatesValidatedDocument()
    {
        var result = GenerateValidDocument();

        WordDocumentValidator.ValidateWordDocument(result);
    }
}

This is very simplistic approach of the test, but it shows the general idea. This exact code would give you failed test with output like this:

There are 1 validation errors with document

Description: The element has unexpected child element 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:rFonts'.
ErrorType: Schema
Node: DocumentFormat.OpenXml.Wordprocessing.RunProperties
Path: /w:document[1]/w:body[1]/w:p[1]/w:r[1]/w:rPr[1]
Part: /word/document.xml
Related Node: DocumentFormat.OpenXml.Wordprocessing.RunFonts
Related Node Inner Text: 

It took me a while to figure out why the test was failing. It was saying that the problem with RunProperties node, but says that RunFonts element is unexpected in RunProperties. But this is rendered perfectly fine if you open in MS Word. The issue here that validator has a specified order of the properties in which they must be provided. So if you swap places how you add properties to the run:

run.AppendChild(new RunProperties(
        new RunFonts() { Ascii = "Helvetica" },
        new FontSize() { Val = "40" }
    ));

This test passes. Strange, but at least I know exactly where to look for the problem!

But to be honest, my validator test is not how I found out about the order-preference for the validator. That was an Open-Source project Open XML Package Explorer. The software allows you to open OpenXml documents in their XML form, without unleashing Zip Archiver and Notepad++.

This is how exactly the same documents looks in Package Explorer:

Package_Explorer_Validation

Here you can see generated XML in a readable format. Plus you can run a validator on your document and have a slightly better validation error message presented to you. This is very good for manual tweaking your stuff, making sure that works and then converting that to a code. And here I figured out that order of properties does matter.

So, to conclude, for automatic validation of your documents use the validation class provided. To manually figure out what is wrong use Package Explorer.