Recently, I needed to integrate one of my applications to MS Office Word 2007 by generating dynamic *.docx reports. Actually, I didn't want to just find the steps to do it. I wanted to make a reusable library so that, I can use it independently in any project in the future.
Introducing Docx File Format
In Office 2007, new file formats are introduced such as docx (for MS Word) and xlsx for (MS Excel). These extensions are based on an ECMA OpenXML standard which enables you to define your data and styles based an XML format specifications (such as wordprocessorML for word and spreadsheetML for excel). So, in the end, it's just an XML and you easily control the style, data and configuration of your office documents by modifying some XML documents.
Here, I will focus on the new MS Word file format - docx. The format is actually a zip achieve file which means you can open it using any zip extractors. The *.docx file itself is called Package. When you unzip the content of *.docx file, you will have a collection of folders and XML files. Each file is called part. These parts contains all the needed styles, layout, fonts and configurations of your word document. And the relations between these parts or files are also defined in XML. This is an important point here as XML-like files enable the developers to change the file style or even the entire data using any programming language. It's not about Microsoft technologies. OpenXML is now an ECMA standard and the format now is accessible to any programming language.
The XML part of interest in docx files is "document.xml". This part contains all the data written in the word document. To see its formate, try to make a word document, write any text inside it, unzip, open document.xml which exists in "word" folder and see how your word document is expressed as XML file.
In this section, I will explain my library design which I made to generate my dynamic documents. I created a class called GenericWordDocument. This is the main class in my library. Beside this class, I created a base class called TemplateFile. This class represents the docx template file which my document will inherits its styles, fonts and main characteristics. This help me so that I can make all the static visualization by hand (just by opening Office 2007, set the colors, document header, footers and all lovely static staff) and then use these visualization in my generated document by taking a copy of this template and modify it dynamically within my program.
I have also some additional classes for generating dynamic data. The first one is Iterator Class and the other one called TemplateRepeatedItem which is inherited from TemplateItem. So that you can have a repeated data to be generate in the word document. Simply you set the XML style of the iterator, when a new Item created in the Iterator it will inherit the style and reformulate itself with the manipulated data.
So, suppose I want to generate a simple word report for some products in my store. I will make a docx template file, add some keywords inside it such as: "#STORE_NAME#", "#COMPANY_NAME#" and so on. This keyword will be replaced with my data within my program. The following snippet show how I will use GenericWordDocument in this simple case:
Suppose, I want to generate iterated rows for some products in my company store. I write down something like that:
Also, you can make nested iterations as each iterator aggregate another iterator inside it. So, you can generate Product list and for each product you can, for example, list its accessories.
Reading/Editing XML Parts
.NET Framework 3.0 gives you the ability to read *.docx files and its XML parts using WindowsBase assembly. You just add a reference for WindowsBase assembly to your project and you can access the inner hierarchy and parts of docx format without extracting it.
The following is a method which read document.xml part from the docx file:
And the following snippet is the part which generate the word document and write the modified XML to document.xml
More about OpenXML?
I recommend these links for learning more about Office OpenXML - OOML and the new MS Office 2007 file formats:
In this post, I tried to make use of OpenXML format of the docx file to create dynamic word documents. I think the new OpenXML formats of Office 2007 is a worthy addition to the Microsoft products interoperability which will increase the developers ability to create more usable and productive projects.
Vote For it