Tuesday, February 27, 2007

Generating Dynamic OpenXML Docx Files

Recently, I needed to integrate one of my applications to MS Office Word 2007 by generating dynamic *.docx reports. Actually, I didn't want to just find the steps to do it. I wanted to make a reusable library so that, I can use it independently in any project in the future.





Introducing Docx File Format

In Office 2007, new file formats are introduced such as docx (for MS Word) and xlsx for (MS Excel). These extensions are based on an ECMA OpenXML standard which enables you to define your data and styles based an XML format specifications (such as wordprocessorML for word and spreadsheetML for excel). So, in the end, it's just an XML and you easily control the style, data and configuration of your office documents by modifying some XML documents.

Here, I will focus on the new MS Word file format - docx. The format is actually a zip achieve file which means you can open it using any zip extractors. The *.docx file itself is called Package. When you unzip the content of *.docx file, you will have a collection of folders and XML files. Each file is called part. These parts contains all the needed styles, layout, fonts and configurations of your word document. And the relations between these parts or files are also defined in XML. This is an important point here as XML-like files enable the developers to change the file style or even the entire data using any programming language. It's not about Microsoft technologies. OpenXML is now an ECMA standard and the format now is accessible to any programming language.

The XML part of interest in docx files is "document.xml". This part contains all the data written in the word document. To see its formate, try to make a word document, write any text inside it, unzip, open document.xml which exists in "word" folder and see how your word document is expressed as XML file.


GenericWordDocument Class

In this section, I will explain my library design which I made to generate my dynamic documents. I created a class called GenericWordDocument. This is the main class in my library. Beside this class, I created a base class called TemplateFile. This class represents the docx template file which my document will inherits its styles, fonts and main characteristics. This help me so that I can make all the static visualization by hand (just by opening Office 2007, set the colors, document header, footers and all lovely static staff) and then use these visualization in my generated document by taking a copy of this template and modify it dynamically within my program.

I have also some additional classes for generating dynamic data. The first one is Iterator Class and the other one called TemplateRepeatedItem which is inherited from TemplateItem. So that you can have a repeated data to be generate in the word document. Simply you set the XML style of the iterator, when a new Item created in the Iterator it will inherit the style and reformulate itself with the manipulated data.



So, suppose I want to generate a simple word report for some products in my store. I will make a docx template file, add some keywords inside it such as: "#STORE_NAME#", "#COMPANY_NAME#" and so on. This keyword will be replaced with my data within my program. The following snippet show how I will use GenericWordDocument in this simple case:



Suppose, I want to generate iterated rows for some products in my company store. I write down something like that:



Also, you can make nested iterations as each iterator aggregate another iterator inside it. So, you can generate Product list and for each product you can, for example, list its accessories.

Reading/Editing XML Parts

.NET Framework 3.0 gives you the ability to read *.docx files and its XML parts using WindowsBase assembly. You just add a reference for WindowsBase assembly to your project and you can access the inner hierarchy and parts of docx format without extracting it.

The following is a method which read document.xml part from the docx file:




And the following snippet is the part which generate the word document and write the modified XML to document.xml


More about OpenXML?

I recommend these links for learning more about Office OpenXML - OOML and the new MS Office 2007 file formats:


Conclusion

In this post, I tried to make use of OpenXML format of the docx file to create dynamic word documents. I think the new OpenXML formats of Office 2007 is a worthy addition to the Microsoft products interoperability which will increase the developers ability to create more usable and productive projects.

kick it on DotNetKicks.com
Digg it
Vote For it


Saturday, February 24, 2007

Average .NET Developers Salaries

Are .Net developers really paid higher salaries than the traditional development languages? I have just found this survey about the average salaries for developers in UK. The survey covers several development languages like C++, JAVA, .NET and Delphi. It was made in August-October 2006. You may check it here:

Average Salaries Survey


Friday, February 23, 2007

Filtering Procedures.. Do you make it right?

All of us do filtering in their project. It's one of the most repeated functionalities. But do we make it well? How do you make your filtering procedures? Ok.. Have a look on the following one:

CREATE PROCEDURE std_GetFiles
@fileCategory varchar(50),
@tag varchar(50)
AS
declare @SQL as nvarchar(200)

SET @sql = 'SELECT * FROM tscoFileIndex WHERE '

if @fileCategory <> null
set @sql = @sql + ' FileCategory = ' + @fileCategory + ' and '

if @tag <> null
set @sql = @sql + ' tag = ' + @tag + ' and '

SET @SQL= LEFT(@SQL, LEN(@SQL) -4)

GO


This is the first way come to your mind when you trying to do filtering. But take care. Actually, all the string concatenation in this procedure make your data under threaten. As this allow procedure users to inject sql statements inside your concatenated query by passing unexpected paramaters through "fileName" and "tag" inputs. Also the many if conditions here affects your procedure performance.

The alternative, which is better than this, would be something like that:

CREATE PROCEDURE std_GetFiles
@fileCategory varchar(50),
@tag varchar(50)
AS

SELECT *
FROM tscoFileIndex
WHERE (@fileCategory is null OR FileCategory = @fileCategory ) AND (@tag is null OR tag = @tag)

GO


So, you match all rows if the parameter is null, and use the
condition when the parameter is not null.

kick it on DotNetKicks.com


Sunday, February 11, 2007

MDC07 Review

As you know, MDC07 started in last 4th Feb. and remains for 4 days. The conference in general was very good, however some faults and management drawbacks affected its ultimate success.

Actually, I didn't attend Day 0, which has some problem in registration process (depending on attendees comments in this day). So, I will start talking about Day 1. In this day, The Keynote was so bad for a big conference like MDC. They have interviewed some young children who have basic programming skills, introducing windows vista and office features, interviewing with some business guys from ITWorx and Investment Ministry. The keynote wasn't informative at all as most of the attendees are developers and they are not interested in such stuff.

Afterwards, I attend "Patrick Hynds" session. The session was talking about some tips for ASP.NET. Although, Patrick is one of the most popular speakers, in this MDC, he wasn't that good in choosing his session subjects. I felt that he didn't have anything new to talk about.

I attend also "Agile Methodology" session, one of the most exciting and informative sessions in this MDC. The speakers was two members in "Microsoft patterns & practices" in Microsoft HQ. The session was discussing the Agile Methodologies, XP Methods and Pair Programming.

Then, attending Delving into VSTS for Software Testers. It was nice to see the Microsoft investment in Testing Module in VSTS. Not only API testing, but also some interesting testing techniques for the load and performance of your applications.

In Day 2, some sessions were cancelled or removed, which was so confusing! I attended a session about VSTS tools for System Architects. The speaker was Microsoft guy called "Abhishek Mathur". So interested session. Afterwards, attended another session about ASP.NET AJAX-Style Server Controls for "Patrik Hynds". It was interested to know how to build your own ajax-style server controls if the current one doesn't fit your requirements. Then, I attended a session about Workflow Foundation. It wasn't so informative. Only some introduction. We expected more than that.

The conference as overall was above average, not so perfect. However, the organization was better than the last years. The place - Cairo conference center - is better and perfect for this large event. Very nice ideas introduced in this conference such as encouraging students to join ImagineCup competition and supporting the online Egyptian communities. I think the sessions needs to target more the developers needs, removing all the propaganda stuff. We hope for better organization in the next years.