This is a mirror of official site: http://jasper-net.blogspot.com/

Read and write Open XML files (MS Office 2007)

| Monday, August 1, 2011
Introduction

With Office 2007, Microsoft decided to change the default application formats from old, proprietary, closed formats (DOC, XLS, PPT) to new, open, and standardized XML formats (DOCX, XLSX, and PPTX). The new formats share some similarities with the old Office XML formats (WordML, SpreadsheetML) and some similarities with the competing OpenOffice.org OpenDocument format, but there are many differences. Since the new formats will be default in Office 2007 and Microsoft Office is the most predominant Office suite, these formats are destined to be popular and you will probably have to deal with them sooner or later.

This article will explain the basics of the Open XML file format, and specifically the XLSX format, the new format for Excel 2007. Presented is a demo application which writes / reads tabular data to / from XLSX files. The application is written in C# using Visual Studio 2010. The created XLSX files can be opened using Excel 2007 or greater.


Microsoft Open XML format

Every Open XML file is essentially a Zip archive containing many other files. Office-specific data is stored in multiple XML files inside that archive. This is in direct contrast with the old WordML and SpreadsheetML formats which were single, non-compressed XML files. Although more complex, the new approach offers a few benefits:

    You don't need to process the entire file in order to extract specific data.
    Images and multimedia are now encoded in native format, not as text streams.
    Files are smaller as a result of compression and native multimedia storage.

In Microsoft's terminology, an Open XML Zip file is called a package. Files inside that package are called parts. It is important to know that every part has a defined content type and there are no default type presumptions based on the file extension. The content type can describe anything; application XML, user XML, images, sounds, video, or any other binary object. Every part must be connected to some other part using a relationship. Inside the package are special XML files with a ".rels" extension which define the relationship between the parts. There is also a start part (sometimes called "root", which is a bit misleading because the graph containing all parts doesn't have to be a tree structure), so the entire structure looks like in picture 1.

Read more: Codeproject
QR: OpenXML.aspx

Posted via email from Jasper-net

0 comments: