Skip to main contentSkip to navigationSkip to navigation

Why Microsoft's .docx format is a good thing

This article is more than 16 years old

Here's a simple way to read one of the new Microsoft file formats, even if you have no Microsoft software installed. Let's suppose you have a file that ends with .docx, from the latest version of Microsoft Word. It's actually a zip file, so add .zip to the end and unzip it. You'll find a cluster of files and folders inside, and the one you want should be called document.xml. Double-click that and it will load in a browser window, where you can read the text.

I'm not suggesting this as a standard office procedure: the text still includes the mark-up tags. However, it does let you read the text in a hurry, and it provides an insight into why Microsoft's new formats are not such a bad idea. My colleague Charles Arthur recently complained about .docx under the headline This time, it's Microsoft which must adapt or die (June 7). I agree. And .docx shows Microsoft adapting.

Yes, the old Office file formats have enjoyed a long, successful run, and they aren't going away. But times have changed. First, Microsoft has come under belated pressure from the European Union and some US states to open up its formats. (I say "belated" because I've been campaigning against proprietary file formats for a decade.)

Second, the current file formats have become overly complex, and a target for virus writers. There are good technical reasons for replacing them with something more robust. Third, programmable file formats offer huge business advantages, including data mining and the re-use of content. With .docx, for example, a company can write routines to change the letterhead on thousands of letters and perform other updates without changing anything else.

That is hard or impossible with big binary blobs like a .doc file. It's easy when the file is already split into constituent parts, with the styles, font tables, settings and contents all in separate XML files. Fourth, the old formats are coming under pressure from the Open Document Format (ODF), which is already an OSI standard. The free OpenOffice.org suite uses ODF files.

Put those four things together and it's easy to see why Microsoft responded by creating the new Office Open XML formats. It has already put these through the ECMA open standards process with a committee that included Apple, the British Library and many others. It now wants to get them accepted by the OSI.

The big question, of course, is why Microsoft didn't simply adopt ODF. So I asked. The answer was that unless businesses could interchange documents between the old and new formats, the new standards would be difficult to adopt. ODF was not defined with Microsoft compatibility in mind, and couldn't offer the level of compatibility it required.

According to Microsoft program manager Brian Jones, the company has "no issues with ODF", which does a different job. Microsoft's goal was different: "An open XML format that could fully represent the existing base of Word binary documents." Either way, you don't have to use the new formats: the old ones are still supported. You don't have to buy a new copy of Office: the backwards compatibility pack very easily adds support to some earlier versions of Office for Windows. And you don't have to use either the old or new formats: you can still use plain text, RTF, HTML, PDF, ODF or whatever else does what you need.

There is no reason to be browbeaten into thinking that there should only be one document format. And I welcome the increases in power, flexibility, openness and choice brought by .docx, even though I have no intention of using it.

· If you'd like to comment on any aspect of Technology Guardian, send your emails to tech@theguardian.com

Most viewed

Most viewed