Mass-migrating Microsoft Word documents to on Linux

: If you're migrating documents from Microsoft Word to on Linux, don't make the leap without considering the options, pitfalls and tools available.

Is your organization in the process of migrating from Microsoft Word to on Linux? If so, your biggest...

obstacle may not be getting used to the new suite, but rather moving from Microsoft's proprietary .doc format to OpenOffice's Open Document Format (ODF).

Granted, the .doc format has been reverse-engineered in sufficient detail that many users can work with it as is. But if you're making the leap from Microsoft Office, it makes sense to jettison as many of its legacies as you can.

This article covers the issues involved in converting from Word to ODF: The converting options available to you, the potential pitfalls and the third-party tools that make conversion easier.

En masse vs. on demand
There are two basic approaches to dealing with converting Word documents to ODF: (1) convert at once and (2) convert as you go. The convert-at-once method involves taking all extant Word documents and converting them en masse into their ODF counterparts. It's useful if you don't have a large number of documents that need to be converted and if it's unlikely that they contain items that may not convert correctly, such as macros or complex formulas. The convert-as-you-go method involves transforming each Word document to ODF as needed; files are left in Word and only converted to ODF format once they're needed.

Each method has distinct advantages. Convert at once separates the task and the burden of conversion from those working with the documents, so users don't have to contend with the conversion process. But the approach can place an unrealistic burden on users if they aren't trained to recognize that a given document has been converted incorrectly or is missing data. Since each document has to be handled independently in the covert-as-you-go method, it's a slower approach. But it better ensures that each document has survived the conversion process intact.

In either case, it's critical to keep copies of the original, unchanged documents so that they can be referenced if need be.

Macro and formula incompatibility
As I mentioned previously, the .doc format has been sufficiently reverse-engineered such that many non-Microsoft programs (including, or OO.o) can open .doc formats and work with them natively. That said, the format also includes a few native Word features that users may have trouble translating correctly, so using the following features requires extra caution.

The first feature is macros written in Word's macro language. Many users -- myself included -- have a large library of Word macros that would be difficult or time-consuming to rebuild in OpenOffice. In time these macros would have to be rewritten entirely in OpenOffice's own macro language because -- at least right now -- there isn't a way to convert Word macros directly into OO.o. One way to provisionally bridge the gap between the two is the VBx OOo toolbox, which allows OpenOffice to be automated via various incarnations of Visual Basic, but this is far different from converting macros to run as is. (The site offers several good examples of how to build new macros in OO.o, or they can be used as is.)

The other potential stumbling block is Word's equation editor, which OO.o will make a best effort to convert to its native format. If the formula in a particular document doesn't translate correctly, you can insist that OO.o preserve the original formula formatting. To do so, in OO.o Writer, select Tools, then Options, then Load/Save, choose Microsoft Office, and uncheck the Load and Save boxes next to MathType to Math and Math to MathType.

Making the conversion
There are several ways to perform the migration process. The most basic is simply to open Word documents in, then resave them as ODF in a new location. If you have a small number of documents or if users are comfortable performing the conversion work on their own, this may be the best place to start.

It's also possible to automate the conversion process using itself and OO.o's own macro language. The site offers a simple script to convert a single file at a time without user interaction and can be further automated using a batch or shell script. It works cross-platform in both the Windows and Linux versions of the product.

You can also use third-party utilities. The document transformation company 3bView offers 3BOpenDoc, which transparently converts documents to and from ODF. The product works with an existing mail server or CMS: When emailed or opened through a document-management product, Word documents convert to and from ODF automatically. The company also provides the same functionality in a more standard bulk converter application so you can use that to manually convert many documents at once. You can also get an online trial of the product: Submit a Word document of less than 500 KB, and test how the software performs the transformation.

About the author: Serdar Yegulalp wrote for Windows Magazine from 1994 through 2001, covering a wide range of technology topics. He now plies his expertise in Windows NT, Windows 2000 and Windows XP as publisher of The Windows 2000 Power Users Newsletter and writes technology columns for TechTarget.

This was first published in September 2007

Dig Deeper



Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: