This post discusses my experience converting a large MS Word document into a LaTeX document using Word-to-LaTeX. Along the way I encountered several challenges. I thought I'd document them in case it may be of interest to others.
Having a good conversion method is important when transitioning existing Word documents to LaTeX and when colloborating with others who are not familiar with LaTeX. Wilfred Hennings provides a great page that summarises options for converting documents from PC Wordprocessors to LaTeX. The page list options and provides recommendations. Wilfred's Quick Comparison List is particularly useful.
I started my conversion journey with Michal Kerbt's Word-to-LaTeX (Word-to-XML) Convertor. It's free software but Michal accepts donations. I used it in it's stand alone form. It provides many options for converting documents. While it generally worked well, the following discusses various challenges in the conversion process with a description of what I did in response.
Convert *.docx to *.doc format: Problem: The convertor did not appear to work with Word 2007 files (*.docx). Solution: Save As Word 97 (*.doc) format.
Set up a PostScript printer: Problem: In order to convert graphics files a postscript printer needs to be setup. Solution: (Warning: I've had some problems with the EPS files generated; I'm not sure if its related to this printer setup)
- Install a postcript printer: Adobe sets out one way to set up a virtual printer. I followed these instructions. Here is a direct link to the downloads. In short it involves installing a PostScript printer driver with a PPD specification.
- Specify in Word-to-LaTeX: This printer is then specified in the Word-to-LaTeX configuration: Figures/Eq/Documents - Figures - PostScript printer.
Big complex documents: Problem: Long and complex documents can take a while to run (e.g., 15 minutes for a 60,000 word document with many styles and tables on a 2007 laptop). Solution: Hey. Who cares! Just let it run. It's quicker than trying to do it manually.
Security Alert Over Macros: Problem: The program installs a macro in the Word Startup folder. My version of Word (Word 2007) disabled this by default. Solution: It is possible to enable all macros. However, this is not particularly safe. I decided to delete the file from: "C:\Program Files\Microsoft Office\Office12\STARTUP" and just run the program through its stand-alone desktop interface.
Tidying Up Problem: The *.tex document was not exactly what I wanted. Solution: Several options presented themselves.
- Change input: I could alter the Word document. I could remove styles, remove unwanted fonts, and so on.
- Change process: Word-to-LaTeX presents many configuration options which I could play with.
- Post-process: I could apply various replacement operations on the *.tex created by Word-to-LaTeX.
Start Up Problem Problem: When I ran Word-to-LaTeX, I obtained the following error:
Conversion started.
Fatal error: Call was rejected by callee.
at Word.DocumentClass.Activate()
at WordToLatex.WLConvertor.Convert()
at WordToLatex.Bin.WLApplication.Main(String[] args)
Solution I closed Word-to-LaTeX. I closed Word. I then pressed control+alt+delete and ended any WINWORD processes that were running. I then restarted Word-to-LaTeX. As an additional point it was sometimes necessary to close Word-to-LaTeX
Conversion Problem Problem: I obtained the following error.
Converting document fields.
Unknown error: Object reference not set to an instance of an object.
at WordToLatex.WLProcessFields.FieldHyperlink(Field field)
at WordToLatex.WLProcessFields.ProcessField(Field field)
at WordToLatex.WLProcessFields.ProcessAllFields()
at WordToLatex.WLConvertor.ConvertInner()
at WordToLatex.WLConvertor.Convert()
Solution:
- Divide and conquer: Dividing a long document into smaller parts to identify which parts could be processed was one strategy. If you do this, it may be good to put the files in separate folders, otherwise image files from one subdocument may be overridden by a latter subdocument.
- Paste into Fresh Document: Another trick that worked for me was to copy and paste the contents of the document into a new document. I'm not sure why this worked. Perhaps it worked because it removed a number of custom styles I had.
Leave a Reply
You must be logged in to post a comment.