|Conversion Options Dialog|
DocToHtml—the List of Unsupported Formatting Features
DocToHtml does not aim to support all MS Word
formatting capabilities. Instead, it
targets at the majority of the most used ones, and tries to make
the output (X)HTML documents as compact and clean as
possible to make them ready for the Web. Below there is a
list of some unsupported or partially supported MS Word
- Overlapped bookmarks or hyperlinks may be converted into
overlapping <A> tags, or even be
- MS Word autoshapes, diagrams, and organization charts are
- Table cell borders are converted only if all cells in a given
table have the same border style. If the borders of even one cell
in the table have a different formatting, the respective table in
the output (X)HTML document will not have any borders at all.
- All rows in a table must have the same beginning and ending
horizontal coordinates. If the rows begin at different left-margin
coordinates, cell width may be set incorrectly.
- Currently, the program cannot split documents based on pages,
but based on Headings only.
- Animated underline styles and animated characters are not
- Currently, the program does not support character, list, or
table-cell styles, but paragraph styles only.
- MS Word forms and dynamic controls within them
(checkboxes, list boxes, text input fields, etc.) are not
- Applied MS Word formatting styles cannot be easily
redefined to make the output (X)HTML document look different from
the original MS Word doc.
- HTML and CSS do not have equivalents for such things as complex
text wrapping modes for text boxes, so such formatting is not
- Sometimes complex formatting will not be completely
reproduced in the resulting HTML document. This is normal.
- The position of “anchored” images cannot be precisely
MS Word can handle at least two types of pictures: inline
pictures and floating ones. An inline picture is embedded into the
text stream, and treated as one big character when applying page
layout algorithms. A floating picture is attached to its “anchor”
and can be moved by dragging the anchor sign.
By default, before starting the actual conversion into (X)HTML,
DocToHtml internally converts all floating pictures into inline
ones. You can alter this behavior via the “Process
floating images” checkbox on the Images tab of the Conversion Options dialog.
If you turn this option off, DocToHtml will not be converting
any floating (“anchored”) images at all.
You can also convert anchored images into inline ones by hand,
as follows. Right-click an image to open the context menu, select
the “Format Picture” menu item, go to the Layout tab,
select the “In line with text” option, and then click
If you do that, the connected anchor should disappear, and the
picture’s position on the page may slightly change. The same change
of picture positions occurs when the DocToHtml conversion process
handles the pictures before their actual conversion into the
resulting (X)HTML document. That’s why you may observe that
pictures in the output (X)HTML document are positioned not exactly
as in the original MS Word document. So, the first rule to get
the best results in preserving picture positions is to use only
inline pictures. The second rule, in case if you need a
particularly precise transformation, is to use borderless tables
and inline pictures inside table cells, one picture per cell.
Suppose, you want to position four pictures inside a square: two
pictures in the first row, and two pictures in the second row, with
the pictures in the second row being exactly under the ones in the
first row. To do that, insert a table with two rows and two columns
into the document, and then insert your pictures into the table
cells. Apply the “Center” horizontal alignment and the
“Center” vertical alignment to all cells (for example, in
MS Word 2003 you can do that via the Table and
Cell tabs of the Table Properties dialog).
Open the “Borders and Shading” dialog and set borders to
“None” for the whole table (unless you need a cell grid in
the resulting HTML document).
Do not forget to set proper options on the Table Attr tab of the Conversion Options dialog. You
can simply check (ALL) and (All CSS) checkboxes there.
After that, the output (X)HTML document will have the pictures
at the same positions as in the original MS Word doc.
(All screenshots were taken in MS Word 2003. Other
MS Word versions should have similar dialogs.)
We regularly add features and functions most requested
by our users. DocToHtml’s further development will be
based on feedback from our users. If you need any new features,
please e-mail your suggestions to firstname.lastname@example.org, and we’ll
consider implementing them. If you think that we have not included
some feature in the above list, please inform us about that. We
would appreciate your feedback!