DocToHtml—the DOC to HTML Converter FAQ
MS Word has the “Save as Web Page”
feature, so why should I use DocToHtml?
There are several reasons, including these:
- DocToHtml produces standards-compliant, clean HTML code with
- MS Word cannot convert multiple documents at once.
- MS Word’s “Save as Web Page” feature provides very few
What is the difference between HTML documents
produced by DocToHtml and MS Word?
DocToHtml produces a much smaller and cleaner HTML code, ready
to use on your website. HTML code produced by MS Word contains
a lot of MS Office-specific tags and unnecessary markup.
Microsoft’s intention was to save all document-specific information
when generating HTML code, to let the user open the HTML document
and continue working with it without losing anything.
Unfortunately, this approach has led to very bloated HTML-code,
with Microsoft-specific markup, which is practically impossible to
edit and is no good for a decent website. As a rule, most of that
code can be safely stripped off without any impact on the
document’s appearance in the web browser. That’s what DocToHtml is
What is the difference between HTML documents
produced by DocToHtml and other “clean-up” conversion tools?
Some tools try to clean up the HTML code generated by
MS Word. Though this approach works, it has major drawbacks.
That’s because Word HTML code is too messy for doing some
For example, Microsoft Word tends to save lists without
using the <ol> or <li> tags. Instead, it just adds a
list item number as the first character(s) of the paragraph. After
that, it might be pretty hard to decipher that a paragraph starting
with a digit is logically not a paragraph at all, but a list
Another example is MS Word’s treatment of tables with
combined cells. MS Word can produce HTML tables with a
different number of cells in each row. This is a nonstandard
approach, and there are no guarantees that a particular browser
will correctly interpret such a table. MS Word also tends to
put the “width” attribute into every table cell, which is
unnecessary in most cases. To deal with these issues, it is
insufficient just to strip off some tag attributes based on a set
of predefined rules, which is what most clean-up utilities do.
Instead, you need to restore the layout structure of the table and,
based on that structure, make individual decisions of whether the
“width” attribute is needed. This is hard to do when you have only
the HTML code produced by the Microsoft built-in converter.
Our program uses a totally different approach. Instead of trying
to clean-up the HTML code generated by MS Word, DocToHtml
produces HTML code on its own. DocToHtml reads the content and
properties of the input document via the MS Word automation;
based on these data, our program generates HTML code in the most
effective way. This approach ensures full control over the output
content, and allows us to do things that otherwise would be
impossible. This fact, combined with our strong intention to make
the resulting HTML documents as small and clean as possible,
results in a very optimized output code in most cases.
However, this approach has one drawback—the conversion might be
too slow at times. For a list of recommendations on how to speed up
the conversion process, please read this
Do I need to have MS Word in order to
perform a conversion?
Yes, currently MS Word 2000 or higher is needed for
DocToHtml to do the conversion.
Does DocToHtml have any technological
Yes, DocToHtml does have some limitations. For example, it does
not support some MS Word formatting features. To make things
better, we regularly add features and functions most requested by
our users. DocToHtml’s further development will be based on
feedback from our users. For a list of unsupported features, please
read this topic. If you
need some new features, please e-mail your suggestions to email@example.com, and we’ll
consider implementing them. We would appreciate your feedback!
Can DocToHtml produce output documents in the
Sorry, but DocToHtml is only intended for creating (X)HTML
documents, ready for the Web. To create a CHM document, you will
have to use other tools. One of the most advanced Help and Manual
authoring tools is Dr. Explain from Indigo Byte
Systems. It features WYSIWYG editing; simultaneous generation
of output documents in the CHM, HTML, PDF, and RTF formats;
automatic annotation of screenshots; support of style templates;
easy integration with your program’s source code in any language;
and so on.
I need to transfer all DocToHtml settings to
another computer. What should I copy?
DocToHtml stores all its settings in several files in the
“DocToHtml” subfolder of the %APPDATA% folder. %APPDATA%, shorthand
for Application Data, is the environment variable which designates
a folder where applications are supposed to create subfolders to
store all their settings. DocToHtml uses its own set of settings
for every user of the computer. To browse to the Application Data
folder, just type %APPDATA% (with percent signs) in the Windows
Explorer address bar, instead of a regular Windows filepath, and it
will be automatically expanded to the actual location. On 64-bit
systems, DocToHtml uses the Application Data folder intended for
32-bit applications, so the “Roaming” subfolder will be added to
the AppData path. To transfer all DocToHtml settings, including
user-defined HTML templates, just copy the contents of the
“%APPDATA%\DocToHtml” folder to the target computer. You will also
have to re-enter your registration key, because it is stored in the
system registry, not in the above-mentioned folder.
Can I use DocToHtml on my two computers? (I
have the Personal License.)
Yes, your Personal License entitles you to use DocToHtml on your
two computers. When you purchased the Personal License for
DocToHtml, you have been given the right to use the program on
different computers. The Personal License prohibits the use of your
copy of DocToHtml by any other persons on their computers, but you
can use it on more than one computer.
How can I speed up the conversion process?
The program in its current implementation has a major drawback:
a rather low conversion speed in case of complex input documents.
The reason is the OLE automation calls to MS Word in order to
perform the actual actions and retrieve the original formatting of
the text to be converted. To learn how to improve the situation,
please read the How to Speed Up the
Conversion Process help topic.