DocToHtml - Sample HTML Code
We invested much of time and efforts into
improving our algorithms of output HTML code optimization. The result
is that in most cases, DocToHtml produces the smallest
possible HTML code. It is several times smaller than code
produced by MS Word itself. And, more important, it thus can
be edited with your favorite HTML editor with ease!
Below there is a fragment from sample document "BUSINESS REQUIREMENTS SPECIFICATION"
converted with DocToHtml.
This fragment on screen in
MS Word (downscaled)
Code by MS Word 2003
Code by DocToHtml 2.50
Code by DocToHtml 2.50, without indentation
Size of the code produced by MS Word is
4 318 bytes, whereas the one from DocToHtml (without
indentation mode) is only 614 bytes. And you can see yourself that it
is very hard and painful to manually edit code from MS Word.
Note also that DocToHtml takes advantage of the
fact that all cells in this table have the same font size and
vertical-align properties, so instead of specifying them for every
cell, they are specified only once at the whole table level. And so
with background color and font-weight properties of the first table
row. Also note that column widths are specified in percents, not in
points. It allows you to use generated table in rubber design with
ease, you do not need to worry about actual table width in pixels.
All these optimizations are possible because
DocToHtml produces resulted HTML completely by itself, without using
internal MS Word function (Although it utilizes
MS Word to gain access to the original document). This
approach allows us to generate highly optimized HTML code.
One can say here that there is free tool from
Microsoft, called Office HTML Filter, which claims to delete all
Office-specific tags and to leave only plain HTML. In reality, Office
HTML Filter isn't very useful. It indeed does some clean-up, but much
of messy code still remains even with the strictest cleaning options.
Actually, it is very hard to clean HTML code produced by built-in
Office converter to the level of unredundancy and painful integration
to your website.
The same fragment cleaned with Office HTML Filter
As you can see, the code is still very inflated
and practically uneditable. Office HTML Filter only strips out totally
useless SPAN tags with LANG attribute and some Office-specific markup,
but there is still a lot of garbage. Compare this to DocToHtml clean
And, of course, with DocToHtml you can fully
customize output HTML code, to the level of every formatting attribute,
which is not possible with Office HTML Filter. Even more, with
DocToHtml, you can specify low-level HTML code characteristics, such as
register of tag names, whether or not to use optional end tags, whether
or not to use optional quotes for attribute values, and so on.
of DocToHtml Conversion Options Dialog will give you an idea
about how deep you can control output HTML code at the very fine level.
When you don't need all formatting
As mentioned above, with DocToHtml you can
selectively omit certain formatting attributes. Office HTML Filter,
also has checkboxes "Remove all STYLE elements" and "Remove standard
CSS". Let's compare the output of DocToHtml with all checkboxes
regarding formatting attributes turned OFF, and output of Office HTML
Filter with mentioned two checkboxes ON.
Office HTML Filter without STYLE and CSS
DocToHtml without output formatting for fonts and
paragraphs, without indentation
In this mode, the document will not appear in
browser exactly the same as the original document in MS Word.
Note that you can't completely omit formatting with Office HTML Filter
- for example, valign attribute, <b> tags will always
present in the output document. With DocToHtml, on the contrary, you
can omit formatting selectively for fonts, paragraphs, tables, BODY
tag, for none of them, and for only certain attributes. In the last
example, all formatting for fonts and paragraphs were stripped, while
options for tables were set ON.
Note that for Office HTML Filter, there is
redundant "width" attribute for each and every table cell. But in HTML,
every cell belonging to a given table column, will have the same width
shared with all other cells belonging to that column. So there is
absolutely no need to specify width for cells which are not in the
first row. But MS Word thinks differently, and Office HTML
Filter can do nothing with it. Compare that to DocToHtml.
Size of selected fragment
|Conversion method||MS Word 2003||DocToHtml||Office HTML Filter||Office HTML Filter without formatting||DocToHtml without formatting|
|Size, Bytes||4 318||614||2 835||919||549|
|Ratio compared to
These digits can be approximated to the whole
document. So, if you want to preserve formatting, DocToHtml will
produce several times more compact code than MS Word even with
Office HTML Filter. It will significantly reduce bandwidth usage and
improve download time. And, often much more important, you can edit
resulted HTML code without any troubles.
Check yourself all that we are talking
about here - download free 30-day trial
All mentioned trademarks
are property of their respective owners