Linux DevCenter    
 Published on Linux DevCenter (http://www.linuxdevcenter.com/)
 See this if you're having trouble printing code examples


Living Linux

Managing Documents With SGMLtools

05/23/2000

With the SGMLtools package, you can write documents and generate output in many different kinds of formats -- including HTML, plain text, PDF, and PostScript -- all from the same plain text input file.

SGML ("Standard Generalized Markup Language") is not an actual format, but a specification for writing markup languages; the markup language "formats" themselves are called DTDs ("Document Type Definitions"). When you write a document in an SGML DTD, you write input as a plain text file with markup tags -- such as "<em>this<em>" for emphasized text..

The various SGML packages on Linux are currently in a state of transition; the original SGML-Tools package (now "SGMLtools v1") is considered obsolete and is no longer being developed; however, the newer SGMLtools v2 (aka "SGMLtools Next Generation" and "SGMLtools '98") is still alpha software, as is SGMLtools-lite, a new subset of SGMLtools.

In the meantime, if you want to dive in and get started making documents with the early SGMLtools and the linuxdoc DTD (the DTD long-used by the Linux Documentation Project), it's not hard to do. While the newer DocBook DTD has become very popular, it may be best suited for technical books and other very large projects. For smaller documents written by individual authors, such as a multi-part essay, FAQ, or white paper, the linuxdoc DTD still works fine.

And since the Linux HOWTOs are still written in linuxdoc, Debian has decided to maintain the SGMLtools 1.0 package independently; you can download both the Debian package and the original source code from http://www.debian.org/Packages/stable/text/sgml-tools.html.

Many of the same SGML tools are available for BSD and can be found in the "textproc" section of the ports or pkg-src collection. Both the linuxdoc and DocBook DTDs are available.

Elements of a document

A document written in an SGML DTD looks a lot like HTML -- which is no coincidence, since HTML is a subset of SGML. A very simple "Hello, world" in the linuxdoc DTD might look like this:

<!doctype linuxdoc system>
<article>
<title>An Example Document
<author>Ann Author
<date>4 May 2000
<abstract>
This is an example LinuxDoc document.
</abstract>

<sect>Introduction

<p>Hello, world.

</article>

The SGMLtools package comes with a simple example file, example.sgml.gz, which is installed in the /usr/doc/sgml-tools directory.

Checking document syntax

Use sgmlcheck to make sure the syntax of your SGML document is correct -- it outputs any errors in the document you specify as an argument. For example, to check the sgml file myfile.sgml, you'd type:

$ sgmlcheck myfile.sgml RET

Generating output

Now for the fun part -- generating output from your .sgml input file.

The following table lists the available SGML converters and the kind of output they generate:

sgml2html

Generates HTML files

sgml2info

Generates a GNU info file

sgml2lyx

Generates a LyX input file

sgml2latex

Generates a LaTeX input file

sgml2rtf

Generates a file in Microsoft Rich Text Format

sgml2txt

Generates plain text format

Each of these tools takes the .sgml input file as an argument and writes the output to a file with the same base file name but with an extension that reflects its format.

For example, to make a plain text file from myfile.sgml, you'd type:

$ sgml2txt myfile.sgml RET

This command writes a plain text file called myfile.txt.

To make a PostScript or PDF file from an .sgml file, first generate a LaTeX input file, run it through LaTeX to make a DVI output file, and then process that to make the final output (processing LaTeX files was the subject of a previous column):

$ sgml2latex myfile.sgml RET
$ latex myfile.latex RET
$ dvips -t letter -o myfile.ps myfile.dvi RET

In this example, sgml2latex writes a LaTeX input file from the .sgml source, and then the latex tool processes the LaTeX file to make DVI output, which is processed with dvips to get the final output, a PostScript file called myfile.ps with a paper size of US letter.

To make a PDF file from the PostScript file, you need to do one more step, and use ps2pdf (part of the gs or Ghostscript package) to convert the PostScript to PDF:

$ ps2pdf myfile.ps myfile.pdf RET

Next week: LyX, a "document processor" application.

Michael Stutz was one of the first reporters to cover Linux and the free software movement in the mainstream press.


Read more Living Linux columns.

Copyright © 2009 O'Reilly Media, Inc.