Parsing and Writing QuickTime Files in Java
by Chris Adamson02/19/2003
Apple's QuickTime turns 12 this year. Its very extensible file format has contributed to this longevity, allowing QuickTime to migrate from a world of CD-ROMs, AppleTalk, and static content to today's massively-networked, streaming, interactive world. The format is so flexible that it was chosen as the basis of the MPEG-4 file format. More than one might expect, the philosophy and concepts of the file format are integral to working with QuickTime structures at runtime.
However, the QuickTime APIs do much to isolate developers from the nuts-and-bolts of the file format when doing the most common tasks, so we'll examine the format with a simple pure-Java QuickTime file format parser, then we'll use some QuickTime for Java code to generate some different kinds of QuickTime files to illustrate the format's flexibility.
The details of the format are readily available in the 351-page Inside QuickTime: QuickTime File Format (PDF). They are also installed--for Mac OS X developers--in /Developer/Documentation/QuickTime/qtdevdocs/PDF/QTFileFormat.pdf by the Developer Tools installer.
Mighty Atom
The heart and soul of QuickTime is the concept of the "atom." The name should remind you of high-school chemistry, where an atom was the smallest unit of an element that retained the properties of the element. In QuickTime, an atom is the lowest level to which we can go and still be able to tell the difference between, say, an edit-list and a sprite. All atoms have a size and a type. Any other information they may contain depends on their type. This concept helps forwards-compatibility in the format--it's easy to skip over an unknown type because the size is right there.
There's a difference between "classic" atoms and newer "QT" atoms, but the latter is backwards-compatible with the former and both are commonly encountered in a single file. Let's focus on the commonalities. All atoms have a header of either 8 or 16 bytes, consisting of either two or three parts:
|
Sample Code Download the sample code for this article. |
- atom size:a 4-byte, unsigned integer. If 0, the atom continues to the end of the file.
- atom type: a 4-byte value, usually interpreted as
an ASCII string like
moov, though any value is valid. - Optionally, an extended size: if the atom size was
1, then this field is present and interpreted as an 8-byte unsigned integer. This allows an atom to contain more than 4 GB of data.
The sample code contains a simple example in the EmptyMovie.mov
file, which is just an untitled movie created in QuickTime Player and saved
without modifiation. Open it in hexdump, od, or your
favorite hex editor (I'm fond of HexEdit for the Mac). If you dump the output as characters (i.e., hexedit -cv EmptyMovie.mov), the atom
types practically jump out at you:
\0 \0 \0 214 m o o v \0 \0 \0 l m v h d
\0 \0 \0 \0 272 @ Q 352 272 @ Q 372 \0 \0 002 X
\0 \0 \0 \0 \0 001 \0 \0 \0 377 \0 \0 \0 \0 \0 \0
\0 \0 \0 \0 \0 001 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
\0 \0 \0 \0 \0 001 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
\0 \0 \0 \0 @ \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
\0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
\0 \0 \0 001 \0 \0 \0 030 u d t a \0 \0 \0 \f
W L O C \0 4 \0 030 \0 \0 \0 \0
If we look at the byte values instead, and carefully count the sizes of the
atoms, we can see the structure of the movie. Figure 1 shows a graphic
representation. In case you're not comfortable reading hex, the file starts
with the size and type of the first atom, an 0x8c-long
moov, which matches the file size. It contains a
0x6c-long mvhd, which has a few non-null bytes. The
moov's other child is a udta of size
0x18, which itself contains a WLOC of size
0x0c.
Figure 1--graphic map of atoms in EmptyMovie.mov
Little things to notice:
- The
moovandudataatoms contain other atoms, and don't seem to do anything besides contain atoms. This is a key trait of QuickTime atoms--they either contain data or other atoms, never both. That's different from other tree-structured data formats like XML, where an element can have both attributes and child elements. - What's the
0x0000that's in theudtabut follows theWLOC? Depending on your mood, it's a bug or a feature. Apple says that they write an extra 32 bits of zero after the last child of audtaatom to maintain compatibility with a bug from way back in QuickTime 1.0. - If your first eight bytes read as
0000 8c00 6f6d 767f, then you're running on Windows. QuickTime data structures are defined as "big-endian," meaning that the most-significant byte of a two-byte value comes first. PCs running Windows use little-endian ordering, so the bytes appear backwards when you look at 16-bit values. - Finally, there's no special sequence to identify the contents as QuickTime
data, like the
CAFEBABE"magic number" that begins Java class files or theID3sequence that typically begins an ID3-tagged MP3 file.
What does all this say anyway? The file-format docs define the contents of
each of the "leaf" atoms, so we look there to interpret the
mvhd and WLOC atoms. Since this is a minimal movie,
there's not much to see--the mvhd is a "movie header;" a
structure that defines some metadata values like creation time, preferred
volume, time-scale, et cetera. These defaults are saved into the file. The next
atom is user data, udta, a container for an arbitrarily long list
of metadata atoms. This is a good place to put your own data into the movie,
with whatever format suits you, so long as you choose an unused atom type and
don't use all-lower-case, which is reserved for Apple. Here, there is only one
piece of user data, the window location, WLOC. It contains two
16-bit unsigned ints for x and y, in this case
(0x34,0x18) or in decimal,
(52,24).
|
Related Reading
Ant: The Definitive Guide |
Doing It the Hard Way
While QuickTime for Java generally isolates you from the grubby details of
the format, I've included a simple all-Java QuickTime file parser so we can
quickly see the structure of a movie file on any J2SE platform. Download the
accompanying source tarball and open it up. The parser source and a
pre-compiled .jar are in the atom-parse directory. An Ant build.xml file is included to help you build the code, if you're interested (do ant help to see the available targets), or you can just run it from the .jar with java
-classpath atomparse.jar com.mac.invalidname.qtatomparse.AtomParser.
The code starts with a basic ParsedAtom class, which represents
any atom found in the file. This is subclassed as
ParsedContainerAtom, containing an array of its children, and
ParsedLeafAtom, which is meant to be a parent for type-specific
subclasses that interpret particular atom types. A factory provides the parser
with the class for a given type--new classes can be added by editing its
properties file. Finally, AtomParser puts it all together,
recursively calling a parseAtoms method when it discovers a
container atom, and returning an array of children.
Here's the critical section for reading an atom's size, type, extended size,
and data, given raf (a RandomAccessFile),
off (current offset that we're reading; i.e., start of an atom), and
stopAt (where the parent atom or file ends).
while (off < stopAt) {
raf.seek (off);
// 1. first 32 bits are atom size
// use BigInteger to convert bytes to long
// (instead of signed int)
int bytesRead = raf.read (atomSizeBuf, 0,
atomSizeBuf.length);
if (bytesRead < atomSizeBuf.length)
throw new IOException ("couldn't read atom length");
BigInteger atomSizeBI = new BigInteger (atomSizeBuf);
long atomSize = atomSizeBI.longValue();
// this is kind of a hack to handle the udta problem
// (see below) when the parent didn't have children,
// meaning we've read 4 bytes of 0 and the parent atom
// is already over
if (raf.getFilePointer() == stopAt)
break;
// 2. next, the atom type
bytesRead = raf.read (atomTypeBuf, 0
atomTypeBuf.length);
if (bytesRead != atomTypeBuf.length)
throw new IOException ("Couldn't read atom type");
String atomType = new String (atomTypeBuf);
// 3. if atomSize was 1, then this is 64-bit ext size
if (atomSize == 1) {
bytesRead = raf.read (extendedAtomSizeBuf, 0,
extendedAtomSizeBuf.length);
if (bytesRead != extendedAtomSizeBuf.length)
throw new IOException (
"Couldn't read extended atom size");
BigInteger extendedSizeBI =
new BigInteger (extendedAtomSizeBuf);
atomSize = extendedSizeBI.longValue();
}
// if this atom size is negative, or extends past end
// of file, it's extremely suspicious (i.e.,we're not
// really in a quicktime file)
if ((atomSize < 0) ||
((off + atomSize) > raf.length()))
throw new IOException (
"atom has invalid size: " + atomSize);
// 4. if a container atom, then parse the children
ParsedAtom parsedAtom = null;
if (ATOM_CONTAINER_TYPES.contains (atomType)) {
// children run from current point to end of the atom
ParsedAtom [] children =
parseAtoms (raf, raf.getFilePointer(), off + atomSize);
parsedAtom =
new ParsedContainerAtom (atomSize, atomType, children);
} else {
parsedAtom =
AtomFactory.getInstance().createAtomFor (
atomSize, atomType, raf);
}
// add atom to the list
parsedAtomList.add (parsedAtom);
// now set offset to next atom (or end-of-file
// in special case (atomSize = 0 means atom goes
// to EOF)
if (atomSize == 0)
off = raf.length();
else
off += atomSize;
// if a 'udta' container atom, then jump ahead 4
// to work around Apple's QT 1.0 workaround
// (http://developer.apple.com/technotes/qt/qt_03.html )
if (atomType.equals("udta"))
off += 4;
} // while not at stopAt
A few caveats to this code. First, please excuse my abuse of the
BigInteger class to get longs from four-byte arrays,
but the alternative is a blinding amount of bit-shifting. Moreover, the reason
I use longs for atom sizes is that it usually avoids signing
problems (32-bit java ints are signed, while the usual QuickTime
atom size is a 32-bit unsigned value). However, it will be wrong if
you happen to encounter an atom larger than 9,223,372,036,854,775,807 bytes
(i.e.,a 64-bit integer with the top bit set). Just thought I'd mention that, in
case you just got back from the store with a 10 exabyte drive. Also,
my scheme for knowing what atoms are containers is to list known containers in
AtomParser. If I've missed one, the parser handles it fairly
gracefully, because we have the size of the atom and simply advance the offset
to the next atom (unfortunately, without parsing the children).
Here's the output when we run the parser on EmptyMovie.mov:
moov (140 bytes) - 2 children
mvhd (108 bytes)
udta (24 bytes) - 1 child
WLOC (12 bytes) (x,y) == (52,24)
So far, so boring. Let's try a more interesting bit of content. The movie tim-drm-ref.mov is a 45-second sound bite of Tim O'Reilly discussing digital rights management at the recent O'Reilly Mac OS X conference. The file is a reference to a 51 MB movie of the entire keynote panel, yet this file is a dainty 6 KB, since it consists entirely of metadata, including the references to the original movie on the O'Reilly web site.
This file's structure is a lot more involved:
moov (5957 bytes) - 4 children
mvhd (108 bytes)
trak (3951 bytes) - 4 children
tkhd (92 bytes)
edts (36 bytes) - 1 child
elst (28 bytes) [1 edit]
mdia (3803 bytes) - 3 children
mdhd (32 bytes)
hdlr (58 bytes) [mhlr/vide - Apple Video Media Handler]
minf (3705 bytes) - 4 children
vmhd (20 bytes)
hdlr (55 bytes) [dhlr/url - Apple URL Data Handler]
dinf (76 bytes) - 1 child
dref (68 bytes)
stbl (3546 bytes) - 6 children
stsd (102 bytes)
stts (24 bytes)
stss (216 bytes)
stsc (172 bytes)
stsz (2248 bytes)
stco (776 bytes)
udta (12 bytes) - 0 children
trak (1857 bytes) - 4 children
tkhd (92 bytes)
edts (36 bytes) - 1 child
elst (28 bytes) [1 edit]
mdia (1709 bytes) - 3 children
mdhd (32 bytes)
hdlr (58 bytes) [mhlr/soun - Apple Sound Media Handler]
minf (1611 bytes) - 4 children
smhd (16 bytes)
hdlr (55 bytes) [dhlr/url - Apple URL Data Handler]
dinf (76 bytes) - 1 child
dref (68 bytes)
stbl (1456 bytes) - 5 children
stsd (132 bytes)
stts (24 bytes)
stsc (880 bytes)
stsz (20 bytes)
stco (392 bytes)
udta (12 bytes) - 0 children
udta (33 bytes) - 2 children
WLOC (12 bytes) (x,y) == (83,93)
SelO (9 bytes)
This file is far more typical of what we expect to see in a movie, or more
accurately, in a moov (go ahead, say it out loud:
moo-vee). In addition to the metadata-bearing mvhd movie
header and the udta user data, there are two trak
atoms, both with a deep, yet similar, structure. This movie consists of two
"tracks," one for video and one for audio. Tracks store metadata in
the tkhd track header (analogous to the mvhd we saw
earlier), an "edits" structure that indicates what parts of the
underlying media are used by the track, and a detailed "media"
structure.
The media structure has, again, a metadata header, a hdlr
handler atom that indicates which component should handle the media data, a
"data information" structure made up of dref data
references to say where the media data is (in this file, elsewhere on disk, on
the net, etc.), and finally, a tricky structure for locating and intepreting
media samples.
It's too much to try to understand what all of these atoms represent right away
if you're new to QuickTime, but it might be helpful to look at Apple's Introduction
to QuickTime tutorial, specifically the section on tracks
and media, and see how the contents map fairly directly onto the structure
presented in the preceding two paragraphs. Another point of interest is
Ridgeworks' QTatomizer,
a shareware product that represents the atom structure of a QuickTime movie as
a Swing JTree.
Pages: 1, 2 |