Making Media from Scratch, Part 2

by Chris Adamson

This is the second of a two-part series on creating QuickTime movies "from scratch" in Java. By that, I mean we're creating our own media data, piece by piece, to assemble the movie. Doing things at this low level is tricky, but I hope you'll agree after this installment that it's remarkably powerful.

Part 1 began with the structure of a QuickTime movie as a collection of tracks, each of which has exactly one Media object that in turn references media data that can be in the movie file, in another file, or out on the network. The Media has tables that indicate how to find specific "samples," individual pieces of audio, video, text, or other content to be rendered at a specific time in the movie. Part 1 used easy-to-create text tracks to show how to build up a Media structure, first by creating a simple all-text movie and then by adding textual "time-code" samples as a new text track in an existing movie.

In this part, we'll move on to creating video tracks from scratch, building up a video media object by adding graphic samples.

The goal of this article's sample code is to take a graphic file and make a movie out of it by "moving" around the image — you may have seen this concept in iMovie, where Apple calls it the "Ken Burns Effect," after the director who used it extensively in PBS' The Civil War and other documentaries. There is also a shareware application called Photo to Movie that does much the same thing.

Source Code

Download the source code for the examples.

We can make this work because of the concept of persistence of vision, which says that the human eye perceives a series of images, alternated sufficiently quickly, as motion. To do an image-to-movie effect, we show slightly different parts of the picture in each distinct image or "frame," creating the illusion of moving from one part of the picture to another.

Creating a VideoMedia

In creating text tracks, the approach was to:

  1. Create a movie on disk.
  2. Create a track.
  3. Add a Media object to it.
  4. Get a MediaHandler and use that to add samples to the Media.

The same approach generally works for video, except that the VisualMediaHandler doesn't do anything for us. Instead, we need to create a compression sequence, or CSequence, to prepare samples, encoded and compressed with a codec supported by QuickTime. We'll then add these samples directly to the Media.

The CSequence class has a method called compressFrame, which is what we need to generate samples. Its signature is:

public CompressedFrameInfo compressFrame(QDGraphics src,
                                         QDRect srcRect,
                                         int flags,
                                         RawEncodedImage data)
                                  throws StdQTException

That doesn't look too bad. We just need a QDGraphics as the source of our image, a rectangle describing what part of the image to use, some behavior flags, and a RawEncodedImage buffer into which to put the compressed frame.

Related Reading

Digital Video Pocket Guide
By Derrick Story

All Around the GWorld with QDGraphics

"So what's a QDGraphics?", you might be wondering. The name is presumably meant to evoke thoughts of the AWT's Graphics. Indeed, the two are remarkably similar: each represents a drawing surface, either on-screen or off-, containing methods for drawing lines, circles, arcs, polygons, and text strings.

One clever thing that QDGraphics does under the covers is to offer an isolation layer to hide whether the drawing surface is on-screen or off-screen unless you specifically ask for it, and what native structures (CGrafPort and GWorld) are involved. One odd side effect of this arrangement is that while there are many getGWorld() methods throughout the QTJ API, there's no GWorld class to return, so you get QDGraphics instead.

In fact, the GraphicsImporter offers a getGWorld(), and if you guessed that this class offers a way to get an image into QuickTime, you're right. So now we have some idea of how we're going to connect the dots to make a movie from an image:

One strategy for getting the frames is to:

  1. Get starting and ending rectangles, where a rectangle is a QDRect representing an upper-left corner point and width by height dimensions.

    Step One
    Step One

  2. Calculate a series of intermediate rectangles that take us from the startRect to the endRect.

    Step Two
    Step Two

  3. For each of these intermediate fromRects, call compressFrame to make a frame from that portion of the original image. Add each frame as a sample.

    Step Three
    Step Three

If you have QuickTime 5 or better, you can see the result here.

This strategy works, but it is limited by the size of the original image. This is pretty much a fatal flaw. If the image is only slightly larger than the movie size (i.e., the size of the rectangles), there isn't much room to move around. If it's smaller than our movie, then it won't work at all. On the other hand, if the image is much larger than our desired movie dimensions, then we might not be able to get the parts of the picture we want — it's not very useful if we can't get someone's entire face in the movie, and instead settle for a shot that moves from their nose to their chin.

Scaling the image would be a nice improvement, but we can actually do better than that. If we could scale each fromRect, then we could "zoom" in or out of the picture by using progressively larger or smaller source regions. But how do we do this?

The Matrix Reloaded

Part 1 demonstrated how QuickTime's Matrix class could be used to define a spatial transformation. Mainly, we used it to move text located at (0,0) to a point at the bottom of a movie, but look at the javadocs and you'll see some intriguing methods, like rotate() and scale().

The key to our improved strategy is a method called rect() that combines a coordinate mapping with a scaling operation. This allows us to use any source rectangle and scale it to the size of the frames we're compressing for the movie.

To make this work, the sample code creates an offscreen QDGraphics and tells the GraphicsImporter to use this for its draw()s. The new QDGraphics's dimensions are the same as those of the frames we intend to compress. That means its bounds are a QDRect with upper-left corner 0,0 and constant dimensions VIDEO_TRACK_WIDTH by VIDEO_TRACK_HEIGHT (which I've set to 360 by 240, but you're welcome to change in the code). For each intermediate fromRect, we create a Matrix to map from the fromRect's QDRect to our QDRect's bounds.

The revised process looks like this:

  1. Get starting and ending rectangles, where a rectangle is a QDRect representing an upper-left corner point and width by height dimensions.

    Step On
    Step One

  2. Calculate a series of intermediate rectangles that take us from the startRect to the endRect.

    Step Two
    Step Two

  3. For each of these intermediate fromRects, use a Matrix to scale the rectangle into the bounds of an offscreen QDGraphics, draw it into the QDGraphics, and then call compressFrame to make a frame from the offscreen QDGraphics. Add each frame as a sample.

    Step Three
    Step Three

Making It Real

Given that strategy, let's step through the code that makes it all work. We'll skip over creating the movie itself, which we covered last time. Similarly, creating and adding the VideoTrack and VideoMedia are a very straightforward analogue to last article's TextTrack and TextMedia setup.

If this is your first time compiling and running QuickTime for Java code, see my earlier article, "A Gentle Re-Introduction to QuickTime for Java," for information on how to work out CLASSPATH and Java versioning issues.

To get things started in this example, we need to know the source image file, as well as the startRect and endRect rectangles that define the movie we are to make. The sample code expects a file to be in the current directory, with entries that look something like this:




If this file is absent, the user will be queried for an image file at runtime, and the rectangles will be chosen randomly.

Given a QTFile for the image file, creating the GraphicsImporter is quite straightforward:

GraphicsImporter importer = new GraphicsImporter (imgFile);

Next, we create the offscreen QDGraphics and tell the GraphicsImporter to use it for its drawing:

QDGraphics gw =
    new QDGraphics (new QDRect (0, 0,
importer.setGWorld (gw, null);

Notice that I inadvertently called the variable gw, as in "GWorld". The use of that term in the API and Apple's docs is really pervasive!

One thing we have to prepare early is a block of memory big enough to hold the largest possible frame that the chosen video compressor could create. To do this, we call a getMaxCompressionSize() method, allocate a block of memory of that size (as referenced by a QTHandle), and "lock" the handle so it can't move while we're working with it. Finally, we can create a RawEncodedImage object with this buffer:

int rawImageSize =
    QTImage.getMaxCompressionSize (gw, 
QTHandle imageHandle = new QTHandle (rawImageSize, true);
RawEncodedImage compressedImage =

Related Reading

Mac OS X for Java Geeks
By Will Iverson

The CODEC_TYPE is a constant defined early in the sample code. It is an int that indicates which QuickTime-supported compression scheme we've chosen to use, "codec" being the term for a scheme by which video is encoded and decoded. Many of these are provided as constants in the StdQTConstants class. Among the popular choices are:

There are more supported codecs than QTJ lets on, but you have to look in the native API's ImageCompression.h to find them. Two great options are:

The next thing we do is to create a CSequence. This object provides us the ability to compress frames. We have to call this with each frame to compress, in order, and there's an interesting reason for this. If we were using a compression scheme meant for single images, such as JPEG, we could do the images in any order, since each frame would have all of the information it needed to be decompressed and rendered. This is generally not true of video compression schemes, which often use "temporal compression": techniques to compress data by eliminating redundant information between frames, such as an unchanging background. Because of this approach, decoding a given frame might depend on information from one or more previous frames, which is why we have to do our compression through an object that understands that we're working with a series of images.

The CSequence constructor looks like this:

CSequence seq = new CSequence (gw,

These arguments are, in order:

Once we've created the CSequence, we get an ImageDescription object, which we'll need later when adding samples to the Media.

Now we can start the loop to draw, compress, and add frames. We calculate a rectangle, fromRect, inside of the original image. This will be the source of this frame. Next, we create a Matrix that maps and scales from its original location and size to the offscreen buffer's location and size; in other words, a rectangle at (0,0) with dimensions VIDEO_TRACK_WIDTH by VIDEO_TRACK_HEIGHT. Calling GraphicsImporter.draw() performs the scaled drawing of the region into the offscreen QDGraphics.

Matrix drawMatrix = new Matrix();
drawMatrix.rect (fromRect, gRect);
importer.setMatrix (drawMatrix);

Next, we compress the image that was drawn into the offscreen QDGraphics:

CompressedFrameInfo cfInfo =
        seq.compressFrame (gw, 

The arguments to this call are:

The compressFrame call returns a CompressedImageInfo object, which has an important method called getSimilarity(). This value represents how similar the compressed image is to the one compressed just before it. A value of 255 means the images are identical. 0 means the compressed frame is to be a "key frame," meaning it has all the image data it needs, it does not depend on other frames, and other frames may depend on it. Other values simply represent image difference, where low values mean low similarity.

With the frame now compressed into the RawEncodedImage, we can add a sample to the VideoMedia, with the addSample method inherited from the Media superclass:

videoMedia.addSample (imageHandle, 
                      (syncSample ?

The arguments to this method are:

Once the loop finishes, we do the same clean-up tasks as with the text-track samples in Part 1 -- declare that we're done editing and insert the media into the video track:

videoTrack.insertMedia (0, // trackStart
                        0, // mediaTime
                        videoMedia.getDuration(), // mediaDuration
                        1); // mediaRate

Finally, we save the movie to disk, exactly as before.

The Result!

Here, for those with QuickTime 5 or 6, is a movie produced by the sample code. If you recompile and re-run the code with different codecs and different sizes, you'll see some fairly dramatic differences in file size and image quality. I've used 160x120 to keep the file size small, in order to avoid abusing O'Reilly's bandwidth, and the compression artifacts here are more visible than in the 320x240 version.

Also remember that while we just copied a scaled section of an image into the offscreen buffer, you can do any kind of imaging with this buffer before compressing it into a frame. For example, you could do the drawing commands in the QDGraphics class, or use the QTImageDrawer to use Java 2D Graphics methods to draw into the QuickTime world. With some bit-munging, you might even find a way to render 3D graphics from JOGL into QuickTime ... anyone up for rendering Finding Nemo directly into a QuickTime movie?

This completes our tour of QuickTime media structures, in which we've gone from the high-level view of what makes up a movie to the low-level mucking around with individual samples. This is a little "closer to the metal" than QTJ usually requires, but if you believe in keeping simple tasks easy and complex tasks possible, this has been an example of the latter.

Chris Adamson is an author, editor, and developer specializing in iPhone and Mac.

Return to

Copyright © 2017 O'Reilly Media, Inc.