ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button

Top Ten New Things You Can Do with NIO
Pages: 1, 2, 3, 4

5: Memory-Mapped Files



The theme of wrapping ByteBuffer objects around arbitrary memory spaces continues with MappedByteBuffer, a specialized form of ByteBuffer. On most operating systems, it's possible to memory map a file using the mmap() system call (or something similar) on an open file descriptor. Calling mmap() returns a pointer to a memory segment, which actually represents the content of the file. Fetches from memory locations within that memory area will return data from the file at the corresponding offset. Modifications made to the memory space are written to the file on disk.

memory mapping
Figure 7: User memory mapped to the filesystem.

There are two big advantages to memory mapped files. First, the "memory" does not usually consume normal virtual memory space. Or, more correctly, the virtual memory space of a file mapping is backed by the file data on disk. That means it's not necessary to allocate regular paging space for mapped files; their paging area is the file itself. If you were to open the file conventionally and read it into memory, that would consume a corresponding amount of paging space, because you're copying the data into regular memory. Second, multiple mappings of the same file share the same virtual address space. Theoretically, 100 mappings can be established by 100 different processes to the same 500MB file; each will appear to have the entire 500MB of data in memory, but the overall memory consumption of the system won't change a bit. Pieces of the file will be brought into memory as references are made, which will compete for RAM, but no paging space will be consumed.

In Figure 7, additional processes running in user space would map to that same physical memory space, through the same filesystem cache and thence to the same file data on disk. Each of those processes would see changes made by any other. This can be exploited as a form of persistent, shared memory. Operating systems vary in the way their virtual memory subsystems behave, so your mileage may also vary.

MappedByteBuffer instances are created by invoking the map() method on an open FileChannel object. The MappedByteBuffer class has a couple of additional methods for managing caching and flushing of updates to the underlying file.

Prior to NIO, it wasn't possible to memory map files without resorting to platform-specific, non-portable native code. It's now possible for any pure Java program to take advantage of memory mapping, easily and portably.

4: Scattering Reads and Gathering Writes

Here's a familiar bit of code:

byte [] byteArray = new byte [100];
...
int bytesRead = fileInputStream.read (byteArray);

This reads some data from a stream into an array of bytes. Here's the equivalent read operation using ByteBuffer and FileChannel objects (to move the examples into the NIO realm):

ByteBuffer byteBuffer = ByteBuffer.allocate (100);
...
int bytesRead = fileChannel.read (byteBuffer);

And here's a common usage pattern:

ByteBuffer header = ByteBuffer.allocate (32);
ByteBuffer colorMap = ByteBuffer (256 * 3)
ByteBuffer imageBody = ByteBuffer (640 * 480);

fileChannel.read (header);
fileChannel.read (colorMap);
fileChannel.read (imageBody);

This performs three separate read() calls to load a hypothetical image file. This works fine, but wouldn't it be great if we could issue a single read request to the channel and tell it to place the first 32 bytes into the header buffer, the next 768 bytes into the colorMap buffer, and the remainder into imageBody?

No problem, can do, easy. Most NIO channel types support scatter/gather, also known as vectored I/O. A scattering read to the above buffers can be accomplished with this code:

ByteBuffer [] scatterBuffers = { header, colorMap, imageBody };

fileChannel.read (scatterBuffers);

Rather than pass a single buffer object to the channel, an array of buffers is passed in. The channel fills each buffer in turn until all are full or there're no more data to read. Gathering writes are done in a similar way; data are drained from each buffer in the list in turn and sent along the channel, exactly as if they had been written sequentially.

Scatter/gather can provide a real performance boost when reading or writing data that's partitioned into fixed-size, logically distinct segments. Passing a list of buffers means the entire transfer can be optimized (using multiple CPUs for example) and fewer overall system calls are needed.

Gathering writes can compose results from several buffers. For example, an HTTP response could use a read-only buffer containing static headers that are the same for every response, a dynamically-populated buffer for those headers unique to the response, and a MappedByteBuffer object associated with a file, which is to be the body of the response. A given buffer may even appear in more than one gather list, or multiple views of the same buffer can be used.

3: Direct Channel Transfers

Did you even notice that whenever you need to copy data to or from a file, you seem to write the same old copy loop over and over again? It's always the same story: you read a chunk of data into a buffer then immediately write it back out again somewhere else. You're not doing anything with that data so why is it necessary to pull it in just to shove it back out again? Why is it necessary to continually reinvent this wheel?

Here's a thought. Wouldn't it be great if you could just tell some class "Move the data from this file to that one" or "Write all the data that comes out of that socket to this file over there"? Well, thanks to the modern miracle of direct channel transfers, now you can.

public abstract class FileChannel
extends AbstractChannel
implements ByteChannel, GatheringByteChannel, ScatteringByteChannel
{
   // This is a partial API listing

   public abstract long transferTo (long position, long count, 
      WritableByteChannel target)

   public abstract long transferFrom (ReadableByteChannel src,	
      long position, long count)
}

A channel transfer lets you cross-connect two channels so that data is transfered directly from one to the other without any further intervention on your part. Because the transferTo() and transferFrom() methods belong to the FileChannel class, a FileChannel object must be the source or destination of a channel transfer (you can't transfer from one socket to another, for example). But the other end may be any ReadableByteChannel or WritableByteChannel, as appropriate.

On operating systems with appropriate support, channel transfers can be done entirely in kernel space. This not only relieves you of the chore of doing the copy, it bypasses the JVM entirely! One low-level system call and boom! Done. Even on those OS platforms without kernel support for transfers, making use of these methods still saves you the trouble of writing yet another copy loop. And the odds are good that the implementation will use native code or other optimizations to move the data as quickly as possible, faster than you could ever do it yourself in regular Java code. And the best part: code you never write never has bugs.

Pages: 1, 2, 3, 4

Next Pagearrow