ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button

Introduction to Amazon S3 with Java and REST

by Eric Heuveneers
11/08/2007

Introduction

Amazon Simple Store Service (S3) is a service from Amazon that allows you to store files into reliable remote storage for a very competitive price; it is becoming very popular. S3 is used by companies to store photos and videos of their customers, back up their own data, and more. S3 provides both SOAP and REST APIs; this article focuses on using the S3 REST API with the Java programming language.

S3 Basics

S3 handles objects and buckets. An object matches to a stored file. Each object has an identifier, an owner, and permissions. Objects are stored in a bucket. A bucket has a unique name that must be compliant with internet domain naming rules. Once you have an AWS (Amazon Web Services) account, you can create up to 100 buckets associated with that account. An object is addressed by a URL, such as http://s3.amazonaws.com/bucketname/objectid. The object identifier is a filename or filename with relative path (e.g., myalbum/august/photo21.jpg). With this naming scheme, S3 storage can appear as a regular file system with folders and subfolders. Notice that the bucket name can also be the hostname in the URL, so your object could also be addressed by http://bucketname.s3.amazonaws.com/objectid.

S3 REST Security

S3 REST resources are secure. This is important not just for your own purposes, but also because customers are billed depending on how their S3 buckets and objects are used. An AWSSecretKey is assigned to each AWS customer, and this key is identified by an AWSAccessKeyID. The key must be kept secret and will be used to digitally sign REST requests. S3 security features are:

  • Authentication: Requests include AWSAccessKeyID
  • Authorization: Access Control List (ACL) could be applied to each resource
  • Integrity: Requests are digitally signed with AWSSecretKey
  • Confidentiality: S3 is available through both HTTP and HTTPS
  • Non repudiation: Requests are time stamped (with integrity, it's a proof of transaction)

The signing algorithm is HMAC/SHA1 (Hashing for Message Authentication with SHA1). Implementing a String signature in Java is done as follows:

private javax.crypto.spec.SecretKeySpec signingKey = null;
private javax.crypto.Mac mac = null;
...
// This method converts AWSSecretKey into crypto instance.
public void setKey(String AWSSecretKey) throws Exception
{
  mac = Mac.getInstance("HmacSHA1");
  byte[] keyBytes = AWSSecretKey.getBytes("UTF8");
  signingKey = new SecretKeySpec(keyBytes, "HmacSHA1");
  mac.init(signingKey);
}

// This method creates S3 signature for a given String.
public String sign(String data) throws Exception
{
  // Signed String must be BASE64 encoded.
  byte[] signBytes = mac.doFinal(data.getBytes("UTF8"));
  String signature = encodeBase64(signBytes);
  return signature;
}
...

Authentication and signature have to be passed into the Authorization HTTP header like this:

Authorization: AWS <AWSAccessKeyID>: <Signature>.

The signature must include the following information:

  • HTTP method name (PUT, GET, DELETE, etc.)
  • Content-MD5, if any
  • Content-Type, if any (e.g., text/plain)
  • Metadata headers, if any (e.g., "x-amz-acl" for ACL)
  • GMT timestamp of the request formatted as EEE, dd MMM yyyy HH:mm:ss
  • URI path such as /mybucket/myobjectid

Here is a sample of successful S3 REST request/response to create "onjava" bucket:

<u>Request</u>:
PUT /onjava HTTP/1.1
Content-Length: 0
User-Agent: jClientUpload
Host: s3.amazonaws.com
Date: Sun, 05 Aug 2007 15:33:59 GMT
Authorization: AWS 15B4D3461F177624206A:YFhSWKDg3qDnGbV7JCnkfdz/IHY=

<u>Response</u>:
HTTP/1.1 200 OK
x-amz-id-2: tILPE8NBqoQ2Xn9BaddGf/YlLCSiwrKP+OQOpbi5zazMQ3pC56KQgGk
x-amz-request-id: 676918167DFF7F8C
Date: Sun, 05 Aug 2007 15:30:28 GMT
Location: /onjava
Content-Length: 0
Server: AmazonS3

Notice the delay between request and response timestamp? The request Date has been issued after the response Date. This is because the response date is coming from the Amazon S3 server. If the difference from request to response timestamp is too high then a RequestTimeTooSkewed error is returned. This point is another important feature of S3 security; it isn't possible to roll your clock too far forward or back and make things appear to happen when they didn't.

Note: Thanks to ACL, an AWS user can grant read access to objects for anyone (anonymous). Then signing is not required and objects can be addressed (especially for download) with a browser. It means that S3 can also be used as hosting service to serve HTML pages, images, videos, applets; S3 even allows granting time-limited access to objects.

Creating a Bucket

The code below details the Java implementation of "onjava" S3 bucket creation. It relies on packages java.net for HTTP, java.text for date formatting and java.util for time stamping. All these packages are included in J2SE; no external library is needed to talk to the S3 REST interface. First, it generates the String to sign, then it instantiates the HTTP REST connection with the required headers. Finally, it issues the request to s3.amazonaws.com web server.

public void createBucket() throws Exception
{
  // S3 timestamp pattern.
  String fmt = "EEE, dd MMM yyyy HH:mm:ss ";
  SimpleDateFormat df = new SimpleDateFormat(fmt, Locale.US);
  df.setTimeZone(TimeZone.getTimeZone("GMT"));

  // Data needed for signature
  String method = "PUT";
  String contentMD5 = "";
  String contentType = "";
  String date = df.format(new Date()) + "GMT";
  String bucket = "/onjava";

  // Generate signature
  StringBuffer buf = new StringBuffer();
  buf.append(method).append("\n");
  buf.append(contentMD5).append("\n");
  buf.append(contentType).append("\n");
  buf.append(date).append("\n");
  buf.append(bucket);
  String signature = sign(buf.toString());

  // Connection to s3.amazonaws.com
  HttpURLConnection httpConn = null;
  URL url = new URL("http","s3.amazonaws.com",80,bucket);
  httpConn = (HttpURLConnection) url.openConnection();
  httpConn.setDoInput(true);
  httpConn.setDoOutput(true);
  httpConn.setUseCaches(false);
  httpConn.setDefaultUseCaches(false);
  httpConn.setAllowUserInteraction(true);
  httpConn.setRequestMethod(method);
  httpConn.setRequestProperty("Date", date);
  httpConn.setRequestProperty("Content-Length", "0");
  String AWSAuth = "AWS " + keyId + ":" + signature;
  httpConn.setRequestProperty("Authorization", AWSAuth);
  // Send the HTTP PUT request.
  int statusCode = httpConn.getResponseCode();
  if ((statusCode/100) != 2)
  {
    // Deal with S3 error stream.
    InputStream in = httpConn.getErrorStream();
    String errorStr = getS3ErrorCode(in);
    ...
  }
}

Dealing with REST Errors

Basically, all HTTP 2xx response status codes are success and others 3xx, 4xx, 5xx report some kind of error. Details of error message are available in the HTTP response body as an XML document. REST error responses are defined in S3 developer guide. For instance, an attempt to create a bucket that already exists will return:

HTTP/1.1 409 Conflict
x-amz-request-id: 64202856E5A76A9D
x-amz-id-2: cUKZpqUBR/RuwDVq+3vsO9mMNvdvlh+Xt1dEaW5MJZiL
Content-Type: application/xml
Transfer-Encoding: chunked
Date: Sun, 05 Aug 2007 15:57:11 GMT
Server: AmazonS3

<?xml version="1.0" encoding="UTF-8"?>
<Error>
  <Code>BucketAlreadyExists</Code>
  <Message>The named bucket you tried to create already exists</Message>
  <RequestId>64202856E5A76A9D</RequestId>
  <BucketName>awsdownloads</BucketName>
  <HostId>cUKZpqUBR/RuwDVq+3vsO9mMNvdvlh+Xt1dEaW5MJZiL</HostId>
</Error>

Code is the interesting value in the XML document. Generally, this can be displayed as an error message to the end user. It can be extracted by parsing the XML stream with SAXParserFactory, SAXParser and DefaultHandler classes from org.xml.sax and javax.xml.parsers packages. Basically, you instantiate a SAX parser, then implement the S3ErrorHandler that will filter for Code tag when notified by the SAX parser. Finally, return the S3 error code as String:

public String getS3ErrorCode(InputStream doc) throws Exception
{
  String code = null;
  SAXParserFactory parserfactory = SAXParserFactory.newInstance();
  parserfactory.setNamespaceAware(false);
  parserfactory.setValidating(false);
  SAXParser xmlparser = parserfactory.newSAXParser();
  S3ErrorHandler handler = new S3ErrorHandler();
  xmlparser.parse(doc, handler);
  code = handler.getErrorCode();
  return code;
}

// This inner class implements a SAX handler.
class S3ErrorHandler extends DefaultHandler
{
  private StringBuffer code = new StringBuffer();
  private boolean append = false;

  public void startElement(String uri, String ln, String qn, Attributes atts)
  {
    if (qn.equalsIgnoreCase("Code")) append = true;
  }
  public void endElement(String url, String ln, String qn)
  {
    if (qn.equalsIgnoreCase("Code")) append = false;
  }
  public void characters(char[] ch, int s, int length)
  {
    if (append) code.append(new String(ch, s, length));
  }

  public String getErrorCode()
  {
    return code.toString();
  }
}

A list of all error codes is provided in S3 developer guide. You're now able to create a bucket on Amazon S3 and deal with errors. Full source code is available in resources section.

File Uploading

Upload and download operations require more attention—S3 storage is unlimited, but it allows 5 GB transfer maximum per object. An optional content MD5 check is supported to make sure that transfer has not been corrupted, although an MD5 computation on a 5 GB file will take some time even on fast hardware.

S3 stores the uploaded object only if the transfer is successfully completed. If a network issue occurs then file has to be to uploaded again from the start. S3 doesn't support resuming or object content partial update. That's one of the limits of the first "S" (Simple) in S3, but the simplicity also makes dealing with the API much easier.

When performing a file transfer with S3, you will be responsible for streaming the objects. A good implementation will always stream objects, as otherwise they will grow in Java's heap; with S3's limit of 5 GB on an object, you could quickly be seeing an OutOfMemoryException.

An example of a good upload implementation is available in the resources section of this article.

Beyond This Example

Many other operations are available through the S3 APIs:

  • List buckets and objects
  • Delete buckets and objects
  • Upload and download objects
  • Add meta-data to objects
  • Apply permissions
  • Monitor traffic and get statistics (still a beta API)

Adding custom meta-data to an object is an interesting feature. For example, when uploading a video file, you could add "author," "title," and "location" properties, and retrieve them later when listing the objects. Getting statistics (IP address, referrer, bytes transferred, time to process, etc.) on buckets could be useful too to monitor traffic.

Conclusion

This article introduced the basics of Amazon Simple Store Service REST API. It detailed how to implement bucket creation in Java and how to deal with S3 security principles. It showed that HTTP and XML skills are needed when developing with S3 REST API. Some S3 operations could be improved (especially for upload), but overall Amazon S3 rocks. To go beyond what was presented in this article, you could check Java S3 tools available in the resources section.

References and Resources

  • Source code: Source code for this article
  • SOAP: Simple Object Access Protocol
  • REST: REpresentational State Transfer
  • S3 APIs: Amazon S3 Developer Guide
  • HMAC: Keyed-Hashing for Message Authentication (RFC 2104)
  • S3 forum: S3 forum for developers
  • S3 upload applet: A Java applet to upload files and folders to S3
  • Java S3 toolkit: An S3 toolkit for J2SE and J2ME provided by Amazon
  • Jets3t: Another Java toolkit for S3

Eric Heuveneers is a software developer and an IT consultant with more than eight years of experience. His main skills are in Java/JEE and open source solutions.


Return to ONJava.