The last Upload Part (Copy) patches went upstream in Ceph some days ago. This new feature is available in the master branch now, and it will ship with the first development checkpoint for Kraken.

In S3, this feature is used to copy/move data using an existing object as data source in the storage backend instead of downloading/uploading the object to achieve the same effect via the request body.

This extension, part of the Multipart Upload API, reduces the required bandwidth between the RGW cluster and the final user when copying/moving existing objects in specific use cases.

In this post I will introduce the feature to know this concept maps to Ceph and how it works under the hood.

Amazon S3 and the Multipart Upload API

In November 2010 AWS announced the Multipart Upload API to allow faster and more flexible uploads into Amazon S3.

Before the Multipart Upload API, the large object uploads usually experienced network issues with limited bandwidth connections. If there was an error on the upload you had to restart the upload. In some cases the network issues were not transient, and it was difficult (or maybe impossible) to upload those larger objects with low-end connections.

The other issue is related to the size of the object, and the time this information is available. Before the Multipart Upload API it was not possible to upload an object with unknown size. You had to wait to have the whole object in place before starting the upload.

In addition to these inconveniences there was also the inability to upload in parallel. The uploads had to be linear.

The next picture shows how things were working before the Multipart Upload API.

As you can see the upload process with the usual Upload API runs straight but it may raises issues with larger objects in some cases.

The Multipart Upload API resolves these limitations to upload a single object as a set of parts. After all the parts of your object are uploaded, Amazon S3 then presents the data as a single object.

With this feature you can create parallel uploads, pause and resume an object upload, and begin uploads before you know the total object size.

A typical multipart upload consists of four steps:

  1. Initiate the Multipart Upload
  2. Separate the object into multiple parts
  3. Upload the parts in any order, one at a time or in parallel
  4. Complete the upload

The official documentation also describes how the Multipart Upload API should interact with the Object Versioning or Bucket Lifecycle support in S3.

A great Multipart Upload API overview is also this official Amazon S3 Multipart Upload introduction by the S3 product managers.

Ceph and the Multipart Upload (Copy part) API

Ceph got the initial Multipart Upload API core support in June 2011. It was 6-7 months later that it was announced, as part of Amazon S3 publicly. It shipped with Argonaut (v0.48), the first major 'stable' release of Ceph in July 2012, although the feature went in around v0.29. A truly impressive timing.

The Multipart Upload API feature is a sensitive feature. It works in tandem with other parts of the code such as Object Versioning, ACL granting, AWS2/AWS4 auth and so on.

Some of the these last features were merged after the original Multipart Upload API code was upstreamed so the Multipart Upload API may be considered as one of the RGW S3 features requiring continuos evolution to face integration and growth with the new features going in.

Notice if you have a look at the previous Multipart Upload API description, and you need to copy/move an existing object, you will miss the right step to run it efficiently.

Before this API, RGW S3 users were copying/moving existing objects, by downloading and uploading the objects again.

After this new feature going upstream the efficient way to copy/move an existing object becomes the 'Upload Part (Copy)' option. The 'Upload Part (Copy)' option works specifying the source object by adding the 'x-amz-copy-source' and 'x-amz-copy-source-range' request headers.

This improvement allows the user to drive the multipart upload process (steps 1, 2, 3 and 4) while steps 2 and 3 handle data coming from the storage backend instead of the request body.

The next picture shows how the data reading happens in RGW S3 along the new copy process (steps 2 and 3 are affected).

Using the Multipart Upload (Copy part) API with Ceph

As you can imagine the new code is a bandwidth saver with specific use cases where the final object needs to be, at least partially, composed of objects that already exist in the system.

Feel free to use this Python code to test the new API.

Enjoy!

Acknowledgments

My work in Ceph is sponsored by Outscale and has been made possible by Igalia and the invaluable help of the Ceph development team. Thanks Yehuda and Casey for all your support to go upstream!

Comments

comments powered by Disqus