The last Requester Pays Buckets patches went upstream in Ceph some days ago. This new feature is available in the master branch now, and it will be part of the next Ceph Jewel release.

In S3, this feature is used to configure buckets in such a way that the user who request the contents will pay transfer fee.

Along this post I will introduce the feature in order to know how this concept maps to Ceph and how it works under the hood.

Understanding the feature

The Requester Pays Buckets feature originates in the Amazon S3 storage. It is part of the Amazon business model related to the Cloud storage.

In S3, the bucket owners pay for all Amazon S3 storage and data transfer costs associated with their buckets. This approach makes sense to cover the use cases where the bucket owners use the service to host/consume the content and/or they want to share the content with some authenticated users.

On the other hand, a relevant number of use cases use S3 to share a huge amount of content requiring some kind of option to balance the costs among the different content consumers in that bucket. This option is known as 'Requester Pays Buckets'

Bear in mind this feature becomes critical when the content is fairly popular and the bucket owner have many requests. This is the typical use case among global content distributors. In this case, the transfer fees may become a major issue.

Mapping the feature to Ceph

As mentioned, this feature comes from the S3 storage where it is used to balance costs related to data transfer in buckets. When enabled the requester pays for the data transfer and the request although the bucket owner still pays for the data storage.

The S3 algorithm also charges the bucket owner for the request under the following conditions:

  • The requester doesn't tag the request as 'Requester Pays Bucket'
  • The request authentication fails
  • The request is anonymous
  • The request is a SOAP request

Ceph does not implement any billing and account management service oriented to charge users so the feature can not be ported as it is.

In this point we made the decision to implement the mechanisms behind of this feature but keeping out the billing policies of Ceph. This way you can find the proper support to reproduce the original Amazon S3 billing behaviour although you will be free to wrap this support with different and more flexible billing policies if needed.

To keep the compatibility with the tools in the S3 ecosystem, the S3 interface of this feature is in place. The Boto library was used to test the proper behaviour.

In the backend, the usage logging was extended to accommodate the new feature. Now the usage logging records are not displayed by the bucket owner. They are listed by the bucket user where this user may be the owner or not.

This change required a new way to index the records in the usage logging although it doesn't break the compatibility with the previous Ceph versions. Bear in mind the old records are displayed in the new format.

There are two notable differences between the S3 and the RGW S3 algorithms. The S3 algorithm charges the bucket owner for the request if the requester doesn't tag the request as 'Requester Pays Bucket' or the request authentication fails. In the case of RGW S3 both cases are logged under the requester instead of the owner.

The S3 REST API

The Ceph RGW S3 REST API was extended to support the following use cases:

Ceph RGW S3 REST API implements the same behaviour and semantics as S3. It is needed to support the S3 tooling ecosystem in a transparent way.

Some examples with Python and Boto

You can use the following Python scripts to set, retrieve and download objects using the Requester Pays Buckets feature in Ceph. Those examples require Boto.

The usage log

The usage log is the place where the Requester Pays Bucket information is aggregated. There are three fields related with the feature ('user', 'owner' and 'payer').

The 'user' (or 'requester') is the client credential accessing the bucket content.

The 'owner' is the client credential creating the bucket.

The 'payer' is the client credential paying the data transfer. If this field doesn't exist the 'owner' is the client credential paying the data transfer.

The new virtual error buckets

One virtual error bucket is a new abstraction to log the usage on non existent resources (404 Not Found). All virtual error buckets have the same name ('-').

Having a look in the 'ops' and 'successful_ops' fields under the virtual error buckets, you will see the second one is always zero.

Each user has its own virtual error bucket to collect 404 errors. Ceph will add a virtual error bucket with the first 404 error available. The virtual error buckets live in the usage logging only.

Wrap up

With this new feature in place, Ceph implements the required support to know in detail who is accessing the RGW S3 buckets (owners vs authenticated users)

The feature brings in new ways to understand and track the bucket content in massive use cases where the costs may be assigned to different users.

The new usage logging also contains more detailed information to be used in regular reports to customers ('owner vs payer' categories, ops on virtual error buckets, etc)

Acknowledgments

My work in Ceph is sponsored by Outscale and has been made possible by Igalia and the invaluable help of the Ceph development team. Thank you guys!

Comments

comments powered by Disqus