I will use this blog post to talk about this new feature shipping in Ceph Jewel and the current effort by Outscale and Igalia to raise the level of compatibility between the Ceph RGW S3 and Amazon S3 interfaces.
In detail, I will describe the signing process in AWS4, how it works in Ceph RGW, the current coverage and the next steps in the pipeline around this authentication algorithm.
S3 request authentication algorithms
If you are not familiar with request authentication, regions, endpoints, credential scopes, etc. in Amazon S3 you could want to read one of my last posts about this stuff. It offers a simple and quick overview of this stuff while introducing the concepts and terms I will use in this blog post. A more long and low-level technical reading is available in the AWS documentation too. I will use this last one to drive/compare the implementations in an reasonable level for everybody.
Amazon S3 provides storage through web services interfaces (REST, SOAP and BitTorrent). By the way, Ceph RGW implements a compatible S3 REST interface in order to be interoperable with the Amazon S3 REST ecosystem (tools, libraries, third-party services and so on).
This S3 REST interface works over the Hypertext Transfer Protocol (HTTP) with the same HTTP verbs (GET, POST, PUT, DELETE, etc) that web browsers use to retrieve web pages and to send data to remote servers.
There are two kind of Amazon S3 RESTful interactions: authenticated and anonymous. The way to implement request authentication is signing these requests or interactions using an authentication algorithm. In the Amazon signing process' public specification there are two authentication algorithms currently in use: AWS2 and AWS4.
AWS2 and AWS4 in Ceph
The new Signature Version 4 (AWS4) is the current AWS signing protocol. It improves the previous Signature Version 2 (AWS2) significantly. Take into consideration these algorithm strenghts of AWS4 over AWS2
- To sign a message, the signer use a signing key that is derived from her secret access key rather than using the secret access key itself
- The signer derives the signing key from the credential scope, which means that she doesn't need to include the key itself in the request
- Each signing task requires to use the credential scope
The benefits of using AWS4 in Ceph are clear:
- Verification of the identify of the requester via access key ID and secret access key
- Request tampering prevention while the request is in transit
- Replay attacks protection within 15 minutes of the timestamp in the request
The signing process can express authentication information by using one of the following methods:
- HTTP Authorization header. The most common method of authenticating. The signature calculations vary depending on the method you choose to transfer the request payload; 'transfer payload in a single chunk' vs 'transfer payload in multiple chunks (chunked upload)'
- Query string parameters. It uses a query string to express a request entirely in an URL together with the authorization information. This type of URL is also known as a presigned URL.
The current Ceph AWS4 implementation supports all authentication methods but transfering payload in multiple chunks (chunked upload). It is in the pipeline though.
Lacking chunked upload does not impact the Ceph RGW performance. The server side always use a streaming-hash approach to compute the signature.
Computing a Signature
The idea behind of computing a signature is using a cryptographic hash function over the request, and then use the hash value, some other values from the request, and a secret access key to create a signed hash. That is the signature.
Depending on the kind of authentication method used and the concrete request the algorithm requires different inputs. As one example to illustrate one of the authentication paths we can explore the required steps to craft a signature in the HTTP Authorization header case.
As you can see it computes a canonical request, a string to sign and a signing key as part of the process.
The final signature is the result of hashing the signing key and the string to sign. The keyed-hash message authentication code used along the signature computation is HMAC-SHA256
Default configuration in Ceph Jewel
Ceph Jewel is planned to ship with AWS2 and AWS4 enabled by default. You will not need to configure any extra switch to authenticate with AWS2 or AWS4.
In Amazon S3 the region enforces the allowed authentication algorithms.
In the case of Ceph RGW the code doesn't implement any kind of constraint related to the region names.
The next steps in the pipeline
The chunked upload feature to transfer the payload in multiple chunks is part of the pipeline definitely.
Some kind of integration with zones/regions to provide 'signature binding' could make sense too. It would help to enforce auth policies and so on.
- AWS4 chunked upload goes upstream in Ceph RGW S3
- Ceph, a free unified distributed storage system
- On S3, endpoints, regions, signatures and Boto 3