Boto3, the AWS SDK for Python, is the reference implementation to consume the Amazon cloud services.

The SDK was designed to support the AWS lifecycle so any possible solution using this SDK will require valid Amazon endpoints or regions to get the things working.

With this post I will have a look in the current Boto3 implementation to know how endpoints and regions are supported in S3, and how it would be possible to use Boto3 with compatible S3 REST interfaces if needed.

I will use the two available request signature processes, v2 and v4, to confirm all things work as expected.

I will also comment on setting up new and compatible regions with Boto3 to consume compatible S3 API, and how the current region constraints can be enabled or disabled.

Let's start with AWS...

Amazon Web Services (AWS), is a collection of cloud computing services, also called web services, that make up the cloud-computing platform offered by Amazon.

Among those services the Amazon Elastic Compute Cloud (EC2) and the Amazon Simple Storage Service (S3) are the most central and well-known services.

These services operate from 12 geographical regions across the world, with 32 availability zones (AZs) within those regions.

Each region has multiple AZs, which are distinct data centers providing AWS services.

AZs are isolated from each other to prevent outages from spreading between zones.

Several services operate across AZs (e.g., S3) while others can be configured to replicate across zones to spread demand and avoid downtime from failures.

On S3 and APIs...

The Amazon Simple Storage Service (S3) is an online file storage web service offered by AWS.

S3 provides storage through web services interfaces (REST, SOAP, and BitTorrent). We will be interested in the REST interface though.

The S3's design aims to provide scalability, high availability, and low latency at commodity costs. It follows an object storage architecture.

Objects are organized into buckets (each owned by an AWS account) and identified within each bucket by a unique, user-assigned key. As expected, buckets and objects can be created, listed, and retrieved using these S3 APIs.

The broad adoption of Amazon S3 and related tooling has given rise to competing services based on the S3 API. These services use the standard programing interface; however, they are differentiated by their underlying technologies and supporting business models.

This broad adoption is happening in the server side and the client side with different levels of API coverage and maturity.

On AWS endpoints, regions and request signatures...

AWS offers regional endpoints to make the requests. An endpoint is an URL that is the entry point for a web service.

Some services don't support regions but it is not the case of S3. S3 supports regions and its endpoints can include a region explicitly. It also lets specify an endpoint that does not include a specific region (https://s3.amazonaws.com). In that case, AWS routes the endpoint to 'us-east-1' region.

Beyond of handling the endpoints and regions to connect with S3 properly, we need to take in consideration the request signature process.

This process is required to verify the identity of the requester, protect the data in transit and protect against potential replay attacks.

AWS supports two signature versions: signature version 2 and signature version 4. Most services support version 4 and, if a service supports version 4, it is recommend over version 2 strongly.

Under the hood, to sign a request, you have to calculate a hash (digest) of the request, and then use the hash value, some other values from the request, and a secret access key to create a signed hash. That is the signature.

One example of signature follows:

AWS4-HMAC-SHA256 Credential=AKIDEXAMPLE/20150830/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-date, Signature=b97d918cfa904a5beff61c982a1b6f458b799221646efd99d3219ec94cdf2500

The issue with the signatures...

If you have a look in the example signature you see a field called 'Credential'. This field is a required part of the signature version 4. It contains the 'credential scope'. The credential scope value is a string that includes the date, the region, the service and a termination string ('aws4_request') in lowercase characters.

In the example case, the credential scope is:

20150830/us-east-1/s3/aws4_request

It seems including the region in the credential scope is not a bad design decision. The issue here is the decision to enforce the request signature policy in the SDK instead of the region's server only.

This SDK's request signature policy uses the region to know the expected region signature (v2/v4 or v4 only) in order to update it if needed.

In other words, if you want to use the Amazon SDK with your own compatible S3 API, you will need to align your infrastructure and its lifecycle with the Amazon ones. It means using the same AWS's region names to support v2.

If you uses a region with v4 support only, or you uses an arbitrary region name the signature version will be v4 by default.

Testing the v2/v4 support with Boto3...

Let's start having a look in this script connecting to AWS S3 via Boto3...

This script connect to Ireland ('eu-west-1' region). This region supports http/https and signatures v2/v4.

The script will list all buckets in that region. The connection goes under v2 as expected.

If you want to connect under v4, you provide an explicit 'config' object with the proper version...

Okay, it works as expected.

With the previous code working, we will run 5 tests to check the v2/v4 support in Boto3...

The test #1 connects to Frankfurt ('eu-central-1' region). This region supports v4 only. Its endpoints are 's3.eu-central-1.amazonaws.com' and 's3-eu-central-1.amazonaws.com'.

The test connection parameters are:

  • region_name='eu-central-1'
  • endpoint_url='http://s3.amazonaws.com'
  • config=boto3.session.Config(signature_version='s3v4')

The output shows the error:

An error occurred (AuthorizationHeaderMalformed) when calling the ListBuckets operation: The authorization header is malformed; the region 'eu-central-1' is wrong; expecting 'us-east-1'

The error comes from the server, so it shows the expected behaviour.

The test #2 connecs to Frankfurt ('eu-central-1' region) but trying v2 instead of v4.

The test connection parameters are:

  • region_name='eu-central-1'
  • endpoint_url='s3.eu-central-1.amazonaws.com'

It lists the buckets... but it uses v4 instead of v2. In this case, the SDK enforces the signature version before making the request.

The test #3 connects to Frankfurt ('eu-central-1' region) but it tries a possible conflict between regions.

The test connection parameters are:

  • region_name='eu-west-1'
  • endpoint_url='http://s3.eu-central-1.amazonaws.com'
  • config=boto3.session.Config(signature_version='s3v4')

The output shows the error:

An error occurred (AuthorizationHeaderMalformed) when calling the ListBuckets operation: The authorization header is malformed; the region 'eu-west-1' is wrong; expecting 'eu-central-1'

The test #4 connects on a compatible S3 API (Ceph RGW). The test uses an arbitrary region and endpoint. It tries under v2.

The test connection parameters are:

  • region_name='free-region'
  • endpoint_url='http://x4.dragon.arbitrary.zzz'

It works... but the SDK uses v4 instead of v2.

The last test, the test #5, connects with an arbitrary endpoint and a 'valid' AWS region with v2 support. It tries under v2.

The test connection parameters are:

  • region_name='eu-west-1'
  • endpoint_url='http://x4.dragon.arbitrary.zzz'

It works... and the SDK uses v2 as expected.

The results...

After running the previous tests, it looks the major issue with Boto3 is the way how it enforces the request signature depending on the region to connect.

Beyond of this behaviour the tests show the expected behaviour.

It is also possible connecting with arbitrary endpoints and regions. In the case of using signature version 2, it is required to use some well-known Amazon region or Boto3 will modify the request signature before making the request on version 4.

This behaviour is also coherent with the AWS-CLI command line utilities as expected.

The next command connects under v2...

aws s3 ls --endpoint-url http://x4.dragon.arbitrary.zzz

While the next command will force v4...

aws s3 ls --region free-region --endpoint-url http://x4.dragon.arbitrary.zzz

Setting up your own regions and constraints...

If you want to setup your own regions and contraints you can modify the file '_endpoints.json' properly. In my configuration it lives in '.../botocore/data/_endpoints.json

This file maps constraints to services. In the case of the S3 service you need to take in consideration the following lines:

Each constraint consists of an attribute in the first index ('scheme', 'service', 'region'), an assertion in the second index ('startsWith', 'equals', 'oneOf') and a value in the third index (e.g. 'us-east-1')

Let's have a look in the second entry. In that entry the constraint matchs any region starting with 'cn-'. If this constraint has success then the signature version becomes v4. The code implements a first-match policy.

With the current mapping if you want to add a new region supporting v2 and v4 you could add it in the third entry quickly.

By the way, you can want to disable all these constraints. In this case you could use a first entry as:

This entry will match all regions and it will modify nothing.

Comments

comments powered by Disqus