Download object from S3 in parallel using presigned url
How to download object from S3 in parallel using a presigned url.
Background
S3 supports downloading large object in parts i.e. subset of the file by specifying the byte range. AWS cli and SDK provides builtin support e.g. TransferManager in Java, s3manager in Go. However, using AWS SDK/cli requires providing AWS credentials. For sharing objects in a private bucket, it's a common practice to use presigned url to provide temporary access for downloading or uploading files.
The question is can we use range download with presigned url? The answer is yes. You don't need AWS SDK to set the byte range, you only need AWS SDK to create the presigned url. For one signed url, you can have multiple parallel requests with different byte range. Following steps show how to download a presigned url in parallel using curl. Which also apply to programitc download in Go, Python, etc.
Steps
First, generate a presigned url for an existing object in existing bucket. It would attach
?X-Amz-Algorithm ....
to the s3 url you provided.
aws s3 presign s3://find-a-unique-bucket-name-is-hard/my-large-file.mp4 --expires-in 604800
Get bytes of the object so we can define the byte range base on total size.
aws s3api head-object --bucket find-a-unique-bucket-name-is-hard --key my-large-file.mp4 --query 'ContentLength' --output text
Then download part of the object by specifying the byte range http header using curl. Assume the object size is 100 bytes, the range is inclusive for both start and end, two parallel requests use range 0-49 and 50-100. To download in parallel, open two terminals and run two commands about the same time...
curl -H "Range: bytes=0-49" "https://s3.us-west-1.amazonaws.com/find-a-unique-bucket-name-is-hard/my-large-file.mp4?X-Amz..." > p1.part
curl -H "Range: bytes=50-100" "https://s3.us-west-1.amazonaws.com/find-a-unique-bucket-name-is-hard/my-large-file.mp4?X-Amz..." > p2.part
Finnaly merge the parts to a new file (cat
is pretty fast...).
cat p1.part p2.part > my-large-file.mp4
Now you can play the merged video file to check if the file is correct.
References
Go
Java
- AWS - Parallelizing large downloads for optimal speed the TransferManager in AWS SDK for Java.
Python