How long does it take for AWS S3 to save and load an item?
S3 FAQ mentions that "Amazon S3 buckets in all Regions provide read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES." However, I don't know how long it takes to get eventual consistency. I tried to search for this but couldn't find an answer in S3 documentation.
Situation:
We have a website consists of 7 steps. When user clicks on save in each step, we want to save a json document (contains information of all 7 steps) to Amazon S3. Currently we plan to:
- Create a single S3 bucket to store all json documents.
- When user saves step 1 we create a new item in S3.
- When user saves step 2-7 we override the existing item.
- After user saves a step and refresh the page, he should be able to see the information he just saved. i.e. We want to make sure that we always read after write.
The full json document (all 7 steps completed) is around 20 KB. After users clicked on save button we can freeze the page for some time and they cannot make other changes until save is finished.
Question:
- How long does it take for AWS S3 to save and load an item? (We can freeze our website when document is being saved to S3)
- Is there a function to calculate save/load time based on item size?
- Is the save/load time gonna be different if I choose another S3 region? If so which is the best region for Seattle?
I wanted to add to @error2007s answers.
How long does it take for AWS S3 to save and load an item? (We can freeze our website when document is being saved to S3)
It's not only that you will not find the exact time anywhere – there's actually no such thing exact time. That's just what "eventual consistency" is all about: consistency will be achieved eventually. You can't know when.
If somebody gave you an upper bound for how long a system would take to achieve consistency, then you wouldn't call it "eventually consistent" anymore. It would be "consistent within X amount of time".
The problem now becomes, "How do I deal with eventual consistency?" (instead of trying to "beat it")
To really find the answer to that question, you need to first understand what kind of consistency you truly need, and how exactly the eventual consistency of S3 could affect your workflow.
Based on your description, I understand that you would write a total of 7 times to S3, once for each step you have. For the first write, as you correctly cited the FAQs, you get strong consistency for any reads after that. For all the subsequent writes (which are really "replacing" the original object), you might observe eventual consistency – that is, if you try to read the overwritten object, you might get the most recent version, or you might get an older version. This is what is referred to as "eventual consistency" on S3 in this scenario.
A few alternatives for you to consider:
- don't write to S3 on every single step; instead, keep the data for each step on the client side, and then only write 1 single object to S3 after the 7th step. This way, there's only 1 write, no "overwrites", so no "eventual consistency". This might or might not be possible for your specific scenario, you need to evaluate that.
- alternatively, write to S3 objects with different names for each step. E.g., something like: after step 1, save that to
bruno-preferences-step-1.json
; then, after step 2, save the results tobruno-preferences-step-2.json
; and so on, then save the final preferences file tobruno-preferences.json
, or maybe evenbruno-preferences-step-7.json
, giving yourself the flexibility to add more steps in the future. Note that the idea here to avoid overwrites, which could cause eventual consistency issues. Using this approach, you only write new objects, you never overwrite them. - finally, you might want to consider Amazon DynamoDB. It's a NoSQL database, you can securely connect to it directly from the browser or from your server. It provides you with replication, automatic scaling, load distribution (just like S3). And you also have the option to tell DynamoDB that you want to perform strongly consistent reads (the default is eventually consistent reads; you have to change a parameter to get strongly consistent reads). DynamoDB is typically used for "small" records, 20kB is definitely within the range — the maximum size of a record would be 400kB as of today. You might want to check this out: DynamoDB FAQs: What is the consistency model of Amazon DynamoDB?
Thanks for the alternative solutions. Do we know how long it takes to upload a 20kb file to S3 the first time? If that doesn't take much time, would it make sense to create a new file each time and delete the old one (i.e. no update to existing files)?
– EV3
@EV3 about the upload time, I would definitely recommend that you do a simple benchmark — it would be strongly related to where your upload originates from. If it's a server running inside AWS uploading to S3, it would likely be blazing fast; if it's a browser uploading to S3, it could depend on a lot of other factors (how good/fast/stable is the client's internet connection? what's the latency between client and S3? etc). I would definitely investigate that further – based on my experience, it's likely that 20kB uploads to S3 will give you good enough performance!
Thanks for your advice about DynamoDB. My final design is to create a new S3 document when user saves a step, save this S3 location to dynamoDB (and keep a mapping of document identifier and document S3 location), remove the old S3 document to free up space.
– EV3
@EV3: that sounds like a very good design! That approach (S3 + "pointers" in DynamoDB) is a common scenario when you need stronger consistency than what S3 alone can give you, and you need to store "larger" blobs (a few kB). One final suggestion: since you are going to keep updating your mapping in DynamoDB, you need to make sure that there are no data races – for this, you can use the Consistent Reads on DynamoDB + conditional updates. Eg, (consistently) read mapping for EV3 and a current "version" number, create new S3 object, save back to DDB the new S3 key and version++ if version = old.
Is there a flag in S3 object metadata to figure out if the object reached consistent state after the last update?
@JayKumar – no, there's no such flag. In CS terms, there can't be such a flag – the existence of such a flag would imply that the system is strongly consistent, violating the hypothesis of eventual consistency.
Starting from Dec 2020 S3 apparently delivers "Strong Read-After-Write Consistency" I would try to run tests again.
– Larytet
How long does it take for AWS S3 to save and load an item? (We can freeze our website when document is being saved to S3)
You will not find the exact time anywhere. If you ask AWS they will give you approx timings. Your file is 20 KB so as per my experience from S3 usage the time will be more or less 60-90 Sec.
Is there a function to calculate save/load time based on item size?
No there is no any function using which you can calculate this.
Is the save/load time gonna be different if I choose another S3 region? If so which is the best region for Seattle?
For Seattle US West Oregon Will work with no problem.
You can also take a look at this experiment for comparison https://github.com/andrewgaul/are-we-consistent-yet
Versioning / Visibility Factors
In Amazon S3, the time it takes for a new image with the same name to show up and replace the original depends on several factors, including the consistency model and caching mechanisms in use. Here’s a detailed breakdown:
Factors Affecting Visibility
Consistency Model:
- Read-After-Write Consistency: S3 provides read-after-write consistency for PUTs of new objects in your bucket in all AWS Regions. This means if you upload a new object with a new name, it should be immediately visible.
- Eventual Consistency: For overwrite PUTS and DELETES (for example, if you overwrite an existing object), S3 offers eventual consistency. This means that it might take some time for changes to propagate and become visible to all clients.
Caching:
- Local Cache: Your local browser or application might cache the old image. Clearing the cache or doing a hard refresh (Ctrl+F5) can sometimes resolve visibility issues.
- CDN Cache: If you are using a Content Delivery Network (CDN) like Amazon CloudFront, the CDN might cache the old image. You might need to invalidate the cache to see the updated image.
- S3 Object Cache: While S3 itself doesn’t cache objects, any proxies or caching layers in between might.
Practical Steps to Ensure Immediate Visibility
Invalidate CDN Cache: If you are using Amazon CloudFront, you can create an invalidation request to remove the old image from the cache:
bash Copy code aws cloudfront create-invalidation --distribution-id <distribution_id> --paths "/path/to/your/image.jpg"
Clear Browser Cache: Ask users to clear their browser cache or perform a hard refresh to see the updated image.
Use Versioning: If immediate visibility is critical, consider using versioning or unique query parameters to bypass the cache:
html Copy code <img src="https://your-bucket.s3.amazonaws.com/image.jpg?v=2">
Example Scenario
Here’s an example of how you might handle uploading and viewing a new image with the same name in S3:
Upload New Image: Upload the new image to S3, replacing the old image.
bash Copy code aws s3 cp new-image.jpg s3://your-bucket/path/to/image.jpg
Invalidate CDN Cache (if using CloudFront):
bash Copy code aws cloudfront create-invalidation --distribution-id E1A2B3C4D5 --paths "/path/to/image.jpg"
Clear Browser Cache: Perform a hard refresh or clear the browser cache to ensure the latest image is loaded.
Conclusion
The time it takes for a new image to replace the old one in S3 can vary due to the factors mentioned above. Generally, for most applications, the visibility of the updated image should be relatively quick, but considering caching mechanisms is crucial to ensure immediate updates are reflected to end-users.
adding ?v=2 to any image will automatically show the new one?
ChatGPT
Adding a query parameter like ?v=2
to an image URL will effectively force the browser to fetch the new image, bypassing any cached versions. Here's why and how it works:
How Query Parameters Bypass Cache
When you add a query parameter to the URL, the browser treats it as a different URL, even though the resource is the same. This way, the browser will make a new request to fetch the image rather than using the cached version. This method is commonly used for cache-busting.
Implementation
- Upload the new image: Ensure the new image is uploaded to your S3 bucket, replacing the old one.
- Update the image URL with a version parameter: Update your HTML or application code to append a version parameter to the image URL.
Example
Let's say you have an image at https://your-bucket.s3.amazonaws.com/image.jpg
. When you upload a new version of this image, you can force the browser to fetch the new version by changing the URL to:
html
Copy code
<img src="https://your-bucket.s3.amazonaws.com/image.jpg?v=2">
If you update the image again in the future, you can change the version number:
html
Copy code
<img src="https://your-bucket.s3.amazonaws.com/image.jpg?v=3">
Practical Steps
Initial Image URL:
html Copy code <img src="https://your-bucket.s3.amazonaws.com/image.jpg">
After Updating the Image in S3: Change the URL to include a version parameter:
html Copy code <img src="https://your-bucket.s3.amazonaws.com/image.jpg?v=2">
For Future Updates: Increment the version parameter:
html Copy code <img src="https://your-bucket.s3.amazonaws.com/image.jpg?v=3">
Automatic Version Management
If you are frequently updating images and want to manage versions automatically, you can use a script or backend logic to append timestamps or hash values based on the image content:
Example with Timestamp
html
Copy code
<img src="https://your-bucket.s3.amazonaws.com/image.jpg?v=<?=time();?>">
Example with Content Hash (pseudo-code)
html
Copy code
$imageHash = md5_file('/path/to/image.jpg');
<img src="https://your-bucket.s3.amazonaws.com/image.jpg?v=<?=$imageHash;?>">
Summary
By adding a version query parameter to the image URL, you can ensure that the latest version of the image is always fetched, effectively bypassing any cached versions. This method is simple and effective for cache-busting in web applications.