Thursday 02 Mar 2023

Data Loss, Data Loss Everywhere!

Image Copyright

I was reviewing and tidying my S3 buckets using the AWS web-based console. I recently finished a process of backing up photos by year in a tar format instead of thousands of files within a directory. I figure it would be easier to restore in the future and less deep archive overhead. Plus it allowed me to tidy up my buckets because I had data stored there since 2015.

Once I finished uploading, I went to wipe the previous bucket. However, I accidentality emptied the wrong bucket; after noticing within a second, I hit back button on the browser but the damage was done - S3 had wiped the majority of the bucket. I had versioning enabled but this does not matter if you use the "empty" button on the web interface. AWS do ask you to type "permanently delete" but I misread this bucket name with a similar named bucket. So the problem here was the human.

In this case there was no panic – there was annoyance though: this was my off-site backup so I was able to then reupload the data. Luckily I use many individual buckets so those were safe.

After 2 days uploading the lost data I was back on track! My off-site backup was restored. This demonstrates the importance of 3-2-1 backup strategy (3 copies of data on 2 different media and 1 off-site). Despite using bucket versioning, I still lost the data. I since learned the “empty” button is irreversible.

I am considering using S3 replication to either replace to another account (would cause more complexity and administration) or to another bucket in another region. I use eu-west-1 so could perhaps use us-east-1 of similar. I cannot use AWS Backup because deep archive is not supported. If I choose this option I will write about it. As I have data existing, I need to use a batch replication process.

I expect a higher AWS bill this month.