AWS S3 Sync Examples - Sync S3 buckets AWS CLI

In this article, we are going to see how to use the powerful aws s3 sync CLI command.

In the Linux world, if we want to compare two directories and copy the files on the same machine or between two machines over the remote (ssh) we use rsync

Those who have used it know how powerful it is. similarly, aws s3 sync is for the S3 bucket or S3 file system ( it is technically not a file system, it is an object store)

Now let us see some real-time use cases for the aws s3 sync command.

Table of Contents

Where can aws s3 sync be used?

To upload a Large directory with files to the remote S3 bucket recursively
download a Large directory with files from S3 to local
To compare two S3 buckets and copy files between them ( only the missing )
compare a local directory with a remote S3 bucket and copy the missing files on the remote or vice versa
Syncing two buckets in two different regions
Including or Excluding files during the copy between source and destination
To update the website content and set cache if your website hosted on S3
Uploading files from local with different file permissions on bulk

There are more use cases of aws s3 sync than I have listed here.

While not all of them are mostly used across the industry. we will try to address some of the examples of the aws s3 sync command

Let us start with some basics

AWS CLI Installation and Configuration

For us to be able to use aws s3 sync command. we need to have the aws CLI installed and configured.

If you already have the AWS CLI installed and configured in your local. please ignore this section and move on or follow through.

Here we are trying to create a new user with full Admin access privileges for testing purposes.

You can give fewer privileges or just s3 related permissions are also fine just to try this article.

Setup your Programmatic Access – Create Access Key

If you would like to create a new user in IAM along with the Access Key follow these steps.

Login to AWS Console
In the services go to IAM
Create a User and Click on map existing Policies
Choose UserName and Select the Policy (Administrator Access Policy)
Create user
The final Stage would present the AccessKEY and Secret Access as given below.

API key creation Successful Message Banner

If you would like to Choose the existing user and create an Access Key follow this

Sign in to the AWS Management Console and open the IAM console at https://console.aws.amazon.com/iam/.
In the navigation pane, choose Users.
Choose the name of the user whose access keys you want to create, and then choose the Security credentials tab.
In the Access keys section, choose to create an access key.
To view the new access key pair, choose Show. You will not have access to the secret access key again after this dialog box closes. ( Refer to the image given above)

Install AWS CLI

Based on your base machine the AWS CLI installation and command might vary.

Amazon has given clear instructions on how to install AWS CLI on each platform. Choose any of the following links and get your AWS CLI installed and ready

Configure AWS CLI

I presume that you have installed the AWS CLI package and if everything went well.

You should be able to see the version of the AWS CLI installed when entering the following command in your terminal or command prompt

aws – version

I am using the AWS CLI Version1 as CLI Version 2 is still in Beta.

Now it is time to configure the AWS CLI, Just enter the following command and you would be prompted with a few questions about the Access Key and Passwords.

aws configure

it would look like this as you are setting it up.

You enter your own AWS Access Key ID and Secret Access Key and the one given below is not correct. Just a made-up.

➜ ~ aws configure
AWS Access Key ID [None]: AKIAS790KQGK63WUK6T5
AWS Secret Access Key [None]: kkQEiBjJSKrDkWBLO9G/JJKQWIOKL/CpHjMGyoiJWW
Default region name [None]: us-east-1
Default output format [None]:

Well done. You are ready with AWS CLI

AWS S3 SYNC Examples ( Local and S3 bucket )

Hope you are ready with all the necessary items to test things in practice.

Here are some examples of aws s3 sync

First, let us see how to use aws s3 sync to upload and download data. In other words, syncing the local to S3 bucket and S3 bucket to local

AWS S3 sync Local to S3 bucket - Copying/Uploading files

When you want to copy a large number of files in Linux, we always use RSYNC as it compares the existence of the files in both source and destination and copies only the files which are not present or recently modified.

rsync provides options to delete the files in case you want the sync to be perfect.

AWS S3 Sync also provides similar options too. we will get there.

For now, let us see a simple example of uploading a directory using s3 sync

I have a directory named tobeuploaded on my local and I want to sync it to the remote S3 bucket which is empty.

Let us execute the aws s3 sync command to upload the files/directories on the tobeuploaded directory to the S3 bucket recursively

⚡ aws s3 sync tobeuploaded/. s3://gritfy-s3-bucket1

You can also execute this command in another way.

⚡ cd tobeuploaded
⚡ aws s3 sync . s3://gritfy-s3-bucket1

In this example, we are cd going into that directory and syncing the file both would give the same result.

Here is the execution/implementation terminal record

After the upload, if you execute the aws s3 ls command you would see the output as shown below.

⚡ aws s3 ls s3://gritfy-s3-bucket1

2022-06-17 21:42:30 0 testfile1
2022-06-17 21:42:30 0 testfile2
2022-06-17 21:42:30 0 testfile3

If you notice, you can see the subdirectories are not being copied.

Note*: s3 sync does not consider the empty directories in both upload/download. the empty directories would not be considered at the destination if they are empty on the source.

Now, I hope you know why the subdirectories were not considered by the aws s3 sync command.

Let us create some test files in the subdirectories and re-sync.

As you can see on the preceding terminal record. the subdirectories are now copied or synced to the S3

the output of aws s3 ls with the recursive flag would yield the following result

⚡ aws s3 ls s3://gritfy-s3-bucket1 – recursive
2022-06-18 00:47:40 0 testdir1/testfile1
2022-06-18 00:47:41 0 testdir2/testfile1
2022-06-17 21:42:30 0 testfile1
2022-06-17 21:42:30 0 testfile2
2022-06-17 21:42:30 0 testfile3

Sync S3 bucket to Local - Downloading Files from S3 to local

In the previous example, we have seen in detail how to sync the local directory and files to the s3 bucket.

Now let us see how can we sync the S3 bucket to the local directory.

In other words, we can say downloading the content from the S3 bucket to local.

To sync the S3 bucket to the local, all we need to change is the order of S3 URI and the local path

Here is the recorded video of me trying to sync the S3 bucket to local

As you can see in the preceding video, Existing files would not be copied or overwritten by default

Let's suppose our local directory has these files

⚡ ls -lrt
total 0
-rw-r--r – 1 sarav staff 0 Jun 17 21:40 testfile1
-rw-r--r – 1 sarav staff 0 Jun 17 21:40 testfile2
-rw-r--r – 1 sarav staff 0 Jun 17 21:40 testfile3
drwxr-xr-x 3 sarav staff 96 Jun 18 00:46 testdir1
drwxr-xr-x 3 sarav staff 96 Jun 18 00:46 testdir2

Our S3 buckets have these files

⚡ aws s3 ls s3://gritfy-s3-bucket1 – recursive
2022-06-18 00:47:40 0 testdir1/testfile1
2022-06-18 00:47:41 0 testdir2/testfile1
2022-06-19 22:29:58 0 testdir3/testfile1
2022-06-17 21:42:30 0 testfile1
2022-06-17 21:42:30 0 testfile2
2022-06-17 21:42:30 0 testfile3

As you can see, we have an extra directory on the S3 bucket which is not present on the local path

If I sync my local path to the S3. No action would be taken as the source is our local path and the destination is an s3 bucket

⚡ aws s3 sync . s3://gritfy-s3-bucket1

If I sync the S3 bucket to my local path, In this context, the source is the S3 bucket and the destination is the local path

The missing file or directory (non-empty) on my local would be copied over from the S3

⚡ aws s3 sync s3://gritfy-s3-bucket1 .
  download: s3://gritfy-s3-bucket1/testdir3/testfile1 to testdir3/testfile1

As you can see the testdir3 is now being copied from the S3 to local.

If you are trying to sync the S3 bucket to an empty local directory, you could be able to download all the files from the S3 bucket recursively

Hope you have now understood about S3 sync and source and destination.

Here is the pictorial representation of what we did now.

how to upload and download with s3 sync.

Sync and delete the extra files on the S3 destination

While performing the sync, What if we want to remove some files to maintain the sync state either at the source or destination. local path or S3

Let's suppose we have a file in the S3 bucket but not on the local path.

As we have seen if we use the S3 path as the source and local path as the destination and perform the sync, the missing file would be downloaded

Let's suppose we have a file in the Local path but not on the S3 bucket

If we do a sync with the local path as a source and the S3 bucket as destination the file would be uploaded to the s3 bucket.

Now let's say we want to delete the file at the destination if the file is not present in the source.

How to do it?

You need a special option to be passed to the aws s3 sync command

--delete (boolean) Files that exist in the destination but not in the
source are deleted during sync.

Let us see it in real time

I have the local directory ( source ) with the following files

⚡ tree .
.
├── testdir1
│   └── testfile1
├── testdir2
│   └── testfile1
├── testfile1
├── testfile2
└── testfile3

I am trying to sync this local directory with a remote S3 bucket which has the following files

⚡ aws s3 ls s3://gritfy-s3-bucket1 – recursive
2022-06-18 00:47:40 0 testdir1/testfile1
2022-06-18 00:47:41 0 testdir2/testfile1
2022-06-19 22:27:15 0 testdir3/
2022-06-19 22:29:58 0 testdir3/testfile1
2022-06-17 21:42:30 0 testfile1
2022-06-17 21:42:30 0 testfile2
2022-06-17 21:42:30 0 testfile3

as you can see there is a directory named testdir3 and file inside, which is not present on the source

Let us see how to delete it on the S3 bucket.

Remember the file would be deleted only on the destination if they are not present on the source.

In our case destination is S3 bucket

In the preceding screenshot, you can see we are using the --delete flag

⚡ aws s3 sync – delete . s3://gritfy-s3-bucket1
delete: s3://gritfy-s3-bucket1/testdir3/testfile1

You can see it returns the output that the additional file which is not present on the source but the destination is deleted.

Sync and delete the extra files on the Local destination

we have seen how to delete the extra files which are present on the local but not on the destination s3 bucket

What if the destination is local and it has an extra file.

No difference except the change of destination, the --delete a flag would make sure it deletes the file on the destination irrespective of whether it is local or S3

Take a look at the list of files on the S3 bucket

Here is the list of files on my local. let us try to sync keeping the local path as the destination

⚡ ls -rlt

-rw-r--r – 1 sarav staff 0 Jun 17 21:40 testfile1
-rw-r--r – 1 sarav staff 0 Jun 17 21:40 testfile2
-rw-r--r – 1 sarav staff 0 Jun 17 21:40 testfile3
drwxr-xr-x 3 sarav staff 96 Jun 18 00:46 testdir1
drwxr-xr-x 3 sarav staff 96 Jun 18 00:46 testdir2

as you can see the testfile3 is not present on the source ( s3 path ) but the destination (local path)

when we execute the following command. the testfile3 should be deleted from the local path

⚡ aws s3 sync s3://gritfy-s3-bucket1 . – delete

and here is the output of this command at my end when tested

as you can see the testfile3 is deleted on the local path ( destination)

AWS S3 Sync two S3 buckets

So far we have seen how to sync upload, download and delete between the local path and s3 bucket

Let us see the most important feature of comparing and syncing two S3 buckets

Just know that all the examples we have seen earlier could be replicated to the S3 buckets.

The only difference is that we are going to use two S3 buckets in both source and destination, unlike our previous examples.

Let us see examples of how to sync two s3 buckets

AWS S3 sync two s3 buckets

Always remember the Source and Destination play an important role while performing a Sync.

While syncing two S3 buckets, the bucket which is at the source end would be considered the ideal state and the destination would be brought to the source state

as we have mentioned the source bucket would be considered as the ideal desired state and the destination would be synced to match the source.

let us validate it once by testing it ourselves

I have two buckets.

here is the state of the first s3 bucket

and the state of the second s3 bucket

as you can see the Second bucket gritfy-s3-bucket2 is empty.

to meet the desired state, we need to have the files created on the bucket2

⚡ aws s3 s3://gritfy-s3-bucket1 s3://gritfy-s3-bucket2

let us try to sync it. See it in action

you can see the files are copied from the source bucket to the destination bucket as the bucket 2 was not having the files on bucket1

In case you want to keep bucket2 as the ideal desired state. All you have to do is to change the source and destination

In the following command, you can see that bucket2 is now at the source and bucket1 is on the destination end.

⚡ aws s3 s3://gritfy-s3-bucket2 s3://gritfy-s3-bucket1

Remember the source is considered as the desired state and the destination would be matched to the source

AWS S3 sync two s3 buckets - Delete extra files

As we have seen earlier, we can delete the extra files/objects from the destination if they are not present on the source.

The same can be done while syncing between two S3 buckets

Let us see how it works in real time.

we will take the same buckets we have used in the previous example.

bucket1 is having the following files and directories

⚡ aws s3 ls s3://gritfy-s3-bucket1 – recursive
2022-06-18 00:47:40 0 testdir1/testfile1
2022-06-18 00:47:41 0 testdir2/testfile1
2022-06-17 21:42:30 0 testfile1
2022-06-17 21:42:30 0 testfile2

bucket2 is having the following files and directories

⚡ aws s3 ls s3://gritfy-s3-bucket2 – recursive

2022-06-20 23:00:20 0 testdir1/testfile1
2022-06-20 23:00:21 0 testdir2/testfile1
2022-06-20 23:03:53 0 testdir3/testfile1
2022-06-20 23:03:53 0 testdir4/testfile1
2022-06-20 23:00:21 0 testfile1
2022-06-20 23:00:21 0 testfile2
2022-06-20 23:03:53 0 testfile3

as you can see we have two directories and a file extra on bucket2 which is not present on bucket1.

If we are using bucket1 as the source and bucket2 as the destination and perform a simple sync. No action would be taken as the source files are already present at the destination.

In case if you want to make sure bucket2 is exactly the same as bucket1. you should delete the extra files on the bucket2

--delete option can help here. we have seen how it works earlier too.

It deletes the files on the destination which are not available on the source

In the preceding screenshot, you can see while performing the sync, the extra files on the destination are removed.

Note*: You can also use an S3 access endpoint in place of the S3 bucket URL.

Hope you have learnt how to perform aws s3 sync between two buckets.

AWS S3 Sync - Exclude and Include files

we have seen so far how to upload/download files and how to sync local path and S3 bucket and also two s3 buckets.

sometimes we might not want all of our files to be considered while performing the sync

for example. let's say you have a directory with various file types like pdf, jpg, png, txt etc.

You are interested in uploading only the image files. how would you do it?

there are two special options that you can use --exclude and --include

let us see how it works

I have a s3 bucket, used as a backend for the CDN ( Content Delivery Network) with images.

I have some files on my local and want to upload (sync) only the image files from Local Path(source) to the s3 bucket (destination)

Look at the following directory listing where I have a variety of files pdf, jpg, png, txt

⚡ ls -lrt
total 0
-rw-r--r – 1 sarav staff 0 Jun 21 11:35 test.jpg
-rw-r--r – 1 sarav staff 0 Jun 21 11:35 test1.png
-rw-r--r – 1 sarav staff 0 Jun 21 11:35 test3.pdf
-rw-r--r – 1 sarav staff 0 Jun 21 11:35 test.txt
-rw-r--r – 1 sarav staff 0 Jun 21 11:36 test2.jpg
-rw-r--r – 1 sarav staff 0 Jun 21 11:36 test2.png
-rw-r--r – 1 sarav staff 0 Jun 21 11:36 test1.pdf
-rw-r--r – 1 sarav staff 0 Jun 21 11:36 test1.txt

Let us see how can I sync (upload) only the image files. with extension jpg or png to our s3 bucket

see it in action here.

As you can see on the terminal record, we are including and excluding certain files using --include and --exclude

To include only image files, you need to exclude all the files first and include only images like this

⚡ aws s3 sync . s3://gritfy-cdn – exclude "*" – include "*.jpg" – include "*.png"

to exclude certain files only you can simply use exclude alone

⚡ aws s3 sync . s3://gritfy-cdn – exclude "*.txt"

You can use multiple --include and --exclude as per your requirement

Read more about include and exclude filter here

Conclusion

In this article, we have learnt various use cases and examples of aws s3 sync such as

How to upload a directory recursively to S3 using aws S3 CLI
How to download a directory recursively from S3 to local using aws S3 CLI
Syncing Local and S3 buckets and uploading/downloading the files
Source and Destination importance in the S3 command
Sync and delete the extra files at the destination
Syncing two S3 buckets and deleting extra files using aws s3 sync etc

Hope this article helps.

If you have any requirements in Cloud and DevOps or Product Development. For Free Consultation reach out to us at [email protected]

Cheers
Sarav AK

Follow me on Linkedin My Profile
Follow DevopsJunction onFacebook orTwitter
For more practical videos and tutorials. Subscribe to our channel

Signup for Exclusive "Subscriber-only" Content