I. Introduction
- AWS DataSync is an online data transfer service that simplifies, automates, and accelerates copying large amounts of data to and from AWS storage services over the internet or AWS Direct Connect.
- Can copy between NFS, SMB, S3, EFS and FSx for Windows.
- Allows you to copy millions of files without having to build custom solutions or licenses.
- You can schedule copy jobs and pass filters to only include the desired files.
- Replication tasks can be scheduled hourly, daily, weekly.
- Need to install DataSync agent (an EC2 instance running DataSync Agent AMI) on-premise to initiate the transfer.
- Performs integrity checks to make sure the the data is copied correctly to the destination.
- You can see the status of the data being copied using CloudWatch metrics.
- Preserves directory structure when copying data.
- Can use either Direct Connect OR private VPC endpoint (PrivateLink) OR regular internet.
- When using VPC endpoint to transfer, data does not traverse public internet.
- To use VPC endpoints with AWS DataSync, you create an AWS PrivateLink interface VPC endpoint for the DataSync service in your chosen VPC, and then choose this endpoint elastic network interface (ENI) when creating your DataSync agent. Your agent will connect to this ENI to activate, and subsequently all data transferred by the agent will remain within your configured VPC.
- If a task is interrupted, the agent restarts and resumes from previous location.
- AWS DataSync assumes an IAM role that you provide. The policy you attach to the role determines which actions the role can perform.
- A single DataSync agent is capable of fully-utilizing a 10 Gbps network link.
- All data transferred between the source and destination is encrypted via Transport Layer Security (TLS), which replaced Secure Sockets Layer (SSL). Data is never persisted in AWS DataSync itself.
- It is PCI compliant and HIPAA eligible.
- NOT suited for bandwidth-constrained customers (use snowball or snowcone).
- You are only charged for the data you move.