I. Introduction
- Fully managed, petabyte-scale data warehousing solution (OLAP) for BI
- Based on pgsql, 10x better performance
- Customers can start at $0.25/hr with no commitment
- Two configurations
- Single-node (160 GB)
- Multi-node
- Leader node (manages client connections and receives query)
- Compute node (store data and perform queries, up to 128 nodes)
- Uses column-based storage and compression
- Doesn't require indexing so uses less space than traditional solutions
- Comes with Massively Parallel Processing (MPP) and distributes data/query load across all nodes enabling fast query performance
- Default backup with 1-day retention period (max 35)
- Redshift always attempts to keep at least three copies of the data
- Original
- Replica
- S3 backup (async, for disaster recovery)
- Priced based on compute node-hours i.e. 1 unit per node per hour, backup and data transfer (within VPC)
- Always encrypted with AES-256
- Multi-AZ is NOT supported
- Data can be loaded from S3, DynamoDB, DMS, other services
- Supports all popular open data formats (Avro, Parquet, ORC etc)
II. Redshift spectrum
- Query data that is already in S3 without loading it.
- To use it, you must have Redshift cluster to start the query.
- The query is then submitted to thousands of Redshift spectrum nodes.