Amazon Redshift – DevOps and Cloud Blog

Table of Contents

Data Warehouse

In computing, a data warehouse is a system used for reporting and data analysis and is considered a core component of business intelligence.
A data warehouse centralizes and consolidates large amounts of data from multiple source
The data within a data warehouse is usually derived from a wide range of sources such as application log files and transaction applications.
It is a relational database that is designed for query and analysis rather than for transaction processing.
- It usually contains historical data derived from transaction data, but it can include data from other sources.
- To perform analytics you need a data warehouse not a regular database
OLAP (On-line Analytical Processing) is characterized by relatively low volume of transactions. Queries are often very complex and involve aggregations.
RDS (MySQL..etc ) is an OLTP database, where there is detailed and current data, and a schema used to store transactional data

Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service .
It is optimized for datasets ranging from a few hundred gigabytes to a petabyte or more
It costs less than $1,000 per terabyte per year, a tenth the cost of most traditional data warehousing solutions.
Amazon Redshift gives you fast querying capabilities over structured data using familiar SQL based clients and business intelligence (BI) tools using standard ODBC and JDBC connections.
Queries are distributed and parallelized across multiple physical resources.
Amazon Redshift uses replication and continuous backups to enhance availability and improve data durability and can automatically recover from component and node failures
Redshift is a SQL based data warehouse used for analytics applications (Analytics DB)
- Example use cases: Sales Reporting, Health Care analytics
- It is suited for OLAP-based use cases (On Line Analytics Processing)
- Can store huge amount of data (a database), but can’t ingest huge amounts of data in real time (not like what Kinesis can do)
You can launch & Configure it from AWS Console or through AWS Redshift APIs
Redshift can:
- Fully recover from a node or component failure
- It automatically patches and performs data backup
- Backups can be stored for a user defined retention period
- Is 10 times faster than traditional SQL RDBMS

Redshift automatically replicates all your data within your data warehouse cluster
Redshift always keeps three copies of your data:
- The original one
- A replica on compute nodes (within the cluster)
- A backup copy on S3
Cross Region Replication
- Redshift can asynchronously replicate your snapshots to S3 in another region for DR

Automatically patches and backs up (Snapshots) your data warehouse, storing the
- backups for a user-defined retention period in AWS S3.
- By default for one day (24 hours) but you can configure it from 0 – 35 days
Automatic backups are stopped if you choose retention period of 0
- You have access to these automated snapshots during the retention period
Manual backups are not deleted automatically, if you do not manually delete them, you will be charged standard S3 storage rates
AWS Redshift currently supports only one AZ (no Multi-AZ option)
You can restore from your backup to a new Redshift cluster in the same or a different AZ

Supports encryption of data at rest
By default, AWS Redshift takes care of encryption key management
You can choose to manage your own keys through
- HSM (Hardware Security Modules)
- AWS KMS (Key Management Service)
Supports SSL Encryption, in-transit, between client applications and Redshift data warehouse cluster
You can’t have direct access to your AWS Redshift cluster nodes

Amazon Redshift