Redshift to Snowflake Migration: SQL Function Mapping

Gather around, kids! We have a new hero in town - Snowflake. Snowflake is killing other databases like Thanos killed spoiler alert in Avengers. They started with Teradata and Vertica, and are now aiming their guns at AWS Redshift. And they have earned this right. They have really addressed the pain points people are facing while using these other databases. For example, scaling up and down in Redshift is a painful process. On the other hand, in Snowflake you can spin up clusters on the fly without much hassle or effort with almost zero impact on running pipelines. 

While this post is not about finding the better one among Snowflake and Redshift, you should still read it if you are one of those thinking about making the move from Redshift to Snowflake. This post lists out the most common functions used in Redshift and points out their Snowflake counterpart. This list is not exhaustive by any means and I will strive to keep on updating it as and when I find new ones.

Function Mapping


Redshift
Snowflake
bpchar
char
character varying
varchar
'now'::character varying
current_timestamp
identity(start, 0, ([0-9],[0-9])::text)
identity(start, 1)
select GETDATE()
select CURRENT_TIMESTAMP()
select TRUNC(CURRENT_TIMESTAMP())
select DATE_TRUNC(DAY,CURRENT_TIMESTAMP())
select SYSDATE
select TO_TIMESTAMP_NTZ (CONVERT_TIMEZONE
('UTC',CURRENT_TIMESTAMP()))
select DATE(CURRENT_TIMESTAMP())
select TO_DATE(CURRENT_TIMESTAMP())
date_time_utc > GETDATE() - 7
date_time_utc > dateadd('DAY', -7, TO_TIMESTAMP_NTZ(LOCALTIMESTAMP))
select coalesce('a')
select coalesce('a', 'b') -- throws error message with only one argument "SQL compilation error: error line 1 at position 7 not enough arguments for function [COALESCE('a')], expected 2, got 1"

Apart from these functions, there are few other subtle differences in Snowflake SQL.

Named columns in DDL

In Redshift, you don't necessarily need to name all the columns in your Create Table/ View scripts. Even though it is a bad practice to have unnamed columns in your script, Redshift is quite forgiving. In this case, it will just name the column based on the function used. But, in Snowflake, your script will fail if you have not named all the columns explicitly. E.g.

-- Redshift 
create table my_table as select getdate();
-- This statement is successful. The column name will be getdate()

- Snowflake
create table my_table as select current_timestamp();
--This statement fails with error "Missing column specification"

Object Name Casing

By default, Redshift lowercases all object names. This is opposite in Snowflake, it uppercases all unquoted object names. To avoid confusion, avoid using quoted identifiers in Snowflake.

select * from "my_test_table";
-- This statement works
select * from my_test_table;
-- This statement fails
select * from "MY_TEST_TABLE";

-- This statement fails 

Delete

There is a minor difference in the Delete syntax between Redshift and Snowflake. Snowflake is again more close to the ANSI standard. While Redshift does not really needs the FROM keyword in Delete statements, Snowflake query will fail if you do not have the FROM keyword.

-- Redshift 
Delete my_table;
-- This statement is successful. 


- Snowflake
Delete from my_table;
-- This statement is successful.

While the jury is still out on which is better among these two technologies, this post was an attempt to bridge the gap for people who are planning to transition from one to another. Do share your feedback!

Comments

Popular posts from this blog

Uber Data Model

Data Engineer Interview Questions: SQL

Cracking Data Engineering Interviews

Hive Challenges: Bucketing, Bloom Filters and More

UBER Data Architecture