Redshift COPY gotcha with epoch seconds
Redshift has a COPY
statement that will load csv or json files from S3 into a table. I was using this feature the other day to load csv data from multiple different sources within S3 into a single table in Redshift.
This was part of an effort to streamline some of our analysts' workflows from multiple, bespoke, boto3-powered download scripts into a connection string.
It took a few iterations to determine the data types I needed for each column. But most of the datetime fields we write as epoch seconds which can safely be loaded into an INTEGER
column.
I did the math. INTEGER
supports up to 2147483647
which ends up being 2038-01-19T03:14:08+00:00
. Plenty of time.
I knew that we would want to leverage the timestamp functions that Redshift's SQL dialect offered so I was looking for a way to convert those epoch seconds fields into TIMESTAMP
and found this:
Automatic recognition does not support epochsecs and epochmillisecs.
Kind of a bummer but the workaround isn't so bad:
SELECT (TIMESTAMP 'epoch' + 1555734000 * INTERVAL '1 second') AS my_timestamp;
And since Redshift supports user-defined functions (UDFs), we can actually wrap this thing up into something that's less visually noisy when you're having to cast a lot of these columns in your ELT/ETL COPY
script:
CREATE OR REPLACE FUNCTION f_epochsecs_to_timestamp (INTEGER)
RETURNS TIMESTAMP
STABLE AS $$
SELECT (TIMESTAMP 'epoch' + $1 * INTERVAL '1 second')
$$ LANGUAGE SQL;
Which turns our previous conversion into:
SELECT f_epochsecs_to_timestamp(1555734000) AS my_timestamp;