[ad_1]
Amazon Redshift is a quick, scalable, safe, and absolutely managed cloud knowledge warehouse that makes it easy and cost-effective to research all of your knowledge utilizing normal SQL and your current ETL (extract, rework, and cargo), enterprise intelligence (BI), and reporting instruments. Tens of 1000’s of shoppers use Amazon Redshift to course of exabytes of information per day and energy analytics workloads resembling BI, predictive analytics, and real-time streaming analytics.
There are numerous methods to work together with Amazon Redshift. You possibly can programmatically entry knowledge in your Amazon Redshift cluster by utilizing the Amazon Redshift Knowledge API, or you need to use a web-based interface resembling Amazon Redshift Question Editor V2 to creator SQL queries. You too can work together with Amazon Redshift in interactive or batch mode by way of Amazon Redshift RSQL.
Beforehand, you had to make use of the PostgreSQL psql command line instrument to work together with Amazon Redshift for easy use instances resembling importing and exporting knowledge to and from Amazon Redshift or submitting SQL in batch or interactive mode, or for superior use instances resembling performing ETL operations with out writing complicated ETL jobs.
Now you need to use Amazon Redshift RSQL, a brand new command line instrument to connect with an Amazon Redshift cluster and carry out operations resembling describe database objects, question knowledge, view and export question leads to varied output file codecs, run scripts that embrace each SQL and sophisticated enterprise logic, carry out knowledge cleaning and transformation operations utilizing acquainted SQL, and write ETL and ELT scripts utilizing enhanced management circulation and superior error dealing with. Furthermore, if you happen to’re migrating from self-managed knowledge warehousing engines resembling Teradata to Amazon Redshift and utilizing Teradata BTEQ (Fundamental Teradata Question) scripting for knowledge automation, ETL or different duties can now migrate to Amazon Redshift RSQL.
This put up explains find out how to use Amazon Redshift RSQL for ETL, knowledge cleaning and preparation, enhanced management circulation, and exception and error dealing with.
Resolution overview
Amazon Redshift RSQL permits you to connect with your Amazon Redshift cluster by way of a command line. It helps the capabilities of the PostgreSQL psql command line instrument with a further set of Amazon Redshift particular capabilities:
- Describe properties or attributes of exterior tables in an AWS Glue catalog or Apache Hive metastore, exterior databases in Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL, and tables shared utilizing Amazon Redshift knowledge sharing
- Use enhanced management circulation instructions resembling
IF
,ELSEIF
,ELSE
,ENDIF
,GOTO
, andLABEL
- Use single sign-on (SSO) authentication utilizing ADFS, PingIdentity, Okta, Azure AD, or different SAML/JWT based mostly id suppliers (IdPs), in addition to use browser-based SAML IdPs with multi-factor authentication (MFA)
The next are some generally used instructions in Amazon Redshift RSQL. We use these instructions on this put up to show completely different attainable use instances utilizing Amazon Redshift RSQL scripts.
- echo – Prints the required string to the output.
- comment – An extension of the
echo
command that has the flexibility to interrupt the output over separate traces. - goto – Skips all intervening instructions and SQL statements and resume the processing on the specified LABEL in sequence. The
LABEL
should be a ahead reference. You possibly can’t bounce to aLABEL
that lexically precedes theGOTO
. - label – Establishes an entry level for operating this system because the goal for a
GOTO
command. - exit – Stops operating Amazon Redshift RSQL. You too can specify an non-obligatory exit code.
- q – Logs off database classes and exits Amazon Redshift RSQL.
Stipulations
The next are the stipulations for utilizing Amazon Redshift RSQL and carry out the steps on this put up:
- An AWS account
- Linux, Home windows, or MacOs working system (Amazon Redshift RSQL is out there for these working programs)
- An Amazon Redshift cluster
- SQL data
Moreover, full the next prerequisite steps:
- Set up Amazon Redshift RSQL in your native machine. For directions, discuss with Getting began with Amazon Redshift RSQL.
- Join to the Amazon Redshift cluster.
- Create the
orders
andorders_summary
tables utilizing the next DDL statements:
Import knowledge into the Amazon Redshift cluster
There are a number of methods to load knowledge into Amazon Redshift tables, together with utilizing Amazon Redshift RSQL. On this part, we assessment the syntax and an instance of the Amazon Redshift RSQL script to load knowledge into an Amazon Redshift desk utilizing the COPY command.
We use the next syntax:
We offer the next parameters:
- <location> – The placement of the supply knowledge to be loaded into the goal desk
- <arn> – The AWS Id and Entry Administration (IAM) function for accessing the information
- <area> – The AWS Area of the supply knowledge
Within the following instance Amazon Redshift RSQL script, we load knowledge from an Amazon Easy Storage Service (Amazon S3) bucket location into the orders
desk:
Enhanced management circulation
You should use Amazon Redshift RSQL to outline programmatic enhanced management circulation and conditional blocks in your ETL script. We use the next syntax:
The syntax contains the next parts:
- <situation> –The conditional assertion
- <code_block1> – The code block to run when the situation is met
- <code_block2> – The code block to run when the situation isn’t met
Within the following instance script, we carry out some conditional checks utilizing if
, elseif
, and else
instructions based mostly on the rely of data from the orders
desk, and we show some messages based mostly on the file rely worth:
Error dealing with
You should use Amazon Redshift RSQL to outline exception dealing with blocks in your in ETL script to deal with varied user-defined and system-generated error situations that you just would possibly encounter whereas operating the ETL course of.
The next are among the error dealing with choices supported in Amazon Redshift RSQL:
- :ACTIVITYCOUNT – This variable is much like the psql variable
ROW_COUNT
, which returns the variety of rows affected by final SQL assertion. Nevertheless,ACTIVITYCOUNT
stories the affected row rely for SELECT, COPY, and UNLOAD statements, whichROW_COUNT
doesn’t. For SQL statements returning knowledge,ACTIVITYCOUNT
specifies the variety of rows returned to Amazon Redshift RSQL from the database. - :ERRORCODE – This variable accommodates the return code for the final submitted SQL assertion to the database. Error code 0 specifies that SQL assertion accomplished with none errors.
- :ERRORLEVEL – This variable is used to assign severity ranges to errors. You should use these severity ranges to find out a plan of action. The default worth is ON.
- :MAXERROR – This variable designates a most error severity stage past which Amazon Redshift RSQL halts job processing. If SQL statements in Amazon Redshift RSQL scripts produce an error severity stage higher than the designated maxerror worth, Amazon Redshift RSQL instantly exits.
- :LAST_ERROR_MESSAGE – This variable accommodates the error message of the latest failed question.
We use the next syntax:
The syntax contains the next info:
- <ERRORCODE> –The error code quantity
- <LAST_ERROR_MESSAGE> – The error message of the latest failed question
- <code_block1> – The code block to run when the error situation is met
- <code_block2> – The code block to run when the error situation isn’t met
Within the following instance script, we create the orders_staging
desk and replica data into the desk from an Amazon S3 location. The script additionally accommodates an exception dealing with part for each the desk creation and replica course of to deal with the attainable errors encountered through the course of.
Knowledge transformation and preparation
You possibly can carry out some frequent knowledge preparation and transformation operations in your dataset utilizing SQL statements within the Amazon Redshift RSQL ETL script. On this part, we show knowledge transformation and preparation operations resembling casting, new knowledge column creation, and splitting an enter column into a number of columns.
We use business normal SQL statements for remodeling and making ready knowledge for downstream consumption.
Within the following instance script, we use a SQL assertion to remodel the information from the orders_staging
desk and insert it into the orders_summary
desk:
Export knowledge from an Amazon Redshift cluster and output file formatting choices
You should use Amazon Redshift RSQL to extract knowledge from one or a number of Amazon Redshift tables and write to your disk for consumption by downstream purposes. Amazon Redshift RSQL makes use of the EXPORT choice to export the results of question to an output file.
The next are among the helpful output formating choices supported in RSQL:
- rset rformat on – This command is required for all of the formatting instructions to take impact.
- pset format – Codecs can embrace aligned, AsciiDoc, CSV, HTML, LaTeX, LaTeX
longtable
,troff-ms
, unaligned, and wrapped. - pset border – This feature specifies border info for output knowledge. Worth 0 means no border, 1 means inside dividing traces, and a couple of means desk body.
- rset heading – This command provides the required heading to the output outcome.
- rset rtitle – This command provides the required heading in addition to present system date of the consumer pc.
- rset titledashes on/off – This command specifies whether or not to print a line of sprint characters between the column names and column knowledge returned for the SQL question.
- rset width – This command specifies the goal width for every line in a report
- t – This command turns off printing column names in addition to outcome row rely on the finish of the output (footers).
We use the next syntax:
We offer the next info:
- <‘FilePath/Filename’> – The file identify and path for the output file
- <SQL Question> – The SQL question to run
Within the following instance script, we export the information from the orders_summary
desk utilizing a SQL question and write it into the orders.txt
textual content file on the native machine:
Automate the Amazon Redshift RSQL ETL script
One of many choices to automate Amazon Redshift RSQL scripts to run on a particular schedule is by way of shell scripting. You possibly can schedule the shell script by way of a CRON job, a command line utility.
We use the next syntax:
We offer the next info:
- <RSQL_SCRIPT> – The SQL scripts to un
- <LOG_FILE> – The output log file
Within the following instance shell script, we run the Amazon Redshift RSQL script that we created and write the output log in a log file within the native machine. You possibly can schedule the shell script by way of a CRON job.
Clear up
To keep away from incurring future costs, cease the Amazon Redshift cluster created for the aim of this put up.
Conclusion
On this put up, we defined find out how to use Amazon Redshift RSQL to carry out ETL operations. We additionally demonstrated find out how to implement superior error dealing with and enhanced management circulation in your Amazon Redshift RSQL ETL script.
When you’re utilizing scripts by way of the psql command line consumer on Amazon Redshift, you’ll be able to function on Amazon Redshift RSQL with no modification. When you’re migrating your Teradata BTEQ scripts to Amazon Redshift RSQL, you need to use the AWS Schema Conversion Software (AWS SCT) to routinely convert BTEQ scripts to Amazon Redshift RSQL scripts.
To be taught extra, discuss with Amazon Redshift RSQL.
Concerning the Authors
Saman Irfan is a Specialist Options Architect at Amazon Net Providers. She focuses on serving to prospects throughout varied industries construct scalable and high-performant analytics options. Outdoors of labor, she enjoys spending time along with her household, watching TV collection, and studying new applied sciences.
Sudipta Bagchi is a Specialist Options Architect at Amazon Net Providers. He has over 12 years of expertise in knowledge and analytics, and helps prospects design and construct scalable and high-performant analytics options. Outdoors of labor, he loves operating, touring, and taking part in cricket.
[ad_2]