Saturday, December 21, 2024
HomeBig DataAutomate Amazon Redshift load testing with the AWS Analytics Automation Toolkit

Automate Amazon Redshift load testing with the AWS Analytics Automation Toolkit

[ad_1]

Amazon Redshift is a quick, totally managed, extensively fashionable cloud information warehouse that powers the trendy information structure that empowers you with quick and deep insights and machine studying (ML) predictions utilizing SQL throughout your information warehouse, information lake, and operational databases. A key differentiating issue of Amazon Redshift is its native integration with different AWS companies, which makes it simple to construct full, complete, and enterprise-level analytics purposes. The AWS Analytics Automation Toolkit allows automated provisioning and integration of not solely Amazon Redshift, however database migration companies like AWS Database Migration Service (AWS DMS) and the AWS Schema Conversion Device (AWS SCT).

This publish discusses new additions to the AWS Analytics Automation Toolkit, which allow you to carry out superior load testing on Amazon Redshift. That is achieved by provisioning Apache JMeter as a part of the analytics stack.

Answer overview

Apache JMeter is an open-source load testing utility written in Java that you need to use to load check net purposes, backend server purposes, databases, and extra. Within the database context, it’s an especially beneficial software for repeating benchmark assessments in a constant method, simulating concurrency workloads, and scalability testing on totally different database configurations.

For instance, you need to use JMeter to simulate a single enterprise intelligence (BI) person or a whole lot of BI customers concurrently operating numerous SQL queries on an Amazon Redshift cluster for efficiency benchmarking, scalability, and throughput testing. Moreover, you possibly can rerun the identical precise simulation on a distinct Amazon Redshift cluster that maybe has twice as many nodes as the unique cluster, to check the worth/efficiency ratio of every cluster.

Equally, you need to use JMeter to load check and assess the efficiency and throughput achieved for combined extract, rework, and cargo (ETL) and BI workloads operating on totally different Amazon Redshift cluster configurations.

For a deeper dialogue of JMeter and its use for benchmarking Amazon Redshift, check with Constructing high-quality benchmark assessments for Amazon Redshift utilizing Apache JMeter.

Though JMeter set up is a comparatively easy course of, consisting primarily of downloading and putting in a Java digital machine and JMeter, the considered having to obtain, set up, and arrange any software for benchmarking functions can generally function a detractor for a lot of. Ranging from scratch for a check setup may be intimidating.

The AWS Analytics Automation Toolkit now consists of the choice to mechanically deploy JMeter on Amazon Elastic Compute Cloud (Amazon EC2) in the identical digital non-public cloud (Amazon VPC) as Amazon Redshift. This features a devoted Home windows occasion, with all required JMeter dependencies, resembling JVM and a pattern check plan, thereby simply enabling highly effective load testing capabilities on Amazon Redshift. On this publish, we reveal the usage of the AWS Analytics Automation Toolkit for JMeter load assessments on cloud benchmark information, utilizing Amazon Redshift as a goal setting.

This answer has the next options:

  • It deploys assets mechanically, together with JMeter
  • You may level JMeter to an present Amazon Redshift cluster, or mechanically create a brand new cluster
  • You may deliver your personal information and queries, or use a pattern TPC dataset
  • You may simply customise the check plan into separate threads, every with totally different workloads and concurrency as wanted

To make use of the AWS Analytics Automation Toolkit to run a JMeter load check, deploy the toolkit with the JMeter possibility, load information into your Amazon Redshift cluster, and customise the default check plan as you see match.

The next diagram illustrates the answer structure:

Conditions

Previous to deploying the AWS Analytics Automation Toolkit, full the stipulations and put together the config file, user-config.json. For directions, check with Automate constructing an built-in analytics answer with AWS Analytics Automation Toolkit.

The config file has a brand new parameter referred to as JMETER, on the prime part. To provision a JMeter occasion, enter the worth CREATE for this parameter.

To provision a brand new Amazon Redshift cluster, enter the worth CREATE for the REDSHIFT_ENDPOINT parameter. Then fill in values for the fields within the redshift part of the config file. You should use the sizing calculator on the Amazon Redshift console to advocate the proper cluster configuration primarily based in your information dimension.

If you would like load an industry-standard pattern TPC-DS information (3TB) into your cluster, enter the worth “Y” for the “loadTPCdata” parameter, beneath the redshift part.

To make use of an present cluster, enter the endpoint of your cluster for the REDSHIFT_ENDPOINT parameter within the prime part of the user-config.json file.

Deploy assets utilizing the AWS Analytics Automation Toolkit

To deploy your assets, full the next steps:

  1. Launch the toolkit as described in Automate constructing an built-in analytics answer with AWS Analytics Automation Toolkit.
  2. Create tables and ingest your check information into your Amazon Redshift cluster.
    • In the event you selected to load the pattern TPC-DS 3TB information, this can take a while to load, so please enable for this. If you’re loading your personal information then it’s possible you’ll try this at this level.

Launch JMeter

To launch JMeter, full the next steps:

  1. Utilizing RDP, log in to the JMeter EC2 Home windows occasion created by the AWS Analytics Automation Toolkit.
  2. Launch the JMeter GUI by selecting (double-clicking) the shortcut JMETER on the Home windows Desktop.

In our expertise, altering the JMeter Look and Really feel choice to Home windows (as a substitute of darkish mode) ends in elevated JMeter stability, so we extremely advocate making that change and selecting Sure to restart the GUI.

Customise the JMeter check plan

To customise the JMeter check plan, we modify the JDBC connection, and optionally modify the thread ramp-up schedule and optimize the SQL.

  1. Utilizing the JMeter GUI, open the AWS Analytics Automation Toolkit’s default check plan file c:JMETERapache-jmeter-5.4.1Redshift Load Take a look at.jmx.
  2. Select the check plan identify and edit the JdbcUser worth to the proper person identify in your Amazon Redshift cluster.

In the event you used the CREATE cluster possibility, this worth is similar because the master_user_name worth in your user-config.json file.

  1. In JDBC connection, edit the DatabaseURL worth and password with the proper values in your cluster.

In the event you used the CREATE cluster possibility, the password is stored in a secret named <stackname>-RedshiftPassword. You’ll find the endpoint by selecting the brand new Amazon Redshift cluster and copying the endpoint worth on the higher proper.

This check plan already has a initialization command set to show off the outcome cache function of Amazon Redshift: set enable_result_cache_for_session to off. No motion is required to configure this.

  1. Optionally, modify the thread ramp-up schedule.

This check plan makes use of the Final Thread Group, which is mechanically put in whenever you open the check plan. Every thread group comprises a ramp-up schedule, in addition to a question or set of queries. Modify these to in line with your testing preferences and dataset. In the event you loaded the TPCDS dataset, the queries included by default within the three thread teams will work.

Within the following instance, Smallthread Group has 4 rows, every of which launches the precise variety of classes at staggered timings. The ramp-up time to attain the utmost session rely is 45 seconds, as a result of the final thread doesn’t begin till 15 seconds into the check, and has a 30-second begin time. You may alter the ramp-up schedule, in addition to the maintain length and shutdown time, by modifying, including, or deleting rows within the Thread Schedule part. The graph is then mechanically adjusted.

  1. Optionally, you possibly can customise the SQL.

Within the following instance, we select the JDBC request merchandise beneath the identical SmallThread Group, referred to as SmallSQL, and evaluation the question run by this thread group. In the event you added your personal information, insert the question or queries you wish to run for this thread group. Do the identical for the medium and huge thread teams, or delete or add thread teams as wanted.

Run the check

Run the check plan by selecting the inexperienced arrow within the GUI.

Alternatively, enter the next command in a Home windows command immediate to run JMeter in command line mode:

C:JMETERapache-jmeter-5.4.1binjmeter -n -t RedshiftTPCDWTest.jmx -e -l check.out

Use case instance: Evaluating concurrency scaling advantages

For our instance use case, we wish to decide how Amazon Redshift concurrency scaling advantages our workload efficiency. We are able to use the JMeter setup outlined on this publish to shortly reply this query.

Within the following instance, we ran the pattern check plan as is in opposition to the pattern TPC-DS information, with 80 concurrent customers throughout three thread teams. We ran the check first with concurrency scaling enabled, then reran it after disabling concurrency scaling on the Amazon Redshift cluster.

To watch the outcomes, open the Amazon Redshift console, select Clusters within the navigation pane, and select the cluster you’re utilizing for the efficiency check. On the Cluster efficiency tab, you possibly can monitor the CPU utilization of all of the nodes in your cluster, as proven within the following screenshot.

On the Question monitoring tab, you possibly can monitor the queue exercise, in addition to concurrency scaling exercise of your cluster.

The previous graphs cowl two totally different assessments with a 4-node ra3.16xlarge cluster. On the left aspect, concurrency scaling was enabled for the cluster, and on the fitting aspect it was disabled. Within the check with concurrency scaling enabled, the queueing is much less, and check completion length is shorter.

Be aware that the check length is the time to run the workload on Amazon Redshift, to not obtain question outcomes.

Lastly, to evaluation the precise check outcome question efficiency, you possibly can obtain the file C:JMETERapache-jmeter-5.4.1SummaryReportIndividualRecords.csv from the EC2 occasion, and evaluation the question efficiency for every thread. The next screenshot is a abstract plot of the check outcomes for this instance, with and with out concurrency scaling.

Because the chart illustrates, concurrency scaling can considerably scale back the latency in your workloads, and is especially helpful for brief bursts of exercise in your utility.

Conclusion

JMeter means that you can create a versatile and highly effective load check in your Amazon Redshift clusters to evaluate the efficiency of the cluster. With the brand new functionality within the AWS Analytics Automation Toolkit, you possibly can provision and configure JMeter, together with a check plan, in a fraction of the time it will usually take.


Concerning the Authors

Samir Kakli is an Analytics POC Specialist Options Architect primarily based out of Florida. He’s targeted on serving to prospects shortly and successfully align Amazon Redshift’s capabilities to their enterprise wants.

Asser Moustafa is an Analytics Specialist Options Architect at AWS primarily based out of Dallas, Texas. He advises prospects within the Americas on their Amazon Redshift and information lake architectures and migrations, ranging from the POC stage to precise manufacturing deployment and upkeep.

[ad_2]

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments