AWS Data Analytics Specialty Certification - My Experience & Tips

AWS Data Analytics Specialty Certification - My Experience & Tips

Key Takeaways

The AWS Data Analytics Specialty certification was a challenging learning experience that requires broad and deep understanding of all designing, building, securing and maintaining of data analytics solutions on AWS. There is a good split between testing of domain expertise (databases, visualisation, storage, ETL, etc.) alongside solution architecture and design. Though I found much of the information tested on the exam to be beneficial to learn and understand, it’s worth noting that some some specific elements will not be valuable on a day to day basis (looking at you, DynamoDB capacity unit calculations). I do, however, feel as though undertaking this certification has had an overall positive impact on my day job.

I personally found the most challenging element of the certification to be mental stamina sitting the 3 hour long exam, so do try some practice exams beforehand.

Background

AWS have a number of specialty exams (6 at the time of writing). I decided to target the Data Analytics Specialty as my third AWS certification after cloud practitioner and solution architect associate certificates. I highly recommend Adrian Cantrill’s guidance around certification pathways as I found it really useful and would echo the sentiment around level of difficulty for the Data Analytics Specialty exam - their was definitely some above-associate-level solution architecture knowledge required that I felt equipped for given my existing analytics domain knowledge, so I would see either Solution Architect Professional (with little to no analytics experience) or Solution Architect Associate (with a lot of analytics experience) as pre-requisite paths.

I have some information on my background in my “About ” page but I would say the most relevant information is that I had been working with AWS for about a year prior to undertaking this certification, I had at least 5 years of experience across analytics-related technologies (databases, visualisation, distributed computing, and ETL), and already had foundational knowledge around storage, compute, and networking. All of these were essential, but some would be covered through the SAA certification.

Exam Topics & Structure

The exam content is fully described in the exam guide. At a glance, I think the structure of the data analytics specialty content was very clear and likely to make the learning easy to align. I found this to be the case, mostly. The exception is that I felt as though it was more helpful to focus specifically on services and solutions rather than the domains (Collection, Storage & Management, Processing, Analytics, & Visualisation, Security) when shifting from theory and hands on labs to exam preparation. This is somewhat arbitrary, but I thought it worth mentioning for anyone who has a little difficulty when switching to exam prep.

Typical guidance for AWS exams, such as paying attention to key terms like “most cost effective” or “least operational overhead” still apply here, as does ensuring you have a good understanding of single-AZ, multi-AZ and multi-region implications, and I found the next most useful thing in selecting a question response to be specific awareness of integrations (or inability to integrate as may be the case) - often with multiple services involved in a single question it can be tricky to remember all available combinations.

Given that AWS scale scoring so that more difficult / complex questions contribute more to your overall score, of course learning has to consider all aspects. However, I also think this means it’s important to be really confident on the less complex questions. In my exam, these included:

  • Quicksight - choosing the appropriate Quicksight visualisation type given a certain set of data (identifying when a scatter plot might be more useful than a bar chart, for example), and implementing AI-driven forecasting
  • How to flatten JSON data in Glue
  • Calculating DynamoDB WCU and RCU
  • Performance and log analytics (i.e. knowing when to use ELK / OpenSearch)

Though all aspects of a relatively large number of services are examined, I found some topic areas to come up slightly more  regularly during the exam, including:

  • File type limitations (orc, avro, parquet, csv) including, for example, which options are best suited based on partitioning or formatting requirements, or reading parquet into Quicksight (via Athena)
  • Scaling DBs for HA or DR, auto scaling vs manual scaling, scaling out and up (or horizontally and vertically)
  • Redshift Workload Management, concurrency, node types
  • DynamoDB came up a lot, but I had multiple questions specifically around partition keys
  • Streaming data probably came up most often. I didn’t get many Kafka / MSK questions (only 2-3 I can remember), but did encounter a number of Kinesis questions, probably as it relates to multiple exam domains when you consider integration with Kinesis Firehose, Analytics, OpenSearch, etc. understanding all integrations and limitations is critical, but you’ll also need to know information around consumers and producers, and error handling / retry config.
  • Message queues (SQS , SNS), message ordering, visibility timeout etc
  • Security - encryption, row and column level security, policies and roles, cross account access
  • EMR provisioning and open source spark applications (use of sqoop, hive, pig, NiFi, flink) and what they do

If I had to pick a handful of services in which a lack of expertise would have had a greater impact in my exam experience, I would say these are Kinesis (+integrations), Redshift, EMR, and DynamoDB, Glue, and relevant security services (KMS, HSM, and others). This is, of course, entirely subjective and will vary for everyone’s question set. This is just what I observed from my exam, but it will not be a surprising list to anybody already working in AWS data analytics.

Admittedly, Glue didn’t feature as much on my set of questions as I was expecting from my exam prep compared to the others listed above. I also didn’t get any ML questions based around Sagemaker, but did have questions that were focused on use of EMR for ML applications / solutions.

All that said, ultimately, if a service is data-specific, you’ll be expected to have expert knowledge on it. These are well described in the big data analytics whitepaper.

Exam Preparation

As with any AWS exam, working with AWS on a daily basis will almost certainly make the experience easier - though working with some of the exact services in the exam would have the most benefit, even if you’re not doing so, having a foundational understanding of networking fundamentals (VPC, CIDR notation and blocks), common integrations, endpoints, High Availability and Disaster Recovery configurations, etc.,will also be of some help. All that said, I undertook the exam when I was not using AWS day-to-day. It’s totally possible, but I would suggest that the learning will likely take a little longer.

Though I know people that have spent a matter of days or a couple of weeks preparing for this exam full time, I was dedicating 2-4 hours per week for studying. It took me 2.5-3 months to prep for and pass the exam.

I have used a variety of learning collateral for previous certifications, but tend to use Adrian Cantrill’s (cantrill.io) and/or Stephane Maarek’s learning material, and Tutorialsdojo’s practice exams. For the Data Analytics specialty, Adrian did not have a course so I used Stephane’s Udemy course. I also read (a few times) the AWS Big Data Analytics Whitepaper - though it was published in 2021, I believe this is still very relevant.

Taking the exam

First of all, if you’ve not taken a professional or specialty level AWS exams, it’s a vastly different experience from any other certification I’ve undertaken in that the question structure and length are different and there are much fewer questions that feel as though there is only one more obviously correct answer. The exam is not only longer (180 mins compared to 130 at associate), but I did need all of that time to complete the exam - I finished with about 4 minutes remaining. This makes it a real challenge in maintaining focus, and I would highlight the importance of taking practice exams to adjust to this.

I think both points above are reflective of the fact that the exam requires specialist domain knowledge across all data domains and services (as described in Adrian’s video, the “T” shaped knowledge).

In all honesty, I feel as though many technical specialists may still struggle with the exam despite great analytics domain knowledge due to the fact that an understanding of architectural considerations, how services and solutions fit together and integrate, are essential. The exam will be much easier for those working in technical or data architect roles (or with experience in these roles).

In taking the exam virtually (I used Pearson), I had no issues but if you’ve had any challenges with this previously (being warned for looking away from camera, or mouthing words as you read questions etc.), again, the likely challenge will be in exam duration.

One exception to the above, that depends on how you typically work, is that you don’t have the ability to use a calculator or pen and paper either to draw out a solution or to do calculations. There will almost certainly be some mental arithmetic required for some specific capacity based questions (throughput  and DynamoDB WCU and RCU calculations, for example).

What next?

Go, dominate the world with your AWS Data Analytics expertise… as for me, I decided to take a small break before going after the remaining associate levels certifications (SysOps and Developer). Once I decide whether to go for the solution architect professional or database specialty, I will be sure to capture my process and experience in a future post.