AWSCertifiedBigDataSlides-compressed
Exam Overview
Course: AWS Certified Data Analytics Specialty Course DAS-C01
Objective: Prepare for the Big Data Specialty certification exam (BDS-C00).
Recommended Background:
Prior knowledge of AWS services (EC2, networking).
Familiarity with data and analytics concepts.
Duration: Long and interesting course, take your time.
Instructors
Stephane Maarek
IT Consultant & AWS Big Data Architect.
Veteran instructor; 94% certification score.
Links: GitHub, LinkedIn, Medium, Twitter.
Frank Kane
Former Amazon Sr. Software Engineer and Manager.
Focus on Machine Learning and Big Data.
Owner of Sundog Education.
Links: LinkedIn, Twitter, Facebook.
Course Coverage
AWS Big Data Services:
Amazon Kinesis, AWS Lambda, AWS Glue, Amazon EMR, Amazon ML, SageMaker, etc.
Services categorized into:
Collection: Kinesis, AWS IoT, SQS.
Storage: S3, DynamoDB.
Processing: AWS Lambda, Glue, EMR.
Analysis: Athena, Redshift, QuickSight.
Security: AWS KMS, CloudHSM.
Case Study
Case Study Overview: cadabra.com
Requirements:
Order History App: Client app, server logs.
Product Recommendations: Server logs.
Transaction Rate Alarm: Server logs.
Near Real-Time Log Analysis: Amazon OpenSearch, Kinesis Data Firehose.
Data Warehousing & Visualization: (Managed Serverless).
AWS Data Collection Methods
Real-Time:
Tools: Kinesis Data Streams, SQS.
Near Real-Time:
Tools: Kinesis Data Firehose, DMS.
Batch Analysis:
Tools: Snowball, Data Pipeline.
Kinesis Data Streams
Architecture: Consists of shards, producers, consumers.
Data Ingestion Constraints:
Retention (1 to 365 days).
Record limitations (1 MB/record, 1,000 records/sec per shard).
Security:
Control access using IAM,
Encryption in flight and at rest with KMS.
AWS Lambda
Functionality: Serverless computing and event-driven architecture.
Common Integrations:
Kinesis, S3, DynamoDB.
Cost Model: Pay for number of requests and duration of execution.
Supported Languages: Node.js, Python, Java, C#, Go.
AWS Glue
Serverless ETL Service: Effective for data cleaning and transformation.
Crawler Feature: Automatic schema discovery from data sources.
Key Transformations:
Machine learning transformations (FindMatches).
Supports a variety of data formats.
Amazon Redshift
Service Type: Fully-managed, petabyte-scale data warehouse.
Performance: 10x faster than others via MPP and columnar storage.
Key Features:
Scaling, backups, and concurrency management.
Storage options: RA3 nodes offering independent scaling of compute and storage.
Amazon QuickSight
Functionality: Fast, serverless business analytics service.
Key Features: Commonly used for dashboards, ad-hoc analysis, and visualizations.
Security: Multi-factor authentication, IAM policies, row-level security.
Exam Preparation Tips
Timing: 65 questions in 170 minutes (~2.5 minutes per question).
Practice: Take practice exams; familiarize with AWS white papers and make use of the AWS training resource.
Day of Exam:
Bring two forms of ID; no notes or electronic devices allowed.
Arrive early and prepared to reduce stress.
Additional Resources
Review AWS Big Data White Paper and specific service documentation for exam preparation.
Join AWS forums and communities for tips and advice from others who have taken the exam.