AWS SA Professional

api
#

api gateway throttling limits
1. aws throttling limit (region level)
2. per account
3. per-api per-stage (methods)
4. per-client (usage plan)
three type of endpoint
1. edge-optimized (default) - route to nearest cloudfront
2. regional
3. private

application discovery service
#

for migration planning
connection type
- application discovery agent -> install on server. support vm / physical server
- discovery connector -> install on vCenter (is a ova)
- Migration Hub import => import the details directly

asg
#

will automatically tag the instances by default
cooldown will start after last instances launched if there are multiple instance scale at the moment

athena
#

performance tunning
1. partition data
2. compression (glue)
3. optimise the file size (aws glue)
4. use columnar (apache orc, parquet with spark or hive on EMR)
5. prevent select *
6. use limit by (guide for columnar)

billing
#

cost allocation tags - tags will show in the cost & usage report
budget - create alert if cost exceed the budget
setup for cost analysis
1. enable cost allocation tags in billing
2. allow user access the billing
cost allocation tags => tags
- user-define => user:XXXX
- aws generated => aws:XXXX
cost explorer => ui for search and filtering
cost category => filter in cost explorer (saved filter)
cost budget => billing alarm with foretasted charged + filtering + linked account; billing alert => amount already be charged
billing alert include
- recurring fee like premium support
- ec2 instance-hours but exclude
- one off fee
- refund
- forecast

cf
#

access control
- cf + waf + elb, it should be cf (set custom header) > waf (validate the rule) > alb
- cf + s3, => cf with oai > s3 bucket policy
- cf + alb => cf with custom header > alb rule
reason of cfn with s3 access denied errors
- s3 block public access must turn off if no oas policy is set - because it will override the permissions that allow public read access
- if request pays is turn on, the request must include the payer header
- object cannot be kms encrypted

cfn
#

can use automatic deployment to auto deploy existing stackset to new accounts in organisation

cloudhsm
#

need tcp/3389 for windows and tcp/22 for linux to connect to ec2 to install cloudhsm client; tcp/2223-2225 to communicate with the cluster

cloudtrail
#

best practice to migrate to org trail
1. create org trail in central account
  - create bucket for org (need to set bucket policy to allow member account to write to it)
  - enable cloudtrail feature in org
  - create org trail through cli
2. move old trail data from member accounts to org trail bucket
3. stop cloudtrail in member accounts and remove the old trail buckets

codecommit
#

data protection
- use macie => can help protecting data in s3

codedeploy
#

need to connect to s3 and codedeploy endpoints

cw
#

cw embedded metric format => can automatrically create metric from log
cw endpoints => monitoring.us-east-2.amazonaws.com
- monitoring.XXXX
- no az
treats each unique combination of dimensions as a separate metric, even if the metrics have the same metric name

data pipeline
#

components
1. pipeline definition
2. pipeline schedule
3. task runner
swf which is specific for data engineering
task runner can run on on-premise hosts
task runners can be run on any compute resources (ec2 and on-premise servers)
use resources in multiple regions
only supported in limited region

data sync
#

use for transfer data between on-premise and aws or between aws service
support cross-region (s3 <=> s3, efx <=> efx, efs <=> efs) sync
source location
destination location

ddb
#

support atomic counter
local secondary index only can create at table creation

eb
#

can stop start eb environment with lambda at scheduled time
doesn’t support HTTPS_PROXY

ebs
#

aws only recommend raid0
summary table for different volume types
gp2
- range: 100-16k iops
- baseline: base on volume (limited by burst credits)
- provision: no
gp3
- range: 3k-16k iops
- baseline: consistent 3k iops
- provision: 500iops/gb
io2, io1
- range: 100-32k iops / 32k-64 iops (only available for nitro system )
- provision: io1: 50iops/gb; io2: 500iops/gb
io2 block express
- only support with specific instance (R5b, X2idn, and X2iedn)
- range: 256k iops
- provision: 1000iops/gb
instance store - temporary block-level storage (physically attach to the host so not an network drive) (support in specific instance type. it is free)
the i/o performance are limited by ec2 instance type. although you can use raid0 to increase iops but still have a max for that
queue length on ssd: 1/1000iops
default block size is 4kb
use case of each volume type
- gp2, gp3 => boot, dev, test
- io1, io2 => db
- st1 (hdd) => large sequential workloads like data / log processing (EMR, ETL, data warehouse)
- sc1 => save costs

ec2
#

use cases of dual home
- separate the traffic by role (frontend, backend)
- ha (move the eni to other instance)
- security appliance reason
eni is binding to subnet
eni - when creating the eni, it inherits the public ipv4 address attribute from subnet

efs
#

HA - regional replication => notice that it means multi az not multi regions
cross region backup
- create 1st lambda to backup data from efs to s3 in region a; turn on cross-region replication in s3; create 2nd lambda to restore data from s3 > efs in region b
- data sync
backup solution which does not work in cross region
- data pipeline => the backup instance cannot mount 2 efs in different regions
- efs-to-efs => same as data pipeline solution but implemented by lambda function only
- aws backup => does not support cross region backup
dns name - file-system-id.efs.aws-region.amazonaws.com (like cw. us-east-2)
efs can deliver sub or low single digit millisecond latencies with > 10gbps through and 500k iops
launching instance is limited by the number of vcpu running per account per region running

elasticache
#

caching strategies
- lazy load - set cache when select from db
- write through - set cache when write to db
- ttl - write through + lazy load but set an expire date
connection endpoints
- node ep - read and write
- primary ep for write; reader ep for read (cluster mode disabled)
- configuration ep for read and write like node ep (cluster mode enable)
automatically cache query to elasticache for rds, aurora and redshift (use proxy)
support up to 500 nodes and shards

elb
#

classic load balancer only support at most one subnet per az

iam
#

set SAML session tags for access control (add attribute to idp metadata)
policy to deny access on specific region - deny all except the global service
in console, the instance profile are automatically created along with the iam role with the same name
ArnLike is case-sensitive but support wildwards like * and ?
group name limit is 128 characters
temporary security credentials are valid until they expire

mTurk
#

submit a request to mTurk. outsource manual tasks like taking survey, text recognition, data migration to public

opsworks
#

setup custom recipe to config the application with other aws services information => solid example for adding redis cluster connection information to rail application

org
#

org features to enable
- all
- consolidated billing
scp is one of the aws organisation feature
- default is allow all => can only use deny list only
- use allow list => have to remove FullAWSAccess (the default allow all policy)

other
#

Public Data Sets - data set for public access. more details
fileb:// is supported in
1. kms (key)
2. ec2 user data (gzip)
3. s3 (encryption key)
govcloud comparson
1. billing and using can be viewed in standard account
2. only us citizen employees can administer the govcloud
3. authentication is isolated from amazon.com
4. network is isolated from other region
migrate IBM MQ to Amazon MQ
migrate ibm db2 luw to rds (mysql postgresql)
iot monitor can check whether the rule has been executed

rds
#

RMAN restore isn’t supported for Amazon RDS for Oracle DB instances (RMAN is an backup and store tool for oracle db)

route53
#

health check must respond with 2xx or 3xx. support tcp and http
support DNSSEC

s3
#

access control to object c (account c) from request user a (account a) and s3 bucket (account b)
1. check the iam role in account a
2. check the bucket policy in account b
3. check the object acl in object owner
event notification support object and bucket level but it will resend the notification and sometimes will delay so people use cloudwatch event instead
there is a check in trusted advisor for the check open access in s3 bucket but no remediation for that. to fix the bucket permission automatically and use lambda + cloudwatch event
cf cannot cache if the request is larger than 30gb. can use range request to chunk the large file into smaller object
requester pays don’t support 1. anonymous request 2. SOAP request
default 100, max 1000 in each account
genomics data processing use case
1. sync data to s3 with data sync
2. use s3 for data storage
3. use storage gateway (on-premise access) / fsx (ec2 access)
s3 encryption - only support symmetric keys
when downloading encrypted s3 object, have to download the encrypted object along with a cipher blob version of the data key. client send the cipher blob to kms to get the plaintext version of data key to decrypt the object
reduced redundancy is one of the storage class in s3 but not recommend bylaws - may have a chance to lose the object

secret manager
#

set RotationSchedule to schedule an auto rotation (to run a custom lambda) the rds password

snowball
#

tips to increase performance
- batch small file
- multiple copy operations at one time (2 terminals 2 cp command)
- connect to multiple workstation (1 snowball can connect to multiple workstation)
step for using snowball
1. start the snowball
2. setup workstation by download ova image and import to the vmware
3. use cp command (something like aws s3api cp) to copy file to snowball
4. can upload through gui / command line
5. send the device back to aws. they will import your data to s3
take at least 1 weeks

sso
#

permission sets - 1 permission set has multiple iam policies => associate to user / group
sso
- ad (identity provider) -> aws sso -> application (github, dropbox) / aws accounts
- sources of identity provider
  - aws sso
  - ad connector
  - aws managed ad
  - external ad (two way trust)
- server -> client
- server = adfs
  - create an app
  - config app sign-in and sign-out url
- client = integrated website
  - trusted idp
  - config idP’s sign-in and sign-out url + cert
- user login with ad’s app endpoint => ad post data to app’s sign-in url => app receive and decrypt the data from ad and give permission to user
- aws iam federation => single account only

storage gateway
#

need to download ova and import the vm to create a endpoint to bridge on-premise and aws
storage gateway type
- volume => mount as a disk (iSCSI) => s3 => ebs
  - cached => save some frequently used data to vm
  - stored => completely on s3
- file => smb / nfs
- tape => tape backup software

support
#

paid support plans allow unlimited number of users to open technical support cases

swf
#

swf vs step function: use step function first. if does not fit => swf
use case: processing large product catalogue using Amazon Mechanical Turk

vpc
#

5 sg / eni
for dx, need to enabled route propagation
resource arn supported in dx
tenancy vpc will determine the instance tenancy by default.
for vpce, need to ensure the private dns option is enabled

worksplace
#

use connection aliases for cross-region workspaces redirection
- create connection alias
- share to other account
- associate with directories in each region
- setup route53 for failover
- setup connection string
maintenance - support regular maintenance windows (eg 15:00-16:00) or manual maintenance but cannot set something like patching on tue 3:00
workspaces application manager - package manager to help installing software
workspaces support Windows 10 desktop but no Windows server

api#

application discovery service#

asg#

athena#

billing#

cf#

cfn#

cloudhsm#

cloudtrail#

codecommit#

codedeploy#

cw#

data pipeline#

data sync#

ddb#

eb#

ebs#

ec2#

efs#

elasticache#

elb#

iam#

mTurk#

opsworks#

org#

other#

rds#

route53#

s3#

secret manager#

snowball#

sso#

storage gateway#

support#

swf#

vpc#

worksplace#

api
#

application discovery service
#

asg
#

athena
#

billing
#

cf
#

cfn
#

cloudhsm
#

cloudtrail
#

codecommit
#

codedeploy
#

cw
#

data pipeline
#

data sync
#

ddb
#

eb
#

ebs
#

ec2
#

efs
#

elasticache
#

elb
#

iam
#

mTurk
#

opsworks
#

org
#

other
#

rds
#

route53
#

s3
#

secret manager
#

snowball
#

sso
#

storage gateway
#

support
#

swf
#

vpc
#

worksplace
#