50 AWS Solutions Architect Interview Questions for 2026 (with Real Answers)
The exact questions AWS hiring managers ask in 2026 — VPC, S3, EC2, RDS, security, scaling, well-architected pillars, and the design scenarios that separate seniors from juniors.

Table of Contents
AWS Solutions Architect interviews in 2026 are a mix of service-level depth, design whiteboarding, and the famous "design Netflix on AWS" type questions. If your CV says SAA-C03, expect interviewers to assume you know the services and probe the design judgement layer. This list covers both.
Tip: The biggest differentiator between juniors and seniors in AWS interviews is whether you can articulate trade-offs. Every architecture has them. Saying "I would use S3 because it is cheap and durable" is junior. Saying "I would use S3 Standard for the first 30 days, then lifecycle to Glacier Deep Archive for retention, accepting a 12-hour retrieval SLA for the cost savings" is senior.
How to Prepare
- Pass SAA-C03 first. The certification is interview signal #1. If you have it, the technical bar is assumed met.
- Practice with our free SAA-C03 AI practice exam — the scenario style is exactly what interviews ask.
- Whiteboard 5 reference architectures cold: three-tier web app, serverless API, data lake, multi-region active-active, hybrid VPN/Direct Connect.
- Read AWS Well-Architected pillar whitepapers. Senior interviewers quote them directly.
- Bring stories. "Tell me about a system you designed" — have two ready, with the trade-offs you accepted.
Foundations & Well-Architected (Q1-Q8)
Q1. What are the six pillars of the AWS Well-Architected Framework?
Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, Sustainability. The senior answer also mentions that pillars trade off — e.g. multi-AZ improves Reliability but increases Cost — and that the framework provides a structured way to make those trade-offs explicit.
Q2. Region vs Availability Zone vs Edge Location?
Region: geographic area (e.g. us-east-1). AZ: one or more data centers within a region with redundant power and networking. Edge location: CloudFront PoPs near end users for caching and Lambda@Edge. Design multi-AZ for HA within a region, multi-region for DR or latency-sensitive global users.
Q3. What is the difference between scalability and elasticity?
Scalability is the ability to handle increased load — vertical (bigger instance) or horizontal (more instances). Elasticity is automatic scaling up and down based on demand. AWS is elastic by design; on-prem is scalable but rarely elastic.
Q4. Explain stateful vs stateless architecture.
Stateless: any instance can serve any request because state is externalized (DB, cache, session store). Stateless tiers scale horizontally trivially. Stateful: instances hold local state (in-memory session, local disk). Stateful tiers need session affinity or careful replication. Modern AWS designs externalize state to ElastiCache, DynamoDB, or S3.
Q5. What is the AWS Shared Responsibility Model?
AWS is responsible for security of the cloud (physical data centers, hypervisor, managed service internals). Customer is responsible for security in the cloud (IAM, network config, OS patching for self-managed, encryption choices, data classification). The line moves depending on service: EC2 customer responsibility is large; Lambda customer responsibility is small.
Q6. What are AWS Organizations and Service Control Policies (SCPs)?
Organizations is a multi-account management service with consolidated billing and hierarchy (Organization Root → OUs → Accounts). SCPs are guardrails applied at OU or account level — they define the maximum permission boundary. An SCP cannot grant access; it can only restrict what IAM otherwise allows.
Q7. AWS Landing Zone / Control Tower — when to use?
Control Tower automates landing-zone setup: multi-account structure, baseline guardrails, centralized logging, identity federation, and account vending. Use whenever you have more than 3-4 AWS accounts. Senior architects also know about Customizations for Control Tower (CfCT) and AFT (Account Factory for Terraform).
Q8. Hybrid cloud connectivity options on AWS?
Site-to-Site VPN (IPsec over internet, quick to set up, lower bandwidth), AWS Direct Connect (dedicated fibre, predictable performance, longer to provision), Transit Gateway (hub for multi-VPC + on-prem), Cloud WAN (managed global network). Most enterprises use Direct Connect with VPN as backup.
Networking & VPC (Q9-Q15)
Q9. Walk me through the components of a VPC.
CIDR block, subnets (one per AZ for HA), route tables, internet gateway (for public subnets), NAT gateway (for private subnets outbound), security groups (instance-level firewall), NACLs (subnet-level), VPC endpoints (private connectivity to AWS services), VPC peering, Transit Gateway attachments. Mention IPv6 dual-stack for senior roles.
Q10. Public subnet vs private subnet?
Public subnet: route table has 0.0.0.0/0 → Internet Gateway. Instances with public IP can reach internet directly. Private subnet: route table has 0.0.0.0/0 → NAT Gateway. Instances have no public IP; outbound internet via NAT, inbound only through ALB/NLB in the public subnet.
Q11. Security group vs NACL?
Security group: stateful, instance-level, allow rules only. NACL: stateless, subnet-level, allow + deny rules. Security groups are evaluated first; if denied there, NACLs do not see the traffic. Use SGs for application-level rules and NACLs for coarse subnet-wide deny lists.
Q12. VPC peering vs Transit Gateway?
VPC peering is point-to-point and does not transit (A<->B and B<->C does not give A<->C). Transit Gateway is a hub-and-spoke router connecting many VPCs, VPN, Direct Connect Gateways. Use TGW once you have more than a few VPCs. TGW also supports inspection patterns with Network Firewall.
Q13. What is a VPC endpoint?
Private connection to AWS services without traversing the internet. Gateway endpoint (free) for S3 and DynamoDB. Interface endpoint (paid, powered by PrivateLink) for most other services and SaaS partners. Reduces egress cost and improves security.
Q14. NAT Gateway vs NAT Instance?
NAT Gateway is managed, auto-scales to 45 Gbps, HA within AZ (deploy one per AZ). NAT Instance is a self-managed EC2 — cheaper at low volumes but you handle patching, HA and scaling. Default to NAT Gateway in production.
Q15. ALB vs NLB vs CLB vs Gateway Load Balancer?
ALB: Layer 7 (HTTP/HTTPS), path/host-based routing, WAF integration, OIDC auth. NLB: Layer 4 (TCP/UDP/TLS), millions of requests per second, static IP, preserves source IP. CLB: legacy, avoid. Gateway Load Balancer: for inserting third-party network appliances (firewalls, IDS/IPS) into the data path.
Compute & Scaling (Q16-Q22)
Q16. EC2 instance types — how do you pick?
General purpose (M-series) for balanced workloads, Compute optimized (C-series) for CPU-heavy, Memory optimized (R/X) for in-memory dbs, Storage optimized (I/D) for high IOPS, Accelerated (P/G/Inf) for GPU/ML. Pick based on benchmarks of your workload, not guesses. Use Graviton (ARM, "g" suffix) for 20-40% better price/performance on supported workloads.
Q17. On-Demand vs Reserved vs Savings Plans vs Spot?
On-Demand: pay as you go, no commitment. Reserved Instances: 1- or 3-year commitment for 30-70% discount, tied to instance family. Savings Plans: 1- or 3-year commitment, more flexible (compute or EC2-family). Spot: spare capacity, up to 90% discount, can be reclaimed with 2-minute warning. Mix for steady + variable load.
Q18. Auto Scaling Group — what scaling policies exist?
Target tracking (e.g. keep CPU at 60%), step scaling (different actions based on threshold breach magnitude), simple scaling (single action), scheduled scaling (predictable load), predictive scaling (ML-driven, looks at historical patterns). Combine target tracking + scheduled for the common case.
Q19. Launch template vs Launch configuration?
Launch templates are the modern, versionable replacement. Support newer features (mixed instance types, Spot allocation strategies, T2/T3 unlimited). Launch configurations are deprecated — always use templates.
Q20. EC2 vs ECS vs EKS vs Fargate vs Lambda — how do you choose?
EC2: full control, traditional. ECS: AWS-managed container orchestration, simpler than k8s. EKS: managed Kubernetes for portability. Fargate: serverless containers, no node management (works with both ECS and EKS). Lambda: function-level, event-driven. Decision tree: stateless event-driven small workloads → Lambda; container-native → Fargate; need full k8s ecosystem → EKS; legacy server-based → EC2.
Q21. ECS Fargate vs ECS on EC2?
Fargate: no servers to manage, pay per task, faster scaling. EC2: cheaper at sustained scale, more flexibility (GPU, custom AMIs, larger volumes), require capacity provider planning. Default Fargate for new workloads; switch to EC2 only if cost or feature reasons warrant.
Q22. Lambda cold start — what is it and how to mitigate?
First invocation of a Lambda after period of inactivity has higher latency (50ms-2s depending on runtime + memory + VPC). Mitigations: provisioned concurrency, smaller deployment packages, faster runtimes (Go, Rust, Node), avoid VPC unless needed, Lambda SnapStart (Java).
Storage (Q23-Q28)
Q23. S3 storage classes — pick one for each scenario.
Frequent access → Standard. Unknown / mixed access → Intelligent-Tiering. Infrequent 30+ days → Standard-IA. Non-critical backups → One Zone-IA. Archive with millisecond retrieval → Glacier Instant Retrieval. Archive with hours-OK → Glacier Flexible Retrieval. Compliance retention 12h+ retrieval acceptable → Glacier Deep Archive.
Q24. How do you secure an S3 bucket?
Block Public Access (account + bucket level), bucket policy and IAM policies (least privilege), VPC endpoint for VPC-only access, encryption at rest (SSE-S3 / SSE-KMS), enforce HTTPS via bucket policy, MFA Delete on critical buckets, versioning + S3 Object Lock for ransomware protection, CloudTrail data events for audit, Macie for sensitive data discovery.
Q25. EBS volume types?
gp3 (general purpose SSD, default), gp2 (older), io2 / io2 Block Express (high IOPS SSD for dbs), st1 (throughput HDD), sc1 (cold HDD). gp3 decouples IOPS/throughput from size; for new workloads default to gp3.
Q26. EBS vs EFS vs FSx?
EBS: block storage, single-instance-attached, in one AZ (unless using multi-attach io2). EFS: NFS file system, multi-AZ, mounted by multiple instances. FSx: managed file systems (Windows File Server, Lustre, NetApp ONTAP, OpenZFS). Use EBS for OS/database, EFS for shared content, FSx for specialty workloads.
Q27. How do you migrate large data to AWS?
Direct upload for <TB, AWS DataSync for online sync (filesystem to S3/EFS), Snowball (50/80 TB physical appliance), Snowmobile (PB, decommissioned 2024), Snowcone (small portable), S3 Transfer Acceleration for global uploads to a central bucket. Choose based on data size and network bandwidth.
Q28. S3 Cross-Region Replication vs Same-Region Replication — when?
CRR for DR, compliance with data residency, lower-latency reads in another region. SRR for log aggregation across accounts, sandbox-to-prod copies, cross-team access. Both require versioning on both buckets, increase storage cost ~2x for replicated objects.
Database (Q29-Q34)
Q29. RDS vs Aurora?
RDS is managed traditional DB engines (MySQL, Postgres, MariaDB, Oracle, SQL Server). Aurora is AWS-built MySQL/Postgres-compatible with separated storage layer, 5x MySQL / 3x Postgres throughput, automatic 6-way replication across 3 AZs. Aurora costs more but is the default for production-grade Postgres/MySQL on AWS.
Q30. RDS Multi-AZ vs Read Replicas?
Multi-AZ is for HA — synchronous standby in another AZ, automatic failover, not for read traffic. Read replicas are for scaling read throughput — asynchronous, can be promoted manually, can be cross-region. Use both: Multi-AZ for HA + replicas for read scaling.
Q31. When to use DynamoDB?
Single-digit millisecond latency at any scale, predictable performance, key-value or document workload, serverless billing if access is spiky (on-demand mode). Anti-patterns: complex relational queries, large blobs, ad-hoc analytics. Use DynamoDB Streams + Lambda for change feeds, Global Tables for multi-region active-active.
Q32. DynamoDB partition key design?
Choose a partition key with high cardinality to avoid hot partitions. Composite key (partition + sort) gives you query flexibility. Use Global Secondary Indexes (GSI) for alternative access patterns. Pre-compute aggregates when possible; do not scan the table.
Q33. ElastiCache Redis vs Memcached?
Redis supports persistence, pub/sub, transactions, complex data structures, replication, HA via Multi-AZ. Memcached is simpler, multithreaded, no persistence, no replication. Default to Redis unless you have a specific reason for Memcached.
Q34. When to use Redshift?
Petabyte-scale data warehouse for analytics — SQL queries over columnar storage. Use for BI dashboards, reporting, batch analytics. Redshift Serverless removes capacity planning. For ad-hoc SQL on S3 use Athena; for streaming analytics use Kinesis Data Analytics / MSF.
Security & IAM (Q35-Q40)
Q35. IAM user vs role vs group?
User: long-lived identity with credentials — avoid for application use. Group: collection of users for permission management. Role: assumed identity with temporary credentials via STS — used by services, federated users, and cross-account access. Modern best practice: no IAM users for humans (SSO via Identity Center), no IAM users for workloads (IRSA / Pod Identity / EC2 instance role).
Q36. What is the principle of least privilege in AWS?
Grant only the permissions required, scoped as narrowly as possible. Tools: IAM Access Analyzer (finds unused permissions), service last-accessed data, IAM Access Advisor. Start broad, tighten after observing actual usage.
Q37. KMS vs Secrets Manager vs Parameter Store?
KMS: key management service, used to encrypt data and other secrets. Secrets Manager: managed rotation of credentials, $0.40/secret/month. Parameter Store: free hierarchical key-value with encryption-at-rest via KMS, no rotation. Use Secrets Manager for DB credentials (rotation matters); Parameter Store for config and feature flags.
Q38. What is AWS WAF and when do you use it?
Web Application Firewall — OWASP Top 10 rules, rate limiting, geo-blocking, bot control. Attach to CloudFront, ALB, API Gateway, App Runner, Cognito UserPool. Combine with AWS Shield (DDoS protection, Standard free, Advanced paid).
Q39. GuardDuty vs Security Hub vs Inspector?
GuardDuty: threat detection from CloudTrail, VPC Flow Logs, DNS logs. Security Hub: central dashboard aggregating findings + CIS/PCI compliance scoring. Inspector: vulnerability assessment for EC2, ECR images, Lambda. Three different layers — enable all in mature environments.
Q40. How do you secure cross-account access in AWS Organizations?
Use IAM roles with sts:AssumeRole and external IDs (for third-party); centralized identity provider (AWS Identity Center / Okta) federated to each account; SCPs at the org level to block dangerous regions/services; CloudTrail aggregation to a central audit account; Control Tower for guardrails.
Integration & Serverless (Q41-Q44)
Q41. SQS vs SNS vs EventBridge vs Kinesis?
SQS: pull-based queue. SNS: pub/sub fan-out, push-based. EventBridge: event bus with rules + schema registry + cross-account routing, replaces CloudWatch Events. Kinesis Data Streams: real-time streaming with replay, sharded throughput. Default: EventBridge for AWS-service events, SNS+SQS for cross-service fanout, Kinesis for high-throughput streaming.
Q42. What is the difference between SQS Standard and FIFO queues?
Standard: at-least-once delivery, best-effort ordering, virtually unlimited throughput. FIFO: exactly-once processing, strict ordering by MessageGroupId, 3000 msgs/sec per group. Use FIFO when ordering or dedup matters (financial transactions, state machines).
Q43. Lambda concurrency — reserved vs provisioned?
Reserved concurrency: cap on max simultaneous executions — protects downstream systems. Provisioned concurrency: pre-warmed execution environments — eliminates cold starts, costs even when idle. Combine: reserve a limit, provision for latency-sensitive paths.
Q44. Step Functions vs SQS-driven Lambda?
Step Functions: explicit state machine, visible workflow, error handling, retries, parallel branches, human approval steps. SQS+Lambda: simpler, cheaper, no visibility into workflow. Use Step Functions for orchestration of 3+ steps; use SQS+Lambda for fire-and-forget event processing.
Design Scenarios (Q45-Q50)
Q45. Design a highly available web application on AWS.
CloudFront in front of an ALB across two AZs, ASG of EC2 (or ECS Fargate) behind the ALB, RDS Aurora Multi-AZ for the database, ElastiCache Redis for sessions / cache, S3 for static assets, Route 53 with health checks. Mention WAF + Shield, CloudWatch monitoring with alarms, multi-region for disaster recovery if RTO matters.
Q46. Design a serverless REST API for 100K requests/second.
API Gateway (REST or HTTP API depending on features) → Lambda functions → DynamoDB. Cache hot responses at API Gateway, use Lambda provisioned concurrency for cold-start-sensitive endpoints, DynamoDB on-demand or auto-scaling, X-Ray for tracing, CloudWatch alarms on Lambda errors. Mention rate limiting via API Gateway usage plans and AWS WAF.
Q47. Design a data lake on AWS.
S3 as the storage layer, organized by raw / curated / consumption zones (medallion pattern); AWS Glue for catalog and ETL; Athena for ad-hoc SQL; Redshift Spectrum or Redshift for warehouse workloads; EMR for big batch; Lake Formation for governance (column / row-level security, fine-grained permissions); MSK or Kinesis for streaming ingest; QuickSight for BI. Senior: discuss Apache Iceberg for ACID transactions on the lake.
Q48. The application is suddenly slow. Walk me through troubleshooting.
Look at CloudWatch dashboards first — ALB latency, Lambda duration, RDS CPU, DynamoDB throttles. X-Ray traces to find the slow segment. Logs for errors. Check if a deploy correlates with the slowdown. If DB: slow query log, RDS Performance Insights. If network: VPC Flow Logs. If client-side: CloudFront cache hit rate.
Q49. How would you migrate a monolithic on-prem app to AWS?
Discovery (Application Discovery Service, Migration Hub Strategy), 6Rs framework (Rehost / Replatform / Refactor / Repurchase / Retire / Retain). For most monoliths: rehost via Application Migration Service first, then refactor incrementally (strangler pattern) into containers / Lambda. Database via DMS with minimal downtime. Migrate non-prod first, validate, then prod with cutover window.
Q50. Tell me about a system you designed.
The behavioural question every AWS interview includes. Use STAR. Pick a real system. Cover: (a) the problem and the constraints (scale, budget, latency), (b) the architecture decisions you made and what you rejected, (c) the trade-offs you accepted explicitly, (d) what you would change with hindsight. Bonus points for mentioning the Well-Architected pillar trade-offs in your reasoning.
Practice SAA-C03 with Free AI Questions
The same scenario style interviewers ask. Unlimited and free.
Try SAA-C03 Practice ExamFrequently Asked Questions
What are the six pillars of the AWS Well-Architected Framework?
Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, Sustainability. Senior answers mention trade-offs between pillars.
What is the difference between Availability Zones and Regions?
Region = geographic area. AZ = one or more data centers within a region with redundant power and networking. Design multi-AZ for HA within a region, multi-region for DR.
S3 storage classes — when to use each?
Standard (frequent), Intelligent-Tiering (unknown patterns), Standard-IA (infrequent 30+ days), One Zone-IA (non-critical), Glacier Instant / Flexible / Deep Archive (archive tiers).
What is the difference between SQS and SNS?
SQS is a pull-based queue (one producer to one consumer). SNS is pub/sub fanout (one publisher to many subscribers). Combine for fanout-with-buffer patterns.
What is the difference between security groups and NACLs?
Security groups: stateful, instance-level, allow only. NACLs: stateless, subnet-level, allow + deny. Use SGs for app rules, NACLs for coarse subnet deny lists.
Land the AWS Solutions Architect Role
Free tools to plan your interview prep and certification roadmap.
