Cloud-Native and Serverless design
July 15, 2025
How to design services that scale predictably on AWS, Azure, or GCP with sensible tradeoffs.
Cloud-native is about operating model
“Cloud-native” usually means you can:
- Deploy frequently without fragile runbooks
- Scale horizontally when traffic changes
- Recover quickly from failures
It’s less about which vendor you pick and more about how you build and run software.
Serverless fits spiky, event-driven workloads
Serverless can be a great default when:
- Work arrives as events (webhooks, queue messages, schedules)
- You want minimal infrastructure management
- You can tolerate cold starts or mitigate them
Common pitfalls are hidden coupling and difficult local debugging. Offset that with good tracing and small functions.
Example (make operational constraints explicit in IaC):
resource "aws_lambda_function" "handler" {
function_name = "api-handler"
timeout = 10
memory_size = 512
}
Design for failure, not perfection
Across AWS/Azure/GCP, the same resilience patterns show up:
- Timeouts everywhere (client, server, downstream calls)
- Retries with jitter (but only when operations are safe to retry)
- Circuit breakers and bulkheads for noisy dependencies
- Dead-letter queues for poisoned messages
Treat IAM as part of your application
Security and operability depend on identity and permissions:
- Use least-privilege roles per service
- Prefer short-lived credentials (workload identity)
- Keep secrets in a managed store, not environment files
Optimize cost like performance
Cloud bills are feedback. Make them actionable:
- Attribute cost by service/team (tags/labels)
- Watch top drivers (egress, logs, idle databases)
- Right-size and autoscale with clear SLOs
References
Hi, I'm Martin Duchev. You can find more about my projects on my GitHub page.