Running Containers on AWS
Iāve spent the last week exploring how to run containers on AWS, since I have more experience with GCP, particularly with GKE.
Hereās what Iāve learned.
Your Options
Thereās basically 4 options. Weāre excluding AWS Lambda or other PaaS offerings, because those arenāt container-based.
- Elastic Container Service (ECS) where you manage the EC2 instances
- Elastic Container Service (ECS) with AWS Fargate āĀ AWS manages the instances
- Elastic Kubernetes Service (EKS) where you manage the node groups (node pools in GKE)
- Elastic Kubernetes Service (EKS) with AWS Fargate āĀ AWS manages the instances
The most interesting options are the 2 Fargate options; infra management is clearly moving farther and farther behind the scenes. We probably wonāt be upgrading node pools in 2 years.
The ECS
API
Letās go through the ECS API and see how it compares to the Kubernetes API.
A Container Instance
is a node. VMs that are part of your ECS cluster. Iām not sure why they didnāt call these nodes.
Launch Types
I only named two launch types above, but there are three available.
- Fargate - AWS provisions and manages the VMs behind the scenes. No version upgrades, nada. Little more expensive but less to deal with.
- EC2 - you have to manage the VMs. This was the original mode of ECS. You do this with an abstraction called a capacity provider.
- External - you have on-prem VMs that are registered with your cluster. See AWS ECS Anywhere.
If you donāt have some weird compliance requirement thatās stopping you, Iād recommend using Fargate.
Itās less of a question of āwhich is betterā, because thatās clear. The real question isāāIs Fargate mature enough to replace the EC2 launch type for most use cases?ā. And I believe thatās a yes.
TaskDefinition
- This is similar to a Kubernetes
Pod
- Can have multiple containers, which can communicate with each other via
localhost
- Can share volumes
- You want them to scale up and down together
- You deploy them together
- āYou should create task definitions that group the containers that are used for a common purpose.ā - the docs.
I would go further and say no more than 1 main container and any additional supporting containers, i.e. the sidecar pattern.
portMappings
, a nested field, is very similar to aService
of typeNodePort
in Kubernetes. It allows the container to access port(s) on the host container instance.
Service
Not to be confused with a Kubernetes Service
(which provides a stable IP, among other things).
Services maintain the availability of your tasks (closer to a Kubernetes ReplicaSet
or Deployment
). You provide it with a task definition and a launch type.
placementConstraints
is similar to node affinity / anti-affinity or taints and tolerations.
Networking
There are a few different networkConfigurations
available.
- In
awsvpc
, tasks receive their own elastic network interface and a private IPv4 address. This puts it on par with an EC2 instance, from a networking perspective. - In
bridge
, it uses Dockerās virtual network. - In
host
, the task maps container ports to the ENI of the host EC2 instance. Keep in mind ports on host nodes are finite resources in your cluster.
If youāre using the Fargate launch mode, you have to use awsvpc
.
This is interesting to compare to Kubernetes, because Kubernetes is like a combination of awsvpc
and bridge
. Pods are given their own IPs, but theyāre virtual (kube-proxy
edits the nodeās IP tables)
In Kubernetes it can also be implemented many different ways; you have to choose a network plugin. In managed Kubernetes they have good default choices and you usually donāt think about this.
Service Discovery
Itās very common for one microservice to want to call another. You donāt want to call public endpoints, because thatās additional load on your networking infrastructure (e.g. a NAT Gateway, an API Gateway), and itās also going over the public internet.
In ECS, to accomplish this, you use service discovery, which is integrated with Amazon Route 53.
You register the service into a private DNS namespace, and DNS records, which reference the private IP, are created for a service. You can then hit your service at <service discovery service name>.<service discovery namespace>
. Good thing weāre not overloading the word āserviceā. š
A typical workflow would create one āService discovery serviceā per ECS Service, with all IP addresses having A
name records.
This was added in 2018, and is a good example of ECS starting out overly simple, and growing more complicated, towards Kubernetes.
Relationship To Load Balancing
To understand this, we need to go over some of the load balancing abstractions in AWS.
TargetGroup
- a set of endpoints, orTargets
. We will have one of these per ECS service.Listener
- Listens to requests from clients on a protocol or port. Weāll have 1 of these per (protocol, port) combination that we support. In the example below, just one, for HTTP.ListenerRule
- This is what connects theListener
and theTargetGroup
.
e.g. if path is /hello
, go to this TargetGroup
. Or if itās /foo
, redirect to /bar
.
So, we will have
- 1 load balancer
- 1 listener, for HTTP and port 80
- 1 target group per ECS Service
- 1 listener rule per ECS Service
Hereās an example, in Pulumi.
Autoscaling?
Yeah, ECS autoscales well. You do this by adding a āscaling policyā. Youāve got a few options there.
- Target based scaling - scale based on some metric
- Step scaling - when some alarm goes off, scale up to the next step. When the next alarm goes off, scale to the next step.
- Scheduled scaling - scale based on date and time.
These are really good options. Many companies know their system is going to have a lot of traffic at some given time, e.g. 09:00
on Monday morning, and scheduled scaling is simple.
The other two seem a bit more complex to tune, but really good options.
Additional Notes
- ECS does rolling deployments by default, has an option for blue/green (
CODE_DEPLOY
), and a way to have even finer-grained control. - Workloads bit slower to start than Kubernetes. I changed two environment variables across two tasks, and that took 8 minutes for me.
- Fargate is especially slow to start, because it can involve scaling up. GKE Autopilot has the same problem.
Fargate on EKS
Fargate on EKS might only be similar to Fargate on ECS in name and ability. Certainly not in implementation or how you use it.
In order to use Fargate on EKS, you have to create a Fargate profile.
You then use label selectors for pods in order to determine which, if any, Fargate profile applies. It schedules the pod using its own scheduler, on what is basically its own managed node pool. They will handle scaling and upgrading for you.
You just have to think through your memory and vCPU requests, which you should be doing anyway.
Fargate on EKS is very similar to GKE Autopilot. Itās clear that these Containers as a Service tools are the future for container orchestration. Few people really want to deal with version upgrades and manually scaling.
Wow! You read the whole thing. People who make it this far sometimes
want to receive emails when I post something new.
I also have an RSS feed.