App Scaling
Learn about scaling Apps CPU, RAM, and containers - manually or automatically
Overview
Aptible Apps are scaled at the Service level, meaning each App Service is scaled independently.
App Services can be scaled by adding more CPU/RAM (vertical scaling) or by adding more containers (horizontal). App Services can be scaled manually via the CLI or UI, automatically with the Autoscaling, or programmatically with Terraform.
Apps with more than two containers are deployed in a high-availability configuration, ensuring redundancy across different zones.
When Apps are scaled, a new set of containers will be launched to replace the existing ones for each of your App’s Services.
High-availability Apps
Apps scaled to 2 or more Containers are automatically deployed in a high-availability configuration, with Containers deployed in separate AWS Availability Zones.
Horizontal Scaling
Scale Apps horizontally by adding more Containers to a given Service. Each App Service can scale up to 32 Containers.‘
Manual Horizontial Scaling
App Services can be manually scaled via the Dashboard or aptible apps:scale
CLI command. Example:
Horizontal Autoscaling
When Horizontal Autoscaling is enabled on a Service, the autoscaler evaluates Services every 5 minutes and generates scaling adjusted based on CPU usage (as percentage of total cores). Data is analyzed over a 30-minute period, with post-scaling cooldowns of 5 minutes for scaling down and 1 minute for scaling up. After any release, an additional 5-minute cooldown applies. Metrics are evaluated at the 99th percentile aggregated across all of the service containers over the past 30 minutes.
This feature can also be configured via Terraform or the aptible services:autoscaling_policy:set
CLI command.
By default, a Horizontal Autoscaling Operation follows the regular Container Lifecycle and Releases pattern of restarting all current containers when modifying the number of running containers. However, this behavior can be disabled by enabling the Restart Free Scaling (use_horizontal_scale
in Terraform) setting when configuring autoscaling for the service. With restart free scaling enabled, containers are added and removed without restarting the existing ones. When removing containers in this configuration, the service’s stop timeout is still respected. Note that if the service has a TCP, ELB, or GRPC endpoint, the regular full restart will still occur even with restart free scaling enabled.
Guide for Configuring Horizontial Autoscaling
Configuration Options
Container & CPU Threshold Settings
Container & CPU Threshold Settings
The following container & CPU threshold settings are available for configuration:
- Percentile: Determines the percentile for evaluating RAM and CPU usage.
- Minimum Container Count: Sets the lowest container count to which the service can be scaled down by Autoscaler.
- Maximum Container Count: Sets the highest container count to which the service can be scaled up to by Autoscaler.
- Scale Up Steps: Sets the amount of containers to add when autoscaling (ex: a value of 2 will go from 1->3->5). Container count will never exceed the configured maximum.
- Scale Down Steps: Sets the amount of containers to remove when autoscaling (ex: a value of 2 will go from 4->2->1). Container count will never exceed the configured minimum.
- Scale Down Threshold (CPU Usage): Specifies the percentage of the current CPU usage at which an up-scaling action is triggered.
- Scale Up Threshold (CPU Usage): Specifies the percentage of the current CPU usage at which a down-scaling action is triggered.
Time-Based Settings
Time-Based Settings
The following time-based settings are available for configuration:
- Metrics Lookback Time Interval: The duration in seconds for retrieving past performance metrics.
- Post Scale Up Cooldown: The waiting period in seconds after an automated scale-up before another scaling action can be considered. The period of time the service is on cooldown is still considered in the metrics for the next potential scale.
- Post Scale Down Cooldown: The waiting period in seconds after an automated scale-down before another scaling action can be considered. The period of time the service is on cooldown is still considered in the metrics for the next potential scale.
- Post Release Cooldown: The time in seconds to ignore following any general scaling operation, allowing stabilization before considering additional scaling changes. For this metric, the cooldown period is not considered in the metrics for the next potential scale.
General Settings
General Settings
The following general settings are available for configuration:
- Restart Free Scaling: When enabled, scale operations for modifying the number of running containers will not restart the other containers in the service.
Vertical Scaling
Scale Apps vertically by changing the size of Containers, i.e., changing their Memory Limits and CPU Limits. The available sizes are determined by the Container Profile.
Manual Vertical Scaling
App Services can be manually scaled via the Dashboard or aptible apps:scale
CLI command. Example:
Vertical Autoscaling
When Vertical Autoscaling is enabled on a Service, the autoscaler also evaluates services every 5 minutes and generates scaling recommendations based:
- RSS usage in GB divided by the CPU
- RSS usage levels
Data is analyzed over a 30-minute lookback period. Post-scaling cooldowns are 5 minutes for scaling down and 1 minute for scaling up. An additional 5-minute cooldown applies after a service release. Metrics are evaluated at the 99th percentile aggregated across all of the service containers over the past 30 minutes.
This feature can also be configured via Terraform or the aptible services:autoscaling_policy:set
CLI command.
Configuration Options
RAM & CPU Threshold Settings
RAM & CPU Threshold Settings
The following RAM & CPU Threshold settings are available for configuration:
- Percentile: Determines the percentile for evaluating RAM and CPU usage.
- Minimum Memory (MB): Sets the lowest memory limit to which the service can be scaled down by Autoscaler.
- Maximum Memory (MB): Defines the upper memory threshold, capping the maximum memory allocation possible through Autoscaler. If blank, the container can scale to the largest size available.
- Memory Scale Up Percentage: Specifies the percentage of the current memory limit at which the service’s memory usage triggers an up-scaling action.
- Memory Scale Down Percentage: Determines the percentage of the next lower memory limit that, when reached or exceeded by memory usage, initiates a down-scaling action.
- Memory Optimized Memory/CPU Ratio Threshold: Establishes the ratio of Memory (in GB) to CPU (in CPUs) at which values exceeding the threshold prompt a shift to an R (Memory Optimized) profile.
- Compute Optimized Memory/CPU Ratio Threshold: Sets the Memory-to-CPU ratio threshold, below which the service is transitioned to a C (Compute Optimized) profile.
Time-Based Settings
Time-Based Settings
The following time-based settings are available for configuration:
- Metrics Lookback Time Interval: The duration in seconds for retrieving past performance metrics.
- Post Scale Up Cooldown: The waiting period in seconds after an automated scale-up before another scaling action can be considered. The period of time the service is on cooldown is still considered in the metrics for the next potential scale.
- Post Scale Down Cooldown: The waiting period in seconds after an automated scale-down before another scaling action can be considered. The period of time the service is on cooldown is still considered in the metrics for the next potential scale.
- Post Release Cooldown: The time in seconds to ignore following any general scaling operation, allowing stabilization before considering additional scaling changes. For this metric, the cooldown period is not considered in the metrics for the next potential scale.
FAQ
How do I scale my apps and services?
How do I scale my apps and services?
See our guide here for How to scale apps and services