laitimes

HOW TO AVOID OVER-PROVISIONING JAVA RESOURCES

author:Cloud and cloud sentient beings
HOW TO AVOID OVER-PROVISIONING JAVA RESOURCES

Horizontal vs. vertical scaling: Which is better for adjusting application resource consumption and optimizing cloud costs?

译自 How to Avoid Overprovisioning Java Resources,作者 Pratik Patel。

Developers are strange creatures; We use shiny new tools or libraries in our applications without much thought, but with caution when deploying to production. No one wants their pager to start buzzing in the middle of the night, let alone the pressure of keeping the app running at a high 9s reliability. Developers are adventurous when it comes to building and coding apps, but they are very conservative when it comes to manipulating elements.

There is a phenomenon known as overprovisioning: adding extra horsepower (typically CPU and RAM) to the deployment of an application in a cloud environment to ensure that the application has enough room to boot and provide room for spikes that occur while the application is running.

Fortunately, there are ways to reduce your need for degree configurations, which can lead to significant savings in cloud spending. I'm going to look specifically at overprovisioning in Java applications.

The application load is always unstable

As any developer or DevOps person will tell you, the traffic to an application is almost never consistent over a day or week, and the vast majority of applications are unevenly loaded over a period of time. Every application has troughs where it doesn't handle many user requests or processes data, and there are spikes where application utilization is very high. As long as instances of the application are not pushed to the point where the application is having problems, these spikes are fine, for example:

  • Unusually long response latency to meet service-level agreements (SLAs).
  • Excessive memory usage can cause garbage collector (GC) jitter in the Java Virtual Machine (JVM).
  • Insufficient resources (CPU threads, file/network handles) can cause incoming requests to be rejected and cannot be processed.

The last two issues cause the application to become completely unresponsive and appear as if nothing has been done. During testing, developers take note of this cap and adjust the number of CPU cores and memory required for the application instance. They then add a typically arbitrary amount of CPU and memory to accommodate spikes – overprovisioning the application's available resources. Overprovisioning is a safety net for the development team to ensure that everything runs smoothly and users are happy with the response time.

However, overprovisioning can add significant costs to running applications. Running cloud VMs typically have a fixed CPU (core or virtual CPU) and memory, and aren't considered resilient. This means that you are paying for the capacity you have provisioned, whether you fully utilize it or not. This extra space can account for anywhere from 5% to 50% of the cloud you've provisioned, depending on how much additional capacity the development team thinks will be needed to accommodate the spike.

To help you deal with overprovisioning and save money on cloud spending, there are certain strategies you can use, depending on whether you're scaling vertically or horizontally. I will describe both scaling models and the strategy for each. Whether you're running in the cloud or on-premises, you can use these strategies and techniques.

Vertical scaling

Vertical scaling is a simpler strategy for scaling your application to handle more load, but it's not as flexible as horizontal scaling. Vertical scaling means adding more CPU cores and more memory to your application on a physical or virtual server (or faster or more SSD storage if your application is I/O intensive). Changing these requires stopping and restarting the application, which can be disruptive. However, there is a way to reduce the over-provisioning of this type of extension.

Better load testing and estimation

Performance testing is considered to be the most difficult type of testing – it requires an in-depth understanding of the entire application and all connected services. Setting up a performance test environment is a lot of work, and aligning it with the characteristics of the production environment can be a challenge. Generating the load of simulated production and having application data (the size and shape of production data) requires thought and effort to get it right.

As a result, development teams often make assumptions and take shortcuts. This is fine, but it can lead to overestimating and over-sizing the size of the production instance of the application.

What can developers do to get better performance data so that their Java applications can be properly sized and sized? Here are the three main things you can do to determine the peak capacity needs of your application.

1. Measure CPU and memory utilization for servers and JVMs

Typically, developers focus solely on the CPU and memory usage of the server (or virtual machine) to determine the amount of CPU and memory needed to handle peak loads. Using tools to monitor these within the JVM will help set these up at the right level:

  • JVM GC Monitoring: This can help detect low memory, which can lead to high CPU utilization as the JVM gets stuck in GC garbage scenarios. This also helps detect situations where too much memory is allocated, resulting in long GC pauses, which in turn can lead to longer latency than expected. Cutting back on unnecessary memory can also save money.
  • JVM Thread Monitoring: This can help detect when the CPU is running out, which can lead to long or unresponsive response times. This can help detect too many idle threads, and by reducing the number of cores allocated, you can also save money.

2. The new JVM version provides better performance than the old version

In our tests from JDK 11 to 17 to 21, we saw an increase in CPU usage with each release of the JVM. Of course, your application code may need some tweaking, especially if your application is based on an earlier version than Java 11.

There are also different GC algorithms that allow you to get more efficiency from your cloud virtual machines; However, this is highly dependent on the memory usage of your app. For example, an application that does a lot of data processing and transformation will have a different GC profile than a RESTful application. You can check out the GC section of the Azul blog for more information.

3. Understand how the JVM works

The following diagram shows how a typical Java application goes from JVM startup to execution over time. High CPU usage at startup; This is the JVM boot, load classes, etc. Your application framework, such as Spring Boot, then starts, initializes, and enters the Ready Service Request state.

HOW TO AVOID OVER-PROVISIONING JAVA RESOURCES

Notice the line above the peak, which shows how CPU over-provisioned the VM deployment for this application is (a safety net for bursting high loads). As the JVM's just-in-time (JIT) compiler optimizes code paths, the application becomes more efficient - it uses less CPU to serve the same amount of load. What ultimately happens is that on top of the extra space you provide for your application, the JVM reaches a lower baseline of CPU utilization due to JIT compiler optimizations. As a result, the amount of over-provisioning increases! This means that you now have more waste in your allocated CPU - and the opportunity to save even more money.

Using a high-performance JVM means you can reduce (or completely eliminate) overprovisioning. Understanding this curve and its impact on your application can help you reduce the safety net allocated to your application's VM instances. If you know where the long-tail peaks are going to be, you can lower the top line ("over-provisioning") and you'll be able to allocate fewer CPU cores and save on cloud spending.

Horizontal scaling

For years, elastic computing has been touted as the holy grail of scalable application development, and horizontal scaling is the foundation of elastic computing. Horizontal scaling means adding capacity to your application by adding more servers (with their own CPU and memory) rather than adding more CPU cores and memory to existing servers.

However, scaling horizontally is more complex than scaling vertically and requires more planning and more external (to application) settings. And it's not as efficient as vertical scaling because you have to introduce a routing layer, which means more processing and network overhead.

Reducing overprovisioning in horizontally scaling deployments of Java applications is done by adding and removing capacity as needed, typically in an automated manner to detect load and start or shut down application node instances. As a result, you'll have some over-provisioned capacity, but in small quantities and for a short duration (depending on how you've configured it).

Reduce the size of your application

As we move from building applications as monolithic applications to microservices (or even smaller cloud functions), we make applications smaller and smaller. There are advantages and trade-offs to these different architectures, but in the context of cloud cost optimization for applications, especially the use of horizontal scaling to get elastic compute smaller (or small to medium scale) is best.

Reducing the size of your application reduces the amount of CPU and memory you need to allocate to each instance of your application. This allows for more incremental scaling and more efficient use of resources, which in turn means more granular control over cloud costs. The smaller the deployment unit, the more (or less) you'll pay as you scale up and down. Of course, this is only possible if you use autoscaling.

Use autoscaling

Autoscaling refers to the ability of an application to add or remove nodes from an application instance based on load up or down. When it comes to cloud cost optimization, we're more interested in scaling down more aggressively, or stopping application instance nodes. Depending on the environment you're using to build your application cluster, you'll get different autoscaling options. The most popular autoscaling platform is Kubernetes, which supports autoscaling. The main trade-off of Kubernetes is that it introduces a high degree of complexity to standard fixed, distributed cluster deployments.

A simpler alternative to Kubernetes is container-as-a-service (CaaS), such as AWS Fargate, Google Cloud Run, or Microsoft Azure containers. These deployment services provide an easier way to deploy your applications. You provide your application (in a Docker container) to a service, and it handles up and down autoscaling. The trade-off for CaaS solutions is that they cost more than standard VMs and may be higher than managed Kubernetes deployments.

conclusion

Reducing overprovisioning can help you save on cloud costs. Ultimately, what you can achieve depends largely on your application and its performance profile. Knowing what happens when an application starts and runs is useful for which strategy you use to reduce overprovisioning. Understanding the CPU and memory profiles of your Java application will help you understand how your application is performing at runtime.

Consider using a more efficient, high-performance JVM, such as Azul Platform Prime, for small to large Java application deployments. Azul Platform Prime:

  • Thanks to its advanced C4 GC, low-level optimizations, and advanced Falcon JIT compiler, it handles peak loads better than other JVMs.
  • JIT ramp-up (and high JIT CPU utilization) can be avoided with ReadyNow.
  • Because of the way it handles peak loads, it provides lower latency under load while handling higher peaks.

To learn more, download IDC's white paper on Optimizing Java Application Performance to Improve Business Outcomes and Cloud Cost Efficiency.

Read on