The wealth of information on scaling is profound. Although, there was one lesson that I found to be the most interesting. This is that every web application is different. This is not to say that we cannot learn from others that have come before us, but we need to understand that every application has a wide variety of demanding requirements. This may include a special read/write mixture of database operations or an inability to effectively cache content. Because of these differences, developers must be extremely perceptive of their system. This means constant monitoring of the site’s trends.

All sites have an average load. Most sites also have a peak load for specific times of the day, week, and year. Furthermore, rapid spikes in I/O are frequently triggered by popular news and blog sites. Planning for yearly spikes is important. Many sites will maintain only 10-20% of their capacity during regular hours to support the eventual peak loads. Even if you don’t want to build your system to handle the few extreme user loads, Amazon and others provide virtually unlimited backup capacity if your software is designed to scale in such a fashion. There are a variety of sites online today running a variety of configurations. Many of the largest web servers have developed unique and quite different solutions to their scaling problems. And not all of those solutions were expensive, proprietary hardware/software.

Segmentation of a site’s features is the first and foremost mentioned method of scaling. Once a system can be segmented, it can be clustered, sharded, and/or balanced. This is applicable to the entire scalable architecture. This is the point at which you stop scaling vertically and begin to scale horizontally. Developers can stop tweaking for minor performance improvements and begin to design the software to handle indefinite user loads at the cost of cheap redundent hardware.

The most popular solution begins by breaking the web server or web application from the database. At this point, each of these can add clusters, layers of redundency, and optimizations based on your read/write mixture. This can be quickly followed by application caching to reduce the inevitable bottleneck within the server. The relational database is almost always the bottleneck in a system, so avoiding accesses to the database is best. Furthermore, expensive operations, such as joins, can be performed at the application level, or they can not be done at all.

Some bloggers have a tendency to blame the language that an application was build upon as a impediment to scale and response. While this may play a factor, I have never seen a developer for a large web service blame shortcomings of the system upon the language in use. In fact, most problems arise from applications being poorly designed for their actual purpose, poorly designed for scale, and poorly designed for transparency.

I’ve gathered a ton of articles on how to make scaling improvements to a general installation. I would recommend looking through a few of these. I found the articles on amazon, google, and facebook architectures to be very interesting.

Very informative on scaling Drupal

Maximum performance from Apache articles

Current server types in use today

This article is a little old, but it is quite funny.

A slideshow of current scaling strategies

Ruby on Rails and S3 Slideshow

May not be realistic, but these are the claims of the VMWare crowd.

Scaling a website with a focus on caching and read pools

The number of servers some of the top websites are currently using. And some comments!

Slideshow on scaling

Short article on loadbalancing and clustering

Tuning the iPlanet web server

Very interesting articles on how the largest web services scale. This is how the big dogs do it.

A definition of scalability

An interesting article on performance planning and the ability to determine the maximum capacity of the server

A good article on how to determine metrics and scale up capacity by Flickr

A short blog post on sharding

Useful Resource Links