Some traps of utilization as a KPI driving a Virtualization Strategy
You must have seen the comments, even from reputable analysts and commentators, ‘servers in data centers run at an average 8% cpu utilization and that’s a bad thing and it’s down to the data center operations people to fix this atrocious misuse of corporate assets by increasing that average utilization rate.’ Often the panacea is to use virtualization. A few observations on this type of analysis:
1/ For much software in the data center, the server platform is Windows and for various reasons the original systems are likely to have been built assuming that they had all of the hardware to themselves and it’s quite hard to consolidate services onto Windows platforms, so you get a lot of small machines that cannot share resources.
2/ 8% (or 5% or 15%) is not necessarily a poor average utilization for a small server: look at these machines over a day and you’ll find that many of them will top out at or near 100%, this is a simple consequence of the volatility of the load that’s put on them – if the average were higher, there’d be many more service interruptions. It’s not difficult to work out why this is so, you just need a bit of Queuing Theory.
3/ Actually, the situation’s worse than it appears at the server level as, on average, a server in a data center is about half the performance of the latest equipment being installed today. With a three year asset lifetime for servers, the computers being decommissioned are a quarter of the performance of the equipment replacing them. So if there’s no consolidation of services/business applications the situation will naturally get worse.
4/ Virtualization will certainly raise the average utilization, in the same way that putting a screen saver on each server would: there’s a significant overhead in Virtualization technology, when compared to the ‘average’ utilization of a server. It takes more than a technology to fix this business problem.
5/ The people who run the data centers are not usually in a position to judge where a particular business service could tolerate a bit lower performance, or a few higher outages. Indeed, it’s rare for an organization to have correct information on which servers are supporting which business applications or services for more than 60% of the estate.
5/ there are two main factors that must be brought to bear to get better asset utilization:
— more ‘liquidity’ for the assets so that the demand fluctuations of the business applications that the servers support can be better averaged out. Liquidity here is used in the same sense that it is in financial markets: if I need a new server for a particular task, can I get one, and can I dispose of it – pass it on to another user of servers – when I need to.
— end-to-end capacity planning and management, tying together the whole IT value chain, breaking down the silos between IT operations, Applications Development, Engineering and the technology silos within these groups.
Virtualization may help with increasing the ‘liquidity’ of the assets, if it is properly implemented and has suitable supporting organization, governance and processes, but it’s not purely a technical issue (the same result could be achieved at greater cost through more aggressive standardization and consolidation of services onto fewer platforms), and it requires a degree of insight into how to make it happen. A significant, but probably invisible, win is that deployment/decommissioning time for servers should drop from around 15% of the assets’ useful life to <5%, merely from re-engineering these processes.

Comments have been disabled for this post.