As data centers begin to virtualize processors and memory to increase server utilization, IT managers could be
setting themselves up for a massive I/O bottleneck crisis.
Sharad Mehrotra, the CEO and co-founder of Mountain View, Calif.-based Fabric7 Systems Inc., said such a scenario is possible given that many virtual deployments today are generally grids or clusters in the Web-tier of the data center.
The benefits of consolidating servers in the near term are obvious, Mehrotra said, given the proliferation of low-end hardware in the marketplace. However, he added that the industry should be laying the groundwork to address a looming traffic jam of data from overlooked I/O capacity of data center servers.
In this interview, SearchOpenSource.com scored an exclusive first look at what Mehrotra would offer during his session on virtualization and I/O at LinuxWorld. Joining Mehrotra for the interview was Fabric7 co-founder and server architect Thomas Lovett. After explaining why I/O has been overlooked, Mehrotra and Lovett offered their ideas on how IT managers can best prepare their data centers to meet the challenges of I/O in a virtual environment.
SearchOpenSource.com: Why has the I/O capacity of today's data centers been overlooked?
Sharad Mehrotra: My feeling is that I/O has historically been treated as a bit of a stepchild in the industry. The more glitzy thing to do was to work on processor technology in general. It's sort of like going to the gym and seeing the guy with the really big arms and then looking down and [seeing his] really weak legs.
Tom Lovett: A lot of benchmarks have come out over the past 10 years that were driven by the server architecture. Vendors really pushed the processor and memory portions of the servers and not the I/O at all. Designs tended to focus on benchmarks because that's how you differentiate the products when selling to customers. As a result, not a whole lot of people paid attention to I/O over the years.
Mehrotra: This is where the mainframe did a real good job. In that age, they did not know how to build very large processors. In order to move information in and out, IBM had to find ways to make due with what they had. They had to figure out how to make separate computers for making I/O and then attach them to the main computer. I/O processing was offloaded or shifted away. That sort of launched a whole channel for I/O. IBM realized that with channel processing they could allocate different I/O channels to different virtual machines.
The mainframe community knew back in the '70s that I/O was equally important, and they invented a whole lot of machinery to handle good I/O. Part of what we are trying to point out is that fast-forwarding 35 years to today, the situation that prevails is somewhat similar to the mainframe, especially with the x86-based servers because we still cannot build very large multiple processors -- only midrange.
What's the timetable for the bottlenecks?
Mehrotra: Generally with the larger networks and grid architectures, we are seeing something of a bottleneck already occurring. The amount of data produced by grid computations is so large that removing the data from the grid can take a long time. And now the time has come where people are starting to build dedicated-edge infrastructure or a dedicated server whose only job is to move data in and out of these grids. If we magnify this problem, and we start with grids of servers and imagine virtual machines running on them, we can now see even more workloads running on grids.
So this bottleneck is forming today, but only in some of the larger deployments?
Lovett: Part of the reason is that there is a lot of overhead in virtualization out there today. In many respects, the technology is very inefficient, and because it is so inefficient, it really doesn't matter what's going on with I/O. However, with the new AMD technology coming out, we will see some of that processor and memory virtualization overhead disappear. When people deploy on top of that technology, we will see I/O start to stick out.
Mehrotra: It's fair to say that grids in the hundreds to thousands of servers are common now, but they don't host many virtual machines on them. When we start to see a rise in the number of virtual machines on these grids over the next two to three years, then I think the next issue to move front and center will be that users can no longer move data in and out of grids.
What can IT managers do today to prepare for the I/O bottleneck?
Mehrotra: Essentially they can begin by embracing the fact that virtual technology is going to become pervasive just like multi-core processors to the degree that it will become very hard to find computers that are not multi-core in a year or two in the x86 world. [Virtualization] will be here, and you won't be able to avoid it because it will be built in. Microsoft will be baking it into the next release of Longhorn, and Xen is shipping with SUSE Linux 10 and will be shipped eventually with Red Hat Linux 5. VMware will promote the technology aggressively with a closed source model as well as a community model initiative.
And so, I think it will be necessary for users to be educated about the importance of I/O in a virtual machine-rich environment. The only way for IT managers to get their hands around it is to experiment relentlessly with some of their own important applications...in order to understand how they will perform in the physical world versus a virtual one. They should not buy into the vendor hype around CPU virtualization.
Lovett: I'd add that people need to start viewing I/O as a first-class citizen and they need to understand what the I/O requirements are for the virtual workloads they are running today.