The mainframe -- an IBM 2066-002, part of the zSeries line -- was unable to keep up with the demands of crunching data on 500 million unique vehicles every year. Some of the data processing jobs on the mainframe would often take several days to complete. Parts of the infrastructure at Polk were close to 20 years old when executives first started looking elsewhere for solutions in 2004.
With about 4.5 petabytes of stored information on hand, the mainframe took on the persona of a lumbering behemoth. This was especially the case when the IT staff had to accommodate new business requirements such as a car dealership adding a new type of vehicle to its inventory. Each update required a major rework of the program, said Mick Isiminger, director of IT operations at RLP Technologies, a wholly-owned research and development subsidiary of Polk.
And the amount of information on hand was growing. As one of the largest providers of automotive consumer information, Polk tasked RLP Technologies (RLPT) with finding, configuring and deploying a higher performance and more flexible alternative.
Grid computing versus mainframe
Leading the effort, Isiminger said the decision was made early on to switch out the mainframe and go with a grid computing environment. Why a grid? Because it offered a "loosely coupled environment" that could adapt to change more easily than a mainframe, he said.
Grid computing performs higher throughput computing by distributing processes across a group of servers. Grids use the resources of this network to solve large-scale computations.
They can also perform computations on large data sets by breaking them into many smaller ones. Grids are a popular form of computing in academic and scientific environments, and organizations like the Open Grid Forum have formed to promote their use in the enterprise.
The servers that would go into the grid were still a matter of debate, however. RLPT's search began with SPARC-based servers from Sun Microsystems Inc. and x86 servers from Dell Inc. Testing included performance benchmarking and scalability for future updates and determined which servers handled floating point and integer calculations the fastest.
Price was also a consideration, although it actually trailed performance and scalability in the pecking order. Truth be told, the Sun SPARC and Dell x86 servers were similarly priced, Isiminger said.
Ultimately, the choice was made to deploy the grid on Intel-based Dell PowerEdge 6850 servers. The production grid comprises 49 servers and 118 processors running Red Hat Enterprise Linux. The grid also runs an RLPT custom-built internal data management application, called OneView360, which handles all information processing.
Isiminger said his IT staff moved from the mainframe to a grid using high-performance code developed in Assembler and PL/1, business logic in COBOL, proprietary mainframe development tools, flat file processing, VSAM and some IMS.
Most of the OneView360 architecture was developed by RLPT, but Cap Gemini and several smaller local boutique consultancy firms were also used to build out the application. Application development took 14 months, while implementation and integration into the business took six months to complete, Isiminger said.
The benefits of grid computing were immediately apparent to Isiminger and his staff.
"In the old world," Isiminger said of life with a mainframe, "we would have passed all the [customer and vehicle] information through in big batches. These batches would run for days in the mainframe, and we're talking batches upon batches upon batches."
But with a grid, this week-long process was reduced to "a matter of hours," said Norm Marks, RLPT's head of marketing. The grid was "elegant" in comparison, he said.
"As the data flows through the grid, we can carve out just the information we need to make the service run. We can parse out individual names, add and separate information all at the same time. Eventually, all of the information meets up at a single end point and is joined together again," Marks said. And instead of having to deal with the entire batch at one time, grid allows the IT staff to scale capacity on demand.
The grid also grants the ability to automate manual tasks and cut hours off the day, Marks said. "Typical up-front data captures used to take four to 14 hours on the mainframe, but today we're completing them as automated tasks in [less than 30 minutes]," he said.
Internal tests have showed speed improvements in data-file processing of up to 70% over what the mainframe could provide. Grid computing granted 100 transactions per second, on average, which was four times parent company [R.L. Polk's] 25 per second. Marks said the differential provides room for future transactions, business growth and processing spikes.
Life on the grid also saved money. Millions, to be precise. Isiminger's IT group now operates at a 43% smaller size, and the move away from the mainframe to a grid computing model reduced hardware costs by 65% overall.
Why Linux and open source?
When RLPT compared servers, they also compared operating systems. The winner was "easily Red Hat," but Eisminger said Sun Solaris was in the running when his company debated whether to use SPARC or Intel-based servers.
The OS comparison focused on security. Isiminger wanted assurances that patching and other security updates would be delivered in a timely manner by the vendor.
Security was the most important criteria because OneView360 functioned as a secure door through which all data for R.L. Polk would flow, Isiminger said. The data included sensitive information like VIN numbers, car pricing and customer names. Linux in general was prized because of its lower price, flexibility and security.
Red Hat Linux impressed Isiminger with its security chops, but the company's partnership with open source middleware provider JBoss Inc. sealed the deal. Red Hat acquired JBoss for $350 million in April.
Isiminger said JBoss became an integral part of OneView360 both as an application server for its user interface (UI) and as a platform for running the applications' complex custom service orchestration engine. The orchestration engine handles the most complex processing in the system and controls mass parallel processing across the grid, he said.
JBoss staff was also onsite to provide subject matter expertise and training for their technologies, he said.
When he was asked to advise potential adopters of grid computing technology, Isiminger kept it simple: "Do your research," he said. "In our case, we needed to enhance a service. Grid computing is certainly no silver bullet, but it worked perfectly for us."
And what of that old mainframe? It's still around, but Isiminger wouldn't say exactly what it was up to. It operates in a "reduced capacity," he said.