A major benefit of a Linux operating system is its multi-platform capability, but each hardware platform comes with its own set of questions: Why might we want to run Linux on higher end machines? How would that system be architected any differently on midrange offerings than on commodity PCs?
In this article we discuss the midrange offering, IBM System p: specifically, what virtualization tools are available on platforms such as IBM's System p, how we configure and administer our partitions in this kind of environment and what tools are available for System p.
Reasons for System p
Why IBM System p? There are three reasons to run Linux on this platform: reliability, scalability and performance. IBM incorporates RAS (reliability, availability and serviceability) characteristics into the hardware which drives both their Linux (RHEL and SLES) and Unix (AIX) distributions.
In order to minimize outages, the reliable system proactively manages possible hardware failures. This is done by using mainframe inspired components and reduced power consumption for increased reliability. The technology even provides self-correcting capabilities: predictive failure analysis on processors, caches and memory.
For many, scalability is the key reason for moving to this architecture. New host instances are created in System p by partitioning the server after determining what the requirements are for the partition (CPU, RAM, I/O.) And the virtualization capabilities
With an architecture that's allows one to scale vertically rather then horizontally, increased scalability leads to increased performance. This allows one to have large growth capabilities within the one box, without having to buy new hardware and also provides a smaller footprint in the Datacenter. It can scale up (as opposed to out), because of the massive horsepower available on systems such as the IBM p595, which can house up to 64 CPUs and 2 TB of RAM and can provide up to 254 logical partitions. This allows for better server consolidation and lower cost of ownership because of power and cooling requirements.
System p is definitely a performer. Recent SPECfp_2006 benchmarks on their most popular high-end midrange server, the IBM System p570, powered by the new POWER6 chip (4.7 GHz) and running SUSE Linux, scored 22.4, the highest result in the industry; approximately 23% better than an HP Integrity rx6600 running HP-UX.
System p virtualization
Virtualization capabilities are another integral component of the POWER architecture. The virtual small computer system interface (SCSI) and shared Ethernet capabilities of IBM's Advanced Power Virtualization (APV) can handle I/O from Virtual I/O (VIO) servers, further increasing flexibly and total cost of ownership. The VIO servers are actually partitions that have physical resources (adapters and/or disks). Your logical partitions, which are VIO clients, can then share these resources with other logical partitions. Virtual I/O is invaluable for environments that do not require a huge amount of bandwidth, such as development or staging environments and environments that have run out of physical capacity in the managed frame. APV is not implemented as operating system functionality (as is VMware) but built directly into the POWER architecture, which uses a hypervisor to integrate with the OS. The Power Hypervisor (PHYP) is the firmware layer that runs underneath AIX, Linux and i5/OS on pSeries machines. It resides in flash memory and provides for the configuration of the POWER processor and the virtualization support.
LoP installation and configuration
So how do we install and configure Linux on Power (LoP)? The IBM Installation Toolkit for Linux on POWER is available as an ISO image, which allows you to actually burn your own DVD for the Toolkit. Version 2.1, released in late September, provides the following enhancements: support for the new POWER6 machines, advanced POWER6-enabled Toolchain installation, support for RHEL5 and SLES10 SP1, an enhanced user interface, improved disk partitioning and improved documentation. There are also special Service and Productivity toolsavailable for both Red Hat and SUSE distributions, which provide specific tools to help you configure and optimize your environment.
How do we build and maintain the partitions themselves? We can use either a Hardware Management Console (HMC) or a software based utility, called Integrated Virtualization Manager (IVM). You can create partitions with either dedicated resources, micro-partitioning with dedicated I/O or micro-partition with shared I/O. The type of environment one is running usually determines configuration. If the environment is a QA or test instance, shared I/O components are usually the way to go. In production environments where optimum performance is usually needed, dedicated I/O is usually recommended. In order to use shared I/O, you would need to configure Virtual IO servers, which would be shared by VIO clients that would use these features. Supported Linux clients include, SLES9, 10 and RHEL AS4 for POWER.
What about the server itself? How can you configure or monitor important components of the System p? Let's start by looking around the box.
172_29_137_157:/proc # more cpuinfo processor : 0 cpu : POWER6 (architected), altivec supported clock : 4704.000000MHz revision : 3.1 (pvr 003e 0301) processor : 1 cpu : POWER6 (architected), altivec supported clock : 4704.000000MHz revision : 3.1 (pvr 003e 0301) timebase : 512000000 machine : CHRP IBM,9117-MMA 172_29_137_157:/etc # more SuSE-release SUSE Linux Enterprise Server 10 (ppc) VERSION = 10 PATCHLEVEL = 1 172_29_137_157:/etc #
Using standard Linux commands, we determine that this is a POWER6 server with SLES10. One of the most important features of Advanced Power Virtualization is Simultaneous multithreading (SMT). SMT enablement on a partition, can provide an extra 30-40% improvement in horsepower, allowing two separate instruction streams (threads) to run concurrently on the same exact physical processor. In Linux, each thread appears to run on an independent logical processor. This technology is only available on Linux on Power (LoP), whose core chipset consists of a dual processor core with each core supporting two hardware threads of execution. One can enable or disable SMT by using a Linux boot console command and a system reboot (on an AIX partition, you can enable and disable this dynamically without a reboot) The following line turns it off, in the /etc/yaboot.conf file:
SMT enablement really helps when the CPU is in a bottleneck. A wonderful performance tool, nmon, has been optimized for use on IBM System p for both AIX and Linux.
Let's look at Oracle, the 10g Release 2 specifically (the most recent supported platform for the POWER). The good news, is that Linux has SMT enabled by default and Oracle also uses the extra hardware automatically. What about RAM? The POWER architecture supports page sizes of 4KB and 16 MB, respectively. LoP normally uses 4 KB pages sizes, which is much too small for databases and will make memory operation inefficient. In order to enable large pages, you would need to change the vm.nr.hugepages parameter.
First we'll determine our page size.
172_29_137_157:/home/u0004773 # grep -i hugepages /proc/meminfo HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 Hugepagesize: 16384 kB 172_29_137_157:/home/u0004773 #
Page size is 16 MB. Next, we'll enable the large pages. For example, if the Oracle System Global Area (SGA) size is 4 GB, you would need to allocate about 256 pages to hold the SGA. We'll round it up to 260.
172_29_137_157:/home/u0004773 # sysctl -w vm.nr_hugepages=260 vm.nr_hugepages = 260 172_29_137_157:/home/u0004773 #
Partitioning on System p
The great thing about System p is you get to choose how to architect your partitioned frame and run your databases. You can choose whether or not you want separate partitions (separate OS images) with greater application segregation and smaller resources at your disposal, or larger partitions with more resources and fewer partitions to administrate and patch. It also gives you a convenient way to virtualize your environment. Linux and Solaris administrators sometimes have a problem understanding the concept because with Solaris you can have separate workgroups that are part of the same OS image. (Interestingly enough, AIX 6.1 will soon allow for the configuration of workgroup partitioning inside of logical partitions.) On System p, these workgroups are really separate partitions with separate operating systems images. While it's great that to have more options, those choices can complicate things. It is important to really understand partitioned environments and their best practices. IBM Redbooks are a great source of information. A Redbook that I highly recommend is Partitioning Implementation for IBM eServer p5 Servers.
An important tool worth noting is the IBM System Planning Tool (SPT). This tool can help you configure your partitioned environment and even validate your proposed architecture prior to deployment. The SPT, which runs on a Windows PC, assists in system planning and design. A strong understanding of partitioning is critical to running Linux on System p.
While the IBM System p environment is a great method of deploying Linux, it's important to note that not all Linux applications that run on Intel servers are certified to run on IBM's System p. This is because of the architecture (see endian) and the fact that Linux applications run natively on the POWER architecture. You will need to obtain the binaries for this architecture or compile and port the application yourself. To date, there are close to 3000 supported LoP applications. To address the problem of users wanting to run Linux applications that have not yet been ported natively to the POWER architecture, IBM has started a new initiative: IBM System p Application Virtual Environment (AVE), which allows users to run most x86 Linux applications on a System p without a recompile. The Beta version is now available as a free download.
About the author: Ken Milberg is a systems consultant with two decades of experience working with Unix and Linux systems. He is a SearchEnterpriseLinux.com Ask the Experts advisor and columnist.
This was first published in November 2007