|
Early clusters started out as a variation on fully-installed machines, with additional communication libraries and ad hoc administration tools. They were installed and controlled as if they were collections of workstations. This was a natural result of utilizing existing tools and well understood approaches, while the development work focused on faster communication and application library support. Developers were just trying to make the clusters usable, besides demonstrating that they were useful. Some of today's cluster systems are little more than incremental improvements on this original approach. But far better cluster approaches -- actual architectures designs, rather than a motley collection of ad hoc programs -- have been developed in the dozen years since clusters started being widely utilized.
New cluster system architectures recognize that the goal of clustering machines (we will call them "nodes" to distinguish them from the overall "cluster") is to present a unified structure. The goal is to present the cluster as a single virtual system. And the simplest system is one that's unchanged from what people already use.
The challenge is presenting a single virtual system image, while maintaining performance and handling failures. There are many aspects and viewpoints of a single system image: Is the SSI for the programmer, application, administrator or end user?
Some of the latest developments in clusters are new-generation cluster systems that help create a single system image. Generally, they have the following attributes:
- A single point OS installation on a front-end node
- Other nodes designated as computational resources
- Single point administration (configuration files)
- Single point of updates (kernel, services, libraries and application executables)
- Single-process space view
- Centralized monitoring and job control
For example, the Scyld Beowulf system introduced a unique architecture to implement these attributes. We designed a master-based cluster system, with computes nodes as minimally-provisioned process execution slaves. The full operating system is installed only on the front-end node, and it becomes the user-visible environment for the cluster. Additional nodes are network booted with a matching kernel, and automatically provisioned by the master with device drivers matching their unique hardware.
A key element of the Scyld Beowulf system is that it implements all of these features to create a single system illusion for the end user and system administrator, not for the application or programmer. The application still sees the machine as a collection of nodes, with independent memory spaces, IP addresses and potential file systems.
Other cluster architectures have created single system illusions for the programmer. They typically do this by emulated shared memory ("Network Virtual Memory" or "Distributed Shared Memory") or basing the system around a cluster file system.
An interesting approach is one taken by the Mosix system. It is based around "transparent process migration." Under Mosix, the OS kernel identifies candidate processes to move to other cooperating machines. Likely processes are ones that are CPU intensive, without much doing many system calls or I/O. The kernel migrates the process by making a copy of the process address space to the remote machine and continuing execution on the remote machine. When the process makes a system call, the parameters are passed by back to the original machine's kernel.
The Mosix approach has several fascinating attributes:
- It creates a unified processes space -- the user doesn't need to know if a process is local or remote to monitor and control it.
- It implements one aspect of dynamic provisioning -- the application and libraries need not already exist on the remote machine.
- It has an application version and execution consistency model. The migrated process is a copy of the original, and will thus have the same executable and library versions, as well as the same library linkage order.
- It effectively creates a single consistent file system, albeit by causing most I/O to occur the originating machine. (In some cases, such as with NFS file systems, it can optimize I/O to be local.)
- It creates a single network address, again by forwarding system calls to the originating machine. It can migrate unmodified applications, and do so at any point during execution.
It has matching drawbacks:
- Mosix can have considerable execution overhead, with normally fast system calls now requiring network transactions.
- The kernel is involved in migration policy, as only it can estimate if migrating a process will be faster than continuing to execute it locally.
- Deciding that a process is a candidate for migration requires an execution history, thus spreading jobs over a cluster take is slow and unpredictable.
- Because the remote process intimately depends on the originating machine, failure of either machine is fatal.
- All process actions affect the originating machine, rather than changing the remote machine. While desirable for applications, a different mechanism is needed for remote administration.
|