In my
OS class instead of an assigment provided by the prof, we had an option of doing something else. We decided to "build" a Beowulf cluster. We didn't do much with it except actually figure out how to install and manage everything. My partner did most of the work. I was left with doing a
writeup. Here it is. It sort of glosses over the results: we successfully built a cluster and ran the test suite. That's about it. Well, here it is,
broken english and all.
Node your homework!
What is a beowulf cluster or Beowulf cluster vs Supercomputer
Beowulf cluster is essentially a number of commodity or of the
shelf PCs tied together using some sort of protocol. Developed
by NASA, Beowulf clusters gained much attention in the last few
years. Often at a fraction of a cost of a "real" supercomputer
a beowulf cluster is comparable (or even exceeds) the
computational power of a super computer. For instance the a
cluster made out of 320 ubiquitous P3-800 CPUs takes 34th place
on the TOP500 super computer list with peak performance of over
a teraflop!
What is required for a beowulf cluster
Essentially all that is needed is a number of available machines.
Historically most of the software written for beowulf clusters
is Linux based. Due to centralized nature of beowulf clusters
machines will need to communicate to the central server.
Ethernet is both cheap and widely available which makes it
especially convenient. On top of Linux you'll have to run some
sort of communication layer which will provide the required
infrastructure between the nodes of your beowulf cluster. There
are few choices; among them OSCAR from NCSA and LAM-MPI
developed by a team from University of Notre Dame. We've chosen
LAM-MPI because of it's simplicity and because of the fact that
it's being continuously developed.
Why would you want to have a beowulf cluster
Although there are certain drawbacks to beowulf clusters they
provide a cheap alternative to buying a "real" super computer.
For instance in a school like Langara there are hundreds unused
computers during off peak hours. Students doing simulations or
other research could possibly use these idle computers to their
advantage. Supercomputers in general and beowulf clusters in
particular have been used for everything from chess playing to
cancer and nuclear research. Moving away from serious research
subjects we've found that there are applications out there that
will allow you to encode your music into mp3 format using
distributed approach - namely clustering software.
MPI-LAM
Message passing interface (MPI) is one of the widely recognized
standards for communication between beowulf nodes. Local Area
Multicomputer (LAM) from the University of Notre Dameis one of
the most popular implementation of the MPI protocol. It
provides an abstraction layer for C/++, Perl, Python and other
popular languages and enables them to create multi-computer
parallel application. MPI-LAM deals with communication between
heterogenous nodes and provides standard MPI interface. LAM
provides an extensive set of debugging applications. It
supports MPI-1 and parts of MPI-2. LAM is not only restricted
to Linux. It will compile on many commercial Unicies.
Our particular implementation
For the ease of implementation we've decided to use NFS to
mount the shared software (MPI-LAM) across all the nodes. On
boot up we pass the required parameters to the LILO which boots
the workstation from the network. On the server, we have a
directory structure which is mirrored through the network
across every node. Since LAM does all of it's processing in
/tmp we have it created as a RAM drive which improves
performance immensely. Since we're using a very recent kernel,
we can set the ram disk size to be dynamic (max 8MB) so it only
uses as much RAM as it actually has files in /tmp.
Thoughts on this setup
Having been through this process, we'd say that using NFS-root
is worthwhile for a large, fast network, but that on a 10MB
network it's too expensive to be justified. We've tested nodes
running at 10MB while playing with home network, and while
stability wasn't compromised, it hugely magnified the time the
test suite took to run. On the other hand even though mounting
the shared software from one central repository eases the
maintenance, the performance is hit because Ethernet is the
primary method of communication between the nodes in the
beowulf cluster. Further tests should be performed comparing
performance in a NFS-network vs no-NFS network.
Hardware used for the tests
Home test network (successfully tested with LAM-MPI test
suite): server: celeron 500, 192MB RAM node1: celeron 900,
128MB RAM node2: Pentium III 933, 256MB RAM
Dungeon test network (marginal stability at best): server:
Pentium 133, 32MB RAM nodes 1,4,5: Pentium 133, 32MB RAM
These nodes were the most stable, likely because they were on
the same hub as the server. The rest were on the other side of
a coax uplink, which seemed a bit flaky under load.
Problems & Difficulties for a prospective user of a beowulf
clusters
Like practically any other task in life the difficulty for a
first time user will depend on his or her expertise. As was
mentioned, Beowulf clusters are usually built using Linux.
However using MPI-LAM many other forms of Unicies become
available to the user. Once the user establishes his operating
system preference, the network topology will have to chosen.
Ethernet seems to be a winning choice because of it's
availability. Last but not least is the software to drive the
cluster. The communication software doesn't have to be written
from scratch; instead communication libraries like LAM-MPI can
be used. The user then, will have to write his own application
employing the communication abstraction layer to achieve his
particular goal.
To summarize, the decisions in building a beowulf cluster will
have to be made in roughly the four following categories:
1. Operating System
2. Network Topology
3. Communication
4. Software
From personal experience (we're fairly familiar with everything
up to "Software") it seems that writing a massively parallel
application is the hardest task of all.
Summary & Conclusion
The purpose of this assignment was to research and try to build
a Linux Beowulf Cluster. Hopefully this paper managed to show
that we've successfully scaled both problems. MPI-LAM was
chosen to implement the cluster as a result of our research.
We've successfully built a node network and run the tests which
prove the cluster functions as promised.
Although writing software for a parallel cluster seems like
beyond the scope of this course (and many magnitudes harder
than this project) it's nevertheless something we may want to
peruse for the next assignment.