Each compute node uses the IPoIB fabric to nfs3 mount the SAN attached file systems from the head node. An Ethernet gateway module - also installed in the SFS-3012 switch - provides Ethernet IP connectivity to the IPoIB network as a bridge to a NAT-routed VLAN (104) on the Cisco 3650 switch. This Ethernet switch trunks both VLANs over a two-fiber pair port channel (using differential routing) back to 12 Oxford Street, where a departmental Cisco 6509 with dual Sup720s provides routing services.
"When it was all said and done, the cluster, with its 14 blades, was just about the right size for the kind of projects and student user load we were shooting for - 50 students: 56 cores," says Lotto. "The different blades, with their varying amounts of memory and hard drive, are ideally suited to specific computational workloads and so far we haven't had one student complaint."
As complex as this cluster sounds when broken down into its component parts, for the end-user students, accessing the system is as easy as pulling up a Web-based browser session.
"We identified some software - we use WebMO and Q-Chem as two particular packages," says Lotto. "WebMO provides an intuitive graphical front end for drawing molecules and submitting jobs for calculation using the cluster, while Q-Chem acts as an advanced computational chemistry package supporting several molecular orbital calculation methods running behind WebMO. So the students can focus on the chemistry, the software interface is as simple as opening a Web browser on their own computer, following the instructions given to them and submitting their jobs."
The BladeCenter cluster configured by Harvard's chemistry and chemical biology departments was designed to take into account future growth, as well as provide flexibility to better address potential problems and/or bottlenecks. And, even though student and faculty satisfaction has been high, Lotto admits there are some aspects of the system he'd like to improve.
"The system, as it's currently set up, is working beautifully in most situations," says Lotto. "I'm having some trouble with the shared file system when the cluster is under a high load, and I'm not at all surprised by that given the fact NFS V3 is really not the protocol designed for high-performance cluster file systems. So what I'm doing right now is working with several IBM GPFS* (general parallel file system) folks and looking to migrate the existing infrastructure to a GPFS file system that would take better advantage of the IB SAN fabric. "
Next page: >>
Browse products and services for Case Studies.