Infrastructure

High Performance Computing

Throughout the lifetime of IIDS we have created a state of the art computational facility that provides the resources essential to contemporary research on evolutionary and computational biology, machine learning, and any many other topics. With a state of the art data center, computing clusters and data storage systems, we excel at supporting varied scientific research at all stages from project development to data reporting.

The RCDS data center is a 1400 square foot facility in room 120B in McClure Hall on the University of Idaho campus that was specifically designed and renovated for the Core. Optical fiber and copper interconnections provide 1-100 Gb/s data transfer rates within the Core, which is connected to the 10Gb/s university backbone and from there to Internet 2. This room has a dedicated UPS with three-phase power and four forced-air handlers attached to redundant university chilled water systems.

Falcon

Falcon currently consists of approximately 932 nodes with dual Intel Xeon E5-2695v4 18-core processors running at 2.1 GHz (36 cores per node) for a total of 33,552 cores capable of more than 1 PetaFLOPS of compute capacity. Each node on Falcon is configured with 128 GB of RAM for a total of about 120TB of overall system memory, uses an Infiniband-based interconnect configured as a 7-dimensional hypercube, and uses a 1.3 petabyte fault-tolerant Lustre filesystem.

Computing

The main cluster provides over 1500 processor cores and over 8 terabytes of system memory, with Infiniband network connections. The modular design of this cluster, primarily enclosures (blade chassis) and blade servers, makes it possible to service or upgrade components without interrupting our users. Removable and redundant fans and power supplies located at the back of the enclosure provide easy access and replacement without powering down individual systems, and each enclosure contains its own network components (1GigE) to maximize inter-enclosure server communication. We use a higher bandwidth network (10GigE/ CX4 uplinks) for communication with servers outside the chassis. Components include Dell M1000e blade enclosures and M605 blade servers, and Dell R815 and R730 servers as ‘fat’ nodes. The cluster includes two Dell R730 servers, each with four Nvidia K80 coprocessors for use by CMCI associated researchers. We added 32 SuperMicro compute nodes, each with 128GB RAM and 8 physical cores (16 logical) in 2016, and several GPU nodes during 2017-2019.

Our main cluster provides over 1500 processor cores and over 8 terabytes of system memory, with Infiniband network connections. The modular design of this cluster, primarily enclosures (blade chassis) and blade servers, makes it possible to service or upgrade components without interrupting our users. Removable and redundant fans and power supplies located at the back of the enclosure provide easy access and replacement without powering down individual systems, and each enclosure contains its own network components (1GigE) to maximize inter-enclosure server communication. We use a higher bandwidth network (10GigE/ CX4 uplinks) for communication with servers outside the chassis. Components include Dell M1000e blade enclosures and M605 blade servers, and Dell R815 and R730 servers as ‘fat’ nodes. The cluster includes two Dell R730 servers, each with four Nvidia K80 coprocessors for use by CMCI associated researchers. We added 32 SuperMicro compute nodes, each with 128GB RAM and 8 physical cores (16 logical) in 2016, and several GPU nodes during 2017-2019.

Standalone Servers

We also maintain several servers that are not connected to the cluster systems. These are for jobs that require very large shared memory machines such as distance-based phylogenetic analyses and molecular simulations, for software development, and for investigators who are unfamiliar with or do not require a cluster environment. The most powerful servers in this group contains over 100 times the system memory of a standard desktop (1000 GB) and is used heavily for hybrid sequence assembly of Illumina and PacBio sequence data.

These are SuperMicro rack servers, each with 48 logical cores, and 256GB of system RAM. Colin features 4 Intel Xeon Phi 5110p Co-processor cards. Arthur has 2 NVIDIA P100 cards.

This is a Dell 730 with 72 logical cores and 96GB of system RAM.

These two servers are Dell M820 blades, each with 64 cores and 1TB of system RAM. Only jobs requiring vast amounts of memory should be run on these two servers.

These are SuperMicro systems, with 40 cores and 192GB system RAM. These are general purpose servers, that will meet most users needs and are a good place to start if you are unsure which server to use.

This is a Dell M905 blade, with 24 cores and 192GB of system RAM, and is good for analyses requiring large amounts of RAM.

This is a Dell R815, with 24 cores and 80GB RAM.

The newest standalone server, a 1U SuperMicro, 40 cores and 192GB RAM. High single thread compute speed.

A SuperMicro chassis with 56 cores and 1TB system memory. Access to this server is restricted to the GRC and by special permission.

The IBEST Genomics Resources Core has exclusive use of this server for sequencer data post-processing of raw data. Huxley has 30TB usable space for storing sequence data generated by the GRC, and the capability to process it.

Since this scale of operation falls well outside typical University of Idaho information technology and computing services we maintain our own support infrastructure. This includes several servers for storage and authentication of user accounts (LDAP), domain name resolution (DNS), Internet address assignment (DHCP) and secure connections to private networks (VPN). We also provide web and database services for online documentation and data sharing.

Data Storage

Our high performance storage system (298TB realized) that uses fast disk drives and multiple control systems that are linked together through a special type of file system (Lustre) that allows us to group storage components into logical units. This makes it possible to access portions of data from multiple storage devices and aggregates data reading and writing across multiple disk drives and network connections, thereby increasing overall performance. Metadata servers contain typical file system information such as ownership permissions, and physical location. We have multiple metadata servers working in parallel to recognize failures and automate device control, minimizing staff intervention and disruption of services. Each individual disk storage system (array) combines multiple disks into a single logical unit (RAID), which provides redundancy on a disk level. Components currently include Dell R630 servers with MD3420 storage arrays for the metadata, and Dell R515 servers for data storage.

Next generation storage, intended to replace Gluster. Self-balancing and self-healing to a greater degree than gluster – composed of 40 servers with a total capacity of approx 611 TB realized (and growing). Backed up offsite.

Applications

Listed below are a few of the applications installed on our servers for use.