CORRECTION: A previous version of this article mischaracterized the number of hard drives in ACCRE. The article has been corrected to state that there are about 3,000 hard drives that are split into smaller groups.
For 17 years, Vanderbilt students and researchers have been able to analyze data with a method much faster than any normal laptop: a supercomputer steps away from the Commons Center.
The Advanced Computing Center for Research and Education (ACCRE), which originated in 2003, is housed in the Hill Center, between the Commons Center and the Dean of the Commons Residence. Today its 4000 square foot facility includes 3,000 hard drives and more than 600 multi-core systems—which translates to a lot of saved time and effort for researchers like Dr. Ipek Oguz, a computer engineering professor at Vanderbilt.
“If you’re trying to analyze 100 CT scans, let’s say, if you’re doing it on your laptop, you have to do it one at a time,” Oguz said. “Versus on ACCRE you could run 100 of them in parallel so you would get your results that much faster.”
Oguz uses ACCRE to process data daily in her Medical Image Computing Lab, but she also brings ACCRE as a pedagogical tool into the classroom.
In a graduate level course she teaches on open source programming, all the students receive ACCRE accounts that they can use remotely for the duration of the semester to analyze data in their in-class work and homework. The students receive a different set of skills this way, Oguz said.
“I’m sure we could construct something in my local lab here, but that’s more of a play example, whereas ACCRE is very much a real, live environment,” Oguz said. “So I think they enjoy that.”
However, ACCRE doesn’t limit its resources to the Vanderbilt community. Researchers from around the world can submit data jobs to be processed. Currently, the supercomputer stores and processes 70 percent of the data from a project of the European Council for Nuclear Research (CERN) based in Geneva, Switzerland.
The data center that houses the supercomputer contains row upon row of large, black, humming machines in protective cases. The whole room is set two feet off the ground to make room for the massive bundles of cables running under the whole floor. Besides precise humidity and temperature control, the facility also has a set of enormous batteries. If the electricity goes out, these batteries have enough power to keep the whole system running for five minutes before the generator kicks in.
Just last week, the ACCRE team had to deal with their worst hardware breakdown in five years.
ACCRE splits its 3,000 hard drives into smaller sets. Last week, as the ACCRE team was replacing two hard drives within one such set, a third drive in the same set broke, three years before its anticipated end-of-life. Suddenly, everything came to a grinding halt.
The outage affected six percent of the stored data, and a total of four million files had to be restored from backups on tape. ACCRE utilizes this long-term archival storage method because it is immune to power surges.
“The good news is, from an operational perspective, the backups worked and the restores worked,” ACCRE Director of Research Computing Operations Hunter Hagewood said. “Because most people have a hard time restoring 100 gigabytes, and we did a good job with 450 terabytes.”
In order to access ACCRE, researchers pay to either rent or buy compute nodes (the parts of the supercomputer that actually execute tasks), but they usually end up getting more than they pay for, Hagewood said. For instance, a researcher might buy 40 nodes but would be able to use the power of the whole cluster when needed.
Warren Eckstein works as systems administrator and self-described janitor of ACCRE. For him, the best part of the job is the opportunity to work with cutting-edge technology that wouldn’t be accessible elsewhere. His office, a creative chaos of screens and wires, bears witness to his love for the challenges technology brings.
“ACCRE provides me an avenue to actually do that on a larger scale,” Eckstein said.
Moving forward, Hagewood hopes that increased investigations into artificial intelligence and machine learning will attract the interest of students and researchers who don’t traditionally use resources like ACCRE. Already, the supercomputer runs 1.5 times as many jobs as it did six years ago.
“As the environment continues to grow, we’ve obviously found ways to increase efficiencies, but to sustain the growth, we look forward to growing in both hardware, personnel, and the research community,” Hagewood said.