Windows virtual machines in the Blackburn laboratory

7 Dec 2015 - 10:00


In 2013, researchers in the Blackburn laboratory within the Division of Medical Biochemistry at UCT acquired a mass spectrometer (an apparatus used to measure the mass of isotopes, molecules and molecular fragments) with which to analyse their samples. The acquisition, however, presented them with huge challenges of data analysis and storage. Drowning in data, the researchers in the laboratory reached out to UCT eResearch. The result: they partnered with eResearch to get access to  Windows virtual machines with far advanced compute and storage capacity.

There are roughly 20 researchers working in the Blackburn Lab under supervision of Professor Jonathon Blackburn, principal investigator in the Applied Proteomics and Chemical Biology Group,  – working in a range of different fields, primarily within tuberculosis research.

According to PhD researcher Dr Shaun Garnett, understanding proteins in the cells is a pretty tricky affair: “Proteins have 20 different amino acids, and every one of those has a different chemical property.” .

Battling with the data deluge

Before they got the mass spetrometer, researchers would send their samples off to external providers for analysis and get only small amounts of data in return. With the mass spectrometer, however, researchers are generating about a gigabyte of data per hour on a machine that runs 24 hours a day.

First they bought a network-attached storage (NAS) drive, housed in the high perfomance computing (HPC) data centre on Upper Campus. This offered nine terrabytes of space (one terrabyte is equal to 1 000 gigabytes), but they still ran out of space pretty quickly, explains Garnett.

Next, they bought some high-spec machines, with 8GB RAM (random access memory). “This was enough for one person to run their experiments for roughly three days,” he says “but no-one else could do anything.”

These high-spec machines - computers with advanced processing power and storage - could only just keep up with the amount of data the mass spectrometer was generating. “So if you need to analyse your data more than once, or try running your experiments slightly differently, which is something a researcher often wants to do, this would be a major task,” says Garnett.

The Solution

With this massive computing challenge, the group decided it was time to upgrade. Instead of acquiring their own equipment at great  cost, both financially and in terms of time, the group reached out to the HPC team  within eResearch to discuss options for collaboration. Garnett spoke to Heine de Jager of UCT eResearch, who suggested that researchers in the Blackburn Lab beta-test the new Windows virtual machines that were about to be launched by eResearch. This means the researchers would be the first sample of the intended audience to try out the new capability.

In return for this help, the Blackburn lab will make a financial contribution towards eResearch rather than spending money on their own computers. “Buying equipment with the HPC team is more expensive than buying it ourselves,” explains Garnet, “but because they buy better quality equipment, we  really feel the value when it comes to servicing and maintaining that equipment We are not IT people. We want to be biologists, and we want the IT people to do the IT.”

Windows Virtual Machines

The group now have access to two, eight-vCPU (virtual central processing units) machines with 50 GB of RAM and a third virtual machine just recently upgraded to 20-core and 100GB of RAM. “These virtual machines behave like any Windows personal computer (PC) you would use,” says Garnett.  “Once you log in, via a remote desktop client, you can see the PC screen, which looks like your own computer - then you use it as if it is.”

One great benefit of a virtual machine is that you can log into it from anywhere, and a number of different researchers can use it at the same time. It is also secure: a 25-terrabyte drive is hooked up to each virtual machine, and the data contained on that drive is backed up in three different locations across South Africa.

Garnett says the virtual machines have reduced a great deal of the strain around data analysis and storage in the lab. “We have all the software we need on these machines, and as long as one is available a researcher can just log in and use it.” He says at least one of these machines is in use at least 60 to 80 per cent of the time.

Story by Natalie Simon

IMAGE: "IRLDESI Side View" by Kermit K. Murray - Own work. Licensed under CC BY 4.0 via Wikimedia Commons.