Investing in a shared resource for scientific software sustainability in computational chemistry

20 Dec 2019 - 11:00

Cartoon representation of NanA sialidase (PDB:2VVZ) from Streptococcus pneumoniae solvated in water. A trisaccharide substrate (van der Waals representation) is bound in the active site of the enzyme with sodium and chloride ions present in solution.Understanding the role of proteins and sugars embedded in cellular membranes, particularly on diseased cells, could be key to cracking the code in diseases like cancer or malaria. The action of multiple enzymes results in complex glycoproteins that decorate the cell. To gain insight into enzyme function, advanced computational expertise is needed to build and run molecular simulations. Dr Chris Barnett has invested in the research cloud-computing platform ilifu to create a central resource where a like-minded research community can come together and share these tools and expertise, using the opensource platform Galaxy.

Barnett, a computational chemist based in the Chemistry Department and Scientific Computing Research Unit, has been studying the role of sugars in cells in specific diseases. Simulation of these enzymes is key to understanding them because different molecules will behave differently in particular contexts. Molecular modelling is practiced by computational chemists, computational biologists and others in bioinformatics. 

“Reusability and reproducibility are big buzzwords in science right now,” says Barnett, “when you publish in journals these days they also want access to the data and the code to be able to confirm that it works.”

It’s very difficult to build software that other people can use, and that is sustainable in the long term. This then becomes a very complex problem for researchers.

“Time and money need to be invested to keep scientific software in a good state, and these are resources most researchers don’t have in abundance.” Software development and aptitude in the skills needed to install software or even to manage simulation data is a major barrier to entry.

Galaxy: the solution

Barnett believes the solution to these issues lies in a web-based platform called Galaxy where a user-community can form and share tools and workflows around enzyme modelling.

Galaxy is a platform for accessible, reproducible and collaborative science. Galaxy Europe hosts multiple tools including a cheminformatics subdomain which is a webserver for processing, analysing and visualizing chemical data, and performing molecular simulations.

“It’s a totally open and transparent platform,” explains Barnett. “It provides workflows and histories so you can share it with other people and build up complex experiments.”

The web interface is designed to prevent people from making easily avoidable mistakes like typos, this is critical as such errors can negate the value of an entire scientific study. Galaxy is ‘batteries included’, the provenance, metadata and choice of simulations parameters are readily accessible for review.

Ilifu: a home for Galaxy South Africa

Barnett’s end goal was to create a collaboration platform - similar to the cheminformatics subdomain of Galaxy Europe - for the community of glycobiology and computational chemistry researchers in South Africa, and hopefully in time, Africa. He held introductory workshops and found an enthusiastic group of researchers ranging from very interested novice users who have never run a simulation before, to advanced users who would find it valuable to repeat simulations on a platform like this. The trouble was finding a platform to house it.

For Barnett ilifu – a cloud-computing research infrastructure, operated by a consortium of universities and research organisations in the Western and Northern Cape – was the obvious choice of a home. 

The challenge was just how Barnett could access the infrastructure as it was built to first service the data-intensive fields of astronomy and bioinformatics. He did this by buying in to the infrastructure with grants provided from both the National Research Foundation (NRF) and the UCT Advanced Computing Committee (ACC).

For Barnett, the fact that ilifu is an infrastructure managed by researchers for researchers makes it preferable to commercial cloud computing platforms like Amazon Web Services or Google Compute Engine.

“It’s a virtualized and flexible computing infrastructure where I don’t need to worry about any of the hardware as the experts take care of that,” he says. “In addition, unlike the commercial platforms, the costing is transparent. And I know the people behind the infrastructure, Andrew Lewis, Timothy Carr and Dane Kennedy. And because of the relationship I have with them I am confident I will get great support.”

Image reference: Cartoon representation of NanA sialidase (PDB:2VVZ) from Streptococcus pneumoniae solvated in water. A trisaccharide substrate (van der Waals representation) is bound in the active site of the enzyme with sodium and chloride ions present in solution.