# Chapter 5. Biotechnology Must Head for the Cloud

It’s clear that the rate of hardware innovation in life science is staggering. Illumina recently made headlines by announcing it had dropped the cost of sequencing one human genome to a mere $1,000. While this cost excludes the much greater costs associated with data analysis and interpretation, it’s still a remarkable milestone considering that the first human genome cost$3 billion to sequence just over a decade ago. That’s a 3,000,000x improvement.

In contrast, the state of software is unacceptable and progressing much more slowly. How many scientists do you know who use spreadsheets to organize DNA? Or who collaborate by emailing files around? Or who can’t actually search their colleagues’ sequence data? If synthetic biology is going to reimagine genetic engineering, it won’t be on the foundation of archaic software tools.

We need a cloud-based platform for scientific research, designed from the ground up for collaboration. Legacy desktop software has compounded systemic problems in science: poor scientific reproducibility, delayed access to new computational techniques, and rampant IT overhead. These issues are a thorn in the side of all scientists, and it’s our responsibility to fix them if we want to accelerate science.

## Reproducibility

The reproducibility of peer-reviewed research is currently under fire. Scientists at Amgen tried to reproduce 53 landmark cancer studies, only to find that all but 6 could not be confirmed. Many journals do not have strict guidelines for publishing all datasets associated with a project.

Just as in life science, computer scientists care about peer-reviewed research. However, a powerful prestige economy exists around creating and maintaining open source software. If you release broken software, people will say something and contribute fixes. This feedback loop is broken in biology. It can take months for journals to accept corrections, and spotting flaws is incredibly difficult without access to all of the project materials.

Replicating this prestige economy around practical, usable output requires a culture shift. Nonetheless, that doesn’t mean we can’t facilitate the process. Preparing a manuscript for publication is tedious and time-consuming. Thus, there’s little incentive for scientists to expend additional effort preparing and hosting project materials online after the fact. Files need to be wrangled from collaborators, data needs to be hosted and maintained indefinitely, and code used in data analysis needs to be documented. This process of "open-sourcing" a project must be as frictionless as flipping a switch.

Another major problem is the speed at which the latest computational techniques are disseminated. New versions of desktop software are often released on a yearly basis due to the overhead involved in developing patches and getting them installed. The upgrades are often tied to expensive license renewals, which slows uptake further. For a quickly developing field like synthetic biology, the algorithms and methods change too quickly for traditional desktop software to keep up. With web-based software, developers can easily push updates multiple times per day without any user intervention. This results in scientists getting access to cutting-edge tools without any inconvenience.

More importantly, a cloud solution would prevent the need to reinvent the wheel for each new computational technique. Imagine you devise a new algorithm for aligning DNA sequences. Any small script you write that does some data processing or analysis should not require you to go through the trouble of making sure it runs properly on your colleagues’ machines. On the other hand, hosting a tool by yourself that other scientists could use would require creating your own data storage, visualization, and serving infrastructure. A shared cloud platform would provide web APIs that provide this functionality, allowing anyone with basic programming skills to develop new tools for the community.