The team behind this project is a biology A-Team that I'm honored to be a member of. Heading up this effort: noted evolutionary bioinformaticist Dan Graur. The core team also includes renowned bioinformaticists Yichen Zheng and Nicholas Price, talented genomic architect Ricardo B. R. Azevedo, and population geneticist Eran Elhaik. Advisers include Larry Wall, fresh off the successful Perl 6 project; noted computer language theorist Edmund Arranga, and celebrated biostatistician and data analyst Lawrence Sanna. Beyond the specific technology, important philosophical guidance was provided by Francis Galton. This project makes use of important work by Francois Pinard, although he is not directly involved in the project.
The thesis of this star-studded group is simple, but like many simple things hidden in plain sight, it takes a true genius like Graur to realize the potential that lies before us, within reach. When you ask Graur why nobody has proposed such a project before, he just smiles wryly and shakes his head. "The human genome is rife with dead copies of protein-coding and RNA-specifying genes that have been rendered inactive by mutation. It's time we stopped passively accepting this situation and did something about it."
Our project: refactoring the human genome. "Refactoring" is a term from computer science, meaning to reorganize the internal structure of a code base without changing the (intended) external behavior. This term has been borrowed into bioinformatics, where the meaning is equivalent: the reorganization of the structure of a genome without changing the behavior of the organism. As Graur eloquently observed, at least 85% of the human genome is plain junk. Most of it is not transcribed, and even much of which is transcribed is by definition nonfunctional, because it is not subject to purifying selection.
Why does this matter? Even though it is not important to our normal genomic lives, this junk DNA is an important source of disease. Most obviously cancer, but also genetic disorders such as converse errors that can cause acute adhominem ataxia.
From an evolutionary viewpoint, a function can be assigned to a DNA sequence if and only if it is possible to destroy the function by removing the DNA sequence. Put another way: by definition, eliding nonfunctional DNA does not change the organism's function. Since 85% of DNA is nonfunctional, suppressing it has no effect on normal biology. Why is it worth eliding? Studies by Graur's colleagues show that transcription of junk DNA is highly active in immortal cancer cell lines. The implication is obvious: the key to curing cancer is to attack it at the root: junk DNA.
Thus our project, dubbed RECODE (short for "Refactoring to Eliminate redundant non-CODEing regions") will use well-understood recombinant gene technology. The result will be a human genome that is between one third and one sixth the present size. The project is ambitious. The benefits clear. The savings in file storage alone amount to millions of dollars a year, and will only grow as genome sequencing becomes more common.
Technically, the project is straightforward, although the bioinformatics is a challenge. The hardest part, and the part I'm personally most excited about, is ensuring backward compatibility of the new genome which we've taken to calling "H. sapiens 2.0" with the present "H. sapiens 1.0" genome. Personally, I'm not sure backward compatibility is actually essential, but politically, it's probably indispensable. For one thing, there is the installed-base to think of.
As often happens, backward compatibility will involve compromises. In particular, rebalancing chromosomes will probably have to wait for a future "H. sapiens 3.0" project. But an elegant implementation of backward compatibility gives humanity a smooth upgrade path to a genomically more healthy future.
This project is so new, it doesn't even have a proper web site yet. However, basic information is available here.