The Origin and Diversification of Proteins
by BradfordAn ongoing exchange involving Matt, John, Eric and Olegt led me to post this:
The authors are Douglas D. Axe, Brendan W. Dixon and Philip Lu. It is published in PLoS ONE. The abstract:
The study of protein evolution is complicated by the vast size of protein sequence space, the huge number of possible protein folds, and the extraordinary complexity of the causal relationships between protein sequence, structure, and function. Much simpler model constructs may therefore provide an attractive complement to experimental studies in this area. Lattice models, which have long been useful in studies of protein folding, have found increasing use here. However, while these models incorporate actual sequences and structures (albeit non-biological ones), they incorporate no actual functions—relying instead on largely arbitrary structural criteria as a proxy for function. In view of the central importance of function to evolution, and the impossibility of incorporating real functional constraints without real function, it is important that protein-like models be developed around real structure–function relationships. Here we describe such a model and introduce open-source software that implements it. The model is based on the structure–function relationship in written language, where structures are two-dimensional ink paths and functions are the meanings that result when these paths form legible characters. To capture something like the hierarchical complexity of protein structure, we use the traditional characters of Chinese origin. Twenty coplanar vectors, encoded by base triplets, act like amino acids in building the character forms. This vector-world model captures many aspects of real proteins, including life-size sequences, a life-size structural repertoire, a realistic genetic code, secondary, tertiary, and quaternary structure, structural domains and motifs, operon-like genetic structures, and layered functional complexity up to a level resembling bacterial genomes and proteomes. Stylus is a full-featured implementation of the vector world for Unix systems. To demonstrate the utility of Stylus, we generated a sample set of homologous vector proteins by evolving successive lines from a single starting gene. These homologues show sequence and structure divergence resembling those of natural homologues in many respects, suggesting that the system may be sufficiently life-like for informative comparison to biology.
There was also this:
However, the models purporting to explain structural radiation generally use simplistic representations of selectable function. As Zeldovich et al. point out, many evolutionary models lack any causal connection at all between sequence and function [8]. But even when causal models are used, they tend to be simplistic. Hirst has discussed the various aspects of structural soundness (e.g., folding stability or speed) that are singled out as proxies for selectable function [9]. Recognizing the distinction between structural soundness and functional utility, he required lattice structures to form a pocket (analogous to an active-site cleft) in order to be deemed functional [9]. This was certainly a step in the right direction, but the underlying problem remains: While these properties are all necessary for the function of real proteins, they are not sufficient. If they were, one good structure would suffice, whereas in reality we see not only a great variety of structures but also a strong connection between this variety and the great variety of specific functions they perform.
Oversimplification of function tends to obscure this fundamental connection. As an example, consider the recent lattice study of Zeldovich et al., which ties a genome's fitness to the lowest stability of its encoded proteins [8]. Their model enables a population carrying the gene for a single lattice structure to diversify to the point where evolved structures span the entire space of possibilities. But it achieves this not only by using stability as a proxy for function, but also by dispensing with the notion of a stability threshold—a minimal stability, below which structures are deemed non-functional [8]. In the end, structure space is freely explored here because it is entropically favorable for it to be explored, making structural variety an entropic artifact rather than a functional necessity. Because one good structure really does suffice in such a world, it seems unlike the real world, where “the great functional capacity and importance of proteins largely stems from the remarkable ability of these polymers to adopt distinct 3-dimensional structures” [3].
Can a new model be framed so as to capture this fundamental aspect of biology? A key step in this direction may be to base it on real function rather than a definitional substitute for function. Because real functions involve both specificity and real constraints, this would guarantee a level of functional realism that is not otherwise easily achieved. This principle is demonstrated by artificial-life simulations, like Avida [10], where computational tasks must be performed in order to gain a selective advantage. But because these tasks are performed by instructions rather than structures, Avida does not readily lend itself to protein studies.



















August 20th, 2010 at 11:49 pm
[...] The Origin and Diversification of Proteins – Telic Thoughts [...]
Pingback by Which cellular process leads directly to the expression of most genes? | Cellular — August 20, 2010 @ 11:49 pm