Proteins employ a wide variety of folds to perform their biological functions. How are these folds first acquired?
An important step toward answering this is to obtain an estimate of the overall prevalence of sequences adopting functional folds. Since tertiary structure is needed for a typical enzyme active site to form, one way to obtain this estimate is to measure the prevalence of sequences supporting a working active site.
Although the immense number of sequence combinations makes wholly random sampling unfeasible, two key simplifications may provide a solution.
First, given the importance of hydrophobic interactions to protein folding, it seems likely that the sample space can be restricted to sequences carrying the hydropathic signature of a known fold.
Second, because folds are stabilized by the cooperative action of many local interactions distributed throughout the structure, the overall problem of fold stabilization may be viewed reasonably as a collection of coupled local problems.
This enables the difficulty of the whole problem to be assessed by assessing the difficulty of several smaller problems. Using these simplifications, the difficulty of specifying a working β-lactamase domain is assessed here.
An alignment of homologous domain sequences is used to deduce the pattern of hydropathic constraints along chains that form the domain fold.
Starting with a weakly functional sequence carrying this signature, clusters of ten side-chains within the fold are replaced randomly, within the boundaries of the signature, and tested for function.
The prevalence of low-level function in four such experiments indicates that roughly one in 1064 signature-consistent sequences forms a working domain.
Combined with the estimated prevalence of plausible hydropathic patterns (for any fold) and of relevant folds for particular functions, this implies the overall prevalence of sequences performing a specific function by any domain-sized fold may be as low as 1 in 1077, adding to the body of evidence that functional folds require highly extraordinary sequences.