Overview
- The peer-reviewed study in PNAS by teams at OIST, ISTA, the University of Vienna, and CAB examines how proteins spread through the vast space of possible amino acid sequences.
- The researchers find ancestry is the strongest brake on diversification, leaving many protein families tightly clustered and low in effective dimensionality.
- Effective dimensionality here means how many independent directions evolution has explored within a family, and many families show very few such paths.
- Simulations suggest the first protein families likely needed DNA recombination to form rather than mutation from a single starting sequence.
- The collaboration cautions that AI design tools learn from a tiny slice of functional proteins and calls for new datasets, noting support from Japan’s JST ASPIRE program and no competing interests.