Digital and nondigital information in genetic langauge

Vasily Ogryzko

1996. In: Rauch, I. and G.F.Carr (eds.), Semiotics around the world. Synthesis in diversity. Berlin: Mouton de Gruyter. Pp: 227-230.

Among a broad variety of different semiotic systems found in nature, the paradigmatic ones are the human language and the genetic code. The most striking common feature of both of these systems is the digital nature of the way information is functioning, is stored and is processed. First, in both of these cases, we have a mechanism of filtering the sense (or meaning) of the symbol from noise during its recognition, transmission etc., whether is it a nucleotide, or an amino acid, a letter of the alphabet or a sound of speech. Second, the symbols form linear strings (texts), combining independently or almost independently from each other.

The digital nature of the language and genetic code, I believe, were essential for the emergence of both life and human phenomena. First, the filtering mechanism provided more stability in information transmission, functioning and maintenance. Together with the second characteristic of digitality: the independence of recombination between symbols, it created a new huge space of freedom which was necessary for the development of complex nonchaotic systems, clearing the way for rapid biological or cultural evolution. In other words, digitality provided the basis for what can be called "biological" or "human" big bangs, when, in a manner similar to the creation of physical Universe, the biological and cultural worlds have emerged.

An essential aspect in the emergence and evolution of these new worlds is the creation of new information. Considering the essential role of digitality in natural evolution, one can ask the question (which ultimately concerns the relationship between the concepts of symbol and information): can all the information specific for a particular natural system be exhausted by the digital information contained in this system, i.e., by the order of symbols in the text used by this system? Can, for example, the structure and behavior of a living organism be deduced from the full sequence of its genome, assuming our complete knowledge of its environment and laws of physics? Or, similarly, can all knowledge accumulated by humankind ever be reduced to a text in a huge CD-ROM library?

One can call the positive answer to this question "principle of text reductionism". Although this view has been popular for some time in biology, up to the definition evolutionary process as the populational dynamics of genes, now it is clear that there are examples of information which do not exist in digital (symbolic) form, in both biological and human cases. Epigenetic inheritance (e.g., lambda switch) is a biological example. Intonation is an example from human culture. The question posed above, however, is whether there are any fundamental limitations for reducing all the information to the digital form. For example, assuming that digitality is so important, can the future bring us to absolute digitalization of information specific for such a particular structure as humankind? I will give examples from biology indicating that it is not possible, that the very principle of a semiotic system requires that there always be some additional information, an irreducible nondigital residue, specific for the semiotic system and essential for its being.

The first example is chirality. Live organisms consist of left amino acids and right sugars; however, there is no obvious reason why it could not be otherwise. The choice between right and left had to be done at the very beginning of life, since molecular recognition and reproduction could not be accurate enough without chiral purity. By definition, since it is a choice between two alternatives, the particular chirality of biomolecules is an information. It is an information specific for life, since chiral purity is not found in inanimate matter, and it is essential for life since the genetic code could not work properly without it. However, this information is not coded in the genome of any living organism: there is nothing in the genetic text that could tell us that the aminoacids are left and sugars are right; if one day the whole biosphere were transformed by reflection in a giant mirror, this would neither change the genetic sequences nor affect any process in it.

Being independent from genetic information, the chirality of biomolecules differs from it in an important aspect: how it reproduces. Genetic, as any other digital, information reproduces "locally" in space and time, each symbol (letter) independently from another and at a particular moment of time. Chirality is the property of the whole organism, or even of the whole biosphere and is reproduced "globally", together with the whole system it belongs to. Due to this principal difference, we will call this kind of information nondigital.

However paradoxical it may seem, another example of nondigital information in biological systems is the genetic code itself. The key elements in the providing the correspondence between nucleotide triplets and aminoacids are the enzymes amynoacyl-tRNA-synthetases. Despite the fact that the primary sequence of these proteins is coded in the genome, this does not mean that the information about the genetic code is completely digitalized: we cannot deduce the aminoacid sequence of these enzymes from the nucleotide sequence of their genes without an a priori knowledge of the code. Genetic code shares with chirality the properties of nondigital information: it is information specific for life, nonreducible to genetic text, and it is reproduced as a global property of an organism. Importantly, the distinction between digital and nondigital information holds only in the frame of reference of the semiotic system under consideration: when I describe genetic code in a book, I can do it digitally.

Another important difference between the two kinds of information is in their "meaning": while text is supposed to have a meaning (it is a symbol of something else), the nondigital information does not symbolize anything except itself.

This difference and the very necessity of the two types of information can be better understood in a view that a semiotic system, especially one with a symbolic component, has to have a law governing how to assign meanings to its symbols. By definition, this law cannot be deduced from the laws of physics and chemistry. It is a convention, and although usually it is reflected in the text, there always will be some residual information which exists only in nondigital form. During the maintenance and reproduction of the semiotic system, considered as a natural system, this information is as essential for reproduction of text, as the digital information is essential for the reproduction of the law.

Therefore, from the perspective of the semiotic triad, the nondigital information can be put in correspondence with an interpreter, while the digital one will obviously correspond to the sign itself. There is, however, another view on nondigital information, most clearly seen in the case of chirality: the choice between left and right can be considered as two alternative interpretations of the same text; therefore, the nondigital information is the information in the interpretation, not in the interpreter.

These two views on nondigital information can be reconciled if we take into account that an organism interprets its genetic information to reproduce itself. So, at least in this particular case the interpreter is identical with the interpretation. Due to this self-referential character, biological systems were claimed to differ from real triadic semiotic systems (Moreno, A et al. 1993), since by identifying interpreter with interpretation we come to a diad. My view is different. I suggest hierarchical view on a semiotic system, described below, which is more flexible and allows a unified approach to both biological and human phenomena.

We notice that there are two kinds of hierarchy in language and the genetic code. First is a horizontal hierarchy in texts: from letters to words to sentences (or from nucleotides to triplets to genes, etc). Second is a vertical hierarchy of steps in the interpretation of a text. For example, from DNA to RNA to protein to function, etc. Then we ask: from all these steps in the vertical hierarchy, what can be considered the most natural interpretation of a particular element of text? We notice that the answer depends on the level of horizontal hierarchy this element belongs to. For example, the letter L codes the sound L; however, the written word TABLE does not code just the sound "TABLE", but rather the concept of a table. Moving further, the meaning of the sentence "the coffin is on the table" is neither the sound nor even the situation described by it: it is rather someone's death. Even the situation itself can serve as a symbol and have the same dire meaning. Thus, with each step along the horizontal hierarchy of the text we move one step higher in what we consider to be the most appropriate level of interpretation. Similarly, the meaning of a DNA nucleotide is an RNA nucleotide, of a codone is an amino acid, and the meaning of a gene is not just a protein, but rather a function, activity, etc.

This observation lets me formulate the following hypothesis: although on most of the levels of hierarchy the difference between interpreter and interpretation is clear (for example, between RNA-polymerase and mRNA), on the highest level of the hierarchy (an organism) the ultimate text (genome) is interpreted by an ultimate interpreter which reproduces itself and therefore is identical with the ultimate interpretation. According to this view, a diadic relationship exists even in the case of human culture, if we take culture itself as an ultimate interpreter which reproduces itself by interpreting all existing texts. On the lower levels of the hierarchy we still have a classical triad: a particular individual is not identical with the meaning of a text he is reading, even if it is his autobiography.


Moreno, A., A. Etxeberria and J.Umerez 1993 "Semiotics and interlevel causality in biology", Rivista di biologia-biology forum 86(2): 197-209