Team:Peking/Background

Background

The spark of thought - the art of DNA

The idea of DNA storage begins with art. In the second half of the 20th century, the explosive progress of life science gradually revealed the secret of life. For artists, a new field is emerging. As a life-programming language, DNA has astonishing fidelity. What if a masterpiece is stored in DNA? Without mutation, its beauty, creativity and philosophy would never be lost!

As art historian Jack Burnham described in his book in 1970, the equilibrium in dynamical systems would not only exist in natural life but also enter artistic life, as a major method to pursue aesthetics. As the new generation became more adept in systems theory, the Ancient Greeks' obsession with "living sculptures" may become a dreamlike reality.

In the 1980s and 1990s, the project "Microvenus" was put into action, and the logo of the project was written into DNA in a more primitive way.

Coding rules:

1. The logo is represented as a 7 x 5 black and white bitmap;

2. Arrange the bitmaps into binary sequence by line;

3. Read the binary sequence, count the number of repeated 0 or 1 before alternation of 0 and 1. 1~4 repeats are represented by C, T, A, G respectively.

This scheme is similar to the binary encoding scheme.

Many scientists and musicians can't resist the temptation of using DNA to encode and store music, so that beautiful music can be preserved forever. In 2017, two songs (Smoke on the Water and Tutu) were incorporated into DNA and read with 100% success rate, becoming the first DNA encoded documents in the world memory list. In 2018, a British band teamed up with Zurich Polytechnic University to store their most popular album Mezzanine 20 years ago in DNA molecules. The size of the file was 15m, which was the second largest DNA document at that time (the largest was Microsoft's 200m). The DNA used to store information was wrapped around 5000 glass beads invisible to the naked eye and placed in a small bottle of water.

With the development of technology, more and more attempts have been made to use DNA as the information storage carrier. The purpose of the attempt is no longer limited to art. The focus has shifted to high-capacity, long-term and high error tolerant storage. Therefore, the coding method is becoming more and more complex (introduced in the last issue of push).

When a piece of natural DNA is encoded by reverse decoding, the result is often garbled. However, it is also possible to make rules to make the decoding result acceptable. Genetic algorithm can be used to find a good decoding method and determine its inverse. The core of genetic algorithm is the fitness function.

Scoring strategy

In this iGEM project, our work is to provide a fitness function for music. The score is used to measure the sweetness of songs, so as to provide the basis for mutation screening of music. Due to the artistic nature of music, it is difficult to give a very strict scoring standard. We are going to try two ways: one is based on music theory and the other is based on machine learning.

The scoring method based on music theory is to give a judging standard from the basic music theory basis. Now the basis is: take the first note of each bar as the root, score the coherence and sweetness between the bars through the fourth degree inclination of the chord where the root tone is; try to avoid the melody which is not too pleasant such as big jump. Of course, this is only a very hasty scoring method. The specific parameters and other rules need to be added and improved.

Base editor: directed mutation and random mutation

At the beginning of the study, David Liu et al. connected the N-terminal of Cas9 (ASP ASP in the above figure) with the cytosine deaminase (Apobec1, Apobec1) through a linker, so that Apobec1 can "reach" the base at the far end of PAM.

Later, in order to reduce the repair of mammalian cell uracil glycosylase (UDG) mutation caused by deaminase, the C-terminal of cas9 was connected with UGIUGI.

There are five families under UDG superfamily, and the first family is responsible for correcting mutations such as cytosine deamination. The sequence alignment results of human and E. coli UDG are shown on the above.

UGI is a kind of UDG inhibitor protein isolated from Bacillus subtilis phage. It has inhibitory effect on E.coli and human first family UDG.

TadA is originally a tRNA adenine deaminase (A-I) in E.coli. We transformed Tada into an enzyme for gene editing by directed evolution, and linked it to the N-terminal of cas9 via linker, thus constructing ABE (realizing A-T base pairs, and transforming it into G-C).

EvolvR error prone polymerase can achieve random mutations.

Algorithm of music generation

As early as the end of the 20th century, the programming masters extended their magic wands to music creation, trying to make computers compose music for people and wipe out the living space of musicians. In the past 30 years of development, they have made great progress in music generation algorithms, from the earliest genetic algorithm, Markov chain and other original algorithms, to contemporary RNN (recurrent neural network), LSTM (long-term memory network), Gan (confrontation generation network), ALS (alternating least squares) and other new algorithms laws. The products have also evolved from almost intolerable noises to rather fluent pieces of music. A good news for the musicians is that even today's "jukebox" (a neural network developed by OpenAI, which can not only generate pop style music, but also achieve "high imitation" with real musicians, which is worthy of the name of "jukebox") is still far from producing a masterpiece.

What is the use of these algorithms in music generation?

Take genetic algorithm (GA) as an example. Generally speaking, our first step is to transform the binary code into music score through a projection rule, such as defining some part of the garbled code as the pitch value, some part as the volume value, and so on, and then get the notes one by one. But this kind of note string obviously can't be used as a song in its own rights. This is why we need to use algorithm to deal with this note string, and "mutate" and "hybridize" it to obtain a new generation of note string.

Then the most important step is to select the appropriate evaluation function.

This evaluation function is also known as the scoring function. It needs to be able to judiciously judge all the mutants we produce, to decide whose score is high (seemingly can be heard by individuals) and whose score is low. After screening for the sequences with higher scores, we will implant them into the E.coli genome to mutate and evolve. After generations of screening, we can attain sequences with generally higher quality.

Unfortunately, it is difficult for us to get a suitable evaluation function, even if we have applied a lot of music theory knowledge. After all, "there are a thousand Hamlets in a thousand people's eyes", and it is the same when it comes to the taste in music.

Art coding of painting

In addition to coding DNA into beautiful music, we also hope to make DNA into moving pictures.

We use specific rules to depict the color table, and then realize the visualization of DNA.