The genome comprises the entire set of DNA instructions for each cell in all living things. DNA (deoxyribonucleic acid) is the hereditary material found in humans and other organisms. The genome contains all the information needed to build any individual living thing, and for it to grow and develop. Read More
As proposed by the Swedish geneticist and Nobel Laureate, Dr Svante Paabo, detecting single nucleotide polymorphisms (pronounced ‘snips’) is essential across the fields of biology, anthropology and medical sciences. Among other things, SNPs can help us predict an individual’s risk of developing specific diseases.
SNPs are the most common type of genetic variation among people, and to detect these tiny differences, we need reliable, publicly available reference sequences. These are the DNA sequences of the four bases (nucleotides) known as adenine, thymine, guanine and cytosine, in an organism’s genome. Unfortunately, sequencing software often cannot reconstruct the true sequence from segmented base sequences generated, even with advanced ‘next generation’ approaches.
With this in mind, Associate Professor Yukihiko Toquenaga and Mr Takuya Gagné at the University of Tsukuba in Japan sought to reconstruct the base sequence of one of the simplest organisms, the phiX174 virus of bacteria. The phiX174 virus is an Escherichia coli (E. coli) bacteriophage , and such viruses work to selectively target and kill bacteria.
Dr Toquenaga and Mr Gagné generated random base segments without error from the reference sequence of phiX174 and fed these segments to more than ten freely available software applications. Of these, only one was able to perfectly reconstruct the phiX174 sequence.
The next step was to create an ensemble answer from all the correct and multiple incorrect sequences using non-metric multidimensional scaling (NMS). NMS is a powerful statistics tool for reconstructing the spatial relationship among data points, for each of which distances from other points are obtained. In this case, plotting the NMS provided Dr Toquenaga and Mr Gagné insight into the actual sequence of phiX174.
As a significant methodological step forward, this work provides a much-needed guideline for base sequencing in future research. Dr Toquenaga and Mr Gagné propose that we should not rely on a single programme for sequencing but instead, use multiple software applications to obtain an ensemble answer as they have demonstrated. Similar to other commonly used approaches in statistics, this can provide us with the capability to determine reliable sequences for detecting SNPs.