| Home | Phylogenetic Tree | Small genomes | Methods | Tools | Other sites | References | Contact Us |


Methods (français)

Introduction

A DNA walk of a genome represents how the frequency of each nucleotide of a pairing nucleotide couple changes locally. This analysis implies measurement of the local distribution of Gs in the content of GC and of Ts in the content of TA. Lobry was the first to propose this analysis (1996, 1999). Two complementary representations can be derived from the DNA walk: the cumulative TA- and the GC-skew analysis. 

Aim: By reading these description of the algorithm, a reader not trained in genomics is able to redraw our graphs, using the basic genometric data file that is posted on our web resource for each organism as a zip file (.zip).

1) DNA walk

1.1) Drawing a DNA walk by reading a sequence file nucleotide by nucleotide.

A simple algorithm is used to draw a DNA walk by simply assigning a direction to each nucleotide. We propose the following assignment, slightly different from Lobry's: to T, C, A, and G correspond the E(ast), S(outh), W(est), and N(orth) directions, respectively (Lobry, 1999). Reading the nucleotide sequence nucleotide by nucleotide, and following the rule, a path clearly emerges on the graph: Figure 1.

Figure 1: DNA walk of the sequence

 GTCTGGTGTCTGGAGTTCCTGGGTCTTGAGACCACAGGACCCACCAGGGACCCAGGACCC

Starting from the bottom left (bold blue line), the curve end at the bottom left (pink line)

1.2) Drawing a DNA walk by slicing a sequence file nucleotide into small windows

A simple way to draw quickly this kind of graph is suggested by Lobry (1996) by cutting a genome into windows of equal length.

Figure 2: DNA walk of the same sequence as the one presented in Figure 1: GTCTGGTGTCTGGAGTTCCTGGGTCTTGAGACCACAGGACCCACCAGGGACCCAGGACCC

The sequence was sliced into 5-nucleotide windows. Only the fifth nucleotide per window is plotted. We can also work with the mean values of the window…

Comment: this method is not as precise as the first one. We could use it with a spreadsheet software without affecting the final resolution of the curve at the genome level.

 

1.2.1) The genome is cut into a number n of windows W, of equal size (the last window being smaller or equal to the other ones).

W1
W2
W3
...
...
Wn-1
Wn

1.2.2) In each of these windows a count for each nucleotide is performed: cA, cC, cG, and cT respectively.

W1

cA1

cC1

cG1

cT1

W2

cA2

cC2

cG2

cT2

W3

cA3

cC3

cG3

cT3

...

...

...

...

...

...

...

...

...

...

Wn-1

cAn-1

cCn-1

cGn-1

cTn-1

Wn

cAn

cCn

cGn

cTn

Example: Mycoplasma genitalium genome (download the compressed text file), cut into windows of 1000 nucleotides.
(Mycoplasma genitalium G37 complete genome, L43967.1, 580074 bp, window: 1000 bp).

Center position

Position of the window center (nt)

cA

cC

cG

cT

500

453

93

86

368

1500

400

120

133

347

2500

374

122

164

340

3500

345

145

200

310

...

...

...

...

...

...

...

...

...

...

578500

313

138

141

408

579500

318

149

145

388

580037

33

8

4

29

1.2.3) Two calculations are performed for each window: xi and yi are determined.

W1

cA1

cC1

cG1

cT1

x1=cT1-cA1

y1=cG1-cC1

W2

cA2

cC2

cG2

cT2

x2=cT2-cA2

y2=cG2-cC2

...

...

...

...

...

...

...

Wn

cAn

cCn

cGn

cTn

xn=cTn-cAn

yn=cGn-cCn

1.2.4) A cumulative curve is calculated : Xi and Yi are determined.

W1 ...

x1=cT1-cA1

y1=cG1-cC1

X1=sum(x1 to x1)

Y1=sum(y1 to y1)

W2 ...

x2=cT2-cA2

y2=cG2-cC2

X2=sum(x1 to x2)

Y2=sum(y1 to y2)

...

...

...

...

...

Wn ...

xn=cTn-cAn

yn=cGn-cCn

Xn=sum(x1 to xn)

Yn=sum(y1 to yn)

1.2.5) A cumulative curve is drawn by respecting the order of data, from X1 to Xn and by assigning to Xi the value of Yi.

1.2.6) According to the previous description the DNA walk was written like this on our graphs, generated by the method "nucleotide by nucleotide":

TmAc vs GmCc meaning that in x is plotted the cumulation of numbers of Ts minus numbers of As vs in y the cumulation of numbers of Gs minus numbers of Cs.

Lobry has chosen to use this assignment: T, G, A, and C correspond to E, S, W, and N directions, respectively. Lobry's outputs are similar to ours (mirror images along the X axis). Compare the DNA walk of Borrelia burgdorferi in Lobry's drawing system and ours.

Lobry's system

our system

Figure 3: DNA walk of Borrelia burdorferi

2) The cumulative TA- and the GC-skew analyses.

2.1) Drawing a cumulative TA- or a GC-skew analysis by reading a sequence file nucleotide by nucleotide.

Cumulative TA-skew analysis: Assign to each nucleotide the following direction: to A, T, C, and G correspond the S, N, nd (no direction), and nd directions, respectively. On the graph, after the reading of one nucleotide, the pointer has to go one step eastward. If a A, or T, is read, a further step is added, southward, or northward, respectively.

Figure 4: Cumulative TA-skew analysis of the sequence of Figure 1

Cumulative GC-skew analysis: Assign to each nucleotide the following direction: to A, T, C, and G correspond the nd, nd, S, and N directions, respectively. On the graph, after reading one nucleotide, the pointer has to move one step eastward. If a C, or G, is read, a further step is added, southward, or northward, respectively.

Figure 5: Cumulative GC-skew analysis of the sequence of Figure 1

2.2.1) Drawing a cumulative TA-skew analysis by slicing a sequence file nucleotide into small windows

According to the annotation developed in 1.2, assign to each window center cwi the Xi value. 

cw1

X1

cw2

X2

...

...

...

...

cwn

Xn

2.2.2) Drawing a cumulative GC-skew analysis by slicing a sequence file nucleotide into small windows
 
A cumulative curve is drawn by respecting the order of data, from cw1 to cwn and by assigning to cwi the value of Xi.
The GC-skew analysis is similar to the latter one. Replace X by Y.

cw1

Y1

cw2

Y2

...

...

...

...

cwn

Yn

On our graphs, generated by the method "nucleotide by nucleotide", the cumulative TA skew is indicated like this: Center vs. TmAc and the cumulative GC skew is indicated by the same annotation system: Center vs. GmCc
 

Figure 6: CumulativeTA-skew analysis of the sequence
of Borrelia burgdorferi

Figure 7: Cumulative GC-skew analysis of the sequence
of Borrelia burgdorferi


 
Lobry, J.R. (1996) A simple vectorial representation of DNA sequences for the detection of replication origins in bacteria. Biochimie, 78, 323-326.
 
Lobry, J.R. (1999) Genomic landscapes. Microbiology Today, 26, 164-165. (Download the file - 223 Ko)
 


| Home | Phylogenetic Tree | Small genomes | Methods | Tools | Other sites | References | Contact Us |

Copyright 2001, IGBM et Université de Lausanne