|
|
Methods (français)
IntroductionA DNA walk of a genome represents how the frequency of each nucleotide of a pairing nucleotide couple changes locally. This analysis implies measurement of the local distribution of Gs in the content of GC and of Ts in the content of TA. Lobry was the first to propose this analysis (1996, 1999). Two complementary representations can be derived from the DNA walk: the cumulative TA- and the GC-skew analysis.
Aim: By reading these description of the algorithm, a reader not trained in genomics is able to redraw our graphs, using the basic genometric data file that is posted on our web resource for each organism as a zip file (.zip).
1) DNA walk
1.1) Drawing a DNA walk by reading a sequence file nucleotide by nucleotide.
A simple algorithm is used to draw a DNA walk by simply assigning a direction to each nucleotide. We propose the following assignment, slightly different from Lobry's: to T, C, A, and G correspond the E(ast), S(outh), W(est), and N(orth) directions, respectively (Lobry, 1999). Reading the nucleotide sequence nucleotide by nucleotide, and following the rule, a path clearly emerges on the graph: Figure 1.
Figure 1: DNA walk of the sequence
GTCTGGTGTCTGGAGTTCCTGGGTCTTGAGACCACAGGACCCACCAGGGACCCAGGACCC
Starting from the bottom left (bold blue line), the curve end at the bottom left (pink line)
1.2) Drawing a DNA walk by slicing a sequence file nucleotide into small windows
A simple way to draw quickly this kind of graph is suggested by Lobry (1996) by cutting a genome into windows of equal length.
Figure 2: DNA walk of the same sequence as the one presented in Figure 1: GTCTGGTGTCTGGAGTTCCTGGGTCTTGAGACCACAGGACCCACCAGGGACCCAGGACCC
The sequence was sliced into 5-nucleotide windows. Only the fifth nucleotide per window is plotted. We can also work with the mean values of the window
Comment: this method is not as precise as the first one. We could use it with a spreadsheet software without affecting the final resolution of the curve at the genome level.
1.2.1) The genome is cut into a number n of windows W, of equal size (the last window being smaller or equal to the other ones).
- W1
- W2
- W3
- ...
- ...
- Wn-1
- Wn
1.2.2) In each of these windows a count for each nucleotide is performed: cA, cC, cG, and cT respectively.
W1
cA1
cC1
cG1
cT1
W2
cA2
cC2
cG2
cT2
W3
cA3
cC3
cG3
cT3
...
...
...
...
...
...
...
...
...
...
Wn-1
cAn-1
cCn-1
cGn-1
cTn-1
Wn
cAn
cCn
cGn
cTn
- Example: Mycoplasma genitalium genome (download the compressed text file), cut into windows of 1000 nucleotides.
- (Mycoplasma genitalium G37 complete genome, L43967.1, 580074 bp, window: 1000 bp).
Center position
Position of the window center (nt)
cA
cC
cG
cT
500
453
93
86
368
1500
400
120
133
347
2500
374
122
164
340
3500
345
145
200
310
...
...
...
...
...
...
...
...
...
...
578500
313
138
141
408
579500
318
149
145
388
580037
33
8
4
29
1.2.3) Two calculations are performed for each window: xi and yi are determined.
W1
cA1
cC1
cG1
cT1
x1=cT1-cA1
y1=cG1-cC1
W2
cA2
cC2
cG2
cT2
x2=cT2-cA2
y2=cG2-cC2
...
...
...
...
...
...
...
Wn
cAn
cCn
cGn
cTn
xn=cTn-cAn
yn=cGn-cCn
1.2.4) A cumulative curve is calculated : Xi and Yi are determined.
W1 ...
x1=cT1-cA1
y1=cG1-cC1
X1=sum(x1 to x1)
Y1=sum(y1 to y1)
W2 ...
x2=cT2-cA2
y2=cG2-cC2
X2=sum(x1 to x2)
Y2=sum(y1 to y2)
...
...
...
...
...
Wn ...
xn=cTn-cAn
yn=cGn-cCn
Xn=sum(x1 to xn)
Yn=sum(y1 to yn)
1.2.5) A cumulative curve is drawn by respecting the order of data, from X1 to Xn and by assigning to Xi the value of Yi.
1.2.6) According to the previous description the DNA walk was written like this on our graphs, generated by the method "nucleotide by nucleotide":
- TmAc vs GmCc meaning that in x is plotted the cumulation of numbers of Ts minus numbers of As vs in y the cumulation of numbers of Gs minus numbers of Cs.
Lobry has chosen to use this assignment: T, G, A, and C correspond to E, S, W, and N directions, respectively. Lobry's outputs are similar to ours (mirror images along the X axis). Compare the DNA walk of Borrelia burgdorferi in Lobry's drawing system and ours.
Lobry's system
our system
Figure 3: DNA walk of Borrelia burdorferi
2) The cumulative TA- and the GC-skew analyses.
2.1) Drawing a cumulative TA- or a GC-skew analysis by reading a sequence file nucleotide by nucleotide.
Cumulative TA-skew analysis: Assign to each nucleotide the following direction: to A, T, C, and G correspond the S, N, nd (no direction), and nd directions, respectively. On the graph, after the reading of one nucleotide, the pointer has to go one step eastward. If a A, or T, is read, a further step is added, southward, or northward, respectively.
Figure 4: Cumulative TA-skew analysis of the sequence of Figure 1
Cumulative GC-skew analysis: Assign to each nucleotide the following direction: to A, T, C, and G correspond the nd, nd, S, and N directions, respectively. On the graph, after reading one nucleotide, the pointer has to move one step eastward. If a C, or G, is read, a further step is added, southward, or northward, respectively.
Figure 5: Cumulative GC-skew analysis of the sequence of Figure 1
2.2.1) Drawing a cumulative TA-skew analysis by slicing a sequence file nucleotide into small windows
According to the annotation developed in 1.2, assign to each window center cwi the Xi value.
cw1
X1
cw2
X2
...
...
...
...
cwn
Xn
- 2.2.2) Drawing a cumulative GC-skew analysis by slicing a sequence file nucleotide into small windows
- A cumulative curve is drawn by respecting the order of data, from cw1 to cwn and by assigning to cwi the value of Xi.
- The GC-skew analysis is similar to the latter one. Replace X by Y.
cw1
Y1
cw2
Y2
...
...
...
...
cwn
Yn
- On our graphs, generated by the method "nucleotide by nucleotide", the cumulative TA skew is indicated like this: Center vs. TmAc and the cumulative GC skew is indicated by the same annotation system: Center vs. GmCc
Figure 6: CumulativeTA-skew analysis of the sequence
of Borrelia burgdorferiFigure 7: Cumulative GC-skew analysis of the sequence
of Borrelia burgdorferi
- Lobry, J.R. (1996) A simple vectorial representation of DNA sequences for the detection of replication origins in bacteria. Biochimie, 78, 323-326.
- Lobry, J.R. (1999) Genomic landscapes. Microbiology Today, 26, 164-165. (Download the
file - 223 Ko)
|
Copyright 2001, IGBM et Université de Lausanne |