Top-bottom differences in retinal vascular properties

Differences in tortuosity between top and bottom retinal vasculature

Jonathan NICOLET-DIT_FÉLIX, Bertille BOURG, Louis HEAU

Supervisor: Sacha BORS

To find out if there was a difference in tortuosity between top and bottom segments, we first put together a method to separate the blood vessels, then we computed their tortuosity and determined there was a very small positive asymmetry.

Presentation of the project: why is it important/relevant?

Retina fundus images are a type of picture taken by a special camera through the iris. They allow us to visualise the vasculature of the retina. This is a very convenient non-invasive way to obtain an insight of the body's vascular features, and the ultimate goal is to be able to use these pictures as a proxy for different diseases linked to vasculature such as ocular diseases (diabetic retinopathy, glaucoma) but also more general cardiovascular diseases like stroke, coronary heart disease or hypertension.

But to make the link between diseases and the retinal vasculature, we need to extract features from the fundus images such as (the number of bifurcation, the length, the evolution of the diameter...). One of the features that has recently drawn attention is tortuosity: how sinuous a blood vessel segment is compared to the direct length from its two extremities. Tortuosity is an important feature to study because it has potential links with cardiovascular diseases.

What is unknown is if there is a difference between the tortuosity in the top of the retina vs the bottom for arteries or veins.

Basic concepts

Indeed, when we look at a fundus image what is striking is that there are two large vascular networks above and below a line between the fovea and the optical disc. Arteries and veins are two stacked layers with different biological functions, so we compare them separately in the analysis. But we have these two sets. are they different ? especially for tortuosity ? Should we compute them separately in the analysis or can we use a "global tortuosity ? "

The dataset we will be using for the analysis is the UK biobank dataset, which contains for one patient their genome sequence, and a huge lot of various traits, including the fundus images from their retinas. with this dataset we have a great number of images. These images can be automatically analysed with a special python package based on machine learning program that's being developed called vascX (LWNET), that can extract features from the retina fundus. The extracted features are stored in a retina object, containing many things such as a representation of the retina as a graph with edges and nodes, and the position of important features such as the optical disc (root of all the vasculature supplying the retinal cells ) and the fovea (focal point of an image on the retina, composed of the majority of the cones)

Our research goals

We want to compare the tortuosity value for each vessel layer (veins & arteries) for the top hemisphere and the bottom hemisphere. to do this, we first need a separation method in python for the top and bottom components. Once we have separated these, we need to compute the tortuosity for each hemisphere, and compare it to the global tortuosity. We want to see if there is a difference, an asymmetry. we will calculate an asymmetry coefficient and see if it correlates with any other retinal vasculature trait. indeed, if it correlates with something else that we already have, there is no need to calculate it, we already have an good estimator.

Methods

Before computing the tortuosity, we need a method to separate the vascular elements between the top and the bottom. For this, we will use the graph representation in the 'retina' vascX output, retina.arteries.graph [.nodes/.edgdes, etc https://networkx.org/ ] or retina.veins.graph. This allows us to use the graph functions implemented in networkX.

We have thought about three methods to separate the connected components, the first one would be to take half the picture and everything that falhs above is "up" and everything below is "down". But this is not really precise as the images can be skewed.

A more precise way separation would be to use anchor points within the retina graph, such as the position of the fovea and the optical disc. we could then draw a line and consider again, everything above belonging to the top etc. But we still have a problem, because vessels are intertwined meaning that some bits from one side will cross over the line and we wrongly attributed.


To account to this, we took the connected components of the graphs, and calculated their barycentres. Their barycentres are the centers of the clusters. So it's a majority vote, and instead of deciding if the points are above or below the line, we decide if the cluster centres are above or below the line, then we attribute the rest of the cluster points to the orientation of the cluster centre, and it is more likely to be accurate. This is the method we used for the final calculation.

Once we have this list of segments belonging to the top or the bottom for both the arteries and the veins, we can compute the tortuosity of each segment, and extract values such as the median tortuosity for the top/bottom and for the veins/arteries. we will also calculate the asymm etry coefficient with the following formula: Tau = \frac{tort_top - tort_bottom}{tort_top + tort_bottom}. The absolute value can be sometimes computed but then the orientation of a potential asymmetry is lost: if the top>bottom or else. We added a control to see if there is a real difference between top and bottom, we shuffled the list of tortuosity per segment, randomly assigning them to 'top' or 'bottom'. After producing a dataframe of the new values, we switched to R for the statistical analysis.


Results

One other way of looking if the tortuosities are similar on the top/bottom to the global one, we can make linear regression models between the global tortuosity for arteries/veins and each regional tortuosity (top/bottom). we obtain coefficients of determination in the 0.6~0.75 range, suggesting a correlation, but not of 1, meaning that there is information in this half retina that is specific to it and that the global tortuosity cannot fully explain. One alternative hypothesis could be that we lack half of the information which is the other half of the retina therefore we cannot explain the variance fully but if the tortuosity were equivalent between the top and the bottom, this "representative half" would be a much better estimator. We could still randomise the segments for the top/bottom tortuosities and see if we find the same values. If we do, then the low R² is because we lack values and if we don't it's because the information in the top/bottom section is specific to it. So when randomized, the R² become considerably higher compared to the non-randomized in the 500 retina sample. This means that still with half the data, we are able to explain the global tortuosity much better. The fact that it does not go higher is probably because the sample is not large enough (we did it on the 500 retinas and not the 10'000). Anyway, a good part of the global tortuosity cannot be explained only by the top or only by the bottom because the information they contain are slightly different.

Proportion of the variance of TOP/BOTTOM tortuosity (variable Y) explained by global tortuosity (variable X independent) is between 60-70%. This means that the overall tortuosity of the veins/arteries explains a large proportion (62%) of the variance observed in the tortuosity of the veins/arteries in the upper hemisphere, although it is not a complete explanation (100%).

So this measure is not 100% redundant and we can consider it. So what we did next: Since there is a difference between TOP and BOTTOM tortuosity we summarize this information in a new feature : asymmetry. How does tortuosity differ in the bottom and top of the retina vasculature? Here are the distributions of the asymmetries per retina for the veins/arteries. to see if it is significant, we compare to what the distribution would look like if the attribution was random. We can see that the distributions are not significantly different from each other, meaning that the distribution of the asymmetries would be the same if the segments were randomly attributed to the top/bottom, meaning that there is no significant difference between the top/bottom tortuosities. For computing power reasons, we only took a subset of the original data composed of 500 retinas. We thought that maybe the difference was so small that we needed more statistical power to detect it, so we tried with 10'000 retinas. There we find a very significant difference between the two distributions. The random one is, as expected, centred around zero, while for both the veins and the arteries the distributions are shifted towards positive values (show on the formula), meaning more sinuous vessels on the top.

So it's significant on 10'000 retinas but not on 500: Larger sample sizes provide more power to detect smaller effects, which is why we find a significant difference with 10,000 retinas but not with 500. Is this *practically significant* though? An effect can be statistically significant but may not be practically/biologically significant if the size of the effect is too small to matter in a practical sense. If the effect size is so small that it requires an enormous sample size to detect, one might question its practical significance or relevance. The difference are so small maybe the erros made by the software a sufficient to bias one observation and make its classification unreliable but this error cancels out on a very very large number of observations, which is what we have. So there is an effect, but have to keep in mind it is very small.

Now that we have this new value for asymmetry, which seems to encode the 'existing' difference that we have between the top tortuosity and the bottom tortuosity. Now we need to know if this new information correlates with anything else to see if is really useful. The heatmap looked sound and asymmetry did not seem correlated with anything else, so appeared to be a useful new parameter.

Next steps of such an analysis

So the asymmetry is a new variable, and the first step would need to be to test the heritability, using a GWAS. GWAS stands for "Genome Wide Association Study" and is used to link a characteristic with a gene or a combination of genes. This was done and the preliminary results could indicate that this asymmetry trait does not have a huge genetic component. We would therefore need to test the environmental exposition aspect. The UK biobank dataset contains additional data such as sun exposure or diseases (skin cancer, cardiovascular), so the correlations have yet to be tested. Also we've been told that the TOP repeatedly has a bigger tortuosity, we could imagine the effects of the environment on that.


Feedback

This project was really interesting. It allowed us to work on something concrete, which contrasted with the other courses, which were much more theoretical. It also enabled us to develop new skills and understand what real bioinformatics research is all about. The only problem we have identified is that the level required to complete this project is much higher than the level acquired during the bachelor's degree. It might be interesting to have one or two courses on the functionalities used.

Additional files

File:Slides lh bb jndf.pdf