-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathpedigree_plotting_in_R.Rmd
256 lines (169 loc) · 9.06 KB
/
pedigree_plotting_in_R.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
---
title: "Plotting Pedigrees in R is Surprisingly Easy (even without ggplot)"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
<body>
<p>This is my first attempt at a tutorial, I don't know if anyone will find it useful, but perhaps someone will stumble across it and find it interesting.</p>
<p>I've always been a little overwhelmed by packages that plot pedigrees, they always seem to have hundreds of lines of code and it always seemed like a very involved process to be able to go from a huge pedigree to such a complicated diagram.</p>
<p>However, as I've become more familiar with R the the mystery behind plotting a pedigree has slowly went away. The truth is that plotting a pedigree is easy, really easy.</p>
<p>I'd argue that in terms of quickly visualising your data, this will be as quick as loading up a package and getting your pedigree into the correct format.</p>
<p>Required data and additional files can be found at:
<a href="https://github.com/simplydch/Pedigree-Plotting-in-R">github.com/simplydch/Pedigree-Plotting-in-R</a></p>
<p>Please feel free to get in touch with any questions or comments.</p>
<h3><u>Plotting a pedigree</u></h3>
<p>To demonstrated I have simulated a small pedigree consisting of 136 individuals
in 10 cohorts:<p>
```{r}
load("short_ped.RData")
head(short_ped)
dim(short_ped)
```
In order to plot the pedigree the first step is to set up the coordinates of the points for each individual.
```{r}
## We need to take the Cohorts from the pedigree to provide our y-axis values
y_coor <- short_ped[, "Cohort"]
## We then need x-coordinates that spread each set of cohorts out evenly across the x-axis
## First we can use table to find how many individuals are in each cohort
## For each of these values we generate an evenly spaced sequence between 1 and 10
## The output of sapply is a list, so do.call is used to join the values
## into a single vector
y_counts <- unname(table(y_coor))
x_coor <- do.call(c, sapply(y_counts, function(x) seq(1, 10, length = x)))
# And we can have our first attempt at plotting
plot(x_coor, y_coor)
```
<p>So we have every individual represented as a dot in the line that relates to their cohort.</p>
<p>Now we need to join those dots, using the under-appreciated, but incredibly useful match function!</p>
<p>This allows us to find the row reference of the sire that relates to each id.
If we re-arrange the order of the plotted x and y values using these references that's all the information needed to plot all the sire links using the segments function. The segments function draws lines between two sets of coordinates </p>
```{r}
## First we need to all pull the ids, sires and dams from the pedigree.
ids <- short_ped[, "ID"]
sires <- short_ped[, "SireID"]
dams <- short_ped[, "MumID"]
# And then get row references, and then coordinates for the sires.
sire_ref <- match(sires, ids)
x_sire <- x_coor[sire_ref]
y_sire <- y_coor[sire_ref]
# Segments creates lines between the points in the provided x and y - values
plot(x_coor, y_coor)
segments(x_coor, y_coor, x_sire, y_sire)
```
<p>This looks good, but obviously isn't quite right. Why are there individuals at the bottom of the plot that have no sire links? Well, out plot is currently upside down. Those must be the female founders!<p>
```{r}
# Lets rearrange the plot reversing the y-values.
plot(x_coor, y_coor, ylim = c(max(y_coor), min(y_coor)))
# And add those sire links again, this time making them orange.
segments(x_coor, y_coor, x_sire, y_sire, col = "orange")
# And we can then also add in the dams using blue lines
dam_ref <- match(dams, ids)
x_dam <- x_coor[dam_ref]
y_dam <- y_coor[dam_ref]
segments(x_coor, y_coor, x_dam, y_dam, col = "blue")
```
<p>And that's it. We've plotted the pedigree, using different coloured lines to represent sires and dams. </p>
<p>We probably still want to change a few settings to make the plot neater.</p>
```{r}
par(mar = c(0.5, 4, 0.5, 0.5))
plot(x_coor, y_coor, ylim = c(max(y_coor), min(y_coor)), col = rgb(0, 0, 0, 0.5), pch = ".", cex = 1, frame.plot = FALSE, ylab = "", xlab = "", axes = FALSE)
## We can add in the Cohort information
axis(side = 2, labels = TRUE, tick = FALSE, las = 2, at = unique(y_coor))
segments(x_coor, y_coor, x_sire, y_sire, col = "orange")
segments(x_coor, y_coor, x_dam, y_dam, col = "blue")
```
<p>It really is that quick and easy. If you wish to only plot some links, for example the descendants of one individual, things become slightly more challenging. However, you have all your links already, you just need to figure out which ones to 'turn on'.</p>
<p>To demonstrate this next stage we can plot all the offspring of individual ID4.<p>
```{r}
# Lets start with the background pedigree, we can specify the colours as rgb
# and make them grey and transparent.
plot(x_coor, y_coor,
ylim = c(max(y_coor), min(y_coor)), col = rgb(0, 0, 0, 0.5), pch = ".", cex = 1,
frame.plot = FALSE, axes = F, ylab = "", xlab = "")
axis(side = 2, labels = TRUE, tick = FALSE, las = 2, at = unique(y_coor))
segments(x_coor, y_coor, x_sire, y_sire, col = rgb(0.9, 0.9, 0.9, 0.6))
segments(x_coor, y_coor, x_dam, y_dam, col = rgb(0.9, 0.9, 0.9, 0.6))
# Lets plot the focal id (in red) using points
id_plot <- which(ids %in% "ID4")
points(x_coor[id_plot], y_coor[id_plot], col = "red")
# And also plot the points representing his offspring, this time in black.
ID_wSire <- which(sire_ref %in% id_plot)
points(x_coor[ID_wSire], y_coor[ID_wSire])
# So to plot the links, lets just turn off all the links that aren't connected to individual ID4. Since he is a sire we only have to plot sire links
x_sire2 <- x_sire
y_sire2 <- y_sire
x_sire2[sire_ref != id_plot] <- NA
y_sire2[sire_ref != id_plot] <- NA
segments(x_coor, y_coor, x_sire2, y_sire2, col = "orange")
```
<p>The next stage is a bit harder still. How do we plot all descendants? Unsurprisingly it requires a few more steps and a little more thinking, but exactly same logic as before.</p>
```{r}
## Lets start again with a plain background
plot(x_coor, y_coor,
ylim = c(max(y_coor), min(y_coor)), col = rgb(0, 0, 0, 0.5), pch = ".", cex = 1,
frame.plot = FALSE, axes = F, ylab = "", xlab = ""
)
axis(side = 2, labels = TRUE, tick = FALSE, las = 2, at = unique(y_coor))
segments(x_coor, y_coor, x_sire, y_sire, col = rgb(0.9, 0.9, 0.9, 0.6))
segments(x_coor, y_coor, x_dam, y_dam, col = rgb(0.9, 0.9, 0.9, 0.6))
# The main hurdle is identifying all the descendants in the first place, so we'll use
# a simple loop
# We set up vectors to hold the parents each iteration and also all the descendants
# our focal individual is included in both. At each iteration of the loop we find all the
# individuals that are offspring of the individuals in the list of parents, and continue until there are no further offspring.
parents <- id_plot
desc_ref <- c(id_plot)
repeat{
offspring <- c(ids[sire_ref %in% parents], ids[dam_ref %in% parents])
off_ref <- which(ids %in% offspring)
if (length(offspring) == 0) {
break
}
desc_ref <- c(desc_ref, off_ref)
parents <- off_ref
}
#
# We have now got all the relevant individual references
desc_ref
# So again we can highlight the focal individual and all it's descendants
points(x_coor[id_plot], y_coor[id_plot], col = "red")
points(x_coor[desc_ref], y_coor[desc_ref])
# This time we are interested in plotting both sire and dam links, but only to other
# individuals in the list of descendants
x_sire2 <- x_sire
y_sire2 <- y_sire
x_sire2[!(sire_ref %in% desc_ref & ids %in% ids[desc_ref])] <- NA
y_sire2[!(sire_ref %in% desc_ref & ids %in% ids[desc_ref])] <- NA
segments(x_coor, y_coor, x_sire2, y_sire2, col = "orange")
x_dam2 <- x_dam
y_dam2 <- y_dam
x_dam2[!(dam_ref %in% desc_ref & ids %in% ids[desc_ref])] <- NA
y_dam2[!(dam_ref %in% desc_ref & ids %in% ids[desc_ref])] <- NA
segments(x_coor, y_coor, x_dam2, y_dam2, col = "blue")
```
<p>
In conclusion, I maybe exaggerated a little about how simple it is, but actually plotting a whole pedigree isn't at all complicated. The key challenge is when you are only interested in plotting some links and sub-setting the lines to just the ones that you require.</p>
<p>Once you understand the basic principle you can easily plot any set of relationships you like, and make a professional looking pedigree plot.</p>
```{r}
plot(x_coor, y_coor,
ylim = c(max(y_coor), min(y_coor)), col = rgb(0, 0, 0, 0.5), pch = ".", cex = 1,
frame.plot = FALSE, axes = F, ylab = "", xlab = ""
)
axis(side = 2, labels = TRUE, tick = FALSE, las = 2, at = unique(y_coor), cex = 0.8)
segments(x_coor, y_coor, x_sire, y_sire, col = rgb(0.9, 0.9, 0.9, 0.6))
segments(x_coor, y_coor, x_dam, y_dam, col = rgb(0.9, 0.9, 0.9, 0.6))
x_sire2 <- x_sire
y_sire2 <- y_sire
x_sire2[!(sire_ref %in% desc_ref & ids %in% ids[desc_ref])] <- NA
y_sire2[!(sire_ref %in% desc_ref & ids %in% ids[desc_ref])] <- NA
segments(x_coor, y_coor, x_sire2, y_sire2, col = "#f1a340")
x_dam2 <- x_dam
y_dam2 <- y_dam
x_dam2[!(dam_ref %in% desc_ref & ids %in% ids[desc_ref])] <- NA
y_dam2[!(dam_ref %in% desc_ref & ids %in% ids[desc_ref])] <- NA
segments(x_coor, y_coor, x_dam2, y_dam2, col = "#998ec3")
```
</body>
</html>