--- title: "From trio information to full families" author: "Emil M. Pedersen" date: "`r Sys.Date()`" output: html_document vignette: > %\VignetteIndexEntry{From trio information to full families} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r message=FALSE, warning=FALSE} library(LTFGRS) library(dplyr) library(igraph) library(kinship2) ``` *Note: This vignette was originally published with the LTFHPlus package [here](https://emilmip.github.io/LTFHPlus/articles/FromTrioToFamilies.html)* In this document, we will present how we can go from trio information to full families that can be used to calculate kinship matrices. By trio information, we specifically mean knowing the id of the child and the id of the child's mother and father. Kinship matrices are essential when estimating the liabilities with the `estimate_liability()` function of the package. This addition help with the process of identifying related individuals and subsequent construction of the kinship matrix. ### From trio information to graph The trio information can be used to create extended families manually by first identifying parents, grandparents, great-grandparents, etc.. From there, siblings, aunts and uncles, cousins, etc.. can also be identified. However, this is a tedious process and it is easy to miss family members. We have developed a function that can find all family member that are related of degree $n$ or closer that does not rely on the tedious process of identifying each family role manually. Below is an example data set of a family. It contains half-siblings, half-aunts and -uncles, as well as cousins and individuals that have married into the family. An example is *mgm* meaning *maternal grandmother*, *hspaunt* meaning *paternal half-aunt*, or *hsmuncleW* meaning *maternal half-uncle's wife*. We construct the example dataset with tribble because it enables row-wise construction, which mirrors how trio information is typically stored and helps readability in small example datasets. ```{r} # Setting seed: set.seed(555) # constructing family (trio) data family = tribble( ~id, ~momcol, ~dadcol, "pid", "mom", "dad", "sib", "mom", "dad", "mhs", "mom", "dad2", "phs", "mom2", "dad", "mom", "mgm", "mgf", "dad", "pgm", "pgf", "dad2", "pgm2", "pgf2", "paunt", "pgm", "pgf", "pacousin", "paunt", "pauntH", "hspaunt", "pgm", "newpgf", "hspacousin", "hspaunt", "hspauntH", "puncle", "pgm", "pgf", "pucousin", "puncleW", "puncle", "maunt", "mgm", "mgf", "macousin", "maunt", "mauntH", "hsmuncle", "newmgm", "mgf", "hsmucousin", "hsmuncleW", "hsmuncle" ) # simulating sex status on ambiguous individuals thrs = tibble( id = family %>% select(1:3) %>% unlist() %>% unique(), sex = case_when( id %in% family$momcol ~ "F", id %in% family$dadcol ~ "M", TRUE ~ NA)) %>% mutate(sex = sapply(sex, function(x) ifelse(is.na(x), sample(c("M", "F"), 1), x))) ``` The object `family` is meant to represent the trio information that can be found in registers. It is possible to have multiple families in the same input data or single individuals with no family links. ```{r} graph = prepare_graph(.tbl = family, node_attributes = thrs, fcol = "dadcol", mcol = "momcol", icol = "id") graph ``` The object `graph` is a directed graph constructed from the trio information in `family` and is build using the ***igraph*** package. The direction in the graph is from parent to offspring. ### From graph to subgraph and kinship matrix We can construct a kinship matrix from all family members present in `family`, or we can consider only the family members that are of degree $n$. We can identify the family members of degree $2$ like this: ```{r, fig.alt = "Plot of the identified pedigree. Pedigree plotted with igraph package."} # get_family_graphs wraps make_ego_graph returns a formatted tbl fam_graph = get_family_graphs(pop_graph = graph, ndegree = 2, proband_vec = V(graph)$name) fam_graph ``` `fam_graph` is a tibble with one row per proband (here all individuals in the graph are probands). The first column is the family ID, typically the name of the proband the family graph is centred around, and a column `fam_graph` containing the corresponding family graphs as igraph objects. We can plot one of the identified family graphs with the standard plot function from the igraph package, however it is not ideal for pedigrees past a certain size. ```{r, fig.alt = "Plot of the identified pedigree. Pedigree plotted with igraph package."} plot(fam_graph$fam_graph[[1]], # choosing pid's family graph layout = layout_as_tree, vertex.size = 27.5, vertex.shape = "rectangle", vertex.label.cex = .75, edge.arrow.size = .3) ``` In particular, individuals such as paternal uncle's child (i.e a cousin, coded as pucousin above) is not present with this relatedness cut-off as such family members are of degree $3$. ### Calculate kinship matrix Finally, the kinship matrix can be calculated with `get_covmat()` in the following way: ```{r} get_covmat(fam_graph$fam_graph[[1]], h2 = 1, index_id = "pid", add_ind = FALSE) ``` A function called `graph_to_trio()` has been included in the package, which can convert from the graph object back into a trio object. This function is useful if you want to use the functionality of other packages that rely on trio information. One such example is using the plotting functionality of pedigrees in kinship2. ```{r} trio = graph_to_trio(graph = fam_graph$fam_graph[[1]], fixParents = TRUE) trio ``` which can be used to utilise the powerful plotting tool kit available in the kinship2 package. ```{r, fig.alt = "Plot of the identified pedigree. Pedigree plotted with kinship2 package."} pedigree = with(trio,kinship2::pedigree(id = id, dadid = dadid,momid = momid,sex = sex)) plot(pedigree) ```