magnusdv / dvir Goto Github PK

Disaster Victim Identification in R

R 100.00%

dvir's Introduction

Disaster Victim Identification in R

The purpose of dvir is to implement state-of-the-art algorithms for DNA-based disaster victim identification (DVI). In particular, dvir performs joint identification of multiple victims.

The methodology and algorithms of dvir are described in Vigeland & Egeland (2021): DNA-based Disaster Victim Identification.

The dvir package is part of the pedsuite, a collection of R packages for pedigree analysis. Much of the machinery behind dvir is imported from other pedsuite packages, especially pedtools for handling pedigree data, and forrel for the calculation of likelihood ratios. A comprehensive presentation of these packages, and much more, can be found in the book Pedigree Analysis in R.

Installation

To get the current official version of dvir, install from CRAN as follows:

install.packages("dvir")

Alternatively, the latest development version may be obtained from GitHub:

# install.packages("remotes")
remotes::install_github("magnusdv/dvir")

Tutorial example

In the following we will use a toy DVI example from the paper (see above) to illustrate how to use dvir.

To get started, we load the dvir package.

library(dvir)
#> Loading required package: pedtools

Introduction

We consider the DVI problem shown below, in which three victim samples (V1, V2, V3) are to be matched against three missing persons (M1, M2, M3) belonging to two different families.

The hatched symbols indicate genotyped individuals. In this simple example we consider only a single marker, with 10 equifrequent alleles denoted 1, 2,…, 10. The available genotypes are shown in the figure.

DNA profiles from victims are generally referred to as post mortem (PM) data, while the ante mortem (AM) data contains profiles from the reference individuals R1 and R2.

Assignments

A possible solution to the DVI problem is called an assignment. In our toy example, there are a priori 14 possible assignments, which can be listed as follows:

#>    V1 V2 V3
#> 1   *  *  *
#> 2   *  * M3
#> 3   * M1  *
#> 4   * M1 M3
#> 5   * M2  *
#> 6   * M2 M3
#> 7  M1  *  *
#> 8  M1  * M3
#> 9  M1 M2  *
#> 10 M1 M2 M3
#> 11 M2  *  *
#> 12 M2  * M3
#> 13 M2 M1  *
#> 14 M2 M1 M3

Each row indicates the missing persons corresponding to V1, V2 and V3 (in that order) with * meaning not identified. For example, the first line contains the null model corresponding to none of the victims being identified, while the last line gives the assignment where (V1, V2, V3) = (M1, M2, M3), For each assignment a we can calculate the likelihood, denoted L(a). The null likelihood is denoted L0.

Goals

We consider the following to be two of the main goals in the analysis of a DVI case with multiple missing persons:

Rank the assignments according to how likely they are. We measure this by calculating the LR comparing each assignment a to the null model: LR = L(a)/L0.
Find the posterior pairing probabilities P(Vi = Mj | data) for all combinations of i and j, and the posterior non-pairing probabilities P(Vi = '*' | data) for all i.

The data

The pedigrees and genotypes for this toy example are available within dvir as a built-in dataset, under the name example2.

example2
#> DVI dataset: 
#>  3 victims (2M/1F): V1, V2, V3
#>  3 missing (2M/1F): M1, M2, M3
#>  2 typed refs: R1, R2
#>  2 ref families: 1, 2
#> Number of markers, PM and AM: 1

Internally, all DVI datasets in dvir have the structure of a list, with elements pm (the victim data), am (the reference data) and missing (a vector naming the missing persons): We can inspect the data by printing each object. For instance, in this case am is a list of two pedigrees:

example2$am
#> [[1]]
#>  id fid mid sex  L1
#>  M1   *   *   1 -/-
#>  R1   *   *   2 2/2
#>  M2  M1  R1   1 -/-
#> 
#> [[2]]
#>   id fid mid sex  L1
#>   R2   *   *   1 3/3
#>  MO2   *   *   2 -/-
#>   M3  R2 MO2   2 -/-

Note that the two pedigrees are printed in so-called ped format, with columns id (ID label), fid (father), mid (mother), sex (1 = male; 2 = female) and L1 (genotypes at locus L1).

The code generating this dataset can be found in the github repository of dvir, more specifically here: https://github.com/magnusdv/dvir/blob/master/data-raw/example2.R.

A great way to inspect a DVI dataset is to plot it with the function plotDVI().

plotDVI(example2)

The plotDVI() function offers many parameters for tweaking the plot; see the help page ?plotDVI() for details.

Joint identification

The jointDVI() function performs joint identification of all three victims, given the data. It returns a data frame ranking all assignments with nonzero likelihood:

jointRes = jointDVI(example2, verbose = FALSE)

# Print the result
jointRes
#>    V1 V2 V3    loglik  LR   posterior
#> 1  M1 M2 M3 -16.11810 250 0.718390805
#> 2  M1 M2  * -17.72753  50 0.143678161
#> 3   * M2 M3 -18.42068  25 0.071839080
#> 4  M1  * M3 -20.03012   5 0.014367816
#> 5   * M1 M3 -20.03012   5 0.014367816
#> 6   * M2  * -20.03012   5 0.014367816
#> 7   *  * M3 -20.03012   5 0.014367816
#> 8  M1  *  * -21.63956   1 0.002873563
#> 9   * M1  * -21.63956   1 0.002873563
#> 10  *  *  * -21.63956   1 0.002873563

The output shows that the most likely joint solution is (V1, V2, V3) = (M1, M2, M3), with an LR of 250 compared to the null model.

The function plotSolution() shows the most likely solution:

plotSolution(example2, jointRes, marker = 1)

By default, the plot displays the assignment in the first row of jointRes. To examine the second most likely, add k = 2 (and so on to go further down the list).

Posterior pairing probabilities

Next, we compute the posterior pairing (and non-pairing) probabilities. This is done by feeding the output from jointDVI() into the function Bmarginal().

Bmarginal(jointRes, example2$missing, prior = NULL)
#>            M1        M2        M3          *
#> V1 0.87931034 0.0000000 0.0000000 0.12068966
#> V2 0.01724138 0.9482759 0.0000000 0.03448276
#> V3 0.00000000 0.0000000 0.8333333 0.16666667

Here we used a default flat prior for simplicity, assigning equal prior probabilities to all assignments.

we see that the posterior pairing probabilities for the most likely solution are

P(V1 = M1 | data) = 0.88,
P(V2 = M2 | data) = 0.95,
P(V3 = M2 | data) = 0.83.

dvir's People

Contributors

Stargazers

Watchers

Forkers

nganbaohuynh

dvir's Issues

An error occurs after removing a family and the missing in the family

library(dvir)
#> Loading required package: pedtools
foo = example2
foo$am = foo$am[[-2]]
foo$missing = foo$missing[-3]
plotDVI(foo, titles = c("PM", "AM 2 and M3 removed"))

jointDVI(foo)
#> DVI problem:
#>  3 victims: V1, V2, V3
#>  2 missing: M1, M2
#>  1 typed refs: R1
#>  1 reference family
#> 
#> Joint identification
#> 
#> Undisputed matches:
#>  Pairwise LR threshold = 10000
#> 
#> Iteration 1:
#> No undisputed matches
#> 
#> Assignments to consider in the joint analysis: 5
#> Error in UseMethod("likelihood", x): no applicable method for 'likelihood' applied to an object of class "character"

^{Created on 2022-11-14 with reprex v2.0.2}

`sequentialDVI`: check-problem. Add seed, mutation

Fra/til Thore: Sjekk sequentialDVI. Legge til seed slik at resultater blir reproduserbare?
Hvordan håndteres mutasjoner?

library(dvir)
data(planecrash)
pm = planecrash$pm
am = planecrash$am
missing = planecrash$missing
sequentialDVI(pm, am, missing)
#> Error in checkDVI(pm, am, missing = missing, moves = moves): Third vector must a vector with names of missing persons

^{Created on 2021-01-15 by the reprex package (v0.3.0)}

More detailed `print.dviData()`

Feature request: Include sex distribution and some marker info (at least the number of markers) when printing a dviData object.

library(dvir)
#> Loading required package: pedtools
grave
#> DVI dataset:
#>  8 victims: V1, V2, V3, V4, V5, V6, V7, V8
#>  8 missing: MP1, MP2, MP3, MP4, MP5, MP6, MP7, MP8
#>  5 typed refs: R3, R5, R4, R1, R2
#>  1 ref family: (unnamed)

^{Created on 2023-08-31 with reprex v2.0.2}

Thanks!

`dviJoint()` with `assignment`

Consider

library(dvir, quietly = T)
res1 = jointDVI(example2, verbose = F)
dviJoint(example2, assignments = res1[1:2, 1:3], verbose = F)
#>   V1 V2 V3 cLR_V1 cLR_V2 cLR_V3    loglik jLR posterior
#> 1 M1 M2 M3      5      5     NA -16.11810 250 0.8333333
#> 2 M1 M2  *     NA     NA     NA -17.72753  50 0.1666667

^{Created on 2023-10-18 with reprex v2.0.2}
I expected cLR_V1, cLR_V2, cLR_V3 to be NA, NA, 5 in the first line above

Fjerne 'export' av noen funksjoner

'sequential1' eksporteres ikke lenger, men help filen er der!

`findExcluded` prints empty lines

library(dvir)
#> Loading required package: pedtools
for(i in 1:10)
 findExcluded(example1, verbose = F)
#> 
#>

^{Created on 2023-02-16 with reprex v2.0.2}

Increasing limit in `jointDVI` gives error

library(dvir)
#> Loading required package: pedtools
jointDVI(example2, limit = 10)
#> DVI dataset:
#>  3 victims: V1, V2, V3
#>  3 missing: M1, M2, M3
#>  2 typed refs: R1, R2
#>  2 reference families
#> 
#> Finding undisputed matches
#> Pairwise LR threshold = 10000
#> 
#> Iteration 1:
#> Computing matrix of pairwise LR
#> No undisputed matches
#> Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 0, 3

^{Created on 2023-01-12 with reprex v2.0.2}

Problem when all are undisputed

The last slot below, dviReduced , not properly returned:

library(dvir)
#> Loading required package: pedtools
findUndisputed(example2, threshold = 1)
#> 
#> Finding undisputed matches
#> Pairwise LR threshold = 1
#> 
#> Step 1:
#> Computing matrix of pairwise LR
#> 1 undisputed match
#>  V3 = M3 (LR = 5)
#> Reducing DVI dataset
#> Removing 1 AM family with no remaining missing persons: 2
#> 
#> Step 2:
#> Computing matrix of pairwise LR
#> 1 undisputed match
#>  V2 = M2 (LR = 5)
#> Reducing DVI dataset
#> 
#> Step 3:
#> Computing matrix of pairwise LR
#> 1 undisputed match
#>  V1 = M1 (LR = 10)
#> Reducing DVI dataset
#> Removing 1 AM family with no remaining missing persons: 1
#> $undisputed
#>   Sample Missing Family LR Step
#> 1     V3      M3      2  5    1
#> 2     V2      M2      1  5    2
#> 3     V1      M1      1 10    3
#> 
#> $dviReduced
#> Error: Input to `nMarkers()` must be a `ped` object or a list of such

^{Created on 2023-09-25 with reprex v2.0.2}

Dealing with apriori certain and impossible identifications

Please consider the feature requests below:

library(dvir, quietly = T)
example1
#> DVI dataset:
#>  3 victims (3M/0F): V1, V2, V3
#>  3 missing (3M/0F): M1, M2, M3
#>  1 typed refs: R1
#>  1 ref family: (unnamed)
#> Number of markers, PM and AM: 1

# Assume V3 = M2, identified by other data.
# Can this identification be fixed in future analyses?

# Add MP4, not identifiable from marker data
# Can MP4 be classified as non-identifiable and removed from further analyses?
# Example of adding and removal
example1$am[[1]] = example1$am[[1]] |> 
 addSon(parents = c("NN", "M4"), id =  "NN2")
#> Father: Creating new individual with ID = M4
example1$missing = c(example1$missing, "M4")
example1$missing = example1$missing[-4]

^{Created on 2023-10-08 with reprex v2.0.2}

plotSolution problem

plotSolution fails:

library(dvir, quietly = T)
example2.1 = example2
example2.1$pm = example2.1$pm[[1]]
joint = jointDVI(example2.1, verbose = F)
plotSolution(example2.1, joint)
#> Error: Mismatch in transfer individuals:
#>  `idsFrom` = 
#>  `idsTo` = =M1

^{Created on 2023-01-16 with reprex v2.0.2}
I believe this happens when dim(joint)[2] == 4

No victims left creates problems for jointDVI

The only victim is excluded and this creates problems

library(dvir, quietly =T)

# Attributes of the marker locus
loc = list(name = "L1", alleles = 1:2)

### 1. PM data

pm = list(
  singleton("V1", sex = 1)
  ) |> 
  addMarker(V1 = "2/2",  locusAttr = loc)

### 2. AM data

am = list(
  nuclearPed(father = "R1", mother = "R2",  child = "M1") 
  ) |> 
  addMarker(R1 = "1/1", R2 = "1/1", locusAttr = loc)

### 3. Missing persons
missing = "M1"

### 4. Create DVI object
dvi = dviData(pm = pm, am = am, missing = missing)
excl = findExcluded(dvi, maxInc = 0, verbose = F)
#> 
jointDVI(dvi, pairings = excl$pairings)
#> DVI dataset:
#>  1 victims: V1
#>  1 missing: M1
#>  2 typed refs: R1, R2
#>  1 reference family
#> Error in pairings[[v]]: subscript out of bounds

^{Created on 2023-02-28 with reprex v2.0.2}

`expand.grid.nodup()` should be more clever

The fact that it wraps expand.grid() makes it fail in some instances where it shouldn't.

library(dvir)

toyExample = function(n) rep(list(c("*", "M")), n)

toyExample(3)
#> [[1]]
#> [1] "*" "M"
#> 
#> [[2]]
#> [1] "*" "M"
#> 
#> [[3]]
#> [1] "*" "M"

# Correct solution 
expand.grid.nodup(toyExample(3))
#>   Var1 Var2 Var3
#> 1    *    *    *
#> 2    M    *    *
#> 3    *    M    *
#> 4    *    *    M

# Chokes already at 40 reps
expand.grid.nodup(toyExample(40))
#> Error: cannot allocate vector of size 4096.0 Gb

# Complete breakdown by 100 reps 
expand.grid.nodup(toyExample(100))
#> Error in rep.int(rep.int(seq_len(nx), rep.int(rep.fac, nx)), orep): invalid 'times' value

^{Created on 2020-12-17 by the reprex package (v0.3.0)}

NULL in dviData

An error is returned below, maybe this is as it should be ...

library(dvir)
#> Loading required package: pedtools
dviData(pm = singleton(1), am = NULL, missing = NULL)
#> Error: Input to `nMarkers()` must be a `ped` object or a list of such

^{Created on 2022-11-20 with reprex v2.0.2}

Bug in summariseDVI

9 families rather than the correct 1 reported below:

library(dvir)
#> Loading required package: pedtools
summariseDVI(example1$pm, example1$am, example1$missing)
#> DVI problem:
#>  3 victims: V1, V2, V3
#>  3 missing: M1, M2, M3
#>  9 families
#>  1 typed refs: R1

^{Created on 2022-04-07 by the reprex package (v2.0.1)}

Missing consolidateDVI in checkDVI()

library(dvir)
#> Loading required package: pedtools
KETPex481
#> DVI dataset: 
#>  2 victims (0M/2F): V1, V2
#>  2 missing (0M/2F): MP1, MP2
#>  1 typed ref: R
#>  1 ref family: (unnamed)
#> Number of markers, PM and AM: 21
checkDVI(KETPex481)
#> Warning: AM families with no missing individuals: 2, 3, 4, 5, 6, 7, 8, 9

^{Created on 2023-10-14 with reprex v2.0.2}

`jointDVI()` returns error if there is nothing left to do

jointDVI() returns an error below (I also encountered this problem with default value of threshold):

library(dvir)
#> Loading required package: pedtools
jointDVI(example2, threshold = 1)
#> DVI dataset:
#>  3 victims: V1, V2, V3
#>  3 missing: M1, M2, M3
#>  2 typed refs: R1, R2
#>  2 reference families
#> 
#> Finding undisputed matches
#> Pairwise LR threshold = 1
#> 
#> Iteration 1:
#> Computing matrix of pairwise LR
#> 1 undisputed match
#>  V3 = M3 (LR = 5)
#> Reducing DVI dataset
#> Removing AM families with no missing persons: 2
#> 
#> Iteration 2:
#> Computing matrix of pairwise LR
#> 1 undisputed match
#>  V2 = M2 (LR = 5)
#> Reducing DVI dataset
#> 
#> Iteration 3:
#> Computing matrix of pairwise LR
#> 1 undisputed match
#>  V1 = M1 (LR = 10)
#> Reducing DVI dataset
#> Removing AM families with no missing persons: 1
#> Error: The dataset has no missing persons

^{Created on 2022-12-12 with reprex v2.0.2}

Add parameter ignoreSex to `exclusionMatrix()`

# Could exclusionMatrix() gain an argument, ignoreSex,
# and return NA in the first row of the exclusion matrix if ignoreSex = F (default):
library(dvir)
#> Loading required package: pedtools
example1$pm["V1"] = swapSex(example1$pm["V1"], "V1")
example1
#> DVI dataset: 
#>  3 victims (2M/1F): V1, V2, V3
#>  3 missing (3M/0F): M1, M2, M3
#>  1 typed refs: R1
#>  1 ref family: (unnamed)
#> Number of markers, PM and AM: 1
exclusionMatrix(example1)
#>    M1 M2 M3
#> V1  0  0  0
#> V2  0  1  0
#> V3  0  0  0

^{Created on 2023-10-13 with reprex v2.0.2}

`dviSim()

This extension of the example in the documentation returns an error:

library(dvir, quietly = T)
dviSim(example2, truth = c(V1 = "M1", V2 = "M2", V3 = "M3"))
#> Error: The first argument must be a `ped` object or a list of such

^{Created on 2023-10-15 with reprex v2.0.2}

jointDVI ignores argument `limit`

library(dvir)
pm = example1$pm
am = example1$am
missing = example1$missing
jointDVI(pm, am, missing, limit = 100000)
#>    V1 V2 V3    loglik     LR    posterior
#> 1  M3 M1 M2 -15.67181 2000.0 0.6895362868
#> 2  M2  * M3 -17.97439  200.0 0.0689536287
#> 3  M2  * M1 -17.97439  200.0 0.0689536287
#> 4   * M1 M2 -17.97439  200.0 0.0689536287
#> 5  M3  * M2 -18.66754  100.0 0.0344768143
..

^{Created on 2021-01-13 by the reprex package (v0.3.0)}

`mergePM` omits last sample?

It appears that the last sample is omitted

library(dvir, quietly = T)
pm1 = singleton("V1") |> 
  addMarker(alleles = 1:3, "V1" = "1/1", name = "L1")
pm2 = singleton("V2") |> 
  addMarker(alleles = 1:3, "V2" = "2/2", name = "L1")

mergePM(list(pm1, pm2))$pmReduced
#> No shared markers
#> $V1
#>  id fid mid sex  L1
#>  V1   *   *   1 1/1
mergePM(list(pm2, pm1))$pmReduced
#> No shared markers
#> $V2
#>  id fid mid sex  L1
#>  V2   *   *   1 2/2

^{Created on 2023-09-08 with reprex v2.0.2}

magnusdv / dvir Goto Github PK

dvir's Introduction

Disaster Victim Identification in R

Installation

Tutorial example

Introduction

Assignments

Goals

The data

Joint identification

Posterior pairing probabilities

dvir's People

Contributors

Stargazers

Watchers

Forkers

dvir's Issues

Recommend Projects

Recommend Topics

Recommend Org