Giter Club home page Giter Club logo

phenopacketgenerator's Introduction

PhenopacketGenerator-

Generate a phenopacket for use with LIRICAL or Exomiser.

Building PhenopacketGenerator

Most users should download the latest prebuilt executable from the releases tab. Phenopacket generator can also be built from source using maven as follows.

$ git clone https://github.com/TheJacksonLaboratory/PhenopacketGenerator.git
$ cd PhenopacketGenerator
$ mvn package

This will build PhenopacketGenerator in the target/ subdirectory.

Running PhenopacketGenerator

On most systems, PhenopacketGenerator can be started with a double click. It can also be started from the command line as follows.

$ java -jar Phenopacket-Generator.jar 

To set up the executable, you will need to indicate the path to the version of the Human Phenotype Ontology Ontology file (hp.obo). The current version of this file can always be downloaded from the HPO Download page.

When Phenopacket Generator is started for the first time, it will indicate that the path to hp.obo needs to be set.

Phenopacket Generator Start Screen

Use the File chooser dialog in the Edit menu to do so. Also set the biocurator ID (this id will be used to denote the creator of the Phenopacket). Once the hp.obo path has been set, the Enter HPO terms and Export Phenopacket buttons will be activated. Enter the data as indicated. A separate dialog will appear once the Enter HPO Terms button is clicked that allows users to navigate the HPO hierarchy, use an autocomplete window, or use text mining to enter HPO terms.

HPO Text Mining

Once all data has been entered, click on Export Phenopacket to save the Phenopacket file to disk. If any required data is missing or malformed, an error dialog will appear, and users will need to correct the data before saving the file.

Data Entry

The following fields can be entered.

  • Sex (optional)
  • Age (optional). Use one or more of the pull down menus to enter the age in years, months, or days. The age information will be stored using the ISO 8601 format, e.g., P42Y for 42 years, P12Y2M3D for 12 years, 2 months, and 3 days
  • Phenopacket ID (required). This ID cannot be empty but can be any user-defined string
  • Proband ID (required). This ID cannot be empty but can be any user-defined string
  • HPO terms. At least one term must be entered. Observed or excluded (negated) terms can be entered. There is no limit to the total number of HPO terms that can be entered.
  • VCF file (optional). The path to a VCF File that is expected to represent the results of NGS Gene Panel, Exome, or Genome sequencing on the proband. The file must have the suffix vcf or vcf.gz
  • Genome assembly (required if a VCF file is provided). The assembly of the VCF file.

Phenopacket export

The phenopacket-schema defines the phenotypic description of a patient/sample (for instance in the context of rare disease or cancer genomic diagnosis). It aims to provide sufficient and shareable information of the data outside of the EHR (Electronic Health Record) with the aim of enabling capturing of sufficient structured data at the point of care by a clinician or clinical geneticist for sharing with other labs or computational analysis of the data in clinical or research environments.

PhenopacketGenerator currently is designed to generate a Phenopacket that represents the phenotypic features of an individual with suspected Mendelian disease for whom genomic diagnostics is being performed. The resulting phenopacket can be used as an in put file for programs such as LIRICAL.

Here is an example phenopacket for an individual with Portal vein thrombosis and Splenomegaly. The path to a VCF file (/path/to/example.vcf) is indicated.

{
  "id": "ID:1",
  "subject": {
    "id": "Patient A",
    "ageAtCollection": {
      "age": "P6Y5M"
    },
    "sex": "MALE"
  },
  "phenotypicFeatures": [{
    "type": {
      "id": "HP:0001744",
      "label": "Splenomegaly"
    },
    "evidence": [{
      "evidenceCode": {
        "id": "ECO:0000302",
        "label": "author statement used in manual assertion"
      }
    }]
  }, {
    "type": {
      "id": "HP:0030242",
      "label": "Portal vein thrombosis"
    },
    "evidence": [{
      "evidenceCode": {
        "id": "ECO:0000302",
        "label": "author statement used in manual assertion"
      }
    }]
  }],
  "htsFiles": [{
    "uri": "file://home/peter/data/lirical/SRR8906477.filtered.vcf",
    "htsFormat": "VCF",
    "genomeAssembly": "hg38"
  }],
  "metaData": {
    "created": "2019-11-10T15:47:06.750Z",
    "createdBy": "ExampleOrg:ExampleCurator",
    "resources": [{
      "id": "hp",
      "name": "human phenotype ontology",
      "url": "http://purl.obolibrary.org/obo/hp.owl",
      "version": "unknown HPO version",
      "namespacePrefix": "HP",
      "iriPrefix": "http://purl.obolibrary.org/obo/HP_"
    }, {
      "id": "eco",
      "name": "Evidence and Conclusion Ontology",
      "url": "http://purl.obolibrary.org/obo/eco.owl",
      "version": "2019-10-16",
      "namespacePrefix": "ECO",
      "iriPrefix": "http://purl.obolibrary.org/obo/ECO_"
    }],
    "phenopacketSchemaVersion": "1.0.0"
  }
}

phenopacketgenerator's People

Contributors

pnrobinson avatar ielis avatar

Stargazers

Dylan avatar Roland Ewald avatar Tony Håndstad avatar ling luo avatar xi zhang avatar  avatar

Watchers

Justin Reese avatar James Cloos avatar  avatar  avatar  avatar

Forkers

790675356

phenopacketgenerator's Issues

YAML output option?

Could we have an option to output yaml as well as json? It reads better for configuration files and I'd like to use this for Exomiser input which uses yaml config.

It's trivial to convert JSON to YAML:

String yaml = new YAMLMapper().writeValueAsString(jsonNode);
---
family:
  proband:
    subject:
      id: "manuel"
    phenotypicFeatures:
    - type:
        id: "HP:0001156"
        label: "Brachydactyly"
    - type:
        id: "HP:0001363"
        label: "Craniosynostosis"
    - type:
        id: "HP:0011304"
        label: "Broad thumb"
    - type:
        id: "HP:0010055"
        label: "Broad hallux"
    htsFiles:
    - uri: "file://Pfeiffer.vcf"
      htsFormat: "VCF"
      genomeAssembly: "GRCh37"
  pedigree:
    persons:
    - individualId: "manuel"
      sex: "MALE"
      affectedStatus: "AFFECTED"

compared to

{
  "family": {
    "proband": {
      "subject": {
        "id": "manuel"
      },
      "phenotypicFeatures": [{
        "type": {
          "id": "HP:0001156",
          "label": "Brachydactyly"
        }
      }, {
        "type": {
          "id": "HP:0001363",
          "label": "Craniosynostosis"
        }
      }, {
        "type": {
          "id": "HP:0011304",
          "label": "Broad thumb"
        }
      }, {
        "type": {
          "id": "HP:0010055",
          "label": "Broad hallux"
        }
      }],
      "htsFiles": [{
        "uri": "file://Pfeiffer.vcf",
        "htsFormat": "VCF",
        "genomeAssembly": "GRCh37"
      }]
    },
    "pedigree": {
      "persons": [{
        "individualId": "manuel",
        "sex": "MALE",
        "affectedStatus": "AFFECTED"
      }]
    }
  }
}

Evidence code

Currently, ,we are using this evidence code:

"id": "ECO:0000033",
"label": "author statement supported by traceable reference"

But the code is designed to denote that the statement refers to a published paper.
We need a code to say that the biocurator is saying this without more evidence (we do not know how this tool will be used). Even the superclass 'author statement' refers to a paper.

The nearest I can find is https://www.ebi.ac.uk/ols/ontologies/eco/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FECO_0000302&viewMode=All&siblings=false

author statement used in manual assertion

I am changing this, but please shout if there is something more appropriate.

hp.obo path is not saved

The path is not being saved correctly (Macintosh) and so the user needs to reenter the path each time the program is started.

Days and Months missing

The drop-downs for age only allow 11 months and 30 days. This should be 12 and 31 respectively.

Always override equals() and hashCode() in value-type classes (toString() is always nice too)

e.g. PgOntologyClass - if you create two of these with the same parameters they will not be considered equals and cannot be safely used as keys in maps or as values in sets which rely on these methods.

For simple immutable value classes consider using the Immutables library:

@Value.Immutable
abstract class AbstractPgOntologyClass {
    abstract String id;
    abstract String label;
    abstract boolean not_observed
}

will auto-generate the class for you with all the equals(), hashCode(), toString, getters, builders and all the rest.

Approved terms should not contain duplicates

In the window which pops up from the 'Enter HPO terms' button, the bottom right table titled 'Approved terms' should not contain duplicate HPO terms. Currently it it possible to add the same term more than once. On output it would be nice to sort the terms by id, for the sake of consistency and reproducibility.

HtsFile URI

I'm using the pheopacket generator to generate phenopackets to test as input to Exomiser (it's working!) however there is an issue where the URI is not valid for a file when generated on Windows. This is probably a Windows-specific issue but it's easily solved.

private String getVcfUri() {
if (vcfPath.startsWith("file")){
return vcfPath;
} else if (this.vcfPath.startsWith("//")) {
return String.format("file:%s",this.vcfPath);
} else if (this.vcfPath.startsWith("/")) {
return String.format("file:/%s",this.vcfPath);
} else {
File f = new File(vcfPath);
return String.format("file://%s",f.getAbsolutePath());
}
}

What's the logic for checking the prefix in the first place? Given this is generated from a file chooser dialogue, why not just replace this all with:

Path path = Paths.get(vcfPath);
return path.toUri().toString();

this will return a correctly formatted URI with all the \\ replaced with a nice / e.g.

file:///C:/Users/...

Does this also output the three /// on Mac/Linux? This is required to be read correctly on Windows.

addendum This explains it in a concise way: https://en.wikipedia.org/wiki/File_URI_scheme so it looks like this is also a Mac/Linux issue too.

To read the file URI you need to reverse the input like this:

URI uri = URI.create(htsFile.getUri());
Path filePath = Paths.get(uri);

HPO text-mining analysis terms API

Hi, I only want to use 'HPO text-mining analysis terms' function to extract HPO terms from text in our code without GUI. Do you provide the API of the function? Or how can I use the function in our code?
Thanks a lot.

1.0.0-RC3

Upgrade phenopacket version to 1.0.0.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.