Giter Club home page Giter Club logo

Comments (19)

zachary-foster avatar zachary-foster commented on July 29, 2024 1

Here is an updated version of my ideas for a set of classes that takes into account out discussion in #4:

taxa_class_ideas

I did most of this before looking at the code you pushed recently, so it does not line up well with it.

from taxa.

sckott avatar sckott commented on July 29, 2024 1

@zachary-foster reworked taxa classes based on your new schematic above, let me know what you think.

I'm not sure how taxonomy is different from hierarchy, thoughts? I didn't implement taxonomy yet

from taxa.

sckott avatar sckott commented on July 29, 2024

Sorry for delay. I like it. will be good to have something like this in the pkg documentation to help users navigate the different classes

from taxa.

sckott avatar sckott commented on July 29, 2024

looking at the PR

from taxa.

zachary-foster avatar zachary-foster commented on July 29, 2024

No problem. Iā€™m glad you like it! Yes, it would be useful to have something like this in the documentation.

One change I think I would make after thinking about this more would be to not use taxon objects in taxonomy, but rather use taxon_ID objects directly and make functions to convert between a list of taxon and a taxonomy.
I think this would aid in subsetting and memory use.

Taxon objects should have a complete classification, yet this information would be largely redundant in a taxonomy object. For example, a list of 10,000 taxon in a taxonomy would have 10,000 of what ever the root taxon was, whereas that information could be stored as one taxon_id in an edge list. To have both the list of taxon and the edge list of taxon_id in taxonomy would be redundant and hard to keep in sync. To have just a list of taxon in taxonomy would require inferring a graph structure every time the taxonomy was manipulated (e.g. subsetting). Therefore, I think that taxonomy objects should have only a list of taxon_id and a vector of indexes that correspond to the location of parents of taxa in that list. We can then make functions to convert between lists of taxon objects and taxonomy objects.

from taxa.

sckott avatar sckott commented on July 29, 2024

I agree we want to make sure data is stored efficiently. Hopefully it's a solution that doesn't take much time to serialize from whatever the object is to e.g., data.frame's

from taxa.

sckott avatar sckott commented on July 29, 2024

I'll give it a shot on a different branch, and I'll ping you for feedback when it's up

from taxa.

sckott avatar sckott commented on July 29, 2024

oh yeah, install from this branch https://github.com/ropenscilabs/taxa/tree/taxa-class-rework

devtools::install_github("ropenscilabs/taxa@taxa-class-rework")

from taxa.

zachary-foster avatar zachary-foster commented on July 29, 2024

Cool, I will look at the updates soon.

I'm not sure how taxonomy is different from hierarchy, thoughts? I didn't implement hierarchy yet

So taxonomy could store the same information as hierarchies. hierarchies would store each classification sequentially, like so:

A > B > C
A > B > F
A > D > E

Where each letter is a taxon object or character.

taxonomy could store the same information hierarchically by having a list of unique taxa and an associated edge list defining the relationships:

A, B, C, D, E, F

1, 2
2, 3
2, 6
1, 4
4, 5

How these differ:

  • taxonomy would use less RAM, since shared supertaxa are represented only once, but could not represent multiple instances (what I call "observations" in taxmap) of the same hierarchy, e.g.:
A > B > C
A > B > C
A > D > E
  • taxonomy would be most easily subset by taxa (e.g. get subtaxa of B), whereas hierarchies would be subset by index (e.g., sequences 2:3), although both could be subset both ways in theory.
  • The conceptual difference is whether the user is mainly interested in instances (seqs in a FASTA file represented by hierarchies) or if they are mainly interested in taxa (e.g. taxonomic trees represented by taxonomy)
  • subsetting taxonomy would be best done with recursive programming and I am guessing would be faster than subsetting the taxa in hierarchies. I have already worked out the structure of these functions for taxmap and they could be easily adapted.

relationship to taxmap

I am thinking of eventually making taxmap be constructed from a taxonomy object with any number of user-defined tables that have a taxon_index or taxon_id column. right now, taxmap has two tables , one which has the edge list, with one row per taxon, and one table that has observation data, with one row per observation. I am thinking of using a taxonomy class for the edge list, and abstracting the taxon and observation data into a single type of table that only required a column with a taxon_index or taxon_id. Im sorry if this is confusing; it sounds unnecessarily abstract, but the reason I have arrived at this solution is to be able to combine any number of tables of taxonomic data, long or wide, into the same object. This would be useful for metabarcoding data in particular.

from taxa.

zachary-foster avatar zachary-foster commented on July 29, 2024

Oops, did not mean to close this

from taxa.

sckott avatar sckott commented on July 29, 2024

Thanks for the clarification. Makes sense now. Above I meant to say that taxonomy hasn't been implemented yet, not hierarchy

I'll get taxonomy done now

from taxa.

sckott avatar sckott commented on July 29, 2024

Actually, I'm still not sure how taxonomy will work, since you have to have some way to group taxa that belong together. So would taxonomy take in hierarchy objects? So in your graphic above taxonomy would be on the same level as hierarchies and hierarchy feeds into either taxonomy or hierarchies?

from taxa.

zachary-foster avatar zachary-foster commented on July 29, 2024

So taxonomy would be made with taxa objects, but it could be converted to and from hierarchies objects. The levels in my graph correspond to dependency. taxonomy is conceptually on the same level as hierarchies, but only depends on taxa.

Actually, I'm still not sure how taxonomy will work, since you have to have some way to group taxa that belong together.

taxon objects are groups together using the edge list. This takes the place of the hierarchies structure. Think of taxonomy as a tree structure and hierarchies as a list. For example, a_taxonomy[1] would be a taxon, whereas a_hierarchies[1] would be a hierarchy.

taxonomy works pretty much the same way as taxmap, but without user-defined data.

from taxa.

sckott avatar sckott commented on July 29, 2024

@zachary-foster Ah, the input to taxonomy are taxa class objects?

from taxa.

zachary-foster avatar zachary-foster commented on July 29, 2024

Yes, either a list of taxon or a taxa, but an edge list would also be needed. I imagine that objects of this class will normally be made by file parsing functions or by converting hierarchies objects, not "by hand" using the class constructor, since an edge list is not a very intuitive data structure, at least for me; I usually have to think pretty hard to create an edge list for a tree I have in mind.

from taxa.

sckott avatar sckott commented on July 29, 2024

Okay, will do. Yeah, I don't think we can expect users to supply an edge list.

from taxa.

sckott avatar sckott commented on July 29, 2024

okay, merging to master now, we can continue changes there

from taxa.

zachary-foster avatar zachary-foster commented on July 29, 2024

Do you think we can close this? It seems the class structure is in place as far as i can tell

from taxa.

sckott avatar sckott commented on July 29, 2024

yep

from taxa.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.