Comments (19)
Here is an updated version of my ideas for a set of classes that takes into account out discussion in #4:
I did most of this before looking at the code you pushed recently, so it does not line up well with it.
from taxa.
@zachary-foster reworked taxa classes based on your new schematic above, let me know what you think.
I'm not sure how taxonomy
is different from hierarchy
, thoughts? I didn't implement taxonomy
yet
from taxa.
Sorry for delay. I like it. will be good to have something like this in the pkg documentation to help users navigate the different classes
from taxa.
looking at the PR
from taxa.
No problem. Iām glad you like it! Yes, it would be useful to have something like this in the documentation.
One change I think I would make after thinking about this more would be to not use taxon
objects in taxonomy
, but rather use taxon_ID
objects directly and make functions to convert between a list of taxon
and a taxonomy
.
I think this would aid in subsetting and memory use.
Taxon objects should have a complete classification, yet this information would be largely redundant in a taxonomy object. For example, a list of 10,000 taxon
in a taxonomy
would have 10,000 of what ever the root taxon was, whereas that information could be stored as one taxon_id
in an edge list. To have both the list of taxon
and the edge list of taxon_id
in taxonomy
would be redundant and hard to keep in sync. To have just a list of taxon
in taxonomy
would require inferring a graph structure every time the taxonomy
was manipulated (e.g. subsetting). Therefore, I think that taxonomy
objects should have only a list of taxon_id
and a vector of indexes that correspond to the location of parents of taxa in that list. We can then make functions to convert between lists of taxon
objects and taxonomy
objects.
from taxa.
I agree we want to make sure data is stored efficiently. Hopefully it's a solution that doesn't take much time to serialize from whatever the object is to e.g., data.frame's
from taxa.
I'll give it a shot on a different branch, and I'll ping you for feedback when it's up
from taxa.
oh yeah, install from this branch https://github.com/ropenscilabs/taxa/tree/taxa-class-rework
devtools::install_github("ropenscilabs/taxa@taxa-class-rework")
from taxa.
Cool, I will look at the updates soon.
I'm not sure how taxonomy is different from hierarchy, thoughts? I didn't implement hierarchy yet
So taxonomy
could store the same information as hierarchies
. hierarchies
would store each classification sequentially, like so:
A > B > C
A > B > F
A > D > E
Where each letter is a taxon
object or character
.
taxonomy
could store the same information hierarchically by having a list of unique taxa and an associated edge list defining the relationships:
A, B, C, D, E, F
1, 2
2, 3
2, 6
1, 4
4, 5
How these differ:
taxonomy
would use less RAM, since shared supertaxa are represented only once, but could not represent multiple instances (what I call "observations" intaxmap
) of the samehierarchy
, e.g.:
A > B > C
A > B > C
A > D > E
taxonomy
would be most easily subset by taxa (e.g. get subtaxa ofB
), whereashierarchies
would be subset by index (e.g., sequences2:3
), although both could be subset both ways in theory.- The conceptual difference is whether the user is mainly interested in instances (seqs in a FASTA file represented by
hierarchies
) or if they are mainly interested in taxa (e.g. taxonomic trees represented bytaxonomy
) - subsetting
taxonomy
would be best done with recursive programming and I am guessing would be faster than subsetting the taxa inhierarchies
. I have already worked out the structure of these functions fortaxmap
and they could be easily adapted.
relationship to taxmap
I am thinking of eventually making taxmap
be constructed from a taxonomy
object with any number of user-defined tables that have a taxon_index
or taxon_id
column. right now, taxmap
has two tables , one which has the edge list, with one row per taxon, and one table that has observation data, with one row per observation. I am thinking of using a taxonomy
class for the edge list, and abstracting the taxon and observation data into a single type of table that only required a column with a taxon_index
or taxon_id
. Im sorry if this is confusing; it sounds unnecessarily abstract, but the reason I have arrived at this solution is to be able to combine any number of tables of taxonomic data, long or wide, into the same object. This would be useful for metabarcoding data in particular.
from taxa.
Oops, did not mean to close this
from taxa.
Thanks for the clarification. Makes sense now. Above I meant to say that taxonomy
hasn't been implemented yet, not hierarchy
I'll get taxonomy
done now
from taxa.
Actually, I'm still not sure how taxonomy
will work, since you have to have some way to group taxa that belong together. So would taxonomy
take in hierarchy
objects? So in your graphic above taxonomy
would be on the same level as hierarchies
and hierarchy
feeds into either taxonomy
or hierarchies
?
from taxa.
So taxonomy
would be made with taxa
objects, but it could be converted to and from hierarchies
objects. The levels in my graph correspond to dependency. taxonomy
is conceptually on the same level as hierarchies
, but only depends on taxa
.
Actually, I'm still not sure how taxonomy will work, since you have to have some way to group taxa that belong together.
taxon
objects are groups together using the edge list. This takes the place of the hierarchies
structure. Think of taxonomy
as a tree structure and hierarchies
as a list. For example, a_taxonomy[1]
would be a taxon
, whereas a_hierarchies[1]
would be a hierarchy
.
taxonomy
works pretty much the same way as taxmap
, but without user-defined data.
from taxa.
@zachary-foster Ah, the input to taxonomy
are taxa
class objects?
from taxa.
Yes, either a list of taxon
or a taxa
, but an edge list would also be needed. I imagine that objects of this class will normally be made by file parsing functions or by converting hierarchies
objects, not "by hand" using the class constructor, since an edge list is not a very intuitive data structure, at least for me; I usually have to think pretty hard to create an edge list for a tree I have in mind.
from taxa.
Okay, will do. Yeah, I don't think we can expect users to supply an edge list.
from taxa.
okay, merging to master now, we can continue changes there
from taxa.
Do you think we can close this? It seems the class structure is in place as far as i can tell
from taxa.
yep
from taxa.
Related Issues (20)
- taxonomy_table function issue with add_id_column parameter. HOT 1
- Support matrix data type in parsers and subsetting functions
- news HOT 1
- Print method idea: tidy lists
- Is a `head` function needed for `taxmap`? HOT 2
- `taxa`: way to do operations on objects within it HOT 4
- Vectorizing some classes and reorganizing inheritance hierarchy HOT 49
- possible to put taxize in Suggests? HOT 2
- Assign taxon_rank based on n_supertaxa HOT 1
- Function to produce html/latex table of taxonomic and associated data ? HOT 3
- use cases section a bit outdated HOT 4
- HTTP errors when parsing long taxon_id list HOT 12
- No taxon_rank information when parsing from NCBI taxonomy HOT 1
- import of taxdump files HOT 3
- Error: Failed to install 'taxa' from GitHub HOT 1
- error in db_ref$get() after using $set() HOT 2
- creating a taxonomy database for PR2 HOT 10
- Error when using parse_tax_data function HOT 1
- Issues with CRAN check on 2022-03-10 HOT 1
- vctrs revdep issues HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
š Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ššš
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ā¤ļø Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from taxa.