Giter Club home page Giter Club logo

dotkegg's People

Contributors

rabadash8820 avatar

Watchers

 avatar  avatar

dotkegg's Issues

Add Unit Tests for the New KeggReader Class

These tests should include:

  • Can't create a KeggReader with a null Stream
  • Can call Dispose() multiple times
  • Calling any other methods after Dispose() throws InvalidOperationExceptions
  • Several Read() calls return the expected KeggIds
  • ReadBlock() saves the expected KeggIds
  • ReadBlock() returns the actual number of KeggIds saved, even if EOS is reached
  • ReadBlock() throws similar exceptions to StreamReader.Read()
    • ArgumentException: The buffer length minus index is less than count.
    • ArgumentNullException: buffer is null.
    • ArgumentOutOfRangeException: index or count is negative.
    • IOException: An I/O error occurs, such as the stream is closed.
  • ReadToEnd() returns all remaining KeggIds

KeggId Classes Should Not Have Tokens In Summary XML Comments

Currently, all the derived KeggId classes have a KEGG database link token in their summaries. However, the token name is converted to text in IntelliSense. For example, the summary for TNumber becomes "Represents an identifier in the GenomeDbLink database", which of course looks silly. So it looks like we'll have to scrap this nice link from the documentation, for the sake of cleaner IntelliSence (which arguably is more important anyway).

Add Unit Tests for KeggId Classes

Like the KeggDb tests, these just check that KeggIds can be constructed, have unique hash codes, and their equality methods/operators work as expected. Also add tests for the new TNumber.GetOrganismCode() method (this can just be [Ignore]d until we actually implement that method).

Use Expression-Bodied Members Throughout DotKEGG

We should take advantage of the new C#7 features, like expression-bodied methods and getters/setters. These changes would prevent collaborators from using Visual Studio versions older than 2017, however, so we should probably wait a while before making these changes.

Edit README to Say VS 2017 is Required

After using <ProjectReference> items in our .csproj files with #24, and after adding .editorconfig preferences, VS 2017 will now be required. This must be reflected in the main repo README.

Add GetOrganismCode() Method to TNumber

Since genome TNumbers map one-to-one to organism codes, we should add a GetOrganismCode() method on the TNumber class, similar to the GetGenomeId() method on the new OrgansimCode class (#29). The method will probably require a call to KeggGet or KeggLink so for now just throw a NotImplementedException. This method should also use the lazy pattern, so future calls don't require accessing KEGG. If the TNumber doesn't exist, then this method should throw an InvalidOperationException.

Split DotKEGG into Domain and Operations Assemblies

DotKEGG needs to target .NET 3.5 so it can be used in Cell. However, most people using DotKEGG will probably be using .NET 4 or later, with access to the TPL and async/await programming, and will want those more modern features. Therefore, we should split DotKEGG into two assemblies:

  1. DotKEGG.Domain: contains the types that are basically just POCOs, including KeggId and its subclasses, KeggDb and its subclasses, and all the future database entry classes. This assembly should target .NET Framework 3.5.
  2. DotKEGG.Operations: contains the KEGG operation classes, including KeggInfo, KeggList, KeggGet, KeggLink, etc. There will be two versions of this assembly, one targeting .NET Framework 3.5, and one targeting .NET Framework 4 and later. Only the latter will support async methods.

Of course, both assemblies should be provided in the NuGet package. NuGet handles adding references to the proper assembly versions.

Add KeggReader Class

This is the class that will be returned by KEGG operations that would otherwise return massive result sets. It allows users to get results one or a few at a time. This class should have the following members:

  • internal constructor, so that KeggRestApi can construct one, but API users must call an operation method. The constructor takes a Stream argument, which will always be the Stream returned by WebClient.OpenRead().
  • Dispose() method and finalizer, implementing the IDisposable interface
  • Read(): returns a single KeggId, initialized from the next line of the Stream
  • ReadBlock(): saves a user-provided number of KeggIds, each initialized from a line of the Stream, into an array. Returns the number of KeggIds that were actually saved (in case EOS was reached before the requested number of KeggIds were saved)
  • ReadToEnd(): returns an array of KeggIds, initialized from remaining lines of the Stream

Don't forget API documentation!

Add a "Release Process" Document

Add a simple MarkDown file to the root of the repo, explaining the steps to take on a release branch. These should include the steps that were taken during the v0.2.0-alpha release, as well as updating documentation in the S3 Bucket, and updating the latest version numbers on the main README.

Then, it will be easy to see what should be included in a future publish script!

Add AWS CloudFormation Template for Documentation Site

Create an AWS CloudFormation template to define the cloud resources necessary to host the DotKEGG static documentation site. Template should be in YAML format and define an S3 bucket as follows:

  • S3 bucket

    • Logging
    • No versioning (revision numbers in documentation files, if we use them, will be enough)
    • Set up for static web-hosting at docs.dotkegg.com
  • DNS routing

    • docs.dotkegg.net -> docs.dotkegg.com (CNAME)
    • docs.dotkegg.org -> docs.dotkegg.com (CNAME)
    • docs.dotkegg.com -> static hosted documentation (ALIAS for S3 bucket that hosts)
    • dotkegg.net -> DotKEGG website (ALIAS for S3 bucket that redirects)
    • dotkegg.org -> DotKEGG website (ALIAS for S3 bucket that redirects)
    • dotkegg.com -> DotKEGG website (ALIAS for S3 bucket that redirects)
    • www.dotkegg.net -> DotKEGG website (ALIAS for S3 bucket that redirects, since URL has path)
    • www.dotkegg.org -> DotKEGG website (ALIAS for S3 bucket that redirects, since URL has path)
    • www.dotkegg.com -> DotKEGG website (ALIAS for S3 bucket that redirects, since URL has path)

Add OrganismCode Class

This class will provide a strongly typed representation of an organism code, effectively wrapping a string such as "hsa" or "ece". It should not inherit from KeggId, because it's not a number. It should have the following members:

  • constructor that takes a string, e.g. "hsa" or "ece"
  • public GetGenomeId() method that looks up the TNumber corresponding to this organism code. This will probably require a call to KeggGet or KeggLink so for now just throw a NotImplementedException. This method should also use the lazy pattern, so future calls don't require accessing KEGG. If the OrganismCode doesn't exist, then this method should throw an InvalidOperationException.
  • Equals() methods inherited from Object() and IEquatable()
  • The == and != operators

Make KeggDb Classes Thread-Safe

KeggDb, and all the database types that derive from it, should be made thread-safe. That means all the Db.Instance properties should have locks on the null check. None of the other methods/properties really pose concurrency issues.

Run All Unit Tests In Parallel

With NUnit, tests are actually executed sequentially by default, even with Visual Studio's "Run In Parallel" option checked. Also, before NUnit 3.7 (which hasn't released yet), test methods can only be run in parallel if they are in different fixtures. So, we need to add the Parallelizable attribute to the DotKEGG.Test assembly, but we probably won't see any real benefit until NUnit 3.7 releases.

See these NUnit Wiki articles for more info:

Add .editorconfig File for VS2017 Developers

Checkout the EditorConfig documentation to see how this is done. We just want to add typical Danware-style C# coding guidelines. C#7 features should not throw errors or warnings if left out, since we'll most likely need to support VS2015 (i.e., C#6) for a while yet.

Custom Error Pages for Documentation

AWS allows custom HTML pages to be displayed when 4xx and 5xx error codes occur. This would be a really nice feature to add, but would require more sophisticated use of HTML/CSS than we have right now... We put in an issue on the SHFB GitHub page to request error pages as a new feature.

Until then we'll just leave this in the backlog in case some future developer wants to be awesome and make such a page. If they do, the specific properties in the CloudFormation template are the CloudFront Distribution's CustomErrorResponses and the S3 Bucket ErrorDocument properties.

Info Operation Should be Implemented as KeggDb Methods

We are trying to move away from the static operation class pattern to using instance methods on the KeggDb classes. Compare:

KeggInfo.ForDatabase(PathwayDb.Instance)
vs
PathwayDb.Instance.Info()

The second form is much more clean/fluent. Luckily, since all KeggDb classes' Info() methods will return the same thing (an InfoResults object), we can just put this method on the KeggDb and KeggCompositeDb classes. Also, GenomeDb should have a GeneInfo() method, with one overload that takes a TNumber and one that takes an OrganismCode.

Don't forget XML comments on these methods!

Add KeggDb Unit Tests

Add unit tests for KeggDb and its derived classes. This includes, but is not limited to:

  • Not null check on Db.Instance properties
  • Test various equality/inequality methods
  • Not null check on Db.Entry methods

Add KeggId Examples to KeggId Remarks

Have a basic example showing how to create a KeggId derived instance using either the derived class's constructor, or a derived KeggDb.Entry() type method. This example should have some sample output of KeggId.ShortForm and KeggId.DBGETForm() calls.

Use New <PackageReference> Tags in Project File

We should be using the leaner .csproj format offered in Visual Studio 2017, particularly the <PackageReference> element for referencing NuGet packages more succinctly. By doing so, we should also be able to use the new package features in the VS Project Properties page. The end goal of all of this is to make it easier to make DotKEGG target multiple versions of .NET.

Here are a couple webpages with more information about migrating from old .csproj and package.config files to the newer format:

Add CloudFront Distribution to Documentation Template

We should place the S3 Bucket currently hosting the documentation in a CloudFront distribution to decrease latency. All docs subdomains will then ALIAS to that distribution, rather than the bucket itself. Here is a link to the AWS::CloudFront::Distribution reference page.

Add Unit Tests for the new OrganismCode Class

Make sure construction, GetGenomeId(), and equality operators/methods work as expected, and that hash codes are unique. We can also remove some of the KeggInfo unit tests for invalid organism codes, since that logic has been moved to the OrganismCode constructor.

Inherit XML Comments in KeggDb Classes

All database types inheriting from KeggDb should have the <inheritdoc/> in the XML comments of their various methods/properties. That way, we won't have to copy/pasta the same comments between all these classes.

Inherit XML Comments in KeggId Classes

All KEGG ID types inheriting from KeggId should have <inheritdoc/> in the XML comments of their various methods/properties. That way, we won't have to copy/pasta the same comments between all these classes.

Implement KEGG List Operation

The KEGG list operation will be implemented as a set of methods on our various KeggDb classes. We are moving away from the static operation class pattern because it makes a lot more sense to have operations be actions that a KeggDb instance can perform. Compare:

KeggList.InDatabase(PathwayDb.Instance)

vs

PathwayDb.Instance.ListPathways()

The second form is a bit more clean/fluent, and allows us to have very descriptive method names on each KeggDb class (ListModules(), ListCompounds(), etc.). Of course, this will require a little more dev effort, as we must implement the methods in each class, not some base abstract class.

Some notes on the List() methods:

  • They will return a ListResults<ID> of all KeggIds in the KeggDb instance, where ID derives from KeggId.
  • For each List() method, there should be one that returns all entries in a single massive list (very time- and memory-intensive), and one that returns them in a ListResults&lt;T&gt; data structure (faster and easier to get results a couple at a time
    • ListResults<T> will need to either inherit from Stream, or wrap a Stream
  • EnzymeDb is the only composite databases that will have a List() method, returning a ListResults<TNumber> of all ECNumbers
  • GenomeDb will have the following methods:
    • ListGenomeIds(): returns a ListResults<TNumber> of all genomes
    • ListOrganismCodes(): returns a ListResults<OrganismCode> of all organisms
    • ListGenesInGenome(TNumber): returns a ListResults<GeneId> of all genes in the provided genome
    • ListGenesInOrganism(OrganismCode): returns a ListResults<GeneId> of all genes belonging to the provided organism
  • PathwayDb will have the following methods:
    • ListPathways: Returns a ListResults<MapNumber> of all KEGG Pathways
    • ListPathwaysInOrganism(OrganismCode): Returns a ListResults<MapNumber> of all pathways belonging to the provided organism
    • ListPathwaysInGenome(TNumber): Returns a ListResults<MapNumber> of all pathways belonging to the provided genome
  • ModuleDb will have the following methods:
    • ListPathways: Returns a ListResults<MNumber> of all KEGG Modules
    • ListPathwaysInOrganism(OrganismCode): Returns a ListResults<MNumber> of all modules belonging to the provided organism
    • ListPathwaysInGenome(TNumber): Returns a ListResults<MNumber> of all modules belonging to the provided genome

Add Admin Group for the S3 Bucket Hosting Documentation

The docs.dotkegg.com S3 bucket needs permissions in the CloudFormation template. We should thus define a new IAM group called "dotkegg-docs-admins" who have explicit permission to manage objects on that bucket, and that bucket only. This group should have no other AWS permissions, and S3 permissions on no other objects.

Add SeeAlso XML Comments to DotKEGG Types

Currently, the only See Also link in the documentation is for the DotKEGG namespace. We should add some more, including:

  • KeggId links
    • From KeggId to KeggDb and KeggCompositeDb
    • From all derived KeggId classes back to KeggId
    • From all derived KeggId classes to KeggDb
    • From all derived KeggId classes to KeggCompositeDb if they're referenced in a composite database
    • From all derived KeggId classes to their respective derived KeggDb class
    • From all derived KeggId classes to any derived KeggCompositeDb classes that they are referenced in
    • From ECNumber to KNumber and RNumber, and to OrthologyDb, ReactionDb, EnzymeDb, LigandDb, and KeggCompositeDb,
    • From OrganismCode to TNumber, GenomeDb, GenomesDb, KeggDb, and KeggCompositeDb
  • KeggDb links
    • From KeggDb to KeggId and KeggCompositeDb
    • From all derived KeggDb classes back to KeggDb
    • From all derived KeggDb classes to KeggId
    • From all derived KeggDb classes to KeggCompositeDb, if they're included in a composite database
    • From all derived KeggDb classes to their respective derived KeggId class
  • KeggCompositeDb links
    • From KeggCompositeDb to KeggId and KeggDb
    • From all derived KeggCompositeDb classes back to KeggCompositeDb
    • From all derived KeggCompositeDb classes to KeggId and KeggDb
    • From all derived KeggCompositeDb classes to their respective derived KeggId class(es)
    • From LigandDb and EnzymeDb to ECNumber
  • Operation links
    • All KEGG operation static classes should link to every other KEGG operation
    • If operation classes return or work with custom data types, then there should also be links to those types

Create Repo Wiki Page

We should have something on the Wiki page for this repo. So add a Home page that just refers users to the actual documentation hosted on S3.

Add Unit Tests for ECNumber Class

These tests should include:

  • Construction works as expected, generating correcting property and ToString() values
  • Static construction methods for specific Enzyme classes work as expected
  • Equality operators/methods work as expected
  • Hash codes are unique

Add Documentation for new OrganismCode Struct

Don't forget examples, and any relevant remarks:

  • Describing the one-to-one relationship between organisms and genomes
  • Describing the lazy functionality of the GetGenomeId() function

Better Error Handling in KeggRestApi

The KeggRestApi class should have try/catch blocks in all of its methods, to free calling code from having to handle WebExceptions.

The various Info() methods should also be looked over in light of these changes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.