rabadash8820 / dotkegg Goto Github PK
View Code? Open in Web Editor NEWA .NET wrapper for the Kyoto Encyclopedia of Genes and Genomes (KEGG) API.
License: MIT License
A .NET wrapper for the Kyoto Encyclopedia of Genes and Genomes (KEGG) API.
License: MIT License
Add a .cmd script that uses the AWS CLI to upload the entire hierarchy of help website files generated by SHFB to the hosting S3 bucket.
These tests should include:
Currently, all the derived KeggId classes have a KEGG database link token in their summaries. However, the token name is converted to text in IntelliSense. For example, the summary for TNumber becomes "Represents an identifier in the GenomeDbLink database", which of course looks silly. So it looks like we'll have to scrap this nice link from the documentation, for the sake of cleaner IntelliSence (which arguably is more important anyway).
The ECNumber is pretty different from the other derived KeggId classes, so its documentation kind of deserves its own issue...
Like the KeggDb tests, these just check that KeggIds can be constructed, have unique hash codes, and their equality methods/operators work as expected. Also add tests for the new TNumber.GetOrganismCode() method (this can just be [Ignore]
d until we actually implement that method).
This is basically in keeping with the issue that we opened in AwsUtil. The !Sub notation is quite a bit cleaner than !Join, almost like String.Format() in .NET, so we should use it instead.
Main README file should have a "Contribute" section, with details about how to clone the repository, set up the solution, test it, etc.
We should take advantage of the new C#7 features, like expression-bodied methods and getters/setters. These changes would prevent collaborators from using Visual Studio versions older than 2017, however, so we should probably wait a while before making these changes.
This should be a more fluent syntax. Compare KeggInfo.ForDatabase(db)
to KeggInfo.Database(db)
.
After using <ProjectReference> items in our .csproj files with #24, and after adding .editorconfig preferences, VS 2017 will now be required. This must be reflected in the main repo README.
Add XML comments, remarks, examples, etc. to the KeggId base class.
This parameter will be used to set the OriginPath
property of the CloudFront distribution for the documentation website. That way, we can just update the parameter during each release, rather than forever making commits to update it.
Add unit tests to verify that the KeggDb Equals() method (and overloads) work correctly, and that the == and != operators work correctly. Also do this for KeggCompositeDb.
Since genome TNumbers map one-to-one to organism codes, we should add a GetOrganismCode() method on the TNumber class, similar to the GetGenomeId() method on the new OrgansimCode class (#29). The method will probably require a call to KeggGet or KeggLink so for now just throw a NotImplementedException. This method should also use the lazy pattern, so future calls don't require accessing KEGG. If the TNumber doesn't exist, then this method should throw an InvalidOperationException.
DotKEGG needs to target .NET 3.5 so it can be used in Cell. However, most people using DotKEGG will probably be using .NET 4 or later, with access to the TPL and async/await programming, and will want those more modern features. Therefore, we should split DotKEGG into two assemblies:
Of course, both assemblies should be provided in the NuGet package. NuGet handles adding references to the proper assembly versions.
This is the class that will be returned by KEGG operations that would otherwise return massive result sets. It allows users to get results one or a few at a time. This class should have the following members:
Read()
: returns a single KeggId, initialized from the next line of the StreamReadBlock()
: saves a user-provided number of KeggIds, each initialized from a line of the Stream, into an array. Returns the number of KeggIds that were actually saved (in case EOS was reached before the requested number of KeggIds were saved)ReadToEnd()
: returns an array of KeggIds, initialized from remaining lines of the StreamDon't forget API documentation!
Currently, all S3 Buckets in the documentation template have an AccessControl
value of PublicRead
. With CloudFront these buckets can now be Private, so not doing so could potentially be a security hole.
We forgot to refactor the calls to Enzyme() constructors in the test project, after changing its first parameter from a uint to an enum.
Add a simple MarkDown file to the root of the repo, explaining the steps to take on a release branch. These should include the steps that were taken during the v0.2.0-alpha release, as well as updating documentation in the S3 Bucket, and updating the latest version numbers on the main README.
Then, it will be easy to see what should be included in a future publish script!
Sign the main DotKEGG assembly with a .pfx file.
Create an AWS CloudFormation template to define the cloud resources necessary to host the DotKEGG static documentation site. Template should be in YAML format and define an S3 bucket as follows:
S3 bucket
DNS routing
Currently, both https://docs.dotkegg.net and https://docs.dotkegg.org can be used to view the documentation website. Both of these domains should be redirecting to https://docs.dotkegg.com. This article seems to present a solution.
This class will provide a strongly typed representation of an organism code, effectively wrapping a string such as "hsa" or "ece". It should not inherit from KeggId, because it's not a number. It should have the following members:
KeggDb, and all the database types that derive from it, should be made thread-safe. That means all the Db.Instance properties should have locks on the null check. None of the other methods/properties really pose concurrency issues.
The following buckets should never contain any objects. Therefore, they should all be given a bucket policy in the CloudFormation template that explicitly allows only the most basic S3 operations.
AWS CloudFormation RecordSet resources have a HealthCheckId property, of type AWS::Route53::HealthCheck. For our first website that may get real traffic, this seems like something we should look into implementing...
Using the WebClient.BaseAddress property, we can specify the KEGG REST domain in just one place.
Something just doesn't seem right about naming a type "Class
". But if we rename it to "ECEnzymeClass
", then we might as well just make it a direct member of the DotKEGG namespace and give it its own file.
Revert #13
We should move away from defining security resources in CloudFormation templates. For the reasons described in this issue, they should be defined manually.
With NUnit, tests are actually executed sequentially by default, even with Visual Studio's "Run In Parallel" option checked. Also, before NUnit 3.7 (which hasn't released yet), test methods can only be run in parallel if they are in different fixtures. So, we need to add the Parallelizable
attribute to the DotKEGG.Test assembly, but we probably won't see any real benefit until NUnit 3.7 releases.
See these NUnit Wiki articles for more info:
Checkout the EditorConfig documentation to see how this is done. We just want to add typical Danware-style C# coding guidelines. C#7 features should not throw errors or warnings if left out, since we'll most likely need to support VS2015 (i.e., C#6) for a while yet.
AWS allows custom HTML pages to be displayed when 4xx and 5xx error codes occur. This would be a really nice feature to add, but would require more sophisticated use of HTML/CSS than we have right now... We put in an issue on the SHFB GitHub page to request error pages as a new feature.
Until then we'll just leave this in the backlog in case some future developer wants to be awesome and make such a page. If they do, the specific properties in the CloudFormation template are the CloudFront Distribution's CustomErrorResponses
and the S3 Bucket ErrorDocument
properties.
We are trying to move away from the static operation class pattern to using instance methods on the KeggDb classes. Compare:
KeggInfo.ForDatabase(PathwayDb.Instance)
vs
PathwayDb.Instance.Info()
The second form is much more clean/fluent. Luckily, since all KeggDb classes' Info() methods will return the same thing (an InfoResults
object), we can just put this method on the KeggDb and KeggCompositeDb classes. Also, GenomeDb should have a GeneInfo() method, with one overload that takes a TNumber and one that takes an OrganismCode.
Don't forget XML comments on these methods!
Change the string argument to KeggInfo.ForOrganism() to an OrganismCode, and adjust the docs and unit tests accordingly.
Add unit tests for KeggDb and its derived classes. This includes, but is not limited to:
Have a basic example showing how to create a KeggId derived instance using either the derived class's constructor, or a derived KeggDb.Entry() type method. This example should have some sample output of KeggId.ShortForm and KeggId.DBGETForm() calls.
We should be using the leaner .csproj format offered in Visual Studio 2017, particularly the <PackageReference> element for referencing NuGet packages more succinctly. By doing so, we should also be able to use the new package features in the VS Project Properties page. The end goal of all of this is to make it easier to make DotKEGG target multiple versions of .NET.
Here are a couple webpages with more information about migrating from old .csproj and package.config files to the newer format:
We should place the S3 Bucket currently hosting the documentation in a CloudFront distribution to decrease latency. All docs subdomains will then ALIAS to that distribution, rather than the bucket itself. Here is a link to the AWS::CloudFront::Distribution reference page.
TNumbers represent genome entries, not organisms, so this makes more sense.
Make sure construction, GetGenomeId(), and equality operators/methods work as expected, and that hash codes are unique. We can also remove some of the KeggInfo unit tests for invalid organism codes, since that logic has been moved to the OrganismCode constructor.
All database types inheriting from KeggDb should have the <inheritdoc/> in the XML comments of their various methods/properties. That way, we won't have to copy/pasta the same comments between all these classes.
All KEGG ID types inheriting from KeggId should have <inheritdoc/> in the XML comments of their various methods/properties. That way, we won't have to copy/pasta the same comments between all these classes.
The KEGG list operation will be implemented as a set of methods on our various KeggDb
classes. We are moving away from the static operation class pattern because it makes a lot more sense to have operations be actions that a KeggDb
instance can perform. Compare:
KeggList.InDatabase(PathwayDb.Instance)
vs
PathwayDb.Instance.ListPathways()
The second form is a bit more clean/fluent, and allows us to have very descriptive method names on each KeggDb
class (ListModules()
, ListCompounds()
, etc.). Of course, this will require a little more dev effort, as we must implement the methods in each class, not some base abstract class.
Some notes on the List() methods:
ID
derives from KeggId.ListResults<T>
data structure (faster and easier to get results a couple at a time
ListGenomeIds()
: returns a ListResults<TNumber> of all genomesListOrganismCodes()
: returns a ListResults<OrganismCode> of all organismsListGenesInGenome(TNumber)
: returns a ListResults<GeneId> of all genes in the provided genomeListGenesInOrganism(OrganismCode)
: returns a ListResults<GeneId> of all genes belonging to the provided organismListPathways
: Returns a ListResults<MapNumber> of all KEGG PathwaysListPathwaysInOrganism(OrganismCode)
: Returns a ListResults<MapNumber> of all pathways belonging to the provided organismListPathwaysInGenome(TNumber)
: Returns a ListResults<MapNumber> of all pathways belonging to the provided genomeListPathways
: Returns a ListResults<MNumber> of all KEGG ModulesListPathwaysInOrganism(OrganismCode)
: Returns a ListResults<MNumber> of all modules belonging to the provided organismListPathwaysInGenome(TNumber)
: Returns a ListResults<MNumber> of all modules belonging to the provided genomeAccording to this article, CloudFront distributions cannot yet be created with IPv6 support through CloudFormation. So we'll just keep this issue in the backlog until that support comes around.
The docs.dotkegg.com S3 bucket needs permissions in the CloudFormation template. We should thus define a new IAM group called "dotkegg-docs-admins" who have explicit permission to manage objects on that bucket, and that bucket only. This group should have no other AWS permissions, and S3 permissions on no other objects.
Currently, the only See Also link in the documentation is for the DotKEGG namespace. We should add some more, including:
We should have something on the Wiki page for this repo. So add a Home page that just refers users to the actual documentation hosted on S3.
These tests should include:
Don't forget examples, and any relevant remarks:
The KeggRestApi class should have try/catch blocks in all of its methods, to free calling code from having to handle WebException
s.
The various Info() methods should also be looked over in light of these changes.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.