Comments (2)
The fact that you can use Nextclade JSON to seed your particular database at all is a little miracle and I would not recommend to rely on it going forward. As mentioned in the docs, JSON output is unstable. Also there will be massive breaking changes in the coming weeks in Nextclade v3.
JSON format is used for internal communication between different parts of Nextclade, and as you've discovered, this is just a serialized internal struct. It naturally changes during routine development.
As a small research lab we are focusing on science and we don't have time to commit to maintain a stable external JSON format at this point, and will not have resources to adjust to the requirements of downstream projects. We experiment and break things a lot and reserve a right to change the JSON format at any time without warning.
So while you can submit a PR to change the format now (assuming there is no loss of functionality and correctness, we will likely accept it), I don't see it helping much in long term.
One thing that we considered to facilitate usage of JSON output is to provide a JSON schema for the format, but this would not help much in your use case.
Perhaps writing a middleware tool to ingest TSV output is a better solution for downstream projects? TSV output is much more stable - it follows semantic versioning. You can then maintain a stable output format of your liking, and to open-source the tool for the community who happen to use your particular toolset.
Also, Spark seems like a massive overkill to me. Internally our scientists use TSV with pandas/polars and it works decently well. Maybe this could also fit to your project?
If you have other ideas let us know.
from nextclade.
Thanks for your comments and suggestions.
I discussed with a few of our team members and we will look into using the TSV output in lieu of JSON.
We do want to thank you for your work and making this tool available.
This has enabled us to do research and help us made some contributions in the public health space.
from nextclade.
Related Issues (20)
- 21L Tree Updates? HOT 2
- `--input-pcr-primers` listed in CLI help options despite being removed in v3 HOT 2
- When using `?input-fasta=` url query param without specifying dataset, web auto-starts analysis (prematurely) HOT 5
- Scrollbar shown for dataset names in dataset picker HOT 9
- how to generate the result table by the cli version auspice HOT 4
- output TSV column(s) for missing bases at beginning and end of sequence? HOT 1
- --input-dataset parameter HOT 5
- Update Fred Hutch logo
- How to get the latest Lineage- with CLI HOT 4
- Community build cache validity bug HOT 2
- Developer guide uses deprecated CLI option
- docs: document nextalign-like use-case HOT 1
- ENH(nextclade cli): nextclade dataset list: indicate whether clades can be assigned HOT 7
- nextclade run --output-columns-selection throws error for seqName and includes index even though I don't want index HOT 11
- Nextclade Web: Confusing unwanted dataset switching HOT 3
- Nextclade Web: consider rethinking dataset badges HOT 1
- Nextclade Web: don't store unnecessary dataset info in local storage
- [minor] Auspice dataset functionality: URL redirects don't update displayed metadata HOT 3
- Max marker setting even counts markers that are off
- Frameshift and insertion markers cannot be disabled/configured in contrast to all other markers
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nextclade.