Comments (12)
thanks for weighing in, Nick. so if offering support for other languages is important, i'd vote that we not take this existing functionality out, and go ahead with updating the value of the default.
from schemas.
Is there a known use case for allowing hubverse admins to customize this directory?
from schemas.
@elray1 @LucieContamin pinging you both because you weighed in about this in today's Hubverse dev meeting.
Re: customizing the model-output
directory, I'm trying to understand the balance between:
- flexibility for hub admins
- potential value of standardization to the Hubverse at large
- long-term maintenance/dev work for Hubverse devs
In this specific case, the impact of allowing people to customize the model-output directory is, perhaps, minor, but it's worth being explicit about the above tradeoffs (for this and for every customization we decide to accommodate).
A few examples/questions related to items 2 and 3 above:
- If there is ever a need to join model outputs across hubs (e.g., current and archived FluSight data, or seasonal hubs with multiple repos), having a standardized Hubverse directory name for the files makes this process easier and less error-prone
- As we move to the cloud, we are providing a path for people to access data without Hubverse context. My .02 is that we'd be doing people a service if they know exactly where to find the model outputs in a Hubverse hub without having to reference a config file.
- Having a configurable model-output directory adds a small but measurable task to our internal downstream processes (e.g., hubData, the cloud sync action, visualizations/dashboards). Parsing a hub's config file to find the model-output directory is one more thing that every tool built on top of these files will have to do.
Accepting internal complexity in the name of simplicity for Hubverse users is what we should be doing. In this case, however, it seems like there's potential to do the work to accommodate flexibility and also make things harder for a certain population of data consumers.
Aside from historical data/hubs, is there a compelling reason to let hubverse admins use a different name than model-output
? Do we know what hubs are currently using a name other than model-output
?
from schemas.
Thanks for the detailed information and context.
About allowing customizing the model-output
directory: as it's currently used and well integrated I will prefer not to go back on that decision and force it to be a defined "model-output"
for example.
The information is stored in a "standardized" .json and is easily accessible if necessary. It's also already integrated in the HubData
, HubAdmin
tools.
For the user, I don't think it's that big on an issue if you provide documentation and example code on how to load the data, especially if using the HubData
package.
Also, it allows new hub to choose their folder, which might be useful especially if they don't want to do everything in English, for example.
Currently, two of the US SMH are using another folder name.
from schemas.
Thanks for raising the issue, Becky, and for your thoughts Lucie!
Summing up thoughts so far:
- some existing hubs use names other than
model-output
for this folder - this feature could be useful for hubs that want to organize in languages other than English
- we don't see any reason for a new English-based hub that is starting up to use anything different than
model-output
.
Here are some thoughts/reactions to points 1 and 2:
- My feeling is that in instances where we think the right way to go breaks with existing hubs, we shouldn't let that hold us back. The way we've handled this in the covid and flu forecast hubs is to restart the hubs in new repos following hubverse standards (this has included changing the name of the model-output folder for both of those hubs), with a planned migration of older data into archival repos that make the historical data available to hubverse tooling. So to me, the fact that the SMHs are using another folder name currently doesn't seem like a key reason to make a decision about hubverse infrastructure setup going forward, if that would allow for simpler tooling and lower maintenance in the future. I agree that the maintenance cost for this particular item is low, but if we can identify several things like this, it could add up to a simpler system overall.
- However, I think the idea of offering support for other languages is worth considering carefully, and is tied to this as Lucie points out. There is a related issue here. I might vote for making a decision about whether we plan to follow through on that issue, and letting that inform what we do about this.
FWIW, I don't feel extremely strongly about this one way or another, these are just my thoughts on this since I was asked :)
from schemas.
Thanks for the discussion here, and input from all the varied perspectives.
I envision localization as something that we might want to head towards in something like 9-18 months from now. Not on our short term list of "big" things to tackle, but on a list of fairly high priority "nice to haves". I don't know what our priority list will look like in 9+ months, but my inclination is to not engineer for something that is that far into the future, and instead to focus on the immediate use-cases in hand. I'd welcome input from others who have more software dev experience on this than I have! I do think that offering support for other languages is important, just not as important as locking down base hubverse functionality.
from schemas.
Great point about the need to consider localization--hadn't considered that!
@elray1 can you clarify what you mean by "existing functionality"?
Are there any active, fully-Hubverse-compatible hubs (i.e., not SMH) that use this feature?
from schemas.
What I mean by "existing functionality" is that hubData already supports handling of the model output directory as a configurable variable rather than a hard-coded thing, e.g. here: https://github.com/Infectious-Disease-Modeling-Hubs/hubData/blob/059aa13fd0f97b31ef52dca396b297385491149a/R/utils-connect_hub.R#L46-L50
from schemas.
Got it--yep, I realize we can already do that.
As I understand the downsides of removing this customization:
- historic hubs: we can fix things up in the data conversion process to hubverse format, if needed
- localization: we're at least 9-18 months out and can revisit as a part of that work
- Revising a past decision is annoying for sure
Are there other cons to removing this customization? The ones above don't outweigh the cons of non-standard folder names as outlined above, imo.
(Unless doing so would break an existing Hubverse hub, then agree we shouldn't do it)
from schemas.
To clarify the above...what I'm getting at is that allowing customization of the model-output folder name adds actual work in the present to maybe enable a future feature that isn't solidly roadmapped.
It's always a tricky call, and favoring the future feels usually like the right thing. In my experience, however, it's hard to predict more than a quarter or two out, and I've often regretted a code base with complexity that felt necessary at the time but ended up unused as priorities evolved.
Again, in this case the complexity isn't terrible, but it's worth considering for a small team and because it potentially impacts users (i.e., not just the dev team).
from schemas.
I don't feel like I have much more to add :)
I understand your points, and also continue to feel some reluctance about putting effort into removing stuff that's done and that we would certainly want as part of a localization feature which it sounds like is a fairly high priority medium-term thing, and I also continue to not really have strong feelings about this either way.
from schemas.
I don't feel like I have much more to add :)
Same!
I can "disagree and commit" on this, though I won't prioritize the corresponding cloud sync issue to accommodate customized model-output directory names (because there is more pressing work, and someone using the action can easily change the directory name if they need to).
Returning to the original purpose of the issue, sounds like there's no good reason not to update the default value as you suggest?
from schemas.
Related Issues (20)
- double check data type options for standard task ids HOT 2
- reconsider the name `output_type_id` HOT 7
- Add schema support for strings as sample indices in `output_type_id` HOT 2
- Broken link HOT 2
- Update authors HOT 1
- Add a property for S3 bucket information? HOT 14
- admin-schema.json: split repository_url property into org name and repository name HOT 3
- update schema to reflect changes in sample output_type
- Notify community of v3.0.0 schema version breaking changes HOT 2
- Allow additional properties? HOT 1
- hubAdmin::validate_hub_config - model metadata information HOT 2
- Add note to README to update hubTemplate whenever a new schema version is released
- remove epidemic week formatting requirements for CDF category values HOT 2
- New value features
- Decimal place in value - new features?
- [ORG NAME CHANGE]: Update repo to hubverse-org organisation name
- Document that for cdf and ordinal pmf output types, output_type_id values should be listed in order
- Introduce a property to fix the `output_type_id` column data type across the hub
- Better test for new schema version releases
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from schemas.