etsy / sahale Goto Github PK
View Code? Open in Web Editor NEWA Cascading Workflow Visualizer
License: MIT License
A Cascading Workflow Visualizer
License: MIT License
Like the pure Java API in Cascading, Cascalog, etc.
Pull requests welcome. I'll take a swipe at this myself sooner or later but its low priority as Etsy's piece of the Cascading landscape really focuses on Scalding.
In progress
Filter the list of taps to include only real (i.e. not in the temp directory) directories and display this information as another tab with the job links and step running times.
We have found the shutdown hook causes instability and should be removed. These operations are too complex to happen at jvm shutdown.
When a job has map-only stages the progress bar can be incorrect (e.g. converging to 50% for a job with only map tasks).
If a job has only a single stage it should be automatically selected in the graph when the page loads.
Internally Etsy has a great staff directory that we link to from the displayed usernames in Sahale. Make this pluggable so others can take advantage of this easily.
There are some log links in the UI that can be moved to the new "Links" tab. Some kind of link/button to open up that tab can replace the existing "View Hadoop Logs" link.
To deal with major version differences in Scala and the various Hadoop frameworks used in Sahale/FlowTracker. Not sure we'll go this route yet, but take a look at various ways it could break down and have a good think on it.
java.lang.NullPointerException
at com.etsy.sahale.FlowTracker.pushReport(FlowTracker.scala:271)
at com.etsy.sahale.FlowTracker.pushFlowReport(FlowTracker.scala:244)
at com.etsy.sahale.FlowTracker.updateFlow(FlowTracker.scala:157)
at com.etsy.sahale.FlowTracker.run(FlowTracker.scala:137)
at java.lang.Thread.run(Thread.java:745)
It would be nice to add properties to the config file on the Sahale side (the db-config.json) that maps expected flow.jt_url
props coming from tracked jobs to shorthand cluster names so they could be displayed in the main Running/Completed jobs tables.
Greetings! I have a small request: When you cut a Sahale release, would you kindly tag it and push the tag to github? Currently, I only see a v1.1.0 and a v0.5 tag.
I'm automating a deployment of the node application, and to get v2.0.1 I have to poke through the commit history to find the commit where that version was set in pom.xml and use that commit's hash as the git ref in config management. I'd much rather just have a tag.
Thanks for your consideration!
When a job fails, pull out the links to error logs and display this information as another tab with the job links and step running times.
It should also change that pane to default to the error link tab instead of the running times.
After changing scala version and submitting fat jar to cluster throws error,
NoClassDefFoundError: scala/reflect/ClassManifest
I think this happens because there is some classes still compiled with lower scala versions.
I think, ideal would be to have cross compiled jar in maven (or other) repositories.
What do you think?
I would be glad to help if there is anything I can do.
Best
Hi there ๐I didn't find any end-to-end tests in Sahale
. I maintain an open source library that creates Playwright/Jest tests and runs them in CI.
Would you like me to set up end-to-end tests on this project? If you'd rather I add tests to a different project that's fine too!
Targeting apache Hadoop 2.7.0
We check at the end of each step to capture the full step running times. We need to reject tasks minStartTimes that erroneously report 0 (or anything less than full workflow start time) for this to avoid ending up with a UNIX timestamp instead of a task duration in seconds.
Instead of hard coding the monitoring server in flow-tracker.properties file, would it be good idea to provide it as parameter to cascading/scalding jobs?
Similar to --track-job also provide --host & --port?
This way re-assemblying fat jar won't be needed if host server changes.
In progress
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.