Comments (16)
@zstumgoren, yes, the Wisconsin system is based on three factors: the revised flag, the sort order and the company name. It passes my eyeball test.
I'm considering having it throw a hard error rather than a log when no ancestor filing is found. That might prevent some new data entry practice by the state from introducing silent bugs.
The policy is limited to WI thus far. The code in the current PR wouldn't affect anywhere else.
from warn-transformer.
If the state's raw data has an amendment column, could we use that to strike certain records?
from warn-transformer.
Another thought, would a true/false boolean called something like is_amendment
be a reasonable way to start flagging these?
from warn-transformer.
from warn-transformer.
Yeah, good start! I think what we really want to know is if something was amended, but obviously harder to flag.
from warn-transformer.
In Wisconsin's case, revisions seem to always appear below the original, could be used to exclude records superseded by amendments.
In this case, we want to keep revision 2, discard revision 1 and the original.
from warn-transformer.
Hmm. There must be some way to suss this out.
from warn-transformer.
I think it's going to be a state-by-state fight, because the date and amount can change and I suspect in rare cases the location can change, too. It's pretty clear to me what do in WI, but other states may not be so straightforward.
from warn-transformer.
Gotcha. I think a first step is to go state by state and try to mark amendments. I'm going to try that first, and then we can figure out how to handle them next.
from warn-transformer.
I went through all of the CSV files. I only found one other state — Iowa — that had a clear and obvious indicator of an amendment. That was added here #69
Have you see other states where you know there to be amendments, @chriszs?
I wonder if @zstumgoren has some wisdom for us here, as well.
from warn-transformer.
I've drafted an addition to our transformer system for excluding the ancestors of amendments. You can find that here. It encodes what are my latest opinions on WI and IA. For WI, it strikes the previous record in the sheet if the name is the same. For IA, it does nothing, because I can't spot any duplication introduced by the amendments.
from warn-transformer.
Yes, I think. IL comes to mind because I did that recently. It's at the bottom of the spreadsheet there. I'd have to go back and look at other states to see how they handle. Some just add like "UPDATE" to it and then update the record, so that may not require special handling.
from warn-transformer.
Hey @palewire I hadn't noticed amendments in other states, but I'd defer to you and @chriszs at this point on where else this may apply. I do like the strategy of giving precedence to the most recently filed amendment, which feels very much in line with how FEC filings work. Alas, sounds like WARN-land we need to devise a heuristic for identifying related records in an "amendment chain". @palewire Am I reading your PR correctly that the v1 strategy is based primarily on identification of a repeated company name?
from warn-transformer.
@chriszs, when you have a minute can you point out to me what you see in the IL data?
In the meantime, I am going to merge this PR.
from warn-transformer.
So this may be moot if the state's live data portal provides a better alternative to the archive page we're using (I've put this in a ticket), but archive has these supplemental notices at bottom of sheets:
Employee number is additional, so may not inflate that, but unclear whether handling this well in every case.
from warn-transformer.
I think we got this handled. Correct me if we dont.
from warn-transformer.
Related Issues (20)
- Fix California transformation error HOT 1
- Write a transformer for CO
- WI transform is failing
- Documentation improvement ideas
- Update click dependency when bug is resolved HOT 1
- LA import disabled until scraper is fixed HOT 1
- QA checks needed HOT 1
- Additions format doesn't allow further automation HOT 3
- Newer pipenvs disable skip-lock functionality HOT 1
- mypy throwing more cli errors HOT 2
- Need retry on some other API calls HOT 1
- Document that test data needs to change when file format changes
- 'make test' fails on Linux HOT 1
- Date simplification technique is too simple HOT 1
- UTF-8 implicity doing bad stuff on Windows
- More Node updates HOT 1
- Begin testing for Python 3.12 support HOT 1
- Automated QA checks HOT 5
- Update pre-commit libraries, use pre-commit trappings for online testing HOT 1
- Intermittent build-release problem HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from warn-transformer.