Comments (14)
Here's a CSV download link that we can use in the crawler: https://docs.google.com/spreadsheets/d/e/2PACX-1vT17qv7NxgWJnqmJJiGTncmAQeWI2QKW9Z92CZOXxWJi071xJr5V8CxtnB3AxgFkFZLCg2eGgBizxXs/pub?output=csv
from crawler-planning.
Let's not make Sanction
here. I would just stuff the text into entity.add('program', ...)
and tag it as debarment
.
from crawler-planning.
The US just started sanctioning these, we should do a one-off extract of this list
from crawler-planning.
Here's a Google Sheet with the data, send me an access request as it still needs to be formatted quite a bit: https://docs.google.com/spreadsheets/d/1KszxKHQ6VTkMCQfjBkaPace5DdbDv7e9gIFmJrPtyy0/edit#gid=833768193
from crawler-planning.
Let's use the topic & collection debarment
from crawler-planning.
Hi! I could do this one - I assume you have generated the spreadsheet semi-automatically from the PDF? Do we include it in the repository or continue to host it on Google Sheets?
from crawler-planning.
We have a whole folder of these and just fetch them as CSVs. Let me see if I can do this from mobile.
from crawler-planning.
Nope. Gotta do it tomorrow :/
from crawler-planning.
No problem!
from crawler-planning.
Great, I'll make a preliminary crawler. Not sure exactly how to create the Sanction entities for this - authority is OHCHR? Do we have a reference for the US sanctioning these (some interesting names in there...)?
from crawler-planning.
Let's not make
Sanction
here. I would just stuff the text intoentity.add('program', ...)
and tag it asdebarment
.
Two questions
- is the
Program
entity documented anywhere? Can't find any details on it in the data dictionary - just free text? - in the case of companies which have been removed from the list, it would be useful to add them with an end date, but if there is no
Sanction
we can't do that. should I simply omit them from the list?
from crawler-planning.
regarding entity.add('program', that's referring to the program property of Thing and its descendants. Search for Thing:program at https://www.opensanctions.org/reference/#schema.LegalEntity
regarding companies removed from the database but included in the spreadsheet, I'm not sure how to treat them in this case. @pudo ?
From the PDF:
- Of the 112 business enterprises included in the 2020 database report A/HRC/43/71,
OHCHR found reasonable grounds for the removal of 15 business enterprises on basis that they
were ceasing or were no longer involved in one or more of the listed activities in the Occupied
Palestinian Territory, according to the standard described above. They were, as a result,
removed from the updated database set out in Section A below.
Section A is the table of companies with Section value A. Business enterprises no longer involved in listed activities
. From the number of entries, it looks like Section A is those that were indeed removed from the database, and just listed here for clarity.
With sanctions it seems we list them as long as they're on the official list but stop adding the sanction topic. With debarments we usually continue listing them, with an end date in the Sanction entity.
from crawler-planning.
I'd vote dropping them - don't think we have very good grounds for listing them if they're delisted around the first time we mention them.
from crawler-planning.
regarding entity.add('program', that's referring to the program property of Thing and its descendants. Search for Thing:program at https://www.opensanctions.org/reference/#schema.LegalEntity
Yes, I saw it there ... but I can't find a description of what it actually means, either there or in the FollowTheMoney ontology. For the moment I am putting the section name, e.g. "B. Business enterprises involved..." or "C. Business enterprises involved as parent companies".
I will just drop those from section A, makes sense to me.
from crawler-planning.
Related Issues (20)
- BIS Entity List HOT 1
- BIS Unverified List HOT 1
- BIS Military End User List HOT 1
- Council Regulation (EU) 2022/398
- Council Implementing Regulation (EU) 2022/2476
- US Nonproliferation Sanctions
- US Terrorist Exclusion List
- US Section 7031(c) of the DoS, Foreign Operations, and Related Programs Appropriations Act
- European Council Decision 2014/145
- European Council Decision 2022/399
- European Council Decision 2022/2477
- European Council Decision 2022/2478
- European Council Decision 2023/2871 HOT 1
- Luxembourg Administrative sanctions HOT 1
- Czech Republic National Sanctions List HOT 4
- Organizations designated as terrorist by Bahrain
- Executive Order 13959
- parltrack as source of european MEPs HOT 1
- Armenia National Assembly
- FCC Covered List HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from crawler-planning.