kanedata / find-that-charity Goto Github PK
View Code? Open in Web Editor NEWReconciliation for UK Charities and other nonprofit organisations, with elasticsearch back end.
Home Page: https://findthatcharity.uk/
License: MIT License
Reconciliation for UK Charities and other nonprofit organisations, with elasticsearch back end.
Home Page: https://findthatcharity.uk/
License: MIT License
@drkane if you check the branch at https://github.com/NICVA/find-that-charity/tree/es_query_yaml I've made a couple of changes that might make ongoing maintenance easier. Let me know if you like the look of it and I'll merge it into NICVA/master and submit a PR.
es_config.yaml
which will be read in to the server command and converted to the json string. I find yaml easier to read and adapt than a json string, so good for ongoing development.recon_config.yml
, and it's used in /reconcile
GET
and POST
now instead of the previous search query.From user
you mentioned that the service requires for NI charity numbers to be pre-pended with ‘NIC’. I’m afraid to say this runs contrary to how we record NI charity numbers (without the NIC) and the file required a bit of tweaking to make it work. I know that CCNI records charity numbers without the NIC and the Find that charity search works in a similar way (although the URLs for relevant charity pages all contain the ‘NIC’, oddly enough – e.g. https://findthatcharity.uk/charity/NIC100012). We actually have a data cleansing routine to remove ‘NIC’ from charity numbers too. It’s not the end of the world – it just adds an extra step to the process.
NI LA have LAN prefix here: https://dev.findthatcharity.uk/orgid/source/lan
It should be GB-LANI http://org-id.guide/list/GB-LANI
To allow autocorrelation/suggestion based on user input.
Would need to incorporate typeahead functionality.
For example of use (in jQuery): http://api.jqueryui.com/autocomplete/#option-source
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-completion.html
See implementation in similar service here:
Search results / reconciliation suggestions put XI-GRID results higher than official registers
https://dev.findthatcharity.uk/?q=queen%27s+university+belfast&orgtype=all.
Local authority search results can show the GB-SHPE- results before the government register results.
https://dev.findthatcharity.uk/?q=%22manchester+city+council%22&orgtype=all
Search results / open refine suggestions should ideally prioritise IDs by org-id list % criteria or a defined ‘primary identity’ - eg a registered charities primary identity is charity, even if it is also registered as a company or social housing provider. Local authorities primary identity is local authority, even if they are also a social landlord or a research org.
I find that in these lines https://github.com/TechforgoodCAST/find-that-charity/blob/master/data_import/create_elasticsearch.py#L8-L28, because mapping
is a tuple, that INDEXES[0]["mapping"][0] = args.es_type
fails (tuples being immutable).
Changing mapping
from a tuple to a [list]
solves the problem and doesn't seem to create any problems down the line.
Perhaps as a result of reconciliation process?
Maybe just CICs and CLGs
Hi @drkane
I think you might be familiar with http://org-id.guide? What I propose is adding the prefixes so that we have org-id format identifiers for the relevant registers appearing on the charity's page and as an object in the reconciliation api.
... where those exist.
P.S. great job on getting the finder online.
eg if a charity record hasn't been imported but the charity does exist, then scrape the data from charity commission site
Ltd
> Limited
&
> and
Remove apostrophes '
The
>
YMCA
Charities
> Charity
Inc
> Incorporated
"This website is supported by 360 giving"
Does using quotation marks make a difference to the search?
How does the closeness of the name match relate to position in results vs weighting for the different lists.
Clarity about ownership
Including filtering organisations?
Moved to: https://github.com/drkane/find-that-charity-scrapers
Make sure removed from readme.md
too
To the organisation info page
So we can spread the word about it
Currently not possible to use elasticsearch versions above 2.x with dokku. See:
This section should be just for further links, as there are hyperlinks via numbers themselves. Re-name ‘Further links’ and move to lower on page.
Can this be combined with data source in boxes - eg link to access the data link or anchor point in table using this format on is page https://dev.findthatcharity.uk/about#data-sources
What is the source of this data - area of operation, or based on postcode? Not sure of the utility of having this on the record pages.
Just leaving this up as an issue that isn't working for me but might work for others, would be good to get feedback.
though /reconcile
does define the preview window as per the OpenRefine Preview API, I can't get it to work (i.e. display the preview iframe on hover over a reconcile candidate) - though preview windows do work for other services (e.g. OpenCorporates).
{
"name": "charitysearch",
"identifierSpace": "http://rdf.freebase.com/ns/type.object.id",
"schemaSpace": "http://rdf.freebase.com/ns/type.object.id",
"view": {
"url": "http://localhost:8080/charity/{{id}}"
},
"preview": {
"url": "http://localhost:8080/preview/charity/{{id}}",
"width": 430,
"height": 300
},
"defaultTypes": [{
"id": "/charity",
"name": "charity"
}]
}
and this preview itself also works fine outside of OpenRefine, e.g. http://localhost:8080/preview/charity/NIC100012
How to implement this?
FWIW, we scrape some data from the NI Charity Commission website - data that doesn't appear in the downloadable .csv (e.g. number of volunteers, trustees and employees, regions it operates and free text description of what charity does, how it meets public benefits test and its charitable purposes etc).
Morph.io scraper and data here https://morph.io/BobHarper1/charity-commish-ni
Might be useful for people interested text mining, also could provide more info to this search tool.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.