innovate-inc / edg_metadata Goto Github PK
View Code? Open in Web Editor NEWEDG metadata on staging created for Innovate-Inc
EDG metadata on staging created for Innovate-Inc
The idea is to use EDG's custom compilations to manage the galleries - the top 50 most popular datasets are already managed in a compilation, visible via API here:
https://edg.epa.gov/metadata/rest/find/document?f=json&childrenof={9007D9FF-E18F-9A91-564F-5C4FF3FAB904} if a record has a thumbnail, it seems to be included in the list of links - this search shows a couple of examples: https://edg.epa.gov/metadata/rest/find/document?searchText=usgs&start=1&max=25&f=json
Below is a clumsily marked-up mockup.
Curious to see how feasible this is, it would definitely help out with our GoogleAnalytics reporting. Right now the HTML title (<title>) for all details pages are the same: EPA Environmental Dataset Gateway. It would be very handy if we could set this title to be equivalent to the title of the metadata record being displayed. Worth discussing.
Today a search can only be run on the main search page via POST, but the EDG has a rich REST API. Can we allow folks to link to the main search page with a search already run via syntax like this?
https://edg.epa.gov/metadata/catalog/search/search.page?searchText=EJScreen
For reporting on a monthly basis - somehow capture or be able to report most frequently searched terms in just the last month.
This issue has two parts:
From: Harness, Catherine
Sent: Thursday, August 25, 2016 11:44 AM
To: Hultgren, Torrin [email protected]
Subject: EDG Staging Publishing Errors
Hi Torrin,
I was working with Mark to test the new OAR-OAP record thumbnails. They appear to be working in the production EDG, however we thought it would be best to test them in the staging version since that’s where Ana wants to see the thumbnails.
When attempting to create a harvest repository on the staging server, after selecting “Register resource on the network” and clicking the “Proceed” button, I get sent back to the homepage.
Then I thought we could easily upload a file for testing purposes, I get the following error when I hit the upload button:
Thanks,
Catherine
Catherine Harness
EPA National Geospatial Support Team
Innovate!, Inc. | [email protected] | 513-713-0260
Also need to ensure that direct links to "About" page are redirected to home page.
We logged this issue with Esri:
Esri/geoportal-server#254
and they implemented this fix:
Esri/geoportal-server@988628f
I'm not completely satisfied with the fix, but it's acceptable as a workaround. Could we grab this file, compile it, and implement it in our catalog?
The UI for the EDG Metrics page is clunky and confusing, but reworking it into something modern, clean, and intuitive like this interface might not require starting over from scratch.
The legacy metrics code has 2 components:
Editing the JSP code to adjust data for metrics is pretty straightforward/low effort (I was able to do it myself!) so even though the way xpath queries are handled in the database is a little clunky, I don't know that there's much value in replacing the back end.
I'm unclear on how challenging it might be to apply purely cosmetic changes to the existing Simile Exhibit front-end. There is an Exhibit3 framework, but development seems to have tapered off, and the UI doesn't look significantly better. Modern JavaScript frameworks seem to offer significantly more capabilities out of the box for interacting with data. The example above uses Esri's Calcite framework, which has this caution:
Calcite Web, while still a CSS Framework, has some profound differences from projects like Bootstrap or Foundation. Where Bootstrap and Foundation both aim to provide a robust set of patterns and utilities for the general, third party developer, Calcite Web only concerns itself with Esri projects. Calcite Web is not designed for a developer who is not directly working for Esri on Esri products or properties. In other words, every project created with Calcite Web will look like an Esri-branded site.
Looking like an Esri-branded site is fine, it's a clean and professional look, but this metrics page is several steps removed from any Esri products or properties, so we don't gain much by locking into an Esri framework, and we might lose out on widgets and functions available in the bigger frameworks. I know Bootstrap is very popular and is underneath the dashboard that Ana really likes, but I'm not sure whether we need a complete framework or might be able to take a widget approach using something like JqueryUI. The decision might also be determined by other requirements, but I will enter those as separate tickets #80 , #81 , #82 , #82 , #83, #84, with this ticket capturing the big-picture decision on what framework to use.
First ticket to track accomplishment of UUID fix.
I think we didn't pursue this issue because it was linking to staging where we didn't expect the search to work, but now that we've updated the data in staging, it's more apparent. The see more link submits the http-encoded version of the search string:
sys.collection%3a%22%7b9B7778AC-DE79-287A-2A79-F05863C8A212%7d%22
which the EDG search apparently can't handle properly (unless it's in an actual URL being interpreted as a REST query) so when triggering a search on the search page, the "See More" link needs to submit this equivalent version:
sys.collection:"{9B7778AC-DE79-287A-2A79-F05863C8A212}"
The search terms from any search (web page, REST API, CSW, etc) should be captured and stored in a database table. The database table should contain two columns - one for the search term itself, and a second that contains the count of how many times that term was used in a search.
Potential enhancements down the road:
Last remaining links from about page....
Let's aim to complete this by 6/24/06. If there are any challenges with ports or IIS, I'm happy to assist.
This page:
https://github.com/Innovate-Inc/EDG_metadata/blob/master/catalog/skins/lookAndFeel.jsp
programmatically references a single folder for javascript files, and right now it is set to v1. But there are much newer javascript folders available, and I think there might be some files that could be updated.
https://github.com/Innovate-Inc/EDG_metadata/tree/master/catalog/js
I might be wrong, but I'd appreciate a review/comparison of these files to be sure we're using the latest.
Clicking on keyword hints fires this URL:
https://edg-staging.epa.gov/metadata/rest/find/document?f=searchpage&searchText=keywords%3A%22soils%20%26%20land%22
which results in a null response. Log file attached.
metadata.2016-08-17.zip
This ticket will try to capture the major page layout changes.
Esri example has fantastic time slider filter tool - it would be awesome if we could incorporate this, but not a high priority.
Showing bar or pie charts in facet panes alongside filter elements would be a higher priority, as long as those charts are straightforward to implement and are linked with the facet functionality.
Determine whether incompatibility can be addressed or if we're truly stuck at 7.
Sending logs over.
This was a big challenge but it's finally working, hooray!
Envision this would be a row of 10 tabs that look similar to the "Featured Data Products" (6 thumbnails and a "See More" link to a full search page) but would be located under Popular Datasets.
The search syntax for each region is:
https://edg.epa.gov/metadata/rest/find/document?owner=Region%201&f=json
https://edg.epa.gov/metadata/rest/find/document?owner=Region%202&f=json
etc.
There should also be a "Find my region" link that pops up a modal with an image map that shows the "US Map Split into regions" and allows a user to click on a region that closes the modal and selects the corresponding tab. All details for implementing this image map are at this link:
https://www.epa.gov/webguide/how-build-standard-us-national-maps
As filters are applied, they should appear in a list of "filters" at the top of the results so they can be quickly seen and possibly removed, analogous to Esri example site
Try minimizing use of any hardcoded URLs (localhost, edg-staging, etc.) either obtain URL from live path, or from main config file.
This is a placeholder ticket for capturing the revised order of facets (and potentially addition/deletion of facets). Emphasis should be on mandatory elements for validation - or possibly even a single button/facet that filters out completely valid records (showing invalid for any reason).
When they're stored as binary objects in the metadata.
Below is the email chain for context. Asking Esri for their thoughts before we get started on this. Will work to formulate more specific requirements.
From: Greene, Ana
Sent: Wednesday, February 22, 2017 8:59 AM
To: Hultgren, Torrin [email protected]
Cc: Pierson, Suzanne [email protected]; Harness, Catherine [email protected]; Suma Malothu [email protected]
Subject: RE: Full text search thoughts
Hi guys,
Did I ever respond to this? Just catching up…only 2 weeks behind on email…
I totally agree that the wildcard and fuzzy searches should be the default. And like the advanced search dialog. I’d like to go ahead and put all of this on our list of near term development projects.
Thanks,
Ana Greene, M.S., PMP
Environmental Dataset Gateway (EDG) Program Manager
Office of Environmental Information (OEI)
Office of Information Management (OIM)
U.S. Environmental Protection Agency
(o): 202-566-2132
(c): 571-232-7860
[email protected]
https://edg.epa.gov/
From: Hultgren, Torrin
Sent: Tuesday, February 07, 2017 7:26 PM
To: Greene, Ana [email protected]
Cc: Pierson, Suzanne [email protected]; Harness, Catherine [email protected]; Suma Malothu [email protected]
Subject: Full text search thoughts
Hi Ana,
I believe I’ve figured out the source of our continuing confusion about full text search. It was legitimately disabled years ago, but has been working for some time, yet perhaps not in the way we might expect, so I think there’s still some room for improvement, or at least adjustment. I think a lot of our confusion revolves around partial search terms and whether or not they’re considered a match. I think we can all remember a time when we used to have to be very careful about our search terms, and we couldn’t assume that search engines would appropriately match partial words or misspellings, yet these days we take it for granted. Lucene is quite capable of handling any match type we want it to, but the default is the old strict way. If we do a search for the first part of your email address, by default it will come up blank, even though there are records containing your email address:
https://edg.epa.gov/metadata/rest/find/document?f=searchpage&searchText=greene.ana
EDG has “advanced Lucene syntax” if anyone chose to read the help, and could apply a wildcard to their search, which just means that indexed terms that aren’t exact matches but contain the string are returned:
https://edg.epa.gov/metadata/rest/find/document?f=searchpage&searchText=*greene.ana*
Which gives us all 6 records that contain your email address. In theory this slows performance, but we’d need orders of magnitude more records in our index before we’d notice any difference. There’s a last option that’s kind of fun – though it doesn’t seem to work with the direct link, so you’ll have to try it manually If you do a search for greene.ana~ it will conduct a “fuzzy search”, where it will include “misspellings” or words that are very similar – it should return a bunch of records with “Greenspace” in the title.
I’m not sure about you, but I think my own expectation these days is that wildcards and fuzzy searches would be the default – I’d prefer a search to return too many results that I could filter through or refine than too few. But that may also because of an assumption that the search engine would do a good job of ranking/sorting those results so the most relevant ones would appear first, and I don’t know how valid an assumption that is with the EDG. I think we could figure out how to adjust the scoring/ranking algorithm under the hood of the EDG, but I’m not at all sure how we’d measure whether our tweaks were making search results more or less relevant. And if we were to make fuzzy searches the default, I wonder how we’d allow someone to opt-out if they wanted a more strict match? Perhaps we could show an “advanced search” dialog if they wished:
http://www.lucenetutorial.com/lucene-query-builder.html
https://www.google.com/advanced_search
Anyway, curious to know your thoughts. Definitely been on the brain today.
Torrin Hultgren
EPA National Geospatial Support Team
Innovate!, Inc. | [email protected] | 703-922-9090 x737
So this URL is broken:
https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid={c5e1e678-1b6b-40ff-b8dc-a89938fb4814}
but this upper-case URL works:
https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid={C5E1E678-1B6B-40FF-B8DC-A89938FB4814}
and in contrast this URL works:
https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid={0fd2712b-62d0-4aaf-ab20-2cbfe8c26b30}
but this upper-case equivalent is broken:
https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid={0FD2712B-62D0-4AAF-AB20-2CBFE8C26B30}
Strict case sensitivity would seem to be an acceptable thing, but it has caused our team a whole lot of confusion and frustration. Would it be possible to adjust the code so that at the very least the UUID portion of the URL is not case sensitive, so both uppercase and lowercase versions work across the board, at all the different endpoints?
New concerns have been raised about needing to track what metadata records have been deleted. Proposed functionality - figure out what functions "delete" records from the database (manual deletion, harvest that occurs with metadata record no longer present, etc) and add an additional procedure to copy that metadata record and associated geoportal attributes to a new table in the database so that we have an archive of all deleted records. (Could this be a database stored procedure?). Ana would first like a rough level of effort estimate, but considers this a pretty high priority.
Login appears successful, but clicking on any site links redirects to homepage and logs out - acts like "log out" button.
In the staging version of the site, on the Details page for every metadata record there is currently a link that says "Discussion Forum". Ana would like this link to be moved to the right side of the screen to make it stand out, and the text be changed to "Share Your Feedback". There is already a big green "Share Your Feedback" link shown at the top of every page. Ana would like this big green button to be hidden on Details pages, but to remain on any other page.
The thumbnails look much better, but now it seems that they're being stretched to completely fill the allotted space. If this is the only way bootstrap handles these, ok, but it'd be nice if they maintained their original aspect ratio and left whitespace in the remaining area. This stackoverflow describes the same scenario, and while they're working in the Rails world, I think the CSS should be the same:
http://stackoverflow.com/questions/25448371/bootstrap-css-thumbnail-image-resize-responsive
Guidance for the EPA Template is here (inaccessible off the EPA LAN, basic content in comment below):
https://www.epa.gov/webguide/applications-and-one-epa-web-template
The HTML for the template is this page:
https://www.epa.gov/sites/all/libraries/template2/standalone.html
The main CSS file is here:
https://www.epa.gov/sites/all/libraries/template/s.css
and the main JS file is here:
https://www.epa.gov/sites/all/libraries/template/js.js
We do not expect this to be a quick, easy, or seamless conversion, but getting it cleaned up and polished will mean a great deal to OEI Management and the user community.
Per the email chain below, it appears that the output of the REST API is returning ISO-8859-1 even though the raw metadata records are being stored as UTF-8, which does funky things with some characters. It's not clear where this encoding switch occurs - is it just the HTTP header setting, or is there some constraint in Java? Is it an easy fix or something major? Let's investigate and/or ask Esri.
From: Hultgren, Torrin [mailto:[email protected]]
Sent: Wednesday, March 08, 2017 4:38 PM
To: Felsher, Maxwell (CGI Federal)
Cc: Greene, Ana; Suma Malothu; [email protected]; Harness, Catherine
Subject: RE: Character encoding of DCAT output?
Hi Max,
I can’t think of any reason the charset of the response should be restricted toISO-8859-1 rather than the full domain of UTF-8, and it only seems to be applying to the REST API (https://edg.epa.gov/metadata/rest) rather than other URLs. I believe we should be able to fix it, but would you mind sharing an example of one of your records that had an encoding issue that we can use for testing?
The approach you’re working with is fine – it’s conducting a full search across all indexed fields, but seems to respond very quickly. To limit the search to just the fileIdentifier field, you could use this syntax:
https://edg.epa.gov/metadata/rest/find/document?f=dcat&searchText=fileIdentifier:A-280j-22
but if there’s a performance improvement, it’s all but impossible to tell. But actually, if all you’re looking for is a way to directly reference your own records, you may also use your own identifiers – the EDG will respect them:
https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid=A-280j-22
Might simplify things on your end?
Torrin
From: Felsher, Maxwell (CGI Federal) [mailto:[email protected]]
Sent: Wednesday, March 08, 2017 2:55 PM
To: Hultgren, Torrin [email protected]
Cc: Greene, Ana [email protected]
Subject: Character encoding of DCAT output?
Hi Torrin,
We were trying to search for some EDG records in the DCAT JSON-LD format (e.g., https://edg.epa.gov/metadata/rest/find/document?searchText=A-280j-22&f=dcat), and we ran into an issue with character encoding; our code was assuming it was in UTF-8, but now we see that the HTTP response specifies ISO-8859-1 in the Content-Type header. We’re fixing our code to not assume UTF-8, but I was wondering whether it was intentional to use ISO-8859-1?
(As an aside, we’re doing this in order to be able to retrieve the corresponding EDG URL for a particular dataset we put in our metadata. We search for our identifiers using URLs like the above and then parse the response and extract the landingPage property. That was the best option we could figure out, but if you have other suggestions, let us know.)
Best,
Max Felsher
Consultant, CGI Federal
Contractor to ORD (ScienceHub team)
Obtain code from here:
https://github.com/Esri/geoportal-server/wiki/Geoportal-Server-Downloads
Merge with existing code base and commit so we can deploy to staging.
Install on staging and test...
Revisit issue #68 after EPA Template upgrade is complete.
The stylesheet used to display metadata at an endpoint like this (note the xsl=metadata_to_html_full):
https://edg.epa.gov/metadata/rest/document?id=%7B4806F6B7-E980-4307-89AD-9436DC377EE3%7D&xsl=metadata_to_html_full
never looked very good to begin with, and its usage was dropped because it didn't accommodate the new project open data dcat format - the page appeared blank. That xsl=metadata_to_html_full term is currently ignored by the application and the raw unstyled xml sent to the end user - an ok compromise, but not pretty.
Esri's desktop products do include more polished stylesheets (attached to this issue).
Stylesheets.zip
If possible, the goal is to switch out the old stylesheet with the desktop stylesheet, and then upgrade the desktop stylesheet to include the DCAT elements.
EJ and FRS
Can we eliminate one of these safely without breaking anything?
The WAFer application is designed to emulate a typical web accessible folder for harvesting metadata, however, it does not appear to be working for ftp://newftp.epa.gov in the same way that it works for ftp://ftp.epa.gov. We need to investigate why it's working for one and not the other and fix it if possible. In the internal\wafconfig.xml file, the two side-by-side configurations are:
<SOURCE` type="FTP" shortName="ORD_NHEERL_WED" longName="U.S. EPA ORD-NHEERL-WED" serviceUrl="ftp://ftp.epa.gov/wed/ecoregions/gdg/" recurse="1" user="anonymous" pwd="" />
<SOURCE type="FTP" shortName="ORD_NHEERL_WED_New" longName="U.S. EPA ORD-NHEERL-WED New" serviceUrl="ftp://newftp.epa.gov/EPADataCommons/ORD/NHDPlusLandscapeAttributes/StreamCat/Documentation/Metadata/XML%20Files/" recurse="1" user="anonymous" pwd="" />
all other configurations can be ignored or commented out for the purpose of this issue. Additional details are being sent via email.
Some stakeholders are wanting to switch to use DOIs (https://en.wikipedia.org/wiki/Digital_object_identifier) as their unique IDs, which seems to be ok in the database, but isn't supported as a direct link to the record using the standard syntax
A custom Identifier may be used to return a record via an indexed field:
https://edg.epa.gov/metadata/rest/find/document?f=dcat&searchText=fileIdentifier:A-280j-22
or by pretending it's the UUID:
https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid=A-280j-22
https://edg.epa.gov/metadata/rest/document?id=A-280j-22
But if the custom ID includes a / which is part of the DOI specification, those direct linkages seem to break. Is there anything we can do, or is this a problem with how browsers parse URLs?
The individual elements that appear in a result need to be reorganized and validation errors need to be highlighted - for now this is a placeholder.
Stewards should have the ability to launch an editor tool (separate ticket) for addressing validation errors.
Many EPA staff web browsers set to default to IE7 emulation for *.epa.gov websites, meaning we need to resolve at least basic look and feel issues - the home page looks pretty awful in IE7.
https://www.epa.gov/webguide/what-browsers-does-epa-support
This comes out of the work on issue #53, which is complete and soon in production. The goal is to have check boxes (facets) allowing quick filtering of results to just EPA regions. If we get this working with a nice UI, it's quite likely Ana will want to add other facets to the search page in the future.
Instead of saving previous search term.
This ticket is designed to capture the major requirements/architecture for a web-based editing tool. Per Ana's vision:
Metadata editing vision:
Metadata editing needs to be limited to known stewards, so it should be behind the agency login and recognize EDG authorization groups.
If non-geo records are going to be edited in place, the editor code should probably sit on the edg-intranet server where the raw .json files also sit - that way the server side code could have direct access to the source records, all records from a single owner (in a single json file) could be loaded/edited/updated at once, and the result written immediately to the server without the user needing to download anything. Ideally after the edit, a re-harvest would also be triggered.
The editor tool should follow the EPA metadata technical spec and focus exclusively on non-geo/POD fields.
Users creating brand new files from scratch (without pointing at an existing .json file on the server) would be able to save a copy of the record to their desktop that they could then send to [email protected] for posting to our server.
Users editing existing JSON files would also be able to save a copy of the complete/updated JSON file locally if they wish.
Editing Geo records is more challenging - tool would need to be able to bring source XML to browser, make edits and allow updated XML to be saved locally (not pushed back to the server). User would then be responsible for placing the XML in an appropriate harvest location.
This ticket probably needs to be broken up into smaller tickets.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.