Giter Club home page Giter Club logo

edg_metadata's People

Contributors

baohong avatar torrin47 avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

saisuma004

edg_metadata's Issues

Add gallery functionality to new home page

  • Ana would like two visible gallery sections on the home page - the first one would be titled "Featured Data Products" and the second one "Popular Datasets".
  • Under Featured Data Products would be 3 tabs - "Climate Change", "Environmental Justice", and "Facility Data", and selecting a tab would show a corresponding set of thumbnails. Popular datasets would not have a similar set of tabs.
  • Each gallery would display 6 thumbnails across (most EDG metadata records do not have thumbnails today, we're working on that, there should be a default thumbnail image that displays when the metadata does not have a thumbnail). Some example galleries seem to allow horizontal scrolling (http://fema.maps.arcgis.com/home/index.html) maybe we can add this in in a later sprint.
  • At the bottom right of each gallery there should be a "see more" link that would link users to the main search results page with an appropriate search for the rest of the records corresponding to the featured or popular list.
  • Each thumbnail should link to the details page for the corresponding metadata record (Template URL: https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid={d8866557-05f6-4622-a873-f74ee726a6ee}) the metadata title would appear below the thumbnail, and the abstract/description would appear as a hovertip.

The idea is to use EDG's custom compilations to manage the galleries - the top 50 most popular datasets are already managed in a compilation, visible via API here:
https://edg.epa.gov/metadata/rest/find/document?f=json&childrenof={9007D9FF-E18F-9A91-564F-5C4FF3FAB904} if a record has a thumbnail, it seems to be included in the list of links - this search shows a couple of examples: https://edg.epa.gov/metadata/rest/find/document?searchText=usgs&start=1&max=25&f=json
Below is a clumsily marked-up mockup.

screenshot

On metadata details page, change page "title" to match metadata record title

Curious to see how feasible this is, it would definitely help out with our GoogleAnalytics reporting. Right now the HTML title (<title>) for all details pages are the same: EPA Environmental Dataset Gateway. It would be very handy if we could set this title to be equivalent to the title of the metadata record being displayed. Worth discussing.

Publishing Errors

This issue has two parts:

  • Register resource on the network takes you back to the home page
  • Upload metadata record returns error

From: Harness, Catherine
Sent: Thursday, August 25, 2016 11:44 AM
To: Hultgren, Torrin [email protected]
Subject: EDG Staging Publishing Errors

Hi Torrin,

I was working with Mark to test the new OAR-OAP record thumbnails. They appear to be working in the production EDG, however we thought it would be best to test them in the staging version since that’s where Ana wants to see the thumbnails.

When attempting to create a harvest repository on the staging server, after selecting “Register resource on the network” and clicking the “Proceed” button, I get sent back to the homepage.

Then I thought we could easily upload a file for testing purposes, I get the following error when I hit the upload button:

screenshot

Thanks,

Catherine

Catherine Harness
EPA National Geospatial Support Team
Innovate!, Inc. | [email protected] | 513-713-0260

Select UI framework for metrics page redevelopment.

The UI for the EDG Metrics page is clunky and confusing, but reworking it into something modern, clean, and intuitive like this interface might not require starting over from scratch.

The legacy metrics code has 2 components:

  • Java/JSP code that generates json file with all the data for the page pulled out of the database nightly
  • Simile Exhibit front end to facilitate faceted search/filter

Editing the JSP code to adjust data for metrics is pretty straightforward/low effort (I was able to do it myself!) so even though the way xpath queries are handled in the database is a little clunky, I don't know that there's much value in replacing the back end.
I'm unclear on how challenging it might be to apply purely cosmetic changes to the existing Simile Exhibit front-end. There is an Exhibit3 framework, but development seems to have tapered off, and the UI doesn't look significantly better. Modern JavaScript frameworks seem to offer significantly more capabilities out of the box for interacting with data. The example above uses Esri's Calcite framework, which has this caution:

Calcite Web, while still a CSS Framework, has some profound differences from projects like Bootstrap or Foundation. Where Bootstrap and Foundation both aim to provide a robust set of patterns and utilities for the general, third party developer, Calcite Web only concerns itself with Esri projects. Calcite Web is not designed for a developer who is not directly working for Esri on Esri products or properties. In other words, every project created with Calcite Web will look like an Esri-branded site.

Looking like an Esri-branded site is fine, it's a clean and professional look, but this metrics page is several steps removed from any Esri products or properties, so we don't gain much by locking into an Esri framework, and we might lose out on widgets and functions available in the bigger frameworks. I know Bootstrap is very popular and is underneath the dashboard that Ana really likes, but I'm not sure whether we need a complete framework or might be able to take a widget approach using something like JqueryUI. The decision might also be determined by other requirements, but I will enter those as separate tickets #80 , #81 , #82 , #82 , #83, #84, with this ticket capturing the big-picture decision on what framework to use.

Fix character encoding in "see more" link

I think we didn't pursue this issue because it was linking to staging where we didn't expect the search to work, but now that we've updated the data in staging, it's more apparent. The see more link submits the http-encoded version of the search string:
sys.collection%3a%22%7b9B7778AC-DE79-287A-2A79-F05863C8A212%7d%22
which the EDG search apparently can't handle properly (unless it's in an actual URL being interpreted as a REST query) so when triggering a search on the search page, the "See More" link needs to submit this equivalent version:
sys.collection:"{9B7778AC-DE79-287A-2A79-F05863C8A212}"

Track usage of search terms

The search terms from any search (web page, REST API, CSW, etc) should be captured and stored in a database table. The database table should contain two columns - one for the search term itself, and a second that contains the count of how many times that term was used in a search.

Potential enhancements down the road:

  • Would we ever want to track usage of search terms over time (like Google does to analyze spikes) or would that be overkill for a site like the EDG with so little traffic?
  • Would be fantastic to add a page to the "Inventory" webapp that displays summary statistics for search terms in a nice user interface.

Investigate JavaScript Version discrepancy

This page:
https://github.com/Innovate-Inc/EDG_metadata/blob/master/catalog/skins/lookAndFeel.jsp
programmatically references a single folder for javascript files, and right now it is set to v1. But there are much newer javascript folders available, and I think there might be some files that could be updated.
https://github.com/Innovate-Inc/EDG_metadata/tree/master/catalog/js
I might be wrong, but I'd appreciate a review/comparison of these files to be sure we're using the latest.

Major UI reorganization components

This ticket will try to capture the major page layout changes.

  • The metrics page should fill the full width of the window.
  • All facets should be displayed on the left (re-ordering them will be in a different ticket #81 )
  • All facets should be expandable but collapsed by default
  • Results should appear on the right (redesigning the design of a result will be in a different ticket #84)
  • No charts should appear at the top
  • Result totals and sort/navigation should be neater and cleaner, similar to Esri demo

Incorporate graphics/dynamic visualization into filter functionality

Esri example has fantastic time slider filter tool - it would be awesome if we could incorporate this, but not a high priority.
Showing bar or pie charts in facet panes alongside filter elements would be a higher priority, as long as those charts are straightforward to implement and are linked with the facet functionality.

Add EPA Regions section to Home page.

Envision this would be a row of 10 tabs that look similar to the "Featured Data Products" (6 thumbnails and a "See More" link to a full search page) but would be located under Popular Datasets.
The search syntax for each region is:
https://edg.epa.gov/metadata/rest/find/document?owner=Region%201&f=json
https://edg.epa.gov/metadata/rest/find/document?owner=Region%202&f=json
etc.

There should also be a "Find my region" link that pops up a modal with an image map that shows the "US Map Split into regions" and allows a user to click on a region that closes the modal and selects the corresponding tab. All details for implementing this image map are at this link:
https://www.epa.gov/webguide/how-build-standard-us-national-maps

Review code for hard-coded URLs

Try minimizing use of any hardcoded URLs (localhost, edg-staging, etc.) either obtain URL from live path, or from main config file.

Reorganize metrics facets

This is a placeholder ticket for capturing the revised order of facets (and potentially addition/deletion of facets). Emphasis should be on mandatory elements for validation - or possibly even a single button/facet that filters out completely valid records (showing invalid for any reason).

Enable fuzzy search as default search mode

Below is the email chain for context. Asking Esri for their thoughts before we get started on this. Will work to formulate more specific requirements.

From: Greene, Ana
Sent: Wednesday, February 22, 2017 8:59 AM
To: Hultgren, Torrin [email protected]
Cc: Pierson, Suzanne [email protected]; Harness, Catherine [email protected]; Suma Malothu [email protected]
Subject: RE: Full text search thoughts

Hi guys,
Did I ever respond to this? Just catching up…only 2 weeks behind on email…

I totally agree that the wildcard and fuzzy searches should be the default. And like the advanced search dialog. I’d like to go ahead and put all of this on our list of near term development projects.

Thanks,

Ana Greene, M.S., PMP
Environmental Dataset Gateway (EDG) Program Manager
Office of Environmental Information (OEI)
Office of Information Management (OIM)
U.S. Environmental Protection Agency
(o): 202-566-2132
(c): 571-232-7860
[email protected]
https://edg.epa.gov/

From: Hultgren, Torrin
Sent: Tuesday, February 07, 2017 7:26 PM
To: Greene, Ana [email protected]
Cc: Pierson, Suzanne [email protected]; Harness, Catherine [email protected]; Suma Malothu [email protected]
Subject: Full text search thoughts

Hi Ana,

I believe I’ve figured out the source of our continuing confusion about full text search. It was legitimately disabled years ago, but has been working for some time, yet perhaps not in the way we might expect, so I think there’s still some room for improvement, or at least adjustment. I think a lot of our confusion revolves around partial search terms and whether or not they’re considered a match. I think we can all remember a time when we used to have to be very careful about our search terms, and we couldn’t assume that search engines would appropriately match partial words or misspellings, yet these days we take it for granted. Lucene is quite capable of handling any match type we want it to, but the default is the old strict way. If we do a search for the first part of your email address, by default it will come up blank, even though there are records containing your email address:

https://edg.epa.gov/metadata/rest/find/document?f=searchpage&searchText=greene.ana

EDG has “advanced Lucene syntax” if anyone chose to read the help, and could apply a wildcard to their search, which just means that indexed terms that aren’t exact matches but contain the string are returned:

https://edg.epa.gov/metadata/rest/find/document?f=searchpage&searchText=*greene.ana*

Which gives us all 6 records that contain your email address. In theory this slows performance, but we’d need orders of magnitude more records in our index before we’d notice any difference. There’s a last option that’s kind of fun – though it doesn’t seem to work with the direct link, so you’ll have to try it manually If you do a search for greene.ana~ it will conduct a “fuzzy search”, where it will include “misspellings” or words that are very similar – it should return a bunch of records with “Greenspace” in the title.

I’m not sure about you, but I think my own expectation these days is that wildcards and fuzzy searches would be the default – I’d prefer a search to return too many results that I could filter through or refine than too few. But that may also because of an assumption that the search engine would do a good job of ranking/sorting those results so the most relevant ones would appear first, and I don’t know how valid an assumption that is with the EDG. I think we could figure out how to adjust the scoring/ranking algorithm under the hood of the EDG, but I’m not at all sure how we’d measure whether our tweaks were making search results more or less relevant. And if we were to make fuzzy searches the default, I wonder how we’d allow someone to opt-out if they wanted a more strict match? Perhaps we could show an “advanced search” dialog if they wished:

http://www.lucenetutorial.com/lucene-query-builder.html
https://www.google.com/advanced_search

Anyway, curious to know your thoughts. Definitely been on the brain today.
Torrin Hultgren
EPA National Geospatial Support Team
Innovate!, Inc. | [email protected] | 703-922-9090 x737

Can we make UUIDs not case-sensitive?

So this URL is broken:
https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid={c5e1e678-1b6b-40ff-b8dc-a89938fb4814}
but this upper-case URL works:
https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid={C5E1E678-1B6B-40FF-B8DC-A89938FB4814}
and in contrast this URL works:
https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid={0fd2712b-62d0-4aaf-ab20-2cbfe8c26b30}
but this upper-case equivalent is broken:
https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid={0FD2712B-62D0-4AAF-AB20-2CBFE8C26B30}
Strict case sensitivity would seem to be an acceptable thing, but it has caused our team a whole lot of confusion and frustration. Would it be possible to adjust the code so that at the very least the UUID portion of the URL is not case sensitive, so both uppercase and lowercase versions work across the board, at all the different endpoints?

Store copy of deleted records

New concerns have been raised about needing to track what metadata records have been deleted. Proposed functionality - figure out what functions "delete" records from the database (manual deletion, harvest that occurs with metadata record no longer present, etc) and add an additional procedure to copy that metadata record and associated geoportal attributes to a new table in the database so that we have an archive of all deleted records. (Could this be a database stored procedure?). Ana would first like a rough level of effort estimate, but considers this a pretty high priority.

Edits to Discussion Forum Link

In the staging version of the site, on the Details page for every metadata record there is currently a link that says "Discussion Forum". Ana would like this link to be moved to the right side of the screen to make it stand out, and the text be changed to "Share Your Feedback". There is already a big green "Share Your Feedback" link shown at the top of every page. Ana would like this big green button to be hidden on Details pages, but to remain on any other page.

Avoid stretching thumbnails to aspect ratio.

The thumbnails look much better, but now it seems that they're being stretched to completely fill the allotted space. If this is the only way bootstrap handles these, ok, but it'd be nice if they maintained their original aspect ratio and left whitespace in the remaining area. This stackoverflow describes the same scenario, and while they're working in the Rails world, I think the CSS should be the same:
http://stackoverflow.com/questions/25448371/bootstrap-css-thumbnail-image-resize-responsive

Transition all EDG look and feel to modernized EPA Template

Guidance for the EPA Template is here (inaccessible off the EPA LAN, basic content in comment below):
https://www.epa.gov/webguide/applications-and-one-epa-web-template
The HTML for the template is this page:
https://www.epa.gov/sites/all/libraries/template2/standalone.html
The main CSS file is here:
https://www.epa.gov/sites/all/libraries/template/s.css
and the main JS file is here:
https://www.epa.gov/sites/all/libraries/template/js.js

We do not expect this to be a quick, easy, or seamless conversion, but getting it cleaned up and polished will mean a great deal to OEI Management and the user community.

Can REST output be UTF-8 instead of ISO-8859-1

Per the email chain below, it appears that the output of the REST API is returning ISO-8859-1 even though the raw metadata records are being stored as UTF-8, which does funky things with some characters. It's not clear where this encoding switch occurs - is it just the HTTP header setting, or is there some constraint in Java? Is it an easy fix or something major? Let's investigate and/or ask Esri.

From: Hultgren, Torrin [mailto:[email protected]]
Sent: Wednesday, March 08, 2017 4:38 PM
To: Felsher, Maxwell (CGI Federal)
Cc: Greene, Ana; Suma Malothu; [email protected]; Harness, Catherine
Subject: RE: Character encoding of DCAT output?

Hi Max,

I can’t think of any reason the charset of the response should be restricted toISO-8859-1 rather than the full domain of UTF-8, and it only seems to be applying to the REST API (https://edg.epa.gov/metadata/rest) rather than other URLs. I believe we should be able to fix it, but would you mind sharing an example of one of your records that had an encoding issue that we can use for testing?

The approach you’re working with is fine – it’s conducting a full search across all indexed fields, but seems to respond very quickly. To limit the search to just the fileIdentifier field, you could use this syntax:
https://edg.epa.gov/metadata/rest/find/document?f=dcat&searchText=fileIdentifier:A-280j-22
but if there’s a performance improvement, it’s all but impossible to tell. But actually, if all you’re looking for is a way to directly reference your own records, you may also use your own identifiers – the EDG will respect them:
https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid=A-280j-22
Might simplify things on your end?

Torrin

From: Felsher, Maxwell (CGI Federal) [mailto:[email protected]]
Sent: Wednesday, March 08, 2017 2:55 PM
To: Hultgren, Torrin [email protected]
Cc: Greene, Ana [email protected]
Subject: Character encoding of DCAT output?

Hi Torrin,

We were trying to search for some EDG records in the DCAT JSON-LD format (e.g., https://edg.epa.gov/metadata/rest/find/document?searchText=A-280j-22&f=dcat), and we ran into an issue with character encoding; our code was assuming it was in UTF-8, but now we see that the HTTP response specifies ISO-8859-1 in the Content-Type header. We’re fixing our code to not assume UTF-8, but I was wondering whether it was intentional to use ISO-8859-1?

(As an aside, we’re doing this in order to be able to retrieve the corresponding EDG URL for a particular dataset we put in our metadata. We search for our identifiers using URLs like the above and then parse the response and extract the landingPage property. That was the best option we could figure out, but if you have other suggestions, let us know.)

Best,
Max Felsher
Consultant, CGI Federal
Contractor to ORD (ScienceHub team)

Upgrade stylesheet

The stylesheet used to display metadata at an endpoint like this (note the xsl=metadata_to_html_full):
https://edg.epa.gov/metadata/rest/document?id=%7B4806F6B7-E980-4307-89AD-9436DC377EE3%7D&xsl=metadata_to_html_full
never looked very good to begin with, and its usage was dropped because it didn't accommodate the new project open data dcat format - the page appeared blank. That xsl=metadata_to_html_full term is currently ignored by the application and the raw unstyled xml sent to the end user - an ok compromise, but not pretty.
Esri's desktop products do include more polished stylesheets (attached to this issue).
Stylesheets.zip
If possible, the goal is to switch out the old stylesheet with the desktop stylesheet, and then upgrade the desktop stylesheet to include the DCAT elements.

Investigate problem with WAFer accessing newftp.epa.gov

The WAFer application is designed to emulate a typical web accessible folder for harvesting metadata, however, it does not appear to be working for ftp://newftp.epa.gov in the same way that it works for ftp://ftp.epa.gov. We need to investigate why it's working for one and not the other and fix it if possible. In the internal\wafconfig.xml file, the two side-by-side configurations are:

<SOURCE` type="FTP" shortName="ORD_NHEERL_WED" longName="U.S. EPA ORD-NHEERL-WED" serviceUrl="ftp://ftp.epa.gov/wed/ecoregions/gdg/" recurse="1" user="anonymous" pwd="" />
<SOURCE type="FTP" shortName="ORD_NHEERL_WED_New" longName="U.S. EPA ORD-NHEERL-WED New" serviceUrl="ftp://newftp.epa.gov/EPADataCommons/ORD/NHDPlusLandscapeAttributes/StreamCat/Documentation/Metadata/XML%20Files/" recurse="1" user="anonymous" pwd="" />

all other configurations can be ignored or commented out for the purpose of this issue. Additional details are being sent via email.

Is it possible to include / in fileIdentifier field?

Some stakeholders are wanting to switch to use DOIs (https://en.wikipedia.org/wiki/Digital_object_identifier) as their unique IDs, which seems to be ok in the database, but isn't supported as a direct link to the record using the standard syntax
A custom Identifier may be used to return a record via an indexed field:
https://edg.epa.gov/metadata/rest/find/document?f=dcat&searchText=fileIdentifier:A-280j-22
or by pretending it's the UUID:
https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid=A-280j-22
https://edg.epa.gov/metadata/rest/document?id=A-280j-22

But if the custom ID includes a / which is part of the DOI specification, those direct linkages seem to break. Is there anything we can do, or is this a problem with how browsers parse URLs?

Metrics results layout revisions

The individual elements that appear in a result need to be reorganized and validation errors need to be highlighted - for now this is a placeholder.
Stewards should have the ability to launch an editor tool (separate ticket) for addressing validation errors.

Add regional facets to Search Page

This comes out of the work on issue #53, which is complete and soon in production. The goal is to have check boxes (facets) allowing quick filtering of results to just EPA regions. If we get this working with a nice UI, it's quite likely Ana will want to add other facets to the search page in the future.

  • Instead of entering owner=Region 0X into the search text, it would be more intuitive to leave the search box empty, and add checkboxes to the page that would filter the search results based on the regional owners. (In effect, this is creating the first "faceted search" functionality on the search page).
  • There would also be an "All Regions" checkbox that would be checked by default (all of the individual regions would be unchecked by default). If a user checks a box by a region, the "all regions" checkbox would toggle off. (We decided this has no value to end users.)
  • A user may check multiple regions and the results would be additive.
  • The search results would not be updated until a user clicks the "Search" button.
  • If a user clicks the "See More" button for one of the regional tabs on the home page, the search page would appear with that region's box checked and the search results showing.

Confirm workflow for web-based editing tool

This ticket is designed to capture the major requirements/architecture for a web-based editing tool. Per Ana's vision:

Metadata editing vision:

  • Web based editor would be used as a light weight editor for non-geo records (POD fields) and geo records (POD fields only)
  • Web based editor would also be able to create new non-geo record
  • EME used to create new metadata records and to edit geo records beyond POD field

Metadata editing needs to be limited to known stewards, so it should be behind the agency login and recognize EDG authorization groups.
If non-geo records are going to be edited in place, the editor code should probably sit on the edg-intranet server where the raw .json files also sit - that way the server side code could have direct access to the source records, all records from a single owner (in a single json file) could be loaded/edited/updated at once, and the result written immediately to the server without the user needing to download anything. Ideally after the edit, a re-harvest would also be triggered.

The editor tool should follow the EPA metadata technical spec and focus exclusively on non-geo/POD fields.

Users creating brand new files from scratch (without pointing at an existing .json file on the server) would be able to save a copy of the record to their desktop that they could then send to [email protected] for posting to our server.

Users editing existing JSON files would also be able to save a copy of the complete/updated JSON file locally if they wish.

Editing Geo records is more challenging - tool would need to be able to bring source XML to browser, make edits and allow updated XML to be saved locally (not pushed back to the server). User would then be responsible for placing the XML in an appropriate harvest location.

This ticket probably needs to be broken up into smaller tickets.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.