edrn / p5 Goto Github PK

View Code? Open in Web Editor NEW

0.0 6.0 0.0 15.59 MB

EDRN Production Program for the Public/Private Portal (P5)

Home Page: https://edrn.nci.nih.gov/

License: Other

Python 70.06% Shell 2.47% CSS 0.56% HTML 24.32% Dockerfile 0.75% JavaScript 1.84%

cancer bigdata early-detection knowledge portal docker django python wagtail

p5's Introduction

Early Detection Research Network Portal

This is the software for the Early Detection Research Network (EDRN) public portal and knowledge environment. It nominally runs the site at https://edrn.nci.nih.gov/

🤓 Development

To develop the portal software for the Early Detection Research Network, you'll need Python, PostgreSQL, Elasticsearch, Redis, and a couple of environment variables. Note that these environment variables should be provided in the development environment, by the continuous integration, by the containerization system, etc. They must be set always:

Variable Name	Use	Value
`DATABASE_URL`	URL to the database where the portal persists data	`postgresql://:@/edrn`
`LDAP_BIND_PASSWORD`	Credential for the EDRN Directory `service` user	Contact the directory administrator

Next, set up a PostgreSQL database:

$ createdb edrn

This has to be done just for the first time—or if you ever get rid of the database with dropdb edrn. Then, set up the software and database schema and content:

$ python3 -m venv .venv
$ .venv/bin/pip install --quiet --upgrade pip setuptools wheel build
$ .venv/bin/pip install --editable 'src/eke.geocoding[dev]'
$ .venv/bin/pip install --editable 'src/edrnsite.streams[dev]'
$ .venv/bin/pip install --editable 'src/edrnsite.controls[dev]'
$ .venv/bin/pip install --editable 'src/edrnsite.content[dev]'
$ .venv/bin/pip install --editable 'src/edrn.collabgroups[dev]'
$ .venv/bin/pip install --editable 'src/eke.knowledge[dev]'
$ .venv/bin/pip install --editable 'src/eke.biomarkers[dev]'
$ .venv/bin/pip install --editable 'src/edrnsite.search[dev]'
$ .venv/bin/pip install --editable 'src/edrn.theme[dev]'
$ .venv/bin/pip install --editable 'src/edrnsite.ploneimport[dev]'
$ .venv/bin/pip install --editable 'src/edrnsite.policy[dev]'
$ .venv/bin/pip install --editable 'src/edrnsite.test[dev]'
$ .venv/bin/django-admin migrate --pythonpath . --settings local
$ .venv/bin/django-admin createsuperuser --pythonpath . --settings local --username root --email [email protected]

When prompted for a password, enter a suitably secure root-level password for the Django super user (twice).

👉 Note: This password is for the application server's "manager" or "root" superuser and is unrelated to any usernames or passwords used with the EDRN Directory Service. But because it affords such deep and penetrative access, it must be kept double-plus super-secret probationary secure.

Then, to run a local server so you can point your browser at http://localhost:8000/ simply do:

$ .venv/bin/django-admin runserver --pythonpath . --settings local

You can also visit the Wagtail admin at http://localhost:8000/admin/ and the Django admin at http://localhost:8000/django-admin/

To see all the commands besides runserver and migrate that Django supports:

$ .venv/bin/django-admin help --pythonpath . --settings local

🍃 Environment Variables

Here is a table of the environment variables that may affect the portal server (some of these have explicit values depending on context, such as containerization):

Variable	Use	Default
`ALLOWED_HOSTS`	What valid hostnames to serve the site on (comma-separated)	`.nci.nih.gov,.cancer.gov`
`AWS_ACCESS_KEY_ID`	Amazon Location Service account access key	Unset
`AWS_SECRET_ACCESS_KEY`	Amazon Location Service secret access key	Unset
`BASE_URL`	Full URL base for generating URLs in notification emails	`https://edrn.nci.nih.gov/`
`CACHE_URL`	URL to the caching & message brokering service	`redis://`
`CSRF_TRUSTED_ORIGINS`	Comma-separated list of origins we implicity trust in form req	`http://.nci.nih.gov,https://.nci.nih.gov`
`DATABASE_URL`	URL to persistence	Unset
`ELASTICSEARCH_URL`	Where the search engine's ReST API is	`http://localhost:9200/`
`FORCE_SCRIPT_NAME`	Base URI path (Apache "script name") if app is not on `/`	Unset
`LDAP_BIND_DN`	Distinguished name to use for looking up users in the directory	`uid=service, dc=edrn, dc=jpl, dc=nasa, dc=gov`
`LDAP_BIND_PASSWORD`	Password for the `LDAP_BIND_DN`	Unset
`LDAP_CACHE_TIMEOUT`	How many seconds to cache directory lookups	`3600` seconds (1 hour)
`LDAP_URI`	URI to locate the EDRN Directory Service	`ldaps://edrn-ds.jpl.nasa.gov`
`MEDIA_ROOT`	Where to save media files	Current dir + `/media`
`MEDIA_URL`	URL prefix of media files; must end with `/`	`/media/`
`MQ_URL`	URL to the message queuing service	`redis://`
`RECAPTCHA_PRIVATE_KEY`	Private key for reCAPTCHA	Unset
`RECAPTCHA_PUBLIC_KEY`	Public key for ereCAPTCHA	Unset
`SECURE_COOKIES`	`True` for secure handling of session and CSRF cookies	`True`
`SIGNING_KEY`	Cryptographic key to protect sessions, messages, tokens, etc.	Unset in operations; set to a known bad value in development
`STATIC_ROOT`	Where to collect static files	Current dir + `/static`
`STATIC_URL`	URL prefix of static files; must end with `/`	`/static/`

Note that the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY can be set through-the-web; the environment variables are just a fallback. Sadly, neither RECAPTCHA_PRIVATE_KEY nor RECAPTCHA_PUBLIC_KEY can due to limitations of wagtail-django-recaptcha.

🪶 Apache HTTPD Configuration with Jenkins

This section describes how you'd use Apache HTTPD with Jenkins in order to make the site accessible to the world.

First up, the HTTPD configuration:

WSGIDaemonProcess edrnportal user=edrn group=edrn python-home=/usr/local/edrn/portal/p5-renaissance/venv 
WSGIProcessGroup edrnportal
WSGIScriptAlias /portal/renaissance /usr/local/edrn/portal/p5-renaissance/jenkins.wsgi process-group=edrnportal
Alias /portal/renaissance/media/ /usr/local/edrn/portal/p5-renaissance/media/
Alias /portal/renaissance/static/ /usr/local/edrn/portal/p5-renaissance/static/
<Directory "/usr/local/edrn/portal/p5-renaissance">
    <IfVersion < 2.4>
        Order allow,deny
        Allow from all
    </IfVersion>
    <IfVersion >= 2.4>
        Require all granted
    </IfVersion>
</Directory>
<Directory "/usr/local/edrn/portal/p5-renaissance/static/">
    Options FollowSymLinks
</Directory>

Next, here's the jenkins.wsgi that was referenced in the HTTPD configuration above (Jenkins should generate this with each build):

from django.core.wsgi import get_wsgi_application
import os
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'edrnsite.policy.settings.ops')
os.environ.setdefault('LDAP_BIND_DN', 'uid=service,dc=edrn,dc=jpl,dc=nasa,dc=gov')
os.environ.setdefault('LDAP_BIND_PASSWORD', 'REDACTED')
os.environ.setdefault('SIGNING_KEY', 'REDACTED')
os.environ.setdefault('DATABASE_URL', 'postgresql://:@/edrn')
os.environ.setdefault('ALLOWED_HOSTS', '.jpl.nasa.gov')
os.environ.setdefault('STATIC_ROOT', '/usr/local/edrn/portal/p5-renaissance/static')
os.environ.setdefault('MEDIA_ROOT', '/usr/local/edrn/portal/p5-renaissance/media')
os.environ.setdefault('BASE_URL', 'https://edrn-dev.jpl.nasa.gov/portal/renaissance/')
os.environ.setdefault('STATIC_URL', '/portal/renaissance/static/')
os.environ.setdefault('MEDIA_URL', '/portal/renaissance/media/')
os.environ.setdefault('SECURE_COOKIES', 'False')
os.environ.setdefault('ELASTICSEARCH_URL', 'http://localhost:9200/')
os.environ.setdefault('CACHE_URL', 'redis://')
# We don't need FORCE_SCRIPT_NAME here since Apache's WSGISCriptAlias does the right thing
application = get_wsgi_application()

Finally, this needs to be run on each deployment:

$ venv/bin/django-admin collectstatic --settings erdnsite.policy.settings.ops --clear --link
$ mkdir media

🚢 Container Setup

To use this software in a Docker container environment, first collect the wheels by running:

support/build-wheels.sh

Or by hand:

.venv/bin/python -m build --outdir dist src/eke.geocoding
.venv/bin/python -m build --outdir dist src/edrnsite.streams
.venv/bin/python -m build --outdir dist src/edrnsite.controls
.venv/bin/python -m build --outdir dist src/edrnsite.content
.venv/bin/python -m build --outdir dist src/edrn.collabgroups
.venv/bin/python -m build --outdir dist src/edrn.theme
.venv/bin/python -m build --outdir dist src/edrnsite.search
.venv/bin/python -m build --outdir dist src/eke.knowledge
.venv/bin/python -m build --outdir dist src/eke.biomarkers
.venv/bin/python -m build --outdir dist src/edrnsite.ploneimport
.venv/bin/python -m build --outdir dist src/edrnsite.policy

You don't need src/edrnsite.test since it's just used for testing.

Repeat this for any other source directory in src. Then build the image:

docker image build --build-arg user_id=NUMBER --tag edrn-portal:latest --file docker/Dockerfile .

Replace NUMBER with the number of the user ID of the user under which to run the software in the container. Typically you'll want

500 for running at the Jet Propulsion Laboratory.
26013 for running at the National Cancer Institute.

Spot check: see if the image is working by running:

docker container run --rm --env LDAP_BIND_PASSWORD='[REDACTED]' --env SIGNING_KEY='s3cr3t' \
    --env ALLOWED_HOSTS='*' --publish 8000:8000 edrn-portal:latest

and visit http://localhost:8000/ and you should get Sever Error (500) since the database connection isn't established.

For a Docker Composition, the accompanying docker/docker-compose.yaml file enables you to run the orchestrated set of needed processes in production, including the portal, maintenance worker, search engine, cache and message queue, and a database. You can launch all the processes at once with docker compose up.

👉 Note: On some systems, docker compose is actually docker-compose.

The environment variables listed above also apply to the docker compose command. The defaults in docker/docker-compose.yaml are suitable for running at the National Cancer Institute, but the environment variables absolutely need to be adjusted for every other context. A table of the additional environment variables follows:

Variable	Use	Default
`EDRN_DATA_DIR`	Volume to bind to provide media files and PostgreSQL DB	`/local/content/edrn`
`EDRN_IMAGE_OWNER`	Name of image owning org.; use an empty string for a local image	`edrndocker/`
`EDRN_PUBLISHED_PORT`	TCP port on which to make the HTTP service available	8080
`EDRN_TLS_PORT`	Encrypted TCP port, if the `tls-proxy` profile is enabled	4134
`EDRN_VERSION`	Version of the image to use, such as `latest`	`6.0.0`
`POSTGRES_PASSWORD`	Root-level password to the PostgreSQL database server	Unset

These variables are also necessary while setting up the containerized database.

Ater setting the needed variables, start the composition as follows:

docker compose --project-name edrn --file docker/docker-compose.yaml up --detach

You can now proceed to set up the database, search engine, and populate the portal with its content.

📀 Containerized Database Setup

Next, we need to set up the database with initial structure and content. This section tells you how.

🏛 Database Structure

To set up the initial database and its schema inside a Docker Composition, we start by creating the database:

docker compose --project-name edrn --file docker/docker-compose.yaml \
    exec db createdb --username=postgres --encoding=UTF8 --owner=postgres edrn

👉 Note: You must set the same environment variables in the above command—and the subsequent commands—as running the entire composition—especially the POSTGRES_PASSWORD.

Next, run the Django database migrations (again, with the environment set):

docker compose --project-name edrn --file docker/docker-compose.yaml \
    exec portal django-admin makemigrations
docker compose --project-name edrn --file docker/docker-compose.yaml \
    exec portal django-admin migrate

👴 Import Content from Plone

👉 Note: This is no longer necessary. Plone hasn't been used in a while now; instead you just upgrade from the previous Wagtail-based version. However, I'm leaving this information intact for posterity and reference. Skip down to "Populate the Rest of the Content".

The next step is to bring the content from the older Plone-based site into the new Wagtailb-ased site. You will need the following:

edrn.json, a file containing the hierarchical content; ask the portal developer for a copy.
export_defaultpages.json, a file indicating the view for "folderish" content types; ask the portal developer for a copy.
The Plone URL prefix used to construct the edrn.json file; ask the developer for the correct value.
The blobstorage directory used by Plone; this is available on the host running the Plone version of the portal.

Import the content from the old Plone site (with the environment variables from above still set):

docker compose --project-name edrn --file docker/docker-compose.yaml \
    run --volume PLONE_EXPORTS_DIR:/mnt/zope --volume PLONE_BLOBS:/mnt/blobs \
    --entrypoint /usr/bin/django-admin --no-deps portal importfromplone \
    PLONE_URL /mnt/zope/edrn.json /mnt/zope/export_defaultpages.json /mnt/blobs

Subsituting:

PLONE_EXPORTS_DIR with the path to the directory containing the edrn.json and export_defaultpages.json files (sold separately)
PLONE_BLOBS with the path to the Zope blobstorage directory, such as /local/content/edrn/blobstorage
PLONE_URL with the prefix URL (provided by the portal developer)

For example, you might save edrn.json and export_defaultpages.json to /tmp, have your blobs in /local/content/edrn/blobstorage, and be told that the prefix URL is http://nohost/edrn/; in that case, you'd run:

docker compose --project-name edrn --file docker/docker-compose.yaml \
    run --volume /tmp:/mnt/zope --volume /local/content/edrn/blobstorage:/mnt/blobs \
    --entrypoint /usr/bin/django-admin --no-deps portal importfromplone \
    http://nohost/edrn/ /mnt/zope/edrn.json /mnt/zope/export_defaultpages.json /mnt/blobs

🥤 Populate the Rest of the Content

You can then populate the rest of the database with EDRN content, maps, menus, and so forth (with the environment still set) by running:

docker compose --project-name edrn --file docker/docker-compose.yaml \
    exec portal django-admin collectstatic --no-input

This step is no longer necessary; it was used only for the first instance of the Wagtail-based site. I'm leaving it here for future reference. Don't try to run it; it won't work.

env AWS_ACCESS_KEY_ID=KEY AWS_SECRET_ACCESS_KEY=SECRET docker compose --project-name edrn \
    --file docker/docker-compose.yaml \
    exec portal django-admin edrnbloom --hostname HOSTNAME

Instead, skip down to this next step:

docker compose --project-name edrn --file docker/docker-compose.yaml \
    exec portal django-admin ldap_group_sync
docker compose --project-name edrn --file docker/docker-compose.yaml \
    exec portal django-admin rdfingest  # This can take a long time, 10–20 minutes
docker compose --project-name edrn --file docker/docker-compose.yaml \
    exec portal django-admin autopopulate_main_menus

In the above, replace HOSTNAME with the host name of the portal, such as edrn-dev.nci.nih.gov or edrn-stage.nci.nih.gov or even edrn.nci.nih.gov. Replace AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY with their corresponding values. If you don't know them, leave them unset.

Lastly, stop the entire service and remove the orphaned containers made during the above steps:

docker compose --project-name edrn --file docker/docker-compose.yaml \
    down --remove-orphans

Then, start it up again, officially!

docker compose --project-name edrn --file docker/docker-compose.yaml \
    up --detach

Then you can point a browser at http://localhost:4135/ (or whatever the EDRN_PUBLISHED_PORT is) and see if it worked. Note that things won't look quite right because static resources aren't loaded on this endpoint URL. The front-end application load balancer or reverse-proxy must serve those.

🕸 Reverse Proxy: ELB, ALB, Nginx, Apache HTTPD, etc.

The Docker Composition itself is not enough, of course. The last step is to set up an actual web server to accept requests, serve static and media files, reverse-proxy to the portal container, handle TLS/SSL encryption, load balancing, and so forth.

The web server is also responsible for serving media files and static assets. This is for efficiency: there's no need to involve the backend content management system for such files (which can be large). Furthermore, by giving the server direct filesystem access, it can use the sendfile system call, which is blazingly efficient.

In a nutshell, the web server must serve MEDIA_URL requests to the MEDIA_ROOT directory (which is EDRN_DATA_DIR/media), STATIC_URL requests to the STATIC_ROOT directory (which is EDRN_DATA_DIR/static), and all other requests reverse-proxied to the EDRN_PUBLISHED_PORT (or EDRN_TLS_PORT if you're using it) TCP socket.

How you configure an Elastic Load Balancer, Application Load Balancer, Nginx, Apache HTTPD, or other web server to handle reverse-proxying to the portal container as well as serving static and medial files depends on the software in use. In the interests of including a working example, though, see the following Nginx configuration:

server {
    listen …;
    location /media/ {                 # Request = http://whatever/media/documents/sentinel.dat
        root /local/web/content/edrn;  # Response = /local/web/content/edrn/media/documents/sentinel.dat
    }
    location /static/ {                # Request = http://whatever/static/edrn.theme/css/edrn-overlay.css
        root /local/web/content/edrn;  # Response = /local/web/content/edrn/static/edrn.theme/css/edrn-overlay.css
    }
    location / {                           # All other requests go to the portal container
        proxy_pass http://localhost:4135;  # EDRN_PUBLISHED_PORT = 4135
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_redirect default;        
    }
}

🔻 Subpath Serving

Normally, the EDRN portal is hosted on the root path / of a host; for example, in production it's at https://edrn.nci.nih.gov/. However, for certain demonstrations and other expositions, it may be necessary to host it on a "subpath", such as https://edrn.jpl.nasa.gov/portal/renaissance/. Here, the subpath is /portal/renaissance/.

Depending on the web server, you may not need to do anything to support such a configuration, because the web server recognizes the subpath and sets the SCRIPT_NAME environment variable to the subpath (this is the case for mod_wsgi). Others, such as reverse-proxies, make no assumptions and make no such setting. When this is the case, you can set the FORCE_SCRIPT_NAME environment variable in the Docker composition to force the portal to believe a SCRIPT_NAME was set even when it wasn't.

As an example, if the web server is reverse-proxying to the Docker composition for URLs such as https://edrn.jpl.nasa.gov/portal/renaissance/, then we'd set FORCE_SCRIPT_NAME when starting the composition to /portal/renaissance/.

👩‍💻 Software Environment

To develop for this system, you'll need

PostgreSQL 13 or later, but not 15 or later
Python 3.9 or later, but not 4.0 or later
Elasticsearch 7.17 or later, but not 8.0 or later
Redis 7.0 or later, but not 8.0 or later

👥 Contributing

You can start by looking at the open issues, forking the project, and submitting a pull request. You can also contact us by email with suggestions.

🔢 Versioning

We use the SemVer philosophy for versioning this software. For versions available, see the releases made on this project. We're starting off with version 5 because reasons.

👩‍🎨 Creators

The principal developer is:

Sean Kelly

The QA team consists of:

To contact the team as a whole, email the Informatics Center.

📃 License

The project is licensed under the Apache version 2 license.

p5's People

Contributors

Stargazers

Watchers

p5's Issues

Group Workspaces' login message isn't obvious

In P4 the text "If you are a member of this group, log in to gain full access" was in bold. In P5 it is lighter.

Dan Crichton is concerned people may not realize they need to log into the NCT to see documents there.

Broken Link on the Collaborative Opportunities Page

The "Request for Biomarkers" link is broken on the Collaborative Opportunities page.

To reproduce:

Visit the portal with a browser.
At the bottom, click "Collaborative Opportunities".
Click "Request for Biomarkers", result: 404 ❌

Broken Link on the Secretome Tool Page

There is a broken link on the page describing the Secretome Tool.

To reproduce:

Visit the portal.
Click "Resources" in the global navigation bar.
Under "Research Tools", click "Secretome".
Under "Contact Information", click "Michael Birrer"; result: 404 ❌

Revamp Home Page

🤔 Tell Us About the Feature

Make the EDRN home page like the suggestion on page 3 of the file EDRNPortalUpdatesSOWandNCISuggestions.pdf

🎇 What Solution You'd Like

Make the EDRN home page like the suggestion on page 3 of the attached.

〽️ Alternative Ideas

We could find even better images to use instead of the one on page 3. For example, Christopher Hawley at JPL has a group that produced several nice images.

🗺 Context

This is for the new portal Statement of Work.

Biomarker ingest for THORLNC doesn't link to study

🐛 Describe the Bug

After a full ingest, the biomarker THORLNC should have a link to the protocol "Clinical utility of novel ncRNAs as biomarkers for prostate cancer: Discovery of Novel Gene Elements Associated with Prostate Cancer Progression" but it doesn't.

It does on Focus BMDB.

📜 How To Reproduce

Log in
Run a full ingest at https://edrn-dev.jpl.nasa.gov/portal/5/dev/@@ingestRDF
Visit https://edrn-dev.jpl.nasa.gov/portal/5/dev/biomarkers/thorlnc
Click Organs
Scroll down to "Organ-Specific Protocols"

There should be a link here to "Clinical utility of novel ncRNAs as biomarkers for prostate cancer: Discovery of Novel Gene Elements Associated with Prostate Cancer Progression". Note that this protocol is indeed on the portal.

❗️ Mention if the bug appears:

When logged in
When not logged in

🔎 Expected Behavior

Maybe there should be a link?

PubMed link not working

links only take you to search page, not actual pub. This seems to be happening for every link on the publications tab that I click on (I tried maybe 5). Here’s an example - https://edrn.nci.nih.gov/publications/12432559-a-case-control-analysis-of-lymphocytic , then click on the pubmed id - https://pubmed.ncbi.nlm.nih.gov/?Db=pubmed&Cmd=DetailsSearch&Term=12432559%5Buid%5D

P5 Search Box - Not returning expected results

In the top search box - I type in an investigator name “Wan Lam” it returns his site, but not a link to him – seems like it should also return a link to him
If I enter “Identification of biomarkers for lung cancer in never smokers” – which is a protocol title. It doesn’t find it. I can find this when I click on Wan Lam.
If I search for “Dan Crichton” it also returns may results including his site, but again, not his name.

Broken link to JPL on Informatics Center page

The link to the Jet Propulsion Laboratory from the Informatics Center page is a 404.

To reproduce:

Visit the portal.
Click "About EDRN" in the global navigation bar.
Click "Scientific Components".
Click "Informatics Center".
Click "NASA Jet Propulsion Laboratory".

Slow 404s

🐛 Describe the Bug

There are hidden 404s for

jquery-integration.js
++resource+plone.app.jquerytools.js

on pages with Daviid's statistical graphics. The pages load and the graphics work, but the hidden 404s slow down Plone.

📜 How To Reproduce

Open the brower's developer console and visit

/biomarkers
/data
/publications

❗️ Mention if the bug appears:

When logged in
When not logged in

🔎 Expected Behavior

No 404s please.

🖼 Screenshots

Ingest Resilience from Bad Data

🤔 Tell Us About the Feature

Dan said:

Can we not fix the ingest software to a) detect the issue, b) recover and ignore and c) report the ingest errors? It seems like all our ingest software agents should do this.

This is going to generate a lot of email.

🎇 What Solution You'd Like

What Dan said.

〽️ Alternative Ideas

Force upstream data providers to be clean.

🗺 Context

DMCC Pub 2588 says the PubMed ID is "PMC5677596", but it should be "28482112"
DMCC Pub 2593 says the PubMed ID is "PMC6175854", but it should be "30297783"
DMCC Pub 2594 says the PubMed ID is "PMC6175854", but it should be "30209824"

Viewing a Disease gives a stack trace

When attempting to view a Disease object on the portal, a stack trace is generated.

Reproduction

Visit https://edrn.nci.nih.gov/resources/diseases/malignant-neoplasm-of-bronchus-and-lung/view while logged in.
Get the stack trace.

If you're not logged in, you just get an error message.

Stack Trace

Traceback (innermost last):
  Module ZPublisher.Publish, line 138, in publish
  Module ZPublisher.mapply, line 77, in mapply
  Module ZPublisher.Publish, line 48, in call_object
  Module plone.autoform.view, line 40, in __call__
  Module plone.autoform.view, line 50, in _update
  Module z3c.form.form, line 136, in updateWidgets
  Module z3c.form.field, line 277, in update
  Module z3c.form.browser.multi, line 63, in update
  Module z3c.form.browser.widget, line 171, in update
  Module z3c.form.widget, line 496, in update
  Module Products.CMFPlone.patches.z3c_form, line 47, in _wrapped
  Module z3c.form.widget, line 132, in update
  Module z3c.form.converter, line 387, in toWidgetValue
  Module z3c.form.object, line 108, in toWidgetValue
  Module z3c.form.datamanager, line 76, in query
  Module z3c.form.datamanager, line 71, in get
  Module z3c.form.datamanager, line 66, in adapted_context
TypeError: ('Could not adapt', <z3c.relationfield.relation.RelationValue object at 0x7f9b6aee99d0>, <SchemaClass eke.knowledge.knowledgeobject.IKnowledgeObject>)

Google Search Console disconnected

🐛 Describe the Bug

With the upgrade to P5, the EDRN portal's connection to Google Search Console is cut off.

📜 How To Reproduce

Log into Google Search Console and see if edrn.nci.nih.gov is active. It's not.

🔎 Expected Behavior

That it's active.

Dataset Search Result Gives Error

When searching for "Tabb" and clicking on a dataset result in the live results you get an error.

To reproduce:

Visit the portal.
In the search box (upper-right), type "Tabb".
Click "CPTAC Phase 1 Data".
Result: "We're sorry, but there seems to be an error…" ❌

Some biomarkers - page doesn't exist

When clicking on some of the biomarkers on the biomarker tab I get a page doesn't exist error.
Examples:
https://edrn-aws.nci.nih.gov/biomarkers/ACTC1
https://edrn-aws.nci.nih.gov/biomarkers/ADIPOQ

Biomarkers page is "kinda of funky"

Reported by Dan Crichton [email protected] in email <[email protected]>:

I did just notice that biomarker is kinda of funky.

Context of this report: responsive design; EDRN is required to have a responsive web UI for graceful display on mobile devices (iPhones, iPads, etc.) 📱

Investigators repeating in Protocols

Investigators repeating in protocols. It seems to be happening in every protocol – here are a few links:
https://edrn.jpl.nasa.gov/protocols/421-a-biomarker-bakeoff-in-early-stage-pancreatic

https://edrn.jpl.nasa.gov/protocols/258-a-methylation-panel-for-bladder-cancer

https://edrn.jpl.nasa.gov/protocols/447-an-automated-system-for-breast-cancer
https://edrn.jpl.nasa.gov/protocols/434-a-prospective-study-to-establish-a-new-onset

504 Gateway Timeout

clicked on JPL - https://edrn.nci.nih.gov/sites/128-nasa-jet-propulsion-laboratory.
Then click on Kristen’s link – get a 504 Gateway Timeout ☹

Portal Ingest needs an end signal, error handling

🤔 Tell Us About the Feature

When a full ingest happens, you know when it fails because there will be an error message. But if it succeeds, you get nothing—no indication at all that it's done. You just have to "know" from the log files that it's truly complete.

On top of that, there's this code:

for path in paths:
    try:
        ingestor = IIngestor(folder)
        results = ingestor.ingest()
       …
    except Exception as ex:
        # What should we do here??!
        raise ex

We should do something there! Maybe keep track of failed ingests?

🎇 What Solution You'd Like

Add both "INGEST START" and "INGEST COMPLETE" log indicators with lots of emoji so they're easy to find. Handle ingest errors better.

Milestone Release Number

🤔 Tell Us About the Feature

It'd be nice to know which version of the EDRN portal you're looking at. How about put a version in the colophon?

🎇 What Solution You'd Like

Put a version in the colophon.

〽️ Alternative Ideas

Put the version somewhere else.

🗺 Context

Well we have the following portal instantiations:

Dev site
Demo, also known as https://edrn.jpl.nasa.gov/ at the present
NCI Dev
NCI Stage
NCI Production, also known as https://edrn.nci.nih.gov/
Plus those run in the development lab environments of each developer on the team

Change container user ID to "edrn"

Although I'm not sure why, CBIIT wants the container user name and ID that runs the Zope/Plone appserver to match that of the host:

uid=26013(edrn) gid=26013(edrn) groups=26013(edrn)

We need to change the Dockerfile to use this ID instead of the one inherited from the Plone image, "plone".

VSIMS should link to https://www.compass.fhcrc.org/vs/login.asp?pt=&m=

On the Informatics Page, VSIMS should link to https://www.compass.fhcrc.org/vs/login.asp?pt=&m=

To reproduce:

Visit the portal.
In the "Quick Links" portlet, click "Informatics".
In the table under "Tools", notice the hyperlink to "VSIMS" in the left column: it's http://www.compass.fhcrc.org/vsims/

It should be ps://www.compass.fhcrc.org/vs/login.asp?pt=&m= 🤷‍♀️

update member directory page

Add text box search and faceted search.
Faceted search by PI, by site, maybe by funding type.
Can the faceted search allow you to start typing and then show matching entries? Just trying to get around these lists being so long. IDK.

Broken Link on Informatics Page

On the Informatics Page, the link to the LabCAS user guide doesn't work.

To reproduce:

Visit the portal.
On the "Quick Links" portlet, click "Informatics".
In the table under "Tools", across from "LabCAS", click "LabCAS User Guide".
After several minutes, the browser gives up trying to contact oodt.jpl.nasa.gov 💥

Add OncoMX

On the Informatics Page, add a row in the Tools table to OncoMX.

First Column

For the first column, use this icon:

And use this text: OncoMX

Second Column

For the second column: Use this description: An integrated cancer mutation and expression resource for exploring cancer biomarkers alongside related experimental data and functional information. Open to the public.

Where to Put It

Visit the portal.
In the "Quick Links" portlet, click "Informatics".
In the table under "Tools", insert a row.

Stack trace after editing BiomarkerFolder

When editing a BiomarkerFolder, the disclaimer text field doesn't contain any of the disclaimer text but a complaint from Plone's "RichTextField". Replacing that with actual valid disclaimer text and saving makes the problem much worse: the site stops displaying any biomarkers and shows instead "The site encountered an error trying to fulfill your request."

The stack trace in the log shows an issue trying to render the disclaimer field:

Traceback (innermost last):
  Module five.pt.engine, line 98, in __call__
  Module z3c.pt.pagetemplate, line 163, in render
  Module chameleon.zpt.template, line 261, in render
  Module chameleon.template, line 191, in render
  Module chameleon.template, line 171, in render
  Module 8cf79282a3bb1497910773cd71ed0c10.py, line 191, in render
  Module 44cf5528f1bcf1906524392197259774.py, line 511, in render_content_core
  Module five.pt.expressions, line 154, in __call__
  Module five.pt.expressions, line 126, in traverse
  Module zope.traversing.adapters, line 142, in traversePathElement
   - __traceback_info__: (u'Lorem', 'output')
  Module zope.traversing.adapters, line 56, in traverse
   - __traceback_info__: (u'Lorem', 'output', ())
LocationError: 'getText'
 - Location:   (line 0: col 0)
 - Arguments:  repeat: {...} (0)
               template: <ViewPageTemplateFile - at 0x7f3408c0b150>
               views: <ViewMapper - at 0x7f3408bc80d0>
               modules: <instance - at 0x7f34177ac370>
               args: <tuple - at 0x7f341af57050>
               here: <ImplicitAcquisitionWrapper - at 0x7f340afdb690>
               user: <ImplicitAcquisitionWrapper - at 0x7f34080e25f0>
               view: <SimpleViewClass from /plone/buildout-cache/eggs/eea.facetednavigation-11.7-py2.7.egg/eea/facetednavigation/browser/template/query.pt faceted_query at 0x7f3408bc8110>
               nothing: <NoneType - at 0x7f341b4f8da0>
               container: <ImplicitAcquisitionWrapper biomarkers at 0x7f33ff3695f0>
               kssClassesView: <DefaultFieldDecoratorView kss_field_decorator_view at 0x7f33fe0e0d90>
               contentFilter: {...} (0)
               plone_view: <Plone plone at 0x7f33feff4190>
               batch_base_url: https://edrn-new.jpl.nasa.gov/portal/biomarkers
               getKssClasses: <instancemethod getKssClassesInlineEditable at 0x7f34081c7fa0>
               root: <ImplicitAcquisitionWrapper Zope at 0x7f33fe043d20>
               request: <instance - at 0x7f33fe043c80>
               wrapped_repeat: <SafeMapping - at 0x7f33fe0546b0>
               traverse_subpath: <list - at 0x7f33fe0f6c30>
               default: <object - at 0x7f341a1787a0>
               loop: {...} (1)
               context: <ImplicitAcquisitionWrapper biomarkers at 0x7f33ff3695f0>
               templateId: query.pt
               translate: <function translate at 0x7f33fe57ded0>
               folderContents: <Batch - at 0x7f33ff113910>
               options: {...} (0)
               target_language: <NoneType - at 0x7f341b4f8da0>

Setting the disclaimer to an empty string is a temporary workaround.

Remove Statistics Link

On the Informatics Page, the "Statistics" link shouldn't be there.

To reproduce:

Visit the portal.
In the "Quick Links" portlet, click "Informatics".
In the table under "Tools", notice the entire row labeled "Statistics" in the first column and "The Diagnostic and Biomarkers Statistical …" in the second column.

Expected: Not this row.
Got: This row.

Can't tag Collaborations Folder

When editing a Collaborations Folder, such as the Groups folder, there is no way to add subject keywords ("tags") as there is no "Categorization" tab on the edit pane.

This is probably due to a missing Dexterity behavior. This probably affects all types in eke.knowledge.

Dev Warning Banner is Off

🐛 Describe the Bug

The development warning banner is off. That's fine in production but it should be on by default everywhere else.

The problem is that the production database is used for development, but the Jenkins job to set up the development version doesn't toggle the banner on.

Dataset Search Results Shows Plone Page

When clicking on a search result for "Gazdar", we get a hated Plone page instead of eCAS.

To reproduce:

Visit the portal.
Type "Gazdar" in the search box (upper-right).
Click "FHCRC Tewari Efficiencies".
Result: it's a Plone page, not eCAS.

CBIIT Vulnerabilities Detected by TwistLock

CBIIT ran TwistLock on the P5 image and found numerous critical and high vulnerabilities. These must be addressed. The attached spreadsheet shows the issues.

twistlock_registry_01_03_20_16_41_31.xlsx

P5 Slow

Some things seem really slow on P5.– Going from list of members and clicked on JPL - https://edrn.nci.nih.gov/sites/128-nasa-jet-propulsion-laboratory – took forever.

Filtered DMCC publications no longer needed

✋ Hold it!

I created the specially edited publications RDF feed to filter out bad PubMed IDs from the DMCC that were of the weird form "PMC123456" instead of "123456". This was to make RDF ingest work in P4.

Well it turns out P5 already has code to filter out bad PubMed IDs, so this specially filtered feed is no longer need. P5 should ingest publications directly from the CancerDataExpo.

📕 Summary

Change the RDF feed for /publications to https://edrn.jpl.nasa.gov/cancerdataexpo/rdf-data/publications/@@rdf.

Resources tab - links broken

From resources tab

click on the secretome link - https://edrn.jpl.nasa.gov/secretome - from the resources tab it doesn’t seem to take me to the tool
click on the miRNA link - https://edrn-aws.nci.nih.gov/microrna. Then click on dashboard link https://edrn.jpl.nasa.gov/miRNA/- I get page doesn’t exist
click on specimen reference sets - https://edrn-aws.nci.nih.gov/resources/sample-reference-sets - click on a reference set document - https://edrn-aws.nci.nih.gov/specimens/reference-sets/bbd-tissue-reference-set/bbd-reference-set-summary-document – page doesn’t exist

Protocol missing Lead PI

The Lead Investigator is missing from Protocol 421. When looking at this protocol on the EDRN portal that’s running at NCI it shows Randall Brand as the Lead PI - https://edrn.nci.nih.gov/protocols/421-a-biomarker-bakeoff-in-early-stage-pancreatic

Protocol link - returns error

🐛 Describe the Bug

From Ziding Feng "page" - clicked on Protocol link and get an error (see attached screen shot)

📜 How To Reproduce

From Ziding's "Page" - https://edrn.nci.nih.gov/sites/5-fred-hutchinson-cancer-research-center/feng-ziding
Click on closed protocol - MALDI Dilution Data: Randolph
Get page "We’re sorry, but there seems to be an error…"

❗️ Mention if the bug appears:

[ X] When not logged in

or possibly all the time!

🖼 Screenshots

🕵️‍♀️ Extra Details

page-within-a-page

🐛 Describe the Bug

In the biomarker tab of the portal and when I click on a biomarker name to drill into a marker, I get a portal page within a portal page. See attached screen shot.

📜 How To Reproduce

Cannot reproduce

❗️ Mention if the bug appears:
[X] When not logged in

or possibly all the time!

🖼 Screenshots

Ingest new Publications

Ingest publications found based on grant numbers provided by Christos.

Add only new publications, ignore existing publications
https://edrn.nci.nih.gov/publications
EDRNgrant_to_pubmed-excel.xlsx

Dev P5 sites using Prod Analytics and Robots

🐛 Describe the Bug

Dev instances (such as on tumor.jpl.nasa.gov, edrn-docker.jpl.nasa.gov, desktop computers, etc.) are using Google Analytics and robots.txt files that are appropriate for production, but are giving bad data when used in non-production environments.

📜 How To Reproduce

Visit any dev instance page and notice the Google Analytics JavaScript code.
Visit any dev instance and get the /robots.txt file.

❗️ Mention if the bug appears:

When logged in
When not logged in

🔎 Expected Behavior

There should be no Google Analytics JavaScript code. And the robots.txt should say it disallows the entire site from being crawled.

Data Dispatcher

We need a data dispatcher.

Explanation

The various objects of the EDRN Knowledge Environment (EKE) all have unique RDF identifiers (URIs), such as http://edrn.jpl.nasa.gov/bmdb/biomarkers/view/747 being the MTO1 biomarker or urn:edrn:data:null being the null EKE object. These are URIs that aren't necessarily URLs, though, which is fine in RDF.

For SEO reasons, the portal uses readable URLs that don't match the URIs. For example, the protocol with RDF URI http://edrn.nci.nih.gov/data/protocols/421 is could live at https://edrn.nci.nih.gov/protocols/421-a-biomarker-bakeoff-in-early-stage-pancreatic

When other applications like LabCAS want to link to that protocol, though, they know the RDF URI, but can't reliably generate 421-a-biomarker-bakeoff-in-early-stage-pancreatic.

Implementation

It'd be great if the portal had an endpoint like

https://edrn.nci.nih.gov/dispatch?id=URI

where URI could be any URI. For example, you could plug in

https://edrn.nci.nih.gov/dispatch?id=http://edrn.nci.nih.gov/data/protocols/421

and the software would give an automatic redirect to https://edrn.nci.nih.gov/protocols/421-a-biomarker-bakeoff-in-early-stage-pancreatic.

This way LabCAS wouldn't need to duplicate how Plone generates URLs and ease @yuliujpl's life.

CDE spreadsheet missing

CDE spreadsheet missing - https://edrn.nci.nih.gov/docs/cde
Please post sheet sent recently from DMCC. Let me know if you need a copy or want me to do this :)

"A Biomarker Bakeoff" protocol is giving a stack trace

🐛 Describe the Bug

Recent changes to site ingest have caused a problem with some protocols.

📜 How To Reproduce

Do a full ingest
Visit /protocols/421-a-biomarker-bakeoff-in-early-stage-pancreatic

❗️ Mention if the bug appears:

When logged in
When not logged in

🔎 Expected Behavior

You should see the protocol

🖼 Stacktrace

What you get instead is:

Traceback (innermost last):
  Module ZPublisher.Publish, line 138, in publish
  Module ZPublisher.mapply, line 77, in mapply
  Module ZPublisher.Publish, line 48, in call_object
  Module grokcore.view.components, line 150, in __call__
  Module grokcore.view.components, line 154, in _render_template
  Module five.grok.components, line 130, in render
  Module zope.pagetemplate.pagetemplate, line 137, in pt_render
  Module five.pt.engine, line 98, in __call__
  Module z3c.pt.pagetemplate, line 163, in render
  Module chameleon.zpt.template, line 261, in render
  Module chameleon.template, line 171, in render
  Module 6ca8aa17614a9d4875b6cd10a0666e8b.py, line 2190, in render
  Module dd615a19901f2a251bc852b71bd5ae29.py, line 1223, in render_master
  Module dd615a19901f2a251bc852b71bd5ae29.py, line 458, in render_content
  Module 6ca8aa17614a9d4875b6cd10a0666e8b.py, line 2178, in __fill_main
  Module 6ca8aa17614a9d4875b6cd10a0666e8b.py, line 664, in render_main
  Module five.pt.expressions, line 161, in __call__
  Module Products.CMFDynamicViewFTI.browserdefault, line 76, in __call__
  Module grokcore.view.components, line 150, in __call__
  Module grokcore.view.components, line 154, in _render_template
  Module five.grok.components, line 130, in render
  Module zope.pagetemplate.pagetemplate, line 137, in pt_render
  Module five.pt.engine, line 98, in __call__
  Module z3c.pt.pagetemplate, line 163, in render
  Module chameleon.zpt.template, line 261, in render
  Module chameleon.template, line 191, in render
  Module chameleon.template, line 171, in render
  Module 434cfe228028911c829575e4cac5ffb4.py, line 1272, in render
  Module dd615a19901f2a251bc852b71bd5ae29.py, line 1223, in render_master
  Module dd615a19901f2a251bc852b71bd5ae29.py, line 420, in render_content
  Module 434cfe228028911c829575e4cac5ffb4.py, line 1260, in __fill_content_core
  Module 434cfe228028911c829575e4cac5ffb4.py, line 736, in render_content_core
  Module five.pt.expressions, line 154, in __call__
  Module five.pt.expressions, line 126, in traverse
  Module zope.traversing.adapters, line 142, in traversePathElement
   - __traceback_info__: (None, 'absolute_url')
  Module zope.traversing.adapters, line 56, in traverse
   - __traceback_info__: (None, 'absolute_url', ())
LocationError: (None, 'absolute_url')

 - Expression: "not:site"
 - Filename:   ... e.knowledge/src/eke/knowledge/protocol_templates/view.pt
 - Location:   (line 88: col 46)
 - Source:     <p tal:condition='not:site' class='discreet' i18n:translate= ...
                                 ^^^^^^^^
 - Expression: "investigator/to_object/absolute_url"
 - Filename:   ... c/eke.knowledge/src/eke/knowledge/site_templates/view.pt
 - Location:   (line 86: col 69)
 - Source:     ... ttributes='href investigator/to_object/absolute_url'
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 - Arguments:  repeat: {...} (0)
               template: <ViewPageTemplateFile - at 0x113a1a1d0>
               views: <ViewMapper - at 0x116afb150>
               modules: <instance - at 0x10aedcd88>
               args: <tuple - at 0x10a3d8050>
               here: <ImplicitAcquisitionWrapper 5-fred-hutchinson-cancer-research-center at 0x116b99c80>
               static: <DirectoryResource15 None at 0x116afb510>
               user: <ImplicitAcquisitionWrapper - at 0x116ccea50>
               nothing: <NoneType - at 0x10a32ff28>
               container: <ImplicitAcquisitionWrapper 5-fred-hutchinson-cancer-research-center at 0x116b99c80>
               request: <instance - at 0x116962320>
               wrapped_repeat: <SafeMapping - at 0x116f88100>
               traverse_subpath: <list - at 0x1170f4950>
               default: <object - at 0x10a42ea90>
               context: <ImplicitAcquisitionWrapper 5-fred-hutchinson-cancer-research-center at 0x116b99c80>
               view: <View view at 0x116b49d50>
               translate: <function translate at 0x1170e2320>
               root: <ImplicitAcquisitionWrapper Zope at 0x116cdc870>
               options: {...} (0)
               loop: {...} (1)
               target_language: <NoneType - at 0x10a32ff28>

Dev portal must remove Google Analytics

🤔 Tell Us About the Feature

Now that P5 is in operations, we are importing the operational database and prepping it for development on the next release of the portal, automated by Jenkins. The operational database includes Google Analytics, though, which makes sense only on edrn.nci.nih.gov. It messes up on all the development environments.

🎇 What Solution You'd Like

Jenkins should drop the Google Analytics in the development post-processed DB.

〽️ Alternative Ideas

No other ideas.

🗺 Context

No further context.

Unexpected search results - search w/i biomarker tab

🐛 Describe the Bug

Searching on "14-3-3 theta" using the free text search box within the biomarker tab, it did not return any results

📜 How To Reproduce

cannot reproduce. seems to be working.
https://edrn.nci.nih.gov/biomarkers#c0=20&b_start=0&c3=14-3-3

Dev Banner

🤔 Tell Us About the Feature

In P4 days, we had a notice banner or similar that warned people that the version of the portal we were using was in development and to please visit the actual production portal. It should would be nice to have that back.

🎇 What Solution You'd Like

Any time we build a dev portal, it should automatically have a notice about development.

〽️ Alternative Ideas

I got nothin'.

🗺 Context

Missing Science Data

On the "Data" section, some following datasets are not showing information:

All of the Basophile files (first page) should list David Tabb as PI.
All of the Lung Carcinoma (third page) should list Daniel Liebler as PI.
UAB Test EGFR Translocation Data (fourth page) should list Bill Grizzle as PI.
UAB Test Preinvasive Neoplasia Image and Summary Data should list Bill Grizzle as PI.

These datasets have a PI in eCAS, but they do not have valid protocols or no protocols, which may be the problem. (None of these appear in the P4 for portal.)

Stack Trace from Biomarker → Protocol

Following a link form a biomarker to a protocol gives a stack trace for non-logged in users.

Reproduction

Visit https://edrn-new.jpl.nasa.gov/portal/biomarkers/muc1
Click the "Studies" tab
Click "SPORE/EDRN/PRE-PLCO Ovarian Phase II Validation Study"

Results

We’re sorry, but there seems to be an error…
The error has been logged as entry number 1582059591.270.623172474017.

If you need to report this to the
site administration, please include this entry number in your message.

Traceback from Site Ingest

🐛 Describe the Bug

When ingesting from RDF Sites, we are getting a stack trace:

  File "/Users/kelly/Documents/Clients/JPL/Cancer/Portal/Development/P5/src/eke.knowledge/src/eke/knowledge/rdfingestor.py", line 44, in update
    results = ingestor.ingest()
  File "/Users/kelly/Documents/Clients/JPL/Cancer/Portal/Development/P5/src/eke.knowledge/src/eke/knowledge/sitefolder.py", line 227, in ingest
    self.addInvestigators(siteURI, sites, _piURI, people, predicates, 'principalInvestigator', False)
  File "/Users/kelly/Documents/Clients/JPL/Cancer/Portal/Development/P5/src/eke.knowledge/src/eke/knowledge/sitefolder.py", line 206, in addInvestigators
    setattr(site, fieldName, RelationValue(personIDs[0]))

The error is: IndexError('list index out of range)

📜 How To Reproduce

Start the EDRN P5 portal.

Ensure ingest is enabled
Visit /@@ingestRDF

❗️ Mention if the bug appears:

When logged in
When not logged in

🔎 Expected Behavior

No stack trace.

Broken Links on Collaborative Opportunities Page

Two links are broken under the Collaborative Opportunities page.

To reproduce:

Visit the portal with a browser.
At the bottom, click "Collaborative Opportunities".
Click "Application Procedure, Receipt Dates, Review".
Click "Collaborative Groups", result: 404 ❌
Click "Executive Committee", result: 404 ❌

Standalone Ingest (`devrbuild` and Jenkins) fails

🐛 Describe the Bug

Building a local dev env P5 fails during the ingest step. It also fails in Jenkins.

📜 How To Reproduce

Run support/devrbuild.sh with a call to support/ingest.py.

🔎 Expected Behavior

Ingest should succeed

🖼 Stacktrace

2020-06-19 20:14:23 ERROR root This is most unfortunate: ((<Item at /edrn/protocols/286-prostate-rapid-reference-set-application-brian>, <HTTPRequest, URL=http://foo>, None, <z3c.relationfield.schema.RelationChoice object at 0x10ca44050>, <SelectWidget 'principalInvestigator'>), <InterfaceClass z3c.form.interfaces.ITerms>, u'')
Traceback (most recent call last):
  File "/Users/kelly/Documents/Clients/JPL/Cancer/Portal/Development/P5/support/ingest.py", line 149, in main
    _main(app)
  File "/Users/kelly/Documents/Clients/JPL/Cancer/Portal/Development/P5/support/ingest.py", line 138, in _main
    _ingest(portal)
  File "/Users/kelly/Documents/Clients/JPL/Cancer/Portal/Development/P5/support/ingest.py", line 98, in _ingest
    ingestor.ingest()
  File "/Users/kelly/Documents/Clients/JPL/Cancer/Portal/Development/P5/src/eke.knowledge/src/eke/knowledge/sitefolder.py", line 230, in ingest
    people = self._ingestPeople(peopleStatements, sites)
  File "/Users/kelly/Documents/Clients/JPL/Cancer/Portal/Development/P5/src/eke.knowledge/src/eke/knowledge/sitefolder.py", line 173, in _ingestPeople
    person = self.createPerson(site, uri, predicates)
  File "/Users/kelly/Documents/Clients/JPL/Cancer/Portal/Development/P5/src/eke.knowledge/src/eke/knowledge/sitefolder.py", line 144, in createPerson
    personID=urlparse.urlparse(unicode(identifier)).path.split(u'/')[-1]
  File "/Users/kelly/.buildout/eggs/plone.dexterity-2.6.2-py2.7.egg/plone/dexterity/utils.py", line 203, in createContentInContainer
    checkConstraints=checkConstraints
  File "/Users/kelly/.buildout/eggs/plone.dexterity-2.6.2-py2.7.egg/plone/dexterity/utils.py", line 189, in addContentToContainer
    newName = container._setObject(name, object)
  File "/Users/kelly/.buildout/eggs/Products.BTreeFolder2-2.14.0-py2.7.egg/Products/BTreeFolder2/BTreeFolder2.py", line 461, in _setObject
    notify(ObjectAddedEvent(ob, self, id))
  File "/Users/kelly/.buildout/eggs/zope.event-3.5.2-py2.7.egg/zope/event/__init__.py", line 31, in notify
    subscriber(event)
  File "/Users/kelly/.buildout/eggs/zope.component-4.4.1-py2.7.egg/zope/component/event.py", line 27, in dispatch
    component_subscribers(event, None)
  File "/Users/kelly/.buildout/eggs/zope.component-4.4.1-py2.7.egg/zope/component/_api.py", line 139, in subscribers
    return sitemanager.subscribers(objects, interface)
  File "/Users/kelly/.buildout/eggs/zope.interface-4.4.3-py2.7-macosx-10.14-x86_64.egg/zope/interface/registry.py", line 442, in subscribers
    return self.adapters.subscribers(objects, provided)
  File "/Users/kelly/.buildout/eggs/zope.interface-4.4.3-py2.7-macosx-10.14-x86_64.egg/zope/interface/adapter.py", line 607, in subscribers
    subscription(*objects)
  File "/Users/kelly/.buildout/eggs/zope.component-4.4.1-py2.7.egg/zope/component/event.py", line 36, in objectEventNotify
    component_subscribers((event.object, event), None)
  File "/Users/kelly/.buildout/eggs/zope.component-4.4.1-py2.7.egg/zope/component/_api.py", line 139, in subscribers
    return sitemanager.subscribers(objects, interface)
  File "/Users/kelly/.buildout/eggs/zope.interface-4.4.3-py2.7-macosx-10.14-x86_64.egg/zope/interface/registry.py", line 442, in subscribers
    return self.adapters.subscribers(objects, provided)
  File "/Users/kelly/.buildout/eggs/zope.interface-4.4.3-py2.7-macosx-10.14-x86_64.egg/zope/interface/adapter.py", line 607, in subscribers
    subscription(*objects)
  File "/Users/kelly/.buildout/eggs/Products.CMFCore-2.2.12-py2.7.egg/Products/CMFCore/CMFCatalogAware.py", line 266, in handleContentishEvent
    ob.notifyWorkflowCreated()
  File "/Users/kelly/.buildout/eggs/Products.CMFCore-2.2.12-py2.7.egg/Products/CMFCore/CMFCatalogAware.py", line 192, in notifyWorkflowCreated
    wftool.notifyCreated(self)
  File "/Users/kelly/.buildout/eggs/Products.CMFCore-2.2.12-py2.7.egg/Products/CMFCore/WorkflowTool.py", line 289, in notifyCreated
    self._reindexWorkflowVariables(ob)
  File "/Users/kelly/.buildout/eggs/Products.CMFCore-2.2.12-py2.7.egg/Products/CMFCore/WorkflowTool.py", line 639, in _reindexWorkflowVariables
    ob.reindexObjectSecurity()
  File "/Users/kelly/.buildout/eggs/Products.CMFCore-2.2.12-py2.7.egg/Products/CMFCore/CMFCatalogAware.py", line 103, in reindexObjectSecurity
    for brain in catalog.unrestrictedSearchResults(path=path):
  File "/Users/kelly/.buildout/eggs/Products.CMFCore-2.2.12-py2.7.egg/Products/CMFCore/CatalogTool.py", line 260, in unrestrictedSearchResults
    processQueue()
  File "/Users/kelly/.buildout/eggs/Products.CMFCore-2.2.12-py2.7.egg/Products/CMFCore/indexing.py", line 91, in processQueue
    processed = queue.process()
  File "/Users/kelly/.buildout/eggs/Products.CMFCore-2.2.12-py2.7.egg/Products/CMFCore/indexing.py", line 220, in process
    util.reindex(obj, attributes, update_metadata=metadata)
  File "/Users/kelly/.buildout/eggs/Products.CMFCore-2.2.12-py2.7.egg/Products/CMFCore/indexing.py", line 46, in reindex
    update_metadata=update_metadata
  File "/Users/kelly/.buildout/eggs/Products.CMFCore-2.2.12-py2.7.egg/Products/CMFCore/CatalogTool.py", line 359, in _reindexObject
    self.catalog_object(object, uid, idxs, update_metadata)
  File "/Users/kelly/.buildout/eggs/Products.CMFPlone-5.1.5-py2.7.egg/Products/CMFPlone/CatalogTool.py", line 421, in catalog_object
    update_metadata, pghandler=pghandler)
  File "/Users/kelly/.buildout/eggs/Products.ZCatalog-3.0.3-py2.7.egg/Products/ZCatalog/ZCatalog.py", line 476, in catalog_object
    update_metadata=update_metadata)
  File "/Users/kelly/.buildout/eggs/Products.ZCatalog-3.0.3-py2.7.egg/Products/ZCatalog/Catalog.py", line 360, in catalogObject
    blah = x.index_object(index, object, threshold)
  File "/Users/kelly/.buildout/eggs/Products.ZCTextIndex-2.13.5-py2.7-macosx-10.14-x86_64.egg/Products/ZCTextIndex/ZCTextIndex.py", line 180, in index_object
    text = getattr(obj, attr, None)
  File "/Users/kelly/.buildout/eggs/plone.indexer-1.0.5-py2.7.egg/plone/indexer/wrapper.py", line 65, in __getattr__
    return indexer()
  File "/Users/kelly/.buildout/eggs/plone.indexer-1.0.5-py2.7.egg/plone/indexer/delegate.py", line 20, in __call__
    return self.callable(self.context)
  File "/Users/kelly/.buildout/eggs/collective.dexteritytextindexer-2.2.1-py2.7.egg/collective/dexteritytextindexer/indexer.py", line 69, in dynamic_searchable_text_indexer
    widget = get_field_widget(obj, form_field, request)
  File "/Users/kelly/.buildout/eggs/collective.dexteritytextindexer-2.2.1-py2.7.egg/collective/dexteritytextindexer/indexer.py", line 142, in get_field_widget
    widget.update()
  File "/Users/kelly/.buildout/eggs/z3c.form-3.6-py2.7.egg/z3c/form/browser/select.py", line 51, in update
    super(SelectWidget, self).update()
  File "/Users/kelly/.buildout/eggs/z3c.form-3.6-py2.7.egg/z3c/form/browser/widget.py", line 171, in update
    super(HTMLFormElement, self).update()
  File "/Users/kelly/.buildout/eggs/z3c.form-3.6-py2.7.egg/z3c/form/widget.py", line 233, in update
    self.updateTerms()
  File "/Users/kelly/.buildout/eggs/z3c.form-3.6-py2.7.egg/z3c/form/widget.py", line 227, in updateTerms
    interfaces.ITerms)
  File "/Users/kelly/.buildout/eggs/zope.component-4.4.1-py2.7.egg/zope/component/_api.py", line 112, in getMultiAdapter
    raise ComponentLookupError(objects, interface, name)
ComponentLookupError: ((<Item at /edrn/protocols/286-prostate-rapid-reference-set-application-brian>, <HTTPRequest, URL=http://foo>, None, <z3c.relationfield.schema.RelationChoice object at 0x10ca44050>, <SelectWidget 'principalInvestigator'>), <InterfaceClass z3c.form.interfaces.ITerms>, u'')

Cache RDF files?

🤔 Tell Us About the Feature

Make ingest more resilient when the many network resources and web services it accesses are down.

🎇 What Solution You'd Like

Have ingest cache every RDF file it ingests so that if they're ever unavailable in the future it can just use the local copy!

〽️ Alternative Ideas

Well we could colocate a lot more of these services. Not sure why we have to run this at NCI except maybe for appearances? 🤔

🗺 Context

No other context.