joshy / meta Goto Github PK

Webinterface to Apache Solr to make a PACS searchable

Python 13.07% JavaScript 22.47% HTML 10.07% CSS 2.95% Shell 0.71% Jupyter Notebook 49.88% Dockerfile 0.83%

meta's Introduction

A meta crawler for PACS

Purpose

This web application lets the user search through a PACS meta data. The data needs to be stored on a Apache Solr instance.

Running the application

Installation

To run the application, python 3.6 is needed. To manage the python environment Anaconda is the recommended way to manage different python versions.

Python libraries needed are noted in the requirements.txt

Setup Solr

Install solr, see the Solr website on how to do it
Create a core with solr create -c <core_name>
Delete the managed-schema file (path would be something like /usr/local/solr-6.2.1/server/solr/<core_name>/conf)
Copy import/schema/schema.xml to the solr conf dir
In solrconfig.xml check/do
- Remove any ManagedIndexSchemaFactory definition if it exists
- Add <schemaFactory class="ClassicIndexSchemaFactory"/>

Configuration

There is a settings.py file which holds all configuration options to setup

Solr Url (default: http://localhost:8983/solr/pacs/query)
DCMTK settings (only needed for Download/Transfer)

Create a directory called instance and create a file called config.cfg. This holds all instance specifig configuration options.

An example would be:

DEBUG=False
# Don't show transfer and download options
DEMO=True
SOLR_HOSTNAME='solr'
SOLR_CORE_NAME='grouping'

Run

To run the application run

python runserver.python

Run development mode

Another option is to use nodemon which also allows to reload on changes. The advantage is that even with compile errors the nodemon is still able to reload while the flask dev server crashes and needs to be manually restarted. To run with nodemon run

./run-dev.sh

Run tests and coverage

python -m unittest
coverage run --source=. -m unittest

# generate console reports
coverage report

# generate html reports
coverage html -d coverage

meta's People

Contributors

Stargazers

Watchers

Forkers

4quant

meta's Issues

Implement paging

Extract transfer and download job to a separate app

Move all the transfer and download job to a separate app. Current choices are:
Celery + RabbitMQ + CouchDB to store the result.

increase limit of 1000 on download status page

movescu for transfering

Movescu needs the dcm.in file for transferring, downloading probably not.

Fix brocken facets

Facets are now broken, because e.g. StudyDescription is an input now. The rendered link now contains two StudyDecriptions e.g. ['', 'lorem ipsum']

Pasting multiple name

Pasting multiple names is not working. Those needs to be quoted with semicolon. For example
"Hans Mueller" "Meier Jonas" etc.

Add solr_api package

Incorporate table / pandas export in Meta

# make a simple query
from itertools import chain
from warnings import warn
from datetime import datetime
import numpy as np
conv_date = lambda x: datetime.strptime(x, '%Y-%m-%d')
as_solr_date = lambda x: x.strftime('%Y%m%d')
import dateutil.relativedelta as rd
conv_diag = lambda x: {'end_date': as_solr_date(x.to_pydatetime()+rd.relativedelta(months = 18)), 
                       'start_date': as_solr_date(x.to_pydatetime()-rd.relativedelta(months = 18))}
str_diff = lambda seq1,seq2: sum(1 for a, b in zip(seq1, seq2) if a != b) + abs(len(seq1) - len(seq2))
def search_solr(in_query = '*:*'):
    params = query.DEFAULT_PAYLOAD
    params['limit'] = 2000
    payload = query.query_body(params).copy()
    payload['query'] = in_query
    headers = {'content-type': "application/json"}
    response = requests.get(solr_url(), data=json.dumps(payload), headers=headers)
    data = response.json()
    docs = data['grouped']['PatientID']
    docs = group(docs)
    return list(chain(*chain(*[cgrp['by_AccessionNumber'].values() for cgrp in docs['groups']]))), docs['matches']
def summarize_solr(in_row, in_query):
    c_results, n_cnt = search_solr(in_query)
    if len(c_results) != n_cnt:
        warn("Results should be equal to matches, check limits {} != {}".format(len(c_results), n_cnt), RuntimeWarning)
    
    # check the name first
    onco_name = '{NACHNAME}^{VORNAME}'.format(**in_row).upper()
    avg_diff = [(str_diff(x['PatientName'],onco_name),x['PatientName']) for x in c_results]
    nmis_field = ', '.join(np.unique([name for cnt, name in avg_diff if cnt>0]))
    n_matches = sum(map(lambda x: x[0]==0, avg_diff))
    full_count = len(c_results)
    if n_matches>0:
        # filter the non matches out
        c_results = [x for x in c_results if str_diff(x['PatientName'],onco_name)==0]
        
    u_vals = lambda res_list, x: ', '.join(np.unique([i[x] for i in res_list]))
    n_field = u_vals(c_results, 'PatientName')
    mod_field = u_vals(c_results, 'Modality')
    acc_num = u_vals(c_results, 'AccessionNumber')
    try:
        studies = [(x['Modality'], x['StudyDate'],x['AccessionNumber'],x.get('SeriesDescription',None)) for x in c_results]
    except:
        # show the exact result which is missing something
        for x in c_results:
            try:
                [(x['Modality'], x['StudyDate'],x['AccessionNumber'],x['SeriesDescription'])]
            except:
                raise ValueError('Missing key! {}'.format(x))
        
        
    return {'solr_mismatched_names': nmis_field, 
            'solr_modalities': mod_field, 
            'solr_count': full_count,
            'solr_name_matches': n_matches,
            'solr_accession_number': acc_num,
            'solr_studies': studies,
            'has_petct': (mod_field.find('PT')>=0) and (mod_field.find('CT')>=0)}

Difficult to select report for accession

Current state:
In order to get a report on the accession level, you have to uncollapse the accession and select all or one of the series.

Problems:
You have to go to series level to get the accession level report.
If you collapse the accession, you will not know what is selected.

Solution:
It would be more intuitive and more user-friendly to have check box for the whole accession when cllapsed, so no uncollapsing is needed to see what is happening inside and it would be possible to get reports in an easier way.

Suggestion:

use check mark if all series have been selected
use a square if one or more, but not all have been selected
use empty check box if nothing has been selected

Empty or invalid results throw error

solr_api.search_solr({'query': 'AccessionNumber:-1'})

Should return no records (an empty dataframe) instead there is an error message

KeyError                                  Traceback (most recent call last)
<ipython-input-103-c7e211f9f9ce> in <module>()
----> 1 solr_api.search_solr({'query': 'AccessionNumber:-1'})

/Users/pacs/meta/meta/solr_api.py in search_solr(params)
     39     """
     40 
---> 41     results = _result(params)
     42     data_frames = []
     43     for page in range(0, results, INTERNAL_LIMIT):

/Users/pacs/meta/meta/solr_api.py in _result(params)
     53     response = get(solr_url(app.config), data=json.dumps(payload), headers=headers)
     54     data = response.json()
---> 55     docs = data['grouped']['PatientID']
     56     results = int(docs['matches'])
     57     # add additional request to get the remaining results

KeyError: 'grouped'

output=file | None
format=html | json?

This needs to be in the results and not in the form, so the user can first drill down the information they needed and then export.