Giter Club home page Giter Club logo

meta's Introduction

A meta crawler for PACS

Build Status

Purpose

This web application lets the user search through a PACS meta data. The data needs to be stored on a Apache Solr instance.

Running the application

Installation

To run the application, python 3.6 is needed. To manage the python environment Anaconda is the recommended way to manage different python versions.

Python libraries needed are noted in the requirements.txt

Setup Solr

  • Install solr, see the Solr website on how to do it
  • Create a core with solr create -c <core_name>
  • Delete the managed-schema file (path would be something like /usr/local/solr-6.2.1/server/solr/<core_name>/conf)
  • Copy import/schema/schema.xml to the solr conf dir
  • In solrconfig.xml check/do
    • Remove any ManagedIndexSchemaFactory definition if it exists
    • Add <schemaFactory class="ClassicIndexSchemaFactory"/>

Configuration

There is a settings.py file which holds all configuration options to setup

Create a directory called instance and create a file called config.cfg. This holds all instance specifig configuration options.

An example would be:

DEBUG=False
# Don't show transfer and download options
DEMO=True
SOLR_HOSTNAME='solr'
SOLR_CORE_NAME='grouping'

Run

To run the application run

python runserver.python

Run development mode

Another option is to use nodemon which also allows to reload on changes. The advantage is that even with compile errors the nodemon is still able to reload while the flask dev server crashes and needs to be manually restarted. To run with nodemon run

./run-dev.sh

Run tests and coverage

python -m unittest
coverage run --source=. -m unittest

# generate console reports
coverage report

# generate html reports
coverage html -d coverage

meta's People

Contributors

drtjre avatar fatkaratekid avatar joshy avatar kmader avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

4quant

meta's Issues

Fix brocken facets

Facets are now broken, because e.g. StudyDescription is an input now. The rendered link now contains two StudyDecriptions e.g. ['', 'lorem ipsum']

Pasting multiple name

Pasting multiple names is not working. Those needs to be quoted with semicolon. For example
"Hans Mueller" "Meier Jonas" etc.

Add solr_api package

Incorporate table / pandas export in Meta

# make a simple query
from itertools import chain
from warnings import warn
from datetime import datetime
import numpy as np
conv_date = lambda x: datetime.strptime(x, '%Y-%m-%d')
as_solr_date = lambda x: x.strftime('%Y%m%d')
import dateutil.relativedelta as rd
conv_diag = lambda x: {'end_date': as_solr_date(x.to_pydatetime()+rd.relativedelta(months = 18)), 
                       'start_date': as_solr_date(x.to_pydatetime()-rd.relativedelta(months = 18))}
str_diff = lambda seq1,seq2: sum(1 for a, b in zip(seq1, seq2) if a != b) + abs(len(seq1) - len(seq2))
def search_solr(in_query = '*:*'):
    params = query.DEFAULT_PAYLOAD
    params['limit'] = 2000
    payload = query.query_body(params).copy()
    payload['query'] = in_query
    headers = {'content-type': "application/json"}
    response = requests.get(solr_url(), data=json.dumps(payload), headers=headers)
    data = response.json()
    docs = data['grouped']['PatientID']
    docs = group(docs)
    return list(chain(*chain(*[cgrp['by_AccessionNumber'].values() for cgrp in docs['groups']]))), docs['matches']
def summarize_solr(in_row, in_query):
    c_results, n_cnt = search_solr(in_query)
    if len(c_results) != n_cnt:
        warn("Results should be equal to matches, check limits {} != {}".format(len(c_results), n_cnt), RuntimeWarning)
    
    # check the name first
    onco_name = '{NACHNAME}^{VORNAME}'.format(**in_row).upper()
    avg_diff = [(str_diff(x['PatientName'],onco_name),x['PatientName']) for x in c_results]
    nmis_field = ', '.join(np.unique([name for cnt, name in avg_diff if cnt>0]))
    n_matches = sum(map(lambda x: x[0]==0, avg_diff))
    full_count = len(c_results)
    if n_matches>0:
        # filter the non matches out
        c_results = [x for x in c_results if str_diff(x['PatientName'],onco_name)==0]
        
    u_vals = lambda res_list, x: ', '.join(np.unique([i[x] for i in res_list]))
    n_field = u_vals(c_results, 'PatientName')
    mod_field = u_vals(c_results, 'Modality')
    acc_num = u_vals(c_results, 'AccessionNumber')
    try:
        studies = [(x['Modality'], x['StudyDate'],x['AccessionNumber'],x.get('SeriesDescription',None)) for x in c_results]
    except:
        # show the exact result which is missing something
        for x in c_results:
            try:
                [(x['Modality'], x['StudyDate'],x['AccessionNumber'],x['SeriesDescription'])]
            except:
                raise ValueError('Missing key! {}'.format(x))
        
        
    return {'solr_mismatched_names': nmis_field, 
            'solr_modalities': mod_field, 
            'solr_count': full_count,
            'solr_name_matches': n_matches,
            'solr_accession_number': acc_num,
            'solr_studies': studies,
            'has_petct': (mod_field.find('PT')>=0) and (mod_field.find('CT')>=0)}

Difficult to select report for accession

Current state:
In order to get a report on the accession level, you have to uncollapse the accession and select all or one of the series.

Problems:
You have to go to series level to get the accession level report.
If you collapse the accession, you will not know what is selected.

Solution:
It would be more intuitive and more user-friendly to have check box for the whole accession when cllapsed, so no uncollapsing is needed to see what is happening inside and it would be possible to get reports in an easier way.

Suggestion:

  • use check mark if all series have been selected
  • use a square if one or more, but not all have been selected
  • use empty check box if nothing has been selected

Empty or invalid results throw error

solr_api.search_solr({'query': 'AccessionNumber:-1'})

Should return no records (an empty dataframe) instead there is an error message

KeyError                                  Traceback (most recent call last)
<ipython-input-103-c7e211f9f9ce> in <module>()
----> 1 solr_api.search_solr({'query': 'AccessionNumber:-1'})

/Users/pacs/meta/meta/solr_api.py in search_solr(params)
     39     """
     40 
---> 41     results = _result(params)
     42     data_frames = []
     43     for page in range(0, results, INTERNAL_LIMIT):

/Users/pacs/meta/meta/solr_api.py in _result(params)
     53     response = get(solr_url(app.config), data=json.dumps(payload), headers=headers)
     54     data = response.json()
---> 55     docs = data['grouped']['PatientID']
     56     results = int(docs['matches'])
     57     # add additional request to get the remaining results

KeyError: 'grouped'

Adding user id to the report name

Currently the report names are given in the fomart accession_id-report.txt. It would be useful to have information about patient as well e.g. patient_id-accession_id-report.txt.

Export results as csv

Ideally there should be an option to export the results to csv. Than all results needs to be collected (use solr_api?) and returned to the user as a file. Maybe two options are needed:

  • output=file | None
  • format=html | json?

This needs to be in the results and not in the form, so the user can first drill down the information they needed and then export.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.