Giter Club home page Giter Club logo

maic's Introduction

Version 0.2

Authors

  • A Law
  • D Farr
  • B Wang
  • JK Baillie

Meta-analysis by information content (MAIC)

Data-driven aggregation of ranked and unranked lists

https://baillielab.net/maic

Code availability

This branch of the maic repository contains the original code, designed to be run as a python script from the command line. it also includes a variety of supplemntary files, including example input and output.

A refactored, functionally identical, version of the code, which can be run from the command line or incorporated into other python scripts, is available from PyPI at https://pypi.org/project/pymaic/0.2/, and on github at https://github.com/baillielab/maic/tree/packaging

basic usage

python maic.py -f <inputfilename>

Input file format

Input is a series of lists of named entities, which may belong to categories. Each line of the input file is a list of entities, separated by tab The first four columns (tab-separated text strings) in each line specify features of the list in this line:

<category> <list_label> RANKED <unused> entity1 entity2 entity3 ...

<category> <list_label> RANKED <unused> entity1 entity2 entity3 ...

<category> <list_label> UNRANKED <unused> entity1 entity2 entity3 ...

<category> <list_label> UNRANKED <unused> entity1 entity2 entity3 ...

Options

-f , --filename

path to the file containing data to be analysed

-o, --output-folder

path to the folder in which to write the results files

-p, --plot

draw plots for each list at each iteration

-d, --dump-scores

dump maic scores at each iteration

-l, --max_input_len

maximum list length (default:2000)

-v, --verbose

increase the detail of logging messages

-q, --quiet

decrease the detail of logging messages (overrides the -v/--verbose flag)

Dataset analysis for methods selection

The dataset features including ranking information, the number of sources included and the heterogeneity of quslity will be explored to show the estimation of the best performed ranking aggregation method for the given dataset. See Wang et al [https://doi.org/10.1093/bioinformatics/btac621] for an explanation of how we evaluated this.

Examples are included in the folder example_input_and_result with simulated input data and output of MAIC.

When MixLarge data with high heterogeneity (See Wang et al [https://doi.org/10.1093/bioinformatics/btac621]) is used, the algorithm will output:

"Based on the characteristics of your dataset, we have estimated that MAIC is the best algorithm for this analysis! See Wang et al [https://doi.org/10.1093/bioinformatics/btac621] for an explanation of how we evaluated this."

When RankLarge data with high heterogeneity (See Wang et al [https://doi.org/10.1093/bioinformatics/btac621]) is used, the algorithm will output:

"Warning! Your dataset has the unusual combination of ranked-only data, high heterogeneity and a relatively large number of sources (11) included. Based on these features we think you'd get better results from running BIRRA [http://www.pitt.edu/~mchikina/BIRRA/]. See Wang et al [https://doi.org/10.1093/bioinformatics/btac621] for an explanation of how we evaluated this."

maic's People

Contributors

princeofpotato avatar kennethbaillie avatar jonathanemillar avatar

Stargazers

 avatar  avatar  avatar John C. Thomas avatar Sean Nesdoly avatar

Watchers

James Cloos avatar Andy Law avatar  avatar

maic's Issues

<list_label> and <unused> columns in input file

Hi,

I would be interested in using the MAIC suite of scripts. However, I am unsure about the values required in each column of the input file. What are <list_label> and < unused > columns?

Can you please provide an example of input file?

Thanks
Claudia

-l option is not exposed in CLI

Simple enough to manually trim the lists beforehand, but it is documented in the README to accept a length argument while this is not implmented in options.py. I can submit a PR.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.