Giter Club home page Giter Club logo

koudyk_bhs_project's People

Contributors

koudyk avatar pbellec avatar samguay avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

koudyk_bhs_project's Issues

How to mine the text ?

Hi @koudyk,

Congrats for this project description!

What type of methods are you planning to use for text mining ? Are you planning to use word2vec for example ?

Tools for extracting features from text files in Python

Hello @koudyk I'm very interested in the results from your project! Indeed that is a very worrying issue in neuroimaging. A big paper on that topic was published yesterday in Nature https://www.nature.com/articles/s41586-020-2314-9 maybe it could be helpful.
Here's also an article about some usefull way to extract features from text files using python
https://www.analyticsvidhya.com/blog/2018/02/the-different-methods-deal-text-data-predictive-python/

Scope - abstract or full text

Cool idea! Does your dataset give you access to the full text, or only the abstract? Would you limit your data mining to the abstract?

5. Visualize real data

Once the real data is ready, visualize it in the way we visualized the simulated data.

colour issue in gif

It looks like something funny happens to the colors during the making of the gif.

The individual images have normal color, so the problem is with the gif.

4. Make lists of papers containing keywords

For each paper, search for methods-related keywords (maybe starting with 'python' and 'matlab'). If a keyword is found, append the paper ID to a list for that keyword. (need to verify if this is a good format for the visualization)

3. Make matrix of who cites who

For each paper, list who they cite. Put this info into a binary matrix including all papers in the search results and the papers cited in all the search results. (need to verify in step 1 whether this is a good format for the visualization)

1. Visualize simulated data

Visualize simulated data in the same way as we want to visualize the real data. This step should be done first so we know what outcomes are possible; this will inform our code for getting data from PubMed.

I'm envisioning a visualization of the entire citation network (over all years), coloured by whether they mention 'python' or 'matlab' (or some other keyword). The interactive component will be a slider that allows the user to step through each year, such that future years are not visible in the figure. Let's see if this is possible

Evaluation

Hi Kendra,

Cool idea :) I was wondering - do you already know how you will evaluate the performance of the keyword extraction?

Improve visualization

improve the visualization in these ways:

  • get citation data from dataframe, without needing the citation matrix (which is time-consuming to build)
  • add the software info by colouring the nodes
  • make new fig for each month or year, then save all images as gif

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.