Giter Club home page Giter Club logo

bibliotools3.0's People

Contributors

boogheta avatar paulgirard avatar tommv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bibliotools3.0's Issues

Distinguishing different types of keywords

The scripts does not distinguish between

  • the Title Keywords (TK - the words present in the title)
  • the Author Keyword (IK - the keywords assigned by the author)

In particular:

  • it is not possible to assign them different thresholds
  • in the Graphml, they are in the same partition "keyword".

Scripting the graph visualisation

It would save a lot of time, if the default visualisation of the networks was automatised (scripted in Gephi toolkit, Sigma or other).

In particular, these are the operations to script:

SPATIALISATION
LinLog Mode (yes)
Scaling (0.02)
Prevent Overlap (no)
gravity (1)

COLORS (as named in Gephi)
reference: light grey
institutions: olivedrab
authors: gold
keywords: hot pink
subjects: dark orange
countries: powder blue

RANKING (according to the occurence count)
5-150

Threshold by average and quartile

Extracting comparable networks (in the sense of having roughly the same number of nodes) from time-spans containing a highly diverse number of bibliographical notices demands to set different filtering thresholds. Lower for time-spans containing fewer nodes; higher for time-spans containing more nodes.
A way of doing this in a more systematic way may be to use average and quartiles.
Instead of filtering all the nodes with an occurence count ("occ") lower than N or the edges with a weight lower than N ("weight"), we could filter all the nodes and edges with an occurence or weight lower than the average (or the 1st quartile or the 3rd quartile).

Problème sur la connexion des meta-données dans la dernière version du script

En les analysant avec les doctorants de Leuven, on a trouvé un truc bizarre sur les cartes que tu m'a envoyé jeudi dernier.
Les références spatialisent bien, mais il y a un problème sur le meta-données.
Sur pas mal des cartes (notamment celles des dernières tranches temporelles), quasiment tous les nœuds des meta-données se retrouvent au centre (car uniformément liés à toutes les parties du graphe de références).

Au début je croyais qu'il s'agissait d'un problème des seuils, mais puis on a lancé l'extraction des même cartes (avec les mêmes seuils) sur l'ordi de Kari et on ne retrouve pas le même problème (les meta-données au contraire se spatialisent bien où on s'attendrait de les trouver).

Vu que le problème semble concerner seulement les dernières tranches temporelles, je me demande si cela n'est pas du à la parallélisation.

Blacklists of entities

Blacklist
It often happens that users want to exclude some of the entities from all the extracted graph. A typical case, is the exclusion of the keyword that were present in the original query and are therefore, by construction, connected to almost all the items in the graph.
It would save users a lot of time to be able to define this entities once and for all in black-list to be applied to all the extracted graph.

Whitelist
Researchers or experts may want to include in the maps some items that would be excluded by the selected thresholds. It would be therefore useful to have the possibility to 'impose' some items in the networks.

Writing all the thresholds in the report

In the config of the script it is possible to define two different threshold for each type of node:

  • occurrence count
  • weight
    In the report, however, only the threshold of the 'weight' is mentioned. Both should be indicated.

Adding the journals

It would be nice to have the titles of the journal appearing in the corpus as a new type of node (apparently the parser is capable to is capable to parse them, but they are not exploited)

User-interface

Develop a minimal user-interface for the script, notably to:

  1. merge a series of csv exported from ISI in one corpus
  2. define a series of temporal periods (with feedback on the number of bibliographical notices in each)
  3. define the threshold for the different types of nodes (with a feedback on the number of nodes obtained)
  4. define the black-list of items not to be included in the graph (i.e. the keywords present in the original query see this other issue)
  5. extract the graphs
  6. set the parameters of spatialisation, colour and ranking to pre-visualize the network (see this other issue)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.