Giter Club home page Giter Club logo

strumentalia-seealsology's Introduction

strumentalia-Seealsology

Seealsology is simple tool that allows you to explore in a quick and dirty way the semantic area related to any Wikipedia Page. To make it simple, it extracts all the links in the "See also" section producing a graph. The tool works currently only for the following versions of Wikipedia: english, french, italian. Adding other languages requires the identification of the various "See also" sections. Feel free to contribute identifying them and proposing new languages via pull requests!

Try it here: http://tools.medialab.sciences-po.fr/seealsology/

or here: http://labs.densitydesign.org/seealsology/

https://raw.githubusercontent.com/densitydesign/strumentalia-seealsology/master/prev

Usage

You can take a look at this video tutorial from TantLab: https://www.youtube.com/watch?v=Ipb7DiyDt48

Paste the full link to one or more english Wikipedia articles (one per line).

The "distance" value defines the number of iterations. With distance 1, you'll get the original pages and the ones contained in the "see also" section. Increasing the value, the tool will perform the same operation on each retrieved page.

The "Get parents" button makes the tool also look for pages whose "See also" sections contain links towards the pages explored and their children. Note this process is a lot more heavey-demanding for the browser which has to perform many requests to Wikipedia's API, use it with caution and avoid it when crawling at a distance higher than 2.

With the "stop words" field it is possible to define wich pages should be discarded. The software will look for each "stop word" in the article title, if there is a match the article will be discarded.

For sanity and performance reasons, results are cached in the browser's localStorage for 24 hours, allowing you to quickly regenerate a previous crawl or restart a canceled one. Click on the "Clear cache" button to reset cache if you feel like results are not in sync with Wikipedia pages.

Output

While the data collection is performed, results will be displayed as a network graph and printed below as a list. Click on a page's name or node to open the Wikipedia page in another tab. On the bottom right panel, errors and stopped pages will be printed.

If the crawl raises really too many pages and you cannot wait, you can stop it using the Stop button.

Alternatively, you can look for more connections by adding new seeds from the graph using Ctrl+click on a node.

A "Download" button will allow you to download the results. Three formats are available:

  • TSV. A tab-separated table, easily editable in Libreoffice, Google Sheets, Excel. In the table, each line represent a connection from the source article to one target article cited in the "see also" section. The table contains three columns. "Source" contains the analyzed articles. "Target" contains the collected ones. "Level" is the distance from the original node.
  • JSON. The network described as object. It is compatible both with D3.js and sigma.js. It contains two arrays of objects: the first one containing nodes, the second one containing edges. Each node and edge is defined as an object.
  • GEXF. The network in XML-compliant format, easily importable in Gephi or Manylines, opensource tools for network visualization.

Installation

First install nodeJs, then run from Seealsology's root directory:

npm install
npm install -g bower
npm install -g grunt
bower install

To serve a development instance:

grunt serve

Or to build a static instance for a production server:

grunt build

and serve the dist directory to whatever path you like with your favorite server software.

strumentalia-seealsology's People

Contributors

boogheta avatar fenicento avatar yomguithereal avatar danieleguido avatar mikima avatar boamaod avatar helder-mattos avatar sibicoder avatar sgottsch avatar bverjat avatar framawiki avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.