Giter Club home page Giter Club logo

extractify's Introduction

Important : to mention Extractify in a publication, please use the following : « Extractify. Frederic Vergnaud, Mines Paris, PSL University, Centre for the Sociology of Innovation, i3 CNRS, France, https://github.com/fredericvergnaud/extractify »

Presentation

Extractify is a free extension for Chromium, developed in JavaScript under Atom, whose purpose is to scrap structured data on the web. It is particularly designed for collecting online comments or online conversations such as forums.

It allows you to:

  1. Select structured information on a web page (like tables with rows and columns), by direct selection on the web page, or manual selection by entering HTML tags and related CSS code
  2. Select the pagination of pages with the same structure and level
  3. Repeat the process as many times as desired for lower levels
  4. Scrape the whole selection
  5. Finally, obtain a file in json format that can be easily imported in other software, in L@ME for example.

What it does not allow: everything else!

Manual installation for Chrome

  1. Press the green « Clone or download » button on this page to download the latest version
  2. Unzip the downloaded archive
  3. In Chrome adress bar, go to extensions page by typing « chrome://extensions/ »
  4. Switch to « Developper mode » in the upper right corner
  5. Finally load the folder extractify-master as an « unpacked extension »

Usage

Go to the wiki to see how to use Extractify.

Love it ? Tell me !

Found a bug ? Don’t be afraid to open an issue.

extractify's People

Contributors

fredericvergnaud avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

extractify's Issues

pagination

Bonjour

deja merci pour cette excellente extension de chrome pour scraping

une petite question: la derniere release a vu quelques changements dans la partie pagination
ou pourrais-je trouver la nomenclature s'il vous plait pour compléter cette partie/essayer différentes configs en lien avec mes objets d'étude?

"pagination": {
"dataType": "pagination",
"selector": "",
"constantUrl": "",
"start": 0,
"step": 0,
"stop": 0,
"selectionType": ""

Merci par avance pour la réponse

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.