Giter Club home page Giter Club logo

doc2map's Introduction

⚠️Need Refactoring⚠️

This project was only design as a prototype to prospect the possibilities of this map concept.

(As it was one of my first project)

Doc2Map

Doc2Map is an algorithm for topic modeling and visualization. It can read any type of document files, but not OCR them. It will find topics base on the core idea of Top2Vec and hierarchicaly display them on a map similar to a Google Map: Leaflet Map Live Demo 1 With Wikipedia Dataset

Live Demo 2 With 20 News Groups

Or on a scatter plot with a munual zoom level: Plotly Map Live Demo 1 With Wikipedia Dataset

Live Demo 2 With 20 News Groups

Why use Doc2Map?

With Doc2Map, you will be able to create beautiful, intuitive, and interactive visuals to summarise your document corpus in a map, similar to Google Map, with topics, clusters, and documents, instead of the names of countries, states, and cities.

Thanks to Apache Tika –a software able to detect and extract and text from over a thousand different file types– allow Doc2Map to read virtually any kind of file.

Note: This is not OCR, can’t extract text from pictures.

Using Doc2Map

There are two ways to use Doc2Vec:

  • Launching directly the python module
  • Importing the Doc2Map library in your script

Launching Doc2Map Module

Your first option is to directly launch the module. Once launch, you will have to wait a little for the programm to start, then you will be asked what folder you want to analyse:

image

Select the folder with the document you want to cartography.

For the next step, you will have to be patient. Doc2Map will analyse and convert into plain text your docuemnt, then organise them. Depending of the format, the size and the number of documents, it may take a long time...

When finished, two web pages will be automaticaly launch on your browser to show you different cartographies of you documents.

The examples are loaded from HTML files newly created. You can easily find their localization by looking at the address bar of your browser, you will see something like file://Your/Path/To/Your/Visuals

These files can easily be exported to another machine, with little of requirements:

  • If your visualization is based on local files, once exported, these files may no longer be accessible by interacting with the visualisation.
  • However, there will be no problem, if you use a common share hard drive with the people you share the visualisations (like it may often be the case in many firms, under the form of a local network). For the visualisation DocMap.html, you will have to include the files: DocMapdensity.svg and data.js.

Importing in a Python Script

If you want to use Doc2Map with python, you have first to install it:

pip install Doc2Map

Then, you will have to import it:

from Doc2Map import Doc2Map

How Does It Work?

Doc2Map is mainly based on the Top2Vec principle, and rely on Plotly and Leaflet to create beautiful visuals.

If you want to know the complete story and working of Doc2Map, I invite you to read the Medium Article about it:

doc2map's People

Contributors

louisgeisler avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

doc2map's Issues

d2m.interactive_map() running for more than 24 hours

Hi !

Thanks for your package !
I would have liked to explore it and show the interactive map , i've loaded 9 json files but it's been running for 24 hours and nothing is showing ...
How could I know what's happening ?

Thank you again !

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.