Giter Club home page Giter Club logo

pulsar's Introduction

Pulsar README

Pulsar is an un-structure focused intelligent data processing system, it extends SQL to handle the entire life cycle of data processing: collection, extraction, analysis, storage and BI, etc.

中文文档

product-screenshot product-screenshot

Features

  • X-SQL: eXtended SQL to do all data jobs: collection, extraction, preparation, processing, storage, BI, etc
  • Web spider: browser rendering, Ajax, scheduling, page scoring, monitoring, distributed, high performance, indexing by solr/elastic
  • BI Integration: turn Web sites into tables and charts using just one simple SQL
  • Big data: large scale, various storage: HBase/MongoDB

For more information check out platonic.fun

X-SQL

Extract data from a single page:

SELECT
    DOM_TEXT(DOM) AS TITLE,
    DOM_ABS_HREF(DOM) AS LINK
FROM
    LOAD_AND_SELECT('https://en.wikipedia.org/wiki/Topology', '.references a.external');

The SQL above downloads a Web page from wikipedia, find out the references section and extract all external reference links.

Extract data from a batch of pages, and turn them into a table:

SELECT
  DOM_BASE_URI(DOM) AS BaseUri,
  DOM_FIRST_TEXT(DOM, '.brand') AS Title,
  DOM_FIRST_TEXT(DOM, '.titlecon') AS Memo,
  DOM_FIRST_TEXT(DOM, '.pbox_price') AS Price,
  DOM_FIRST_TEXT(DOM, '#wrap_con') AS Parameters
FROM LOAD_OUT_PAGES_IGNORE_URL_QUERY('https://www.mia.com/formulas.html', '*:expr(width>=250 && width<=260 && height>=360 && height<=370 && sibling>30 ) a', 1, 20);

The SQL above visits an index page in mia.com, download detail pages and then extract data from them.

You can clone a copy of Pulsar code and run the SQLs yourself, or run them from our online demo.

Check sql-history.sql to see more example SQLs. All SQL functions can be found under ai.platon.pulsar.ql.h2.udfs.

BI Integration

Use the customized Metabase to write X-SQLs and turn Web sites into tables and charts immediately. Everyone in your company can ask questions and learn from WEB DATA now, for the first time.

Build & Run

Install dependencies

bin/tools/install-depends.sh

Install mongodb

You can skip this step, in such case, all data will lose after pulsar shutdown. Ubuntu/Debian:

sudo apt-get install mongodb

Build from source

git clone https://github.com/platonai/pulsar.git
cd pulsar && mvn -Pthird -Pplugins

Start pulsar server

bin/pulsar

Execute X-SQLs

Web console http://localhost:8082 is already open in your browser now, enjoy playing with X-SQL.

Use Metabase

Metabase is the easy, open source way for everyone in your company to ask questions and learn from data. With X-SQL support, everyone can organize knowledge not just from the company's internal data, but also from the WWW.

git clone https://github.com/platonai/metabase.git
cd metabase
bin/build && bin/start

Enterprise Edition:

Pulsar Enterprise Edition supports Auto Web Mining: unsupervised machine learning, no rules or training required, turn Web sites into tables automatically. Here are some examples: Auto Web Mining Examples

pulsar's People

Contributors

galaxyeye avatar insidegalaxyeye avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.