Giter Club home page Giter Club logo

babeldb's Introduction

BabelDB

“The library will endure; it is the universe. As for us, everything has not been written; we are not turning into phantoms. We walk the corridors, searching the shelves and rearranging them, looking for lines of meaning amid leagues of cacophony and incoherence, reading the history of the past and our future, collecting our thoughts and collecting the thoughts of others, and every so often glimpsing mirrors, in which we may recognize creatures of the information.” ― Jorge Luis Borges, The Library of Babel

⚠️ BabelDB is an ongoing "Sci-Fi" experimentation project.

BabelDB is an in-memory Website Database. BabelDB combines a programmatic data extraction engine with scheduling and data clustering. It offers a standard and lightweight SQL syntax and a powerful DSL for querying, searching and information retrieval. BabelDB continuously ingests data from any pre-defined seed web source and allows you to query data with standard SQL. Also it provides its own query language: BabelQL, built on top of the engine to provide search capabilities such as full-text search, term and phrase matching, regex and more.

Traditionally the building blocks of Databases relies on storage resource (e.g. disk, memory) and how it is organized and how data is distributed. Well for BabelDB the storage and distribution is already solved by internet itself: interconnected computer networks to storage and distribute data around the globe. BabelDB attempts to make all common DB features accesible for all at any time in any device.

Features

  • Data collection scheduling
  • Data clustering
  • Tag linking
  • Incrementally updated materialized views
  • Pattern matching
  • Deep collection
  • Stream data into pre-defined sinks
  • Define custom data collectors
  • Semantic subscription
  • Data discovery
  • Monitoring (404, etc.)
  • REST API connector
  • BabelQL

Motivation

From Wikipedia:

...a database is an organized collection of data stored and accessed electronically...

Can Internet as a whole be considered a Database by itself?

The internet is a vast space of information. Most of the information is free (which does not mean true) and accessible through browsers and search engines and dedicated tooling. Crawler & Scrapper bots are popular ways for automated data collection and indexing. Crawling is essentially what search engines do while scraping is an automated way of extracting specific datasets. But when it comes to address a more specific use cases or non-technical users, sometimes this is not enough.

For example:

  • I want to collect all news articles automatically and compare climate change narrative between site X and Y.
  • I want to know how site X looked like 24 hours ago and retrieve only the updates.
  • I want to keep track of companies that are environmentally friendly or have sustainability programs.
  • I want to discover linked web resources which match with some pattern.
  • I want to subscribe and be aware when certain semantic shows up in site X.

Ok!, technically speaking this is not too complex with the tooling we have access nowadays. But let's say I want a Marketing analyst with knowledge of SQL can do it.

BabelDB is the experimental attempt to solve that! 😀

babeldb's People

Contributors

margostino avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.