pinterest / querybook Goto Github PK

Querybook is a Big Data Querying UI, combining collocated table metadata and a simple notebook interface.

License: Apache License 2.0

Shell 0.13% JavaScript 2.52% HTML 0.04% Makefile 0.05% Dockerfile 0.06% Python 30.80% Mako 0.01% TypeScript 58.34% CSS 0.01% SCSS 4.26% Mustache 0.03% MDX 3.76%

metastore analyses hive presto notebook typescript flask celery charting

querybook's People

Stargazers

Watchers

Forkers

czgu wcypig meowcodes rachshe likeucode kolachoor davidko3 weiplanet admariner piboonsak set5think silascutler rickysaltzer erichep justinmpier iaflelvbll hbcbh1999 jopymesgyre ignocu-eke almogtavor daintyframe dinhnc yh-syd beesitech hdakontcoi alphacentauri763 qingant suryatmodulus fengweijp marinhof zag iamheru parkman328 heiz tpnguyen vankylpedut joskid miotantenm dashewiuo dix12uodkean data-analisis pongthep jjbursik bligh18 maniacs-oss faisal-w jangocheng jangocity julienfrc wutao0914 geekhuyang aka-shi intercom razorpay v2kk mnoumanshahzad nishuihanqiu dropoftruth anismiles yongchand tagawaakirayo tmdc-io rxhealth daniel-mueller dperetin cesar-neri wenlongsun19 ankitsingh1492 tufanrakshit shivamsinghs485 gz315200 joshuayan laopeng2021 dut3062796s pswaters ekmixon palaniappa alexvasylenko yibit trendingtechnology nathanawmk shivammmmm froukees brooke-white guptam jmvizcainoio anirudhbhatta mthomas100 hackstrap aj2346 creepysta nagyist wanghangyu817 ismlkrkmz darapuk otakart brechtcorbeel anhnv korallin theseyi

querybook's Issues

Add scale options to x-axis as well

Add the ability to

choose a range for x values in chart-config UI
have an option for making x-axis logscale

Add Explain Query functionality

Add explain in the dropdown

Table warning system

Add a table warning system in DataHub where users can put their own warning messages for a table. This warning message will be shown by the linter while user is writing code.

Favoriting is broken

After favoriting a DataDoc, it does not show up on refresh

[Stores] Implement Local File Result Store

Add a chart option to hide legend

This is useful when there is a large amount of series

Allow transfer DataDoc ownership to another user

Expand the dropdown to not just give edit/read permission but also to give ownership. The previous owner should still get write permission afterwards

Improve ElasticSearch for code search

When searching xxx.yyy in data doc search, yyy would return nothing and users have to search xxx.yyy to find the result. The strategy will be provide multiple analyzers to analyze code and rich text differently

Generalize search filters for DataHub table search

Make sidebar search and table search use similar UI to control what parameters can be filtered such as:

featured
tags
search fields
table creation date

Querybook is not aware of default schema information

By default the default schema name is 'default', which does not apply to all cases since this can be overridden in the connection string. This setting would also be different for different query engines, for example, sqlite's default is actually 'main' instead of 'default'

Acceptance

backend should perform query analysis based on dynamic default schema information based on language and connection setting
Frontend should grab this information from backend and perform a similar analysis

Improve row samples

Add field selection to row samples, by default, all columns are selected
Users can export the raw query
Users can copy the result to clipboard as tsv

Clicking query examples user in Overview would open table in non-modal view

This is caused by the url change without considering the current state

Add private DataDoc to search functionality

Currently, all private DataDocs are not indexed on Elasticsearch for simplification of logic. Since most of the DataDocs will be private by default with FGAC, it is essential to make them searchable from Elasticsearch. The new Elasticsearch table for DataDocs should include 2 more fields: public and readable_user_ids. The second field readable_user_ids should include every user who can access this private DataDoc.

Scheduled DataDoc V2

Story
As a user I want to export multiple query results externally

Acceptance

Scheduled DataDocs allows exporting multiple query results with custom exporter settings for each

Filter example queries by query engine

Story
As a user, I would only want to view query examples by a certain query engine

Assumption

Adding query engine filter would not impact the count of other filters (by user, by table join)

Acceptance

User can filter query examples so that only examples ran with one selected query engine would show up

Remove Null Values in Chart Aggregation

Remove null vals to make chart smooth

Being able to add a title for Table Chart visualization

Currently picking a table chart does not let the user choose a title. It makes it hard to differentiate between a table chart and a query execution

Datahub Notification Plugin

Create Notifier plugin model to allow for different orgs to add new notification services such as ms teams. Notifier will handle sending query completion messages as well as doc permission change messages to DataHub users.

Move Data Table examples to elasticsearch

Many to many query engine <-> environment + orderable query engine

Story
As an admin, I want to add the same query engine to different environments without worrying about duplicating the config.
As an admin, I want to be able to order query engine in the dropdown so that I can order them differently for the user.

Assumption

A query engine and environment should be joined with an intermediate table
Extend single environment check to multiple should be easy

Acceptance

Admins can add the same query engine to multiple environments
Admins can order environment via drag and drop UI and it gets reflected in the query engine selector / query status etc

Cell Deletion UX

disable deleting cell with backspace
add keyboard shortcut with confirmation

[BUG] Using EXTRACT from presto syntax breaks syntax highlighting

Problem:
The sql-lexer assumes that anything that is a VARIABLE type following a FROM statement is a table and breaks the suggestions.

Root cause:
Presto allows a FROM clause in front of things other than table names

The types supported by the extract function vary depending on the field to be extracted. Most fields support all date and time types.
extract(field FROM x) → bigint
Returns field from x.

Code where this fails:

         while (!stream.eol()) {
            // here the match fails, and because nothing gets consumed it goes off in an infinite loop if the match is handled
            // Maybe the right thing to do is, if there's no match, break out of the stream matching?
            const match = stream.match(/^([_\w\d]+|`.*`)\.?/, true);

           // this fails and kicks you out of the loop, but then the suggestions stop working
            if (match[1]) {
                let part = match[1];
                if (part.charAt(0) === '`') {
                    // remove first and last char
                    part = part.slice(1, -1);
                }
                parts.push(part);
            }

short snippet of what caused this:

   SELECT *
   FROM table_2
  JOIN table_1
        ON table_1.field_1 = table_2.field_2
       AND extract(YEAR FROM field_1_date) = table_2.field_year

Query Execution Access Control

Adds control to who can access query executions, logs, and results.
Adds request for access functionality to query executions

Add DataHub Result Column Statistics

Show column statistics for DataHub result

UI Should be similar to this

Add the ability to customize welcome & no environment messages

Story
As a user, it would be confusing when I go on DataHub and it does not tell me why I cannot see any environments.
As an admin, I want to give pointers to new users when they first visit DataHub.

Acceptance

Welcome & No environment messages can be customized through something similar to the plugins model
Admins can use markdown to provide a custom message

[Charts] Improve Transformations + Bubble & Scatter Charts

Don't show table fields if there is no information

This change will apply to the following views:

Tooltip view
Sidebar view
Full table view

Fields such as partition, hive metastore information, query users, should be all hidden if there is no information to show

Auto format breaks when encountering s3 urls

expected formatting:

DELETE JAR s3://test-bucket/hadoopusrs/prod/test-0.5-SNAPSHOT/test-0.5-SNAPSHOT.jar;
ADD JAR s3://test-bucket/hadoopusrs/bob/test-0.5-SNAPSHOT/test-0.5-SNAPSHOT.jar;

-> same

actual formatting

DELETE JAR s3://test-bucket/hadoopusrs/prod/test-0.5-SNAPSHOT/test-0.5-SNAPSHOT.jar;
ADD JAR s3://test-bucket/hadoopusrs/bob/test-0.5-SNAPSHOT/test-0.5-SNAPSHOT.jar;

DELETE JAR s3: / / test - bucket / hadoopusrs / prod / test -0.5 - SNAPSHOT / test -0.5 - SNAPSHOT.jar;
ADD JAR s3://test-bucket/hadoopusrs/bob/test-0.5-SNAPSHOT/test-0.5-SNAPSHOT.jar;

Announcements not supporting emojis

[Chart] Convert Data Transformation to Pivot Tables

UI Component Snapshot Tests

Add the option to request access to private data docs

Impression Table does not sort correctly.

When sorting the impression count in the impression table in DataDoc/DataTable view, it does not sort from largest to smallest or vice versa.

Update Table Column UI

change it from a table format to rows format similar to announcement admin ui

Better unit test system with example test data

As a developer, I want to set up data source unit tests quickly with some example data in database

Acceptance:

Use demo data to setup test
fixture should be function level, so they can be swapped in/out
create 1 or more unit test examples that use these data

In DataDoc Query Cell full screen mode, the search box is not showing up

Make search box visible when entering full screen query cell mode
Make search box only care about the context of the current cell

Make the document title also contain the App name

This would be useful if user opens DataHub in a multi-window browser

Improve sample queries

add quotes
provide more options to explore table sample

Explicit column search by allowing users to pick which field to search

Being able to set which fields to search for cmd+k search
Here are some of the potential fields:

Title
Description
Column
User should be able to pick multiple fields at once

Add a user setting for query results text size

There is a user setting for editor text size. It would be nice to have a similar setting for the text size of the query results.
We can also reuse such setting for query results size

Show frequent users of a table

As a user, I want to see who are the frequent users of a table so I can ask them questions.

Assumption:
Use query samples to obtain info about the common query runners

Acceptance:

A new UI that shows a list of top 10 users of a table and their frequency
(optional) ability to filter query samples by users

[Auth] Add LDAP Support

Exporter V2

Together with #202, they should help with the experience of exporting

Story
As an user I want to export my entire DataHub query results without worrying about the preview size

Acceptance

The size of query results is exported not subjected to the size of query result but the maximum accepted size of the exporter
The export process should be async now and should optionally report a progress

Support Snowflake as a query engine

We can support snowflake easily with snowflake-sqlalchemy integration

https://docs.snowflake.com/en/user-guide/sqlalchemy.html
https://pypi.org/project/snowflake-sqlalchemy/

DataDoc Date Range filter does not work

The request does not return when you add a filter for start date or end date.
Things to check:

Why does the search request fail with a start date or end date as a filter
There should be an error UI when the request fails, instead of being stuck at the loading state.

Add vscode support for DataHub

Story
As a user, I want to use Vscode to develop DataHub with minimal amount of effort

Acceptance

Suggest a list of extensions that are essential for DataHub development (prettier, black, etc)
Suggest some standard vscode settings (formatting on save etc...)
Make devcontainer.json so that users can easily launch datahub with vscode

Query Execution picker would auto select on running query update

To repro:

Have 1 finished query.
Start a new query run and pick the old query.
When there is a query update, the picker would jump back to the running query

Convert JSON column types to JSON for mysql 5.7

Currently most columns are using varchar or mediumtext

Info Section in Sidebar

Change Logs
Keyboard Shortcuts
FAQs
Tours

pinterest / querybook Goto Github PK

querybook's People

Stargazers

Watchers

Forkers

querybook's Issues

Recommend Projects

Recommend Topics

Recommend Org