Giter Club home page Giter Club logo

linq-to-bigquery's Introduction

LINQ to BigQuery build status

LINQ to BigQuery is C# LINQ Provider for Google BigQuery. It also enables Desktop GUI Client with LINQPad and plug-in driver.

Installation

Binary from NuGet, LINQ-to-BigQuery

PM> Install-Package LINQ-to-BigQuery

LINQPad Driver can install from LINQ Pad driver gallery.
"Add connection" -> "View more drivers..." -> clcik "LINQ to BigQuery" - "Download & Enabled driver" link.
Note:Configuration's JSON is OAuth2 JSON Generate from GCP Management Page.

LastUpdate, ver 0.7.2(2016-02-15). See All Release Notes

LINQ with LINQPad

LINQPad is Desktop GUI Client for BigQuery!

bq_linqpad_first

  • Schema tree window in side explorer
  • "this" as BigQueryContext with authenticated connection
  • load dependented assembly and namespaces automatically(such as "BigQuery.Linq" namespace)
  • generate/load class of schema dynamically
  • add utility dump for BigQuery(DumpQuery/DumpRun/DumpRunToArray/DumpChart/DumpGroupChart//DumpToExcel)
  • of course you can save/load query in file

Result tab(Rows is expandable to grid view)

bq_linqpad_results

Legendary Dump to Excel, perfect integration to Excel:)

legendary_dump_to_excel

Why LINQ?

Strongly Typed with IntelliSense(and function documents).

bq_intellisense

bq_intellisense_func

All BigQuery functions is under BqFunc(except string.Contains). If you use C# 6.0(or LINQPad 5) you can enable static using.

// C# 6.0 Using Static
using static BigQuery.Linq.BqFunc;

LINQ to BigQuery is LINQ but is not IQueryable. It's constraint is old good method chain. The rule is

From((+TableDecorate)+Flatten) -> Join -> Where -| -> OrderBy(ThenBy) -> Select ->                     | -> Limit -> IgnoreCase
                                                 | -> Select | -> GroupBy -> Having -> OrderBy(ThenBy) | -> IgnoreCase
                                                             | -> OrderBy(ThenBy) ->                   |

After Select, you can call...

Run/RunAsync - Execute query with QueryResponse. 
ToArray/ToArrayAsync - Execute query and return rows.
AsEnumerable - Execute query and return rows, it's deferred(but resultset is not streaming).
RunDry - Dry run it's receive execution plan.
ToString - Build query string. It no needs network connection and BigqueryService. 
Into - Query as Subquery(same as From(query))

Powerfull Tabledecorator/Table wildcard integration.

// Table Decorators - WithRange(relative or absolute), WithSnapshot 

// FROM [githubarchive:github.timeline@-900000-]
.From<github_timeline>().WithRange(TimeSpan.FromMinutes(15))

// FROM [githubarchive:github.timeline@1411398000000000]
.From<github_timeline>().WithSnapshot(DateTimeOffset.Parse("2014-09-23"))

// Table wildcard functions - FromDateRange, FromDateRangeStrict, FromTableQuery

// FROM (TABLE_DATE_RANGE([mydata], TIMESTAMP('2013-11-10'), TIMESTAMP('2013-12-01')))
.FromDateRange<mydata>("mydata", DateTimeOffset.Parse("2013-11-10"), DateTimeOffset.Parse("2013-12-1"))

// FROM (TABLE_QUERY([mydata], "([table_id] CONTAINS 'oo' AND (LENGTH([table_id]) >= 4))"))
.FromTableQuery<mydata>("mydata", x => x.table_id.Contains("oo") && BqFunc.Length(x.table_id) >= 4)

// FROM (TABLE_QUERY([mydata], "REGEXP_MATCH([table_id], r'^boo[\d]{3,5}')"))
.FromTableQuery<mydata>("mydata", x => BqFunc.RegexpMatch(x.table_id, "^boo[\\d]{3,5}"))

Timezone convention

LINQ to BigQuery converts local time to UTC automatically for BigQuery's query string. Result set of BigQuery converts UTC to local time automatically, too. If you want to use UTC, configure BigQueryContext.IsConvertResultUtcToLocalTime.

var context = new BigQueryContext();
context.IsConvertResultUtcToLocalTime = false; // default is true

My thought of LINQ

LINQ to BigQuery is LINQ but is not IQueryable. It's my answer of LINQ.

IQueryable isn't necessary for LINQ to translate SQL. I need all BigQuery's sql can write LINQ to BigQuery, all specialized sql can execute, all function can execute(include window function). BigQuery's SQL different with standard Sql. I must support Multiple FROM, WITHIN, JOIN EACH, GROUP EACH BY, FLATTEN, IGNORECASE, etc) and LINQ to BigQuery is done.

Therefore control order, protect runtime error, satisfy all syntax. "IQueryable is all database's abastraction" is fantasy, LINQ needs specialized each Database. But LINQ to Xxx needs LINQ's atmosphere, filter as Where, projection as Select, order as OrderBy/ThenBy, like convention over configuration.

I think "IQueryable is dead, Long live Expression". Of course Expression Tree has performance issue, but execute BigQuery is very expensive so LINQ to BigQuery can ignore cost of Experssion relatively.

Basics

// create BigQueryContext with BigqueryService
var context = new BigQueryContext(/* BigqueryService, "projectId" */);

// From -> write Queries -> Run
var result = context.From<github_timeline>()
   .Select(x => x.repository_url)
   .Run();

tableName is provided from class definition's TableNameAttribute.

[TableName("[publicdata:samples.github_timeline]")]
public class github_timeline
{
    public string repository_url { get; set; }
    public bool? repository_has_downloads { get; set; }
    public string repository_created_at { get; set; }
    public bool? repository_has_issues { get; set; }
    // snip...(200 lines)
	public string url { get; set; }
	public string type { get; set; }
}

If table separated in YYYYMMDD suffix(for FromDateRange), you can use TablePrefixAttribute's table and BigQueryContext.BuildCSharpClass supports generate class.

[TablePrefix("[mytable.LoginInfo]")]
public class Login
{
    public DateTimeOffset timestamp { get; set; }
    public string url { get; set; }
    public string userAgent { get; set; }
}

// Generate all tableinfo and if table's suffix is YYYYMMDD, generates [TablePrefix] table.
var codes = context.BuildCSharpClass("mydataset");

TablePrefix's class is used by FromDateRange, FromDateRangeStrict.

// TABLE_DATE_RANGE([mytable.LoginInfo], TIMESTAMP('2015-03-03'), TIMESTAMP('2015-03-09'))
context.FromDateRange<LoginInfo>(new DateTime(2015, 3, 4), new DateTime(2015, 3, 10));

// You can write Today and One Date
context.FromDateRange<LoginInfo>();
context.FromDateRange<LoginInfo>(new DateTime(2015, 3, 4));

Class definition can generate from MetaTable and it's helper method.

new MetaTable("publicdata", "samples", "github_timeline")
	.GetTableSchema(context.BigQueryService)
	.BuildCSharpClass();

// or generate all tableinfo with guess table/tableprefix
context.BuildCSharpClass("mydataset");

Advanced Sample

Show github new repository every month by language.

bq_linqpaddemo_graph

Code

From<github_timeline>()
  .Where(x => x.repository_language != null && x.repository_fork == "false")
  .Select(x => new
  {
      x.repository_url,
      x.repository_created_at,
      language = LastValue(x, y => y.repository_language)
          .PartitionBy(y => y.repository_url)
          .OrderBy(y => y.created_at)
          .Value
  })
  .Into()
  .Select(x => new
  {
      x.language,
      yyyymm = StrftimeUtcUsec(BqFunc.ParseUtcUsec(x.repository_created_at), "%Y-%m"),
      count = CountDistinct(x.repository_url)
  })
  .GroupBy(x => new { x.language, x.yyyymm })
  .Having(x => GreaterThanEqual(x.yyyymm, "2010-01"))
  .Into()
  .Select(x => new
  {
      x.language,
      x.yyyymm,
      x.count,
      ratio = RatioToReport(x, y => y.count)
          .PartitionBy(y => y.yyyymm)
          .OrderBy(y => y.count)
          .Value
  })
  .Into()
  .Select(x => new
  {
      x.language,
      x.count,
      x.yyyymm,
      percentage = Round(x.ratio * 100, 2)
  })
  .OrderBy(x => x.yyyymm)
  .ThenByDescending(x => x.percentage)
  .Run()  // ↑BigQuery
  .Dump() // ↓LINQ to Objects(and LINQPad)
  .ToArray()
  .GroupBy(x => x.language)
  .DumpGroupChart(x => x.yyyymm, x => x.percentage);

It's query.

SELECT
  [language],
  [count],
  [yyyymm],
  ROUND(([ratio] * 100), 2) AS [percentage]
FROM
(
  SELECT
    [language],
    [yyyymm],
    [count],
    RATIO_TO_REPORT([count]) OVER (PARTITION BY [yyyymm] ORDER BY [count]) AS [ratio]
  FROM
  (
    SELECT
      [language],
      STRFTIME_UTC_USEC(PARSE_UTC_USEC([repository_created_at]), '%Y-%m') AS [yyyymm],
      COUNT(DISTINCT [repository_url]) AS [count]
    FROM
    (
      SELECT
        [repository_url],
        [repository_created_at],
        LAST_VALUE([repository_language]) OVER (PARTITION BY [repository_url] ORDER BY [created_at]) AS [language]
      FROM
        [githubarchive:github.timeline]
      WHERE
        (([repository_language] IS NOT NULL) AND ([repository_fork] = 'false'))
    )
    GROUP BY
      [language],
      [yyyymm]
    HAVING
      [yyyymm] >= '2010-01'
  )
)
ORDER BY
  [yyyymm], [percentage] DESC

Realtime Streaming Sample with Rx

// [githubarchive:github.timeline@1411511274158-1411511574167]
// [githubarchive:github.timeline@1411511574167-1411511874174]
// [githubarchive:github.timeline@1411511874174-1411512174175]
// ...
Observable.Timer(TimeSpan.Zero, TimeSpan.FromMinutes(5))
    .Timestamp()
	.Buffer(2, 1) // Buffer Window
	.SelectMany(xs =>
	{
		this.UseQueryCache = false;
		return this.From<github_timeline>().WithRange(xs[0].Timestamp, xs[1].Timestamp)
			.Select(x => new { x.repository_name, x.created_at })
			.ToArrayAsync();
	})
	.Dump();

More query samples, see:BigQuery.Linq.Tests. You can see How to Join? How to Window? etc...

Create table, insert data samples, see:SampleApps/StoreTwitterTimeline.

Author Info

Yoshifumi Kawai(a.k.a. neuecc) is software developer in Japan.
He is Director/CTO at Grani, Inc.
Grani is top social game developer in Japan.
He awarded Microsoft MVP for Visual C# since 2011.
He is known by creator of linq.js(LINQ to Objects for JavaScript)
Another his LINQ Project is UniRx - Reactive Extensions for Unity

Blog: http://neue.cc/ (JPN)
Twitter: https://twitter.com/neuecc (JPN)

License

This library is under MIT License.

linq-to-bigquery's People

Contributors

greigs avatar minato128 avatar neuecc avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.