Testing the schema with data on Mesa at Google about ga4gh-schemas HOT 4 CLOSED

ga4gh commented on August 14, 2024

Testing the schema with data on Mesa at Google

from ga4gh-schemas.

Comments (4)

max-biodatomics commented on August 14, 2024

Paul,

Mesa is a close source.
there are several open source alternatives for it in Hadoop stack. It
includes: Impala, Hive + Tez, Shark

Max

On Sun, Aug 24, 2014 at 1:20 PM, Paul Grosu [email protected]
wrote:

Hi Cassie, David, et al,

I just read the paper regarding Google's Mesa from the following links:

http://research.google.com/pubs/pub42851.html

http://research.google.com/pubs/archive/42851.pdf

I noticed it has several advantages such as online schema changes as well
as query by function on sets of values, which can perform in near
real-time. Another advantage is that it has petascale data-warehousing with
ACID properties for transactions. The function on sets can be especially
useful for the many-to-many relationships we have in our schema.

This seems to have some advantages over Megastore, Spanner, and F1 that we
can try to leverage.

I was wondering if we can test the schema with data on a development area
of Mesa.

Thank you,
Paul

—
Reply to this email directly or view it on GitHub
#131.

Maxim Mikheev M.D. Ph.D.
Founder & CEO

www.BioDatomics.com http://www.biodatomics.com/
Tel +1.412.475.8886
Fax +1.470.201.6233

from ga4gh-schemas.

pgrosu commented on August 14, 2024

Hi Max,

I understand and thank you for the alternatives, but I'm not sure that it should preclude us - especially if we gain other benefits. You'll notice BigQuery is also closed source, but we have an implementation of it for Google Genomics here:

https://github.com/googlegenomics/bigquery-examples

Thus utilizing the capabilities via a service does not require seeing the source code. All I am saying is that we have a better platform in operation ready to go for large-scale storage and analysis, and there seem to be advantages of such a closed platform that has components implemented in C/C++ vs Java with other optimizations (i.e. Collosus, etc.). Think of Google Caffeine which is based on Percolator that replaced processing on MapReduce, because it allowed more efficient processing of its indexing system. In fact Google Pregel can be very helpful for the variant analysis step where for GAVariationReference we can have very complex graphs, and it is also has a C++ implementation to make it fast. Again the code is not necessary, but only in the processing of the data via a service.

Since we are all working together as a team, everyone has their expertise which makes this project great. At least for me, the source code is not that critical in the system we use. If the service to the system can accept and process correctly a schema for updating the keys or data for storage and processing, then I see no downside. If a new platform exists with added benefits over the current implementations, and it does not impact production negatively, then I say let's try it out. Otherwise we keep tweaking the schema because of limitations in on older technologies, which might limit some of the analysis possibilities down the line.

Paul

from ga4gh-schemas.

cassiedoll commented on August 14, 2024

@pgrosu - this kind of thing probably isn't a good fit for ga4gh.
This is very google specific, and so would fit best in a google specific area. Likewise, that bigquery repo you linked to isn't part of ga4gh, and there are no plans to move it over.

Like Max said, there are plenty of open source alternatives for this use case if we decide we need this kind of solution.

In this repo though (ga4gh/schemas) we actually don't have a need for any large scale backend as this is just an API definition - and not an implementation. Because of that, I'm closing this issue.

from ga4gh-schemas.

pgrosu commented on August 14, 2024

@cassiedoll - I understand, no problem :)

from ga4gh-schemas.

Testing the schema with data on Mesa at Google about ga4gh-schemas HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent