Giter Club home page Giter Club logo

Comments (4)

max-biodatomics avatar max-biodatomics commented on August 14, 2024

Paul,

Mesa is a close source.
there are several open source alternatives for it in Hadoop stack. It
includes: Impala, Hive + Tez, Shark

Max

On Sun, Aug 24, 2014 at 1:20 PM, Paul Grosu [email protected]
wrote:

Hi Cassie, David, et al,

I just read the paper regarding Google's Mesa from the following links:

http://research.google.com/pubs/pub42851.html

http://research.google.com/pubs/archive/42851.pdf

I noticed it has several advantages such as online schema changes as well
as query by function on sets of values, which can perform in near
real-time. Another advantage is that it has petascale data-warehousing with
ACID properties for transactions. The function on sets can be especially
useful for the many-to-many relationships we have in our schema.

This seems to have some advantages over Megastore, Spanner, and F1 that we
can try to leverage.

I was wondering if we can test the schema with data on a development area
of Mesa.

Thank you,
Paul


Reply to this email directly or view it on GitHub
#131.

Maxim Mikheev M.D. Ph.D.
Founder & CEO

www.BioDatomics.com http://www.biodatomics.com/
Tel +1.412.475.8886
Fax +1.470.201.6233

from ga4gh-schemas.

pgrosu avatar pgrosu commented on August 14, 2024

Hi Max,

I understand and thank you for the alternatives, but I'm not sure that it should preclude us - especially if we gain other benefits. You'll notice BigQuery is also closed source, but we have an implementation of it for Google Genomics here:

https://github.com/googlegenomics/bigquery-examples

Thus utilizing the capabilities via a service does not require seeing the source code. All I am saying is that we have a better platform in operation ready to go for large-scale storage and analysis, and there seem to be advantages of such a closed platform that has components implemented in C/C++ vs Java with other optimizations (i.e. Collosus, etc.). Think of Google Caffeine which is based on Percolator that replaced processing on MapReduce, because it allowed more efficient processing of its indexing system. In fact Google Pregel can be very helpful for the variant analysis step where for GAVariationReference we can have very complex graphs, and it is also has a C++ implementation to make it fast. Again the code is not necessary, but only in the processing of the data via a service.

Since we are all working together as a team, everyone has their expertise which makes this project great. At least for me, the source code is not that critical in the system we use. If the service to the system can accept and process correctly a schema for updating the keys or data for storage and processing, then I see no downside. If a new platform exists with added benefits over the current implementations, and it does not impact production negatively, then I say let's try it out. Otherwise we keep tweaking the schema because of limitations in on older technologies, which might limit some of the analysis possibilities down the line.

Paul

from ga4gh-schemas.

cassiedoll avatar cassiedoll commented on August 14, 2024

@pgrosu - this kind of thing probably isn't a good fit for ga4gh.
This is very google specific, and so would fit best in a google specific area. Likewise, that bigquery repo you linked to isn't part of ga4gh, and there are no plans to move it over.

Like Max said, there are plenty of open source alternatives for this use case if we decide we need this kind of solution.

In this repo though (ga4gh/schemas) we actually don't have a need for any large scale backend as this is just an API definition - and not an implementation. Because of that, I'm closing this issue.

from ga4gh-schemas.

pgrosu avatar pgrosu commented on August 14, 2024

@cassiedoll - I understand, no problem :)

from ga4gh-schemas.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.