Comments (4)
Paul,
Mesa is a close source.
there are several open source alternatives for it in Hadoop stack. It
includes: Impala, Hive + Tez, Shark
Max
On Sun, Aug 24, 2014 at 1:20 PM, Paul Grosu [email protected]
wrote:
Hi Cassie, David, et al,
I just read the paper regarding Google's Mesa from the following links:
http://research.google.com/pubs/pub42851.html
http://research.google.com/pubs/archive/42851.pdf
I noticed it has several advantages such as online schema changes as well
as query by function on sets of values, which can perform in near
real-time. Another advantage is that it has petascale data-warehousing with
ACID properties for transactions. The function on sets can be especially
useful for the many-to-many relationships we have in our schema.This seems to have some advantages over Megastore, Spanner, and F1 that we
can try to leverage.I was wondering if we can test the schema with data on a development area
of Mesa.Thank you,
Paul—
Reply to this email directly or view it on GitHub
#131.
Maxim Mikheev M.D. Ph.D.
Founder & CEO
www.BioDatomics.com http://www.biodatomics.com/
Tel +1.412.475.8886
Fax +1.470.201.6233
from ga4gh-schemas.
Hi Max,
I understand and thank you for the alternatives, but I'm not sure that it should preclude us - especially if we gain other benefits. You'll notice BigQuery is also closed source, but we have an implementation of it for Google Genomics here:
https://github.com/googlegenomics/bigquery-examples
Thus utilizing the capabilities via a service does not require seeing the source code. All I am saying is that we have a better platform in operation ready to go for large-scale storage and analysis, and there seem to be advantages of such a closed platform that has components implemented in C/C++ vs Java with other optimizations (i.e. Collosus, etc.). Think of Google Caffeine which is based on Percolator that replaced processing on MapReduce, because it allowed more efficient processing of its indexing system. In fact Google Pregel can be very helpful for the variant analysis step where for GAVariationReference
we can have very complex graphs, and it is also has a C++ implementation to make it fast. Again the code is not necessary, but only in the processing of the data via a service.
Since we are all working together as a team, everyone has their expertise which makes this project great. At least for me, the source code is not that critical in the system we use. If the service to the system can accept and process correctly a schema for updating the keys or data for storage and processing, then I see no downside. If a new platform exists with added benefits over the current implementations, and it does not impact production negatively, then I say let's try it out. Otherwise we keep tweaking the schema because of limitations in on older technologies, which might limit some of the analysis possibilities down the line.
Paul
from ga4gh-schemas.
@pgrosu - this kind of thing probably isn't a good fit for ga4gh.
This is very google specific, and so would fit best in a google specific area. Likewise, that bigquery repo you linked to isn't part of ga4gh, and there are no plans to move it over.
Like Max said, there are plenty of open source alternatives for this use case if we decide we need this kind of solution.
In this repo though (ga4gh/schemas) we actually don't have a need for any large scale backend as this is just an API definition - and not an implementation. Because of that, I'm closing this issue.
from ga4gh-schemas.
@cassiedoll - I understand, no problem :)
from ga4gh-schemas.
Related Issues (20)
- Package for CRAN
- RNA expression data structure is inefficient HOT 7
- Rename repository HOT 2
- Update Release notes for the v0.6.0a10 release
- Remove created and updated timestamps from API HOT 4
- Add peer service human readable docs HOT 1
- Document maven release process HOT 1
- Move datamodel to its own repo
- Improve development.rst
- Content Type Negotiation
- Implement updated transcript effects protocol
- Deprecate reference ID in favor of reference name or accession ID HOT 1
- Recreate assay metadata HOT 2
- Update Java Protobuf Dependency to 3.1+
- protobuf java square write code-gen HOT 3
- Change booleans to enums
- Assay Metadata for Analysis object table is broken in documentation...
- GeoLocation attributes names HOT 1
- ListReferenceBasesRequest GET or POST HOT 1
- AnalysisResult scores
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ga4gh-schemas.