Giter Club home page Giter Club logo

gimel's Introduction

gimel's People

Contributors

abhishekchinni avatar ameya-pandilwar avatar anishanainani avatar ayushiagarwal avatar bagopalan avatar bhargav18 avatar bthilakraj avatar dee-pac avatar dependabot[bot] avatar kination avatar laxpatil avatar meisam avatar parikhv15 avatar pkasinathan avatar praveen-kanamarlapudi avatar pravenne676 avatar qwjlegend avatar r39132 avatar rajarshisarkar avatar rampallydheeraj avatar sathishsri88 avatar susheendar avatar theromit avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gimel's Issues

[Standalone] Fix logging in quickstart scripts

Following are some fixes needed in quickstart scripts:

  1. Error and Process Aborted
    20180411 11:42:36 | ------------------------------------------------------------------------------
    20180411 11:42:36 | Attempting to Turn Off Safe Mode on Name Node..
    20180411 11:42:36 | ------------------------------------------------------------------------------
    20180411 11:42:36 | Attempting to Turn Off Safe Mode on Name Node...
    20180411 11:42:37 | Executing Command --> docker exec -it namenode hadoop dfsadmin -safemode leave
    DEPRECATED: Use of this script to execute hdfs command is deprecated.
    Instead use the hdfs command for it.

safemode: Call From e4e0ae5d275a/172.18.0.4 to namenode:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
20180411 11:43:02 | Error | Stage | docker exec -it namenode hadoop dfsadmin -safemode leave
20180411 11:43:02 | Error | Stage | sh /Users/pkasinathan/workspace/gimel-oss/gimel-dataapi/gimel-quickstart/bootstrap.sh kafka,elasticsearch

  1. Error and Process Continued

20180411 16:39:39 | -----------------------------------------------------------------
20180411 16:39:39 | Deleting topic if exists...
20180411 16:39:39 | -----------------------------------------------------------------
20180411 16:39:39 | Executing Command --> docker exec -it kafka kafka-topics --delete --zookeeper zookeeper:2181 --topic gimel.demo.flights
Error while executing topic command : Topic gimel.demo.flights does not exist on ZK path zookeeper:2181
[2018-04-11 23:39:43,609] ERROR java.lang.IllegalArgumentException: Topic gimel.demo.flights does not exist on ZK path zookeeper:2181
at kafka.admin.TopicCommand$.deleteTopic(TopicCommand.scala:180)
at kafka.admin.TopicCommand$.main(TopicCommand.scala:71)
at kafka.admin.TopicCommand.main(TopicCommand.scala)
(kafka.admin.TopicCommand$)
20180411 16:39:42 | Error | Stage | docker exec -it kafka kafka-topics --delete --zookeeper zookeeper:2181 --topic gimel.demo.flights
20180411 16:39:47 | ------------------------------------------------------------------------------

  1. Error and Process Aborted

20180411 16:40:28 | ------------------------------------------------------------------------------
20180411 16:40:28 | ALL STORAGE CONTAINERS - LAUNCHED
20180411 16:40:28 | ------------------------------------------------------------------------------
20180411 16:40:28 |
20180411 16:40:28 |
20180411 16:40:28 | ------------------------------------------------------------------------------
20180411 16:40:28 | Setting up spark container...
20180411 16:40:28 | ------------------------------------------------------------------------------
20180411 16:40:28 | Executing Command --> docker cp /Users/pkasinathan/workspace/gimel-oss/gimel-dataapi/gimel-standalone/lib/gimel-sql-1.2.0-SNAPSHOT-uber.jar spark-master:/root/
20180411 16:40:33 | Success | Stage | docker cp /Users/pkasinathan/workspace/gimel-oss/gimel-dataapi/gimel-standalone/lib/gimel-sql-1.2.0-SNAPSHOT-uber.jar spark-master:/root/
20180411 16:40:33 | Executing Command --> docker cp hive-server:/opt/hive/conf/hive-site.xml /Users/pkasinathan/workspace/gimel-oss/tmp/hive-site.xml
20180411 16:40:33 | Success | Stage | docker cp hive-server:/opt/hive/conf/hive-site.xml /Users/pkasinathan/workspace/gimel-oss/tmp/hive-site.xml
20180411 16:40:33 | Executing Command --> docker cp /Users/pkasinathan/workspace/gimel-oss/tmp/hive-site.xml spark-master:/spark/conf/
20180411 16:40:34 | Success | Stage | docker cp /Users/pkasinathan/workspace/gimel-oss/tmp/hive-site.xml spark-master:/spark/conf/
20180411 16:40:34 | Executing Command --> docker cp hbase-master:/opt/hbase-1.2.6/conf/hbase-site.xml /Users/pkasinathan/workspace/gimel-oss/tmp/hbase-site.xml
Error: No such container:path: hbase-master:/opt/hbase-1.2.6/conf/hbase-site.xml
20180411 16:40:34 | Error | Stage | docker cp hbase-master:/opt/hbase-1.2.6/conf/hbase-site.xml /Users/pkasinathan/workspace/gimel-oss/tmp/hbase-site.xml
20180411 16:40:34 | Error | Stage | sh /Users/pkasinathan/workspace/gimel-oss/gimel-dataapi/gimel-quickstart/bootstrap.sh kafka,elasticsearch
pkasinathan@LM-SJN-21013346:~/workspace/gimel-oss $

  1. Error on last step here (http://gimel.readthedocs.io/en/latest/try-gimel/1-HDFS/2-analyze-hdfs-data/)

scala> import com.paypal.gimel._
import com.paypal.gimel._
scala> val dataSet = DataSet(spark)
dataSet: com.paypal.gimel.DataSet = com.paypal.gimel.DataSet@4c49471c
scala> val df = dataSet.read("pcatalog.flights_hdfs")
Catalog Provider is --> USER
Resolving Catalog Via catalogProvider --> USER
java.util.NoSuchElementException: key not found: pcatalog.flights_hdfs.dataSetProperties
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:59)
at scala.collection.MapLike$class.apply(MapLike.scala:141)
at scala.collection.AbstractMap.apply(Map.scala:59)
at com.paypal.gimel.common.catalog.CatalogProvider$.getDataSetProperties(CatalogProvider.scala:72)
at com.paypal.gimel.DataSet.read(DataSet.scala:89)
... 52 elided
scala> df.count
res10: Long = 1398164
scala>

[Gimel-Core] Implement Class Loader Pattern to load requested Connectors

Problem Statement

Current Implementation of Gimel-Core Data API loads all the connectors, because dependencies are compile time.
With this pattern, a use-case that requires say just Kafka and ElasticSearch connectors are also loading other connectors Gimel Supports.
The implication is

  1. Packaging Unnecessary Dependencies for using say - just 2 of N connectors.
  2. Gimel would not provide capability to leverage different versions of same storage (say Elastic 2.x or 5.x). Current implementation strictly support only 1 version.

Solution

  • Improve Gimel Build to refer connector dependencies as scope=provided.
  • Modify Gimel-Core Module DataSet / DataStream to load the connectors using classLoader.
  • Resolve any APIs that are broken due to above changes.

[Docs] Fix Documentation errors in Gimel Quick Start

There are few steps to correct in the Try-Gimel documentation.

  • ask user to set GIMEL_HOME shell variable
  • remove references such as cd .. & always have user refer to $GIMEL
  • change the downloading option from personal google drive to another hosted repo (bintray)

[CheckStyle] Fix CheckStyle Warnings, Ensure we have 0 Warnings

Currently CheckStyle is set to Warning Level for several modules. This is because we have to do a lot of code cleanup to adhere to rules specified in checkstyle specification.

Without adding/changing any logic : only address the style sheet errors.

A plugin can be used to auto-fix the checkstyle violations. But this requires post-fix validation to ensure there is no inadvertent logic changes performed by any plugin that is used.

Other way is to fix manually each module.

Ensure the current code base adheres to the checkstyle by running the autofixer on the entire code base.

[Standalone] Add Gimel Standalone Feature to Gimel

The Gimel Standalone feature will provide capability for developers / users alike to

  • Try Gimel in local/laptop without requiring all the ecosystems on a hadoop cluster.
  • Standalone would comprise of docker containers spawned for each storage type that the user would like to explore. Storage type examples : kafka , elasticsearch.
  • Standalone would bootstrap these containers (storage types) with sample flights data.
  • Once containers are spawned & data is bootstrapped, the use can then refer the connector docs & try the Gimel Data API / Gimel SQL on the local laptop.
  • Also in the future : the standalone feature would be useful to automate regression tests & run standalone spark JVMs for container based solutions.

[Test] scalatest maven plugin is missing from the modules

The test cases in modules are not running because the scalatest maven plugin is missing module pom files does not.

To reproduce:

$ mvn test

Expected behavior:
DataSetUtilsSpec.scala test suit should run and all test cases in it should pass.

Actual behavior:
DataSetUtilsSpec.scala does NOT run.

[Logging] Refactor Gimel Logger

It is very difficult to unit-test Gimel, because a lot of dependencies, configurations, and options are hard-coded in Gimel source code and it is almost impossible to change them or mock them. Gimel should be refactored so we can unit-test Gimel.

Restore Lost commits

While trying to amend commit messages - there was an accidental force push to master that caused several commits to be lost. This issue is to restore the lost commit.

[Docs] Refactor Getting Started

Currently we have document docs/getting-started.md.

We will move it to folder docs/getting-started.
Break down all the documentation in one markdown file - into several files.

[Build] [CI] Enhance the build script

Currently we are suggesting everyone to use build/gimel to build Gimel.
However this has to be enhanced for reasons below -

  1. build/gimel does not provide customization of build by arguments. Say, we like to build a specific module - build/gimel still installs entire project.
  2. build/gimel also installs the dependencies, but we may require dependencies alone to be installed. So this needs to be separate
  3. Error handling is missing in build script.
  4. We will used the same script to setup travis ci.

[CI] Add Maven Integration for Artifacts

With UnPayPal & Open Sourcing, we need to start pushing Gimel Artifacts to Maven Central.

Add the necessary hooks/ implementations to enable Build + Push Artifacts to Maven Central.

[GitHub] Auto-close GH Issues when PRs are merged

As a developer, I would like GH Issues that are fixed to automatically close when the referencing PR is merged.

This can be done by adding a set of reserved words to the body of either the PR or the commit message as described here.

I propose updating the PR template to include this so that we ensure all PRs are compliant with this practice.

[Logging] Refactor code that captures and logs method names

The code that capture and logs method names is scattered over many places (40+ files).

    def MethodName: String = new Exception().getStackTrace.apply(1).getMethodName

    logger.info(" @Begin --> " + MethodName)

It is also cryptic, uses bad coding practices and violates Scala coding convention.

This code should be refactored into one place, ideally in gimel.logging.Logger.

[CI] Add CI for Gimel Build & PRs

With UnPayPal & Open Sourcing, the CI tied to PayPal Jenkins has been unpinned.

Add a new CI to continue with Builds for Master & Pull Requests.

An Option is travis CI.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.