Giter Club home page Giter Club logo

Comments (31)

vrajat avatar vrajat commented on July 19, 2024 1

There hasnt been any progress on this feature. IIRC @jayeshagwan1 got stuck in installing a test Hive cluster. @zer0pool will you be able to help out?

from piicatcher.

vrajat avatar vrajat commented on July 19, 2024

from piicatcher.

jayeshagwan1 avatar jayeshagwan1 commented on July 19, 2024

I am interested in contributing.

from piicatcher.

vrajat avatar vrajat commented on July 19, 2024

Thanks!
Here are some guidelines on how to get started:

Install a developer version of piicatcher

  1. Fork the repo.
  2. Instructions are here: https://tokern.io/docs/piicatcher/development

Hive installation

I am not sure about your tech setup. A web search should provide a lot of websites with instrutions to setup Hive.

Load data into Hive

I use a couple of simple datasets:

  1. https://github.com/tokern/piicatcher/blob/master/tests/test_databases.py#L19
  2. https://github.com/tokern/piicatcher/blob/master/tests/samples/sample-data.csv

Add pyhive

Add pyhive as a requirement in requirements.txt

Rerun pipenv update to install pyhive.

Write a explorer

An explorer is the base class for supporting different types of technologies.
You can use AWS Explorer as an example.

You'll have to:

  1. Create a new python file - hive.py - for example.
  2. Implement a cli function.
  3. Implement a HiveExplorer class similar to AthenaExplorer
  4. Change all the code in the functions to make it work with hive. For example all the queries have to be changed. Use pyhive instead of pyathena and so on.

I can answer any questions while you develop.

from piicatcher.

jayeshagwan1 avatar jayeshagwan1 commented on July 19, 2024

Thanks @vrajat. Will follow the above steps. If any issue, will let you know.

from piicatcher.

jayeshagwan1 avatar jayeshagwan1 commented on July 19, 2024

Above steps were followed. After running the command piicatcher --config hiveconfig.ini hive
getting below error :
image

It seems its issue on windows system while installing pyhive.

from piicatcher.

vrajat avatar vrajat commented on July 19, 2024

from piicatcher.

jayeshagwan1 avatar jayeshagwan1 commented on July 19, 2024

image

from piicatcher.

vrajat avatar vrajat commented on July 19, 2024

from piicatcher.

jayeshagwan1 avatar jayeshagwan1 commented on July 19, 2024

Thanks. Now facing :

image

from piicatcher.

jayeshagwan1 avatar jayeshagwan1 commented on July 19, 2024

https://community.cloudera.com/t5/Support-Questions/pyhive-connection-error-thrift-transport-TTransport/td-p/206372

from piicatcher.

vrajat avatar vrajat commented on July 19, 2024

from piicatcher.

jayeshagwan1 avatar jayeshagwan1 commented on July 19, 2024

Hive2

from piicatcher.

vrajat avatar vrajat commented on July 19, 2024

from piicatcher.

jayeshagwan1 avatar jayeshagwan1 commented on July 19, 2024

yes

from piicatcher.

vrajat avatar vrajat commented on July 19, 2024

from piicatcher.

jayeshagwan1 avatar jayeshagwan1 commented on July 19, 2024

Now able to connect to hiveserver2. But getting below error:

raise ValueError("Password should be set if and only if in LDAP or CUSTOM mode; " ValueError: Password should be set if and only if in LDAP or CUSTOM mode; Remove password or use one of th ose modes

Currently I am passing auth='NOSASL' in connection. If I pass auth='Custom or none' then getting this error:

image

from piicatcher.

vrajat avatar vrajat commented on July 19, 2024

Can you confirm if these are errors when you try to connect to hive through python console ? No PIICatcher involved ?

Can you confirm if you can connect to Hive and run queries from python console ?

from piicatcher.

jayeshagwan1 avatar jayeshagwan1 commented on July 19, 2024

Sure. Will confirm. I think there similar open issues with pyhive also. Do we have other option for pyhive ?

from piicatcher.

vrajat avatar vrajat commented on July 19, 2024
  1. https://github.com/cloudera/impyla
  2. https://dwgeek.com/steps-to-connect-hiveserver2-from-python-using-hive-jdbc-drivers.html/

1 is probably the better option

from piicatcher.

jayeshagwan1 avatar jayeshagwan1 commented on July 19, 2024

There is some issue with pyhive. I have tried with python, but still getting same error.

image

from piicatcher.

jayeshagwan1 avatar jayeshagwan1 commented on July 19, 2024

Is it specific to OS ? Haven't tried with linux or ubuntu yet.

from piicatcher.

vrajat avatar vrajat commented on July 19, 2024

I am not sure. I've used in Centos and it worked. That was for a specific configuration of hive. OS or the configuration of python/hive can be the problem. Dont know how to help remotely with no knowledge about the setup.

from piicatcher.

vrajat avatar vrajat commented on July 19, 2024

Can you try impyla ?

from piicatcher.

jayeshagwan1 avatar jayeshagwan1 commented on July 19, 2024

Is this uses impala ?

from piicatcher.

jayeshagwan1 avatar jayeshagwan1 commented on July 19, 2024

I am trying on centOS, but getting this error:

[Errno 14] problem making ssl connection
Trying other mirror.
Error: Cannot retrieve repository metadata (repomd.xml) for repository: bintray--sbt-rpm. Please verify its path and try again

So could not install anything. Tried couple of things for ssl but its not working

from piicatcher.

vrajat avatar vrajat commented on July 19, 2024

from piicatcher.

vrajat avatar vrajat commented on July 19, 2024

ftw superset uses pyhive. https://github.com/apache/incubator-superset/blob/master/superset/db_engine_specs/hive.py#L71

There are also hive related issues but in general it works. I still think there is something about your installation that pyhive does not work with.

from piicatcher.

jayeshagwan1 avatar jayeshagwan1 commented on July 19, 2024

I will start working on Hive from next week and keep you posted.

from piicatcher.

zer0pool avatar zer0pool commented on July 19, 2024

@jayeshagwan1 hello. I am wondering how this implementation go. it would be great if this feature can be added soon.

from piicatcher.

vrajat avatar vrajat commented on July 19, 2024

closing this as there is not much demand for Hive. There is more interest in redshift, snowflake and Trino.

from piicatcher.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.