Giter Club home page Giter Club logo

datahub-gma's People

Contributors

alyiwang avatar arunvasudevan avatar camelliazhang avatar clojurians-org avatar cptran777 avatar czbernard avatar dependabot[bot] avatar ericsun2 avatar igbopie avatar jerrybai2009 avatar jiaomawhu avatar jphui avatar jsdonn avatar jywadhwani avatar kaliang1 avatar liangjun-jiang avatar mars-lan avatar ramanbalagan avatar realchrisl avatar schuangv2 avatar shakti-garg-saxo avatar shpark76 avatar shridharsattur avatar sunzhaonan avatar theseyi avatar tsukaby avatar yangyangv2 avatar ybz1013 avatar zhixuanjia avatar ziveo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datahub-gma's Issues

feat: Move more code from linkedin/datahub

Medium to long term we'd like to move more code over from linkedin/datahub than was in this initial move. This includes the restli DAO and jobs (or at least very easy to use libraries for the jobs).

Sync get, getAll, batchGet, and search getInternal behavior

Is your feature request related to a problem? Please describe.

PR#132 changes the intended behavior of get() when parameter aspectNames is empty. Instead of throwing a 404, it will return VALUE with no aspects. To accomplish this behavior, getInternalNonEmpty call was changed to getInternal. We need to evaluate this behavior with getAll(), batchGet(), and search() methods to make sure they are aligned with get()'s new behavior.

Describe the solution you'd like

Double check and update calls to getInternalNonEmpty if needed to match with new get() behavior.

doc: Create roadmap

Create a roadmap. Ideally we phrase this as the roadmap to a 1.0 (more stable) release.

doc: Clean up docs after move

We haven't just split our code up, but also our documentation. They need more cleanup than what I've given them initially here. For now, users should probably read the complete / untouched documentation on linkedin/datahub.

Enforce code + javadoc formatting

Spotless can help here.

Javadoc formatting also includes ensuring that it is A) valid javadoc (html) and B) references are valid.

build issues: `:dao-impl:ebean-dao:test` failing in time zones ahead of UTC.

Describe the bug A clear and concise description of what the bug is.

docs/developers.md says ./gradlew build should work, it no longer does out of the box. Before we were on ea281ea, had no problems then.
With current master (ff9a36b) I noticed I had to:

  • install libncurses5 (found this in .github/workflows/build-and-test.yml), maybe this can just be updated in the docs?
  • run TZ=UTC ./gradlew build to make :dao-impl:ebean-dao:test pass; otherwise I got stuff like
12:01:29.310 [DEBUG] [TestEventLogger] Gradle suite > Gradle test > com.linkedin.metadata.dao.localrelationship.EbeanLocalRelationshipWriterDAOTest.testAddRelationshipWithRemoveAllEdgesToDestination FAILED
12:01:29.311 [DEBUG] [TestEventLogger]     javax.persistence.PersistenceException: Data truncation: Incorrect datetime value: '1970-01-01 00:00:01' for column 'lastmodifiedon' at row 1

Think this is 1970/01/01 in my timezone (=GMT+1) hence before 1970 in UTC. Thought the latest commits could maybe have fixed it since they mentioned lastmodifiedon, but apparently the issue is still there (tested with a8fb6c9 and ff9a36b).

Expected behavior

Tests should work regardless of timezone

Desktop (please complete the following information):

Ubuntu 22.04

Support "CONTAIN" filter condition

Describe the bug A clear and concise description of what the bug is.

Currently, the "CONTAIN" filter condition is not supported in Search, because the getQueryBuilderFromCriterion in SearchUtils class explicitly does not support it.

    final Condition condition = criterion.getCondition();
    if (condition == Condition.EQUAL) {
      if (criterion.getValue().startsWith("urn:li:")) {
        return QueryBuilders.termsQuery(criterion.getField(), criterion.getValue().trim());
      }
      return QueryBuilders.termsQuery(criterion.getField(), criterion.getValue().trim().split("\\s*,\\s*"));
    } else if (condition == Condition.GREATER_THAN) {
      return QueryBuilders.rangeQuery(criterion.getField()).gt(criterion.getValue().trim());
    } else if (condition == Condition.GREATER_THAN_OR_EQUAL_TO) {
      return QueryBuilders.rangeQuery(criterion.getField()).gte(criterion.getValue().trim());
    } else if (condition == Condition.LESS_THAN) {
      return QueryBuilders.rangeQuery(criterion.getField()).lt(criterion.getValue().trim());
    } else if (condition == Condition.LESS_THAN_OR_EQUAL_TO) {
      return QueryBuilders.rangeQuery(criterion.getField()).lte(criterion.getValue().trim());
    }

    throw new UnsupportedOperationException("Unsupported condition: " + condition);
  }```

(even though it exists in the Filter Condition enum.


#### To Reproduce

Steps to reproduce the behavior:

1. Deploy DataHub
2. Issue a Search Query with a specific "filter" criteria that has the condition "CONTAIN". You'll see a server error returned. 

![image](https://user-images.githubusercontent.com/17549204/125294210-efdd4080-e2d8-11eb-8a0d-143f87c93453.png)


#### Expected behavior

Contains operator should work for substring of string fields. 

As reported by Lal Rishav at Saxo Bank!

#### Screenshots If applicable, add screenshots to help explain your problem.

#### Desktop (please complete the following information):

- OS: [e.g. iOS]
- Browser [e.g. chrome, safari]
- Version [e.g. 22]

#### Additional context

Add any other context about the problem here.

Cannot build datahub-gma

I can't build gma on Mac M1 chip because of this error: dyld: Library not loaded: /usr/local/opt/openssl/lib/libssl.1.0.0.dylib

To Reproduce

Steps to reproduce the behavior:

Use a M1 chip Mac

  1. ./gradlew build

Expected behavior

build should work

  • OS: macOS Ventura 13.3.1
  • Chip version: Apple M1 Pro

Additional context

I tried to install [email protected] by myself per recommendation: #220.
However, I met a series of exception and here are related posts:

  1. rbenv/homebrew-tap#4
  2. rbenv/homebrew-tap#2
  3. sidneys/homebrew-homebrew#2

I was not able to install this version, can we choose another openssl version as I can successfully install 1.1, or, can someone help me install it, many thanks.

feat: Stop "pushing" code

Now that this code lives in this git repo & we have published jars, we should be able to stop pushing code from internal and make this the source of truth.

We first need to catch up, then we can switch. We're a ways behind.

[search][filter] Do not use comma as a delimiter to specify multiple criteria

Current behavior of search/filter method in ESSearchDAO is to support multiple values a key can take using comma as a delimiter. This restricts the ability to specify values that have comma itself as a character.

Additional context

Some references to the code that splits the value provided in Criterion model by commas include

return QueryBuilders.termsQuery(criterion.getField(), criterion.getValue().trim().split("\\s*,\\s*"));

Arrays.stream(criterion.getValue().trim().split("\\s*,\\s*"))

Elasticsearch Integration Tests

We should add integration tests for Elasticsearch, or at least a framework for it in GMA so it is easy to write integration tests in DataHub.

Enable WError

We should promote warnings to errors, and clean up said errors. There are a few places in our code that are a bit sloppy due to lack of these warnings being enforced (e.g. rawtype and unchecked warnings)

Why does BaseQueryDAO expose a "raw graph query statement"?

The BaseQueryDAO exposes four methods with a Statement argument, which is referred to as "raw graph query statement", e.g.

* Finds a list of entities of a specific type using a raw graph query statement.
*
* @param entityClass the entity class to query
* @param queryStatement a {@link Statement} with query text and parameters
* @param <ENTITY> returned entity type. Must be a type defined in com.linkedin.metadata.entity.
* @return a list of entities from the outcome of the query statement
*/
@Nonnull
public abstract <ENTITY extends RecordTemplate> List<ENTITY> findEntities(@Nonnull Class<ENTITY> entityClass,
@Nonnull Statement queryStatement);

What query language is expected here? Does it depend on the actual BaseQueryDAO implementation?

If the that is the case, any code that uses such statement-methods would render implementation-specific, which contradicts the DAO approach.

Add support for Gremlin

Is your feature request related to a problem? Please describe.

We are considering using an upstream project datahub. Our team is an AWS shop, and would like to take advantage of AWS hosted solutions like Neptune whenever possible. It would be great to add support for Gremlin (one of the interfaces that Neptune implements) so that we can easily host the graph database for datahub.

Describe the solution you'd like

Implement a BaseGraphWriterDAO and BaseQueryDAO for gremlin based graph data stores.

Describe alternatives you've considered

Alternatives would be finding 3rd party neo4j SaaS provider, or hosting our own database cluster within AWS. Both of these are something that we would prefer to avoid if possible, for both cost and business reasons.

Additional context

Add any other context or screenshots about the feature request here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.