Giter Club home page Giter Club logo

ldbc_finbench_docs's People

Contributors

bingtong0 avatar qingfeng14 avatar qishipengqsp avatar szarnyasg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ldbc_finbench_docs's Issues

[Audit] specify the validation rules

Three probable modes to valid the result,

  • ACID, validation, throughput test in separate
  • run result validation along with ACID
  • run benchmarking (throughtput measurement) along with ACID and result validation at the same time.

About how to valid the result, two probable ways to do,

  • self-validation by the driver
  • cross-validation

Simple read have some question

In the implementation phase:

simple read 4
result: there need have two results about numEdges & sumAmount

simple read 5
result: desc have a result about the transfer amount per day

Complex and ReadWrite

Complex 8
Is the distance from srcId to dstId the shortest?

Complex11
How to compute final share? It's not exactly known by description.

Read Write 2
Is the transaction aborts condition the both vertexes are fast-in and fast-out pattern?

Read Write3
Is p1.id from srcId or dstId? Or do both?

Write Params Format

Write
It would be better to stay the same format like personId, the other format is Person.personId.

TCR 10 should be moved to Simple Reads

Move TCR 10 to simple reads After solving the problem the input of Simple Reads comes from the output of Complex Reads.

However, this pattern might be different in the next version based on new query profiling.

Simple Read

simple read 1:
The desc is account or person, I think person should be removed.
The type of result also should be removed.

simple read2:
The query only input one account vertex. It would be better if change id1 to id.
The pattern contains COUNT(edge1), but result doesn't contain numEdge1. It's better to be consistent.

simple read3:
It's better to have precision about blockRatio.

simple read6:
The type of result could be ID rather than [ID].

simple read7:
Same as read6. The type of result could be ID rather than [ID].

[Spec Writing] section 1.3 describing the differences should be more clear

Some comments from Mingxi Wu @ TigerGraph,

A couple of comments of FinBench The LDBC Financial Benchmark
(version 0.0.1-SNAPSHOT)

  • 1.3 Differences between FinBench and SNB
    This section seems to mix the graph schema and query characteristics when it tries to differentiate FinBench and SNB.
    I would suggest to separate the differences into schema shape differences, and query shape differences.

  • schema shape differences
    a. it supports multiple edge.
    b. dynamic attributes to mark entities (e.g., an account is marked as blocked)
    c. quantity attribute + dynamic attribute on edges (e.g., transfer edge has quantity attribute amount, and the dynamic attribute timestamp)

  • query shape differences
    a. variable length path that qualified by sum of the edge quantity attributes.
    b. path qualification based on the path quantity attributes aggregation, either along one path or a set of paths.
    c. ....

With the above taxonomy, we can have clear differentiation of different benchmark focus. It will also guide us to find different benchmark metrics and choke points.

Section 1.3 in the initial draft is written to make the proposal. So it describes the difference generally both about data and workload.

Besides section 1.3, we should rewrite section 1, the introduction section, further.

Write have some question

In the implementation phase:

Write 4
Is the type useful in the pattern?

Write 8
What does the one-off means?

Write 9
Maybe the desc and params should be deposit, not repay, but it will be the same as Write 8.

Write 11
The person vertex don't have a property of isBlock.

[Format] Doc has some format or type errors.

Complex-read3 has a format error in params.
image
Please remove the special symbol.

Complex-read8 word ratio's font is different from other words.
image
Complex-read9
image
Result could't match query result description.

Complex-read12
image
Pattern description can't match result.

Refine the returned result

Result structure

Consider the exact returned result structure,

  • list of tuple
  • tuple of list
  • nested json

E.g.:

  • CR 8

Query template

  • add the groupby to the result. Consider the groupby description in SNB. E.g., CR 4 6 8 13
  • add a new row for result sort order.

simple-read and write query have some question

SimpleRead1

result: properties id todo

SimpleRead3

result: The result does not require a set, .e. g accounts.id([ID]) -> accountId(ID)

SimpleRead4

params: add startTime,endTime

SimpleRead6

result: The result does not require a set, .e. g companies.id([ID]) -> companyId(ID)

SimpleRead7

result: The result does not require a set, .e. g COLLECT(DISTINCT dstAccount.id)([ID]) -> dstAccountId(ID)

write3

params: add srcId,dstId

params: amount64-bit Integer has format problem

write5

params: add currentTime

write6

params: add currentTime

write8

params: amt -> amount, Complete spelling will be more consistent with other queries

write9

params: amt -> amount, Complete spelling will be more consistent with other queries

write11:

params: accountId -> mediumId

write12:

params:

  1. personId1
  2. personId2
  3. currentTime

write14:

pattern:it also have apply edge

ReadWrite have some question

Read Write 1
Does the edge1 contain the historical one?
There doesn't provide attributes for transfer edge at params.

Read Write 2
It would be better if it add desc about fast-in and fast-out, though we can know it at ComplexRead7.
There doesn't provide attributes at params.

Read Write 3
The person vertex don't have a property of isBlock.
There doesn't provide attributes at params.

[Schema] Gurantee between Company entites?

Some good points from @rickatultipa,

The schema looks good in the initial draft, and there are a few tweaks may help expand the schema to cover broader-spectrum fintech scenarios:

  1. If Person entities can have "guarantee" relationships, Company entities also have that in corporate guaranteed loan scenarios.
  2. As you indicated Loan is a special kind of Account, for loan applications, Medium signin relationships are also tracked.
  3. Account seems to be very general in the initial draft, we might want to consider 2 special kinds of accounts -- ATMs and POSes, as these 2 are frequently encountered in all card transactions scenarios.
  4. Lastly, we might want to clarify the specification of all 10 types of relationships, particularly the attributes assigned to each type of relationship.
    Hope my inputs make sense, cheers.
    Ricky

[Query] thresholds and time sequencing in read query #8

  1. The threshold for edge1 (transfer) and the threshold for edge2 ( withdraw ) should not be the same.
  2. The timing of transfer and withdrawal is not significant as long as they are within the range of the inquiry time window.

Complex-Read Query Have Some Question

Complex-read 4
result: add otherAccount
image

Complex-read 5
question:
1.If it have a path a->b->c->d, whether we need output a->b->c?
2.If it have a path a->b->c->b->d, is the path right?

Complex-read 6
result: add mid
image

Complex-read 7
question: If count(e2)==0, how to output result?
image

Complex-read 8
question: There maybe have many ratio1/ratio2/ratio3, how to distinguish it?

Complex-read 9
params: add id
question: I think there will have many pair of edge1 and edge2, how to choose it?

Complex-read 10
params: add start_time, end_time, truncation_limit, truncation_order

Complex-read 11
question: I didn't understand how to calculate final share.

Complex-read 12
title: Cycle -> Chain?
params: add start_time, end_time, truncation_limit, truncation_order
result: add Loan
pattern:
1.Do we want to consider the time of apply relationship?
2.How do we deal end if it find a cycle in the chain? .e.gļ¼Œp1->p2->p3->p4->p2.
image

Complex-read 13
result: add Company
image

[Raw comments] Concerning data distribution and AP workload

Comments from @rickatultipa

After reviewing the slides of last data schema discussion meeting, there are some meaningful comments from @rickatultipa

Hi Shipeng:
I look at the draft and have a few comments:
Slide#9 on Preliminary Data Profiling Result: there are lots of isolated vertices, 21B out of 21.6B, 97% are isolated vertices, which also means only 0.656B vertices are meaningful (connected) and worthy to be ingested for graph analysis. When banks are processing graph data, they tend to purge those isolated data points and the logic is simple: isolated data carry little value in graph-powered network/behavior analysis. I understand you may have used sample data from your group, but I'm wondering if the data can be representative of the average financial industry.
Also, on Slide#9: Hub-vertex degree: Clearly, there are hotspot supernodes with degrees exceeding 100M. This reveals that the underpinning data modeling is questionable, because this will bog down any graph system who tries to traverse such supernode efficiently. On the other hand, consider alternative data modeling/schema that effectively lower the max degree. Over 100M is a bit too extreme -- if the max degrees are in the range of 1-2M, that's probably more practical. (As far as I know, most graph systems today can NOT even traverse hotspot nodes with 10,000+ degrees with a tight latency bound, say, 100ms.)
Slide#14: Regarding TP vs. AP workload, it makes sense to formally introduce some graph algorithms as typical AP workload, from relatively straightforward ones to really sophisticated/time-consuming ones, like PageRank, LPA, Louvain, Node2Vec, etc. And all we need to define are input parameters and expected output formats.
Let me know what you think, thanks.
Best

[Spec Writing] unify the diagram and figures of queries

Consider these aspects,

  • expression-oriented: describe the query from the angle of Query Language Expression
  • data-oriented: describe the query from the angle of query pattern(like how much neighbors from the seed will be touched)

Complex read have some question

In the implementation phase:

complex read 1
How to understand the result?
Does it have to be mediumId.size == numMedia and mediumType.size == numMedia?
We can add it at pattern and desc.

complex read 2
Does the result need to distinct loans about sumLoanAmount & sumLoanBalance ?

complex read 4
If there don't have a edge between src and dst, how to do it?

complex read 5
If there have more than one edge between two vertices, do we only outputs path once?

complex read 6
The pattern is contain 3 transfer, but the desc is more than 3 transfer.
Maybe we need to choose one.

complex read 8
Does upstream means vertex or edge or edge's amount?
The upstream of pattern is different from desc.

complex read 9
The params is missing Account.id.

complex read 10
The params is missing startTime & endTime.

complex read 11
How to compute the finalShare?

complex read 13
The desc is return companies and the sum of their transfer, but the pattern and result is sumEdge2Amount.
Maybe we need to add Company.id or Compnay.name at result.

About Sort:
It will be easier to understand if we add descending order at describe when it query need sort.

acid has some error

1 Atomicity

grammar: an -> a

image-20230105202017480

2 Observed Transaction Vanishes

cypher: The two variables should be the same

image-20230105205915377

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.