Giter Club home page Giter Club logo

lingbm's Introduction

Linköping GraphQL Benchmark (LinGBM)

LinGBM is a performance benchmark for GraphQL server implementations. The wiki of this repo provides an introduction to the project, the specification of the benchmark, and design artifacts. This repo contains artifacts created for the benchmark (such as GraphQL schemas, query templates, query workloads) and the following benchmark software:

Publications related to LinGBM

lingbm's People

Contributors

chengsijin0817 avatar daniel-dona avatar dependabot[bot] avatar hartig avatar ljukas avatar rabnawazjansher avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

lingbm's Issues

Update LinGBM.wiki

  • GraphQL schema
  • query templates
  • Performance metrics
  • Create sidebar
  • restructrure the directory of Git repo

OrderFieldInput - Q8

I just want to ask how this is supposed to work.

In the following query:

offers(where: OfferWhereInput, limit: Int, order: [OrderFieldInput]): [Offer]

We can give an order argument which is a list of OrderFieldInputs, each OrderFieldInputs can have two fields to order by.

If there is just ONE OrderFieldInput my guess would be that we sort first after the orderField1, and secondly we sort by orderField2, but we keep the order given in orderField1.

My question then is how should it work if there is a list of OrderFieldInputs? Q8 uses this input, but it says that $attrOffer1 is selected from the 10 possible values of OffersSortingField, the same for $attrOffer2. But the schema does not match here.

The query template should look like the following to use the current schema:

{
  offers(limit:$cnt, order: { orderField1: $attrOffer1, orderField2: $attrOffer2} ) {
    offerWebpage
    validFrom
    price
}

Notice here that it does not take an Array, but a singel OrderFieldInput.

We need to decide here which approach to take. I recommend taking the example given above. That the order argument is always just a single OrderFieldInput, and we can therefore sort by 2 fields as a maximum.

bring back BSBM

(#79 describes how LinGBM was initially based on BSBM but is now based on LUBM)

There are some benefits to BSBM:

  • in BSBM resultset size doesn't increase with dataset size. In large LUBM variants, response time is dominated by the time to return the result. GraphQL results don't have streaming, so this may represent a further problem
  • Some databases have been tested on very large BSBM variants
  • Other virtualization frameworks (eg ONTOP, Morph-GraphQL) have been tested against BSBM

And some disadvantages:

  • BSBM uses named graphs, and I don't know how this could be mapped to GraphQL, and none of the GraphQL-RDF frameworks I know supports named graphs

Neutral:

  • BSBM results are often dominated by one query that uses regexp. But LUBM QT5 does the same
  • I think BSBM has a relational variant

(Some of the info above is from @vassilmomtchev, Ontotext CTO)

Would it be a lot of effort to support both?

Typo Error

In schema there are (int) it should be replaced with (Int)

QueryGen Errors

I've started work on a preliminary test-driver. It runs all queries in the actualQueries folder synchrounously.

When I send the following query:

{
  reviewSearch(field:text,
               criterion:contains,
               pattern:"elms")
  {
    title 
    label
  }
}

I get an error on the criterion field, because according to the schema it is a StringCriterion:

enum StringCriterion 
{
    CONTAINS
    START_with
    END_with
    EQUALS  
}

Which is all caps. So the querygen needs to be modified to fit the schema. I also recommend changing the START_with to START_WITH and the same for END_with.

NOTE: Its the queryTemplate10.txt that needs to be edited.

Regarding this, could I be allowed to push to branches on this repo? I have a fix for this issue on my machine but I cannot push it into a branch and make a pull-request. If I could do this I can fix issues and you just have to approve them instead of spending time on fixing them yourselves 😃

issue in query template qt3.txt

query researchGroup_department_head_doctorDegreeFrom($researchGroupID:ID) { researchGroup(nr:$researchGroupID) { subOrgnizationOf { head { id email doctorDegreeFrom {id} } } } }

email must be replaced by emailAddress and should look like this

query researchGroup_department_head_doctorDegreeFrom($researchGroupID:ID) { researchGroup(nr:$researchGroupID) { subOrgnizationOf { head { id emailAddress doctorDegreeFrom {id} } } } }

issue in query template qt4.txt

query lecturer_university_graduateStudent_professor_department($lecturerID:ID) { lecturer(nr:$lecturerID) { doctoralDegreeFrom { id undergraduateDegreeObtainedBystudent { id email advisor { id email worksFor {id} } } } } }

email should replaced by *** emailAddress and it should look like this

query lecturer_university_graduateStudent_professor_department($lecturerID:ID) { lecturer(nr:$lecturerID) { doctoralDegreeFrom { id undergraduateDegreeObtainedBystudent { id emailAddress advisor { id emailAddress worksFor {id} } } } } }

issue Query template QT4.txt

query lecturer_university_graduateStudent_professor_department($lecturerID:ID) { lecturer(nr:$lecturerID) { doctoralDegreeFrom { id undergraduateDegreeObtainedBystudent { id email advisor { id emailAddress worksFor {id} } } } } }

There is issue email should be replaced with emailAddress in above query template and it looks like this

query lecturer_university_graduateStudent_professor_department($lecturerID:ID) { lecturer(nr:$lecturerID) { doctoralDegreeFrom { id undergraduateDegreeObtainedBystudent { id emailAddress advisor { id emailAddress worksFor {id} } } } } }

-nm flag doesn't seem to work

I try to generate one of each queryTemplate using:

java -cp target/querygen-1.0-SNAPSHOT.jar se.liu.ida.querygen.generator -nm 1

But I still get 20 instances of each queryTemplate.

Error: Unknown type "publicationField".

In type Query
publicationSearch(field: publicationField!, criterion: StringCriterion!, pattern: String!): [Publication]
this should be replaced by this i think.

publicationSearch(field: PublicationField!, criterion: StringCriterion!, pattern: String!): [Publication]

Issue in query Template q14.txt

query multipleFilters($departmentID:ID,$professorType:String, $interestkeyword:String) { university(nr:$universityID) { undergraduateDegreeObtainedBystudent(where: {AND:[{advisor:{age:{criterion:MORETHAN, pattern:$age}}},{advisor:{researchInterest:{criterion:CONTAINS, pattern:$interestKeyword}}}]}) { id emailAddress takeGraduateCourses {id} } } }

in the filter field, there is unknown parameter advisor: age

i think possible correct query structure is

query multipleFilters( $departmentID: ID $professorType: String $interestkeyword: String ) { university(nr: $universityID) { undergraduateDegreeObtainedBystudent( where: { AND: { age: { pattern: $interestkeyword, criterion: MORETHAN } advisor: { researchInterest: { criterion: CONTAINS, pattern: $interestkeyword } } } } ) { id emailAddress takeGraduateCourses { id } } } }
calling and filter with this AND:{} not with this AND:[]

Define reporting rules in the wiki

We need an additional page in the wiki that defines the rules for reporting benchmark results. For instance, when reporting benchmark results, it needs to be explicitly mentioned:

  • on what machine(s) the software was run (CPU, RAM, ...),
  • which versions of what software were used (incl. operating system, node.js version, etc),
  • which scale factor(s),
  • etc.

Concerns regarding the "scenario" of the approach outlined in the wiki

I love the motivation behind this effort and benchmarking suite! I think this would be very useful for the community if done in a manner useful to GraphQL authors and not just GraphQL vendors. Especially, I believe it will give the community ideas about different ways to implement a GraphQL backend. Also a fan of your work with GraphQL cost measurement!

With reference to the wiki I have some concerns about this benchmarking approach and am jotting down some thoughts here.

The focus of our benchmark will be a scenario in which data from a legacy database has to be exposed as a read-only GraphQL API with a user-specified GraphQL schema.

Systems that can be used to implement GraphQL servers but that are not designed to support this scenario out of the box can also be tested with the benchmark, for which they have to be extended with an integration component such as a schema delegation layer. In such a case, from the perspective of the benchmark, the combination of the system and the integration component are treat as a black box.

I find this approach extremely confusing because the aim is to test integration with a legacy database of a GraphQL server with a user-defined schema. There are 2 problems I have here:

  1. Your approach here instead will end up testing the schema delegation layer and not the GraphQL server? Isn't this a flawed approach to testing the efficiency of a GraphQL server's implementation?
  2. If I was writing a user-defined GraphQL schema, why would that schema expose complex subquerying, filtering, traversal, limit, offset? I would expose exactly the GraphQL schema my apps need. What is the rationale behind putting a schema delegation layer in front of an "out-of-the-box" GraphQL server?

As a personal preference, if I was writing a user-defined schema and building a GraphQL server querying a database I would use an ORM not a GraphQL server, like SQL Alchemy or massive.js or knex.

I find this approach of benchmarking a combination of a user-defined GraphQL server, going through a generic "GraphQL vendor", then going to the database, heavily biased towards a Prisma style of implementation and not applicable to anything else. Benchmarking is hard enough with just a server and a database, and adding a third component will make it harder to reason about the correctness of the benchmark.

Are there other GraphQL vendor or products that are GraphQL middleware/ORMs for authors of APIs other than Prisma and Dreamfactory, that you intend to benchmark with this suite?

Open-source servers like Postgraphile and Hasura (I work here) and vendor solutions like AppSync are meant to be GraphQL servers that are optimised for large numbers of HTTP clients querying for simple to complex real-world queries. It seems pointless to add a GraphQL delegation layer in front of these GraphQL servers.

Recommendation 1:

If this effort is intent on benchmarking the nascent ecosystem of "GraphQL vendors" I would recommend instead:

  1. Choose a dataset on a database with a fixed schema: This is important and reflective of the real world where data is modelled on a database for the database
  2. For the same choke point, write GraphQL queries as exposed by the GraphQL API of the vendor, even if the GraphQL queries have slightly differing syntax
Recommendation 2:

Instead, if this effort is meant to benchmark the broader process of writing GraphQL servers with a hand-written schema, I would recommend:

  1. Choose a GraphQL server implementation and choose a database ORM
  2. Write different optimised backends with different languages
  3. Write the same GraphQL queries that would test similar choke points that your benchmarking suite will run against.

From a usefulness point of view, I think the latter (recommendation 2) is very useful for folks in the community building GraphQL servers! This would become a very useful benchmark for folks building GraphQL servers that have to inevitably talk to a database and that need to choose between different ORMs and approaches to processing the GraphQL query.

I think the former (recommendation 1) is a benchmark of "GraphQL vendors". These are 2 different things and should be treated differently as such.

issue in query template qt2.txt

query university_faculty_publications($universityID:ID) { university(nr:$universityID){ doctoralDegreeObtainers{ publications{title} } } }
i think
doctoralDegreeObtainers there must be where clause in it

QT13 Date-generation error

Hello.

This this bug is quite funny actually. The querygen gave me the following query:

{
  producer(nr:20) {
    products {
      offers(where:{ vendor:{ publishDate:{criterion:AFTER date:"2002-02-30"}}})
      {	
        price
        offerWebpage
        product {label comment}
      }
    }
  }
}

Can you spot the error?

Yes, the date doesn't exist, February does not have 30 days. :D Sending this query to the database causes it to send back an error because of "Incorrect DATE value".

issue Query template q12.txt

query subqueryFilter1($universityID:ID, $departmentID:ID) 
{ 
  university(nr:$universityID) { 
    doctoralDegreeObtainers (where: {department: {nr:$departmentID} } ){ 
      id 
      emailAddress 
      publications {id } 
    } 
  } 
} 

this should be replaced by this

query subqueryFilter1($universityID:ID, $departmentID:ID) { university(nr:$universityID) { doctoralDegreeObtainers (where: {worksFor : { nr: $departmentID} } ){ id emailAddress publications {id } } } }

in doctoralDegreeObtainers instead of the department, we have workFor as input parameter in the schema

Q10 - Schema

Q10 looks like the following:

query stringMatching($textOfReviewKeyword:String)
{
  reviewSearch(field:text,
               criterion:contains,
               pattern:$textOfReviewKeyword)
  
    title 
    label
  
}

reviewSearch returns a list of reviews according to the schema:

reviewSearch(field: ReviewFieldInput!, criterion: StringCriterion!, pattern: String!): [Review]

A Review has no field label, like the query template want us to return. Propose to remove that field, or change it to one that exists.

Consider adding documents as a PR for feedback

Hey 👋, just stumbled on this, very cool idea and project. Maybe you could consider opening a PR with your initial assumptions / documents for easier feedback. It's hard to comment on a wiki. If not, how would you like to receive feedback?

Thanks!

Who has run LinGBM?

I don't know if you have plans to publish benchmark results; that usually requires the fixation of reporting rules, and agreement from vendors. Perhaps the ldbcouncil.org can help you with this?

Here is a start for a list, because it's interesting for users and vendors to know who has tried it:

QueryGen - Duplicates

Hello.

When we met at the mid-term review we talked about the test methodology. One thing we mentioned was that we wanted to run throughput tests with the different query-templates.

For some query-templates only a few different queries can be generated that are distinctly different, for example query-template 2 will only generate 22 different queries when the database is created with the regular settings.

One way to combat this was for the querygen to be able to generate duplicates. I'm ready, or very very soon ready, to start running the real tests now. Would it be possible to include an option in the querygen that makes it generate duplicates? Or should I just copy-paste the generated queries to get more of them?

Aggregation related schema and query

  • update shema: AggregateOffers, PriceAggregationOfOffers
  • update Query Template Q16 in Google doc
  • update Q16 in query template on Github repo
  • update Q16 on Github wiki

QT15 and QT16

Hello.

Short question and your thoughts around it.

So since the querygen does not generate any query which has the same arguments as any other, some of the queryTemplates are hard to generate in big numbers.

For example when I run the querygen with "-nm 200" I still only get 12 queries each from these templates since thats how many vendors there are.

I suppose one way to increase the querygens limit is to generate a bigger database, which I think might be a good choice in my tests anyway.

But, maybe we should include an option that forces the querygen to generate the amount specified with the "-nm" flag? Or what do you think regarding this? @hartig @chengsijin0817

A sample size of 12 is not good for much right?

different package name for querygen

The code for the query generator should be in the Java package se.liu.ida.lingbm.querygen (not se.liu.ida.querygen), and the generated jar file should be named lingbm-querygen-...jar. I will implement both changes now.

Schema - Person

So, I found now that the Person type in the schema is not used in any of the query-templates. So maybe we should remove it from the schema?

document RDF generation and the LUBM ontology

I assume this test targets GraphQL implementations over RDF (like the Ontotext Platform).

Product - Reviews ordering - ReviewSortingCriterion - Q9

As in #32 we have differing schema and query template here.

We can query reviews on a product with the following:

reviews(order:[ReviewSortingCriterion]): [Review]

Here it says that we can pass a list of ReviewSortingCriterion to the order argument.

The only place this is used in a query template is in Q9 where it is used in the following way:

vendor(nr:$vendorID) {
    offers(limit:50) {
      product {
        reviews( order:{field:$attrReview direction:DESC} ) {
          title
          rating1
          rating2
          rating3
        }
      }
   }
}

Here the order argument is not a list, but a single object again. So I propse that we change the schema to the following:

reviews(order:ReviewSortingCriterion): [Review]

What are you thoughts?

directories and file names of query templates

@chengsijin0817 After seeing the new directory/file structure with the query templates now, I realize that I would like this to be slightly different still.

First, I think that the individual files for the query templates and their "description" should not be in separate sub-directories. Instead, these files should all be under ./artifacts/queryTemplates/main/

Second, the files should be renamed:

  • The files with the query templates should be named QT1.txt, QT2.txt, etc.
  • The files with the variable names should be named QT1.vars, QT2.vars, etc.

/cc @ljukas

Heap Limit reached QT5

So... funny story...

This query:

{
  product(nr:405) {
    reviews {
      title
      reviewFor {
        reviews {
          title
          reviewFor {
            reviews {
              title
              reviewFor { 
                reviews {
                  text
                  title
                  reviewFor { label }
                }
              }
            }
          }
        }
      }
    }
  }
}

Because of the exponential growth of the returned object, the heap limit is reached on the server, and it crashes.

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

Is that something we want to account for in testing? or would an alternative approach maybe be to remove the last "reviewFor" or similar? @hartig @chengsijin0817

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.