Giter Club home page Giter Club logo

sqlwhat's Introduction

sqlwhat

Build Status PyPI version

sqlwhat enables you to write Submission Correctness Tests (SCTs) for interactive SQL exercises on DataCamp.

Installing

pip install sqlwhat     # install from pypi
make install            # install from source

Reference

Raising issues with how SQL is parsed

Please raise an issue on the respsective parser repo:

Basic Use

from sqlwhat.State import State    # State holds info needed for tests
from sqlwhat.Reporter import Reporter
from sqlwhat.checks import *       # imports all SCTs
from sqlalchemy import create_engine

code = "SELECT * FROM artists WHERE id < 100"

state = State(
    student_code = code,
    solution_code = code,
    pre_exercise_code = "",
    student_conn = create_engine('sqlite:///'),
    solution_conn = create_engine('sqlite:///'),
    student_result = {'id': [1,2,3], 'name': ['greg', 'jon', 'martha']},
    solution_result = {'id': [1,2,3], 'name': ['toby', 'keith', 'deb']},
    reporter = Reporter()
    )

# test below passes, since code is equal for student and solution
has_equal_ast(state)

# test below raises a TestFail error, since 'name' col of results
# doesn't match between student and solution results
check_result(state)
# shows error data
state.reporter.build_payload()

# can also be done using a chain
from sqlwhat.sct_syntax import Ex
Ex(state).check_result()

Running unit tests

pytest -m "not backend"

If you also want to run the backend tests, you need to set a GITHUB_TOKEN environment variable with access to the (private) sqlbackend repository. After this, you can:

make install
pytest

Rules of testing

  1. Running queries is the backend's job
  2. If a test doesn't run queries, it doesn't need the backend
  3. Very few tests should run queries

Building Docs

Install sqlwhat and run ..

cd docs
make html

sqlwhat's People

Contributors

arnoco avatar bogdanfloris avatar ddmkr avatar ewaldbervoets avatar filipsch avatar hermansje avatar machow avatar ncarchedi avatar timsangster avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sqlwhat's Issues

Implement basic messaging system

  • generic pattern for AST libraries to report human friendly name of AstNode instances
  • sqlwhat only uses most recent message for now, relies on highlighting for context. (e.g. rather than "Check the 1st Select statement. Check the WHERE clause. Did you _____?", Use highlighting + "Did you ____?".
  • sqlwhat uses human friendly names when can't find node or field (e.g. "Did you define a {ordinal} {node_name}" -> "Did you define a 2nd SELECT statement?")

where -> check_result - "incorrect third condition in WHERE?"

In general, producing a statement like this is very hard. It's unclear how this should be done. For reference, a WHERE clause might look like this...

image

(Note that we essentially run into the same problem here as for test_operator in pythonwhat.)

move out postgres parser and put it on pypi

Currently, we use travis to build sqlwhat (by generating the antlr python targets). However, if we move the parser out, then we can move all the building to the parser, which will likely be updated less frequently than sqlwhat itself. This will also help for creating a small app to visualize the AST, since it and sqlwhat can just depend on the parser.

use subset of unit tests to examine feedback messages

Many tests have the form

def test_test_has_columns_fail():
    state = prepare_state({'a': [1,2,3]}, {})
    with pytest.raises(TF): cr.test_has_columns(state)

For some of them, it would be useful to optionally view their feedback message. Could do this by doing something like

@pytest.mark.feedback
def test_something():
    ....

Then they can be viewed by running from command-line...

pytest -s -m "feedback"    # plus some meta-parameter telling tests to print feedback

replace ast list with ListNode

Currently, when you get the target_list of a SelectStmt, you get a literal python list. That's fine for comparing currently, since things like has_equal_ast use repr to compare AST representations. However, it could cause problems down the road.

override_sql function

has_equal_ast will be equivalent to using override_sql -> testing the asts are identical

Adding test_ncols to check_result

Overview

Currently check_result() will pass if the columns in the student submission are some superset of the columns in the solution.

This means that

SELECT *
FROM films;

will pass, when the desired solution is:

SELECT title, release_year
FROM films;

Enhancement Request

Adding test_ncols() to check_result() so the number of columns in student submission must match the number of columns in the solution code. I would argue that this is desirable default behavior, but it's up for discussion: cc @ncarchedi

check_clause - gets AST node attribute

  • can be skipped to get next statement (e.g. check select stmt -> check select stmt)
  • missing msg, if attribute is None
  • can be followed by has_equal_ast

Allow has_equal_ast to pass if student AST contains solution

Similar to pythonwhat. This will allow SCTs like

*** =solution

SELECT * 
FROM film
WHERE release_year = 2006
AND film_id < 5;

*** =sct


Ex().check_node('SelectStmt') \
    .check_field('where_clause') \
    .has_equal_ast(sql = "film_id < 5", start="expression", exact = False)

Note that we'll need to work out how to show CDs the names of parser rules in sqlwhat-viewer (so they can specify the start arg).

Dialect specific handling of case sensitivity

Currently, to workaround tsql issues...

  1. AST representations are converted to lowercase
  2. Column names are converted to lowercase

In postgres, these are not issues because unquoted column names should be made lowercase in the AST (as the official parser does). This also means that column names tend to be lowercase, too. Using casing is possible, though, so we don't want to convert everything to lowercase for postgres.

Support row order independent checking in check_column

When checking columns, there should be an option to compare the columns without accounting for the order of the rows. This can be done brutally, by simply ordering the column and seeing what happens, or ordering by a certain column, and keeping the permutation to order other columns accordingly (which is more correct I think).
As this can raise the number of steps in checking quite a bit, I think you should make this opt-in to start with.

Ignore AST tests if parsing error in antlr

It's possible that students will submit code that is valid postgresql, but the parser can't handle. In this case..

  1. It should raise (e.g.) a ParsingError, that is caught and short-circuits chaining
  2. Things like checking results should still run

AST node -> position in raw submission text

for ctx = ast_node._ctx, should be able to use something like

start = ctx.start     # starting token
start.start              # it's first letter
start.stop
start.line

stop = ctx.stop      # stopping token
# etc..

(I'm not sure what happens if a token spans multiple lines, but also can't think of when that would happen)

Provide option in check_result to make case-insenstive

Feedback from Graeme, Microsoft contact for the transact-sql course:

Also, I noticed that I get an error based on the column name case not matching the expected answer (e.g. if I were to use the alias CustomerName when it expects Customername) โ€“ by default , Transact-SQL is case-insensitive so this should not cause an error. Can you fix that?

I suggest we put an option in check_result() and test_column() to be forgiving for case sensitivity. Pretty urgent.

Mapping column aliases should work in clauses

If I understand the spec correctly people should be able to specify that these two options are both correct even when testing ASTs...

SELECT id FROM sometable ORDER BY id ASC

and

SELECT id as id2 FROM sometable ORDER BY id2 ASC

However, if the order by clause is incorrect in the second give feedback:

-- "incorrect ORDER BY clause"
SELECT id as id2 FROM sometable ORDER BY id2 DESC

implementation

This will require mapping identifiers from the solution onto the submission AST. An analogous case is mapping imports in pythonwhat. One solution is, like pythonwhat, keep a dictionary that maps each identifier and alias to the table it is referring to. E.g.

# where these are encountered in order
id as id2 -> sometable.id
id -> sometable.id
id2 -> sometable.id
sometable.id -> sometable.id

Note that this doesn't account for cases where an alias is given to, say, the result of COUNT or a CASE expression. For those situations it would just map to the corresponding COUNT or CASE ASTs...

# for CASE ... as id2
id2 -> CASE AST
# for COUNT(*) as count
count -> COUNT(*)

check_result option to be flexible on column name/order

Minimal example:

CREATE TABLE simple ( id INTEGER)
INSERT INTO simple (1)

Pass both SELECT id FROM simple and SELECT id as id2 FROM simple.

Realistic Example:

For example, both should pass:

SELECT 
    EXTRACT(MONTH FROM start_time) AS month, 
    COUNT(*) 
FROM trips 
GROUP BY month;
SELECT 
    EXTRACT(MONTH FROM start_time) AS month, 
    COUNT(*) as ttl
FROM trips 
GROUP BY month;

Complete AST parsing for most SELECT clauses

Most of these just involve removing the keyword

  • FROM list of tables: SELECT a FROM b, c
  • WHERE
  • GROUP BY
  • UNION
  • ORDER BY
  • OFFSET (not in parser, see issue #45 )

(bonus: HAVING, WINDOW, FETCH, FOR)

regex matching seemingly not working

Am I doing something wrong?

See screenshot below:
screen shot 2017-05-26 at 4 24 43 am

Test is on line 1071, and works fine when fixed=True but I want to check for and|AND.

Another example is here:
screen shot 2017-05-26 at 5 12 10 am

function names should reflect what is happening to state

Currently the three functions

  • check_result
  • check_statement
  • check_clause

all begin with check, but only the last two change state (by fetching a new AST node). It seems like we should change check_result, whose behavior will change depending on the state (e.g. called after check_statement('select', ...)) to test or test_result.

Multiple feedback messages when submission has syntax error

The first comes from the backend, and is passed to test_exercise. The second is from running the SCT.

Should...

  1. add test_error function, which fails if backend sent in an error.
  2. ensure that if SCT completes, and there is already an error, then it fails and returns the original error message.

Note that this does not include the case where someone wants an exercise to succeed, even if there is a syntax error (we can always through in an allow_error function if it comes to it, seems unlikely).

ORDER BY parsing is incorrect

edit:
UNION is treated as compound statement, so for two select statements, a trailing ORDER BY is the outermost piece. It's probably okay for now, but the ORDER BY will incorrectly be part of the final SELECT statement. Example below.

SELECT a FROM b
UNION
SELECT x FROM y
ORDER BY c

the postgres ast is

  [{"SelectStmt"=>
     {"sortClause"=>
       [{"SortBy"=>
          {"node"=>
            {"ColumnRef"=>
              {"fields"=>[{"String"=>{"str"=>"c"}}], "location"=>47}},
           "sortby_dir"=>0,
           "sortby_nulls"=>0,
           "location"=>-1}}],
      "op"=>1,
      "larg"=>
       {"SelectStmt"=>
         {"targetList"=>
           [{"ResTarget"=>
              {"val"=>
                {"ColumnRef"=>
                  {"fields"=>[{"String"=>{"str"=>"a"}}], "location"=>7}},
               "location"=>7}}],
          "fromClause"=>
           [{"RangeVar"=>
              {"relname"=>"b",
               "inhOpt"=>2,
               "relpersistence"=>"p",
               "location"=>14}}],
          "op"=>0}},
      "rarg"=>
       {"SelectStmt"=>
         {"targetList"=>
           [{"ResTarget"=>
              {"val"=>
                {"ColumnRef"=>
                  {"fields"=>[{"String"=>{"str"=>"x"}}], "location"=>29}},
               "location"=>29}}],
          "fromClause"=>
           [{"RangeVar"=>
              {"relname"=>"y",
               "inhOpt"=>2,
               "relpersistence"=>"p",
               "location"=>36}}],
          "op"=>0}}}}],

test_student_did_not_type functionality

Summary of Issue

At present we can check if a student typed something in the submission. It would be useful to be able to check that they did not type something.

Example

Currently for the TransactSQL course, the triple underscore (___ ) sometimes causes a parsing error in sqlwhat.

screen shot 2017-05-22 at 2 05 57 pm

Additionally, for exercises using ISQL, at present we can only check the AST.

Enhancement Idea

So, it might be useful to check that there is no ___ present in the submission using test_student_did_not_type(), rather than failing when trying to parse the submission. Thus in the SCT, test_student_did_not_type() would only pass and allow the AST checking to occur if there were no blanks present.

Related to #96

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.