Giter Club home page Giter Club logo

Comments (27)

Pvlerick avatar Pvlerick commented on July 28, 2024 1

Reading the code (I made a few comments), I think this adds the flexibility needed to add what we need for Cassandra.

I will try to merge this in my current work (or rebase on top of it) to see if I can already use it.

from evolve.

Pvlerick avatar Pvlerick commented on July 28, 2024 1

Good news: after implementing a CqlStatementBuilder and hacking the missing logic into the Evolve class, the migration test now works for Cassandra. It has the previously mentioned limitations, but I think these limitation are fairly reasonable.

Now, I noticed that using this strategy, we end up having multiple entries in the changelog table, manly because each migration script is divided into statements, and each statement is then considered as a migration. This does not look like a big issue to me since they will have the same checksum, name and description, but I'm not sure what are the ramifications of this (maybe that can be solved by changing the logic in Evolve and save migrations based on files and not statements).

Sources: https://github.com/Pvlerick/Evolve/tree/issue-39-add-support-for-cassandra

from evolve.

Pvlerick avatar Pvlerick commented on July 28, 2024

I started working on a prototype: https://github.com/Pvlerick/Evolve/tree/issue-39-add-support-for-cassandra

It is a bit clunky because Cassandra's nomenclature doesn't map so well with SQL concepts, that I think that in the end it will hold.

from evolve.

lecaillon avatar lecaillon commented on July 28, 2024

Hi @Pvlerick
First of all thank you for the interest and the work you put in Evole. πŸ‘
I'm not familiiar with Cassandra. Is it a database which needs a schema migration tool ?
I'm not convinced by the interest of Evolve for that kind of nosql database.
Can you enlighten me please ?

from evolve.

Pvlerick avatar Pvlerick commented on July 28, 2024

Cassandra is a distributed, high availability and scalable database. It is considered NoSQL but it behaves in a similar way in many regards as a classical RDBMS, although it has quite a lot of limitations when it comes to features compared to classical RDBMS (that's the trade-off for being distributed and such).

It has its own query language: CQL (Cassandra Query Language) which is quite similar to SQL. Keyspaces (databases/schema) and tables must be created using it, altered, dropped, etc... using that language, so the need for a migration tool is there.

We currently need such a tool where I work at and Evolve is the best candidate so far, as you wrote all the mechanics and it's just a matter of wiring it all together. As I said, some parts might be a bit clunky because of some of the concepts' names and because some of the abstractions are not going to fit 100%, but from what I have seen so far it could work.

If you don't want to include it in the project, we could try to make this an extension in a different package?

from evolve.

lecaillon avatar lecaillon commented on July 28, 2024

Thank you for the feedback. Appreciate it.
I thought the database schema was more or less managed in the app code with Cassandra.

So, I will be pleased to help you when I can in your effort to develop a Cassandra driver for Evolve.
And let's see where it leads us :)

from evolve.

lecaillon avatar lecaillon commented on July 28, 2024

@Pvlerick For your information, I'm working on the CassandraCSharpDriver integration part in MsBuild / dotnet build. It's a tricky part in Evolve I'm starting to get used to.

You can concentrate on the "Cassandra database stuff".

from evolve.

Pvlerick avatar Pvlerick commented on July 28, 2024

@lecaillon Ok, cool! I'm progressing well, the dialect part is almost complete and I didn't encounter anything blocking yet.

from evolve.

Pvlerick avatar Pvlerick commented on July 28, 2024

@lecaillon I have the dialect working but I started to run into some limitations due to the Cassandra drivers, mainly:

  • A script can only contain one statement, I didn't find a way to execute multiple statements in one file - obvious workaround is to have a file for each statement, not ideal to say the least;
  • The drivers return a CqlBatchTransaction when BeginTransaction is called on CqlConnection (see here), which means that any statement that is not idempotent (such as an alter table) will fail because it is executed twice (once when ExecuteNonQuery is called, once when Commit is called - at least that's my current assessment.

Sources updated: https://github.com/Pvlerick/Evolve/tree/issue-39-add-support-for-cassandra

cc @jorgebay, just in case he has something to chip in ;-)

from evolve.

lecaillon avatar lecaillon commented on July 28, 2024

@Pvlerick The Oracle driver acts exacty like this. It supports only one statement at a time (that's why I did not implement it until now). That is not the best case scenario for us :)
I see (at least) 2 options:

  • Use the DatabaseHelper.BatchDelimiter property in the cql scripts. Evolve use this string to split one script into several statements. You can give it a try, but it's more a hack than a clean solution IMO.
  • The proper way to do it is for each script, to parse the content of the file to extract queries one by one. I didn't do it earlier because no one seems to be interested in Oracle, and honestly it is quite tricky to do. I see a lot of edge cases. More over at the time I did not have a lot of unit tests, so I prefered to be careful on this point.
    If you want to work on this point with me I would be really happy. It would be very interresting. Tell me what you want to do.

I did not understand your second remark. :)

from evolve.

jorgebay avatar jorgebay commented on July 28, 2024

If I can give you some advice, I would stay away of the ADO.NET component of the driver and use the core API, that supports parallel execution, request pipelining, ...

from evolve.

Pvlerick avatar Pvlerick commented on July 28, 2024

@lecaillon I tried using the BatchDelimiter property, but it didn't work.
Extracting queries from the script is a possible solution, but I'm not sure implementing a parser on our own is a good idea: that's a lot of work and as you say there are tons of edge cases.
Maybe we could implement something simple and document limitations. From the top of my head, I would suggest something that splits the file in lines and look for the end-of-statement character (in CQL, ;) at the end of each line and regroup lines based on this.
The obvious limitation is that scripts should not have literals spanning on multiple lines that contain ; at the end of one of the lines, and not have a comment ending with ;.
To me, this sounds like a reasonable limitation for the tool and is easy to implement. Nothing prevents us from having a better implementation in the future that would remove this limitation, of course.

My second point was that the Evolve class wraps each statement in a BeginTransaction/Commit. It seems that the Cassandra's drivers do not handle this well when it comes to statements such as alter table.
Basically, when I run my tests with an alter table, the table gets the new column added when the ExecuteNonQuery is executed, but it seems that it is executed again when the Commit is called, which results in an error since the table already has the column. I will have to check if the behavior is the same for other statements (something that @jorgebay can probably confirm).

@jorgebay: we are using the ADO drivers because they allow us to reuse most of the code already written for SQL engines. Since this is for database/keyspace/whatever evolution and versioning, it does not need the extra features from the core API (which indeed should absolutely be used in any application). Should we take from your warning that you plan an decommissioning this, or can we use it safely with the warnings/limitations that you documented?

from evolve.

Pvlerick avatar Pvlerick commented on July 28, 2024

@lecaillon for the second point, one way to solve this would be to be able to optionnaly remove the BeginTransaction/Commit using a property on the Evolve class. I'll give it a try in another branch and see if this is sound.

from evolve.

lecaillon avatar lecaillon commented on July 28, 2024

@Pvlerick Knowing that some drivers do not support transaction for DDL instructions, it can be an interresting option. Disable it, even for SQL statements seems awkward.

from evolve.

lecaillon avatar lecaillon commented on July 28, 2024

@Pvlerick By the way, integration of the CassandraCSharpDriver is almost done. It should be easy to use in Evolve. That's a good news :)

from evolve.

Pvlerick avatar Pvlerick commented on July 28, 2024

@lecaillon ok, good news!

Here is the branch for optional transactions : Pvlerick@a657006
Tests are all green, but since this is your code you can probably assess if that's sound, as I might have missed something. The diff is a bit messy, but I did a reformat in VS and it removed some trailing spaces all around :-)

from evolve.

lecaillon avatar lecaillon commented on July 28, 2024

@Pvlerick Concerning the Query parser we can start with an naive implementation only activated for Cassandra migrations.

from evolve.

Pvlerick avatar Pvlerick commented on July 28, 2024

@lecaillon where would you put that implementation? In a subtype of MigrationScript?

from evolve.

lecaillon avatar lecaillon commented on July 28, 2024

@Pvlerick I'm going to create a specific branch on this one, for us to work on this.

I think the design (even naΓ―ve) must meet those 2 goals:

  • Be able to go on with the old fashioned MigrationScript when the database driver does not need this feature. ie for all existing drivers excepting Cassandra and Oracle (one day maybe).
  • Be able to extract all the queries of a script and execute the non DDL contiguous ones in the same transaction.

from evolve.

lecaillon avatar lecaillon commented on July 28, 2024

@Pvlerick

The proposal

SqlStatementBuilder

Introduce a new abstract class : SqlStatementBuilder, in charge of parsing the content of a MigrationScript. A builder is set on each specific DatabaseHelper. We use the SimpleSqlStatementBuilder on all existing databases, and the CassandraSqlStatementBuilder for the CassandraDatabase.
Each SqlStatementBuilder has to implement this method:

protected abstract IEnumerable<SqlStatement> Parse(string sqlScript, string delimiter);

SqlStatement

The new SqlStatement class holds the sql statement and the execution context (some props like CanExecuteInTransaction).

Sources updated : https://github.com/lecaillon/Evolve/tree/feature-query-parser

What do you think ?

from evolve.

Pvlerick avatar Pvlerick commented on July 28, 2024

Ok, in the meantime, I fixed the migration count issue on the Evolve class.

I think we now have all the pieces that we need to make this work :-)
I can probably add a bit more tests for Cassandra, but all in all it looks like we have a good starting point that I can start using at work.

from evolve.

lecaillon avatar lecaillon commented on July 28, 2024

@Pvlerick
How do you see the next steps ?

from evolve.

Pvlerick avatar Pvlerick commented on July 28, 2024

@lecaillon if you implement your proposal in master, I should be able to rebase on it and cleanup the history in my branch then open PR.

from evolve.

lecaillon avatar lecaillon commented on July 28, 2024

@Pvlerick
Ok, I will make my best to implement my part as soon as possible. I keep you in touch when I'm ready.
On your side have you got a deadline for this feature ?

from evolve.

Pvlerick avatar Pvlerick commented on July 28, 2024

@lecaillon no deadline, but the sooner the better, off course ;-)

I will already start the preparatory work on adding this to the build/deployment process.

from evolve.

lecaillon avatar lecaillon commented on July 28, 2024

@Pvlerick Merged the new SQL query parser in master. Implementation done. Lack few tests.
Feel free to give me your feedback concerning the design, the naming, comments ...
I'm waiting for your PR.

from evolve.

Pvlerick avatar Pvlerick commented on July 28, 2024

I guess this can be considered done for now, I'll open separate issues for the tweaks.

from evolve.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.