Giter Club home page Giter Club logo

Comments (25)

sgrif avatar sgrif commented on May 18, 2024 1

So I've started to give this some serious thought. @tafia has started some of the work on this in #72. Here's a rough overview of what has to change, in order of easiest to hardest, along with what I think might be possible solutions, but I'm not yet sure.

Just as a note, while I haven't decided if I will support MySQL in the long term, I do want whatever changes we make here to make it possible to theoretically support MySQL, as that will help ensure that we can truly support N adapters and not just 2. In an ideal world, this is done in a way that additional adapters can be added as third party crates.

I'm writing this up to help clarify my thoughts, get some eyes on it to see if I'm missing anything, and give a roadmap for anyone who wants to take on a piece of this. I would prefer that we solve the harder bits (or at least have a strong indication that we have a solution that will work) before we bother with the easier pieces.

  • Connection needs to become generic. Should be trivial, but need to audit the public API for anything that doesn't translate.
  • Cursor somehow needs to become generic.
    • This one is a pain in the ass, and I wish we didn't have to do it. Any way that we end up doing this will basically mean that users who want to point at this as a return type will have to put connection specific information in their signatures.
    • This will eventually just be replaced by impl trait in whatever form it eventually lands, as I want the return type here to be impl Iterator<Item=T>. The current direction that's heading seems like it'll be useable for our needs (basically anything that doesn't require us to state the concrete type higher than the function level, as that would end up requiring HKTs to work here).
      • I don't expect that feature to be in stable pre-1.0, so I expect that we'll just have to bite the bullet here, and bump to 2.0 at some point to change to impl trait.
  • Many of the types we support right now are specific to PG.
    • The easy answer is to move everything that isn't also supported by Sqlite to a separate diesel-postgres crate, or under a config flag.
      • I am comfortable treating Sqlite as the "lowest common denominator" for SQL implementations and leaving those in the main crate. To my knowledge there is no type that is supported by sqlite (without extensions) that isn't supported by all backends.
      • There is definitely value in leaving these types non-backend specific. The vast majority of schemas are dominated by strings, integers, dates, and their variants. We want crates to be able to add functionality around those without caring about backend when possible
    • We should change the constraint on Connection to be trait Connection where {all std lib primitives and all fundamental sql types}: ToSql<Type, Self>.
      • We should also constrain that an impl is required for the nullable versions of all of those. If we ever see Higher Ranked Types, what we should write instead is for <ST: NativeSqlType, T: ToSql<ST, Self>> Nullable<T>: ToSql<ST, Self>
      • An unresolved question here is whether or not we should also require implementations of optional 3rd party crates that we've deemed important enough to support in the core adapter (e.g. chrono). I'm leaning towards no, since those won't be in the core crate when compiled w/ no features, which is what crates abstracting over us would depend on.
    • Array is an unfortunate loss here, as it's the only other type that is generic besides Nullable, and might be painful to abstract over without coupling to the backend (Oracle and SQLServer also support arrays).
      • As far as I can tell, there is no other generic type in PG or any other backend that is unbounded. Range is the closest, but even it's restricted to a known subset.
      • I'd like to resolve the question of "is there any unbounded generic type supported by any backend other than array?"
    • JSON is an additional unfortunate loss. 2 of the 3 cases I'm looking at (MySQL and PG) have a JSON data type, and it would be nice to eventually write impl<T: serde::Deserialize> FromSql<Backend, JSON> for T where String: FromSql<Backend, JSON>, since that would not be backend dependent. I'm not sure we can ever do that though, since with these API changes that impl could access backend specific code. We might need some sort of MappedFromSql trait.
  • ToSql, FromSql, Expression, and Query need to become generic over the backend, as do any traits generic over them.
    • This is going to include things like AsExpression, AsQuery, Queryable, FromSqlRow, and likely more.
    • Are we sure that Expression and Query need to be generic? Query does if Expression does to be sure, but are there any fundamental SQL statements that change based on the back end?
      • As I wrote that, Bound becomes the obvious case.
    • Again, this shouldn't be too hard, but I'd like to canvas our existing users to see how often any of these traits show up in their code for internal abstractions they've built.
  • NativeSqlType is coupled to PG
    • oid and array_oid are the trouble cases here.
    • The best solution that comes to mind is to have it be NativeSqlType<Backend>, giving Backend an associated Metadata type, and having a metadata function that returns that struct.
      • We should try to eliminate the new function if we go this route
      • Is the compiler smart enough to eliminate some copying if I only use one field of the returned struct, and the function impl is just a struct literal?
        • Is this still true if it's going through a trait object?
        • Does it even matter? Is our worst case for a performance hit a few extra i32 copies? We should benchmark regardless
  • Note for the last 2 points: This is making it clear to me that I want the "backend" type to be different from the connection. So we would have Connection<Backend> where i32: ToSql<Integer, Backend>, ... and we'd have impl Connection<PG> for PgConnection. This is because there's a lot of structs that will need to become generic over the backend which need to remain 0-size, and I'd like to avoid using PhantomData any more than we are, as it tends to be an indication that you're doing something which cannot be made object safe.
  • Our handling of bind params is fundamentally at odds with SQLite's API.
    • Just to clarify:
      • PG's API takes a query, an array of oids, an array of char*s (for the data), an array of lengths, and an array of booleans indicating whether it's text or binary
        • As such, our push_bind_param function takes a type and an Option<Vec<u8>>. We can calculate the lengths ourselves, we get the oid from the type, and we always pass binary.
      • MySQL is similar. They take an array of MYSQL_BIND structs, which consists of an enum for the type, a void * for the data, and a length. Whether the data is binary or text is dictated by the type. Text data still fits into our Vec<u8> fine.
      • SQLite instead has several functions that you call, based on the type of the bind param. These functions in general take a statement pointer, the index of the bind param, the data itself which is usually by value and a specific type, not a pointer, and a length in places where it makes sense.
        • What's important to note here is that the requirement of a statement pointer means that we can't actually call these functions in the place where we currently call to_sql.
        • Our code related to constructing bind params currently does not know or care about the index.
        • SQLite is unique in that it has no IO protocol that you can use (which is sorta the point), so we are bound to its C API.
    • This is by far the hardest one to solve
    • One potential solution (I think I prefer the one proposed later) would basically be to create a BoxToSql<Backend> trait, take the type on construction, and return that. This would mean ToSql falls to dynamic dispatch. This is potentially a catastrophic perf hit.
    • This is potentially awful, but there's actually no reason we can't just continue to use our current API. We can read a Vec<u8> back into an i32. We lose all guarantees of type safety at that layer of abstraction, but we should in theory be guaranteed that we're fine by the layer above. Since we have a very small subset of types that we need to worry about, we can just match on a type identifier to determine what to do. If we see performance hits in the read, we can even do some unsafe casts from Vec<u8> -> &u8 -> *const u8 -> *const i32 and dereference it if we really need to.
      • This seriously sounds so horribly dirty, but the more I think about it the more it could actually work, and is starting to be my favored option as it doesn't require significant changes to the API for other adapters, and won't cause a performance hit for them either.
      • This is all dependent on the fact that SQLite has a known set of types that we have to deal with. As far as I can tell, no extensions add types which affect the C API, particularly in the handling of bind parameters. We need to confirm that this is true, otherwise we need to provide an API that crates can hook into.
  • A scorched earth solution if we truly can't find a type safe solution that abstracts over multiple backends is to just take the traits or structs that need to change, and just put them under #[cfg(feature = "postgresql")]. This would mean we don't have to find an actually generic API for different backends, but would be incredibly harmful to the ability for third party crates to provide additional abstractions, and would mean that Diesel would never be able to be used with more than one type of database in the same application.

from diesel.

sgrif avatar sgrif commented on May 18, 2024

This is likely not going to happen until we get closer to 1.0. I have said in the past that I want to support other databases eventually. My gut was to close this, as there's not anything immediately actionable. However, as you've presented a reasonable use case that isn't "I prefer MySQL", I'm accepting this feature for 1.0.

from diesel.

iqualfragile avatar iqualfragile commented on May 18, 2024

make sure (pre 1.0) that the libraries design allows generating database-agnostic sql (or well plugging in custom sql generators for different databases), i think that might be easily forgotten when mainly using postgress as backend.

from diesel.

sgrif avatar sgrif commented on May 18, 2024

Yup, supporting 1 means supporting N

from diesel.

tafia avatar tafia commented on May 18, 2024

Could you describe what has to be done to implement a new database ?
I'd also like to have sqlite (in particular) implemented in diesel and I'd like to help if possible.

from diesel.

sgrif avatar sgrif commented on May 18, 2024

@mfpiccolo @samphippen @mcasper It's long, and pretty technical, but I'd like to get thoughts on the above. Also if any of you are interested in helping tackle some of this, have at it.

@tenderlove, @rafaelfranca, @senny, @metaskills -- I know none of you are involved in this project or Rust, but I think you guys would be able to provide some valuable insight as well. I'm happy to clarify our API and how these problems relate to it in ways that don't require knowledge of the language if you'd like.

@metaskills: In addition, I'd be interested to hear if you think this will be enough for SQL Server. I've more or less looked at this from a high level WRT MySQL as the third case, but knowing that it can handle another DB is a good indication that we will in fact be generic after this.

After writing this up, I think I've got my thoughts organized enough that I can see a reasonable path to implementing this. From what I can tell, the proposed solutions will also put us in a place for others to write additional database adapters without having to have them in the primary crate. We can essentially put this to the test by having postgres and sqlite live in separate crates. @e-oz, you've expressed interest in having MySQL support in the past. It'd be great if you were interested in working on that as a third party crate, as well.

from diesel.

sgrif avatar sgrif commented on May 18, 2024

@matthewd If you don't mind, you might have some thoughts too

from diesel.

iqualfragile avatar iqualfragile commented on May 18, 2024

About differences in types supported by different databases:
Another option would be providing a fallback, so for example just saving an ip address, which is a native type in postgres into a 32 bit integer in sqlite.

from diesel.

iqualfragile avatar iqualfragile commented on May 18, 2024

Would it be an option to just put support for different engines into different modules, so you can just import the right one for your application? That would remove the ability to dynamically configure your application to use another database, but might reduce coding complexity.

from diesel.

sgrif avatar sgrif commented on May 18, 2024

@iqualfragile That's essentially the scorched earth option at the end, but it doesn't work terribly well if you want to have traits people can depend on regardless of backend. Ideally our core types (in the rust sense not the SQL sense, e.g. what comes out of the expression module) don't change based on your backend.

But yes, not having to be properly generic over the backend would reduce coding complexity, and I'm open to it as a last resort. I'm confident after writing up these thoughts that we can properly become generic though. Bind params are the hardest part and I think we have some good options.

from diesel.

Aatch avatar Aatch commented on May 18, 2024
  • Is the compiler smart enough to eliminate some copying if I only use one field of the returned struct, and the function impl is just a struct literal?

Probably.

  • Is this still true if it's going through a trait object?

Probably not, unfortunately. Since it can't inline a virtual call, it needs to allocate the full return value on the stack for the function to use.

  • Does it even matter? Is our worst case for a performance hit a few extra i32 copies? We should benchmark regardless

Depending on the size of the returned value, it probably won't make a huge difference. When dealing with databases, you have to compare this stuff to the performance of I/O. Reading/writing to the network or a file is going to dominate any profile, so worrying about a few extra copies here and there is probably not going to matter.

from diesel.

sgrif avatar sgrif commented on May 18, 2024

it probably won't make a huge difference. When dealing with databases, you have to compare this stuff to the performance of I/O.

I agree. That said, it'd still be nice to have benchmarks to back up the fact that we haven't caused any regressions in cases like finding a single record by primary key, where our query builder has the largest hit. Additionally, being able to show that our query builder is effectively 0 cost is a nice bullet point. That said, yeah I don't think the metadata case will be nearly as big of a hit as changing to_sql to dynamic dispatch would be.

from diesel.

iqualfragile avatar iqualfragile commented on May 18, 2024

there is another big problem with sqlite, as its column types are more of a recommendation, i.e. you can store a string into an integer field and sqlite won't complain. while i strongly doubt this should be supported for the insert/update, it needs to be kept in mind on the select side of things.

from diesel.

sgrif avatar sgrif commented on May 18, 2024

This is true. However, I think leaning on the type safety provided by our current query builder is fine here. Based on where sqlite has caused issues in Active Record, I believe we should be safe.

from diesel.

mfpiccolo avatar mfpiccolo commented on May 18, 2024

Nice job on the roadmap. I agree that if we can pull it off the adapters should be third party crates.

Here are some of my thoughts:

'Cursor' ... connection specific information in their signatures.

This sounds a bit rough but like you said we will be handled by impl Trait then that should be fine in the interim.

done in a way that additional adapters can be added as third party crates

👍

Array is an unfortunate loss.
JSON is an additional unfortunate loss

Meaning loss as in they will need to be supported in an external crate?

We should benchmark regardless

Ensuring that we don't have regressions seems like a good move if diesel is to power much of the database interaction in the Rust community.

unsafe casts from Vec -> &u8 -> *const u8 -> *const i32

The small subset of types that SQLite supports is going to be the default types that are supported in diesel core, so this will affect all the adapters even though it wouldn't be necessary if we didn't support SQLite. I certainly defer to your better judgement on this:

we should in theory be guaranteed that we're fine by the layer above.

but it might be prudent to be sure of this before deciding if it is worth it to support SQLite with this route.

from diesel.

metaskills avatar metaskills commented on May 18, 2024

Thanks for including me Sean. I think I can speak to one point well about bind params and my experience with SQL Server and Ruby's TinyTDS gem. I know of no C API outside of the ODBC layer for the DBLIB protocol that is essentially SQL Server. I punted on this use TSQL when I did support for bind params in the SQL Server adapter. Check out this really old engineyard post for details. Basically, we do bind params at the SQL layer by wrapping the SQL as a param to a stored procedure which has n arguments that include the type and value of each param. I am unsure if this, like SQLite, is at odds with your architecture. Happy to go into details on any topic if you think it helps.

from diesel.

sgrif avatar sgrif commented on May 18, 2024

Meaning loss as in they will need to be supported in an external crate?

Loss as in they will need to be adapter specific (which likely means being in another crate)

but it might be prudent to be sure of this before deciding if it is worth it to support SQLite with this route.

If our type system doesn't cover any potential problems, it's an issue regardless of if we're supporting SQLite or not.

from diesel.

sgrif avatar sgrif commented on May 18, 2024

Moving the timeframe on this forward to 0.5, as it's our most immediately actionable feature. I expect to have this finished this weekend.

from diesel.

placrosse avatar placrosse commented on May 18, 2024

While not perfect, www.freetds.org does have a 'C' library that can connect to Sybase and MS SQL Server, including some support for the dblib API. We used rust-bindgen with it, and are testing it out now.

I'm very interested in seeing if it can be made to work with Diesel (eventually).

from diesel.

metaskills avatar metaskills commented on May 18, 2024

@placrosse The TinyTDS gem makes exclusive use of the DBLIB API in FreeTDS. Love it.

from diesel.

sgrif avatar sgrif commented on May 18, 2024

@placrosse I'm interested in making sure that it's possible to add additional adapters that aren't part of the main diesel crate. If you're interested in trying to add TDS support as a third party crate once we finish the SQLite stuff, let me know -- I'd be interested in working with you on it.

from diesel.

placrosse avatar placrosse commented on May 18, 2024

Yes, @sgrif , I am very interested in doing just that.

from diesel.

sgrif avatar sgrif commented on May 18, 2024

We still have a little bit more work to do, but I've merged our SQLite support branch into master. We'll be releasing 0.5 with this feature later this week.

from diesel.

ixjf avatar ixjf commented on May 18, 2024

In regards to MySQL, are there any plans for support in the near future, or is it still slated for close-to-1.0? This is not an "I prefer MySQL" as in "I don't want to work with PostgreSQL", but the project I'm working on uses MySQL for other services and it's just not feasible to move to PostgreSQL and SQLite is not a good alternative due to its performance.

from diesel.

fables-tales avatar fables-tales commented on May 18, 2024

@ixjf the diesel core team have explicitly decided we will not be adding first party support for MySQL. That said, we welcome anyone who is interested in building a 3rd party crate that integrates with diesel to leverage this support. If you'd like to do that, we'd absolutely love to see it and we'd be able to support you to some degree. Thanks ❤️

from diesel.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.