Type conversions refactor split into more logical parts, plus a complete PEPification

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Type conversions refactor bis,about django-nonrel/djangotoolbox

Comments (45)

emperorcezar commented on July 19, 2024

tests for the natural usage of ListField with ForeignKeys are on a separate branch);

Interested in this. Does it accomplish the same thing I tried with this pull request django-nonrel/mongodb-engine#83 and my custom ForeignKey field https://github.com/emperorcezar/Django-utils/blob/master/fields.py ?

from djangotoolbox.

wrwrwr commented on July 19, 2024

You'll still have to use model_instance.pk rather than simply model_instance (but other than that it should now work with Mongo, not just GAE). I did some research on the issue, but so far this only ended with some tests and a broken, one-sided solution:
https://github.com/django-nonrel/djangotoolbox/compare/feature/collections-with-foreignkeys-tests

The hard part seems to be making ForeignKeys in a ListField fetched from the database appear like model instances -- Django uses descriptors / managers for that, so it would be nice to do something along this line for collection fields.

from djangotoolbox.

emperorcezar commented on July 19, 2024

In your tests you seem to be pushing, querying, and returning model instances. In what circumstances do you need to use model_instance.pk?

from djangotoolbox.

jonashaag commented on July 19, 2024

Thanks so much for the amazing work!

In my opinion, however, a big part of the style changes are just wrong.

We really don't need to break all lines after 79 characters. 90 chars is okay sometimes. For instance, django-nonrel/mongodb-engine@12ae01c#L1L57 is not improving readability at all.
Parenthesis-imports are okay and much more readable than using \. django-nonrel/mongodb-engine@12ae01c#L2L8
Not all comments need to end in a dot, nor do exception messages need to end in a dot.
XXX is not the same thing as TODO
django-nonrel/mongodb-engine@12ae01c#L21L8 django-nonrel/mongodb-engine@12ae01c#L50L22 okay these are really stupid. We won't do any styling on autogenerated/auto"provided" code.
django-nonrel/mongodb-engine@12ae01c#L54L9 this doesn't help readability either.
...

I really really appreciate your hard work but you need to stop thinking about PEP8 (and friends) as hard rules. They're just guidelines and it's okay to think on your own sometimes.

from djangotoolbox.

wrwrwr commented on July 19, 2024

Returning model instances is not solved -- the test will just fail there; the solution on the foreignkey branch is just a (bad) idea. Take
a look at this test: https://github.com/django-nonrel/djangotoolbox/blob/feature/type-conversions-refactor-2/djangotoolbox/tests.py#L230, this is what works after refactor.

from djangotoolbox.

wrwrwr commented on July 19, 2024

Update: due to the renaming of DatabaseCreation.nonrel_db_type the nonrel branch:
https://github.com/django-nonrel/django-nonrel/compare/feature/drop-related-db-type
is required for the -2 branches to work (not having it results in an infinite recursion).

from djangotoolbox.

jonashaag commented on July 19, 2024

Woa GitHub's formatting of email replies is really horrible. Can you just re-post the whole thing using the webinterface?

from djangotoolbox.

emperorcezar commented on July 19, 2024

Returning model instances is not solved -- the test will just fail there; the solution on the foreignkey branch is just a (bad) idea. Take a look at this test: https://github.com/django-nonrel/djangotoolbox/blob/feature/type-conversions-refactor-2/djangotoolbox/tests.py#L230, this is what works after refactor.

The only way I found to solve that was overriding the to_python on the ForeignKey field and calling rel.to.objects.get which worked well in my case, but my bet is you're not overriding the ForeignKey field.

https://github.com/emperorcezar/Django-utils/blob/master/fields.py#L18

from djangotoolbox.

wrwrwr commented on July 19, 2024

In my opinion, however, a big part of the style changes are just wrong.

Probably you're simply right, nevertheless I'll try to explain myself.

We really don't need to break all lines after 79 characters. 90 chars is okay sometimes. For instance, django-nonrel/mongodb-engine@12ae01c#L1L57 is not improving readability at all.

I'd say 120 is always OK. There may be a few that did decrease readability, but in most cases there's no harm done. And the time spent thinking if 80 / 90 / 100 is fine here or there can be better used.

Parenthesis-imports are okay and much more readable than using \. django-nonrel/mongodb-engine@12ae01c#L2L8

OK, but lets use the same thing in toolbox and GAE too. Fixed.

Not all comments need to end in a dot, nor do exception messages need to end in a dot.

Yup, this is my personal bias. Can't be helped, sorry :-)

XXX is not the same thing as TODO

Undone. Maybe lets also explain XXX in docs somewhere -- I can think of several different meanings of it (like FIXME, like HACK, "had no time to properly check", "a critical problem" or any of these); some more TODOs should probably be turned to XXXs then.

django-nonrel/mongodb-engine@12ae01c#L21L8 django-nonrel/mongodb-engine@12ae01c#L50L22 okay these are really stupid. We won't do any styling on autogenerated/auto"provided" code.

Quite right, some mindless automation must have been in work here. Fixed.

django-nonrel/mongodb-engine@12ae01c#L54L9 this doesn't help readability either.

It (sometimes) does help maintainability though (you don't have to reindent after adding another case) and (often) does not harm readability significantly. Some guides recommend this explicitly -- the Google one above, but also some for different languages (surely you'll find some that say differently); I'd say the recommendation is sound. Of course, there are exceptions, but I don't think this specific case deserves to be one.

I really really appreciate your hard work but you need to stop thinking about PEP8 (and friends) as hard rules. They're just guidelines and it's okay to think on your own sometimes.

Maybe you'd be surprised to see the styling I tend to use in private (with a lot of camelCase, whole paragraphs on a single line, and even most inline comments lowercase ;-) However, I believe it's good to stick to something for a project that many are working on -- this doesn't have to be PEP or anything other, just something.

I wouldn't like to push these style changes against anyone, so if you'd rather have just the refactor that's OK (some renaming and merging will have to be done, but otherwise it shouldn't be very hard to just drop these first commits).

from djangotoolbox.

wrwrwr commented on July 19, 2024

@emperorcezar Yes, I had the ambition of making it work with the standard ForeignKey :-) I've also given a short attempt at the descriptor approach (something like defining __get__ and __set__ for AbstractIterableField) -- didn't work, but should be possible.

You should be able to drop the db_type method in your class and do just:

def get_db_prep_value(self, value, connection, prepared=False):
    if isinstance(type(value), type(self.rel.to)):
        return value.pk
    else:
        return value

instead of both get_db_prep_save and get_db_prep_lookup with the refactor.

from djangotoolbox.

jonashaag commented on July 19, 2024

Okay just did the review. This looks really really great. I added a few comments here and there. I didn't review the App Engine part because I'm not familiar with (nor interested in) it.

Thanks thanks thanks!

from djangotoolbox.

jonashaag commented on July 19, 2024

ah btw I did some MongoDB Engine compiler refactoring a while ago, if you like you can incorporate what's left after your refactoring. https://github.com/django-nonrel/mongodb-engine/tree/feature/compiler-cleanup

from djangotoolbox.

jonashaag commented on July 19, 2024

looks really great now. RFC from my point of view for djangotoolbox and MongoDB Engine.

from djangotoolbox.

wrwrwr commented on July 19, 2024

@jonashaag Thanks for the review!

Second update:

Fixed in-memory filtering (matches_filters only used by GAE). It was comparing non-primary key model values with database values, which doesn't work in a few cases.
Fixed decimal sorting for Mongo (by pulling GAE solution to toolbox).
Did some more compiler refactoring:
- incorporating Jonas's branch;
- making toolbox pass converted values to filter processing (thus mostly letting back-ends only worry about writing a proper value_for/from_db and otherwise not having to deal with conversions at all; toolbox now also knows not to convert isnull arguments and can handle lists of arguments for in, range or year lookups properly, so writing those conversion methods should be simple);
- passing related model primary key field instead of an original related field to conversions (as generally you should convert a foreign key value in the same way as a "parent" primary key value), passing fields instead of columns to order_by and lookup name to value_for_db.
Coded an alternative solution for the AutoField "just ints" problem -- getting rid of the value_to_db_auto (we're blanking rest of the value_to_db_* methods as most conversions needed are not tied to a specific field kind) and also giving AutoField's validation its strength back. There are different solutions possible (global setting / descriptor instead of the metaclass / just some public methods...), please take a closer look at this one -- I've tried a couple of different things, but nothing seems perfect.
Retested for GAE 1.6.3 and MongoDB 2.0.3.

The Django branch now uses the same name as the rest:
https://github.com/django-nonrel/django-nonrel/compare/feature/type-conversions-refactor-2

from djangotoolbox.

jonashaag commented on July 19, 2024

Hey Wojtek, are you going to merge your branches soon?

from djangotoolbox.

wrwrwr commented on July 19, 2024

Anytime we decide that's the time :-) I could find some more things to improve or fix, but it would be better done on some new branches, these are already big enough.

from djangotoolbox.

jonashaag commented on July 19, 2024

If you think they're ready for merging, go ahead.

from djangotoolbox.

charettes commented on July 19, 2024

@wrwrwr would you be kind enough to wait a couple of days before merging this? I'd like to take a look at the patch and issue a review. I agree that it might be hard if not impossible to stick with the django's value_to_db_* idiom but to me it's a big step backward to make DateField.get_db_prep_(value|save) return a value that's not ready for database insertion. I mean... all the other backend work this way, how is someone porting code from django to django-nonrel is supposed to know that? What happens with people writing new field subclasses, do they have to write their custom backend to wrap value_to_db to return the correct type for their field? How is someone calling SetField.value_to_string is supposed to know that it will return a list?

from djangotoolbox.

wrwrwr commented on July 19, 2024

@charettes Certainly, but please compare to what nonrel currently has to offer.

DateField.get_db_prep_* -- you already have to use the nonrel conversions for many (all) fields (nonrel fields, AutoFields, any "long string" fields with GAE etc.), so I find it more consistent this way (toolbox blanks most of the vanilla database conversions; additonal conversions are currently defined on compilers; at least you'll learn quicker that you should also pass values through them ;-) Potentially you could call value_for_db from Field.get_db_prep_* -- see a consideration of two future options: b4315fc#L0R109.

Field subclasses -- they don't need to do anything differently than they'd do with vanilla, the framework is due to back-ends needing to do additional value preparation.

value_to_string -- it's the same on develop... Nonrel fields serialization is still broken anyway, I believe the method should really look more or less like this (for List/Set/DictField, EmbeddedModelField needs a bit more):

return smart_unicode(self._map(self.item_field.value_to_string, self._get_val_from_obj(obj)))

However, I remember someone else was working on the serialization, and this is not the place to fix it.

from djangotoolbox.

wrwrwr commented on July 19, 2024

Third update:

Yet another AufoField type solution -- a bit nonstandard usage of Settings instead of the static variable, in case someone needs to know / set the type in a context in which fields can't be imported. Also turned the type mismatch exception into a warning -- there are some circumstances that back-ends with different AutoField types may still be used (e.g. without using AutoField at all).
Introduced a MONGO_INT_BASED_AUTOFIELDS setting (defaults to False) -- ObjectIds are the natural, scalable choice, but this seems the only way to run Django test suite (at the moment ~25% tests fail mostly due to ManyToManyFields / abstract models). The code is nearly a copy of the MongoDB docs one (an additional counters collection + findAndModify).
Moved remaining parts of the value conversion framework to DatabaseOperations, not only compilers may need to use them.
Did some small refactoring to _get_ordering that lets back-ends use query.standard_ordering directly when there is no field ordering defined.

from djangotoolbox.

jonashaag commented on July 19, 2024

I don't like the MONGO_INT_BASED_AUTOFIELDS thing. It doesn't help at all to make the Django test suite pass with integer IDs because all real-live use cases use ObjectIds.

I understand your desire to run the Django test suite against MongoDB Engine but please have the hacks required for that in a separate branch. (That of course may be on this GitHub repo!)

from djangotoolbox.

emperorcezar commented on July 19, 2024

I have to disagree. If we want our patches accepted into Django core, then we're going to have to deal with the tests. While this may not be the way, the issue needs to be tackled.

The "cleaner" way is to open tickets against the Django tests and patch those.

from djangotoolbox.

jonashaag commented on July 19, 2024

I totally agree Adam! But I don't think the Django folks care about our backend being testing well, only the changes we make to Django.

from djangotoolbox.

emperorcezar commented on July 19, 2024

Good point. I don't know how they test things. Do they have a test server that runs against each of the backends, or do they just run it against sqlite and call it good?

from djangotoolbox.

wrwrwr commented on July 19, 2024

Alright, I'll move the MONGO_INT_BASED_AUTOFIELD to a separate branch. The Django suite is quite slow anyway, hard to use in practice, but even with the AutoField substitute it can detect a lot of other problems (with the additional inconvenience of having to filter out all those "This query is not supported(...)" / "Multi-table inheritance(...)" / "Cursors (...)" failures). I had a slight hope you won't consider this a hack, but an alternative solution :-)

from djangotoolbox.

jonashaag commented on July 19, 2024

https://code.djangoproject.com/changeset/17641
https://code.djangoproject.com/changeset/17698

probably fixes the same thing you did with explicit lazy object handling?

from djangotoolbox.

wrwrwr commented on July 19, 2024

Almost looks so, thanks for the pointers, should I rather remove the checks already or mark them for consideration in 1.4?

Nevertheless, these changes make me wonder about proxies in updates / lookups (toolbox currently only does the lazy evaluation for lookups) and wrappers around objects that shouldn't be cast to a string... Take a look at this comment too: https://code.djangoproject.com/ticket/10498#comment:12.

from djangotoolbox.

jonashaag commented on July 19, 2024

I'd keep the test, skipping until 1.4, and remove the fixing code.

from djangotoolbox.

wrwrwr commented on July 19, 2024

Hmmm, the test will likely fail with 1.4 too (it has cases for lookups with lazy translations and it doesn't seem Django will get any further refactoring for 1.4 -- unless I've missed some big ticket; currently it doesn't need to handle this case because lookup values are always cast to strings for SQL back-ends).

The issue is not significant enough to really care about it and I'm not sure what will actually happen with 1.4, but I guess I'll leave the code on another branch; we might want to bring it back (at least to some future GAE database-adapter or just for lookups).

from djangotoolbox.

charettes commented on July 19, 2024

@emperorcezar here's the CI they use to test against each backend. Looks like they've broken something recently... I should have time to review the pull request today, sorry for the delay.

from djangotoolbox.

emperorcezar commented on July 19, 2024

Hmm. There's not oracle on there so they don't seem to run it against all official backends.

from djangotoolbox.

charettes commented on July 19, 2024

@emperorcezar It used to be there, they removed it because they were having trouble setting it up on the ci box.

from djangotoolbox.

wrwrwr commented on July 19, 2024

@jonashaag I've checked that the Django changes don't fix lookups (nor some potential cases with using conversions directly, not through the sql.Query) and confirmed that's also a problem for MongoDB (fails with InvalidDocument: cannot convert value of type <class 'django.utils.functional.__proxy__'> to bson), so I'm only marking this for reconsideration for now (testing if it's a lookup or not just adds an unnecessary condition).

from djangotoolbox.

jonashaag commented on July 19, 2024

right

from djangotoolbox.

wrwrwr commented on July 19, 2024

Merged into develop. Thanks to everyone involved!

from djangotoolbox.

jonashaag commented on July 19, 2024

There's a flaw in your AutoField handling implementation -- it doesn't not allow for using e.g. SQL and MongoDB at the same time (because the AutoField type is hardcoded).

The AutoField type should depend on the backend used for each model.

from djangotoolbox.

wrwrwr commented on July 19, 2024

Yes, that's true, this was by design.

Models/fields are not tied to back-ends, they only get a database connection when preparing a value for the database, however some AutoField (type-dependent) value processing may happen without any database interaction (value validation, serialization / deserialization).

If we want to support multi-database settings with differing AutoField types there won't be effective field-level value validation, what can potentially result in problems in non-database related areas (which may expect AutoField values to be properly validated even though they have not passed through the database layer). It's more or less what we had previously, so maybe that's acceptable?

I've repushed the branches with two more solutions (two commits for the nonrel repo: moving to a wrong place or dropping the validation). I can also bring back the original check:

if not (value is None or isinstance(value, (basestring, int, long))):
    raise exceptions.ValidationError(self.error_messages['invalid'])

but that doesn't seem much better than doing no validation at all (and doesn't work for things that could be cast to unicode or int, but are not instances of their subclasses).

from djangotoolbox.

emperorcezar commented on July 19, 2024

It's critical to support multi-database settings. That's been touted as a benefit of using non-rel for a very long time. To break it would break a lot of installations.

from djangotoolbox.

wrwrwr commented on July 19, 2024

I guess I'll merge one of the "database setting" solutions tomorrow then. I'm a bit worried about the ineffective validation, but seems there is no way around it. A careful analysis on how AutoField values are used would be nice -- with nonrel changes (master or develop-to-be) something like this: mark_safe("The id is <em>%s</em>." % field.pk) may become a script-injection vector where it previously wasn't and things like log spoofing may be a bit easier to carry out (but the lack of the (field-layer) validation shouldn't make any vulnerability on its own).

Do you want the old check back (so I'm not the one to blame ;-) or should I rather drop the validation completely?

from djangotoolbox.

jonashaag commented on July 19, 2024

These questions can get pretty philosophic btw ;-)

I guess I prefer the solution that does validation based on the backend's AutoField type (i.e. PKs in forms for a MongoDB model must be ObjectIds). So the backend-setting approach probably makes most sense. However why don't we just use connection.ops.convert-autofield-value-to-what-the-backend-understands()? So that to-autopk-conversions are nicely abstracted away from the model code.

from djangotoolbox.

wrwrwr commented on July 19, 2024

Back-end setting makes a lot of sense, but it's not possible to validate using it (field can validate a value that later may be used by various back-ends or no back-end; technically it's something separate from the database layer, you don't have the connection or anything like that available). I did an attempt of pushing the validation for AutoField to the database layer, but that turns rather ugly -- Django doesn't expect it there, and it's not effective until the field is stored in the database.

The database casts / conversions are another thing, you're right: a method rather than a type-setting is better here, and that's actually what we had before :-)

So the current situation is (on tcr-2 branches):

value_to_db_auto is back (with a minor cleanup I did some time in the past, but generally that's the original solution; the casts could be moved to value_for_db if it makes it into Django -- but that should probably happen together with a deeper abstraction refactor that's described on the SoC page);
AutoField's field-layer validation is dropped -- there is no way of doing it properly and I believe not keeping the instance check makes the situation clearer (you should expect AutoFields to hold values of various types, including arbitrary strings, and pass them through output filtering suitable for context).

from djangotoolbox.

jonashaag commented on July 19, 2024

I don't get why you think this isn't something that can be validated properly -- after all, this is code of the database abstraction layer, so we can safely assume we have a database connection (more specifically a DatabaseWrapper instance). Maybe you're confusing form validation and model validation here?

from djangotoolbox.

wrwrwr commented on July 19, 2024

This is all about field/form validation (done in to_python where there's no connection and for this purpose it can't be easily made available), this is the one that have been changed/dropped -- vanilla Django assumes that AutoField values will be ints and enforces this "on input". There's no impact on the database layer (it does it's own "validation", quoting or casts; everything looks good on this level), but the int-type may have, potentially, been assumed elsewhere.

from djangotoolbox.

jonashaag commented on July 19, 2024

Agreed. The problem is that you're not thrown a validation error if you input invalid values into a FK form field but a DatabaseError which results in a HTTP 500.

But that's kind of a no-realworld-relevance usecase and the fix promises to be complex, so I'm fine with no validation at all.

Feel free to merge the branches if you think they're ready.

Well, seems like we've got this beast done :-) Thank you so much for the hard work!

from djangotoolbox.

wrwrwr commented on July 19, 2024

Remerged, thanks again for all the reviews and discussions, wouldn't get anywhere without that.

from djangotoolbox.

Type conversions refactor bis about djangotoolbox HOT 45 CLOSED

Comments (45)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent