Comments (8)
That's a good idea, beside this doesn't seems like something hard to add : Just create a new ByteField
which would mimic marshmallow's String
but with a check on byte
in _deserialize
instead of basestring
PR welcomed 👍
from umongo.
@touilleMan @chenjr0719 @lafrech @martinjuhasz
Guys, this issue seems dead old, but this is where search engine leads you when you're looking for ways to add binary fields to uMongo document. In my opinion, it is such a shame Marshmallow does not provide users binary field. They do it on purpose, but there are really no good reasons for doing so:
- BSON spec has Binary and Mongo supports it.
- Binary field is needed for numerous appliances (whether it would an avatar, some hash or small blob). And this is where Mongo plays strong in terms of efficiency.
- People are trying to store bytes either as UTF-8 encoded string, which one day will result in ultimate failure (example –
b'\xd5\xce\xe1\x86\xcf'
), or as base64 encoded value. Which is more reliable, but introduces inconveniences (no obvious way to check length, slice, ... without decoding first) and computation overheads. - Others are trying to store blobs in GridFS. Stackoverflow is full of such recommendations. Of course, it is not the use case GridFS was initially made for
Conclusion from the above: uMongo needs BinaryField. If Marshmallow guys refuse to add support for it – f*ck them, let's do it in uMongo
Unfortunately, I'm not uMongo developer and haven't dig deep into how everything works. Here is an example of BinaryField
I came with:
import bson
from marshmallow import compat as ma_compat, fields as ma_fields
from umongo import fields
class BinaryField(fields.BaseField, ma_fields.Field):
default_error_messages = {
'invalid': 'Not a valid byte sequence.'
}
def _serialize(self, value, attr, data):
return ma_compat.binary_type(value)
def _deserialize(self, value, attr, data):
if not isinstance(value, ma_compat.binary_type):
self.fail('invalid')
return value
def _serialize_to_mongo(self, obj):
return bson.binary.Binary(obj)
def _deserialize_from_mongo(self, value):
return bytes(value)
Maybe there are some obscure caveats, maybe not. This is the code I'm currently having in project and it seems to work like a charm. (I'm using Motor)
Would be nice if someone familiar with internals of uMongo could take a look
from umongo.
Okay, good. But what would the default serialization method do? Byte data isn't necessarily convertible into a string, right? How would someone want a byte Field to be serialized?
In my special case the byte field represents some pickled state of an object that i don't even want to be serialized and sent over my api (is there a way to exclude fields on serialization?).
class ByteField(BaseField, ma_fields.String):
def _deserialize(self, value, attr, data):
if isinstance(value, bytes):
return value
return super()._deserialize(value, attr, data)
Works fine for storing and if the stored byte field is valid utf-8 it gets converted into a string on serialization.
from umongo.
Bytes is a valid bson type (named Binary data in mongodb types)
Beside, it seems pymongo does the convertion bytes <=> Binary data
by itself:
>>> hello = 'héllo'
>>> doc_id = db.test.insert({'str': hello, 'bytes': hello.encode()})
>>> db.test.find_one(doc_id)
{'bytes': b'h\xc3\xa9llo', 'str': 'héllo', '_id': ObjectId('57ad9b0713adf23b7095fcee')}
So I think the _deserialize
method should check the entry data is bytes
and that's it ! pymongo will gladly take care of those bytes
for us ;-)
In my special case the byte field represents some pickled state of an object that i don't even want to be serialized and sent over my api (is there a way to exclude fields on serialization?).
Yes there is ! You should use the attribute load_only
for your field. This way it will never be serialize.
I guess you should also use dump_only
attribute as well in order for your API not to accept incoming data for this field during deserialization.
@instance.register
class MyDoc(Document):
pickled_stuff = field.BytesField(load_only=True, dump_only=True)
public_name = field.StrField()
# inside your POST API
payload = get_payload_from_request()
my_doc = MyDoc(**payload)
# raise ValidationError if a 'pickled_stuff' field is present
assert my_doc.pickled_stuff == None
my_doc.pickled_stuff = pickle_my_stuff() # must return bytes
my_doc.commit()
return 200, 'Ok'
# inside your GET API
my_doc = MyDoc.find({'id': my_id})
print(my_doc)
# <... {'pickled_stuff': b'<pickled data>', 'public_name': 'test' }...>
my_doc.dump()
{'public_name': 'test'}
return 200, json.dumps(my_doc.dump())
You should also have a look at the flask example which show you how to use umongo inside an API with custom loading/dumping schema
from umongo.
Thanks for sharing this, great stuff!
So you think BytesField
should try to serialize using ensure_text_type
(as it does when inheriting from BaseField, ma_fields.String
)? It will fail on binary data thats not utf-8 encoded, but i guess thats fine, because if you want binary data to be serialized you would have thought about encoding before storing it.
My pickled data would fail on serialization:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
from umongo.
I think we should not try to do any string encode/decode inside umongo. This should be the user responsibility to provide bytes
given there is too much suppositions on his workflow otherwise (what is the encoding the user want ? what to do with bytes that can't be decoded ? etc.).
Beside this makse the implementation more straightforward and simple, so why bother ;-)
from umongo.
Does this enhancement still need? If so, I want to get a try.
from umongo.
@thodnev Thanks for the code, I'm going to use it as I need the ability to store binary data (in this case, a password salt created from os.urandom). If this code works, I can't imagine it would be too hard to add to a PR (if you haven't already).
from umongo.
Related Issues (20)
- keyPattern not existing in DuplicteKeyError, causing `KeyError` HOT 3
- `find` with `GenericReferenceField` does not return the document.
- pymongo Database class explicitly denies bool HOT 3
- always $set usage with updating from commit
- Date serialization removes microseconds, but not during commit(), only when deserializing from mongo HOT 5
- 'id' based query other than of type ObjectId is returning None HOT 1
- Why does field.UUIDField return the same value for each instance? HOT 4
- Embedded document: unknown "fieldname" field in DB HOT 1
- DictFields and EmbeddedFields aren't created with default values HOT 4
- How to use UUID as _id HOT 3
- How do the isolation test
- Insert many documents at once
- ConstantField produces error when converting to marshmallow schema
- DictField with an EmbeddedDocument value produces error
- Referencing GridFS data?
- [RFC] Drop txmongo support? HOT 2
- Warning: The 'missing' attribute of fields is deprecated HOT 1
- Passing marshmallow Schema options to the models HOT 2
- Add commit_many() function to perform builk updates HOT 1
- [dependencies] Upgrade `pymongo>4` and `motor>3`. HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from umongo.