scottrogowski / mongita Goto Github PK
View Code? Open in Web Editor NEW"Mongita is to MongoDB as SQLite is to SQL"
License: BSD 3-Clause "New" or "Revised" License
"Mongita is to MongoDB as SQLite is to SQL"
License: BSD 3-Clause "New" or "Revised" License
mongita.errors.InvalidName: Collection cannot be named 'Nerve_Tibial.v8.egenes_ann_query_res.vcf'.
It seems to me that the prohibition of presence of non-letter symbols in the collection name is superfluous. By the way, MongoDB does not have this restriction.
wrong: mongoose_types.insert_many([{'name': 'Meercat', 'not_into', 'Snakes'},{'name': 'Yellow mongoose': 'eats': 'Termites'}])
desc: the argument is not a correct python dict
wrong: list(coll.find({'weight': {'$gt': 1}))
coll.delete_one({'name': 'Meercat'})
desc: the variable coll is not existed, should be mongoose_types
I'm experiencing weird issues,
the api will throw a key error exception when accessing a dict. And after that, the db will be erased.
It seems to be a thread safety issue? read/write can't happen at the same time.
Having a collection like this
[
{
"_id": 1,
"results": [
{
"product": "abc",
"score": 10
},
{
"product": "xyz",
"score": 5
}
]
},
{
"_id": 2,
"results": [
{
"product": "abc",
"score": 8
},
{
"product": "xyz",
"score": 7
}
]
},
{
"_id": 3,
"results": [
{
"product": "abc",
"score": 7
},
{
"product": "xyz",
"score": 8
}
]
},
{
"_id": 4,
"results": [
{
"product": "abc",
"score": 7
},
{
"product": "def",
"score": 8
}
]
}
]
and a query like this
list(db.collection.find({
"results.product": "xyz"
}))
produces no result
[]
when running it againts mongodb the query finds the elements correctly
[
{
"_id": 1,
"results": [
{
"product": "abc",
"score": 10
},
{
"product": "xyz",
"score": 5
}
]
},
{
"_id": 2,
"results": [
{
"product": "abc",
"score": 8
},
{
"product": "xyz",
"score": 7
}
]
},
{
"_id": 3,
"results": [
{
"product": "abc",
"score": 7
},
{
"product": "xyz",
"score": 8
}
]
}
]
CC: @dgutson
I am trying to install mongita on Python 3.9 on a Windows machine. I am getting the following error. Any ideas?:
$ pip install mongita
Collecting mongita
Using cached mongita-1.0.0.tar.gz (33 kB)
ERROR: Command errored out with exit status 3221225477:
command: 'D:\anaconda\envs\py39\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\len_w\AppData\Local\Temp\pip-install-ee8rj6s8\mongita\setup.py'"'"'; file='"'"'C:\Users\len_w\AppData\Local\Temp\pip-install-ee8rj6s8\mongita\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\len_w\AppData\Local\Temp\pip-pip-egg-info-m1bl3ksn'
cwd: C:\Users\len_w\AppData\Local\Temp\pip-install-ee8rj6s8\mongita
Complete output (11 lines):
running egg_info
creating C:\Users\len_w\AppData\Local\Temp\pip-pip-egg-info-m1bl3ksn\mongita.egg-info
writing C:\Users\len_w\AppData\Local\Temp\pip-pip-egg-info-m1bl3ksn\mongita.egg-info\PKG-INFO
writing dependency_links to C:\Users\len_w\AppData\Local\Temp\pip-pip-egg-info-m1bl3ksn\mongita.egg-info\dependency_links.txt
writing requirements to C:\Users\len_w\AppData\Local\Temp\pip-pip-egg-info-m1bl3ksn\mongita.egg-info\requires.txt
writing top-level names to C:\Users\len_w\AppData\Local\Temp\pip-pip-egg-info-m1bl3ksn\mongita.egg-info\top_level.txt
writing manifest file 'C:\Users\len_w\AppData\Local\Temp\pip-pip-egg-info-m1bl3ksn\mongita.egg-info\SOURCES.txt'
reading manifest file 'C:\Users\len_w\AppData\Local\Temp\pip-pip-egg-info-m1bl3ksn\mongita.egg-info\SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'LICENSE,'
writing manifest file 'C:\Users\len_w\AppData\Local\Temp\pip-pip-egg-info-m1bl3ksn\mongita.egg-info\SOURCES.txt'
----------------------------------------
ERROR: Command errored out with exit status 3221225477: python setup.py egg_info Check the logs for full command output.
Segmentation fault
In pymongo, using replace_one()
with upsert=True
will create a new document if it doesn't exist, and always use an ID provided in the filter if available. It appears that mongita will only use the provided ID if the document already exists. Otherwise, it creates a new ID via bson.ObjectId()
.
Here's an example to reproduce this:
from mongita import MongitaClientDisk
from pymongo import MongoClient
def test_ids(client):
collection = client['test_db']['test_collection']
collection.replace_one(
{'_id': 'id_from_filter'},
replacement={'key': 'value'},
upsert=True,
)
doc = collection.find_one({'_id': 'id_from_filter'})
print(f'Fetched document by ID: {doc}')
print('All IDs:')
for d in collection.find({}):
print(d['_id'])
print('pymongo\n----------')
test_ids(MongoClient())
print('\nmongita\n----------')
test_ids(MongitaClientDisk())
Output:
pymongo
----------
Fetched document by ID: {'_id': 'id_from_filter', 'key': 'value'}
All IDs:
id_from_filter
mongita
----------
Fetched document by ID: None
All IDs:
6356e86fd4d2dac326e38371
I believe it comes down to this section in Collection.__find_one_id()
:
Lines 823 to 826 in 0bc8e57
@scottrogowski, this is a really nice project. The name is awesome too!
I have not had the chance yet to really git it a spin, however I think the benchmarks can be improved a bit.
I believe SQLite performance comparision is can be improved if you compared insertion of a dict into JSON.
I think this is where most of the CPU cycles in the row insertion are consumed, which make SQLite look so bad ...
def _to_sqlite_row(doc):
doc['_id'] = str(doc['_id'])
return (doc['_id'], doc['name'], doc['dt'], doc['count'], doc['city'],
doc['content'], doc['percent'],
json.dumps(doc['dict'], default=json_util.default))
Thanks for publishing this nifty little project!
In the SQLite schema you forgot to declare idd
a primary key, or even to add an index on it. Thus all of your find_one()
calls have O(n) performance.
I'm struggling to believe that this was unintentional, especially since you did bother to add indexes on other columns. Surely when you saw the jaw-droppingly bad numbers for SQLite in "get 1000 docs by ID", you would have investigated?
Would it be preferable to use mongita memory over redis for caching? The reason being that I'm already using mongodb as my main database and implementing mongita as a cache layer seems to be the easiest route. How's the performance of it compared to redis?
While it is possible to use MongoEngine 0.22.1 with mongita-1.1.0 MongitaClientMemory:
import pymongo
import mongita
# This works
pymongo.MongoClient = mongita.MongitaClientMemory
import mongoengine
mongoengine.connect(host='c:/temp/mongita')
It does not work with MongitaClientDisk:
import pymongo
import mongita
# This fails
pymongo.MongoClient = mongita.MongitaClientDisk
import mongoengine
mongoengine.connect(host='c:/temp/mongita')
The MongitaClientDisk constructor always fails.
Not clear if this is just a version mis-match with mongoengine or not.
The error also affects the unit tests for mongita.
The problem seems to be in mongita_client.py in the MongitaClientDisk constructor where it
invokes:
disk_engine.DiskEngine.create(host)
This is because according to:
https://pymongo.readthedocs.io/en/stable/api/pymongo/mongo_client.html
the host parameter to the MongoClient() method is a list, not a string. Since the
DiskEngine.create() factory method is expecting a string, it reports the error:
"unhashable type: list"
The simple fix is to test in the MongoClientDisk constructor in mongita_client.py
to see if the parameter is a list and if so pluck the first element. Since the
default value for mongoengine is 'localhost' this is worth checking for and replacing
with DEFAULT_STORAGE_DIR.
While we are at it, a check for the existence of the parent directory where the
database is to be located is worthwhile. This leaves us with:
def __init__(self, host=DEFAULT_STORAGE_DIR, **kwargs):
host = host or DEFAULT_STORAGE_DIR
if host == 'localhost':
host = DEFAULT_STORAGE_DIR
if os.path.exists(os.path.dirname(host)):
raise NotADirectoryError(os.path.dirname(host))
self.engine = disk_engine.DiskEngine.create(host)
I'd be happy to generate a pull request, but I'd like to know that the pre-existing unit tests work on some system before
requesting a pull for a fix that may be out of date.
I see that it is written in the docs:
It is not process-safe
But the particular case is not clear. What if to create one client to work with one collection?. Example with MongoDB. Is it a safe approach?
Mongita Collection returns a List[Dict]
, and should instead return a MutableMapping[str, Any]
See https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html#pymongo.collection.Collection.index_information
My suggested implementation:
def index_information(self):
ret = {'_id_': {'key': [('_id', 1)]}}
metadata = self._Collection__get_metadata()
for idx in metadata.get('indexes', {}).values():
ret[idx['_id']] = {'key': [(idx['key_str'], idx['direction'])]}
return ret
That's an awesome tools, and really very close to MongoDB. Is there any feature equivalent to mongodump
and mongorestore
(or mongoexport
and mongoimport
)? If not, it would be a useful addition to the module. Thanks.
In my PR #40, I suggest a change uplift the limitation preventing users to use bson.Binary
data as an identifier.
In my project I need to store UUID
data as _id
, my ODM (beanie
) turns them into bson.Binary
. It seems a pretty legit behavior in mongo.
see: https://www.mongodb.com/docs/manual/reference/bson-types/#binary-data
First off, I love this library. It's one of the main reasons I made the jump from NodeJS with NEDB to Python. However, I'm trying to implement pagination on my Flask site, and the cursor.skip method would make life a little easier for me. Thank you for taking the time for implementing this feature. I appreciate all you do.
Mongita explicitly depends on pymongo <=4.
Since mongita relies only on pymongo bson support, is it complex to provide a mongita compatible with pymongo 4 ?
Hi!
I am not sure if it's against the design goals of the library, but it would be very useful to be able to provide custom Python callback functions to the _doc_matches_slow_filters
code path of the find()
method. That means a custom function which takes a single document and returns True
or False
if it should be included in the output. As fast as I understand this should be easy to implement as that's what the non-indexed code path is basically doing anyway.
Using that it would also be easy to work around operators which are not implemented yet.
This is a really awesome lib. I wanted to see if I am able to start a db, and use it with mongoengine. I'm not sure if it requires an actual connection.
It is not clear whether collections are compressed. If so, what algorithm is used by default? Can the user specify a preferred algorithm and compression level?
An example from the MongoDB world:
create_collection(name, storageEngine={'wiredTiger': {'configString': 'block_compressor=zstd'}})
Hi there, really nice project you've got here! I'm using mongita as an embedded DB in one of my projects and it's great, but I've started making a few tweaks I thought you might be interested in?
Firstly, I see the on-disk storage engine currently has an infinitely growing cache, which naturally leads to memory leaks with large read/write cycles. I've had a go at writing a simple limited cache that should work as a drop-in replacement (#36). Limiting the cache size to below the benchmarking set size will obviously have a negative impact on performance, but when set larger (or to infinite) the change has limited impact.
Secondly, I think one of the current major bottlenecks for performance is the copy.deepcopy()
calls on insertion and retrieval. For insertion, I'm fairly certain this can be replaced with a simple shallow copy, as all that's changed is the addition of the _id
field? I've made a PR to test this out (#37) and all seems to work fine. On my system, the increase in insertion performance with the benchmarking set is ~50%.
For retrieval it looks like things are more complicated. Currently, the returned record is copied regardless of whether it's fetched from cache or from disk, but of course the record returned from disk is already unique so the copy is wasted. I don't see any easy way to change this at present without changing some other internals, most probably collection.__find_ids()
. Is there a reasons this function couldn't return the actual documents rather than just the IDs? It's already gone to the bother of fetching the records so it seems wasteful to discard them only to retrieve again later?
Cheers, and I hope you don't mind the comments!
According to https://docs.mongodb.com/manual/reference/operator/query/in/#mongodb-query-op.-in
If the field holds an array, then the $in operator selects the documents whose field holds an array that contains at least one element that matches a value in the specified array (for example,
<value1>
,<value2>
, and so on).
mongita raises an error in this case:
Traceback (most recent call last):
File "test.py", line 57, in test_mongita
result = list(col.find({"names: {"$in": ["asd", "qwe"]}}))
File "Python38\lib\site-packages\mongita\cursor.py", line 56, in __iter__
for el in self._gen():
File "Python38\lib\site-packages\mongita\collection.py", line 870, in __find
for doc_id in self.__find_ids(filter, sort, limit, metadata=metadata):
File "Python38\lib\site-packages\mongita\collection.py", line 845, in __find_ids
if doc and _doc_matches_slow_filters(doc, slow_filters):
File "Python38\lib\site-packages\mongita\collection.py", line 193, in _doc_matches_slow_filters
if _doc_matches_agg(doc_v, query_ops):
File "Python38\lib\site-packages\mongita\collection.py", line 143, in _doc_matches_agg
if doc_v not in query_val:
TypeError: unhashable type: 'list'
Hey Scott
First thanks a lot for the package; found it really useful.
I'm having some issues anding together some search conditions. It looks like using $and fails. Example:
`test = [{'name': 'foo'}, {'name':'bar'}]
db['assessment'].delete_many({})
db['assessment'].insert_many(test)
print(db['assessment'].count_documents({ 'name': { '$in' :['foo'] }}))
db['assessment'].count_documents({'$and': [{ 'name' :'foo' }]})`
When im trying to find something in a collection 'assessment'
Any thoughts appreciated!
Hi,
Very nice work with the mongita project!
I have just started experimenting with it.
I have found that if i write to a collection in one process and reads from it in another, the values when reading will not be updated without reinstance the MongitaClient. From behaviour it seems like the full db is loaded to memory? is there a function I can use to refresh?
I was testing this with MongitaClientDisk.
If I open two instances of the same db and collection in the same process the changes to the collection is reflected immediately.
Best regards.
First of all, thank you for this library.
I love it.
Would love to see $push implemented.
https://docs.mongodb.com/manual/reference/operator/update/push/
If I have a chance I'll make an attempt at it this week.
What modules would I need to touch for this?
You should mention in the README where the databases / collections are stored physically. I had to use strace to figure out they are in ~/.mongita
under Linux.
Edit: how can we store the data in the current (project) directory (similarly to SQLite)?
Hello, I was wondering if we can access a mongita db created on the disk of one machine on another machine. We have RPyC connection between those 2 machines if it helps. (We don't want to use an actual mongoDB because we can't install mongo on them). Thanks!
When trying to create more than one index via create_indexes
, a MongitaNotImplementedError
occurs. I understand that implementing index intersection is a difficult task, but I really hope that it will be in Mongita someday.
Hi, I get the following error when trying to import mongita in python 3.6.
Traceback (most recent call last):
File "test.py", line 3, in <module>
from mongita import MongitaClientDisk
File "/home/chris/GitWS/tinydb_sqlite/venv/lib/python3.6/site-packages/mongita/__init__.py", line 1, in <module>
from . import collection
File "/home/chris/GitWS/tinydb_sqlite/venv/lib/python3.6/site-packages/mongita/collection.py", line 39, in <module>
re.Pattern: b'\n',
AttributeError: module 're' has no attribute 'Pattern'
Looks like 3.6 does not support Pattern
. Looks like other projects have the same issue https://github.com/getsentry/responses/pull/196/files, beetbox/beets#2978
To continue to support 3.6 you may have to add something like:
in an appropriate place
I've dumped my Mongo Database in local folder now working with it mongitaDB.
PyMongo has such functionality where I can retrieve documents from the Database with specific fields. For an example:
mongo_client.db.col.find({},{"_id":1})
This line returns the cursor from where I will get the "_id" field. Then, I've tried something similar with mongitaDB.
mongita_client.db.col.find({},{"_id":1})
It raises following error:
mongita.errors.MongitaError: Unsupported sort parameter format. See the docs.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.