Comments (31)
I need reliable steps to reproduce, then I can bisect and figure out which exact version of Datasette introduced the problem.
I have a hunch that it relates to changes made to the datasette/database.py
module, maybe one of these changes here: 0.61...0.63.1#diff-4e20309c969326a0008dc9237f6807f48d55783315fbfc1e7dfa480b550e16f9
from datasette.
The good news is that this bug is currently unlikely to affect most users since named in-memory databases (created using datasette.add_memory_database("airtable_refs")
(docs) are a pretty obscure feature, only available to plugins.
from datasette.
Just managed to get this exception trace:
return await self.route_path(scope, receive, send, path)
File "/Users/simon/.local/share/virtualenvs/airtable-export-Ca4U-3qk/lib/python3.8/site-packages/datasette/app.py", line 1354, in route_path
response = await view(request, send)
File "/Users/simon/.local/share/virtualenvs/airtable-export-Ca4U-3qk/lib/python3.8/site-packages/datasette/views/base.py", line 134, in view
return await self.dispatch_request(request)
File "/Users/simon/.local/share/virtualenvs/airtable-export-Ca4U-3qk/lib/python3.8/site-packages/datasette/views/base.py", line 91, in dispatch_request
return await handler(request)
File "/Users/simon/.local/share/virtualenvs/airtable-export-Ca4U-3qk/lib/python3.8/site-packages/datasette/views/base.py", line 361, in get
response_or_template_contexts = await self.data(request, **data_kwargs)
File "/Users/simon/.local/share/virtualenvs/airtable-export-Ca4U-3qk/lib/python3.8/site-packages/datasette/views/table.py", line 158, in data
return await self._data_traced(request, default_labels, _next, _size)
File "/Users/simon/.local/share/virtualenvs/airtable-export-Ca4U-3qk/lib/python3.8/site-packages/datasette/views/table.py", line 568, in _data_traced
await gather(execute_facets(), execute_suggested_facets())
File "/Users/simon/.local/share/virtualenvs/airtable-export-Ca4U-3qk/lib/python3.8/site-packages/datasette/views/table.py", line 177, in _gather_parallel
return await asyncio.gather(*args)
asyncio.exceptions.CancelledError
INFO: 127.0.0.1:64109 - "GET /airtable_refs/airtable_refs?_facet=table_name&table_name=Sessions HTTP/1.1" 500 Internal Server Error
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/Users/simon/.pyenv/versions/3.8.17/lib/python3.8/concurrent/futures/thread.py", line 40, in _python_exit
t.join()
File "/Users/simon/.pyenv/versions/3.8.17/lib/python3.8/threading.py", line 1011, in join
self._wait_for_tstate_lock()
File "/Users/simon/.pyenv/versions/3.8.17/lib/python3.8/threading.py", line 1027, in _wait_for_tstate_lock
elif lock.acquire(block, timeout):
KeyboardInterrupt
from datasette.
The bug exhibits when I try to add a facet. I think it's caused by the parallel query execution I added to facets at some point.
http://127.0.0.1:8045/airtable_refs/airtable_refs - no error
http://127.0.0.1:8045/airtable_refs/airtable_refs?_facet=table_name#facet-table_name - hangs the server
Crucial line in the traceback:
await gather(execute_facets(), execute_suggested_facets())
From here:
datasette/datasette/views/table.py
Line 568 in 917272c
from datasette.
That line was added in 942411e which first shipped in 0.62a0.
from datasette.
I wrote this test, but it passes:
@pytest.mark.asyncio
async def test_facet_against_in_memory_database():
ds = Datasette()
db = ds.add_memory_database("mem")
await db.execute_write("create table t (id integer primary key, name text)")
await db.execute_write_many(
"insert into t (name) values (?)", [["one"], ["one"], ["two"]]
)
response1 = await ds.client.get("/mem/t.json")
assert response1.status_code == 200
response2 = await ds.client.get("/mem/t.json?_facet=name")
assert response2.status_code == 200
assert response2.json() == {
"ok": True,
"next": None,
"facet_results": {
"results": {
"name": {
"name": "name",
"type": "column",
"hideable": True,
"toggle_url": "/mem/t.json",
"results": [
{
"value": "one",
"label": "one",
"count": 2,
"toggle_url": "http://localhost/mem/t.json?_facet=name&name=one",
"selected": False,
},
{
"value": "two",
"label": "two",
"count": 1,
"toggle_url": "http://localhost/mem/t.json?_facet=name&name=two",
"selected": False,
},
],
"truncated": False,
}
},
"timed_out": [],
},
"rows": [
{"id": 1, "name": "one"},
{"id": 2, "name": "one"},
{"id": 3, "name": "two"},
],
"truncated": False,
}
from datasette.
Landing a version of that test anyway.
from datasette.
Maybe it's not related to faceting - I just got it on a hit to http://127.0.0.1:8045/airtable_refs/airtable_refs
instead.
from datasette.
Sometimes it takes a few clicks for the bug to occur, but it does seem to always be within the in-memory database.
from datasette.
I managed to trigger it by loading http://127.0.0.1:8045/airtable_refs/airtable_refs
- which worked - and then hitting refresh on that page a bunch of times until it hung.
from datasette.
I switched that particular implementation to using an on-disk database instead of an in-memory database and could no longer recreate the bug.
from datasette.
OK, I can trigger the bug like this:
datasette pottery2.db -p 8045 --get /airtable_refs/airtable_refs
Can I write a bash script that fails (and terminates the process) if it takes longer than X seconds?
from datasette.
This worked, including on macOS even though GPT-4 thought timeout
would not work there: https://chat.openai.com/share/cc4628e9-5240-4f35-b640-16a9c178b315
#!/bin/bash
# Run the command with a timeout of 5 seconds
timeout 5s datasette pottery2.db -p 8045 --get /airtable_refs/airtable_refs
# Check the exit code from timeout
if [ $? -eq 124 ]; then
echo "Error: Command timed out after 5 seconds."
exit 1
fi
from datasette.
I'm now trying this test script:
#!/bin/bash
port=8064
# Start datasette server in the background and get its PID
datasette pottery2.db -p $port &
server_pid=$!
# Wait for a moment to ensure the server has time to start up
sleep 2
# Initialize counters and parameters
retry_count=0
max_retries=3
success_count=0
path="/airtable_refs/airtable_refs"
# Function to run curl with a timeout
function test_curl {
# Run the curl command with a timeout of 3 seconds
timeout 3s curl -s "http://localhost:${port}${path}" > /dev/null
if [ $? -eq 0 ]; then
# Curl was successful
((success_count++))
fi
}
# Try three parallel curl requests
while [[ $retry_count -lt $max_retries ]]; do
# Reset the success counter
success_count=0
# Run the curls in parallel
echo " Running curls"
test_curl
test_curl
test_curl # & test_curl & test_curl &
# Wait for all curls to finish
#wait
# Check the success count
if [[ $success_count -eq 3 ]]; then
# All curls succeeded, break out of the loop
echo " All curl succeeded"
break
fi
((retry_count++))
done
# Kill the datasette server
echo "Killing datasette server with PID $server_pid"
kill -9 $server_pid
sleep 2
# Print result
if [[ $success_count -eq 3 ]]; then
echo "All three curls succeeded."
exit 0
else
echo "Error: Not all curls succeeded after $retry_count attempts."
exit 1
fi
I run it like this:
git bisect reset
git bisect start
git bisect good 0.59.4
git bisect bad 1.0a6
git bisect run ../airtable-export/testit.sh
But... it's not having the desired result, I think because the bug is intermittent so each time I run it the bisect spits out a different commit as the one that is to blame.
from datasette.
Output while it is running looks like this:
running '../airtable-export/testit.sh'
INFO: Started server process [75649]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8064 (Press CTRL+C to quit)
Running curls
Running curls
Running curls
Killing datasette server with PID 75649
../airtable-export/testit.sh: line 54: 75649 Killed: 9 datasette pottery2.db -p $port
Error: Not all curls succeeded after 3 attempts.
Bisecting: 155 revisions left to test after this (roughly 7 steps)
[247e460e08bf823142f7b84058fe44e43626787f] Update beautifulsoup4 requirement (#1703)
running '../airtable-export/testit.sh'
INFO: Started server process [75722]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8064 (Press CTRL+C to quit)
Running curls
Running curls
Running curls
Killing datasette server with PID 75722
../airtable-export/testit.sh: line 54: 75722 Killed: 9 datasette pottery2.db -p $port
Error: Not all curls succeeded after 3 attempts.
Bisecting: 77 revisions left to test after this (roughly 6 steps)
[3ef47a0896c7e63404a34e465b7160c80eaa571d] Link rel=alternate header for tables and rows
running '../airtable-export/testit.sh'
INFO: Started server process [75818]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8064 (Press CTRL+C to quit)
Running curls
from datasette.
I tried it with a path of /
and everything passed - so it's definitely the path of /airtable_refs/airtable_refs
(an in-memory database created by an experimental branch of https://github.com/simonw/airtable-export) that triggers the problem.
from datasette.
I knocked it down to 1 retry just to see what happened.
from datasette.
Turned out I wasn't running the datasette
from the current directory, so it was not testing what I intended.
FIxed that with pip install -e .
in the datasette/
directory.
Now I'm seeing some passes, which look like this:
running '../airtable-export/testit.sh'
INFO: Started server process [77810]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8064 (Press CTRL+C to quit)
Running curls
INFO: 127.0.0.1:59439 - "GET /airtable_refs/airtable_refs HTTP/1.1" 200 OK
INFO: 127.0.0.1:59440 - "GET /airtable_refs/airtable_refs HTTP/1.1" 200 OK
INFO: 127.0.0.1:59441 - "GET /airtable_refs/airtable_refs HTTP/1.1" 200 OK
All curl succeeded
Killing datasette server with PID 77810
../airtable-export/testit.sh: line 54: 77810 Killed: 9 datasette pottery2.db -p $port
All three curls succeeded.
Bisecting: 4 revisions left to test after this (roughly 2 steps)
[7463b051cf8d7f856df5eba9f7aa944183ebabe5] Cosmetic tweaks after blacken-docs, refs #1718
running '../airtable-export/testit.sh'
INFO: Started server process [77826]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8064 (Press CTRL+C to quit)
Running curls
from datasette.
OK it looks like it found it!
942411ef946e9a34a2094944d3423cddad27efd3 is the first bad commit
commit
Author: Simon Willison <[email protected]>
Date: Tue Apr 26 15:48:56 2022 -0700
Execute some TableView queries in parallel
Use ?_noparallel=1 to opt out (undocumented, useful for benchmark comparisons)
Refs #1723, #1715
datasette/views/table.py | 93 ++++++++++++++++++++++++++++++++++--------------
1 file changed, 67 insertions(+), 26 deletions(-)
bisect found first bad commit
942411e does look like the cause of this problem.
from datasette.
I also confirmed that http://127.0.0.1:8064/airtable_refs/airtable_refs?_noparallel=1
does not trigger the bug but http://127.0.0.1:8064/airtable_refs/airtable_refs
does.
from datasette.
Now that I've confirmed that parallel query execution of the kind introduced in 942411e can cause hangs (presumably some kind of locking issue) against in-memory databases, some options:
- Disable parallel execution entirely and rip out related code.
- Disable parallel execution entirely by leaving that code but having it always behave as if
_noparallel=1
- Continue digging and try and find some way to avoid this problem
The parallel execution work is something I was playing with last year in the hope of speeding up Datasette pages like the table page which need to execute a bunch of queries - one for each facet, plus one for each column to see if it should be suggested as a facet.
I wrote about this at the time here: https://simonwillison.net/2022/May/6/weeknotes/
My hope was that despite Python's GIL this optimization would still help, because the SQLite C module releases the GIL once it gets to SQLite.
But... that didn't hold up. It looked like enough work was happening in Python land with the GIL that the optimization didn't improve things.
Running the nogil
fork of Python DID improve things though! I left the code in partly on the hope that the nogil
fork would be accepted into Python core.
... which it now has! But it will still be a year or two before it fully lands: https://discuss.python.org/t/a-steering-council-notice-about-pep-703-making-the-global-interpreter-lock-optional-in-cpython/30474
So I'm not particularly concerned about dropping the parallel execution. If I do drop it though do I leave the potentially complex code in that relates to it?
from datasette.
Looking again at this code:
datasette/datasette/database.py
Lines 87 to 117 in 6ed7908
check_same_thread=False
really stands out here.
Python docs at https://docs.python.org/3/library/sqlite3.html
check_same_thread (bool) -- If
True
(default),ProgrammingError
will be raised if the database connection is used by a thread other than the one that created it. IfFalse
, the connection may be accessed in multiple threads; write operations may need to be serialized by the user to avoid data corruption. Seethreadsafety
for more information.
I think I'm playing with fire by allowing multiple threads to access the same connection without doing my own serialization of those requests.
I do do that using the write connection - and in this particular case the bug isn't coming from write queries, it's coming from read queries - but perhaps SQLite has issues with threading for reads, too.
from datasette.
Using SQLite In Multi-Threaded Applications
That indicates that there's a SQLite option for "Serialized" mode where it's safe to access anything SQLite provides from multiple threads, but as far as I can tell Python doesn't give you an option to turn that mode on or off for a connection - you can read sqlite3.threadsafet
y to see if that mode was compiled in or not, but not actually change it.
On my Mac sqlite3.threadsafety
returns 1 which means https://docs.python.org/3/library/sqlite3.html#sqlite3.threadsafety "Multi-thread: In this mode, SQLite can be safely used by multiple threads provided that no single database connection is used simultaneously in two or more threads." - it would need to return 3 for that serialized mode.
from datasette.
My current hunch is that SQLite gets unhappy if multiple threads access the same underlying C object - which sometimes happens with in-memory connections and Datasette presumably because they are faster than file-backed databases.
I'm going to remove the asyncio.gather()
code from the table view. I'll ship a 0.x release with that fix too.
from datasette.
The one other thing affected by this change is this documentation, which suggests a not-actually-safe pattern:
Lines 1292 to 1321 in 6ed7908
from datasette.
Added a note to that example in the documentation:
Line 1320 in 4e6a341
from datasette.
/me munches popcorn at a furious rate, utterly entralled
from datasette.
I'm going to release this in 1.0a7
, and I'll backport it to a 0.64.4
release too.
from datasette.
Release 0.64.4: https://docs.datasette.io/en/stable/changelog.html#v0-64-4
from datasette.
We're planning a breaking change in 1.0a7
:
Since that's a breaking change I'm going to ship 1.0a7 right now with this fix, then ship that breaking change as 1.0a8
instead.
from datasette.
1.0a7 is out with this fix as well now: https://docs.datasette.io/en/1.0a7/changelog.html#a7-2023-09-21
from datasette.
Related Issues (20)
- Mechanism for register_output_renderer hooks to access full count HOT 2
- Datasette for serving JSON only
- Plugin Hooks for "compile to SQL" languages HOT 2
- "Test DATASETTE_LOAD_PLUGINS" test shows errors but did not fail the CI run HOT 6
- Deploy failing with "plugins/alternative_route.py: Not a directory" HOT 8
- `datasette publish` needs support for the new config/metadata split HOT 12
- Discord invite link returns 401 HOT 2
- click-default-group-wheel dependency conflict HOT 3
- --load-extension=spatialite not working with Windows HOT 3
- Detailed upgrade instructions for metadata.yaml -> datasette.yaml HOT 7
- Discord invite link is invalid
- custom plugin not seen as sql function
- request.post_body() can only be called once
- request.post_vars() method obliterates form keys with multiple values HOT 3
- ModuleNotFoundError: No module named 'click_default_group
- No suggested facets when a column named 'value' is included HOT 1
- Unreachable exception handlers for `sqlite3.OperationalError`
- Can't filter with numbers
- feature request: gzip compression of database downloads HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from datasette.