Giter Club home page Giter Club logo

Comments (12)

tpetazzoni avatar tpetazzoni commented on August 16, 2024 1

This seems to be a regression that came with the new HTML popup. @fstachura, do you think you could have a look ?

from elixir.

fstachura avatar fstachura commented on August 16, 2024

@mwalle Sorry to hear that. As a temporary fix - middle or ctrl click will open the identifier in a new tab, bypassing the popup. The results seem to be correct there. Alternatively, disabling javascript for elixir.bootlin.com with a browser extension should revert back to the old behavior.

@tpetazzoni
Sure, it seems that the API incorrectly returns empty results for this identifier
https://elixir.bootlin.com/api/ident/u-boot/srand?version=latest&family=C

The same URL without the family parameter returns correct results (NOTE: try it with curl, for some reason that link still returns empty results when opened in a browser)
https://elixir.bootlin.com/api/ident/u-boot/srand?version=latest

from elixir.

mwalle avatar mwalle commented on August 16, 2024

@mwalle Sorry to hear that. As a temporary fix - middle or ctrl click will open the identifier in a new tab, bypassing the popup. The results seem to be correct there. Alternatively, disabling javascript for elixir.bootlin.com with a browser extension should revert back to the old behavior.

Thanks for the productivity boost ;)

from elixir.

tpetazzoni avatar tpetazzoni commented on August 16, 2024

Thanks a lot @fstachura for the initial investigation! Indeed, looks like a server-side issue then.

from elixir.

fstachura avatar fstachura commented on August 16, 2024

@tpetazzoni
I downloaded and indexed u-boot sources on my local machine, and the results are correct there, even with the family parameter. I have a suspicion that the cache is somehow involved.

from elixir.

fstachura avatar fstachura commented on August 16, 2024

I noticed another strange thing. I added a URL parameter to force Varnish to return uncached content. On some requests that bypass cache, the backend returns correct data.

% for n in `seq 4000 4100`; do echo https://elixir.bootlin.com/api/ident/u-boot/srand\?version\=latest\&family\=C\&test\=$n; curl https://elixir.bootlin.com/api/ident/u-boot/srand\?version\=latest\&family\=C\&test\=$n; echo; done

https://elixir.bootlin.com/api/ident/u-boot/srand?version=latest&family=C&test=4000
{"definitions": [], "references": [], "documentations": []}
https://elixir.bootlin.com/api/ident/u-boot/srand?version=latest&family=C&test=4001
{"definitions": [], "references": [], "documentations": []}
https://elixir.bootlin.com/api/ident/u-boot/srand?version=latest&family=C&test=4002
{"definitions": [], "references": [], "documentations": []}
https://elixir.bootlin.com/api/ident/u-boot/srand?version=latest&family=C&test=4003
{"definitions": [], "references": [], "documentations": []}
https://elixir.bootlin.com/api/ident/u-boot/srand?version=latest&family=C&test=4004
{"definitions": [], "references": [], "documentations": []}
https://elixir.bootlin.com/api/ident/u-boot/srand?version=latest&family=C&test=4005
{"definitions": [], "references": [], "documentations": []}
https://elixir.bootlin.com/api/ident/u-boot/srand?version=latest&family=C&test=4006
{"definitions": [], "references": [], "documentations": []}
https://elixir.bootlin.com/api/ident/u-boot/srand?version=latest&family=C&test=4007
{"definitions": [], "references": [], "documentations": []}
https://elixir.bootlin.com/api/ident/u-boot/srand?version=latest&family=C&test=4008
{"definitions": [], "references": [], "documentations": []}
https://elixir.bootlin.com/api/ident/u-boot/srand?version=latest&family=C&test=4009
{"definitions": [{"path": "include/rand.h", "line": 20, "type": "prototype"}, {"path": "drivers/crypto/ace_sha.c", "line": 122, "type": "function"}, {"path": "drivers/rng/npcm_rng.c", "line": 72, "type": "function"}, {"path": "lib/rand.c", "line": 29, "type": "function"}], "references": [{"path": "cmd/mem.c", "line": "1297", "type": null}, {"path": "drivers/crypto/ace_sha.c", "line": "152,181", "type": null}, {"path": "drivers/ram/stm32mp1/stm32mp1_tests.c", "line": "616,619", "type": null}, {"path": "drivers/rng/sandbox_rng.c", "line": "24", "type": null}, {"path": "lib/uuid.c", "line": "477,479", "type": null}, {"path": "net/dhcpv6.c", "line": "618", "type": null}, {"path": "net/net_rand.h", "line": "55,57", "type": null}, {"path": "scripts/kconfig/conf.c", "line": "537", "type": null}, {"path": "test/dm/mux-cmd.c", "line": "129,163", "type": null}, {"path": "tools/gen_eth_addr.c", "line": "17", "type": null}], "documentations": [{"path": "include/rand.h", "line": "12", "type": null}]}
https://elixir.bootlin.com/api/ident/u-boot/srand?version=latest&family=C&test=4010
{"definitions": [], "references": [], "documentations": []}
https://elixir.bootlin.com/api/ident/u-boot/srand?version=latest&family=C&test=4011
{"definitions": [{"path": "include/rand.h", "line": 20, "type": "prototype"}, {"path": "drivers/crypto/ace_sha.c", "line": 122, "type": "function"}, {"path": "drivers/rng/npcm_rng.c", "line": 72, "type": "function"}, {"path": "lib/rand.c", "line": 29, "type": "function"}], "references": [{"path": "cmd/mem.c", "line": "1297", "type": null}, {"path": "drivers/crypto/ace_sha.c", "line": "152,181", "type": null}, {"path": "drivers/ram/stm32mp1/stm32mp1_tests.c", "line": "616,619", "type": null}, {"path": "drivers/rng/sandbox_rng.c", "line": "24", "type": null}, {"path": "lib/uuid.c", "line": "477,479", "type": null}, {"path": "net/dhcpv6.c", "line": "618", "type": null}, {"path": "net/net_rand.h", "line": "55,57", "type": null}, {"path": "scripts/kconfig/conf.c", "line": "537", "type": null}, {"path": "test/dm/mux-cmd.c", "line": "129,163", "type": null}, {"path": "tools/gen_eth_addr.c", "line": "17", "type": null}], "documentations": [{"path": "include/rand.h", "line": "12", "type": null}]}

from elixir.

fstachura avatar fstachura commented on August 16, 2024

Ok, I think I know what the issue is, or at least why separate requests sometimes return different results for the same URL. I have managed to reproduce something similar on my local machine. Cache is not at fault (obviously).

api.py is a WSGI script. mod_wsgi starts multiple processes of the same WSGI script
https://github.com/bootlin/elixir/blob/master/api/api.py#L40
build_query imports query (local import, executed on each call)
https://github.com/bootlin/elixir/blob/master/query.py#L30
query.db is a module-global variable that is initialized on first import. From what I understand, it is an interface to project data
https://github.com/bootlin/elixir/blob/master/lib.py#L187
lib.getDataDir() simply returns $LXR_DATA_DIR which is set to the data directory of a project on each build_query call. This value is used to initialize query.db.

I think that the assumption made in build_query was that on each call, the local import statement will reimport the module, recreating all the global variables from that module. That does not seem to be true, imports are globally cached.
https://docs.python.org/3/reference/import.html#the-module-cache

So, there is separate query.db for each process, and it is initialized on the first bulid_query call.
On a fresh apache start, if the first request is for something from Linux codebase, and the second request is for something from Musl codebase, its likely that each request will go to a different process. In each process, query.db will be set to a different database. Then, following requests might unexpectedly return empty data (which will be then cached etc etc) because some requests will go to the process that has the Linux db, and some will go the the process that has the musl db.
This was never a problem in the original cgi script, because it's re-executed from scratch on each HTTP request.

I think that there still is something missing in my reasoning. The bug would imply that the number of projects working correctly is limited by the number of WSGI processes started by apache. I guess that most people use Elixir for browsing Linux sources, which would mean that on most processes query.db points to the Linux project database. Meanwhile all projects from elixir.bootlin.com that I tried to browse seemed to be working fine. But I wasn't looking very closely. Maybe the cache or the algorithm that picks the process that will handle a request somehow helped to maintain the illusion that everything is OK.

I see two possible fixes:

I'm gonna go ahead and open a PR with the first fix, since this probably impacts a sizeable chunk of the userbase and therefore is more or less urgent.

from elixir.

tpetazzoni avatar tpetazzoni commented on August 16, 2024

Thanks a lot @fstachura for the investigation and quick fix!

I am personally not familiar with the Elixir code base, so I find the need to force re-import a module to not be very great. And it seems like you agree: this is the quick and dirty fix, and we need a more long term fix that reworks the code so that query.py is better structured to handle multiple projects.

I'll let @michaelopdenacker handle your pull request with the quick fix.

from elixir.

fstachura avatar fstachura commented on August 16, 2024

The quick fix is not enough. Mod_wsgi utilizes multithreading and multiprocessing, meanwhile this solution does not seem thread safe (and I'm actually not sure if DB is thread safe at all)

from elixir.

fstachura avatar fstachura commented on August 16, 2024

@mwalle A fix for this was just deployed on elixir.bootlin.com. Could you try to reproduce the bug now?

from elixir.

mwalle avatar mwalle commented on August 16, 2024

@fstachura thanks, seems to work now!

from elixir.

michaelopdenacker avatar michaelopdenacker commented on August 16, 2024

Thanks for confirming. Closing this bug.

from elixir.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.