Giter Club home page Giter Club logo

Comments (28)

kohr-h avatar kohr-h commented on May 14, 2024

Sure thing. To make it easier to pull in everything I suppose to make a metapackage odl on PyPI which pulls in all subpackages as namespace packages and as a result installs odl with all the rest thrown into its namespace.

To get just the core part, there could be an odl-core package.

from odl.

adler-j avatar adler-j commented on May 14, 2024

Sure, we only need a way for odl to not "crash" if the users dont have cuda for example, simply excluding odl-cuda in a nice way.

from odl.

kohr-h avatar kohr-h commented on May 14, 2024

The namespace package issue

I played around a bit with the namespace package idea yesterday, but to be honest I couldn't get it to work. Unfortunately the documentation on this topic is quite sparse and does not really cover possible problems very well. Implementation-wise my impression is that support for this feature is a bit fragile. And, worst of all, it does not seem to work properly with the pip install -e option which is so useful that I don't want to miss it.

What one is supposed to do

There are two slightly different ways of creating a namespace package in Py2 and Py3 < 3.3, plus one for 3.3 and later.

In general, odl would be a namespace package and odl.core, odl.solvers etc the subpackages. In any implementation, we would have a directory structure as follows (the folders odl.core and odl.solvers need not be in the same parent folder, and names are also irrelevant):

├── odl.core
│   └── odl
│       └── core
│           └── <core modules>
├── odl.solvers
│   └── odl
│       └── solvers
│           └── <solvers modules>
└── ...
1. Using setuptools

According to the setuptools doc on namespace packages, each odl directory would have to contain an __init__.py file with only the line

__import__('pkg_resources').declare_namespace(__name__)

In addition, each subpackage would have to use the option namespace_packages=['odl'] in the respective setup.py call to setup().

2. Using pkgutil (standard library module)

The recommended way of marking a package as a namespace package is via pkgutil.extend_path (documentation here). In that case, one adds a __init__.py to each odl directory in the tree and adds the lines

from pkgutil import extend_path
__path__ = extend_path(__path__, __name__)

Side note from PEP 420:

Every distribution needs to provide the same contents in its __init__.py , so that extend_path is invoked independent of which portion of the package gets imported first. As a consequence, the package's __init__.py cannot practically define any names as it depends on the order of the package fragments on sys.path to determine which portion is imported first.

A I read it, we would no longer be able to propagate names to the top-level odl namespace.

3. Implicit namespace package (Python 3.3 and later)

This feature is triggered by a subpackage not having an __init__.py in its odl directory. No need to do anything else, but I don't know exactly the consequences and how this feature collaborates with pip.

Pros and Cons of namespace packages

Pros
  • Nice import statements import odl.solvers just as if solvers was a submodule in the large odl distribution.
  • One import odl would make the namespace of the submodules immediately accessible. Apparently, that's not what happens. One still needs to explicitly import the submodule. It just looks like it is part of the "big blob" odl.
  • Each subpackage could be maintained separately and have its own (isolated) dependencies.
  • Distro and import package names would be nicely aligned.
Cons
  • Since it seems impossible to have a "parent" package and several "child" or "plugin" packages using the parent's namespace (at least with this mechanism), what is now odl would have to be moved to odl.core, for example.
  • The nice "propagate to top level" imports would only reach the next-to-top level, i.e. odl.Rn would then be odl.core.Rn. This counteracts the advantage of a unified namespace to some extent.

Alternative

Instead of making a namespace package odl and several subpackages (including odl.core which is now odl), we could keep odl as a proper package and use a common naming scheme for the packages which are based on odl and extend its functionality.

To make distribution simpler for users, the core part could get the distro name odl-core on PyPI, while odl would pull in all dependencies it can get (possibly excluding CUDA and other, non-PyPI packages, e.g. ASTRA).

Conclusion

Before thinking further about how to get namespace packages working for our environment, we should decide if it provides the functionality we want to have (see Pros/Cons). What it boils down to is:

Do we want

  1. odl.core.Rn and nice namespace packages like odl.solvers with nicely aligned names in distro and import
    or
  2. odl.Rn and proper packages like odl-solvers imported as odl_solvers, where we need to tailor a solution to distribute the whole bunch?

Useful Links

Here's a collection of links where I got my information from:

  • PEP 420 (2012) dealing with implicit namespace packages. This PEP has two predecessors - PEP 382 (2009) and PEP 402 (2011) which were both rejected in favor of this one. It is implemented since Python 3.3, but not backported to Python 2. The situation for Py2 and Py3 pre-3.3 is described in a section of the PEP.
  • A stackoverflow question with some good answers, some however before or between the PEPs, so statements about concepts being future-proof or not have to be read in that context.
  • Another useful stackoverflow discussion.
  • Yet another one relevant for us, rather describing the problem than a solution.
  • zope.interface is a namespace package, so by simply copying what they did we should succeed, too.
  • oslo, an OpenStack related project which apparently dropped out of namespace packages altogether due to poor usability. The issue description particularly mentions the trouble with the pip install -e option which I have not been able to make work at all has a workaround. They opted for a naming scheme with underscore, oslo_foo instead of the namespace oslo.foo which we could consider, too. It would be easy to go back to namespace once a good solution is available (or we can just forget about Python < 3.3).

Edit: Added conclusion.
Edit 2: Added a pro (the most important one) and complemented a con.
Edit 3: Added alternative.
Edit 4: Corrected the alleged extra pro added in edit 2 and added choices in conlcusions

from odl.

kohr-h avatar kohr-h commented on May 14, 2024

Working namespace package implemented in a branch.

from odl.

kohr-h avatar kohr-h commented on May 14, 2024

Regarding Git repo layout, a possible solution would be to make a meta-repository odl which simply includes all existing extensions as Git submodules. This would still require manual installation of all submodules, but at least the fetching process would be much simpler if one wants everything. It may also be easier to sync revisions of subpackages such that the latest master of the meta-package would always be working (hopefully), possibly backed by CI.

from odl.

adler-j avatar adler-j commented on May 14, 2024

Excellent writeup. I can certainly see the issues this causes. I'll try to do some further research myself as well.

from odl.

adler-j avatar adler-j commented on May 14, 2024

Another alternative would be to have a single "main" package, lets call it odl, which could contain the current code, or it could be moved to odl-core. Anyway, we then make a bunch of sub-libraries that we name as you suggested odl-solvers etc.

We then add these as optional dependencies in setup.py, thus users should be able to install odl-cuda with something like (in the odl folder)

python setup.py cuda

which should then try to install the odl-cuda package (also hosted on PyPI).

If we finally in the odl package do some magic in the __init__ files, we should be able to

import odl-cuda as cuda
from cuda import *
__all__ += cuda.__all__

then users could still do anything they can do atm.

If we finally put the doc inside the odl library, we should be able to have a unified doc somewhere.

from odl.

kohr-h avatar kohr-h commented on May 14, 2024

For your proposed method, we could use the setuptools extras similar to what we do with 'testing' right now. For example, for CUDA we would add

setup(
    ...
    extras_require = {
        ...
        'cuda': 'odl-cuda'}

to the setup.py of odl. This option would then be triggered by the command pip install odl[cuda]. See example 6 here.

from odl.

adler-j avatar adler-j commented on May 14, 2024

That should be quite fine, no? I assume an user could do pip install odl[cuda,solvers] etc then?

Main issue then would be to keep the documentation organized, It would be nice if we had a single doc.

from odl.

kohr-h avatar kohr-h commented on May 14, 2024

I hope that's how it works. In any case, we could add an all extra target so by calling pip install odl[all] one would get everything.

from odl.

kohr-h avatar kohr-h commented on May 14, 2024

And yes, I think we could go for that solution.

from odl.

adler-j avatar adler-j commented on May 14, 2024

My opinion is that we dont make a odl-core package and leave that stuff in the odl package.

from odl.

kohr-h avatar kohr-h commented on May 14, 2024

I fully agree.

from odl.

adler-j avatar adler-j commented on May 14, 2024

So we had a good discussion on this yesterday, which I'll summarize here.

At the core of this problem, there are two options. Having a monolithic package, or a core package with "add-ons".

As discussed above, namespace packages seem to have a bunch of issues themselves, and do not seem fit for this package.

A branch for integrating solvers with odl using a structure where odl_solvers is a subpackage och odl was attempted. This also turned out to have several issues, one is that we would not be able to import odl.solvers as one would expect. Another issue is that the doc becomes fragmented (we need to have the solvers doc inside the main package).

Another method would be to have no formal connection between the libraries, instead letting odl_solvers have odl as a dependency. This is unfavorable since users would then need to import and manage several packages, causing a more complicated install process and workflow.

A final method is the method used by py.test, which has an addon structure which is loosely coupled. They use a plugin architecture using dynamic discovery. This would be useful if we had a very tightly specified interface and wanted users to be able to add plugins. An example on how to use this is that plugins could register FnBase spaces, and we could import them dynamically in uniform_discr. The issue with this is that it seems you cannot directly access the plugins as an external user, nor is the doc automatically merged.

Proposal

In light of this (and unless we find another method), I propose that we re-merge all sub packages into the main odl package. Each of them should be inside odl/SUB_PACKAGE and they should not be imported togeather with odl i.e. one needs to write odl.solvers.conjugate_gradient instead of simply odl.conjugate_gradient.

from odl.

aringh avatar aringh commented on May 14, 2024

The proposal sounds like a good idea, however I'm not completely sure of what you mean. Will one have to do imports like
import odl
import odl.solvers
or is it simply when calling that one has to do odl.solvers.whatever_method?

from odl.

adler-j avatar adler-j commented on May 14, 2024

What I mean is that user can either do:

from odl.solvers import whatever_method
whatever_method

or

import odl
odl.solvers.whatever_method

or

import odl.solvers
odl.solvers.whatever_method

The won't be able to do

from odl import whatever_method

or

import odl
odl.whatever_method

from odl.

kohr-h avatar kohr-h commented on May 14, 2024

import odl
import odl.solvers

@aringh This is how it would look like with namespace packages. odl.solvers would be an independent package which only looks like it belongs to odl.

So in an ideal world, there would be a solution which allows us to

  1. have a main package odl,
  2. have several add-ons like the solvers as separate packages with isolated dependencies,
  3. install everything with one command only
  4. get all with one import import odl as subpackages odl.solvers e.g.
  5. maintain continuous integration and documentation in one place (main repo)

Since we're not living in an ideal world, we don't get all that.

  • Namespace packages: probably impossible to get 1. and 4., with a big question mark for 5.
  • Separate packages: 4. is not possible, especially not the given syntax. We would have to settle with something like odl_solvers, and each import must be explicit. Question mark for 5.
  • Monolithic package: Naturally no 2., but the rest is possible. However, we have some experience with compartmentalizing dependencies (like CUDA), so this should be doable.

I agree to merging back to main.

from odl.

adler-j avatar adler-j commented on May 14, 2024

Do you also agree on assigning the task to yourself? :)

Final note:

We could add labels "solvers", "core", etc to make working with issues easier.

from odl.

kohr-h avatar kohr-h commented on May 14, 2024

Yeah, what the heck ;-)

from odl.

adler-j avatar adler-j commented on May 14, 2024

Actually putting odl into PyPI is the last of the major "making ODL a serious package" things we have to do, IMO this should be prio now.

from odl.

kohr-h avatar kohr-h commented on May 14, 2024

I have an account on PyPi, and the procedure seems to be very simple. The thing is more when do we dare to attach a version number to it. Right now it's a bit arbitrary 0.9-ish. Hopefully we only change things under the hood, but since the library is quite heterogeneous in that respect, it's hard to get right.

What we could do is to call it 0.9 and mark the more shaky parts as experimental and likely to undergo large changes.

from odl.

adler-j avatar adler-j commented on May 14, 2024

My suggestion is that we simply make a 0.10 tag about now (just make sure all tests and doc works), and then release it with a big disclaimer.

from odl.

kohr-h avatar kohr-h commented on May 14, 2024

Okay, good. 0.1 it would then be. People probably expect 1.0 after 0.9, not 0.10

from odl.

adler-j avatar adler-j commented on May 14, 2024

No, 0.10 goes after 0.9, see for example the examples in PEP440.

See also wikipedia on the issue

Most free and open-source software packages, including MediaWiki, treat versions as a series of individual numbers, separated by periods, with a progression such as 1.7.0, 1.8.0, 1.8.1, 1.9.0, 1.10.0, 1.11.0, 1.11.1, 1.11.2, and so on.

If 0.1 came after 0.9 version checks version >= (0, 9) would fail

from odl.

kohr-h avatar kohr-h commented on May 14, 2024

Okay, I'm convinced. Maybe that's also a way to indicate that the 1.0 release may come at any point, for example after 89 further releases until 0.99. So users don't expect 1.0 to come soon when current status is 0.9. Let's go for 0.10 then.

from odl.

kohr-h avatar kohr-h commented on May 14, 2024

There seems to be a nice way to integrate Python package index testing into our upcoming Jenkins environment, namely with devpi. It's a simple index server implementation with some interesting features:

  • It caches packages installed from PyPi when installing with the devpi command.
  • One can upload to it a package and then test-install with pip install -i <devpi server>.
  • It can trigger a Jenkins test every time a package is uploaded so installing with pip becomes part of the CI
  • For compiled packages, one can upload a wheel which speeds up installation significantly but also solves issues with virtualenv where I had a hard time installing Scipy.
  • Possibly one can even combine those tools plus tox to locally run a mini-CI.

I suggest we have a look into this and consider integrating in into our CI workflow.

from odl.

adler-j avatar adler-j commented on May 14, 2024

Oh my goooood! does happy dance 💃

I guess odlpp is missing in that for now?

from odl.

kohr-h avatar kohr-h commented on May 14, 2024

Yes, sorry :-) We can try to build a wheel from the source. I don't think it's that hard, we just need to add an option to CMake.

from odl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.