mentalisttraceur / python-macaddress Goto Github PK

View Code? Open in Web Editor NEW

18.0 18.0 5.0 135 KB

License: BSD Zero Clause License

Makefile 0.73% Python 99.27%

python-macaddress's People

Contributors

Stargazers

Watchers

Forkers

duynhaaa danmar2000 eqin3 nkaminski larsvoelker

python-macaddress's Issues

Explicitly overloading `ne`

We should probably delete HWAddress.__ne__, and let Python automatically implicitly implement it.

Thoughts?

Add `.from_str`, `.from_bytes`, and `.from_int` `classmethod` constructors?

Verdict: ~~no, unless someone brings a particularly compelling reason~~ probably, if people start asking for it.

Rationale: users can easily compose type-constraining functions on top of a type-flexible constructor, like this:

import macaddress

# When hardware address class is known:
def eui48_from_str(string: str) -> macaddress.EUI48:
    if not isinstance(string, str):
        raise TypeError(f"not a string: {string!r}")
    return macaddress.EUI48(string)

# When hardware address class is dynamic:
from typing import Type, TypeVar

HWAddressType = TypeVar('HWAddress', bound=macaddress.HWAddress)

def from_str(cls: HWAddressType, string: str) -> HWAddressType:
    if not isinstance(string, str):
        raise TypeError(f"not a string: {string!r}")
    return cls(string)

However, Python is a batteries-included language, and I could imagine wanting these batteries in this library. So if someone finds themselves wanting it, I might be pretty easily swayed, especially by seeing an example use-case where this is nicer to have out-of-the-box.

Retain `.oui` for backwards compatibility?

Based on the decision in #8 , starting with version 2.0.0, the canonical way to get the OUI of an address will be OUI(address) instead of address.oui.

However, should address.oui still work?

Why no `index`?

Jotting down rationale for why this library's classes should not implement __index__.

Abstractly: __index__ means "this is an integer", not "coersions of this to an integer are never lossy".

Concretely: an_array[a_mac_address] is usually nonsensical, an_array[one_mac_address:second_mac_address] has even more of its possibility space be nonsensical, an_array[::a_mac_address] is definitely nonsensical, and a_string * a_mac_address is profoundly nonsensical. And yet __index__ would enable all of those.

In every case I can imagine saying any of the above would be useful, I would reflexively think some code is designed wrong, and even if it's right and good in some edge case, I'm pretty sure code would literally always be more clear and self-descriptive to readers and maintainers if it was forced to explicitly convert to integers in those cases.

MicroPython support

Jotting this down to clarify exactly where I stand on supporting MicroPython in this library:

If there is enough demand, MicroPython will be supported, but
the library design will not be limited to things that only work on MicroPython out-of-the-box, and
the source code will not be cluttered or complicated by MicroPython portability cruft or workarounds,
instead, we can add an automated step to the package build to munge the source into a form that works around MicroPython deficiencies before building the micropython-macaddress package (or a next-best-thing name if someone takes that name by then).

The main reasons for this are that:

MicroPython is a somewhat incomplete Python, increasingly so as Python marches on, and
any graceful degradations which are possible from full Python to MicroPython would probably be useful for many libraries, and thus would ideally be implemented in a reusable tool that can be run as a package build step.

Notes on Avoiding Accidentally Inconsistently Transitive Equality

This does not effect any released version. I just wanted to jot down some thoughts about this design pitfall that I almost made with this library back when I first started it.

The behavior which I originally documented in the docstring of __eq__, but luckily accidentally did not implement, was that hardware address instances were equal if one was a subclass of the other (unless the subclass had a different size for some reason).

Now that sounds pretty reasonable at face value, doesn't it? Substitution principle - subclass instances should be usable as a parent class instance, and that includes equality.

I even implemented the "fixed" version (but never merged or released) in commit bdb4d26, in case you want to play with it.

Here's the thing though: it would've resulted in this behavior:

>>> from macaddress import *
>>> class MAC1(EUI48): pass
...
>>> MAC1(0) == MAC1(0)
True
>>> MAC1(0) == EUI48(0)
True
>>> class MAC2(EUI48): pass
...
>>> MAC2(0) == EUI48(0)
True
>>> MAC2(0) == EUI48(0) == MAC1(0)
True
>>> MAC2(0) == MAC1(0)
False
>>> MAC2(0) == MAC1(0) == EUI48(0)
False

That should raise some alarms. foo == bar == qux is true but foo == qux == bar is false!?

This kind of inconsistency would've been a terrible API pitfall - most users would not expect equality to work this way, and the common case would absolutely lead to people forming expectations that equality is transitive.

Design of `HWAddress` equality and hashing

This issue is the continuation of the discussion that happens within the pull request #2. Here's @mentalisttraceur's dump of thoughts:

Quick question: do you have any thoughts on different classes, or classes with different names, hashing differently vs the same?

I was thinking that the right default is for subclass instances to compare equal to their base class instances by default (they don't right now, despite the __eq__ docstring saying that they do, due to a bug that I haven't gotten around to committing a fix for) and this means that subclasses should probably hash the same way as their parent class.

Couple examples of this:

The MAC and EUI48 classes. I think the most sensible behavior is that they're effectively aliases, so much so that maybe I should have just done MAC = EUI48 instead of class MAC(EUI48): .... However, maybe someone sees a good use-case for them to count as two different members of a set?

MAC and the MACAllowsTrailingDelimiters example - this seems like a totally clear-cut case of a situation where the difference of type or even just type name shouldn't cause objects to compare or hash differently. Of course, it was kinda bad on my part to suggest changing formats by subclassing. If I provide and encourage a better way, then this becomes irrelevant.

But the current idea of class-aware equality comparison creates a weird hashing and set membership situation.

Sets and dictionaries and so on consider things the same if they are equal, but they expect and rely on equal things to hash the same, right?

So for example, if we have two subclasses of MAC, then they both compare equal to MAC instances with the same address (once the bug I described is fixed) but they compare unequal to each other. Even if we get hashing to be the same, this means we can get different sets purely based on the order of adds to those sets! That's messed up.

For another example, if two HWAddress types compare equal, then you have to know how they hash, and if they hash differently then I think it becomes indeterministic if they get counted as one member or two in the set (deterministic only if we know the underlying implementation details and all operations done, like how many buckets are in the underlying hash table at any given moment and how the hash method's hash is mapped to each bucket).

My current thoughts are:

If addresses just reject anything that's not exactly the class that they are, it breaks the use of subclasses for various stuff, but is easy to fix by overwriting equality operations as appropriate, which could "cast" self to the class that we want to be equivalent to forward to that equality method. This is consistent with a hash by type and address value. The only reasons right now for classes to do subclassing like this are the MAC/EUI-48 situation (honestly I should probably just alias them), the formatting situation (I should probably expose the relevant logic so people could do it without subclassing - how often is there a good reason for this to actually come up anyway? never?)

If classes compare equal by just size and address value, then this is consistent with hashing by size and value. It might also break situations people have with identically-sized addresses that they want to be distinct (when though? is that a real case? I can imagine cases, but real ones? Like what if you have token ring and Ethernet MAC addresses - you probably want logic to distinguish them at the type level, but for equality or hashing purposes you might not? If you have two different hardware address types that have the same bit pattern, are you more benefitted from a "if equal: bad" case being easy to implement, or from keeping them both from colliding in a set or as dictionary keys?)

Of course I should also keep my eyes on the common cases! Those should guide a lot of these choices. The uncommon cases should just be reasonably doable.

The common case has no subclassing beyond what this library does.

The commonest appropriate use case for subclassing a HWAddress subclass would presumably be to add unrelated enriching behavior, and probably should not effect equality or hashing.

Treating MAC and EUI48 as equivalent in all ways seems like it is the right behavior more often than not. (In particular, a reasonable situation is that a user writes code like if is instance(something, Mac address.MAC), and this should not break if something was initialized by someone who used macaddress.EUI48.) So I think I've decided that MAC and EUI48 should be aliases of each other. This is a minor breaking change that would probably effect no one. The one downside of this is that error messages use the class name, so you could get error messages that refer to the other class name. Maybe this means that MAC just shouldn't exist as a name exported by this library, but that's a much bigger breaking change.

Alright that concludes this chunk of the thought dump.

IMEI

An IMEI is also essentially a hardware address, much like a MAC.

TODO: think through if if it's worth providing a premade IMEI class, if IMEI influences any of the 2.0.0 design thoughts like #8, #7, or #13, or otherwise suggests anything.

Include license file in PyPI package

It would be great to have the license file included in the package that is downloaded from PyPI.

Incremental Parsing?

So one thing that comes up in real-world parsing, is parsing part of a string, up to the end of the next parsed thing.

In other words you might want something like this:

remaining_text = '01-23-45-67-89-AF whatever ...'

mac, remaining_text = parse_next(remaining_text, MAC)

assert mac == MAC('01-23-45-67-89-AF')
assert remaining text == 'whatever ...'

And of course you want some sort of sensible error behavior.

Notably, right now macaddress doesn't help you do this. It assumes you already have strings which are the right size to pass in to the class constructor or parse.

Of course, neither does ipaddress... is this a deficit on the part of ipaddress? Or should this kind of functionality be provided somehow by higher-level wrapper libraries?

Notably:

If an incremental parsing library can't share parsing logic, then it has to reimplement a lot of the same parsing logic.
If an incremental parsing library has to reuse ipaddress or macaddress as it is right now, then they must necessarily take on an inefficiency:

Imagine you have a string that you want to parse for the next "token" being either an EUI48 or OUI literal. (Maybe you're writing a mini language for working with networking settings, like for scripting a router or something.)

(Quick aside: when parsing stuff from the front of something longer, parses might want to do either a greedy match or conservative match: if you see the string 01-02-03-04-05-06, do you conservatively match just OUI('01-02-03') and leave -04... unconsumed, or do you greedily match the whole EUI48('01-02-03-04-05-06')? In this use-case I think it's obvious that you'd want to do the greedy match, but in some cases you might want the conservative one.)

So the point is, from an efficiency perspective, ideally the parsing consumes 01-02-03, and then decides based on the next character whether or not it is done (and has parsed an OUI instance) or needs to continue to try to parse a full EUI48 instance.

(There's another permutation: if the next character is -, then sometimes you want to consider the OUI possibility as eliminated, so that if the rest doesn't parse into a full EUI48, it's now an error, and other times you want to consider the OUI possibility still "live", so that if the greedy match fails, you treat it as if you just parsed an OUI instance and stopped there, so that the next step of the parser can read from right after the OUI.)

In all of these cases though, it kinda makes sense to me to parse all the possibilities incrementally - check each character against all parse possibilities - the way I do in macaddress._parse. And while the logic of macaddress._parse assumes a string containing one complete address, and only tries the formats which match the size, it's probably pretty close to what an incremental parser would want in that case.

So it seems like a shame that I've written some careful logic, which can parse canonical and custom formats into all these hardware address types represented as distinctly typed objects in memory, and yet if someone wanted to write parser which parses hardware addresses from the next characters in a string, they'd basically have to reimplement a lot of what I did, or use what I did at the cost of a lot of boilerplate and inefficiency.

Overloading `+` (and `-`)?

Tentative verdict: add support for __add__ and __sub__ but not __radd__ and __rsub__. Don't wrap around in the edges.

Questions:

Should MAC('01-02-03-04-05-06') + 1 work to step to the "next" address, and produce MAC('01-02-03-04-05-07')?
Is there any good reason to do something more rigid than duck-typing for this?
Is there any good reason to not also support subtraction the same way?
Is there any good reason to not support reversing the operands?
What should happen at the edge cases like MAC('FF-FF-FF-FF-FF-FF') + 1? (Error or wraparound to MAC('00-00-00-00-00-00')?)

My thoughts on those questions:

The main benefits if incrementing is supported natively with operator overloads:
- MAC(int(mac) + 1) becomes just mac + 1
- mac = MAC(int(mac) + 1) becomes just mac += 1
- type(instance)(int(instance) + 1) becomes just instance + 1
The first two are just convenient and ergonomic for humans, especialy during interactive use in a REPL, Jupyter Notebook, whatever. For reference, the ipaddress module supports this, and I'm pretty sure I've used that before during prototyping/testing/fiddling.

That last one is particularly compelling to me in a "this should be a feature in the library" way because the ability to write type-generic/type-abstracted helper code is huge for empowering people to cleanly refactor repetitive code and build helper libraries that might be useful across a wider variety of use-cases.

This may also prove helpful for implementing sequences/prefixes (mostly discussed in #7 ). Not overwhelmingly compelling in itself, but the fact that I reached for it several times while playing with sequences/prefixes smells like another angle of this being a good idea.

Also, if I was implementing a MAC address allocator (normally I use the example of a hardware manufacturer, but actually an example closer to most of us is virtual networking stuff - think multiple containers or virtual machines on the same computer talking to each other), then the ability to concisely express "the next MAC address" seems like it would be rather nice.
One problem that I see with just doing simple duck typing is that this implementation would be broken:
```
def __add__(self, other):
    return type(self)(int(self) + int(other))
```
because then stuff like this would work in surprising ways: a_mac_address + 0.5, a_mac_address + other_mac_address, and a_mac_address + an_ip_address - but all of those are nonsensical at face value and more likely to be an error than an intentional coercion. These also don't have a single natural interpretation/expectation/behavior/semantics, whereas if I do + 1 onto an address, I think there's a good argument that this has one interpretation that fits much better than any other.

Similarly, this is less broken but still broken:
```
 def __add__(self, other):
     return type(self)(int(self) + other)
 def __radd__(self, other):
     return type(self)(other + int(self))
```
because while this avoids the problems with silently coercively tolerating floats and so on, this would still allow a_mac_address + another_mac_address to silently work.

That last one can be interpreted as either a problem with naive permissive duck-typing or as a problem with implementing __radd__. But notably, operator reversal can work so long as there's more robust type checking:
```
def __add__(self, other):
    if not isinstance(other, int):
        return NotImplemented
    return type(self)(int(self) + other)
def __radd__(self, other):
    if not isinstance(other, int):
        return NotImplemented
    return type(self)(other + int(self))
```
Seems super easy to me: every justification for overloading + in our case supports overloading - as well, because + is basically defined by going that many "steps" along the natural ordering of addresses, so naturally - is just going back by the same steps.
So this one was a serious challenge for me, deciding if integer + mac should also work, or if only mac + integer should work?

As a practical matter: increment = functools.partial(operator.add, 1)) is something that only works on address instances if they implement __radd__.

I also had a concern that implementing both __add__ and __radd__ could cause the problems described in the previous point - but I think I've convinced myself that this can be properly solved by implementing type-checking, and that this type checking is worth implementing anyway.

But the biggest thing was deciding if order of + parameters should be meaningful? My initial reflex was to think that order of operations shouldn't matter: all the typical + operations I got taught in school and that I normally see in math, science, and programming have + be commutative. But then I thought about - and it hit me that we can validly think of + in an order-matters way - similarly to -. Because obviously 1 - mac_address doesn't make sense. So the reason why mac_address - 1 makes sense is that we can think of - as "decrement by", and there is a natural way to sort addresses, etc. So we could also think of + as "increment by". And "increment by" is not a commutative operation. It makes sense to "increment this address by one" (go to the "next" address), but it doesn't make sense to "increment this number by this address".

Plus, this is what ipaddress does: the base classes implement __add__ and __sub__ but not __radd__ and __rsub__.

So for subtraction, it makes sense to not implement __rsub__, and so for symmetry between addition and subtraction, it seems like it's better to not implement __radd__ until there's a really compelling argument.
Precedent: ipaddress classes error out with a value error on address overflow and underflow. Do I have good reason to do otherwise? (Also lesser/looser/vaguer/distant-er precedent: Python integers are not wrap-around, they're arbitrary-precision.)

I think consistency is important within an ecosystem - so if Python developers are used to things not wrapping around, there needs to be a good reason to for a Python macaddress library to wrap around. And if Python developers are not use to thinking of not-obviously-superficially-integer things being "actually" integers (represented or just representable as integers), there needs to be a good reason for semantics that only make sense for integers.

Now, as someone very comfortable with fixed-width wrap-around integers at the bare metal level and languages such as C, Go, etc, and as someone comfortable with low-level implementation details and translating between functionally equivalent representations, I personally would find it intuitive to have MAC('FF-FF-FF-FF-FF-FF') + 1 == MAC('00-00-00-00-00-00').

But I think that's a very non-representative perspective. Near as I can tell, a lot of Python devs are used to operating at a higher level of abstraction, where this would be surprising and nonsensical.

So what good reasons would there be to overpower that? The only one I can think of is if you want to start with a random hardware address instance, and then perform a single operation that would always give you a different valid address. But obviously the most readable form of that is something like mac + 1 if mac < MAC('FF-FF-FF-FF-FF-FF') else mac - 1. In one-off interactive REPL fiddling, you could do MAC(int(mac) & ((1 << mac.size) - 1)). Which is annoying, but not awful - if you find that intuitive enough to spontaneously think of it, you can probably decide to do it. Similarly, if I come up with a good you can address range/sequence addition to this library, then that can be combined with a general-purpose itertool-style thing that turns any sequence into a cycle.

Class vs ABC vs metaclass vs decorator vs `__init_subclass__`

Should the HWAddress base class be changed to an abstract base class, a metaclass, or a decorator? (Edit: or get an __init_subclass__ method?)

Allow Copy Construction or "Casting" to Different Sizes?

Here's a concept: when the "constructor" (__init__) of a hardware address class gets a hardware address instance of a different size as its address parameter, we could truncate it or zero-pad it (from the "right") as needed to match the newly constructed address size.

Should we do this?

`self.size` vs `type(self).size`

Right now the code uses self.size. Should it use type(self).size instead?

I am tempted to say "yes".

The only behavior difference I can think of is that

a HWAddress subclass where size is defined on the instance rather than on the class would no longer work, and
AttributeError messages may differ - in CPython, that difference is that it says type object 'YourSubclass' ... instead of 'YourSubclass' object ....

Weak References

If you have a use-case for weakly referencing classes provided by this library, let me know.

I don't see a use-case for it, because weak references exist to break reference cycles, but the classes provided by this library don't really reference anything.

So I've been holding off on adding __weakref__ to __slots__, because it seems pointless. And I kinda want to wait and see if it ever actually comes up.

But I'm open to adding it if anyone runs into a good use-case for it.

`self.formats` vs `type(self).formats`

Kinda like #6: the formats could be a class-only thing, or optionally an instance-specific thing. Should they be gotten from the instance instead of from the class?

I mostly reached a stable decision on this right after #6, matching that decision, but I wanted to open this issue to discuss/record the thinking for the format-specific stuff.

Release 2.0.0?

I'm noticing that there's an older version on PyPi (1.2.0), but the latest here is 2.0.0. Is there any plan to release that to PyPi (I didn't compare the differences between what's in 1.2.0 and 2.0.0).

Thanks!

`@functools.total_ordering`

If someone came to me in code review at a job with all four of __lt__, __le__, __gt__, and __ge__ implemented, I'd immediately say "delete three of them and just use @functools.total_ordering: less potential for error, better signal-to-noise when reading code, [...]".

So is there a good enough reason for me to keep not applying my own advice to library code like this?

`SNAP` "addresses"

Not exactly actual hardware addresses, but SNAP protocol uses OUI as the first part, so if I ever had to write code that needs to wor with SNAP, I could totally see myself doing something like this:

class SNAPAddress(macaddress.HWAddress):
    size = 40
    formats = ...

And then maybe doing stuff like OUI(my_snap_address) (assuming #8), etc.

Anyway, nothing to do for now, just jotting it down as another example of almost-hardware-address things which this library could be useful for.

EUI-48 and "MAC"; the fate of `macaddress.MAC`

Should macaddress.MAC be provided by this library (it currently is)?

If so, should it be

a subclass of macaddress.EUI48 (like it is now),
just another reference to macaddress.EUI48,
some other option? (Feel free to voice any ideal-world or "would be nice" visions, even if you can't think of how to implement them within Python.)

Prefixed "Groups" of Addresses (like `ipaddress.ip_network`)

Should this library provide something analogous to the ipaddress "network" classes? If so, what should that look like?

Default `formats`?

If a hardware address class does not specify formats, but does specify a size, we could generate at least one default format string, for example like this:

'x' * (TheClass.size // 4)

But should we?

Alternatives for custom formats instead of subclassing?

Currently, if you need to parse with alternative formats, you need to make a subclass:

class MAC(macaddress.MAC):
    formats = macaddress.MAC.formats + (
        'xxx-xxx-xxx-xxx',
        'xxx:xxx:xxx:xxx',
        'xxx xxx xxx xxx',
    )

And if you want to control the output format, you can do that by making the first format what you want:

class MAC(macaddress.MAC):

   formats = (
       'xxx-xxx-xxx-xxx',
       'xxx:xxx:xxx:xxx',
       'xxx xxx xxx xxx',
   ) + macaddress.MAC.formats

This works, and it has a very "snug fit" to the problem: the classes you're creating are still MAC addresses, they can be used anywhere those classes can be used, and behave identically except for the exact output format and input formats.

But... I have never been 100% comfortable and satisfied with using subclassing for this. It works, it is pretty comfortable/ and ergonomic to use, and I can't think of a better way, but I can't help but feel like there might be a better way that I am just missing.

So, thoughts?

Relentless Robot SemVer or Feely Human SemVer?

I've been thinking that the Right(tm) way to do SemVer looks like a version bump for every. single. change.

Basically, each "atomic change" (from the original meaning of the word "atom" - so small you can't divide it further - a change that can't be split up into smaller independently usable changes) would get its own release and version number.

But what this means in practice is that version numbers would sprint up - sometimes there might be several "major" version bumps in a single day.

For context, if I was following that idea, just the changes I have in mind to implement and release so far would get us to at least version 5.0.0:

2.0.0: equality based on address size instead of type,
3.0.0: macaddress.MAC becomes alias instead of subclass of macaddress.EUI48.
4.0.0: prefix-cast construction (f.e. OUI(my_mac_address))
5.0.0: remove special .oui attribute.

But knowing me, I would implement and release all those in a flurry of activity over the course of a day or two. I wouldn't want to inject artificial delays between them. That's for your automation and compatibility testing to deal with, if you need the delays.

Of course this is fine and good if you really embrace SemVer as a robotic usage of version numbers to communicate programmatically useful information to tooling and automation about API compatibility.

In other words, if you think about SemVer as if you're a finely oiled machine which feels nothing except eager ruthless joy at

downloading each dependency as soon as it's released,
running compatibility tests on it, recording any deprecation warnings for later use, and
- if everything passes, installing and releasing it, or
- if there is a test failure, doing a binary search between that version and the latest one which worked to find which change broke your usage of the dependency,

then one release per atomic change is really nice, elegant, sensible, and very helpful.

In fact, if you're a human in a world where that tooling exists and works well, used to working on projects which take full advantage of that in their CI/CD, well then you'd probably really want projects to roll releases like that.

But if you're a human manually watching for updates of dependencies and manually checking what changed, this would lead to a worse experience. One day you check the dependency and see that you're "four major versions behind?!?" and if you're not used to this idea, it's surprising, weird, and would probably leave you feeling like your time was wasted after you make an effort to find what major thing changed in each one, when really each one was just a tiny breaking change which didn't matter to you.

It also has a slight impression management downside: like back when Google Chrome first came out and shot through major version after major version - Firefox had been sitting in the ballpark of 3.* for years, because they were saving major versions for the really big stuff, then Chrome was up at version 12.* in like a year - and people felt stuff like "why are there so many major versions?" and "they're just trying to pad their numbers to look on par with the major browsers".