mentalisttraceur / python-macaddress Goto Github PK
View Code? Open in Web Editor NEWLicense: BSD Zero Clause License
License: BSD Zero Clause License
We should probably delete HWAddress.__ne__
, and let Python automatically implicitly implement it.
Thoughts?
Verdict: no, unless someone brings a particularly compelling reason probably, if people start asking for it.
Rationale: users can easily compose type-constraining functions on top of a type-flexible constructor, like this:
import macaddress
# When hardware address class is known:
def eui48_from_str(string: str) -> macaddress.EUI48:
if not isinstance(string, str):
raise TypeError(f"not a string: {string!r}")
return macaddress.EUI48(string)
# When hardware address class is dynamic:
from typing import Type, TypeVar
HWAddressType = TypeVar('HWAddress', bound=macaddress.HWAddress)
def from_str(cls: HWAddressType, string: str) -> HWAddressType:
if not isinstance(string, str):
raise TypeError(f"not a string: {string!r}")
return cls(string)
However, Python is a batteries-included language, and I could imagine wanting these batteries in this library. So if someone finds themselves wanting it, I might be pretty easily swayed, especially by seeing an example use-case where this is nicer to have out-of-the-box.
Based on the decision in #8 , starting with version 2.0.0, the canonical way to get the OUI of an address will be OUI(address)
instead of address.oui
.
However, should address.oui
still work?
Jotting down rationale for why this library's classes should not implement __index__
.
Abstractly: __index__
means "this is an integer", not "coersions of this to an integer are never lossy".
Concretely: an_array[a_mac_address]
is usually nonsensical, an_array[one_mac_address:second_mac_address]
has even more of its possibility space be nonsensical, an_array[::a_mac_address]
is definitely nonsensical, and a_string * a_mac_address
is profoundly nonsensical. And yet __index__
would enable all of those.
In every case I can imagine saying any of the above would be useful, I would reflexively think some code is designed wrong, and even if it's right and good in some edge case, I'm pretty sure code would literally always be more clear and self-descriptive to readers and maintainers if it was forced to explicitly convert to integers in those cases.
Jotting this down to clarify exactly where I stand on supporting MicroPython in this library:
micropython-macaddress
package (or a next-best-thing name if someone takes that name by then).The main reasons for this are that:
This does not effect any released version. I just wanted to jot down some thoughts about this design pitfall that I almost made with this library back when I first started it.
The behavior which I originally documented in the docstring of __eq__
, but luckily accidentally did not implement, was that hardware address instances were equal if one was a subclass of the other (unless the subclass had a different size for some reason).
Now that sounds pretty reasonable at face value, doesn't it? Substitution principle - subclass instances should be usable as a parent class instance, and that includes equality.
I even implemented the "fixed" version (but never merged or released) in commit bdb4d26, in case you want to play with it.
Here's the thing though: it would've resulted in this behavior:
>>> from macaddress import *
>>> class MAC1(EUI48): pass
...
>>> MAC1(0) == MAC1(0)
True
>>> MAC1(0) == EUI48(0)
True
>>> class MAC2(EUI48): pass
...
>>> MAC2(0) == EUI48(0)
True
>>> MAC2(0) == EUI48(0) == MAC1(0)
True
>>> MAC2(0) == MAC1(0)
False
>>> MAC2(0) == MAC1(0) == EUI48(0)
False
That should raise some alarms. foo == bar == qux
is true but foo == qux == bar
is false!?
This kind of inconsistency would've been a terrible API pitfall - most users would not expect equality to work this way, and the common case would absolutely lead to people forming expectations that equality is transitive.
This issue is the continuation of the discussion that happens within the pull request #2. Here's @mentalisttraceur's dump of thoughts:
Quick question: do you have any thoughts on different classes, or classes with different names, hashing differently vs the same?
I was thinking that the right default is for subclass instances to compare equal to their base class instances by default (they don't right now, despite the
__eq__
docstring saying that they do, due to a bug that I haven't gotten around to committing a fix for) and this means that subclasses should probably hash the same way as their parent class.Couple examples of this:
The
MAC
andEUI48
classes. I think the most sensible behavior is that they're effectively aliases, so much so that maybe I should have just doneMAC = EUI48
instead of classMAC(EUI48): ...
. However, maybe someone sees a good use-case for them to count as two different members of aset
?
MAC
and theMACAllowsTrailingDelimiters
example - this seems like a totally clear-cut case of a situation where the difference of type or even just type name shouldn't cause objects to compare or hash differently. Of course, it was kinda bad on my part to suggest changing formats by subclassing. If I provide and encourage a better way, then this becomes irrelevant.But the current idea of class-aware equality comparison creates a weird hashing and set membership situation.
Sets and dictionaries and so on consider things the same if they are equal, but they expect and rely on equal things to hash the same, right?
So for example, if we have two subclasses of
MAC
, then they both compare equal toMAC
instances with the same address (once the bug I described is fixed) but they compare unequal to each other. Even if we get hashing to be the same, this means we can get different sets purely based on the order of adds to those sets! That's messed up.For another example, if two
HWAddress
types compare equal, then you have to know how they hash, and if they hash differently then I think it becomes indeterministic if they get counted as one member or two in the set (deterministic only if we know the underlying implementation details and all operations done, like how many buckets are in the underlying hash table at any given moment and how the hash method's hash is mapped to each bucket).My current thoughts are:
If addresses just reject anything that's not exactly the class that they are, it breaks the use of subclasses for various stuff, but is easy to fix by overwriting equality operations as appropriate, which could "cast" self to the class that we want to be equivalent to forward to that equality method. This is consistent with a hash by type and address value. The only reasons right now for classes to do subclassing like this are the
MAC
/EUI-48
situation (honestly I should probably just alias them), the formatting situation (I should probably expose the relevant logic so people could do it without subclassing - how often is there a good reason for this to actually come up anyway? never?)If classes compare equal by just size and address value, then this is consistent with hashing by size and value. It might also break situations people have with identically-sized addresses that they want to be distinct (when though? is that a real case? I can imagine cases, but real ones? Like what if you have token ring and Ethernet MAC addresses - you probably want logic to distinguish them at the type level, but for equality or hashing purposes you might not? If you have two different hardware address types that have the same bit pattern, are you more benefitted from a "if equal: bad" case being easy to implement, or from keeping them both from colliding in a set or as dictionary keys?)
Of course I should also keep my eyes on the common cases! Those should guide a lot of these choices. The uncommon cases should just be reasonably doable.
The common case has no subclassing beyond what this library does.
The commonest appropriate use case for subclassing a
HWAddress
subclass would presumably be to add unrelated enriching behavior, and probably should not effect equality or hashing.Treating
MAC
andEUI48
as equivalent in all ways seems like it is the right behavior more often than not. (In particular, a reasonable situation is that a user writes code like if is instance(something,Mac address.MAC
), and this should not break if something was initialized by someone who usedmacaddress.EUI48.
) So I think I've decided thatMAC
andEUI48
should be aliases of each other. This is a minor breaking change that would probably effect no one. The one downside of this is that error messages use the class name, so you could get error messages that refer to the other class name. Maybe this means thatMAC
just shouldn't exist as a name exported by this library, but that's a much bigger breaking change.Alright that concludes this chunk of the thought dump.
It would be great to have the license file included in the package that is downloaded from PyPI.
So one thing that comes up in real-world parsing, is parsing part of a string, up to the end of the next parsed thing.
In other words you might want something like this:
remaining_text = '01-23-45-67-89-AF whatever ...'
mac, remaining_text = parse_next(remaining_text, MAC)
assert mac == MAC('01-23-45-67-89-AF')
assert remaining text == 'whatever ...'
And of course you want some sort of sensible error behavior.
Notably, right now macaddress
doesn't help you do this. It assumes you already have strings which are the right size to pass in to the class constructor or parse
.
Of course, neither does ipaddress
... is this a deficit on the part of ipaddress
? Or should this kind of functionality be provided somehow by higher-level wrapper libraries?
Notably:
If an incremental parsing library can't share parsing logic, then it has to reimplement a lot of the same parsing logic.
If an incremental parsing library has to reuse ipaddress
or macaddress
as it is right now, then they must necessarily take on an inefficiency:
Imagine you have a string that you want to parse for the next "token" being either an EUI48 or OUI literal. (Maybe you're writing a mini language for working with networking settings, like for scripting a router or something.)
(Quick aside: when parsing stuff from the front of something longer, parses might want to do either a greedy match or conservative match: if you see the string 01-02-03-04-05-06
, do you conservatively match just OUI('01-02-03')
and leave -04...
unconsumed, or do you greedily match the whole EUI48('01-02-03-04-05-06')
? In this use-case I think it's obvious that you'd want to do the greedy match, but in some cases you might want the conservative one.)
So the point is, from an efficiency perspective, ideally the parsing consumes 01-02-03
, and then decides based on the next character whether or not it is done (and has parsed an OUI
instance) or needs to continue to try to parse a full EUI48
instance.
(There's another permutation: if the next character is -
, then sometimes you want to consider the OUI
possibility as eliminated, so that if the rest doesn't parse into a full EUI48
, it's now an error, and other times you want to consider the OUI
possibility still "live", so that if the greedy match fails, you treat it as if you just parsed an OUI
instance and stopped there, so that the next step of the parser can read from right after the OUI.)
In all of these cases though, it kinda makes sense to me to parse all the possibilities incrementally - check each character against all parse possibilities - the way I do in macaddress._parse
. And while the logic of macaddress._parse
assumes a string containing one complete address, and only tries the formats which match the size, it's probably pretty close to what an incremental parser would want in that case.
So it seems like a shame that I've written some careful logic, which can parse canonical and custom formats into all these hardware address types represented as distinctly typed objects in memory, and yet if someone wanted to write parser which parses hardware addresses from the next characters in a string, they'd basically have to reimplement a lot of what I did, or use what I did at the cost of a lot of boilerplate and inefficiency.
Tentative verdict: add support for __add__
and __sub__
but not __radd__
and __rsub__
. Don't wrap around in the edges.
Questions:
MAC('01-02-03-04-05-06') + 1
work to step to the "next" address, and produce MAC('01-02-03-04-05-07')
?MAC('FF-FF-FF-FF-FF-FF') + 1
? (Error or wraparound to MAC('00-00-00-00-00-00')
?)My thoughts on those questions:
The main benefits if incrementing is supported natively with operator overloads:
MAC(int(mac) + 1)
becomes just mac + 1
mac = MAC(int(mac) + 1)
becomes just mac += 1
type(instance)(int(instance) + 1)
becomes just instance + 1
The first two are just convenient and ergonomic for humans, especialy during interactive use in a REPL, Jupyter Notebook, whatever. For reference, the ipaddress
module supports this, and I'm pretty sure I've used that before during prototyping/testing/fiddling.
That last one is particularly compelling to me in a "this should be a feature in the library" way because the ability to write type-generic/type-abstracted helper code is huge for empowering people to cleanly refactor repetitive code and build helper libraries that might be useful across a wider variety of use-cases.
This may also prove helpful for implementing sequences/prefixes (mostly discussed in #7 ). Not overwhelmingly compelling in itself, but the fact that I reached for it several times while playing with sequences/prefixes smells like another angle of this being a good idea.
Also, if I was implementing a MAC address allocator (normally I use the example of a hardware manufacturer, but actually an example closer to most of us is virtual networking stuff - think multiple containers or virtual machines on the same computer talking to each other), then the ability to concisely express "the next MAC address" seems like it would be rather nice.
One problem that I see with just doing simple duck typing is that this implementation would be broken:
def __add__(self, other):
return type(self)(int(self) + int(other))
because then stuff like this would work in surprising ways: a_mac_address + 0.5
, a_mac_address + other_mac_address
, and a_mac_address + an_ip_address
- but all of those are nonsensical at face value and more likely to be an error than an intentional coercion. These also don't have a single natural interpretation/expectation/behavior/semantics, whereas if I do + 1
onto an address, I think there's a good argument that this has one interpretation that fits much better than any other.
Similarly, this is less broken but still broken:
def __add__(self, other):
return type(self)(int(self) + other)
def __radd__(self, other):
return type(self)(other + int(self))
because while this avoids the problems with silently coercively tolerating floats and so on, this would still allow a_mac_address + another_mac_address
to silently work.
That last one can be interpreted as either a problem with naive permissive duck-typing or as a problem with implementing __radd__
. But notably, operator reversal can work so long as there's more robust type checking:
def __add__(self, other):
if not isinstance(other, int):
return NotImplemented
return type(self)(int(self) + other)
def __radd__(self, other):
if not isinstance(other, int):
return NotImplemented
return type(self)(other + int(self))
Seems super easy to me: every justification for overloading +
in our case supports overloading -
as well, because +
is basically defined by going that many "steps" along the natural ordering of addresses, so naturally -
is just going back by the same steps.
So this one was a serious challenge for me, deciding if integer + mac
should also work, or if only mac + integer
should work?
As a practical matter: increment = functools.partial(operator.add, 1))
is something that only works on address instances if they implement __radd__
.
I also had a concern that implementing both __add__
and __radd__
could cause the problems described in the previous point - but I think I've convinced myself that this can be properly solved by implementing type-checking, and that this type checking is worth implementing anyway.
But the biggest thing was deciding if order of +
parameters should be meaningful? My initial reflex was to think that order of operations shouldn't matter: all the typical +
operations I got taught in school and that I normally see in math, science, and programming have +
be commutative. But then I thought about -
and it hit me that we can validly think of +
in an order-matters way - similarly to -
. Because obviously 1 - mac_address
doesn't make sense. So the reason why mac_address - 1
makes sense is that we can think of -
as "decrement by", and there is a natural way to sort addresses, etc. So we could also think of +
as "increment by". And "increment by" is not a commutative operation. It makes sense to "increment this address by one" (go to the "next" address), but it doesn't make sense to "increment this number by this address".
Plus, this is what ipaddress
does: the base classes implement __add__
and __sub__
but not __radd__
and __rsub__
.
So for subtraction, it makes sense to not implement __rsub__
, and so for symmetry between addition and subtraction, it seems like it's better to not implement __radd__
until there's a really compelling argument.
Precedent: ipaddress
classes error out with a value error on address overflow and underflow. Do I have good reason to do otherwise? (Also lesser/looser/vaguer/distant-er precedent: Python integers are not wrap-around, they're arbitrary-precision.)
I think consistency is important within an ecosystem - so if Python developers are used to things not wrapping around, there needs to be a good reason to for a Python macaddress library to wrap around. And if Python developers are not use to thinking of not-obviously-superficially-integer things being "actually" integers (represented or just representable as integers), there needs to be a good reason for semantics that only make sense for integers.
Now, as someone very comfortable with fixed-width wrap-around integers at the bare metal level and languages such as C, Go, etc, and as someone comfortable with low-level implementation details and translating between functionally equivalent representations, I personally would find it intuitive to have MAC('FF-FF-FF-FF-FF-FF') + 1 == MAC('00-00-00-00-00-00')
.
But I think that's a very non-representative perspective. Near as I can tell, a lot of Python devs are used to operating at a higher level of abstraction, where this would be surprising and nonsensical.
So what good reasons would there be to overpower that? The only one I can think of is if you want to start with a random hardware address instance, and then perform a single operation that would always give you a different valid address. But obviously the most readable form of that is something like mac + 1 if mac < MAC('FF-FF-FF-FF-FF-FF') else mac - 1
. In one-off interactive REPL fiddling, you could do MAC(int(mac) & ((1 << mac.size) - 1))
. Which is annoying, but not awful - if you find that intuitive enough to spontaneously think of it, you can probably decide to do it. Similarly, if I come up with a good you can address range/sequence addition to this library, then that can be combined with a general-purpose itertool-style thing that turns any sequence into a cycle.
Should the HWAddress
base class be changed to an abstract base class, a metaclass, or a decorator? (Edit: or get an __init_subclass__
method?)
Here's a concept: when the "constructor" (__init__
) of a hardware address class gets a hardware address instance of a different size as its address parameter, we could truncate it or zero-pad it (from the "right") as needed to match the newly constructed address size.
Should we do this?
Right now the code uses self.size
. Should it use type(self).size
instead?
I am tempted to say "yes".
The only behavior difference I can think of is that
HWAddress
subclass where size is defined on the instance rather than on the class would no longer work, andAttributeError
messages may differ - in CPython, that difference is that it says type object 'YourSubclass' ...
instead of 'YourSubclass' object ...
.If you have a use-case for weakly referencing classes provided by this library, let me know.
I don't see a use-case for it, because weak references exist to break reference cycles, but the classes provided by this library don't really reference anything.
So I've been holding off on adding __weakref__
to __slots__
, because it seems pointless. And I kinda want to wait and see if it ever actually comes up.
But I'm open to adding it if anyone runs into a good use-case for it.
Kinda like #6: the formats could be a class-only thing, or optionally an instance-specific thing. Should they be gotten from the instance instead of from the class?
I mostly reached a stable decision on this right after #6, matching that decision, but I wanted to open this issue to discuss/record the thinking for the format-specific stuff.
I'm noticing that there's an older version on PyPi (1.2.0), but the latest here is 2.0.0. Is there any plan to release that to PyPi (I didn't compare the differences between what's in 1.2.0 and 2.0.0).
Thanks!
If someone came to me in code review at a job with all four of __lt__
, __le__
, __gt__
, and __ge__
implemented, I'd immediately say "delete three of them and just use @functools.total_ordering
: less potential for error, better signal-to-noise when reading code, [...]".
So is there a good enough reason for me to keep not applying my own advice to library code like this?
Not exactly actual hardware addresses, but SNAP
protocol uses OUI as the first part, so if I ever had to write code that needs to wor with SNAP
, I could totally see myself doing something like this:
class SNAPAddress(macaddress.HWAddress):
size = 40
formats = ...
And then maybe doing stuff like OUI(my_snap_address)
(assuming #8), etc.
Anyway, nothing to do for now, just jotting it down as another example of almost-hardware-address things which this library could be useful for.
Should macaddress.MAC
be provided by this library (it currently is)?
If so, should it be
macaddress.EUI48
(like it is now),macaddress.EUI48
,Should this library provide something analogous to the ipaddress
"network" classes? If so, what should that look like?
If a hardware address class does not specify formats
, but does specify a size
, we could generate at least one default format string, for example like this:
'x' * (TheClass.size // 4)
But should we?
Currently, if you need to parse with alternative formats, you need to make a subclass:
class MAC(macaddress.MAC):
formats = macaddress.MAC.formats + (
'xxx-xxx-xxx-xxx',
'xxx:xxx:xxx:xxx',
'xxx xxx xxx xxx',
)
And if you want to control the output format, you can do that by making the first format what you want:
class MAC(macaddress.MAC):
formats = (
'xxx-xxx-xxx-xxx',
'xxx:xxx:xxx:xxx',
'xxx xxx xxx xxx',
) + macaddress.MAC.formats
This works, and it has a very "snug fit" to the problem: the classes you're creating are still MAC addresses, they can be used anywhere those classes can be used, and behave identically except for the exact output format and input formats.
But... I have never been 100% comfortable and satisfied with using subclassing for this. It works, it is pretty comfortable/ and ergonomic to use, and I can't think of a better way, but I can't help but feel like there might be a better way that I am just missing.
So, thoughts?
I've been thinking that the Right(tm) way to do SemVer looks like a version bump for every. single. change.
Basically, each "atomic change" (from the original meaning of the word "atom" - so small you can't divide it further - a change that can't be split up into smaller independently usable changes) would get its own release and version number.
But what this means in practice is that version numbers would sprint up - sometimes there might be several "major" version bumps in a single day.
For context, if I was following that idea, just the changes I have in mind to implement and release so far would get us to at least version 5.0.0:
macaddress.MAC
becomes alias instead of subclass of macaddress.EUI48
.OUI(my_mac_address)
).oui
attribute.But knowing me, I would implement and release all those in a flurry of activity over the course of a day or two. I wouldn't want to inject artificial delays between them. That's for your automation and compatibility testing to deal with, if you need the delays.
Of course this is fine and good if you really embrace SemVer as a robotic usage of version numbers to communicate programmatically useful information to tooling and automation about API compatibility.
In other words, if you think about SemVer as if you're a finely oiled machine which feels nothing except eager ruthless joy at
then one release per atomic change is really nice, elegant, sensible, and very helpful.
In fact, if you're a human in a world where that tooling exists and works well, used to working on projects which take full advantage of that in their CI/CD, well then you'd probably really want projects to roll releases like that.
But if you're a human manually watching for updates of dependencies and manually checking what changed, this would lead to a worse experience. One day you check the dependency and see that you're "four major versions behind?!?" and if you're not used to this idea, it's surprising, weird, and would probably leave you feeling like your time was wasted after you make an effort to find what major thing changed in each one, when really each one was just a tiny breaking change which didn't matter to you.
It also has a slight impression management downside: like back when Google Chrome first came out and shot through major version after major version - Firefox had been sitting in the ballpark of 3.* for years, because they were saving major versions for the really big stuff, then Chrome was up at version 12.* in like a year - and people felt stuff like "why are there so many major versions?" and "they're just trying to pad their numbers to look on par with the major browsers".
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.