more-itertools / more-itertools Goto Github PK
View Code? Open in Web Editor NEWMore routines for operating on iterables, beyond itertools
Home Page: https://more-itertools.rtfd.io
License: MIT License
More routines for operating on iterables, beyond itertools
Home Page: https://more-itertools.rtfd.io
License: MIT License
The documentation for more_itertools.first
claims:
It is marginally shorter than
next(iter(...))
but saves you an entiretry
/except
when you want to provide a fallback value.
This is completely false.
The next
function has always had the second argument to be used exactly as is done by first
. (See the What's new in Python 2.6 document: https://docs.python.org/3.6/whatsnew/2.6.html#other-language-changes) so calling first
doesn't save any try
/except
if you know how to use next
properly.
I'd remove that final sentence and, instead, add the fact that first
is simply a better name for that operation.
Possibly controversial:
I don't think the chunked
function should yield tuples. I think it should yield lists.
The position of each item in the returned sub-iterable don't have intrinsic meaning, and they are homogeneous sub-sections of the original iterable. Therefore, IMO, the most "correct" thing to return is a list.
http://wescpy.blogspot.co.uk/2012/05/tuples-arent-what-you-think-theyre-for.html
Making this change to the existing chunked
function is pretty trivial:
diff --git a/more_itertools/__init__.py b/more_itertools/__init__.py
index 0c769f4..d309375 100644
--- a/more_itertools/__init__.py
+++ b/more_itertools/__init__.py
@@ -28,7 +28,7 @@ def chunked(iterable, n):
for group in izip_longest(*[iter(iterable)] * n, fillvalue=_marker):
if group[-1] is _marker:
# If this is the last group, shuck off the padding:
- group = tuple(x for x in group if x is not _marker)
+ group = [x for x in group if x is not _marker]
yield group
But if you agree with this, and also agree with #5, the new version of chunked
that supports slicing should also yield lists.
The new context
itertool tries to expose a context manager as an iterable. This breaks the context manager manager guarantee that __exit__
will be called. It's not enough to tell the caller that he has to iterate over the whole iterable. Even if there are no break
or return
statements in the loop, there is always the possibility of exceptions. The whole point of context managers is to guarantee that the __exit__
is always called when a block terminates. This is why context managers and iterables are orthogonal concepts; in general, one cannot be made to look like the other.
Please remove context
because it encourages people to write bad code.
There is no benefit to context
in any case. Even the motivating example in the documentation is just:
consume(print(x, file=f) for f in context(file_obj) for x in it)
which can be written just as succinctly
with file_obj as f:
consume(print(x, file=f) for x in it)
https://pypi.python.org/pypi/more-itertools/ is showing the raw restructured text source code, not rendered HTML. Maybe a syntax error somewhere?
I came across this recipe by R. Hettinger:
>>> x = [False,True,True,False,True,False,True,False,False,False,True,False,True]
>>> nth_item(50, True, x)
-1
>>> nth_item(0, True, x)
1
>>> nth_item(1, True, x)
2
>>> nth_item(2, True, x)
4
>>> nth_item(3, True, x)
6
Code
>>> from itertools import compress, count, imap, islice
>>> from functools import partial
>>> from operator import eq
>>> def nth_item(n, item, iterable):
indicies = compress(count(), imap(partial(eq, item), iterable))
return next(islice(indicies, n, None), -1)
I thought this may be a useful addition.
Abstract
This feature request proposes extending functionality in more_itertools.windowed
by producing windows separated by a given step.
Sliding windows are known for advancing a fixed length from one adjacent item to the next (e.g. step=1
) continuously. Can an option be made for implementing larger step sizes?
Example
>>> from more_itertools import windowed
>>> iterable = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# Present: Continuous Sliding
>>> all_windows = windowed(iterable, 3)
>>> list(all_windows)
[(1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 7), (6, 7, 8), (7, 8, 9), (8, 9, 10)]
# Proposed: Steps
>>> stepped_windows = windowed(iterable, 3, step=2)
>>> list(stepped_windows)
[(1, 2, 3), (3, 4, 5), (5, 6, 7), (7, 8, 9)]
By default, the iterator returns windows spanning the length of the given iterable. Therefore, sliding may stop abruptly for windows that are not equally divisible (e.g. leaving out item 10
). However, if all items in the iterable are desired, there can be an option for including the tail and wrapping the results back to the head, as requested in this SO Post:
# Proposed: Steps and Wrapping
>>> stepped_windows = windowed(iterable, 3, step=2, wrap=True)
>>> list(stepped_windows)
[(1, 2, 3), (3, 4, 5), (5, 6, 7), (7, 8, 9), (9, 1, 2)]
Further Considerations
I believe I have a working prototype for this implementation with passing unittests and comparable performance times that I can post here if this is a plausible feature and you are interested in further discussion.
you may want to steal the cardinality.count()
implementation from
https://github.com/wbolster/cardinality ;)
http://docs.python.org/library/itertools.html#recipes has generically useful tools. Let's include them here.
I just now discovered more-itertools. Thanks for that!
I'd like to leverage this project to supersede the implementations I have in jaraco.util.itertools (docs). I see in the README the invite to send pull requests, but since this effort would be a somewhat large undertaking, I'd like to preface with some questions and concerns.
The next()
function takes a default
arg, which the first()
function currently does not use, so I think first()
can be written as a one-liner: return next(iter(iterable), default)
.
Implementation at time of writing: https://github.com/erikrose/more-itertools/blob/master/more_itertools/more.py#L39
I just came up with another function that I don't think already exists and might make a useful addition to more-itertools: identifying elements in an iterable that match a predicate or are adjacent to those matching a predicate. As a stupid (but simple) example, suppose I want to detect which letters are vowels or adjacent to vowels in a word. The design I have in mind returns a tuple for each element in the iterable with a boolean indicating whether it is or is adjacent to a "selected" element, as well as the element itself.
>>> list(adjacent(lambda c: c in 'aeiou', 'thursday'))
[(False, 't'), (True, 'h'), (True, 'u'), (True, 'r'), (False, 's'), (True, 'd'), (True, 'a'), (True, 'y')]
In my application it's important to know where the (equivalents of the) vowel-centered groups end and begin, in addition to knowing which elements are in those groups, so I pass the result of adjacent()
through groupby()
. (This is akin to choosing context lines in a differ.) If I wanted an iterable of just ['h', 'u', 'r', 'd', 'a', 'y']
in that example, I could use instead use filter()
and map()
or a generator, i.e. (e[1] for e in adjacent(...) if e[0])
. I think this flexibility is important.
The question I want to bring up before sending in a pull request is how to generalize this. My base implementation is the following:
def adjacent(predicate, iterable):
i1, i2 = tee(iterable)
selected = chain([False], map(predicate, i1), [False])
adjacent = map(any, windowed(selected, 3))
return zip(adjacent, i2)
(designed to avoid calling predicate()
more than once per item) It's easy enough to change the number of elements of "context" by increasing the second argument to windowed()
, and that would be a straightforward generalization. Is it also worthwhile to support arbitrary "masks" by using stagger()
instead of windowed()
? E.g. passing offsets=(-1, 1)
to "mark" only elements which are before or after those which satisfy the predicate, leaving out the ones which do satisfy the predicate themselves? Or offsets=(0,1,2)
to "mark" items which satisfy the predicate and the two that follow them?
I suspect the README is the first (and often the only) stop from visitors to most packages. Through the visitor's lens, I observe the following:
First, it's actually not clear how to install the package (from either the README or the docs). I propose adding a line on installation to the README.
...
> pip install more_itertools
...
Second, right now the first tool in the API docs is the new more_itertools.adjacent
, which has a heavier docstring than traditional ones. IMO, seeing this much text for the first tool is less inviting to newcomers trying to quickly figure out what this package does. I suggest adding a couple elegant examples to the README, maybe a recipe and original, e.g. flatten
and chunked
.
...
>>> import more_itertools as mit
# Itertools Recipe
>>> list(mit.flatten([[0, 1], [2, 3]]))
[0, 1, 2, 3]
# More-Itertools Original
>>> list(mit.chunked([1, 2, 3, 4, 5, 6, 7], 3))
[[1, 2, 3], [4, 5, 6], [7]]
...
The idea is to succinctly demonstrate up front that this package is a simple extension of itertools
, and it is easy to use.
I noticed the signature for windowed
included seq
, i.e. windowed(seq, ...)
. I consider sequences to be "sliceable iterables", so I think the correct name should be iterable instead, i.e. windowed(iterable, ...)
.
From Python docs:
sequence
An iterable which supports efficient element access using integer indices via the getitem() special method and defines a len() method that returns the length of the sequence. Some built-in sequence types are list, str, tuple, and unicode. Note that dict also supports getitem() and len(), but is considered a mapping rather than a sequence because the lookups use arbitrary immutable keys rather than integers.
iterable
An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an iter() method or with a getitem() method that implements Sequence semantics.
The reason I bring this up is because sliced
requires a sequence (as well as certain Python builtins that exclude general iterables). Since windowed
can also be applied to non-sequences (e.g. dicts), iterable
seems the appropriate variable name, consistent with other patterns in the source.
As far as I can tell, changing the name has no negative effect on the code as the iterable is made into an iterator and used no where else.
I like how peekable can look ahead and modify without affecting it's iterator and I like how spy can look ahead more than one.
It would be really nice if I could specify how many I want to look ahead with peek.
>>> a = peekable((1,2,3,4,5))
>>> a.peek(2)
2
Add a generator that produces a generator for each chunk, where each chunk is defined by a "is first item" or "is last item" predicate.
Basic code for "is first item" idea:
def chunk_by_first(seq, pred):
"""chunk_by_first(iterable, callable) -> list, ...
Breaks up an iterable based on a predicate: If true, break before that item.
"""
buf = []
for i in seq:
if pred(i) and len(buf):
yield buf
buf = []
buf.append(i)
yield buf
(This implementation is of course undesirable if chunks are of considerable size. But should concisely express the idea.)
I'm not sure if this is a fitting addition to more-itertools
but it's a method I use quite often. This function sorts iterables using a defined order of priority. So you can sort iterables in concordance with a given sort pattern. I suppose it's tough to explain so here are three examples.
# Will sort all iterables based on the ascending sort order of the first iterable
>>>sort_iterables_by([['a', 'd', 'c', 'd'], [1, 3, 2, 4]], key_list=(0,))
[('a', 'c', 'd', 'd'), (1, 2, 3, 4)]
# Will sort all iterables based on the ascending sort order of the first iterable,
# then the second iterable
>>>sort_iterables_by([['d', 'd', 'd', 'c'], [4, 3, 7, 10], [1, 2, 3, 4]],
key_list=(0, 1))
[('c', 'd', 'd', 'd'), (10, 3, 4, 7), (4, 2, 1, 3)]
# Will sort all iterables based on the descending sort order of the first iterable,
# then the second iterable
>>>sort_iterables_by([['a', 'b', 'b'], [1, 3, 2]],
>>> key_list=(0, 1),
>>> reverse=True))
[('b', 'b', 'a'), (3, 2, 1)]
Here is the function I propose
import operator
def sort_iterables_by(iterables, key_list=(0,), reverse=False):
return list(zip(*sorted(zip(*iterables),
key=operator.itemgetter(*key_list),
reverse=reverse)))
What do you guys think? A useful addition? One remark is that because zip
is used, iterables are returned trimmed to the length of the shortest iterable before sorting. An alternate form of the function could be used with zip_longest
although for lists with heterogeneous objects no fillvalue will make obvious sense.
Example:
import operator
import itertools
def sort_iterables_by(iterables, key_list=(0,), reverse=False, fillvalue=None):
return list(zip(*sorted(itertools.zip_longest(*iterables, fillvalue=fillvalue),
key=operator.itemgetter(*key_list),
reverse=reverse)))
Hi,
First - really like this package. I've written most of these things several times for various projects so it's nice to have them all in one place.
I have a couple of suggestions which I'll add as tickets. I'm more than happy to do the work to implement them if you prefer (in fact I've already made a start) but I wanted to get a few design decisions first.
So this ticket relates to the chunked
function. Sometimes the existing behaviour is exactly what you want - yielding fixed-sized chunks of a possibly infinite-length iterable. However, sometimes you actually want to slice the iterable to get back the chunks. For example, if the API you're using makes a call to a database with an offset and a limit each time you slice the iterable. In this case you don't want to load all the rows into memory to start yielding chunks. You're essentially paginating the iterable, and yielding a page at a time.
The basic implementation of this pattern is here: http://stackoverflow.com/questions/3744451/is-this-how-you-paginate-or-is-there-a-better-algorithm
def getrows_byslice(seq, rowlen):
for start in xrange(0, len(seq), rowlen):
yield seq[start:start+rowlen]
This could be added to the library in a couple of ways. It could be an additional function alongside chunked
(I'm thinking chunked_slices
or maybe just paginate
). Alternatively, it could be implemented as an alternative behaviour of chunked
, by passing an argument, like this:
def chunked(iterable, n, slice=False):
if slice:
iterable = list(iterable)
for start in xrange(0, len(iterable), n):
yield tuple(iterable[start:start + n])
else:
for group in izip_longest(*[iter(iterable)] * n, fillvalue=_marker):
if group[-1] is _marker:
group = tuple(x for x in group if x is not _marker)
yield group
What do you think?
I've been looking for a good python functional programming library for a while now. PyFunctional is an ok start but is lacking a lot of features that I would expect from a functional library. Most importantly it does not make reusable pipes meaning combined functions cannot be used more than once, which isn't good if the pipe is going to be used a lot.
Do you know of any libraries that do allow reusable pipes?
I also think that the functionality of more-itertools would be amazing if we could add them to the PyFunctional package or do something similar.
Let me know what you think!
Using tox 1.4.2 I get the following error when I run tox
:
$ tox
ERROR: unknown environment 'py32 # Python 3.1 and 3.0 might work as well.'
It looks like tox doesn't handle comments in the env list. Removing the comment fixes the issue and tox runs successfully.
Hi there,
Nice idea to create more_itertools package! But it seems more_itertools.collate is duplicating heapq.merge from standard library with minor differences. Do you really need it?
Usually it is good to check have you anything in peekable as part of more complex checks.
For example:
while p.peek(None) is not None and p.peek().type == 1:
Will be shorter and more readable as:
while not p.empty() and p.peek().type == 1:
or even:
while p.has() and p.peek().type == 1:
# === variant
while p.more() and p.peek().type == 1:
Closing a file after yielding its last line is all well and good, but I wonder: can we be more general and add power without losing much brevity?
side_effect(log, some_file, last=lambda: some_file.close())
side_effect(ingest, listdir(some_dir), last=lambda: rmtree(some_dir))
I'm finally getting around to reconciling always_iterable
in jaraco.itertools, as contributed to this project in #37 and #108.
In September last year, I discovered that one would be unlikely to want to iterate over a dictionary (or other Mapping) when using always_iterable
, so in jaraco.itertools 2.0, Mappings were treated as non-iterable. The reasoning, as found in the docs, is that a dictionary is likely to be intended as a single object rather than a sequence of keys, but also that one can readily pass iter(dict) or dict.keys() if one does want the value to be treated as iterable.
In order for more_itertools.more.always_iterable to supplant jaraco.itertools.always_iterable, I'd like for more_itertools to adopt this behavior as well.
What do you think?
Some atexit handler explodes after running the tests with python setup.py test
:
----------------------------------------------------------------------
Ran 7 tests in 0.022s
OK
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
bbedit +24 /usr/local/Cellar/python2.6/2.6.7/lib/python2.6/atexit.py # _run_exitfuncs
func(*targs, **kargs)
bbedit +258 /usr/local/Cellar/python2.6/2.6.7/lib/python2.6/multiprocessing/util.py # _exit_function
info('process shutting down')
TypeError: 'NoneType' object is not callable
Error in sys.exitfunc:
Traceback (most recent call last):
bbedit +24 /usr/local/Cellar/python2.6/2.6.7/lib/python2.6/atexit.py # _run_exitfuncs
func(*targs, **kargs)
bbedit +258 /usr/local/Cellar/python2.6/2.6.7/lib/python2.6/multiprocessing/util.py # _exit_function
info('process shutting down')
TypeError: 'NoneType' object is not callable
Hello,
sometimes I need to peek at the first element of an iterable. I usually implement it like this:
def peek(iterable):
iterator = iter(iterable)
item = next(iterator)
return item, itertools.chain([item], iterator)
element, my_list = peek(my_list)
��
Would this be an interesting addition to more-itertools? If so, I'll send a pull request sometime in the future.
Cheers.
I expect the following peekable
code to raise a StopIteration
error, but it runs without warning:
iterable = "A B C".split()
p = mit.peekable(iterable)
while p:
line = next(p)
print(line, end=" ")
# A B C
By comparison, most iterators and generators I've tried raise an error:
iterable = "A B C".split()
p = iter(iterable)
while p:
line = next(p)
print(line, end=" ")
Output
A B C
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
<ipython-input-67-fe7c490ca45a> in <module>()
3
4 while p:
----> 5 line = next(p)
6 print(line, end=" ")
StopIteration:
Note, if replaced with while True
, the StopIteration
is raised as expected. However, the sudden ending of the while loop in the first example seems like a bug as it is unclear why the loop ended. I understand the peek
method has an exception handler, although this method it is not directly called in the former example.
Before investigating further, regarding the first example:
peekable
not to raise a StopIteration
?while
loop to end?I had thought to add an index to peek()
for grabbing, say, the 2nd item (and making peekable
stack-based), but it'd be a little awkward to have a 2nd param on there. What if, instead, peekables were sliceable?
peekable(some_iter)[1]
would be equivalent to peekable(some_iter).peek()
.
peekable(some_iter)[2:8]
would also work (and look ahead, without appearing to advance the iterator).
We'd probably start off supporting only indexing and might never support negative indices.
We could also support peekable(some_iter).get(2, 'default')
for having default fallbacks for arbitrary indexes.
The Python 3 docs have additional recipes that aren't in more-itertools
, in particular:
tail
all_equal
partition
first_true
I'll add them to recipes
. I think we should also have accumulate
for Python 2.7 users.
While looking up the package details to recommend it to someone, I noticed that the docs for more_itertools.first claim it saves a try/except for the case when the iterable might be empty.
This is not correct, as the next
builtin also supports a default argument for the empty iterator case: http://docs.python.org/2/library/functions.html#next
Recently while looking at the recipes, I noticed robinrobin
gives similar results to interleave_longest
:
import more_itertools as mit
iterables = ['ABC', 'D', 'EF']
list(mit.roundrobin(*iterables))
# ['A', 'D', 'E', 'B', 'F', 'C']
list(mit.interleave_longest(*iterables))
# ['A', 'D', 'E', 'B', 'F', 'C']
I realize interleave_longest
was discussed among other items in #22, but it seems it's similarity to an existing recipe may have been overlooked. Is there a rationale for keeping both tools?
I'd like to suggest adding a wrapper that allows pushing a value back on to an iterator, so that the next call to next(it)
will return the pushed value before the next element from the underlying iterable. I find myself wanting this from time to time (usually in parsing applications), and I could have sworn it was implemented somewhere standard, but I looked around and couldn't find it. Would this be a good addition to more-itertools?
I do have code to offer, but I'm posing this as an issue instead of a pull request because I have a dilemma. I've come up with two implementations, one as a generator function
def pushback(iterable, maxlen=None):
iterable = iter(iterable)
# add 1 to account for the append(None)
stack = deque(maxlen=maxlen + 1 if maxlen is not None else None)
while True:
if stack:
e = stack.pop()
else:
e = next(iterable)
sent = yield e
if sent is not None:
stack.append(sent)
stack.append(None) # dummy value to return from send()
and the other as a class
class pushback:
def __init__(self, iterable, maxlen=None):
self.iterable = iter(iterable)
self.stack = deque(maxlen=maxlen)
def __iter__(self):
return self
def __next__(self):
return self.stack.pop() if self.stack else next(self.iterable)
def send(self, value):
self.stack.append(value)
The function implementation is about twice as fast in my preliminary tests (using IPython)
In [13]: %timeit list(pushback_function(range(10)))
100000 loops, best of 3: 5.45 µs per loop
In [14]: %timeit list(pushback_class(range(10)))
100000 loops, best of 3: 10.8 µs per loop
On the other hand the class implementation is conceptually cleaner, and also does not need to be "primed" by calling next(it)
before sending in a value with it.send(x)
.
Now, in most cases, you can prime the generator iterator without losing an item by running it.send(next(it))
, and that could be done in a wrapper function to make it transparent to client code. But only the class implementation allows pushing in front of an empty iterable (admittedly a rather pathological use case):
>>> it = pushback([])
>>> it.send(10)
>>> list(it)
[10]
So my point is: if this is something you want for more-itertools, which implementation to use? Or is there a way to "fix" one of them to make it strictly better than the other, that I'm not seeing? (Or does this whole thing already exist and I wasted an evening?)
From this SO post, given
number = 123456789012345678901234567890
expected = "12345 67890 12345 67890 12345 67890"
This looks like an opportunity for interspearse
. However, the present implementation "injects" (actually zips) one unique element between every element of the iterable. I propose modifying interspearse
to inject an element between every n elements, e.g. a space every 5 characters in the expected
string.
Here is a quick modification to the intersperse
code adding an n
keyword argument:
import itertools
import more_itertools as mit
def intersperse(e, iterable, n=1):
it = iter(mit.chunked(iterable, n)) # dependency
filler = itertools.repeat(e)
zipped = mit.collapse(zip(filler, it)) # dependency
next(zipped)
return zipped
Results
print(list(intersperse('x', 'ABCD')))
print(list(intersperse('x', 'ABCD', 2)))
# ['A', 'x', 'B', 'x', 'C', 'x', 'D']
# ['A', 'B', 'x', 'C', 'D']
print(list(intersperse(None, [1,2,3])))
print(list(intersperse(None, [1,2,3], 2)))
# [1, None, 2, None, 3]
# [1, 2, None, 3]
print("".join(intersperse(" ", str(number), 5)))
# 12345 67890 12345 67890 12345 67890
These are minor changes, i.e. adding chunked
and substituting flatten
with collapse
. The downside is that this modified implementation depends on other tools. I imagine the hope is to keep new recipes independent. Before proceeding, are there any thoughts on adding a keyword, suggestions for a different implementation or desire to keep as is?
Related posts
Oof. I think this is the :func:
directive again - our regex doesn't catch the .
in :func:run_length.decode.
Before PyPI had a way to manually edit things to fix an existing release, but that seems to be gone.
PR #56 and PR #58 both raise the idea of having a function that splits an iterator into a group of sub-iterators with a fixed length. That is, a version of chunked()
that emits iterators instead of lists.
I was hoping to be able to modify chunked()
to do this via a parameter or something, but I think performance would suffer. The simple version from #58 isn't viable.
>>> for func in (original, ichunked_new, ichunked_pr58):
... def stmt():
... iterable = range(2000) # Obviously performance will vary with iterable and n
... n = 101
... all_chunks = list(func(iterable, n))
... assert len(all_chunks) == 20
...
... result = timeit(stmt, number=10000)
... print(func.__name__, result)
original 0.7484618649759796
ichunked_new 0.971211633994244
ichunked_pr58 17.300433913012967
So I think a separate function (ichunked
, I guess) is called for!
from itertools import chain, islice, zip_longest
from more_itertools import consume, peekable
from timeit import timeit
def original(iterable, n):
it = iter(iterable)
while True:
chunk = list(islice(it, n))
if not chunk:
return
yield chunk
def ichunked_pr58(iterable, n, emit_lists=True):
p = peekable(iterable)
while p:
chunk = islice(p, n)
if emit_lists:
yield list(chunk)
else:
yield chunk
consume(chunk)
def ichunked_new(iterable, n, emit_lists=True):
it = iter(iterable)
while True:
test_chunk = islice(it, n)
try:
item = next(test_chunk)
except StopIteration:
return
chunk = chain([item], test_chunk)
if emit_lists:
yield list(chunk)
else:
yield chunk
consume(test_chunk)
for func in (original, ichunked_new, ichunked_pr58):
def stmt():
iterable = range(2000)
n = 101
all_chunks = list(func(iterable, n))
assert len(all_chunks) == 20
result = timeit(stmt, number=10000)
print(func.__name__, result)
I'm getting a error trying to access bucket
.
>>> more_itertools.bucket(iterable, key=lambda s: s[0])
...
AttributeError: module 'more_itertools' has no attribute 'bucket'
Has bucket
been deprecated? If so, the latest docs need to be updated.
Same for more_itertools.collapse
.
This ticket is a minor feature suggestion, not a bug/issue.
I've been using chunked for years, the difference being, the lists (chunks) that are yielded are often huge. The scenario is numerical computing.
To allow releasing the memory more quickly, I don't keep a reference to the yielded object in chunked
, so that it can be easily garbage collected (once all other outside references are gone).
Code:
def chunked(iterable, chunksize):
"""
Return elements from the iterable in `chunksize`-ed lists. The last returned
list may be smaller (if length of collection is not divisible by `chunksize`).
>>> list(chunked(xrange(10), 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
"""
it = iter(iterable)
while True:
wrapped_chunk = [list(itertools.islice(it, chunksize))]
if not wrapped_chunk[0]:
break
# memory opt: wrap the chunk and then pop(), to avoid leaving behind a reference
yield wrapped_chunk.pop()
Is there any use-case for numeric_range
with only one argument? It seems the type of the objects that are returned are solely depending on the type
of start
and step
.
When only stop
is given (even if I use float
s or Decimal
as stop
) it will always return integers. And it's a lot slower than range
.
I am slightly confused but it seems there might be either a bug of some kind (possibly related to more-itertools
) or a misunderstanding on my part.
Under python 3.6
shell:
>>> from itertools import islice
>>> from more_itertools import ilen
>>> iterable = [0, 40, 20, 30]
>>> ilen(iterable)
4
>>> i = 0
>>> slicesz=2
>>> slc = islice(iterable, i, slicesz)
>>> slc
<itertools.islice object at 0x7fce0964c728>
>>> ilen(slc)
2
>>> avg = lambda l: sum(l)/ilen(l)
>>> avg(slc)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <lambda>
ZeroDivisionError: division by zero
How can ilen
return zero inside the lambda
?
def consumer(func):
"""
Decorator that automatically advances a "reverse iterator" to its first
yield point when initially called
"""
@wraps(func)
def wrapper(*args, **kwargs):
gen = func(*args, **kwargs)
gen.next()
return gen
return wrapper
Observation
I understand the code for all_equal
derives from the itertools recipe. This legacy implementation has the benefit of working with generic iterables. However, I came across this SO post, which shows an elegant implementation for the same operation on strings, that is, verify all letters in a string are equal.
import itertools
import more_itertools
s = "aaaa"
%timeit more_itertools.all_equal(s)
1000000 loops, best of 3: 1.1 µs per loop
%timeit s == s[0] * len(s)
1000000 loops, best of 3: 438 ns per loop
We see the SO algorithm in this case is 2x-3x faster for strings.
Request
Can the SO algorithm be included in more_itertools.all_equal
so that if a string is passed as the argument, this faster algorithm is preferred?
For example:
def all_equal(iterable):
"""
Returns True if all the elements are equal to each other.
Uses a faster implementation for strings.
http://stackoverflow.com/a/14321721/4531270
>>> all_equal('aaaa')
True
>>> all_equal('aaab')
False
>>> all_equal([1,1,1,1])
True
>>> all_equal([1,1,1,0])
False
"""
if isinstance(iterable, str):
s = iterable
return s == s[0] * len(s)
g = itertools.groupby(iterable)
return next(g, True) and not next(g, False)
Tests
My local tests confirm these results:
# New algorithm
all_equal("aaaa")
# True
all_equal("aaab")
# False
# Legacy algorithm
all_equal([1,1,1,1])
# True
all_equal([1,1,1,0])
# False
Performance
There is some improvement in speed over legacy with continued benefits for longer strings.
s = "a"*100000
# Legacy implementation
%timeit -n 1000 more_itertools.all_equal(s)
1000 loops, best of 3: 1.09 ms per loop
# Proposed implementation
%timeit -n 1000 all_equal(s)
1000 loops, best of 3: 9.64 µs per loop
A last()
function, with an API mirroring first()
, would come in handy for things like https://gist.github.com/4019721, removing the need to use the less readable deque(seq, 1)[0]
. Plus, we could take advantage of the __reversed__
method if it exists on the sequence.
I didn't see a float range function in the library so frange could be helpful. I also used the recommendation from itertools.count to reduce float error.
When counting with floating point numbers, better accuracy can sometimes be achieved by substituting multiplicative code such as:
(start + step * i for i in count())
.
import itertools, operator
# frange(stop)
# frange(start, stop[, step])
def frange(*args):
if len(args) == 1:
start = 0
stop = args[0]
step = 1
elif len(args) == 2:
start, stop = args
step = 1
elif len(args) == 3:
start, stop, step = args
else:
raise TypeError('frange expected at most 1 - 3 arguments, got {}.'.format(len(args)))
if start < stop and 0 < step:
compare_with = operator.lt
elif start > stop and 0 > step:
compare_with = operator.gt
else:
return
compare_with = operator.lt if start < stop else operator.gt
for step_count in itertools.count():
val = start + step * step_count
if compare_with(val, stop):
yield val
else:
break
import unittest
class Testfrange(unittest.TestCase):
def test_frange(self):
self.assertEqual(
tuple(frange(5)),
tuple( range(5))
)
self.assertEqual(
tuple(frange(-5)),
tuple( range(-5))
)
self.assertEqual(
tuple(frange(2, 5)),
tuple( range(2, 5))
)
self.assertEqual(
tuple(frange(2, 10, 2)),
tuple( range(2, 10, 2))
)
self.assertEqual(
tuple(frange(2, -5)),
tuple( range(2, -5))
)
self.assertEqual(
tuple(frange(2, -10, -2)),
tuple( range(2, -10, -2))
)
self.assertEqual(
tuple(frange(2, 10, -2)),
tuple( range(2, 10, -2))
)
self.assertEqual(
tuple(frange(2, -10, 2)),
tuple( range(2, -10, 2))
)
self.assertEqual(
tuple(frange(2.5, 4, 0.5)),
(2.5, 3, 3.5)
)
self.assertEqual(
tuple(frange(2.5, 4.1, 0.5)),
(2.5, 3, 3.5, 4)
)
with self.assertRaises(TypeError):
tuple(frange())
with self.assertRaises(TypeError):
tuple(frange(1, 2, 3, 4))
if __name__ == '__main__':
unittest.main()
How about adding a sliding window itertool?
There are multiple implementations that trade off on things like memory consumption/speed/etc, but here's a version that's worked fine for me
from collections import deque
def window(seq, n=2):
it = iter(seq)
win = deque((next(it, None) for _ in xrange(n)), maxlen=n)
yield win
append = win.append
for e in it:
append(e)
yield win
Yes, this is a bug.
It's unconceivable Python doesn't ship with an implementation for flatten
already. It might be the most asked question about Python on Stack Overflow.
Since this package so useful, what about pushing this into the standard lib?
Hi I just found your more-itertools library and really like what's in it. I stumbled upon this because I was looking for an iterator similar to itertools.cycle that also gives the number of cycles that have been given, (cycle count, object).
I have created this and think that it would be a good addition to your package. Let me know if you want this incorporated to the package.
from itertools import cycle
def count_cycle(iterable):
'''
similar to itertools.cycle but give the number of cycles that
have already been given
'''
iterable = cycle(iterable)
count = 0
first = next(iterable)
first_id = id(first)
yield count, first
for item in iterable:
if id(item) == first_id:
count += 1
yield count, item
import unittest
class TestCycleCount(unittest.TestCase):
def test_count_cycle(self):
self.assertEqual(
tuple(count_cycle(())),
()
)
self.assertEqual(
tuple(cc for i, cc in zip(range(9), count_cycle(range(3)))),
((0,0), (0,1), (0,2), (1,0), (1,1), (1,2), (2,0), (2,1), (2,2))
)
if __name__ == '__main__':
unittest.main()
Something I do relatively often is chunk data into a certain number of chunks, rather than chunks of a certain size. I find this useful in situations where I'm parallelizing work that depends on a bottleneck (eg, database, vcs server), and I want to ensure I don't overload it if a massive amount of work comes in. I have an implementation of this in https://github.com/bhearsum/chunkify, but it seems like it might fit well into chunked.
It would require an API break, with an interface such as:
def chunked(data, chunk_size=None, total_chunks=None)
...where one (and only one) of chunk_size or total_chunks is required. chunkify also supports only returning a specific chunk, which is a bit more efficient for large lists, but not crucial.
I notice more_itertools.lstrip
is similar to itertools.dropwhile
:
import itertools as it
import more_itertools as mit
iterable = [0, None, 1, 2, 0, 3, None, 0]
pred = lambda x: x in {None, 0}
list(mit.lstrip(iterable, pred))
# [1, 2, 0, 3, None, 0]
list(it.dropwhile(pred, iterable))
# [1, 2, 0, 3, None, 0]
I recall lstrip
is a derivative of strip
, but It may be worth noting in the docstring the similarity between lstrip
and dropwhile
(see #122).
Recipes.grouper has this signature:
def grouper(n, iterable, fillvalue=None):
But more.chunked has this signature:
def chunked(iterable, n):
Because these two functions serve almost exactly the same purpose, only with chunked not providing any fill value, it would be nice if it also had a congruent interface.
I know it's a lot to ask for an API to change so dramatically, but I think it would be worth the backward-incompatible change to make these congruent.
I make this post for your consideration and feedback. I won't be offended if the idea is rejected.
À la Haskell: http://hackage.haskell.org/packages/archive/base/latest/doc/html/Data-List.html#v:intersperse
Similar to str.join()
, but for iterators.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.