gabriella439 / foldl Goto Github PK

Composable, streaming, and efficient left folds

License: BSD 3-Clause "New" or "Revised" License

Haskell 98.74% Nix 1.26%

foldl's Introduction

`foldl`

Use this foldl library when you want to compute multiple folds over a collection in one pass over the data without space leaks.

For example, suppose that you want to simultaneously compute the sum of the list and the length of the list. Many Haskell beginners might write something like this:

sumAndLength :: Num a => [a] -> (a, Int)
sumAndLength xs = (sum xs, length xs)

However, this solution will leak space because it goes over the list in two passes. If you demand the result of sum the Haskell runtime will materialize the entire list. However, the runtime cannot garbage collect the list because the list is still required for the call to length.

Usually people work around this by hand-writing a strict left fold that looks something like this:

{-# LANGUAGE BangPatterns #-}

import Data.List (foldl')

sumAndLength :: Num a => [a] -> (a, Int)
sumAndLength xs = foldl' step (0, 0) xs
  where
    step (x, y) n = (x + n, y + 1)

That now goes over the list in one pass, but will still leak space because the tuple is not strict in both fields! You have to define a strict Pair type to fix this:

{-# LANGUAGE BangPatterns #-}

import Data.List (foldl')

data Pair a b = Pair !a !b

sumAndLength :: Num a => [a] -> (a, Int)
sumAndLength xs = done (foldl' step (Pair 0 0) xs)
  where
    step (Pair x y) n = Pair (x + n) (y + 1)

    done (Pair x y) = (x, y)

However, this is not satisfactory because you have to reimplement the guts of every fold that you care about and also define a custom strict data type for your fold. Hand-writing the step function, accumulator, and strict data type for every fold that you want to use gets tedious fast. For example, implementing something like reservoir sampling over and over is very error prone.

What if you just stored the step function and accumulator for each individual fold and let some high-level library do the combining for you? That's exactly what this library does! Using this library you can instead write:

import qualified Control.Foldl as Fold

sumAndLength :: Num a => [a] -> (a, Int)
sumAndLength xs = Fold.fold ((,) <$> Fold.sum <*> Fold.length) xs

-- or, more concisely:
sumAndLength = Fold.fold ((,) <$> Fold.sum <*> Fold.length)

To see how this works, the Fold.sum value is just a datatype storing the step function and the starting state (and a final extraction function):

sum :: Num a => Fold a a
sum = Fold (+) 0 id

Same thing for the Fold.length value:

length :: Fold a Int
length = Fold (\n _ -> n + 1) 0 id

... and the Applicative operators combine them into a new datatype storing the composite step function and starting state:

(,) <$> Fold.sum <*> Fold.length = Fold step (Pair 0 0) done
  where
    step (Pair x y) n = Pair (x + n) (y + 1)

    done (Pair x y) = (x, y)

... and then fold just transforms that to a strict left fold:

fold (Fold step begin done) = done (foldl' step begin)

Since we preserve the step function and accumulator, we can use the Fold type to fold things other than pure collections. For example, we can fold a Producer from pipes using the same Fold:

Fold.purely Pipes.Prelude.fold ((,) <$> sum <*> length)
    :: (Monad m, Num a) => Producer a m () -> m (a, Int)

To learn more about this library, read the documentation in the main Control.Foldl module.

Quick start

Install the stack tool and then run:

$ stack setup
$ stack ghci foldl
Prelude> import qualified Control.Foldl as Fold
Prelude Fold> Fold.fold ((,) <$> Fold.sum <*> Fold.length) [1..1000000]
(500000500000,1000000)

How to contribute

Contribute a pull request if you have a Fold that you believe other people would find useful.

Development Status

The foldl library is pretty stable at this point. I don't expect there to be breaking changes to the API from this point forward unless people discover new bugs.

License (BSD 3-clause)

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of Gabriella Gonzalez nor the names of other contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

foldl's People

Contributors

Stargazers

Watchers

foldl's Issues

pretraverse's type is too strict

pretraverse currently requires a Traversal', but it doesn't need the full power of a Traversal', it would be happen with a mere Control.Lens.Fold.

This came up on #haskell where someone wanted to use pretraverse with a fold.

If it's type was

pretraverse ::
  (forall x. (b -> Const (Endo x) b) -> a -> Const (Endo x) a) ->
  Fold b r -> Fold a r

-- or with lens type synonyms
pretraverse ::
  (forall x. Getting (Endo x) a b) ->
  Fold b r -> Fold a r

It would then be usable with Getters and Folds from lens package without any implementation change.

Add variant of `purely` for normal `foldl`s

Some libraries (e.g. bytestring) provide a function analogous to list foldl. I wrote an adapter wrap g f i o = g . f i o so that I can write purely (wrap foldl), but the types for purely' f = purely (wrap f) don't work out.

I think this would be a useful adapter function to have in the library, so that even code released without knowledge of foldl could be used easily in conjunction with it.

Can traversals be added?

Maybe I'm missing something, but I would think you could add versions of traverse and mapAccumL to take a FoldM and a Fold, respectively.

Edit: traverse probably doesn't work, but something similar might, that takes the extra argument into account.

Fold looks a Profunctor

Fold looks like a profunctor:

instance Profunctor Fold where
    lmap = premap
    rmap = fmap

Is there interest in declaring it as such?

Implement fold/foldM for MonoFoldable class?

I wonder if introducing mono-traversable dependency and implement fold/foldM for MonoFoldable is a good idea. If so, I can open a pull request for this.

Readme: about the space leak

For Haskell beginners/intermediates it would be helpful, if the issue of space leaks would be explained a little bit more.
What exactly is the 'leak' (usually in other Languages one simply thinks of a memory-leak that'll never get garbagecollected, but is this the situation here ?

That this library is more effiecient by folding in 'one go' is pretty clear, but the advantage concerning the memory-consumption should be made more understandable for beginners, I think

An idea concerning ST

I've had an interesting idea. There seems to be a way to specify all kinds of pure Fold as a case of FoldM spesialised to ST. This would solve the issue of the divergence of the API into pure and effectful. Also it would allow to use temporary mutable data-structures (those that have an ST API) in the accumulator.

type Fold =
   forall s. FoldM (ST s)

It's hard to predict how that would affect GC and performance, but it surely seems interesting to explore.

I'm posting this primarily as a matter of a discussion.

Drop the "Control." prefix from "Control.Foldl"

Since it's been recently established that you've joined the camp of those who name the modules after the package, it would be great if you updated "foldl" accordingly.

benchmarks

I made a branch with a bunch of benchmarks. I didn't make a pull request as it contains extraneous material, and a hideously boilerplated Bench.hs.

The benchmarks/Fold directory contains three replicas of the current Control.Foldl with different optimization ideas. Everything pertains to folds over lists; I was thinking of adding more pertaining to bytestring, vector and text -- and I suppose pipes, though perhaps pipes foldls should be benchmarked elsewhere. It contains a report.html made with carter s template. For some reason this is very large (and also javascripty); it takes a minute for my browser to open it, but it is much clearer than the default (perhaps because I don't know how best to arrange benchmarks)

The result of the list-foldling competition, so far, is very much in favor of benchmarks/Foldl/Inlines: it combines 1) a simple-minded early timing of the inlining of <*>, trying to make sure that <*> has exposed the constructor before fold tries anything, so to speak (I see now that I forgot to delay the inlining of fold in this version, but it seems not to have mattered); and a definition of fold that just imports Data.List.foldl' . I still wonder if a scheme using Dan Burton's idea of using a RULE pertaining to build might do something better, but maybe build really just is made for foldrs.

I meant to record that I ran the benchmark executable with

 dist/build/prelude-benchmarks/prelude-benchmarks --template=benchmarks/report.tpl -o benchmarks/report.html

having built with

 cabal configure --enable-bench --package-db=../.cabal-sandbox/x86_64-osx-ghc-7.6.2-packages.conf.d/

Add revList

Sometimes you want it backwards; sometimes you just want it fast.

revList :: Fold a [a]
revList = Fold (flip (:)) [] id

Should there be an `id` Fold?

Say I want "the current item and the last item" in a list, I'd need an id Fold, do I?

thisAndLast = (,) <$> F.id <*> F.previous

(obviously more useful on scans)

Or is that hidden in a typeclass somewhere?

Where id is probably:

id :: F.Fold a a
id = F.Fold (\_ x -> x) undefined id

But this actually evaluates undefined when used with F.scan and crashes.

Add wrapper function to interface with prelude folds and scans.

It's not hard to write, but it seems like it would be sufficiently common to merit inclusion; they don't meet the interface for purely.

Add scanM

I'm not sure if it would actually make sense, but I found myself wanting it (while working in the Either monad).

Increment version

The version number in the repository is behind the one on hackage, which is a little confusing if you try to build interdependent repositories.

Use "vector-builder"

According to the benchmark, it's several times faster than the current implementation and it doesn't require the FoldM abstraction, Fold is enough.

Following are the benchmark results:

benchmarking vector-builder
time                 82.43 μs   (82.16 μs .. 82.84 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 82.79 μs   (82.41 μs .. 83.38 μs)
std dev              1.579 μs   (1.104 μs .. 2.078 μs)
variance introduced by outliers: 14% (moderately inflated)

benchmarking default
time                 217.1 μs   (211.7 μs .. 223.8 μs)
                     0.994 R²   (0.990 R² .. 0.998 R²)
mean                 212.8 μs   (210.2 μs .. 216.8 μs)
std dev              10.99 μs   (6.764 μs .. 16.42 μs)
variance introduced by outliers: 50% (moderately inflated)

A question on purely_ and impurely_

Hi Gabriel,

I was wondering if you've actually used purely_? I can clearly see how to use purely but not purely_.
For instance I could write a simple function foo as follows:

foo :: (x -> a -> x) -> x -> (x -> b) -> [a] -> b
foo f z extract = extract . foldl f z

The type of this matches with the first argument of purely which is forall x. (x -> a -> x) -> x -> (x -> b) -> r. So I can then define this

foo' :: Fold a b -> [a] -> b
foo' = purely foo

However, I can't quite see how to use purely_ as it expects a first argument of type forall x. (x -> a -> x) -> x -> x

As far as I can see the only function you could write that would match this type would be

 bar f z = z

I thought that perhaps the type of purely_ should be (forall x. (x -> a -> x) -> x -> r) -> Fold a b -> r (by looking at the type signature of purely) but then I was unable to write my own version of purely_ with that type signature.

Add `Num` and `Fractional` instances for `Fold`/`FoldM`

Foldl documentation not showing up properly on Hackage

For some reason, at https://hackage.haskell.org/package/foldl, under "Modules", the links to "Control.Foldl", "Control.Foldl.ByteString", and "Control.Foldl.Text" aren't clickable.

Thankfully, it is still working at Stackage (https://www.stackage.org/package/foldl).

Is this issue likely to be from the project, or from the Hackage maintainers' end?

Direction of `foldl` library

This is partially an update and partially a solicitation for opinions on certain design choices for this library.

So I've designed pipes libraries to not directlyl depend on foldl so that I can be much freer about adding dependencies to foldl. The reason why is that these other issues have indicated that foldl may require many other dependencies including, but not limited to:

profunctors
comonad
text and bytestring (to provide convenience folds for these types)

I wanted to see if people would be fine with the above dependencies or if they had other dependencies that they thought might be worth adding. I'd like to develop this into a more fully featured library.

Also, I've decided to freeze the core type (except perhaps to unify FoldM and Fold into a single type as @michaelt suggested), so there won't be any new changes on that front. I'm pretty happy with its current functionality.

Inaccurate version bounds in .cabal file

Hi again... :-)

...currently with GHC 7.0/base-4.3 & GHC 7.2/base-4.4 (i.e. a base >= 4.5 lower bound seems appopriate):

[1 of 4] Compiling Control.Foldl.Internal ( src/Control/Foldl/Internal.hs, dist/dist-sandbox-771bb167/build/Control/Foldl/Internal.o )
[2 of 4] Compiling Control.Foldl    ( src/Control/Foldl.hs, dist/dist-sandbox-771bb167/build/Control/Foldl.o )

src/Control/Foldl.hs:203:10:
    Could not deduce (Eq (Fold a b))
      arising from the superclasses of an instance declaration
    from the context (Num b)
      bound by the instance declaration at src/Control/Foldl.hs:203:10-32
    Possible fix:
      add (Eq (Fold a b)) to the context of the instance declaration
      or add an instance declaration for (Eq (Fold a b))
    In the instance declaration for `Num (Fold a b)'

src/Control/Foldl.hs:203:10:
    Could not deduce (Show (Fold a b))
      arising from the superclasses of an instance declaration
    from the context (Num b)
      bound by the instance declaration at src/Control/Foldl.hs:203:10-32
    Possible fix:
      add (Show (Fold a b)) to the context of the instance declaration
      or add an instance declaration for (Show (Fold a b))
    In the instance declaration for `Num (Fold a b)'

src/Control/Foldl.hs:338:10:
    Could not deduce (Eq (FoldM m a b))
      arising from the superclasses of an instance declaration
    from the context (Monad m, Num b)
      bound by the instance declaration at src/Control/Foldl.hs:338:10-46
    Possible fix:
      add (Eq (FoldM m a b)) to the context of the instance declaration
      or add an instance declaration for (Eq (FoldM m a b))
    In the instance declaration for `Num (FoldM m a b)'

src/Control/Foldl.hs:338:10:
    Could not deduce (Show (FoldM m a b))
      arising from the superclasses of an instance declaration
    from the context (Monad m, Num b)
      bound by the instance declaration at src/Control/Foldl.hs:338:10-46
    Possible fix:
      add (Show (FoldM m a b)) to the context of the instance declaration
      or add an instance declaration for (Show (FoldM m a b))
    In the instance declaration for `Num (FoldM m a b)'
xcabal: Error: some packages failed to install:
foldl-1.1.4 failed during the building phase. The exception was:
ExitFailure 1

`Fold` vs `FoldM Identity`

This isn't really an issue, I'm just curious: why not have type Fold a b = FoldM Identity a b? Is there any real difference between the two?

Having type Fold a b = FoldM Identity a b and type Handler a b = HandlerM Identity a b would mean that generalize, simplify, premapM, and handlesM could disappear. However, I think Comonad (FoldM Identity a) would require FlexibleInstances or something.

Fold vs FoldM

In the little flurry of benchmarking a couple weeks ago I came across some old material that had come up in issue 10 and maybe earlier, about the possibility of assimilating Fold and FoldM. Trapped in an airport, I reworked and updated the old implementation to be a dependent module Control.FoldM exporting the same API as Control.Foldl but ignoring the existence of Fold, so to speak. https://github.com/michaelt/foldm

I added benchmarks for a bunch of stock folds. They seem to show that, at least in the somewhat weird world criterion creates, pure folds have no advantage. You can see the results here fwiw http://michaelt.github.io/bench/stock_folds.html
In each quartet, the first benchmark is of the direct fold over the appropriate type, e.g. Foldable.sum, Vector.Unboxed.sum, etc. (marked prelude); the second applies the associated Control.Foldl.Fold (marked pure); the third and fourth apply a Control.Foldl.FoldM. Notice that the impure folds are frequently faster, though I expect this is just curiosities about inlining and so on in the presence of whnf. The labor of implementing the folds directly (these are the ones marked impure) rather than just applying generalize seems to have been basically pointless (the earlier source was supposed to be a monadic replacement for Fold). This proves the merits of generalize, not that anyone was doubting them. There is also an ancillary benchmark of folds composed with the applicative instance http://michaelt.github.io/bench/compose_folds.html which had been causing me trouble earlier.

It seems to me that, at least where one is using effectful sequences like Pipes import Control.FoldM might be conceptually simpler anyway, since e.g. the transition from say

 >>> M.impurely P.foldM M.sum $ each [1..5]
 15

 >>> M.impurely P.foldM (M.sum <* M.sink print) $ each [1..5]
 1
 2
 3
 4
 5
 15

is so much simpler.

Anyway, I thought these benchmarks might interest you.

add mapM_/forM_?

Hi, do you think mapM_ would be a useful addition to foldl?

mapM_ :: Monad m => (a -> m b) -> FoldM m a ()
mapM_ action = sink (void . action)

It took me a while to find sink as the name is unfamiliar :)

Bump transformers dependency (allow > 0.5)

The transformers library is currently at 0.5.2.0. I have compiled foldl using --allow-newer and worked ok, so perhaps the dependency (currently at < 0.5) could be bumped.

Changelog missing

It would be nice to know breaking changes between major versions.

Add `vector` fold

The type I have in mind would be something like:

vector :: (MonadPrim m, Vector v a) => FoldM m a (v a)

This would probably obsolete pipes-vector, so I'm pinging @bgamari on this. You'd now be able to write:

import Control.Foldl (impurely, vector)
import qualified Pipes.Prelude as Pipes

impurely Pipes.fold vector
    :: (Monad m, Vector v a) => Producer a m () -> m (v a)

Add `Foldable` version for `fold`

Right now fold only has a monomorphic version specialized to lists, the reason being that fusion doesn't occur if you change it to a Foldable-based version. However, the Foldable-based version is still useful and is worth providing as a separate function.

Q: Monoid for Sum and length ?

Hi,

I have a list of (Num a) => Maybe a that I want to fold in one pass to calculate the average (discarding Nothing records).

What is the most idiomatic way to achieve this ?

Do I need to re-implement sum and length to take monoid parametric values instead of Num ?

Thanks for your help.

lmapM

I noticed premapM:

premapM :: (a -> b) -> FoldM m b x -> FoldM a x

But how about a monad variant?

lmapM :: (a -> m b) -> FoldM m b x -> FoldM m a x

Support for early termination

Is there a good way to support early termination in the foldl library? Currently fold has to go though the list until the end even if head is passed.

More stock `FoldM`s?

Is there are reason why FoldMs more or less like these

toHandle :: IO.Handle -> FoldM IO String ()  -- for string this should maybe (also) be toHandleLn?
toHandle h = FoldM step begin done where
  done  = return 
  begin = return ()
  step () str = IO.hPutStr h str

stdout ::  FoldM IO String () -- for string, this should perhaps be stdoutLn
stdout = FoldM step begin done where
  done  = return 
  begin = return ()
  step () str = IO.putStr str 

stderr :: FoldM IO String ()
stderr = FoldM step begin done where
  done  = return 
  begin = return ()
  step () str = IO.hPutStrLn IO.stderr str -- here putStrLn seems very natural

are unsafe or unreasonable?

I was wanting to use bytestring equivalents for a still somewhat imaginary tutorial for these streaming and streaming-bytestring libraries I pieced together. (Foldr and FoldM are the natural consumer/sink types for these discount producer types, I wanted to say.)

Examples like

 L.handlesM _Left stderr <> L.handlesM _Right (L.premapM show (toHandle h))

are extremely attractive, and would make a nice demo in the haddocks for foldl

Profunctor instance for (FoldM m)

If I'm not mistaken,

instance Monad m => Profunctor (FoldM m) where
  rmap = fmap
  lmap = premapM

Speaking of premapM, it appears that its Monad m context is redundant.

Add pretraverse and pretraverseM

pretraverse :: Traversal' a b -> Fold b r -> Fold a r
pretraverse k (Fold step begin done) = Fold step' begin done
  where
    step' = flip (appEndo . getConstant . k (Constant . Endo . flip step))

newtype EndoM m a = Endo { appEndoM :: a -> m a }

instance Monoid m => Monad (EndoM m a) where
    mempty = EndoM return
    mappend (EndoM f) (EndoM g) = EndoM (f >=> g)

pretraverse :: Monad m => Traversal' a b -> FoldM m b r -> FoldM m a r
pretraverse k (FoldM step begin done) = FoldM step' begin done
  where
    step' = flip (appEndoM . getConstant . k (Constant . EndoM . flip step))

Suggestion: rename `Fold` to `Foldl`

If this is possible given backwards compatibility concerns I'm +1. The main rationale is to distinguish it from Control.Lens.Fold.

Add type classes for accumulators

I propose to add following type classes for accumulators. In current
desing accumulator data type is incapsulated and inaccessible. It's
however useful to have such data types accesible if someone wants to
avoid use of Fold machinery for whatever reason.

class NullEst m where
  nullEst :: m

class NullEst m ⇒ FoldEst m a where
  addElem :: m → a → m

class NullEst m ⇒ MonoidEst m where
  mergeEst :: m → m → m

Meaning of those types is quite straightforward. m is a return value
of some function of type [a] → m which could be expressed as a
fold. Thus:

nullEst — corresponds to empty list
addElem — folding function
mergeEst — describes merging of two results obtained from two
data samples and corresponds to ++ operator

It's possible to merge NullEst and MonoidEst type classes but I decided
to keep them separate because some estimator could lack reasonable merge. I
don't have an example though.

Here are laws for the type classes. It seems impossible to write write laws that
doesn't involve monoid in some way.

1,2,3 - Usual monoid laws
4     - addElem m a = m `mergeEst` addElem nullEst a

Yet another reason to expose accumulator types is that's possible to extract
many values from single accumulator. For example online variance allows to
estimate simultaneously number of elements, mean, unbiased/maximum
likelyhood variance/stardard deviation or any combinator of them. Obviously
we need some composble way to extract final values from accumulator.

Fold zoo

With these type classes it becomes possible to write a whole zoo of folds.
I'm not advocating to add them all but at least some of them could be useful.

        Fold0 ← FoldMonoid0  
         ↓       ↓
FoldM ← Fold  ← FoldMonoid

data FoldM m a b = FoldM (∀ x. (x → a → m x) x (x → m b))
data Fold    a b = Fold  (∀ x. (x → a →   x) x (x →   b))
data FoldMonoid a b = FoldMonoid
   (∀ x. x (x → a → x) (x → x → x) x (x → b))

data Fold0 a b = Fold0 (∀ x. FoldEst x a ⇒ x -> b)
data FoldMonoid0 a b = FoldMonoid0 (∀ x. (MonoidEst x, FoldEst x a) ⇒ x -> b)

0-family are folds which are guarenteed to be empty. In other words that
they accumulator corresponds to empty data sample.

P.S. This will move library towards Edward Kmett's reducers.

Severe performance regression in `Control.Foldl.vector` when using `vector-builder`

foldl-1.2.5

module Main where

import qualified Data.Vector.Unboxed as V
import qualified Control.Foldl as L

main :: IO ()
main = do 
    v <- L.foldM L.vector [1..10000000::Int]
    print . V.maximum $ v

10000000
     971,706,320 bytes allocated in the heap
          13,496 bytes copied during GC
      83,887,216 bytes maximum residency (9 sample(s))
       2,071,424 bytes maximum slop
             168 MB total memory in use (5 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1388 colls,     0 par    0.000s   0.002s     0.0000s    0.0000s
  Gen  1         9 colls,     0 par    0.000s   0.016s     0.0018s    0.0147s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.172s  (  0.164s elapsed)
  GC      time    0.000s  (  0.018s elapsed)
  EXIT    time    0.000s  (  0.015s elapsed)
  Total   time    0.172s  (  0.197s elapsed)

  %GC     time       0.0%  (9.2% elapsed)

  Alloc rate    5,653,564,043 bytes per MUT second

  Productivity 100.0% of total user, 90.6% of total elapsed

vector-builder-0.3.1

module Main where

import qualified Data.Vector.Unboxed as V
import qualified Control.Foldl as L
import qualified VectorBuilder.Builder as N
import qualified VectorBuilder.Vector as O

main :: IO ()
main = print . V.maximum $ L.fold (L.foldMap N.singleton O.build) [1..10000000::Int]

10000000
   2,914,242,624 bytes allocated in the heap
   4,551,665,120 bytes copied during GC
     823,919,352 bytes maximum residency (15 sample(s))
      68,425,232 bytes maximum slop
            1697 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      5425 colls,     0 par    2.859s   2.754s     0.0005s    0.0012s
  Gen  1        15 colls,     0 par    3.219s   3.557s     0.2371s    0.8895s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.906s  (  0.911s elapsed)
  GC      time    6.078s  (  6.311s elapsed)
  EXIT    time    0.016s  (  0.161s elapsed)
  Total   time    7.000s  (  7.383s elapsed)

  %GC     time      86.8%  (85.5% elapsed)

  Alloc rate    3,215,715,998 bytes per MUT second

  Productivity  13.2% of total user, 14.5% of total elapsed

foldl-1.3.1

module Main where

import qualified Data.Vector.Unboxed as V
import qualified Control.Foldl as L

main :: IO ()
main = print . V.maximum $ L.fold L.vector [1..10000000::Int]

10000000
   2,914,242,624 bytes allocated in the heap
   4,551,665,120 bytes copied during GC
     823,919,352 bytes maximum residency (15 sample(s))
      68,425,232 bytes maximum slop
            1697 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      5425 colls,     0 par    2.625s   2.750s     0.0005s    0.0011s
  Gen  1        15 colls,     0 par    3.078s   3.549s     0.2366s    0.8859s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    1.156s  (  0.907s elapsed)
  GC      time    5.703s  (  6.299s elapsed)
  EXIT    time    0.016s  (  0.162s elapsed)
  Total   time    6.875s  (  7.369s elapsed)

  %GC     time      83.0%  (85.5% elapsed)

  Alloc rate    2,520,426,053 bytes per MUT second

  Productivity  17.0% of total user, 14.5% of total elapsed

Add `Num` instance for `Fold`/`FoldM`?

I'm tempted to add a Num instance for the Fold and FoldM types. It would basically be the standard way to lift Num over an Applicative:

instance Num b => Num (Fold a b) where
    fromInteger n = pure (fromInteger n)
    (+) = liftA2 (+)
    (*) = liftA2 (*)
    (-) = liftA2 (-)
    negate = fmap negate
    abs = fmap abs
    signum = fmap signum

The main reason I want these Num instances is primarily to lift semirings over Folds. The main objection I can see is that if you add Num then you should probably add instances for the other numeric type classes, too, otherwise people will wonder:

"Why can I multiply Folds but not divide them?"
"Why do integer literals type-check as Folds, but not floating point literals?"

So I guess there are three main choices:

Don't add a Num instance
Add a Num instance, but not other numeric type class instances
Add all numeric type class instances

Post-map for FoldM?

I noticed that Fold has a Functor instance, so that we can "post-map" the result of our fold using fmap:

postMap :: (b -> c) -> Fold a b -> Fold a c
postMap = fmap

Is there a monadic analogue of this for FoldM, like the following

postMapM :: (b -> m c) -> FoldM m a b -> FoldM m a c

I apologize if I missed something by not reading the docs carefully. I looked at handles and handlesM, but, as I understand it, those functions are for transforming the input before the actual fold, not after.

Test speed of new `foldable` functions

The foldable branch of the repository has new Foldable versions of the pure and monadic folds. On my machine these now run as fast as the original list-specialized folds, but I wanted to ask if @michaelt could test them on his machine. If he gets the same results as I do then I will go ahead and replace the list-based folds with the more general Foldable versions and merge this into master.

allow 'map' and 'zip' before fold

I like to have a way to do 'map's and 'zip's before the fold. That is, there should be a function L.compose satisfying

L.fold (L.compose f g) xs = L.fold f $ fmap g xs

and a function L.feed with

L.fold (L.feed f gen) xs = L.fold f $ zip (L.unfold gen) xs

'gen' should be a stream producer, in contrast to the fold 'f', which is a stream consumer. 'L.unfold gen' should generate the according list, like List.unfoldr does.

Those extensions would be great, since we could use 'map' and 'zip' on types like 'StorableVector', where certain element types are not allowed (e.g. functions). That is, we could use 'map' and 'zip' in combination with a fold, where we could not use them alone.

status?

I was wondering if you had abandoned this project? I was thinking of starting a new version, or perhaps requesting maintainership, or addition to maintainership, though I'm not sure about some existing features nor whether I have any sufficiently powerful ideas to add. I have to say that I constantly feel the lack of this module with Pipes and ListT -- note that every module like Pipes-Text ends up going through the boilerplate of writing a fixed familiar list of folds ... that cant be composed. Every time you reuse fold in the definitions in Pipes.Prelude, it is to produce something that cant be composed. Which is okay in a Pipes-Prelude of course.

The package of course needs fold-functions for the suitable types in base and boot packages like bytestring I was thinking of naming these 'interpreters' after the type folded over, e.g. bytestring lazyBytestring vector etc. (Imported as Foldl.bytestring etc.) One also needs to add some ancillary one-line micro-projects like foldl-vector, foldl-pipe (and foldl-listt), and foldl-conduit etc etc. for things more remote from the core and from Haskell platform. Maybe pipes-foldl should include a file-system fold for dirstream though restrictions on rational fords are maybe greater there?

It could also use definitions for more concrete folds, partly just for demonstration, to bring out various peculiariities to watch for -- the threat of overflows and so on The addition of he different kind of mean (harmonicMean etc.) in the statistics package, explicitly marked as such, might be worthwhile. I have a suspicion that there are Foldl's like those all over hackage, I discovered those by chance. The documentation should emphasize the need to use strict operations in user-written unrelentingly.

It is true that the ideas in the package immediately suggest many generalizations, and so one naturally thinks that one hasnt hit the right abstraction, but I wonder if this isn't a perpetual illusion with broadly speaking mathematical material. The merit of the project is that it can be immediately understood and used by a beginning Haskell user. If these combinators can be fixed in her intelligence, than she will be ready for other more spectacular projects like Edward's interesting fold project (which is also on the backburner). Thus it is if you like a little prelude to something more powerful (I could use a bit of this pedagogical help before working with edward's package)

For me the strength of the package is the close combination of nice haskell composability features, and excellent optimization properties. It is a place where we would like to take the necessity of reasoning this out of the users hand deposit in a carefully tuned library, as with text vector and co. Another reason I suspect generalizations is just this: I am not sure that they will be able to retain the optimization model one is trying to secure. As things stand, the compiler can see it has a real left fold and can deduce an appropriate loop. Nothing like this is so far in the cards with e.g. edwards model, though he is certainly planning things like this. It may be that this is a case where restriction and specialization are exactly what makes for optimization friendliness. (Note also the avoidance of type classes. It would be possible to have a typeclass for Foldlable things but I think you might have to introduce ugly type-class extensions for things like Text or ByteString.)

One thing that has surprised me -- maybe this shows I am a victim of generalization -- is that Fold is dispensible for FoldM with the Identity wrapper. This is obvious in theory , but seems to make no difference when we optimize with -O2. I'm not sure that it will be carried over to fold interpreters for more complicated types like Producer

Anyway these are just a few half-baked ideas occurring too early in the morning.

Allow vector >=0.11

`foldl` build failure on GHC-7.10

This was identified by Stackage and I'm copying the failed build log here:

[1 of 4] Compiling Control.Foldl.Internal ( src/Control/Foldl/Internal.hs, dist/build/Control/Foldl/Internal.o )
[2 of 4] Compiling Control.Foldl    ( src/Control/Foldl.hs, dist/build/Control/Foldl.o )

src/Control/Foldl.hs:42:7:
    Ambiguous occurrence ‘mconcat’
    It could refer to either ‘Control.Foldl.mconcat’,
                             defined at src/Control/Foldl.hs:396:1
                          or ‘Prelude.mconcat’,
                             imported from ‘Prelude’ at src/Control/Foldl.hs:(107,1)-(122,5)
                             (and originally defined in ‘GHC.Base’)

src/Control/Foldl.hs:43:7:
    Ambiguous occurrence ‘foldMap’
    It could refer to either ‘Control.Foldl.foldMap’,
                             defined at src/Control/Foldl.hs:401:1
                          or ‘Prelude.foldMap’,
                             imported from ‘Prelude’ at src/Control/Foldl.hs:(107,1)-(122,5)
                             (and originally defined in ‘Data.Foldable’)

It looks like I just need to qualify the use of foldMap and mconcat. I'm just opening this issue to remind myself.

README missing

It would be nice with a README that had examples and such.

[Question] How to fold a list of IO

I have some input of type [IO (Maybe a)]. I need to get the number of failure and the length of the array.

How can I possibly do that using an applicative fold ?

This is the real code and I was wondering if it was possible to do something more clever (more efficient) with foldl:

countFailure :: [Maybe a] -> Int
countFailure = length . filter isNothing

-- | For each node, queryfunc the catalog and return stats
computeStats :: QueryFunc -> [Nodename] -> IO ()
computeStats queryfunc nx = do
  failures <- countFailure <$> parallel (map (computeCatalog queryfunc) nx)
  let nbnodes = length nx
  putStr ("Tested " ++ show nbnodes ++ " nx. ")

  if failures > 0
    then do {putDoc ("Found" <+> red (int failures) <+> "failure(s)." <> line) ; exitFailure}
    else do {putDoc (dullgreen "All green."  <> line) ; exitSuccess}

  where
    computeCatalog :: QueryFunc -> Nodename -> IO (Maybe (FinalCatalog, EdgeMap, FinalCatalog, [Resource]))
    computeCatalog func node = func node >>= S.either
                                 (\err -> putDoc ("Problem with" <+> ttext node <+> ":" <+> getError err </> mempty) >> return Nothing)
                                 (return . Just)

Thanks for your help. I hope you don't mind me crying for help through the list of issues ;-)

Provide a mapped function

mapped :: (a -> b) -> L.Fold b r -> L.Fold a r
mapped f (L.Fold step begin done) = L.Fold step' begin done
  where
    step' x = step x . f

MonadIO, hoisting etc.

I was defeated trying to apply randomN inside a MonadResource/MonadSafe sort of constraint - basically "get a vector of n random words from /usr/share/dict/words" . If the parameters for FoldM were rearranged, I guess it would have a valid MFunctor instance since the monadic elements are always in positive position. But is there place for something like hoist or, for the special case of MonadIO

liftingIO (L.FoldM step begin done) = L.FoldM (\a b -> liftIO (step a b)) (liftIO begin) (\x -> liftIO (done x))

I feel like I'm missing some obvious way to do this that is already present in the library.

It's not possible to use FoldM incrementally

Here is problem. It's very easy to write function to incrementally fold over many lists using Fold in constant space.

incrementally :: [a] → Fold a b → Fold a b
incrementally as (Fold step x out) = Fold step (foldl' step x as) out

It's still possible to write similar function for FoldM but we will lose constant space property.

incrementallyM :: Monad m => [a] -> FoldM m a b -> m (FoldM m a b)
incrementallyM as (FoldM step mx0 out) = do
  x0 <- mx0
  return $ FoldM step (foldM step x0 as) out

In general case we have to retain all elements in the list. It's easy to fix. All we need is to data type chage definition to:

data FoldM m a b = forall x. FoldM (x → a → m x) x (x → m x)

It however comes at a price. Most of useful FoldMs could be only created in the monad. Basically anything that involves mutable variables and/or arrays. This makes such folds very cumbersome to work with. This issue could worked around by introducing another type:

data WrappedFoldM ma b = WrappedFoldM (m (FoldM m a b))

It's not a fold but rather an instruction on creating actual fold. It's still possible to transform inputs, combine folds using applicative interface but now we a not confined to monad unless we actually start folding data.

At this point we have 3 different types of fold and question of common API for all of them becomes very important. Especially one about input data stream transformation

Here I'm speaking from experience of histogram-fill

Add fold analogous to `Data.ByteString.count`

It would have the following type:

count :: Num n => Word8 -> Fold ByteString n

Provide a function from attoparsec parsers to `Fold`s

This is the signature I would like:

parse :: Attoparsec.Parser i r -> Fold i r