Comments (4)
For your information, the issue is still there if I replace traceShow
with just regular putStrLn
:
module Main where
import Control.Concurrent
import Data.Function
import Debug.Trace
import qualified Streamly.Data.Stream.Prelude as S
import qualified Streamly.Data.Unfold as U
import qualified Streamly.Internal.Data.Fold as F
main :: IO ()
main = do
let fileCount :: Int = 2222222
fileSize = 5555555
S.fromList [1 :: Int .. fileCount] -- Iterate through files.
& S.mapM (\i -> putStrLn ("processing file: " ++ show i) >> return i)
& S.parConcatMap
(S.maxBuffer 10 . S.maxThreads 10 . S.ordered True)
( \i ->
let i' = i -- traceShow ("processing file: " ++ show i) i
in S.unfold U.fromList [i' + 1 .. i' + fileSize] -- Read file.
& S.mapM
( \x -> do
threadDelay 10 -- Computation on file chunk.
return x
)
)
& S.fold F.drain
However, the problem appears to occur less frequently (and is harder to reproduce). Here is the example I came across:
# 12 lines (unexpected)
$ cabal exec -- my-test-exe
processing file: 1
processing file: 2
processing file: 3
processing file: 4
processing file: 5
processing file: 6
processing file: 7
processing file: 8
processing file: 9
processing file: 10
processing file: 11
processing file: 12
^C
# 10 lines (expected)
$ cabal exec -- my-test-exe
processing file: 1
processing file: 2
processing file: 3
processing file: 4
processing file: 5
processing file: 6
processing file: 7
processing file: 8
processing file: 9
processing file: 10
^C
from streamly.
Possibly related to this: Ahead streams might dispatch too many workers doing nothing #1922
(Although, as shown above, I have seen it dispatch too few workers (in the 9 lines example).)
from streamly.
Thanks for reporting. There seems to be a bug in the ordered parConcatMap. I cannot reproduce it without ordered
though. I see 100's of workers being dispatched in some cases. The problem seems to go away if the delay is increased beyond a threshold. Also, the problem does not seem to occur with parMapM.
There are some XXX comments in dispatchWorker, it may have something to do with those.
Fewer workers are expected in some cases, but too many are not.
from streamly.
We should check the worker limit in pushWorker
when we are incrementing the count under CAS. We should not dispatch if the count has gone beyond the limit.
from streamly.
Related Issues (20)
- Implementation of `MonadReader` instance for `ParserK`
- Missing `infixr` annotation for `consM`, `cons` in `streamly-core`
- Use ByteArray# instead of MutableByteArray# in the immutable Array HOT 1
- asPtrUnsafe should return the new array
- Change the signature of interpose a la intercalate
- Deprecate the array stream parsing functions from StreamK module
- Unreproducible test failure HOT 1
- fromPtr, fromBytes in Array module should be monadic
- Add a follow-symlinks option to readdir
- Track the major version changes according to the pacdiff output in the CI
- Rename some concurrent stream APIs
- Work on the Pipe type
- Remove code duplication in Parser and ParserK tests
- StreamK/ParserK conversion functions
- Rename getIndexUnsafe etc to unsafeGetIndex in mutArray/generic modules
- Consider re-exporting all modules in streamly-core via streamly
- Rename writeLastN
- Rename lengthGeneric, indexGeneric in Data.Fold
- Unreproducible failure in Unicode.Stream test suite
- Using resourcet with streamly
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from streamly.