taskflow / taskflow Goto Github PK

A General-purpose Parallel and Heterogeneous Task Programming System

License: Other

C++ 80.62% CMake 2.02% HTML 0.19% Makefile 0.09% C 5.21% Python 0.22% Cuda 10.37% Shell 0.18% CSS 0.11% JavaScript 0.99%

parallel-programming threadpool concurrent-programming high-performance-computing multicore-programming multi-threading taskparallelism multithreading parallel-computing work-stealing

taskflow's Introduction

Taskflow

Taskflow helps you quickly write parallel and heterogeneous task programs in modern C++

Why Taskflow?

Taskflow is faster, more expressive, and easier for drop-in integration than many of existing task programming frameworks in handling complex parallel workloads.

Taskflow lets you quickly implement task decomposition strategies that incorporate both regular and irregular compute patterns, together with an efficient work-stealing scheduler to optimize your multithreaded performance.

Static Tasking	Subflow Tasking

Taskflow supports conditional tasking for you to make rapid control-flow decisions across dependent tasks to implement cycles and conditions that were otherwise difficult to do with existing tools.

Conditional Tasking

Taskflow is composable. You can create large parallel graphs through composition of modular and reusable blocks that are easier to optimize at an individual scope.

Taskflow Composition

Taskflow supports heterogeneous tasking for you to accelerate a wide range of scientific computing applications by harnessing the power of CPU-GPU collaborative computing.

Concurrent CPU-GPU Tasking

Taskflow provides visualization and tooling needed for profiling Taskflow programs.

Taskflow Profiler

We are committed to support trustworthy developments for both academic and industrial research projects in parallel computing. Check out Who is Using Taskflow and what our users say:

"Taskflow is the cleanest Task API I've ever seen." Damien Hocking @Corelium Inc
"Taskflow has a very simple and elegant tasking interface. The performance also scales very well." Glen Fraser
"Taskflow lets me handle parallel processing in a smart way." Hayabusa @Learning
"Taskflow improves the throughput of our graph engine in just a few hours of coding." Jean-Michaël @KDAB
"Best poster award for open-source parallel programming library." Cpp Conference 2018
"Second Prize of Open-source Software Competition." ACM Multimedia Conference 2019

See a quick presentation and visit the documentation to learn more about Taskflow. Technical details can be referred to our [IEEE TPDS paper][TPDS21].

Start Your First Taskflow Program

The following program (simple.cpp) creates four tasks A, B, C, and D, where A runs before B and C, and D runs after B and C. When A finishes, B and C can run in parallel. Try it live on Compiler Explorer (godbolt)!

#include <taskflow/taskflow.hpp>  // Taskflow is header-only

int main(){
  
  tf::Executor executor;
  tf::Taskflow taskflow;

  auto [A, B, C, D] = taskflow.emplace(  // create four tasks
    [] () { std::cout << "TaskA\n"; },
    [] () { std::cout << "TaskB\n"; },
    [] () { std::cout << "TaskC\n"; },
    [] () { std::cout << "TaskD\n"; } 
  );                                  
                                      
  A.precede(B, C);  // A runs before B and C
  D.succeed(B, C);  // D runs after  B and C
                                      
  executor.run(taskflow).wait(); 

  return 0;
}

Taskflow is header-only and there is no wrangle with installation. To compile the program, clone the Taskflow project and tell the compiler to include the headers.

~$ git clone https://github.com/taskflow/taskflow.git  # clone it only once
~$ g++ -std=c++17 examples/simple.cpp -I. -O2 -pthread -o simple
~$ ./simple
TaskA
TaskC 
TaskB 
TaskD

Visualize Your First Taskflow Program

Taskflow comes with a built-in profiler, TFProf, for you to profile and visualize taskflow programs in an easy-to-use web-based interface.

# run the program with the environment variable TF_ENABLE_PROFILER enabled
~$ TF_ENABLE_PROFILER=simple.json ./simple
~$ cat simple.json
[
{"executor":"0","data":[{"worker":0,"level":0,"data":[{"span":[172,186],"name":"0_0","type":"static"},{"span":[187,189],"name":"0_1","type":"static"}]},{"worker":2,"level":0,"data":[{"span":[93,164],"name":"2_0","type":"static"},{"span":[170,179],"name":"2_1","type":"static"}]}]}
]
# paste the profiling json data to https://taskflow.github.io/tfprof/

In addition to execution diagram, you can dump the graph to a DOT format and visualize it using a number of free GraphViz tools.

// dump the taskflow graph to a DOT format through std::cout
taskflow.dump(std::cout);

Express Task Graph Parallelism

Taskflow empowers users with both static and dynamic task graph constructions to express end-to-end parallelism in a task graph that embeds in-graph control flow.

Create a Subflow Graph
Integrate Control Flow to a Task Graph
Offload a Task to a GPU
Compose Task Graphs
Launch Asynchronous Tasks
Execute a Taskflow
Leverage Standard Parallel Algorithms

Create a Subflow Graph

Taskflow supports dynamic tasking for you to create a subflow graph from the execution of a task to perform dynamic parallelism. The following program spawns a task dependency graph parented at task B.

tf::Task A = taskflow.emplace([](){}).name("A");  
tf::Task C = taskflow.emplace([](){}).name("C");  
tf::Task D = taskflow.emplace([](){}).name("D");  

tf::Task B = taskflow.emplace([] (tf::Subflow& subflow) { 
  tf::Task B1 = subflow.emplace([](){}).name("B1");  
  tf::Task B2 = subflow.emplace([](){}).name("B2");  
  tf::Task B3 = subflow.emplace([](){}).name("B3");  
  B3.succeed(B1, B2);  // B3 runs after B1 and B2
}).name("B");

A.precede(B, C);  // A runs before B and C
D.succeed(B, C);  // D runs after  B and C

Integrate Control Flow to a Task Graph

Taskflow supports conditional tasking for you to make rapid control-flow decisions across dependent tasks to implement cycles and conditions in an end-to-end task graph.

tf::Task init = taskflow.emplace([](){}).name("init");
tf::Task stop = taskflow.emplace([](){}).name("stop");

// creates a condition task that returns a random binary
tf::Task cond = taskflow.emplace(
  [](){ return std::rand() % 2; }
).name("cond");

init.precede(cond);

// creates a feedback loop {0: cond, 1: stop}
cond.precede(cond, stop);

Offload a Task to a GPU

Taskflow supports GPU tasking for you to accelerate a wide range of scientific computing applications by harnessing the power of CPU-GPU collaborative computing using CUDA.

__global__ void saxpy(size_t N, float alpha, float* dx, float* dy) {
  int i = blockIdx.x*blockDim.x + threadIdx.x;
  if (i < n) {
    y[i] = a*x[i] + y[i];
  }
}
tf::Task cudaflow = taskflow.emplace([&](tf::cudaFlow& cf) {

  // data copy tasks
  tf::cudaTask h2d_x = cf.copy(dx, hx.data(), N).name("h2d_x");
  tf::cudaTask h2d_y = cf.copy(dy, hy.data(), N).name("h2d_y");
  tf::cudaTask d2h_x = cf.copy(hx.data(), dx, N).name("d2h_x");
  tf::cudaTask d2h_y = cf.copy(hy.data(), dy, N).name("d2h_y");
  
  // kernel task with parameters to launch the saxpy kernel
  tf::cudaTask saxpy = cf.kernel(
    (N+255)/256, 256, 0, saxpy, N, 2.0f, dx, dy
  ).name("saxpy");

  saxpy.succeed(h2d_x, h2d_y)
       .precede(d2h_x, d2h_y);
}).name("cudaFlow");

Compose Task Graphs

Taskflow is composable. You can create large parallel graphs through composition of modular and reusable blocks that are easier to optimize at an individual scope.

tf::Taskflow f1, f2;

// create taskflow f1 of two tasks
tf::Task f1A = f1.emplace([]() { std::cout << "Task f1A\n"; })
                 .name("f1A");
tf::Task f1B = f1.emplace([]() { std::cout << "Task f1B\n"; })
                 .name("f1B");

// create taskflow f2 with one module task composed of f1
tf::Task f2A = f2.emplace([]() { std::cout << "Task f2A\n"; })
                 .name("f2A");
tf::Task f2B = f2.emplace([]() { std::cout << "Task f2B\n"; })
                 .name("f2B");
tf::Task f2C = f2.emplace([]() { std::cout << "Task f2C\n"; })
                 .name("f2C");

tf::Task f1_module_task = f2.composed_of(f1)
                            .name("module");

f1_module_task.succeed(f2A, f2B)
              .precede(f2C);

Launch Asynchronous Tasks

Taskflow supports asynchronous tasking. You can launch tasks asynchronously to dynamically explore task graph parallelism.

tf::Executor executor;

// create asynchronous tasks directly from an executor
std::future<int> future = executor.async([](){ 
  std::cout << "async task returns 1\n";
  return 1;
}); 
executor.silent_async([](){ std::cout << "async task does not return\n"; });

// create asynchronous tasks with dynamic dependencies
tf::AsyncTask A = executor.silent_dependent_async([](){ printf("A\n"); });
tf::AsyncTask B = executor.silent_dependent_async([](){ printf("B\n"); }, A);
tf::AsyncTask C = executor.silent_dependent_async([](){ printf("C\n"); }, A);
tf::AsyncTask D = executor.silent_dependent_async([](){ printf("D\n"); }, B, C);

executor.wait_for_all();

Execute a Taskflow

The executor provides several thread-safe methods to run a taskflow. You can run a taskflow once, multiple times, or until a stopping criteria is met. These methods are non-blocking with a tf::Future<void> return to let you query the execution status.

// runs the taskflow once
tf::Future<void> run_once = executor.run(taskflow); 

// wait on this run to finish
run_once.get();

// run the taskflow four times
executor.run_n(taskflow, 4);

// runs the taskflow five times
executor.run_until(taskflow, [counter=5](){ return --counter == 0; });

// block the executor until all submitted taskflows complete
executor.wait_for_all();

Leverage Standard Parallel Algorithms

Taskflow defines algorithms for you to quickly express common parallel patterns using standard C++ syntaxes, such as parallel iterations, parallel reductions, and parallel sort.

// standard parallel CPU algorithms
tf::Task task1 = taskflow.for_each( // assign each element to 100 in parallel
  first, last, [] (auto& i) { i = 100; }    
);
tf::Task task2 = taskflow.reduce(   // reduce a range of items in parallel
  first, last, init, [] (auto a, auto b) { return a + b; }
);
tf::Task task3 = taskflow.sort(     // sort a range of items in parallel
  first, last, [] (auto a, auto b) { return a < b; }
);

// standard parallel GPU algorithms
tf::cudaTask cuda1 = cudaflow.for_each( // assign each element to 100 on GPU
  dfirst, dlast, [] __device__ (auto i) { i = 100; }
);
tf::cudaTask cuda2 = cudaflow.reduce(   // reduce a range of items on GPU
  dfirst, dlast, init, [] __device__ (auto a, auto b) { return a + b; }
);
tf::cudaTask cuda3 = cudaflow.sort(     // sort a range of items on GPU
  dfirst, dlast, [] __device__ (auto a, auto b) { return a < b; }
);

Additionally, Taskflow provides composable graph building blocks for you to efficiently implement common parallel algorithms, such as parallel pipeline.

// create a pipeline to propagate five tokens through three serial stages
tf::Pipeline pl(num_parallel_lines,
  tf::Pipe{tf::PipeType::SERIAL, [](tf::Pipeflow& pf) {
    if(pf.token() == 5) {
      pf.stop();
    }
  }},
  tf::Pipe{tf::PipeType::SERIAL, [](tf::Pipeflow& pf) {
    printf("stage 2: input buffer[%zu] = %d\n", pf.line(), buffer[pf.line()]);
  }},
  tf::Pipe{tf::PipeType::SERIAL, [](tf::Pipeflow& pf) {
    printf("stage 3: input buffer[%zu] = %d\n", pf.line(), buffer[pf.line()]);
  }}
);
taskflow.composed_of(pl)
executor.run(taskflow).wait();

Supported Compilers

To use Taskflow, you only need a compiler that supports C++17:

GNU C++ Compiler at least v8.4 with -std=c++17
Clang C++ Compiler at least v6.0 with -std=c++17
Microsoft Visual Studio at least v19.27 with /std:c++17
AppleClang Xode Version at least v12.0 with -std=c++17
Nvidia CUDA Toolkit and Compiler (nvcc) at least v11.1 with -std=c++17
Intel C++ Compiler at least v19.0.1 with -std=c++17
Intel DPC++ Clang Compiler at least v13.0.0 with -std=c++17 and SYCL20

Taskflow works on Linux, Windows, and Mac OS X.

Learn More about Taskflow

Visit our project website and documentation to learn more about Taskflow. To get involved:

See release notes to stay up-to-date with newest versions
Read the step-by-step tutorial at cookbook
Submit an issue at GitHub issues
Find out our technical details at references
Watch our technical talks at YouTube

CppCon20 Tech Talk	MUC++ Tech Talk

We are committed to support trustworthy developments for both academic and industrial research projects in parallel and heterogeneous computing. If you are using Taskflow, please cite the following paper we publised at 2021 IEEE TPDS:

Tsung-Wei Huang, Dian-Lun Lin, Chun-Xun Lin, and Yibo Lin, "Taskflow: A Lightweight Parallel and Heterogeneous Task Graph Computing System," IEEE Transactions on Parallel and Distributed Systems (TPDS), vol. 33, no. 6, pp. 1303-1320, June 2022

More importantly, we appreciate all Taskflow contributors and the following organizations for sponsoring the Taskflow project!

License

Taskflow is licensed with the MIT License. You are completely free to re-distribute your work derived from Taskflow.

taskflow's People

Contributors

Stargazers

Watchers

Forkers

gitter-badger woodsky p78001324 mikin0923 nanxiao thuang19 vtronko lzhkui rockonedege alarouche oceancx vblanco20-1 myousefi2016 jeffhammond yuanjie-ai danielkrupinski forandom xuanhan863 neinnil skyformat99 junneyang chandanksingh dongfangduoshou123 tlming16 sniealu granyy joyifbam5 edawson jonike alan0526 keinen-lab totalgee justinhachemeister linecode mad-thanos guannan-git yanyusu jacksparal 91yuan mcthemax777 728443778 jhonconal abduld patrikhuber jwmneu armorsuitable sriharivignesh harlowja hunglethanh9 tisma rjammala abhi-jha sigino importantprojects alakia forgemistress ivan-v-kush lshappylife dreamplayerzhang frank-y-liu clickto xjtueducation jiafenggit derthorsten yqyunjie gevarakelyan audiobucket aimoonchen neumann-a loveleon swq0553 retval navono ashgoel joe2hpimn arthurwangsuper lifeiteng peilix kshitijaggrwl seanshpark edmig feiyunwill tide999 hug2008 funnypp dancal luoyouchun dima-huang dendisuhubdy haohuixin bengisubatmaz lopesivan hhy5277 artivis bo-yuan-huang nguyen-viet-hung neon77 mutesk buckaroo-pm fcccode

taskflow's Issues

Taskflow reuse

I came across cpp-taskflow recently and I find it very interesting and well designed. However, from my point of view, it would be great to have the possibility to reuse a Taskflow graph after its completion.

To give a bit of background, I work on robot control and we always have a bunch of stuff to compute in a real time loop (up to 1kHz/1ms) in order to send the robot(s) a new command. Taskflow would fit perfectly to describe and perform all the computations but having to reconstruct the graph at each iteration is not very elegant and may also bring some latencies due to the dynamic memory allocations that happen behind the scene.

Is it something that could be implemented in cpp-taskflow?

Register callback function to run when task is completed

Hi,

C++ has a bit of a weird API when it comes to std::future and monitoring progress. It is "doable" but certainly not trivial and requires quite a bit of hackery. What i once did is starting the tasks from within a sub thread and let that thread block on the tasks, like your wait_for_all() function. Then, once those tasks were done the block lifted and the next line would be called to in effect notify some other class that the task is done.

I'm hoping for an api function like that in taskflow.
You now have the functions: dispatch/silent_dispatch/wait_for_topologies/wait_for_all
I would like to request a new addition to those, lets call it dispatch_callback which would be non blocking and takes one callable as argument. That callable would be called once all tasks are done processing.

The implementation could be what i described above as hackery :) But i'm guessing you have much neater options at your disposal as you're already managing the threadpool behind it.

For me, it would make this project really useful! Even though i'm merely using it as a threadpool, it simply becomes very easy and convenient to use and implement multithreading in something :)

Cheers,
Mark

WorkStealingThreadpool - tree of tasks - unexpected behaviour

Hi,

I am observing some unexpected behaviour when trying to dynamically spawn a tree of tasks.

Here is a simple example:

void syncLog(std::string const& msg)
{
    static std::mutex logMutex;
    std::lock_guard<std::mutex> lock(logMutex);
    std::cout << msg << '\n';
}

void grow(tf::SubflowBuilder& subflow, uint64_t depth)
{
    syncLog("Depth: " + std::to_string(depth));
    std::this_thread::sleep_for(std::chrono::seconds(1));
    if(depth < 3)
    {
        subflow.silent_emplace(
                [depth](tf::SubflowBuilder& subsubflow){grow(subsubflow, depth+1);},
                [depth](tf::SubflowBuilder& subsubflow){grow(subsubflow, depth+1);});
        subflow.detach();
    }
}

int main(int argc, char *argv[])
{
    tf::Taskflow mainTaskFlow(8);
    mainTaskFlow.silent_emplace([](tf::SubflowBuilder& subflow){grow(subflow, 0);});
    mainTaskFlow.wait_for_all();
}

Here is the dumped topology of the main taskflow: here

Using "WorkStealingThreadpool" I have the following (unexpected) output:

[09:54:22] Depth: 0 
[09:54:23] Depth: 1
[09:54:24] Depth: 2
[09:54:25] Depth: 3 **Only a single thread running until here**
[09:54:26] Depth: 2 **Only two threads from here**
[09:54:26] Depth: 3 
[09:54:27] Depth: 3
[09:54:27] Depth: 1
[09:54:28] Depth: 3
[09:54:28] Depth: 2
[09:54:29] Depth: 3
[09:54:29] Depth: 2
[09:54:30] Depth: 3
[09:54:30] Depth: 3
[09:54:31] Depth: 3

Using "SimpleThreadpool" I have the following (expected) output:

[10:02:31] Depth: 0 *one thread*
[10:02:32] Depth: 1 *two threads*
[10:02:32] Depth: 1
[10:02:33] Depth: 2 *four threads*
[10:02:33] Depth: 2
[10:02:33] Depth: 2
[10:02:33] Depth: 2
[10:02:34] Depth: 3 *eight threads*
[10:02:34] Depth: 3
[10:02:34] Depth: 3
[10:02:34] Depth: 3
[10:02:34] Depth: 3
[10:02:34] Depth: 3
[10:02:34] Depth: 3
[10:02:34] Depth: 3

tf::Taskflow cannot be declare as extern nor static variable or even as namespace variable

Describe the bug
tf::Taskflow cannot be declare as extern nor static variable or place it in a namespace

To Reproduce
Visual Studio 2017/2019 just declare it as static, extern or in a namespace. Will get stuck here:
// Procedure: _spawn
template
void WorkStealingExecutor::_spawn(unsigned N) {

// Lock to synchronize all workers before creating _worker_mapss
for(unsigned i=0; i<N; ++i) {
_threads.emplace_back([this, i, N] () -> void {
...

Desktop (please complete the following information):

OS: Win10
Version: latest in master

If someone can fast-test it? I'll delay the initialization in my source code at the moment.
Thanks!

Create a Vcpkg package

On Windows at least, having the library as a vcpkg is incredibly convenient

[Question] example/dynamic_traversal performance

Hello,

why is the parallel version of dynamic traversal example so much slower? I understand that it is caused by communication overhead between the worker processes. I expect that there is some queue of tasks and that is the limit.

Is it possible to tune-up taskflow for large number of small tasks (for BFS like algorithms)?

Is there any performance comparisons with boost threadpool?

$ ./dynamic_traversal seq
Seq runtime: 37
./dynamic_traversal seq tf
Tf runtime: 677

./dynamic_traversal
Tf runtime: 668
./dynamic_traversal tf
Tf runtime: 651

Support for “nameless” tasks?

It might be expected that tasks can get a name for this software.

How do you think about to identify tasks by any other means than the data type “std::string”?
Can this class attribute be occasionally omitted generally?

Adjust two constants in implementation of the function “throw_re”

I suggest to use constant characters instead of passing string literals which contain only single characters.
Would you like to integrate a source code adjustment like the following?

diff --git a/taskflow.hpp b/taskflow.hpp
index fd8e7f5..d30ea83 100644
--- a/taskflow.hpp
+++ b/taskflow.hpp
@@ -45,7 +45,7 @@ namespace tf {
 template <typename... ArgsT>
 inline void throw_re(const char* fname, const size_t line, ArgsT&&... args) {
   std::ostringstream oss;
-  oss << "[" << fname << ":" << line << "] ";
+  oss << '[' << fname << ':' << line << "] ";
   (oss << ... << std::forward<ArgsT>(args));
   throw std::runtime_error(oss.str());
 }

Add object/memory pool for task nodes

We are planning to add object pool or memory pool for task nodes to boost the performance. In large applications, frequently creating task nodes can introduce overhead due to the dynamic memory allocation.

Passing data from tasks to their successors

Could you suggest an easy way to pass data (return values) from tasks to their successors?
Suppose A->C and B->C and C needs the results of A and B to proceed (typical of divide & conquer algorithms - BTW an example of d&q would be really helpful - e.g. TBB has a great example for recursive Fibonacci).

As far as I understand, I'd need to emplace A and B, getting the futures; then I need to capture the futures in a lambda that is passed to C.

Alternatively, I'd have to create variables for the results of A and B, and capture one of them in A and B, and both of them in C. This variables cannot be allocated on stack as it can get destroyed before the tasks have a chance to run, so one has to use dynamic memory allocation.

These approaches have major disadvantages:

futures are heavy objects, allocating memory and requiring synchronisation - I believe they are an overkill for this scenario: We already know that A and B have stopped once C is running, so it is safe to retrieve the return values without synchronisation; moreover, if A and B have not been destroyed yet (my understanding is that the clean-up is only performed when the graph terminates), no memory allocation is necessary.
The code becomes unnecessarily ugly - I think it would be much cleaner to pass a lambda accepting 2 parameters to C, and guarantee that the order of parameters corresponds to the order of precede() etc. calls.

Fix add_test() calls in a CMake script

The prefix “${PROJECT_SOURCE_DIR}/unittest/” was specified for calls of the CMake command “add_test”.
I suggest to delete it there.

BasicTaskflow should let us specify the Closure type

Hi,

I think it would be great if we could define our own Closure type for BasicTaskflow, and for multiple reasons:

One could limit the usage to a fixed sized type instead of relying on std::function. For example use a function_ref http://wg21.link/p0792
One could implement Executors that would recognize properties of the Closure, such as priority, long running task, IO task. What I'm thinking about here is having something similar to the windows threadpool https://docs.microsoft.com/fr-fr/windows/desktop/ProcThread/thread-pool-api that can spawn more threads when needed, or change their priority.
On some systems, some functions are restricted to a given CPU core, we could imagine adding some kind of core affinity to the closure

Some of those features could be implemented by other means, but I think customizing the BasicTaskflow::Closure type would be the easiest way to have a system with mixed requirements.

Dynamically Scheduled parallel_for (questions)

Hello.
At the moment, im moving one of my game experiments to taskflow (a spaceship battle simulation with 150.000 game objects moving around). details

For this, ive implemented taskflow to schedule my whole game loop, having each of the game simulation steps be a task. This has worked quite well, and it runs fine, but this is not at full speed yet. Just running multiple systems at once increases parallelism but not enough.

In the original implementation, i was using std::for_each constantly (the c++17 std parallel for algorithm version), and it was netting me a 6-7 times scalability on a ryzen with 8 cores and 16 threads. I tested running std::for_each inside tasks and it actually runs and scales, but i dont think thats good as i would like to leave taskflow to schedule everything instead.

On the engine, i have a "schedule" step that builds the task graph for the whole engine, and then i execute it by making the main thread wait until everything is finished.

This means there is a delay beetween the task creation and the task execution, and this causes considerable issues with the parallel for. The parallel for creates a few tasks but there is no dynamic version. On the engine, i have a task that "prepares" the game objects that will run for a system (for example, the renderables), and then have a parallel for that goes over all that list of renderables and calculates culling. Next, a different system has a different task that just iterates over the visible renderables and draws them to the screen. Sadly there is no way to create tasks within tasks, wich would solve this instantly.

//pseudocode mockup
//-- somewhere , basically  globals (non owning, temporal) so they can be shared from place to place
vector<RenderObject*> *renderables;// pointer to a renderables vector, can be any size from 10 to 200.000
CameraData * camdata;//also global

//There should be no data races by design given the constraints within tasks.


//--------------------------------
//over multiple systems and classes but simplified

//gameplay systems scheduling
auto root_task = task_engine.placeholder();

auto game_final_task = GameSystems.Schedule(root_task);

//grabs renderable data and camera data and stores them in the temporal globals to share them across time. 
auto prepare_cull_task = task_engine.silent_emplace([&]() {
//copy camera data to the temporal storage.
camdata = Engine.GetCameraData();
//gather all the renderables and put them in an array, then point that to the global one so it can be used out of scope.
renderables = Engine.GetRenderables(); 

}

//performs culling
auto cull_task = task_engine.silent_emplace([&]() {
    //serial, should actually be parallel
    for(auto r : Renderables){
       r->cull(camdata);
    }
}

//renders the renderable list
auto render_task =  task_engine.silent_emplace([&]() {
   //serial becouse gpu
    for(auto r: Renderables)    {
         if(r->visible)  Engine.DrawRenderable(r);
    }
}


prepare_cull_task.gather(game_final_task);
cull_task.gather(prepare_cull_task);
render_task.gather(cull_task);

//execute the whole engine iteration
task_engine.wait_for_all();

The provided parallel_for will not work in a case like this, becouse it needs to have a set size. Even if i create the parallel for task on the schedule step, this will not work becouse the amount of renderables is set by the gameplay systems, wich can create or destroy game objects.

For this, i was going to implement my own dynamically scheduled parallel for, more similar to how std::for_each(parallel) works, but i wanted to ask about the design.

My plan is to have a new type of high level task creator that supports this pattern. It would have a lambda for the "preparation" step, and then it will create a customizable number of tasks. (for example 4 tasks). Those tasks will execute the work prepared on the preparation step.

For example, my culling system has the preparation step of grabbing all the renderables, and they are 100.000. Becouse its that many, the intermediate step (that prepares the parallel for) divides it in batches of 4048 (configurable) and inserts the chunks into a parallel queue (just begin/end ranges). Then the worker tasks just dequeue the work chunks and execute the function inside.

Is a pattern like that alright? Given how common the pattern is, i would like to have it be generic and add it to the library.

Are tasks bound to threads, or can they run on any thread in the pool?

Reconsider influence of value objects on interfaces

This class library was designed in the way that specific member functions work with references.
A design pattern suggests to omit them because value objects like strings can be occasionally handled also in a more efficient way by an other programming interface style.
Would you like to support the configuration for this implementation detail?

Longer-running tasks with Framework

Hello! First off, thank you for implementing Framework, this is exactly what I would need to use TaskFlow in my physics simulation code. In particular, each time step of the simulation is a set of tasks with dependencies between them, and the simulation should continue to execute this task graph in a loop until the time variable reaches a user-specified value.

I had a question about a slightly more complex case that I'd like to do as well:

My simulation spends a long time writing to files, but it doesn't write to a file every single time step. It only writes to a file when the time variable is a multiple of some large number. Is there a way to simultaneously dispatch one task (the file writing task) and a framework run_until (the normal simulation time steps), then wait on both of them? That way I could allow the file writing to be concurrent with many time steps, until the next file write is needed.

Conan package

It's can be great if we can make a contact with official bincrafters from conan so they add cpp-taskflow officialy to their remote

Segmentation fault in parallel_for

There is a bug in parallel_for that can cause segmentation fault from captured variables in the lambda. We are fixing it.

[ProactiveThreadpool Bug] shutdown hangs

The proactive thread pool hangs during dynamic tasking - task adds another tasks to the threadpool. For example, the following code can cause proactive threadpool to hang when calling shutdown:

std::function<void(int)> insert;
insert = [&threadpool, &insert, &sum] (int i) {
  if(i > 0) {
    threadpool.silent_async([i=i-1, &insert] () {
      insert(i);
    }); 
  }   
  else {
    if(auto s = ++sum; s == num_threads) {
      promise.set_value();
    }   
  }   
};

// emplace dynamic tasks
for(size_t i=0; i<num_threads; i++){
  insert(num_tasks / num_threads);
}
  
// this will hang
threadpool.shutdown();

Question about taskflow example in readme

I found this quite confusing at first, there's this graph you use as an example of comprehensive taskflow.

I might be nitpicking, but isn't S-a1 edge quite redundant here? You can't proceed to a1 without a0 being complete, and that one requires S being finished anyway.
Correct me if I am wrong here. Thanks.

CMake failed on os X

Describe the bug
mkdir build; cd build; cmake ..
It's seems that cannot detect the Clang++ version ,
error log is:
Cpp-Taskflow currently supports the following compilers:\n\

g++ v7.3 or above\n\
clang++ v6.0 or above\n\
MSVC++ v19.14 or above\n\

so I replace
elseif (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
with
elseif (CMAKE_CXX_COMPILER_ID MATCHES "Clang")
and it's works for me!

Desktop (please complete the following information):

OS: Mac OSX 10.14.2
CMake Version 3.12.1
-Clang++ Version:
Apple LLVM version 10.0.1 (clang-1001.0.46.3)
Target: x86_64-apple-darwin18.2.0
Additional context*
Add any other context about the problem here.

Threadpools limited to core count within a single processor group on Windows

Just making an issue that people are aware of the limitations of cpp-TaskFlow on the Windows platform

https://bitsum.com/general/the-64-core-threshold-processor-groups-and-windows/
https://blogs.technet.microsoft.com/hardtofind/2018/01/29/windows-server-with-more-than-64-logical-processors/

The solution to solve this problem would be to change the affinities of each individual thread by hand.

Can this be used to run a "single" task a thousand times? (each time with different arguments)

Hi,

I have a fairly simple task: make thumbnails of a thousand large images.
Right now i'm using a thread pool which works great! I basically have one function that accepts two input arguments (the source of the large image and the destination of the thumbnail), that is then added to a queue which is then emptied by the threadpool.

But how would you do such a thing in this tasks library? Is it even intended to be used like that, and if so, how?
An example would be really great :)

I've been looking at the parallel_for example but i don't quite see how i would apply that to my case yet.

Cheers,
Mark

Warning in basic_taskflow.hpp on Visual Studio 2017 x64 (15.8.4)

When compiling Taskflow in VS2017 we get a warning like this:
Warning C4267 'argument': conversion from 'size_t' to 'int', possible loss of data
In \taskflow\taskflow\graph\basic_taskflow.hpp on line 478 (this points to the wrong line but whatever)

The actual line that causes this is the following:
node->_num_dependents = node->_dependents.size();
By static casting it like so, we fixed the issue:
node->_num_dependents = static_cast(node->_dependents.size());

I would like to end this issue by saying that I absolutely love taskflow, and the fact that my first reported issue is something as minor as a warning (which triggers our "treat warnings as error" flag, but still) speaks for itself. Great job so far and thank you! :)

Possible Memory Leak

I have encountered std::bad_alloc issues in my codebase and all of the profiling I have done is pointing to the node pool not being properly cleaned up or handled. It seems like the graph is allocating new nodes instead of reusing existing ones. This occurred as well in the prior version that did not include the pooling.

What I am observing is a steady climb in memory usage where I would expect there to be an initial spike and then some settling as the pool begins to fill up. Has there been any investigation on this?

Disposal of completed tasks and destruction of task trees.

I've been trying to figure out where I am able to remove tasks from the graph or how the lifetimes of the tasks are managed, but this library doesn't seem to make it obvious where that is. The reason for this is that I want to be able to build temporary graphs for asynchronous resource loading that I can then discard once they've been processed. The graph would be perfect since I can use the graph to ensure resources that have dependencies are loaded in the correct order. However, once the resources are loaded, it is safe for the graph to be cleaned up to keep memory usage low.

Are there any examples that you could provide that display this type of behavior? It seems like the only way for me to do this would be to discard the Taskflow object entirely and construct a new one, however that seems a little bit excessive since I feel like there should be a way to selectively remove branches of tasks from the graph.

Subflows in a parallel_for

I'm experimenting with taskflow and in some cases would be useful to have dynamic subflows inside a parallel_for to continue to generate tasks. Is this possible with the current interface? If yes could you write a small example?

Warnings on Visual Studio 2017 x64 (15.9.4)

\cpp-taskflow\include\taskflow\threadpool\workstealing_threadpool.hpp(570): warning C4267: 'initializing': conversion from 'size_t' to 'unsigned int', possible loss of data

Release request

Hi, have you considered to have releases?

If so, when do you plan to have the first one?
If not, can you make release?

Thanks.

Support for common reduce operations

We are adding the following method to taskflow to support common reduce operations:

reduce_min
reduce_max
reduce_sum

separate declaration and implementation

Is your feature request related to a problem? Please describe.
I want to use TaskFlow in a C++ project which depends on TensorFlow. When compiling the Op & Kernel with c++17, there is a problem tensorflow/tensorflow#23561

Describe the solution you'd like
separate declaration and implementation, e.g.

# depends
g++ -std=c++17 taskflow/graph taskflow/threadpool ..

# keep header simple
g++ -std=c++11 taskflow/taskflow.hpp ..

In this way, I can use taskflow/taskflow.hpp with TensorFlow without c++17

Describe alternatives you've considered
separate declaration and implementation in Ops & Kernels, there is a lot of work to do.

Additional context
Can you give some suggestions for doing this (separate declaration and implementation).

Exception handling for dispatch()

I was running into issues where my application crashed with std::terminate().
After some debugging I found out that in one task an exception is thrown.
I switched from silent_dispatch() to dispatch() and thought that I could catch these exceptions with the shared_future being returned from dispatch... However, this is not the case.
I assume that std::terminate is called because the worker threads are running in a detached state (aren`t they?)...

Do you think it's possible to add exception handling, at least for dispatch?
I'm not familiar enough with the internals of Taskflow (yet) to prepare code...

Visual Studio/Windows support

Hello. I went to try this library on windows, but ive had some issues with visual studio 2017 latest update.

With a quick source code edit, i was able to have the matrix example running well, but thats a hack.
When compiled on visual studio, the library throws a huge amount of warnings. They are mostly about type casting and data padding.

The library fails to compile due to Threadpool::async<>, wich somehow fails the second if constexpr
(Line 279 of taskflow.hpp). The branch is ignored and it tries to call .set_value(c) when the std::promise is of void type. By commenting that out, the library compiles and i can run the matrix demo with full parallism. The issue is that doing that is a huge hack.

Another common error, specifically in the matrix example, is that openmp refuses to compile with the loop variable as a size_t. It requests a "signed" integer and size_t is unsigned.

Readme still up to date?

I'm referring to this line from the readme:

auto [A, fu] = tf.emplace([] (tf::SubflowBuilder& subflow) {});

This used to work fine with v2.0 in my project but after updating to master today I get a build error: "cannot decompose inaccessible member tf::Task::_node of tf::Task".

Is it still possible to obtain a future from a subtask? I need that future so I can pass it to task B (A.precede(B);)

Segmentation fault

I run the unit test and sometimes it reports a segmentation fault. Can you look into this?

Make data export format configurable

The data format “DOT” is supported by the dump() function so far.

How do you think about to support further file formats there by passing a corresponding data export object?
Would you like to select any related parameters?

gcc 7.3.0 with c++17 compile error

error screenshot: (the source code only include #<taskflow.hpp>, cflags is: -std=c++17)
http://98.142.142.53/cpp-taskflow-compile.png

here is my gcc info:
root@imx8mqevk:~/mnt/source/capture_and_encode# gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/aarch64-poky-linux/7.3.0/lto-wrapper Target: aarch64-poky-linux Configured with: ../../../../../../work-shared/gcc-7.3.0-r0/gcc-7.3.0/configure --build=x86_64-linux --host=aarch64-poky-linux --target=aarch64-poky-linux --prefix=/usr --exec_prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --libexecdir=/usr/libexec --datadir=/usr/share --sysconfdir=/etc --sharedstatedir=/com --localstatedir=/var --libdir=/usr/lib --includedir=/usr/include --oldincludedir=/usr/include --infodir=/usr/share/info --mandir=/usr/share/man --disable-silent-rules --disable-dependency-tracking --with-libtool-sysroot=/home/chunrong/fsl-arm-yocto-bsp/build-xwaylandimx8/tmp/work/aarch64-poky-linux/gcc/7.3.0-r0/recipe-sysroot --with-gnu-ld --enable-shared --enable-languages=c,c++ --enable-threads=posix --enable-multilib --enable-c99 --enable-long-long --enable-symvers=gnu --enable-libstdcxx-pch --program-prefix=aarch64-poky-linux- --without-local-prefix --enable-lto --enable-libssp --enable-libitm --disable-bootstrap --disable-libmudflap --with-system-zlib --with-linker-hash-style=gnu --enable-linker-build-id --with-ppl=no --with-cloog=no --enable-checking=release --enable-cheaders=c_global --without-isl --with-sysroot=/ --with-build-sysroot=/home/chunrong/fsl-arm-yocto-bsp/build-xwaylandimx8/tmp/work/aarch64-poky-linux/gcc/7.3.0-r0/recipe-sysroot --with-gxx-include-dir=/usr/include/c++/7.3.0 --without-long-double-128 --enable-nls --enable-initfini-array --enable-__cxa_atexit Thread model: posix gcc version 7.3.0 (GCC)

Adding reduce operation

We are adding a reduce operation to Taskflow. The API will follow the STL-style reduce in C++17.

thinking about how to integrate into bazel

Is your feature request related to a problem? Please describe.
I want to integrate this library into a bazel project. This would require building it from source and then using it in bazel rules.

Describe the solution you'd like
A bazel BUILD file that can build and use this project.
I think I can pull it off by using the foreign rules here:
https://github.com/bazelbuild/rules_foreign_cc

Describe alternatives you've considered
It would probably be better than trying to write a bunch of individual BUILD files and then try to get you to merge and support those.

Additional context
I just wanted to make an issue so you knew I was working on it, and if you had any comments yourself. thanks!

what do you think it would take to get a c++11 supported version?

Any thoughts on the work that would be needed to backport the library for c+11 also?
I know the examples show structured bindings and stuff, but a user does not strictly need to use that.
Is there anything specifically about the API that would not work in c++11?
Curious what the big blockers would be, and if you think that's even something worth doing. Thanks!

Change the number of worker threads

Plan to add a new method called num_workers to allow users to configure the number of worker threads in the taskflow. Currently, user either pass the argument to the constructor of taskflow or leave it to default std::thread::hardware_concurrency().

Support custom threadpool and executor

Currently the taskflow implements its own threadpool object. We are planning to standardize the interface to allow custom threadpool or executor.

[API request] wait for all previous topologies to finish

Need an API to wait for existing topologies to finish before dispatching the current graph.

wait_for_all_topologieis();

Sharing threadpool between taskflows

Is it possible to create several independent taskflows which share the same threadpool? The rational is that I would like to set the total number of threads to correspond to the number of h/w threads, but it's tricky with several taskflows.

reserved identifier violation

I would like to point out that an identifier like “__throw__” does eventually not fit to the expected naming convention of the C++ language standard.
Would you like to adjust your selection for unique names?

QUESTION: independent tasks to be run without ordering

This is even less than a feature request, and more of a conceptual question on tasking.

let's imagine that i have the tasks

A B C D E F G

because of their structure the tasks can be run in any order. HOWEVER a task can be running only when its neighbour tasks are not being run (however it does not matter if the neighbouring tasks were actually executed or not).
To make an example, i could run B only under the guarantee that neither A or C are run while B is running.

the question is:
how can i prescribe such logical dependency without implying an order between the tasks?

thank you in advance for any hint on this

Adjust condition checks in implementation of the function “dump”

I suggest to determine the emptiness of strings in a consistent way for a succinct data output.
How do you think about to integrate a source code adjustment like the following?

diff --git a/taskflow.hpp b/taskflow.hpp
index d30ea83..ae3ae89 100644
--- a/taskflow.hpp
+++ b/taskflow.hpp
@@ -1017,20 +1017,13 @@ std::string BasicTaskflow<F>::dump() const {
   os << "digraph Taskflow {\n";
   
   for(const auto& node : _nodes) {
-
-    os << "  \"";
-    if(!node.name().empty()) os << node.name();
-    else os << &node;
-    os << "\";\n";
+    os << "  \"" << (node.name().empty() ? &node : node.name()) << "\";\n";
 
     for(const auto s : node._successors) {
-      os << "  \"";
-      if(!node.name().empty()) os << node.name();
-      else os << &node;
-      os << "\" -> \"";
-      if(s->name() != "") os << s->name();
-      else os << s;
-      os << "\";\n";  
+      os << "  \"" << (node.name().empty() ? &node : node.name())
+         << "\" -> \""
+         << (s->name().empty() ? s : s->name())
+         << "\";\n";  
     }
   }

It is possible to create a continuous flow?

The situation I am describing is based on my use case, but it has simplified. So, please, do not focus on the details unless they are needed to answer.

I have a flow of data, from our point of view we can see it as a volatile variable I have to read regularly.
From the value of the variable I need to compute five functions: f0, f1, f2, f3, and f4. Each takes longer than the previous and uses a input the output of the previous. f0 starts with the volatile variable value.

This situation calls for the pipeline pattern, basically after five readings we are doing all the computations in parallel.

Playing around with cppflow (great library, by the way) I understood how to make one loop, however this is not helpful as if there is only one reading then the computation is basically single threaded.

So my question is, it is possible to implement something like this? I would be happy to read the fine manual if the answer is there, just point me in the right direction.

OpenMP Comparison in matrix.cpp is unfair

Describe the bug
The OpenMP code in example/matrix.cpp is exceptionally naive. You know that the matrices are of different sizes and will therefore take different times to multiply, yet you are using the default (i.e. static in all compilers of which I am aware) loop schedule. To be fair you should at least use schedule(dynamic) on all of the parallel for statements. You could also experiment with schedule(nonmonotonic:dynamic) if you have a modern compiler.

You could also reduce the number of fork/join operations by using code more like this

void openmp(const std::vector<size_t>& D) {

  std::cout << "========== OpenMP ==========\n";

  auto tbeg = std::chrono::steady_clock::now();

  std::vector<matrix_t> As(D.size());
  std::vector<matrix_t> Bs(D.size());
  std::vector<matrix_t> Cs(D.size());

  std::cout << "Generating matrix As ...\n";

  #pragma omp parallel
  {
    #pragma omp for schedule(dynamic)
    for(int j=0; j<(int)D.size(); ++j) {
      As[j] = random_matrix(D[j]);
    }

    #pragma omp single
      std::cout << "Generating matrix Bs ...\n";

    #pragma omp for schedule(dynamic)
    for(int j=0; j<(int)D.size(); ++j) {
      Bs[j] = random_matrix(D[j]);
    }

    #pragma omp single
      std::cout << "Computing matrix product values Cs ...\n";

    #pragma omp for schedule(dynamic), nowait
    for(int j=0; j<(int)D.size(); ++j) {
      Cs[j] = As[j] * Bs[j];
    }
  }

  auto tend = std::chrono::steady_clock::now();

  std::cout << "OpenMP takes "
            << std::chrono::duration_cast<std::chrono::milliseconds>(tend-tbeg).count()
            << " ms\n";
}

Or you could experiment with omp taskloop :-)

To Reproduce
Steps to reproduce the behavior:

Read the code
Be amazed at its naivety

Expected behavior
Be fair in your comparisons.

Compile error when compiling with clang 7

Version:
504e78b (latest master)

To Reproduce
Steps to reproduce the behavior:

clone the repository
~$ cd cpp-taskflow && mkdir build && cd build && cmake -DCMAKE_CXX_COMPILER=clang++-7 .. && make -j1

Expected behavior
cpp-taskflow compiles without errors

Actual behavior

$ make -j1
/usr/bin/cmake -H/home/ceeac/Projects/code/cpp-taskflow -B/home/ceeac/Projects/code/cpp-taskflow/build --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/cmake -E cmake_progress_start /home/ceeac/Projects/code/cpp-taskflow/build/CMakeFiles /home/ceeac/Projects/code/cpp-taskflow/build/CMakeFiles/progress.marks
make -f CMakeFiles/Makefile2 all
make[1]: Entering directory '/media/ceeac/Projects/code/cpp-taskflow/build'
make -f CMakeFiles/threadpool_test_tmp.dir/build.make CMakeFiles/threadpool_test_tmp.dir/depend
make[2]: Entering directory '/media/ceeac/Projects/code/cpp-taskflow/build'
cd /home/ceeac/Projects/code/cpp-taskflow/build && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /home/ceeac/Projects/code/cpp-taskflow /home/ceeac/Projects/code/cpp-taskflow /home/ceeac/Projects/code/cpp-taskflow/build /home/ceeac/Projects/code/cpp-taskflow/build /home/ceeac/Projects/code/cpp-taskflow/build/CMakeFiles/threadpool_test_tmp.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/media/ceeac/Projects/code/cpp-taskflow/build'
make -f CMakeFiles/threadpool_test_tmp.dir/build.make CMakeFiles/threadpool_test_tmp.dir/build
make[2]: Entering directory '/media/ceeac/Projects/code/cpp-taskflow/build'
make[2]: Nothing to be done for 'CMakeFiles/threadpool_test_tmp.dir/build'.
make[2]: Leaving directory '/media/ceeac/Projects/code/cpp-taskflow/build'
[  6%] Built target threadpool_test_tmp
make -f CMakeFiles/executor.dir/build.make CMakeFiles/executor.dir/depend
make[2]: Entering directory '/media/ceeac/Projects/code/cpp-taskflow/build'
cd /home/ceeac/Projects/code/cpp-taskflow/build && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /home/ceeac/Projects/code/cpp-taskflow /home/ceeac/Projects/code/cpp-taskflow /home/ceeac/Projects/code/cpp-taskflow/build /home/ceeac/Projects/code/cpp-taskflow/build /home/ceeac/Projects/code/cpp-taskflow/build/CMakeFiles/executor.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/media/ceeac/Projects/code/cpp-taskflow/build'
make -f CMakeFiles/executor.dir/build.make CMakeFiles/executor.dir/build
make[2]: Entering directory '/media/ceeac/Projects/code/cpp-taskflow/build'
[  9%] Building CXX object CMakeFiles/executor.dir/example/executor.cpp.o
/usr/bin/clang++-7   -I/home/ceeac/Projects/code/cpp-taskflow  -Wall -O2 -g   -std=gnu++17 -o CMakeFiles/executor.dir/example/executor.cpp.o -c /home/ceeac/Projects/code/cpp-taskflow/example/executor.cpp
In file included from /home/ceeac/Projects/code/cpp-taskflow/example/executor.cpp:6:
In file included from /home/ceeac/Projects/code/cpp-taskflow/taskflow/taskflow.hpp:3:
In file included from /home/ceeac/Projects/code/cpp-taskflow/taskflow/threadpool/threadpool.hpp:10:
In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/functional:61:
In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/unordered_map:46:
In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/hashtable.h:37:
In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/node_handle.h:39:
/usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/optional:575:9: error: call to implicitly-deleted copy constructor of 'std::_Optional_payload<tf::BasicTaskflow<WorkStealingThreadpool>::Closure, true, false, false>'
      : _Optional_payload(__engaged
        ^                 ~~~~~~~~~
/usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/optional:739:4: note: in instantiation of member function 'std::_Optional_payload<tf::BasicTaskflow<WorkStealingThreadpool>::Closure, true, false, false>::_Optional_payload' requested here
        : _M_payload(__other._M_payload._M_engaged,
          ^
/usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/optional:985:11: note: in instantiation of member function 'std::_Optional_base<tf::BasicTaskflow<WorkStealingThreadpool>::Closure, false, false>::_Optional_base' requested here
    class optional
          ^
/home/ceeac/Projects/code/cpp-taskflow/taskflow/threadpool/workstealing_threadpool.hpp:248:10: note: in implicit move constructor for 'std::optional<tf::BasicTaskflow<WorkStealingThreadpool>::Closure>' first required here
  return item;
         ^
/home/ceeac/Projects/code/cpp-taskflow/taskflow/threadpool/workstealing_threadpool.hpp:442:24: note: in instantiation of member function 'tf::WorkStealingQueue<tf::BasicTaskflow<WorkStealingThreadpool>::Closure>::pop' requested here
        if(t = w.queue.pop(); !t) {
                       ^
/home/ceeac/Projects/code/cpp-taskflow/taskflow/threadpool/workstealing_threadpool.hpp:375:3: note: in instantiation of member function 'tf::WorkStealingThreadpool<tf::BasicTaskflow<WorkStealingThreadpool>::Closure>::_spawn' requested here
  _spawn(N);
  ^
/usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/ext/new_allocator.h:136:23: note: (skipping 10 contexts in backtrace; use -ftemplate-backtrace-limit=0 to see all)
        { ::new((void *)__p) _Up(std::forward<_Args>(__args)...); }
                             ^
/usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/alloc_traits.h:475:8: note: in instantiation of function template specialization '__gnu_cxx::new_allocator<std::_List_node<tf::BasicTaskflow<WorkStealingThreadpool> > >::construct<tf::BasicTaskflow<WorkStealingThreadpool>, const unsigned long &>' requested here
        { __a.construct(__p, std::forward<_Args>(__args)...); }
              ^
/usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/stl_list.h:644:24: note: in instantiation of function template specialization 'std::allocator_traits<std::allocator<std::_List_node<tf::BasicTaskflow<WorkStealingThreadpool> > > >::construct<tf::BasicTaskflow<WorkStealingThreadpool>, const unsigned long &>' requested here
          _Node_alloc_traits::construct(__alloc, __p->_M_valptr(),
                              ^
/usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/stl_list.h:1902:18: note: in instantiation of function template specialization 'std::__cxx11::list<tf::BasicTaskflow<WorkStealingThreadpool>, std::allocator<tf::BasicTaskflow<WorkStealingThreadpool> > >::_M_create_node<const unsigned long &>' requested here
         _Node* __tmp = _M_create_node(std::forward<_Args>(__args)...);
                        ^
/usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/stl_list.h:1234:10: note: in instantiation of function template specialization 'std::__cxx11::list<tf::BasicTaskflow<WorkStealingThreadpool>, std::allocator<tf::BasicTaskflow<WorkStealingThreadpool> > >::_M_insert<const unsigned long &>' requested here
          this->_M_insert(end(), std::forward<_Args>(__args)...);
                ^
/home/ceeac/Projects/code/cpp-taskflow/example/executor.cpp:76:20: note: in instantiation of function template specialization 'std::__cxx11::list<tf::BasicTaskflow<WorkStealingThreadpool>, std::allocator<tf::BasicTaskflow<WorkStealingThreadpool> > >::emplace_back<const unsigned long &>' requested here
    auto& tf = tfs.emplace_back(MAX_THREAD);
                   ^
/usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/optional:581:7: note: explicitly defaulted function was implicitly deleted here
      _Optional_payload(const _Optional_payload&) = default;
      ^
/usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/optional:622:24: note: copy constructor of '_Optional_payload<tf::BasicTaskflow<WorkStealingThreadpool>::Closure, true, false, false>' is implicitly deleted because variant field '_M_payload' has a non-trivial copy constructor
          _Stored_type _M_payload;
                       ^
1 error generated.
make[2]: *** [CMakeFiles/executor.dir/build.make:66: CMakeFiles/executor.dir/example/executor.cpp.o] Error 1
make[2]: Leaving directory '/media/ceeac/Projects/code/cpp-taskflow/build'
make[1]: *** [CMakeFiles/Makefile2:113: CMakeFiles/executor.dir/all] Error 2
make[1]: Leaving directory '/media/ceeac/Projects/code/cpp-taskflow/build'
make: *** [Makefile:144: all] Error 2

Desktop (please complete the following information):

OS: Ubuntu 18.10

Additional context

$ clang --version
clang version 7.0.0-3 (tags/RELEASE_700/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

taskflow / taskflow Goto Github PK

taskflow's Introduction

Taskflow

Why Taskflow?

Start Your First Taskflow Program

Visualize Your First Taskflow Program

Express Task Graph Parallelism

Create a Subflow Graph

Integrate Control Flow to a Task Graph

Offload a Task to a GPU

Compose Task Graphs

Launch Asynchronous Tasks

Execute a Taskflow

Leverage Standard Parallel Algorithms

Supported Compilers

Learn More about Taskflow

License

taskflow's People

Contributors

Stargazers

Watchers

Forkers

taskflow's Issues

Recommend Projects

Recommend Topics

Recommend Org