Giter Club home page Giter Club logo

bistro's Introduction

Bistro: A fast, flexible toolkit for scheduling and running distributed tasks

Build Status

This README is a very abbreviated introduction to Bistro. Visit http://facebook.github.io/bistro for a more structured introduction, and for the docs.

Bistro is a toolkit for making distributed computation systems. It can schedule and run distributed tasks, including data-parallel jobs. It enforces resource constraints for worker hosts and data-access bottlenecks. It supports remote worker pools, low-latency batch scheduling, dynamic shards, and a variety of other possibilities. It has command-line and web UIs.

Some of the diverse problems that Bistro solved at Facebook:

  • Safely run map-only ETL tasks against live production databases (MySQL, HBase, Postgres).
  • Provide a resource-aware job queue for batch CPU/GPU compute jobs.
  • Replace Hadoop for a periodic online data compression task on HBase, improving time-to-completion and reliability by over 10x.

You can run Bistro "out of the box" to suit a variety of different applications, but even so, it is a tool for engineers. You should be able to get started just by reading the documentation, but when in doubt, look at the code --- it was written to be read.

Some applications of Bistro may involve writing small plugins to make it fit your needs. The code is built to be extensible. Ask for tips, and we'll do our best to help. In return, we hope that you will send a pull request to allow us to share your work with the community.

Early release

Although Bistro has been in production at Facebook for over 3 years, the present public release is partial, including just the server components.

Install the dependencies and build

Bistro needs a 64-bit Linux, Folly, FBThrift, Proxygen, boost, and libsqlite3. You need 2-3GB of RAM to build, as well as GCC 4.9 or above.

build/README.md documents the usage of Docker-based scripts that build Bistro on Ubuntu 14.04, 16.04, and Debian 8.6. You should be able to follow very similar steps on most modern Linux distributions.

If you run into dependency problems, look at bistro/cmake/setup.cmake for a full list of Bistro's external dependencies (direct and indirect). We gratefully accept patches that improve Bistro's builds, or add support for various flavors of Linux and Mac OS.

The binaries will be in bistro/cmake/{Debug,Release}. Available build targets are explained here: http://cmake.org/Wiki/CMake_Useful_Variables#Compilers_and_Tools You can start Bistro's unit tests by running ctest in those directories.

Your first Bistro run

This is just one simple demo, but Bistro is a very flexible tool. Refer to http://facebook.github.io/bistro/ for more in-depth information.

We are going to start a single Bistro scheduler talking to one 'remote' worker.

Aside: The scheduler tracks jobs, and data shards on which to execute them. It also makes sure only to start new tasks when the required resources are available. The remote worker is a module for executing centrally scheduled work on many machines. The UI can aggregate many schedulers at once, so using remote workers is optional --- a share-nothing, many-scheduler system is sometimes preferable.

Let's make a task to execute:

cat <<EOF > ~/demo_bistro_task.sh
#!/bin/bash
echo "I got these arguments: \$@"
echo "stderr is also logged" 1>&2
echo "done" > "\$2"  # Report the task status to Bistro via a named pipe
EOF
chmod u+x ~/demo_bistro_task.sh

Open two terminals, one for the scheduler, and one for the worker.

# In both terminals
cd bistro/bistro
# Start the scheduler in one terminal
./cmake/Debug/server/bistro_scheduler \
  --server_port=6789 --http_server_port=6790 \
  --config_file=scripts/test_configs/simple --clean_statuses \
  --CAUTION_startup_wait_for_workers=1 --instance_node_name=scheduler
# Start the worker in another
mkdir /tmp/bistro_worker
./cmake/Debug/worker/bistro_worker --server_port=27182 --scheduler_host=:: \
  --scheduler_port=6789 --worker_command="$HOME/demo_bistro_task.sh" \
  --data_dir=/tmp/bistro_worker

You should be seeing some lively log activity on both terminals. In several seconds, the worker-scheduler negotiation should complete, and you should see messages like "Task ... quit with status" and "Got status".

Since we passed --clean_statuses, the scheduler will not persist any task completions that happened during this run. The worker, on the other hand, will keep a record of the task logs in /tmp/bistro_worker/task_logs.sql3.

If you want task completions to persist across runs, tell Bistro where to put the SQLite database, via --data_dir=/tmp/bistro_scheduler and --status_table=task_statuses

mkdir /tmp/bistro_scheduler
./cmake/Debug/server/bistro_scheduler \
  --server_port=6789 --http_server_port=6790 \
  --config_file=scripts/test_configs/simple \
  --data_dir=/tmp/bistro_scheduler --status_table=task_statuses \
  --CAUTION_startup_wait_for_workers=1 --instance_node_name=scheduler

You can query the running scheduler via its REST API:

curl -d '{"a":{"handler":"jobs"},"b":{"handler":"running_tasks"}}' :::6790
curl -d '{"my subquery":{"handler":"task_logs","log_type":"stdout"}}' :::6790

Pro-tip: For ease of reading, pipe the output through either jq or json_pp (from a Perl package). For longer outputs, try | jq -C . | less -R.

You should also take a look at the scheduler configuration to see how its jobs, nodes, and resources were specified.

less scripts/test_configs/simple

For debugging, we typically invoke the binaries like this:

gdb cmake/Debug/worker/bistro_worker -ex "r ..." 2>&1 | tee WORKER.txt

When configuring a real deployment, be sure to carefully review the --help of the scheduler & worker binaries, as well as the documentation on http://facebook.github.io/bistro. And don't hesitate to ask for help in the group: https://www.facebook.com/groups/bistro.scheduler

License

See LICENSE.

bistro's People

Contributors

ahornby avatar andrewjcg avatar bkoray avatar chadaustin avatar cooperlees avatar dgrnbrg-meta avatar fanzeyi avatar genevievehelsel avatar igorsugak avatar jstrizich avatar leehowes avatar lnicco avatar lukaspiatkowski avatar meyering avatar mizuchi avatar nataliejameson avatar orvid avatar pedroerp avatar pkaush avatar raheelshahzad avatar saifhhasan avatar shri-khare avatar simpkins avatar snarkmaster avatar vgao1996 avatar vitaut avatar wez avatar xavierd avatar yfeldblum avatar zertosh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bistro's Issues

Example doesn't work with Docker-based build

Hi,
thank you for contributing this great tool to Github, I couldn't find any similar tool, not as simple at least.

However, I have a problem with running example program on Docker build.
TL;DR: when trying to connect worker with server, I have an error: 111 (Connection Refused). I've checked the port (with lsof -i) from worker's terminal and server indeed listens on 6789 on both ipv4 and ipv6.

First, I've had some issues with build of "master" branch. IIRC some build script was using thrift1 command, instead of /home/install/bin/thrift1. I've looked at issue tracker and found this: #18 , and I used the commit pointed here (044cd9f...). It worked: build finished, even though some tests fail, but binaries were built and they work.
For note, my command for making the Docker image:
os_image=ubuntu:16.04 gcc_version=5 make_parallelism=2 travis_cache_dir=~/travis_ccache ./fbcode_builder/travis_docker_build.sh &> build_at_$(date +'%Y%m%d_%H%M%S').log

Then I connected to my image (using instructions from https://github.com/facebook/bistro/blob/master/build/fbcode_builder/README.docker) and tried to run the example from here: https://github.com/facebook/bistro/blob/master/README.md#your-first-bistro-run.
I'm running both exactly the same commands as in README, in directory /home/bistro/bistro, on the same docker session, using screen, and worker returns this error:

W0928 12:12:55.081917   157 BistroWorkerHandler.cpp:666] Waiting for this worker to start listening on ServiceAddress {
  1: ip_or_host (string) = "172.17.0.2",
  2: port (i32) = 27182,
}: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused

I was wondering that maybe there's something wrong with my Docker configuration? I've installed it using this guide: https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-on-ubuntu-16-04

Worker log:

root@cc646d054226:/home/bistro/bistro# ./cmake/Debug/worker/bistro_worker --server_port=27182 --scheduler_host=:: \
>   --scheduler_port=6789 --worker_command="$HOME/demo_bistro_task.sh" \
>   --data_dir=/tmp/bistro_worker
W0928 12:25:39.609571   215 server_socket.cpp:90] Found no 10 interfaces that are not link-local or loopback
I0928 12:25:39.612613   215 LogWriter.cpp:79] Created table stderr
I0928 12:25:39.612731   215 LogWriter.cpp:79] Created table stdout
I0928 12:25:39.612826   215 LogWriter.cpp:79] Created table statuses
I0928 12:25:39.613024   217 AutoTimer.h:142] Pruned logs with cutoff 1504009539 in 57.89 us
I0928 12:25:40.873081   215 BistroWorkerHandler.cpp:102] Worker is ready: BistroWorker {
  1: shard (string) = "cc646d054226",
  2: machineLock (struct) = MachinePortLock {
    1: hostname (string) = "cc646d054226",
    2: port (i32) = 27182,
  },
  3: addr (struct) = ServiceAddress {
    1: ip_or_host (string) = "172.17.0.2",
    2: port (i32) = 27182,
  },
  4: id (struct) = BistroInstanceID {
    1: startTime (i64) = 1506601540,
    2: rand (i64) = -6770707008561318671,
  },
  5: heartbeatPeriodSec (i32) = 15,
  6: protocolVersion (i16) = 2,
  7: usableResources (struct) = UsablePhysicalResources {
    1: msSinceEpoch (i64) = 0,
    2: cpuCores (double) = 0,
    3: memoryMB (double) = 0,
    4: gpus (list) = list<struct>[0] {
    },
  },
}
W0928 12:25:40.892567   230 BistroWorkerHandler.cpp:666] Waiting for this worker to start listening on ServiceAddress {
  1: ip_or_host (string) = "172.17.0.2",
  2: port (i32) = 27182,
}: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused
I0928 12:25:41.894337   246 AutoTimer.h:142] Query: 'SELECT job_id, node_id, time_and_count, line FROM statuses WHERE (time_and_count <= 0) ORDER BY time_and_count DESC LIMIT 2'; args: ' in 182 ns
I0928 12:25:41.894436   246 LogWriter.cpp:220] Got 0 statuses lines
E0928 12:25:41.895129   230 BistroWorkerHandler.cpp:754] Unable to send heartbeat to scheduler: Channel is !good()

Scheduler log:

# ./cmake/Debug/server/bistro_scheduler \
  --server_port=6789 --http_server_port=6790 \
  --config_file=scripts/test_configs/simple --clean_statuses \
  --CAUTION_startup_wait_for_workers=1 --instance_node_name=scheduler> > > 
I0928 12:26:42.317178   255 AutoTimer.h:142] Read config from /home/bistro/bistro/scripts/test_configs/simple in 106.4 us
I0928 12:26:42.317651   255 AutoTimer.h:142] Parsed config with 1 jobs in 352.2 us
I0928 12:26:42.317860   255 AutoTimer.h:142] Have 7 nodes after manual in 62.42 us
I0928 12:26:42.318045   258 Monitor.cpp:79] Updating monitor histogram (/home/bistro/bistro/monitor/Monitor.cpp:65): Monitor transiently not making a histogram for simple_job since it is not loaded
W0928 12:26:42.318713   260 RemoteWorkerRunner.cpp:93] RemoteWorkerRunner initial wait (/home/bistro/bistro/runners/RemoteWorkerRunner.cpp:79): DANGER! DANGER! Your --CAUTION_startup_wait_for_workers of 1 is lower than the max healthcheck gap of 125, which makes it very likely that you will start second copies of tasks that are already running (unless your heartbeat interval is much smaller). No initial worker set ID consensus. Waiting for all workers to connect before running tasks.
I0928 12:26:42.319443   261 Bistro.cpp:184] Idle wait...

bistro_scheduler startup error Singleton N6wangle12_GLOBAL__N_113PollerContextE requested before registrationComplete() call

I got the docker build to run, but ctest had 4 tests failed, and bistro_scheduler got error: "wangle...PollerContextE requested before registrationComplete() call". What am I missing?

Test failure:
export os_image=ubuntu:16.04
export gcc_version=5
make_parallelism=2 ./build/fbcode_builder/travis_docker_build.sh

$ docker run -it 1e47cff229f0 bash
nobody@5fa08110f0b5:/home/bistro/bistro/cmake/Debug$ ctest
Test project /home/bistro/bistro/cmake/Debug
Start 1: test_async_read_pipe
1/56 Test #1: test_async_read_pipe .................. Passed 0.02 sec
Start 2: test_async_read_pipe_rate_limiter
...
93% tests passed, 4 tests failed out of 56

Total Test time (real) = 38.80 sec

The following tests FAILED:
11 - test_worker (OTHER_FAULT)
19 - test_thrift_monitor (OTHER_FAULT)
28 - test_scheduler (OTHER_FAULT)
51 - test_remote_runner (OTHER_FAULT)
Errors while running CTest


bistro_scheduler startup error.

root@27cb23c3eb07:/home/bistro/bistro# ./cmake/Debug/server/bistro_scheduler \

--server_port=6789 --http_server_port=6790
--config_file=scripts/test_configs/simple --clean_statuses
--CAUTION_startup_wait_for_workers=1 --instance_node_name=scheduler
I0406 14:21:51.122525 37 AutoTimer.h:142] Read config from /home/bistro/bistro/scripts/test_configs/simple in 89.35 us
I0406 14:21:51.122921 37 AutoTimer.h:142] Parsed config with 1 jobs in 275.8 us
I0406 14:21:51.123087 37 AutoTimer.h:142] Have 7 nodes after manual in 48.02 us
I0406 14:21:51.123237 40 Monitor.cpp:74] Updating monitor histogram (/home/bistro/bistro/monitor/Monitor.cpp:60): Monitor transiently not making a histogram for simple_job since it is not loaded
W0406 14:21:51.124105 42 RemoteWorkerRunner.cpp:89] RemoteWorkerRunner initial wait (/home/bistro/bistro/runners/RemoteWorkerRunner.cpp:75): DANGER! DANGER! Your --CAUTION_startup_wait_for_workers of 1 is lower than the max healthcheck gap of 125, which makes it very likely that you will start second copies of tasks that are already running (unless your heartbeat interval is much smaller). No initial worker set ID consensus. Waiting for all workers to connect before running tasks.
I0406 14:21:51.124487 43 Bistro.cpp:184] Idle wait...
I0406 14:21:51.126633 37 HTTPMonitorServer.cpp:130] Launched HTTP Monitor Server on port 6790, result 0
F0406 14:21:51.127137 37 Singleton-inl.h:241] Singleton N6wangle12_GLOBAL__N_113PollerContextE requested before registrationComplete() call.
*** Check failure stack trace: ***
@ 0x7f3873f8e5cd google::LogMessage::Fail()
@ 0x7f3873f90433 google::LogMessage::SendToLog()
@ 0x7f3873f8e15b google::LogMessage::Flush()
@ 0x7f3873f8e379 google::LogMessage::~LogMessage()
@ 0x7f38713e7ba2 folly::detail::SingletonHolder<>::createInstance()
@ 0x7f38713e6b9c folly::detail::SingletonHolder<>::try_get()
@ 0x7f38713e64d7 folly::Singleton<>::try_get()
@ 0x7f38713e5790 wangle::FilePoller::init()
@ 0x7f38713e5654 wangle::FilePoller::FilePoller()
@ 0x7f3871fc3ccd ZSt11make_uniqueIN6wangle10FilePollerEJRKNSt6chrono8durationIlSt5ratioILl1ELl1EEEEEENSt9_MakeUniqIT_E15__single_objectEDpOT0
@ 0x7f3871fc2b4d apache::thrift::SecurityKillSwitchPoller::SecurityKillSwitchPoller()
@ 0x7f3871fc2a89 apache::thrift::SecurityKillSwitchPoller::SecurityKillSwitchPoller()
@ 0x7f3871fe4dcd apache::thrift::ThriftServer::ThriftServer()
@ 0x7f3871fe49ef apache::thrift::ThriftServer::ThriftServer()
@ 0xc796e0 ZN9__gnu_cxx13new_allocatorIN6apache6thrift12ThriftServerEE9constructIS3_JEEEvPT_DpOT0
@ 0xc78c37 ZNSt16allocator_traitsISaIN6apache6thrift12ThriftServerEEE9constructIS2_JEEEvRS3_PT_DpOT0
@ 0xc781d4 std::_Sp_counted_ptr_inplace<>::_Sp_counted_ptr_inplace<>()
@ 0xc7719d ZNSt14__shared_countILN9__gnu_cxx12_Lock_policyE2EEC2IN6apache6thrift12ThriftServerESaIS6_EJEEESt19_Sp_make_shared_tagPT_RKT0_DpOT1
@ 0xc7629e std::__shared_ptr<>::__shared_ptr<>()
@ 0xc75676 ZNSt10shared_ptrIN6apache6thrift12ThriftServerEEC2ISaIS2_EJEEESt19_Sp_make_shared_tagRKT_DpOT0
@ 0xc74462 std::allocate_shared<>()
@ 0xc73004 ZSt11make_sharedIN6apache6thrift12ThriftServerEJEESt10shared_ptrIT_EDpOT0
@ 0xc7004e main
@ 0x7f387019f830 __libc_start_main
@ 0xc6f6c9 _start
@ (nil) (unknown)
Aborted (core dumped)

Cron schedule runs 3-5 minutes late

cron schedule with specific date and time seems to run about 3-5 minutes late, and schedule for running every minute, seems to be running every 5 minutes. Is that normal? Thank you.

Configuration example

Schedule for specific datetime:

{
  "bistro_settings": {
    "resources": {
      "instance": {"concurrency": {"limit": 10, "default": 1}},
      "level1": {
        "my_resource": {"limit": 3, "default": 0}
      }
    },
    "nodes": {
      "levels": ["level1", "level2"],
      "node_sources": [
        {
          "source": "manual",
          "prefs": {
            "node1": []
          }
        },

        {
          "source": "add_time",
          "prefs": {
            "parent_level": "level1",
            "schedule": [
              {
                "cron": {
                  "year": 2017,
                  "month": 5,
                  "day_of_month": 17,
                  "hour": 8,
                  "minute": 50,
                  "dst_fixes": ["unskip", "repeat_use_only_early"]
                },
                "lifetime": 6000,
                "tags": ["tag_job1"]
              }
            ]
          }
        }

      ]
    },
    "enabled" : true
  },

  "bistro_job->job1" : {
    "owner" : "test",
    "enabled" : true,
    "command" : ["/bs/job_script.sh"],
    "priority": 1,
    "resources": {
      "my_resource": 1
    },
    "filters": {
      "level2": {
        "tag_whitelist": ["tag_job1"]
      }
    }
  }

}

Here is my configuration for the task running every minute:

        {
          "source": "add_time",
          "prefs": {
            "parent_level": "level1",
            "schedule": [
              {
                "cron": {
                  "minute": {"period": 1},
                  "dst_fixes": ["unskip", "repeat_use_only_early"]
                },
                "lifetime": 40
              }
            ]
          }
        }

Example Output:

Here is example timestamps showing approx. 5 minute gaps between job runs:

$ date -d +@1494852746
2017-05-15T08:52:26
$ date -d +@1494853041
2017-05-15T08:57:21
$ date -d +@1494853342
2017-05-15T09:02:22

Fix test_scheduler on Travis

As of recently, this test is failing on Travis, but not on FB-internal CI:

I0509 20:01:23.618247   720 AutoTimer.h:142] Found 4 orphan running tasks in 152.4 us
/home/bistro/bistro/scheduler/test/test_scheduler.cpp:222: Failure
Expected equality of these values:
  expected_orphans
    Which is: { 224-byte object <90-DC AB-02 00-00 00-00 07-00 00-00 00-00 00-00 62-61 64-5F 6A-6F 62-00 00-00 00-00 00-00 00-00 B0-DC AB-02 00-00 00-00 09-00 00-00 00-00 00-00 68-6F 73-74 32-2E 64-62 31-00 00-00 00-00 00-00 ... 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 69-73 74-29 88-13 00-00 00-00 00-00 00-00 00-00 63-74 3E-5B>, 224-byte object <70-DD AB-02 00-00 00-00 03-00 00-00 00-00 00-00 6A-6F 62-00 00-00 00-00 00-00 00-00 00-00 00-00 90-DD AB-02 00-00 00-00 08-00 00-00 00-00 00-00 62-61 64-5F 6E-6F 64-65 00-00 00-00 00-00 00-00 ... 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 28-69 33-32 88-13 00-00 00-00 00-00 00-00 00-00 20-20 20-7D>, 224-byte object <50-DE AB-02 00-00 00-00 0C-00 00-00 00-00 00-00 6A-6F 62-5F 64-69 73-61 62-6C 65-64 00-6F 72-6B 70-DE AB-02 00-00 00-00 09-00 00-00 00-00 00-00 68-6F 73-74 31-2E 64-62 32-00 3A-20 69-6E 76-6F ... 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 72-61 74-69 88-13 00-00 00-00 00-00 00-00 00-00 3D-20 42-61>, 224-byte object <30-DF AB-02 00-00 00-00 03-00 00-00 00-00 00-00 6A-6F 62-00 31-3A 20-6E 6F-4D 6F-72 65-42 61-63 20-E2 AB-02 00-00 00-00 11-00 00-00 00-00 00-00 11-00 00-00 00-00 00-00 20-20 32-3A 20-73 65-63 ... 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 88-13 00-00 00-00 00-00 00-00 00-00 00-00 00-00> }
  res.orphanTasks_
    Which is: { 224-byte object <F0-02 AC-02 00-00 00-00 07-00 00-00 00-00 00-00 62-61 64-5F 6A-6F 62-00 00-00 00-00 00-00 00-00 10-03 AC-02 00-00 00-00 09-00 00-00 00-00 00-00 68-6F 73-74 32-2E 64-62 31-00 00-00 00-00 00-00 ... 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 88-13 00-00 00-00 00-00 00-00 00-00 00-00 00-00>, 224-byte object <D0-03 AC-02 00-00 00-00 03-00 00-00 00-00 00-00 6A-6F 62-00 00-00 00-00 00-00 00-00 00-00 00-00 B0-FA AB-02 00-00 00-00 11-00 00-00 00-00 00-00 11-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 ... 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 88-13 00-00 00-00 00-00 00-00 00-00 00-00 00-00>, 224-byte object <B0-04 AC-02 00-00 00-00 03-00 00-00 00-00 00-00 6A-6F 62-00 00-00 00-00 00-00 00-00 00-00 00-00 D0-04 AC-02 00-00 00-00 08-00 00-00 00-00 00-00 62-61 64-5F 6E-6F 64-65 00-00 00-00 00-00 00-00 ... 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 88-13 00-00 00-00 00-00 00-00 00-00 00-00 00-00>, 224-byte object <90-05 AC-02 00-00 00-00 0C-00 00-00 00-00 00-00 6A-6F 62-5F 64-69 73-61 62-6C 65-64 00-00 00-00 B0-05 AC-02 00-00 00-00 09-00 00-00 00-00 00-00 68-6F 73-74 31-2E 64-62 32-00 00-00 00-00 00-00 ... 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 88-13 00-00 00-00 00-00 00-00 00-00 00-00 00-00> }
[  FAILED  ] TestScheduler.InvokePolicyAndCheckOrphans (4 ms)

E.g.

Can't build on fresh Ubuntu 12.04.5

Building from the docker image for Ubuntu 12.04.5 fails when attempting to use cmake as the build script installs a version that's too low:

+ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.8 50
update-alternatives: using /usr/bin/gcc-4.8 to provide /usr/bin/gcc (gcc) in auto mode.
+ sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.8 50
update-alternatives: using /usr/bin/g++-4.8 to provide /usr/bin/g++ (g++) in auto mode.
+ CMAKE_NAME=cmake-2.8.12.1
+ GFLAGS_VER=2.1.1
+ GLOG_NAME=glog-0.3.3
+ pushd .
+ git clone https://github.com/google/double-conversion
/bistro/bistro/build/deps/fbthrift/thrift/build/deps/folly/folly/build/deps /bistro/bistro/build/deps/fbthrift/thrift/build/deps/folly/folly/build/deps
Cloning into 'double-conversion'...
+ cd double-conversion
+ cmake -DBUILD_SHARED_LIBS=ON .
CMake Error at CMakeLists.txt:1 (cmake_minimum_required):
  CMake 2.8.12 or higher is required.  You are running version 2.8.7

I attempted to build using 14.04.4 and the build script gets further and fails trying to fetch from a PPA:

W: Failed to fetch http://ppa.launchpad.net/boost-latest/ppa/ubuntu/dists/trusty/main/binary-amd64/Packages 404 Not Found

What am I missing here?

Building Docker image fails: liblib_bistro_if.a(common_types.cpp.o): undefined reference to symbol

Hi, i am struggling with building bistro in docker and i am not sure what is wrong. I am quite newbie with c++ so i really dont know what is the issue and if it is on my side.

Here are steps to reproduce:

git clone https://github.com/facebook/bistro.git
cd bistro/build && ./fbcode_builder/make_docker_context.py \
    --make-parallelism=$(nproc) \
    --docker-context-dir=../../ \
    --local-repo-dir="../" \
    --os-image="ubuntu:18.04" \
    --gcc-version=7
cd ../.. && docker build .

You can check attempt to automate the build: https://github.com/rectorphp/docker-base-bistro-image-builder/blob/master/.github/workflows/public-docker-image.yml

Here is the most recent (and i think the most relevant) logs:

make[1]: *** Waiting for unfinished jobs....
13958
[ 46%] Linking CXX executable test_cgroup_resources
13959
cd /home/bistro/bistro/cmake/Debug/physical/test && /usr/bin/cmake -E cmake_link_script CMakeFiles/test_cgroup_resources.dir/link.txt --verbose=1
13960
/usr/bin/c++  -g  -rdynamic CMakeFiles/test_cgroup_resources.dir/test_cgroup_resources.cpp.o  -o test_cgroup_resources  -L/home/install/lib -Wl,-rpath,/home/install/lib ../../cmake/deps/gtest-1.8.1/googlemock/gtest/libgtestd.a ../../libfolly_gtest_main.a ../libphysical_lib.a -lfolly -lfmt -lglog -lgflags -lboost_context -lboost_date_time -lboost_regex -lboost_system -lboost_thread -lboost_filesystem -ldouble-conversion -lproxygenhttpserver -lproxygen -lcrypto -lfizz -lpthread -lsqlite3 -lwangle -lssl -lsodium -lz -lzstd -lasync -lconcurrency -lprotocol -lthrift-core -lthriftcpp2 -lthriftmetadata -lthriftprotocol -ltransport -pthread ../../processes/libsubprocess_with_timeout.a ../../utils/libexception_lib.a ../../utils/libutils_lib.a ../../if/liblib_bistro_if.a ../../sqlite/libsqlite_lib.a -lfolly -lfmt -lglog -lgflags -lboost_context -lboost_date_time -lboost_regex -lboost_system -lboost_thread -lboost_filesystem -ldouble-conversion -lproxygenhttpserver -lproxygen -lcrypto -lfizz -lpthread -lsqlite3 -lwangle -lssl -lsodium -lz -lzstd -lasync -lconcurrency -lprotocol -lthrift-core -lthriftcpp2 -lthriftmetadata -lthriftprotocol -ltransport 
13961
/usr/bin/ld: ../../if/liblib_bistro_if.a(common_types.cpp.o): undefined reference to symbol '_ZN6apache6thrift6detail2st20translate_field_nameEN5folly5RangeIPKcEERsRNS0_8protocol5TTypeERKNS2_26translate_field_name_tableE'
13962
//home/install/lib/librpcmetadata.so: error adding symbols: DSO missing from command line
13963
collect2: error: ld returned 1 exit status
13964
make[2]: *** [physical/test/test_cgroup_resources] Error 1
13965
physical/test/CMakeFiles/test_cgroup_resources.dir/build.make:102: recipe for target 'physical/test/test_cgroup_resources' failed
13966
make[2]: Leaving directory '/home/bistro/bistro/cmake/Debug'
13967
make[1]: *** [physical/test/CMakeFiles/test_cgroup_resources.dir/all] Error 2
13968
CMakeFiles/Makefile2:4808: recipe for target 'physical/test/CMakeFiles/test_cgroup_resources.dir/all' failed
13969
make[1]: Leaving directory '/home/bistro/bistro/cmake/Debug'
13970
Makefile:140: recipe for target 'all' failed
13971

As i tried to automate build using github actions, logs are here: https://github.com/rectorphp/docker-base-bistro-image-builder/runs/1149345804?check_suite_focus=true

How to install

I've tried running /home/avner/bistro/bistro/cmake/run-cmake.sh

I get an error about "thrift1" . I went ahead and installed thrift , but thrift1 doesn't exist.

Are there some instructions how to install this?

How to run the simple example with docker?

I built successfully a docker image in my local. And i want to runthe simple example by docker-compose which is mentioned in the README.md. But the woker saide that BistroWorkerHandler.cpp:755] Unable to send heartbeat to scheduler: Channel is !good()

Bellow is my docker-compose.yml

version: "3.2"
services:
  scheduler:
    image: bistro
    working_dir: "/home/bistro/bistro"
    command: ["./cmake/Debug/server/bistro_scheduler", "--server_port=6789", "--http_server_port=6790", "--config_file=scripts/test_configs/simple", "--clean_statuses", "--CAUTION_startup_wait_for_workers=1", "--instance_node_name=scheduler"]
    ports:
      - "6789:6789"
      - "6790:6790"

  worker:
    image: bistro
    working_dir: "/home/bistro/bistro"
    command: ["./cmake/Debug/worker/bistro_worker", "--server_port=27182", "--scheduler_host=scheduler", "--scheduler_port=6789", "--worker_command=$HOME/demo_bistro_task.sh", "--data_dir=."]
    ports:
      - "27182:27182"
    links:
      - scheduler
    depends_on:
      - scheduler

I tried following setting for scheduler_host, that all are not working.
--scheduler_host=scheduler, --scheduler_host=::, --scheduler_host=0.0.0.0

What's the problem? Does anybody used the docker-compose?

cron schedule day_of_week with single item gets error: dow 3 carried to 3

cron schedule day_of_week with single item gets error, "dow 3 carried to 3".

These examples get the error (with today being Tuesday):

"day_of_week": 3,
"day_of_week": [3],
"day_of_week": [3,3],
"day_of_week":["tue"],
"day_of_week":"tue",
"day_of_week":"tuesday",

But multiple days of week works fine (runs the task):

"day_of_week": [1,3],
"day_of_week": ["mon","tue"],

Example error message:

E0516 09:20:18.108150 31104 Monitor.cpp:141] Updating monitor histogram (/home/username/src/bistro/bistro/monitor/Monitor.cpp:60): dow 3 carried to 3
E0516 09:20:19.658288 31107 Bistro.cpp:75] Main loop (/home/username/src/bistro/bistro/Bistro.cpp:48): Error getting nodes: dow 3 carried to 3

Example configuration that produces the error:

{
  "bistro_settings": {
    "resources": {
      "instance": {"concurrency": {"limit": 10, "default": 1}},
      "level1": {
        "my_resource": {"limit": 3, "default": 0}
      }
    },
    "nodes": {
      "levels": ["level1", "level2"],
      "node_sources": [
        {
          "source": "manual",
          "prefs": {
            "node1": []
          }
        },
        {
          "source": "add_time",
          "prefs": {
            "parent_level": "level1",
            "schedule": [
              {
                "cron": {
                  "hour": 8,
                  "minute": 50,
                  "day_of_week": 3,
                  "dst_fixes": ["unskip", "repeat_use_only_early"]
                },
                "lifetime": 6000,
                "tags": ["tag_job2"]
              }
            ]
          }
        }
      ]
    },
    "enabled" : true
  },
  "bistro_job->job2" : {
    "owner" : "test",
    "enabled" : true,
    "command" : ["/bs/job_script.sh"],
    "priority": 1,
    "resources": {
      "my_resource": 1
    },
    "filters": {
      "level2": {
        "tag_whitelist": ["tag_job2"]
      }
    }
  }
}

Example scheduler startup:

#!/bin/bash
cd $HOME/src/bistro/bistro

# Start the scheduler in one terminal
./cmake/Debug/server/bistro_scheduler \
  --server_port=6789 \
  --http_server_port=6790 \
  --config_file=$HOME/bs/config.json \
  --clean_statuses \
  --CAUTION_startup_wait_for_workers=700 \
  --instance_node_name=scheduler

Example worker startup:

cd $HOME/src/bistro/bistro
./cmake/Debug/worker/bistro_worker \
  --server_port=27182 \
  --scheduler_host=:: \
  --scheduler_port=6789 \
  --worker_command="$HOME/bs/default_task.sh" \
  --data_dir=/tmp/bistro_worker

Cron schedule not working?

The Cron schedule with specific date and time (runs only once) does not seem to be working for me (maybe configuration issue?) Here is my configuration:

{
  "bistro_settings": {
    "resources": {
      "instance": {"concurrency": {"limit": 10, "default": 1}},
      "level1": {
        "my_resource": {"limit": 3, "default": 0}
      }
    },
    "nodes": {
      "levels": ["level1", "level2"],
      "node_sources": [{
          "source": "manual",
          "prefs": {
            "node1": []
          }
        }, {
          "source": "add_time",
          "prefs": {
            "parent_level": "level1",
            "schedule": [{
                "cron": {
                  "year": 2017,
                  "month": 5,
                  "day_of_month": 3,
                  "hour": 7,
                  "minute": 15,
                  "dst_fixes": ["unskip", "repeat_use_only_early"]
                },
                "lifetime": 50
              }
            ]
          }
        }
      ]
    },
    "enabled" : true
  },

  "bistro_job->job1" : {
    "owner" : "test",
    "enabled" : true,
    "command" : ["/path/to/task_script.sh"],
    "priority": 2,
    "resources": {
      "my_resource": 1
    }
  }
}

However, if I change the schedule item to epoch, then it works right away:
{
"cron": {"epoch": {"period": 60}},
"lifetime": 20
}

Could you give a working Cron example, please? Thank you.

How to use bistro as workqueue?

Hi Developers,

I'm working on setting up distributed task scheduling system to achieve data migration from HDFS to local file system. The example use case is querying the namenode to get file list, then create and schedule the task for each individual file. In README, the example code shows that scheduling and executing one task which written in script. Here are some of my questions:

  1. Is there any API list for other any language that integration with bistro?
  2. How to use bistro as workqueue? In other words, after creating each task, how do I keep those created tasks? I noticed bistro can talk with MySQL, HBase and Postgres. Would you please provide more information about this?
  3. My plan is to run "task agent" on each "worker", instead of shell script way, something there's daemon process running on worker and waiting for scheduling from scheduler. Would you please some examples?

Thanks in advance!

How send job to bistro_worker ?

Hi admin,

I started :
bistro_scheduler --server_port=6789 --http_server_port=6790 --config_file=${ROOT_PATH}/configs/bistro/scheduler.json --data_dir=/mnt/data/bistro/scheduler --clean_statuses --CAUTION_startup_wait_for_workers=1 --instance_node_name=scheduler

bistro_worker --server_port=27182 --scheduler_host=:: --scheduler_port=6789 --worker_command="${ROOT_PATH}/configs/bistro/demo_bistro_task.sh" --data_dir=/mnt/data/bistro/worker

At the moment, scheduler + worker servers are running in two ports: 6789 (scheduler), 27182 (worker). How do I post new job (with parameter) for bistro worker ?

Is bistro architecture the same Gearmand? (http://gearman.org/)

Thank you!

Python 2 syntax error in scratch_test.py

$ python2 -c "C:\users"

  File "<string>", line 1
    C:\users
     ^
SyntaxError: invalid syntax

On Python 3, this is not a syntax error.

$ flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics

./build/fbcode_builder/getdeps/test/scratch_test.py:20:1: E999 SyntaxError: (unicode error) 'rawunicodeescape' codec can't decode bytes in position 2-3: truncated \uXXXX
                r"C:\users\alice\appdata\local\temp\fbcode_builder_getdeps",
^
1     E999 SyntaxError: (unicode error) 'rawunicodeescape' codec can't decode bytes in position 2-3: truncated \uXXXX
1

can not buid bistro

I have some progress on building Bistro. I can run travis_docker_build.sh to build but the following error blocked the building process. I found the TMultiplexedProtocol.cpp is an apache thrift file and it is not contained in fbthrift project. And I still can not find a successful build in Travis.

Would you like to tell me how to fix the problem so I can move forward. many thanks!

make[4]: *** No rule to make target 'protocol/TMultiplexedProtocol.cpp', needeby 'protocol/TMultiplexedProtocol.lo'. Stop.
make[4]: *** Waiting for unfinished jobs....
make[4]: Leaving directory '/home/fbthrift/thrift/lib/cpp'
Makefile:1287: recipe for target 'all-recursive' failed
make[3]: Leaving directory '/home/fbthrift/thrift/lib/cpp'
make[3]: *** [all-recursive] Error 1
Makefile:416: recipe for target 'all-recursive' failed
make[2]: Leaving directory '/home/fbthrift/thrift/lib'
make[2]: *** [all-recursive] Error 1
Makefile:498: recipe for target 'all-recursive' failed
make[1]: Leaving directory '/home/fbthrift/thrift'
make[1]: *** [all-recursive] Error 1

Docker build not working (FunctionInfo.h missing)

Hey,

I just tried compiling bistro through Docker and was not able to complete the process. Somewhere during the build the compiler throws an error that FunctionInfo.h is missing from the fbthrift dependency.

After digging through the container I was not able to recover log file with the error message. It would be great to get some info as to where these messages get stored. Running find /home -type f -name *.log -exec /bin/bash -c "cat {} | grep FunctionInfo" did not return anything useful.

I did manage to find a probable cause for this issue by searching for references to FunctionInfo.h in the source files (/home/*) and working backwards from there. The Makefile.am in the fbthrift repository includes only the transport/core/TransportRoutingHandler.h and transport/core/ThriftProcessor.h headers. My guess is that FunctionInfo.h should also be included here.

https://github.com/facebook/fbthrift/blob/76d376e1b7d0f189708ef6438abb40862be360d5/thrift/lib/cpp2/Makefile.am#L182-L184

Bistro source code didn't match with folly library.

-- Configuring done
-- Generating done
-- Build files have been written to:  workspacecpp/bistro/bistro/build/Debug
[  1%] Built target lib_bistro_sqlite
[ 17%] Built target lib_bistro_if
[ 18%] Building CXX object utils/CMakeFiles/lib_bistro_utils.dir/ProcessRunner.cpp.o
 workspacecpp/bistro/bistro/utils/ProcessRunner.cpp: In member function ‘std::unique_ptr<folly::Subprocess> facebook::bistro::ProcessRunner::makeSubprocess(facebook::bistro::ProcessRunner::WriteCallback, const boost::filesystem::path&, const std::vector<std::basic_string<char> >&, const std::vector<std::basic_string<char> >&)’:
 workspacecpp/bistro/bistro/utils/ProcessRunner.cpp:138:5: error: ‘std::string’ has no member named ‘toStdString’
   ).toStdString());
     ^
 workspacecpp/bistro/bistro/utils/ProcessRunner.cpp: In member function ‘void facebook::bistro::ProcessRunner::exceptionCallback(folly::StringPiece, folly::StringPiece, const std::exception&)’:
 workspacecpp/bistro/bistro/utils/ProcessRunner.cpp:183:5: error: ‘std::string’ has no member named ‘toStdString’
   ).toStdString());
     ^
utils/CMakeFiles/lib_bistro_utils.dir/build.make:123: recipe for target 'utils/CMakeFiles/lib_bistro_utils.dir/ProcessRunner.cpp.o' failed
make[2]: *** [utils/CMakeFiles/lib_bistro_utils.dir/ProcessRunner.cpp.o] Error 1
CMakeFiles/Makefile2:2907: recipe for target 'utils/CMakeFiles/lib_bistro_utils.dir/all' failed
make[1]: *** [utils/CMakeFiles/lib_bistro_utils.dir/all] Error 2
Makefile:86: recipe for target 'all' failed
make: *** [all] Error 2

I'm recheck with folly source code in FBString.h is return std::string.

1113 #ifndef _LIBSTDCXX_FBSTRING
1114   // Compatibility with std::string
1115   basic_fbstring & operator=(const std::string & rhs) {
1116     return assign(rhs.data(), rhs.size());
1117   }
1118
1119   // Compatibility with std::string
1120   std::string toStdString() const {
1121     return std::string(data(), size());
1122   }
1123 #else
1124   // A lot of code in fbcode still uses this method, so keep it here for now.
1125   const basic_fbstring& toStdString() const {
1126     return *this;
1127   }
1128 #endif

It should compile bistro source code completed, but after run script build.sh Debug is show an error above.

Build failure

Hello

Thanks for this project - Bistro seems to be an awesome tool!

Trying to build with following commands:
./fbcode_builder/make_docker_context.py --os-image ubuntu:16.04 --gcc-version 5
then into dir returned by cmd above:
docker build .

Got following error:

Step 119/120 : RUN make -j '1'
 ---> Running in 0a59e2acdb6b
Scanning dependencies of target bistro_lib
[  0%] Building CXX object CMakeFiles/bistro_lib.dir/Bistro.cpp.o
Linking CXX static library libbistro_lib.a
[  0%] Built target bistro_lib
Scanning dependencies of target gtest
[  0%] Building CXX object cmake/deps/gtest-1.7.0/CMakeFiles/gtest.dir/src/gtest-all.cc.o
Linking CXX static library libgtest.a
[  0%] Built target gtest
Scanning dependencies of target gtest_main
[  1%] Building CXX object cmake/deps/gtest-1.7.0/CMakeFiles/gtest_main.dir/src/gtest_main.cc.o
Linking CXX static library libgtest_main.a
[  1%] Built target gtest_main
Scanning dependencies of target subprocess_with_timeout
[  2%] Building CXX object processes/CMakeFiles/subprocess_with_timeout.dir/SubprocessOutputWithTimeout.cpp.o
Linking CXX static library libsubprocess_with_timeout.a
[  2%] Built target subprocess_with_timeout
Scanning dependencies of target processes
[  3%] Building CXX object processes/CMakeFiles/processes.dir/AsyncCGroupReaper.cpp.o
[  3%] Building CXX object processes/CMakeFiles/processes.dir/AsyncReadPipeRateLimiter.cpp.o
[  4%] Building CXX object processes/CMakeFiles/processes.dir/CGroupSetup.cpp.o
[  4%] Building CXX object processes/CMakeFiles/processes.dir/TaskSubprocessQueue.cpp.o
Linking CXX static library libprocesses.a
[  4%] Built target processes
Scanning dependencies of target test_async_read_pipe
[  4%] Building CXX object processes/tests/CMakeFiles/test_async_read_pipe.dir/test_async_read_pipe.cpp.o
Linking CXX executable test_async_read_pipe
[  4%] Built target test_async_read_pipe
Scanning dependencies of target test_async_read_pipe_rate_limiter
[  5%] Building CXX object processes/tests/CMakeFiles/test_async_read_pipe_rate_limiter.dir/test_async_read_pipe_rate_limiter.cpp.o
Linking CXX executable test_async_read_pipe_rate_limiter
CMakeFiles/test_async_read_pipe_rate_limiter.dir/test_async_read_pipe_rate_limiter.cpp.o: In function `facebook::bistro::AsyncReadPipeRateLimiter::AsyncReadPipeRateLimiter(folly::EventBase*, unsigned int, long, std::vector<std::shared_ptr<facebook::bistro::AsyncReadPipe>, std::allocator<std::shared_ptr<facebook::bistro::AsyncReadPipe> > >)':
/home/bistro/bistro/../../bistro/bistro/processes/AsyncReadPipeRateLimiter.h:35: undefined reference to `vtable for facebook::bistro::AsyncReadPipeRateLimiter'
CMakeFiles/test_async_read_pipe_rate_limiter.dir/test_async_read_pipe_rate_limiter.cpp.o: In function `_ZZN4PipeC4EPKcPN5folly9EventBaseElNS2_5RangeIS1_EEENKUlPN8facebook6bistro13AsyncReadPipeES6_E_clESA_S6_':
/home/bistro/bistro/processes/tests/test_async_read_pipe_rate_limiter.cpp:45: undefined reference to `facebook::bistro::AsyncReadPipeRateLimiter::reduceQuotaBy(long)'
collect2: error: ld returned 1 exit status
make[2]: *** [processes/tests/test_async_read_pipe_rate_limiter] Error 1
make[1]: *** [processes/tests/CMakeFiles/test_async_read_pipe_rate_limiter.dir/all] Error 2
make: *** [all] Error 2
The command '/bin/bash -c make -j '1'' returned a non-zero code: 2

CLI tools are still missing

"Although Bistro has been in production at Facebook for over 3 years, the present public release is partial, including just the server components. The CLI tools and web UI will be shipping shortly."

It would be great to see these.

Trouble when using host physical resources

Hi,

I'm trying to use bistro Discovering available physical resources capability in the toy example of the README without success.
I run into two problems:

Bistro seems not to find any of the resources of my computer:

Running:

./cmake/Debug/worker/bistro_worker --scheduler_host=:: --scheduler_port=6789 --worker_command="$HOME/demo_bistro_task.sh" --data_dir=/tmp/bistro_worker --nvidia_smi=/usr/bin/nvidia-smi

Among other, get this output (worker terminal):

I0515 18:47:10.139637 107952 BistroWorkerHandler.cpp:100] Worker is ready: BistroWorker {
[...]
7: usableResources (struct) = UsablePhysicalResources {
1: msSinceEpoch (i64) = 0,
2: cpuCores (double) = 0.0,
3: memoryMB (double) = 0.0,
4: gpus (list) = list[0] {
},
},
}

It may be only some synchronization error as I am able to get my real computer configuration by running the following commands (the order is important): run bistro_scheduler then run bistro_worker then kill bistro_scheduler then run bistro_scheduler again (scheduler terminal):

I0515 18:12:31.293471 107047 AutoTimer.h:143] Got 0 running tasks from worker BistroWorker {
[...]
7: usableResources (struct) = UsablePhysicalResources {
1: msSinceEpoch (i64) = 1589559121458,
2: cpuCores (double) = 128.0,
3: memoryMB (double) = 128737.11328125,
4: gpus (list) = list[3] {
[0] = GPUInfo {
1: name (string) = "GeForce RTX 2080 SUPER",
2: pciBusID (string) = "00000000:01:00.0",
3: memoryMB (double) = 7979.0,
4: compute (double) = 1.0,
},
[1] = GPUInfo {
1: name (string) = "GeForce RTX 2080 SUPER",
2: pciBusID (string) = "00000000:21:00.0",
3: memoryMB (double) = 7982.0,
4: compute (double) = 1.0,
},
[2] = GPUInfo {
1: name (string) = "GeForce RTX 2080 SUPER",
2: pciBusID (string) = "00000000:4B:00.0",
3: memoryMB (double) = 7982.0,
4: compute (double) = 1.0,
},
},

I'm not able to properly configure the bistro_settings with physical resources

My setting file is the following:

{

"bistro_settings" : {
  "resources" : {
    "worker" : {
      "ram" : {
        "limit" : 0,
        "default" : 0
      },
      "cpu" : {
        "limit" : 0,
        "default" : 1
      },
      "gpu" : {
        "limit" : 0,
        "default" : 0
      }
    }
  },
  "nodes" : {
    "levels": ["worker", "level1", "level2"],
    "node_sources": [{
      "source": "manual",
      "prefs": {
        "node1": ["node11", "node12"],
        "node2": ["node21", "node22"]
      }
    }]
  },
  "enabled" : true,
  "physical_resources": {
    "ram_mb": {
        "logical_resource": "ram",
        "multiply_logical_by": 1024,
        "physical_reserve_amount": 4096
    },
    "cpu_core": {
        "logical_resource": "cpu",
        "enforcement": "none"
    },
    "gpu_card": {
        "logical_resource": "gpu"
    }
  },
},


"bistro_job->simple_job" : {
  "owner" : "test",
  "enabled" : true
}
}

So if I had well understood the documentation all of my node will require 1 cpu to be run by the simple_job. And I expect that the limit of ram, cpu and gpu will be updated accordingly to the worker capability. But when I'm running the simple example with this configuration file, no jobs are launched and the scheduler is waiting for a worker with enough capability.

I would be very grateful if you can give me some insight for solving my problem.

Nathan

How can achieve HA/Failover of scheduler

Hi,

My company is interested in using Bistro for our task distributed system. We are reading the design and code of bistro, one important factor for us is how to achieve high availability of scheduler. Can you let me know if this is implemented? If not, how can I achieve it, where is the best place I can add HA logic?

Best regards
Nathan

Sometimes scheduler gets error No initial worker set ID consensus

Sometime it gets error and does not run tasks:
W0517 08:44:53.419775 11057 RemoteWorkerRunner.cpp:89] RemoteWorkerRunner initial wait (/home/user/src/bistro/bistro/runners/RemoteWorkerRunner.cpp:75): No initial worker set ID consensus. Waiting for all workers to connect before running tasks.

It sometime seems to work more consistently if worker is started (fully) before scheduler(?)

Scheduler startup:

$HOME/src/bistro/bistro/cmake/Debug/server/bistro_scheduler \
  --server_port=6789 \
  --http_server_port=6790 \
  --config_file=/etc/bs/config.json \
  --clean_statuses \
  --CAUTION_startup_wait_for_workers=700 \
  --instance_node_name=scheduler

Worker startup:

$HOME/src/bistro/bistro/cmake/Debug/worker/bistro_worker \
  --server_port=27182 \
  --scheduler_host=:: \
  --scheduler_port=6789 \
  --worker_command="/etc/bs/default_task.sh" \
  --data_dir=/tmp/bistro_worker

custom synonyms list not saved

Hi,

I created custom synonyms list (Tools > create custom synonyms), gave a name, input about 200 positions and clicked on 'save'. However, when trying to filter ('search this file') the name doesn't autocomplete, and going back to Tools > create custom synonyms, you see that the synonym that was just created/saved, is not there.

thanks,

Pankaj

Support Encrypted Traffic

It might just be the docs being in a fresh state but I can't see anything about encrypting inter-node traffic. It would be nice to be able to run Bistro without trusting the network.

How to optimize Docker image size?

Hi, i was finally able to build a Docker image with Bistro, but i am a bit worried about it's enormous size. It has roughly 5.2gb.

Do you have any tips how to reduce it's size?

It is automatically generated Dockerfile using fbcode_builder.

Basically it is repeating blocks of download+build+install blocks:

### Check out fmtlib/fmt, workdir build ###

USER root
RUN mkdir -p '/home' && chown 'nobody' '/home'
USER 'nobody'
WORKDIR '/home'
RUN git clone  https://github.com/'fmtlib/fmt'
USER root
RUN mkdir -p '/home'/'fmt'/'build' && chown 'nobody' '/home'/'fmt'/'build'
USER 'nobody'
WORKDIR '/home'/'fmt'/'build'
RUN git checkout '6.2.1'

### Build and install fmtlib/fmt ###

RUN CXXFLAGS="$CXXFLAGS -fPIC -isystem "'/home/install'"/include" CFLAGS="$CFLAGS -fPIC -isystem "'/home/install'"/include" cmake -D'CMAKE_INSTALL_PREFIX'='/home/install' -D'BUILD_SHARED_LIBS'='ON' '..'
RUN make -j '4' VERBOSE=1 
RUN make install VERBOSE=1 

I was thinking if i can somehow remove cache. Maybe just rm -rf /fmt (same for every other cloned repository) after package is installed could help to reduce size.

As well i do not usually use c++ so i do not know how it really works internally, please if i am mistaken and my idea is stupid, just correct me 😄 if we could take only the final binaries and extract them to different, clean, docker image?

Other idea was using some alpine based linux or other base image than ubuntu (quick googling brought me to https://github.com/madduci/docker-cpp-env).

Can anything of this work or would you suggest anything completely different?

I was thinking about having autoscaling mechanism for bistro workers etc on aws spot instances (maybe even as lambdas) and for these purposes i wanted to have image as thin as possible.

Docker build failed with permissions denied error on step 119

Hello,

I tried building bistro using docker image and i am facing this issue .

Command :
os_image=ubuntu:16.04 gcc_version=5 make_parallelism=2 travis_cache_dir=~/travis_ccache ./../build/fbcode_builder/travis_docker_build.sh

Error:
Step 119/126 : RUN PATH="$PATH:"'/home/install'/bin TEMPLATES_PATH='/home/install'/include/thrift/templates ./cmake/run-cmake.sh Debug -DCMAKE_INSTALL_PREFIX='/home/install'
---> Running in e731295f3654
/bin/bash: ./cmake/run-cmake.sh: Permission denied
The command '/bin/bash -c PATH="$PATH:"'/home/install'/bin TEMPLATES_PATH='/home/install'/include/thrift/templates ./cmake/run-cmake.sh Debug -DCMAKE_INSTALL_PREFIX='/home/install'' returned a non-zero code: 126

  • build_exit_code=126
  • echo 'Build failed with code 126, trying to save ccache'
    Build failed with code 126, trying to save ccache

This file seems to have all the permissions. Don't know whats going wrong.
rwxrwxrwx 1 psachan psachan 3K Jan 11 11:58 ./cmake/run-cmake.sh*

Any help is appreciated.
Thanks

Docker build is failing

Hi ,
I am trying to install bistro in below environment .
os : ubuntu:16.04
gcc : 5
This is blocking ,please suggest any quick around for installation of any stable version of bistro .

Below is the error i am getting in log file :

Step 124/126 : RUN make -j '1'
---> Running in bfef425bf6ae
Scanning dependencies of target bistro_lib
[ 0%] Building CXX object CMakeFiles/bistro_lib.dir/Bistro.cpp.o
[ 0%] Linking CXX static library libbistro_lib.a
[ 0%] Built target bistro_lib
Scanning dependencies of target gtest
[ 1%] Building CXX object cmake/deps/gtest-1.7.0/CMakeFiles/gtest.dir/src/gtest-all.cc.o
[ 1%] Linking CXX static library libgtest.a
[ 1%] Built target gtest
Scanning dependencies of target gtest_main
[ 1%] Building CXX object cmake/deps/gtest-1.7.0/CMakeFiles/gtest_main.dir/src/gtest_main.cc.o
[ 2%] Linking CXX static library libgtest_main.a
[ 2%] Built target gtest_main
Scanning dependencies of target subprocess_with_timeout
[ 2%] Building CXX object processes/CMakeFiles/subprocess_with_timeout.dir/SubprocessOutputWithTimeout.cpp.o
[ 3%] Linking CXX static library libsubprocess_with_timeout.a
[ 3%] Built target subprocess_with_timeout
Scanning dependencies of target processes
[ 4%] Building CXX object processes/CMakeFiles/processes.dir/AsyncCGroupReaper.cpp.o
[ 4%] Building CXX object processes/CMakeFiles/processes.dir/AsyncReadPipeRateLimiter.cpp.o
[ 4%] Building CXX object processes/CMakeFiles/processes.dir/CGroupSetup.cpp.o
[ 5%] Building CXX object processes/CMakeFiles/processes.dir/TaskSubprocessQueue.cpp.o
[ 5%] Linking CXX static library libprocesses.a
[ 5%] Built target processes
Scanning dependencies of target test_cgroup_setup
[ 5%] Building CXX object processes/tests/CMakeFiles/test_cgroup_setup.dir/test_cgroup_setup.cpp.o
[ 6%] Linking CXX executable test_cgroup_setup
^[[91mCMakeFiles/test_cgroup_setup.dir/test_cgroup_setup.cpp.o: In function TestCGroupSetup_TestSetup_Test::TestBody()::{lambda()#1}::operator()() const': /home/bistro/bistro/processes/tests/test_cgroup_setup.cpp:72: undefined reference to facebook::bistro::cgroupSetup(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, facebook::bistro::cpp2::CGroupOptions const&)'
CMakeFiles/test_cgroup_setup.dir/test_cgroup_setup.cpp.o: In function TestCGroupSetup_TestSetup_Test::TestBody()::{lambda()#4}::operator()() const': /home/bistro/bistro/processes/tests/test_cgroup_setup.cpp:93: undefined reference to facebook::bistro::cgroupSetup(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, facebook::bistro::cpp2::CGroupOptions const&)'
CMakeFiles/test_cgroup_setup.dir/test_cgroup_setup.cpp.o: In function TestCGroupSetup_TestSetup_Test::TestBody()::{lambda()#6}::operator()() const': /home/bistro/bistro/processes/tests/test_cgroup_setup.cpp:118: undefined reference to facebook::bistro::cgroupSetup(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, facebook::bistro::cpp2::CGroupOptions const&)'
CMakeFiles/test_cgroup_setup.dir/test_cgroup_setup.cpp.o: In function TestCGroupSetup_TestSetup_Test::TestBody()::{lambda()#7}::operator()() const': /home/bistro/bistro/processes/tests/test_cgroup_setup.cpp:139: undefined reference to facebook:^[[0m^[[91m:bistro::cgroupSetup(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, facebook::bistro::cpp2::CGroupOptions const&)'
CMakeFiles/test_cgroup_setup.dir/test_cgroup_setup.cpp.o: In function TestCGroupSetup_TestSetup_Test::TestBody()': /home/bistro/bistro/processes/tests/test_cgroup_setup.cpp:63: undefined reference to facebook::bistro::cpp2::CGroupOptions::CGroupOptions()'
/home/bistro/bistro/processes/tests/test_cgroup_setup.cpp:67: undefined reference to `facebook::bistro::cgroupSetup(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, facebook::bistro::cpp2::CGroupOptions const&)'

Cannot build Bistro on Debian 9

When i run run_cmake.sh I get the following error.
admin@instance-bistro-test:/bistro/bistro/bistro/cmake$ ./run-cmake.sh Debug
/bistro/bistro/bistro/cmake/deps /bistro/bistro/bistro/cmake
/bistro/bistro/bistro/cmake
Generating Thrift Files
/bistro /bistro/bistro/bistro
!!! Unrecognized option: /bistro/bistro/bistro/cmake/../../..
Usage: thrift [options] file

Worker console shows error with 1000 jobs: Failed to fully qualify local hostname: Bad file descriptor

With 1000 jobs, worker console show errors (but jobs actually ran to completion):
I0517 09:02:11.589076 11198 TaskSubprocessQueue.cpp:113] Task job993, node1:1495025520 message: {"status":{"result_bits":4},"invocation_rand":6304019436659602736,"event":"got_status","worker_host":"","invocation_start_time":1495026072,"raw_status":"done"}
I0517 09:02:48.227895 11216 BistroWorkerHandler.cpp:285] Queueing healthcheck started at 1495026168
E0517 09:02:48.228618 11191 hostname.cpp:40] Failed to fully qualify local hostname: Bad file descriptor [9]

Python script to generate test configuration:

#!/usr/bin/env python
# Example generator of load test configuration.
# Usage: python genconfig.py [FILE]
# Example: python genconfig.py /etc/bs/config.json
# If FILE is not specified, it outputs to stdout.
# Need to configure the time schedule at the bottom (look for 'Configuration').

import sys

templ = """{
  \"bistro_settings\": {
    \"resources\": {
      \"instance\": {\"concurrency\": {\"limit\": 1000, \"default\": 1}},
      \"level1\": {
        \"my_resource\": {\"limit\": 1000, \"default\": 0}
      }
    },
    \"nodes\": {
      \"levels\": [\"level1\", \"level2\"],
      \"node_sources\": [
        {
          \"source\": \"manual\",
          \"prefs\": {
            \"node1\": []
          }
        },

        {
          \"source\": \"add_time\",
          \"prefs\": {
            \"parent_level\": \"level1\",
            \"schedule\": [
              %s
            ]
          }
        }

      ]
    },
    \"enabled\" : true
  },

  %s
}
"""

def gen_cron_item(month, day_of_month, hour, minute, job_name):
  cron_templ = """
              {
                \"cron\": {
                  \"month\": %d,
                  \"day_of_month\": %d,
                  \"hour\": %d,
                  \"minute\": %d,
                  \"dst_fixes\": [\"unskip\", \"repeat_use_only_early\"]
                },
                \"lifetime\": 6000,
                \"tags\": [\"tag_%s\"]
              }
""" % (month, day_of_month, hour, minute, job_name)
  return cron_templ

def gen_job(job_name):
  job_templ = """
    \"bistro_job->%s\" : {
    \"owner\" : \"test\",
    \"enabled\" : true,
    \"command\" : [\"/etc/bs/job_script.py\"],
    \"priority\": 2,
    \"resources\": {
      \"my_resource\": 1
    },
    \"filters\": {
      \"level2\": {
        \"tag_whitelist\": [\"tag_%s\"]
      }
    }
  }
""" % (job_name, job_name)
  return job_templ

if __name__ == "__main__":
  # Configuration
  month = 5
  day_of_month = 17
  hour = 9
  minute = 27
  num_jobs = 1000

  cron_items = []
  jobs = []
  for i in range(1, num_jobs+1):
    job_name = "job%d" % i
    cron_item = gen_cron_item(month, day_of_month, hour, minute, job_name)
    cron_items.append(cron_item)
    job = gen_job(job_name)
    jobs.append(job)

  config = templ % (', '.join(str(x) for x in cron_items), ', '.join(str(x) for x in jobs))
  if len(sys.argv) > 1:
    file_name = sys.argv[1]
    with open(file_name, 'w') as f:
      f.write(config)
  else:
    print(config)

server.sh

#!/bin/bash
$HOME/src/bistro/bistro/cmake/Debug/server/bistro_scheduler \
  --server_port=6789 \
  --http_server_port=6790 \
  --config_file=/etc/bs/config.json \
  --clean_statuses \
  --CAUTION_startup_wait_for_workers=700 \
  --instance_node_name=scheduler

worker.sh

worker.sh
#!/bin/bash
[ -f /tmp/bistro_worker ] || mkdir /tmp/bistro_worker
$HOME/src/bistro/bistro/cmake/Debug/worker/bistro_worker \
  --server_port=27182 \
  --scheduler_host=:: \
  --scheduler_port=6789 \
  --worker_command="/etc/bs/default_task.sh" \
  --data_dir=/tmp/bistro_worker

job_script.py

#!/usr/bin/env python
import sys
import json
import time

# args: ScriptPath ShardID NamedPipe JobArgs
print("python job script args: %s" % json.dumps(sys.argv))
print("stderr is logged too", file=sys.stderr)
# Simulate random work
N = 1 * 60  # about N minutes
for i in range(1,N):
  for j in range(1,10):
    a = 3 + j * 100 / 25
    b = a * a / 2
    c = b * b * b / b + 35
  # print("loop %d" % i) # debug
  time.sleep(1.0)
with open('/tmp/test.log', 'a') as f:
  f.write('done\n')
with open(sys.argv[2], 'w') as f:
  f.write("done")  # Report the task status to Bistro via a named pipe

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.