infochimps-labs / ironfan Goto Github PK

Chef orchestration layer -- your system diagram come to life. Provision EC2, OpenStack or Vagrant without changes to cookbooks or configuration

Home Page: http://infochimps.com

License: Other

Ruby 98.45% Shell 1.43% JavaScript 0.11%

ironfan's Introduction

Ironfan Core: Knife Tools and Core Models

Ironfan, the foundation of The Infochimps Platform, is an expressive toolset for constructing scalable, resilient architectures. It works in the cloud, in the data center, and on your laptop, and it makes your system diagram visible and inevitable. Inevitable systems coordinate automatically to interconnect, removing the hassle of manual configuration of connection points (and the associated danger of human error). For more information about Ironfan and the Infochimps Platform, visit infochimps.com.

This repo implements:

Core models to describe your system diagram with a clean, expressive domain-specific language
Knife plugins to orchestrate clusters of machines using simple commands like knife cluster launch
Logic to coordinate truth among chef server and cloud providers

Getting Started

To jump right into using Ironfan, follow our Installation Instructions. For an explanatory tour, check out our Web Walkthrough. Please file all issues on Ironfan issues.

Tools

Ironfan consists of the following Toolset:

ironfan-homebase: centralizes the cookbooks, roles and clusters. A solid foundation for any chef user.
ironfan gem:
- core models to describe your system diagram with a clean, expressive domain-specific language
- knife plugins to orchestrate clusters of machines using simple commands like knife cluster launch
- logic to coordinate truth among chef server and cloud providers.
ironfan-pantry: Our collection of industrial-strength, cloud-ready recipes for Hadoop, HBase, Cassandra, Elasticsearch, Zabbix and more.
silverware cookbook: coordinate discovery of services ("list all the machines for awesome_webapp, that I might load balance them") and aspects ("list all components that write logs, that I might logrotate them, or that I might monitor the free space on their volumes".

Documentation

Index of wiki pages
Ironfan wiki: high-level documentation
Ironfan issues: bugs, questions and feature requests for any part of the ironfan toolset.
Ironfan gem docs: rdoc docs for Ironfan
Ironfan Screencast -- build a Hadoop cluster from scratch in 20 minutes.
Ironfan powers the Infochimps Platform, our scalable enterprise big data platform. Ironfan Enterprise adds zero-configuration logging, monitoring and a compelling UI.

Note: Ironfan is not compatible with Ruby 1.8. All versions later than 1.9.2-p136 should work fine.

The Ironfan Way

Core Concepts -- Components, Announcements, Amenities and more.
Philosophy -- Best practices and lessons learned
Style Guide -- Common attribute names, how and when to include other cookbooks, and more
Homebase Layout -- How this homebase is organized, and why

Getting Help

Feel free to contact us at [email protected] or 855-DATA-FUN
Also, you're invited to a private consultation with Infochimps founders on your big data project.

ironfan's People

Contributors

$fractaloop avatar$

Stargazers

Watchers

Forkers

ekoontz ziz rmcl gwho ekampf ananelson pcn intelie remotesyssupport gchpaco mrogg guniorobot lazzarello dius cindia-blue asherbond orls drax68 tarikus darkseed jordandm jasherai scalp42 matiu2 steveblackmon jackl0phty doubleotoo beachheadstudio vmware-serengeti double-z whoschek huhongbo senilegenius kthguru nickmarden tannerburson ellisera msaffitz strategist922 crowdtap kornypoet jeffreylutz benattar shredder12 liku arrawatia cc-sysops rottmanj timgasper patrickmcguinness independentip charlesmartin14 jessehu sureddy philippeguay mailmahee leogamas yoshizrbit acritox andrewgoktepe rberger alexknith gad0lin web5design rgaidot applifier epiclist howech shashirsa betamatt yoramw destrys gwilton drsquidop joehahnic mohitsethi sanjayoliver alanma rdegraci sryan3 rapidriversoftware smaftoul fattyuke akshatknsl ajaybhatnagar haseebalam mbrukman wohckcin satisfaction mrxiansen venkateshsampath bearerpipelinetest

ironfan's Issues

knife cluster {stop|start} cluster [facet]

a user should be able to

  knife cluster {stop|start} cluster [facet]

hadoop_cluster cookbook - compatibility with CDH3u1

CDH3u1 dicks around with a few paths. These should be updated so that those paths are attributes (so that CDH3b3 still works).

The new (CDH3u1) paths should be the default.

Omnibus plan for landing version_2 => Master

These are the planned changes we will make to move the version_2 branch into production and actually release as v2.0.

High Priority:

H-1. 20+ Node Cluster Launch

Cluster Chef 2 should be able to handle a timely cluster launch that involves 20 or more nodes at the same time with no intervention from the user. (It currently does not do so efficiently, as all of the nodes are launched in series, also old Broham occasionally has a spaz attack and assigns two nodes the same identity).

H-2. New Purpose Assigner (aka Patty)

Replace old Broham... Patty Bernstein showed Navin R. Johnson (Steve Martin's character in The Jerk) what his special purpose was. Exact requirements for this are still evolving in conversations between me and Nathan, but whatever it is, it needs to be able to ensure that requirement 1 does its thing.

H-3. "Cloud SSH" equivalent

a user should be able to execute a command locally like "knife cluster ssh sudo sv restart hadoop-0.20-datanode" and expect it to run the command on every one of the machines in the cluster.

H-4. Unified recipes

at the end of the cluster chef 2 project, we should have just one set of chef recipes in the cluster chef repo

H-5. Virgin AMI Bootstrapping

"knife cluster launch xxxx yyyy" should work even if the cluster in question uses stock AMIs.

H-6. Don't use AWS user data to pass DNA in

bootstrapping dna data should not be inserted into the AWS instance userdata. It should go through another mechanism - perhaps scp to the node file system as a part of the bootstrap.

H-7. Keep Jacob happy

while we are developing, Jacob should be able to set up chimpmark using poolparty. When we are done, Jacob should be able to set up chimpmark using cluster chef 2, with a minimum amount of retooling.

Medium Priority:

M-1. Idempotent Cluster Start

if you say "knife cluster launch xxx yyy; knife cluster launch xxx yyy" you should end up with one set of yyy-faceted nodes in the xxx cluster (not two).

M-2. Spot pricing

you should be able to specify spot priced nodes in the cluster definition.

M-3 Cluster Visibility Tool

there should be a command like "knife cluster show" which will give a report on what the cluster is defined to be, what nodes exist in chef and what nodes are part of the cluster as reported by AWS.

M-4 Use chef 0.10

although setting up chef environments is not a part of this phase of the project, we should pave the way as much as possible by ensuring that our internal clients are on version 0.10. (All clients on chef 0.10 is a requirement for chef environments to work...)

Low Priority:

L-1 Elastic IP Assignment

probably as a chef recipe akin to the ebs volume assignment stuff

EMR compatibility

Investigate using chef with EMR.

Project Milestones

Major

Be able to start the Bonobo hadoop cluster as a throwaway hadoop workhorse. Once this is accomplished, we can ensure that there is only one version of all the cookbooks and then start getting users within the company working with cluster chef 2 and getting rid of pool party.

Minor

TBD

Cookbook standardizations

We should standardize a few things that are mostly-agreed on, but still have some variation.

default, client, server

Recipes should differentiate between client and provider recipes (the redis recipe is one notable offender). Taking redis as an example,

recipes/default information shared by anyone using redis
recipes/client configure me as a client of a redis facet
recipes/server configure me as a server of a redis facet

We should standardize these names and that separation across the cookbooks. The redis recipe has the dangerous situation that including the 'redis' recipe makes you a server. We need to make sure that's not the case

include_recipe

Use include_recipe:

only if putting it in the role would be utterly un-interesting.
never for anything that will start a daemon

If you leave it out, it will break unless I include the recipe in my role: this is a GOOD thing. include_recipe yes: java, ruby, cluster_service_discovery, etc. include_recipe no: zookeeper:client, nfs:server.

fooservice/thingy, not fooservice/fooservice_thingy

Recipes shouldn't repeat their service name: 'hbase:master' and not 'hbase:hbase_master'; 'zookeeper:server' not 'zookeeper:zookeeper_server'.

node[:cluster_name] vs node[:service][:cluster_name]

Right now recipes disorganizedly reference either

the scoped cluster name: node[:zookeeper][:cluster_name]
the cluster name node[:cluster_name] directly

Cluster identity is currently used to reference

the cluster I am a member of
for a service I provide, the cluster I represent
for a service I consume, the cluster I am patronizing.

The first two DO seem to be identical, but the third is different and shouldn't use the word 'cluster_name'. Proposed that recipes reference:

the scoped cluster name: node[:zookeeper][:providing_cluster] if present
the cluster name node[:cluster_chef][:cluster] otherwise

so with a new method

def cluster_providing(service_scope, service_role=nil)
  service_role ||= service_scope
  providing_cluster = (node[service_scope][:providing_cluster] || node[:cluster_chef][:cluster])
  "#{providing_cluster}-#{service_role}"
end

then if cluster hambone had

  :zookeeper => { :providing_cluster => 'turkey' },
  :redis     => { },
  :cassandra => { },

you would say

  private_ip_of(cluster_providing(:zookeeper))                   # "turkey-zookeeper"
  private_ip_of(cluster_providing(:redis}                        # "hambone-redis"
  private_ip_of(cluster_providing(:cassandra, 'cassandra-seed')) # "hambone-cassandra-seed"

Can a cluster be a dependency for another cluster

We are looking for a solution that would help us in a continuous deployment scenario. From what I can see, cluster_chef can do a lot of the heavy lifting. I'm curious about this, though. The infrastructure is defined as a number of components that work together. A database, a web server, a few app servers, a proxy, etc. Once in a while, an entire class of application servers will be replaced to effect a clean software build. So if I have software foo-1.0.1 and foo-1.0.2, and I run foo on 5 servers, I would like to bring up the new version by deploying 1.0.2 onto 5 new servers, and then when they're up I can phase out the 5 old servers gradually to make sure I haven't introduced a regression.

Can cluster_chef give me a way to describe that two (or more?) instances of the same service at two (or more?) revisions should both be running? My current thinking is that I can see this taking the form of defining each deployment as a set of dependencies on specific clusters. E.g. app-cluster-release-20110926-0001 and app-cluster-release-20110926-0002 are the same app at different release levels. Each one is a "deployment" of the app. If I have a dependent app, the deployment processes would seem to need a way to refer to "the clusters of applications running production code" (where there can be >1 revisions of production code at any time) and these references would be replaced over time by something (probably a continuous integration hook) so that when a deployment is triggered, a new app-cluster-release is triggered, everything that depends on the app-cluster will, at their next cluster_chef client invocation, note that the revision has been updated, and re-configure themselves appropriately (according to logic in our cookbooks).

There are probably some gaps in the explanation I'm providing here, but I'd love to talk about this some more. It'd be nice to have this functionality, especially if there's a way to do this already (e.g. perhaps I should be treating the entire environment as one very large cluster, and add and remove facets to make this work?)

Thanks!

-Peter

big_package is a disaster waiting to happen

We have been using big_package as a dumping ground for library installs that we are pretty sure that we would like to have, but we don't really have a good reason for them being there. This is bad for a couple of reasons. Reason #1, other cookbooks may be silently relying on packages that are installed by big_package, so removing big_package from a node's run list may cause failures because of the undocumented dependency. Reason #2: the libraries in big_package have their own depencencies, which could easily conflict with dependencies found in other cookbooks.

Integration cookbooks FTW

@temujin9 has proposed, and it's a good propose, that there should exist such a thing as an 'integration cookbook'.

The hadoop_cluster cookbook should describe the hadoop_cluster, the ganglia cookbook ganglia, and the zookeeper cookbook zookeeper. Each should provide hooks that are neighborly but not exhibitionist, but should mind its own business.

The job of tying those components together should belong to a separate concern. It should know how and when to copy hbase jars into the pig home dir, or what cluster service_provide'r a redis client should reference.

Practical implications

I'm going to revert out the node[:zookeeper][:cluster_name] attributes -- services should always announce under their cluster.
Until we figure out how and when to separate integrations, I'm going to isolate entanglements into their own recipes within cookbooks: so, the ganglia part of hadoop will become ganglia_integration or somesuch.

Example integrations

Copying jars

Pig needs jars from hbase and zookeeper. They should announce they have jars; pig should announce its home directory; the integration should decide how and where to copy the jars.

Reference a service

Right now in several places we have attributes like node[:zookeeper][:cluster_name], used to specify the cluster that provides_service zookeeper.

Server recipes should never use node[:zookeeper][:cluster_name] -- they should always announce under their native cluster. (I'd kinda like to give provides_service some sugar to assume the cluster name, but need to find something backwards-compatible to use)
Need to take a better survey of usage among clients to determine how to do this.
cases:
- hbase cookbook refs: hadoop, zookeeper, ganglia
- flume cookbook refs: zookeeper, ganglia.
- flume nodes may reference several different flume provides_service'rs
- API using two different elasticsearch clusters

Logging, monitoring

tell flume you have logs to pick up
tell ganglia to monitor you

Service Dashboard

Let everything with a dashboard say so, and then let one resource create a page that links to each.

These are still forming, ideas welcome.

Remaining cookbook differences between master and version_2

Most cookbooks are now the same between master (infochimps production versions) and version_2:

Here are the few remaining cookbooks that are different between the branches:

hadoop_cluster,hbase,hive,cloudera_hue. At some point in the future the versions in master will replace those in version_2. We will try to ensure stability for people using version_2 copies of those cookbooks.
big_package: these are probably better in version_2 and I'll sort them out at some point.
elasticsearch: one uses cluster discovery, one uses explicit seeds. You're working with these @temujin9, go with the one you think makes more sense.
redis -- see infochimps/cluster_chef#10 (cookbook standardization) for more
zookeeper -- see infochimps/cluster_chef#10 (cookbook standardization) for more
cookbooks/java, cookbooks/ubuntu -- the version_2 cookbook forces sun java, which is GOOD. openjdk is an infestation. I didn't pull this in to master, because it should be done thoughtfully.

Changes made to each branch

You'll see a whole flurry of changes, I tried to only do things that wouldn't impact production.

To version_2:

new to the version_2 branch, pulled in wholesale from master: apache2, graphite, nagios, nodejs, ntp, python, statds
fixes to cassandra and thrift from Mike Heffner.
From the production versions in master, patches applied cleanly: flume, zookeeper, pig, jruby

To master:

backported fixes from the opscode-master for cookbooks/{aws,nfs,runit,thrift}
nuked some bad symlinks
fixes to the cassandra recipes
fix to nfs recipe for NFS servers under ubuntu maverick
new to the master branch, pulled in wholesale from version_2: Rstats, motd/...node_name
a one-line nitpick in hbase.

site-cookbooks vs cookbooks:

I pulled apache2, ntp, python into site-cookbooks because that's where it was in master. I think many of these are unmolested copies from the opscode-cookbooks versions -- @temujin9, would you please relocate them into cookbooks and put only the overrides if any in site-cookbooks?

"Safeties" for dangerous operations (hadoop namenode -format)

It would be nice (not a high prioirty) to have at least a couple mechanisms to make it hard to do something really destructive to a cluster. hadoop namenode -format is the scariest.

The use of a role that you have to add and subtract seems like it could backfire if you forget to take it out.

I'm trying something like this:

      dfs_name_dir_paths = hadoop[:mount_points][environment].map { |m| "#{m[:mount]}/#{m[:dfs_name_path]}" if m[:dfs]}.compact
      Chef::Log.warn "About to hadoop namenode -format filesystems #{dfs_name_dir_paths.inspect}"
      execute "Format hadoop file system" do
        user  hadoop[:user]
        group  hadoop[:group]
        command "#{hadoop_dir}/bin/hadoop namenode -format"
        not_if do
          dfs_name_dir_paths.any? { |m| File.exist?(m) }
        end
      end
      node.set[:hadoop_namenode_format_safety] = true
      node.save
      Chef::Log.warn "NAMENODE SAFETY ENGAGED: Future chef-client runs will NOT format name server filesystem."
      Chef::Log.warn "Set node[:hadoop_namenode_format_safety] to false to enable destructive format"
      unless dfs_name_dir_paths.all? { |m| File.exist?(m) }
        msg = "One or more of #{dfs_name_dir_paths.join(', ')} was not namenode formated"
        Chef::Log.error msg
        raise msg
      end

I'm also having all the EBS mounted filesystems specified in environment scoped data bags. They include a "safety" flag that can be used to enable/disable formating and such on a fine grained way. I may add an automatic node based safety mechanism like the one above as well.

replace `cloud` with `ec2` (Proposed Breaking Change)

The cloud statement needs rethunk

tl;dr -- The bare cloud statement (generic cloud attributes) was well-meaning but useless.

Given that the only cloud we currently allow is EC2, I propose we remove the cloud directive (cloud.flavor('t1.micro')) in favor of ec2 (ec2.flavor('t1.micro')). I'd like to not deprecate the term but remove it -- breaking any script that currently calls it. It's a simple regex replace and it will be unambiguous why the call failed.

If you don't like that idea, speak up now.

Longer version:

The cloud statement is intended to let me say "Here, friends, is the platonic ideal of an industrial-strength hadoop cluster, wheresoever you are. Should you find yourself in Rackspace, apply that ideal on various components sized thusly; if instead, EC2, on components sized like this."

We use the terms that fog very nicely provides for describing aspects of a machine (flavor, image_name, etc), so that if you say cloud.flavor('whatever') the code to apply that directive is shared across providers.

The DSL looks like this:

  # no cloud specified
  cloud do
    flavor 'c1.medium'
    elastic_ip '123.45.67.89'
  end

  # an equivalent way of doing the above
  cloud.flavor 'c1.medium'
  cloud.elastic_ip '123.45.67.89'

  cloud(:ec2) do
    flavor 'c1.medium'
    elastic_ip '123.45.67.89'
   end

  cloud(:ec2).flavor 'c1.medium'         # (yuck)
  cloud(:ec2).elastic_ip '123.45.67.89'

The idea was that with a bare cloud statement you would say "Here's generic description of cloud shape", vs cloud(:ec2) defining "Here's specifics if the cluster is launched on EC2".

Now: most things on the left are generic across clouds. The rest are typically harmless even if the cloud doesn't handle them.

However! almost everything that you'd put on the right of those is not cloud-agnostic. Even things like 'elastic_ip' seem global but of course only exist in the cloud that owns it.

This shows that a generic 'cloud' statement doesn't make any sense, and while we have the chance to make breaking changes I'd like to delete it.

Instead, we define directives ec2 (and so on for other cloud providers). Each provider's class inherits from cloud (and so can be decorated with things that only make sense on that cloud).

  ec2.flavor 'c1.medium' 
  ec2.elastic_ip '123.45.67.89'

  rackspace.flavor     '1024MB'
  rackspace.image_name 'rs_maverick'

  vagrant do
    vfs_path '/disk2/vagrants/gibbon'
  end

Since everywhere we currently say cloud we really mean ec2, it's a simple regex-replace. So I'd like to not deprecate the term but remove it -- breaking any script that currently calls it. This will also help isolate the provider-specific stuff in the cluster_chef tools (though it's the cookbooks that need the real de-linting).

If you don't like the idea of breakage, speak up now.

Standardize cluster facet node addressing

Some of the knife cluster commands will take a cluster name and an optional facet name. Others will also take an optional instance index.

The command line interface for cluster facet node addressing should be uniform across knife cluster commands.

Converge External State (EBS attachments, elastic IP, load balancer, etc) externally

says @howech,

Folks,

I would like to propose a change to the way ebs volumes get mounted.

In the current scheme, there is an ebs volume mounter chef script that runs as a part of the chef-client run. It pulls data out of the cluster_ebs_volumes data bag, and does an excellent good job of mounting volumes in the right places.

But if it works so well, why change it?

Security.

Because of this recipe, we have to put a too-general AWS key on all of our EC2 instances. Should a node really be given the power to manage its own AWS EC2 destiny - maybe. Should it also be given the ability to manage the destiny of other nodes - probably not. However, the idea of setting up custom api-keys, security groups and policies that make it so that the key on bonobo-slave-17 is only given power over bonobo or the bonobo-slave facet, or even the bonobo-slave-17 node seems extraordinarily complicated.

Instead, imagine a world where "knife cluster launch bonobo" would look into the same databag and pull out the mapping of instance names to ebs volumes. (Ok, bad example, since bonobo does not have ebs volumes, but you get the idea). Most of the time, you just want to make sure that the ebs volumes get set up in the right place when the node is started: I think that an ebs volume will remain attached to an instance through anything you can do to the instance (except of course to detach it, and to terminate the node...). Mounting the device does not require AWS EC2 privs, so that part of the process can remain the way it is. If we think we need it, we could maybe write something to make sure that a running cluster has its ebs volumes attached in the right places, and optionally fix it. The "fix it" part is problematic, as you have to ensure that the drives are unmounted if you want to detach them - as this could fatally interfere with running processes on a server, it is probably best to keep this part manual. The rest should be relatively simple to do with Fog.

Comments?

Proposal: add identification information to aws tags

Currently, ClusterChef discovers cluster/facet membership by looking at aws security groups. If we were to add instance tags, we could unequivocabably identify a cluster node just from the aws data. This could become the primary method for cluster discovery with internal chef data acting as a backup if the tagging information is missing. Also, cluster chef could ensure that the tags and internal chef data match up on a "cluster show".

This would make it so that cluster chef would not have to wait for chef-client to run to be able to reconcile cluster nodes.

Complex hadoop cluster convergence should be more reliable

Cluster convergence of complex hadoop clusters needs to be more solid.

While it would be awesome to have fire-and-forget for a full cluster, orchestrating the convergence and invocation of nearly a dozen services across as many as four supervisory machines is demostrably non-trivial. I'm at peace with a workflow that went something like

launch namenode, wait for initial chef-client run to complete
single-command script to format namenode and start services
launch all remaining nodes
... and from there, evaluate full fire-and forget

TODO

Separate initial chef convergence of configuration from hadoop service launch
- Cluster recipes give you control over whether to launch+enable, enable, etc services by default.
- a single-step mechanism to switch the services on
- take another look at the namenode formatting step
compatibility with CDH3u1 (which dicks around with a few paths).
cluster_service provider racing
- When you stop-start machines, or otherwise need to reconverge, this will happily sail through with no notion of the race condition. Add a knife cluster unprovide that unregisters to provide all services, and have knife cluster stop invoke it along the way.
- add a wait_for_provider helper (or option) -- see fiksu@b65a7a5b74402c0c222ae9da27bd8163045997e8
checkpoints so that if a service provider is missing (eg namenode not running) it doesn't cause chef to bomb out
cluster reconvergence when machines are stopped and then started

Tag instances with cluster name/node names

On EC2, optionally tag instance IDs with the cluster name and their node names.

config/bootstrap_chef_client.sh.erb relies on runit

Runit is required to start chef-client. Without it, that fails. It seems that runit is a requirement for most of the cookbooks that you provide, but not for e.g. nfs-server.

Runit should probably just be installed by default.

Add facet.choose_by_index([array]) to let me cycle AZs, etc

For launching by correct availability zone (related to @librato 's SHA:fe7a1fd37f694a8104364eeb6fe868020b22dd3b), we need a method lets me hand a facet an array like %w[ us-east-1a us-east-1b us-east-1c ], to have it choose from the array based on its facet_index mod the length.

README.textile appears to be different from the implementation

The README.textile indicates that cluster_chef is recording data in git. From what I've found, role information is currently stored and queried in AWS server properties and merged with config settings (at least for knife, I assume the same is done for cluster_chef in general).

Specifically:

* *Chef server is never the repository of truth* -- it only mirrors the truth.
  - a file is tangible and immediate to access
* Specifically, we want truth to live in the git repo, and be enforced by the chef server. *There is no truth but git, and chef is its messenger*.
  - this means that everything is versioned, documented and exchangeable.
* *Systems, services and significant modifications cluster should be obvious from the `clusters` file*.  I don't want to have to bounce around nine different files to find out which thing installed a redis:server.
  - basically, the existence of anything that opens a port should be obvious when I look at the cluster file.

is this planned or is this approach discarded? It would be very nice for cluster_chef to allow the state of a cluster at a point in time to be seen via git or some other dvcs since it would allow for the state of a cluster to be materialized and monitored from anywhere, not just via AWS tools, and without any aws authentication artifacts (keys, ids, etc.).

Proposal: move cluster definition data into a databag

This would make cluster definitions behave more like native chef roles.

Currently, the cluster representation gets built from the cluster file when you run "knife cluster command". Unfortunately, in a multi-user environment, this leaves it susceptible to the users making sure that they are in sync with the cluster definition.

This proposal would be to add a "knife cluster from file" command which would take the cluster definition and compile it into a static json representation and upload it into a chef databag. At the same time, commands like "knife cluster kill", "knife cluster launch", etc would pull the cluster definition down from the databag instead of trying to load it from the ruby files.

This proposal would help to ease multi-user syncronization issues.

This proposal would eliminate the ugly ruby lib manipulation code that gets cluster definitions to load, easing the way forward for a cluster_chef gem installation.

Config files referenced in notes/pt1[...] don't exist

notes/pt1-initial-settings-and-credentials.textile mentions:


  cd PATH/TO/cluster_chef
  cp ./config/poolparty-example.yaml   ~/.hadoop-ec2/poolparty.yaml 
  ln -nfs ~/.hadoop-ec2/poolparty.yaml ~/.hadoop-ec2/aws
  # optional:
  ( cd ~/.hadoop-ec2 && git init && git add . && git commit -m "Initial commit" )

From the commit history, it looks like poolparty-example.yaml became cluster_chef.yaml, and then removed in/around https://github.com/infochimps/cluster_chef/tree/f88f0830b8b6e1da9f36d4a298e003127165c224/config.

Should these notes files be removed or updated?

Spot Pricing

Clusters should have a mechanism to launch instances with spot pricing.

Changing the number of instances in a facet

What is the expected behavior of cluster_chef if a cookbook is altered to increase/decrease the number of desired instances and then the cluster_chef client is run?

What, if anything, would need to be changed to allow clients to be removed from a cluster facet and appropriate de-configuration done on configuration that was created to accomodate them?

Is there a command or configuration setting to tell cluster_chef to resize itself?

knife launch should create a chef node if not present

Knife should create the chef node if it's missing.

This causes problems with sync and lots besides when it doesn't

flip

Cookbooks and cluster configuration should be cloud agnostic

The recipes have EC2 intertwingled in various places.

We're hoping some folks in the community can help abstract the cloud dependencies, so that all the recipes work on EC2, Rackspace, other cloud providers and non-cloud machines.

Pull requests are welcome. The best solution will leave the recipes modular and not require conditional logic.

The cluster_chef library needs integration:

/lib/cluster_chef/compute.rb -- has only ec2 cloud integration so far

lib/cluster_chef/cloud.rb: The cloud statement should allow

with no argument, describes physical attributes of the machine generically, using the terminology of Fog.
with a provider, describes provider-specific configuration, and settings that apply only when on that provider.

cloud{}             # ...machine's physical configuration) 
cloud(:ec2){}       # ...physical configuration and attribute overrides when on ec2)
cloud(:rackspace){} # ...physical configuration and attribute overrides when on rackspace)
cloud(:local){}     # ...physical configuration and attribute overrides when on raw hardware)

Several cookbooks require abstraction:

In several places, we access the @node[:ec2][:public_hostname] explicitly. This should be abstracted -- perhaps through a chef library?
site-cookbooks/motd: report parameters regardless of cloud.
site-cookbooks/big_package: separate out aws-specific packages.
site-cookbooks/elasticsearch/recipes/http.rb: abstract the load_balancer logic.
site-cookbooks/hadoop_cluster:
- much of the configuration takes place in recipes/ec2_conf.rb.
- The machine parameters tuning in attributes/hadoop_cluster.rb is very intertwingled
- templates/webfront_index.html.erb has a lot of references to ec2 parameters
- roles/hadoop_s3_keys.rb should be cloud agnostic

Probably safe, but could benefit from a clean way to inject cloud-specific template content

site-cookbooks/cassandra
site-cookbooks/elasticsearch
site-cookbooks/flume/templates/default/flume-site.xml.erb
site-cookbooks/hadoop_cluster/templates/default/core-site.xml.erb

knife cluster launch --bootstrap not working

knife cluster launch --bootstrap not working. cluster_bootstrap.rb sets the following config variables, but cluster_launch does not.

        config[:ssh_user]       = facet.cloud.ssh_user
        config[:identity_file]  = facet.cloud.ssh_identity_file
        config[:distro]         = facet.cloud.bootstrap_distro
        config[:run_list]       = facet.run_list

Cluster ssh fails when there is no chef_node present

Currently, cluster ssh gets its list of addresses from chef attributes. When you first spin up a cluster, the chef nodes do not exist until chef-client runs successfully on the nodes, so until that happens cluster ssh is blind. If something goes wrong with the first chef run, this makes it very difficult to fix what has gone wrong.

In many cases, if a fog_server is present, it has a better idea of what the ip address for a node really is. We should use the fog_server.public_ip_address as a first option and fall back to chef_node if something goes wrong.

Relocate cookbooks, roles and clusters into separate repo(s)

Cluster chef currently consists of the following separate(able) concerns:

cluster_chef tools
- the DSL that lets you define clusters
- the knife commands which use that DSL
- optional bootstrap scripts for a machine image that can then launch bootstrap-less
cluster-oriented cookbooks
- cluster_service_discovery (recipes to let clusters discover services using chef-server's search)
- ?others?
cloud utility cookbooks
- motd, system_internals (swappiness, ulimit, etc)
big data cookbooks (hadoop, cassandra, redis, etc):
- cookbooks
- roles
- clusters

I think it's time to separate those into at least two repos.

REQUEST FOR COMMENTS:

Division of concerns

It's clear that the cluster_chef tools and the big data cookbooks should be divorced, but I don't know whether there should be further subdivision.

Proposed:

cluster_chef holds only the DSL, knife commands, and bootstrap scripts -- basically the stuff in lib/, along with the gemspec etc.
cluster_chef-systems -- holds cookbooks, roles and example clusters that use them.
- Utility cookbooks (cluster_service_discovery, motd, etc) and system cookbooks(hadoop, cassandra, etc) are housed in two separate folders.
- The standard layout would just include the cookbooks, but a cluster-oriented approach demands that the roles travel along too
(possibly) cluster_chef-chef-repo (??better name, anyone??) -- a fork of https://github.com/opscode/chef-repo that integrates the above

Handling of cookbooks that originate from opscode-cookbooks

Right now we copy standard cookbooks from opscode's repo into the cookbooks directory. This lets us version them separately, but means we have to track them, and could cause conflicts with the majority of people who will be pulling from opscode-cookbooks already.

omit entirely, but list as dependencies (my vote)
git subtree pull them into the cluster_chef-cookbooks repo
copy them in as we've been doing
git submodule opscode-cookbooks and symlink

Organization of new repo(s)

Opscode recommends a standard layout for your chef repo. We should make the new arrangement work seamlessly within that structure.

The new layout should

easy to integrate if you have your own existing chef-repo
straightforward for a new chef user to adopt
either mirror or be what we actually use

Proposed, but this doesn't feel solid to me yet:

  clusters/                 # internal clusters
  roles/                    # internal roles
  site-cookbooks/           # internal cookbooks

  systems-cookbooks@        # symlink to cluster_chef-systems/systems-cookbooks
  cloud-cookbooks@          # symlink to cluster_chef-systems/cloud-cookbooks
  opscode-cookbooks/        # git submodule of https://github.com/opscode/cookbooks

  cluster_chef-systems      # git submodule of https://github.com/infochimps/cluster_chef-systems
    clusters/               
    roles/                  
    systems-cookbooks/      # hadoop, cassandra, etc
    cloud-cookbooks/        # cluster_service_discovery, motd, etc

  .chef/                    # knife config, keypairs, etc
  certificates/
  config/
  data_bags/
  environments/

The idea is that you can either just include the cookbooks or clusters directories in your config file, or symlink selectively.

Omnibus minor cleanup issues (2011 October)

Things that are annoying as hell but not costly:

NFS server boostrapping isn't automatic -- you need to upgrade kernel, restart.
A 'safety catch' -- see https://github.com/infochimps/cluster_chef/issues/18#issuecomment-1194916
Strip out use defaults; deprecate 'use'.
ephemeral drives cleanup
cluster_role_implication cleanup
ebs volumes shouldn't complain if data_bag missing

Clusters, facets, and servers should implement display code in compute.rb

Currently, the display code is implemented in the cluster_xxxx.rb commands. This should be abstracted up to the compute.rb layer.

bootstrap doesn't work when there's an elastic_ip

bootstrap doesn't work when there's an elastic_ip -- it is applied in time for knife to be surprised by its new IP address.

Support Spot Instances

Assuming my priorities don't get swapped out from under me, I'm going to be working on adding Spot Instance support to Fog and then to knife-ec2 plugin. Off hand it looks like most of the work is on the fog end. So that should be leveragable into the knife-cluster plugin.

So you can count me in to work on at least the Fog support starting within a week...

Project has no license

Please add a LICENSE file or in some other way indicate under what terms this project is usable.

cluster_chef discovery fails with fog version 0.90

It looks like the way to access security groups through fog has changed. This caused the discovery mechanism to fail to find any appropriate servers.

Searching by security group is probably obsolete. We can probably just go with tags.

Alternatively, we could figure how to do the same thing we were doing with fog-0.90 and make it work like before.

cluster chef DSL should handle the runlist exclusively

We should only use the outer role / recipe keywords to fix the runlist. Right now many cluster definitions have these duplicated inside the facet_role block.

We need to

add the code that does the runlist injection for us
make knife create the node if it doesn't exist.
a cleanup of cluster_role_implication wouldn't be terrible
clean up first_boot.json

A few reasons why you must/should have the roles under control of cluster_chef:

security groups implied by roles (nfs_client etc)
separation of cluster/facet runlist contributions
It's a cleaner expression

Use alternatives to AWS user data

There should be a way to bootstrap a node without having to rely on the AWS user data feature. This could be nice, for instance, in an environment that is not AWS EC2.

Elastic IP Addresses

There should be a mechanism to assign elastic ip addresses to cluster nodes.

What permissions need to be set to use hosted chef with v3

I'm getting 403 errors from opscode when creating clients. I noticed that you had a 500 error last week (http://help.opscode.com/discussions/problems/866-opscode-platform-500-error-creating-chef-client-with-site-or-with-knife) but I think this is different.

Based on this: http://help.opscode.com/discussions/problems/873-how-can-i-find-out-what-permission-is-being-denied it looks like the v3 client creation needs to do some more work on the knife side... maybe? I'm trying to knife from a cluster definition and all of my clients end up seeing this stack trace when running chef-client --once

Generated at Tue Nov 08 21:54:17 +0000 2011
Net::HTTPServerException: 403 "Forbidden"
/usr/lib/ruby/1.8/net/http.rb:2105:in error!' /usr/lib/ruby/gems/1.8/gems/chef-0.10.4/bin/../lib/chef/rest.rb:237:inapi_request'
/usr/lib/ruby/gems/1.8/gems/chef-0.10.4/bin/../lib/chef/rest.rb:288:in retriable_rest_request' /usr/lib/ruby/gems/1.8/gems/chef-0.10.4/bin/../lib/chef/rest.rb:218:inapi_request'
/usr/lib/ruby/gems/1.8/gems/chef-0.10.4/bin/../lib/chef/rest.rb:130:in put_rest' /usr/lib/ruby/gems/1.8/gems/chef-0.10.4/bin/../lib/chef/api_client.rb:247:insave'
/usr/lib/ruby/gems/1.8/gems/chef-0.10.4/bin/../lib/chef/rest.rb:81:in register' /usr/lib/ruby/gems/1.8/gems/chef-0.10.4/bin/../lib/chef/rest.rb:79:inupto'
/usr/lib/ruby/gems/1.8/gems/chef-0.10.4/bin/../lib/chef/rest.rb:79:in register' /usr/lib/ruby/gems/1.8/gems/chef-0.10.4/bin/../lib/chef/rest.rb:77:incatch'
/usr/lib/ruby/gems/1.8/gems/chef-0.10.4/bin/../lib/chef/rest.rb:77:in register' /usr/lib/ruby/gems/1.8/gems/chef-0.10.4/bin/../lib/chef/client.rb:280:inregister'
/usr/lib/ruby/gems/1.8/gems/chef-0.10.4/bin/../lib/chef/client.rb:150:in run' /usr/lib/ruby/gems/1.8/gems/chef-0.10.4/bin/../lib/chef/application/client.rb:239:inrun_application'
/usr/lib/ruby/gems/1.8/gems/chef-0.10.4/bin/../lib/chef/application/client.rb:229:in loop' /usr/lib/ruby/gems/1.8/gems/chef-0.10.4/bin/../lib/chef/application/client.rb:229:inrun_application'
/usr/lib/ruby/gems/1.8/gems/chef-0.10.4/bin/../lib/chef/application.rb:67:in run' /usr/lib/ruby/gems/1.8/gems/chef-0.10.4/bin/chef-client:26 /usr/bin/chef-client:19:inload'
/usr/bin/chef-client:19

What are you doing to prevent this when you test?

Deprecate role/recipe inclusion in cluster definitions

Roles and recipes included directly from cluster definitions are written into node metadata on opscode, and must be edited manually with knife to remove. This is the core of the "hadoop_cluster::system_internals" bug we keep encountering: although it was updated in all roles, it was also present in some node meta-data.

Both the cluster and it's individual facets can establish roles and recipes associated with a collection of systems, in a way that allows for more consistent editing, updating, and revision tracking. There's no benefit to duplicating that capability inside of cluster definitions, and several drawbacks. I'd recommend we stop using that capability, and deprecate it in cluster_chef, ASAP.

knife cluster ssh does not work

Seems to default to the internal hostname when invoked from outside EC2? Not sure if that's why it's failing.

$ knife cluster ssh mycluster datanode uptime  -VV
DEBUG: Using configuration from cluster_chef/.chef/knife.rb
DEBUG: Signing the request as xxxx
DEBUG: Sending HTTP Request via GET to ..../search/node
DEBUG: Adding ip-....ec2.internal
DEBUG: Adding ip-.....ec2.internal
DEBUG: Adding ip-.....ec2.internal
DEBUG: Adding domU-.....compute-1.internal
DEBUG: establishing connection to ip-.....ec2.internal:22
/home/mheffner/.rvm/gems/ree-1.8.7-2011.03@cluster_chef/gems/chef-0.10.0/lib/chef/knife/ssh.rb:90:in `session': undefined method `each' for nil:NilClass (NoMethodError)
    from /home/mheffner/.rvm/gems/ree-1.8.7-2011.03@cluster_chef/gems/net-ssh-multi-1.0.1/lib/net/ssh/multi/session_actions.rb:37:in `join'
    from /home/mheffner/.rvm/gems/ree-1.8.7-2011.03@cluster_chef/gems/net-ssh-multi-1.0.1/lib/net/ssh/multi/session_actions.rb:37:in `sessions'
    from /home/mheffner/.rvm/gems/ree-1.8.7-2011.03@cluster_chef/gems/net-ssh-multi-1.0.1/lib/net/ssh/multi/session_actions.rb:37:in `each'
    from /home/mheffner/.rvm/gems/ree-1.8.7-2011.03@cluster_chef/gems/net-ssh-multi-1.0.1/lib/net/ssh/multi/session_actions.rb:37:in `sessions'
    from /home/mheffner/.rvm/gems/ree-1.8.7-2011.03@cluster_chef/gems/net-ssh-multi-1.0.1/lib/net/ssh/multi/session_actions.rb:81:in `open_channel'
    from /home/mheffner/.rvm/gems/ree-1.8.7-2011.03@cluster_chef/gems/chef-0.10.0/lib/chef/knife/ssh.rb:161:in `ssh_command'
    from /home/mheffner/.chef/plugins/knife/cluster_ssh.rb:151:in `run'
    from /home/mheffner/.rvm/gems/ree-1.8.7-2011.03@cluster_chef/gems/chef-0.10.0/lib/chef/knife.rb:391:in `run_with_pretty_exceptions'
    from /home/mheffner/.rvm/gems/ree-1.8.7-2011.03@cluster_chef/gems/chef-0.10.0/lib/chef/knife.rb:166:in `run'
    from /home/mheffner/.rvm/gems/ree-1.8.7-2011.03@cluster_chef/gems/chef-0.10.0/lib/chef/application/knife.rb:128:in `run'
    from /home/mheffner/.rvm/gems/ree-1.8.7-2011.03@cluster_chef/gems/chef-0.10.0/bin/knife:25
    from /home/mheffner/.rvm/gems/ree-1.8.7-2011.03@cluster_chef/bin/knife:19:in `load'
    from /home/mheffner/.rvm/gems/ree-1.8.7-2011.03@cluster_chef/bin/knife:19

All fog calls should specify the region given in the cluster definition

https://github.com/infochimps/cluster_chef/blob/master/lib/cluster_chef/discovery.rb#L101

region isn't requested. If no region is provided, then connect will happen to the default region (us-east-1 I guess) and cluster creation ground work will be done there. E.g groups will be created, permissions assigned etc. However, when creating the server nothing has been done in the region that was requested for the cluster.

Quick hack: add knife[:region] 'us-west-1' to the .knife being used, and then change line 101 in the link above to use Chef::Config[:knife][:region].

The right fix is probably to check if the region where the requested AZ is located is different from the region the current connection is in, and do something special (create a region pool and switch to the right region? Re-connect in the correct region before dealing with security groups?)

If there's a better way to do this, let me know. knife cluster --region returned that the region option was invalid with the cluster module.

Fallback to {cluster_chef_path}/clusters

Right now when creating a custom cluster definition and placing that definition in a custom cluster_path, there's no way to do:

use :defaults

without also copying/symlinking the default cluster definition into the custom cluster_path. It would be nice to be able to have a custom cluster definition in a custom directory while also being able to reference the default cluster definitions shipped with cluster_chef.

How about a fallback mechanism that will check cluster_path -> cluster_chef_path/clusters in order?

Split off cluster management into a gem

It would be very convenient to have the cluster management plugin part of the project split off into its own gem. This would include the knife plugin and cluster DSL but not the cookbooks and hadoop-specific kludges. Right now, using this project to manage a non-hadoop cluster is much more difficult than it could be- 1. install gem 2. require in knife.rb.

Converge on right NTP cookbook

there are two versions of the NTP cookbook in play, one from infochimps and one from librato... unless there's a reason it should be modified, can we go with the opscode-cookbooks version, placed into cookbooks/ntp ??

(I haven't looked carefully at either branch, so don't know the differences, hoping you can sort it out Nathan)

knife cluster doesn't respect -i

It seems that when launching a cluster the -i flag doesn't get used, so an alternate ssh key can't be provided:

$ knife cluster -c /Users/pn/.chef/knife-utility.rb -i ~/.ssh/id_1XXXXXX2 launch demosimple homebase --bootstrap
  +------------+-----------------------+----+------------+---------+------------+------------+-------+-------------+-------------+--------+-----------+
  | Elastic IP | Name                  | AZ | Created At | Volumes | Private IP | InstanceID | Image | State       | launchable? | Flavor | Public IP |
  +------------+-----------------------+----+------------+---------+------------+------------+-------+-------------+-------------+--------+-----------+
  |            | demosimple-homebase-0 |    |            |         |            |            |       | not running | true        |        |           |
  +------------+-----------------------+----+------------+---------+------------+------------+-------+-------------+-------------+--------+-----------+

Making security groups:
["authorizing group for", "demosimple", "demosimple", nil]
["authorizing group for", "nfs_server", "nfs_client", nil]

Launching machines:
ERROR: Fog::Service::NotFound: The key pair 'demosimple' does not exist

I think it'd make sense to allow an alternate id to be provided for the cluster so multiple clusters can be managed.

make this a knife plugin (but retain v0.9.x compatibility)

@rberger has proposed to make this a knife plugin -- +1

But we need to retain v0.9.x compatibility for at least a while -- It's OK if it continues to require a kludge no worse than what we have now, but does need to stay plausible with 0.9.x

Virgin AMI Cluster boot

There should be a mechanism to boot a cluster from a virgin ubuntu AMI. This was supposed to happen as a part of the version_2 merge, but we decided to press forward with the merge without it.

Make clusters+facets more easily searchable

Quoting howech

Various of the requirement for cluster chef make point toward having a way to glean from the chef node data exactly which nodes belong to a particular cluster (and facet). This is easy enough to do if you iterate through all of the nodes and filter on the cluster_name and cluster_role attributes:

chris@basqueseed:/usr/lib/ruby/gems/1.8/gems/chef-0.9.14/lib/chef/knife$ knife exec -E 'nodes.all {|n| next unless n.cluster_name == "greeneggs"; puts n.node_name}'
greeneggs-delta-0
greeneggs-delta-1
greeneggs-delta-2
greeneggs-delta-3
greeneggs-delta-4
greeneggs-beta
... lots

Iterating through all of the nodes is a little slow, so I would really like to use searching instead. The problem with this is that for searching, chef indexes all of the nested attributes as if they were top level attributes, making it easy to search for values buried deep within the node structure. This feature frustrates using search as a cluster discovery because of some unfortunate choices of other configuration environments:

chris@basqueseed:~/.ssh$ knife exec -E 'nodes.search("cluster_name:greeneggs") {|n| puts n.node_name}' | sort
chimpmark-master-0
chimpmark-slave-0
chimpmark-slave-1
...
goldencap-nikko-0
goldencap-twscraper-0
...
greeneggs-alpha
greeneggs-beta
greeneggs-delta-0
greeneggs-delta-1
greeneggs-delta-2
greeneggs-delta-3
...

What happened? goldencap and chimpmark are configured to talk to the hbase residing on greeneggs by the hbase.cluster_name attribute. Maybe this was a poor choice of attributes to use to indicate which hbase cluster to talk to (it was my choice, actually). But it brings up a general problem with chef search and attribute collisions that we need to be aware of.

Another feature of chef search is that the node attributes are also indexed by the full path. A nodes node.hbase.cluster_name attribute gets indexed as "cluster_name:" and "hbase_cluster_name:value":

chris@basqueseed:~/.ssh$ knife exec -E 'nodes.search("hbase_cluster_name:greeneggs") {|n| puts "#{n.node_name} #{n.cluster_name} #{n.hbase.cluster_name}" }' | sort
chimpmark-master-0 chimpmark greeneggs
chimpmark-slave-0 chimpmark greeneggs
chimpmark-slave-10 chimpmark greeneggs
chimpmark-slave-11 chimpmark greeneggs
chimpmark-slave-12 chimpmark greeneggs
...
greeneggs-delta-3 greeneggs greeneggs
greeneggs-delta-4 greeneggs greeneggs
greeneggs-delta-5 greeneggs greeneggs
greeneggs-delta-6 greeneggs greeneggs
greeneggs-gamma-0 greeneggs greeneggs
greeneggs-gamma-1 greeneggs greeneggs
greeneggs-gamma-2 greeneggs greeneggs

I propose that cluster chef should mark nodes in an unequivocal way so that it can search for them and get the answer it expects. Basically, this means adding a top level "clusterchef" attribute that contains cluster_name, cluster_facet, and cluster_facet_index. The top level "clusterchef" attribute gives us a unique namespace that we can use for the chef search interface. (note that the top level cluster_name attribute will still be there.)

cluster bootstrap

Hi, I'm new to chef and certainly cluster_chef, so I may have a basic misunderstanding...

When I run "knife cluster launch demohadoop master --bootstrap", it starts an EC2 instance, but that instance appears to have nothing installed on it. My understanding was that some chef specific stuff was supposed to land out there and even if it didn't install correctly, I could "manually" fire up the chef client from the EC2 instance to do the install. Looks like not even chef-client is being installed. What am I missing?

v3 documentation needs to note the new knife plug-in location

I'm testing out v3 from the git head via a pull from upstream to my local repo, and I've noticed that the location of the knife plugin has moved in the cluster_chef repo. Can you update the readme to add:

mkdir -p ~/.chef/plugins # probably not necessary, but I'm not 100% sure since I haven't re-built my chef install
cd .chef/plugins && \
ln -s $CLUSTER_CHEF/lib/chef/knife plugins

Limit on how much data can be sent to a newly bootstrapped node?

/Users/pn/gems/gems/excon-0.6.6/lib/excon/connection.rb:190:in `request': RequestLimitExceeded => Request limit exceeded. (Fog::Service::Error)

I get this trying to launch a cluster of 18 nodes after adding this hack to chef, to send an encrypted data bag secret:

pcn/chef@953170f

"description"=>"cluster_chef generated group PNTestCluster"}, @cloud=#<ClusterChef::Cloud::Ec2:0x1023e29e0 ...>>, "ssh"=>#<ClusterChef::Cloud::SecurityGroup:0x1023deef8 ...>}}>>}}>, @cluster=#<ClusterChef::Cluster:0x1023e3868 ...>, @FullName="PNTestCluster-knewdle-0">}>}, @fog_servers= <Fog::AWS::Compute::Servers
filters={}
[]

, @aws_instance_hash={}>, @FullName="PNTestCluster-web_proxy-0">
WARNING:
18/18 |**************************************************| 1:55
/Users/pn/gems/gems/excon-0.6.6/lib/excon/connection.rb:190:in request': RequestLimitExceeded => Request limit exceeded. (Fog::Service::Error) from /Users/pn/gems/gems/chef-0.10.4/lib/chef/knife/core/bootstrap_context.rb:54:injoin'
from /Users/pn/gems/gems/chef-0.10.4/lib/chef/knife/core/bootstrap_context.rb:54:in to_proc' from /Users/pn/.chef/plugins/knife/knife_common.rb:109:ineach'
from /Users/pn/.chef/plugins/knife/knife_common.rb:109:in progressbar_for_threads' from /Users/pn/.chef/plugins/knife/cluster_launch.rb:99:inrun'
from /Users/pn/gems/gems/chef-0.10.4/lib/chef/knife.rb:391:in run_with_pretty_exceptions' from /Users/pn/gems/gems/chef-0.10.4/lib/chef/knife.rb:166:inrun'
from /Users/pn/gems/gems/chef-0.10.4/lib/chef/application/knife.rb:128:in run' from /Users/pn/gems/gems/chef-0.10.4/bin/knife:25 from /Users/pn/gems/bin/knife:19:inload'
from /Users/pn/gems/bin/knife:19

How has the amount of data being sent to clients changed recently? What is the limit I'm running into?

Thanks,