Giter Club home page Giter Club logo

clusterdata's Introduction

Overview

The trace data, ClusterData201708, contains cluster information of a production cluster in 12 hours period (see note below), and contains about 1.3k machines that run both online service and batch jobs.

The data is provided to address the challenges Alibaba face in idcs where online services and batch jobs are co-allocated. We distill the challenges as the following topics:

  1. Workload characterizations: How can we characterize Alibaba workloads in a way that we can simulate various production workload in a representative way for scheduler studies.
  2. New algorithms to assign workload to machines and to cpu cores. How we can assign and re-adjust workload to different machines and cpus for better resource utilization and acceptable resource contention.
  3. Online service and batch jobs scheduler cooperation: How we can adjust resource allocation between online service and batch jobs to improve throughput of batch jobs while maintain acceptable service quality and fast failure recovery for online service.

Please let us know if you have any issues, ideas, or papers about these data by sending email to us aliababa-clusterdata. The more specific the feedback, the more likely we are to be able to help you.

note for 12 hours period: although the data for server and batch spans about 24hours, data for containers is refined to 12 hours. We will release another version in near future.

Trace data

The format of trace data is described in the schema description, and defined in the specification file schema.csv in the repository.

Downloading the trace

The data is stored in Alibaba Cloud Object Storage Service. You do not need to have an Alibaba account or sign up for Object Storage Service to download the data.

Downloading information can be found (after a short survey) in this link. We use the contact information to keep in touch with you, and announce goodies such as new traces. Included with the trace is a SHA256SUM file, which can be used to verify the integrity of a download, using the sha256sum command from GNU coreutils using a command like

sha256sum --check SHA256SUM

clusterdata's People

Contributors

furykerry avatar haiyangding avatar lioncruise avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.