Giter Club home page Giter Club logo

rgw-autotier's Introduction

rgw-autotier

Lua script and tools for configuring dynamic auto-tiering (auto assignment) of objects into different Storage Classes (bucket data pools) via the Ceph RGW (S3 gateway)

Overview

The Ceph object storage gateway (CephRGW) has integration with Lua which allows one to augment the gateway to with additional features. Arguably one of the most important features one can add to Ceph object storage is the ability to assign objects to different pools of storage. This enables one to place objects optimally to boost performance, boost usable capacity, and to apply organizational policies.

For example, one might have a rule that all .pdf files are stored in an erasure-coded 8k+3m layout except those that are less than 64K which should be stored in a replica-3 layout. Simple rules like this can have a profound effect on the usability of an object store and enables one to more effectively use multiple types of storage within a given storage cluster.

Storage Classes

To use the scripts effectively you'll need to setup a Ceph object storage configuration with at least one gateway and at least two data pools (eg. default.rgw.buckets.data & default.rgw.buckets.glacier.data). We recommend having a default 'STANDARD' data pool (default.rgw.buckets.data) based on a replica layout such as replica=3 and one or more additional data pools based on an erasure-coding layout such as 4k+2m or 8k+3m, you might call this second data class 'ARCHIVE' or 'GLACIER' (default.rgw.buckets.glacier.data). You'll also need at least one CephRGW instance setup within your cluster but generally for best performance run a CephRGW instance on all cluster nodes. Last, when defining a new storage class consider naming it after one of the names that AWS S3 uses (STANDARD | REDUCED_REDUNDANCY | STANDARD_IA | ONEZONE_IA | INTELLIGENT_TIERING | GLACIER | DEEP_ARCHIVE | GLACIER_IR) as some S3 clients don't support using storage class names other than those defined by AWS. (eg. Veritas NetBackup)

Setup

To enable auto-tiering you'll first need to create a rules file at /etc/ceph/rgw_autotier.prop and this file will need to be copied to all the systems in your Ceph cluster running an CephRGW instance.

Auto-Tiering Rules Format

Rules matching with regex PATTERN and capacity matching using an OPERATOR with a CAPACITY may be used together or separately. When not using any given field use the asterisk '*' to denote any/all match.

The format of an auto-tiering rules is one rule per line in the following format with semicolon (;) as the delimiter.

STORAGECLASS;PATTERN
STORAGECLASS;PATTERN;OPERATOR;CAPACITY
STORAGECLASS;PATTERN;OPERATOR;CAPACITY;BUCKET
STORAGECLASS;PATTERN;OPERATOR;CAPACITY;BUCKET;TENANT

STORAGECLASS

STORAGECLASS must be a valid storage class associated with a Ceph bucket.data pool else object PUT request will be rejected if the assigned STORAGECLASS is not valid/defined

PATTERN

PATTERN is used to match the object name via regex. The pattern must be compatible with Lua's regex ( see Lua string.find() ) and must exclude semicolons (;) as that is used as our rules line delimiter.

OPERATOR

The OPERATOR field is only valid if a valid CAPACITY is specified. OPERATOR can be greater-than '>', less-than '<', equals '=', or '*' to indicate any object size

CAPACITY

CAPACITY indicates the capacity in bytes to apply the OPERATOR to with the Request.ContentLength. For example if CAPACITY is 65536 and OPERATOR is < then only PUT requests of objects with Request.ContentLength less than 65536 bytes will be a positive match.

BUCKET

The BUCKET field is used to exact match to a bucket name (Request.Bucket.Name) , leave blank or use the '*' character to indicate any bucket

TENANT

The TENANT field is used to exact match to a tenant name (Request.Bucket.Tenant), leave blank or use the '*' character to indicate any tenant

Example Rules Configuration File (/etc/ceph/rgw_autotier.prop)

# put all objects less than 32K into INTELLIGENT_TIERING regardless of object name
INTELLIGENT_TIERING;*;<;32768
# put all .eml objects into STANDARD_IA regardless of size
STANDARD_IA;.eml
STANDARD_IA;.eml;*;*
# put all .pdf greater then 1MiB into STANDARD storage class
STANDARD;.pdf;>;1048576
# put all .iso images less than 1GiB into the REDUCED_REDUNDANCY storage class
REDUCED_REDUNDANCY;.iso;<;1073741824
# put all .xlsx objects less than 64K into STANDARD_IA
STANDARD_IA;.xlsx;<;65536
# put all objects less than 64K being written to bucket bucket123 into STANDARD_IA
STANDARD_IA;*;<=;65536;bucket123
# put all objects less than 64K being written to tenant abcdef into STANDARD_IA
STANDARD_IA;*;<=;65536;*;abcdef

Installing

Once you have your rules file installed at /etc/ceph/rgw_autotier.prop just run the following command to install the Lua script into the cluster. This will immedicately be applied to all CephRGW instances. The CephRGW instances do not need to be restarted. Additionally, you can update the rules file at any time without restarting the CephRGW instances. Note though that any changes to the rgw_autotier.prop file you'll need to propogate to all the nodes with CephRGW instances.

radosgw-admin script put --infile=rgw_autotier.lua --context=preRequest

Un-installing

This will remove the script from all the CephRGW instances.

radosgw-admin script rm --context preRequest

Debugging

Edit the ceph.conf file and in the RGW section(s) add this line, then restart the CephRGW instance(s).

        debug rgw = 20

To monitor the script to see what is getting tagged with a Storage Class per the rules configuration use something like this to monitor the radosgw log file. It can be noisy hence the "grep Lua" is helpful.

tail -f /var/log/radosgw/client.radosgw.*.log | grep Lua

This will show filtered log output like so with just log entries from the Lua auto-tiering script:

2024-03-20T06:47:55.945+0000 7f6afc6f8700 20 Lua INFO:   Object hello.pdf matched: storageClass 'GLACIER' patternMatch '%.pdf' operator '*' capacityThreshold '0' bucketMatch '' tenantMatch ''
2024-03-20T06:48:10.301+0000 7f6ab0660700 20 Lua INFO:   Object world.pptx matched: storageClass 'GLACIER' patternMatch '*' operator '>=' capacityThreshold '10485760' bucketMatch '' tenantMatch ''
2024-03-20T06:48:24.785+0000 7f6ab1662700 20 Lua INFO:   Object testing.pdf matched: storageClass 'GLACIER' patternMatch '%.pdf' operator '*' capacityThreshold '0' bucketMatch '' tenantMatch ''

rgw-autotier's People

Contributors

sumbehoc avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.