Giter Club home page Giter Club logo

ottertune's People

Contributors

apavlo avatar arifiorino avatar bohanjason avatar dependabot[bot] avatar dvanaken avatar yangdsh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ottertune's Issues

ConversionUtil

ConversionUtil.get_human_readable essentially passes on the value we're converting and some system (list of tuples with numeric value for the suffix and said suffix) to hurry.filesize size function. The issue is in the conversion to the POSTGRES_TIME_SYSTEM in PostgresUtilImpl.

The POSTGRES_TIME_SYSTEM is currently in alphabetical order of the abbreviations - ['d', 'hr', 'min', 'ms', 's']. This poses an issue for the size function since it will convert the numeric value to the first unit that has a numeric value less than the one passed in.
i.e. - 55000 converted to time becomes "55000ms" rather than "55s".

The quick fix for this is to reorder the POSTGRES_TIME_SYSTEM into ['d', 'hr', 'min', 's', 'ms']. However, this causes another issue - the current implementation of ConversionUtil.get_raw_size tries to parse '1000ms' into '1000m' seconds and throws an ValueError.

This can be fixed with a try-except in get_raw_size in addition to the reordering.

deploy?

hi, Can you provide a detailed deployment document?
It's best to use a database case to deploy and explain as you can, and tell us how to use it
thank you!!!

Max Value

We need to set max values for some knobs in different hardwares. For example, the buffer memory cannot exceeds the total memory of the machine.

We can get the hardware information (e.g. memory, cpu, storage info) from here.

class Hardware(BaseModel):

We select a hardware type when creating a session. Then we can get the hardware info like memory.
When tuning memory-related knobs, we do not want to exceeds the hardware memory. X_max values can be set here

model.fit(X_scaled, y_scaled, X_min, X_max, ridge)

The first step is to add KnobResourceType in the types file ottertune/server/website/website/types.py
1: Memory , 2: CPU, 3: Storage, 4:Other
Then add the resource attribute in the KnobCatalog model

class KnobCatalog(BaseModel):

I have selected some memory-related knobs for Postgres, you can add the info in the knob fixture ottertune/server/website/website/fixtures/postgres-96_knobs.json, and reload the data.

For now we can only support memory. The following knobs type should be set 1(i.e. Memory), the others can be set 4(Others) in the knob fixture.

global.work_mem
global.shared_buffers
global.temp_buffers
global.wal_buffers
global.maintenance_work_mem
global.effective_cache_size

Fix target objective

Only show the selected target objective instead of all target objectives in the session view. e.g. when the user selects throughput as the target objective, then the latency should not be shown in the session view. Otherwise it gets broken because it does not have latency data.

Encode categorical variables

When tuning knobs that are categorical variables (enums), OtterTune currently just replaces the enum with its index in the list of all possible enums. We need to add support to correctly handle categorical variables before running the machine learning algorithms by using an encoding scheme (e.g., dummy encoding, one-hot encoding).

Support other Objective Functions

Now we hardcode the objective function as throughput. See:

if self.transactions_counter not in metric_data:

and

def transactions_counter(self):

Also, we use the gradient descent to minimize -throughput.

# FIXME: Maximize the throughput, hardcode

We want to remove the hardcode part, and support other objective functions like latency (less is better), and use gradient descent to minimize these less-is-better objectives. Users can choose the objective function from the list (for now throughput and latency are enough). You may want to change the above files. The following files may also be changed.

class MetricManager(models.Manager):

<td><div class="text-right">{{ labels.target_objective }}</div></td>

there is missing some file in code

hi, i read your code and found some file missing,for example, analysis.preprocessing

there exists

from .util import NEARZERO
from common.typeutil import is_numeric_matrix, is_lexical_matrix

but i can't find file util and typeutil

LabelUtil outputs are have inconsistent encodings

Use of LabelUtil's style_util method with a LabelStyle.CAPFIRST outputs labels in unicode. Other LabelStyles output standard Python 2.x strings.

To be consistent, all label styles should output Unicode.

speed up the website

The website is too slow when many results are uploaded. When uploading ~1000 results, the page response time is ~20 seconds. We need to speed up the website by optimizing Django code (e.g. use cache in Django). The response time can be reduced to less than 1 sec from my previous experience.

support boolean/binary categoricals

PR #88 and #131 add support for categorical variables when the variable can take at least 3 values.

There is no reason the same code cannot work for 2 values, except that dummy encoding a boolean just adds a redundant column.

The hacky way is to just take what's in the above PRs and turn off the check that the variable take at least 3 values. Unless that's actually fine to do, I plan to re-add support for binary categoricals that is both correct and doesn't require dummy encoding.

User Define Function - Server side

Add a target_objective.json file which has information about latency and its value. Upload it together with other four files (knobs.json, metrics_before/after, summary.json) to the tuner. Tuner needs to parse the file and gets the value of target objective.

def handle_result_files(session, files):

You may also want to modify the parser .

def latency_timer(self):

Test your code locally to make sure it can tune latency successfully.

upload_data.py not working

When I and Bohan tried to upload data using upload_data.py, we encountered following error.

ubuntu@ubuntu-xenial:/ottertune/server/website$ python script/controller_simulator/upload_data.py script/controller_simulator/generated_data 0987654321
Traceback (most recent call last):
File "script/controller_simulator/upload_data.py", line 64, in
main()
File "script/controller_simulator/upload_data.py", line 60, in main
upload(args.datadir, args.upload_code)
File "script/controller_simulator/upload_data.py", line 30, in upload
for wkld_dir in sorted(glob.glob(os.path.join(basedir, '*'))):
File "/usr/lib/python2.7/posixpath.py", line 70, in join
elif path == '' or path.endswith('/'):
AttributeError: 'list' object has no attribute 'endswith'

Change Celery log path

Celery and Django have the same log file website.log, it may cause some permission issues when restarting Celery worker or Django. For now we can grant permissions for /tmp/log/website.log* on ottertune machine when encountering such problem. It is not urgent, and I will change the Celery log file to celery.log when I am free.

Invalid target objective in tuning session crashes session page

During the creation or edition of a session, if a non existent target objective is informed as below:

screen shot 2018-02-17 at 1 55 26 am

Then the session page crashes. Also, if the user doesn't inform a target objective when tuning session is true, the table page gets a bit screwed up.

Support tuning without previous data

Right now the automatic tuning component fails if there is no knob/metric data in its repository. To add support for tuning with an empty repository, OtterTune should start collecting some initial data so that it can train its ML models, then proceed as usual after it has collected enough training data.

Ottertune doesn't start due to lib missing

When trying to run ottertune server it blows up with the following message:

  File "/Users/edward/.virtualenvs/ottertune/lib/python2.7/site-packages/django/db/backends/mysql/base.py", line 28, in <module>
    raise ImproperlyConfigured("Error loading MySQLdb module: %s" % e)
django.core.exceptions.ImproperlyConfigured: Error loading MySQLdb module: No module named MySQLdb

Incremental Knobs.

We may want to use an incremental approach where OtterTune dynamically increases the number of knobs used in a tuning session over time. Details can be found in our paper.

Settings to replicate paper

Hi all,
I'm trying to use ottertune to replicate the results obtained in "Automatic Database Management System Tuning Through Large-scale Machine Learning".
More specifically, I'm trying to run the TPC-C workload of OLTP. I've seen that the number of warehouses (scale) has been set to 200, but I can't find the number of terminals nor the weights for the various transaction types. Could you please share the xml configuration file you used?

I'd also like to know for how much time you let the workload run before evaluating the single trial, as I think that the default duration of 10 seconds is too short to obtain stable results. I'm orienting toward longer trials (~10 minutes) and discarding the first minutes to avoid warm up effects, what do you think?

Thanks in advance!

Below is a sample configuration provided by OLTP:

<?xml version="1.0"?>
<parameters>

        <!-- Connection details -->
        <dbtype>postgres</dbtype>
        <driver>org.postgresql.Driver</driver>
        <DBUrl>jdbc:postgresql://DB_IP:5432/</DBUrl>
        <username>postgres</username>
        <password>password</password>
        <isolation>TRANSACTION_READ_COMMITTED</isolation>

        <!-- Scale factor is the number of warehouses in TPCC -->
        <scalefactor>2</scalefactor> <!-- instead of 2 -->

        <!-- The workload -->
        <terminals>2</terminals> <!--instead of 2 -->
        <works>
                <work>
                        <time>10</time> <!-- instead of 10 -->
                        <rate>10000</rate>
                        <ratelimited bench="tpcc">true</ratelimited>
                        <weights>45,43,4,4,4</weights>
                </work>
        </works>

        <!-- TPCC specific -->  
        <transactiontypes>
                <transactiontype>
                        <name>NewOrder</name>
                </transactiontype>
                <transactiontype>
                        <name>Payment</name>
                </transactiontype>
                <transactiontype>
                        <name>OrderStatus</name>
                </transactiontype>
                <transactiontype>
                        <name>Delivery</name>
                </transactiontype>
                <transactiontype>
                        <name>StockLevel</name>
                </transactiontype>
        </transactiontypes>
</parameters>

Workload Issue

The workload model is kind of messy. (1) We do not distinguish the training data workloads and the tuning data workloads. (2) Also, OtterTune thinks two workloads are same if they have same names. If two users both name their workloads as 'workload-0', OtterTune will treat them as the same workload now. (3) When deleting a session, the corresponding workloads are not deleted.

Controller Signal

When the controller is not specified a running time, it will continue to run until it receives a 'stop' signal.

update Vagrant

I have not tested Python3 on Vagrant. Update Vagrant to support it. Remove supervisor in the fab file if necessary.

What's your plan about open source this project

@dvanaken , it is so excited you make this project open source, according to the article https://aws.amazon.com/cn/blogs/ai/tuning-your-dbms-automatically-with-machine-learning/ the OtterTune project consists of several module like Tuning Manager, JDBC Manager and so on. It seems the current release is incomplete, we can't run a complete scenario with OtterTunne and the Website project.
You also mentioned you will make it as online tuning service, so I wondering what's your plan about this project, will fully open source it or just part of it. If you will open source a 'work' release, what about the schedule?

Thanks.

convert_numeric_metrics in BaseParser

Convert_numeric_metrics is a method in the BaseParser class that iterates over the numeric_metric_catalog_ field.

  1. There's a redundant conditional within the for loop that checks if the metrictype of a specified metric is MetricType.COUNTER. By definition, the numeric_metric_catalog_ contains only COUNTER metrics.

  2. Method assumes that every numeric metric is found within the input metrics argument. This would raise a KeyError if not true. A fix would be to replace it with the 'get' method an check if the value of the metric is None.

  3. Convert_numeric_metrics uses convert_integer rather than convert_real, then uses "float(converted) / observation_time". There is a possible loss of precision.

Collector doesn't support Postgres9.3

pg_stat_archiver statistics collector didn't come to PostgreSQL till 9.4. As a result, collecting data fails.

We should add some way to check versions of the DBMS in the collector before looking for stat views that don't exist yet.

BaseParser parse_dbms_metrics supports global and local scopes only

Postgres96Parser metrics have different scopes - database, table, index, global. Currently, the BaseParser machinery supports global and local scopes only, and raises Exceptions otherwise.

Postgres96Parser needs an implementation of parse_dbms_metrics that doesn't rely on the BaseParser's parse_dbms_variables method, or the BaseParser's parse_dbms_variables method needs to be able to deal with various common scopes, or condense them into one scope.

Vagrant File Synced Folder

Vagrant has the capability to sync folders between the host machine and the VM.

This is potentially more secure than symlinks and portable to other hypervisors since Vagrant handles the configurations.

Probably helps deal with the long filenames and development process.

User Define Function -Client side

When the driver stops OLTPBench and controller, OLTPBench will output some statistics like latency, and controller will output some collected data (knobs.json, metrics_before/after.json, summary.json) . You may want to parse the OLTPBench output file and write it into a new file target_objective.json, which has the information of target objectives and their values (e.g. latency)

Then you will upload the target_objective.json together with four other json files collected from controller to the OtterTune server.

upload_result()

Fix the PR #147 first.

Deployment Instructions?

I'm interested in trying out OtterTune, and I'd love more detailed information about where to start! I can see that most questions about deployment were closed last November with notes that more detailed instructions would be coming soon, is there an estimate on when more detailed information might be available, or any pointers on where to begin in the meantime?

How to install ottertue?

Hi,
Is there any document for the full installation ? is the website already included in the ottertune ?
Thanks

move parameters to the constant file

Move the parameters in the analysis part to the constant file. ottertune/server/website/website/settings/constants.py

Tune these parameters if you have free time.

Change Password

It seems that we do not have a page view to change the password for users. Can anyone help with it ?

Port Ottertune to Python3

As seen here: https://pythonclock.org/ , python2 is being retired in the next two years. It would be very handy to port it to python3 as soon as possible. One idea is to use 2to3 tool that allows for an automatic way of rewriting the source files of the project. It can be executed as a dry-run so that you can see the impact of the changes. The main impediment is that some libs like poster are not ported to python3 (one of the motivations to my PR #49 , besides requests being awesome) , but this one in particular looks like abandonware, imho.

Wdyt, @dvanaken ?

Starting Points

For now we only use random samples as starting points in GPR.

X_samples = np.empty((num_samples, X_scaled.shape[1]))

We may want to add the previous top K configs with the best performance as the starting points. K is a hyperparameter included in the constant file ottertune/server/website/website/settings/constants.py
The default K value is 20

Validate knob inputs and outputs

OtterTune currently does not validate the values of the knobs when a user uploads new data to the website. It also does not validate them before creating the next configuration for the user to run.

We should perform the following checks for each knob using the information in the KnobCatalog:

  • If it's an integer or real, check that the value is between minval and maxval
  • If it's an enum, check that the value is in the enumvals list
  • If it's a bool, check that value is one of the formats supported by the DBMS (e.g., true, false, 0, 1)

Fixtures for different DBMS versions

This is not urgent. But I think we can have a better way to store the fixtures for different database versions. For example, Most of knobs and metrics in Postgres 9.2, 9.3, 9.4, 9.6 are same. We can only store these common knobs/metrics in a single Postgres fixture, and store the version-specific knobs/metrics separately. For now we have duplicated knobs/metrics in all Postgres versions.

ottertune/server/website/website/fixtures/

OLTPBench Log

Run OLTPBench on the ottertune-dev, and write a python file to check its LOG periodically (like every 1 second). The OLTPBench will create some terminals, start the experiment, finish the experiment and kill the processes. Your python file should know when OLTPBench starts and finishes the experiment by checking its Log. Then send a signal to the controller to start and terminate. You may want to add the glue code in the driver code.

# run oltpbench as a background job

Negative Value for cache size?

The loop script is almost done. However, OtterTune recommends some knobs out of the reasonable range sometimes. I am still looking into it and will fix it soon .

analysis/workload_mapping.py -- is this still relevant?

It doesn't look like anything else in the project depends on either the WorkloadState or WorkloadMapper classes defined here, and that there's a independent map_workload task that really does the work right now. Or am I missing something?

How ottertune is deployed ?

I am very interested in ottertune, view the relevant documents, but I do not know how to use, can provide detailed installation using documentation. Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.