cmu-db / ottertune Goto Github PK
View Code? Open in Web Editor NEWThe automatic DBMS configuration tool
License: Other
The automatic DBMS configuration tool
License: Other
ConversionUtil.get_human_readable essentially passes on the value we're converting and some system (list of tuples with numeric value for the suffix and said suffix) to hurry.filesize size function. The issue is in the conversion to the POSTGRES_TIME_SYSTEM in PostgresUtilImpl.
The POSTGRES_TIME_SYSTEM is currently in alphabetical order of the abbreviations - ['d', 'hr', 'min', 'ms', 's']. This poses an issue for the size function since it will convert the numeric value to the first unit that has a numeric value less than the one passed in.
i.e. - 55000 converted to time becomes "55000ms" rather than "55s".
The quick fix for this is to reorder the POSTGRES_TIME_SYSTEM into ['d', 'hr', 'min', 's', 'ms']. However, this causes another issue - the current implementation of ConversionUtil.get_raw_size tries to parse '1000ms' into '1000m' seconds and throws an ValueError.
This can be fixed with a try-except in get_raw_size in addition to the reordering.
from dbms.param import ConfigManager
can not find file dbms
hi, Can you provide a detailed deployment document?
It's best to use a database case to deploy and explain as you can, and tell us how to use it
thank you!!!
We need to set max values for some knobs in different hardwares. For example, the buffer memory cannot exceeds the total memory of the machine.
We can get the hardware information (e.g. memory, cpu, storage info) from here.
ottertune/server/website/website/models.py
Line 157 in 3022061
We select a hardware type when creating a session. Then we can get the hardware info like memory.
When tuning memory-related knobs, we do not want to exceeds the hardware memory. X_max values can be set here
The first step is to add KnobResourceType in the types file ottertune/server/website/website/types.py
1: Memory , 2: CPU, 3: Storage, 4:Other
Then add the resource attribute in the KnobCatalog model
I have selected some memory-related knobs for Postgres, you can add the info in the knob fixture ottertune/server/website/website/fixtures/postgres-96_knobs.json, and reload the data.
For now we can only support memory. The following knobs type should be set 1(i.e. Memory), the others can be set 4(Others) in the knob fixture.
global.work_mem
global.shared_buffers
global.temp_buffers
global.wal_buffers
global.maintenance_work_mem
global.effective_cache_size
ottertune/server/website/tests/test_parser.py
Line 454 in 03c4801
We need to update the test above because the create_knob_config function has been changed
Only show the selected target objective instead of all target objectives in the session view. e.g. when the user selects throughput as the target objective, then the latency should not be shown in the session view. Otherwise it gets broken because it does not have latency data.
When tuning knobs that are categorical variables (enums), OtterTune currently just replaces the enum with its index in the list of all possible enums. We need to add support to correctly handle categorical variables before running the machine learning algorithms by using an encoding scheme (e.g., dummy encoding, one-hot encoding).
Now we hardcode the objective function as throughput. See:
and
Also, we use the gradient descent to minimize -throughput.
We want to remove the hardcode part, and support other objective functions like latency (less is better), and use gradient descent to minimize these less-is-better objectives. Users can choose the objective function from the list (for now throughput and latency are enough). You may want to change the above files. The following files may also be changed.
hi, i read your code and found some file missing,for example, analysis.preprocessing
there exists
from .util import NEARZERO
from common.typeutil import is_numeric_matrix, is_lexical_matrix
but i can't find file util and typeutil
Use of LabelUtil's style_util method with a LabelStyle.CAPFIRST outputs labels in unicode. Other LabelStyles output standard Python 2.x strings.
To be consistent, all label styles should output Unicode.
The website is too slow when many results are uploaded. When uploading ~1000 results, the page response time is ~20 seconds. We need to speed up the website by optimizing Django code (e.g. use cache in Django). The response time can be reduced to less than 1 sec from my previous experience.
PR #88 and #131 add support for categorical variables when the variable can take at least 3 values.
There is no reason the same code cannot work for 2 values, except that dummy encoding a boolean just adds a redundant column.
The hacky way is to just take what's in the above PRs and turn off the check that the variable take at least 3 values. Unless that's actually fine to do, I plan to re-add support for binary categoricals that is both correct and doesn't require dummy encoding.
Add a target_objective.json file which has information about latency and its value. Upload it together with other four files (knobs.json, metrics_before/after, summary.json) to the tuner. Tuner needs to parse the file and gets the value of target objective.
ottertune/server/website/website/views.py
Line 384 in eeebeb9
You may also want to modify the parser .
Test your code locally to make sure it can tune latency successfully.
For now we show all the plot metrics on the left side, which is too long. We want to divide them into different pages like the right-side filtered results.
When I and Bohan tried to upload data using upload_data.py, we encountered following error.
ubuntu@ubuntu-xenial:/ottertune/server/website$ python script/controller_simulator/upload_data.py script/controller_simulator/generated_data 0987654321
Traceback (most recent call last):
File "script/controller_simulator/upload_data.py", line 64, in
main()
File "script/controller_simulator/upload_data.py", line 60, in main
upload(args.datadir, args.upload_code)
File "script/controller_simulator/upload_data.py", line 30, in upload
for wkld_dir in sorted(glob.glob(os.path.join(basedir, '*'))):
File "/usr/lib/python2.7/posixpath.py", line 70, in join
elif path == '' or path.endswith('/'):
AttributeError: 'list' object has no attribute 'endswith'
Hi ,
Is ottertune is ready for production use.
Celery and Django have the same log file website.log, it may cause some permission issues when restarting Celery worker or Django. For now we can grant permissions for /tmp/log/website.log* on ottertune machine when encountering such problem. It is not urgent, and I will change the Celery log file to celery.log when I am free.
Right now the automatic tuning component fails if there is no knob/metric data in its repository. To add support for tuning with an empty repository, OtterTune should start collecting some initial data so that it can train its ML models, then proceed as usual after it has collected enough training data.
When trying to run ottertune server it blows up with the following message:
File "/Users/edward/.virtualenvs/ottertune/lib/python2.7/site-packages/django/db/backends/mysql/base.py", line 28, in <module>
raise ImproperlyConfigured("Error loading MySQLdb module: %s" % e)
django.core.exceptions.ImproperlyConfigured: Error loading MySQLdb module: No module named MySQLdb
We may want to use an incremental approach where OtterTune dynamically increases the number of knobs used in a tuning session over time. Details can be found in our paper.
Hi all,
I'm trying to use ottertune to replicate the results obtained in "Automatic Database Management System Tuning Through Large-scale Machine Learning".
More specifically, I'm trying to run the TPC-C workload of OLTP. I've seen that the number of warehouses (scale) has been set to 200, but I can't find the number of terminals nor the weights for the various transaction types. Could you please share the xml configuration file you used?
I'd also like to know for how much time you let the workload run before evaluating the single trial, as I think that the default duration of 10 seconds is too short to obtain stable results. I'm orienting toward longer trials (~10 minutes) and discarding the first minutes to avoid warm up effects, what do you think?
Thanks in advance!
Below is a sample configuration provided by OLTP:
<?xml version="1.0"?>
<parameters>
<!-- Connection details -->
<dbtype>postgres</dbtype>
<driver>org.postgresql.Driver</driver>
<DBUrl>jdbc:postgresql://DB_IP:5432/</DBUrl>
<username>postgres</username>
<password>password</password>
<isolation>TRANSACTION_READ_COMMITTED</isolation>
<!-- Scale factor is the number of warehouses in TPCC -->
<scalefactor>2</scalefactor> <!-- instead of 2 -->
<!-- The workload -->
<terminals>2</terminals> <!--instead of 2 -->
<works>
<work>
<time>10</time> <!-- instead of 10 -->
<rate>10000</rate>
<ratelimited bench="tpcc">true</ratelimited>
<weights>45,43,4,4,4</weights>
</work>
</works>
<!-- TPCC specific -->
<transactiontypes>
<transactiontype>
<name>NewOrder</name>
</transactiontype>
<transactiontype>
<name>Payment</name>
</transactiontype>
<transactiontype>
<name>OrderStatus</name>
</transactiontype>
<transactiontype>
<name>Delivery</name>
</transactiontype>
<transactiontype>
<name>StockLevel</name>
</transactiontype>
</transactiontypes>
</parameters>
The workload model is kind of messy. (1) We do not distinguish the training data workloads and the tuning data workloads. (2) Also, OtterTune thinks two workloads are same if they have same names. If two users both name their workloads as 'workload-0', OtterTune will treat them as the same workload now. (3) When deleting a session, the corresponding workloads are not deleted.
When the controller is not specified a running time, it will continue to run until it receives a 'stop' signal.
The BaseParser class convert_integer and convert_real don't perform any checking to see if a value is negative.
I think it would be reasonable to expect knobs to have non-negative values. A ValueError should be raised if a PostgresParser is given a negative knob value as input.
I have not tested Python3 on Vagrant. Update Vagrant to support it. Remove supervisor in the fab file if necessary.
@dvanaken , it is so excited you make this project open source, according to the article https://aws.amazon.com/cn/blogs/ai/tuning-your-dbms-automatically-with-machine-learning/ the OtterTune project consists of several module like Tuning Manager, JDBC Manager and so on. It seems the current release is incomplete, we can't run a complete scenario with OtterTunne and the Website project.
You also mentioned you will make it as online tuning service, so I wondering what's your plan about this project, will fully open source it or just part of it. If you will open source a 'work' release, what about the schedule?
Thanks.
Convert_numeric_metrics is a method in the BaseParser class that iterates over the numeric_metric_catalog_ field.
There's a redundant conditional within the for loop that checks if the metrictype of a specified metric is MetricType.COUNTER. By definition, the numeric_metric_catalog_ contains only COUNTER metrics.
Method assumes that every numeric metric is found within the input metrics argument. This would raise a KeyError if not true. A fix would be to replace it with the 'get' method an check if the value of the metric is None.
Convert_numeric_metrics uses convert_integer rather than convert_real, then uses "float(converted) / observation_time". There is a possible loss of precision.
We have some hardcoded variables now. It's better to have a constant file and user can set all these variables in the constant file.
e.g. The number of samples is hardcoded.
pg_stat_archiver statistics collector didn't come to PostgreSQL till 9.4. As a result, collecting data fails.
We should add some way to check versions of the DBMS in the collector before looking for stat views that don't exist yet.
BaseParser leaves the methods convert_string and format_string unimplemented. PostgresParser does have knobs and metrics with VarType.STRING.
Postgres96Parser metrics have different scopes - database, table, index, global. Currently, the BaseParser machinery supports global and local scopes only, and raises Exceptions otherwise.
Postgres96Parser needs an implementation of parse_dbms_metrics that doesn't rely on the BaseParser's parse_dbms_variables method, or the BaseParser's parse_dbms_variables method needs to be able to deal with various common scopes, or condense them into one scope.
Vagrant has the capability to sync folders between the host machine and the VM.
This is potentially more secure than symlinks and portable to other hypervisors since Vagrant handles the configurations.
Probably helps deal with the long filenames and development process.
When the driver stops OLTPBench and controller, OLTPBench will output some statistics like latency, and controller will output some collected data (knobs.json, metrics_before/after.json, summary.json) . You may want to parse the OLTPBench output file and write it into a new file target_objective.json, which has the information of target objectives and their values (e.g. latency)
Then you will upload the target_objective.json together with four other json files collected from controller to the OtterTune server.
ottertune/client/driver/fabfile.py
Line 178 in eeebeb9
Fix the PR #147 first.
I'm interested in trying out OtterTune, and I'd love more detailed information about where to start! I can see that most questions about deployment were closed last November with notes that more detailed instructions would be coming soon, is there an estimate on when more detailed information might be available, or any pointers on where to begin in the meantime?
Hi,
Is there any document for the full installation ? is the website already included in the ottertune ?
Thanks
Move the parameters in the analysis part to the constant file. ottertune/server/website/website/settings/constants.py
Tune these parameters if you have free time.
Migrate to Python 3 ASAP to support GPFlow.
It seems that we do not have a page view to change the password for users. Can anyone help with it ?
As seen here: https://pythonclock.org/ , python2 is being retired in the next two years. It would be very handy to port it to python3 as soon as possible. One idea is to use 2to3
tool that allows for an automatic way of rewriting the source files of the project. It can be executed as a dry-run so that you can see the impact of the changes. The main impediment is that some libs like poster
are not ported to python3 (one of the motivations to my PR #49 , besides requests
being awesome) , but this one in particular looks like abandonware, imho.
Wdyt, @dvanaken ?
Add support for Postgres 9.3
Add fixture here
ottertune/server/website/website/fixtures/dbms_catalog.json
Add Postgres9.3 parser here
For now we only use random samples as starting points in GPR.
We may want to add the previous top K configs with the best performance as the starting points. K is a hyperparameter included in the constant file ottertune/server/website/website/settings/constants.py
The default K value is 20
OtterTune currently does not validate the values of the knobs when a user uploads new data to the website. It also does not validate them before creating the next configuration for the user to run.
We should perform the following checks for each knob using the information in the KnobCatalog:
This is not urgent. But I think we can have a better way to store the fixtures for different database versions. For example, Most of knobs and metrics in Postgres 9.2, 9.3, 9.4, 9.6 are same. We can only store these common knobs/metrics in a single Postgres fixture, and store the version-specific knobs/metrics separately. For now we have duplicated knobs/metrics in all Postgres versions.
ottertune/server/website/website/fixtures/
Run OLTPBench on the ottertune-dev, and write a python file to check its LOG periodically (like every 1 second). The OLTPBench will create some terminals, start the experiment, finish the experiment and kill the processes. Your python file should know when OLTPBench starts and finishes the experiment by checking its Log. Then send a signal to the controller to start and terminate. You may want to add the glue code in the driver code.
ottertune/client/driver/fabfile.py
Line 150 in 490a9d5
The loop script is almost done. However, OtterTune recommends some knobs out of the reasonable range sometimes. I am still looking into it and will fix it soon .
It doesn't look like anything else in the project depends on either the WorkloadState or WorkloadMapper classes defined here, and that there's a independent map_workload task that really does the work right now. Or am I missing something?
Currently MySQL, PostgreSQL, and Actian Vector are supported based on the whitepaper, can you theorize difficulty in application to TSDB tunings like OpenTSDB, InfluxDB, etc.?
I am very interested in ottertune, view the relevant documents, but I do not know how to use, can provide detailed installation using documentation. Thank you.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.