gurobi / gurobi-logtools Goto Github PK

Extract and visualize information from Gurobi log files

License: Apache License 2.0

Python 85.76% Jupyter Notebook 13.59% Shell 0.64%

gurobi-logtools's Introduction

gurobi-logtools

Extract information from Gurobi log files and generate pandas DataFrames or Excel worksheets for further processing. Also includes a wrapper for out-of-the-box interactive visualizations using the plotting library Plotly.

Note

We have renamed the project to gurobi-logtools, so please also adapt the import statement accordingly:

import gurobi_logtools as glt

Installation

python -m pip install gurobi-logtools

It is recommended to prepend the pip install command with python -m to ensure that the package is installed using the correct Python version currently active in your environment.

See CHANGELOG for added, removed or fixed functionality.

Usage

First, you need a set of Gurobi log files to compare, e.g.,

results from several model instances
comparisons of different parameter settings
performance variability experiments involving multiple random seed runs
...

You may also use the provided gurobi-logtools.ipynb notebook with the example data set to get started. Additionally, there is a Gurobi TechTalk demonstrating how to use it (YouTube):

Pandas/Plotly

parse log files:

import gurobi_logtools as glt

results = glt.parse(["run1/*.log", "run2/*.log"])
summary = results.summary()
nodelog_progress = results.progress("nodelog")

Depending on your requirements, you may need to filter or modify the resulting DataFrames.

draw interactive charts, preferably in a Jupyter Notebook:
- final results from the individual runs:
```
glt.plot(summary, type="box")
```
- progress charts for the individual runs:
```
glt.plot(nodelog_progress, y="Gap", color="Log", type="line")
```
- progress of the norel heuristic (note, the time recorded here is since the start of norel, and does not include presolve + read time):
```
glt.plot(results.progress("norel"), x="Time", y="Incumbent", color="Log", type="line")
```
These are just examples using the Plotly Python library - of course, any other plotting library of your choice can be used to work with these DataFrames.

Excel

Convert your log files to Excel worksheets right on the command-line:

python -m gurobi_logtools myrun.xlsx data/*.log

List all available options and how to use the command-line tool:

python -m gurobi_logtools --help

Rename log files

The command line tool can also rename log files according to the parameters set and model solved in a given run. This is useful if your log files do not have a consistent naming scheme, or if multiple runs are logged per file and you want to extract the individual runs.

For example:

python -m gurobi_logtools --write-to-dir nicenames summary.xlsx tests/assets/combined/*.log

separates logs for individual runs in the input files and writes copies to the 'nicenames' folder with a consistent naming scheme:

> ls nicenames
912-MIPFocus1-Presolve1-TimeLimit600-glass4-0.log
912-MIPFocus1-Presolve1-TimeLimit600-glass4-1.log
912-MIPFocus1-Presolve1-TimeLimit600-glass4-2.log
912-MIPFocus2-Presolve1-TimeLimit600-glass4-0.log
912-MIPFocus2-Presolve1-TimeLimit600-glass4-1.log
912-MIPFocus2-Presolve1-TimeLimit600-glass4-2.log
912-Presolve1-TimeLimit600-glass4-0.log
912-Presolve1-TimeLimit600-glass4-1.log
912-Presolve1-TimeLimit600-glass4-2.log

gurobi-logtools's People

Contributors

Stargazers

Watchers

Forkers

zedtdean simonbowly siefen erodriguezheck maliheha ronaldvdv begunefeoglu torressa jiaodaxiaozi wenyuzhi venaturum ztsshao123 cozad-gurobi aazab

gurobi-logtools's Issues

Uncaught ReferenceError: Plotly is not defined

Hi! While trying to use gurobi-logtools, I am encoutering the following error:

Uncaught ReferenceError: Plotly is not defined.

This is the code I'm executing in databricks:

%pip install gurobi_logtools

import gurobi_logtools as glt
import pandas as pd

results = glt.parse(["gurobi_log.log"])
summary = results.summary()
nodelog = results.progress("nodelog")

glt.plot(summary, type="box")
glt.plot(nodelog, y="Gap", color="Log", type="line")
glt.plot(results.progress("norel"), x="Time", y="Incumbent", color="Log", type="line")

I already tried to reinstall plotly and import plotly, but this does not solve the problem. Do you have any idea what could help? Thanks!

Extract changed parameter summary

When the tool extracts parameter settings into individual columns, each run that did not have an explicitly defined value for a certain parameter will have the default parameter value in that column. This means that from the parameter columns, one cannot easily see which parameters were defined.

The default values are added in fill_default_parameters

The most important use case for me would be to show a simple summary of the parameter combination in tables and plots. For example, if Heuristics was not set explicitly for a particular run but NodeMethod was value 1, then we could summarize the log file as "NodeMethod=1".

I would propose we change the fill_default_parameters function. We first find the columns that relate to parameter settings (using re_parameter_column.match(column) like here). Then for each log file, we collect the non-NaN-values of these columns, pass them through one or more callbacks (together with the parameter names) and store the result in a new column. One example of a (default) callback would be the string formatter above. Another example would be to count the number of non-default-values which is often relevant for tuning to prefer smaller combinations of parameters.

Parse the new "Solving model xxx" as the model name

In tuner-outputted logs, we should parse:

Solving model misc07

to give misc07 as the ModelName field.

Correctly parse Compute Server logs

The header lines are different for Compute Server logs, so the beginning of a log is currently not detected.

Cut name pattern too restrictive

When extracting statistics about generated cuts, the pattern for matching cut names is too restrictive. It does not match Relax-and-lift which is generated for 912-glass4-0.log. I believe the hyphen is not included in the [\w ] pattern for the cut name.

Improve handling of multi-objective optimization logs

We need to have a better parsing of multi-objective runs so that we can clearly distinguish the individual objectives.

track data like gap and primal and dual bound progress for each objective
track how many objectives have been solved
identify how much time was spent for each objective

Throw a more useful error message if no logs were parsed

If the glob pattern matches no files (which can easily happen due to a typo, or being in the wrong directory), we get this unhelpful error during post-processing:

>>> import grblogtools as glt
>>> glt.parse("asdjkhaskd")
<grblogtools.api.ParseResult object at 0x1023f75b0>
>>> glt.parse("asdjkhaskd").summary()
Traceback (most recent call last):
  File "/Users/bowly/.pyenv/versions/3.10.3/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3800, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Version'

We should throw something more useful, like a FileNotFoundError at the parsing step if no files match.

Provide conda install

Please provide a conda install for this, not just pip. pip and conda do not always work well together.

See what anaconda says:
https://www.anaconda.com/blog/using-pip-in-a-conda-environment

better handling of warnings

Numerical warnings should be easily accessible after parsing. There could also be a dedicated function to display the "health" of the run, i.e., count the warnings and violations.

Currently code in gurobi-logtools.ipynb is untested

Parse deterministic work metric

For example, in MIP logs we parse this line but don't collect the work units info

Explored 1466 nodes (267947 simplex iterations) in 12.83 seconds (18.68 work units)

add support for VIVA .trc file export

Add a method that can output a TRC file to be read by VIVA.

example TRC file:

alan,MINLP,scip,CONOPT,CPLEX,43358.445,0,8,9,4,24,3,1,8,1,2.92500006672242,2.92211381078608,0.137,165,0,1,#
ball_mk2_10,MINLP,scip,CONOPT,CPLEX,43358.445,0,2,11,10,21,10,1,1,1,0,0,0.002,0,0,1,#

meaning of the values:

InputFileName
ModelType
SolverName
NLP
MIP
JulianDate
Direction
NumberOfEquations
NumberOfVariables
NumberOfDiscreteVariables
NumberOfNonZeros
NumberOfNonlinearNonZeros
OptionFile
ModelStatus
TermStatus
ObjectiveValue
ObjectiveValueEstimate
SolverTime
NumberOfIterations
NumberOfDomainViolations
NumberOfNodes
UserComment

Model size before/after presolve

In 912-glass4-0.log we have the following:

Optimize a model with 396 rows, 322 columns and 1815 nonzeros
Model fingerprint: 0x0a9d9037
Variable types: 20 continuous, 302 integer (0 binary)
Coefficient statistics:
...
Presolve removed 6 rows and 6 columns
Presolve time: 0.00s
Presolved: 390 rows, 316 columns, 1803 nonzeros
Variable types: 19 continuous, 297 integer (297 binary)

We extract the following information:

NumConstrs = 396
NumVars = 322
NumNZs = 1815
PresolvedNumConVars = 19
PresolvedNumIntVars = 297
PresolvedNumBinVars = 297

Should we also extract the third line above, e.g. the detailed model size before presolve (20 continuous, 302 integer, 0 binary variables)?

Problem with log files without tree search log

For some set of log files, I'm seeing this error:

UnboundLocalError: local variable 'tree_search_log' referenced before assignment

I believe this relates to this line which fails if the file does not contain the data tested for in the if-statement above.

rename the package to gurobi-logtools

This will make the package more consistent with the other open-source Gurobi projects.

Incorrect status reported for incomplete logs

In latest master branch, if a MIP log is incomplete (i.e. cut off with no termination message for whatever reason), we might report optimal status incorrectly. For example:

Variable types: 23522 continuous, 2343 integer (0 binary)

Root barrier log...

Barrier solved model in 50 iterations and 72.71 seconds (53.24 work units)
Optimal objective -1.76339641e+08

Solved with barrier

Root relaxation: objective -1.763396e+08, 104343 iterations, 108.23 seconds (79.42 work units)

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

here we get 'OPTIMAL' status from ContinuousParser, but no termination message from NodeLogParser.

grblogtoolsv1 would give an 'incomplete log' warning in this situation (and report unknown status? I'm not sure).

We should check for this with some custom logic for Status and Runtime, something like:

If the model is continuous, we can get Runtime and Status from ContinuousParser
If the model is (a) a MIP or (b) a continuous model solved as a MIP, we should ignore Runtime and Status from ContinuousParser
- (a) We can check using model type in SingleLogParser
- (b) Look for the message Solving as a MIP in header or presolve
If TerminationParser reports runtime or status, it should take precedence (this already happens)

Heuristic solutions found prior to nodelog (but not by NoRel) not parsed

Eg, we should pick up the following two incumbents

 Coefficient statistics:
  Matrix range     [8e-07, 1e+00]
  Objective range  [3e-04, 1e+00]
  Bounds range     [0e+00, 0e+00]
  RHS range        [2e+00, 1e+04]
Found heuristic solution: objective 2.6508070                             <- here
Presolve removed 751 rows and 0 columns
Presolve time: 0.63s
Presolved: 249 rows, 1000 columns, 249000 nonzeros
Variable types: 0 continuous, 1000 integer (0 binary)
Found heuristic solution: objective 15.5735142                             <- and here

Root relaxation: objective 2.467344e+01, 32 iterations, 0.02 seconds (0.01 work units)

    Nodes    |    Current Node    |     Objective Bounds      |     Work
Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0   24.67344    0    5   15.57351   24.67344  58.4%     -    0s

The number of these incumbents found should be reported in the summary.
The objective value of the incumbents should be reported in timelines.

[BUG] cannot join with no overlapping index names

Although the log file seems to be a valid one (MILP model feasible, and optimal value found), grblogtools cannot parse correctly the log (attached
solver_gurobi_8_0_ir.log).

The same issue appears when using the Python API directly parse and get_dataframe functions, with one or several valid logs.

Versions

Gurobi Optimizer version 9.1.2 build v9.1.2rc0 (linux64)
python3.9.5
grblogtools=2.0.0
EDIT pandas==2.0.1

Buggy code

python3 -m grblogtools test.xlsx solver_gurobi_8_0_ir.log

Error messages

Traceback (most recent call last):
  File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File ".venv_39-dev/lib/python3.9/site-packages/grblogtools/__main__.py", line 4, in <module>
    cli(
  File ".venv_39-dev/lib/python3.9/site-packages/grblogtools/cli.py", line 28, in cli
    summary = result.summary()
  File ".venv_39-dev/lib/python3.9/site-packages/grblogtools/api.py", line 118, in summary
    summary = summary.join(parameters)
  File ".venv_39-dev/lib/python3.9/site-packages/pandas/core/frame.py", line 9734, in join
    return self._join_compat(
  File ".venv_39-dev/lib/python3.9/site-packages/pandas/core/frame.py", line 9773, in _join_compat
    return merge(
  File ".venv_39-dev/lib/python3.9/site-packages/pandas/core/reshape/merge.py", line 158, in merge
    return op.get_result(copy=copy)
  File ".venv_39-dev/lib/python3.9/site-packages/pandas/core/reshape/merge.py", line 805, in get_result
    join_index, left_indexer, right_indexer = self._get_join_info()
  File ".venv_39-dev/lib/python3.9/site-packages/pandas/core/reshape/merge.py", line 1039, in _get_join_info
    join_index, left_indexer, right_indexer = left_ax.join(
  File ".venv_39-dev/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 259, in join
    join_index, lidx, ridx = meth(self, other, how=how, level=level, sort=sort)
  File ".venv_39-dev/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 4455, in join
    return self._join_multi(other, how=how)
  File ".venv_39-dev/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 4578, in _join_multi
    raise ValueError("cannot join with no overlapping index names")
ValueError: cannot join with no overlapping index names

gurobi-logtools.ipynb broken

There are cells which compare the Seed column to strings, but the data is numeric, eg

fastest_run = selected_run[selected_run["Seed"] == "1"]
slowest_run = selected_run[selected_run["Seed"] == "2"]

Update default parameters for V11

The latest json data for default parameter values is for v10.0.1

There were a few important parameters missing, namely 'ConcurrentMethod', 'FuncNonlinear', 'MixingCuts'.