jeffersonlab / ccdb Goto Github PK

Jefferson Lab Calibration and Conditions Database (CCDB)

Python 79.58% Shell 0.09% C++ 17.78% C 0.83% Makefile 0.01% Java 0.09% Kotlin 0.85% Perl 0.01% CMake 0.35% CSS 0.11% HTML 0.30% Go 0.02%

ccdb's Introduction

ccdb

The Jefferson Lab Calibration Constants Database (CCDB) is a framework to store and manage calibration constants for experiments in high energy and nuclear physics. Primary access to the constants sets is by run number. Constants sets themselves are organized in a tree of constant set types, customized for the experiment and of arbitrary depth. Alternate versions of constants are supported. The complete time history of the constant set tree is kept. Access to alternate versions and to older versions is supported via configuration of the access routines.

CCDB provides readout interfaces for:

C++
Python
Java
JANA framework
Command line
Web site

To manage data (add, update, delete):

Command line tools ('ccdb' command)
Python API

Platforms:

Linux (tested with RedHat, Debian families)
MacOS
Windows (partly)

Installation

The minimum installation to see and manage constants is like this:

git clone [email protected]:JeffersonLab/ccdb.git ccdb
source ccdb/environment.bash

#That is it! Check it works
ccdb -i

Instructions how to build CCDB for different programming languages and other info are in the wiki

ccdb's People

Contributors

Stargazers

Watchers

Forkers

theodoregoetz sonia3994 lendackya kpadhikari latifkabir bopopescu ademus4 mholtrop markito3 sebouh137 yeq71 sdobbs

ccdb's Issues

ccdb command ignores variation parent

As we discussed yesterday in the BCAL reconstruction meeting, we had
decided for the variation=calib to duplicate some of the constants
(e.g. ADC_gains) from the variation=default. According to Mark Ito, a
variation will default to the parent data if no specific 'variation
data' has been loaded. This would save us considerable effort in
completing the variation=calib. However, this does not seem to work:

ifarm1102> ccdb dump /BCAL/ADC_gains::calib > ! ADC_gains.txt
ifarm1102> more ADC_gains.txt
There is no data for table /BCAL/ADC_gains, run 0, variation 'calib'
Cannot fill data for assignment with this ID
ifarm1102> ccdb dump /BCAL/ADC_gains::default > ! ADC_gains.txt
ifarm1102> more ADC_gains.txt

 #
   3.32441e-05
   6.81519e-05
   4.44338e-05
   8.16211e-05
   3.15279e-05
   5.12456e-05
   5.01645e-05
   5.9035e-05
 ........

Put CCDB to HEP software foundation wiki

ccdb command print errors to stderr

For debugging purposes, it would be useful if the ccdb CLI program printed errors to stderr instead of stdout. In some batch jobs, I was getting errors like the following which were showing up in stdout, and it would be easier to track them down if they showed up in stderr:

CCDB provider unable to connect to sqlite:///home/gxproj3/calib_challenge/ccdb.sqlite. Aborting command. Exception details: (OperationalError) unable to open database file None None

[Java] Add writing possibility to Java API

[Java] Setup unit testing framework for Java

Problems using low-level API to write tables to SQLite files

I've run into a possible problem using the python low-level API to add assignments to an SQLite file. This code had worked up until about a month ago, for what it's worth.

I've attached a python script that reproduces the problem, along with the output that it produces.

I've been able to reproduce the bug using several SQLite files on various locations in the JLab CUE. It works fine with the MySQL master DB.

example.zip

[CLI] Copy assignments from one variation to another

[C++ API] Slow GetValue

Reading columns with different types, like this:

    auto_ptr<Assignment> calibModel(calib->GetAssignment(dcc.database));

    for(size_t rowI = 0; rowI < calibModel->GetRowsCount(); rowI++)
    {
            cout << "  par Name: "       << calibModel->GetValue(rowI, 1)
                  << "  par Value:  "     << calibModel->GetValue(rowI, 2) << endl;
    }

Looks slower than if all the columns are of the same type. Or is it maybe cause I’m reading every item in every column?

It seems that having a vector < vector > and filling that table is faster, but I’m not sure.

[Java] Access the LOG from Java

Need an API to access the LOG from Java. Possibly for given time range.

[C++] Crash on resources release

This crash is with JANA 0.7.9, current sim-recon master, CCDB 1.06.04, ROOT 6.08.06

Full crash report
RF_online_ccdb_crash.txt

put requests syntaxis into the documentation

Deprecated tables

Mark tables as deprecated.
Deprecated tables are not shown be ccdb command by default but still exists
To see all tables --show-all flag should be used for ccdb

Tools for moving data between variations

These would just help streamline usage. Things like tools for copying assignments from one variation to another, and assigning the average values determined over one run range in a "working" variation to that run range in another variation.

run number input in ccdb add

By default add takes 0-inf, better to give an error message if missing.

Better user rights system

Required by #7 and #6

User will be added directly to SQLAlchemy model classes. So one can use something like

table.creator.name
variation.creator.logs

[Java] Given a table name, retrieve all variations and runs for that table

"log" command enhancements

I find myself using the log command a lot, and it would be nice to have the following features:

ability to set which fields are displayed
ability to filter on various fields (e.g. certain user names, table names, dates, etc.)

Use of cat --id

Using cat --id command corrupts the command line interface ("No row was found for one()").

Lustre issues?

I'd forgotten, with the latest releases, are we moving towards Lustre compatibility?

I'd had some questions sent my way recently in which there were problems with SQLite DBs, which may be due to a filesystem problem, but the usual errors were not thrown. If the diagnosis is correct, I'll open another issue, but wanted to check on this point first.

Tag a new version on 1.06 line with JAVA changes

as discussed at meeting on 9/23/16 with Hall B folks.

Adding blame-info in "vers" output

It would be really useful to have the name of the person who added the constants to be listed in the "vers" command. Right now to find this information, one has to cross references a couple different commands.

Comment dump fix

Do we need a new tag to include the fix to the issue that Elton reported? The one about comments not being dumped out properly?

Management ideas

This note contains a few different ideas and is posted for the sake of discussion. It should probably be broken up into some actual issues for implementation.

There's a few improvements that would make calibration processing easier.

Getting data into the CCDB.
Right now, the GlueX jobs generate calibrations for ~20-30 tables for each production run. It would be nice to do further data reduction in the CCDB framework. I'd prefer to automatically add all of this information in the jobs to the main CCDB in some specified variation. I'm hesistant to do so since it would spam up the logs and potentially inflate the size of the DB. So I see a few possible solutions:

Just do the work in an SQLite CCDB - this can be a headache, with many jobs potentially writing to the same file at once. I could do write some scripts to do post-processing, though.
Hide the commits with improved logging. Maybe only show changes to the default variation by default?
Meta-commits. One could also imagine building a commit where instead of just applying one set of constants to a table for a given run range, one constructs a mapping of constants files to particular runs (or ranges of runs?) and adds those to the DB all at once. This might be an over complication, though.

[Note that some constants require data from multiple runs. These procedures are still mainly in the hands of the experts at the moment.]

Analysis of data in the CCDB.
Once we've determined constants for the individual runs, assuming they are put in the CCDB, we'd like to monitor them, and determine values for certain subranges (if need be). So, the following would be useful:

Time series: Plots that show the variation of constants as a function of run. Should be able to show individual constants as well as summary values (e.g. for individual channel timings, useful to see mean. std. dev., quartiles, etc.)
Visualization of comparison between two different variations. Here the idea is to show the difference in values between two variations. For example, let's say one has a "working" variation and is trying to decide what to commit to the default variation for physics analysis. One wants to see the differences between the variations for a given set of constants, and summary values for tables with many entries. These could be shown for many tables in a given run, or for many runs for a given table.
This is very key in figuring out which runs are in need of recalibration.
Tools for moving data between variations. These would just help streamline usage. Things like tools for copying assignments from one variation to another, and assigning the average values determined over one run range in a "working" variation to that run range in another variation.

Questions about data model

Just a few questions...

Is it by design that assignments to a table must always have a fixed number of rows? For instance, I have a bad channel set that has a variable number of rows. Would it be better to model this so that every channel has a value with a boolean value so that every dataset has the same number of rows?

Can I assign multiple independent run ranges to the same set of calibration data? Or should I instead duplicate the data for the new run range?

Thanks.

Create a script to recover columns without type

The columns that was created by #20

Error message when modifying assignment

When modifying an assignment using the low-level API, the command completes successfully and the database is modified, but the following error message is thrown:

Traceback (most recent call last):
File "fix_run_range.py", line 12, in
provider.update_assignment(assignment)
File "/group/halld/Software/builds/Linux_CentOS6-x86_64-gcc4.9.2/ccdb/ccdb_1.06.01/python/ccdb/provider.py", line 1252, in update_assignment
affected_ids=[assignment.tablename + assignment(assignment.id)],
TypeError: 'Assignment' object is not callable

Locked tables and variations

If table or variation is "locked" one can't write to it untill it is "unlocked"

Visualization of comparison between two different variations

Visualization of comparison between two different variations. Here the idea is to show the difference in values between two variations. For example, let's say one has a "working" variation and is trying to decide what to commit to the default variation for physics analysis. One wants to see the differences between the variations for a given set of constants, and summary values for tables with many entries. These could be shown for many tables in a given run, or for many runs for a given table.

[Java] Any Java examples available?

Are there Java examples available anywhere showing how to use this API?

It would be especially helpful to see simple examples of adding, selecting, modifying and deleting data sets from the DB.

Thanks.

[Java] Given a table name and run number, retrieve all variations for that table and run

JS API

JS API for CCDB. Lets specify:

the request
the response

What should be there, etc

CCDB Allows inappropriate types for columns

This works:

ifarm1101> ccdb mktbl /calibration/dc/signal_generation/intrinsic_inefficiency -r 6 parameter3=double parameter4=float 
/group/clas12/bin/ccdb/sqlalchemy/engine/default.py:425: Warning: Data truncated for column 'columnType' at row 1 saving table to database...  completed
ifarm1101> ccdb cat /calibration/dc/signal_generation/intrinsic_inefficiency 
+-------------------------+
| parameter3 | parameter4 | 
| double     |            | 
...

But should not

[C++] scons install

Implement

scons install

If no prefix given, copy to standard system folders

Time series

Plots that show the variation of constants as a function of run. Should be able to show individual constants as well as summary values (e.g. for individual channel timings, useful to see mean. std. dev., quartiles, etc.)

[Java] Getting lists of variation, run ranges, time stamps for a given table

Error when creating directory in ccdb command line tool

When I execute this in the command shell

/> mkdir /test2

I get this error with a blank exception message

Failed to create directory. Exception message:

I am using a copy of the sqlite data file provided in the ccdb git project e.g.

ccdb -c sqlite:///$PWD/src/main/sql/ccdb.sqlite -i

Several other commands such as ls, etc. seem to work fine.

Am I able to create new tables via this interface or no?

Thanks.

Problems with SQLite files on the ifarm?

I've been working on some calibrations on the ifarm machine, using SQLite CCDB files, but when I make changes to these files, it looks like they are modified, but ccdb log show no changes.

An example file is here: /u/scratch/sdobbs/ccdb.sqlite
I changed the /PHOTON_BEAM/RF/time_offset table, but this change doesn't show.

password prompt when adding variation

This one came in over email from Sean Dobbs (@sdobbs):

When adding the variation for the 2016 simulations on the ifarm, it asked me for a password for some reason [terminal capture at the end of the email]. I put in "ccdb" and that worked fine, but I'm not sure if it's supposed to do that.

ifarm1102> ccdb mkvar mc_sim1 -p mc
Variation mc_sim1 created
Enter MySql password:
Password:

Better Control of Timestamps on Ancestor Variations

@DraTeots and I ( @markito3 ) discussed this yesterday.

The issue is that since any given variation may have one or more ancestors (parents, parents of parents, etc.), the user may want to have different calibration times (CALIBTIME or historical timestamp) for the variation being used and each of its ancestors. For example when working on a TOF calibration using the "tofcal" variation, one may want to have a fixed version of all constants not associated with the TOF, i. e., not explicitly named in the "tofcal" variation. If tofcal's parent is "default", then the user would want to use a fixed version of "default", identified by date, but always use the latest version of the "tofcal" variation. Currently the only behavior available is the opposite of this use case; the user would get the latest version of "default" and can only specify a fixed CALIBTIME for "tofcal".

The proposed solution has two parts:

Make another signature-differentiated version of the SetTime function of the API. The current version takes only a time as an argument. The new version takes a time and a variation name.
Add a new parameter to the JANA_CALIB_CONTEXT parameter: VARTIME that specifies variation and time, e. g., VARTIME=mc:2016-04-01 . Multiple instances of VARTIME could appear. Implementation would use the API function defined above.

Make sure CLAS12 software group is aware of github issues

Gagik will take care of this one.

Errors running LLAPI example?

I've run into some errors that I hadn't before when using the low-level API. Maybe the best illustration is using the sample script $CCDB_HOME/examples/llapi_readout.py

I tried running on ifarm1401 with python2.7 using ccdb_1.06.01 and ccdb_1.06.01 and get the following output:

=======================

ifarm1401> python llapi_readout.py
<Directory 3 'test_vars'>
test_vars
/test/test_vars
<Directory 1 'test'>
[]
== TABLE == 'test_table'
/test/test_vars/test_table
Test type
2014-04-10 17:20:28
test_vars
[<ConstantSet '1'>, <ConstantSet '2'>, <ConstantSet '4'>, <ConstantSet '5'>]
x y z
double double double
rows 2 x 3 columns
2
== TABLE == 'test_table2'
/test/test_vars/test_table2
Test type 2
2014-04-10 17:20:28
test_vars
[<ConstantSet '3'>]
c1 c2 c3
int int int
rows 1 x 3 columns
2

== Getting tables another way ==
/test/test_vars/test_table2
test_table
test_table2
test_table
test_table2

== Getting all table data ==
test
0
2147483647
/test/test_vars/test_table2:0:test:2012-09-30_23-48-42
2
[[u'10', u'20', u'30']]
[u'10', u'20', u'30']

== Getting assignment ==
Traceback (most recent call last):
File "llapi_readout.py", line 91, in
assignment = provider.get_assignment(1, "/test/test_vars/test_table2", "test") # run, table, variation
File "/u/group/halld/Software/builds/Linux_CentOS6-x86_64-gcc4.9.2/ccdb/ccdb_1.06.02/python/ccdb/provider.py", line 1026, in get_assignment
assert isinstance(path_or_table, TypeTable)
AssertionError

[C++] Memory Leak

There seems to be a memory leak in DataProvider where it does not delete mAuthentication in the destructor.

Option to disable anonymous checkins from CCDB command line tool

It would be nice to disable anonymous checkins from the CCDB command line tool in some way. Ideally this would be the default behavior.

Since the name associated with a change is taken from CCDB_USER, many people forget to properly set this (despite sustained nagging), which is beginning to make changes difficult to manage.

[Java] Get all paths for Java API

Java API is lacking ability to get all tables/directories from CCDB. Should be there.

[Java] Example of working with directory structure

[Java] Column data with 'long' type cannot be retrieved

I have a simple main where I'm trying to read long values from a column in Java:

import java.util.Vector;

import org.jlab.ccdb.Assignment;
import org.jlab.ccdb.CcdbPackage;
import org.jlab.ccdb.JDBCProvider;
import org.jlab.ccdb.TypeTableColumn;

public class ReadTest {
        
    static final String CONNECTION = "sqlite:////u/ey/jeremym/hps-dev/ccdb-scratch/scratch/ccdb.sqlite";
    static final int RUN = 5772;
    static final String TABLE = "/ECAL/calibrations";
    
    public static void main(String[] args) {
        
        JDBCProvider provider = CcdbPackage.createProvider(CONNECTION);
        provider.connect();
        provider.setDefaultRun(5772);
        Assignment a = provider.getData(TABLE);
        
        Vector<TypeTableColumn> typeTable = a.getTypeTable().getColumns();
        for (TypeTableColumn col : typeTable) {
            System.out.println(col.getName() + ":" + col.getCellType());
        }
        
        Vector<Long> channelIds = a.getColumnValuesLong(0); // Throws exception but column is actually a long!
        Vector<Double> pedestals = a.getColumnValuesDouble(1);
        Vector<Double> noise = a.getColumnValuesDouble(2);
        
        int len = channelIds.size();
        for (int i = 0; i < len; i++) {
            System.out.println(channelIds.get(i) + " " + pedestals.get(i) + " " + noise.get(i));
        }
        
        provider.close();
    }
}

The column info in the db looks like:

+------------------------------------------+
| Columns info                             |
+------------------------------------------+

Columns info
 N.   (type)    : (name)
 0    long      : ecal_channel_id
 1    double    : pedestal
 2    double    : noise

The test does not work though. The Java API is not able to read back the long values, e.g.

Exception in thread "main" java.lang.NumberFormatException: For input string: "1L"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:589)
        at java.lang.Long.parseLong(Long.java:631)
        at org.jlab.ccdb.Assignment.getColumnValuesLong(model.kt:355)
        at ReadTest.main(ReadTest.java:26)

The Java API seems to know the correct column types though:

ecal_channel_id:long
pedestal:double
noise:double

Any idea why this might be?

I was seeing similar problems for int columns as well.

This is using Java 1.8 with the CCDB master and Python 2.7 (I'm suspecting there's an issue here with python 2.7 adding the 'L' to these values).

Retroactively adding comments

Hi,

Is it possible to retroactively add comments to tables, identified by ID or date or similar?

Thanks.

ccdb help mktbl doesn't provide a list of allowed types

It says

type     - type of the column (int, double, string)

But should

.... int uint long ulong bool double string

Very slow startup with CCDB sqlite file

Mark Dalton recently told me of an issue with hd_root taking several minutes to start
processing events when running on the gluons. After a little investigation, I was able
to reproduce the problem in my own account, but only if using an sqlite file for CCDB.
If I use mysql, then it starts up normally in just a few seconds.

Has anyone else observed this recently? I have successfully used sqlite for CCDB
quite a bit so I'm suspicious this is not a global issue but wanted to check with others
before fully escalating it to defcon Romanov.

-David

https://groups.google.com/forum/#!topic/gluex-software/stOBsyHJsoE

Tracking committer information

Is there currently a way to find out the person who committed a particular assignment through the standard CCDB CLI?

jeffersonlab / ccdb Goto Github PK

ccdb's Introduction

ccdb's People

Contributors

Stargazers

Watchers

Forkers

ccdb's Issues

Recommend Projects

Recommend Topics

Recommend Org