Giter Club home page Giter Club logo

ccdb's Introduction

ccdb

travis CI

The Jefferson Lab Calibration Constants Database (CCDB) is a framework to store and manage calibration constants for experiments in high energy and nuclear physics. Primary access to the constants sets is by run number. Constants sets themselves are organized in a tree of constant set types, customized for the experiment and of arbitrary depth. Alternate versions of constants are supported. The complete time history of the constant set tree is kept. Access to alternate versions and to older versions is supported via configuration of the access routines.

CCDB provides readout interfaces for:

  • C++
  • Python
  • Java
  • JANA framework
  • Command line
  • Web site

To manage data (add, update, delete):

  • Command line tools ('ccdb' command)
  • Python API

Platforms:

  • Linux (tested with RedHat, Debian families)
  • MacOS
  • Windows (partly)

Installation

The minimum installation to see and manage constants is like this:

git clone [email protected]:JeffersonLab/ccdb.git ccdb
source ccdb/environment.bash

#That is it! Check it works
ccdb -i

Instructions how to build CCDB for different programming languages and other info are in the wiki

ccdb's People

Contributors

collinm8 avatar dmitryromanovtest avatar drateots avatar faustus123 avatar lendackya avatar markito3 avatar theodoregoetz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

ccdb's Issues

ccdb command ignores variation parent

As we discussed yesterday in the BCAL reconstruction meeting, we had
decided for the variation=calib to duplicate some of the constants
(e.g. ADC_gains) from the variation=default. According to Mark Ito, a
variation will default to the parent data if no specific 'variation
data' has been loaded. This would save us considerable effort in
completing the variation=calib. However, this does not seem to work:

ifarm1102> ccdb dump /BCAL/ADC_gains::calib > ! ADC_gains.txt
ifarm1102> more ADC_gains.txt
There is no data for table /BCAL/ADC_gains, run 0, variation 'calib'
Cannot fill data for assignment with this ID
ifarm1102> ccdb dump /BCAL/ADC_gains::default > ! ADC_gains.txt
ifarm1102> more ADC_gains.txt

 #
   3.32441e-05
   6.81519e-05
   4.44338e-05
   8.16211e-05
   3.15279e-05
   5.12456e-05
   5.01645e-05
   5.9035e-05
 ........

ccdb command print errors to stderr

For debugging purposes, it would be useful if the ccdb CLI program printed errors to stderr instead of stdout. In some batch jobs, I was getting errors like the following which were showing up in stdout, and it would be easier to track them down if they showed up in stderr:

CCDB provider unable to connect to sqlite:///home/gxproj3/calib_challenge/ccdb.sqlite. Aborting command. Exception details: (OperationalError) unable to open database file None None

Problems using low-level API to write tables to SQLite files

I've run into a possible problem using the python low-level API to add assignments to an SQLite file. This code had worked up until about a month ago, for what it's worth.

I've attached a python script that reproduces the problem, along with the output that it produces.

I've been able to reproduce the bug using several SQLite files on various locations in the JLab CUE. It works fine with the MySQL master DB.

example.zip

[C++ API] Slow GetValue

Reading columns with different types, like this:

    auto_ptr<Assignment> calibModel(calib->GetAssignment(dcc.database));

    for(size_t rowI = 0; rowI < calibModel->GetRowsCount(); rowI++)
    {
            cout << "  par Name: "       << calibModel->GetValue(rowI, 1)
                  << "  par Value:  "     << calibModel->GetValue(rowI, 2) << endl;
    }

Looks slower than if all the columns are of the same type. Or is it maybe cause I’m reading every item in every column?

It seems that having a vector < vector > and filling that table is faster, but I’m not sure.

Deprecated tables

Mark tables as deprecated.
Deprecated tables are not shown be ccdb command by default but still exists
To see all tables --show-all flag should be used for ccdb

Tools for moving data between variations

These would just help streamline usage. Things like tools for copying assignments from one variation to another, and assigning the average values determined over one run range in a "working" variation to that run range in another variation.

Better user rights system

Required by #7 and #6

User will be added directly to SQLAlchemy model classes. So one can use something like

table.creator.name
variation.creator.logs

"log" command enhancements

I find myself using the log command a lot, and it would be nice to have the following features:

  • ability to set which fields are displayed
  • ability to filter on various fields (e.g. certain user names, table names, dates, etc.)

Use of cat --id

Using cat --id command corrupts the command line interface ("No row was found for one()").

screen shot 2017-09-22 at 2 09 38 pm

Lustre issues?

I'd forgotten, with the latest releases, are we moving towards Lustre compatibility?

I'd had some questions sent my way recently in which there were problems with SQLite DBs, which may be due to a filesystem problem, but the usual errors were not thrown. If the diagnosis is correct, I'll open another issue, but wanted to check on this point first.

Adding blame-info in "vers" output

It would be really useful to have the name of the person who added the constants to be listed in the "vers" command. Right now to find this information, one has to cross references a couple different commands.

Comment dump fix

Do we need a new tag to include the fix to the issue that Elton reported? The one about comments not being dumped out properly?

Management ideas

This note contains a few different ideas and is posted for the sake of discussion. It should probably be broken up into some actual issues for implementation.

There's a few improvements that would make calibration processing easier.

  1. Getting data into the CCDB.
    Right now, the GlueX jobs generate calibrations for ~20-30 tables for each production run. It would be nice to do further data reduction in the CCDB framework. I'd prefer to automatically add all of this information in the jobs to the main CCDB in some specified variation. I'm hesistant to do so since it would spam up the logs and potentially inflate the size of the DB. So I see a few possible solutions:
  1. Just do the work in an SQLite CCDB - this can be a headache, with many jobs potentially writing to the same file at once. I could do write some scripts to do post-processing, though.
  2. Hide the commits with improved logging. Maybe only show changes to the default variation by default?
  3. Meta-commits. One could also imagine building a commit where instead of just applying one set of constants to a table for a given run range, one constructs a mapping of constants files to particular runs (or ranges of runs?) and adds those to the DB all at once. This might be an over complication, though.

[Note that some constants require data from multiple runs. These procedures are still mainly in the hands of the experts at the moment.]

  1. Analysis of data in the CCDB.
    Once we've determined constants for the individual runs, assuming they are put in the CCDB, we'd like to monitor them, and determine values for certain subranges (if need be). So, the following would be useful:
  1. Time series: Plots that show the variation of constants as a function of run. Should be able to show individual constants as well as summary values (e.g. for individual channel timings, useful to see mean. std. dev., quartiles, etc.)
  2. Visualization of comparison between two different variations. Here the idea is to show the difference in values between two variations. For example, let's say one has a "working" variation and is trying to decide what to commit to the default variation for physics analysis. One wants to see the differences between the variations for a given set of constants, and summary values for tables with many entries. These could be shown for many tables in a given run, or for many runs for a given table.
    This is very key in figuring out which runs are in need of recalibration.
  3. Tools for moving data between variations. These would just help streamline usage. Things like tools for copying assignments from one variation to another, and assigning the average values determined over one run range in a "working" variation to that run range in another variation.

Questions about data model

Just a few questions...

Is it by design that assignments to a table must always have a fixed number of rows? For instance, I have a bad channel set that has a variable number of rows. Would it be better to model this so that every channel has a value with a boolean value so that every dataset has the same number of rows?

Can I assign multiple independent run ranges to the same set of calibration data? Or should I instead duplicate the data for the new run range?

Thanks.

Error message when modifying assignment

When modifying an assignment using the low-level API, the command completes successfully and the database is modified, but the following error message is thrown:

Traceback (most recent call last):
File "fix_run_range.py", line 12, in
provider.update_assignment(assignment)
File "/group/halld/Software/builds/Linux_CentOS6-x86_64-gcc4.9.2/ccdb/ccdb_1.06.01/python/ccdb/provider.py", line 1252, in update_assignment
affected_ids=[assignment.tablename + assignment(assignment.id)],
TypeError: 'Assignment' object is not callable

Visualization of comparison between two different variations

Visualization of comparison between two different variations. Here the idea is to show the difference in values between two variations. For example, let's say one has a "working" variation and is trying to decide what to commit to the default variation for physics analysis. One wants to see the differences between the variations for a given set of constants, and summary values for tables with many entries. These could be shown for many tables in a given run, or for many runs for a given table.

[Java] Any Java examples available?

Are there Java examples available anywhere showing how to use this API?

It would be especially helpful to see simple examples of adding, selecting, modifying and deleting data sets from the DB.

Thanks.

JS API

JS API for CCDB. Lets specify:

  • the request
  • the response

What should be there, etc

CCDB Allows inappropriate types for columns

This works:

ifarm1101> ccdb mktbl /calibration/dc/signal_generation/intrinsic_inefficiency -r 6 parameter3=double parameter4=float 
/group/clas12/bin/ccdb/sqlalchemy/engine/default.py:425: Warning: Data truncated for column 'columnType' at row 1 saving table to database...  completed
ifarm1101> ccdb cat /calibration/dc/signal_generation/intrinsic_inefficiency 
+-------------------------+
| parameter3 | parameter4 | 
| double     |            | 
...

But should not

[C++] scons install

Implement

scons install

If no prefix given, copy to standard system folders

Time series

Plots that show the variation of constants as a function of run. Should be able to show individual constants as well as summary values (e.g. for individual channel timings, useful to see mean. std. dev., quartiles, etc.)

Error when creating directory in ccdb command line tool

When I execute this in the command shell

/> mkdir /test2

I get this error with a blank exception message

Failed to create directory. Exception message:

I am using a copy of the sqlite data file provided in the ccdb git project e.g.

ccdb -c sqlite:///$PWD/src/main/sql/ccdb.sqlite -i

Several other commands such as ls, etc. seem to work fine.

Am I able to create new tables via this interface or no?

Thanks.

Problems with SQLite files on the ifarm?

I've been working on some calibrations on the ifarm machine, using SQLite CCDB files, but when I make changes to these files, it looks like they are modified, but ccdb log show no changes.

An example file is here: /u/scratch/sdobbs/ccdb.sqlite
I changed the /PHOTON_BEAM/RF/time_offset table, but this change doesn't show.

password prompt when adding variation

This one came in over email from Sean Dobbs (@sdobbs):

When adding the variation for the 2016 simulations on the ifarm, it asked me for a password for some reason [terminal capture at the end of the email]. I put in "ccdb" and that worked fine, but I'm not sure if it's supposed to do that.

ifarm1102> ccdb mkvar mc_sim1 -p mc
Variation mc_sim1 created
Enter MySql password:
Password:

Better Control of Timestamps on Ancestor Variations

@DraTeots and I ( @markito3 ) discussed this yesterday.

ccdb_ancestor_time

The issue is that since any given variation may have one or more ancestors (parents, parents of parents, etc.), the user may want to have different calibration times (CALIBTIME or historical timestamp) for the variation being used and each of its ancestors. For example when working on a TOF calibration using the "tofcal" variation, one may want to have a fixed version of all constants not associated with the TOF, i. e., not explicitly named in the "tofcal" variation. If tofcal's parent is "default", then the user would want to use a fixed version of "default", identified by date, but always use the latest version of the "tofcal" variation. Currently the only behavior available is the opposite of this use case; the user would get the latest version of "default" and can only specify a fixed CALIBTIME for "tofcal".

The proposed solution has two parts:

  1. Make another signature-differentiated version of the SetTime function of the API. The current version takes only a time as an argument. The new version takes a time and a variation name.
  2. Add a new parameter to the JANA_CALIB_CONTEXT parameter: VARTIME that specifies variation and time, e. g., VARTIME=mc:2016-04-01 . Multiple instances of VARTIME could appear. Implementation would use the API function defined above.

Errors running LLAPI example?

I've run into some errors that I hadn't before when using the low-level API. Maybe the best illustration is using the sample script $CCDB_HOME/examples/llapi_readout.py

I tried running on ifarm1401 with python2.7 using ccdb_1.06.01 and ccdb_1.06.01 and get the following output:

=======================

ifarm1401> python llapi_readout.py
<Directory 3 'test_vars'>
test_vars
/test/test_vars
<Directory 1 'test'>
[]
== TABLE == 'test_table'
/test/test_vars/test_table
Test type
2014-04-10 17:20:28
test_vars
[<ConstantSet '1'>, <ConstantSet '2'>, <ConstantSet '4'>, <ConstantSet '5'>]
x y z
double double double
rows 2 x 3 columns
2
== TABLE == 'test_table2'
/test/test_vars/test_table2
Test type 2
2014-04-10 17:20:28
test_vars
[<ConstantSet '3'>]
c1 c2 c3
int int int
rows 1 x 3 columns
2

== Getting tables another way ==
/test/test_vars/test_table2
test_table
test_table2
test_table
test_table2

== Getting all table data ==
test
0
2147483647
/test/test_vars/test_table2:0:test:2012-09-30_23-48-42
2
[[u'10', u'20', u'30']]
[u'10', u'20', u'30']

== Getting assignment ==
Traceback (most recent call last):
File "llapi_readout.py", line 91, in
assignment = provider.get_assignment(1, "/test/test_vars/test_table2", "test") # run, table, variation
File "/u/group/halld/Software/builds/Linux_CentOS6-x86_64-gcc4.9.2/ccdb/ccdb_1.06.02/python/ccdb/provider.py", line 1026, in get_assignment
assert isinstance(path_or_table, TypeTable)
AssertionError

[C++] Memory Leak

There seems to be a memory leak in DataProvider where it does not delete mAuthentication in the destructor.

Option to disable anonymous checkins from CCDB command line tool

It would be nice to disable anonymous checkins from the CCDB command line tool in some way. Ideally this would be the default behavior.

Since the name associated with a change is taken from CCDB_USER, many people forget to properly set this (despite sustained nagging), which is beginning to make changes difficult to manage.

[Java] Column data with 'long' type cannot be retrieved

I have a simple main where I'm trying to read long values from a column in Java:

import java.util.Vector;

import org.jlab.ccdb.Assignment;
import org.jlab.ccdb.CcdbPackage;
import org.jlab.ccdb.JDBCProvider;
import org.jlab.ccdb.TypeTableColumn;

public class ReadTest {
        
    static final String CONNECTION = "sqlite:////u/ey/jeremym/hps-dev/ccdb-scratch/scratch/ccdb.sqlite";
    static final int RUN = 5772;
    static final String TABLE = "/ECAL/calibrations";
    
    public static void main(String[] args) {
        
        JDBCProvider provider = CcdbPackage.createProvider(CONNECTION);
        provider.connect();
        provider.setDefaultRun(5772);
        Assignment a = provider.getData(TABLE);
        
        Vector<TypeTableColumn> typeTable = a.getTypeTable().getColumns();
        for (TypeTableColumn col : typeTable) {
            System.out.println(col.getName() + ":" + col.getCellType());
        }
        
        Vector<Long> channelIds = a.getColumnValuesLong(0); // Throws exception but column is actually a long!
        Vector<Double> pedestals = a.getColumnValuesDouble(1);
        Vector<Double> noise = a.getColumnValuesDouble(2);
        
        int len = channelIds.size();
        for (int i = 0; i < len; i++) {
            System.out.println(channelIds.get(i) + " " + pedestals.get(i) + " " + noise.get(i));
        }
        
        provider.close();
    }
}

The column info in the db looks like:

+------------------------------------------+
| Columns info                             |
+------------------------------------------+

Columns info
 N.   (type)    : (name)
 0    long      : ecal_channel_id
 1    double    : pedestal
 2    double    : noise

The test does not work though. The Java API is not able to read back the long values, e.g.

Exception in thread "main" java.lang.NumberFormatException: For input string: "1L"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:589)
        at java.lang.Long.parseLong(Long.java:631)
        at org.jlab.ccdb.Assignment.getColumnValuesLong(model.kt:355)
        at ReadTest.main(ReadTest.java:26)

The Java API seems to know the correct column types though:

ecal_channel_id:long
pedestal:double
noise:double

Any idea why this might be?

I was seeing similar problems for int columns as well.

This is using Java 1.8 with the CCDB master and Python 2.7 (I'm suspecting there's an issue here with python 2.7 adding the 'L' to these values).

Very slow startup with CCDB sqlite file

Mark Dalton recently told me of an issue with hd_root taking several minutes to start
processing events when running on the gluons. After a little investigation, I was able
to reproduce the problem in my own account, but only if using an sqlite file for CCDB.
If I use mysql, then it starts up normally in just a few seconds.

Has anyone else observed this recently? I have successfully used sqlite for CCDB
quite a bit so I'm suspicious this is not a global issue but wanted to check with others
before fully escalating it to defcon Romanov.

-David

https://groups.google.com/forum/#!topic/gluex-software/stOBsyHJsoE

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.