Giter Club home page Giter Club logo

pandra's People

Contributors

mjpearson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pandra's Issues

Use of undefined constant UUID_FMT_BIN - assumed 'UUID_FMT_BIN' in /Applications/MAMP/htdocs/app/mjpearson-Pandra-d709595/lib/ColumnContainer.class.php on line 438

Odds are I am doing something wrong but when using:

$keyspace = 'test';
$column_family = 'db';
$var = 'app';

$cf = new PandraColumnFamily();
$cf->setKeySpace($keyspace);
$cf->setName($column_family);
$cf->setType(PandraColumnContainer::TYPE_STRING);
$cf->addColumn($var, 'string');

With Cassandra ColumnFamily definition of:

<ColumnFamily Name="db" CompareWith="UTF8Type" />

I get the following PHP Notice error:

Notice: Use of undefined constant UUID_FMT_BIN - assumed 'UUID_FMT_BIN' in /Applications/MAMP/htdocs/app/mjpearson-Pandra-d709595/lib/ColumnContainer.class.php on line 438

Use PECL UUID (if present) instead of PHP UUID

On the note of high performance, there is a very nice PECL UUID generator, considerably better featured than PHP UUID class.

While PHP UUID is very useful for those who don't have/can't use PECL (and for pure class reasons), performance dictates otherwise, especially if we need to generate many UUIDs quickly and with very low system overhead.

Installation:
http://pecl.php.net/package/uuid
Get the latest file, untar

cd uuid-1.0.2
phpize
make
make install
cp /usr/lib64/extensions/no-debug-non-zts-20090626/uuid.so /usr/lib64/php/modules
(for Centos x64, will differ by distro)

add uuid.so to /etc/php.ini or /etc/php.d/uuid.ini

after that, fork out UUID generation to
if(function_exists('uuid_create'))
{
//requires PECL's uuid compiled and added to php.ini
//http://pecl.php.net/package/uuid
//fully compliant with RFC by using mac address
$uuid = uuid_create(1);
}
else
{
//uses a PHP UUID class from
//http://www.shapeshifter.se/2008/09/29/uuid-generator-for-php/
}

Besides pure performance differences, using PECL class also generates strictly compliant Type 1 UUIDs, which use timestamp and MAC addressin generation (latter can be quite cumbersome to get in PHP directly without system calls)

In addition, PECL class provides a few very nifty functions, such as uuid_timestamp and uuid_mac to respectfully retrieve timestamp and generator's mac address from a type 1 uuid, as well as uuid_is_valid() and uuid_is_null methods, which allow one to validate uuids without using preg_match (again, for performance reasons).

PandraCore: Set socket Timeout

I'm using Pandra on a VPN.

I've got the following error:

Warning: pfsockopen(): unable to connect to cassandra.somewhere.com:9160 (Operation timed out) in /.../Pandra/lib/thrift/transport/TSocket.php on line 176:

TSocket.php (line:176):

$this->handle_ = @pfsockopen($this->host_,
                               $this->port_,
                               $errno,
                               $errstr,
                               $this->sendTimeout_/1000.0);

The problem is due to the connection timeout reached by the socket, since the default timeout is:

TSocket.php (line:57):

 /**
  * Send timeout in milliseconds
  *
  * @var int
  */
  private $sendTimeout_ = 100;

So it's 0.1 seconds on the socket. This is correct, because to get the connection on a LAN, this is time requested ( for connect ).
When on VPN, it's possible that you want to increase this time, because of the network
round-trip.

So I modified PandraCore::connect in order to set expose these interfaces and set them:

Core.php (line:219):

            // Create Thrift transport and binary protocol cassandra client
            $tSocket = new TSocket($host, $port, PERSIST_CONNECTIONS,      'PandraCore::registerError');

             // LP: set send and receive timeout, as for relaying on vpn tunnelling
            $tSocket->setSendTimeout(THRIFT_PORT_SOCKET_TIMEOUT);
            $tSocket->setRecvTimeout(THRIFT_PORT_SOCKET_TIMEOUT);

            $transport = new TBufferedTransport($tSocket, 1024, 1024);
            $transport->open();

where the define in the config file:

config.php (line:13):

define('THRIFT_PORT_DEFAULT', CASSANDRA_THRIFT_PORT_DEFAULT);
define('THRIFT_PORT_SOCKET_TIMEOUT', CASSANDRA_THRIFT_TOUT);

In this way it's possibile to tune the socket in a better way, relying on your network configuration.

Load and populate methods

This "issue" is going to be long, to write and to read. It's more a thought, an opinion I would like to share with you.

So, it's been 2 days since I played with Pandra. Installation/basic examples are ok. Now I am willing to make some more sophisticated requests using thrift function get_slice, multiget_slice and get_range_slice(s? cassandra 0.6). So I searched their equivalent in Pandra. I found them more or less under the method PandraCore::getRangeKeys, PandraCore::getCFSliceMulti and PandraCore::getCFSlice.

So I started to play with these 3 last methods. Then I realized, that they returned thrift basic objects, I mean, not wrapped under Pandra. I asked my self, how the author (it means you :) )is using PandraCore::* methods. Then I saw in the code that , after each call to these methods, you pass their result to a method called Populate (defined in ColumnContainer and SuperColumn).

Reciprocally, method populate() is always called in load() method, just after calls to PandraCore::*.

So my point is : the cast from Thrift "raw" objects to Pandra "magic" objects should be done in PandraCore::* methods. PandraCore::* should call a Core method that does the type cast and return a nice Pandra Object (a not these basic ugly :p thrift ones). It's like moving the method(s) Populate in the Core class and giving an access point to the "casting method" (because it is now in the core class and not "hidden" in 2 others class).

This idea could need a bit of refactoring (which i am willing to do), but imho it will enhance the global architecture of Pandra. But as I am new to this project I might be missing some important points. But please think about it twice :) And i feel that this idea could help you in your OQL, or at least, like me, you feel the way (and the place where) the current casting/poupulating is done is not perfect.

Thanks for reading, I hope I made myself clear :)

Suggestion - use built in thrift protocol load balancer for connectivity rather than tracking your own

I noticed you have a pretty sophisticated tracker of up/down hosts for multi-host setups, with active/round robin/random support.

In reality, most will likely want to use true random (seldom r/r and almost never active) approach given the eventually consistent nature of cassandra, rather than anything else. Up to you of course, but it simplifies code maintenance a lot, and improves readability.

I am in the writing a highly optimized, performance oriented read/write cassandra CRUD for our needs, and noticed that thrift supports internal randomized (or r/r, but not "active") load balancing without much ado, and in addition does a very good job with downed host detection using APC as an intermediary.

All you need to do is use a TSocketPool object instead of TSocket during cassandra object initialization.

So in this case, $transport = new TBufferedTransport(new TSocket($host, $port), 1024, 1024); will need to be replaced with $transport = new TBufferedTransport(new TSocketPool($hosts, $port), 1024, 1024); where $hosts is an array of hostnames/IPs. $port can also be an array but in Thrift's case it's expected to be a 1:1 host->port relationship, or a single unified port, so if you have 5 hosts, you can use a single unified port (such as default 9160) or an array of 5 ports, otherwise things may not work as expected.

TSocket/TSocketPool also seems to track open()/isOpen() internally, so it's probably not needed to do that either.

If you desire round/robin approach, you can achieve that using setRandomize(false) method of TSocketPool.

See TSocket.php and TSocketPool.php for more options too.

Using TimeUUIDType results in Warning

A TimeUUID result in :
Warning: Illegal offset type: /Pandra/lib/SuperColumnFamily.class.php on line 73

Used UUID:
ffe25e40-3749-11df-a2f4-096c4076c368

Examples Don't Appear to Work

I know this should be run via web server, but I get the same effect either way. Run via command line for clarity.

[root@staging examples]# php address_supercolumn.php
PHP Fatal error: Call to undefined method PandraSuperColumn::setValue() in /var/www/html/Pandra/lib/ColumnContainer.class.php on line 407

Tried writing my own code based on the examples with the same issue.

ossp UUID adding trailing null on bin to string conversion

uuid examples fail for debian :

object(cassandra_ColumnOrSuperColumn)[20]
  public 'column' => 
    object(cassandra_Column)[21]
      public 'name' => string 'IÞÚç\Ó�ß )@@)�Ã`' (length=16)
      public 'value' => string '5' (length=1)
      public 'timestamp' => int 1273564839236
  public 'super_column' => null

...
echo UUID::convert($column->name, UUID::UUID_FMT_STR);

49dedae7-5cd3-11df-a029-40402912c360\u0000

Node auto-discover and Quorum consistency reads should downgrade on node failure

on initial connect try to auto discover additional nodes via thrift client get_string_property("token map")

where nodes have dropped from the pool during the instance, downgrade any QUORUM reads if the minimum 2 nodes aren't available to consistency ONE. Node id should also drop from the ACTIVE and RANDOM_APC connection pool

  • note: drop ROUND connection select type

SuperColumn is not saved after a delete

I seem to have come across a bug with SuperColumns. I can add one successfully (i.e. add through Pandra and then go to the CLI and type get ks.cf['key']) However, if I delete the key by running "del ks.cf['key']" and then add that same column back through Pandra a subsequent get ks.cf['key'] always returns 0 results.

It seems that Pandra should add the SuperColumn back with that key, but maybe is getting tripped up because the row was "tombstoned"? FWIW, I can do a "set ks.cf['key']['super']['col'] = 'val'" from the CLI and the row shows up.

Michael, I have been digging through the lib folder to see if I can track the problem down, but I am still getting my bearings and haven't been able to find the culprit yet.

Minor bug in ColumnContainer.class.php (Line 254)

/**
 * autoCreate mutator
 * @param bool $autoCreate new mode
 */
public function setAutoCreate(bool $autoCreate) {
    $this->$_autoCreate = $autoCreate;
}

should be:

$this->_autoCreate = $autoCreate;

PandraCore::getRangeKey - Key range is overlapping keys

I'm using the key slice in this way on a Super Column Family:

 $res=PandraCore::getRangeKeys(
        'myNS',
        array('start'=>$start,'finish'=>''),
        new cassandra_ColumnParent(array(
            'column_family' => 'mySCF',
        )),
    new PandraSlicePredicate(
        PandraSlicePredicate::TYPE_RANGE,
        array('start' => '',
            'finish' => '',
            'count' => $limit,
            'reversed' => true)),
            $limit
        );

This is the only way using parameters $start and $limit to achieve a sortain of key range, sliciing from $start=0 to $start=$limit

But something is wrong, because I'm having back same keys during iterations:

Event: block:1 start:50 size:24 count:24 memory: 6.22 M

  • index:0 id 47ce2c4d31d8e4cdf8f8190e66be8b17 tag 0
  • index:1 id c58cafee1a1032ae4a54d41d342f4660 tag 0
  • index:2 id b4dfd506827bd2bfb4456d18b4d8e328 tag 0
  • index:3 id 450b9461e2a84c99149fd7b1a9c10ec4 tag 0

Following keys will be repeated in the next loop:

  • index:4 id 06c6e7ab3c8b2745716df257ac8fcdf9 tag 0
    • index:5 id __37cb2c60594cd5b8cb9419f275f2e2 tag 0
    • index:6 id 65c9eb832df73c5462fc69514fd2baac tag 0
      (...omissis)
    • index:23 id 49721dbb746319b5548e3b455622317f tag 0

And the repeating keys block:
Event: block:0 start:26 size:20 count:44 memory: 6.22 M

  • index:24 id 06c6e7ab3c8b2745716df257ac8fcdf9 tag 0
  • index:25 id __37cb2c60594cd5b8cb9419f275f2e2 tag 0
  • index:26 id 65c9eb832df73c5462fc69514fd2baac tag 0
    (...omissis)
    • index:43 id 49721dbb746319b5548e3b455622317f tag 0

Since the block total size should be 50, from the second block (I'm reading from the last one), starting from 26:
Event: block:0 start:26 size:20 count:44 memory: 6.22 M

I should have a new key range.
What's happening is that key ranges are overlapping in some way.

Any idea?

Framed transport

We are currently using framed transport for thrift. Pandra has support for this, but it would be nice to make this easily configurable. To be able to configure the transport type we made the following adjustments:

in file config.php:

define('THRIFT_TRANSPORT_BUFFERED', 1);
define('THRIFT_TRANSPORT_FRAMED', 2);
define('THRIFT_TRANSPORT', THRIFT_TRANSPORT_FRAMED);

in file lib/Core.class.php, in both functions connect() and connectSeededKeyspace():

if (THRIFT_TRANSPORT == THRIFT_TRANSPORT_BUFFERED) {
    $transport = new TBufferedTransport(new TSocket($host, $port, PERSIST_CONNECTIONS, 'PandraCore::registerError'), 1024, 1024);
} else {
    $transport = new TFramedTransport(new TSocket($host, $port, PERSIST_CONNECTIONS, 'PandraCore::registerError'), 1024, 1024);
}

And finally change the definition of the function _authOpen in file lib/Core.class.php to:

static private function _authOpen(TTransport &$transport, $keySpace)

Using TBinaryProtocolAccelerated (with additional requirements) at initialization of instead of TBinaryProtocol greatly speeds up I/O on Cassandra

For production needs, and on properly configured thrift clients (described below) it would be best to use TBinaryProtocolAccelerated which uses binary serialization, which in turn significantly speeds up I/O from a thrift/php client to/from cassandra cluster.

in Pandra.class.php replace
'client' => new CassandraClient(new TBinaryProtocol($transport))
with
'client' => new CassandraClient((function_exists("thrift_protocol_write_binary") ? new TBinaryProtocolAccelerated($transport) : new TBinaryProtocol($transport)))

There are a few things that need to happen for this to work (i.e. "properly configured thrift clients")

  1. PECL APC must be compiled and added to php.ini (or /etc/php.d/apc.ini)
  2. thrift_protocol.so [which provides thrift_protocol_write_binary()] must be compiled and added to php.ini (or /etc/php.d/thrift.ini)

APC is well documented, to compile thrift_protocol.so in thrift-php/ext/thrift_protocol:
phpize
make
make install
cp /usr/lib64/extensions/no-debug-non-zts-XXXXXXXX/thrift_protocol.so /usr/lib64/php/modules

(XXXXXXXX will vary depending on your php version)

edit php.ini or (preferred) /etc/php.d/thrift.ini to include "extension=thrift_protocol.so"

After that, phpinfo() will contain a thrift protocol section (Version 1.0) and function_exists("thrift_protocol_write_binary") will return true.

In my testing, the benefit of using thrift+apc binary was approximately 400% on very heavy writes (~80,000 per second vs ~20,000 using fallback mechanism)

php5-uuid on Mac OS X

This is not a code issue, it's more a setup problem that is project related. I'm developing on Linux and Mac OS X. On Ubuntu it's easy to install php5-uuid like you described in the documentation to use your UUID class. But does anyone have an idea how to install it under Mac OS X?

Pandra sometimes not adding

Assuming $email = '[email protected]

        $this->scf->setKeyID($email);
        $info = new PandraSuperColumn('info');
        $info->addColumn('password', 'string');
        $info->addColumn('name', 'string');
        $info->addColumn('userlevel', 'string');
        $info->setColumn('password', md5($password));
        $info->setColumn('name', $name);
        $info->setColumn('userlevel',$userlevel);
        $this->scf->addSuper($info);
        if(!$this->scf->save()){
            die;
        }

If i run this code, then delete the records in the cassandra CLI

        cassandra> del Keyspace.Users['[email protected]']

Then try and re-run the above php code it will not run..> I am however able to add via the Cassandra cli at that point.

Impossible to save a column with an empty value

Code Sample :
// Create column without validator
$c = new PandraColumn('aNonExistentCol', 'isempty');
$c->setValue('');
$c->setParent(new PandraColumnFamily('myKeyID', 'Keyspace1', 'Standard1'));
var_dump($c->isModified());
$c->save();

This displays bool(false) whereas is should displays bool(true). As a consequence, the column is not saved.

IMHO the problem is in Column.class.php line 153 (method setValue()) :
if ($this->value == $value) return TRUE;
It should be
if ($this->value === $value) return TRUE;
Because, as I am sure you know, in PHP NULL == '' and cassandra_Column initiate the property 'value' with NULL

TYPE_LONG: (Possible) Fix for Packing to 8bytes long (big endian) binary

It seems that using CF with type long is not working at all.

I defined this CF:

class RollupCheckpoint extends PandraColumnFamily {

// keyspace in storage.conf
var $keySpace = 'CheckPoints';

// Column name
var $columnFamilyName = 'RollupCheckpoint';

public function init() {
    $this->setKeySpace($this->keySpace); // keyspace
    $this->setName($this->columnFamilyName); // name
    $this->setType(PandraColumnFamily::TYPE_LONG);
}

}

Then I'm inserting new column this way:

            $rollupObj = new RollupCheckpoint();
            $rollupObj->setKeyID( self::pack_longtype($rollupTS) );
    $info = array(
        't'             => $rollupTS,
        'checkpoint'    => $checkpointTS,
        'last-checkpoint'   => $lastCheckpointTS,
    );

    foreach($info as $name => $value) {
        $rollupObj->addColumn($name)->setValue($value); // add column to CF
    }

          $rollupObj->save();

Where two functions pack_longtype and unpack_longtype are from Cassandra FAQ:

http://wiki.apache.org/cassandra/FAQ#a_long_is_exactly_8_bytes

Pandra is responding:

 Warning: pack(): Type N: too few arguments in /Library/WebServer/Documents/logger /phplib/standalone/Logger/lib/pandra/lib/ColumnContainer.class.php on line 485

So, I modified the function this way:

   protected function typeConvert($columnName, $toFmt) {
(...)
} else if ($this->_containerType == self::TYPE_LONG) {
        $columnName = UUID::isBinary($columnName) ?
                        /*unpack('NN', $columnName) :
                        pack('NN', $columnName);*/
                        self::unpack_longtype($columnName) :
                        self::pack_longtype($columnName);

    }

No insert were made in the CF, before that fix (multiline commented code).

After the fix, CF stats then were:

Column Family: RollupCheckpoint
SSTable count: 1
Space used (live): 381
Space used (total): 381
Memtable Columns Count: 3
Memtable Data Size: 99
Memtable Switch Count: 1
Read Count: 5
Read Latency: 0,059 ms.
Write Count: 6
Write Latency: 0,013 ms.
Pending Tasks: 0
Key cache capacity: 128
Key cache size: 0
Key cache hit rate: NaN
Row cache: disabled
Compacted row minimum size: 0
Compacted row maximum size: 0
Compacted row mean size: 0

So some insert were made in it !

The row converted to JSON was:

{ row : {"t":1291078926,"checkpoint":1290606987,"last-checkpoint":1279022588}}

But when trying to read it:

   $rollupObj = new MXMRollupCheckpoint();
   $rollupObj->setKeyID( self::pack_longtype($rollupTS) );
   $rollupObj->load();

   Logger::getInstance()->debug( '{ row:'.$rollupObj->toJSON(True).'}' );

I got

  { row : ["1279022588"]}

As you can see it lacks of NS and CF names, as required by the

   $rollupObj->toJSON(True)

but there's something inside of it.

So, what's happening with TYPE_LONG?

Call to undefined method TBinaryProtocol

Unexpected behavior on getRangeKeys:

Fatal error: Call to undefined method TBinaryProtocol::BBBB��M�3„Š��() in /.../pandra/lib/thrift/packages/cassandra/cassandra_types.php on line 1069

Code snippet:

  $res=PandraCore::getRangeKeys(
        'myNS',
        array('start'=>'','finish'=>''),
        new cassandra_ColumnParent(array(
            'column_family' => 'myCF',
        )),
    new PandraSlicePredicate(
        PandraSlicePredicate::TYPE_RANGE,
        array('start' => '',
            'finish' => '',
            'count' => '',
            'reversed' => false))
        );

Any idea?

@todo validator extensions, consolidate oql against column containers

rewrite validator as event listeners via filter_var (FILTER_CALLBACK) to be added to keys, (super) column names and value set mutators. Add validators automatically in the model based on container types (utf8, long, uuid).

key and value validators will be superceded by enumerated validator scopes (VALIDATOR_KEY, VALIDATOR_NAME, VALIDATOR_VALUE) and enforced by the ColumnPathable interface.

Can probably drop /query or related clauses entirely and use validators for filtering results out from ranging queries in the column containers (dynamically added ranging or enumerating validators). This will significantly simplify the predicate graph and validation scheme in general. Query can just handle relations and hydration of containers over key ranges (ie: become an in-memory memcachable view)

also need to support ip filters (used in conjunction with ip2long callbacks for classful/subnet range queries), eg :

filter_var($ip, FILTER_VALIDATE_IP, FILTER_FLAG_IPV4 | FILTER_FLAG_NO_PRIV_RANGE | FILTER_FLAG_NO_RES_RANGE);

circular dependencies

ensure columnfamilies (ColumnContainer children) are decoupled from Columns and Pandra connection manager.

Examples to retrieve data

I'm a newbie to Pandra.
How about data retrieval?

I'm using these techniques by now:

Suppose to have this array

$myArrayKeyValues = array(
'key1' => 'val1',
'key2' => 'val2',
);

// 1. Clause plugins extensions

    class PandraClauseInAddon extends PandraClauseIn {
   public function match($value) {
          // you can implement here your matching logic
          return ( in_array($value, $this->getValueIn()) );
        }
    }

And then apply the plugin to a key name array to retrieve all keys

    $matches=$someSuperColumn->getColumn( 
   new PandraClauseInAddon(
    array_keys( $myArrayKeyValues ) )
    );

// 2. Slicing - you'love it! Look at Pandra Examples

$result = PandraCore::getCFSlice($ks,
    $keyID,
    new cassandra_ColumnParent(array(
            'column_family' => $cfName,
    )),
    new PandraSlicePredicate(
      PandraSlicePredicate::TYPE_RANGE,
       array('start' => '0',
            'finish' => '10',
            'count' => 5,
            'reversed' => true))
    );

NOTE: Due to a bug in previous 0.6.3 Cassandra binaries , you've got an error if you don't specifiy 'start' and 'finish' values when comparing UUID types (CompareWith="TimeUUIDType"). That (fixed) bug described here: https://issues.apache.org/jira/browse/CASSANDRA-377

// 3. Iterating over columns to match inner values. I really hate this!

    $subColumns=$col->getColumns(); // get sub - columns
foreach($subColumns as $sub) {
    if (  in_array($sub->getValue(),$myArrayKeyValues) ) {
               // found some item
           }
}

// Best ways to match values / retrieve keys using great Pandra functionalities?

installation

Hi Michael. First I want to say thank you for writing this PHP library for Cassandra. I expect it will give me a great leg up into moving away from MySQL to Cassandra for our particular app.

I have installed and run Cassandra 0.51 successfully and done some basic get/set/del commands through the CLI. I think I am finally wrapping my RDBMS mind around this new architecture! ;) My next step is to begin to integrate Cassandra into my PHP application.

The problem I have right now is I am not entirely sure how to proceed using Pandra. I have downloaded the tarball, but am unsure over whether this is an extension I need to compile myself on all the clients (phpize, etc) or if I include specific files in the scripts I want to use it with, or both. Forgive me for asking such a newbie question here. I don't have a whole lot of experience with installing custom PHP extensions, about as far as I have gone is installing via pear and pecl.

I'm also a bit green on what Thrift is or how I need to install it on my client servers. The best thing I have found so far is: https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP However, they stop at installing the Thrift API and don't talk about Pandra at all.

If you or anyone else reading this forum could point me in the right direction for how best to integrate Cassandra into an existing PHP script I would greatly appreciate it.

Thanks,
Mike

Use Time Range PandraClause

Is it possibile to extend a PandraClause with a
PandraClauseDateRange

in order to perform date rane queries based on column family (not SCF, just CF) UUIDs ?

Thanks in advantage.

consitency downgrades (quorum)

need to support quorum dowgrades when we lose or don't have the minimum number of nodes.

-also requires refactor away from Consistency function defaults. Should default to NULL and inherit from master Pandra::getConsistency()-

@todo 0.6, 0.7 feature gaps

need to audit and re-implement api calls in core

  • meta (describes)
  • batch mutation
  • authentication
  • ks and cf definition

type hinting with int in Core.class.php

There's a type hint with int in setConsistency (Core.class.php:397) which will trigger an error on call

Core.class.php:397
static public function setConsistency(int $consistencyLevel)

Calling code (from example)
PandraCore::setConsistency(cassandra_ConsistencyLevel::ONE);

Error:
PHP Catchable fatal error: Argument 1 passed to setConsistency() must be an instance of int, integer given, called in

The PHP manual says:
Traditional type hinting with int and string isn't supported. (http://www.php.net/manual/en/language.oop5.typehinting.php)

Is this a bug or are you using any third party library for type hinting? (as there seems to exist an unit test for setConsistency (PandraCoreTest.php) :)

"auto" is a vague name for a method

initialize_all_connections would be more descriptive, although somewhat verbose...

helped someone troubleshoot his app today where he was calling auto() for each request, resulting in a ton of connection overhead :)

BTW, good to see activity on a PHP client, Cassandra needs a standard there like Hector is for Java. Thanks for working on this.

Data model inversion?

It's possible I've got this all wrong, but I've thought about it for a while and I'm reasonably certain I understand Cassandra. In "WTF is a SuperColumn", the data model is described as Keyspace.SuperColumnFamily[Row][Super][Column] = value. Pandra does not have a Row type; instead each SCF has a keyID, which is the Row index (first key in brackets). This means that, in order to add rows to a SuperColumnFamily, we must create a NEW SuperColumnFamily object for every entry, with the same keyID (since Pandra uses that to mean the second member of the dotted pair) but a different name. This is backward: there should be a Row class or some such, which does what SCF does now (hold SuperColumns), and the SuperColumnFamily class should be repurposed to be solely a Map<String, Row>.

More than just a naming issue, this implementation has technical implications. Specifically, in PandraSuperColumnFamily::save(), there is a comment /* @todo there must be a better way */, followed by looping over all of the SuperColumn children. There is a better way! The Thrift method batch_mutate takes a keyspace, and a map<string, map<string, list>>. Mutation, meanwhile, can describe a SuperColumn insertion, which itself is a list of Column insertions. Pandra is not making use of all of these levels of hierarchy: every save() call in Pandra's API could be implemented as a single Thrift call, with no need for multiple requests.

My rough sketch of an implementation would be:

class SCF {
  function save() {
    $mutations = array();
    foreach ($this->getRows() as $key => $superCol) {
      $mutations[$key] = array($superCol->getMutation()); // see below
    }
    $realParam = array($this->name => $mutations); // wrap it up to save just this SCF
    $client->batch_mutate($this->keyspace, $realParam);
  }
}
class SuperColumn {
  function getMutation() {
    $cols = array();
    foreach ($this->getColumns() as $name => $value) {
      $cols[] = new ThriftColumn($name, $value);
    }
    return new ThriftMutation(INSERT, new ThriftSuperColumn($this->name, $cols));
  }
}

Obviously this glosses over quite a few details, like deletions, but I think the structure is right. I definitely sympathize with your erroneous (but see disclaimer at top!) implementation: even when you know exactly what to do it's hard to think about SuperColumnFamilies!

Problem recovering datas by Regex

Hi M. Pearson,

Thanks for your php Library, I use and like it. But I have a little problem, I would like to use your example :
"// By pluggable 'Clause'
$c = new PandraColumnFamily();
$c['username'] = 'myuser';
$c['homeAddress'] = ' MY HOUSE ';
$c['phone'] = '987654231';
$c['mobile'] = '011465987';
$c['workAddress'] = ' MY WORK ';

// regex extraction column references ending in 'address'
// (ie: homeAddress and workAddress)
$q = new PandraQuery();
$addresses = $c[$q->Regex('/address$/i')];
foreach ($addresses as $addressColumn) {
echo "QUERIED PATH : ".$addressColumn->value."
";
}"

But I have this error :
"Catchable fatal error: Object of class PandraQuery could not be converted to string in /Pandra/lib/ColumnContainer.class.php on line 779"

Do you know a way to fix this problem ?

Thanks for your answer,

Nicolas

DEFAULT_CREATE_MODE = false, clean degradation

where devs have chosen not to auto create columns not existing in the schema, ensure this is properly degraded or logged.

*note: allowing ColumnContainer children to override master parent

Implement PandraCore as a singleton

Most methods of PandraCore are static, and those which are not static are used in a static way (PandraCore::getCFSlice, PandraCore::getCFSliceMulti).

IMHO there is no future need to instantiate PandraCore and, at least, no need to instantiate it several time.

To implement the singleton pattern, just add the following code before static public function getSupportedModes

private static $_instance = NULL;

private function __construct() {
}

public static function getInstance() 
{
    if (!is_null(self::$instance)) {
        $c = __CLASS__;
        self::$_instance = new $c;
    }

    return self::$_instance;
}

apc testing

test multi node APC round robin connecter selection

TException: A long is exactly 8 bytes

Maybe I'm missing something here, but I can't seem to save a column defined as LongType

              <ColumnFamily Name="MyTest"
                ColumnType="Super"
                CompareWith="LongType"
                KeyCached="0.1"
                CompareSubcolumnsWith="BytesType" />

Here is the PHP code:

if (!PandraCore::auto('192.168.1.1.')) {
die(PandraCore::$lastError);
}

//super column name is a timestamp, and compare with longtype, so column names are sorted by time
$superName = time();
$keyId = '101';
$cfName = 'testCF';
$sc = new PandraSuperColumn($superName, $keyId, $cfName);
$sc->setColumnFamilyName('MyTest');
$sc->addColumn('http://facebook.com')->setValue('37');
$sc->addColumn('http://google.com')->setValue('42');
$sc->save();

I got this error back:

Array
(
[0] => TException: A long is exactly 8 bytes
)

Does Pandra convert an integer to long type Thrift expects?

$_subject declared twice.

not sure if this was mentioned, but a minor php error

private $_subject = '';

was declared twice in PandraLoggerMail class

ColumnContainer::destroyColumns

I can't understand the logic behind this method.

I think it should be
if ($columnName !== NULL) {
instead of if ($columnName === NULL) {

Use Cassandra-CLI safe time stamps

Cassandra CLI uses 13-digit timestamps, which are 10-digit unix Epoch + 3 decimal points represented as 13-digit integer.

If one is using Cassandra CLI + PHP/Thrift at the same time, and makes changes in CLI, then those fields err columns :) will have a 13-digit stamp, and all subsequent PHP modification requests to it will be ignored, as they will be deemed "older"

A solution would be to standardize on a special 13-digit time function, since PHP doesn't have one built in.

In my case, this is what I used, and ensured that all $this->setTime() requests converge to that method.

/**
 * Returns Cassandra-safe 13-digit time stamp, which is comprised of a traditional 10-digit timestamp and 3 decimal points converted to an integer
 *
 * @param int $timestamp (optional) traditional 10-digits or less unix epoch timestamp
 */
public static function time13($timestamp = NULL)
{
    $timestamp = intval($timestamp);
    if(strlen((string)$timestamp) > 10)
    {
        //assume already properly sized, can make strict == 13 if desired
        return $timestamp;
    }
    return($timestamp ? $timestamp*1000 : round(microtime(true)*1000, 3));
}

Dealing with Predicate

At the moment Prandra lacks a correct implementation of Predicate. You can do less with Pandra than with the thrift API.

In Pandra, 3 methods are dealing with cassandra_SlicePredicate :
Core::getRangeKeys, Core::getCFSliceMulti and Core::getCFSlice
In particular, for the last two, you can't use SliceRange Predicate.

IMHO, these 3 methods should not create a cassandra_SlicePredicate on their own, but instead receive it as a parameter. But that will break compatibility with previous version of Pandra, as the number and type of arguments would change.

So I am going to propose, 3 small modifications to these 3 methods, and keep the compatibility.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.