Giter Club home page Giter Club logo

p5-cassandra-simple's Introduction

No Maintenance Intended Be cautious about using it in production!

NAME Cassandra::Simple

VERSION version 0.2

DESCRIPTION Easy to use, Perl oriented client interface to Apache Cassandra.

This module attempts to abstract the underlying Thrift methods as much
as possible to allow any Perl developer a small learning curve when
using Cassandra.

SYNOPSYS my ($keyspace, $column_family) = qw/simple simple/;

        my $conn = Cassandra::Simple->new(keyspace => $keyspace,);

        $conn->create_column_family( column_family => $column_family);

        $conn->insert(column_family => $column_family, key => 'KeyA', columns => { 'ColumnA' => 'AA' , 'ColumnB' => 'AB' } );

        $conn->get(column_family => $column_family, key => 'KeyA');
        $conn->get(column_family => $column_family, key => 'KeyA', columns => [ qw/ColumnA/ ]);
        $conn->get(column_family => $column_family, key => 'KeyA', column_count => 1, column_reversed => 1);

        $conn->batch_insert(column_family => $column_family, rows => { 'KeyB' => [ [ 'ColumnA' => 'BA' ] , [ 'ColumnB' => 'BB' ] ], 'KeyC' => [ [ 'ColumnA' => 'CA' ] , [ 'ColumnD' => 'CD' ] ] });

        $conn->multiget(column_family => $column_family, 'keys' => [qw/KeyA KeyC/]);

        $conn->get_range(column_family => $column_family, start => 'KeyA', finish => 'KeyB', column_count => 1 );
        $conn->get_range(column_family => $column_family);

        $conn->get_indexed_slices(column_family => $column_family, expression_list => [ [ 'ColumnA' => 'BA' ] ]);

        $conn->remove(column_family => $column_family, 'keys' => [ 'KeyA' ], columns => [ 'ColumnA' ]);
        $conn->remove(column_family => $column_family, 'keys' => [ 'KeyA' ]);
        $conn->remove(column_family => $column_family);

        $conn->get_count(column_family => $column_family, key => 'KeyA');
        $conn->multiget_count(column_family => $column_family, 'keys' => [ 'KeyB', 'KeyC' ]);

get Arguments:

  column_family, key, columns, column_start, column_finish,
  column_count, column_reversed, super_column, consistency_level

Returns an HASH of the form "{ column => value, column => value }"

multiget Arguments:

  column_family, keys, columns, column_start, column_finish,
  column_count, column_reversed, super_column, consistency_level

Returns an HASH of the form "{ key => { column => value, column => value
}, key => { column => value, column => value } }"

get_count Arguments:

  column_family, key, columns, column_start, column_finish,
  super_column, consistency_level

Returns the count as an int

multiget_count Arguments:

  column_family, keys, columns, column_start, column_finish,
  super_column, consistency_level

Returns a mapping of "key -> count"

get_range Arguments:

  column_family, start, finish, columns, column_start, column_finish,
  column_reversed, column_count, row_count, super_column,
  consistency_level

Returns an *HASH* of the form "{ key => { column => value, column =>
value }, key => { column => value, column => value } }"

get_indexed_slices Arguments:

  column_family, expression_list, start, row_count, columns,
  column_start, column_finish, column_reversed, column_count,
  consistency_level

The *expression_list* is an *ARRAYREF* of *ARRAYREF* containing
"$column[, $operator], $value". $operator can be '=', '<', '>', '<=' or
'>='.

Returns an *HASH* of the form "{ key => { column => value, column =>
value }, key => { column => value, column => value } }"

insert Arguments:

  column_family, key, columns, timestamp, ttl, consistency_level

The $columns is an *HASHREF* of the form "{ column => value, column =>
value }"

insert_super Arguments:

  column_family, key, columns, timestamp, ttl, consistency_level

The $columns is an *HASH* of the form "{ super_column => { column =>
value, column => value } }"

batch_insert Arguments:

  column_family, rows, timestamp, ttl, consistency_level

$rows is an *HASH* of the form "{ key => { column => value , column =>
value }, key => { column => value , column => value } }"

add Arguments:

  column_family, key, column, value, super_column, consistency_level

Increment or decrement counter $column by $value. $value is 1 by
default.

batch_add Arguments:

  column_family, rows, consistency_level

$rows is an *HASH* of the form "{ key => { column => value , column =>
value }, key => { column => value , column => value } }"

remove_counter Remove counter $column on $key.

Arguments:

  column_family, key, column, super_column, consistency_level_write

remove Arguments:

  column_family, keys, columns, super_column, write_consistency_level

$keys is a key or an *ARRAY* of keys to be deleted.

A removal whitout keys truncates the whole column_family.

The timestamp used for remove is returned.

list_keyspace_cfs Arguments:

Returns an HASH of "{ column_family_name => column_family_type }" where
column family type is either "Standard" or "Super"

create_column_family Arguments:

  keyspace, column_family, "cfdef"

"cfdef" is any Column Family Definition option (column_type,
comparator_type, etc.).

create_keyspace Arguments:

  keyspace, strategy, strategy_options

list_keyspaces Arguments:

drop_keyspace Arguments:

  keyspace

create_index Arguments:

  keyspace, column_family, columns, validation_class

Creates an index on $columns of $column_family. $columns is an ARRAY of
column names to be indexed. $validation_class only applies when $column
doesn't yet exist, and even then it is optional (defaults to
*BytesType*).

ring Arguments:

  keyspace

Lists the addresses of all nodes on the cluster associated with the
keyspace "<$keyspace">.

BUGS Bugs should be reported on github at https://github.com/fmgoncalves/p5-cassandra-simple.

TODO Thrift Type Checking and Packing/Unpacking

The defined types (or defaults) for each column family are known and
should therefore be complied with. Introducing Composite Types has
forcefully introduced this functionality to an extent, but there should
be a refactoring to make this ubiquitous to the client.

Error Handling

Exceptions raised when calling Cassandra code should be reported in
error form with appropriate description.

Unit Tests

  Sort of done in the examples folder
  <https://github.com/fmgoncalves/p5-cassandra-simple/tree/master/exampl
  es>

Tombstones

get, get_range and get_indexed_slices should probably filter out
tombstones, even if it means returning less than the requested count.
Ideally it would retry until it got enough results.

Methods

The following are Thrift methods left unimplemented.

Not all of these will be implemented, since some aren't useful to the
common developer.

Priority will be given to live schema updating methods.

  describe_cluster_name

  string describe_cluster_name()

  describe_keyspace

  KsDef describe_keyspace(string keyspace)

  describe_partitioner

  string describe_partitioner()

  describe_snitch

  string describe_snitch()

  describe_version

  string describe_version()

  system_drop_column_family

  string system_drop_column_family(ColumnFamily column_family)

ACKNOWLEDGEMENTS Implementation loosely based on Cassandra::Lite.

  <http://search.cpan.org/~gslin/Cassandra-Lite-0.0.4/lib/Cassandra/Lite
  .pm>

API based on Pycassa.

  <http://pycassa.github.com/pycassa/>

AUTHOR Filipe Gonçalves "[email protected]"

COPYRIGHT AND LICENSE This software is copyright (c) 2012 by Filipe Gonçalves.

This is free software; you can redistribute it and/or modify it under
the same terms as the Perl 5 programming language system itself.

p5-cassandra-simple's People

Contributors

fmgoncalves avatar jcbf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

numkem schatt jcbf

p5-cassandra-simple's Issues

loop over all rows of a column family

I'm trying to loop over all rows in a column family. This is well performing and very scalable in pycassa:

for row in mycolfam.get_range():

My test data set of ~200000 rows is being processed in 20s (10000rows/s) on a weak development virtual machine.

However, I haven't found a working solution in cassandra-simple. Using something like

$conn->get_range('mycolfam', {'row_count'=>$maxrows});

will start loading all tokens from the database into memory. This works for row_count with a maximum of 10000. A bigger row_count will result in enormous times to finish the request. I'm talking about several hours for 200000 rows.

Is there a scalable way in cassandra-simple that I missed? Or is this impossible at the moment?

bulk update counters (compared to pycassa)

I am currently writing an application that makes heavy use of counter columns, often requiring to bulk update counters. pycassa offers to use insert() just the same as add(), so does batch_insert():

mycolfam.batch_insert({'foobar': {'ccol': {'count': 1}}, 'barfoo': {'ccol': {'count': 1}}})

In cassandra-simple this seems not to work:

$conn->batch_insert('mycolfam', {'foobar' => {'ccol' => {'count' => 1}}, 'barfoo' => {'ccol' => {'count' => 1}}});
$VAR1 = 'invalid operation for commutative columnfamily mycolfam at /usr/local/share/perl5/Cassandra/Cassandra.pm line 7022, <FIN> line 3.
';

Instead I need to loop over all keys that need to be updated and do individual add() calls. Therefore the same procedure takes much more time in Perl compared to Python. It would be great if cassandra-simple could support insert()/batch_insert() on counter columns.

lazy_query doesn't go through all keys

Hello,

I'm using this column family :

create column family sessions
with column_type = 'Standard'
and comparator = 'UTF8Type'
and default_validation_class = 'UTF8Type'
and key_validation_class = 'UTF8Type'
and read_repair_chance = 0.1
and dclocal_read_repair_chance = 0.0
and gc_grace = 864000
and min_compaction_threshold = 4
and max_compaction_threshold = 32
and replicate_on_write = true
and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
and caching = 'KEYS_ONLY'
and column_metadata = [
{column_name : 'ip',
validation_class : UTF8Type},
{column_name : 'pubpoint_id',
validation_class : IntegerType},
{column_name : 'request',
validation_class : UTF8Type},
{column_name : 'key',
validation_class : UTF8Type,
index_name : 'sessions_key_idx',
index_type : 0},
{column_name : 'useragent',
validation_class : UTF8Type},
{column_name : 'timestamp',
validation_class : UTF8Type},
{column_name : 'uid',
validation_class : IntegerType},
{column_name : 'status',
validation_class : UTF8Type,
index_name : 'sessions_status_idx',
index_type : 0}]
and compression_options = {'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'};

And while using this call to lazy_query() :

my $query = $cass->lazy_query('get_indexed_slices', column_family => 'sessions', expression_list => [ [ 'status', 'starting' ] ]);
while (my $res = $query->run(1)) {
}

I'm noticing that the while doesn't return all the keys, just a fraction of it. Would it be possible that since I'm deleting keys in the while it would be causing issues?

Issue with Bit::Vector::new_Dec()

Hello, I am really impressed by how simple you made it to access cassandra using perl - very slick. I am having an issue however with getting data into my cassandra instance. When I call your super_column_support example or when I try to insert data using the insert command I retrieve this error:

$conn->insert(supersimple, 'KeyA', { 'SuperColumnA' => { 'SubColumnA' => 'AAA', 'SubColumnB' => 'AAB'}, 'SuperColumnB' => { 'SubColumnA' => 'ABA', 'SubColumnB' => 'ABB'} })

$VAR1 = 'Bit::Vector::new_Dec(): input string syntax error at /usr/local/share/perl5/Thrift/BinaryProtocol.pm line 203.';

I am not sure if this is an incompatibility on my end or something that can be fixed within your client implementation. Let me know if you need version information, etc.

Thank you!

adding 0 to counter column increases value by 1

I expected no change in value. See my example as I tried this in perlconsole:

Perl> $conn->get('mycolfam', 'foobar')->{ccol}->{'count'}
6

Perl> $conn->add('mycolfam', 'foobar', 'count', 0, {'super_column' => 'ccol'});
1

Perl> $conn->get('mycolfam', 'foobar')->{ccol}->{'count'}
7

All other values, including negative values, behave as excpected.

ResourcePool errors

I have discovered hardly reproducable ResourcePool errors under load/parallel access to cassandra. Unfortunately, I have too little Perl knowledge to find the culprit myself. This is the error I get:

Use of uninitialized value $_[0] in sprintf at /usr/local/share/perl5/ResourcePool.pm line 247, <GEN70> line 3.
ResourcePool(): got failed resource from client
Use of uninitialized value $_[1] in sprintf at /usr/local/share/perl5/ResourcePool/LoadBalancer.pm line 307, <GEN70> line 3.
Use of uninitialized value $_[0] in sprintf at /usr/local/share/perl5/ResourcePool.pm line 247, <GEN70> line 3.
ResourcePool(): Downsizing
ResourcePool: Downsized... still 0 open (0)

I'm getting this error while connecting, reading, writing... Maybe you can give me a few pointers where to investigate further.

The number of errors could be reduced by increasing the Max option in ResourcePool->new() (I simply hardcoded a value of 256 instead of 5 in RessourcePool.pm).

This is how I connect to my cassandra cluster:

my $conn = Cassandra::Simple->new(keyspace => 'mykeyspace', server_name => '192.168.0.31', pool_from_ring => 0);
$conn->pool->add_pool('mykeyspace',  '192.168.0.32');
$conn->pool->add_pool('mykeyspace',  '192.168.0.33');

However, after the queries my application simply quits. Maybe my application gets stuck because of lingering connections. Is there a way to properly disconnect from cassandra?
Should I randomize to which server I connect first?

Exceptions are not catchable

Hi,

This is more of a feature request than anything but it seems that while using Error.pm, it's not possible to catch exceptions classes specified in Types.pm.

It seems like it's missing the catch() method.

Thank you.

Trying to insert a BooleanType value

When I pass 'true' it fails. When I pass '1' it works but then I can't do: GET XXX where alive=true;

The Reply from Filipe Gonçalves via email, because I asked through email :)

Using validation and comparator classes in Cassandra is something very difficult to deal with in Perl. Basically, since there is no type system in Perl, it's impossible to know when inserting or obtaining a value from Cassandra what the correct type is throughout the processing stages. For this reason, it's probably better to use UTF8Type everywhere in Cassandra and assign known values to your Perl app.

In this case you could have a UTF8Type instead of BooleanType and only assign it 'True' and 'False' values. It's not ideal but this is a design limitation in Perl and there isn't much to be done about i

ResourcePool::LoadBalancer Issue

Hello,
We had contact before and I wanted to let you know that for the most part I am extremely happy with this component. I have one question though. I am running Cassandra::Simple using modperl and there is a fair amount of calls to that web service. After a while I am getting these error messages and aside from adding more Cassandra node names to the pool I can't find a way to prevent this from happening. Do I need to close a connection to prevent this? Or increase the ResourcePool from 5 to something higher (e.g. 100). Do you have any recommendation?

==> /var/log/httpd/error_log <==
ResourcePool> sleeping 0 seconds...
ResourcePool(): Downsizing
ResourcePool: Downsized... still 5 open (0)
LoadBalancer(cassandra31436): Suspending pool to '' for 5 seconds
ResourcePool(): Downsizing
ResourcePool: Downsized... still 5 open (0)
ResourcePool> sleeping 0 seconds...
ResourcePool(): Downsizing
ResourcePool: Downsized... still 5 open (0)
LoadBalancer(cassandra31436): Suspending pool to '' for 5 seconds
ResourcePool(): Downsizing
ResourcePool: Downsized... still 5 open (0)
ResourcePool::LoadBalancer> sleeping 1 seconds...
ResourcePool::LoadBalancer> sleeping 1 seconds...
ResourcePool::LoadBalancer> sleeping 1 seconds...
ResourcePool::LoadBalancer> sleeping 1 seconds...

And the exception dumper tells me this:
Can't call method "batch_mutate" on an undefined value at Cassandra/Simple.pm line 648

Thank you,
Nils

get doesn't work with LazyQuery

The following code doesn't work:

my $query = $cassandra->lazy_query('get', column_family => 'foo', key => 'bar');

my $i = 3;
while( $i-- && (my $res = $query->run(10))){
print "Res $i: ".join("\n", keys %$res,'');
}

Overriding options in create_keyspace() doesn't work. Logic error

The Call:

$CASS->create_keyspace($ks,
{
'strategy' => 'org.apache.cassandra.locator.SimpleStrategy',
'strategy_options' => {
'replication_factor' => 1,
'datacenter1' => '1'
},

} );

The Offending code in Cassandra::Simple.pm (line #1109, 1113)
$params->{strategy_class} = 'org.apache.cassandra.locator.NetworkTopologyStrategy' unless $opt->{strategy};

AND

$params->{strategy_options} = { 'datacenter1' => '1' } unless $opt->{strategy_options};

The codes logic is in Error.

Same type (logic) of code as a test:

my $strategy='one';
my $params = 'two' unless $strategy;
print "OUT: $params\n"; ### parms should equal 'one'.

The error code should be:
$params->{strategy_class} = $opt->{strategy} || 'org.apache.cassandra.locator.NetworkTopologyStrategy';
AND
$params->{strategy_options} = $opt->{strategy_options} || { 'datacenter1' => '1' };

wish: Consider use thrift-xs

Andy Grundman's Thrift-XS is a drop-in replacement for Thrift module.

Thrift::XS provides faster versions of Thrift::BinaryProtocol and Thrift::MemoryBuffer.
Thrift compact protocol support is also available, just replace Thrift::XS::BinaryProtocol with Thrift::XS::CompactProtocol.
To use, simply replace your Thrift initialization code with the appropriate Thrift::XS version.

Check please http://search.cpan.org/~agrundma/Thrift-XS-1.04/ or https://github.com/andygrundman/thrift-xs
Could do something similar with JSON module and allow the developer to specify with backend or load XS and fallback to PurePerl

Benchmarks shows significant improvment

XS::MemoryBuffer write + read: 6x faster

XS::BinaryProtocol
    writeMessageBegin + readMessageBegin: 12.0x
    complex struct/field write+read:       6.6x
    writeMapBegin + readMapBegin:         24.0x
    writeListBegin + readListBegin:       20.0x
    writeSetBegin + readSetBegin:         21.0x
    writeBool + readBool:                 13.5x
    writeByte + readByte:                 13.9x
    writeI16 + readI16:                   14.4x
    writeI32 + readI32:                   12.9x
    writeI64 + readI64:                   29.4x
    writeDouble + readDouble:             13.5x
    writeString + readString:              7.5x

XS::CompactProtocol
    writeMessageBegin + readMessageBegin: 11.6x
    complex struct/field write+read:       6.2x
    writeMapBegin + readMapBegin:         18.7x
    writeListBegin + readListBegin:       14.1x
    writeSetBegin + readSetBegin:         13.3x
    writeBool + readBool:                 13.2x
    writeByte + readByte:                 13.9x
    writeI16 + readI16:                    9.0x
    writeI32 + readI32:                    7.5x
    writeI64 + readI64:                   10.0x
    writeDouble + readDouble:             13.5x
    writeString + readString:              7.4x

Connection fails if rpc_address is set to 0.0.0.0

describe_ring() seems to list the values of the cassandra config directive rpc_address. According to cassandra doc, this config directive can be set to 0.0.0.0. Therefore, if rpc_address is set to 0.0.0.0 and not the real IP of the node, then Cassandra::Simple tries to connect to the node 0.0.0.0 which of course will always fail with the error message:
Runtime error: TSocket: Could not connect to 0.0.0.0:9160 (Connection refused) at /usr/local/share/perl/5.10.1/Cassandra/Pool/CassandraServer.pm line 51.

The workaround on cassandra side is to explicitly set rpc_address to the real IP address of the node. Correct me if I'm wrong, but it would be much nicer if Cassandra::Simple could determine the ring description using the values of the listen_address directive, which must be set to the node's IP anyway.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.