Giter Club home page Giter Club logo

Comments (21)

rbotzer avatar rbotzer commented on July 26, 2024

Hey Ken, let me try to reproduce that. Can you please give me the environment details? OS type and version, which Python client release, and which server version are you working with (on which OS, version), etc.

from aerospike-client-python.

whosken avatar whosken commented on July 26, 2024

Thanks! Let's see:

  • Python 2.6.9
  • Both server and client are on: Linux version 3.10.42-52.145.amzn1.x86_64 (mockbuild@gobi-build-64003) (gcc version 4.8.2 20131212 (Red Hat 4.8.2-7) (GCC) ) SMP Tue Jun 10 23:46:43 UTC 2014
  • Python client version: 1.0.41
  • Server version: aerospike-amc-enterprise-3.5.4-el5.x86_64-3.rpm

from aerospike-client-python.

rbotzer avatar rbotzer commented on July 26, 2024

A couple of questions:

  • Do you also this behavior with release 1.0.44?
  • Are you saying that you're missing records or that for certain sets the entire query fails (nothing is printed at all from the foreach() callback)?
  • If you modify your code to user results() is data missing, or is it only happening in foreach()?
  • Can you confirm that there is a secondary index on the bin 'index' in all the sets? Can you go into AQL and check with show indexes?

from aerospike-client-python.

whosken avatar whosken commented on July 26, 2024
  • Yes, it is reproduced in 1.0.44.
  • Nothing is printed from the foreach. What should a missing record look like in contrast?
  • Unfortunately so. We are receiving [] from query.results() even though we know there are data in the set.
  • Yes. I can confirm that we are receiving no results while there are secondary indices on the bin.

The following is roughly what our tables look like:

+-------------------------+--------------------+------+------+----------+------------+-----+-------------+--------+
| index    | value   | bids | wins | spend     | cost           | fee | impressions | clicks |
+-------------------------+--------------------+------+------+----------+------------+-----+-------------+--------+
| "index" | "value" | 105  | 53   | 2661136 | 2866136    | 0   | 40                | 2       |
+-------------------------+--------------------+------+------+----------+------------+-----+-------------+--------+

We have multiple of these. We query them separately and aggregate the results. Many of these are not returning results via the Python client while we receive them from NodeJS client and AQL.

Thanks again.

from aerospike-client-python.

ryanwitt avatar ryanwitt commented on July 26, 2024

I think this behavior goes back at least a couple months, but I'm not sure which version I first saw it in.

from aerospike-client-python.

rbotzer avatar rbotzer commented on July 26, 2024

Hi Ken. I haven't yet been able to reproduce the issue, but I noticed a mistake in the callback function. The parameter for it should be a tuple (key, meta, bins) rather than three separate arguments. What happens if you change your script to:

def print_result((k,m,r)):
  print r

Also, can you show me what is in the setnames list? If you can post the output of show indexes in AQL, even a partial one, that would help.

One more thing, is it the set name None that is the problem? A query over a namespace and the None set will give you records that are not part of any formal set. This is different from how a scan over a namespace with a None set behaves (that would return all records of all the namespace)

from aerospike-client-python.

arthurprs avatar arthurprs commented on July 26, 2024

We're experiencing similar problems with the latest (to date) AS driver and Python 2.7

this code bellow returns no results (slight abstraction on top of predicate and query)

uspib.search_by_field_value('iid_pn', concat([input_split['import_id'], page_num]))

but when using a local variable it magically works

x = concat([input_split['import_id'], page_num])
uspib.search_by_field_value('iid_pn', x)

Which makes me think it's somehow related to the reference count of the variables.

from aerospike-client-python.

arthurprs avatar arthurprs commented on July 26, 2024

My coworker @pauloborges just showed me this

In [44]: query = client.query('test', 'demo')

In [45]: query.where(p.equals('index', 'foo:bar'))
Out[45]: <aerospike.Query at 0x1f309f0>

In [46]: query.results()
Out[46]: []

In [47]: query = client.query('test', 'demo'); query.where(p.equals('index', 'foo:bar')); query.results()
Out[47]: 
[(('test',
   'demo',
   None,
   bytearray(b'g-\xdemG\x10\xacIQ\xa1\x95bU\xc0RK3\xffl\xcd')),
  {'gen': 1, 'ttl': 4294967295},
  {'index': 'foo:bar'})]

from aerospike-client-python.

whosken avatar whosken commented on July 26, 2024

@rbotzer Thanks for following up.

The argument error doesn't exist in our production code, it was my mistake when transcribing. Sorry about that.

The following is a snippet of show indexes. Hopefully it'll give you some insights.

+-----------+---------+-------------------------+----------+-------+----------------------------------------------+------------+--------+
| ns        | bins    | set                     | num_bins | state | indexname                                    | sync_state | type   |
+-----------+---------+-------------------------+----------+-------+----------------------------------------------+------------+--------+
| "week_2x" | "index" | "live_stats_utc_sun_08" | 1        | "RW"  | "idx__index__week_2x__live_stats_utc_sun_08" | "synced"   | "TEXT" |
| "week_2x" | "index" | "live_stats_utc_mon_08" | 1        | "RW"  | "idx__index__week_2x__live_stats_utc_mon_08" | "synced"   | "TEXT" |
| "week_2x" | "index" | "live_stats_utc_mon_23" | 1        | "RW"  | "idx__index__week_2x__live_stats_utc_mon_23" | "synced"   | "TEXT" |
| "week_2x" | "index" | "live_stats_utc_wed_14" | 1        | "RW"  | "idx__index__week_2x__live_stats_utc_wed_14" | "synced"   | "TEXT" |
| "week_2x" | "index" | "live_stats_utc_thu_06" | 1        | "RW"  | "idx__index__week_2x__live_stats_utc_thu_06" | "synced"   | "TEXT" |

I don't believe we have any None set.

While I'm not sure it's the same cause as the one @pauloborges described, we are observing similar results.

from aerospike-client-python.

rbotzer avatar rbotzer commented on July 26, 2024

This is definitely something I want to prioritize because it's affecting core functionality, and because to be honest it's a weird bug. I will spin up similar EC2 instances and see if I can replicate it there.

Can I ask you to please run the unit tests, especially the following ones?

py.test -v test_query.py
py.test -v test_scan.py

from aerospike-client-python.

ryanwitt avatar ryanwitt commented on July 26, 2024

Thanks @rbotzer

Ok, I had to do some tweaking because our cluster doesn't have a test.demo namespace/set, but the modified test is throwing a bunch of errors that looks like this:

E       InvalidRequest: (4L, 'Invalid type must be [functional, userland, default]', 'src/main/client/sec_index.c', 502)

Does the test not match the driver? Our driver is now at 1.0.44 as of this test.

from aerospike-client-python.

rbotzer avatar rbotzer commented on July 26, 2024

If you removed the test namespace (which is there by default) you'll have to edit the tests. demo is a set inside test, and will be created automatically.

Running py.test -v should return detailed error information (the lines around that error). Can you paste it here? This seems pertinent, because on our various QA environments we don't see that. I'm hoping it will help us focus in the right direction.

from aerospike-client-python.

rbotzer avatar rbotzer commented on July 26, 2024

This should be fixed by release >= 1.0.45. Please verify.

from aerospike-client-python.

whosken avatar whosken commented on July 26, 2024

Unfortunately, we are still seeing the issue.

Apologies for not being able to get to the unit test. We don't have a test namespace in our cluster, making it a little messier to run. Are there internals from the query instance that we can quickly print out to get you some extra info?

from aerospike-client-python.

rbotzer avatar rbotzer commented on July 26, 2024

Hey, Ken. It's fairly easy to add the test namespace back in, even as an in-memory one. You change aerospike.conf and restart a node, then do the same for the next nodes in the cluster. I'll get back to you about the method for getting extra information.

namespace test {
    storage-engine memory
    memory-size 2G
    replication-factor 2
    high-water-memory-pct 60
    stop-writes-pct 90
    default-ttl 0
}

from aerospike-client-python.

rbotzer avatar rbotzer commented on July 26, 2024

Release 1.0.46 is now available. I'd appreciate if you tried it and if the error is still present, if you could run the tests and quote their output.

from aerospike-client-python.

rbotzer avatar rbotzer commented on July 26, 2024

Again, running the query tests and quoting the results would be very helpful if you are having this issue.
The following are two scripts that I'm using to try and reproduce the problem.

issue56_prep.py

from __future__ import print_function
import aerospike
from aerospike.exception import *
import sys

config = {'hosts': [('192.168.119.3', 3000)]}

try:
    client = aerospike.client(config).connect()
except ClientError as e:
    print("Error: {0} [{1}]".format(e.msg, e.code))
    sys.exit(1)

distribution = {}
ival = 0
rid = 1
while rid < 300:
    if rid <= 100 and (rid % 3 != 0):
        ival = ival + 1
    elif (rid > 100 and rid <= 200) and (rid % 5 == 0):
        ival = ival + 1
    elif (rid > 200 and rid <= 300) and (rid % 10 == 0):
        ival = ival + 1
    try:
        distribution[ival] = distribution[ival] + 1
    except KeyError:
        distribution[ival] = 1
    try:
        client.put(('test','i56',rid), {'id':rid, 'ival':ival, 'sval':str(ival)})
    except RecordError as e:
        print("Error: {0} [{1}]".format(e.msg, e.code))
        sys.exit(2)
    rid = rid + 1

client.index_integer_create('test', 'i56', 'ival', 'i56_int_idx')
client.index_string_create('test', 'i56', 'sval', 'i56_str_idx')
print(distribution)
client.close()

The distribution of ival values is printed when this script is done.

issu56_test.py

from __future__ import print_function
import aerospike
from aerospike.exception import *
from aerospike import predicates as p
import sys

config = {'hosts': [('192.168.119.3', 3000)]}

try:
    client = aerospike.client(config).connect()
except ClientError as e:
    print("Error: {0} [{1}]".format(e.msg, e.code))
    sys.exit(1)

# query for an expected 10 record result:
query = client.query("test", "i56")
query.where(p.equals("ival", 96))
res = query.results()
print(res)
print("There are ", len(res), " results for this query")
print('-----------------------------------------')

# query for an expected 5 record result:
query = client.query("test", "i56")
query.where(p.equals("ival", 74))
res = query.results()
print(res)
print("There are ", len(res), " results for this query")
print('-----------------------------------------')

# query for an expected 2 record result:
query = client.query("test", "i56")
query.where(p.equals("ival", 6))
res = query.results()
print(res)
print("There are ", len(res), " results for this query")
print('-----------------------------------------')

# query for an expected 1 record result:
query = client.query("test", "i56")
query.where(p.equals("ival", 7))
res = query.results()
print(res)
print("There are ", len(res), " results for this query")

client.close()

So far I have seen the expected results with OS X 10.10, Debian 7, Ubuntu 14.04. If you are getting unexpected results please copy and paste them in a comment, and also check in your AQL for whether the results there are mismatched.

AQL output

aql> select * from test.i56 where ival=96
+------+------+-----+
| ival | sval | id  |
+------+------+-----+
| 96   | "96" | 293 |
| 96   | "96" | 298 |
| 96   | "96" | 290 |
| 96   | "96" | 292 |
| 96   | "96" | 294 |
| 96   | "96" | 299 |
| 96   | "96" | 296 |
| 96   | "96" | 297 |
| 96   | "96" | 291 |
| 96   | "96" | 295 |
+------+------+-----+
10 rows in set (0.032 secs)

aql> select * from test.i56 where ival=74
+------+------+-----+
| ival | sval | id  |
+------+------+-----+
| 74   | "74" | 139 |
| 74   | "74" | 137 |
| 74   | "74" | 136 |
| 74   | "74" | 138 |
| 74   | "74" | 135 |
+------+------+-----+
5 rows in set (0.008 secs)

aql> select * from test.i56 where ival=6
+------+------+----+
| ival | sval | id |
+------+------+----+
| 6    | "6"  | 9  |
| 6    | "6"  | 8  |
+------+------+----+
2 rows in set (0.006 secs)

aql> select * from test.i56 where ival=7
+------+------+----+
| ival | sval | id |
+------+------+----+
| 7    | "7"  | 10 |
+------+------+----+
1 row in set (0.007 secs)

from aerospike-client-python.

rbotzer avatar rbotzer commented on July 26, 2024

With the help of @arthurprs and @pauloborges we have debugged this issue, and it is particular to EC2.

Overview

EC2 nodes have a private IP, that is visible in their subnet within their availability zone, and a public IP. Application nodes (where the client lives) may or may not be able to access the private IP, for example if they connect from a different zone.

Initial Connection

When the client connects to the public IP of a cluster node it will inquire about the other nodes in the cluster. This is equivalent to running asinfo -v 'services' on that node. Depending on its configuration, the cluster may respond with unreachable private IPs.

By default, the access-address config parameter will be set to any, which means it will expose its own IP. In EC2 the node knows only of its private IP.

Consequences

Key-Value Operations

These will continue to occur by proxy. The client will send all reads and writes to the single node it has access to, and that node will proxy those operations. You will see a high number of proxy events, which normally only show up during migrations.

Queries

Queries do not proxy. As a result, the client will send the query request to the single node it has a connection to. The records matched against the secondary index will stream back from that client, giving fewer than expected resulting records. No data will come back from any node that is unreachable by the client. This is not ideal behavior, as it would be better for the client to give a clear error rather than a partial result. I have opened an internal ticket for this ( AER-3903 ).

Workaround

This problem is distinct to a cloud environment, such as EC2. There are two workarounds:

  • Locate the application (client) nodes in the same availability zone as the server nodes. This will also result in lower latency between them.
  • If the application nodes may be in a different zone, configure the access-address to be the public IP of the node. Even if the clients are all in the same zone, ensure that each client node can access the private IP of all the cluster nodes. Again, use asinfo or telnet to port 3000 on the private IPs to determine this.

@whosken and @ryanwitt please check if this is your problem, and let me know if the workarounds solve it. Thank you everyone for your help in identifying and debugging this problem.

from aerospike-client-python.

whosken avatar whosken commented on July 26, 2024

@rbotzer Thanks for your patience and efforts.

I have verified that our cluster are all in the same availability zone. While the cause may be the same, the workaround may not resolve our situation. I have also verified with release 1.0.46, and still found the faulty behavior.

Similar bug is not present in the Node driver, however. I am uncertain why. Would that suggest there's a workaround or solution we can apply on the client side?

from aerospike-client-python.

rbotzer avatar rbotzer commented on July 26, 2024

@whosken thanks for looking into it. Did you also try asinfo -v services on all the app nodes? That should show you the IPs of the other nodes in the cluster, while asinfo -v service shows you the IP of the node you are connected to. Those should give you a good idea if the cluster is defined correctly and accessible from all the app nodes. I'd appreciate if you do that.

Beyond that, I am unsure. We seemed to see the same problem with AQL, which also wraps around the C client, and that suggested it wasn't specifically in Python. Today was a long day 😫 and I think I've maxed out on problem-solving. I will be talking to the main C client developer tomorrow about it, though.

PS: did you run AQL and node.js on the same app nodes as where your Python scripts run?

from aerospike-client-python.

rbotzer avatar rbotzer commented on July 26, 2024

One last try to collect information. Release 1.0.49 and server release 3.5.15 are out. Please try again. In case the problem remains please answer the questions I had up the thread, and I'll reopen.

from aerospike-client-python.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.