test_stream_parity can fail if it's running on a slow disk (i.e., not SSD/ramdisk). This is bad because it's part of our make site_test suite, and presumably should pass even in less resource-heavy environments.
The particular failure mode that I saw was:
-- 2014-10-30 10:33:07,052 utils:50 INFO ===== test_set_insert_id (__main__.TestUpdateStream) [1847/8060]
-- 2014-10-30 10:33:07,052 utils:50 INFO ===== test_set_insert_id (__main__.TestUpdateStream) ...
-- 2014-10-30 10:33:07,243 update_stream:339 DEBUG test_set_insert_id: starting @ {'MariaDB': '0-62344-526'}
-- 2014-10-30 10:33:07,294 utils:50 INFO ===== test_set_insert_id (__main__.TestUpdateStream) ... ok
-- 2014-10-30 10:33:07,294 utils:50 INFO ===== test_stream_parity (__main__.TestUpdateStream)
-- 2014-10-30 10:33:07,294 utils:50 INFO ===== test_stream_parity (__main__.TestUpdateStream) ...
-- 2014-10-30 10:33:17,306 update_stream:284 DEBUG run_test_stream_parity starting @ {'MariaDB': '0-62344-652'}
-- 2014-10-30 10:33:18,240 update_stream:316 DEBUG Test Failed - # of records mismatch, master [{'Category': 'DML', 'GTIDField': {}, 'Timestamp': 1414690397L, 'TableName': 'vt_a', 'Sql': '', 'PkRows': [[('
eid', 1L), ('id', 1L)]]}, {'Category': 'DML', 'GTIDField': {}, 'Timestamp': 1414690397L, 'TableName': 'vt_a', 'Sql': '', 'PkRows': [[('eid', 2L), ('id', 2L)]]}, {'Category': 'DML', 'GTIDField': {}, 'Timest
amp': 1414690397L, 'TableName': 'vt_a', 'Sql': '', 'PkRows': [[('eid', 3L), ('id', 3L)]]}, {'Category': 'DML', 'GTIDField': {}, 'Timestamp': 1414690397L, 'TableName': 'vt_a', 'Sql': '', 'PkRows': [[('eid',
4L), ('id', 4L)]]}, {'Category': 'DML', 'GTIDField': {}, 'Timestamp': 1414690397L, 'TableName': 'vt_a', 'Sql': '', 'PkRows': [[('eid', 5L), ('id', 5L)]]}, {'Category': 'DML', 'GTIDField': {}, 'Timestamp':
1414690397L, 'TableName': 'vt_a', 'Sql': '', 'PkRows': [[('eid', 6L), ('id', 6L)]]}, {'Category': 'DML', 'GTIDField': {}, 'Timestamp': 1414690397L, 'TableName': 'vt_a', 'Sql': '', 'PkRows': [[('eid', 7L),
('id', 7L)]]}, {'Category': 'DML', 'GTIDField': {}, 'Timestamp': 1414690397L, 'TableName': 'vt_a', 'Sql': '', 'PkRows': [[('eid', 8L), ('id', 8L)]]}, {'Category': 'DML', 'GTIDField': {}, 'Timestamp': 1414
690397L, 'TableName': 'vt_a', 'Sql': '', 'PkRows': [[('eid', 9L), ('id', 9L)]]}, {'Category': 'DML', 'GTIDField': {}, 'Timestamp': 1414690397L, 'TableName': 'vt_a', 'Sql': '', 'PkRows': [[('eid', 10L), ('i
d', 10L)]]}, {'Category': 'DML', 'GTIDField': {}, 'Timestamp': 1414690397L, 'TableName': 'vt_a', 'Sql': '', 'PkRows': [[('eid', 11L), ('id', 11L)]]}, {'Category': 'DML', 'GTIDField': {}, 'Timestamp': 14146
90397L, 'TableName': 'vt_a', 'Sql': '', 'PkRows': [[('eid', 12L), ('id', 12L)]]}, {'Category': 'DML', 'GTIDField': {}, 'Timestamp': 1414690397L, 'TableName': 'vt_a', 'Sql': '', 'PkRows': [[('eid', 13L), ('
id', 13L)]]}, {'Category': 'DML', 'GTIDField': {}, 'Timestamp': 1414690397L, 'TableName': 'vt_a', 'Sql': '', 'PkRows': [[('eid', 14L), ('id', 14L)]]}, {'Category': 'DML', 'GTIDField': {}, 'Timestamp': 1414
690397L, 'TableName': 'vt_a', 'Sql': '', 'PkRows': [[('eid', 15L), ('id', 15L)]]}, {'Category': 'POS', 'GTIDField': {'MariaDB': '0-62344-653'}, 'Timestamp': 1414690397L, 'TableName': '', 'Sql': '', 'PkRows
': []}] replica [{'Category': 'DML', 'GTIDField': {}, 'Timestamp': 1414690397L, 'TableName': 'vt_insert_test', 'Sql': '', 'PkRows': [[('id', 1000252L)]]}, {'Category': 'DML', 'GTIDField': {}, 'Timestamp':
1414690397L, 'TableName': 'vt_insert_test', 'Sql': '', 'PkRows': [[('id', 1000253L)]]}, {'Category': 'DML', 'GTIDField': {}, 'Timestamp': 1414690397L, 'TableName': 'vt_insert_test', 'Sql': '', 'PkRows': [[
('id', 1000254L)]]}, {'Category': 'DML', 'GTIDField': {}, 'Timestamp': 1414690397L, 'TableName': 'vt_insert_test', 'Sql': '', 'PkRows': [[('id', 1000255L)]]}, {'Category': 'POS', 'GTIDField': {'MariaDB': '
0-62344-652'}, 'Timestamp': 1414690397L, 'TableName': '', 'Sql': '', 'PkRows': []}]
-- 2014-10-30 10:33:18,241 utils:50 INFO ===== test_stream_parity (__main__.TestUpdateStream) ... FAIL
-- 2014-10-30 10:33:18,242 update_stream:126 DEBUG Tearing down the servers and setup
-- 2014-10-30 10:33:18,242 tablet:624 DEBUG killing vttablet: test_nj-0000062344
-- 2014-10-30 10:33:18,242 tablet:624 DEBUG killing vttablet: test_nj-0000062345
-- 2014-10-30 10:33:18,282 vtclient:127 WARNING db connection failed: test_keyspace.0.master :15009, (error(111, 'Connection refused'), '')
-- 2014-10-30 10:33:18,283 vtdb_logger:24 INFO Fetched keyspace test_keyspace from topo_client in 0.000878 secs
-- 2014-10-30 10:33:18,283 vtdb_logger:64 WARNING vtclient_exception for test_keyspace.0.master: ('unable to create vt connection', 'test_keyspace.0.master', ':15009', FatalError
((error(111, 'Connection refused'), ':15009 master test_keyspace/0>'),))
Exception in thread write_thd:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "test/update_stream.py", line 197, in perform_writes
self._exec_vt_txn(self._populate_vt_insert_test)
File "test/update_stream.py", line 269, in _exec_vt_txn
vtdb_conn = self._vtdb_conn('localhost:%u' % master_tablet.port)
File "test/update_stream.py", line 263, in _vtdb_conn
conn.connect()
File "mariadb/py-vtdb/vtdb/vtclient.py", line 95, in connect
return self._connect()
File "mariadb/py-vtdb/vtdb/vtclient.py", line 133, in _connect
'unable to create vt connection', db_key, host_addr, db_exception)
OperationalError: ('unable to create vt connection', 'test_keyspace.0.master', ':15009', FatalError((error(111, 'Connection refused'), ''),))
-- 2014-10-30 10:33:18,464 utils:214 DEBUG run: ['mariadb/bin/mysqlctl', '-log_dir', 'mariadb/vtdataroot/tmp', '-tablet_uid', '62344', 't
eardown', '-force'] extra_env={'EXTRA_MY_CNF': 'mariadb/src/github.com/youtube/vitess/config/mycnf/master_mariadb.cnf'}
-- 2014-10-30 10:33:18,466 utils:214 DEBUG run: ['mariadb/bin/mysqlctl', '-log_dir', 'mariadb/vtdataroot/tmp', '-tablet_uid', '62345', 't
eardown', '-force'] extra_env={'EXTRA_MY_CNF': 'mariadb/src/github.com/youtube/vitess/config/mycnf/master_mariadb.cnf'}
-- 2014-10-30 10:33:22,704 utils:50 INFO ===== ======================================================================
-- 2014-10-30 10:33:22,704 utils:50 INFO ===== FAIL: test_stream_parity (__main__.TestUpdateStream)
@enisoc and I hypothesize that this is flaky because of the first two lines of the test:
def test_stream_parity(self):
master_start_position = _get_master_current_position()
replica_start_position = _get_repl_current_position()
It could be that if there is replication lag at the time that the test starts, the master and replica have different current positions, which cause their events streams to not be equal.