Giter Club home page Giter Club logo

collage's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

collage's Issues

Thread safety question on co/instanceCache.cpp

Is it thread safe to return a reference in:

const InstanceCache::Data& InstanceCache::operator[]( const UUID& id )

?

I don't see the lock being returned as part of InstanceCache::Data.

Add LocalNode::syncObject

syncObject( object, id, version, node ) is an optimized version of map + unmap which just copies the given instance data onto the object without mapping it. It will mostly be used by static objects such as init data in Equalizer.

boost version

I come to wake up the sleeping project.
which boost version if collage and its sub project fits best?
I use boost_1_75_0 to compile, and get some error: can't find some boost symbles.

Global argument parsing broken

Reported be @hernando: when launching a node:

localNode.cpp:296 26 Invalid global variables string: ##100#100#100#10#1#5#20#3#64#5#65000#524288#1#1#8#512#5000#-1#300000#1023##, using default global variables.

BufferCache race may lead to segmentation fault

The BufferCache deletes Buffers in compact which are free. After the refCount of a buffer reaches 0, i.e., it is free, notifyFree() is still being processed which leaves a small window where the buffer might get deleted but is still being in notifyFree.

The rtt branch contains a broken attempt to fix this.

LocalNode::handleData() asserts handling non-pending receive

From the barrier unit test (https://s3.amazonaws.com/archive.travis-ci.org/jobs/20283377/log.txt) and user report (see BBPRTN-307):

1: 22250 R PN2co9Loc src/Collage/co/connection.cpp:256  14 Assert: _impl->buffer [No pending receive on TCPIP#102400#testing-worker-linux-4-1-25432-linux-5-20283377##3109#default#]   in: 
1:   9: lunchbox::abort()
1:   8: co::Connection::recvSync(lunchbox::RefPtr<co::Buffer>&, bool)
1:   7: co::LocalNode::_readHead(lunchbox::RefPtr<co::Connection>)
1:   6: co::LocalNode::_handleData()
1:   5: co::LocalNode::_runReceiverThread()
1:   4: co::detail::ReceiverThread::run()
1:   3: lunchbox::Thread::_runChild()
1:   2: lunchbox::Thread::runChild(void*)
1:   1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x2b8873267e9a]
1:   0: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x2b8873d4a4bd]
1: 22250 R PN2co9Loc c/Lunchbox/lunchbox/debug.cpp:44   16 
 1/21 Test  #1: barrier ..........................***Exception: Other  0.03 sec

Strict aliasing issue with gcc 4.4

While compiling a Release version of Collage with gcc 4.4 (for instance on RHEL6), you'll get a warning (as error) regarding breaking strict aliasing rules:

Building CXX object tests/CMakeFiles/dataStream.dir/dataStream.cpp.o
cc1plus: warnings being treated as errors
/home/nachbaur/dev/bytest/Build/install/include/lunchbox/bitOperation.h: In function ‘int testMain(int, char**)’:
/home/nachbaur/dev/bytest/Build/install/include/lunchbox/bitOperation.h:111: error: dereferencing pointer ‘value.133’ does break strict-aliasing rules
/home/nachbaur/dev/bytest/Build/install/include/lunchbox/bitOperation.h:119: note: initialized from here
/home/nachbaur/dev/bytest/Build/install/include/lunchbox/bitOperation.h:142: error: dereferencing pointer ‘value.136’ does break strict-aliasing rules
/home/nachbaur/dev/bytest/Build/install/include/lunchbox/bitOperation.h:150: note: initialized from here

Google tells you to disable strict-aliasing entirely on gcc as it is broken by design, especially if you follow Linus' argumentation (http://www.mail-archive.com/[email protected]/msg01647.html). Later versions of gcc do not have this issue, though. Also Clang 3.x works fine with the current code.

Make CommandQueue maximum size configurable

Requested by @delyas: For larger client pools with a lot of requests (mapping) the appNode commandqueue blows up. Blocking the receiver thread and eventually the sending clients.

This has the potential for deadlocks, but I can't see an obvious pattern atm.

Make ConnectionSet processing round-robin

Reported by @delyas: The current ConnectionSet::select processing favors the first connections, which causes timeouts when a set of clients pushes a lot of requests fast (mapping) onto a single node, which will then first serve the 'first' clients.

Processing this in a round-robin fashion will ensure that all clients make progress and should speed up the rcv thread since commands are fully read already.

Does Collage have onlinne-offline management?

I find ping() and pingIdleNodes() cmd,but donot find the online state management of the nodes.
Does Collage maintain the nodes' state via heartbeat and provide state request api, or on/off line callback reg api?
If not, any suggestion for me to build it myself?

Thanks very much!

LocalNode::notifyDisconnect is invoked spuriously in case of simultaneous connect

The 'simultaneous connect' code path in co::LocalNode::_cmdConnectReply() will invoke _closeNode() on one of the connections, which results in a notifyDisconnect callback.

An application will receive two LocalNode::notifyConnect callbacks, followed by one notifyDisconnect, all referring to the same remote node.
An application that uses notifyDisconnect to detect the loss of connectivity to a remote node needs to implement additional logic to distinguish these spurious disconnect notifications from those actually indicating a loss of connectivity.

It might still be desirable to have matching numbers of connect and disconnect events, but in that case the documentation should clearly state the possibility of spurious disconnect notifications.

Node::connect race condition

The connection is added to the connection set before the recv or accept is posted, which can cause it to fire in the recv thread and cause the recvSync to be called before the recvNB from the thread calling connect().

g++ compile error at udtConnection.cpp:158:10

OS: Ubuntu 13.04
g++ (Ubuntu/Linaro 4.7.3-1ubuntu1) 4.7.3

I got the following compilation error after typing

make Equalizer

[ 22%] Performing build step for 'Collage'
[ 1%] Building CXX object co/CMakeFiles/Collage.dir/udtConnection.cpp.o
/home/lguyot/develop/Buildyard/src/Collage/co/udtConnection.cpp:158:10: error: 'void >co::{anonymous}::CTCP::onACK(int32_t)' marked override, but does not override

test/connection fails for RDMA connections

I've tested this on two different machines with IB interfaces with the same result, the test hangs at line 132 calling writer->connect().
I've tried moving acceptSync to the reader thread, just in case connect was blocking because of that, but then it fails in _checkCQ at rmdaConnection.cpp:539 in one machine (EPFL vizcluster) and rmdaConnection.cpp:595 in the other one (Lugano cluster).
On the other hand, coNetPerf works.

co::Serializable fdConnection issue (related to compression?)

I've got a distributed object that inherits from co::Serializable. This objects serializes a few kilobytes of data all at once, which is then pushed to slave instances. No subsequent commits happen to that object.

Everything was working fine and dandy with previous versions of EQ (this is actually code that's barely been touched in two years or so). Recently, I pulled from the eyscale repos to the latest versions of all the libraries (Collage commit: 378ab5c).

Now I'm noticing the following behavior. If i commit less than 600-700 bytes of data, everything works fine. However, if I serialize more than that, I get a bunch of errors within co::fdConnection ("Invalid argument (22)", "Error during write after 0 bytes, closing connection", etc).

On the support forums (http://software.1713.n2.nabble.com/Issue-with-co-serializable-in-recent-Collage-version-td7585328.html#a7585332) Daniel noted that this is related to compression. Indeed, deactivating compression on the object seems to restore functionality.

RDMAConnection not endian-safe

At least the initial handshake commands to exchange information are not byte-swapped. Need test setup, current Cadmos setup does not allow us to log onto Intel admin servers.

RSP distribution problem (Windows)

During the RSP session one of the clients gets disconnected and whole session get stalled. It's possible to reproduce it using eqPly example, loading some model, like dragon_vrip. Issue occurs on windows-based cluster.

Steps to reproduce:
pre-launch clients with RSP option
rttScaleClient.exe –-eq-client –-eq-listen 10.0.0.2:12345 --eq-listen RSP#102400#239.255.42.42#10.0.0.2#11147##

run master
eqPly.exe -m \user\nlu\Models\dragon_vrip.ply --eq-config \user\nlu\rttScaleConfig_clean.eqc

config: http://pastebin.com/DjWXj87Q

Custom protocol support

In relation to @julitopower work:

  • Add a protocol magic identifier as first word on a connection
  • Safe protocol with connection
  • Implement current code path for co protocol
  • Add raw protocol
  • Allow protocol -> boost::function( ConnectionPtr ) registration for custom protocols which will be called when data arrives on a non-co connection

RSP connection performance

The test tests/connection.cpp fails in a machine with an Intel i7-3630QM @ 2.40 Ghz because the protocol performs very poorly and the watchdog goes off.
If I reduce NPACKETS until the test finishes in time the reported bandwidth is less than 1 MB/s. In another machine with an Intel i7 (non mobile but I can't recall the model and I can't check it right now) I get 11 MB/s. In an Intel Xeon E5645 2.4 Ghz the performance is 96 MB/s.
I've also noticed that it affects other network applications (Spotify streaming freezes in the mobile CPU).

build.BAT Error NMAKE U1073

NMAKE : fatal error U1073: don't know how to make 'preinstall'
Stop.
CMake Error: Error processing file: cmake_install.cmake
CMake Error: Error processing file: cmake_install.cmake
CMake Error: Error processing file: cmake_install.cmake
CMake Error: Error processing file: cmake_install.cmake

Remove Buffer reallocation on clone

Use lunchbox::Buffer* _data in co::Buffer to point to master buffer, and keep the pointers in the lunchbox::Buffer unchanged.

Make lunchbox::Buffer member private again

Use cloning in ObjectStore (for CommandFunc 'this')

Related to Eyescale/Equalizer#145

Barrier races and deadlocks with sync()

Having multiple threads A share the same barrier, commit from a different thread B multiple versions ahead, sync and enter the barrier the threads A.

The sync after the barrier leave can be executed before another thread is leaving the barrier, as specified by pthread_cond_broadcast() the order of execution is determined by the scheduling policy. As unpack() during sync() resets the monitor/barrier, this causes a deadlock.

Avoid copy of large data during send

The old implementation did send large data blob at the end of the packet directly on locked connections, since the send call implicitly did finish the packet. The new << operators do not know when the packet ends and can't send before the final size is known. (Eyescale/Equalizer#145)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.