Giter Club home page Giter Club logo

Comments (9)

tomas-wood avatar tomas-wood commented on August 21, 2024 1

Okay I figured it out it's related to this issue. Merci!

from graph_nets.

IMBurbank avatar IMBurbank commented on August 21, 2024 1

To run the GPU version, follow the exact same steps again, but swap in the GPU image in step 3.
3. Run the GPU docker file image with a bash command to enter the container.

docker run --rm --runtime=nvidia --user $(id -u):$(id -g) -p 8888:8888 -v $(pwd):/my-devel -it imburbank/graph_nets:latest-gpu bash -l

That should duplicate a standard environment running tensorflow_gpu, tensorflow_probability_gpu, graph_nets and the standard dependencies.


I see you got it worked out. Cheers!

from graph_nets.

IMBurbank avatar IMBurbank commented on August 21, 2024

How are you running the tests? What environment? What commands?

They all pass in my dev environment.

I ran the tests as follows:

Clone and enter repo

git clone https://github.com/deepmind/graph_nets.git
cd gatph_nets/

I use docker images with all the dependencies included so I don't have to worry about system incompatibilities or version conflicts. If you have docker, you can try the Graph Nets images I'm currently hosting to see if it's an issue with your local dev environment.

# CPU Image
docker run --rm -u $(id -u):$(id -g) -p 8888:8888 -v $(pwd):/my-devel -it imburbank/graph_nets bash -l

# GPU image
docker run --rm --runtime=nvidia --user $(id -u):$(id -g) -p 8888:8888 -v $(pwd):/my-devel -it imburbank/graph_nets:latest-gpu bash -l

Then I ran each test

python graph_nets/tests/blocks_test.py
python graph_nets/tests/modules_test.py
python graph_nets/tests/utils_tf_test.py
...ect

from graph_nets.

tomas-wood avatar tomas-wood commented on August 21, 2024

Hi @IMBurbank thank you for commenting.

I was just cding into graph_nets/tests and running python blocks_test.py after installing. I'm pulling your docker images right now and will try it out through them. Alright I tried 'em out and it got pretty ugly.

Realized I had sshed into the wrong machine and had just installed the binaries for tensorflow through pip instead of building them myself with bazel as I always do. Though in this case it seems like the new binary for tensorflow installed through pip isn't sending me mangled stack traces my own build is.

Running with my own compiled binaries (no docker, no conda env, just Ubuntu 16.04) gave me something similar to your docker image.

2018-10-22 14:49:38.296960: I tensorflow/stream_executor/stream.cc:1960] stream 0x8cda6960 did not wait for stream: 0x18423e90
2018-10-22 14:49:38.296978: I tensorflow/stream_executor/stream.cc:4793] stream 0x8cda6960 did not memcpy host-to-device; source: 0x7fc8f2c00000
2018-10-22 14:49:38.297064: F tensorflow/core/common_runtime/gpu/gpu_util.cc:339] CPU->GPU Memcpy failed
*** Received signal 6 ***
*** BEGIN MANGLED STACK TRACE ***
/usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so(+0x6ba3ee)[0x7fd10b6163ee]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7fd15e6f4390]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7fd15e34e428]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7fd15e35002a]
/usr/local/lib/python2.7/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so(+0x4fadaa7)[0x7fd110eddaa7]
/usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so(+0x5f75ff)[0x7fd10b5535ff]
/usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so(_ZN5Eigen26NonBlockingThreadPoolTemplIN10tensorflow6thread16EigenEnvironmentEE10WorkerLoopEi+0x241)[0x7fd10b5ee581]
/usr/local/lib/python2.7/dist-packages/tensorflow/python/../libtensorflow_framework.so(_ZNSt17_Function_handlerIFvvEZN10tensorflow6thread16EigenEnvironment12CreateThreadESt8functionIS0_EEUlvE_E9_M_invokeERKSt9_Any_data+0x37)[0x7fd10b5ec317]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80)[0x7fd11d65fc80]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7fd15e6ea6ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fd15e42041d]
*** END MANGLED STACK TRACE ***

*** Begin stack trace ***
	tensorflow::CurrentStackTrace[abi:cxx11]()
	
	
	gsignal
	abort
	
	
	Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int)
	std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&)
	
	
	clone
*** End stack trace ***

Aborted (core dumped)

It looks like it couldn't find BLAS with your docker image and in my local environment I'm having trouble getting data from the GPU to the CPU because of a misbehaving stream.

from graph_nets.

IMBurbank avatar IMBurbank commented on August 21, 2024

As long as Docker is working, your local installations of python, conda, bazel, tensorflow, etc won't matter. Everything needed to run the tests is already in the container environments.

Let's start with CPU (I'm not sure if you have GPU configured).

  1. Make sure you're in your normal local environment at a location where you can download the graph_nets repository.

  2. Clone a fresh version of graph_nets to make sure tests are passing with the current build.

git clone https://github.com/deepmind/graph_nets.git
  1. Enter the graph_nets project directory
cd graph_nets/
  1. Run the CPU docker file image with a bash command to enter the container.
docker run --rm -u $(id -u):$(id -g) -p 8888:8888 -v $(pwd):/my-devel -it imburbank/graph_nets bash -l
  1. In that same terminal, so that you're using the container environment, run the tests
python graph_nets/tests/blocks_test.py
python graph_nets/tests/modules_test.py
python graph_nets/tests/utils_tf_test.py

This will not use your locally-compiled tensorflow. The tests should pass. From there, you may be able to work on isolating the problem in your local dev environment.

I would recommend trying the tests on your local dev system with a standard tensorflow package and seeing if they pass. If they do, move to the next link in the chain with your compiled tensorflow.

from graph_nets.

tomas-wood avatar tomas-wood commented on August 21, 2024

I'll try out your CPU version, but I have the GPU configured. My locally installed tensorflow-r1.10 build works on the GPU. All tests passing. Lots of code run with it. If it's causing the problem, I'm only seeing it when trying to run the tests in graph_nets. I also know how docker works. I'm not a complete idiot (just a touch, now and then, for character).

Looks like your CPU binaries work. Doesn't really do me a bit of good, but they work. Kudos.

The thing you recommend, trying the tests on local dev system with standard tf (no GPU) installed with pip is what produced the Failed to run optimizer ArithmeticOptimizer, stage RemoveStackStridedSliceSameAxis. errors I first reported.

from graph_nets.

tomas-wood avatar tomas-wood commented on August 21, 2024

I'm still using nvidia-docker because I'm trapped in the past lol

from graph_nets.

abh2424 avatar abh2424 commented on August 21, 2024

I am facing the similar error while running my object detection python file.I have completed all the above steps given by @IMBurBank.But still the error is same.

What is the top-level directory of the model you are using:
./models/research
Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
NO
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Raspbian OS - linux
TensorFlow installed from (source or binary):
pip3
TensorFlow version (use command below):
1.13.1
Bazel version (if compiling from source):
0.8.0
CUDA/cuDNN version:
no
GPU model and memory:
cpu only

Please help me @IMBurbank

from graph_nets.

cutemuggle avatar cutemuggle commented on August 21, 2024

I am facing the similar error while running my object detection python file. Could you please tell me how to solve it ? Thanks a lot! @abh2424 @tomas-wood

from graph_nets.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.