Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wikimovies/expt.py theano xc generates segfault on GPU machines #6

Open
krivard opened this issue Aug 28, 2017 · 0 comments
Open

wikimovies/expt.py theano xc generates segfault on GPU machines #6

krivard opened this issue Aug 28, 2017 · 0 comments

Comments

@krivard
Copy link
Member

krivard commented Aug 28, 2017

Documenting here for future reference. Sample output: running just the xc section, and just the DenseMatDenseMsg version:

I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
INFO:root:deserializing db file tmp-cache/train-250.db
INFO:root:deserializing database from tmp-cache/train-250.db
INFO:root:deserialized database has 12 relations and 425618 non-zeros
INFO:root:matrixDB relation has_tags/2 argument 1 type entity_t
INFO:root:matrixDB relation has_tags/2 argument 2 type entity_t
INFO:root:matrixDB relation has_feature/2 argument 1 type question_t
[...]
INFO:root:matrixDB relation in_language/2 argument 2 type entity_t
INFO:root:deserializing dataset file tmp-cache/train-250.dset
INFO:root:deserialized dataset has 1 modes and 697 non-zeros
INFO:root:deserializing dataset file tmp-cache/test-250.dset
INFO:root:deserialized dataset has 1 modes and 707 non-zeros
INFO:root:pool initialized with 5 processes
INFO:root:created pool of 5 workers
INFO:root:compiling answer/io time 0.000 sec mem 3.102 Gb
INFO:root:tensorlog compilation complete; cross-compiling answer/io time 0.002 sec mem 3.102 Gb
INFO:root:tensorlog->target language compilation complete time 26.641 sec mem 65.936 Gb
Expt configuration: {'targetMode': 'answer/io', 'learner':     <tensorlog.theanoxcomp.FixedRateGDLearner object at 0x7f42d30b3290>, 'trainData': <tensorlog.dataset.Dataset object at 0x7f42e68a9d90>, 'savedModel': 'learned-model.tensorlog.theanoxcomp.DenseMatDenseMsgCrossCompiler.db', 'prog': <tensorlog.program.ProPPRProgram object at 0x7f42e690a3d0>, 'testData': <tensorlog.dataset.Dataset object at 0x7f42e68a9f10>}
running untrained theory on train data ...
Segmentation fault (core dumped)

The Sparse version seems to run fine on a GPU machine, and both versions run (though quite slowly) on a CPU-only machine. May be related to this bug on theano integer addressing?

Theano version is '0.9.0dev2.dev-RELEASE'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant