O.6.1 exiting error #106

RGlabs84 · 2019-12-01T03:41:04Z

New to version 061 see screenshot.

todxx · 2019-12-01T04:09:07Z

Hmm, it seems we've introduced an issued with the shutdown sequencing.
This only happens when trying to shutdown the miner, right?

RGlabs84 · 2019-12-01T05:52:33Z

Correct, after it happens, it triggers an instability that will cause further thread/gpu crashing. IE, if I shutdown the miner and experience this crash error, move on to something else, and then relaunch the miner to resume mining duty, the miner will have a gpu dead crash out after a random amount of time. Before this update 059 and 060, rock solid stable.

RGlabs84 · 2019-12-01T05:59:05Z

I can also confirm that this bug is affecting my Polaris based mini rig

todxx · 2019-12-01T06:00:36Z

Got it, thanks for the additional info. I'll try to reproduce this and figure out what's causing it.

RGlabs84 · 2019-12-01T06:00:57Z

Okay I have even more information,

RGlabs84 · 2019-12-01T06:02:16Z

When checking another algo, in this case I tried lyra2r3. Works perfectly.

todxx · 2019-12-01T11:31:26Z

I believe I've found what's causing the issue, and it should hopefully be fixed in the next release.
Thanks for the bug report :)

RGlabs84 · 2019-12-01T16:02:46Z

this is also the random crash i get, no real signs of a hang up, and the GPU gets a quicker reset then usual on a full GPU crash, normally when I crash out the GPU the screen is dead a good 30-50 seconds while the GPU and Windows driver reset, at this crash point is reset in about 8 seconds, im going to step back down to 060 until a fix release is applied.

Im more then willing to test some builds if you want.

todxx · 2019-12-01T16:46:10Z

I don't know of any recent changes that would cause a dead GPU like that. Are you sure it's not an overclock/undervolt issue?

RGlabs84 · 2019-12-01T16:50:35Z

Very sure, switch back to 060 and it's running without issue

…

On Sun, Dec 1, 2019, 11:46 AM todxx ***@***.***> wrote: I don't know of any recent changes that would cause a dead GPU like that. Are you sure it's not an overclock/undervolt issue? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#106?email_source=notifications&email_token=AMGUWHXPBNIO4BSOGI6PP3DQWPS5HA5CNFSM4JTJR7PKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFROOQQ#issuecomment-560129858>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMGUWHUHR6EJDWTWMSVDHVTQWPS5HANCNFSM4JTJR7PA> .

todxx · 2019-12-01T17:02:50Z

Can you reproduce the dead GPU scenario again with --debug and post the result? The watchdog messages might help debug the issue.

RGlabs84 · 2019-12-01T17:11:20Z

Also, there has to be some kind of difference between 060 and 061, just from a hashrate stand point. X16v2 has never past 28mh on my single Vega 56, my clock speeds are all the same and stable but on 061 I see hashrates over 35mh.

…

On Sun, Dec 1, 2019, 11:50 AM Ross Gilson ***@***.***> wrote: Very sure, switch back to 060 and it's running without issue On Sun, Dec 1, 2019, 11:46 AM todxx ***@***.***> wrote: > I don't know of any recent changes that would cause a dead GPU like that. > Are you sure it's not an overclock/undervolt issue? > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#106?email_source=notifications&email_token=AMGUWHXPBNIO4BSOGI6PP3DQWPS5HA5CNFSM4JTJR7PKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFROOQQ#issuecomment-560129858>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AMGUWHUHR6EJDWTWMSVDHVTQWPS5HANCNFSM4JTJR7PA> > . >

RGlabs84 · 2019-12-01T17:12:09Z

Sure I'll add the --debug command and post results when crash occurs

…

On Sun, Dec 1, 2019, 12:02 PM todxx ***@***.***> wrote: Can you reproduce the dead GPU scenario again with --debug and post the result? The watchdog messages might help debug the issue. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#106?email_source=notifications&email_token=AMGUWHTS24QPAB375KC3EETQWPU3VA5CNFSM4JTJR7PKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFRO4WA#issuecomment-560131672>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMGUWHTSLWRTDLJPBKKYY7TQWPU3VANCNFSM4JTJR7PA> .

todxx · 2019-12-01T17:14:23Z

That bogus high hashrate is likely a sign of GPU crashing. We've seen it for other algos where a GPU will start reporting ~50% more hashrate (it's not real hashrate) and hang shortly after.

RGlabs84 · 2019-12-01T22:17:55Z

I dont know whats changed, but I have it in debug with text file logging as well, nothing yet, will post results if the random crash occurs again

RGlabs84 · 2019-12-02T05:22:02Z

no crash yet but have seen at least 2-3 spikes over 40MH/s on a single vega in an 7hr run time, still waiting to catch the actual crash

todxx · 2019-12-02T06:45:03Z

I wasn't thinking in my past post. Since this is an x16* algo, big swings in hashrate are actually expected. The x16* class of algos randomly select a sequence of 16 hash functions to run for each block on the network, so if you get a lucky sequence where it mostly runs fast hash functions the hashrate can indeed spike very high.

This also makes tuning the hardware difficult because you never know when you'll get a particularly tough sequence of algos that puts a lot more strain on the GPU hardware. So things can run stable for days, and when a super tough sequence shows up, it crashes the GPUs.

RGlabs84 · 2019-12-02T10:07:46Z

Yes, definitely agree with that, sometimes you get a lucky hash order. I haven't had it crash yet, but I think it's come close. Going by your previous message about spiking hashrates before a crash, a Vega56 should be within the 22-24mhs rate. Yesterday, early this morning, when observing current data(pids) and watching the console, I noticed the following: Eventually my hot spot hits 80c, when hitting 80c hot spot, the driver over rides my fan setting with some "hey let's cool that off faster" fan setting. Now that's a key factor, I believe, that leads to the random crash. 2 things line up at random times and it triggers a 3rd problem, which leads to the crash. When mining x algos I have a specific OC profile, that keeps the GPU right @ 1400core with a fan profile that keeps GPU right around 56-62c Now the trick is to remember the fan override for hotspot at 80c. So 3 things take place, 1. Mining along @0.95v which drops to a solid 0.9v underload, 2. However, when a hard sequence hits it pulls harder at the core, But 3. If the hotspot temp creeps to 80 while on a hard seq the fan ramp to 97% is enough to drop my core under the 0.9v mark, at which point the trifecta of the random crash, in theory, presents with a huge hash spike before the watchdog declares the GPU dead. So we figured out the random crash, I bumped vcore a smidge and it's all good We still have the original posted issue of crashing on exit for Vega and Polaris however

…

On Mon, Dec 2, 2019, 1:45 AM todxx ***@***.***> wrote: I wasn't thinking in my past post. Since this is an x16* algo, big swings in hashrate are actually expected. The x16* class of algos randomly select a sequence of 16 hash functions to run for each block on the network, so if you get a lucky sequence where it mostly runs fast hash functions the hashrate can indeed spike very high. This also makes tuning the hardware difficult because you never know when you'll get a particularly tough sequence of algos that puts a lot more strain on the GPU hardware. So things can run stable for days, and when a super tough sequence shows up, it crashes the GPUs. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#106?email_source=notifications&email_token=AMGUWHUA4EQFGHMJRAKXK5DQWSVHBA5CNFSM4JTJR7PKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFSNLWY#issuecomment-560256475>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMGUWHUUI4OQFO6FTLXHQF3QWSVHBANCNFSM4JTJR7PA> .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

O.6.1 exiting error #106

O.6.1 exiting error #106

RGlabs84 commented Dec 1, 2019

todxx commented Dec 1, 2019

RGlabs84 commented Dec 1, 2019 via email •

edited

Loading

RGlabs84 commented Dec 1, 2019

todxx commented Dec 1, 2019

RGlabs84 commented Dec 1, 2019

RGlabs84 commented Dec 1, 2019

todxx commented Dec 1, 2019

RGlabs84 commented Dec 1, 2019

todxx commented Dec 1, 2019

RGlabs84 commented Dec 1, 2019 via email

todxx commented Dec 1, 2019

RGlabs84 commented Dec 1, 2019 via email

RGlabs84 commented Dec 1, 2019 via email

todxx commented Dec 1, 2019

RGlabs84 commented Dec 1, 2019

RGlabs84 commented Dec 2, 2019

todxx commented Dec 2, 2019

RGlabs84 commented Dec 2, 2019 via email

O.6.1 exiting error #106

O.6.1 exiting error #106

Comments

RGlabs84 commented Dec 1, 2019

todxx commented Dec 1, 2019

RGlabs84 commented Dec 1, 2019 via email • edited Loading

RGlabs84 commented Dec 1, 2019

todxx commented Dec 1, 2019

RGlabs84 commented Dec 1, 2019

RGlabs84 commented Dec 1, 2019

todxx commented Dec 1, 2019

RGlabs84 commented Dec 1, 2019

todxx commented Dec 1, 2019

RGlabs84 commented Dec 1, 2019 via email

todxx commented Dec 1, 2019

RGlabs84 commented Dec 1, 2019 via email

RGlabs84 commented Dec 1, 2019 via email

todxx commented Dec 1, 2019

RGlabs84 commented Dec 1, 2019

RGlabs84 commented Dec 2, 2019

todxx commented Dec 2, 2019

RGlabs84 commented Dec 2, 2019 via email

RGlabs84 commented Dec 1, 2019 via email •

edited

Loading