-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
O.6.1 exiting error #106
Comments
Hmm, it seems we've introduced an issued with the shutdown sequencing. |
Correct, after it happens, it triggers an instability that will cause further thread/gpu crashing.
IE, if I shutdown the miner and experience this crash error,
move on to something else,
and then relaunch the miner to resume mining duty,
the miner will have a gpu dead crash out after a random amount of time.
Before this update 059 and 060, rock solid stable.
|
Got it, thanks for the additional info. I'll try to reproduce this and figure out what's causing it. |
Okay I have even more information, |
I believe I've found what's causing the issue, and it should hopefully be fixed in the next release. |
this is also the random crash i get, no real signs of a hang up, and the GPU gets a quicker reset then usual on a full GPU crash, normally when I crash out the GPU the screen is dead a good 30-50 seconds while the GPU and Windows driver reset, at this crash point is reset in about 8 seconds, im going to step back down to 060 until a fix release is applied. |
I don't know of any recent changes that would cause a dead GPU like that. Are you sure it's not an overclock/undervolt issue? |
Very sure, switch back to 060 and it's running without issue
…On Sun, Dec 1, 2019, 11:46 AM todxx ***@***.***> wrote:
I don't know of any recent changes that would cause a dead GPU like that.
Are you sure it's not an overclock/undervolt issue?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#106?email_source=notifications&email_token=AMGUWHXPBNIO4BSOGI6PP3DQWPS5HA5CNFSM4JTJR7PKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFROOQQ#issuecomment-560129858>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AMGUWHUHR6EJDWTWMSVDHVTQWPS5HANCNFSM4JTJR7PA>
.
|
Can you reproduce the dead GPU scenario again with |
Also, there has to be some kind of difference between 060 and 061, just
from a hashrate stand point. X16v2 has never past 28mh on my single Vega
56, my clock speeds are all the same and stable but on 061 I see hashrates
over 35mh.
…On Sun, Dec 1, 2019, 11:50 AM Ross Gilson ***@***.***> wrote:
Very sure, switch back to 060 and it's running without issue
On Sun, Dec 1, 2019, 11:46 AM todxx ***@***.***> wrote:
> I don't know of any recent changes that would cause a dead GPU like that.
> Are you sure it's not an overclock/undervolt issue?
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#106?email_source=notifications&email_token=AMGUWHXPBNIO4BSOGI6PP3DQWPS5HA5CNFSM4JTJR7PKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFROOQQ#issuecomment-560129858>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AMGUWHUHR6EJDWTWMSVDHVTQWPS5HANCNFSM4JTJR7PA>
> .
>
|
Sure I'll add the --debug command and post results when crash occurs
…On Sun, Dec 1, 2019, 12:02 PM todxx ***@***.***> wrote:
Can you reproduce the dead GPU scenario again with --debug and post the
result? The watchdog messages might help debug the issue.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#106?email_source=notifications&email_token=AMGUWHTS24QPAB375KC3EETQWPU3VA5CNFSM4JTJR7PKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFRO4WA#issuecomment-560131672>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AMGUWHTSLWRTDLJPBKKYY7TQWPU3VANCNFSM4JTJR7PA>
.
|
That bogus high hashrate is likely a sign of GPU crashing. We've seen it for other algos where a GPU will start reporting ~50% more hashrate (it's not real hashrate) and hang shortly after. |
I dont know whats changed, but I have it in debug with text file logging as well, nothing yet, will post results if the random crash occurs again |
no crash yet but have seen at least 2-3 spikes over 40MH/s on a single vega in an 7hr run time, still waiting to catch the actual crash |
I wasn't thinking in my past post. Since this is an x16* algo, big swings in hashrate are actually expected. The x16* class of algos randomly select a sequence of 16 hash functions to run for each block on the network, so if you get a lucky sequence where it mostly runs fast hash functions the hashrate can indeed spike very high. This also makes tuning the hardware difficult because you never know when you'll get a particularly tough sequence of algos that puts a lot more strain on the GPU hardware. So things can run stable for days, and when a super tough sequence shows up, it crashes the GPUs. |
Yes, definitely agree with that, sometimes you get a lucky hash order. I
haven't had it crash yet, but I think it's come close.
Going by your previous message about spiking hashrates before a crash, a
Vega56 should be within the 22-24mhs rate.
Yesterday, early this morning, when observing current data(pids) and
watching the console, I noticed the following:
Eventually my hot spot hits 80c, when hitting 80c hot spot, the driver over
rides my fan setting with some "hey let's cool that off faster" fan
setting.
Now that's a key factor, I believe, that leads to the random crash. 2
things line up at random times and it triggers a 3rd problem, which leads
to the crash.
When mining x algos I have a specific OC profile, that keeps the GPU right
@ 1400core with a fan profile that keeps GPU right around 56-62c
Now the trick is to remember the fan override for hotspot at 80c.
So 3 things take place,
1. Mining along @0.95v which drops to a solid 0.9v underload,
2. However, when a hard sequence hits it pulls harder at the core,
But 3. If the hotspot temp creeps to 80 while on a hard seq the fan ramp to
97% is enough to drop my core under the 0.9v mark, at which point the
trifecta of the random crash, in theory, presents with a huge hash spike
before the watchdog declares the GPU dead.
So we figured out the random crash, I bumped vcore a smidge and it's all
good
We still have the original posted issue of crashing on exit for Vega and
Polaris however
…On Mon, Dec 2, 2019, 1:45 AM todxx ***@***.***> wrote:
I wasn't thinking in my past post. Since this is an x16* algo, big swings
in hashrate are actually expected. The x16* class of algos randomly select
a sequence of 16 hash functions to run for each block on the network, so if
you get a lucky sequence where it mostly runs fast hash functions the
hashrate can indeed spike very high.
This also makes tuning the hardware difficult because you never know when
you'll get a particularly tough sequence of algos that puts a lot more
strain on the GPU hardware. So things can run stable for days, and when a
super tough sequence shows up, it crashes the GPUs.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#106?email_source=notifications&email_token=AMGUWHUA4EQFGHMJRAKXK5DQWSVHBA5CNFSM4JTJR7PKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFSNLWY#issuecomment-560256475>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AMGUWHUUI4OQFO6FTLXHQF3QWSVHBANCNFSM4JTJR7PA>
.
|
New to version 061 see screenshot.
The text was updated successfully, but these errors were encountered: