Replies: 28 comments 1 reply
-
Could you check to see if zfs-import-scan service is disabled by running |
Beta Was this translation helpful? Give feedback.
-
I disabled all zfs startup services to get server to multi-user target. |
Beta Was this translation helpful? Give feedback.
-
you do not need to disable zfs to get into multi-user.target. |
Beta Was this translation helpful? Give feedback.
-
All together tried both with zfs-import-scan service, also without zfs-import-scan service. |
Beta Was this translation helpful? Give feedback.
-
Below information is possible to view before import
|
Beta Was this translation helpful? Give feedback.
-
tail -f /proc/spl/kstat/zfs/dbgmsg
|
Beta Was this translation helpful? Give feedback.
-
following #835 |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
started to do something
|
Beta Was this translation helpful? Give feedback.
-
trying running the following zfs is trying to import the pool but there seems to be issues with your vdevs. |
Beta Was this translation helpful? Give feedback.
-
Not using cryptsetup, nor any encryption in place. no events so far
|
Beta Was this translation helpful? Give feedback.
-
There are two method of importing the pool which are either with the cachefile or the dev id number. To import the pool with the cachefile (used to import a root pool) |
Beta Was this translation helpful? Give feedback.
-
Hi johnnyjacq16, Seems problem I have is not that zfs can't properly identify devices while re-created cache is fully aware of all devices
I belive issue here is ZFS is doing extra checks and transactions in background that delays import I assume zfs is trying to finish transaction 31064817 then I'm importing with to rewind transactions to 31064811 while that seemd to be last correctly written transaction to disk now I see:
seems problem now is below mentioned dsl_scan_sync DSL(Dataset and Snapshot Layer) which I belive is trying to do some "repair"
Long story short I belive problems are extensive checks ZFS are doing, but on scale of 200TB this would take ages (moths) to finish. |
Beta Was this translation helpful? Give feedback.
-
Hi @behlendorf is there any way to skip this scan sync ? |
Beta Was this translation helpful? Give feedback.
-
I belive uncompleted scrub executed 31.5 is running as "foreground" process that blocks actual zpool import. |
Beta Was this translation helpful? Give feedback.
-
@omfdzg importing the pool read-only will be the easiest way to prevent the scan from running and should let you verify that the rewind will succeed. Can you grab the stack trace for the txg sync thread. That should give us a better idea about why the import isn't progressing.
One other thing I noticed is that you specified a negative txg when importing the pool which I'm assuming was a typo. This value should be positive and the txg you want to import using.
It's possible the scrub is causing an issue for this extreme rewind case. There is some logic which delays starting the scrub for 5 txgs after the pool is imported, so I'm surprised the see that it was started at all. See the |
Beta Was this translation helpful? Give feedback.
-
NB, rewinds can take much longer than latest txg imports because it is possible metadata has been overwritten, so it cannot be trusted and must be checked. Scrubs start much later, so they aren't an issue. |
Beta Was this translation helpful? Give feedback.
-
Hi, thanks for checking here is stack trace for current import
|
Beta Was this translation helpful? Give feedback.
-
Today after a week of trying all import options has the pool successfully imported based on "zpool import -T-31064811 datapool" this command was running for 2 days approx. |
Beta Was this translation helpful? Give feedback.
-
I consider this as a real issue with ZFS, When scrub is stared, write IO is blocked. It will resume operating when "zpool scrub -s" is issued but after ~2 days. This is show stopper for production enrironment and large pools! Please take a look at scrub fucntionality there is no way around it once started pool will be "haged". |
Beta Was this translation helpful? Give feedback.
-
HW RAID is killing ZFS |
Beta Was this translation helpful? Give feedback.
-
It's not .. this pool is working fine on heavy loads ( 15 servers making backups tons of small files and 100TB of zvol-ext4 deduplaicated by Veritas netbackup software ) for a year already, had issues only when srub was started. For example now doing 2000 IO/ps with layecy 1 ms. But if I'll start scrub .. goodbye for 2 days or more.
|
Beta Was this translation helpful? Give feedback.
-
@omfdzg it sounds as if the default scrub behavior may require some tuning for your environment. There are several module options described in the I'd suggest starting by reducing the There are several other scrub related options described in the man page which you may also find useful. If you do determine a better set of tunings for you environment please let us know so we can potentially revisit the default values. One other option you may want to consider is that it's possible to revert to the previous scrub behavior by setting the |
Beta Was this translation helpful? Give feedback.
-
Thank you @omfdzg! This helped me resolve my own import issues over the last weekend. System details: I have a simple two disk raid-z setup (Seagate Constellations ES'), and my system was stalling at boot (which ended up being bad memory, but I digress); after a wipe and reload of Ubuntu 20.04, I installed the zfs package and tried to import this simple pool, but it would stall out. I tried different periods of time to wait for the import, but ultimately after waiting for 13+ hours it was not importing. Zpool status was reporting the pool was online, so after reading through this and #835, I decided to try the approach above. I searched through zdb and found the last transaction on each disk and performed the following:
The pool imported almost immediately and I ran a scrub afterwards with no problems found with the data. Unfortunately I didn't grab any logs afterwards, but I did capture the zdb data from both disks before the import if that could be of any help. If there is any more info I can pull to help with this, let me know. As an aside and somewhat related , I did have another issue during boot on the fresh Ubuntu 20.04 install. Had to perform a couple of reboots due to display issues, and once again boot stalled. This time I dug into it, and zfs-import-cache.service was stalling out (possibly similar issue). I rebooted into recovery mode, disabled the service and was able to boot. After boot none of the zfs services were running, so I reenabled the import-cache service and started all zfs services and everything was back. I recreated the zfs cache just to be safe, but everything seems fine right now. Not sure what to make of it, but if there's any logs I can pull to help out, let me know. Also if there's specific logs I can pull if/when this happens again, let me know. |
Beta Was this translation helpful? Give feedback.
-
So based on my experience today, setting spa_load_verify_metdata to 0 after the fact, doesn't stop the loop in |
Beta Was this translation helpful? Give feedback.
-
Providing some way to bail out of |
Beta Was this translation helpful? Give feedback.
-
When doing It sounds like the initial problem report was not using
|
Beta Was this translation helpful? Give feedback.
-
Was looking at something related (resilver running before the import is done) and the other idea I considered was making SCAN_IMPORT_WAIT_TXGS tunable at run time instead of it being hardcoded to 5. |
Beta Was this translation helpful? Give feedback.
-
System information
Describe the problem you're observing
zpool import hanged , tried all options to import including -N -fFX , seems zpool hanged for hours on end of month scheduled zpool scrub operation, got reboot, next import was not possible
Describe how to reproduce the problem
zpool import
Include any warning/errors/backtraces from the system logs
Beta Was this translation helpful? Give feedback.
All reactions