From 212b40b524928fbc97f76803bce09f9a66685a5e Mon Sep 17 00:00:00 2001 From: Kerney666 <44686606+Kerney666@users.noreply.github.com> Date: Fri, 31 Jul 2020 14:12:37 +0200 Subject: [PATCH] Added miner documentation files. --- doc/CN_AUTOTUNING_WITH_TRM.txt | 129 +++++++++++++++++ doc/CN_GENERAL_TUNING.txt | 68 +++++++++ doc/CN_MAX_YOUR_VEGA.txt | 177 ++++++++++++++++++++++++ doc/ETHASH_GENERAL_TUNING.txt | 245 +++++++++++++++++++++++++++++++++ doc/KAWPOW_TUNING.txt | 178 ++++++++++++++++++++++++ doc/MAP_YOUR_GPUS.txt | 54 ++++++++ doc/MTP_MINING.txt | 101 ++++++++++++++ doc/NIMIQ_MINING.txt | 48 +++++++ doc/TRTL_CHUKWA_MINING.txt | 71 ++++++++++ doc/USAGE.txt | 229 ++++++++++++++++++++++++++++++ 10 files changed, 1300 insertions(+) create mode 100644 doc/CN_AUTOTUNING_WITH_TRM.txt create mode 100644 doc/CN_GENERAL_TUNING.txt create mode 100644 doc/CN_MAX_YOUR_VEGA.txt create mode 100644 doc/ETHASH_GENERAL_TUNING.txt create mode 100644 doc/KAWPOW_TUNING.txt create mode 100644 doc/MAP_YOUR_GPUS.txt create mode 100644 doc/MTP_MINING.txt create mode 100644 doc/NIMIQ_MINING.txt create mode 100644 doc/TRTL_CHUKWA_MINING.txt create mode 100644 doc/USAGE.txt diff --git a/doc/CN_AUTOTUNING_WITH_TRM.txt b/doc/CN_AUTOTUNING_WITH_TRM.txt new file mode 100644 index 0000000..8fb01d8 --- /dev/null +++ b/doc/CN_AUTOTUNING_WITH_TRM.txt @@ -0,0 +1,129 @@ +Team Red Miner CryptoNight Auto-Tuning Support +============================================== + +TL;DR +----- +TRM now has auto-tuning capabilities and the recommended way is to either always run the miner without specific configurations or use a script to deduce the best configurations. The former means deleting any --cn_config arguments from existing start scripts. For the latter, edit the included run_autotune_quick.bat/.sh with your algo, pool, wallet and password. Run the script and wait until it completes. Expected runtime is 5-15 mins. Open the logfile autotune_quick.txt and copy the command line arg at the end of the file with the CN configs to your start script, replacing any existing --cn_config argument. If you change the rig setup (e.g. add/remove gpus), do this process again. + + +Introduction +------------ +In TRM 0.5, we had to add an set of additional tuning parameters for CN variants. The CN config format now contains an optional fine-tuning suffix set of a colon and three letters. This meant that the scan range became much larger than before, turning tuning a rig into a tedious project. Moreover, it's not trivial to explain when to try different combinations of the fine-tuning parameters. + +We also noticed that many of the previous variants could benefit from small tweaks to the CN configs as well, but in highly random ways. There is no simple default formula that provides the optimal CN config for TRM for both Windows and Linux, across all driver versions and sets of chosen clocks and timings. + +Given the above, we felt it was necessary to include auto-tuning support in the miner rather than ask the users to spend hours on end and still not be sure they are running TRM in the best possible configuration for their rig(s). In some ways, this can be described as a much better default mode where we don't have to guess but can take 5 mins on startup and benchmark our way to the best found config, without restarting the miner and still mining at close to maximum speed while benchmarking. + +We have also added a manual tuning mode using key menus in the miner. This means you can switch freely between CN configs without restarting the miner. The only limitation is that intensities can't be increased. + + +The CN Configuration +-------------------- +The full CN config now consists of the following parts. Please note that this is mostly for reference, the idea with the auto-tuning is rather that the user should have to care _less_ about the parameters involved. + +o Prefix: an optional L. This enables a more compressed deployment on the gpu + where each work unit handles more pads. Mostly used for small pad + algos (Turtle, UPX2) or on small gpus like 550s. + +o Thread 1 intensity X: typically a number between 1-16 for 8GB gpus and + standard 2MB pad size, but it depends on the algo and pad size. + +o Tweak setting: one of the following chars: + o - No tweak enabled. + o + First tweak option (only option for Lexa/Baffin/Polaris cards). + o * Second tweak option (for Vegas, especiall with timing mods enabled). + o . I don't care, let the miner choose. + +o Thread 2 intensity Y: the second thread's intensity. Can be zero to disable + thread 2. + +o Finetune: three chars that are loosely defined from the user's perspective + since we might change these at any time. The standard finetune set + is always AAA. Not all algos have all combinations available. + o Finetune 1: one of ABCDE + o Finetune 2: one of AB + o Finetune 3: one of AB + +Please note that it's _always_ possible to specify the CN config without the added finetune part. + +Example configs: + + 16*14:CAA Common Vega config for 2MB pad algos. + L28+28:AAA Common Vega config for small pad algos (Turtle, UPX2). + 8+7:AAA Common 470-580 config for 2MB pad algos. + 16+14:CBB Common 470-580 and Vega config for 4MB pad algos. + + +Types of Auto-Tuning available +------------------------------ +The miner supports the following types of auto-tuning: + +o AUTO: A (default) quick scan on start-up for all GPUs, only checking a small + set of known good configs. If the user has specified specific config + units (intensities), only a single configuration per gpu is scanned, + otherwise a sensible default range is used. + +o QUICK: The same quick scan as AUTO, but with better reporting. + +o SCAN: A thorough exhaustive search of all configurations. This can take a + long time to execute (> 1-2h) but guarantees that all configs have + been tested. It can also scan across multiple intensities. + +o NONE: Disable all forms of auto-tuning, just run with the provided or + default configs. This is not recommended. + + +The Auto-Tuning Process +----------------------- +Note 1: the quality of the auto-tuning output can vary, and in rare cases there might even be false best modes chosen. In general you will end up with a config that is either your best choice under the chosen clocks and timings, or it is at least very very close. That said, verify that you're getting the expected hashrates from the chosen configs, otherwise revert to the 2nd or 3rd best config in the final list printed in step (7) below. + +Note 2: during the auto-tuning, there is a small probability that false hw errors will occur when switching between configs. Therefore, please disregard any small amount of hw errors that occur during the auto-tuning process. Errors that occur when the process has completed for a gpu and it's listed as 100% done are real errors and indicate you're clocks and/or timings are too aggressive. + +1. Set the clocks and timings you aim to run in production, but be a little + more generous. This primarily means lowering memclk and raising voltages + just a little. We want the clock regime to be compatible with the final + clocks from an optimization perspective, but we also want to minimize the + risk of crashing during the auto-tuning process. + +2. Edit the run_autotune_quick.bat/.sh: + o Set your pool, user and password. + o If you only want to work with a subset of your gpus, set the DEVS + variable to a -d x,y,z argument. + +3. Execute run_autotune_quick.bat/.sh. The miner will shut down when the + auto-tuning process has completed. + +4. Open the autotune_quick_log.txt file and scroll to the bottom. Copy the + command line argument with the CN configs printed by the miner to + your start script for the miner, either adding it or replacing any + existing --cn_config argument. + +5. Optional continuation: if you _really_ want to make sure you are running + the best possible configuration for your gpu(s), open the + run_autotune_full.bat/.sh file and insert the same variables as for the + quick script. + +6. You can also enter CN configurations for all gpus in the full scan. The + only reason for doing so would be to force the start intensities to + specific values for a gpu. The scan will try all possible configurations + at each intensity level, decreasing the intensity one step at the time. + The miner will normally choose high start intensities when it knows it is + going to scan a full range, making this step unnecessary. If you still + want to add it, set the CN_CFG variable to e.g. + --cn_config=16+15,8+8 for a two-gpu system. Do NOT configure more than the + two intensities (i.e. 16+15 rather than 16+15:AAA) or you will disable the + auto-tuning. + +7. Start run_autotune_full.bat/.sh and go grab lunch, coffee or dinner. This + will take a while. + +8. Open the autotune_full_log.txt file and scroll to the bottom. Compare the + final output values to the values from the quick scan and use any mix of + the two. + + +Manual menu-based tuning +------------------------ +For manual tuning, we've also added a key-driven menu subsystem in the miner that reuses the same mechanisms as the auto-tuning mode. You enter it by pressing 't' and then then one of 0-9 (or a-f for 10-15). + +The mode itself should hopefully be self-explanatory. You can cycle all available options per finetune parameter, tweak mode and L prefix enabled or not. You can also move freely between intensities <= the ones chosen at startup. This is a great power-user mode and for playing around with the configs found by the auto-tune process. diff --git a/doc/CN_GENERAL_TUNING.txt b/doc/CN_GENERAL_TUNING.txt new file mode 100644 index 0000000..d3a6920 --- /dev/null +++ b/doc/CN_GENERAL_TUNING.txt @@ -0,0 +1,68 @@ +Team Red Miner CryptoNight Tuning +================================= + +IMPORTANT IMPORTANT IMPORTANT: this document preceeds the new tuning document included in the releae, CN_AUTOTUNING_WITH_TRM.txt. While the information below is still accurate, it is also somewhat outdated. It is very much recommended to read the auto-tuning guide and use the new auto-tuning support to trim your rigs. + +Note: the hashrates mentioned in this document are for the main Monero PoW variants, CNv8 and CN/r. + +Introduction +------------ +This miner is more lean than other CN miners. This can translate into either increased hashrate, lower power draw, or both, or none of the above. Your mileage may vary and is highly dependent on mem straps, modded timings and clocks. For most CN variants, if this miner is running at the same hashrate as other CN miners, you can expect your power draw per GPU to decrease between -5-20W depending on gpu model and clocks. + +There are fewer controls in this miner than the standard CN miner. You specify a config for one or two threads, and a mode. You provide one intensity value per thread in the range 0-16. Moreover, you can choose between three modes, +, - and *. The + and - modes will most often not have any effect on the end result, but it never hurts to try it. The effect of the + and - modes varies with your gpu model and clocks, making it difficult to make general recommendations on when to us one or the other. + +The * mode is different. It's designed specifically to take advantage of modded timings on Vega cards. Whenever you use modded timings with tightened latency, you should use the * mode. The AMD Memory Tweak tool released by ElioVP is truly an amazing addition to the Vega toolset. We recommend all Vega owners to read up on the tool and the current latest and greatest mem timings provided by the community. The tool is available at https://github.com/Eliovp/amdmemorytweak, and the Bitcointalk ANN thread can be found here: https://bitcointalk.org/index.php?topic=5123724.0 + +Standard Tuning Guide +--------------------- + +[Windows drivers] +For some Polaris cards, the good ol' blockchain driver works fine. However, the one driver that seems to be a good fit across the board is 18.6.1, and that's the driver we have used in all our Windows tests. + +[Windows swap space] +You should to set up your swap space to be at least the sum of all GPU memory you intend to use when mining. Typically, for a 4GB card this is 3.5GB and for an 8GB it's 7.5GB. Playing it safe is recommended, i.e. rather add the full memory size of all your GPUs and set the swap to the total sum or more. + +[Linux drivers] +For your Vegas to reach max possible hashrate under linux, you need amdgpu-pro drivers >= 18.30. Polaris/Baffins/Lexa Pros are not as sensitive to the driver version. Also, please note that this release does not include ROCm support for CN variants, it will be included in an upcoming release instead. + +[Polaris cards (470-580)] +The standard configuration is 8+8 for all of these cards. 8+6 or 7+7 might give the same optimal hashrate, and 9+9 can, especially under linux, give a better result for 480/580. For some cards, 16+14 is the best choice but also increases the probability of stale shares. You must have good mem straps to reach a good hashrate. Normally e.g. the Pimp My Straps function in SRB Polaris Bios Editor is sufficient. For mem clk, boosting it as much as possible while avoiding mem errors is a good thing. The core clk should generally end up between 1230-1270. For 580s, a boosted core clk to 1300 can push the hashrate to 1100 h/s while still staying at a reasonable power draw. We have seen few Polaris cards not being able to reach 1020-1030 h/s with this miner when the proper mem straps are in place. With the introduction of CN/r, which is more power hungry than CNv8, the core clks mentioned above might be too high and skew your efficiency. Either lower them somewhat or make sure your temps and power draw at-the-wall is under control. + +[Baffin and Lexa Pro cards (550-560)] +From v0.3.8, this miner has now been better optimized for these smaller cards. The major additions are that the '+' mode has been optimized and a 'L' prefix mode designed for the smallest Lexa Pro cards has been added. Some rules of thumb when you optimize your rigs: + +o We'd expect 4+4 and 4+3 to be the only interesting configs for 4GB cards. +o For Lexa Pro cards with 8 CUs, prefix your config with 'L', i.e. L4+3. +o The 'L' prefix is designed for Lexa Pro, but can also work well for Baffin with 10 CUs. +o Many 2GB Lexa Pro can't do L4+3 under Win, only L3+3. For max performance you should try Linux and L4+3. +o For an overkill full range test, you should try all of 4+4,4+3,3+3,3+2,2+2 in four modes: X+Y,LX+Y,X-Y,LX-Y + +[Vega cards] +The Vegas can end up anywhere from 1900-2450 depending on if it's a 56 or 64, a reference card or not and your choice of clocks and modded mem timings. With the AMD Memory Tweak tool, we have an additional dimension to play with, and there this section has been expanded into a separate Vega tweaking document with full examples of how to bring different Vegas to their maximum potential with this miner. If you only want a quick overview of the interesting configurations for a Vega: try 14+14, 14-14, 14*14, 15+15, 15*15, 16+14, 16*14. You can also try 16+15, 16*15, 15+14, 15*14, etc. The mem clk is very important, and you should aim for as high as possible while keeping your rig stable. If you have a Vega 64 and don't mod your timings, a higher core clk will have a significant effect on the hashrate, but tweaking timings is much more efficient. The 16+14 configuration will often not show its true capability before hitting 1500 core clk. Your power draw should still stay reasonable (as in lower than other miners at more standard clocks). For a lower core clk around 1408, some cards do best with 16+14, others with 15+15, some with 14+14, YMMV. Again, with modded timings you can keep your core clk around 1408 and still hit a very high hashrate. + +[Older cards] +We're sorry, we only support 470-580, 550/560 and Vega cards. There are reports of people successfully running the miner on Fiji and Tonga cards (R9 290X etc), but we do not test on said devices. + +Benchmark results (CNv8 results, somewhat outdated but still indicative) +------------------------------------------------------------------------ +For most Polaris cards below, one-click Pimp My Straps in SRB Polaris Bios Editor has been used for mem straps. + +6 x Rx 470 8GB (Samsung mem) rig +8+8, 1250/900 cclk, 2000/900 mclk, 6105 h/s, total rig 685W + +Rx 560 4GB (Samsung mem) +4+4, 1230/900 cclk, 2050/900 mclk, 540 h/s, unknown power draw + +Rx 570 8GB (Samsung mem) +8+8, 1270/900 cclk, 2100/900 mclk, 1030 h/s @ ~100W at wall + +Rx 580 8GB (Hynix mem) +8+8, 1250/900 cclk, 2000/900 mclk, 1029 h/s @ ~105W at wall + +Vega 56 reference card (56 bios, ppt mod) +16+14, 1413/880 cclk, 935/880 mclk, ~2000 h/s @ ~197W at wall + +Vega 64 liquid cooling +15+15, 1408/880 cclk, 1100/880 mclk, ~2100 h/s @ ~190W(?) at wall +16+14, 1560/925 cclk, 1100/880 mclk, ~2270 h/s @ ~210W(?) at wall + diff --git a/doc/CN_MAX_YOUR_VEGA.txt b/doc/CN_MAX_YOUR_VEGA.txt new file mode 100644 index 0000000..e01b1b6 --- /dev/null +++ b/doc/CN_MAX_YOUR_VEGA.txt @@ -0,0 +1,177 @@ +Team Red Miner CryptoNight Tuning For Vega GPUs +=============================================== + +Introduction +============ +With the addition of ElioVP's awesome AMD Memory Tweak tool to the Vega toolset, the game has changed drastically. In addition to the old tools like soft powerplay mods, we can now tweak the memory timings on the Vega GPUs much in the same way we've been able to fine-tune Polaris GPUs with bios mods for a long time. We're now able to achieve higher hashrates than ever before while still keeping the core clk at standard or low CN levels. + +At the time of writing, the community is producing new modded memory timings on a daily basis. This document only presents a few of the available timings that at this point seem stable over time. They might not work for all GPUs. Silicon lottery will definitely be a factor when trying to squeeze out as much performance as possible. I do hope automated tuning tools become available that e.g. can tune the memory timings automatically and run benchmarks jobs with known result sets to automatically check for memory/hw errors. + +This guide only contains two recipes, one for Samsung and one for Hynix HBM memory. It's not intended to be a complete guide, rather focused on presenting one path to achieving a very competitive hashrate on each type of Vega GPU, then leaving it up to the community to do further tweaking and present their results. It will most definitely not be the most efficient setup, we will rather keep clocks and voltages at stable old-school CN levels. + +The hashrates we hope to achieve by following this guide are: + +Vega 64 (Samsung) 2440 h/s @ 185-195W +Vega 56 flashed 64 (Samsung) 2250 h/s @ 180-190W +Vega 56 (Hynix) 2110 h/s @ 180-185W + + +Step 1 - which HBM2 memory do I have? +===================================== +There are three manufacturers of HBM2 memory, the type of memory used in all Vega GPUs: Samsung, Hynix and Micron. Not surprising, Samsung is best-in-class and afaik used in all Vega 64s and reference Vega 56s. Hynix is typically found in newer (non-ref designs) Vega 56s, although there might be instances that uses Samsung memory as well. I believe Micron is only used in the Radeon VII which isn't covered in this guide. + +The type of memory decides how aggressive we can be with out modded timings. If you're not sure about what memory your Vega has, you can run GPU-Z under Windows, it will in almost all cases list the manufacturer under "Memory Type". + + +Step 2 - flash reference Vega 56 with Vega 64 bios +================================================== +NOTE: this step might not be necessary, but at the time of writing, we've only tested the Samsung timing mods on V56@64. In practice, this step means that you grab the corresponding Vega 64 bios for your particular type of Vega 56 and flash that using ATIFlash. There are a number of videos available on youtube to help you out if necessary. + + +Step 3 - which driver should I use? +=================================== +For Windows, we've tested extensively on Adrenaline 18.6.1. Newer drivers do provide the nice feature of being able to edit all P-states without soft powerplay tables (see next step). However, we have gotten reports that the hashrates are not as competitive as with 18.6.1, and sometimes power draw seems to be higher. + +For Linux, any amdpgpu-pro >= 18.30 should be fine. + + +Step 4 - set up soft powerplay tables (or use 19.4) +=================================================== +We will only briefly mention this step here - there are many links and tutorials out there to produce soft powerplay mods for both Windows and Linux. If you're not familiar with powerplay tables, please google or youtube your way to the necessary knowledge. After that you should be able to parse the information below. + +We are targeting an effective core clk of approx 1350 MHz on all GPUs. By "effective", we mean the value displayed by HWinfo64 (Windows) or /sys/kernel/debug/dri/*/amdgpu_pm_info (Linux). + +The results mentioned in the introduction section above were achieved under Windows using: + +Type CoreClk P7 MemClk P3 +Vega 64 or flashed 64 (Samsung) 1407 MHz, 905mV 1075 MHz, 905 mV +Vega 56 (Hynix) 1407 MHz, 905mV 930 MHz, 905 mV + +If you don't want to deal with messy powerplay tables, you can install Adrenaline 19.4.1, the latest driver available when writing this. You should be able to edit the full P-state information directly using OverdriveNTool 0.2.8. This approach has not been tested when writing this guide. + + +Step 5 - make sure you manage your temps +======================================== +The HBM2 memory performance will degrade with rising temperaturs. It is _imperative_ that you keep your gpu core temps under control. A rule of thumb is that the HBM2 temp will be around +10C from the gpu core temp, and we do _not_ want the HBM2 temp > 75C. Some people advocate keeping the HBM2 <= 60C at all times, meaning you should target 50C for your core temp. Personally, I target 55C for my core temps. + +For Windows and drivers < 18.12.2, you can simply enter a target temp in Wattman, OverdriveNTool or your PPT, and the driver will adjust the fan speed over time to keep the temp stable around your entered target. For newer drivers >= 18.12.2, this mechanism has been replaced with a very annoying fan curve that doesn't use interpolation of fan speeds between defined states. Unless you want to use Wattman, make sure you use OverdriveNTool >= 0.2.8 for full support. + +For Linux, there are a few tools available to set the fan speed, and there is normally good support in custom mining OSs. + +For monitoring, it's usually enough to just monitor the gpu core temp as displayed by the miner and assume +10C for the HBM2 temp. If you want to dig deeper, you can also run HWinfo64 or GPU-Z under Windows to see the true HBM2 temp at runtime instead of deriving it from the core temp. Under Linux, the HBM2 temp sensor isn't available in standard setups, although you might find it in custom mining OSs. + + +Step 6 - download ElioVP's AMD Memory Tweak Tool and apply timings +================================================================== +The Bitcointalk ANN thread is here: https://bitcointalk.org/index.php?topic=5123724.0 +The tool is available here: https://github.com/Eliovp/amdmemorytweak/releases + +Unfortunately, there is no bundled single file per platform to download. For Linux, you only download the single binary file "amdmemtweak". For Windows, download all the other files one-by-one and place them in the same directory. Note: the latest version when writing this was 0.1.7. Later releases might have a proper bundling in e.g. a .zip or .tgz file. + +NOTE: the memory tweak tool must always be run with Administrator/root permissions. + +The memory tweak tool lists GPUs in bus order. This is the same order that e.g. OverDriveNTool uses. TeamRedMiner does not use this order by default, but you can enable it by adding --bus_reorder as a command line argument for TRM. This is highly recommended. + +If you execute "teamredminer.exe --bus_reorder --list_devices", you will list all your available GPUs in the same order as the memory tweak tool uses. The goal is now to apply the appropriate timings to all your Vega GPUs. The approach followed below is to apply the timings one GPU at the time in a .bat or shell script. + +These are the timings used in our results presented above. We do NOT assume responsibility for applying these. We have used them extensively on many different GPUs, and the Vegas in general are quite fine being manhandled with weird settings. + +Lucky Vega 64 or flashed 64 (Samsung): +--CL 19 --RAS 28 --RCDRD 12 --RCDWR 5 --RC 44 --RP 12 --RRDS 3 --RRDL 3 --RTP 4 --FAW 18 --CWL 6 --WTRS 4 --WTRL 9 --WR 15 --WRRD 1 --RDWR 18 --REF 17000 --RFC 248 + +Weaker Vega 64 or flashed 64 (Samsung) - use if lucky timings aren't stable: +--CL 19 --RAS 30 --RCDRD 12 --RCDWR 6 --RC 44 --RP 13 --RRDS 5 --RRDL 5 --RTP 4 --FAW 18 --CWL 6 --WTRS 4 --WTRL 9 --WR 15 --WRRD 1 --RDWR 18 --REF 17000 --RFC 248 + +Conceal Vega 64 or flashed 64 (Samsung) - use for CN Conceal or try for any algo if Weaker straps still didn't work: +--CL 19 --RAS 32 --RCDRD 15 --RCDWR 8 --RC 46 --RP 13 --RRDS 4 --RRDL 5 --RTP 5 --FAW 16 --CWL 7 --WTRS 4 --WTRL 9 --WR 16 --WRRD 1 --RDWR 19 --REF 17000 --RFC 248 + +Lucky Vega 56 (Hynix): +--RAS 22 --RCDRD 17 --RCDWR 4 --RC 35 --RP 13 --RRDS 4 --RRDL 4 --RFC 148 --REF 15600 + +Weaker Vega 56 (Hynix): +--RAS 24 --RCDRD 19 --RCDWR 4 --RC 35 --RP 13 --RRDS 4 --RRDL 5 --RFC 148 --REF 15600 + +Note: the Vega 56 timings are based on user anwil's post on bitcointalk. + +Example mixed rig with three Vegas, two flashed 64, one Hynix. The last GPU is the Hynix. + +# Fetch the TeamRedMiner GPU order (using --bus_reorder) +%> ./teamredminer --bus_reorder --list_devices + +[2019-04-18 15:48:53] Auto-detected AMD OpenCL platform 0 +[2019-04-18 15:48:53] Auto-detected AMD OpenCL platform 1 +[2019-04-18 15:48:54] Detected 5 devices, listed in pcie bus id order: +[2019-04-18 15:48:54] Miner Platform OpenCL BusId Name Model Nr CUs +[2019-04-18 15:48:54] ----- -------- ------ -------- ------------- ------------------------- ------ +[2019-04-18 15:48:54] 0 0 2 03:00.0 gfx900 Radeon RX Vega 56 +[2019-04-18 15:48:54] 1 1 0 05:00.0 Ellesmere Radeon RX 570 Series 32 +[2019-04-18 15:48:54] 2 1 1 09:00.0 Baffin Radeon RX 560 Series 16 +[2019-04-18 15:48:54] 3 0 1 0e:00.0 gfx900 Radeon RX Vega 56 +[2019-04-18 15:48:54] 4 0 0 11:00.0 gfx900 Radeon RX Vega 56 +[2019-04-18 15:48:54] Successful clean shutdown. + +# Verify that the order matches the tool (note: output filtered) +%> ./amdmemtweak --bus_reorder --list_devices + +GPU 0: Vega [Radeon RX Vega] pci:0000:03:00.0 +... +GPU 1: Ellesmere [Radeon RX 470/480] pci:0000:05:00.0 +... +GPU 3: Vega [Radeon RX Vega] pci:0000:0e:00.0 +... +GPU 4: Vega [Radeon RX Vega] pci:0000:11:00.0 +... + +# This would be the content of a Windows .bat file with different straps selections: +-------- +@echo off + +rem Vega gpu 0 - lucky Samsung straps +WinAMDTweak.exe --gpu 0 --CL 19 --RAS 28 --RCDRD 12 --RCDWR 5 --RC 44 --RP 12 --RRDS 3 --RRDL 3 --RTP 4 --FAW 18 --CWL 6 --WTRS 4 --WTRL 9 --WR 15 --WRRD 1 --RDWR 18 --REF 17000 --RFC 248 + +rem Vega gpu 3 - weaker Samsung straps +WinAMDTweak.exe --gpu 3 --CL 19 --RAS 28 --RCDRD 12 --RCDWR 5 --RC 44 --RP 12 --RRDS 3 --RRDL 3 --RTP 4 --FAW 18 --CWL 6 --WTRS 4 --WTRL 9 --WR 15 --WRRD 1 --RDWR 18 --REF 17000 --RFC 248 + +rem Vega gpu 4 - lucky Hynix straps +WinAMDTweak.exe --gpu 4 --RAS 22 --RCDRD 17 --RCDWR 4 --RC 35 --RP 13 --RRDS 4 --RRDL 4 --RFC 148 --REF 15600 +-------- + +As you can see, you simply execute the command with " --gpu " and then append the full timings line copied from our list above. + +After applying the memory timings, you can run the tool with "WinAMDTweak.exe --current" and verify that the presented values for the GPUs now matches the straps you wanted to apply. + +Step 7 - configure and run miner +================================ +After making sure both your powerplay table (clocks/voltages) and your memory timings are applied, it's time to start the miner. The most important part is to make sure the new *-mode is used for all Vegas. If this mode isn't used, you will not see any significant improvements with the memory timings. The mode will be used by default. + +In practice, this means that any old start script for the miner that configures each gpu by passing e.g. --cn_config=16+14,14+14 for a Vega 64+Vega 56 now should be modified to --cn_config=15*15,14*14, these are the expected best modes per Vega type. Flashing a Vega 56 to 64 does not increase the nr of compute units, so you should try 14*14 first. In some cases, 15*15 has been a little better. For true Vega 64s, we expect 15*15 to be optimal, but it can also be interesting to try 16*14. + +Other than the above, I will not repeat how you configure the miner in this document. For the simplest form of testing, run the start_cnr.bat or ./start_cnr.sh script included in the release. It will run the miner with decent configurations on all AMD GPUs found. + + +Step 8 - monitor and evaluate results +===================================== +This step is much more important now that we've introduced an additional form of optimization with the modded memory timings. We need to make sure our clocks and timings do indeed produce correct results over time. The best way of assessing this is by running the miner for a longer period, then inspect the nr of hw errs and the poolside hashrate as reported by TRM. + +If you see too many hw errors for your GPU(s), you simply need to dial down on the memory timings, lower your core and/or mem clk, and potentially also increase voltage. You need to tune your way to stability. If none of the timings above work, look for better memory timings suggested in the Bitcointalk thread for the tool. However, if you see a low amount of hw errors, maybe 0.10-0.20% and your poolside hashrate still matches the expected, you can definitely choose to keep the settings, but keep an eye out going forward. + +Now, I believe most miners don't know how to evaluate their poolside hashrate properly. It's never about how long you are running the miner in your test(s), it's _only_ about the nr of shares found. Moreover, we actually need a very(!) large amount of shares found before we can determine if a miner is doing its job or not. Running 10h against Nicehash is absolutely useless(!) as an assessment test of a miner's poolside hashrate. + +Instead, to produce a large nr of found shares we can do two things: run for a very long time, or run with a low static difficulty, or both. However, it is NOT nice to pool ops to use a low static diff just to test your miner. Instead, download xmrig-proxy and configure it with your pool of choice. Then, also configure the "custom-diff" value and set it to a very low value and shares will be flying, which is what we want. Xmrig-proxy will forward only the shares that matches the true pool difficulty to the pool, and everyone is happy. + +This is the outline for a good test: + +1) We need to run until we have a count of 50,000 accepted shares. +2) We want to run for max 5h. This means ~3 found shares/sec. +3) If the hashrate of your rig is X h/s, you need to set a custom diff of X/3 in xmrig-proxy. +4) Run for 5h. +5) Check the poolside hashrate in TRM and verify against the avg hashrate * 0.975. + +Using this approach, being more than 2% from the expected value is highly unlikely. In a few cases, you will still be between 1-2% off the mark. If you're further off, continue to run until you can see which GPU(s) are too far off from their respective mark. Again, take the avg hashrate * 0.975, this time per GPU. Many times it's obvious which GPU is being problematic, especially if hw errors are present. Other times it can be more difficult when one or more cards simply don't deliver the expected nr shares over time. + + +Last Step +========= + +Happy Mining, and as per ElioVP's wishes: share any improved memory straps you may come up in the Bitcointalk ANN thread! + diff --git a/doc/ETHASH_GENERAL_TUNING.txt b/doc/ETHASH_GENERAL_TUNING.txt new file mode 100644 index 0000000..5077eca --- /dev/null +++ b/doc/ETHASH_GENERAL_TUNING.txt @@ -0,0 +1,245 @@ +TeamRedMiner Ethash Tuning Guide +================================ + +Summary +------- +- Do not trust the displayed hashrates of other closed source + miners. We will release a separate report on this topic and tools + for everyone to verify our claims shortly. For the time being, run + longer tests, observe your accepted poolside hashrate and draw your + own conclusions. + +- Polaris cards are only briefly covered by this guide, plenty of + guides available already. + +- For Vega performance, you must establish modded mem timings using + the AMD mem tweak tool yourself, or use a mining OS that does it for + you. TeamRedMiner does not modify timings automatically. + +- Vega 56 Hynix can do > 50 MH/s which is a significant boost vs other + miners. + +- Vega 64 Samsung can do > 50 MH/s. + +- Vega 56 Samsung should be flashed with a Vega 64 bios for max + hashrate, 49-50 MH/s. + +- Vega 56 Samsung with 56 bios should be tuned for efficiency instead + targeting 46.5-47.5 MH/s. + +- Radeon VII gets a big boost and can do > 86-87 MH/s. + +- Dev fees: Polaris 0.75%, Vega 1.0% + +- TRM discord: https://discord.gg/RGykKqB + +General Overview +---------------- +In general, TeamRedMiner behaves like other ethash miners. If you have +a tuned configuration for another miner, it should generally work well +although might not be the absolute optimum for your rig(s). The main +exception is for cards driving monitors or doing other simultaneous +work. For those, you often need to specify a lower manual --eth_config +value or the miner will collide with rendering tasks, having the +driver reset the GPU during mining. + +For more help, and for issues not mentioned in this document, please +join the TRM discord and ping us there. + +Note 1: the reader is assumed to be skilled in Vega tuning and know how +to use PPTs, set clocks using e.g. OverdriveNTool on Win, modify +timings using the AMD mem tweak tool, etc. + +Note 2: we will be shortly be releasing an ETH miner testing tool that +helps out in these tuning tests by acting as a pool with very low +difficulty, making it much easier to spot excessive hw errors +quickly. Stay tuned for more info! + +Note 3: a huge shout-out to our beta testers for helping out with +timings and stability testing in general (ddobreff, BDF, NCarter84, +gkumaran, pbfarmer, heavyarms1912)! + + +Mining Modes: A and B +--------------------- +TeamRedMiner introduces two mining modes for ethash: A and B mode. +The B mode allocates a lot more memory for the DAG, meaning only >= +8GB GPUs can be used. The B mode normally adds 0.2-0.5% for Polaris +8GB cards, nothing for Vega 56/64 and a significant boost for Radeon +VIIs. The miner automatically chooses the best mode for you. + +NOTE: for Windows miners using the B mode - you should roughly make +sure you have at least 8GB per GPU as available swap space, or you +risk running out of swap and need to manually force the A mode for one +or more GPUs in the rig using the --eth_config command line argument. + +In addition to the A/B mode, the miner contains a single tuning +number. The miner will auto-tune the best setup for you unless a +manual override is provided using the --eth_config parameter. Please +see the miner help (teamredminer --help) for more details. The tuning +number ranges for various GPUs are: + +470/570: 0-512 +480/580: 0-576 +Vega 56: 0-896 +Vega 64: 0-1024 +Radeon VII: 0-960 + + +Polaris Cards (470-580) +----------------------- +There are a multitude of tuning guides, straps, ref boost guides etc +available for Polaris GPUs already. For these cards, TeamRedMiner +performs similarly to available miners. At the time of writing, our +tests indicate that TRM outperforms the other miners in the majority +of cases (0.75% dev fee taken into account). Sometimes it can be +0.5-1.0%, other times it's less, a few times it's such a close call +it's impossible to tell. Our B mode normally adds a 0.3-0.5% boost on +8GB cards. + +For power draw, and adjusting for the true poolside hashrate, TRM is +currently the most lean miner, although the normal difference isn't +more than 0.5-1.5W per Polaris GPU. + +Tuning process in short: use your current ethash straps, or if you're +a CN miner please switch to ethash-centric straps if possible. Then, +don't forget to enable --REF boost using the AMD mem tweak tool. Last, +start with a more generous voltage, max your mem clock, then lower +your core clk, and drop the voltage as much as possible while +remaining stable. + +If you need more help or like to discuss tuning, join the TRM discord. + +Vega 56 Hynix +------------- +The Vega 56 GPUs with Hynix HBM2 memory are generally known to be +underperforming their Samsung cousins. With this TRM release, they +have suddenly become strong contenders for the best Vega for ethash +mining instead. + +The tuning setup we have come to like in our tests is the following: + +1) Start with your core clk at 1100 MHz while pushing the mem clock to + 950 MHz. If you know that your mem can't handle 950 MHz, lower it + from the start. Use 875mV for voltage. + +1) Start with the following modded mem timings as a baseline: + + --cl 18 --ras 23 --rcdrd 23 --rcdwr 11 --rc 34 --rp 13 + --rrds 3 --rrdl 4 --rtp 6 --faw 12 --cwl 7 --wtrs 4 --wtrl 4 + --wr 11 --rfc 164 --REF 17000 + +2) The guess is that this setup will hit 46-46.5 MH/s for you. The key + to improving performance is --rcdrd. Proceed to lower that value + one step at the time, stopping and restarting the miner between + each change. You must be on the lookout for hw errors which means + you've reached your GPU's limit. + +3) If you're part of the lucky crowd and your GPU can handle a rcdrd + value as low as 15-16, you should now be seeing a 50-50.2 MH/s + hashrate. If your card starts producing hw errors, you need to + increase rcdrd until you're stable. NOTE: you must _blast_ your + fans to make sure the HBM temp is kept in check. On Windows, we + recommend using HWiNFO64 to monitor the HBM temp. It should + preferably be kept < 60C. + +4) Lower the core clk as much as possible without losing hashrate. If + you ended up ith a rcdrd value > 16, there should be room to lower + it from 1100 MHz. For a 50 MH/s hashrate, you probably need the + 1100 MHz core clk to sustain it. + +5) Lower voltage to 850/840/830/820 mV, as low as possible while + remaining stable. + +6) For better efficency (but lower hashrate), tune down your core clk + and try to further lower the voltage. + +Using this setup, we have been running Gigabyte Vega 56 Hynix cards +for > 50 MH/s with no hw errors. Other miners running at the same +clocks ended up around 44 MH/s, meaning +13-14% for TRM, although we +can't guarantee there aren't better configurations for said miners. + +Vega 64 Samsung +--------------- +In short, try the following for a 50-51 MH/s setup: + +1) Set clocks to core clk 1075 MHz, mem clk 1107 MHz, voltage at 850 + mV. Your card may need to clock down the mem clk to 1050-1080 MHz + to be stable with no hw errs. + +2) Use the following modded timings: + + --CL 20 --RAS 30 --RCDRD 14 --RCDWR 12 --RC 44 --RP 14 --RRDS 3 + --RRDL 6 --RTP 5 --FAW 12 --CWL 8 --WTRS 4 --WTRL 9 --WR 14 + --REF 17000 --RFC 249 + +3) Hopefully it runs stable and you should observe a hashrate around + 50.5-50.9 MH/s. If your GPU can't handle the timings, you need to + relax them to less aggressive variants available in the AMD mem + tweak thread on Bitcointalk, or come join the TRM discord for + further suggestions. + +4) For better efficiency (but lower hashrate), tune down your core clk + and try to further lower voltage while remaining stable. + +5) We reiterate: _blast_ your fans and monitor your HBM temperature. + + +Vega 56 Samsung flashed 64 bios +------------------------------- +From a tuning perspective, we treated these cards like regular Vega 64 +GPUs. You might need to increase core clk somewhat compared to true +Vega 64s to compensate for the fewer compute units, otherwise follow +the guide for Vega 64 Samsung. + +In general, the flashed cards couldn't handle a maxed out mem clk at +1107 MHz, rather needed to clock down to 1060-1075 MHz. Start stable, +and add a step where you slowly increase the mem clk again. + + +Vega 56 Samsung (56 bios) +------------------------- +If you want to target a max hashrate setup for 49-50 MH/s, we advise +you to flash a Vega 64 bios. We have found it very difficult to +achieve the same speeds as for Vega 56 Hynix using the stock bios for +Vega 56 Samsung GPUs. If you insist on using the Vega 56 bios, we +rather recommend a more efficient setup doing 47-47.5 MH/s: + +1) Clock down your core clk significantly to 940 MHz. We ran our tests + in power states core P3+mem P3. Set mem clk to 950 MHz. Start with + 850 mV for voltage. + +2) Apply small timing modifications: + + --RCDRD 12 --REF 12000 + + If your card can't handle RCDRD 12, leave it at the stock value (13). + +3) Lower voltage as much as possible. We ended up at 815 mV. You + should now have a nice efficient setup for 47.0-47.5 MH/s. + + +Radon VII +--------- +The VIIs are simple to tune. Just start the miner at maybe 1600 MHz +core clk, 1000 MHz mem clk, it should give you 86-87 MH/s. The miner +will auto-tune to some Bxxx value for you and give a significant boost +over other miners. + +IMPORTANT: we have seen some VIIs getting close to zero boost between +A and B modes. The B mode should see a significant hashrate +increase. Many times those GPUs have been running the very original +bios (v105) and need to flash to v106 that was released by AMD shortly +after the Radeon VII release date. That bios can be tricky to find at +this point. Please ping us in the TRM discord and we can provide it. + +The only remaining tuning is to first lower the core clk until you +start losing hashrate, which often is immediately. After that, try to +lower voltage for a more efficient setup. If you can't keep your cards +cool, you need to clock down in general (both core and mem clk) for a +lower hashrate. + +And again: blast those fans, especially on the VIIs. + + +Happy mining! diff --git a/doc/KAWPOW_TUNING.txt b/doc/KAWPOW_TUNING.txt new file mode 100644 index 0000000..43f3fb5 --- /dev/null +++ b/doc/KAWPOW_TUNING.txt @@ -0,0 +1,178 @@ +Team Red Miner Kawpow Mining +============================ +This document provides some pointers on how to best test and tune for +the Kawpow algo used by Ravencoin. + + +General background +------------------ +Kawpow (progpow) is designed to fully utilize the resources on a gpu: +compute, local memory and global memory. This means the algorithm +falls into the most resource-intensive category of pow algorithms to +date. In turn, this leads to a high power draw and hot gpus. + +The algorithm also contains random elements that vary with each block +height. The load on the gpu will therefore vary accordingly. The +hashrate difference between lean and mean blocks with easy vs heavy +random math operations can vary as much as +-10%. Therefore, tuning +for the algo without locking down your tests to a specific block +height means you'll get random results over time. This is very +important, and the miner therefore provides a mechanism for doing so. + +DISCLAIMER: please note that this algo runs MUCH HOTTER than +e.g. ethash. You should be prepared to lower your clocks significantly +to avoid overheating your gpus. + +Tuning Clocks +------------- +The most important controls are (as usual) the core and memory clock +of your gpu. This miner does not set clocks; you need to control them +using your mining os or an external tool like OverdriveNTool on Windows. + +The goal is to find a balance between the core and memory clock so +that your gpu temperatures are under control and neither core or mem +is a clear bottleneck. We choose to do so on a specific block height +that we know represent an average load on the gpu. Then, we make sure +our tuning runs fine on two other example heights that are lean and +mean, respectively. If the rig can run for a sustained test period on +the mean block height, the configuration should hopefully survive the +hardest heights during regular mining. + + +Tuning Intensity and Kernel Mode +-------------------------------- +This miner very aggressively consumes a gpu's resources to achieve the +highest possible hashrate. In some cases, you need to control the +"intensity" used to allow for e.g. rendering tasks to execute in +parallel with mining. + +Unless you specify the intensity manually, the miner will select an +intensity that is 16 below the max (see below for max values). Most of +the time a higher intensity results in a higher hashrate, but can +cause problems with responsiveness on GPUs used for monitors. + +For Vegas and Navi cards, the miner also provides two different mining +modes: A and B mode. Due to choices made by the AMD driver teams, the +B mode is not available for Vegas on Windows or older amdgpu-pro +drivers (<= 18.50 or so). For Navi cards, both modes are available on +all platforms. Unless specified, the miner will choose what should be +the optimal mode for each gpu automatically. The B mode is the obvious +choice whenever you can run it, it is faster. However, for gpus +running a monitor, the A mode is sometimes the better choice. + +The maximum intensity a gpu can run is 16 x ComputeUnits. This means: + +GPU 470/570 480/580 Vega56 Vega64 VII 5700xt 5700/5600xt +Max 512 576 896 1024 960 640 576 + +The miner will automatically cap a higher configured value than the +max intensity for a gpu. + +The kernel mode and intensity are specified together using the +--prog_config command line parameter. Please see the USAGE.txt file or +run the miner with --help to see examples of how to use the parameter. + + +Tuning for monitor gpus +----------------------- +As mentioned, monitor gpus need to be limited manually with a lower +intensity to provide access to the gpu for rendering tasks, or your +system might become overly sluggish and unresponsive. + +Typically, take the max intensity possible for your gpu, reduce it +with 64 and pass it in a --prog_config argument. If the system still +feels sluggish, decrease the value further in steps of 8 until you're +satisfied. You can also try increasing it if the system feels +responsive enough. For Vegas and Navi cards, you can also try to +switch between the A and B modes to see what effect it has on your +system. + +Example for a three gpu system of Vega 64s, the first running a +monitor, choosing the A node for the monitor gpu: + +--prog_config=A960,B1024,B1024 + + +Tuning Workflow +--------------- +This is an example workflow that can be used to tune a rig for kawpow +mining: + +- We have selected three block heights representing lean, average, and + mean random math selections: + + Lean: 937863 + Average: 1234567 + Mean: 1006647 + +- Create a script or command line for running the miner against any + pool of your choice but also adding the argument + --prog_height=1234567. This means the miner will simulate mining on + our average random math height and also not submit any shares to the + pool. Using the example script provided with TRM + (start_kawpow.sh/bat), mining on Linux, only on device 0 and + hashrate log time shortened to 15 secs, it would look as follows: + + ./teamredminer -a kawpow -o stratum+tcp://us.rvn.minermore.com:4501 \ + -u RDpPHx43bhrmdyd8L6BcpkHtjuc1vMpNSk.trmtest -p x \ + -d 0 -l 15 --prog_height=1234567 + +- Crank up your fans as much as possible to keep temps in check + throughout the tuning process. + +- Start tuning by tweaking your clocks, always rerunning the same + script with the average block height 1234567. You can use the clock + examples below as your starting point, or any other clocks of your + choice. Use a low-ish voltage for your clocks but rather a little + above what you're hoping to achieve as the final configuration. + +- Running the script over and over again, or using live clock + adjustments, observe how changes to higher/lower core clk and + higher/lower mem clk affects the hashrate. Unless they are balanced + fairly well, changes to one of core/mem clk will have a more + significant impact on the hashrate than the other. Lower that clock + until you reach the point where lowering either of the two clocks + affects the hashrate slightly. + +- Modify your script or command line to use our mean block height, + 1067241, instead of 1234567. + +- Run the script repeatedly again, and try to lower voltage as much as + possible while still being able to mine continuously for a while on + the mean block height. + +- Lower your fans to a more comfortable level. Run mining on the mean + block height again and verify that gpu temps are still under control. + +- If you feel you ended up with a too low/high core clock, or your + temps are not under control, restart the process from tuning your + clocks with a different core clock as starting point, and find the + correct mem clock for a balanced core/mem clock configuration. + +- Last, run the lean mining level as a final test, modifying your + script to run block height 937863. It should give you the highest + hashrate seen so far, and hopefully have no issues running. + + +Memory Timings +-------------- +Timings do not have the same significant impact on kawpow as on +e.g. ethash or CN variants. There might be findings in the future that +can be worth mentioning, for now we skip the subject in this guide. + + +Clocks and hashrates examples +----------------------------- +These clocks are reported by TRM testers and can be used as a starting +point for your own tuning. They have not been tested at controlled +heights, so your results may vary from the specified hashrates below. + +Type CoreClk MemClk mV prog_config Hashrate +------- -------- -------- ------ ------------- --------- +5700XT 1310 MHz 1800 MHz 750 mV B608 23.0 MH/s +5700 1310 MHz 1800 MHz 775 mV A400 (monitor) 19.7 MH/s +5600XT 1310 MHz 1800 MHz 750 mV B608 20.0 MH/s +Vega 64 1045 MHz 925 MHz 800 mV B1024 25.0 MH/s +Vega 56 Hynix 957 MHz 940 MHz 812 mV B896 24.4 MH/s +RX580 8GB 1175 MHz 2100 MHz 820 mV A576 15.0 MH/s +RX480 4GB 1100 MHz 2000 MHz 850 mV A400 14.3 MH/s diff --git a/doc/MAP_YOUR_GPUS.txt b/doc/MAP_YOUR_GPUS.txt new file mode 100644 index 0000000..2957a94 --- /dev/null +++ b/doc/MAP_YOUR_GPUS.txt @@ -0,0 +1,54 @@ +Team Red Miner GPU Mapping Tutorial +=================================== + +Introduction +============ +In a multi-gpu rig, a recurring problem is to understand which gpu is which when moving between the Windows registry, miners, and tools like OverdriveNTool. + +Basically, there are three enumerations of interest to the user: + +1) PCI bus id order. Each PCI device/slot is assigned a bus id. It stays the same across reboots. +2) OpenCL order. This is the order chosen by the device driver. +3) (Windows) Registry order. For e.g. assigning soft powerplay tables, you need the registry key: 0000, 0001, etc. +4) (Linux) Kernel sysfs order. + +The most common order chosen by mining software has been the OpenCL enumeration order, since that's the interface you use to communicate with the driver. That is also the default enumeration order in TeamRedMiner. That said, with things like multiple OpenCL platforms now being used by the Linux driver, this is getting annoyingly complex. The PCI bus id order makes more sense, and if we would have built TeamRedMiner from scratch today we probably would have opted to use that order instead. + +The OpenCL order is produced by the device driver. As long as you don't add/remove GPUs the order will stay constant. You can always enumerate this order yourself by running the command "clinfo", available on both Windows and Linux. It will dump a boatload of data in your face and isn't trivial to follow, but if you only look at device names you should find all your devices. + + +Windows +======= +On Windows, the best way to map gpus is to use OverdriveNTool: + +1) Download and install OverdriveNTool. +2) Open the tool, then right-click the top title bar and choose "Settings". +3) In the menu, enable "Bus Number", "Registry Key", and "Friendly Name" in the upper-left section. + +Each gpu is now listed as e.g. "1: Radeon RX Vega (Bus: 6:0:0 | RegKey: 0001)". Note that OverdriveNTool itself uses the PCI bus id order. Next, in your .bat file that starts TeamRedMiner, add the command line arg "--bus_reorder". This instructs TRM to also use the PCI bus id order. + +With these changes, and assuming you mine on all AMD GPUs in TRM, you will: + +1) Have the same set of gpus in OverdriveNTool and TRM, listed in the same order. +2) Be able to use OverdriveNTool to map a gpu to the registry. + +You can also sanity check the mappings by checking the PCI bus id printed by TRM at startup, then verify that the same nrs are indeed listed per gpu in the same order in OverdriveNTool. + + +Linux +===== +Linux doesn't really have the same recurring mapping issues as Windows. Under Linux, the PCI bus id is available in a number of places, for example in /sys/class/drm/card*/device/uevent. + +The most common issue for Linux is rather if you have hacked your own clock/fan scripts using sysfs that have hardcoded cardX values. The order under /sys/class/drm/card* is chosen by the kernel and is not guaranteed to stay constant if you e.g. add/remove a gpu. For mixed rigs, it's VERY annoying if one gpus dies, you have some watchdog reboot mechanism, and then you apply clocks and voltages to the wrong cards because cardX is now cardY instead. + +Therefore, it's much better to scan all /sys/class/drm/card* entries for a specific PCI bus id, and not use hardcoded cardX values at all. Naturally, all of the custom Linux mining OSs will solve this for you. It's also a good way to make sure you apply clocks/PPTs to the correct card in the TRM device order that you use. For finding the sysfs path for a gpu with PCI bus id 0000:11:00.0 you can use this one-liner: + +%> egrep PCI_SLOT_NAME /sys/class/drm/card*/device/uevent | egrep "0000:11:" | cut -f 1-5 -d / + + +API PCI bus id support +====================== +This is a somewhat undocumented feature to our sgminer-compatible API that can be helpful for 3rd parties using the API: + +1) Execute the sgminer "DEVDETAILS" API call. +2) Each gpu will have the PCI bus id in hex format in the "Device Path" field. diff --git a/doc/MTP_MINING.txt b/doc/MTP_MINING.txt new file mode 100644 index 0000000..9a59c09 --- /dev/null +++ b/doc/MTP_MINING.txt @@ -0,0 +1,101 @@ +Team Red Miner MTP Mining +========================= +The MTP algo is primarily used by Zcoin (XZC). We're writing this +document since the algo is a little different from the other +algorithms available in TRM. + +4GB Cards +========= +4GB cards are NOT supported as the MTP scratchpad will not fit in the +available vram. The miner will automatically exclude GPUs where the +driver reports less than 4.4GB available vram. If you still want to +force the miner to run you can add the --allow_all_devices argument. + +Characteristics +=============== +MTP runs HOT on your gpus and is truly a power hog algo. On most gpus +it will consume close to 100% of the compute resources available while +also activating the mem controller to 75-85% of full load. Hence, +MTP's profile is both that of a compute bound and a mem bound +algo. For example, if you limit the compute resources too much with +e.g. a lower core clk you can end up running the mem clk unnecessarily +high and vice versa. In practice, this means you need to find the +correct balance between core and mem clk for max efficiency. + +Due to the power hog profile where e.g. a Vega64 LC at 1407@900 core +clk, 1107@900 mem clk easily can pull 250W, we cannot stress this +enough: DO NOT START THE MINER OF A FULL RIG AT "NORMAL" CLOCKS unless +you know your PSU can handle a very high load. Run it on 1-2 gpus +first using -d 0,1 and observe, or lower your clocks and work your way +upwards instead when tuning. + +Tuning +====== +For Polaris (470-580s) cards, a typical configuration would be a core +clk around 1200 MHz. Then, start with a mem clk around 1900-2000 MHz +and lower it in steps of 50 MHz and measure the hashrate for each +change. The first steps should have a very small impact on the +hashrate, but at some point the mem clk will have a more significant +impact. This point varies a lot between gpus, you have to find your +own. After finding your hard limit for the mem clk, adjust it upwards +a little bit to a hashrate of your liking given the power draw. Last, +dial down both core and mem voltage as much as possible to save power. + +For Vegas, as stated above this algo runs HOT. To improve efficiency, +the key is to lower your mem clk significantly, to 800 MHz or even 500 +MHz. You should also try to force the mem power state to P2 or P1 +since this will lower the soc clk, resulting in lower power draw as +well. + +In a full throttle configuration on my Vega 64 LC using e.g. 1400 cclk +at 860 mV, 800 memclk, I can push it to 3.5 MH/s. This will be a 250W+ +power draw. This can absolutely be a net win in the end, but you need +to check your numbers and account for your power costs. + +For a full Vega rig, a more reasonable approach is to clock down mem +to 500 MHz in mem p-state 1 and the core clk to anything between +1180-1250 MHz. Start with the core clk in the low range and set +voltages to 800 mV (for both core and mem), then increase core clk to +see how far you can push it while remaining at 800mV. This will lead +to 2.8-2.9 MH/s for a Vega 64 and 2.6-2.7 (maybe) MH/s for Vega 56s, +still pulling around 200W per gpu. + +Nicehash Mining +=============== +TRM supports Nicehash mining for MTP out of the box. However, there is +a confirmed bug with Nicehash for ntime rolling which they are aware +of. Meanwhile TRM will automatically apply the option +"--no_ntime_roll" whenever Nicehash mining is detected. + +Dev Fee Switching +================= +In all other TRM algos, we run the dev fee hashes with a very fine +granularity, many times truly concurrently with the user's work. This +means that there are no interruptions for the user, a smooth poolside +hashrate and zero downtime for pool switching (which can have a very +negative impact with pools that penalize pool hoppers). + +For MTP, this approach is not possible. The user's and dev's work will +never use the same 4GB scratchpad. Therefore, MTP must be implemented +using a more traditional switching mechanism. We've worked actively to +reduce the impact for the user with this less sophisticated approach: + +- Heavily optimized code for the pad rebuild, fully done on gpu + (1.6-1.9 secs on both Polaris and Vega GPUs). + +- We're flexible with when we switch, always trying to switch when a + new user job arrives which means that the pad would have to be + rebuilt anyway. + +- We remove -0.1% from the listed dev fee, i.e. devs take the hit for + the extra pad rebuild, not the user. + +Last, we don't log explicitly when we switch to dev hashing. That +said, we do not switch often, and the user should indeed see a +poolside hashrate of 97.5% of the average hashrate over time. Another +difference compared to other miners is that different GPUs are not +necessarily switched at the same time. However, please note that there +will always be a single early dev fee switch on all GPUs to avoid easy +circumvention of the dev fee by restarting the miner repeatedly. This +means that the (very) early poolside hashrate for the user might be +lower than expected. diff --git a/doc/NIMIQ_MINING.txt b/doc/NIMIQ_MINING.txt new file mode 100644 index 0000000..b8f1699 --- /dev/null +++ b/doc/NIMIQ_MINING.txt @@ -0,0 +1,48 @@ +Team Red Miner Nimiq Mining +============================ +This document provides some quick pointers on how to best test and +tune for the Nimiq argon2 variant. + + +WALLETS IN MINING OS CONFIGS +---------------------------- +IMPORTANT: in both Hive and SMOS, the Nimiq wallet needs to be passed +without spaces, i.e. "NQ69ABCD...." instead of the user-friendly form +"NQ69 ABCD ....". + + +Pools +----- +TRM only support Nimiq dumb mode mining, and not over secure +websockets but plain tcp. This is not a mode supported by +pools. Therefore, TRM runs a separate process as a network proxy +converting the plain tcp traffic into secure websockets messages. + +The proxy is now integrated into the miner and executed automatically +unless you disable it using --nimiq_no_proxy as a command line +argument. There are also arguments for controlling the local port and +the path to the proxy binary. See --help or USAGE.txt for more info. + +The bundled binaries are built from the source found here: + +https://github.com/Kerney666/trm_nimiq_proxy + +You can also run the proxy outside of the miner. Follow the +instructions in the proxy github, pass --nimiq_no_proxy as argument to +the miner and pass the IP+port where the proxy is listening for the +stratum pool argument. + +Use the --nimiq_worker=myrig argument to set a device name for your +workers that is passed to the pool(s). + + +Tuning +------ +Nimiq tuning is more or less identical to Turtlecoin Chukwa +tuning. See our separate tuning guide TRTL_CHUKWA_TUNING.txt. + + +Nimiq Tuning Parameters +----------------------- +There are no tuning parameters for Nimiq, it just runs. Play with your +clocks to adjust behavior and power draw. diff --git a/doc/TRTL_CHUKWA_MINING.txt b/doc/TRTL_CHUKWA_MINING.txt new file mode 100644 index 0000000..c11557d --- /dev/null +++ b/doc/TRTL_CHUKWA_MINING.txt @@ -0,0 +1,71 @@ +Team Red Miner Turtlecoin Chukwa-512 Mining +=========================================== +This document describes how to tune for the Argon2id-512 algo used by +Turtlecoin. + + +2GB/4GB Cards +============= +The algo allocates ~1.2GB of ram and should run on 550 2GB gpus and +up. At the time of writing, we have not done any tests on <= 4GB gpus +though. + + +Characteristics +=============== +Chukwa-512 runs quite lean. Mem clock and timings is _everything_. For +max hashrate you actually _need_ to downclock the core clk to find +your sweet spot. + + +Polaris Tuning +============== +For Polaris (470-580s) cards, crank up your mem clock as much as +possible and use optimized straps. We have mostly tested using CN +optimized straps which seem to work fine. + +After maxing the mem clock, proceed by lowering your core clock. You +should see your hashrate fluctuate slightly. My sweet spot during +testing was around 1050 MHz, yours might be different. + +Next, proceed to undervolt as much as possible while keeping your rig +stable. + +Last, using the rx boost (--ref) tweak with the AMD Mem Tweak Tool by +eliovp can give you a boost as well. + + +Vega Tuning (56/64/VII) +======================= +For Vega 56/64, we recommend flashing ref V56s with the corresponding +V64 bioses. Mem latency is important for the Vegas for Chukwa. Modding +mem timings using the AMD Mem Tweak Tool by eliovp will help a lot for +Vegas. + +The typical configuration for a Vega 56/64 is simple: max your mem +clock, pull down your core clock to e.g. 1137 MHz, undervolt as much +as possible. Also try to find your sweet spot for the core clk, it +could boost the hashrate somewhat. + +Next, if you want to maximize your hashrate, you need to apply mem +timings. We recommend running with stock timings to make sure the rig +is stable first though. Then, try either our recommended timings in +CN_MAX_YOUR_VEGA.txt in order Lucky, Weaker, Conceal (for V64s). Or, +use your own timings. Personally, I run with the leaner Conceal on my +V64s. + +For Radeon VII, this is one of the few algos where a maxed out mem +clock at 1200 MHz actually provides a better hashrate. Cooling is +often the bottleneck for VIIs. Play with your clocks until you find a +reasonable hashrate vs power draw vs cooling setup. Modding timings +can also help for the VIIs, not very common. + +Vega clocks and hashrates examples + +Type CoreClk MemClk mV Stock Mem Modded Mem +---------- -------- -------- ------ --------- ---------- +V56 Hynix 1137 MHz 945 MHz 875 mV 101 kh/s 110 kh/s (Lucky) +V56@64 Ref 1137 MHz 1075 MHz 900 mV 110 kh/s 120 kh/s (Conceal) +V64 LC 1137 MHz 1107 MHz 900 mV 114 kh/s 123 kh/s (Conceal) +VII 1351 MHz 1150 MHz 831 mV 222 kh/s 225 kh/s (small patches) +VII 1400 MHz 1200 MHz 831 mV 231 kh/s 234 kh/s (small patches) diff --git a/doc/USAGE.txt b/doc/USAGE.txt new file mode 100644 index 0000000..01f81ba --- /dev/null +++ b/doc/USAGE.txt @@ -0,0 +1,229 @@ + Team Red Miner version 0.7.9 +Usage: teamredminer [OPTIONS] +Options: + -a, --algo=ALGORITHM Selects the mining algorithm. Currently available: + ethash (eth, etc, etp, others) + kawpow (ravencoin) + lyra2z + phi2 (lux, argoneum) + lyra2rev3 (vtc) + x16r (rvn original) + x16rv2 (rvn) + x16s (pgn, xsh) + x16rt (veil, gin) + mtp (zcoin) + cuckatoo31_grin (grin) + cuckarood29_grin (grin) + cnv8 + cnr (monero) + cnv8_half (stellite, masari) + cnv8_dbl (x-cash) + cnv8_rwz (graft) + cnv8_trtl (old turtlecoin, loki) + cnv8_upx2 (uplexa) + cn_heavy (classic CN heavy) + cn_haven (haven) + cn_saber (bittube) + cn_conceal (conceal) + trtl_chukwa (turtlecoin) + nimiq (nimiq) + + -h, --help Display this help message and exit. + --debug Enables debug log output. + --disable_colors Disables console output colors. + --force_colors Forces console color output even if the terminal does not seem to support them. + --api_listen=IP:PORT Enables the api. IP:PORT is optional. If present, the IP:PORT combo decides + the interface(s) and port to listen to. Default is 127.0.0.1:4028. For + external access, use e.g. 0.0.0.0:4028. It's also valid to only specify the + port, e.g. 4029. + --log_file=FILENAME Enables logging of miner output into the file specified by FILENAME. + -l, --log_interval=SEC Set the time interval in seconds for averaging and printing GPU hashrates. + SEC sets the interval in seconds, and must be > 0. + +Pool config options: + -o, --url=URL Sets the pool URL. Currently stratum+tcp and stratum+ssl URLs are supported. + Each additional time this option is specified will start a new pool config. + Per-pool options (such as -u, -p) will need to be explicitly specified + again for each new pool. (See the example start_multipool.sh/bat file) + The multi-pool strategy for the miner is set with the --pool_strategy option. + -u, --user=USERNAME Sets the username for pool authorization. + -p, --pass=PASSWORD Sets the password for pool authorization. + --pool_force_ensub Forces an extranonce subscribe request for supported pools unknown to the miner. + --pool_no_ensub Prevent miner from sending extranonce subscribe request to the pool. + --pool_broken_rpc Tells the miner to only allow a single outstanding rpc request on the pool + connection. This is a work-around for pools that violate the json rpc + specification regarding rpc IDs. + +Global pool options: + --pool_connect_TO=SEC Set the time-out for attempting to connect to a pool. SEC is the time to wait in + seconds. Default is 10. + --pool_rpc_TO=SEC Set how long the miner will wait for an unanswered RPC to the pool. After this + time, the miner will reconnect to the pool. SEC is the time to wait in seconds. + Default is 60. + --pool_max_rejects=N If a pool rejects N shares in a row, the pool connection is reset. This is to prevent + against pools that invalidates mining sessions without disconnecting the user. + Default value is 5. + --pool_strategy=STRAT Sets the strategy for selecting pools when running with multiple pools. The available + values are: priority, load_balance, and quota. The default is priority. + priority: The miner will use pools in the order they are listed, only moving on + to the next pool if the previous cannot establish a connection. + load_balance: The miner will evenly balance the hashrate across all currently + connected pools. + quota: The miner will evenly balance the total hashes completed across + all pools. If a pool disconnects and later reconnects, the miner will move + hashrate to the pool until the total hashes for each pool is balanced. + --no_ntime_roll Prevents the miner from rolling ntime in the block header, only using the value + provided by the pool. This needs to be enabled for some pools when mining x16rt. + +GPU options: + --platform=INDEX Sets the index of the OpenCL platform to use. If not specified, platform will + be auto-detected. Linux with multiple platforms is only supported using no + --platform arg but instead adding --bus_reorder. + -d, --devices=DEVLIST Sets gpu devices to use from detected list. DEVLIST should be a comma- + separated list of device indices, e.g. -d 0,1,2,4. If not specified, all + devices on the platform(s) will be used. NOTE: by default the devices are ordered + by pcie bus ordering. Use --list_devices to show indices. + --init_style=1/2/3 Specified the init style (1 is default): + 1: One gpu at the time, complete all before mining. + 2: Three gpus at the time, complete all before mining. + 3: All gpus in parallel, start mining immediately. + --pcie_fmt=FORMAT Sets print format for pcie bus numbers. The accepted values for FORMAT are + either 'hex' or 'dec'. The default is dec for windows and hex for linux. + --bus_reorder Reorders the detected or specified devices after their pcie bus id. If no + platform is specified, devices will be collected from all detected AMD OpenCL + platforms. Note: As of version v0.7.0 this is the default behavior. + --opencl_order Orders the detected or specified devices in the order OpenCL presents them. + --list_devices Lists the available devices for the detected or specified platform and exits + immediately. Bus reordering will be implemented in the displayed order. + +Watchdog options: + --no_gpu_monitor Disables the ADL (Windows) or sysfs (Linux) GPU monitor for temperature and + fan speed. + --temp_limit=TEMP Sets the temperature at which the miner will stop GPUs that are too hot. + Default is 85C. + --temp_resume=TEMP Sets the temperature below which the miner will resume GPUs that were previously + stopped due to temperature exceeding limit. Default is 60C. + --watchdog_script(=X) Configures the gpu watchdog to shut down the miner and run the specified platform + and exits immediately. The default script is watchdog.bat/watchdog.sh in the + current directory, but a different script can be provided as an optional argument, + potentially with a absolute or relative path as well. + --watchdog_test Tests the configured watchdog script by triggering the same action as a dead gpu + after ~20 secs of mining. + --watchdog_disabled Forces the watchdog to not execute. Can be used to disable the watchdog in mining os + that always run with the watchdog enabled. + +Ethash options: + --eth_config=CONFIG Manual ethash configuration for the miner. CONFIG must be in the form [M][L]. + The [M] value selects the mode which can be either 'A' or 'B'. + The 'B' mode uses additional memory and will only work on 8+GB cards. + The [L] value selects the intensity and it's range will depend on the GPU architecture. + Both values are optional, but if [L] is specified, [M] must also be specified. + Example configs: --eth_config=A + --eth_config=B750 + CONFIG can also be a comma separated list of config values where each is + applied to each GPU. For example: --eth_config=A,B750,,A288 + Any gpu that does not have a specific config in the list will use the first + config in the list. + --eth_aggr_mode Enables automatic use of the 'B' mode for all Polaris 8GB cards, unless they have a + different config provided by the --eth_config argument. This is the same thing as + manually setting all Polaris 8GB gpus in the rig to 'B' mode using --eth_config. + For most gpus, this adds 0.1-0.2 MH/s of hashrate. NOTE: 20-25% of rigs becomes less + stable in this mode which is the reason it isn't the default mode. If you experience + dead gpus, you should remove this argument and run the gpus in the 'A' mode. Moreover, + this option will stop working when the DAG approaches 4GB. + --eth_stratum_mode=MODE Sets a fixed stratum mode for ethash pools. By default the miner will attempt + to automatically determine the type of stratum the pool supports and use that mode. + This automatic detection can be overriden by specifying this option. The MODE can be + set to one of the following options: stratum, nicehash, ethproxy. + --eth_worker Set the worker id that will be sent to pool. This only applies to pools with ethproxy + stratum mode. + --eth_epoch Tests a specific ethash epoch. NOTE: you still need to provide a pool as if you were mining, + but no shares will be submitted. Simulated mining only. + +Progpow options: + --prog_config=CONFIG Manual progpow configuration for the miner. CONFIG must be in the form [M][L]. + The [M] value selects the mode which can be either 'A' or 'B'. + The 'B' mode typically results in better performance but is only available for + Vega on linux and Navi (linux or windows). + The [L] value selects the intensity and it's range will depend on the GPU architecture. + Both values are optional, but if [L] is specified, [M] must also be specified. + Example configs: --prog_config=A + --prog_config=B750 + CONFIG can also be a comma separated list of config values where each is + applied to each GPU. For example: --prog_config=A,B750,,A288 + Any gpu that does not have a specific config in the list will use the first. + --prog_height=VALUE Sets a fixed block height for progpow algorithms for benchmarking purposes. + Note that using this option needs a pool connection but will not submit shares. + --prog_strict Forces miner to always generate strictly accurate kernels. By default the miner will + generate relaxed kernels that use less computation power but can result in + occasional invalid shares. + +Cryptonight options: + --rig_id Set the rig identifier that will be sent to the pool. This is only used for + cryptonight pools. + --cn_config=CONFIG Manual cryptonight configuration for the miner. CONFIG must be in the form + [P][I0][M][I1][:xyz], where [P] is an optional prefix and [:xyz] is an + optional suffix. For [P], only the value of 'L' is supported for low-end + GPUs like Lexa/Baffin. [I0] and [I1] are the thread intensity values normally + ranging from 1 to 16, but larger values are possible for 16GB gpus. [M] is the + mode which can be either '.', -', '+' or '*'. Mode '.' means that the miner + should choose or scan for the best mode. Mode '*' both a good default more and + _should_ be used if you mine on a Vega 56/64 with modded mem timings. The + exceptions to this rule are small pad variants (cnv8_trtl and cnv8_upx2), they + should still use '+'. For Polaris gpus, only the '-' and '+' modes are available. + + NOTE: in TRM 0.5.0 auto-tuning functionality was added, making manual configuration + of the CN config modes unnecessary except for rare corner cases. For more info, + see the tuning docs and how-to documents bundled with the release. + + Example configs: --cn_config=15*15:AAA + --cn_config=14-14 + --cn_config=L4+3 + CONFIG can also be a comma separated list of config values where each is + applied to each GPU. For example: --cn_config=8-8,16+14:CBB,15*15,14-14 + Any gpu that does not have a specific config in the list will use the first + config in the list. + --no_cpu_check Disables cpu verification of found shares before they are submitted to the pool. + Note: only CN algos currently supports cpu verification. + --no_lean Disables the CN lean mode where ramp up threads slowly on start or restart after + network issues or gpu temp throttling. + --no_interleave=DEVS Lists gpu devices where CN thread interleave logic should be not be used. + The argument is a comma-separated list of devices like for the -d option. + Use this argument if some device(s) get a worse hashrate together with a lot + of interleave adjust log messages. + --alloc_patch=DEVS Lists gpu devices that loses hashrate between TRM v0.4.5 and later versions. With this + argument a simpler mem allocation strategy is used, and the old (higher) hashrate should + be restored. Auto-tuning mode can still be used. + --auto_tune=MODE Enable the auto-tune mode upon startup. Only available for CN variants. MODE must + be either NONE, QUICK or SCAN. The QUICK mode checks a few known good configurations + and completes within 1 min. The SCAN mode will check all possible combos and will + run for 20-30 mins. Setting MODE to NONE disable the auto-tune feature. The default + mode is QUICK. + --auto_tune_runs(=N) Executes multiple runs for the auto tune, each time decreasing the unit of pads used -1 + in one of the threads (15+15 -> 15+14 -> 14+14 -> 14+13 -> ...). You can specify the + explicit nr of runs or let the miner choose a default value per gpu type (typically 3-4). + --auto_tune_exit If present, the miner will exit after completing the auto-tuning process. This is helpful + when you want to scan for optimal settings and then use the resulting command line arg + printed by the miner. + --allow_large_alloc If present, and when the driver indicates there is enough GPU vram available, the miner + will be more aggressive with the initial memory allocation. In practice, this option + means that Vega GPUs under Linux will start the auto-tuning process at 16*15 rather + than 16*14 or 15*15. + +MTP options: + --allow_all_devices Some algos can't be mined on e.g. 4GB gpus. Those gpus will be disabled automatically + by the miner. This argument overrides this logic and allows mining on all specified + or detected devices. + +X16* options: + --hash_order=VALUE Sets a fixed hash order for algorithms like x16r for benchmarking purposes. + Note that using this option needs a pool connection but will not submit shares. + The standard benchmark hash order for x16r is --hash_order=0123456789abcdef. +Nimiq options: + --nimiq_worker=VALUE Sets the worker/device name for nimiq to pass to the pool(s). + --nimiq_no_proxy Disables the automatic Nimiq proxy executed as a separate process. This means that the + host and port passed to the miner must be pointing to a proxy. + --nimiq_proxy=VALUE Overrides the default path to the Nimiq proxy. The default is trm_nimiq_proxy-win.exe. + and trm_nimiq_proxy-linux in the current miner director. + --nimiq_port=VALUE Overrides the default local port (4444) used for the Nimiq proxy. This can be used if your + system is already using port 4444 for some other tcp/ip service.