-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scraper, project, superblock: Implement automated greylisting #2778
base: development
Are you sure you want to change the base?
scraper, project, superblock: Implement automated greylisting #2778
Conversation
715dba6
to
11cbd3b
Compare
Makes sense for automatic greylisted projects for de-greylisting, but why are statistics collected for manually greylisted projects? The greylister can then operate on projects with either ACTIVE or AUTO_GREYLISTED status. Also considering the WAS & ZCD calculation is done per day, I am not sure if bothering with caching is worth it. Might be worthwhile to make some dumb implementation and do a benchmark. Could adding a separate |
Excellent question. Depending on the reasons for the manual greylist, statistics may still be available for a project. If so they should continue to be collected, because the ZCD and WAS rules would then apply if the manual greylist status was removed. What are you thinking in terms of functionality for the -projectnotify parameter? |
Good point.
Manual greylisting should instantly take effect, but should ungreylisting do so too? It should be simpler to make manual ungreylisting put the project in an auto greylist state. It'll take until the next superblock for the project to become active but that's ok imo. I'm imagining a FSM like this:
Similar to other notify commands. It should be triggered on project status changes(added to whitelist, removed, greylisted etc.), should call a script with the contract hash(or superblock hash in case of an automatic greylist). |
11cbd3b
to
de00a19
Compare
That is a good simplification actually. |
6dbd145
to
23e6b12
Compare
23e6b12
to
c254f43
Compare
c501783
to
3fe15e7
Compare
Well... back to the drawing board. With "Initial implementation of AutoGreylist" I have created an auto greylist class that executes the ZCD and WAS rules. It is not wired into the whitelist or the superblock yet, but you can see the results using getautogreylist rpc. The idea on this is to use the AutoGreylist as an override in the Whitelist class, marking the status of projects that meet automatic greylisting status as AUTO_GREYLISTED. This would take precedence over ACTIVE or even MANUALLY_GREYLISTED, There are no project entry (registry) status updates made for the automatic greylist. It is maintained as a global singleton cache which is intended to be refreshed when a new superblock is staked. So when a project comes off the auto greylist, it will revert automatically to the original status dictated by the last valid project entry status. There is a big problem with the input to the AutoGreylist class, however. The scrapers filter the projects to select only the statistics of actively beaconed crunchers to save processing time and network bandwidth/node memory. This means that the TC reported in the project files provided by the scrapers for greylisted projects, indeed all projects, when summed, only are across active beacon holders for that project, not ALL crunchers. The AutoGreylist class uses the information in the historical superblocks, which is a reduced form of the scraper statistics, so it telescopes the same problem. Unfortunately, this causes serious issues with the current rules. For example, here is an output after the SB at 3473369 was posted on mainnet:
I haven't fully traced the WAS, but the ZCD for asteroids, I went and traced the void AutoGreylist::RefreshWithSuperblock(SuperblockPtr superblock_ptr_in) as it updated the greylist. What became immediately apparent is that the asteroids TC for the latest superblock at the time of the run (3473369), was actually LESS than the TC for the previous SB (3472381). How can this be. It is absolutely possible. For example, beaconholders that contributed to the TC for that project may expire between SB's and actually cause the TC summed across all active beacon holders to decline. This is clearly not what is intended by the rules. The rules using the script based greylist checker operate on the total TC for the entire project across ALL CPID's, whether registered beacon holders or not, and as such are much more stable. I was hoping to get away with minimal changing to the scraper plumbing and the project stats objects that are put in the manifests of the scrapers to avoid additional points of possible problems, but it looks like I am going to have to bite the bullet and
This is going to cause the scraper processing to go up some. Ugh. |
df06937
to
9331288
Compare
Also implement corresponding -projectnotify startup parameter that provides the ability to run a cmd based on a change to the whitelist.
This adds a boolean argument to the listprojects RPC function. This boolean defaults to false, which only lists active projects. When true, it will list projects of every status.
This supports project status in the superblock.
This is the initial implementation of the AutoGreylist class. It also includes a diagnostic getautogreylist RPC to see the current results of the algorithm.
…(part 1) This commit adds the internal machinery to the scraper to compute the total credit across ALL cpids for each project, regardless of whether they are active beaconholders or not. This is to support auto greylisting. This commit does NOT include the network manifest changes.
7acd87d
to
3783458
Compare
This allows the automatic greylist to be overridden for a project if for some reason it is not functioning correctly.
380f0c3
to
e9c0e11
Compare
e9c0e11
to
7960dfa
Compare
Alright. I am nearing the end of the work on the automated greylisting functionality. There are two important issues that have come out of my testing that I would like some opinions on:
Thoughts? |
0ec7cef
to
8350062
Compare
Made small adjustments as a result of unit testing. Note that a tough to track down error was occurring due to an incorrect action from the MarkAsSuperblock() call on the CBlockIndex object. It turns out this was a enum confusion caused by SUPERBLOCK being used in two different enums in the same namespace. This has been corrected by puting the MinedType enum in the GRC namespace. The rules have also been finalized to mean the 7 out of 20 ZCD's means 20 SB's lookback from the current, which is actually 21 SB's including the baseline, and correspondingly for the other intervals. Also a grace period of 7 SB's from the baseline is used before the rules are applied to allow enough statistics to be collected to be meaningful.
8350062
to
8a751ab
Compare
This PR is for tracking the progress of implementing automated greylisting.
Please see the following set of notes for design considerations that need to be discussed.
block → MAN_GREYLISTED
superblock → AUTO_GREYLISTED → status still MAN_GREYLISTED
superblock → removed from AUTO_GREYLISTED → status still MAN_GREYLISTED
block → removal from MAN_GREYLISTED → ACTIVE
block → MAN_GREYLISTED
superblock → AUTO_GREYLISTED → status still MAN_GREYLISTED
block → removed from MAN_GREYLISTED → AUTO_GREYLISTED
superblock → removed from AUTO_GREYLISTED → ACTIVE
- Have status in cache of something like AUTO_GREYLIST_QUAL which means project has met the conditions for AUTO_GREYLIST by the rules, but was already MAN_GREYLISTED
- This would be checked for each contract injection to change MAN_GREYLIST status, to decide whether new status is either AUTO_GREYLIST or ACTIVE
- The AUTO_GREYLIST_QUAL is a flag on the project at the current (head) state
- Maybe this really belongs in the in memory superblock structure? This does not need to be in the on-disk (chain) superblock structure at the cost of some computations.
- There is an existing superblock cache (SuperblockIndex g_superblock_index) that currently stores the last three superblocks and could be expanded to 40 superblocks as an easy way to do the cache. This means more computation on top of the cache but much faster because it operates on in memory structures rather than reading from the superblock registry (disk I/O). It also means more memory usage.
- Maybe best to modify the cache to be a hybrid and store more limited information for superblocks 4 – 40. But this makes the cache updating more complicated.
- The memory usage of the additional superblocks is minimal compared to the current size of other data structures with the current active beacon count and chain length; however, when the benefactor contracts are implemented, this will no longer be true.