About removal of classical evaluation. #4678
Replies: 9 comments 14 replies
-
I am curious about the implication about the whole information loop in the NNue Master network training, as it might depart from the available blog from SF12 (to my knowledge, i would like to be informed otherwise with link to follow) where it is made clear that the classical evaluation version of SF at that version, was the "orable" trainer for the NN to approximate on some big position set. I think I have not missed one blog version page and did not see that SF12 blog explanation had been changed, but with SF16, some new language minimally referring to leela's data was used. My understanding for now, or hypotheses of understanding, are that minimally, it is about using the training games positions. There are other ways to use leela's data that might be a structure change in my opinion about the whole information flow in the design of SF evolution (sorry for the word choice is troubling). But it might requires a minimal bit of information other than how fast the implementations have become.. It is a matter of user interpret-ability of the tool. And it might be very easy to do for those who know. And it might not be the prison that good documentation other than source code might have haunted the developer imagination in open source collaborative projects (if that is the resistance). I know chess likes to trade in secrets and expertise auras, but we are talking about programmable things. Someone ought to have the perspective to answer my question. If using more than the training games databases ground into position database, and keeping the whole game outcome information as part of the SF master network training. . Then what kind of machine learning procedure is used. How are the games outcome integrated. Could the isolated repository about WDL conversions from one of the SF developers have something to do with it. How? but the simplest question is, does SF12 blog explanation of the outer flow of information on the network training coming from classical searches still stand.. I am writing here because I would like the op to please provide some links to where such decision might have been discussed. As it does not seem compatible with SF12 blog. and the absence of blog announcements that would modify that. I am sorry if missed a blog paragraph though. I think source code is not use documentation. So, it might be a bit from exasperation about that possible tendency, that I might sound a bit tense about this issue.. but really i am deeply interested by how training is done, more than how fast the executable performs.. I think documentation might be a bit more transparent about that. it does not require developer inside knowledge to be able to understand such level of reading. It has nothing to do with quantization (well not directly) or feature reduction at input (well not directly). It can be shared perhaps at the same minimal disclosure level as SF12 blog. I would congratulate such effort in documenting something so important for interpretation by serious users and all lichess users for example. |
Beta Was this translation helpful? Give feedback.
-
Although I'm new incomer here but from a chess programmer's point of view, we can't remove the classical eval until at least 90% of the chess positions have been trained. Maybe it would be better if this removal was done in Stockfish 20. |
Beta Was this translation helpful? Give feedback.
-
First, I have to see if this problem will be solved by structural changes in network training or not. Recently, I opened a discussion asking what is the difference between a network that is trained with @cj5716 Can you help me in this context? |
Beta Was this translation helpful? Give feedback.
-
90% in distinct position numbers. Someone allready mentions about the exponential combinatiorial "radiation" or compounded divergence, over game depths that might still be of tournament constraints range... It is not really needed depth to do that growth, it can go "dense" but is that only a branching degree question. 90%. Since no engine is using any position based "extrensic" "metric" (yep, being prudent, internet big, git hub also big, and chess, don't let me go off on chess and words, .. I have high hope for chess, but chess is not having the coding sphere tight relationships between the coding syntax and the board behavior execution tight relationship, if sticking to non too floaty discrete time dynamics we could still think of as finite algorightms. So. maybe here there might be more enthousiasm for discussing what 90% of positions. (and all that this is leaving out as room to start pulling mind sleeves and start poking on the ELO pool conceptual bubble? That won't be it would disappear in a puff of new ambient logic. It might still be characterizable. As long as all the games still retrievable over history, and the engines that produced them in well kept pool data, are still available for more characterization of their behavior, if we had some kind of position based external ways of characterizing their behavior other than trhough compettivie pairing odds. I know I know. that it the optmization objective for ages and ages.. but what if we could characterize without using evaluation of game odds (well sometimes we may have to cheat and light tasks might show the way toward not measuring from win data but putting more attention to basic many positions response function characteristics. We might not even need to have engines pairing. just plenty of single root searchs. and plenty of leaf eval function special evaluatoins over any position. Even if never used in real games from engine pools in existence (as the evolution environemnet dynamic constraint, thorugh hopely good covering of all possible pairings. Do we need to have such measure already packages to wonder what we would it need to be able to do. Should the characterization experimental design be leelas data, or use engine eval (some training feedback loop stage version, I never got to get a full picture of that data information flow scheme but i did not keep looking hard, see previous posts. At some point, even if things get better, previous struggles leave some pre-emptive belief to avoid losing more energies doing the same with for the same results risk expectation. It can become its own loop of ignarance, but one might have other things bearing on their digging budgets. So, do not find me flippant with my apparent lazyness to go read some thing that might be refreshing and finally informative about the basic machine learning likely explanable in basic linear algebra and keeping to chess land interpretations of the variable, not the programming language or some pseudo code version. One might even skip the weight updating and lump such step in whole NN function space (weight space) updates thorugh data vectors (vectors of vectors, the training step has indexing, on top of the alreayd multidimensional position and leaf evaluation problem. ok off topic.. But not so much.. The evaluation function domain (even if using reduced dimensional feature set) where 90% notion (do not dismiss that question, it is the beggining of chess land asking for stuff from devland to make some chess sense.. I hope. also if I am babbling in vain, now that maybe someone already answered the above, could I get some links not too vague (some kind anchors). maybe even use my poetry, and chop it in chunks pair with one liner of appropriate doc links. That was huge parenthesis. So i like the word agnostic to mean not obsessed by evaluation too prematurely. so we don't get trap into representative relationship between training set and not test set, but actual chess world expected to be bigger than still not characterized tournament games maximal position space visited (I am better, would love to be bullshitting in old ignorance too tired to update itself). Ok. some hope. Number of stars the universe. We don,t know how big in spatial volument would the astronomical previous analogies of order of magnitude would take. but if we look closer, at fluids. maybe the ocean number of molecules. Think about it. Volume, measure, distance based metrics (means same thing to me, but now bigger pool or audience can hop through that redundance. I can delete this, if not welcome. We need to be many and drink some humility potion, not a normal behavior in orbit of chess culture (but also any stressed resources many people competition system.. etc.. I might be trying another time in one year, how is that. The deluded hopeful. I am hoping that jinxing the worse case scenarios in my ignorance (not for lack of previous trying though), by sticking to my previous assessments above. I would like to be knocked off such assessment. we don't need dynamics to characterize positoin space, is my current working hypothesis. but best work from A0 LC0 (possibly still undisclosed mathematical input layer basic vector. not in programming langguage. It could be directly translated back to chess ruleset layers, and a little bit of linear algebra, sets, , relations, and ok, might be weird. but function spaces might come in handy on the equation intricacies. Any one touch typing latex around? |
Beta Was this translation helpful? Give feedback.
-
Also SF would gain to be clear about its information loop of training from itself (being concise, and prudent since apparently the job might not have been done, or put forward for chess priority questions as above. Who might be relying on past communication from engine culture of how leaf evaluations were about transferring either chess theory (from which school was that? not important, anymore just wait for my hopeful additional rambling). I have got wind through sprace fly by reasdings in issues, really tangential, as I don't have the energy ot dig and I thought I would not try again here. The amount of things I have to make sure are not dismissable, by vague sliding replies, to get the worms I am looking for our of obscuiryt nose of sources code using some preception magic spell that has its developpers convinced it can be user documentation (I know there is wiki. I dare not dig on the NNue part, for fear of more implementation cost saving feats, and perpetuated loud silence on the big picture that does not need that part. ML is modular at that level. One can separate the implement feats as black boxed as long as the chess in and blacck box out are explicit, the data maximal world that the taining and test dwarf portion should together be representing. See, I can get back to the op here. It may sound like rambling. But the op quesiton or maybe it was another comment. About 90% of positions, I get the intent, but it is not the machine learning well posed problem. Actually that is what SF devs. having clinked clanked and clunked for the past years with NNue and its tango with the good old exhaustive search austerity budget philosophy, lean mean, and maybe somethign about the intent of the 90% question. I don't really care of the aggressive pruning. As I noticed that now SF has been seriously going in the leaf evaluation desing efforts, who would have known that a stronger leaf evaluations would end up saving on the search nodes. I am sure that some might still think it is the aggressive pruning based on the better eval.. but no way to tell which of the higher leaf eval confidence or the lower leaf eval confidence is making it so, that the search breadth has gone down (I think I read that). This is the hybrid program. but in all the past, I think the 90% of position incomplete formulation, has been absent. No wonder we don,t even know how to make a complete version (no offense, I share the concern, but I have been thinking hard about this for few years as my lichess musing clown. The weird ramblings there. If no one had a clue. Same rambler. I come here, maybe once a year. or was it shorter this time. one needs to look a bit outside of the pools as the maximal conceptul cavitiy of any chess, human or engine. And one needs to untangle positions set hypervoluems based measures, from evaluation "entanglement" is my current muse or delusion. But when no one asked (but thanks you above, hinted they would like some notions or curiosity from the engine racers, about how their things is perhaps visiting the whole of chess. I don't think going back to classical is that wise. It was a mess from a global optimiation and functions space point of view. I have hope that if SF could spend some of its time helping the one who seem talented in higher level concepts sharing extraction from source code depths. I noticed in some passage maybe month or 2, bad with time, where TODO questions to other collorators to stop by and lift the fog of if it is working why do you care. We make racing machine. nobody told us we would have to baby sit human chess users overusing in vain those machines over positions. They are meant to win games in engine pools that currently also compete in each tournament pools. Is that last paragraph, actually feeling outrageous, caricatural, exagerating, a tad bit theatrical, or maybe sums up some behind the scene common sense. I do not wish to disparage all the valiant efforts in the only clear optimizanoi parameters that have been there for long time, and therefore should or will be there the same as why break something that "works", ok. gettihng old and not thinking actually a thing. I think there is no need to mystify machine lerning basics that can be shared without confusing terminolgoy for how it might hve been implemented. I also understand that computers are not math friendly, and nothing on the web is direct math. communication. The web tech is not continuous penmanship. where not encoding of the spatial effienciency of mathematical notations based communicatoin, making such math. communicatoin tool, in sync with natural language complementary narration or motivatoin or illustration or scaffolding. hammering keyboard for math. reading stuff. if really a pain. But if I knew the big picture from having been working deep in the code. I would take the time to contact back, those who have rambled hard to get such modular chess world bridge view of the major new direction that SF has been taking.. I would stop the minimalist drip of information about how the trainined works, and just work with those who can make the models that are missing, and have been implemented in the code itself. While that is the land of exhaustive tree search recursive magics, the more basic concepts of machine learning are coming from standalone more legible and not code entangled details, mathematics, that are cleaner to communciaton to chess users, who are already likely visua spatial thinkers, not string encoding virtuosi. ok. I might have gotten on some nerves. or made a fool. but I say by long time brewing, and usually too sisiphean to share, in the internets here. How many times. Coming in the land of the too busy. I think spaced iterations over months and not direct conversation, might still have some sharing percolating. even if in rejection. to reject or dismiss, one might have looked a bit. till next storm. if I did not break anything. |
Beta Was this translation helpful? Give feedback.
-
Q: is the features set of NNue evolving across versions. And what is the thinking (perhaps that need the previous call for a modular high level chess world shareable math. model version of the data vectors (matrix) loops in how chess information goes in and goes out. and some progress of learning AND generalizatoin happen. If there is a serious effort to do the missing implement to math backward step to reach the chess land interpretable world and its users, maybe I would find the energy to go latex again. but I can't be the crazy one for ever. This the last missing idea had intended for the op title question. Again pending a basic machine learning information loop serous communicatino effort (and I can say, that it would help the very developped to not be full exhaustive global parameters search space, although I might be preaching to choir by now, just anyone let me know I am babbling in vain and all I may be saying is already linkable. And maybe not notes about past confusing readme files that I have seen using the word reinforcment learning as natural language intuition probably naive discovery by implementation making the way to think about, being about some feedback loop of information from some simple evaluation running the core search engine at moderate depths to populate the target data vector to be used in training on "leelas" data. That is what blog level reading crumbs give me. anything better? So about the questino of classical. if SF were able to extract the higher level understanding (won't hurt future dev, just need to give less computery symbols to reach back chess land in more generaic mathematical formulation. above programming jargon. needs cooperation across expertise, which means agreeing on the questions and needs. So. do not thing the mess in classical eval was total. it wsa a mess when using the single value score. It did not have to be strong, with the full exhaustive program not having encounters the other type of engine species, with opposite bet, small search tree, strong and costly leaf evals. (but not claim of experts having been in of the features reductions, or in classical is was construction one isolated "features" at a time, a bit like it is taught to humans, but for us, we can tweak with intuition, or repairs the possible spurrous exagerations, that only hindsight one feature at a time spread over long duration, and single lines of plan ideas, along <with mainline also deep .. so that before the real foresight problems that engine can deny, for they don't reprogram when a positon is new and has never been visited in any dev engine pool or any tournament pool. 90% related here. So the hope here. for human chess concepts and engine interpreatbalilty, is that the feature input space of the master NN being traiin in some feedback loop from simple eval serch tree of moderate depth back onto NN master (if still that, using SF12 model from gitbud if still in playas the trainng model. That feedback loop from SF eval onto NN leaf evaluation as training target vector is not reinforcement learning of machine learning. The learning is not in that open loop, the training is at at the chess data vectorts SET defijntion. sorry. but that big fundamental missing stone that had kept being neglected for only crazies would ramble about it I guess. but, if SF web guide was not just a web guise but was part of the offical competitions entry requirment to be made UCI visible. (and if your favotie web site wrapper, did not worry about your engine belief stability, then we already could guess about which features might be considered in an improving development program (still would need to be needing improvement by realizing there is floating questino of generalization right now, just hoping that new engines will actuallly provide for generalization challenges. The training /testijg set split. is only valide about the maximal set that the combined set was representative off. I am giving a crash course here. I can go on and on.. but I think I would need to know that there is a link to the missing nugget. |
Beta Was this translation helpful? Give feedback.
-
In any case - we have already gotten a second, smaller, net, something that effectively replaces the job of classical, while being worth more elo. |
Beta Was this translation helpful? Give feedback.
-
I am still looking for a documentation of the training set up in terms that do not require codeing skills. the replacement foir the classical eval as NNue trainer narrative from SF12 splitting from SF11 issue buried text blurb, seem to not have been made visible at wiki level at the same interested user readable level, say as the search part of the code. Can some make it clear in the document the outer sheme of the training. the input data "vector" defnition. .the supervising oracle training data vector. and it there are stages in the training looping on previous stages of such supervising target vector to "fit" with generalization, then please do make that visible. to user minds that are not buying the blind oracle mysticism. |
Beta Was this translation helpful? Give feedback.
-
I think this is a big step foward to make it in a rush. I´m not saying it is wrong but I would say it could wait a little longer, when the classical evaluation would not bring any real elo to the engine (proved). It´s maintanance were not really "harming" the engine progress. No "nostalgic" feelings about it. It seems this code would not be removed if treated as a normal "simplifying test" as it should. I think it pragmatically failed as a "simplyfication test" by SF´s stardard rules but made by "The Director´s will". (Do you understand that about 3 ELO in this level maybe represents about +15 or even more in other times? This is due to "shrinking gains" in high level of play. Did you consider this?)
Anyway...as I said...a big step foward, maybe it is reasonable, hope it will bring us a stronger engine in the long term. I miss the time the focus of SF Project was +ELO. Let´s hope the best! Good luck and job.
Beta Was this translation helpful? Give feedback.
All reactions