About removal of classical evaluation. #4678

AlexandreMasta · 2023-07-12T15:20:39Z

AlexandreMasta
Jul 12, 2023

I think this is a big step foward to make it in a rush. I´m not saying it is wrong but I would say it could wait a little longer, when the classical evaluation would not bring any real elo to the engine (proved). It´s maintanance were not really "harming" the engine progress. No "nostalgic" feelings about it. It seems this code would not be removed if treated as a normal "simplifying test" as it should. I think it pragmatically failed as a "simplyfication test" by SF´s stardard rules but made by "The Director´s will". (Do you understand that about 3 ELO in this level maybe represents about +15 or even more in other times? This is due to "shrinking gains" in high level of play. Did you consider this?)

Anyway...as I said...a big step foward, maybe it is reasonable, hope it will bring us a stronger engine in the long term. I miss the time the focus of SF Project was +ELO. Let´s hope the best! Good luck and job.

Dboingue · 2023-07-13T02:35:54Z

Dboingue
Jul 13, 2023

I am curious about the implication about the whole information loop in the NNue Master network training, as it might depart from the available blog from SF12 (to my knowledge, i would like to be informed otherwise with link to follow) where it is made clear that the classical evaluation version of SF at that version, was the "orable" trainer for the NN to approximate on some big position set.

I think I have not missed one blog version page and did not see that SF12 blog explanation had been changed, but with SF16, some new language minimally referring to leela's data was used.

My understanding for now, or hypotheses of understanding, are that minimally, it is about using the training games positions.
The least change from SF12 blog description would then have the classical version of SF (assuming the latest SF during training) still produced the training target vector for the master network to fit in a supervised learning mode. (moderate depth classical SF search on the positions to be faithful to SF12 Blog).

There are other ways to use leela's data that might be a structure change in my opinion about the whole information flow in the design of SF evolution (sorry for the word choice is troubling). But it might requires a minimal bit of information other than how fast the implementations have become.. It is a matter of user interpret-ability of the tool. And it might be very easy to do for those who know. And it might not be the prison that good documentation other than source code might have haunted the developer imagination in open source collaborative projects (if that is the resistance). I know chess likes to trade in secrets and expertise auras, but we are talking about programmable things. Someone ought to have the perspective to answer my question.

If using more than the training games databases ground into position database, and keeping the whole game outcome information as part of the SF master network training. . Then what kind of machine learning procedure is used. How are the games outcome integrated.

Could the isolated repository about WDL conversions from one of the SF developers have something to do with it. How?

but the simplest question is, does SF12 blog explanation of the outer flow of information on the network training coming from classical searches still stand.. I am writing here because I would like the op to please provide some links to where such decision might have been discussed. As it does not seem compatible with SF12 blog. and the absence of blog announcements that would modify that.

I am sorry if missed a blog paragraph though. I think source code is not use documentation. So, it might be a bit from exasperation about that possible tendency, that I might sound a bit tense about this issue.. but really i am deeply interested by how training is done, more than how fast the executable performs.. I think documentation might be a bit more transparent about that. it does not require developer inside knowledge to be able to understand such level of reading. It has nothing to do with quantization (well not directly) or feature reduction at input (well not directly). It can be shared perhaps at the same minimal disclosure level as SF12 blog. I would congratulate such effort in documenting something so important for interpretation by serious users and all lichess users for example.

0 replies

ChessOverflow · 2023-07-16T14:16:03Z

ChessOverflow
Jul 16, 2023

Although I'm new incomer here but from a chess programmer's point of view, we can't remove the classical eval until at least 90% of the chess positions have been trained. Maybe it would be better if this removal was done in Stockfish 20.

10 replies

PGG106 Jul 17, 2023

the "generalization" is not something you have to enable, it's just a property that is always true, chess has so many positions the fact the net works at all means it has to be generalizing.
Whatever positions you may find where hce and nnue disagree on a static eval pov are irrelevant since pure static eval is never used, as long as the net works correctly together with search there's no problem.
And even if you can find positions where the net completely misevals and the search is wrong as a result, keeping hce around is not a fix to any of that, the way to tackle such an issue would be to train stronger nets.

ChessOverflow Jul 17, 2023

Keeping hce is the only way to fix the problem, If you have ever used Stockfish-NNUE by @joergoster, you will understand how it does the correct evaluation using classic eval.

POSITION FEN 8/p4p1p/5kp1/1p6/8/3b2P1/PPp2P1P/2R3K1 w - - 0 34

Classical evaluation: -0.87 (white side)

NNUE evaluation:      1.70 (white side)

Final evaluation:     -0.87 (white side)

This formula shows that how SF detect to use strong classical or NNUE.

      Value psq = Value(abs(eg_value(pos.psq_score())));
      bool  largePsq = psq * 16 > (NNUEThreshold1 + pos.non_pawn_material() / 64) * r50;
      bool  classical = largePsq || (psq > PawnValueMg / 4 && !(pos.this_thread()->nodes & 0xB));
      bool strongClassical = pos.non_pawn_material() < 2 * RookValueMg && pos.count<PAWN>() < 2;

      v = classical || strongClassical ? Evaluation<NO_TRACE>(pos).value() : adjusted_NNUE();

When we have StrongClassical, Why use NNUE?

cj5716 Jul 17, 2023

that is only because classical was forced into lazyEval through rule50count, it has nothing to do with NNUE evaluation being innacurate, since using lazyEval increases speed, we use it in high psq positions.

cj5716 Jul 18, 2023

If you have any feelings with regards to SF removing classical, you are welcome to fork SF and reintroduce HCE. However, the decision has been made by the maintainer and will not be changed.

cj5716 Jul 18, 2023

And btw, the strongClassical just detects how much material we have on the board so that we can use the faster HCE evaluation in completely winning positions.

Dboingue · 2023-07-16T14:56:47Z

Dboingue
Jul 16, 2023

https://github.com/official-stockfish/Stockfish/wiki/Stockfish-FAQ#interpretation-of-the-stockfish-evaluation

0 replies

ChessOverflow · 2023-07-18T06:19:02Z

ChessOverflow
Jul 18, 2023

If you have any feelings with regards to SF removing classical, you are welcome to fork SF and reintroduce HCE. However, the decision has been made by the maintainer and will not be changed.

First, I have to see if this problem will be solved by structural changes in network training or not.

Recently, I opened a discussion asking what is the difference between a network that is trained with SallowDepth and a network that is trained without a ShallowDepth? But no one has answered it yet.

‎Should I train NNUE with or without PruneAtShallowDepth? #4690

@cj5716 Can you help me in this context?

1 reply

cj5716 Jul 18, 2023

I am not experienced in NNUE development, sorry. I think asking someone who as trained a neural network/worked on evaluation extensively is more suitable in this regard.

Dboingue · 2024-06-27T01:02:00Z

Dboingue
Jun 27, 2024

90% in distinct position numbers. Someone allready mentions about the exponential combinatiorial "radiation" or compounded divergence, over game depths that might still be of tournament constraints range... It is not really needed depth to do that growth, it can go "dense" but is that only a branching degree question. 90%. Since no engine is using any position based "extrensic" "metric" (yep, being prudent, internet big, git hub also big, and chess, don't let me go off on chess and words, .. I have high hope for chess, but chess is not having the coding sphere tight relationships between the coding syntax and the board behavior execution tight relationship, if sticking to non too floaty discrete time dynamics we could still think of as finite algorightms.

So. maybe here there might be more enthousiasm for discussing what 90% of positions. (and all that this is leaving out as room to start pulling mind sleeves and start poking on the ELO pool conceptual bubble? That won't be it would disappear in a puff of new ambient logic.

It might still be characterizable. As long as all the games still retrievable over history, and the engines that produced them in well kept pool data, are still available for more characterization of their behavior, if we had some kind of position based external ways of characterizing their behavior other than trhough compettivie pairing odds. I know I know. that it the optmization objective for ages and ages..

but what if we could characterize without using evaluation of game odds (well sometimes we may have to cheat and light tasks might show the way toward not measuring from win data but putting more attention to basic many positions response function characteristics. We might not even need to have engines pairing. just plenty of single root searchs. and plenty of leaf eval function special evaluatoins over any position. Even if never used in real games from engine pools in existence (as the evolution environemnet dynamic constraint, thorugh hopely good covering of all possible pairings.

Do we need to have such measure already packages to wonder what we would it need to be able to do. Should the characterization experimental design be leelas data, or use engine eval (some training feedback loop stage version, I never got to get a full picture of that data information flow scheme but i did not keep looking hard, see previous posts. At some point, even if things get better, previous struggles leave some pre-emptive belief to avoid losing more energies doing the same with for the same results risk expectation. It can become its own loop of ignarance, but one might have other things bearing on their digging budgets. So, do not find me flippant with my apparent lazyness to go read some thing that might be refreshing and finally informative about the basic machine learning likely explanable in basic linear algebra and keeping to chess land interpretations of the variable, not the programming language or some pseudo code version. One might even skip the weight updating and lump such step in whole NN function space (weight space) updates thorugh data vectors (vectors of vectors, the training step has indexing, on top of the alreayd multidimensional position and leaf evaluation problem.

ok off topic.. But not so much.. The evaluation function domain (even if using reduced dimensional feature set) where 90% notion (do not dismiss that question, it is the beggining of chess land asking for stuff from devland to make some chess sense.. I hope.

also if I am babbling in vain, now that maybe someone already answered the above, could I get some links not too vague (some kind anchors). maybe even use my poetry, and chop it in chunks pair with one liner of appropriate doc links.

That was huge parenthesis. So i like the word agnostic to mean not obsessed by evaluation too prematurely. so we don't get trap into representative relationship between training set and not test set, but actual chess world expected to be bigger than still not characterized tournament games maximal position space visited (I am better, would love to be bullshitting in old ignorance too tired to update itself).

Ok. some hope. Number of stars the universe. We don,t know how big in spatial volument would the astronomical previous analogies of order of magnitude would take. but if we look closer, at fluids. maybe the ocean number of molecules. Think about it.
Fluid dynamics is might be approaximated through CPUS, but the stopping criteria are not based on finite mathematical problems counting variable setting the problem sice. In fact, the approximation and validity of using CPUs rely on using a "quantization" a small as needed to get convergence (well I think i skiiped a step, convernece itself is a continuum concept, that can be approached in ordered sets of CPU, the rationals, ok divide intergers if needed. Discretization of dynamics that come from continuous space models, how can I say, not even sure that is needing proof, or maybe to find the quanta constant that each continuum model can sustain before spurious complexification behavior emerges from the added leakage in the multidimension walks. I am trying to be concise so it might not be music to all ears.

Volume, measure, distance based metrics (means same thing to me, but now bigger pool or audience can hop through that redundance.

I can delete this, if not welcome. We need to be many and drink some humility potion, not a normal behavior in orbit of chess culture (but also any stressed resources many people competition system.. etc.. I might be trying another time in one year, how is that. The deluded hopeful. I am hoping that jinxing the worse case scenarios in my ignorance (not for lack of previous trying though), by sticking to my previous assessments above. I would like to be knocked off such assessment.

we don't need dynamics to characterize positoin space, is my current working hypothesis. but best work from A0 LC0 (possibly still undisclosed mathematical input layer basic vector. not in programming langguage. It could be directly translated back to chess ruleset layers, and a little bit of linear algebra, sets, , relations, and ok, might be weird. but function spaces might come in handy on the equation intricacies. Any one touch typing latex around?

0 replies

Dboingue · 2024-06-27T01:40:25Z

Dboingue
Jun 27, 2024

Also SF would gain to be clear about its information loop of training from itself (being concise, and prudent since apparently the job might not have been done, or put forward for chess priority questions as above. Who might be relying on past communication from engine culture of how leaf evaluations were about transferring either chess theory (from which school was that? not important, anymore just wait for my hopeful additional rambling). I have got wind through sprace fly by reasdings in issues, really tangential, as I don't have the energy ot dig and I thought I would not try again here. The amount of things I have to make sure are not dismissable, by vague sliding replies, to get the worms I am looking for our of obscuiryt nose of sources code using some preception magic spell that has its developpers convinced it can be user documentation (I know there is wiki. I dare not dig on the NNue part, for fear of more implementation cost saving feats, and perpetuated loud silence on the big picture that does not need that part. ML is modular at that level. One can separate the implement feats as black boxed as long as the chess in and blacck box out are explicit, the data maximal world that the taining and test dwarf portion should together be representing. See, I can get back to the op here. It may sound like rambling. But the op quesiton or maybe it was another comment. About 90% of positions, I get the intent, but it is not the machine learning well posed problem.

Actually that is what SF devs. having clinked clanked and clunked for the past years with NNue and its tango with the good old exhaustive search austerity budget philosophy, lean mean, and maybe somethign about the intent of the 90% question.

I don't really care of the aggressive pruning. As I noticed that now SF has been seriously going in the leaf evaluation desing efforts, who would have known that a stronger leaf evaluations would end up saving on the search nodes. I am sure that some might still think it is the aggressive pruning based on the better eval.. but no way to tell which of the higher leaf eval confidence or the lower leaf eval confidence is making it so, that the search breadth has gone down (I think I read that). This is the hybrid program. but in all the past, I think the 90% of position incomplete formulation, has been absent. No wonder we don,t even know how to make a complete version (no offense, I share the concern, but I have been thinking hard about this for few years as my lichess musing clown. The weird ramblings there. If no one had a clue. Same rambler. I come here, maybe once a year. or was it shorter this time.

one needs to look a bit outside of the pools as the maximal conceptul cavitiy of any chess, human or engine. And one needs to untangle positions set hypervoluems based measures, from evaluation "entanglement" is my current muse or delusion. But when no one asked (but thanks you above, hinted they would like some notions or curiosity from the engine racers, about how their things is perhaps visiting the whole of chess.

I don't think going back to classical is that wise. It was a mess from a global optimiation and functions space point of view. I have hope that if SF could spend some of its time helping the one who seem talented in higher level concepts sharing extraction from source code depths.

I noticed in some passage maybe month or 2, bad with time, where TODO questions to other collorators to stop by and lift the fog of if it is working why do you care. We make racing machine. nobody told us we would have to baby sit human chess users overusing in vain those machines over positions. They are meant to win games in engine pools that currently also compete in each tournament pools.

Is that last paragraph, actually feeling outrageous, caricatural, exagerating, a tad bit theatrical, or maybe sums up some behind the scene common sense. I do not wish to disparage all the valiant efforts in the only clear optimizanoi parameters that have been there for long time, and therefore should or will be there the same as why break something that "works", ok. gettihng old and not thinking actually a thing.

I think there is no need to mystify machine lerning basics that can be shared without confusing terminolgoy for how it might hve been implemented. I also understand that computers are not math friendly, and nothing on the web is direct math. communication. The web tech is not continuous penmanship. where not encoding of the spatial effienciency of mathematical notations based communicatoin, making such math. communicatoin tool, in sync with natural language complementary narration or motivatoin or illustration or scaffolding.

hammering keyboard for math. reading stuff. if really a pain. But if I knew the big picture from having been working deep in the code. I would take the time to contact back, those who have rambled hard to get such modular chess world bridge view of the major new direction that SF has been taking.. I would stop the minimalist drip of information about how the trainined works, and just work with those who can make the models that are missing, and have been implemented in the code itself. While that is the land of exhaustive tree search recursive magics, the more basic concepts of machine learning are coming from standalone more legible and not code entangled details, mathematics, that are cleaner to communciaton to chess users, who are already likely visua spatial thinkers, not string encoding virtuosi.

ok. I might have gotten on some nerves. or made a fool. but I say by long time brewing, and usually too sisiphean to share, in the internets here. How many times. Coming in the land of the too busy. I think spaced iterations over months and not direct conversation, might still have some sharing percolating. even if in rejection. to reject or dismiss, one might have looked a bit.

till next storm. if I did not break anything.

0 replies

Dboingue · 2024-06-27T01:43:53Z

Dboingue
Jun 27, 2024

Q: is the features set of NNue evolving across versions. And what is the thinking (perhaps that need the previous call for a modular high level chess world shareable math. model version of the data vectors (matrix) loops in how chess information goes in and goes out. and some progress of learning AND generalizatoin happen. If there is a serious effort to do the missing implement to math backward step to reach the chess land interpretable world and its users, maybe I would find the energy to go latex again. but I can't be the crazy one for ever.

This the last missing idea had intended for the op title question. Again pending a basic machine learning information loop serous communicatino effort (and I can say, that it would help the very developped to not be full exhaustive global parameters search space, although I might be preaching to choir by now, just anyone let me know I am babbling in vain and all I may be saying is already linkable.

And maybe not notes about past confusing readme files that I have seen using the word reinforcment learning as natural language intuition probably naive discovery by implementation making the way to think about, being about some feedback loop of information from some simple evaluation running the core search engine at moderate depths to populate the target data vector to be used in training on "leelas" data. That is what blog level reading crumbs give me. anything better?

So about the questino of classical. if SF were able to extract the higher level understanding (won't hurt future dev, just need to give less computery symbols to reach back chess land in more generaic mathematical formulation. above programming jargon. needs cooperation across expertise, which means agreeing on the questions and needs. So. do not thing the mess in classical eval was total. it wsa a mess when using the single value score. It did not have to be strong, with the full exhaustive program not having encounters the other type of engine species, with opposite bet, small search tree, strong and costly leaf evals. (but not claim of experts having been in of the features reductions, or in classical is was construction one isolated "features" at a time, a bit like it is taught to humans, but for us, we can tweak with intuition, or repairs the possible spurrous exagerations, that only hindsight one feature at a time spread over long duration, and single lines of plan ideas, along <with mainline also deep .. so that before the real foresight problems that engine can deny, for they don't reprogram when a positon is new and has never been visited in any dev engine pool or any tournament pool. 90% related here.
can such a new position exist. do we just need to shift a pawn on point on the 8x8.. what is the curren set of visited positoins of any engine pool participants . did it grow? The classical, in my bet of what SF might be doing but not shouting it clearly, is that the very simple eval that is likely the pure material count eval primordial bias of all exhaustive engine species, combined with quiesceint selective leaf evaluatoin dispath crit. (non quiet position never get either a NNue eval or a simple eval., asking to be contradicted, or would like to know what revising that last part might come up, on the way to better quality leaf eval. That is another way to fix the"horizon" problem coming from predominance of the material count in the non hybrid program.

So the hope here. for human chess concepts and engine interpreatbalilty, is that the feature input space of the master NN being traiin in some feedback loop from simple eval serch tree of moderate depth back onto NN master (if still that, using SF12 model from gitbud if still in playas the trainng model. That feedback loop from SF eval onto NN leaf evaluation as training target vector is not reinforcement learning of machine learning. The learning is not in that open loop, the training is at at the chess data vectorts SET defijntion.

sorry. but that big fundamental missing stone that had kept being neglected for only crazies would ramble about it I guess.
is getting in the way of all other explanations. That dont fear, what could be found valuable as indiviual component of the classical evaluation, but we could not see as beeing buried in the magics or sequential isocalted theory from human to automatic algeabraic function that can't use hindsight to decide if well this time, it is not tht feature that has shown critical.

but, if SF web guide was not just a web guise but was part of the offical competitions entry requirment to be made UCI visible. (and if your favotie web site wrapper, did not worry about your engine belief stability, then we already could guess about which features might be considered in an improving development program (still would need to be needing improvement by realizing there is floating questino of generalization right now, just hoping that new engines will actuallly provide for generalization challenges.

The training /testijg set split. is only valide about the maximal set that the combined set was representative off. I am giving a crash course here. I can go on and on.. but I think I would need to know that there is a link to the missing nugget.

0 replies

cj5716 · 2024-06-27T01:56:41Z

cj5716
Jun 27, 2024

In any case - we have already gotten a second, smaller, net, something that effectively replaces the job of classical, while being worth more elo.

0 replies

Dboingue · 2024-08-16T23:13:34Z

Dboingue
Aug 16, 2024

I am still looking for a documentation of the training set up in terms that do not require codeing skills.
I thought I would find it in the wiki. But it seems to have gone back to some rudimentary split of dumb user assumptions. or too buse dev. nuts and bolts. While the search tree part of the documenation has made a lot of progress, including the terminology page.

the replacement foir the classical eval as NNue trainer narrative from SF12 splitting from SF11 issue buried text blurb, seem to not have been made visible at wiki level at the same interested user readable level, say as the search part of the code.

Can some make it clear in the document the outer sheme of the training. the input data "vector" defnition. .the supervising oracle training data vector. and it there are stages in the training looping on previous stages of such supervising target vector to "fit" with generalization, then please do make that visible. to user minds that are not buying the blind oracle mysticism.

3 replies

vondele Aug 17, 2024
Maintainer

See https://github.com/official-stockfish/nnue-pytorch/blob/master/docs/nnue.md for the documentation of the NNUE part of SF.

Dboingue Aug 18, 2024

Thank you much. At very fast glance it looks like a serious documentation effort. I will look for the training setup story as I consider it a modular or possible independent of how it might have been made possible or actually implemented optimally for SF.
The NN being often toted as a black box (even some say model free) model or function of its input vector. The algorithm part that is about input definition can be made relevant to chess user, as well as the supervised learning (if still that) actual definition of the output target vector, and the objective function(s) toward approximating the function that generated the target vector.

My past understanding to now be updated with this well-structured markdown file, was from SF12 blog and the PR, I think that was making a pause to make a milestone SF version to mark the change of SF evolution program or paradigm from an exhaustive search engine with weak leaf evaluation, on selected for some quiescent criterion (either from the current search depth or a quiescence extension further search), to a plan of developing the quality of that leaf evaluation function.

The core clever cost saving idea being to use the smallest and most quantized, NN beast, that would be able to predict on shallower nodes, that were previously not detectable position signals (with previous design, no NN), what a further search with classical equipped SF, of moderate depth would score if taking such shallower position as a new root to search exhaustively from. The extra redundancy might be emphasis for chess users that want to bridge their previous understanding under classical leaf evaluation SF.

The simplest understanding I managed is that, for SF12 till maybe SF16 (when the blog started using the word leela's data), for what a chess user might need in order to bridge their previous interpretation of SF engine as analysis tool on their chess position of interest, not from engine games by the way, but their own chess under study. Missing a verb? think of it as a long bullet or label for what follows.

Supervised Learning (SL) very basic "loop". We have input variables and output variables that are thought to be describing a phenomenon of interest. That full external world can be sampled over the input domain, and the output variables of the target world or phenomenon, are also part of that sample. That sample is assumed to be unbiased representative sample of the target world (and a lot of ML has been able to use well defined toy problems where one has full control over the characterisitcs of the sampling). In chess, that is assumed but, AFAIK, never really studied, at least I do not know that there are tools or methods or even well formed attempts to find out.

So, let's give the assumptions a try. forget what I just said. I can't shut up about it.. so now out of my system. let's keep on the sufficient chess user interpretable explantion attempt. SL. A chess engine, is an algorithm, when all its parameters have been instanciated, that given an input root position, will generate some output, with various degree of divulgation, but minimally the root position and a few immediate successor positions would get a number generated as user visible output, Miniamally a very trusting and eager for chess truth about the best move, one would get only one number and also one successory position and its implied (and applied) move from the root position. The number is what I call the score. I am nitpicky for this story. I keep the word score just for root position used a input position of the whole engine (not just its leaf evaluation function, a.k.a. static evaluation).

So. This is enough actually to explain what I consider essential to share for chess students using the engine as analytical tool.
Well I assume the tree search story well explained in the wiki. The main ideas are there.

So what is NNue or its mother bigger network for development learning? Where does it get its chess information from? Would that be not useful to know, as avid user of engine as position analysis tool? So I assume, reader of this story, wants to use the engine, but is not forgetting the engine has been constructed by a restricted population of human creators, and that most sciences do not pretend to know the final truth about their scope of questions being pursued. I will attribute some humility to the engine creators. And I would say that askng to understand the nature of the goggles and the filters on them that were constructed for the engine to do its tournaments and get its ELO badges, is not dismissing the work of those creators or the value of the engine, as it keeps being optimized for its purpose (s plural?).

My question now, going forward, with curiosity and hope that I will be able to finally verify or falsify my past understanding and also figure out what exactly changes from SF12 to current SF from the NN training point of view, is what is the new supervised training chess infromation loop, design, or diagram.

in summary. The SF12 story was that the training of NNue was with a large set of positions, generated somehow so that applying the SF with only classical eval, on those positions as search tree root position (input), would give output scores in some range thought to be representative of what would be needed during fishtest and full engine diversity during official or authoritative (?) engine tournament and their pool of engine IDs of differnt developper teams conception.

But it does not really matter. That is one way to make sure that we are actually going to consider a sample that is representative of the reality we want to learn with the NN. That reality then was those input positions. And those SF engine with only Classical leaf, moderate depth searches suing the input positions are the root of the searches, well those are also assume to be representative of the reality to learn. Learning is fitting and generalizing. I like to call that a double optimization problem (both global).

So for SF12 the world of all chess position was defined as above. and the world of chess truth was also as defined above. The chess truth or supervising label training target to predict with generaliztion (over not yet seen positions).

Ok, once NNue master NN, as it was called in the PR I mentioned, is trained to satisfacaiton. i.e. it acts like SF classical of moderate depth search would on the same input position given to the trained NN executable function.

This is basically fitting a function family (the NN with all possible weight values) so that it reproduces the other function of same input domain, called SF with classical leaf eval, score on the same input positions. Crucially, it does not only fit it on the same positions used in training. it is also and preferentially reproducing SF as trainer function on position input not used in the training.

Ok. now. I did assume something so as not to trip over my rambling trying to put it all in paralel.

There is a method, actualy an art and more than one techinque, to obtain the best generalization fit on new positions.
It involves 2 things, or concepts. one is not needed really for the chess interpreattion of SF as position score "oracle", it is called regularization, the art is in how one defineds the training objective functional that is the goal of optimization by finding the NN weights that will make NN output be closer to the SF classical chess truth (supervisor or target trainer).

The other more important for the chess user eager to use SF score with discerning wisdom thing. Is how the reality-representative sample of inputs data points (the positions set, the whole sample above) is being used or split into distinct partitions.

I now run into historical terminolgy problem. and will use SF terms. Fishtest has take dibs on the word test. So I won't use the word test set, instead I think SF is using validation set. What matters, is that the optmization that tunes or optmize the NN to fit SFclassical on the training set of position, does not use the other positions from the whole sample, that were not in the traiing set.

I hope that the article will be better written than above. But I might have a specific target audience. That is neither trained or enjoying computer lexicon language, or having machine learning basics. But that is for sure a serious chess student and candidate user of SF engine as position analysis tool.

I am tired. past my limit. I was really happy to receive the link in my github email. once I had a glimpse at it. thanks you SF team for having taken the time. Even if I come back with more rambles. The effort is already worth congrats.

Dboingue Aug 18, 2024

It is also possible that the other development efforts to implement the above in the reality of the engine development problem, is taking a lot of room in the documentation. And it is also interesting to me, to a certain point. But I think my verbosity and instance to express my understanding, is that it is not needed if the objective is to give diligent chess users of SF, the key for a rational interpretation of SF output score (and sometimes if GUI allows it, of the leaf evaluations at the end of the long PV sequence... or close enough, from SF search tendency to forget where it was... in the tree...just searched).

And now this: in my story above, however badly delivered (dont worry, I might converge there one day, in future iterations), is there something that is used to train the NN on the sample of positions being used as representative of the chess reality it is meant to be exposed in future engine tournaments.

current clues from crumbs:

simple evaluation = classical evaluation from the beginning of the chess engine history of type A = 1,3,3,5,9

I have read the arguments that do not surprise me about keeping classical versus 1,3,3,5,9 (my theory to be verified).
Some 2% ELO difference. There are global optimization arguement I have already rambled about somewhere, that might help explain. A theory. Quiescence partition of all positions that can ever be used to expose the function under optimization to the diversity of chess position information "reality". And the historical single component at a time optimization method, that started with somewhere in the prehistory of chess engines with 1,3,3,5,9 or close. Once you start with those frozen parameter values. and you try to improve by adding through some linear combination (or some thing with limited flexibility, compared, e.g. to a neural network combination of such features or component of classical evaluation under historical evolution.

In summary of preceding paragraph: The classical evaluation function was already dominated by its simple evolution component.

my big picture understanding and suggestion for those like the op that want some interpretability that they considered having with classical (and yes, with commad line or using the web guide site, and if the GUI allows access to the leaf of the PV, one could look at the entrail components of the classical SF before those components would be buried numerically under the loud materical count differential.

My hope of understanding. is that now (and some recent crumbs seem to hint in that direction), the ELO worthy components that were bullied by the 1,3,3,5,9 in the previous few parameter model of chess board features measures combinations (near linear, but there might have been phase conditionals, but the freedom of many parameters that NN gives upon the uncombined components was not there).

I do not understand why this is not made user documentation top explanation. (well the well written version, which I hope I just reinvented the wheel). I will read the link, but hunt for the level and intent as above. So, quantization while also haunting my thinking strolls or storms, will be skipped over. However I will pay close attention to chess user interpretable full description of the input features of the NN functions. Those should be what the op would need to be able to have (and all chess users as well, if getting real about engines not being oracles, we don't need oracles, chess is a game of equal information, and I like to delude myself with, a game of reasoning upon such equal information. I would find it not delude to think a chess player might want to be logical about all things chess, even the performance heros, that can be poked inside for all needed information. and which don't have to be stuck in programming language encryption. sorry. I like figurative language. I mean no ill will..

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About removal of classical evaluation. #4678

{{title}}

Replies: 9 comments 14 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

About removal of classical evaluation. #4678

Replies: 9 comments · 14 replies

vondele Aug 17, 2024 Maintainer

Replies: 9 comments 14 replies

vondele Aug 17, 2024
Maintainer