Forums

Experiment to find the relative value of variant pieces

Sort:
evert823

Here's an experiment in which we try to establish, between two variant pieces, Guard and Time Thief, which one is stronger. This procedure could work with any two variant pieces, so Guard and Time thief are just examples.

Let's first create a random position, in which we put two almost equal armies. Except that Black will get a Guard, and White a Time Thief. This is our random position, with Black to move (but that is also random):

We have verified that this random position is legal, and a bit balanced. For this we have checked with minmax search depth 3, and seen that there was no too unbalanced outcome.

Next, we will simply count the attacks on any square from each piece in each of the two armies. For this position, the outcome was -3, which indicates that Black is slightly better.

But now we do one more thing: search moves from this position, with a minmax depth 6. The evaluation function will still just count attacks. The outcome could now also be mate then that will earn a higher value. For the above position, the outcome was -5,5. So Black was better, and that is in favor of the Guard.

Now realize that the Time Thief does not have normal attacks. We're not counting its attacking score directly. But it can still work together with allied pieces, and help to attack more squares.

Of course, this is just one random position. It could have contained some tactic by pure coincidence. But then we can generate any number of random positions, and repeat this procedure for all of them, and we'll get an average outcome.

When I did this with 100 random positions, and minmax search depth 6, I saw a total result of +89. So during this experiment, the Time Thief was stronger than the Guard.

evert823

I did this procedure to test Hunter against Rook, with depth 7 instead of 6, and 500 random positions. Result: totalscore +604, which means that a Hunter is slightly stronger than a Rook on an 8x8 board.

One of the positions:

HGMuller

I don't quite understand your procedure. Minimax is just a score-propagation scheme for establishing which of the evaluations in the tree leaves is the relevant one. So in the end your judgements about the positions are based on this evaluation function. And the most important contribution to evaluation in chess-like games is the material balance in terms of piece values. And the random positions you show are not tactically quiet at all; both sides have hanging pieces. So the principal variation will involve a lot of captures.

How can you avoid that what you get out in these conditions is simply what you put in as piece value in the evaluation function?

evert823

I will elaborate later. But I did mention that my evaluation function here does not know or put in any piece value. It counts attacked squares of each army.

evert823

To explain how it works:

The evaluation function counts the squares attacked by each of the armies, and takes the difference between these numbers. It's that simple (*).

For a magical piece without direct attacking power, it will not count any attacking power. But minmax will make sure that these pieces count for something.The magical pieces will help the other (allied) pieces to maximize their attacks. That's what should be observable in the leaves somehow, as minmax is trying to find clever ways to deploy these pieces.(Same with a Queen that may be blocked in one position, but finds much more freedom in a leave.)

 (*)OK, it also does a few additional steps to guarantee, that the outcome ends up within a particular range, that does not mingle with values that represent absolute win for white or black.And where we actually find an absolute win, in this procedure, it will be detected and count. 

I agree with you that the random positions, on top of which we run minmax, contain many trivial captures. That may be something that should be tuned further in order to improve this approach.

HGMuller

OK, so it is purely based on the number of attacked squares. That is a bit suspect, as for many pieces this number is quite dependent on board population. On a crowded board it is usually easier to manoeuvre a Bishop on a (half-)open diagonal than it is to get a Rook on an open file. Yet we know that even in this phase of the game a Rook is worth significantly more than a Bishop.

The first thing you should do is try if the method works for the orthodox pieces. If one side has a Bishop where the other has a Knight, does it think they are equal?

In the Interactive Diagram I also guestimate the value of a piece by the number of squares it attacks and non-capture moves it has in a few hundred random positions. But rather than doing a minimax search from those positions to allow some optimization of the piece, I just assume that it can be placed one standard deviation above the average.

Superplayer7472

How do the Guard, Time thief and Hunter move?

evert823
HGMuller wrote:

If one side has a Bishop where the other has a Knight, does it think they are equal?

I've started this calculation with 7 plies and 500 positions and it will take several hours.

You have mentioned the idea of testing the value of pieces, by letting your engine play a significant amount of entire games against itself. Have you actually done that? Or what is your opinion on the achievability of this?

Superplayer7472 wrote:

How do the Guard, Time thief and Hunter move?

I have added links to the posts above.

HGMuller

I just tried a few games with Fairy-Max, where I replaced white's Rooks with Hunters. (mKcNcD; White was still allowed to castle with those.) This was working nicely, and after 14 games there was no obvious advantage for either side (7+ 4= 7-). To determine value difference to within a quarter of a Pawn needs at least 100 games, though. And I was doing this at 40 moves per minute, where a game lasts about 3 min. This is nice for watching, but for doing 100 games it is more convenient to make it play faster.

For the Guard (non-royal K) I already did such tests long ago, and it was the same strength as a Knight.

To do Time Thief would require a modification of the engine. The Time-Thief captures cannot be handled by the standard e.p. code, because you might have to put back a capture victim.

For the Guard

evert823
HGMuller wrote:

If one side has a Bishop where the other has a Knight, does it think they are equal?

With 500 positions depth 7 it took 5 hours and 20 minutes. Outcome +97.5. That is a very marginal preference for white. Where white has an extra Knight against an extra Bishop for black. One such position:

HGMuller

I have no feeling for what is large or small in this case, but it sounds like you consider this as passing the test. I am surprised it takes that long, though. You are not using alpha-beta pruning?

Fairy-Max could play 150 games at 40moves/min in 5 hours, and very likely a faster time control could be used as well. I believe Stockfish is developed using 10-second games, and would do 1800 games in that time. Fairy-Stockfish should be able to handle pieces like Guard and Hunter.

The advantage of playing games is that you measure the performance of the pieces averaged over a representative sample of positions, something you cannot be sure of when you just randomply place pieces on a board. The Interactive Diagram's heuristic for determining piece values uses random positions as well, but it never plays any moves from them. So the only effect is really whether a square is empty or has a white or black occupant. (And to make it more realistic I let the density of pieces of a certain color increase linearly towards their own base rank.) So the randomized pieces just act as potential victims or blockers (or hopper mounts), and the piece under test is put on all empty squares to average its mobility.

Does the minimax you perform really add anything? I.e. are the results significantly different depending on whether you search 3 or 7 ply (or not at all)?

evert823
HGMuller wrote:

I am surprised it takes that long, though. You are not using alpha-beta pruning?

The person who wrote this is a really bad programmer(LOL). Yes there is alpha-beta pruning, without it it was even slower.

evert823
HGMuller wrote:

I am surprised it takes that long, though. You are not using alpha-beta pruning?

Out of curiosity, how much time does Fairy-max need to find the forced mate in this position?

https://www.chess.com/forum/view/more-puzzles/hardest-mate-in-4-puzzle-61176239

HGMuller

Interesting, black has 9 Pawns.

Fairy-Max finds mate in 4 in 0.32 sec:

10 #4 371502 0:00.42 e2f3 c6a5 h4f6 e7f6 f3f4 e5d5 d3e4
9 #4 312899 0:00.35 e2f3 c6a5 h4f6 e7f6 f3f4 e5d5 d3e4
8 #4 283245 0:00.32 e2f3 c6a5 h4f6 e7f6 f3f4 e5d5 d3e4
7 #4 272298 0:00.32 e2f3 c6a5 h4f6 e7f6 f3f4 e5d5 d3e4
7 +0.95 110522 0:00.14 f2f4 e5f5 f1g3 f5g6 g3e4 f6e4 e2g4 g6h7 d3e4 f7f5 h4e7
6 +0.81 21219 0:00.03 f2f4 e5f5 f1g3 f5g6 g3e4 f6e4 e2g4 e7g5 f4g5
5 +0.63 13516 0:00.03 f2f4 e5f5 f1g3 f5g6 g3e4 f6e4 d3e4
5 +0.29 6066 0:00.01 h4g3 e5f5 e2f3 f5g6 d3e4 f6e4 f3e4
4 +0.29 3037 0:00.01 h4g3 e5f5 e2f3 f5g6 d3e4 f6e4
3 +0.08 1311 0:00.01 h4g3 e5f5 e2f3 f5g5 d3e4
2 -0.51 541 0:00.01 h4g3 e5f5 e2f3
2 -0.57 260 0:00.01 f2f4 e4f3 e2f3
1 -0.68 50 0:00.00 f2f4

evert823

My engine somewhere above 12 minutes. That's after already having done many improvements. Funny thing is: the golden move (Qf3!) is one of the worst moves on lower depth so it's almost the last move that it starts serously looking at, spending at least 10 minutes out of these 12 minutes on the non-winning moves. Maybe I should start a topic on talkchess.com.

HGMuller

Well, it is an unfair comparison, because (if I understand correctly) your engine uses a fixed-depth search, and Fairy-Max uses all kinds of reductions in non-PV lines to get faster time-to-depth. While a check extension helps it to see checkmates at lower depth. This is why it already sees the mate at d=7, while the capture of the King occurs only in the 9th ply. There are two checks in the main line.

Qf3 will indeed look very poor until you see it ends in checkmate. I don't think there is any way to avoid that. Unless you would have a very sophisticated evaluation of King Safety, which scores an extra attack on f5 as more than a Queen. (If you sort the moves strictly by evaluation score, that is.) Then the Pe4 would be 'pinned' by Bd3. In an engine especially designed for mate solving you would probably weigh King freedom much heavier than material, and when Qf3 cannot be taken it gives you the score for attacking f5, and be considered best move from the start.

evert823

What I don't understand now is that my engine finds this mate

https://talkchess.com/viewtopic.php?t=84227

and the original mate in 7 (!) - in far under a minute.

evert823

The Joker imitates movement and capture of the piece which the opponent had moved last.
Obviously, the Joker is never stronger than the strongest piece in the opponent's army, and probably weaker.
So I decided to do a measurement of extra Joker against extra Archbishop (B+N-component), 200 positions depth 7.
Example of such a position:


Outcome: 1501,5
1501,5 / 200 = 7.51 in favour of the Archbishop on avarage
This indicates that a Joker is much weaker than an Archbishop.

HGMuller

Indeed, a Joker is pretty weak. I once made a Fairy-Max derivative that could play with a Joker, ( http://hgm.nubati.net/jokermax.zip ). And, like you say, the value of a Joker is likely dependent on what the opponent has, and thus won't be constant over the game (while Fairy-Max assumes such constancy). So it might not play optimally with the Joker.

Main problem is that a Joker is very easily lost for a Pawn in the middle-game, when it gets on or near the opponent half. And in the end-game there is a good chance only Pawns and minors are around, and then it won't be worth more than a minor. In fact it is worth significantly less, as the opponent is partly in control over its moves. My impression was that the opening value of a Joker is about the same as Knight.

Aserew12phone

The opening value is mostly a pawn unless you keep attacking your opponents queen