I spent the past week building an AI for Solitaire Dice. William created this game for the touch table from the Sid Sackson book “A gamut of games”. The game plays by rolling five dice each round. You pick one die to be the reject and then make two scoring pair from the remaining four dice. Each round you reject a number (1-6) and then score two pairs (2-12). Once you have rejected three different numbers, you always have to reject one of those three if possible. If the five dice don’t contain any of your rejects, you don’t pick a reject and make two scoring pairs from the five dice. Once you have rejected a number 8 times the game is over. Your score is based on how many times you have taken each scoring pair during the game. You lose 200 points if you have scored a number 1-4 times. 0 or 5 times is zero points. You get points for each score >5 and <10. So, the first time you pick a number it cost you 200 points, once you have taken the number five times you are back to zero. Each additional score gives you points based on the number 2 or 12=100, 3 or 11=70, 4 or 10=60, 5 or 9=50, 6 or 8=40, and 7=30. It is a fun game that you get better at as you play.
It is not possible for the computer to look through all the possible games to decide what to do since there are a staggeringly huge number of games. I decided to make the AI score each possible option after each throw. The tricky part is scoring the possibilities. The first step was to build a table of all the possible combinations of the five dice and how likely they are to come up. For example: there is only one way to roll all ones, but getting four ones and one two can happen five ways. For a given roll, the choices that you have depend on the rejects that you have taken so far. So, for each set of three rejects, I looked at each dice combination and built all the choices that you would have. For example: if you are rejecting 1,2 and 3 and you roll 1,3,4,4,5 you can: reject 1, take 7,9; reject 1, take 8,8; reject 3, take 5,9; or reject 3 take 6,8. So this roll give you one way to score 7, three ways to score 8, two ways to score 9, etc.. I then add the number of times you could take each number times that number’s score to the weighting for that number. In the previous example there were three ways to score 8 * 60 ways to roll a 1,3,4,4,5 * 40 points for eight = 19,200. After going through all the possible rolls I had a weighting for each number, repeating for each set of three rejects, I had a strategy of which dice to take based on your rejects.
The next step was to write the actual game AI based on this strategy. The AI played by scoring each possible choice and taking the best scoring choice. The choice’s score is the amount of points you will gain/lose for scoring those dice, plus a factor for getting the set of three rejects that the strategy picked, plus a factor for getting the three rejects picked quickly, plus a factor for keeping the rejects taken a similar number of times, plus a factor for picking scoring pairs that have a high weighting. I picked the relative importance of each of the five factors by hand.
The next step was to make some improvements to the game logic. I wrote some code to handle the fact that scoring two of the same number in one round is not the same as scoring each number separately. I also wrote code to handle the symmetry of the rejects: 1,2,6 is the same as 1,5,6; so if your strategy is to go for 1,2,6 and you have the option to reject a 5, that is just as good. And finally, if you are forced to reject a number outside your strategy, I update the weightings of each scoring number. Once all the play logic was written, the AI was scoring about 0 points for the best set of rejects and -100 to -50 for the other sets of rejects.
The next step was to write code to run the game many many times with different values for the five scoring factors to find the best set. To do this, I create a population of 20 different values and play 50K games. The top scoring set of values is then used to make 20 new strategies. Repeating this 100 times results in a good set of scoring factors. This step improved the AI to an average score of 65 for the best reject set and 10 for the other reject sets. The average human averages -50 and a good player can average +100.