Neural Network Black Jack

For a final project in my Neural Networks course, my partner and I created a black jack game in C++ that uses a reinforcement learning neural network. The game has 3 players: the dealer, neural network 1, and either a human player or neural network 2.

For those who don't know what Black Jack is, Blackjack (or Twenty-One) is a card game where the player attempts to beat the dealer by obtaining a sum of card values higher than the dealer's and equal to or less than 21. Each card has the same value as its index except for the ace (which can be counted as 1 or 11, as per the player's choice) and the face cards (which are counted as 10). At the beginning of the game, each player is dealt two cards, one face up and one face down. After looking at his first two cards, the player chooses to draw (hit) or to stop drawing cards (stand). The player may take as many hits as he wants as long as he doesn't bust, i.e., as long as the sum of card values in his hand does not exceed 21. Once all the players have finished their hands, the dealer shows his or her face-down card and draws cards until he/she has a total of 17 or above (the standard strategy in professional Blackjack).

In our implementation of the game, the dealer played based on a flat dealer rule. If the dealer has less than 17, the dealer hits. If the dealer has 17 or more, the dealer stands, unless a player has it beat, in which case the dealer hits. This flat rule served as our benchmark. Also, techniques such as "splitting pairs," "doubling down," etc. were removed for the sake of simplicity. Our goal was to utilize a neural network to play Black Jack, not create a complex Black Jack simulation.

The learning equation for our neural network was:

Q(s,a) = Q(s,a) + α(r + γQ(s’, a’)-Q(s,a)) where
- γ is the discount size
- α is the learning rate
- s is the state
- s' is the next state
- a is the action
- a' is the next action
- and r is the reinforcement signal

To clear up any confusion with the equation (or possibly to confuse you even more), I'll take you through an example of our neural network learning. Our reinforcement signal will have one of three values: -1 if the NN busts, +1 if the NN wins, and 0 otherwise. Our learning rate α will be 0.1, and our discount size γ will be 0.9. Also, two terminal states were introduced: s=21 (Q=1) for a perfect score, and s=-1 (Q=-1) for a bust.

Say the NN is initially dealt a 9 and a 4. The initial state of the system corresponds to the values at hand, so s=13.
The NN chooses an action according to the highest Q value. For example, if Q(13,hit) equals -0.3 and Q(13,stand) equals -0.5, the NN takes action 'hit.'
The NN receives a new card and modifies its state. Say it received an ace, its state would now be 14 (s=14).
The state-action value Q(13,hit) now needs to be updated according to the learning equation: Q(13,hit)=Q(13,hit)+0.1(r+0.9(Q(s',a'))-Q(13,hit)). This requires the selection of a next action. The NN chooses the next action as it did before, by determining the highest Q value between Q(14,hit) and q(14,stand). Assuming the next action is 'hit,' Q(13,hit) is updated using the estimate of the reward of the next state action Q(14,hit). If Q(14,hit) equals -0.4, Q(13,hit) is updated as follows:
- Q(13,hit) = -0.3+0.1(0+(0.9*-0.4)-(-0.3)) = -0.306
The NN decided to hit, so say it receives a 2 as the next card. The NN modifies its state to s=16.
Similar to step 4, the system updates Q(14,hit) as a function of Q(16,a'), where a' is the action chosen from state s=16. Say Q(16,hit) is larger than Q(16,stand), thus the NN decides to hit.
Say the NN receives a 10 as its new card, which results in a bust. The NN falls in the terminal state (s=-1), and updates Q(16,hit) as follows (assuming Q(16,hit) was previously -0.45):
- Q(16,hit) = -0.45+0.1(-1+(0.9*-1)-(-0.45)) = -0.595

NN Black Jack Files:

Presentation - the final presentation that we gave to the class. Most of our information was either stated verbally or demonstrated, therefore, this presentation does not contain much info.
Game(Self Extracting Archive) - Install and see how well you fare against our neural network!