Neural Network Sort

2016-01-06 | by Kory Becker

Introduction

Neural networks can be used for some pretty fascinatingly flexible solutions. Typical examples includes bitwise logic (AND, OR, XOR), classification of data, image recognition, and more recently, self-driving cars. You’ve probably heard about many of these examples already. But, have you ever heard of neural network sorting? Well then, read on!

This tutorial demonstrates the (perhaps, no so) surprising result of training an artificial intelligence program to sort numbers. We’ll create a simple example of a neural network in R. We’ll feed it a training set of 3 numbers, from 1 to 100, in random order. The program will learn to output the numbers in ascending order.

A Simple Neural Network Example in R

I’ve written a couple of articles on neural networks in the past, with examples in both C# and Javascript. For the past year or so, I’ve been working with R for the majority of my machine learning projects. I’ve found R to be one of the easiest and fastest ways to get a machine learning project up and running. Using the neurondotnet C# library or the brain node.js module, usually required a bit of mangling with data to get it in the correct format for training. With the help of R’s caret library, it’s much faster getting a project running. Not to mention, R has a huge list of built-in machine learning models ready for use.

Let’s start with the basics. Below is the code for a very simple neural network in R. This neural network learns how to calculate the square root of a number. The code is shown below.

library(caret)

# Generate 50 random numbers uniformly distributed between 0 and 100.
training <- data.frame(x = runif(50, min=0, max=100))
training$y <- sqrt(training$x)

# Generate some squared numbers.
cv <- data.frame(x = (1:10)^2)
cv$y <- sqrt(cv$x)

# Run the neural network training.
fit <- train(y ~ ., data = training, method = 'neuralnet', tuneGrid = expand.grid(layer1 = c(6), layer2 = 0, layer3 = 0))

# Get the results against.
results <- predict(fit, newdata = cv)
conf <- table(results, cv$y)

cbind(cv, predict = round(predict(fit, newdata = cv)))

In the above code, we simply generate some random numbers to find the square root. We have both a training and cross-validation set. We then run the “train” method, using a neural network model. In this example, we’re simply using 1 input node (the number to find the square root of), 6 hidden nodes, and 1 output node (the square root). The train method figures out the number of nodes for input and output, based upon the data. So, that’s pretty handy.

The final three lines of code simply run the trained neural network on our cross-validation data. The output looks like this:

     x  y     predict
1    1  1 1.556684143
2    4  2 2.114010068
3    9  3 2.975641028
4   16  4 3.987989475
5   25  5 5.004697727
6   36  6 5.998886923
7   49  7 6.997019411
8   64  8 8.000357771
9   81  9 9.003102803
10 100 10 9.992746337

Since the neural network output will have float values, we’ll just round the values. These are shown in the “predict” column below.

     x  y predict
1    1  1       2
2    4  2       2
3    9  3       3
4   16  4       4
5   25  5       5
6   36  6       6
7   49  7       7
8   64  8       8
9   81  9       9
10 100 10      10

As you can see, the neural network has successfully trained on the square root of a number (with 90% accuracy).

That’s a pretty simple example of a neural network in R. And it’s less than 10 lines of code. Now, back to our main project!

Neural Network Sort

It’s time to sort some numbers. The input to our neural network will be 3 numbers from 1 to 100. The output will be the same numbers in ascending order. We’ll use 2 hidden input layers of 40 neurons each, trained to a threshold of 0.001, with a learning rate of 0.6, and a training set size of 750 examples. Here’s the code:

library(neuralnet)

# Helper method to generate a training set containing size random numbers (a, b, c) and sorted (x, y, z).
generateSet <- function(size = 100, max = 100) {
  # Generate size random numbers between 1 and max.
  training <- data.frame(a=sample(1:max, size, replace=TRUE), 
                         b=sample(1:max, size, replace=TRUE), 
                         c=sample(1:max, size, replace=TRUE))
  
  # Generate output examples by sorting the numbers.
  output <- data.frame()
  x <- sapply(1:nrow(training), function(i) {
    row <- training[i, ]
    sorted <- row[order(row)]
    
    output <<- rbind(output, unlist(sorted))
  })
  
  # Append output to the training set.
  names(output) <- c('x', 'y', 'z')
  
  cbind(training, output)
}

# Helper method to restore the original values after scaling. x is the object to unscale, orig is the originally scaled data.
unscale <- function(x, orig) {
  t(apply(x, 1, function(r) { 
    r * attr(orig, 'scaled:scale') + attr(orig, 'scaled:center')
  }))
}

# Generate training and cv data.
data <- generateSet(1500)

# Normalize data.  
data <- scale(data)

# Split for a training and cv set.
half <- nrow(data)/2
training <- data[1:half,]
cv <- data[(half+1):nrow(data),]

# Train neural network.
fit <- neuralnet(x + y + z ~ a + b + c, 
                 data = training, 
                 hidden = c(40, 40), 
                 threshold = 0.001, 
                 rep=1, 
                 learningrate = 0.6, 
                 stepmax = 9999999,
                 lifesign = 'full')

# Check results.
results1 <- round(unscale(compute(fit, training[,1:3])$net.result, data))
results2 <- round(unscale(compute(fit, cv[,1:3])$net.result, data))

# Count rows that are correct. Note, we use round(i, 10) when comparing equality http://stackoverflow.com/a/18668681.
correct1 <- length(which(rowSums(round(unscale(training[,4:6], data), 10) == results1) == 3))
correct2 <- length(which(rowSums(round(unscale(cv[,4:6], data), 10) == results2) == 3))

# Calculate accuracy.
print(paste('Training:', correct1 / nrow(training) * 100, '% / CV:', correct2 / nrow(cv) * 100, '%'))

Aside from the two helper methods for generating and de-normalizing data, the code is fairly straight-forward. We cut the generated data in half for a training and cross-validation set. Then we train the neural network. Once complete, we count the number of correct results on both the training and cv sets. This gives us our final accuracy. You can then run the neural network on any new data and it should be able to successfully sort any 3 numbers within the range of 1-100.

Choosing the Neural Network Parameters

The method I used to identify the number of neurons and various configuration parameters was through learning curves.

You can create a learning curve by first choosing specific parameters for the network. Then run the network over gradually increasing numbers of training sets.

For example, run the network on 25, 50, 75, 100 examples. At the lowest number, the training accuracy should be near 100%. After all, it’s usually easy for a neural network to learn to match the output from a couple of training examples. The cross-validation set, which the network has not trained on, should have a very low accuracy. As the examples increase, the training accuracy should decrease slightly (more examples take more effort to train) and the cross-validation accuracy should increase (more examples lead to more learning). Eventually, the two accuracy lines tend to converge.

The R code, for generating the learning curve for this project, draws a chart like this:

Neural Network Sort Learning Curve - sorting 3 numbers

You can see how the two curves for the training set and cross-validation set converge towards a common accuracy. Now, keep in mind, since we’re generating sets of 3 numbers from 1-100, there is a finite number of possible arrangements. If we train the network on all of them, we’ve defeated the purpose! Luckily, there are 1,000,000 possibilities, and we’re only training on 750 of them (or 0.075%). Looking at the graph, we could probably train on just 300 examples and still achieve an accuracy around 80%. Let’s take a look at the results.

Results

Sorting 3 Numbers

After training Neural Network Sort on 750 examples of sorting 3 numbers, we see an accuracy of 100% training / 98% cross-validation.

Here is Neural Network Sort running (note, the variable named “fit” is our neural network model):

Training: 100 % / CV: 97.9 %

> test <- generateSet(10)
> testScaled <- scale(test, attr(data, 'scaled:center'), attr(data, 'scaled:scale'))
> testResults <- round(unscale(compute(fit, testScaled[,1:3])$net.result, data))

> test[,1:3]
    a  b  c
1   8  4 61
2  94 39 97
3  55  1 63
4  61 63 52
5  17 15 39
6  79 18 91
7  73 38 89
8  79 53 17
9  20 31 12
10 87 35 26

> testResults[,4:6]
    x  y  z
1   4  8 61
2  39 94 97
3   1 55 63
4  52 61 63
5  15 17 39
6  18 79 91
7  38 73 89
8  17 53 79
9  12 20 31
10 26 35 87

> paste('Test Results:', length(which(rowSums(round(test[,4:6], 10) == testResults[,4:6]) == 3)) / nrow(test) * 100, '%')
Test Results: 100 %

Ok, that was running on just 10 samples. Let’s try it on something bigger (5,000 examples).

> test <- generateSet(5000)
> testScaled <- scale(test, attr(data, 'scaled:center'), attr(data, 'scaled:scale'))
> testResults <- round(unscale(compute(fit, testScaled[,1:3])$net.result, data))

> head(test[,1:3])
    a  b  c
1  59 28 52
2 100 40 72
3  22 58 75
4  35 21  7
5  94 83 39
6  68 71 32

> head(testResults[,4:6])
   x  y   z
1 28 52  59
2 40 72 100
3 22 58  75
4  7 21  35
5 39 83  94
6 32 68  71

> paste('Test Results:', length(which(rowSums(round(test[,4:6], 10) == testResults[,4:6]) == 3)) / nrow(test) * 100, '%')
Test Results: 98.06 %

Sorting 4 Numbers

Increasing the number of values to sort will result in increased complexity. Using the same training set size of 750 examples (this time, sorting 4 numbers), we see an accuracy of 87% training / 50% cross-validation. The code was modified to simply include an extra number in the generation data and an extra input/output node to the neural network.

Notice in the learning curve for this new training set, the complexity increase can be seen, as shown by the reduced accuracy. Longer training, along with a larger training set, would do much better. You can tell this by looking at the learning curve below, since the cross-validation accuracy continues to increase with more training examples. Based on this chart, we might predict convergence around 80%.

Neural Network Sort Learning Curve - sorting 4 numbers

Let’s prove it. If we bump the training up to 1,500 examples (which took 4 hours to train on a Quad-Core I7), we get an accuracy of 93% training / 77% cross-validation.

Neural Network Sort Learning Curve - sorting 4 numbers, trained on 1000 examples

Neural Network Sort Wrapper

We can wrap the manual sorting code into a helper method named nnsort, as follows:

nnsort <- function(fit, scaleVal, a, b, c) {
  numbers <- data.frame(a=a, b=b, c=c, x=0, y=0, z=0)
  
  numbersScaled <- as.data.frame(scale(numbers, attr(scaleVal, 'scaled:center'), attr(scaleVal, 'scaled:scale')))
  round(unscale(compute(fit, numbersScaled[,1:3])$net.result, scaleVal))[,4:6]
}

Then run it:

> nnsort(fit, data, 100, 20, 32)
  x   y   z 
 20  32 100
> nnsort(fit, data, 37, 6, 4)
 x  y  z 
 4  6 37
> nnsort(fit, data, 78, 1, 12)
 x  y  z 
 1 12 78

Saving and Loading a Trained Neural Network

A previously trained neural network can be saved for future use, with the following commands:

# Save the neural network to disk.
save(fit, file='C:\\Users\\username\\Desktop\\fit.RData')
# Save the scaled data to disk.
save(data, file='C:\\Users\\username\\Desktop\\data.RData')

You can then load the trained neural network to run it. Here are the download links for the trained neural networks to sort 3 numbers and 4 numbers.

After downloading, you can run the trained network as shown below.

library(neuralnet)
# Load the neural network from disk.
load('trained.RData')
# Run Neural Network Sort.
nnsort(fit, data, 6, 4, 8, 21)

Conclusion

Neural networks can do a lot of things. As we’ve just demonstrated, sorting is one of them. Source code for this project is included above. There is also a version in node.js available as well. The javascript model is slightly different, but follows the same general pattern.

Neural Network Sort for node.js

About the Author

This article was written by Kory Becker, software developer and architect, skilled in a range of technologies, including web application development, machine learning, artificial intelligence, and data science.

Software Development, Programming, AI

ProgrammingArtificial Intelligence

Neural Network Sort

Introduction

A Simple Neural Network Example in R

Neural Network Sort

Choosing the Neural Network Parameters

Results

Sorting 3 Numbers

Sorting 4 Numbers

Neural Network Sort Wrapper

Saving and Loading a Trained Neural Network

Conclusion

About the Author