Is it possible for a computer program to write its own programs? Could human software developers be replaced one day by the very computers that they master? Just like the farmer, the assembly line worker, and the telephone operator, could software developers be next? While this kind of idea seems far-fetched, it may actually be closer than we think. This article describes an experiment to produce an AI program, capable of developing its own programs, using a genetic algorithm implementation with self-modifying and self-improving code.
The above programming code was created by an artificial intelligence program, designed to write programs with self-modifying and self-improving code. The program created the above result in 29 minutes. The programming language is brainfuck. Why this programming language? Read on.
All code for the AI program is available at GitHub.
Artificial intelligence has been progressing steadily over the years, along with advances in computer technology, hardware, memory, and CPU speeds. As computers get faster, more computations can be performed, allowing increasing power for the computation-intensive processing required by many AI algorithms.
It’s been somewhat of a hobby for me, dabbling with artificial intelligence programs in an attempt to write a program that can, itself, write programs. Of course, I’m not referring to programs that take subsets of program instructions or code blocks and combine them together or otherwise optimize to produce an end result. I’m referring to starting from complete scratch, with the AI having absolutely no knowledge whatsoever about how to program in the target language. The AI must learn, on its own, how to create a fully functioning program for a specific purpose.
I initially began this venture in the late 1990’s by attempting to create programs with simple if/then/else statements to output programs in the BASIC programming language. This was a difficult task for a multitude of reasons. First, using if/then/else conditionals to write a random program doesn’t seem very intelligent at all. Second, the number of computer instructions available in BASIC is far too great. Even more troublesome, some of the instructions are downright dangerous (Shell(“format c:”))! I also attempted to generate programs using C, C++, and a few other languages. However, this naive approach never produced a working child program. Although this was due not just from using simple if/then/else statements, but also because the selected programming language was intended to be usable by humans - not computers, and thus, far more complicated for an AI to automate.
While the ultimate goal would be to produce a computer program capable of writing its own word processing software, image editing tool, web browser, or disk defragmenter, I was more-so interested in a simple proof-of-concept that demonstrated the idea was possible.
My initial motivation behind the idea came from the infinite monkey theorem, which states if you have 1,000+ monkeys banging away on a typewriter long enough, they’ll eventually re-produce a play written by Shakespeare. This sounds ridiculous, but given enough time, surely the monkeys would end up hitting “some” random sequence of characters that end up producing the written works. Simplifying this idea, surely one of the monkeys would at least hit the first letter of a Shakespeare play somewhere in the banging of the keyboard; that’s certainly possible.
What if you could guide the monkeys? Each time one of the monkeys hits a correct key in the right sequence you reward him with a banana? After a long enough time, maybe the monkey would begin to pick up a pattern? If he’s really sharp, maybe he’d even begin to pick up which letters typically go together to form words in the English language and thus, capitalize on many more bananas than his peers.
This is the general idea behind a genetic algorithm. A genetic algorithm is a type of artificial intelligence, modeled after biological evolution, that begins with no knowledge of the subject, aside from available tools and valid instructions. The AI picks a series of instructions at random (to serve as a piece of DNA) and checks the fitness of the result. It does this with a large population size, of say 100 programs. Surely, some of the programs are better than others. Those that have the best fitness are mated together to produce offspring. Each generation gets a bit of extra diversity from evolutionary techniques such as roulette selection, crossover, and mutation. The process is repeated with each child generation, hopefully producing better and better results, until a target solution is found. Genetic algorithms are programmatic implementations of survival of the fittest. They can also be classified as artificially intelligent search algorithms, with regard to how they search an immense problem space for a specific solution.
While the original experiments using BASIC, C, C++, and other languages failed to produce results, I was able to successfully generate AI written programs by combining a home-grown programming language (consisting of add, subtract, loop, etc) with genetic algorithms and neural networks. While this was interesting, the end result was simple math calculation and the programming language itself, was unknown and had severe limitations with what it could ultimately produce.
I began searching for a simple programming language, with a limited number of instructions, that I could train an AI program to use. Assembly (ASM) is close, but still contains far too many permutations. As humorous as it sounds, I eventually experimented with brainf-ck and was finally able to successfully generate the code shown above.
While brainf-ck is intended to be a joke programming language, due to how difficult it is for humans to work with, it actually has several distinct advantages for computers.
A Turing complete programming language means that it’s theoretically capable of solving any computational problem in the universe. A programming language with this capability opens up a vast array of possibilities. After all, most, if not all computer programs are designed to perform some kind of computation and output the result in some manner.
The simplified instruction set reduces the search space in which a target program code can be found. As computers get faster, larger problem spaces can be searched. However, on a personal computer, the search space needs to be constrained. By limiting the programming instruction set to just 8 different characters, the AI can run much faster and obtain an optimal fitness score within a reasonable amount of time (ie., minutes, hours, or possibly even a day).
The instruction set is well documented and easy to understand. Therefore, it’s quite straightforward to create a simple interpreter that can execute a program. By including the interpreter within the AI program + genetic algorithm itself, the code can be optimized to run much faster than if it had to call an outside compiler to execute each child program. This also provides security constraints, since the child programs are running within a controlled environment, within the AI program. The AI also gains access to the internal components of the interpreter, such as memory, instructions, and output. This is useful in calculating a fitness score. Whereas, with a 3rd-party compiler, these components would be far more difficult to access.
The AI program used in this article is designed in C# .NET and uses an array of doubles to serve as a genome. Each double (gene) in the genome corresponds to an instruction in the programming language. Since each instruction is just 1 byte, it’s easy to map each gene to a programming code (note, 1 double = 8 bytes; still equivalent to one slot in the array).
Most interpreters for the programming language simply execute the code, maintain memory values, and include support for console input/output. However, it’s possible to expand the interpreter to include support for producing graphics, network capability, file-system access, and much more. Just think of the power you could give the AI to develop its own programs! ;)
The AI program works, as follows:
- A genome consists of an array of doubles.
- Each gene corresponds to an instruction in the brainf-ck programming language.
- Start with a population of random genomes.
- Decode each genome into a resulting program by converting each double into its corresponding instruction and execute the program.
- Get each program’s fitness score, based upon the output it writes to the console (if any), and rank them.
- Mate the best genomes together using roulette selection, crossover, and mutation to produce a new generation.
- Repeat the process with the new generation until the target fitness score is achieved.
Since the fitness method is the most computationally expensive part (it has to execute the program code for each member in the population, which probably includes infinite loops and other nasty stuff), the AI program uses the Parallel.ForEach method, found in .NET 4.5. In this manner, it can execute multiple fitness algorithms for multiple genomes in the population, upon each generation. This allows the program to utilize maximal CPU resources and take advantage of multiple CPU cores. The program also saves its state every 10,000 generations, in case the program or PC is shutdown, and it can continue searching from where it left off.
The fitness method works by scoring the output of the generated program. The score is calculated by looking at each character output by the program (if any output was produced at all) and subtracting its value from the desired character:
fitness += 256 - Math.Abs(console[i] - targetString[i]);
Of course, initially most generated programs won’t even compile, let alone output text to the console. These are simply discarded, favoring programs that at least output something; and further guided and evolved until the output result is closer and closer to the desired solution.
Brainf-ck consists of the following instruction set:
> Increment the pointer.
The AI successfully wrote a program to output “hi” after 5,700 generations in about 1 minute. It produced the following code:
While the above code contains parsing errors, such as non-matching brackets, our simulation interpreter computes the result up until the program fails, so in the above case, the syntax error (which is later on in the code, after a solution is found) doesn’t impact the fitness.
You can try pasting the above code into a brainf-ck interpreter. Click “Start Debugger”, ignore the warnings, then click Run To Breakpoint. Note the output.
If we trim off the excess code, we see the following syntactically-valid code:
You can view the screenshots below, taken while the program was running:
This is the history graph, plotting the fitness score over time. You can see how the AI learned how to program in the target language and achieve the desired solution.
The AI successfully wrote a program to output “hello” after 252,0000 generations in about 29 minutes. It produced the following code:
During the generation process, the AI came pretty close to a solution, but a couple letters were bound to each other, within a loop. The AI was able to overcome this by creating an inner-loop, within the problematic one, that successfully output the correct character, and continued processing.
The AI successfully wrote a program to output “Hi!” after 1,219,400 generations in about 2 hours and 7 minutes. It produced the following code:
This is actually one of my favorites. Run it and you can see why (click Start Debugger and Run to Breakpoint). It’s almost as if the computer knows what it’s doing. It’s interesting to note how generating this program took quite a bit longer than the prior two. This is likely due to the characters used, which include a capital letter and a symbol. The other two examples used characters that are much closer in value in the ASCII system, which would be easier for the AI to find.
The AI successfully wrote a program to output “reddit” after 195,000 generations in about 22 minutes. It produced the following code:
This one was a challenge. It may have been tricky due to its length, or possibly due to the location of the d’s. The AI would repeatedly get stuck within a local maximum. A local maximum is when a genetic algorithm finds the best fitness that it can see within its current parameters, even though a better fitness may exist. The AI is unable to get out of its hole and achieve the better fitness because doing so would require the fitness to drop before increasing again, which is generally against the rules of a genetic algorithm.
I was able to resolve this issue by including additional diversity in the mutation function. Previously, the mutation worked by simply altering a single instruction in the genome. Mutation was enhanced to include not just mutating a single bit (replacement mutation), but also shifting the bits up (insertion mutation), and shifting down (deletion mutation). This extra diversity allowed the AI to keep moving.
This was produced after 580,900 generations in about 2 hours. It produced the following code:
If you trim off the excess, the actual code that prints the text is much shorter:
I love all humans
This was produced after 6,057,200 generations in about 10 hours. It produced the following code:
If you trim off the excess, the actual code that prints the text is shorter:
In the above run, the AI was supplied with a starting program array size of 300 instructions (ie., 300 bytes, or rather 2,400 bytes since 1 double = 8 bytes). The full length of the program code was not needed by the AI. It was able to write the program with just 209 instructions.
Note, this solution took 10 hours to complete. However, keep in mind, with an AI program doing the programming, rather than a human, the amount of time required to complete a program is of less concern. The AI can simply be left running in the background, while the human works on other tasks. I also expect the computation time to be dramatically reduced as computers get faster in the coming years.
This experiment was a proof-of-concept that an AI program could develop its own computer programs that perform a specific task. In that regard, it was a success. The AI was able to start with absolutely no knowledge of the target programming language and successfully learn how to generate a valid computer program, which when executed, solved a particular task.
As with all genetic algorithms, there was work involved with designing the fitness function. The fitness function is equivalent to describing to the AI what you’re looking for. In this way, creating the fitness function itself, is a bit like programming (on behalf of the human). If it were possible for the AI to develop its own fitness function, this would be a step forward. In the meantime, it may still be possible to grow this project to create more complex child programs, such as those that take user input and compute results.
Ten years ago, this program would not have succeeded within any reasonable amount of time. Five years ago, this program would likely have taken days to weeks, possibly even longer. Today, the execution took a matter of minutes. Tomorrow, the program might run in milliseconds. As computers grow faster and more powerful, larger and larger search spaces can be computed. I can’t wait.
If you’ve found this interesting and want to learn more, download the full source code at GitHub or contact Kory Becker. Read my tutorial on using genetic algorithms and neural networks in C# .NET. Program executables in this article are compiled with Brainfuck.NET Compilator.
Want to see what else the AI can do? Me too! Read more in the follow-up articles: Pushing the Limits of Self-Programming Artificial Intelligence and Self-Programming Artificial Intelligence Learns to Use Functions.
This article was written by Kory Becker, software developer and architect, skilled in a range of technologies, including web application development, machine learning, artificial intelligence, and data science.Sponsor Me