A graphical representation of the increasing strength of computer Go programs

Graph, 1989-2017
Graph, 2008-2017

This page uses graphs to show the strength of the leading computer Go programs, plotted against time.

The source of all the data is the page Human-Computer Go Challenges, which lists all the "official" human-computer go games that I am aware of. These are games and matches that have been well-publicised, or which have been played at the end of a computer Go tournament between the winner or winners and a human, as part of the event. Inclusion criteri are listed below.

Criteria for inclusion in the graph

A game is treated as a data point for the graph if all of these are true:

The game was sufficiently public that I am aware of it and it is listed here.
It was played by a leading program.
The human's rating is known.
It was played on a 19×19 board.
The handicap is known, and was less than 18 stones.

The data file used to create the graphs is in this text file.

Weaknesses in the data

There are many weaknesses in the data.

The programs use a very wide range of hardware. An event in November 2014 used 2-core laptops. AlphaGo's match in October 2015 used 1202 CPUs and 176 GPUs.
While I have only included humans of known strength, their strengths are given according to a variety of national and Go-server-based rating systems. These differ considerably. However I have treated them all in the same way (except for Korean ratings, where I have treated "1-gup" as 0-dan professional, and otherwise ignored amateur Korean ratings).
The games involve "leading programs", which may be a grade or so weaker than the strongest program of their day.

Assumptions

To make all the data comparable so that it can be included in one graph, I have made some assumptions:

One stone difference has the same meaning at all playing strengths.
One amateur (dan or kyu) grade difference corresponds to one handicap stone.
One professional grade difference corresponds to ⅓ of a handicap stone.
Professional 1-dan corresponds to amateur 6⅓ dan, and so professional 9-dan corresponds to amateur 9 dan.
An "n-stone" handicap is effectively worth n-½ stones.

Details of the graphs

Wherever a human of known strength (taking account of the handicap used) lost to a leading program, there is a black ribbon extending downwards from one stone weaker than the human's adjusted strength to the bottom of the graph. The ribbon indicates that it is slightly improbable that the program was weak enough to fall in the strength range shown by the ribbon.

Likewise, wherever a human beat a leading program, there is a white ribbon extending upwards from one stone stronger than the human's adjusted strength to the top of the graph, indicating that it is slightly improbable that the program was strong enough to fall in the strength range shown by the ribbon.

The ribbons are all partly transparent, allowing the combined effect of several overlapping ribbons to be seen for games played on the same or close dates.

For example, at the end of 1997, three Taiwanese inseis (whom I have treated as amateur 6-dan) played against Handtalk, then the world's leading program, all giving 11-stone handicaps. One insei won his game, the other two lost. The adjusted rating of a 6-dan giving 11 stones is 6-kyu, so the graph shows, at the end of 1997, a black ribbon extending down from 7-kyu and a white ribbon extending up from 5-kyu. The black ribbon is actually two overlaid black ribbons for the two games lost by the inseis, so is somewhat denser than the white ribbon.

Graph, 1989-2017

Graph, 2008-2017

Reasons for the continuing improvement in strength

Better programs. In the early days of computer Go, the programs were the work of a few amateurs. But since about 1990, incerasing numbers of prgrammers have been working full time to make improvements. Increasingly, programs are the work of teams.
Moore's Law. Processor power continues to double every two years. However, for a period from about 2000 to 2006, increasing processor power had little effect on the performance of Go-playing programs. Chess programs could use it to read ever deeper with alpha-beta search, but Go programs could not make effective use of more power.
Parallelisation. After the introduction of UCT in 2006, more processor power could once more be used effectively: a UCT search can readily be parallelised.
DCNN. Late in 2014, programmers started to consider the use of Deep Convolutional Neural Nets. These have proved very effective.
Increased availability of massive processor power. Maybe this should just be regarded as a consequence of Moore's Law. Graphics cards are now widely used by computer Go programs, as they can support multiple parallel processes. The "Cloud" also helps: it can now be possible for an individual to hire a thousand processors for two hours, or to borrow them from his employer for a weekend.
Increased sponsorship. From early in 2016, and probably earlier but privately, large corporations have been devoting significant resorces to improving computer Go software. Most notably, DeepMind, acquired by Google in 2014, created AlphaGo; also Facebook created DarkForest, and Dwango supported DeepZen.

Last updated: 2016-02-08