Crawlers
April 2, 2025 / code, cuda, gol
As a second foray into CUDA, I’ve written a simple program to hunt for
crawlers in the Game of Life. The kernel itself is too revolting to
make public, but the supporting LifeStateCU
code may be useful for
others. (This could be seen as the first step in a GPU version of
CatForce
/LightCone
…)
The method by which each state is stored is taken from apg’s
lifelib
. Briefly, each warp stores a 64×64 torus using one uint4
per thread. This uint4
is a 2×2 array of uint32_t
values,
representing a 64×2 strip of the state.Annoyingly, the decade-old
CatForce
arranges its state in column-major rather than row-major
order, so there is a mismatch here.
Operations involving information
from multiple rows can be done using the efficient warp-level
intrinsics, avoiding any use of shared memory.
For example, fetching the state of a single cell can be achieved
having the thread that contains the relevant row shuffle the value of
that row to all other threads:Of course you should try desperately
to avoid writing code that works one cell at a time! This is just an
example.
uint64_t LifeStateCU::row(int y) const {
__device__ int src = (y & 63) >> 1;
if (y & 1) {
uint32_t lo = __shfl_sync(0xffffffffu, state.z, src);
uint32_t hi = __shfl_sync(0xffffffffu, state.w, src);
return (uint64_t)hi << 32 | lo;
} else {
uint32_t lo = __shfl_sync(0xffffffffu, state.x, src);
uint32_t hi = __shfl_sync(0xffffffffu, state.y, src);
return (uint64_t)hi << 32 | lo;
}
}
bool LifeStateCU::get(int x, int y) const {
__device__ uint64_t r = row(y);
return (r >> x) & 1;
}
The particular bit twiddles used in the advance()
step were
discovered by Julia “Vlamonster” using a SAT solver. I’ve added many
of the functions from LifeAPI
to make it easier to work with.
Given an active pattern, the program runs an almost-brute-force search on a list of reagent placements, running them a fixed duration and checking whether the active pattern reoccurs at any time, at any location. Trying to match the active pattern is fairly expensive, more expensive than advancing the state itself. We place reagents in the reaction envelope of the pattern, rather than in some region fixed in advance.
The list of reagents to try was determined by a one-off script that ran some soups and catalogued the most common ash. Here’s what it found:
That is, block, hive, blinker, traffic light, loaf, ship, honeyfarm, blockade, pond, tub, half traffic light, 3/4 traffic light, teardrop, ship, followed by
- This specific pair of block and hive:
b2o$o2bo2b2o$b2o3b2o!
- Biblock:
2o$2o2$2o$2o!
- Half blockade:
2o$2o2b2o$4b2o!
- Wider half blockade:
b2o$b2o4$2o$2o!
- Long boat:
2bo$bobo$obo$2o!
- This specific pair of blinkers:
o$o$o7$4b3o!
Results
Crawlers that can react more than once are surprisingly rare. Here are the only remotely feasible ones that I found.