As a second foray into CUDA, I’ve written a simple program to hunt for crawlers in the Game of Life. The kernel itself is too revolting to make public, but the supporting LifeStateCU code may be useful for others. (This could be seen as the first step in a GPU version of CatForce/LightCone…)

The method by which each state is stored is taken from apg’s lifelib. Briefly, each warp stores a 64×64 torus using one uint4 per thread. This uint4 is a 2×2 array of uint32_t values, representing a 64×2 strip of the state.Annoyingly, the decade-old CatForce arranges its state in column-major rather than row-major order, so there is a mismatch here.

Operations involving information from multiple rows can be done using the efficient warp-level intrinsics, avoiding any use of shared memory.

For example, fetching the state of a single cell can be achieved having the thread that contains the relevant row shuffle the value of that row to all other threads:Of course you should try desperately to avoid writing code that works one cell at a time! This is just an example.

__device__ uint64_t LifeStateCU::row(int y) const {
  int src = (y & 63) >> 1;
  
  if (y & 1) {
    uint32_t lo = __shfl_sync(0xffffffffu, state.z, src);
    uint32_t hi = __shfl_sync(0xffffffffu, state.w, src);
    return (uint64_t)hi << 32 | lo;
  } else {
    uint32_t lo = __shfl_sync(0xffffffffu, state.x, src);
    uint32_t hi = __shfl_sync(0xffffffffu, state.y, src);
    return (uint64_t)hi << 32 | lo;
  }
}

__device__ bool LifeStateCU::get(int x, int y) const {
  uint64_t r = row(y);
  return (r >> x) & 1;
}

The particular bit twiddles used in the advance() step were discovered by Julia “Vlamonster” using a SAT solver. I’ve added many of the functions from LifeAPI to make it easier to work with.

Given an active pattern, the program runs an almost-brute-force search on a list of reagent placements, running them a fixed duration and checking whether the active pattern reoccurs at any time, at any location. Trying to match the active pattern is fairly expensive, more expensive than advancing the state itself. We place reagents in the reaction envelope of the pattern, rather than in some region fixed in advance.

The list of reagents to try was determined by a one-off script that ran some soups and catalogued the most common ash. Here’s what it found:

That is, block, hive, blinker, traffic light, loaf, ship, honeyfarm, blockade, pond, tub, half traffic light, 3/4 traffic light, teardrop, ship, followed by

Results

Crawlers that can react more than once are surprisingly rare. Here are the only remotely feasible ones that I found.