Cellsprings Help: The CAR File Format

Rules in Cellsprings' built-in rulespaces are encoded using the CAR (for 'CA Rule') file format. (Note that seed states are encoded separately as standard GIF files.) CAR is a simple binary format I developed several years ago for my own use. Owing to its humble and distant origins, the current spec is not without its anachronistic adhocisms, but there is one thing it offers to the CA programming community of which I've yet to see the equivalent, viz., a flexible yet standard means of encoding exhaustive rule tables. Such tables are now in use in popular programs, and specialized search/editor programs are likely to appear soon to navigate such data. Thus there is an increasingly acute need for a common file format, and CAR might at least provide a starting point for one.

Cutting to the chase, here is the order of blocks in the file, linked to descriptions. The bracketed blocks are optional:

  1. Header
  2. [Name]
  3. [Description]
  4. Neighborhood Definition
  5. Rule Table

Header Block

Structure

The structure of this block is best described with a snippet of C++ code:
typedef unsigned char  uchar;
typedef unsigned short ushort;

struct SCARHeader {
  uchar  id[6];          // "CARule"
  ushort version;        // CAR version number in hundredths: 50 | 51
  uchar  ndim;           // number of CA dimensions
  ushort type;           // rule type code: 0 | 1 | 2 | 3
  ushort hoodsize;       // number of cells in hood, inc. center (symbol: H)
  ushort cellsize;       // number of values a cell may take on (symbol: C)
  uchar  bits_per_entry; // rule table entry packing size
  uchar  sname;          // size >= 0 of name block in bytes
  ushort sdesc;          // size >= 0 of description block in bytes
};
The multi-byte fields are encoded in little endian order (i.e., low byte first), because the original codec was implemented in a DOS/Intel environment.

Notice that the parameters permit considerable generality. However, as you'll see below, the present version of Cellsprings doesn't really take advantage of this flexibility. In particular, Cellsprings doesn't run non-2D CAs or generalized cell neighborhoods.

Encoding Tips

The version field should be written as 51 (CAR version 0.51). The older version serves to flag some earlier files that decoders may occasionally encounter which differ only in the neighborhood definition format. In writing other fields, be aware that the Cellsprings CAR decoder is currently fairly limited. Here's a list of restrictions it imposes on header field values:
  1. ndim: must be 2
  2. hoodsize: must be 5 or 9
  3. cellsize: maximum of 256
  4. bits_per_entry: must be 8
But just because Cellsprings doesn't support a particular rulespace is no reason not to take advantage of the flexibility of the format for use in other software. Cellsprings might even catch up someday.

Name Block

Structure

Not much to this. Zero to 256 bytes of character data to serve as the rule name.

Encoding Tips

Encoders must store the actual size in bytes of this block in the sname header field.

The Cellsprings decoder has another limitation you should know about. It reads this block, stores its content, but then precedes to do nothing with it, using the filename for the name instead. (CAR was first implemented in a 16 bit DOS program - that should tell you why this block is in the spec :)

Be that as it may, it might be a good idea to use this field, since even the 32-bit world has platform-dependent limitations on file naming, and there are also network restrictions. What I myself have been doing in practice, though, is lazily saving such "freer text" names as part of the description block.

Description Block

Structure

Same idea as the Name Block, but since the sdesc field in the Header Block is wider than the sname field, the description can contain more text. (In theory one could have 64K of description but that would be pretty dumb :)

Encoding Tips

Be aware that Cellsprings will not read this block if it is larger than 10K. Indeed, it aborts the entire decode. (The funny part is, I'm the one who tends to write insufferably long rule descriptions. Hahaha.)

Another Cellsprings limitation is that it makes the assumption that the data are ASCII - or, more precisely, that each byte represents a different character.

Neighborhood Definition Block

Structure

The hood definition is an array of byte-sized offset values specifying the relative-to-center ndim-dimensional coordinates for each of the hoodsize homies, where one set of coordinates follows another in the sequence. For 2D CAs, the column coordinate proceeds the row coordinate, and, as you'd expect in matrix-style indexing, the positive directions are "right" (East) and "down" (South). The total number of values is of course given by hoodsize x ndim.

Some examples follow. (In the notation, besides the standard abbreviations for compass directions, we have C=center, R=right, and L=left.)

NeighborhoodDefinition (signed bytes)
1D, Radius 1 (in R L C order) 1, -1, 0
1D, Radius 2 (in R2 L2 order) 2, -2, [Radius 1 definition]
2D, von Neumann (in E W S N C order) 1, 0, -1, 0, 0, 1, 0, -1, 0, 0
2D, Moore (in SE SW NE NW order) 1, 1, -1, 1, 1, -1, -1, -1, [von Neumann definition]

The neighborhood definition plays a central role in the interpretation of the subsequent rule table data. For exhaustive rules the order of homie locations is as important as the locations themselves, whereas for totalistic tables only the locations matter.

Decoding Tips

Ideally, the decoder will be able to flexibly decode the neighborhood and set up the requisite correspondence with table-entry ordering in a soft-coded manner. (Unfortunately, the present implementation of Cellsprings doesn't do this, as you'll see below.)

Note on reading version 0.50. In files of this "version", the definitions presume the opposite polarity of the y axis. If you encounter such a file, don't use its definition. Proceed as though the block consisted of either the von Neumann or the Moore definition listed above (depending on whether the hoodsize is 5 or 9). This will ensure that you correctly interpret the rule-table entries. Alternatively, you could just reject 0.50 files, which are likely to be quite rare (so let's stamp 'em out while we have the chance :) Instead Cellsprings/DT can be used to convert them - it reads both versions but writes only 0.51.

Encoding Tips

The encoder programmer should be aware that the Cellsprings decoder at present only reads von Neumann and Moore definitions, and their homie definitions are assumed to be ordered as listed above. (The decoder actually skips over the hood definition, using the hoodsize to discriminate between them. Odd that I would set up an elegant data generalization only to ignore it. Must have been "doing it for the children" :)

Though future decoders can and should be more flexible, it is suggested that you stick with my ordering for these two popular neighborhoods. Mine is as good as any and better than some. For one thing the center cell is the least significant entry in the neighborhood vector, which lets one stack neighborhood "layers" as illustrated above.

Rule Table Block

The format of this block depends on the type code in the Header Block, so I've broken up the discussion accordingly, but there is something that bears mentioning at the global level. Types 1 and 2 are "1-bit" types because their tables have 1-bit wide inputs and outputs. However, such a rule's cellsize may actually exceed 2 - the extra cell states are interpreted as refractory decay states. So with these rules, cellsize varies independently from the table. This is not the case with type 0, which is exhaustive in every sense - such a rule's cellsize is also the size of the table inputs.

Type 0 - Exhaustive

This type calls for an exhaustive lookup table containing C^H cell-values, one for each possible hood-state.

The cell-values are ordered in terms of a walk-through of the corresponding hood-states. The walk-through is a base-C ascension from 0 to C^H - 1, where particular sites in the hood correspond to particular significance positions in the digit string. This correspondence is given by the order of cell-sites in the hood definition, where the first homie defined corresponds to the most significant digit.

For a C3 von Neumann rule, for example, there are 243 table entries that might go like so:

EWSNC  Entry
00000  0
00001  0
00002  2
00010  1
00011  0
00012  2
.
.
.
22222  0
The digit strings on the left I call hood vectors, but, as I've indicated, you can think of them as single integers, that is, table indices in base 3 representation. If cellsize were 4 we'd want to count in base 4 instead, and go up to 1023 decimal.

Encoders should use this type for exhaustive tables whose cell inputs exceed 1 in value. When cellsize is 2, exhaustive tables should instead be encoded as type 2.

Type 1 - Outer Totalistic

The table tells what to do with a 1-bit cell as a function of the number of 1-valued neighbors. Thus the table is only H entries long, that is, it contains one entry for each possible neighbor-count from 0 to H - 1, and in that order.

Each entry consists of a 2-bit value, since there's a 1-bit output for each possible present value of center. The high bit of the entry is the "survival" bit and the low bit is the "birth" bit. Thus the entry can be seen as a code telling us how to determine the new cell value, case-statement style, as follows:

Table Entry Operation Name Output Center (X = Input Center)
0Clear0
1Toggle1 - X
2CopyX
3Set1

I used this encoding format because I had been using an ASCII case statement code. If I had it to do over again, I'd make it consistent with the exhaustive encoding by having H*2 1bit entries going like so:

Count   Center  Entry
0       0       ?
0       1       ?
1       0       ?
1       1       ?
.
.
.
H-1     0       ?
H-1     1       ?  
Notice that if we packed the entry bits into a single value, we wouldn't get the same value we'd get if we packed the current encoding. The difference is that the birth and survival places for a given count are switched. Center goes 101010... in the present encoding because the high bit is "survival". The mismatch with the ascending walkthrough of the exhaustive types is inelegant to say the least.

Ideally this would be changed in future CAR versions (assuming it is advisable to keep this type at all). As the type is used for other dimensions and neighborhoods, one would want to be able to just collapse and expand the neighbor vectors to convert between totalistic and exhaustive encodings without switching bits.

As mentioned in the section intro, a value for the header's cellsize field in excess of 2 defines a decay rule. The extra states are interpreted as refractory decay states.

Type 2 - General 1Bit

The table is an exhaustive table with single-bit inputs and outputs. The same encoding procedure is used as that listed for type 0 if cellsize were to equal 2. However, as with type 1 rules, cellsize may not actually be 2, and indeed that's the reason for splitting this type off from type 0. The cellsize - 2 extra cell states are interpreted as refractory decay states.

Type 3 - Cyclic

The "rule table" in this case consists of only one byte, the CCA threshold value. Use of this type is not recommended due to its ad hoc and limited nature. A more general cyclic rulespace would be better, but perhaps such types - and any number of them (well, up to 64K of them :) could be added to the spec - are best not implemented in CAR at all, but rather using springlets or another "serializable class" style mechanism.


Copyright © 1998-2000 J. M. G. Elliott.