Direct Construction of Random Latin Squares

by the Method of Circuits

Larry Trammell (unaffiliated)
September 12, 2021

(draft version 1.4)

 

Abstract

A method is presented for direct construction of "random" Latin squares, essentially indistinguishable from those sampled uniformly from the set of all possible Latin squares of the specified size. While it is still not known "exactly how random" the results are, many applications are likely to consider the statistical deficiencies acceptable, given the degree of computational efficiency. Construction proceeds cell by cell, using internal graph-theoretic sub-structures to enforce the invariant that at the completion of each step no Latin square properties are violated — short of incomplete symbol assignments. Termination is guaranteed with probability 1. Empirical tests indicate O(N3) computational complexity, but the cubic effects have little impact until the Latin squares are very large.

Keywords: Latin square, random sampling, algorithm, generator, method of circuits

 

Background

A Latin square is an N x N square layout of cells in which each cell is assigned a symbol from a set of N distinct symbols (arbitrarily indicted by labels "1" through "N" though any symbols may be used) such that each symbol appears exactly once in each row and exactly once in each column [11]. Figure 1 shows an example.


Figure 1 - An example Latin square of size 6

Some Latin squares display what seems a highly non-random structure. For example, the following Latin square has an obvious banded structure.


Figure 2 - A Latin square with regular internal structure

Predictable structures such as this one compromise the effectiveness of Latin square applications that rely on apparent randomness.

A highly-structured thing like a Latin square might not seem very random. The "randomness" is actually in the "sampling" process used to obtain it, not in the square itself. One way to "randomly" obtain a Latin square would be to draw symbols randomly out of a pool containing N copies of each of the symbols "1" through "N," without replacement, and place those symbols into randomly-selected empty cells of a Latin square array. If by chance the resulting layout satisfies the properties of a Latin square, then this is clearly a "randomly sampled" Latin square. Another approach is to pre-calculate all possible Latin squares and then randomly select one of them. The problem with both of these ideas is the impossible amount of required computation. The challenge is to obtain a result that is "sufficiently close to random" using a reasonable amount of computation.

 

Prior Art

The "Latin square completion" problem has a long history. A layout has the "Latin property" when no symbol appears twice in any row or in any column. [8] A Latin square is "partially populated" — for example, as rectangular sub-blocks — when the layout has a Latin property, part of the cells have assigned values, and the rest of the cells remain empty. A subsequent process can then attempt to fill the empty cells to "complete" the Latin square layout.

Colbourn [3] found that in general the complexity of the Latin square completion problem was NP-complete. Kuhl and Shroeder [9] show that even in the case of one empty row and one missing symbol, the completion problem remains nontrivial. A "divide and conquer" strategy proposed by DeSalvo [5] is also based on the idea of completion, partitioning the problem into an "easy" part and a "hard" part, where the scale of the "hard" part might be small enough to succumb to a brute force attack. However, this approach does not promise to scale up well for large Latin squares.

Given the challenges presented by a completion strategy, that idea is abandoned here. The new approach has a goal of "extending" a partial Latin square layout so that, at the end of each step, one additional cell is filled and the layout retains the Latin property. Repeating this for every unassigned cell results in a complete Latin square. Unlike the classical "completion problem," all prior symbol assignments are treated as advisory. Any prior symbol assignment is allowed to be altered in any manner that is expedient.

An established and practical way to obtain an "almost random" Latin square has been the method of Jacobson and Matthews [7]. In this approach, a systematically-constructed Latin square generated from a pattern is expanded to a 3-dimension sparse grid structure. Elements of this structure are "scrambled," in a manner that uses properties of the higher-order grid to preserve Latin square properties. The structure is then flattened back to a two-dimension layout to produce a Latin square that is "sufficiently random." Though very useful for small Latin squares, this approach has the significant drawback that it is iterative. The shuffling processing must visit each cell a sufficiently large number of times to thoroughly break up all traces of the initial patterns. What constitutes a "sufficiently large number of times" in this context remains unclear.

The method described here attempts to insert symbols into empty cells, as randomly as possible, without violating the properties of a Latin square. If a temporarily violation of Latin square properties occurs, a corrective action is applied to repair the integrity of internal sub-structures discussed in the next section, thereby restoring Latin properties.

There is some similarity to a path following algorithm such as Lemke's Method for solving a complementarity problem [4].

In applications such as secure hashing [10], a well-known practice is "tracing a path" through a Latin square by seeking a sequence of symbols specified by a given text, and maintaining a row-column position as "state." The internal sub-structures used by the new generator method are closely related to sequences generated by the special case of a text stream restricted to two distinct symbols appearing in an alternating sequence.

 

Internal Structures of Latin Squares

The Latin square construction method takes advantage of certain internal structures of Latin squares. This section characterizes these structures. There is nothing new about these; only their application to the problem of constructing randomized Latin squares is novel.

The internal structures of Latin squares are a subject of ongoing study. Cells in a Latin square can be interpreted as "places" or nodes as defined by Graph Theory. [6] The relationship of two symbols appearing in the same row or column of a Latin square can be interpreted as a "connection" or link as defined by Graph Theory. This leads to a representation of a Latin square as a mathematical graph structure. Note that in a Latin square, every symbol is related to every other symbol by being in the same row, and every node also is related to every other symbol by being in the same column. In addition, every node can be considered to have an implicit connection to itself. This kind of graph, where every node has a connection to every other node, is called a completely connected graph.

A relationship that results by appearing in the same row can be denoted by assigning a color property to the connecting edge of the graph. A relationship that results by appearing in the same column can be denoted similarly, using a second color property. For the case of a single pair of distinct symbols, and disregarding all other cells and links that do not involve these two labels, this identifies a subgraph in which every node has exactly two links, an "incoming" link of one color, and an "outgoing" link of the other color. A graph of this sort, where every node has one link of each color, is said to have rainbow coloration. [2]. Repeating this for all possible pair of symbols establishes an indexed collection of subgraphs — a factorization of the completely connected graph.

A path can be followed through this kind of two-colored graph by tracing edges with alternating colors from node to node. Because the number of rows and columns is finite, eventually the path must return to a node previously visited. The subset of nodes that are reached along a path in this manner is called a cycle. [2]

There are other ways to view these cyclic path structures. One way is in terms of transversals. In its basic form, a transversal is a set of N Latin square cells, each containing a different symbol, such that every row and every column contains one of the N transversal cells. A reduced partial transversal can also be defined using a subset of the N symbols [2]. In a Latin square, the locations bearing the same symbol form a 1-symbol transversal set. Given two of these 1-symbol transversals, the links going from nodes bearing the first label to nodes bearing the second label have one coloration, and the links going from nodes of the second set to the nodes of the first have the opposite coloration. In this view, the links can be considered to define a matching between the two sets. Every node has exactly one link with each coloration, hence the matching is a proper rainbow matching between the two sets.

Another useful view comes from combinatorics. Take two rows of length N, each containing the symbols 1 through N so that the columns (of two symbols each) satisfy the Latin property. That means that a symbol cannot appear at the same column location in both rows. Suppose that an element with symbol "A" appears at an arbitrary position within the first row. Some other symbol "B" appears in the second row immediately below it. This is interpreted as "the A symbol is transposed to the position where the B symbol was originally." This means that the "B" symbol was displaced. Looking for the "B" symbol in the first row, the element below it indicates some symbol "C" that the "B" symbol displaced. And so forth. Because there are only a finite number of symbols, eventually a location is reached where the "A" symbol started; the transposition sequence then forms a cycle.

In this view, the second row can be viewed as a kind of shuffling of the elements of the first row where no element remains in its original position — that is, the mapping has no fixed point. This special kind of permutation is known as a derangement. The combinatorial properties of derangements provide insights about properties of the Latin square and cycle sub-structures within it.

It is easily observed that N permutation steps are possible in a cycle within a row of N elements — but not 1 or N-1, as this would involve a permutation with a degenerate "fixed point." When the number of cycle steps is less than N, the symbols not encountered along the cycle will form their own isolated cycles, thus estabishing a "partition" of the columns within the two rows. The statistical properties of these cycles in randomly-selected derangements is discussed by Cavenagh, Greenhill, and Wanless. [1].

Any two rows and columns of a random Latin square exhibit the properties of random derangements. However, additional constraints come into play when more rows or columns are involved. For the case of any two rows, the number of possible mutual derangements is a well-defined number; but for more than two rows, it is easily demonstrated that the number of ways a row can be a derangement of all the others varies, depending on the cyclic sub-structures present in those other rows. An important special case, if there are N-1 mutual-derangement rows, there is only one way to construct an Nth row that is a derangement of the other N-1 rows. This property does not follow transparently from the properties of two-row derangements.

While all of these points of view are helpful, the terminology of graph factorizations, cycles, colorings, matchings, and derangements is not particularly conducive to the purposes here. Consequently, the cycle structures apparent in all of these various points of view are denoted as circuits, and the process of "tracing a path" is described as "tracing a circuit", analogous to following current flow through an electric circuit. These "circuits" correspond exactly to the "cycles" as described by Gyárfás and Sárközy [2].

The theoretical framework aside, circuit tracing is actually very easy. For example, Figure 3 shows a circuit within a Latin square, starting (arbitrarily) at the cell at the upper left corner. It carries the symbol "2". With "6" selected arbitrarily as the second symbol, follow a transition path across the first row from the cell with the "2" symbol to the cell with the "6" symbol. Having found this, transition along the column containing the "6" symbol to locate the row containing a "2" symbol, and so forth.


Figure 3 - A circuit in a Latin square

In this particular case, the choice of the symbol pair "2" and "6" generated a circuit with 2·N connected cells. Since this reaches all of the cells in the Latin square that have labels "2" or "6", this can be called a circuit of maximal length. This demonstrates that maximal circuits can exist. However, within the same Latin square, the symbols "2" and "1" can be observed to factor the rows and columns into two disjoint sets of less than maximal length.


Figure 4 - Two disjoint circuits partition a Latin square

To summarize the Latin square properties implied by the circuit paths that are particularly relevant here:

  1. For any two given symbols, there exists a transition between those two symbols in every row [column]. This arises from the Latin square property that every symbol appears in every row and in every column.
  2. The presence of each symbol exactly once in every row and column guarantees that a transition between any two symbols is unique in every row, unique in every column.
  3. Uniqueness of transitions in a circuit implies uniqueness of the circuits that contain them.
  4. Uniqueness of circuits implies that if more than one circuit is generated by a given number pair, these circuits are disjoint. It is not possible to trace along one circuit and encounter a transition leading to a different circuit for the same number pair. There are no "sub-circuit" paths that can bypass any of the transitions along any circuit.
  5. A circuit always has an even number of transitions. For every transition that departs along a row or column from one symbol, another transition arrives seeking the other symbol of the pair.
  6. Because there is a finite number of rows or columns, eventually a transition must reach the original row or column, thus arriving back at the (arbitrary) initial cell for the circuit. Because there is an even number of transitions for a cycle between two symbols, it is not possible to arrive back at the starting row or column "at the wrong symbol" of the pair.
  7. From finiteness of the square, the number of transitions along the path of any given circuit is firmly bounded to 2·N or less.

THEOREM: Assigning symbols to cells of a square layout in a manner that satisfies the stated circuit properties at every cell is necessary and sufficient to construct a Latin square.

PROOF: As discussed above, the Latin square properties imply the existence of the circuits. The uniqueness/disjointedness of the circuits implies that a symbol can appear at most once in any row or column, while the presence of N filled cells in each row and each column implies that every symbol must be present, thus forcing the Latin properties.

 

Partial Latin Squares and Circuits

A partial Latin square of size N has cells that are either empty or assigned some subset of the N eligible symbols in a manner such that a symbol does not appear more than once in any row or any column. That is, the layout satisfies the Latin property. A special trivial case is a layout with all of its cells empty. Simple accounting tells us that the presence of an empty cell means that neither the row nor the column containing that empty cell has all N symbols assigned. The presence of an empty cell does not violate the Latin properties.

A feasible symbol assignment is a placement of a symbol into a cell, such that the assigned symbol was not previously assigned to any of the other cells in either the row or the column that contains the cell. Feasible symbol assignments "extend" a partial Latin square. To formalize this, for any selected empty cell at a given row and column position in a partial Latin square, a set of row-feasible symbols [column-feasible symbols] is defined to be the set of symbols that have not previously be assigned to any cells in that row [column]. Observe that assignment of a symbol into a cell implies removal of that symbol from the row-feasibility and column-feasibility sets for that cell. If it becomes necessary to remove a symbol from a cell, this implies restoration of that symbol into the row-feasibility and column-feasibility sets for that location.

A partial circuit is a sequence of cells, starting with a selected pair of symbols, such that tracing the circuit from cell to cell leads to a row or column in which the symbol required to continue is not yet present. In this situation, it is not possible to continue tracing the circuit to return to the initial cell.

A partial Latin square can contain a mix of partial and complete circuits. The more unassigned cells in the partial Latin square layout, the more likely that paths traced along circuits will be partial. The cells along a partial circuit do not violate any circuit or Latin square properties.

Given a partial Latin square, any empty cell can be selected arbitrarily for a feasible symbol assignment. Given the goal of constructing a random Latin square, a symbol is selected randomly from the set of feasible symbols available to fill that cell.

 

Conflict Resolution

It is possible that past feasible assignments of randomly-selected symbols resulted in a partial Latin square layout having an empty cell such that the intersection of the row-feasibility set and the column-feasibility set for that cell is empty. In other words, "the Latin square cannot be completed" because any symbol assignment would duplicate a previous assignment in its row or in its column. We will call this situation a conflict. Systematic construction of the Latin square cannot continue while any conflict exists.

A resolution process is required to remove any conflict encountered. Since the cell is empty, a symbol must exist that is row-feasible, and another (different because of the conflict) symbol must exist that is column-feasible for that cell position. Without loss of generality, select a symbol from the row-feasible set and assign it to the empty cell, updating the row-feasibiliy set. This is not a column-feasible symbol assignment, since the selected symbol cannot also be not present in the column-feasible set while the conflict exists. Therefore, that first selected symbol is present somewhere else in the column.

Locate the conflicting symbol within that column. Remove that symbol from its cell. Return the symbol to the row-feasibility set for that cell location (but not the column feasibility set, because the preceding symbol assignment replaced it). These adjustments cure the problem of the conflicting symbol assignment in the column, but at the expense of creating an empty cell at a new location within the layout.

At the new empty cell, check whether there is now any symbol that is both row- and column-feasible there. Two cases are possible.

  1. There exists a symbol that is both row- and column-feasible. This will be a highly likely occurrence when there are many empty cells present in the layout. Assign the selected symbol to the cell. Remove this symbol from the row- and column-feasibility sets. The conflict is resolved and the resolution sequence terminates.
  2. There does not exist a symbol that is both row- and column-feasible at that location. In this case, a conflict persists, but it is now located at the recently-vacated cell, rather than the original one.

Resolving a new conflict at the second cell proceeds very much the same, but this time, select a symbol from the column-feasible set, and remove that symbol from the set. While this assignment is column-feasible, it is not row-feasible, again because of the conflict. Look through the row for the matching symbol. Remove that symbol, and restore it to the column-feasible set for that location. The layout is again restored to a partial Latin square, but a vacant cell was created at yet another new location.

It can be noted that the resolution process is now exactly as it was when the first conflict was detected — except now the empty cell with the conflict problem is at a completely different row and column location. Continue the resolution process as before, as many times as required until the conflict is resolved.

Which raises the question: is the conflict resolved? Or can the process spin around indefinitely? Before considering the general question of termination, first consider a special case which makes the termination issue clear.

 

Conflict Resolution using Circuits

The discussion of conflict resolution does not place any restrictions on the selection of a replacement symbol other than row- or column-feasibility. Consider now a special case for this selection.

When beginning a conflict resolution process, call the row-feasible symbol selected for assignment into the initial empty row cell "symbol 1". When another cell is vacated and some new column-feasible symbol is selected to refill that cell, call the symbol selected for this assignment "symbol 2". Thus, the resolution process assigns "symbol 1" to the initial cell, and assigns a replacement "symbol 2" somewhere in the column of the initial cell.

Note that replacement of "symbol 1" by "symbol 2" always restores "symbol 1" to the row-feasibility set of the last filled cell. Similarly, replacement of "symbol 2" by "symbol 1" always restores "symbol 2" to the column-feasibility set for the last filled cell. It is therefore possible to apply a restricted policy that alternately picks "symbol 1" or "symbol 2" for assignment at each step of the resolution process. In other words, using this policy, the resolution method traces a circuit induced by "symbol 1" and "symbol 2", alternately replacing "symbol 1" with "symbol 2", or "symbol 2" with "symbol 1" along the way.

With the observation that this process traces a circuit path, it is clear that one of two things must happen:

  1. reach the original cell where the resolution process started, at which point the conflict resolution process is finished;
  2. reach some other cell where the symbol assignment happens to result in no new conflict, and the conflict resolution process is finished.

In all cases, the tracing the circuit path leads to conflict resolution. No Latin square properties are violated, and no harm was done to any of the rows or columns not touched.

Note that each construction step starts by filling a cell that was previously empty, and every cell that was filled at the beginning of a conflict resolution process remains filled when the resolution process finishes, leaving no violations of Latin square properties. Thus forward progress is guaranteed. When the last cell is filled, the layout is a complete Latin square. This approach, of tracing along circuits, and swapping symbols until the partial or completed circuit structures are correctly structured, is called the "Simplified Method of Circuits."

A bound on circuit sizes within a Latin square is the length of a maximal circuit, 2·N steps, which means that every resolution process is bounded. Since there are only N2 locations, this bounds the number of conflicts. Thus the total resolution effort cannot exceed 2· N3 steps.

Listing 1 summarizes the Simplified Method of Circuits in a pseudo-code style. An executable implementation coded in Python is available for download [12] under the Creative Commons License.

 

Listing 1
Building a random Latin square using the Simplified Method of Circuits

MAIN PROGRAM

  # Construct required data structures
  DEFINE  Nrc = number of rows = number of columns
  DEFINE  A[Nrc,Nrc] = empty Latin square layout of dimension Nrc x Nrc
  DEFINE  rowavail[Nrc] = empty array of Nrc row-feasible sets
  DEFINE  colavail[Nrc] = empty array of Nrc column-feasible sets

  # Initialize the row and column feasibility sets
  FOR irc = 1 to Nrc
      FOR symbol = 1 to Nrc
          Add symbol into set rowavail[irc] 
          Add symbol into set colavail[irc]
      ENDFOR
  ENDFOR

  # Randomize the order of empty row-column locations
  shuffle(pool)

  # Assign value to Latin square layout cell by cell
  FOR irow in 1 to Nrc
      FOR jcol in 1 to Nrc
          avail = intersection of sets rowavail[irow] && colavail[icol] 
          IF avail is not empty  
              # No conflict
              symbol1 = randomly selected member of avail
              Assign A[irow,jcol] = symbol1 
              Remove symbol1 from set rowavail[irow]
              Remove symbol1 from set colavail[icol]
          ELSE                   
              # Conflict
              symbol1 = randomly selected member of rowavail[irow]
              symbol2 = randomly selected member of colavail[jcol] 
              Assign A[irow,jcol] = symbol1
              Remove symbol1 from set rowavail[irow]
              Resolve Conflict ( A, irow, jcol, symbol1, symbol2 )
          ENDIF
      ENDFOR
  ENDFOR
END MAIN PROGRAM

SUBPROGRAM Resolve Conflict ( A, iviol, jviol, sym1, sym2 )  
  LOOP
      Let i = index not equal to iviol where A[i,jviol] matches sym1
      IF i is empty
          EXIT LOOP

      iviol = i
      A[iviol,jviol] = sym2
      Add sym1 to set rowavail[iviol]
      IF sym2 is in set colavail[jviol]
          Remove sym2 from set colavail[jviol]
      
      Let j = index not equal to jviol where A[iviol,j] matches sym2
      IF j is empty
          EXIT LOOP

      jviol = j
      A[iviol,jviol] = sym1        
      Add sym2 to set colavail[jviol]
      IF sym1 is in set rowavail[iviol]
          Remove sym1 from set rowavail[iviol]
  ENDLOOP
END subprogram        

 

This is encapsulated by the following.

THEOREM:     The "Simplified Method of Circuits" terminates in a finite number of steps with a layout that is a complete Latin square.

This is efficient, and the results might be good enough for many applications. However, empirical results from statistical tests indicate that the results are not completely free of bias. If randomness is critical important, this is not a preferred method.

 

Randomized Conflict Resolution

Two policies appear to produce the observed statistical biases in the Simplified Method:

  1. The Method of Circuits in general does not require empty cells to be filled in any given order. Instead of processing empty cells in a rigid lexical order, the selection of locations to fill can be randomized.
  2. The Method of Circuits in general does not require staying with a single initial choice of row- or column-feasible symbols during circuit tracing. Symbols from the row or column feasibility sets can be selected randomly at every possible opportunity.

Applying these two changes in the resolution search policies leads to the Randomized Method.

The second policy can be interpreted as randomly making a jump (or not!) from one circuit to another one at any opportunity to do so. There is always a chance that continued tracing along the any circuit will eliminate the conflict in its next step, thus finishing the conflict resolution. Given enough resolution steps, the random selections will eventually find one of these outcomes leading to resolution, so the resolution process finishes. Thus, the algorithm terminates with probability 1. This is a weaker condition for termination than the finite bound on computational effort than exists for the Simplified Method.

From a programming standpoint, the randomized variant does not carry around as much state information, allowing a simpler program structure. However, many more randomized choices are made. Generating high-quality pseudo-random numbers for all of these choices means significantly higher computational cost. An executable implementation coded in Python is available for download [13] under the Creative Commons License.

Listing 2
Building a random Latin square using the Randomized Method of Circuits

MAIN PROGRAM

  # Construct required data structures
  DEFINE  Nrc = number of rows = number of columns
  DEFINE  A[Nrc,Nrc] = empty Latin square layout of dimension Nrc x Nrc
  DEFINE  rowavail[Nrc] = empty array of Nrc row-feasible sets
  DEFINE  colavail[Nrc] = empty array of Nrc column-feasible sets
  DEFINE  pool[Nrc] = empty array of randomly shuffled row-col locations

  # Initialize the row and column feasibility sets.  Establish a pool 
  # of completely randomized row-column locations.
  FOR irc = 1 to Nrc
      FOR symbol = 1 to Nrc
          Add symbol into set rowavail[irc] 
          Add symbol into set colavail[irc] 
          Append row-column index pair into pool 
      ENDFOR
  ENDFOR

  # Process the empty Latin cells in randomized order
  WHILE row-column pool is nonempty
      Select (irow,jcol) = next random location to process

      REPEAT
          # If there are any feasible choices, take one of them randomly 
          avail = intersection of sets rowavail[irow] && colavail[icol]
          IF avail is not empty
              # No conflict, accept symbol selection  
              symbol1 = randomly selected member of avail
              Assign A[irow,jcol] = symbol1 
              Remove symbol1 from set rowavail[irow]
              Remove symbol1 from set colavail[icol]
              Continue from the start of the main WHILE loop

          # Otherwise, perform conflict resolution in column
          ELSE
              # Conflict
              symbol1 = randomly selected member of current row availability set
              Assign A[irow,jcol] = symbol1 
              Remove symbol1 from set rowavail[irow]

              # Remove conflicting symbol assignment in column
              iviol = row index of conflicting assignment in column jcol
              Add symbol1 to the set rowavail[iviol]

              # Advance to the emptied location of previous conflict             
              irow = iviol
          ENDIF

          # If there are any feasible choices now, take one of them randomly 
          avail = intersection of sets rowavail[irow] && colavail[icol]
          IF avail is not empty
              # No conflict, accept symbol selection 
              symbol2 = randomly selected member of avail
              Assign A[irow,jcol] = symbol2 
              Remove symbol1 from set rowavail[irow]
              Remove symbol1 from set colavail[icol]
              Continue from the start of the main WHILE loop
          ELSE
              # Conflict
              symbol2 = randomly selected member of current column availability set
              Assign A[irow,jcol] = symbol2 
              Remove symbol2 from set colavail[icol]

              # Remove conflicting symbol assignment in row
              jviol = column index of conflicting assignment in row irow
              Add symbol2 to the set colavail[jviol]

              # Advance to the emptied location of previous conflict             
              jcol = jviol
          ENDIF

      ENDREPEAT

  ENDWHILE 

END MAIN PROGRAM
     

 

Randomness

Every cell starts with a conditionally random symbol assignment — conditioned only by the restriction that the symbol must not be previously used in the row/column where located. For the case of the Randomized resolution policy, both the contents and the location of the next empty cell to be initialized are selected at random. Thus, any possible Latin square could be produced by the initialization process alone — though that is highly unlikely.

For the Randomized resolution policy, the symbols used during the conflict resolution can be switched randomly at at any time when more than one candidate symbol is available.

The fact that there are so many opportunities for random selections suggests that statistical properties of the results should be pretty good, but there is no reason to expect perfectly random and unbiased results. The Method of Circuits reaches a Latin square by way of numerous "conflicted" square layouts that temporarily violate the Latin property. It is conjectured that "positions" of Latin squares within the set of all square layouts might not be uniformly distributed; perhaps then, some of the Latin squares are "easier to reach" when starting from a layout that has a conflict, and this would result in bias. Also the Method of Circuits goes systematically from few filled cells to many; this might tend to favor "extending" existing circuit patterns and thus favor longer cycles.

A possibility to consider is a hybrid algorithm.[6] Use the Method of Circuits to establish an initial layout, and then apply several rounds of an iterative "scrambling" algorithm. Even if imperfect, a starting layout produced by the Method of Circuits will be vastly better than any "pattern" layout of the sort typically used to initialize iterative Latin square methods.

Other generation methods finish by applying isotopic transformations [8] such as a random permutation of rows and columns. These transformations do not change transversal properties of the results, or the distributions of cycle sizes within sub-structures, so no benefit would be expected from this extra effort.

 

Complexity - Empirical

Each of the N2 cells must be processed at least once to provide an initial symbol assignment. Thus complexity can be no less than O(N2).

When any conflict occurs, it will require tracing some number of cells along a circuit. Circuit lengths grow in proportion to N. With this additional effort repeated at a large number of cells, a complexity of O(N3) is anticipated. The following plot shows the expectation of observed runtime for the Randomized method, where sample sets of 1000 generated random Latin square layouts are tested.



Figure 4 - Mean runtime for Latin square calculation.

As is clear from the plot, a cubic function provides an excellent characterization, supporting the idea of O(N3) complexity. However, execution time contains a mix of contributions from the implementation environment as well as the algorithm itself.

To better isolate behaviors that are independent of the implementation, a "unit of effort" is defined to be a visit to a layout cell that supplies a new value to that cell. The effort to supply the N2 initial values already accounts for some of these visits. The rest depends on the conflict resolution processing.

Measured in this way, sample sets of 1000 Latin squares were again generated, this time counting the number of conflict events, and also the number of follow-up steps to reach each resolution. The number of resolution processes initiated, times the effort per resolution, measures the total effort expected for conflict resolution processing. Once again, the O(N3) character of this growth is apparent.



Figure 5 - Conflict resolution effort growth O(N3).

Even though the third-order growth term must eventually dominate the lower order terms, that does not mean that the third order effects make much difference until the Latin square size is very large. Another way to look at the previous data is to scale it by N2, to observe expected effort per cell.



Figure 6 - Expected number of visits per cell.

If the complexity were O(N2), effort per cell would be constant, and the right side of the curve would approach a horizontal asymptote. Clearly, that does not happen. The gradual slope of the upper part of the curve reflects the third-order growth that becomes significant for larger layout sizes.

Another way to summarize this is that the Randomized method visits each cell less than 2.5 times, on the average, and this rule-of-thumb holds true until the Latin square size N is into the thousands.

 

Complexity - Analytical

The empirical demonstrations show that there is no combinatorial growth in effort as the Latin square size N increases. The fundamental reasons for this are not yet fully understood. This section summarizes some of observations that a likely to contribute to a rigorous explanation in the future.

It might seem that as Latin squares become larger, there should be many more opportunities for conflicts to arise, so the number of conflicts should explode. However, the empirical tests show that conflicts become proportionately less likely as the Latin squares become larger.



Figure 7 - Percentage of cells that require conflict resolution.

The actual growth rate for the number of cells requiring conflict processing is seen to grow proportionally to N3/2, which is slower than the number of cells in the layout.



Figure 8 - Dependency of conflict occurrences on N.

The reason for this will be related to the fact that maintaining the Latin property among all of the filled cells in the partial layout imposes rigorous restrictions on future choices. Even though the choices become more and more constrained, fewer of the remaining choices are bad ones.

Given that a conflict occurs, its resolution requires some number of steps to trace along a circuit. Circuit lengths and derangement cycles for random derangements show the same kind of length distribution, so properties of derangements can tell us a lot about the cycle length distributions for circuits. The following plot, obtained experimentally using derangements of size 64, illustrates the distribution of circuit sizes.



Figure 9 - Distribution of cycle lengths in derangements.

Disregarding the seemingly anomalous (but mostly understood) values for very long cycles, the distribution is almost indistinguishable from a uniform distribution. For all practical purposes, the expected number of steps to trace a cycle and reach resolution is approximately the median cycle length, which is proporional to N.

A more important effect on the length of the resolution process is the number of unfilled cells present in the layout. Recall that resolution is obtained in one of two ways, either by successfully traversing an entire circuit, or by obtaining a partial circuit in which the conflict no longer exists. When lots of layout cells are empty, it is much more likely that resolution processing terminates early, with a partial circuit. As a result, the number of expected resolution steps is significantly less than N, as shown in the following plot.



Figure 10 - The number of resolution steps grows more slowly than N.

Presenting this same data in a different way, it can be observed that the growth in the number of steps to achieve resolution is approximately proportional to N1/2.



Figure 11 - Growth in the number of resolution steps.

 

Addendum: Latin Square Augmentation

A small variation of the Method of Circuits proves to be remarkably useful for the related problem of augmenting a Latin Square. Suppose that a suitably random Latin square of size N-1 is known, but it is necessary to expand it to size N.

To do this, append to an initial N-1 x N-1 Latin square an additional row of empty cells to the bottom, and an additional column of empty cells to the side, expanding it to a N x N layout. Construct row and column feasibility sets for this modified layout. For the first N-1 rows and columns, the only feasible symbol available for assignment is "N". For the new empty row and column, all symbols from "1" to "N" are available for assignment.

The Method of Circuits is now applied to fill the empty cells of the last row. Random row-feasible assignment operations will produce an immediate column conflict unless the symbol selected is "N". Find the conflicting symbol in the column above and replace this with the new symbol "N". Continue as in the regular Method of Circuits if a new conflict is introduced, until the conflict is resolved.

After all of the symbols have been assigned to fill the once-empty row, begin processing the cells in the empty column. The row- and column-feasibility sets will indicate which symbols are needed to complete the final column. This completes the augmented Latin square.

This process seems efficient enough to raise the question: why not build a Latin square of size N entirely in this manner, recursively, starting with a trivial size 1 Latin square, and "growing" it to size 2, size 3, size 4 and so forth, until size N is obtained. Unfortunately, this has O(N2) overhead for maintaining the succession of row and column feasibility sets for sizes 1, 2, 3, etc.. Therefore, it is unclear whether the recursive approach offers a performance advantage.

 

Conclusion

This is surprising — a method for constructing "randomized" Latin squares that is straightforward to implement and not an iterative "mixing" process approximating randomness. There is reason to be optimistic but cautious. Casual inspection suggests that the method should be amenable to a functional programming style, which would enhance the range of possible applications. Proving complexity order rigorously, quantifying the degree of "randomness" in the final result, and theoretical questions about relationships to classical topics in combinatorics and graph theory, remain for future study. The potential for immediate application benefits motivates making this approach known expeditiously, while the theoretical support is awaited.

 


References:

[1] Cavenagh, Nicholas J.; Greenhill, Catherine; Wanless, Ian M.; "The Cycle Structure of Two Rows in a Random Latin Square"; Wiley InterScience (2007), DO1 10.1002/rsa.20216.

[2] Gyárfás, András; Sárközy, Gábor N.; "Rainbow matchings and cycle-free partial transversals of Latin squares"; Discrete Mathematics 327 (2014) 96–102.

[3] Colbourn, C. J., "The complexity of completing partial Latin squares" (1984), Discrete Applied Mathematics, 8(1):25–30.

[4] Cottle, R. W.; Pang, J-S; Stone, R. E., "The Linear Complementarity Problem" (1992), Academic Press, Chapter 4.4 "Lemke's Method," pp 265-288

[5] DeSalvo, Stephen, "Exact sampling algorithms for Latin squares and Sudoku matrices via probabilistic divide-and-conquer" (2016), arXiv preprint arXiv:1502.00235.

[6] Diestel, Reinhard, "Graph Theory", 5th Edition (2016), Graduate Texts in Mathematics, Volume 173, Springer-Verlag, Heidelberg Graduate Texts in Mathematics, Volume 173

[7] Jacobson, M. T., Matthews, P., "Generating uniformly distributed random Latin squares", Journal of Combinatorial Designs 4 (6)" (1996), pp 405–437

[8] Keedwell, Donald, and Dénes, József, "Latin Squares and Their Applications", Elsevier, (2016), ISBN 9780444635556.

[9] Kuhl J, Schroeder M.W., Completing partial Latin squares with one nonempty row, column, and symbol, The Electronic Journal of Combinatorics 23(2) (2016), #P2.23

[10] Schmidt, Nathan O., Latin Squares and Their Applications to Cryptography (2016), Boise State University Theses and Dissertations, 1223.

[11] Tang, Jason, "Latin Squares and Their Applications", (2009) self-hosted essay.

[12] rlatin2.py, a demo implementation of the Simplified Method of Circuits, in Python, available under the Creative Commons license.

[13] rlatin4.py, a demo implementation of the Randomized Method of Circuits, in Python, available under the Creative Commons license.

[14] See: "Testing Latin Square Randomness using Matchings", posted at: "http://www.ridgerat-tech.us/latinsq/BiasTesting.html" .


 

This page and its contents are licensed under a Creative Commons Attribution 4.0 International License. For complete information about the terms of this license, see http://creativecommons.org/licenses/by/4.0/. The license allows copying, usage, and derivative works for individual or commercial purposes under few restrictions.

Document history:

Version 1.1 draft prepared in HTML format by Larry Trammell, copyright holder, who released the document irrevocably 2020-05-02 under the terms of the CC-BY-4.0 license, and encourages authors to build on this foundation.

Version 1.2 contributed by Larry Trammell, adds observations about the relationships to classical and random graph theory, and also about the efficiency of algorithm execution.

Version 1.3 contributed by Larry Trammell Sept. 20, 2021, expands coverage of graph-theoretical basis of internal structures, provides new information about complexity with emphasis on the fully-randomized method, expands the reference section, and updates graphics presentations.

Version 1.4 contributed by Larry Trammell Nov. 11, 2021, adds linkage to results from statistical tests.