Direct Construction of Random Latin Squares

by the Method of Circuits

Larry Trammell (unaffiliated)
May 26, 2020

(draft)

 

Abstract

A method is presented for direct construction of "random" latin squares, essentially indistinguishable from those sampled uniformly from the set of all possible latin squares of the specified size. Construction proceeds cell by cell, using internal sub-structures called circuits to preserve the invariant that at the completion of each step no latin square properties are violated — short of the incomplete symbol assignments. Termination in finite steps is guaranteed. An empirical examination demonstrates bounded O(N3) complexity.

Keywords: latin square, random sampling, algorithm, generator, direct method, method of circuits

 

Background

A latin square is an N x N square layout of cells in which each cell is assigned a symbol from a set of N distinct symbols (arbitrarily indicted by "1" through "N" though any symbols may be used) such that each symbol appears exactly once in each row and exactly once in each column [9]. Figure 1 shows an example.


Figure 1 - An example latin square of size 6

Some latin squares display a structure that appears decidedly non-random. For example, the following latin square has an obvious banded structure.


Figure 2 - A latin square with regular internal structure

Such predictable structures diminish the effectiveness of latin square applications that rely on apparent randomness.

The purpose of this paper is to stimulate interest and study for a new, systematic process that constructs "uniformly random" latin squares directly and efficiently. The statistical properties of the generated squares should be sufficient to make them indistinguishable (in a practical sense) from latin squares selected at random from the set of all possible latin squares of the stated size.

 

Prior Art

The "latin square completion" problem has a long history. [6] A latin square is "partially populated" — for example, as rectangular sub-blocks — when part of the cells have assigned values and the rest of the cells remain empty. A subsequent process can then attempt to fill in the remaining cells to complete the latin square layout. Colbourn [1] found that in general the complexity of the latin square completion problem was NP-complete. Kuhl and Shroeder [7] show that even in the case of one empty row and one missing symbol, the completion problem remains nontrivial. A "divide and conquor" strategy proposed by DeSalvo [3] is also based on the idea of completion, partitioning the problem into an "easy" part and a "hard" part, where the scale of the "hard" part might be small enough to succumb to a brute force attack. However, complexity is not in general reduced, so this approach does not promise to scale up well for large latin squares.

Given the challenges presented by a completion strategy, that idea is abandoned here. The new approach is similar to a completion strategy with the goal of filling one additional cell with a valid symbol assignment, and repeating. Unlike the classical "completion problem," all prior symbol assignments are treated as advisory. Any prior symbol assignment is allowed to be altered in any manner that is expedient.

An established and practical way to obtain an "almost random" latin square has been the method of Jacobson and Matthews [5]. In this approach, a systematically-constructed latin square generated from a pattern is expanded to a 3-dimension sparse grid structure. Elements of this structure are "scrambled" in a manner something like a random annealing process, using properties of the higher-order grid to preserve latin square properties. The structure is then flattened back to a two-dimension layout to produce a latin square that is "sufficiently random." Though quite useful, a drawback of this method is that the process is O(N3) and iterative, so a large number of repetitions can be required to be confident of a "sufficiently random" result.

The method described here attempts to insert symbols into empty cells, as randomly as possible, without violating the properties of a latin square. If a temporarily violation of latin square properties occurs, a corrective action is applied, indirectly, by maintaining the integrity of internal sub-structures called circuits, to be described in the next section. A relationship might exist between the circuit sub-structures and transversals within a latin square [6]. For now, this remains a matter of conjecture.

In applications such as secure hashing [8], a well-known practice is "tracing a path" through a latin square by seeking a sequence of symbols from a given text, and maintaining a row-column position as "state." Circuit sub-structures are closely related to the special case of a symbol stream that is restricted to two distinct symbols appearing in an alternating sequence.

There is some similarity to a path following algorithm such as Lemke's Method for solving a complementarity problem [2].

 

Latin Squares and Circuits

A circuit is an elementary sub-structure, somewhat analogous to a cyclic path in graph theory [4], consisting of a sequence of cells induced by a selected pair of symbols connected by directed transitions. The first transition is along a row where the first symbol is located, arriving at the column where the second symbol is located. The next transition is then along that column to locate the row where the first symbol is located. The next transition is then along that new row to where the second symbol is located, and so forth in an alternating sequence, eventually arriving back at the initial cell.

For example, Figure 3 shows a circuit within a latin square, starting (arbitrarily) at the cell at the upper left corner. It has the symbol "2". With "6" selected arbitrarily as the second symbol, follow a transition path across the first row from the cell with the "2" symbol to the cell with the "6" symbol. Having found this, transition along the column containing the "6" symbol to locate the row containing a "2" symbol, and so forth.


Figure 3 - A circuit in a latin square

In this particular case, the choice of the symbol pair "2" and "6" generated a circuit with 2·N connected cells. Since this reaches all of the cells in the latin square that have labels "2" or "6", this is called a circuit of maximal length. The example demonstrates that maximal circuits can exist. However, within the same example, the symbols "2" and "1" can be observed to lie on two disjoint circuits of less than maximal length.


Figure 4 - Two disjoint circuits partition a latin square

Latin square properties imply some properties of these circuit sub-structures. We only need a few of these.

  1. For any two given symbols, there exists a transition between those two symbols in every row [column]. This arises from the latin square property that every symbol appears in every row and in every column.
  2. The presence of each symbol exactly once in every row and column guarantees that a transition between any two symbols is unique in every row, unique in every column.
  3. Uniqueness of transitions in a circuit inductively implies uniqueness of the circuits that contain them.
  4. Uniqueness of circuits implies that if more than one circuit is generated by a given number pair, these circuits are disjoint. It is not possible to trace along one circuit and encounter a transition leading to a different circuit for the same number pair. There are no "sub-circuit" paths that can bypass any of the transitions along any circuit.
  5. A circuit always has an even number of transitions. For every transition that departs along a row or column from one symbol, another transition arrives seeking the other symbol of the pair.
  6. Because there is a finite number of rows or columns, eventually a transition must reach the original row or column, thus arriving back at the (arbitrary) initial cell for the circuit. Because there is an even number of transitions, with two symbols, it is not possible to arrive back at the starting row or column "at the wrong symbol."
  7. From finiteness of the square, the number of transitions along the path of any given circuit is firmly bounded to 2·N or less.

THEOREM: Assigning symbols to cells of a square layout in a manner that satisfies the stated circuit properties at every cell is necessary and sufficient to construct a latin square.

As discussed above, the latin square properties imply the existence of the circuits. The uniqueness/disjointness of the circuits implies that a symbol can appear at most once in any row or column, while the presence of N filled cells in each row and each column implies that every symbol must be present, thus the latin square properties.

 

Partial Latin Squares and Circuits

A partial latin square of size N is a square layout with cells that are either empty or assigned some subset of the N eligible symbols in a manner that no symbol appears more than once in any row or any column. A special trivial case is a layout with all of its cells empty. Simple accounting tells us that the presence of an empty cell means that neither the row nor the column containing that empty cell has all N symbols assigned. The presence of an empty cell does not violate any properties of a latin square, other than the incompleteness of the list of assigned symbols.

A feasible symbol assignment is a placement of a symbol into a cell, such that the assigned symbol was not previously assigned to any of the other cells in either the row or the column that contains the cell. Feasible symbol assignments produce a partial latin square. To formalize this, for any selected empty cell at a given row and column position in a partial latin square, a set of row-feasible symbols [column-feasible symbols] is defined to be the set of symbols that have not previously be assigned to any cells in that row [column]. Observe that assignment of a symbol into a cell implies removal of that symbol from the row-feasibility and column-feasibility sets for that cell. If it becomes necessary to remove a symbol from a cell, this implies restoration of that symbol into the row-feasibility and column-feasibility sets for that location.

A partial circuit is a sequence of cells, starting with a selected pair of symbols, such that tracing the circuit from cell to cell eventually leads to a row or column in which the symbol required to continue tracing is not yet assigned. In this situation, it is not possible to continue tracking the circuit to return to the initial cell.

A partial latin square can contain a mix of partial or complete circuits. A circuit can be complete, as it would be in a finished latin square, when it happens that the two symbols are present in all the rows and columns reached as the circuit is traced. Informally, the more unassigned cells in the partial latin square layout, the more likely that a symbol needed to continue along the circuit is not yet present, thus the more likely that the circuit is partial. The cells along a partial circuit do not violate any circuit or latin square properties, other than the incompleteness of the circuit.

Given a partial latin square, any empty cell can be selected arbitrarily for a feasible symbol assignment. Empty cells can be selected in a lexical indexing order (by rows, then by columns within rows) without loss of generality. The intersection of the row- and column-feasibility sets for each empty cell defines the set of symbols available for a feasible assignment into that cell. Given the goal of constructing a random latin square, a symbol is selected randomly from the set of feasible symbols available to fill that cell.

 

Conflict Resolution

It is possible that past feasible assignments of randomly-selected symbols resulted in a partial latin square having an empty cell such that the intersection of the row-feasibility set and the column-feasibility set for that cell is empty. In other words, "the latin square cannot be completed" because any symbol assignment would duplicate a previous assignment in the row or column. We will call this situation a conflict. Construction of the latin square cannot continue while any conflict exists.

A resolution process is required to remove any conflict encountered. Since the cell is empty, a symbol must exist that is row-feasible, and another (different because of the conflict) symbol must exist that is column-feasible. Without loss of generality, select a symbol from the row-feasible set and assign it to the empty cell, updating the row-feasibiliy set. This is not a column-feasible symbol assignment, since the selected symbol cannot also be not present in the column-feasible set while the conflict exists. Therefore, that first symbol is present somewhere else in the column.

Locate the conflicting symbol within that column. Remove that symbol from the cell there. Return the symbol to the row-feasibility set for that cell location — but not the column feasibility set, because another copy of that symbol is still present. These adjustments cure the problem of the conflicting symbol assignment in the column, but at the expense of creating an empty cell at a new location within the layout.

At the new empty cell, check whether there is now any symbol that is both row- and column-feasible there. Two cases are possible.

  1. There exists a symbol that is both row- and column-feasible. Then select this symbol for assignment into the cell. Remove this symbol from the row- and column-feasibility sets. The conflict is resolved and the resolution sequence terminates. The layout has been restored to a partial latin square.
  2. There does not exist a symbol that is both row- and column-feasible. In this case, a conflict persists, but it is now located at the recently-vacated cell, rather than the original one.

Resolving a new conflict at the second cell proceeds very much the same, but this time, select a symbol from the column-feasible set, and remove that symbol from the set. While this assignment is column-feasible, it is not row-feasible, again because of the conflict. Look through the row for the matching symbol. Remove that symbol, and restore it to the column-feasible set for that location. The layout is again restored to a partial latin square, but a vacant cell was created at yet another new location.

It can be noted that the resolution process is now exactly as it was when the first conflict was detected — except now the empty cell is at a completely different row and column location. Continue the resolution process as before, as many times as required until the conflict is resolved.

Which raises the question: is the conflict resolved? Or can the process wander indefinitely? Before considering the general question of termination, first consider a special case which makes the termination issue clear.

 

Conflict Resolution using Circuits

The discussion of conflict resolution does not place any restrictions on the selection of a replacement symbol other than row- or column-feasibility. Consider now a special case for this selection.

When beginning a conflict resolution process, call the row-feasible symbol selected for assignment into the initial empty cell "symbol 1". When another cell is vacated and some new column-feasible symbol is selected to refill that cell, call the symbol selected for this assignment "symbol 2". Thus, the resolution process assigns "symbol 1" to the initial cell, and assigns a replacement "symbol 2" somewhere in the column of the initial cell.

Note that replacement of "symbol 1" by "symbol 2" always restores "symbol 1" to the row-feasibility set of the last filled cell. Similarly, replacement of "symbol 2" by "symbol 1" always restores "symbol 2" to the column-feasibility set for the last filled cell. It is therefore possible to apply a restricted policy that always picks "symbol 1" or "symbol 2" for assignment at each step of the resolution process. In other words, using this policy, the resolution method traces a circuit induced by "symbol 1" and "symbol 2", alternately replacing "symbol 1" with "symbol 2", or "symbol 2" with "symbol 1" along the way.

With the observation that this process consists of a circuit traversal, it is clear that the process must either reach the original cell as the circuit completes, and the conflict is resolved; or the process reaches a row or column with empty cells so that the conflict is resolved but the circuit is partial and cannot be traced further. In either case, the conflict resolution process finishes, no latin square properties are violated, and no harm was done to any of the rows or columns not touched by the traversal.

This approach, of tracing around a circuit and swapping symbols until the partial or completed circuit structures are correctly structured, is called the "Method of Circuits." The bound on the length of circuits forces a bound on the number of steps for each resolution process.

Note that each resolution process starts by filling a cell that was previously empty, and ends with all previously-filled cells remaining filled, leaving no violations of latin square properties. Thus forward progress is guaranteed. When the last cell is filled, the layout is a complete latin square.

Listing 1 summarizes the Method of Circuits in a pseudo-code style. An executable implementation coded in Python is available for download [10] under the Creative Commons License.

Listing 1
Building a random latin square using the Method of Circuits


MAIN PROGRAM

  # Construct required data structures
  DEFINE  Nrc = number of rows = number of columns
  DEFINE  A[Nrc,Nrc] = empty latin square layout of dimension Nrc x Nrc
  DEFINE  rowavail[Nrc] = empty array of Nrc row-feasible sets
  DEFINE  colavail[Nrc] = empty array of Nrc column-feasible sets

  # Initialize the row and column feasibility sets
  FOR irc = 1 to Nrc
      FOR symbol = 1 to Nrc
          Add symbol into set rowavail[irc] 
          Add symbol into set colavail[irc] 
      ENDFOR
  ENDFOR

  # Assign value to latin square layout cell by cell
  FOR irow in 1 to Nrc
      FOR jcol in 1 to Nrc
          avail = intersection of sets rowavail[irow] && colavail[icol] 
          IF avail is not empty  
              # No conflict
              symbol1 = randomly selected member of avail
              Assign A[irow,jcol] = symbol1 
              Remove symbol1 from set rowavail[irow]
              Remove symbol1 from set colavail[icol]
          ELSE                   
              # Conflict
              symbol1 = randomly selected member of rowavail[irow]
              symbol2 = randomly selected member of colavail[jcol] 
              Assign A[irow,jcol] = symbol1
              Remove symbol1 from set rowavail[irow]
              Resolve Conflict ( A, irow, jcol, symbol1, symbol2 )
          ENDIF
      ENDFOR
  ENDFOR
END main program

SUBPROGRAM Resolve Conflict ( A, iviol, jviol, sym1, sym2 )  
  LOOP
      Let i = index not equal to iviol where A[i,jviol] matches sym1
      IF i is empty
          EXIT LOOP

      iviol = i
      A[iviol,jviol] = sym2
      Add sym1 to set rowavail[iviol]
      IF sym2 is in set colavail[jviol]
          Remove sym2 from set colavail[jviol]
      
      Let j = index not equal to jviol where A[iviol,j] matches sym2
      IF j is empty
          EXIT LOOP

      jviol = j
      A[iviol,jviol] = sym1        
      Add sym2 to set colavail[jviol]
      IF sym1 is in set rowavail[iviol]
          Remove sym1 from set rowavail[iviol]
  ENDLOOP
END subprogram        

 

This is encapsulated by the following.

THEOREM:     The "Method of Circuits" terminates with a layout that is a complete latin square.

 

Complexity

The number of steps required by the Method of Circuits can be tallied as follows.

  • It is clear that N2 symbol selections must be made for assigning symbols to the N2 cells in the layout.
  • Some portion of the initial cell assignments will result in conflicts. that a fixed proportion of the cells, independent of the layout Each conflict can be cleared by a circuit traversal that requires at most 2·N steps.
  • Assume that conflicting symbols can be located within each row or column using an indexing method with O(1) complexity at each step of the conflict resolution process. (However, if a row or column search is used instead by the implementation, this could result in higher complexity.)

This leads directly to O(N3) steps necessary to generate each latin square. This corresponds well to actual implementation behavior.

This corresponds well to actual implementation behavior. To illustrate this, the following plot displays a third-order model curve superimposed on the actual runtimes measured for an actual implementation. Of course, this statistical demonstration is highly dependent on implementation details and the run-time environment. Nevertheless, it can be seen that the polynomial approximation provides a very useful predictor of performance, for square sizes 8 through 1024. The 6-σ range for the variations in run-time are also shown, and is barely visible, which means that the distribution of run times is very tight, a very desirable application property.



Figure 5 - A simple 3rd order polynomial model predicts run-time.
Expected values and ±6σ bands lie along this curve.

Two executable demo implementations coded in Python are available for download under the Creative Commons License. The first uses the indexed search as described above. [10]. The second [11] uses a simple linear search to locate conflicting symbols, at the expense of added run time, but with the benefit of requiring 1/3 as much working storage.

A careful study of the distributions of circuit lengths and the probability of encountering conflicts would be valuable to establish a theoretical basis for the observed complexity estimates. However, the empirical evidence seems strong enough for experimentation to proceed while awaiting rigorous theory.

Two empirical observations that are likely to contribute to further study:

  1. The expected number of conflicts detected during the construction process grows more slowly than the number of cells in the layout. This is plausible, because as the layout becomes increasingly large, the construction process becomes more and more like simply suppling random numbers into the squares. We can see in the following plot that the growth is proportional to N4/3.

  2. The expected number of steps along a circuit required to attain resolution grows more slowly than the size of maximal circuits. This indicates that there is not a uniform distribution of circuit sizes, and as N grows larger there is a greater liklihood of encountering circuits with a smaller size. We can see in the following plot (with each of the estimates evaluated twice as an indication of variance) that the growth is proportional to N2/3.

 

Randomness

The Method of Circuits always alternates between the same two symbols at every adjustment step. Conflict resolution requires at each step only that selected symbols are row-feasible or column-feasible. This suggests an alternate plausible policy for conflict resolution, that of choosing replacement symbols for assignment at random from all feasible symbols at every opportunity during a conflict resolution.

There are various ways to consider the potential costs and benefits:

  1. perhaps systematic adjustments of the Method of Circuits somehow introduce a bias — unclear, since circuit structures always exist in any latin square;
  2. perhaps the Method of Circuits introduces the least non-randomness because it touches the fewest cells during conflict resolution;
  3. perhaps the randomized resolution strategy is better because it it makes more choices randomly.

It is unclear whether any of these speculations have merit. However, under either resolution strategy, the initial symbol assignment to every cell, and also the symbol values to use during conflict resolution, are either completely random or conditioned on prior random choices. Whether this leaves the final result "completely random" or "sufficiently random" remains as a theoretical question, but future experience can show whether there is any practical issue.

The randomized resolution policy could be interpreted as applying the Method of Circuits, but from time to time shifting randomly to a different circuit. One might get lucky and jump to a circuit where resolution arrives quickly; but on the other hand, the resolution process might spin around for a long time before landing upon a circuit where resolution completes. Experiments suggests that the additional randomization does incur additional cost, and that the order of complexity is affected. Table 1 shows the expected number of resolution steps calculated over samples of 256 generated latin squares to compare the two policies.

 

 
   TABLE 1  -  Expected number of cells visited during resolution
     (Sample populations of 256 squares)

square size        Method of Circuits     Randomized Resolution
    16x16                  81                    101                 
    32x32                 420                    540
    40x40                 693                    902
  100x100                5080                   6180
  160x160               13849                  20509
  200x200               22887                  29768

 

CONJECTURE:     There is no difference in the statistical properties of latin squares generated by the Method of Circuits and those generated by the randomized resolution method.

Symbols assigned early in the construction process rarely encounter feasibility restrictions, because so few other symbol were previously assigned. Symbols inserted later are conditioned on previous choices, so encountering a conflict is more likely. Does this produce some kind of structural bias late in the process? It is unclear, since each resolution sequence scatters its effects along rows and columns of the circuits traced. Other methods sometimes finish by applying isotopic transformations [6] such as a randomized permutation of rows and columns. Such transformations could be employed, if deemed beneficial to disperse any locality artifacts.

 

Addendum: Latin Square Augmentation

A small variation of the Method of Circuits proves to be remarkably useful for the related problem of augmenting a Latin Square. Suppose that a suitably random latin square of size N-1 is known, but it is necessary to expand it to size N.

To do this, append to an initial N-1 x N-1 latin square an additional row of empty cells to the bottom, and an additional column of empty cells to the side, expanding it to a N x N layout. Construct row and column feasibility sets for this modified layout. For the first N-1 rows and columns, the only feasible symbol available for assignment is "N". For the new empty row and column, all symbols from "1" to "N" are available for assignment.

The Method of Circuits is now applied to fill the empty cells of the last row. Random row-feasible assignment operations will produce an immediate column conflict unless the symbol selected is "N". Find the conflicting symbol in the column above and replace this with the new symbol "N". Continue as in the regular Method of Circuits if a new conflict is introduced, until the conflict is resolved.

After all of the symbols have been assigned to fill the once-empty row, begin processing the cells in the empty column. The row- and column-feasibility sets will indicate which symbols are needed to complete the final column. This completes the augmented latin square.

This process seems efficient enough to raise the question: why not build a latin square of size N entirely in this manner, recursively, starting with a trivial size 1 latin square, and "growing" it to size 2, size 3, size 4 and so forth, until size N is obtained. Unfortunately, this has O(N2) overhead for maintaining the succession of row and column feasibility sets for sizes 1, 2, 3, etc.. Therefore, it is unclear whether the recursive approach offers a performance advantage.

 

Conclusion

This is surprising — a method for constructing "randomized" latin squares that is straightforward to implement and not an iterative process approximating randomness. There is reason to be optimistic but cautious. Casual inspection suggests that the method should be amenable to a functional programming style, which would enhance the range of possible applications. Proving complexity order rigorously, quantifying the degree of "randomness" in the final result, and theoretical questions about relationships to classical topics in combinatorics and graph theory, remain for future study. The potential for immediate application benefit motivates making this approach known expeditiously.

 


References:

[1] Colbourn, C. J., "The complexity of completing partial latin squares" (1984), Discrete Applied Mathematics, 8(1):25–30.

[2] Cottle, R. W.; Pang, J-S; Stone, R. E., "The Linear Complementarity Problem" (1992), Academic Press, Chapter 4.4 "Lemke's Method," pp 265-288

[3] DeSalvo, Stephen, "Exact sampling algorithms for Latin squares and Sudoku matrices via probabilistic divide-and-conquer" (2016), arXiv preprint arXiv:1502.00235.

[4] Diestel, Reinhard, "Graph Theory", 5th Edition (2016), Graduate Texts in Mathematics, Volume 173, Springer-Verlag, Heidelberg Graduate Texts in Mathematics, Volume 173

[5] Jacobson, M. T., Matthews, P., "Generating uniformly distributed random latin squares", Journal of Combinatorial Designs 4 (6)" (1996), pp 405–437

[6] Keedwell, Donald, and Dénes, József, "Latin Squares and Their Applications", Elsevier, (2016), ISBN 9780444635556.

[7] Kuhl J, Schroeder M.W., Completing partial latin squares with one nonempty row, column, and symbol, The Electronic Journal of Combinatorics 23(2) (2016), #P2.23

[8] Schmidt, Nathan O., Latin Squares and Their Applications to Cryptography (2016), Boise State University Theses and Dissertations, 1223.

[9] Tang, Jason, "Latin Squares and Their Applications", (2009) self-hosted essay.

[10] randls2.py, a simple demo implementation in Python using indexing, available under the Creative Commons license.

[11] randls1.py, a simple demo implementation in Python using symbol search, available under the Creative Commons license.


 

This page and its contents are licensed under a Creative Commons Attribution 4.0 International License. For complete information about the terms of this license, see http://creativecommons.org/licenses/by/4.0/. The license allows copying, usage, and derivative works for individual or commercial purposes under few restrictions.

Document history:

First draft prepared in HTML format by Larry Trammell, copyright holder, who released the document irrevocably 2020-05-02 under the terms of the CC-BY-4.0 license, and encourages authors to build on this foundation.