Wednesday, January 8, 2020

You Can Sort Dancing Links

From Knuth, Fig 4
I've long been fascinated by the Dancing Links algorithm, popularized by Donald Knuth (who's careful as always to attribute it to Hitotumatu and Noshita). It organizes a search problem into a 0/1 matrix with the goal of selecting rows that cover each column exactly once. The key insight is to represent each 1 in the matrix by a node that's part of two circular doubly linked lists – one a list of all the 1 nodes in its row, the other of all the 1 nodes in its column. To add a row to a potential solution, remove the row and all of the rows which share a column with a 1 with that row. This just requires splicing the nodes out of the horizontal and vertical doubly linked lists. When backtracking, splice the nodes back in again in the reverse order they were removed. If you leave the next and prev pointers in a spliced-out node alone, it's easy and fast to splice it back exactly where it came from.

I'm interested in a slight variation of the problem, where we have a set of n candidates that we want to assign to m roles and n ≤ m. We assign a goodness score to each role for each candidate. We'd like to find an assignment of candidates to roles that maximizes the sum of the goodness scores.

We can use a simplified variant of the dancing links representation, where rows are roles, columns are candidates, and we assign a candidate to a role by splicing out the candidate column and the role row. We label each node with its goodness score. We also label each row and column with its maximum goodness, updating this as we splice out nodes. This gives us an instantaneous upper bound on the possible goodness score of a solution being tried. We discard a solution if this bound falls below the best solution seen so far.

How do we decide what row and column to choose, and how do we quickly update goodness scores? This is the key idea:
        Sort the doubly linked lists for each row and each column.
It turns out we don't actually care what order the list elements occur in when we dance the links – for example, we only care that entry i,j belongs to row list i and column list j, but we don't care (for example) if its predecessors in the row list belong to entries <j.

We can start our search with a greedy approach to assigning candidates to roles, by simply assigning each candidate in turn to the first (highest scoring) role remaining in its column. Search can then backtrack and try to improve on this solution. Note that we should always assign the last candidate to the best available role, but otherwise we can choose to assign candidates to gradually less ideal roles in the hopes of finding a better overall match. We can keep the candidates in a doubly linked list ordered by their best role score, and update this over time if their best role is taken by someone else.

Some candidates are not suited to a given role. We can imagine their goodness score is -∞, but it's easier just to leave nodes out of the matrix. Per Knuth, we should attempt to assign a candidate with the minimal number of available roles at each step to keep the search tree from getting too bushy —so really we should be ordering candidates first by number of available roles, then by maximum role value.

If n = m (which can happen along the way even if it's not initially true) the problem becomes symmetrical and we can instead choose to assign a role to a candidate. In particular, if a role has fewer candidates available than any candidate has roles, we should assign a candidate to that role rather than vice versa. Again we can sort the roles by available candidates and maximum score.

Finally, we can prune entries from the matrix early. Every time the upper bound shrinks, we can remove entries that, if chosen, would drive the upper bound below the best solution found so far. We can do this by walking the sorted doubly linked lists in reverse, removing entries until we reach an entry that's still viable.

We're done as soon as every candidate has a role. If we ever find that a candidate has no available roles, we immediately backtrack. Similarly if n > m we know that some candidate will eventually run out of roles and we can immediately backtrack.

It's not hard to generalize this to a version where roles can accept between p and q candidates. It's trickier for me to see how to do this in the original dancing links without redundancy, though I'm sure it's equally easy in practice.