From Knuth, Fig 4 |
I'm interested in a slight variation of the problem, where we have a set of n candidates that we want to assign to m roles and n ≤ m. We assign a goodness score to each role for each candidate. We'd like to find an assignment of candidates to roles that maximizes the sum of the goodness scores.
We can use a simplified variant of the dancing links representation, where rows are roles, columns are candidates, and we assign a candidate to a role by splicing out the candidate column and the role row. We label each node with its goodness score. We also label each row and column with its maximum goodness, updating this as we splice out nodes. This gives us an instantaneous upper bound on the possible goodness score of a solution being tried. We discard a solution if this bound falls below the best solution seen so far.
How do we decide what row and column to choose, and how do we quickly update goodness scores? This is the key idea:
Sort the doubly linked lists for each row and each column.
It turns out we don't actually care what order the list elements occur in when we dance the links – for example, we only care that entry i,j belongs to row list i and column list j, but we don't care (for example) if its predecessors in the row list belong to entries <j.We can start our search with a greedy approach to assigning candidates to roles, by simply assigning each candidate in turn to the first (highest scoring) role remaining in its column. Search can then backtrack and try to improve on this solution. Note that we should always assign the last candidate to the best available role, but otherwise we can choose to assign candidates to gradually less ideal roles in the hopes of finding a better overall match. We can keep the candidates in a doubly linked list ordered by their best role score, and update this over time if their best role is taken by someone else.
Some candidates are not suited to a given role. We can imagine their goodness score is -∞, but it's easier just to leave nodes out of the matrix. Per Knuth, we should attempt to assign a candidate with the minimal number of available roles at each step to keep the search tree from getting too bushy —so really we should be ordering candidates first by number of available roles, then by maximum role value.
If n = m (which can happen along the way even if it's not initially true) the problem becomes symmetrical and we can instead choose to assign a role to a candidate. In particular, if a role has fewer candidates available than any candidate has roles, we should assign a candidate to that role rather than vice versa. Again we can sort the roles by available candidates and maximum score.
Finally, we can prune entries from the matrix early. Every time the upper bound shrinks, we can remove entries that, if chosen, would drive the upper bound below the best solution found so far. We can do this by walking the sorted doubly linked lists in reverse, removing entries until we reach an entry that's still viable.
We're done as soon as every candidate has a role. If we ever find that a candidate has no available roles, we immediately backtrack. Similarly if n > m we know that some candidate will eventually run out of roles and we can immediately backtrack.
It's not hard to generalize this to a version where roles can accept between p and q candidates. It's trickier for me to see how to do this in the original dancing links without redundancy, though I'm sure it's equally easy in practice.