The Hungarian algorithm
The Hungarian algorithm (also called Kuhn-Munkres algorithm) is a powerful algorithm, yet mostly unknown out of the computer science community.
Linear sum assignment problem
The problem is actually quite simple, and is called the linear sum assignment problem: Let us say that you have N agents and N tasks, and you can estimate the benefice that you can gain from all possible individual assignment.
As an exemple, let us consider the following situation where we have estimated the individual assignment scores:
| Task→ Worker↓ |
Make coffee | Use machine | Fix machine |
|---|---|---|---|
| Stakhanovite worker | 8/10 | 10/10 | 9/10 |
| Specialized worker | 5/10 | 8/10 | 2/10 |
| Newbie | 2/10 | 2/10 | -2/10 |
The Hungarian algorithm is an optimisation method that finds the assignment maximizing the total gain (the sum of all N individual gain). Note: obviously, this assignment is also the one maximizing the average gain per agent/task.
An example in theoretical chemistry would be the assignment of normal modes from the ground state to an excited state, or optimizing the distribution of desks among the PhD students...
More generally, this applies to any problem where you search a complete one-to-one matching and you can evaluate a cost/gain for each individual assignment, usually in the form of a cost/gain matrix. Note: If M is a gain matrix, then -M is trivially an acceptable corresponding cost matrix.
Practical solving
The Hungarian algorithm's details are definitely not trivial, and are therefore out of scope for this geek news. The implementation is also technical and let us be honest, not very interesting...
Fortunately, the Hungarian algorithm has already been implemented in Python3, and is easily accessible from the well-known Scipy module, through the scipy.optimize.linear_sum_assignment function.
import numpy as np
from scipy.optimize import linear_sum_assignment
This fonction requires a cost matrix as input, so we create it using the example above (using the opposite of the above gain matrix).
gain_matrix = np.array([[8, 10, 9], [5, 8, 2], [2, 2, -2]])
cost_matrix = - gain_matrix
At this point, applying the Hungarian algorithm is as easy as:
row_ind, col_ind = linear_sum_assignment(cost_matrix) # `row_ind`->[0, 1, 2], `col_ind`->[2, 1, 0]
The function return two lists: row_ind and row_ind, such that the i-th component of the computed assignment matches worker row_ind[i] to task col_ind[i].
One can now vizualize the results in a more meaningful way, and display some quality estimator of the assignment.
for worker_ind, task_ind in zip(row_ind, col_ind): # Iterate over `row_ind` and `col_ind` simultaneously
print('worker {} -> task {}'.format(worker_ind, task_ind))
# Pretty print
workers_names = ['Stakhanovite worker', 'Specialized worker', 'Newbie']
tasks_names = ['Make coffee', 'Use machine', 'Fix machine']
for i, j in zip(row_ind, col_ind):
print('{} -> {}'.format(workers_names[i], tasks_names[j]))
# Bonus : retrieve total sum
print('Total cost:', cost_matrix[row_ind, col_ind].sum()) # Print total cost
print('Average overall gain', -(1/3)*cost_matrix[row_ind, col_ind].sum()) # Print average score
Full script
import numpy as np
from scipy.optimize import linear_sum_assignment
# Definition of cost matrix
gain_matrix = np.array([[8, 10, 9], [5, 8, 2], [2, 2, -2]])
cost_matrix = - gain_matrix
# Apply Hungarian algorithm
row_ind, col_ind = linear_sum_assignment(cost_matrix) # `row_ind`->[0, 1, 2], `col_ind`->[2, 1, 0]
# Print raw matching results
print('Optimal assignment:')
for worker_ind, task_ind in zip(row_ind, col_ind): # Iterate over `row_ind` and `col_ind` simultaneously
print('worker {} -> task {}'.format(worker_ind, task_ind))
# Pretty print
print('\nPretty printed assignment:')
workers_names = ['Stakhanovite worker', 'Specialized worker', 'Newbie']
tasks_names = ['Make coffee', 'Use machine', 'Fix machine']
for i, j in zip(row_ind, col_ind):
print('{} -> {}'.format(workers_names[i], tasks_names[j]))
# Bonus : retrieve total sum
print('\nTotal cost:', cost_matrix[row_ind, col_ind].sum()) # Print total cost
print('Average overall gain', -(1/3)*cost_matrix[row_ind, col_ind].sum()) # Print average score
Enjoy!