My favorite problems are those which seem simple but exhibit unexpected depth. A prime example is the Traveling Salesperson Problem: It is simple to understand that the garbage truck needs to collect every garbage container, while trying to take the shortest route.

But here, I want to talk about the problem of the longest increasing subsequence (LIS): For a given sequence of numbers, find the subsequence consisting of increasing numbers, which is longest.

A longest increasing subsequence is marked in a sequence

This problem is so simple that it was first studied almost as a placeholder by Stanisław Ulam in a book chapter describing the Monte Carlo method. And judging by the google results, it seems to be a common problem posed to university students. I am wondering how many job applicants were distressed when trying to solve it in front of a whiteboard.

The Surprising Mathematics of Longest Increasing Subsequences -- Dan Romik

However, apparently one can write whole books about this problem. It turns out that there are surprising connections to seemingly independent problems. For example, the length \(L\) of a LIS of a permutation fluctuates the same way as the distance from the center to the border of a coffee stain or the largest eigenvalues of random matrices.

The solution of this problem is not unique: A Sequence can contain multiple longest increasing subsequences. Indeed, their number grows exponentially with the length of the original sequence.

Different longest increasing subsequences within the same sequence

But up to now, there did not exist any results about the precise number of different LIS. A common sentiment is that counting all LIS was infeasible, since there are exponentially many. And that would be true if we would want to enumerate them. But since we only want to now the number, we can use dynamic programming to determine it efficiently. The basic idea is that we calculate for each element that can appear at position \(x\) in a LIS of how many increasing subsequences of length \(L-x\) it is the first element.

This becomes easy thanks to a datastructure encoding which elements can be subsequent in a LIS. For this we extend Patience Sort. Since the algorithm is called after a game of cards, it is adequate to describe it with cards: We write each element of our sequence on a card and sort the deck according to the sequence such that the first element is on top. Then we take cards from the top of the deck. We put the topmost card on the table opening a stack. We put the next card on the first stack whose top card is larger than it or open a new stack right of the currently rightmost stack. Each time we put a card on the table, we also add pointers to all cards of the stack left of the placed card which are smaller. These are the cards which could be its predecessor in a LIS.

Animation of Patience Sort

In the end there are \(L\) stacks, where \(L\) is the length of the LIS. We can start from the rightmost stack, select an arbitrary element and follow the pointers to build a LIS. If we were only interested in the length, we could disregard all but the top card of every stack and could simply the algorithm:

fn lis_len<T: Ord>(seq: &[T]) -> usize {
    let mut stacks = Vec::new();
    for i in seq {
        let pos = stacks.binary_search(&i)
            .err()
            .expect("Encountered non-unique element in sequence!");
        if pos == stacks.len() {
            stacks.push(i);
        } else {
            stacks[pos] = i;
        }
    }
    stacks.len()
}

But we want more, therefore we annotate each card of the rightmost stack with the number of increasing subsequences of length \(x=1\) of which they are the first element, which is trivially 1 for each card. Then we continue with the stack left of it and annotate how many increasing subsequences of length 2 start with them. We can calculate this easily by following the pointers backwards and add up the annotations of all predecessor cards. After repeating this and annotating the leftmost stack, we can sum all annotations of the leftmost stack to get the total number of distinct LIS: here \(7\).

Example of the datastructure to count LIS

About the behavior for longer sequences from different random ensembles we published an article.