前言

H-Index，这是一道 mid 难度的题目，需要我们对给定的作者 paper 引用次数citations计算 h-index，h-index 就是：

The h-index is defined as the maximum value of h such that the given researcher has published at least h papers that have each been cited at least h times.

就是说对于一个 $h$ ，它满足作者至少有 $h$ 篇文章引用数大于等于 $h$ 。 $\textit{h-index}$ 是 $h$ 的最大值。 $\textit{h-index} = 5$ ，则说明作者有至少 5 篇文章引用数大于等于 5，而且并不存在 6 篇引用数大于等于 6 的文章。

动态规划（×）枚举（√）

因为做题的习惯，首先想到的是动态规划，即根据截止到第 $i - 1$ 篇文章的 $\textit{h-index}$ 计算到第 $i$ 篇文章的 $\textit{h-index}$ 。事实上计算到第 $i$ 篇文章的 $\textit{h-index}$ （记为 $h_i$ ）需要用到前面所有文章的引用数：（用伪代码写一下）

delta = sum(citations > h_{i-1}) > h_{i-1}\\ h_i = h_{i-1} + delta

这种动态规划（？）的思路复杂度应该是时间 $O(n^2)$ ，空间 $O(1)$ ， $n$ 是citations的长度。

也可以换个方向，不枚举下标，而是枚举结果 $\textit{h-index}$ ，或者说 $h$ ，很显然：

0 \leq h \leq citations.length

从大到小枚举 $h$ ，也就是找到符合题目条件最大的 $h$ 值，有：


ts
function hIndex(citations: number[]): number {
    const len = citations.length
    const calc = function (target: number) {
        let count = 0
        for (let i = 0; i < len; i++) {
            if (citations[i] >= target) count++
        }
        return count >= target
    }
    let ans = 0
    for (let i = len; i >= 0; i--) {
        if (calc(i)) {
            ans = i
            break
        }
    }
    return ans
};

复杂度应该是时间 $O(n^2)$ ，空间 $O(1)$ 。

二分搜索

我们可以用二分搜索优化一下上面的枚举，时间复杂度 $O(nlogn)$ ，空间复杂度是 $O(1)$ 。


ts
function hIndex(citations: number[]): number {
    const len = citations.length
    const calc = function (target: number) {
        let count = 0
        for (let i = 0; i < len; i++) {
            if (citations[i] >= target) count++
        }
        return target - count
    }
    let left = 0, right = len + 1
    while (left < right) {
        const mid = (left + right) >> 1
        const val = calc(mid)
        if (val <= 0) {
            left = mid + 1
        } else {
            right = mid
        }
    }
    return left - 1
};

构造一个 $h$ 到citations离满足 $\textit{h-index}=h$ 文章数差额的函数：

f(h) = h - sum(citations \geq h)

$sum(citations \geq h)$ 也就是citations引用大于等于 $h$ 的文章数，随 $h$ 递减，因此上面的 $f(h)$ 是单调递增的，因此可以对其进行从大到小地枚举和二分搜索。

上面的代码就是需要查找这个函数小于 0 的最大值，即满足题目citations的最大 $h$ ，也就是二分查找中说的右侧边界。

空间换时间的优化

我们可以在 $O(n)$ 时间内把 $h\rightarrow f(h)$ 的结果算出来，这样子就可以时间复杂度 $O(n)$ ，空间复杂度 $O(n)$ 了。


ts
function hIndex(citations: number[]): number {
    const len = citations.length
    const cache = new Array(len + 1).fill(0)
    for (let i = 0; i < len; i++) {
        cache[Math.min(citations[i], len)]++
    }
    for (let i = len - 1; i >= 0; i--) {
        cache[i] += cache[i + 1]
    }
    let left = 0, right = len + 1
    while (left < right) {
        const mid = (left + right) >> 1
        const val = mid - cache[mid]
        if (val <= 0) {
            left = mid + 1
        } else {
            right = mid
        }
    }
    return left - 1
};

参考这个题解，其实在上面累加cache数组的时候，累加值大于等于i的时候就相当于枚举到了最大的 $h$ 值，程序可以结束了。


ts
function hIndex(citations: number[]): number {
    const len = citations.length
    const cache = new Array(len + 1).fill(0)
    for (let i = 0; i < len; i++) {
        cache[Math.min(citations[i], len)]++
    }
    let total = 0, ans = 0
    for (let i = len; i >= 0; i--) {
        total += cache[i]
        if (total >= i) {
            ans = i
            break
        }
    }
    return ans
};

同样地，时间复杂度 $O(n)$ ，空间复杂度 $O(n)$ 。

总结

本文讨论了基于枚举和二分搜索的 H-Index 解法及其优化。感觉遇到枚举的题目可以考虑进行二分搜索，进一步地，对于搜索中耗时计算可以考虑进行空间换时间的优化。

Leetcode 274 H-Index

前言

动态规划（×）枚举（√）

二分搜索

空间换时间的优化

总结