H-Index [Code + Interactive Visualization]

Nov 12

Array Algorithm Medium


def hIndex(citations):
    # Get length of citations array
    n = len(citations)
    # Create buckets array with size n+1 to count citations
    buckets = [0] * (n + 1)
    
    # Count frequencies of citations
    for c in citations:
        # If citation count >= n, put in last bucket
        if c >= n:
            buckets[n] += 1
        else:
            buckets[c] += 1
    
    # Track total papers with enough citations
    count = 0
    
    # Check each possible h-index value from highest to lowest
    for i in range(n, -1, -1):
        count += buckets[i]
        if count >= i:
            return i
            
    return 0


class Solution {
    // Optimal Bucket Sort solution
    public int hIndex(int[] citations) {
        // Get length of citations array 
        int n = citations.length;
        // Create buckets array with size n+1 to count citations
        int[] buckets = new int[n + 1];
        
        // Count frequencies of citations
        for (int c : citations)
            if (c >= n) buckets[n]++; else buckets[c]++;
        
        // Track total papers with enough citations
        int count = 0;
        
        // Check each possible h-index value from highest to lowest
        for (int i = n; i >= 0; i--) {
            count += buckets[i];
            if (count >= i)
                return i;
        }
        
        return 0;
    }
}


class Solution {
public:
    int hIndex(vector& citations) {
        // Get length of citations array
        int n = citations.size();
        // Create buckets array with size n+1 to count citations
        vector buckets(n + 1, 0);
        
        // Count frequencies of citations
        for (int c : citations)
            if (c >= n) buckets[n]++; else buckets[c]++;
        
        // Track total papers with enough citations
        int count = 0;
        
        // Check each possible h-index value from highest to lowest
        for (int i = n; i >= 0; i--) {
            count += buckets[i];
            if (count >= i)
                return i;
        }
        
        return 0;
    }
};

Problem Statement

Given an array of integers citations where citations[i] is the number of citations a researcher received for their ith paper, return the researcher's h-index. According to the definition of h-index on Wikipedia: The h-index is defined as the maximum value of h such that the given researcher has published at least h papers that have each been cited at least h times.

Detailed Explanation

Approach

This solution uses bucket sort to efficiently count citation frequencies. Since the h-index cannot be larger than the number of papers, we can use buckets to count citations and find the optimal h-index in a single pass.

Key Concepts

Bucket Sort Principle: Count frequencies in buckets, where each index represents citation count
Capped Counts: Citations higher than n can be counted in bucket n
Running Count: Track papers with sufficient citations while scanning buckets

Algorithm Steps

Create buckets array of size n+1:
- Index i represents i citations
- Last bucket (n) holds count of all papers with ≥n citations
Count citations in appropriate buckets:
- If citations ≥ n, increment bucket[n]
- Otherwise, increment bucket[citations]
Scan buckets from n to 0:
- Add count from current bucket to running total
- If running total ≥ current index, found h-index

Time and Space Complexity

Time Complexity: O(n) - One pass to count citations, one pass through buckets
Space Complexity: O(n) - Buckets array of size n+1

Why Bucket Sort Works

Bucket sort is optimal here because:

H-index cannot exceed number of papers (n)
Citations higher than n don't need precise counting
We only need frequency counts, not sorted order
Can process frequencies in optimal order (high to low)

Edge Cases

Empty array: Return 0
All zeros: Return 0
All citations > n: Return n
Citations not in sorted order: Works correctly

Common Mistakes

Not capping citations at n in bucket counting
Scanning buckets in wrong direction
Not maintaining running count correctly
Trying to sort citations first (unnecessary)

H-Index - Interactive Visualization

Current Citation

Running Count

H-Index

Click "Next Step" to begin the visualization

Citations Array

Buckets Array

Tal Nabulsi