H-Index [Code + Interactive Visualization]


def hIndex(citations):
    # Get length of citations array
    n = len(citations)
    # Create buckets array with size n+1 to count citations
    buckets = [0] * (n + 1)
    
    # Count frequencies of citations
    for c in citations:
        # If citation count >= n, put in last bucket
        if c >= n:
            buckets[n] += 1
        else:
            buckets[c] += 1
    
    # Track total papers with enough citations
    count = 0
    
    # Check each possible h-index value from highest to lowest
    for i in range(n, -1, -1):
        count += buckets[i]
        if count >= i:
            return i
            
    return 0

class Solution {
    // Optimal Bucket Sort solution
    public int hIndex(int[] citations) {
        // Get length of citations array 
        int n = citations.length;
        // Create buckets array with size n+1 to count citations
        int[] buckets = new int[n + 1];
        
        // Count frequencies of citations
        for (int c : citations)
            if (c >= n) buckets[n]++; else buckets[c]++;
        
        // Track total papers with enough citations
        int count = 0;
        
        // Check each possible h-index value from highest to lowest
        for (int i = n; i >= 0; i--) {
            count += buckets[i];
            if (count >= i)
                return i;
        }
        
        return 0;
    }
}

class Solution {
public:
    int hIndex(vector& citations) {
        // Get length of citations array
        int n = citations.size();
        // Create buckets array with size n+1 to count citations
        vector buckets(n + 1, 0);
        
        // Count frequencies of citations
        for (int c : citations)
            if (c >= n) buckets[n]++; else buckets[c]++;
        
        // Track total papers with enough citations
        int count = 0;
        
        // Check each possible h-index value from highest to lowest
        for (int i = n; i >= 0; i--) {
            count += buckets[i];
            if (count >= i)
                return i;
        }
        
        return 0;
    }
};

Problem Statement

Given an array of integers citations where citations[i] is the number of citations a researcher received for their ith paper, return the researcher's h-index. According to the definition of h-index on Wikipedia: The h-index is defined as the maximum value of h such that the given researcher has published at least h papers that have each been cited at least h times.

Detailed Explanation

Approach

This solution uses bucket sort to efficiently count citation frequencies. Since the h-index cannot be larger than the number of papers, we can use buckets to count citations and find the optimal h-index in a single pass.

Key Concepts

  1. Bucket Sort Principle: Count frequencies in buckets, where each index represents citation count
  2. Capped Counts: Citations higher than n can be counted in bucket n
  3. Running Count: Track papers with sufficient citations while scanning buckets

Algorithm Steps

  1. Create buckets array of size n+1:
    • Index i represents i citations
    • Last bucket (n) holds count of all papers with ≥n citations
  2. Count citations in appropriate buckets:
    • If citations ≥ n, increment bucket[n]
    • Otherwise, increment bucket[citations]
  3. Scan buckets from n to 0:
    • Add count from current bucket to running total
    • If running total ≥ current index, found h-index

Time and Space Complexity

  • Time Complexity: O(n) - One pass to count citations, one pass through buckets
  • Space Complexity: O(n) - Buckets array of size n+1

Why Bucket Sort Works

Bucket sort is optimal here because:

  • H-index cannot exceed number of papers (n)
  • Citations higher than n don't need precise counting
  • We only need frequency counts, not sorted order
  • Can process frequencies in optimal order (high to low)

Edge Cases

  • Empty array: Return 0
  • All zeros: Return 0
  • All citations > n: Return n
  • Citations not in sorted order: Works correctly

Common Mistakes

  • Not capping citations at n in bucket counting
  • Scanning buckets in wrong direction
  • Not maintaining running count correctly
  • Trying to sort citations first (unnecessary)
H-Index - Interactive Visualization
Current Citation
-
Running Count
0
H-Index
0
Starting with citations array and empty buckets
Citations Array
3
0
6
1
5
0
1
2
3
4
Buckets Array
0
0
0
0
0
0
0
1
2
3
4
5
Previous
Previous

Integer to Roman [Code + Interactive visualization]

Next
Next

Roman To Integer [Code + Interactive Visualization]