H-Index [Code + Interactive Visualization]
def hIndex(citations):
# Get length of citations array
n = len(citations)
# Create buckets array with size n+1 to count citations
buckets = [0] * (n + 1)
# Count frequencies of citations
for c in citations:
# If citation count >= n, put in last bucket
if c >= n:
buckets[n] += 1
else:
buckets[c] += 1
# Track total papers with enough citations
count = 0
# Check each possible h-index value from highest to lowest
for i in range(n, -1, -1):
count += buckets[i]
if count >= i:
return i
return 0
class Solution {
// Optimal Bucket Sort solution
public int hIndex(int[] citations) {
// Get length of citations array
int n = citations.length;
// Create buckets array with size n+1 to count citations
int[] buckets = new int[n + 1];
// Count frequencies of citations
for (int c : citations)
if (c >= n) buckets[n]++; else buckets[c]++;
// Track total papers with enough citations
int count = 0;
// Check each possible h-index value from highest to lowest
for (int i = n; i >= 0; i--) {
count += buckets[i];
if (count >= i)
return i;
}
return 0;
}
}
class Solution {
public:
int hIndex(vector& citations) {
// Get length of citations array
int n = citations.size();
// Create buckets array with size n+1 to count citations
vector buckets(n + 1, 0);
// Count frequencies of citations
for (int c : citations)
if (c >= n) buckets[n]++; else buckets[c]++;
// Track total papers with enough citations
int count = 0;
// Check each possible h-index value from highest to lowest
for (int i = n; i >= 0; i--) {
count += buckets[i];
if (count >= i)
return i;
}
return 0;
}
};
Problem Statement
Given an array of integers citations where citations[i] is the number of citations a researcher received for their ith paper, return the researcher's h-index. According to the definition of h-index on Wikipedia: The h-index is defined as the maximum value of h such that the given researcher has published at least h papers that have each been cited at least h times.
Detailed Explanation
Approach
This solution uses bucket sort to efficiently count citation frequencies. Since the h-index cannot be larger than the number of papers, we can use buckets to count citations and find the optimal h-index in a single pass.
Key Concepts
- Bucket Sort Principle: Count frequencies in buckets, where each index represents citation count
- Capped Counts: Citations higher than n can be counted in bucket n
- Running Count: Track papers with sufficient citations while scanning buckets
Algorithm Steps
- Create buckets array of size n+1:
- Index i represents i citations
- Last bucket (n) holds count of all papers with ≥n citations
- Count citations in appropriate buckets:
- If citations ≥ n, increment bucket[n]
- Otherwise, increment bucket[citations]
- Scan buckets from n to 0:
- Add count from current bucket to running total
- If running total ≥ current index, found h-index
Time and Space Complexity
- Time Complexity: O(n) - One pass to count citations, one pass through buckets
- Space Complexity: O(n) - Buckets array of size n+1
Why Bucket Sort Works
Bucket sort is optimal here because:
- H-index cannot exceed number of papers (n)
- Citations higher than n don't need precise counting
- We only need frequency counts, not sorted order
- Can process frequencies in optimal order (high to low)
Edge Cases
- Empty array: Return 0
- All zeros: Return 0
- All citations > n: Return n
- Citations not in sorted order: Works correctly
Common Mistakes
- Not capping citations at n in bucket counting
- Scanning buckets in wrong direction
- Not maintaining running count correctly
- Trying to sort citations first (unnecessary)
Current Citation
-
Running Count
0
H-Index
0
Starting with citations array and empty buckets
Citations Array
3
0
6
1
5
0
1
2
3
4
Buckets Array
0
0
0
0
0
0
0
1
2
3
4
5