Hard-Bench

A Challenging Benchmark for Low-Resource Learning

Paper      code      data

About

We are delighted to introduce Hard-Bench, a benchmark that carefully curates training data from original datasets, which provides a greater challenge than simply random sampling from typical datasets. Our benchmark draws on data from a variety of areas, including NLP and CV, such as the eight tasks from the GLUE benchmark, CIFAR-10, CIFAR-100, and the ImageNet collection.

Hard-Bench (GradNorm)

In Hard-Bench (GradNorm), we create a k-shot dataset by selecting the k data points with the highest gradient norms for each label.
Specifically, for each data point, we compute the gradient vector of the predictor being trained and calculate its Euclidean norm. We then sort the data points by their gradient norms within each label and select the top k data points to form the k-shot dataset. This method allows us to construct a challenging dataset that provides a rigorous evaluation of model performance.

Hard-Bench (Loss)

In Hard-Bench (Loss), we construct a new k-shot dataset by selecting the k data points of each label with the highest loss value.
Specifically, for each data point in the original dataset, we calculate the loss value of the predictor being trained. We then sort the data points by their loss values within each label and select the top k data points to form the k-shot dataset. This approach enables us to construct a challenging dataset that provides a more comprehensive evaluation of model performance.

The content on this website was proofread by ChatGPT.

Available Leaderboards

Hard-Bench (GradNorm) NLP (500-shot) Hard-Bench (Loss) NLP (500-shot) Hard-Bench (GradNorm) CV Hard-Bench (Loss) CV Hard-Bench (GradNorm) NLP (100-shot) Hard-Bench (Loss) NLP (100-shot) Hard-Bench (GradNorm) NLP (32-shot) Hard-Bench (Loss) NLP (32-shot) Hard-Bench (GradNorm) NLP (16-shot) Hard-Bench (Loss) NLP (16-shot)
Leaderboard: NLP, 500-shot, Hard-Bench(GradNorm)
Leaderboard: GLUE, 500-shot, Hard-Bench(Loss)
Leaderboard: CV, Hard-Bench (GradNorm)
500-shot for CIFAR-10, 50-shot for CIFAR-100, 100-shot for ImageNet
Leaderboard: CV, Hard-Bench (Loss)
500-shot for CIFAR-10, 50-shot for CIFAR-100, 100-shot for ImageNet
Leaderboard: GLUE, 100-shot, Hard-Bench(GradNorm)
Leaderboard: GLUE, 100-shot, Hard-Bench(Loss)
Leaderboard: GLUE, 32-shot, Hard-Bench(GradNorm)
Leaderboard: GLUE, 32-shot, Hard-Bench(Loss)
Leaderboard: GLUE, 16-shot, Hard-Bench(GradNorm)
Leaderboard: GLUE, 16-shot, Hard-Bench(Loss)