Question & Answer with Sparse BERT using the SQuAD dataset

This prediction model is designed to answer a question about a given input text--reading comprehension. The model does not just answer questions in general -- it only works from the text that you provide. However, automated reading comprehension can be a valuable task.

The model is based on the Zafrir et al. (2021) paper: Prune Once for All: Sparse Pre-Trained Language Models. The model can be found here. It has had weight pruning and model distillation applied to create a sparse weight pattern that is maintained even after fine-tuning has been applied. According to Zafrir et al. (2021), their "results show the best compression-to-accuracy ratio for BERT-Base". This model is still in FP32, but can be quantized to INT8 with the Intel® Neural Compressor.

The training dataset used is the English Wikipedia dataset (2500M words), and then fine-tuned on the SQuADv1.1 dataset containing 89K training examples, compiled by Rajpurkar et al. (2016): 100, 000+ Questions for Machine Comprehension of Text.

Author of Hugging Face Space: Benjamin Consolvo, AI Solutions Engineer Manager at Intel

Date last updated: 03/28/2023