distilbert-base-uncased-finetuned-imdb
This model is a fine-tuned version of distilbert-base-uncased on the IMDb dataset.
Model description
This model uses DistilBERT (a smaller, faster, cheaper version of BERT) and performs Domain Adaptation on movie reviews.
While the original DistilBERT was pre-trained on English Wikipedia and BookCorpus (factual and literary data), this version is fine-tuned on the IMDb dataset to better understand the specific vocabulary, sentiment nuances, and context of movie reviews. This process allows the model to predict masked tokens that are contextually relevant to the film industry and subjective opinions.
Intended uses & limitations
Intended Uses:
- Masked Language Modeling: The model can be used to fill in the blank (
[MASK]) in sentences related to movies or reviews. - Domain Adaptation Base: This model can serve as a better starting point (backbone) for training a downstream classifier (e.g., sentiment analysis) on movie reviews compared to the vanilla DistilBERT.
Limitations:
- The model is trained on a downsampled version of the IMDb dataset (10,000 samples) for demonstration purposes, so it may not be as robust as a model trained on the full corpus.
- It is biased towards the specific vocabulary found in internet movie reviews (which can be highly polarized).
Training and evaluation data
The model was trained on the Large Movie Review Dataset (IMDb).
Preprocessing:
- Tokenization: Used the DistilBERT tokenizer with a chunk size of
128tokens. - Masking: Random masking with a probability of
0.15(15%). - Sampling: To speed up training for the tutorial, the dataset was downsampled:
- Training set: 10,000 examples.
- Test set: 1,000 examples (10% of training).
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 64
- eval_batch_size: 64
- seed: 42
- optimizer: AdamW
- weight_decay: 0.01
- lr_scheduler_type: linear
- num_epochs: 3.0
- mixed_precision_training: Native AMP (fp16=True)
Training results
The model achieved a significant reduction in perplexity compared to the pre-trained base model, indicating successful adaptation to the movie review domain.
| Epoch | Perplexity |
|---|---|
| 0 | 11.40 |
| 1 | 10.90 |
| 2 | 10.73 |
Note: The base model started with a perplexity of ~21.75 on this dataset before fine-tuning.
Framework versions
- Transformers 4.57.2
- Pytorch 2.9.0+cu126
- Datasets 4.0.0
- Tokenizers 0.22.1
- Downloads last month
- 17
Model tree for rajaykumar12959/distilbert-base-uncased-finetuned-imdb
Base model
distilbert/distilbert-base-uncased