Collection of different data selection methods, where the amount of training data is the same
Dragoș Alexandru Bălan
greenw0lf
AI & ML interests
Automatic speech recognition for low-resource languages
Organizations
None yet
Whisper - 22 hours of training data
Collection of different data selection methods, where the amount of training data is the same
Data selection - early experiments
Collection of Whisper FT models using different data selection approaches (metadata, classifier, full corpus)
Whisper child-adult training data ratios for child ASR
Models that have all been trained with 30 hours of speech, but using different ratios of child-adult speech