Abstract
Despite much success, the effectiveness of deep learning models largely relies on the availability of large amounts of labeled data. A large amount of labeled data, however, is costly to acquire in many applications of interest, which hinders the applicability of these models, especially in resource-poor settings. On the other hand, with the growth of the internet, an enormous amount of user-generated data have been accumulated which is readily available and free. Although they may not annotate the necessary structured output of the target downstream tasks, they can provide relevant information and background knowledge which can be formed into auxiliary learning signals to enhance the target application. Hence, computational approaches for leveraging the open-source data as well as utilizing the resource-rich corpora in low-resource applications can enable us to build models for a broad spectrum of languages, domains, and modalities regardless
of their training data size.
of their training data size.
Original language | English |
---|---|
Publication status | Published - 2022 |
Externally published | Yes |