Skip to content

Commit

Permalink
update_21
Browse files Browse the repository at this point in the history
  • Loading branch information
rucliuzenghao committed Sep 18, 2024
1 parent ef73cb9 commit f03f8c5
Showing 1 changed file with 1 addition and 7 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,7 @@ publication_types: ['journal-article']
publication: in *JCST*
publication_short: ""

abstract: Deep learning has shown significant improvements on various machine learning tasks by introducing a wide
spectrum of neural network models. Yet, for these neural network models, it is necessary to label a tremendous amount
of training data, which is prohibitively expensive in reality. In this paper, we propose OnLine Machine Learning (OLML)
database which stores trained models and reuses these models in a new training task to achieve a better training effect
with a small amount of training data. An efficient model reuse algorithm AdaReuse is developed in the OLML database.
Specifically, AdaReuse firstly estimates the reuse potential of trained models from domain relatedness and model quality,
through which a group of trained models with high reuse potential for the training task could be selected efficiently. Then,multi selected models will be trained iteratively to encourage diverse models, with which a better training effect could be achieved by ensemble. We evaluate AdaReuse on two types of natural language processing (NLP) tasks, and the results show AdaReuse could improve the training effect significantly compared with models training from scratch when the training data is limited. Based on AdaReuse, we implement an OLML database prototype system which could accept a training task as an SQL-like query and automatically generate a training plan by selecting and reusing trained models. Usability studies are conducted to illustrate the OLML database could properly store the trained models, and reuse the trained models efficiently in new training tasks.
abstract: Deep learning has shown significant improvements on various machine learning tasks by introducing a wide spectrum of neural network models. Yet, for these neural network models, it is necessary to label a tremendous amount of training data, which is prohibitively expensive in reality. In this paper, we propose OnLine Machine Learning (OLML) database which stores trained models and reuses these models in a new training task to achieve a better training effect with a small amount of training data. An efficient model reuse algorithm AdaReuse is developed in the OLML database. Specifically, AdaReuse firstly estimates the reuse potential of trained models from domain relatedness and model quality, through which a group of trained models with high reuse potential for the training task could be selected efficiently. Then,multi selected models will be trained iteratively to encourage diverse models, with which a better training effect could be achieved by ensemble. We evaluate AdaReuse on two types of natural language processing (NLP) tasks, and the results show AdaReuse could improve the training effect significantly compared with models training from scratch when the training data is limited. Based on AdaReuse, we implement an OLML database prototype system which could accept a training task as an SQL-like query and automatically generate a training plan by selecting and reusing trained models. Usability studies are conducted to illustrate the OLML database could properly store the trained models, and reuse the trained models efficiently in new training tasks.

# Summary. An optional shortened abstract.
summary: ""
Expand Down

0 comments on commit f03f8c5

Please sign in to comment.