A preview version submitted with paper review process.

Requirements: tensorflow 2.2 +

Please report bugs via reviews.

Hints

Implementation logic of the algorithm in the paper (also see comments in CLPM/CLPM.py/_CLPM)
alternation in CLPM_MLM.py: line 123

Steps and some tricks.

a. using UNIVERSAL/tokenizer to generate BPE and the entries of languages' domain. **NOTE that for your convience, you can use your FastBPE to collect language domains. e.g.,apply codes to corpus:./fast applybpe after-bpe-En-corpuse before-bpe-EN-corpuse bpe-codes and get ./fast getvocab after-bpe-En-corpuse > vocab.En)

b. set up the entries of languages' domains for CLPM.  (CLPM.py, line 66)

c. Select a masking method and a MLM instance.

d. Suppose we use a XLM encoder and its masking method.
	1. The input is [x1, [MASK], x3,[MASK],x5,[MASK],x6,[MASK]].
	2.  In line 90 of CLPM_MLM.py, we random select [C] position: clpm_position [0,1,0,0,0,0,0,1]. 
	3. We wrap the inference mode in line 105 of CLPM_MLM.py and use the inference mode in line 152 of CLPM.py. 
	4. We pass the input [x1, [MASK], x3,[MASK],x5,[MASK],x6,[MASK]] for inferring. 
	5. Then, we use the last hiden state to compute candidats (line line 144 def_cos of CLPM.py) and get [C]: [0,[C2],0,0,0,0,0,[C7]]. C2 and C7 are representations. Please also see comments in files.
e. now we can get the perturbed input for training.
Input Embedding * (1-  clpm_position) + [C]
e.g., [x1, [MASK], x3,[MASK],x5,[MASK],x6,[MASK]] * (1-[0,1,0,0,0,0,0,1]) =  [x1, 0, x3,[MASK],x5,[MASK],x6,0]. [x1, 0, x3,[MASK],x5,[MASK],x6,0] + [0,[C2],0,0,0,0,0,[C7]] = [x1, [C2], x3,[MASK],x5,[MASK],x6,[C7]]. [x1, [C2], x3,[MASK],x5,[MASK],x6,[C7]] is the training sample.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.DS_Store		.DS_Store
.gitattributes		.gitattributes
CLPM.py		CLPM.py
CLPM_MLM.py		CLPM_MLM.py
README.md		README.md
initialization.py		initialization.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A preview version submitted with paper review process.

Hints

Steps and some tricks.

About

Releases

Packages

Languages

baridxiai/clpm

Folders and files

Latest commit

History

Repository files navigation

A preview version submitted with paper review process.

Hints

Steps and some tricks.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages