Skip to content

Commit

Permalink
Add models from Hugging Face/transformers from MLAgility (#615)
Browse files Browse the repository at this point in the history
* popular_on_huggingface/bert-base-uncased.py

Signed-off-by: jcwchen <[email protected]>

* add transformers models

Signed-off-by: jcwchen <[email protected]>

* remove gpt1 and gpt2 for now

Signed-off-by: jcwchen <[email protected]>

* config

Signed-off-by: jcwchen <[email protected]>

* get model name from build_dir

Signed-off-by: jcwchen <[email protected]>

* find_model_hash_name

Signed-off-by: jcwchen <[email protected]>

* subprocess.PIPE

Signed-off-by: jcwchen <[email protected]>

* new models

Signed-off-by: jcwchen <[email protected]>

* 7 models

Signed-off-by: jcwchen <[email protected]>

* only keep 4

Signed-off-by: jcwchen <[email protected]>

* remove 4

Signed-off-by: jcwchen <[email protected]>

* remove albert-base-v2

Signed-off-by: jcwchen <[email protected]>

* del model and sess

Signed-off-by: jcwchen <[email protected]>

* check_path

Signed-off-by: jcwchen <[email protected]>

* drop models in CI

Signed-off-by: jcwchen <[email protected]>

* add bert_generation

Signed-off-by: jcwchen <[email protected]>

* --binary

Signed-off-by: jcwchen <[email protected]>

* disable bert_generation.py

Signed-off-by: jcwchen <[email protected]>

* no binary

Signed-off-by: jcwchen <[email protected]>

* cancel in progress

Signed-off-by: jcwchen <[email protected]>

* binary

Signed-off-by: jcwchen <[email protected]>

* minimal

Signed-off-by: jcwchen <[email protected]>

* --mini

Signed-off-by: jcwchen <[email protected]>

* manually check

Signed-off-by: jcwchen <[email protected]>

* only keep

Signed-off-by: jcwchen <[email protected]>

* run_test_dir

Signed-off-by: jcwchen <[email protected]>

* coma

Signed-off-by: jcwchen <[email protected]>

* cache_converted_dir = "~/.cache"

Signed-off-by: jcwchen <[email protected]>

* delete and clean cache

Signed-off-by: jcwchen <[email protected]>

* clean

Signed-off-by: jcwchen <[email protected]>

* clean all

Signed-off-by: jcwchen <[email protected]>

* only clean

Signed-off-by: jcwchen <[email protected]>

* --cache-dir", cache_converted_dir

Signed-off-by: jcwchen <[email protected]>

* disable openai_clip-vit-large-patch14

Signed-off-by: jcwchen <[email protected]>

* disable

Signed-off-by: jcwchen <[email protected]>

* only keep 4

Signed-off-by: jcwchen <[email protected]>

* comma

Signed-off-by: jcwchen <[email protected]>

* runs-on: macos-latest

Signed-off-by: jcwchen <[email protected]>

* not using conda

Signed-off-by: jcwchen <[email protected]>

* final_model_path

Signed-off-by: jcwchen <[email protected]>

* git-lfst pull dir

Signed-off-by: jcwchen <[email protected]>

* git diff

Signed-off-by: jcwchen <[email protected]>

* use onnx.load to compare

Signed-off-by: jcwchen <[email protected]>

* test_utils.pull_lfs_file(final_model_path)

Signed-off-by: jcwchen <[email protected]>

* only test changed models

Signed-off-by: jcwchen <[email protected]>

* test_utils

Signed-off-by: jcwchen <[email protected]>

* get_cpu_info

Signed-off-by: jcwchen <[email protected]>

* ext names

Signed-off-by: jcwchen <[email protected]>

* test_utils.get_changed_models()

Signed-off-by: jcwchen <[email protected]>

* compare 2

Signed-off-by: jcwchen <[email protected]>

* fix init

Signed-off-by: jcwchen <[email protected]>

* transformers==4.29.2

Signed-off-by: jcwchen <[email protected]>

* test

Signed-off-by: jcwchen <[email protected]>

* initializer

Signed-off-by: jcwchen <[email protected]>

* update bert-generation

Signed-off-by: jcwchen <[email protected]>

* fixed numpy

Signed-off-by: jcwchen <[email protected]>

* print(f"initializer {k}")

Signed-off-by: jcwchen <[email protected]>

* update bert from mac

Signed-off-by: Chun-Wei Chen <[email protected]>

* remove bert-generation

Signed-off-by: jcwchen <[email protected]>

* mlagility_subdir_count number

Signed-off-by: jcwchen <[email protected]>

* remove unused onnx

Signed-off-by: jcwchen <[email protected]>

---------

Signed-off-by: jcwchen <[email protected]>
Signed-off-by: Chun-Wei Chen <[email protected]>
  • Loading branch information
jcwchen authored Jul 26, 2023
1 parent c5612a4 commit c021460
Show file tree
Hide file tree
Showing 31 changed files with 170 additions and 56 deletions.
4 changes: 4 additions & 0 deletions .github/workflows/codeql.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@ on:
schedule:
- cron: '31 11 * * 4'

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
analyze:
name: Analyze
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/linux_ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@ on:
pull_request:
branches: [ main, new-models]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
# This workflow contains a single job called "build"
build:
Expand Down
17 changes: 10 additions & 7 deletions .github/workflows/mlagility_validation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,20 +6,23 @@ on:
pull_request:
branches: [ main, new-models]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
build:
runs-on: ubuntu-latest
runs-on: macos-latest
strategy:
matrix:
python-version: ['3.8']
python-version: ["3.8"]

steps:
- uses: actions/checkout@v3
- uses: actions/checkout@c85c95e3d7251135ab7dc9ce3241c5835cc595a9 # v3.5.3
name: Checkout repo
- uses: conda-incubator/setup-miniconda@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@61a6322f88396a6271a6ee3565807d608ecaddd1 # v4.7.0
with:
miniconda-version: "latest"
activate-environment: mla
python-version: ${{ matrix.python-version }}

- name: Install dependencies and mlagility
Expand All @@ -34,4 +37,4 @@ jobs:
run: |
# TODO: remove the following after mlagility has resovled version contradict issue
pip install -r models/mlagility/requirements.txt
python workflow_scripts/run_mlagility.py
python workflow_scripts/run_mlagility.py --drop
4 changes: 4 additions & 0 deletions .github/workflows/windows_ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@ on:
pull_request:
branches: [ main, new-models]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
# This workflow contains a single job called "build"
build:
Expand Down
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
2 changes: 2 additions & 0 deletions models/mlagility/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
numpy==1.24.4
torch==2.0.1
torchvision==0.15.2
transformers==4.29.2
3 changes: 2 additions & 1 deletion workflow_scripts/check_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@ def has_vnni_support():

def run_onnx_checker(model_path):
model = onnx.load(model_path)
onnx.checker.check_model(model, full_check=True)
del model
onnx.checker.check_model(model_path, full_check=True)


def ort_skip_reason(model_path):
Expand Down
3 changes: 1 addition & 2 deletions workflow_scripts/generate_onnx_hub_manifest.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,7 @@
import onnx
from onnx import shape_inference
import argparse
from test_models import get_changed_models
from test_utils import pull_lfs_file
from test_utils import get_changed_models, pull_lfs_file


# Acknowledgments to pytablereader codebase for this function
Expand Down
11 changes: 11 additions & 0 deletions workflow_scripts/mlagility_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,15 @@
"torch_hub/densenet121.py",
"torch_hub/inception_v3.py",
"torch_hub/googlenet.py",
#"transformers/bert_generation.py", # non consistent created model from mlagility
#"popular_on_huggingface/bert-base-uncased.py",
#"popular_on_huggingface/xlm-roberta-large.py",
#"popular_on_huggingface/bert-large-uncased.py",
"popular_on_huggingface/openai_clip-vit-large-patch14.py",
#"popular_on_huggingface/xlm-roberta-base.py", # output nan
#"popular_on_huggingface/roberta-base.py", # output nan
"popular_on_huggingface/distilbert-base-uncased.py",
#"popular_on_huggingface/distilroberta-base.py", # output nan
"popular_on_huggingface/distilbert-base-multilingual-cased.py",
#"popular_on_huggingface/albert-base-v2", # Status Message: indices element out of data bounds, idx=8 must be within the inclusive range [-2,1]
]
49 changes: 34 additions & 15 deletions workflow_scripts/run_mlagility.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import subprocess
import sys
import ort_test_dir_utils
import test_utils


def get_immediate_subdirectories_count(dir_name):
Expand All @@ -21,7 +22,7 @@ def find_model_hash_name(stdout):
line = line.replace("\\", "/")
# last part of the path is the model hash name
return line.split("/")[-1]
raise Exception(f"Cannot find Build dir in {stdout}.")
raise Exception(f"Cannot find Build dir in {stdout}.")


ZOO_OPSET_VERSION = "16"
Expand All @@ -33,34 +34,45 @@ def find_model_hash_name(stdout):


def main():
# caculate first; otherwise the directories might be deleted by shutil.rmtree
mlagility_subdir_count = get_immediate_subdirectories_count(mlagility_models_dir)

parser = argparse.ArgumentParser(description="Test settings")

parser.add_argument("--all_models", required=False, default=False, action="store_true",
help="Test all ONNX Model Zoo models instead of only chnaged models")
parser.add_argument("--create", required=False, default=False, action="store_true",
help="Create new models from mlagility if not exist.")
parser.add_argument("--drop", required=False, default=False, action="store_true",
help="Drop downloaded models after verification. (For space limitation in CIs)")
parser.add_argument("--skip", required=False, default=False, action="store_true",
help="Skip checking models if already exist.")


args = parser.parse_args()
errors = 0

changed_models_set = set(test_utils.get_changed_models())
print(f"Changed models: {changed_models_set}")
for model_info in models_info:
directory_name, model_name = model_info.split("/")
_, model_name = model_info.split("/")
model_name = model_name.replace(".py", "")
model_zoo_dir = model_name
print(f"----------------Checking {model_zoo_dir}----------------")
final_model_dir = osp.join(mlagility_models_dir, model_zoo_dir)
final_model_name = f"{model_zoo_dir}-{ZOO_OPSET_VERSION}.onnx"
final_model_path = osp.join(final_model_dir, final_model_name)
if not args.all_models and final_model_path not in changed_models_set:
print(f"Skip checking {final_model_path} because it is not changed.")
continue
if osp.exists(final_model_path) and args.skip:
print(f"Skip checking {model_zoo_dir} because {final_model_path} already exists.")
continue
try:
print(f"----------------Checking {model_zoo_dir}----------------")
final_model_dir = osp.join(mlagility_models_dir, model_zoo_dir)
final_model_name = f"{model_zoo_dir}-{ZOO_OPSET_VERSION}.onnx"
final_model_path = osp.join(final_model_dir, final_model_name)
if osp.exists(final_model_path) and args.skip:
print(f"Skip checking {model_zoo_dir} because {final_model_path} already exists.")
continue
cmd = subprocess.run(["benchit", osp.join(mlagility_root, model_info), "--cache-dir", cache_converted_dir,
"--onnx-opset", ZOO_OPSET_VERSION, "--export-only"],
cwd=cwd_path, stdout=subprocess.PIPE,
stderr=sys.stderr, check=True)
model_hash_name = find_model_hash_name(cmd.stdout)
print(model_hash_name)
mlagility_created_onnx = osp.join(cache_converted_dir, model_hash_name, "onnx", model_hash_name + base_name)
if args.create:
ort_test_dir_utils.create_test_dir(mlagility_created_onnx, "./", final_model_dir)
Expand All @@ -75,14 +87,21 @@ def main():
except Exception as e:
errors += 1
print(f"Failed to check {model_zoo_dir} because of {e}.")

if args.drop:
subprocess.run(["benchit", "cache", "delete", "--all", "--cache-dir", cache_converted_dir],
cwd=cwd_path, stdout=sys.stdout, stderr=sys.stderr, check=True)
subprocess.run(["benchit", "cache", "clean", "--all", "--cache-dir", cache_converted_dir],
cwd=cwd_path, stdout=sys.stdout, stderr=sys.stderr, check=True)
shutil.rmtree(final_model_dir, ignore_errors=True)
shutil.rmtree(cache_converted_dir, ignore_errors=True)
total_count = len(models_info) if args.all_models else len(changed_models_set)
if errors > 0:
print(f"All {len(models_info)} model(s) have been checked, but {errors} model(s) failed.")
print(f"All {total_count} model(s) have been checked, but {errors} model(s) failed.")
sys.exit(1)
else:
print(f"All {len(models_info)} model(s) have been checked.")
print(f"All {total_count} model(s) have been checked.")


mlagility_subdir_count = get_immediate_subdirectories_count(mlagility_models_dir)
if mlagility_subdir_count != len(models_info):
print(f"Expected {len(models_info)} model(s) in {mlagility_models_dir}, but got {mlagility_subdir_count} model(s) under models/mlagility."
f"Please check if you have added new model(s) to models_info in mlagility_config.py.")
Expand Down
34 changes: 3 additions & 31 deletions workflow_scripts/test_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,25 +23,6 @@ def get_all_models():
return model_list


def get_changed_models():
model_list = []
cwd_path = Path.cwd()
# git fetch first for git diff on GitHub Action
subprocess.run(["git", "fetch", "origin", "main:main"],
cwd=cwd_path, stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
# obtain list of added or modified files in this PR
obtain_diff = subprocess.Popen(["git", "diff", "--name-only", "--diff-filter=AM", "origin/main", "HEAD"],
cwd=cwd_path, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdoutput, _ = obtain_diff.communicate()
diff_list = stdoutput.split()

# identify list of changed ONNX models in ONXX Model Zoo
model_list = [str(model).replace("b'", "").replace("'", "")
for model in diff_list if onnx_ext_name in str(model) or tar_ext_name in str(model)]
return model_list


def main():
parser = argparse.ArgumentParser(description="Test settings")
# default all: test by both onnx and onnxruntime
Expand All @@ -53,12 +34,12 @@ def main():
parser.add_argument("--create", required=False, default=False, action="store_true",
help="Create new test data by ORT if it fails with existing test data")
parser.add_argument("--all_models", required=False, default=False, action="store_true",
help="Test all ONNX Model Zoo models instead of only chnaged models")
help="Test all ONNX Model Zoo models instead of only changed models")
parser.add_argument("--drop", required=False, default=False, action="store_true",
help="Drop downloaded models after verification. (For space limitation in CIs)")
args = parser.parse_args()

model_list = get_all_models() if args.all_models else get_changed_models()
model_list = get_all_models() if args.all_models else test_utils.get_changed_models()
# run lfs install before starting the tests
test_utils.run_lfs_install()

Expand Down Expand Up @@ -106,16 +87,7 @@ def main():
print("[PASS] {} is checked by onnx. ".format(model_name))
if args.target == "onnxruntime" or args.target == "all":
try:
# git lfs pull those test_data_set_* folders
root_dir = Path(model_path).parent
for _, dirs, _ in os.walk(root_dir):
for dir in dirs:
if "test_data_set_" in dir:
test_data_set_dir = os.path.join(root_dir, dir)
for _, _, files in os.walk(test_data_set_dir):
for file in files:
if file.endswith(".pb"):
test_utils.pull_lfs_file(os.path.join(test_data_set_dir, file))
test_utils.pull_lfs_directory(Path(model_path).parent)
check_model.run_backend_ort_with_data(model_path)
print("[PASS] {} is checked by onnxruntime. ".format(model_name))
except Exception as e:
Expand Down
35 changes: 35 additions & 0 deletions workflow_scripts/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,18 @@ def pull_lfs_file(file_name):
print(f'LFS pull completed for {file_name} with return code= {result.returncode}')


def pull_lfs_directory(directory_name):
# git lfs pull those test_data_set_* folders
for _, dirs, _ in os.walk(directory_name):
for dir in dirs:
if "test_data_set_" in dir:
test_data_set_dir = os.path.join(directory_name, dir)
for _, _, files in os.walk(test_data_set_dir):
for file in files:
if file.endswith(".pb"):
pull_lfs_file(os.path.join(test_data_set_dir, file))


def run_lfs_prune():
result = subprocess.run(['git', 'lfs', 'prune'], cwd=cwd_path, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
print(f'LFS prune completed with return code= {result.returncode}')
Expand Down Expand Up @@ -62,3 +74,26 @@ def remove_tar_dir():
def remove_onnxruntime_test_dir():
if os.path.exists(TEST_ORT_DIR) and os.path.isdir(TEST_ORT_DIR):
rmtree(TEST_ORT_DIR)


def get_changed_models():
tar_ext_name = ".tar.gz"
onnx_ext_name = ".onnx"
model_list = []
cwd_path = Path.cwd()
# TODO: use the main branch instead of new-models
branch_name = "new-models" # "main"
# git fetch first for git diff on GitHub Action
subprocess.run(["git", "fetch", "origin", f"{branch_name}:{branch_name}"],
cwd=cwd_path, stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
# obtain list of added or modified files in this PR
obtain_diff = subprocess.Popen(["git", "diff", "--name-only", "--diff-filter=AM", "origin/" + branch_name, "HEAD"],
cwd=cwd_path, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdoutput, _ = obtain_diff.communicate()
diff_list = stdoutput.split()

# identify list of changed ONNX models in ONXX Model Zoo
model_list = [str(model).replace("b'", "").replace("'", "")
for model in diff_list if onnx_ext_name in str(model) or tar_ext_name in str(model)]
return model_list

0 comments on commit c021460

Please sign in to comment.