Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding /get_tokenizer to api_server for lm-evaluation-harness ease integration. #2643

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

AguirreNicolas
Copy link
Contributor

OpenAI is already supporting logprobs for chat models in most model cases.
In paralell, lm-evaluation-harness is in dev to add support to this feature.

In order to run a evaluation of an endpoint backended by vLLM, and similarly as in the openai native case using tiktoken, the tokenizer will be needed.

The simple way would be to return the HF <organization>/<repo> string, and then relay on the use of AutoTokenizer.from_pretrained.
But given that some usecase like:

  • models/tokenizer can be private, or
  • lm-evaluation-harness and vllm can be in the same network but without internet access.

I propose to add the get_tokenizer endpoit the api_server so then it can be used like:

# Get tokenizer
response = requests.get("http://0.0.0.0:8000/get_tokenizer")
tokenizer_objects = json.loads(response.content)
# save into files
for key, value in tokenizer_objects.items():
    # create folder if not exists
    if not os.path.exists(TOKENIZER_EPHIMERAL_PATH):
        os.mkdir(TOKENIZER_EPHIMERAL_PATH)
    with open(os.path.join(TOKENIZER_EPHIMERAL_PATH, key + ".json"), "w") as f:
        json.dump(value, f)
        f.close()
try:
    shutil.rmtree(TOKENIZER_EPHIMERAL_PATH)
    #logger.info(f"Ephimeral '{TOKENIZER_EPHIMERAL_PATH.name}' directory removed successfully.")
    #logger.info(f"Tokenizer objects availables: {str(self.tokenizer_objects.keys())}")
except OSError as e:
    raise RuntimeError(f"Error removing '{TOKENIZER_EPHIMERAL_PATH.name}' directory: {e}")

# load tokenizer
vllm_tokenizer = AutoTokenizer.from_pretrained(TOKENIZER_EPHIMERAL_PATH)

The next step would be to have logprobs supported in vLLM for chat cases in the OAI entryppoint, and then a PR on lm-evaluation-harness to catch the vLLM tokenizer feature (like using the code above).

@AguirreNicolas AguirreNicolas mentioned this pull request Feb 1, 2024
30 tasks
@simon-mo
Copy link
Collaborator

In these cases

models/tokenizer can be private, or
lm-evaluation-harness and vllm can be in the same network but without internet access.

wouldn't the user of lm-evaluation-harness be able to figure out the tokenizer ahead of time?

@simon-mo
Copy link
Collaborator

I would like to wait a bit on this PR to see how popular this request it. Thank you for posting! @AguirreNicolas

@AguirreNicolas AguirreNicolas force-pushed the get_remote_tokenizers branch from cfa4b85 to f89a2b1 Compare April 26, 2024 08:13
@AguirreNicolas
Copy link
Contributor Author

AguirreNicolas commented Apr 30, 2024

Hi @simon-mo ,

My team and I want to provide support for lm-eval-harness so that tests can be run on a vLLM endpoint without known the model a priori.

That is, to use lm-eval-harness as a client which wants to evaluate the capability of a model served through vLLM.

The issue we are encountering is in OpenaiCompletionsLM

Under the current lm-eval-harness concept, the tokenizer is available via huggingface or using the OpenAI tokenizer.

For this reason, we believe it would be valuable to move forward with the current PR. Moreover, I was thinking of updating the current PR adding a boolean option on the vLLM side to enable or disable this option.

Finally, open another PR in lm-eval-harness updating the OpenaiCompletionsLM by adding:

        elif self.tokenizer_backend == "vllm":
            import transformers  # noqa: E401

            # Get tokenizer
            response = requests.get(base_url+"/get_tokenizer")
            tokenizer_objects = json.loads(response.content)    
            # save into files
            for key, value in tokenizer_objects.items():
                # create folder if not exists
                if not os.path.exists(self.tokenizer_ephimeral_path):
                    os.mkdir(self.tokenizer_ephimeral_path)
                with open(os.path.join(self.tokenizer_ephimeral_path, key + ".json"), "w") as f:
                    json.dump(value, f)
                    f.close()
            try:
                shutil.rmtree(self.tokenizer_ephimeral_path)
                eval_logger.debug(f"Ephimeral '{self.tokenizer_ephimeral_path.name}' directory removed successfully.")
                eval_logger.debug(f"Tokenizer objects availables: {str(self.tokenizer_objects.keys())}")
            except OSError as e:
                raise RuntimeError(f"Error removing '{self.tokenizer_ephimeral_path.name}' directory: {e}")
            # load tokenizer
            self.tokenizer = transformers.AutoTokenizer.from_pretrained(self.tokenizer_ephimeral_path)            

            self.vocab_size = self.tokenizer.vocab
            self.end_of_text_token_id = self.tokenizer.eos_token                

What do you think? Would you consider in any way valuable our time into vLLM on this issue ?

@AguirreNicolas AguirreNicolas force-pushed the get_remote_tokenizers branch from 8fe1af7 to 3f3d7fc Compare April 30, 2024 15:56
Copy link

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

@github-actions github-actions bot added the stale label Oct 30, 2024
Copy link

mergify bot commented Oct 30, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @AguirreNicolas please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants