`/tokenize` - Optionally Apply Chat Template before Tokenization

### Feature request

On the `/tokenize` endpoint of TGI, add an option to apply the chat template from the model's tokenizer, if existant, before tokenizing.


### Motivation

The `/tokenize` endpoint of TGI is very useful in situations where an application requires information about the tokenization of a string, but doesn't have direct access to a model/tokenizer that can be loaded with `AutoTokenizer`.

Specifically, I have instances where I need to know the token count of a prompt before sending it to `/v1/chat/completions` so that I can appropriately truncate the input to be <= `max_input_tokens`.

`/tokenize`, however, does not adequately serve this purpose when calling `/v1/chat/completions`, as the tokenization we get is on the prompt _without_ the chat template applied. 

**Since the chat template may differ by model, there is no generic way via a `TGI` endpoint to get the token count of a prompt _after_ a chat template has been applied, meaning that preventing inputs from exceeding `max_input_tokens` is very difficult.**

### Your contribution

[Caveat - I do not know Rust, nor am I familiar with the inner workings of TGI]

I have done my best to read through the existing `/tokenize` implementation, and will attempt to provide a high-level overview of what might need to be changed, and where.

Adding an optional boolean parameter `apply_chat_template` to the [`/tokenize` endpoint](https://github.com/huggingface/text-generation-inference/blob/4ee0a0c4010b6e000f176977648aa1749339e8cb/router/src/server.rs#L1091) would suffice for my purposes.

It appears that one could mirror the exiting [`return_full_text` boolean parameter](https://github.com/huggingface/text-generation-inference/blob/4ee0a0c4010b6e000f176977648aa1749339e8cb/router/src/lib.rs#L209C1-L211C40) of `GenerateParameters`.

Further more, I imagine the `/tokenize` [with chat template] implementation could would be very similar to what's happening at the [`/v1/chat/completions` endpoint](https://github.com/huggingface/text-generation-inference/blob/4ee0a0c4010b6e000f176977648aa1749339e8cb/router/src/server.rs#L768)

<details>
<summary>`/v1/chat/completions` chat templatting implementation</summary>

```rs
    // apply chat template to flatten the request into a single input
    let mut inputs = match infer.apply_chat_template(req.messages) {
        Ok(inputs) => inputs,
        Err(err) => {
            metrics::increment_counter!("tgi_request_failure", "err" => "validation");
            tracing::error!("{err}");
            return Err((
                StatusCode::UNPROCESSABLE_ENTITY,
                Json(ErrorResponse {
                    error: err.to_string(),
                    error_type: err.error_type().to_string(),
                }),
            ));
        }
    };
```
</details>


### Example API calls with proposed parameter:

#### Without Chat Template Applied (current behavior)
```bash
curl -X 'POST' \
  'http://my.tgi.host/tokenize' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "inputs": "My name is Olivier and I"
}'
```

#### With Chat Template Applied (proposed behavior)
```bash
curl -X 'POST' \
  'http://localhost:8083/tokenize' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "inputs": "My name is Olivier and I",
  "parameters": {
     "apply_chat_template": true
  }
}'
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`/tokenize` - Optionally Apply Chat Template before Tokenization #1706

Feature request

Motivation

Your contribution

Example API calls with proposed parameter:

Without Chat Template Applied (current behavior)

With Chat Template Applied (proposed behavior)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

/tokenize - Optionally Apply Chat Template before Tokenization #1706

Description

Feature request

Motivation

Your contribution

Example API calls with proposed parameter:

Without Chat Template Applied (current behavior)

With Chat Template Applied (proposed behavior)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`/tokenize` - Optionally Apply Chat Template before Tokenization #1706