Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to perform batch inference to accelerate the process - coca #781

Closed
Raychanan opened this issue Jan 4, 2024 · 1 comment
Closed

Comments

@Raychanan
Copy link

Raychanan commented Jan 4, 2024

Hi, thanks for your amazing work!

I'm currently following the official tutorial available on the project's homepage through Colab, and it's working out great for me.

I now have a GPU with ample memory. As I have numerous images to process, I'm wondering if there's a way to handle all the images in a folder in several batches for accelerating the processing? I've only used batching with text before and am quite new to doing this with images. If it's possible, could you provide some guidance or resources on this? Thanks!

Here is my existing code:

import open_clip
import torch
from PIL import Image
import os

# Create model and transforms
model, _, transform = open_clip.create_model_and_transforms(
    model_name="coca_ViT-L-14",
    pretrained="mscoco_finetuned_laion2B-s13B-b90k",
    device='cuda',
    cache_dir="/global/scratch/users/USERNAME/huggingface_cache/"
)

folder_path = './'

# List all files in the directory
files = os.listdir(folder_path)

for file in files:
    # Check if the file is an image (you might want to check for specific extensions)
    if file.lower().endswith(('.png', '.jpg', '.jpeg')):
        image_path = os.path.join(folder_path, file)

        # Load and transform the image
        im = Image.open(image_path).convert("RGB")
        im = transform(im).unsqueeze(0)

        # Transfer the image tensor to CUDA
        im = im.to('cuda')

        with torch.no_grad(), torch.cuda.amp.autocast():
            generated = model.generate(im)

        # Print the generated text
        print(f"Text for {file}:")
        print(open_clip.decode(generated[0]).split("<end_of_text>")[0].replace("<start_of_text>", ""))

@Raychanan
Copy link
Author

Poked around, and this commit contribution resolves my questions:

#498

https://github.com/mlfoundations/open_clip/pull/498/files

My code after updating the open_clip package source code for the generate function coca_model.py:

import open_clip
import torch
from PIL import Image
import os
print(open_clip.__file__)
import time

# Create model and transforms
model, _, transform = open_clip.create_model_and_transforms(
    model_name="coca_ViT-L-14",
    pretrained="mscoco_finetuned_laion2B-s13B-b90k",
    device='cuda',
    cache_dir="/global/scratch/users/USERNAME/huggingface_cache/"
)

# help(model.generate)

folder_path = './'

# List all files in the directory
files = os.listdir(folder_path)

# Assuming 'files' and 'folder_path' are defined
BATCH_SIZE = 4  # You can adjust this size
batch = []

start_time = time.time()

for file in files:
    if file.lower().endswith(('.png', '.jpg', '.jpeg')):
        image_path = os.path.join(folder_path, file)

        # Load and transform the image
        im = Image.open(image_path).convert("RGB")
        im = transform(im).unsqueeze(0)

        batch.append(im)

        # Check if batch size is reached
        if len(batch) == BATCH_SIZE:
            # Process the batch
            batch_tensor = torch.cat(batch, dim=0)
            with torch.no_grad(), torch.cuda.amp.autocast():
                generated = model.generate(batch_tensor, device='cuda', batch_size=BATCH_SIZE)

            for idx, gen in enumerate(generated):
                # Print the generated text for each image
                print(f"Text for {files[idx]}:")
                print(open_clip.decode(gen).split("<end_of_text>")[0].replace("<start_of_text>", ""))

            # Clear the batch
            batch = []

# Process any remaining images in the batch
if batch:
    batch_tensor = torch.cat(batch, dim=0)
    with torch.no_grad(), torch.cuda.amp.autocast():
        generated = model.generate(batch_tensor, device='cuda')

    for idx, gen in enumerate(generated):
        print(f"Text for {files[idx]}:")
        print(open_clip.decode(gen).split("<end_of_text>")[0].replace("<start_of_text>", ""))

end_time = time.time()

elapsed_time = end_time - start_time
print(f"Total time for processing: {elapsed_time} seconds")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant