Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using GPU to train models in a K8s Pod with the K8s Device Plugin and PyTorch framework, the training time is 6% longer compared to running on bare metal. #1101

Open
lyon-v opened this issue Dec 16, 2024 · 1 comment

Comments

@lyon-v
Copy link

lyon-v commented Dec 16, 2024

Here is the code:

import time
import torch
import torch.nn as nn
import torch.optim
import torch.utils.data
import torchvision
import torchvision.transforms as T
import torchvision.datasets as datasets
import torch.profiler
from torchvision import models
import torchvision.transforms as transforms

device = torch.device("cuda:0")
model = models.resnet50()
model.cuda(device)

train_dataset = datasets.FakeData(51246, (3, 224, 224), 1000, transforms.ToTensor())

train_loader = torch.utils.data.DataLoader(
    train_dataset, batch_size=256, shuffle=None,
    num_workers=4, pin_memory=True)

criterion = nn.CrossEntropyLoss().cuda(device)
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
model.train()
total_epochs = 2

wait = 1
warmup = 1
active = 4

sumstep = wait + warmup + active
# Training loop
skip = wait + warmup

for i in range(total_epochs):
    totaltime=0
    step_start_time = time.time()  # Track the start time of the iteration
    for step, data in enumerate(train_loader, 0):
        
        inputs, labels = data[0].to(device=device), data[1].to(device=device)
        
        data_loading_time = time.time() - step_start_time
        
        # if step > skip +active:
        #         torch.cuda.cudart().cudaProfilerStop()
        #         print("break out")
        #         break
        # if step == skip:
        #     torch.cuda.cudart().cudaProfilerStart()

        # 2. Forward pass
        outputs = model(inputs)
        loss = criterion(outputs, labels)

        # 3. Backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        iteration_time = time.time() - step_start_time
        totaltime +=iteration_time
        print(f"Step {step}: Iteration Time: {iteration_time:.4f}s, Data Loading Time: {data_loading_time:.4f}s")
        
        # Measure data loading time
        step_start_time = time.time() 
        
    print(f"epoch {i}: Total Time: {totaltime:.4f}s, Avg Iteration Time: {totaltime/step:.4f}s")

print("Training Finished")

Under the same configuration, including CPU, memory, and GPU, training one epoch in a K8s Pod takes 6% longer compared to running on bare metal (using Docker).

k8s-pod: epoch 1: Total Time: 79.8202s, Avg Iteration Time: 0.3991s
bare metal: epoch 1: Total Time: 73.0051s, Avg Iteration Time: 0.3650s

so,I need your help

@chipzoller
Copy link
Contributor

I don't think this would have anything to do with the device plugin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants