[Pytorch] GPU elapsed time / torch.cuda.Event()

TIL

[Pytorch] GPU elapsed time / torch.cuda.Event()

BKM 2024. 8. 12. 15:15

딥러닝 모델이 실행되는 시간을 체크하려면 일반적으로 사용하는

import time

start = time.time()
output = model(input_tensor)
elapsed_time = time.time()-start

혹은

import time

start = time.perf_counter()
output = model(input_tensor)
end = time.perf_counter()

다음과 같은 방식으로 접근하는 것은 정확하지 않다.

그 이유는 우리가 일반적으로 딥러닝 모델에 대한 연산을 진행할 때 GPU를 사용하기 때문이다.

딥러닝 모델에 대한 연산이 진행되는 개괄적인 흐름은 다음과 같다.

CPU에서 모델 학습/추론에 사용될 input tensor를 준비
준비된 input tensor를 모델에서 연산

이때 우리는 많은 양의 데이터를 효율적으로 처리하기 위해 GPU를 활용해 병렬적 처리를 하게 되는데,

GPU에 넣어줄 데이터를 준비하는 1번 과정을 하기 위해 2번 과정이 끝날때까지 기다리게 되면 너무 많은 시간이 소요된다.

그렇기 때문에 GPU 연산을 효율적으로 처리하기 위해서 1,2번 과정은 비동기적으로 처리(asynchronous execution)되며

`time`라이브러리는 CPU를 기준으로 시간을 측정하기 때문에

위의 코드를 적용했을때 GPU연산이 실제로는 마무리 되지 않았음에도 `model(input_tensor)`부분에서 return을 해버려

CPU기준 Elapsed Time이 계산돼, 실제 GPU연산이 끝날때까지 걸린 정확한 시간을 측정할 수 없는 경우도 생긴다.

정확한 GPU 시간 측정 방법

그러므로 실제 GPU연산이 끝나는 시간과 CPU에서 측정된 시간을 동기화 해주는 작업이 필요하고

`torch.cuda.Event`를 통해 이를 적용할 수 있다.

import torch

start_event = torch.cuda.Event(enable_timing=True)
end_event = torch.cuda.Event(enable_timing=True)

with torch.no_grad():
    start_event.record()
    output = model(input_tensor)
    end_event.record()
    
torch.cuda.synchronize() # GPU 연산 마무리 되는 시간 동기화

elapsed_time = start_event.elapsed_time(end_event)

Reference

https://seungseop.tistory.com/41

Deep Learning Model Inference time 정확히 측정하는 방법

Deep learning model inference time을 정확히 측정하는 법 요즘 ChatGPT, DALL-E 등 딥러닝 모델들이 많은 주목을 받고, 이에 따라 사용량 또한 급증하면서 모델을 사용할 때의 적은 inference time이 더욱 중요해

seungseop.tistory.com

https://www.speechmatics.com/company/articles-and-news/timing-operations-in-pytorch

How to Accurately Time CUDA Kernels in Pytorch

In a world of increasingly costly machine learning model deployments, ensuring accurate GPU operation timing is key to resource optimization. Read more!