pytorch model evaluation slow when deployed on kubernetes

Question

I would like to make the result of a text classification model (finBERT pytorch model) available through an endpoint that is deployed on Kubernetes.

The whole pipeline is working but it's super slow to process (30 seconds for one sentence) when deployed. If I time the same endpoint in local, I'm getting results in 1 or 2 seconds. Running the docker image in local, the endpoint also takes 2 seconds to return a result.

When I'm checking the CPU usage of my kubernetes instance while the request is running, it doesn't go above 35% so I'm not sure it's related to a lack of computation power?

Did anyone witness such performances issues when making a forward pass to a pytorch model? Any clues on what I should investigate?

Any help is greatly appreciated, thank you!

I am currently using

limits: cpu: "2" requests: cpu: "1"

Python : 3.7 Pytorch : 1.8.1

LogDog23 · Accepted Answer

I had the same issue. Locally my pytorch model would return a prediction in 25 ms and then on Kubernetes it would take 5 seconds. The problem had to do with how many threads torch had available to use. I'm not 100% sure why this works, but reducing the number of threads sped up performance significantly.

Set the following environment variable on your kubernetes pod. OMP_NUM_THREADS=1

After doing that it performed on kubernetes like it did running it locally ~30ms per call.

These are my pod limits:

cpu limits 1
mem limits: 1500m

I was led to discover this from this blog post: https://www.chunyangwen.com/blog/python/pytorch-slow-inference.html

pytorch model evaluation slow when deployed on kubernetes

Tags:

python

kubernetes

pytorch

bert-language-model

move_ludwig

1 Answers

LogDog23

Recent Activity

Donate For Us

pytorch model evaluation slow when deployed on kubernetes

Tags:

python

kubernetes

pytorch

bert-language-model

move_ludwig

1 Answers

LogDog23

Related questions

Recent Activity

Donate For Us