Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Surprising results of Ms Azure vs Google Colab BERT training performance, not sure how to explain

I'm not sure if it's BERT related or not, had no chance to test other models, but did it for BERT.

So what I noticed recently that training algorithms and data that I used to work with in google colab for free, are seemed to work significantly slower in Azure ML workspace which we pay for.

I made the comparison - same data file (classification problem, sentiment analysis of 10K reviews), totally same notebook code (copy+paste), same latest ver of ktrain lib installed on both, both must be on Python 3.8, but GPU is a bit more performant on a colab side.

Results surprised me to say the least: google lab made its job 10 times faster: 17 min vs 170 min, and it's reproducible. Tesla T4 (colab) is faster than K80 (azure) indeed, but not that much as per known benchmarks. So I wonder what else could matter. Is it virt. environment created in Azure ML performing so slow? If you have any idea what it could be, or what else I can check on both sides to reveal it, please share

BTW google gives you T4 in colab for your experimentations for free, while you have to pay for slower K80 at Azure.

Google colab execution time = 17 min enter image description here Google colab hardware: cpu model: Intel(R) Xeon(R) CPU @ 2.20GHz, memory 13Gb, GPU:
enter image description here

Azure execution time = 2h50m = 170min (10x of colab) enter image description here Azure hardware information enter image description here

K80 and T4 comparison: https://technical.city/en/video/Tesla-K80-vs-Tesla-T4

like image 342
YMC Avatar asked Oct 25 '25 11:10

YMC


1 Answers

So I think firstly, to do a comparison that isn't apples-to-apples in terms of hardware, you'll struggle to get to the root of the issue.

That being said, on Azure, the Standard_NC6 compute target only gives half of a K80 card. I'm not sure how that 'half' divides all the specs, but I do know that it only gives half the GPU memory. From this, I'd assume that it also only gives half the CUDA cores, but maybe not half the bandwidth of the memory bus.

Lastly, the T4 has nearly double the boost clock speed of the K80, which might not give the 10x discrepancy you're seeing but will definitely have a substantial impact on performance.

I'd suggest maybe provisioning a K80 on the colab notebook, or any other gpu enabled compute that is available on both to test any theories you have about performance on the two platforms.

like image 89
byronV999 Avatar answered Oct 28 '25 03:10

byronV999



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!