Surprising results of Ms Azure vs Google Colab BERT training performance, not sure how to explain

Question

I'm not sure if it's BERT related or not, had no chance to test other models, but did it for BERT.

So what I noticed recently that training algorithms and data that I used to work with in google colab for free, are seemed to work significantly slower in Azure ML workspace which we pay for.

I made the comparison - same data file (classification problem, sentiment analysis of 10K reviews), totally same notebook code (copy+paste), same latest ver of ktrain lib installed on both, both must be on Python 3.8, but GPU is a bit more performant on a colab side.

Results surprised me to say the least: google lab made its job 10 times faster: 17 min vs 170 min, and it's reproducible. Tesla T4 (colab) is faster than K80 (azure) indeed, but not that much as per known benchmarks. So I wonder what else could matter. Is it virt. environment created in Azure ML performing so slow? If you have any idea what it could be, or what else I can check on both sides to reveal it, please share

BTW google gives you T4 in colab for your experimentations for free, while you have to pay for slower K80 at Azure.

Google colab execution time = 17 min enter image description here Google colab hardware: cpu model: Intel(R) Xeon(R) CPU @ 2.20GHz, memory 13Gb, GPU:

Azure execution time = 2h50m = 170min (10x of colab) enter image description here Azure hardware information

K80 and T4 comparison: https://technical.city/en/video/Tesla-K80-vs-Tesla-T4

byronV999 · Accepted Answer

So I think firstly, to do a comparison that isn't apples-to-apples in terms of hardware, you'll struggle to get to the root of the issue.

That being said, on Azure, the Standard_NC6 compute target only gives half of a K80 card. I'm not sure how that 'half' divides all the specs, but I do know that it only gives half the GPU memory. From this, I'd assume that it also only gives half the CUDA cores, but maybe not half the bandwidth of the memory bus.

Lastly, the T4 has nearly double the boost clock speed of the K80, which might not give the 10x discrepancy you're seeing but will definitely have a substantial impact on performance.

I'd suggest maybe provisioning a K80 on the colab notebook, or any other gpu enabled compute that is available on both to test any theories you have about performance on the two platforms.

Surprising results of Ms Azure vs Google Colab BERT training performance, not sure how to explain

Tags:

google-colaboratory

azure

azure-machine-learning-service

YMC

1 Answers

byronV999

Recent Activity

Donate For Us

Surprising results of Ms Azure vs Google Colab BERT training performance, not sure how to explain

Tags:

google-colaboratory

azure

azure-machine-learning-service

YMC

1 Answers

byronV999

Related questions

Recent Activity

Donate For Us