I am using vertex-ai endpoints to serve a deep learning service.
My service takes approximately 30s - 2 minutes to respond on CPU depending on the size of the input. I noticed that when the input size takes more than one minute to respond, the API fails, giving me this error:
<!DOCTYPE html>
<html lang=en>
<meta charset=utf-8>
<meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
<title>Error 502 (Server Error)!!1</title>
<style>
*{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
</style>
<a href=//www.google.com/><span id=logo aria-label=Google></span></a>
<p><b>502.</b> <ins>That’s an error.</ins>
<p>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds. <ins>That’s all we know.</ins>
When I retry, I keep getting the same error. Once I decrease the input size, the API starts working again. For these reasons, I believe this is a timeout issue.
So my question is: how can I change the timeout value in vertex-ai endpoints? I read through all the documentation, and it doesn't seem to be mentioned anywhere.
Thank you.
There is an upper limit on the timeout of about 60s plus some extra overhead. So anything approaching 2m is definitely the reason why you're getting that error. It also isn't configurable.
Are there ways to speed up the model serving overhead? Such as deploying on faster hardware, other model optimizations? If you're running a custom container, perhaps take advantage of more cores, reduce any external dependencies
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With