Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elastic Search, delete_by_query takes a long time to finish and causes http request to timeout

The deletion is still working but the server throws exceptions because it takes too long. What's the best way to handle this on the server side?

The delete_by_query api doc says it will return a task so I can track the deletion progress.

If the request contains wait_for_completion=false then Elasticsearch will perform some preflight checks, launch the request, and then return a task which can be used with Tasks APIs to cancel or get the status of the task. Elasticsearch will also create a record of this task as a document at .tasks/task/${taskId}. This is yours to keep or remove as you see fit. When you are done with it, delete it so Elasticsearch can reclaim the space it uses.

How to get this task id? It's not in the HTTP response, also in the timeout scenario, there may not even be an HTTP response.

GET _tasks?detailed=true&actions=*/delete/byquery will return me a list of deletion tasks but I just want the one task. If there are two tasking running, how would I know which is the one I am looking for?

Thanks.

like image 898
Jinggang Avatar asked Oct 18 '25 10:10

Jinggang


1 Answers

Elasticsearch 6

Create task:

nick@work:
curl -X POST "es-prices-ape:9200/prices /_delete_by_query?wait_for_completion=false" -H 'Content-Type: application/json' -d'
{
  "query": {
    "term": {
      "cella_id": "58259"
    }
  }
}
'

{"task":"GChf5jO9Q2Sti-Qi1G-oAw:12221137"}

Get task info:

nick@nick-home:~$ curl -X GET "es-prices-ape:9200/_tasks/{GChf5jO9Q2Sti-Qi1G-oAw:12221137}"

{"completed":true,"task":{"node":"GChf5jO9Q2Sti-Qi1G-oAw","id":12221137,"type":"transport","action":"indices:data/write/delete/byquery","status":{
"total" : 0,
"updated" : 0,
"created" : 0,
"deleted" : 0,
"batches" : 0,
"version_conflicts" : 0,
"noops" : 0,
....
like image 56
nickyat Avatar answered Oct 21 '25 06:10

nickyat