Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why the node's label affect the query performance significantly in Neo4j?

Tags:

neo4j

I try to simplify my question. If all nodes in Neo4jDB have same label Science, what's the difference between MATCH n WHERE n.ID="UUID-0001" RETURN n and MATCH (n:Science) WHERE n.ID="UUID-0001" RETURN n. Why the performance is not the same?

My Neo4j database contains about 70000 nodes and 100 relations.

The nodes have two types: Paper and Author, and they both have an ID field.

I created each node with corresponding label, and I also use ID as the index.

However, since one of my functions need to query nodes by ID without considering the label. The query just like: MATCH n WHERE n.ID="UUID-0001" RETURN n. The query time cost about 4000~5000 ms!

But after adding Science for each node and using MATCH (n:Science) WHERE n.ID="UUID-0001" RETURN n. The query time became about 1000~1100 ms. Does anyone know the difference between these two cases?

PS. Count(n:Science) = Count(n:Paper) + Count(n:Author), which mean each node has two labels.

like image 750
LoveTW Avatar asked Dec 03 '25 14:12

LoveTW


1 Answers

Because for every label Neo4j automatically creates an extra index. The Cypher language can be broadly thought of as piping + filtering, so Match n WHere ... will first get every node and then filter on the where part. Whereas Match (n:Science) Where... will get every node with label science (using an index) and then try to match the where. From your query performance we can see that about 1/5th of your nodes were marked science so the query runs in a fifth he time, because it did a fifth as many comparisons.

like image 124
phil_20686 Avatar answered Dec 05 '25 11:12

phil_20686



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!