Why the node's label affect the query performance significantly in Neo4j?

Question

I try to simplify my question. If all nodes in Neo4jDB have same label Science, what's the difference between MATCH n WHERE n.ID="UUID-0001" RETURN n and MATCH (n:Science) WHERE n.ID="UUID-0001" RETURN n. Why the performance is not the same?

My Neo4j database contains about 70000 nodes and 100 relations.

The nodes have two types: Paper and Author, and they both have an ID field.

I created each node with corresponding label, and I also use ID as the index.

However, since one of my functions need to query nodes by ID without considering the label. The query just like: MATCH n WHERE n.ID="UUID-0001" RETURN n. The query time cost about 4000~5000 ms!

But after adding Science for each node and using MATCH (n:Science) WHERE n.ID="UUID-0001" RETURN n. The query time became about 1000~1100 ms. Does anyone know the difference between these two cases?

PS. Count(n:Science) = Count(n:Paper) + Count(n:Author), which mean each node has two labels.

phil_20686 · Accepted Answer

Because for every label Neo4j automatically creates an extra index. The Cypher language can be broadly thought of as piping + filtering, so Match n WHere ... will first get every node and then filter on the where part. Whereas Match (n:Science) Where... will get every node with label science (using an index) and then try to match the where. From your query performance we can see that about 1/5th of your nodes were marked science so the query runs in a fifth he time, because it did a fifth as many comparisons.

Why the node's label affect the query performance significantly in Neo4j?

Tags:

neo4j

LoveTW

1 Answers

phil_20686

Recent Activity

Donate For Us

Why the node's label affect the query performance significantly in Neo4j?

Tags:

neo4j

LoveTW

1 Answers

phil_20686

Related questions

Recent Activity

Donate For Us