I want to test the Neo4j performance with large number of nodes. I am thinking of creating billions of nodes and then want to see how much time it takes to fetch a node meeting some criteria. Like 1 billion nodes labeled Person having SSN property
match (p:Person) where p.SSN=4255556656425 return p;
But how can I create 1 billion nodes, is there a way to generate 1 billion nodes?
What you would be measuring then is the performance of the lucene index. So not a graph-database operation.
There are a number of options:
Neo4j 2.2.0-M03 comes with neo4j-import, a tool that can quickly and scalable import a 1 billion node csv into Neo4j.
this is very new in Neo4j 2.2
I created a node-only Graph with 1.000.000.000 nodes in 5mins 13s (53G db) with the new ParallelBatchImporter. Which makes it about 3.2M nodes/second.
Code is here: https://gist.github.com/jexp/0ff850ab2ce41c9ca5e6
You could use the Neo4j Batch-Inserter-API to create that data without creating the CSV first.
see this example here which you would have to adopt to not read CSV but generate the data directly from a for loop: http://jexp.de/blog/2014/10/flexible-neo4j-batch-import-with-groovy/
If you want to use Cypher I'd recommend to run something like this in the JAVA_OPTS="-Xmx4G -Xms4G" bin/neo4j-shell -path billion.db:
Here is the code and timings for 10M and 100M I took on my macbook:
create a csv file with 1M lines
ruby -e 'File.open("million.csv","w") 
   { |f| (1..1000000).each{|i| f.write(i.to_s + "\n") }  }' 
Experiment running on a MacBook Pro Cypher execution is single threaded estimated size (15+42) bytes * node count
// on my laptop
// 10M nodes, 1 property, 1 label each in 98228 ms (98s) taking 580 MB on disk
using periodic commit 10000
load csv from "file:million.csv" as row
//with row limit 5000
foreach (x in range(0,9) | create (:Person {id:toInt(row[0])*10+x}));
// on my laptop
// 100M nodes, 1 property, 1 label each in 1684411 ms (28 mins) taking 6 GB on disk
using periodic commit 1000
load csv from "file:million.csv" as row
foreach (x in range(0,99) | create (:Person {id:toInt(row[0])*100+x}));
// on my linux server
// 1B nodes, 1 property, 1 label each in 10588883 ms (176 min) taking 63 GB on disk
using periodic commit 1000
load csv from "file:million.csv" as row
foreach (x in range(0,999) | create (:Person {id:toInt(row[0])*100+x}));
creating indexes
create index on :Person(id);
schema await
// took about 40 mins and increased the database size to 85 GB
then I can run
match (:Person {id:8005300}) return count(*);
+----------+
| count(*) |
+----------+
| 1        |
+----------+
1 row
2 ms
The other simple answer is a good one. If you want something a bit more involved, Michael Hunger posted a good blog entry on this. He recommends something which is basically very similar, but you can loop with some sample data as well, and use random numbers to establish linkages.
Here's how he created 100,000 users and products and linked them, customize as you see fit:
WITH ["Andres","Wes","Rik","Mark","Peter","Kenny","Michael","Stefan","Max","Chris"] AS names
FOREACH (r IN range(0,100000) | CREATE (:User {id:r, name:names[r % size(names)]+" "+r}));
with ["Mac","iPhone","Das Keyboard","Kymera Wand","HyperJuice Battery",
"Peachy Printer","HexaAirBot",
"AR-Drone","Sonic Screwdriver",
"Zentable","PowerUp"] as names
    foreach (r in range(0,50) | create (:Product {id:r, name:names[r % size(names)]+" "+r}));
Let's not forget sweet random linkage:
match (u:User),(p:Product)
where rand() < 0.1
with u,p
limit 50000
merge (u)-[:OWN]->(p);
Go nuts.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With