I'm trying to make a code for getting Twitter network properties.
But I got an error on my code. I don't know how come it happened.
The error is this:
Traceback (most recent call last):
File "Network_property.py", line 14, in <module>
followee = line.strip().split('\t')[1]
IndexError: list index out of range
The code is this:
import os, sys
import time
import networkx as nx
DG = nx.DiGraph()
ptime = time.time()
j = 1
#for line in open("./US_Health_Links.txt", 'r'):
for line in open("./test_network.txt", 'r'):
follower = line.strip().split('\t')[0]
followee = line.strip().split('\t')[1]
DG.add_edge(follower, followee)
if j%1000000 == 0:
print j*1.0/1000000, "million lines done", time.time() - ptime
ptime = time.time()
j += 1
print nx.number_connected_components(DG)
I gathered some links data like this:
1000 1001
1000 1020191
1000 10267352
1000 10957902
1000 11039092
1000 1118691
1000 11882
1000 1228281
1000 1247041
1000 12965332
1000 13027572
1000 13075072
1000 13183162
1000 13250162
1000 13326292
1000 13452672
1000 13844892
1000 14061830
1000 1406481
1000 14134703
1000 14216951
1000 14254402
1000 14258044
1000 14270791
1000 14278978
1000 14313332
1000 14392970
1000 14441172
1000 14497568
1000 14502775
1000 14595635
1000 14620544
1000 14632615
1000 14680596
1000 14956164
1000 14998341
1000 15132211
1000 15145450
1000 15285998
1000 15288974
1000 15300187
1000 1532061
1000 15326300
"1000" is a follower and others are followee.
+
I wanna get results of (1) number of connected component, (2) fraction of nodes in the largest connected component, (3) average and median of in-degree, (4) average and median of out-degree, (5) diameter, and (6) clustering coefficient
But the site "networkx.lanl.gov" doesn't work.
Is there anybody who help me out?
The error has nothing specifically to do with networkx. What is happening is that for some line line.strip().split('\t') is returning only a single field. I'd guess that the problem is with blank lines in your file. Compare:
>>> ''.split("\t")
['']
>>> ''.split("\t")[1]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
>>>
Thus, an empty line could cause the problem. You could check this explicitly, for example, by adding
if not line:
continue
at the beginning of your for loop.
Also take a look at networkx.read_edgelist, which should be simplest if you don't need to have the print statement showing progress.
The network.txt file example that you provided does not have tabs; it has spaces. If you change your instances of split('\t') to split(), it will split on any whitespace, so it will handle your files whether they have spaces or tabs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With