Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Python networkx for exploring network properties

I'm trying to make a code for getting Twitter network properties.

But I got an error on my code. I don't know how come it happened.

The error is this:

Traceback (most recent call last):
  File "Network_property.py", line 14, in <module>
    followee = line.strip().split('\t')[1]
IndexError: list index out of range

The code is this:

import os, sys
import time
import networkx as nx


DG = nx.DiGraph()

ptime = time.time()
j = 1

#for line in open("./US_Health_Links.txt", 'r'):
for line in open("./test_network.txt", 'r'):
    follower = line.strip().split('\t')[0]
    followee = line.strip().split('\t')[1]

    DG.add_edge(follower, followee)

    if j%1000000 == 0:
        print j*1.0/1000000, "million lines done", time.time() - ptime
        ptime = time.time()
    j += 1

print nx.number_connected_components(DG)

I gathered some links data like this:

1000    1001
1000    1020191
1000    10267352
1000    10957902
1000    11039092
1000    1118691
1000    11882
1000    1228281
1000    1247041
1000    12965332
1000    13027572
1000    13075072
1000    13183162
1000    13250162
1000    13326292
1000    13452672
1000    13844892
1000    14061830
1000    1406481
1000    14134703
1000    14216951
1000    14254402
1000    14258044
1000    14270791
1000    14278978
1000    14313332
1000    14392970
1000    14441172
1000    14497568
1000    14502775
1000    14595635
1000    14620544
1000    14632615
1000    14680596
1000    14956164
1000    14998341
1000    15132211
1000    15145450
1000    15285998
1000    15288974
1000    15300187
1000    1532061
1000    15326300

"1000" is a follower and others are followee.

+

I wanna get results of (1) number of connected component, (2) fraction of nodes in the largest connected component, (3) average and median of in-degree, (4) average and median of out-degree, (5) diameter, and (6) clustering coefficient

But the site "networkx.lanl.gov" doesn't work.

Is there anybody who help me out?

like image 494
ooozooo Avatar asked Feb 20 '26 13:02

ooozooo


2 Answers

The error has nothing specifically to do with networkx. What is happening is that for some line line.strip().split('\t') is returning only a single field. I'd guess that the problem is with blank lines in your file. Compare:

>>> ''.split("\t")
['']
>>> ''.split("\t")[1]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range
>>>

Thus, an empty line could cause the problem. You could check this explicitly, for example, by adding

if not line:
    continue

at the beginning of your for loop.

Also take a look at networkx.read_edgelist, which should be simplest if you don't need to have the print statement showing progress.

like image 54
Michael J. Barber Avatar answered Feb 23 '26 01:02

Michael J. Barber


The network.txt file example that you provided does not have tabs; it has spaces. If you change your instances of split('\t') to split(), it will split on any whitespace, so it will handle your files whether they have spaces or tabs.

like image 29
David Alber Avatar answered Feb 23 '26 03:02

David Alber