Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make GraphFrame from Edge DataFrame only

From this, "A GraphFrame can also be constructed from a single DataFrame containing edge information. The vertices will be inferred from the sources and destinations of the edges."

However when I look into its API doc, it seems there is no way to create one.

Has someone tried to create a GraphFrame using edge DataFrame only? How?

like image 258
Muhammad Mohib Khan Avatar asked Sep 03 '25 14:09

Muhammad Mohib Khan


2 Answers

In order to avoid duplicates in the vertices list I would add a distinct

verticesDf=edgesDf \
     .select("src") \ 
     .union(edgesDf.select("dst")) \
     .distinct() \
     .withColumnRenamed('src', 'id')

verticesDf.show()

graph=GraphFrame(verticesDf,edgesDf)
like image 190
Alex Ortner Avatar answered Sep 05 '25 10:09

Alex Ortner


The graphframes scala API has a function called fromEdges which generates a graphframe from a edge dataframe. As far as I can overlook it this function isn't avaiable in pyspark, but you can do something like:

##something

verticesDf = edgesDF.select('src').union(edgesDF.select('dst'))
verticesDf = verticesDf.withColumnRenamed('src', 'id')

##more something

to achieve the same.

like image 27
cronoik Avatar answered Sep 05 '25 10:09

cronoik