From this, "A GraphFrame can also be constructed from a single DataFrame containing edge information. The vertices will be inferred from the sources and destinations of the edges."
However when I look into its API doc, it seems there is no way to create one.
Has someone tried to create a GraphFrame using edge DataFrame only? How?
In order to avoid duplicates in the vertices list I would add a distinct
verticesDf=edgesDf \
.select("src") \
.union(edgesDf.select("dst")) \
.distinct() \
.withColumnRenamed('src', 'id')
verticesDf.show()
graph=GraphFrame(verticesDf,edgesDf)
The graphframes scala API has a function called fromEdges which generates a graphframe from a edge dataframe. As far as I can overlook it this function isn't avaiable in pyspark, but you can do something like:
##something
verticesDf = edgesDF.select('src').union(edgesDF.select('dst'))
verticesDf = verticesDf.withColumnRenamed('src', 'id')
##more something
to achieve the same.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With