Network Analysis-Part II (Gephi)

This tutorial was written by Katherine Walden, Digital Liberal Arts Specialist at Grinnell College.

This tutorial was reviewed by Sarah Purcell (L.F. Parker Professor of History) and Gina Donovan (Instructional Technologist) at Grinnell College, and edited by Papa Ampim-Darko, a student research assistant at Grinnell College.

Creative Commons License
Network Analysis-Part III (Gephi) is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.


While NetworkX is a powerful tool for analyzing networks and calculating network metrics, sometimes the type of interactive visualization we generated in Palladio is useful to understand a network or communicate our network analysis.

Gephi is an open-source network analysis and visualization software created by students at the University of Technology of Compiègne in 2008. The Gephi Consortium, which supports the ongoing development and documentation for Gephi, is a non-profit corporation supported by members that include SciencesPo, Linfluence, WebAtlas, and Quid. Gephi runs on Linux, Windows, and macOS operating systems and is available in 9 different languages.


Installing Gephi

Gephi is already installed on the Library lab computers.

To download Gephi on your own computer, go to Gephi’s download page and select the correct version for your operating system.


Data

1-In this tutorial, we will be working with data about Grinnell faculty, the subject of their terminal degree, where they received their terminal degree, and when they were hired by the College.

2-Navigate to http://vivero.sites.grinnell.edu/files/ in a browser and save the quakers_nodelist and quakers_edgelist CSV files to your Desktop. Open the files in Microsoft Excel to explore the data structure.

  • What types of institutions and College instructors are represented?

  • How are they described in the nodelist data?

  • What additional questions do you have about the individuals and degree-granting institutions that will be represented as nodes?

  • How is the edgelist data structured?

  • Based on a preliminary scan of the nodelist and edgelist CSV data, what types of networks do you think this data might illuminate?

  • Are there gaps, silences, or alternative networks that are not accounted for in the data?

3-Save these files to your desktop as “Grinnell_nodelist” and “Grinnell_edgelist” or another descriptive file name.


Loading Data into Gephi

4-Open Gephi by selecting it from Start->All Programs->Gephi or the Desktop icon.

5-Click the X in the top right-hand corner of the Welcome popup window.

6-Click File->Import Spreadsheet and navigate to the Desktop where the nodelist and edgelist CSV files are saved on your computer.

7-Select the nodelist file, make sure Comma is selected as Separator, Nodes table is selected under Import as, and UTF-8 under Charset.

8-Click Next. Leave the default settings on the Import settings (2 of 2) popup, and click Finish.

9-Select Undirected for Graph Type, and switch the default selection from New workspace to Append to existing workspace.

10-Click OK.

11-Your node data is now loaded in the Data Laboratory tab in Gephi.

12-Click Import Spreadsheet, and select the CSV with your edgelist data.

13-Make sure Comma is selected as Separator, Edges table under Import as, and UTF-8 under Charset.

14-Click Next.

15-Leave the default settings on the Import settings (2 of 2) window and click Finish.

16-Make sure Undirected is selected as Graph Type, and switch the default selection from New workspace to Append to existing workspace.

17-Click OK.

18-Now our nodes and edges data has been imported in Gephi.

19-Click File->Save to save your project. Label the Gephi file “Network_Tutorial” or another descriptive name.



20-Click on the Overview tab to see Gephi’s default visualization of your network data.

21-As you probably noticed, the default visualization is an interesting connection of nodes and edges, but doesn’t do much to help us more fully understand and analyze our data.

22-Click on Choose a layout under the Layout panel to select how Gephi displays your nodes and edges.

23-Gephi uses a variety of layout algorithms to determine the shape of network graphs. These different layouts algorithms highlight different aspects or features of your data.

Divisions OpenOrd
Complementarities ForceAtlas, Yifan Hu, Frushterman-Reingold
Ranking Circular, Radial Axis
Geographic Repartition GeoLayout

24-Label Adjust, Noverlap, Expansion, and Contraction make graphic adjustments to how your data displays, rather than using an underlying algorithm to change the structure of the network visualization.

25-Select different Layout options and click the Run icon to see how the different algorithms and settings change the visualization of your data. Click the Stop icon to stop the layout operation.

26-What Layout option(s) do you prefer, and why? How do different Layout options impact the visualization of your data? How could different Layout options be useful to answer different types of research questions?


Calculating Network Metrics in Gephi

27-In the NetworkX tutorial, we used Python to calculate a variety of metrics for our Quaker network data. As a GUI interface, Gephi allows us to calculate those statistics without having to run the back-end code or use a library like NetworkX.


28-The Statistics panel gives you the option to run a number of different calculations on your network data.

29-While Gephi allows us to easily perform these calculations, the program doesn’t automatically tell us what these measures mean or how they are calculated.

30-The HTML report in the pop-up window that displays after you run a Statistics calculation gives you a graph of the data calculation, and sometimes the source for the algorithm used to calculate the statistic.

31-Consult Gephi’s GitHub repository for more information on these statistics.

32-Click Run for each of the options under Statistics.

33-If you click on the Data Laboratory tab, you will see these calculated statistics have been added to your network data.

  • Using what you learned about network analysis in the NetworkX tutorial and the additional GitHub documentation for Gephi, what do these statistics tell you about your network?
  • How do these statistics help you understand your network data more deeply?
  • What questions do you have about the network data or these calculations?

Customizing a Network Visualization in Gephi

34-Select Noverlap for your Layout so your nodes and edges don’t overlap as you are exploring display customization options.

35-The border icons in the Graph panel allow you to customize the display of your network visualization.

36-Click on the Show Node Labels icon to display the node labels.

37-You can change the size of text for your labels by using the slider to the right of “Arial Bold, 32.”


38-You can also click on the Attributes icon to customize what data fields display as part of your labels.

39-While your nodes now have labels, the large number of nodes and edges makes it difficult to differentiate or discern various attributes about our network data.


40-The Appearance panel allows you to customize the color, size or weight, and labels for your nodes and edges. Changing the coloring or sizing of nodes in the Appearance panel can help make our network visualization more meaningful.

41-Select the Size icon to change the default size of your nodes. You can explore different sizes, but 3 works well with this dataset. Click the Apply icon to change the setting for your network visualization.

42-You can also rank the size of nodes based on one of the network metrics you calculated in the Statistics panel.

43-Click the Ranking icon in the Appearance panel, and select Degree from the dropdown menu. Click Apply.

44-Ranking our node size by degree determines the size of a node based on its degree of centrality (or connectedness). Nodes with higher numbers of connections appear larger, and nodes with lower numbers of connections are smaller.

45-What changes did you notice in your network visualization after sizing nodes by degree? How does changing the size of nodes impact your understanding of the network data? What do you see in the network data after this change that you weren’t able to see before?

46-You can change the minimum and maximum node size, and also select different statistics to use for determining node size.

47-Explore these different settings and options to see how different node size calculations shape your understanding of the network data.

48-Save your project.


Exporting Networks


49-Click File->Export to save your project as a static image (SVG, PNG, PDF) or network file (CSV, GDF, GEXF, GraphML, Pajek Net).

50-You can learn more about the different export options on Gephi’s website documentationNetwork.

51-Step 47 in this tutorial had you experiment with display customization settings.

52-Return to the visualizations you explored in that step, and export those you find useful or interesting as PNG files.


Tutorial reflection questions:

In this two-part tutorial on network analysis, we explored tools digitla scholars can use in network analysis. What similarities did you notice across the different digital tools? What differences stood out to you?

  • How can different approaches to network analysis help us understand different aspects or features of network data?
  • What did you find challenging or confusing about network analysis? What questions do you still have about the network data?

Network analysis reflection questions:

Explore a blog post from Kieran Healy that uses network analysis to analyze the social networks around Paul Revere. In another blog post, Emily Kugler uses network analysis to explore “Imagined Empires in Jane Austen’s Mansfield Park.” As a third example, the Kindred Britain project based out of Stanford University explores the different relationship networks that connect 30,000 British figures.

  • How do you think about or understand network analysis differently after seeing how scholars deploy it in their research?
  • What types of data do you think would work well for network analysis? What types of data might not be well-suited for network analysis?
  • What types of research questions do you think network analysis would be useful to help answer or address? What types of questions do you think might not be the best fit for network analysis?