Dune — A Hidden Network. In this article, with Patrik Szigeti… | by Milan Janosov | Mar, 2024
Following the success of Dune both at the box office and with the critics in 2021, Dune: Part Two was one of the most anticipated movies of 2024, and it didn’t disappoint. On track to earn more, and holding higher ratings both on Rotten Tomatoes and iMDB than its prequel at the time of writing this article, with its ever changing political landscape, Dune is the perfect franchise to dive into through network science. In this short piece, we aimed to explore the connections between the different Houses and people of the Impremium based on the first three books of Frank Herbert — Dune (1965), Dune Messiah (1969) and Children of Dune (1976).
In the first part of this article, we present a Python-based approach to collecting character profile data from the Dune Wiki and turn those profiles into a catchy network graph. Then, in the second — rather spoiler-heavy — section, we dive into the depth of the network and extract all the stories it has to say about the first trilogy of the Dune.
All images were created by the authors.
First, we use Python to collect the full list of Dune characters. Then, we download their biography profiles from each character’s fan wiki site and count the number of times each character’s story mentions any other character’s story, assuming these mentions encode various interactions between any two pairs of characters. Then, we will use network science to turn these relationships into a complex graph.
1.1 Collecting the list of characters
First off, we collected the list of all relevant characters from the Dune fan wiki site. Namely, we by used urllib and bs4 to extracted the names and fan wiki id-s of each character mentioned and has its own wiki page encpoded by their id. We did this for the first three books: Dune, Dune Messiah and Childen of Dune. These three books cover the rise of the house of Atreides.
Sources:
First, download the character listing site’s html:
dune_meta = {
'Dune': {'url': 'https://dune.fandom.com/wiki/Dune_(novel)'},
'Dune Messiah': {'url': 'https://dune.fandom.com/wiki/Dune_Messiah'},
'Children of Dune': {'url': 'https://dune.fandom.com/wiki/Children_of_Dune_(novel)'}
}for book, url in dune_meta.items():
sauce = urlopen(url['url']).read()
soup = bs.BeautifulSoup(sauce,'lxml')
dune_meta[book]['chars'] = soup.find_all('li')
A little manual help to fine-tune the character name and id:
dune_meta['Dune']['char_start'] = 'Abulurd'
dune_meta['Dune']['char_end'] = 'Arrakis'
dune_meta['Dune Messiah']['char_start'] = 'Abumojandis'
dune_meta['Dune Messiah']['char_end'] = 'Arrakis'
dune_meta['Children of Dune']['char_start'] = '2018 Edition'
dune_meta['Children of Dune']['char_end'] = 'Categories'
Then, we extracted all the potentially relevant names and the corresponding profile urls. Here, we manually checked from which tag blocks the names start (e.g. as opposed to the outline of the character listing site). Additionally, we decided to drop the characters marked by ‘XD’ and ‘DE’ corresponding to the extended series, as well as characters that were “Mentioned only” in a certain book:
for k, v in dune_meta.items():
names_urls = {}
keep_row = False
print(f'----- {k} -----')
for char in v['chars']:
if v['char_start'] in char.text.strip():
keep_row = True
if v['char_end'] in char.text.strip():
keep_row = False
if keep_row and 'Video' not in char.text:
try:
url = ' + str(char).split('href="')[1].split('" title')[0]
name = char.text.strip()
if 'wiki' in url and 'XD' not in name and 'DE' not in name and '(Mentioned only)' not in name:
names_urls[name] = url
print(name)
except:
pass
dune_meta[k]['names_urls'] = names_urls
This code block then outputs the list of characters, such as:
Finally, we check the number of characters we collected and save their profile URLs and identifiers for the next subchapter.
dune_names_urls = {}
for k, v in dune_meta.items():
dune_names_urls.update(dune_meta[k]['names_urls'])names_ids = {n : u.split('/')[-1] for n, u in dune_names_urls.items()}
print(len(dune_names_urls))
The outputs of this cell, showing 119 characters with profile URLs:
1.2 Downloading character profiles
Our goal is to map out the social network of the Dune characters — which means that we need to figure out who interacted with whom. In the previous sub chapter, we got the list of all the ‘whom,’ and now we will get the info about their personal stories. We will get those stories by again using simple web scraping techniques, and then save the source of each characters personal site in a separate file locally:
# output folder for the profile htmls
folderout = 'fandom_profiles'
if not os.path.exists(folderout):
os.makedirs(folderout)# crawl and save the profile htmls
for ind, (name, url) in enumerate(dune_names_urls.items()):
if not os.path.exists(folderout + '/' + name + '.html'):
try:
fout = open(folderout + '/' + name + '.html', "w")
fout.write(str(urlopen(url).read()))
except:
pass
The result of running this code will be a folder in our local directory with all the fan wiki site profiles belonging to every single selected character.
1.3 Building the network
To build the network between characters, we count the number of times each character’s wiki site source references any other character’s wiki identifier using the following logic. Here, we build up the edge list — the list of connections which contain both the source and the target node (character) of the connections as well as the weight (co-reference frequency) between the two characters’ pages.
# extract the name mentions from the html sources
# and build the list of edges in a dictionary
edges = {}for fn in [fn for fn in os.listdir(folderout) if '.html' in fn]:
name = fn.split('.html')[0]
with open(folderout + '/' + fn) as myfile:
text = myfile.read()
soup = bs.BeautifulSoup(text,'lxml')
text = ' '.join([str(a) for a in soup.find_all('p')[2:]])
soup = bs.BeautifulSoup(text,'lxml')
for n, i in names_ids.items():
w = text.split('Image Gallery')[0].count('/' + i)
if w>0:
edge = '\t'.join(sorted([name, n]))
if edge not in edges:
edges[edge] = w
else:
edges[edge] += w
len(edges)
Once we run this block of code, we will get the result of 307 as the number of edges connecting the 119 Dune characters.
Next, we use the NetworkX graph analytics library to turn the edge list into a graph object and output the number of nodes and edges the graph has:
# create the networkx graph from the dict of edges
import networkx as nx
G = nx.Graph()
for e, w in edges.items():
if w>0:
e1, e2 = e.split('\t')
G.add_edge(e1, e2, weight=w)G.remove_edges_from(nx.selfloop_edges(G))
print('Number of nodes: ', G.number_of_nodes())
print('Number of edges: ', G.number_of_edges())
The result of this code block:
The number of nodes is only 72, meaning 47 characters were not linked to any central member in their — probably rather brief — wiki profiles. Additionally, we see a decrease of four in the number of edges because a few self-loops were removed as well.
Let’s take a brief view of the network using the built-in Matplotlib plotter:
# take a very brief look at the network
import matplotlib.pyplot as plt
f, ax = plt.subplots(1,1,figsize=(15,15))
nx.draw(G, ax=ax, with_labels=True)
The output of this cell:
While this visual already shows some network structure, we exported the graph into a Gephi file using the following line of code, and designed the network attached on the figure below (the how-to of such network visuals will be the topic of an upcoming tutorial article):
nx.write_gexf(G, 'dune_network.gexf')
The full Dune network: