Tutorial 1.3 -- Social Network Analysis with Flickr

This tutorial will give you some hands-on experience parsing data from live API feeds. We’ll then apply network analysis concept to the data we obtained.

Problem Statement

We will examine data from a small public group on social photo sharing site, Flickr. This group is called ANU Photography Club, and is used to share memories about Beihang University by students and alumni.

The group URL is http://www.flickr.com/groups/anuphotographyclub/

The web page will show that this group has 29 members as of 16 August, 2013. Viewing one of the user’s profile (user name “delayedflight”) will show that he has a few dozen contacts.

One may wonder how closely knit this group is? i.e. how many of these group members are each other’s friends (or friend’s friends), and when they are friends with each other, are their friends also friends with each other (triadic closure)?

Getting data from Flickr

Sign up for an API key from Flickr, or borrow mine ‘c690eb01b1bf91e0965cfa4b61b9a23f’. Note that this key will be deleted after this class, you cannot use it for purposes.

Start Python in the same directory by typing “python” or “ipython –pylab”.

First type the following to load your python libraries, and tell python about Flickr API (its documentation is here).

import json
import urllib2

flickr_json_api = 'http://api.flickr.com/services/rest/?format=json&%s'
api_key = 'YOUR_API_KEY_HERE'

Find this group by sending a first request to Flickr API and parse the resulting json.

req = {'api_key':api_key, 'method':'flickr.groups.search', 'text':'ANU+Photography+Club'}

argstr = "&".join( map(lambda t:'='.join(t), req.iteritems()) )
req_url = flickr_json_api % argstr
print req_url

This code block will compose your API request and print the request URL. It looks like http://api.flickr.com/services/rest/?method=flickr.groups.search&api_key=...&text=...

We send this HTTP request to flickr, and get the response back as a string.

# read json response
jstr = urllib2.urlopen(req_url).read()
# examine what we've got 
jstr

ToDo for you: copy-paste the request URL above to your browser address bar, and compare the content of jstr above with what the browser displays.

""" We find that flickr response has an extra function wrapper 'jsonFlickrApi(...)'
    this is for parsing the return with a javascript function (often in a browser), we don't need it
    Let's first get rid of this extra wrapper before parsing the remaining string as json
"""
jstr = jstr.replace('jsonFlickrApi', '').strip('()')
# parse json info 
jinfo = json.loads( jstr )
# look at the json we've got back
print json.dumps(jinfo, indent=2)

# look at the first group and its data fields
target_group = jinfo['groups']['group'][0]
target_group

Check that the target_group you get above is indeed the ANU Photography Club group.

Now using the group id, target_group['nsid'], to get the list of memembers of this group. Getting this information uses autheticated API and is beyond the scope of this tutorial. This information is gathered and parsed for you using this script and is stored in usr_ANUphoto.pkl. (Optional) Take a look at the script to see how usr_dict.pkl was generated.

Download this pkl file to your local dir, and then read it with the following code. You can see that

import pickle
usr_dict = pickle.load(open('usr_ANUphoto.pkl', 'rb'))

Examine usr_dict with the following print command, you will see that it contains a mapping between Flickr user ID (internal string) to user-selected screen names.

print "\n".join( map(lambda t: "%s\t: %s" % (t[0], t[1]), usr_dict.iteritems()) )

""" The output of printing reads like
62719492@N08    : pingu2011
50959051@N02    : delayedflight
86265891@N04    : Anuditya RATHOUD
23733887@N05    : daniele martinie
11652060@N02    : evandently
57829351@N04    : waynetsai
... ... 
"""

Next, get the list of Flickr contacts of each user. Here we make use of Flickr API again.

Tip: To execute a block of code in ipython, say, save the script as “get_contacts.py” and then type “%run get_contacts.py” OR try the magic command “%paste” after you’ve got the code in clipboard.

contact_dict = dict.fromkeys(usr_dict.keys())
unique_contacts = []
for u, v in usr_dict.iteritems():
    req = {'api_key':api_key, 'method':'flickr.contacts.getPublicList', 'user_id':u}
    req_url = flickr_json_api % "&".join( map(lambda t:'='.join(t), req.iteritems()) )
    
    jstr = urllib2.urlopen(req_url).read()
    jinfo = json.loads( jstr.replace('jsonFlickrApi', '').strip('()') )

    if jinfo['contacts']['total'] > 0:
        contact_dict[u] = dict( map(lambda s: (s['nsid'], s['username']), jinfo['contacts']['contact']) )
        unique_contacts += contact_dict[u].keys()
        print "usr: %s, name: %s, %d contacts" % (u, v, len(contact_dict[u]) ) 
        
    else:
        contact_dict[u] = {}
        print "usr: %s, name: %s, 0 contacts" % (u, v ) 

print len(unique_contacts), len(set(unique_contacts))

This block of code should print out user informatione like this:

usr: 62719492@N08, name: pingu2011, 0 contacts
usr: 50959051@N02, name: delayedflight, 64 contacts
usr: 86265891@N04, name: Anuditya RATHOUD, 4 contacts
usr: 23733887@N05, name: daniele martinie, 1000 contacts
usr: 11652060@N02, name: evandently, 33 contacts
usr: 57829351@N04, name: waynetsai, 7 contacts
usr: 73197277@N03, name: GBYZH, 0 contacts
usr: 94058237@N07, name: mingbook, 0 contacts
usr: 93369377@N06, name: little_penguin, 3 contacts
... ...

Question 1: Copy-paste the output from the code above, how many contacts does each group member has? How many unique people are these group members collectively connected to?

Constructing a Graph

We construct an undirected graph containing the members in this Flickr group, and their contact relatioinships.

import networkx as nx
G = nx.Graph()

for k, v in usr_dict.iteritems():
    for x in contact_dict[k].keys():
        if x in usr_dict.keys(): # we only consider a network among group members
            if k not in G:  G.add_node(usr_dict[k])
            if x not in G:  G.add_node(usr_dict[x])
            G.add_edge(usr_dict[k], usr_dict[x])

print "there're %d within-group edges involving %d members" % ( G.number_of_edges(), len(G) ) 
print "edge list"
G.edges()

Question 2: How many memmbers of the ANU Photograph Club are connected to each other? How many edges exist among them? Please answer this question by either (1) copy-paste the output from the code snippet above to the submission sheet or (2) use Gephi to visualize the graph from this graphml file.

Computing Graph Metrics

Clustering coefficient measures the local cohesion of a graph. For an unweighted graph, the clustering coefficient of each node u is the fraction of possible triangles that exist among u’s neighbors. The transitivity for graph G is the fraction of all possible triangles present in G.

You are now asked to implement part of the computation for computing transitivity.

import networkx as nx

def count_triangles(G):
    # initialize
    num_triangles = {}
    for x in G.nodes():
        num_triangles[x] = 0
        """
            Write some code to count the total number of triangles
            that node x is part of
        """

    return num_triangles

def clustering(G):
    num_triangles = count_triangles(G)
    clustering_coef = dict.fromkeys(num_triangles.keys())
    for x in G.nodes():
        dx = G.degree(x)
        """
            Write a few lines of code to compute the clustering coefficient for node x
        """

    return clustering_coef

Question 3: (a) Copy-paste your code above into the submission box. (b) What is the clustering coefficient of each node? (c) What is the total number of triangles in graph G? is it sum(num_triangles.values()), why or why not?

(Optional) Compare your answer for (c) with what you see visually from nx.draw(G), and compare your answer for (b) with results from the corresponding networkx function.

CSS2013 01 May 2013 Canberra, Australia