Step 2. Install iPython. You don’t have to do this, you could use IDLE or PyCharm or PyDev, but iPython is the best way to write Python code hands down. It’s also pretty easy to install (assuming you don’t go on mass file deleting sprees like I do..).

Now we can start writing code. Import pandas, and create a dataframe using the ‘read_csv’ function.

1 2 3 4 |
import pandas as pd cities = pd.read_csv("/Users/alexwoods/Desktop/PeurtoRico.csv") # data - cities of peurto rico cities.head() # shows all columns, and first 5 rows. |

We can use the head() function as shown above to get a feel for the dataset. We can also use the count() function.

1 |
cities.count() |

Below you can see how we index the column and row. This is really useful.

1 2 |
cities['city'] # to access the whole 'city' column cities['city'].ix[0] # to access just the first row of the city column - Adjuntas |

Below is something I like to do if I’m planning on running more computer science like algorithms on the data (perhaps a greedy algorithm, or something else that the dataset lends itself to). What I’m talking about is make an object for each row (only if appropriate!).

So I create a standard python class, and pass in row number to the constructor, because that’s how I’m going to create an array of these objects. I’m using ‘getters’ instead of accessing the data members directly, just because that’s a good practice.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
class City(): def __init__(self, rowNum): self.name = cities['city'].ix[rowNum] # the most important attribute! self.zipCode = cities['zip_code'].ix[rowNum] self.latitude = cities['latitude'].ix[rowNum] self.longitude = cities['longitude'].ix[rowNum] self.county = cities['county'].ix[rowNum] def getName(self): return self.name def getZipCode(self): return self.zipCode def getLat(self): return self.latitude def getLong(self): return self.longitude def getCounty(self): return self.county # we should always have a string representation of the object def show(self): string = "City = " + self.getName() + "\n" + "Latitude = " + str(self.getLat()) + "\n" + "Longitude = " + str(self.getLong()) + "\n" print(string) |

The **show()** function prints a nice string representation of the city object. Below I’m going to create an array of and fill it with the whole dataset, so it will be easier to run an algorithm through it.

1 2 3 4 5 6 7 8 9 10 11 12 |
# now I'm going to make an array of cities. The point of all this is to make running # algorithms on the dataset easier on myself. places = [] # already used the name 'cities'... for i in range(cities['city'].count()): # the method inside will return row temp = City(i) # length for the 'city' column. places.append(temp) for j in range(5): # the data in it's new array of objects format! places[j].show() |

Now we’re going to get a little more complicated. I want to calculate the distance in between two cities (nodes) and then write a function that finds the closest city for any given city.

note – The Haversine formula below, you can ignore that. It’s the mathmatical way to calculate distance between two points of longitude and latitude. It’s one of those things you google when you need it then never use or remember it again. It is, however, critical to our distance function.

Notice that when I write the function to find the closest city, I’m extremely careful to make sure that it **doesn’t ever compare to itself**. This is because if it did, it would pick itself every time, making the algorithm useless. This is important to take into account if you are writing any route planning algorithms, like the one I want you to try in the challenge part.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
# now some other methods that might be useful for analysis on these cities! # the haversine formula is a way to calculate distance between a longitude and latitude. # this code is via - http://bit.ly/1bKauqS # don't look to into it unless you love geography... from math import radians, cos, sin, asin, sqrt def haversine(lon1, lat1, lon2, lat2): """ Calculate the great circle distance between two points on the earth (specified in decimal degrees) """ # convert decimal degrees to radians lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2]) # haversine formula dlon = lon2 - lon1 dlat = lat2 - lat1 a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2 c = 2 * asin(sqrt(a)) r = 6371 # Radius of earth in kilometers. Use 3956 for miles return c * r # a distance function to make my life easier def distance(a, b): # 'a', and 'b' will just be City objects! return haversine(a.getLong(), a.getLat(), b.getLong(), b.getLat()) # what if I want to know the closest city? import random def findClosestCity(a): start = places[random.randrange(0, 89)] while start == a: # if I don't make sure it can't be itself, start = places[random.randrange(0, 89)] # it will pick itself every time. champDistance = distance(a, start) # the distance we will "challenge" closest = start for i in places: testDistance = distance(a, i) if testDistance < champDistance and not a == i: closest = i champDistance = testDistance # now it will be the thing to challenge. return closest |

1 2 3 |
# now let's test some of the functionality of what we just coded!!! # let's find a location to start at. 35 is a randomish number.. places[35].show() |

1 2 3 4 5 |
# ok, Mayaguez it is! Mayaguez = places[35] closeToMaya = findClosestCity(Mayaguez) closeToMaya.show() # here's google directions to check - http://bit.ly/1BTTyks |

Challenge – write a greedy algorithm to try and find the minimum travel time to ten locations.