Overview
Teaching: 25 min
Exercises: 20 minQuestions
How can I plot my data?
Objectives
Create a time series plot showing a single data set.
Create a scatter plot showing relationship between two data sets.
matplotlib
is the most widely used scientific plotting library in Python.matplotlib.pyplot
.%matplotlib inline
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
plt.plot(x, y)
plt.xlabel('Numbers')
plt.ylabel('Doubles')
matplotlib.pyplot
.import pandas
data = pandas.read_csv('data/gapminder_gdp_oceania.csv', index_col='country')
data.ix['Australia'].plot()
plt.xticks(rotation=90)
DataFrame.plot
plots with the rows as the X axis.data.T.plot()
plt.ylabel('GDP per capita')
plt.xticks(rotation=90)
plt.style.use('ggplot')
data.T.plot(kind='bar')
plt.xticks(rotation=90)
plt.ylabel('GDP per capita')
# Accumulator pattern to collect years (as character strings).
years = []
for col in data.columns:
year = col[-4:]
years.append(year)
# Australia data as list.
gdp_australia = data.ix['Australia'].tolist()
# Plot: 'b-' sets the line style.
plt.plot(years, gdp_australia, 'b-')
# Accumulator pattern to collect years (as character strings).
years = []
for col in data.columns:
year = col[-4:]
years.append(year)
# Select two countries' worth of data.
gdp_australia = data.ix['Australia']
gdp_nz = data.ix['New Zealand']
# Plot with differently-colored markers.
plt.plot(years, gdp_australia, 'b-', label='Australia')
plt.plot(years, gdp_nz, 'g-', label='New Zealand')
# Create legend.
plt.legend(loc='upper left')
plt.xlabel('Year')
plt.ylabel('GDP per capita ($)')
plt.scatter
or DataFrame.plot.scatter
plt.scatter(gdp_australia, gdp_nz)
data.T.plot.scatter(x = 'Australia', y = 'New Zealand')
Minima and Maxima
Fill in the blanks below to plot the minimum GDP per capita over time for all the countries in Europe. Modify it again to plot the maximum GDP per capita over time for Europe.
data_europe = pandas.read_csv('data/gapminder_gdp_europe.csv') data_europe.____.plot(label='min') data_europe.____ plt.legend(loc='best')
Correlations
Modify the example in the notes to create a scatter plot showing the relationship between the minimum and maximum GDP per capita among the countries in Asia for each year in the data set. What relationship do you see (if any)?
data_asia = pandas.read_csv('gapminder_gdp_asia.csv') data_asia.describe().T.plot(kind='scatter', x='min', y='max')
You might note that the variability in the maximum is much higher than that of the minimum. Take a look at the maximum and the max indexes:
data_asia = pandas.read_csv('gapminder_gdp_asia.csv') data_asia.max().plot() print(data_asia.idxmax()) print(data_asia.idxmin())
More Correlations
This short programs creates a plot showing the correlation between GDP and life expectancy for 2007, normalizing marker size by population:
data_all = pandas.read_csv('gapminder_all.csv') data_all.plot(kind='scatter', x='gdpPercap_2007', y='lifeExp_2007', s=data_all['pop_2007']/1e6)
Using online help and other resources, explain what each argument to
plot
does.
Key Points
matplotlib
is the most widely used scientific plotting library in Python.Plot data directly from a Pandas dataframe.
Select and transform data, then plot it.
Many styles of plot are available.
Can plot many sets of data together.