Pandas Recap#

See, that was a ton of syntax! Don’t worry though, you’ll get plenty of practice with pandas over time, this is just your first day! For example, once I started getting better with pandas , I stopped using spreadsheets to calculate grades for courses because I got so much better at writing short little programs to compute the values I wanted!

On this page, we have a “cheat-sheet” version of everything on the last slide. You’ll likely find it to be a good reference! However, we still recommend trying to turn this into something of your own making to help solidify the concepts and help you build up a stronger mental model!

import pandas as pd

# Read a file
df = pd.read_csv('some_file.csv')

# Access a column
df['col']

# Summary statistic of column
df['col'].mean()

# Lots of summary functions to use
#   mean: Calculates the average value of the Series
#   min: Calculates the minimum value of the Series
#   max: Calculates the maximum value of the Series
#   idxmin: Calculates the index of the minimum value of the Series
#   idxmax: Calculates the index of the maximum value of the Series
#   count: Calculates the number values in the Series
#   unique: Returns a new Series with all the unique values from the Series

# Element-wise operations
df['col1'] + df['col2']

# Also works with single values
df['col'] // 2
df['col'] > 2

# Filter a DataFrame (& for and, | for or, ~ for not)
mask1 = df['col'] > 2
df[mask1]

mask2 = df['col2'] == 2
df[mask1 & mask2]
df[mask2 | ~mask1]

# Location: df.loc[row_indexer, column_indexer] (column_indexer is optional, default all)
# Indexers can be many types (can mix and match for row/col!):
#   * List of values or a slice
#   * Mask
#   * : (for everything)
#   * Single value

# Single value
df.loc[0, 'col'] # Returns value

# List or slice of values
df.loc[[0, 2, 1], ['col1', 'col2']]  # Returns DataFrame
df.loc[0:4, 'col1':'col5']  # Returns DataFrame

# Everything
df.loc[:, :]  # DataFrame

# Other examples
df.loc[0]  # Series, default for column indexer is :
df.loc[0:5, 'col']  # Series
df.loc[1, 'col':'col' ] # Series
df.loc[0:5, ['col1', 'col2']] # DataFrame