Welcome and Introduction#

Welcome! 🎉

This website contains the materials for University of Washington’s CSE 163: Intermediate Data Programming. This website is the interactive textbook for the class, including all the lesson readings, videos, and practice problems. While this book is designed for a particular class, we hope that by providing the materials in a publicly-accessible format, you will find it useful in your data programming journey.

What is Intermediate Data Programming?#

The world has become data-driven. Domain scientists and industry increasingly rely on data analysis to drive innovation and discovery; this reliance on data is not only restricted to science or business, but also is crucial to those in government, public policy, and those wanting to be informed citizens. As the size of data continues to grow, everyone will need to use powerful tools to work with that data.

This course teaches intermediate data programming. This course assumes some level of programming experience that you would receive in a first programming course. However, it does not assume that you already know Python. If you know how to program in another language, this course will be a great introduction to learning Python and data programming.

We broadly define the term data programming as “the programming required to be an effective data scientist”. We say broadly here because we aren’t meaning to make a precise, academic definition of the concept or the terms used in our definition. Generally, we think of data science classes thinking about statistical and algorithmic concepts applied to a domain or context. Our course focuses on “data programming”, which means we focus more on the algorithmic side of things and the programming required to even get your data into some data science tool you might use. We introduce them to some common methods data scientist uses to analyze data (data visualization, machine learning, etc) but the focus is on the programming that supports using those methods.

Course Level Learning Goals#

There are five high-level learning goals in this course. Below, we outline a high-level list of topics covered in this course by each of this learning goals.

1) Programming#

Modules: 1, 2, 5

More advanced programming concepts than a first programming course including how to write bigger programs with multiple classes and modules.

Topics:

  • Testing code

  • Using data structures: Lists, sets, dictionaries, tuples

  • Classes and Objects

  • Modules and Packages in Python

  • Anonymous functions (lambdas) and some ideas of functional programming

2) Data#

Modules: 2, 3, 5, 7, 8

How to work with different types of data: tabular, text, images, geo-spatial.

Topics:

  • Tabular (CSV) data

  • Unstructured text data

  • Geo-spatial data

  • Images

  • If schedule allows: Time series and web data

3) Data Science Tools#

Modules: 3, 4, 7, 8

Ecosystem of data science tools including Jupyter Notebook and various data science libraries including scikit image, scikit learn, and Pandas data frames.

Topics:

  • Jupyter Notebooks

  • Visual Studio Code + Anaconda

  • Libraries

    • pandas

    • seaborn and matplotlib

    • sklearn

    • geopandas

    • numpy

    • requests

    • skimage

4) Computer Science#

Modules: 6, 7, 9

Basic concepts related to code complexity, efficiency of different types of data structures, and memory management.

Topics:

  • Algorithmic efficiency

  • Code profiling

  • Computer memory

  • Hashing

  • Joins (databases)

  • Spatial Indices

  • Why Python is slow (and why that’s okay)

5) Data and Society#

Modules: 3, 9, 10

Thinking more broadly about the impact of data science in society and the role we, as programmers, have in making software that process data that is just and fair.

Topics:

  • Effective data visualizations

  • Impacts of machine learning

  • Ethics and Data Science

  • Algorithmic fairness

  • Privacy

How this book works#

This book is broken up into a Modules (1-10), and each module contains 2 or 3 Lessons. At UW, each Module maps to about one week of class, with students completing one Lesson each day. Each Lesson contains a series of readings and some practice problems in the form of quiz questions or programming problems. We also have a reflective component to ask you to write down what you learned in each lesson to summarize your learning. There are also videos that go along with each lesson that have Hunter talking over the reading slide. The videos don’t contain any other information than the reading slides do, so there is no need to watch them. Note that these videos are specific to the UW course (the tools we use and discussing homeworks), but you may still find them useful.

We recommend doing the practice problems for each lesson before moving on to the next. Learning is an active process! Doing the problems makes you actively participate in your learning, and can give you feedback on whether you understand the concepts enough to apply them to new situations. The coding problems come with an unit tests that will let you know whether or not your solution exhibits the correct behavior.

You may also find the course website for the course useful to find extra resources.

Note: This book was originally written in a system we use internally EdStem. While EdStem has many fantastic features for student practice, it is unable to be accessed publicly which is why we made this website! While public access is really important, it does mean this current version of the book has some limitations. Namely, the quiz questions aren’t interactive and can’t reveal solutions. We hope to iterate on this experience and make it more interactive, but more complex behaviors take time. Since this website is so new, there might be leftover mentions to EdStem or any other UW specific things. If you find anything in the book that doesn’t seem to make sense, please file an issue on GitHub letting us know where the issue is, and we will fix it as soon as we can!

Once you are ready to read, you can navigate the book using the sidebar on the left!