Lesson 30: Wrap Up and Next Steps!#

You did it! You made it all the way through the course! 🎉

We hoped you learned some valuable skills and some interesting ideas in the time you spent with us! We want to wrap up this course with this last lesson with two main objectives:

Take a look back at the course as a whole. With the additional perspective of hindsight, we might have a better appreciation for how the course concepts relate to each other.
Take a look at some potential next steps in your journey!

Victory Lap#

In the introduction to the book, we told you there were 5 main learning goals in the course. They were:

Programming - More advanced programming concepts than a first programming course including how to write bigger programs with multiple classes and modules.
Data - How to work with different types of data: tabular, text, images, geo-spatial.
Data Science Tools - Ecosystem of data science tools including Jupyter Notebook and various data science libraries including scikit image, scikit learn, and Pandas data frames.
Computer Science - Basic concepts related to code complexity, efficiency of different types of data structures, and memory management.
Data and Society - Thinking more broadly about the impact of data science in society and the role we, as programmers, have in making software that process data that is just and fair.

When you first read these, they probably were pretty abstract and didn’t have much meaning for you. Now that we look back, you should hopefully be able to appreciate what each of these learning areas means to you as a data programmer.

1) Programming#

We learned some pretty complex programming skills in this course! From writing code that reads from a file all the way to building a search engine from scratch! The experience of solving these challenging programming problems will help you solve novel challenges you face in the future.

Topics:

Testing code
Using data structures: Lists, sets, dictionaries, tuples
Classes and Objects
Modules and Packages in Python
Anonymous functions (lambdas) and some ideas of functional programming

2) Data#

Writing programs that interact and analyze data is one of the most important programming skills you can learn. You experienced working with many types of data that you will commonly run into in the future. While we can’t possibly cover every single data type or format you will face, seeing how to process these types can provide solid reference points to tackle new data types as they arrive.

Tabular (CSV) data
Unstructured text data
Geo-spatial data
Images
If schedule allows: Time series and web data

3) Data Science Tools#

You learned how to use many pre-written tools to help you process and analyze data. These tools are just the tip of the iceberg of a data science ecosystem. Using these as a basis, you can now go learn any tool that might be more specialized for your work (we have a few recommendations below!).

Jupyter Notebooks
Visual Studio Code + Anaconda
Libraries
- pandas
- seaborn and matplotlib
- sklearn
- geopandas
- numpy
- requests
- skimage

4) Computer Science#

As this is a computer science course, we also wanted to introduce fundamental concepts in computer science to help you understand how to write programs that can process large amounts of data efficiently. Having an understanding of the theory and techniques to efficiently process data is an incredibly valuable skill when it comes to the world of ever-growing datasets.

Algorithmic efficiency
Code profiling
Computer memory
Hashing
Joins (databases)
Spatial Indices
Why Python is slow (and why that’s okay)

5) Data and Society#

It’s often the case that programming courses are taught “in a vacuum” where the choices made by a programmer are often not critically reflected on. However, the choices of programmers can have huge ramifications in the society their programs operate in, and we if we aren’t careful about our choices, can cause great harm to people. So throughout many of our data science topics, we also discussed ramifications and prompted for questions to reflect on our choices.

Our discussion in this course is nowhere near comprehensive, and it likely never will be. The situations you’ll have in your future are too varied for a single course to cover everything you could possibly run into. But hopefully we provided a starting point for you to reflect on your work and realize the importance of building tools that help everyone.

Effective data visualizations
Impacts of machine learning
Ethics and Data Science
Algorithmic fairness
Privacy

Next Steps#

So now that you are wrapping up with the course, we can look towards what might come next. Before we do list out some further resources, we want to point out that any list we come up with can never completely cover all your options. You should follow any path you are interested in even if it isn’t listed here! We can only list a small-handful of resources. Something being omitted from this list, by no means should indicate that we don’t think that path is valid or important.

Online Learning Resources#

UW Courses#

The University of Washington has many courses related to data science and its applications.

The courses Hunter knows best are in the Paul G. Allen School of Computer Science & Engineering so, he highlights a few courses that are good follow-ons to CSE 163. Many CSE courses have open-access lecture slides and/or projects which you can follow along with.

Other Resources#

UC Berkeley’s Data 8 and Data 100 have some overlap with CSE 163, but go much further in some topics.
Coursera has many great full courses for data science applications.
Data Camp Community Tutorials has many great notebook tutorials for many topics of interest.
Real Python has many written tutorials on many topics related to Python and data science.
Towards Data Science is a popular blog on Medium that has many interesting blog posts. Note that this blog is contributed to by community members, so the quality of the articles vary.

Projects#

One great way to deepen your learning is to work on personal or group projects. Like we mentioned earlier, we can’t possibly list out every idea for a project here, but we try to provide some concrete suggestions.

Learn a New Library#

We learned many Python libraries for processing data in this course, but this is just the tip of the iceberg! There are far too many libraries others have written that you can use in your own projects! We highlight some other popular libraries that we did not cover this quarter, but again, there are many more other there that you might find interesting or useful!

Data Visualization: Bokeh or Altair
Natural Language Processing: NLTK or spaCy
Machine Learning: PyTorch, TensorFlow, or Keras
Images: OpenCV

Learn a New Language#

Python is a very prominent language for data science, but it’s not the only useful language to know! Here are some other programming languages commonly used by data scientists:

R for numerical processing and statistics.
Julia New and up-coming language that’s gaining traction in computational biology applications.
Scala is compatible with Java and its libraries, has much nicer syntax than Java.
Javascript is the language of the world-wide web. We saw a bit about how the web works with HTML, and this will make it go much further. Also very common to use Javascript in data visualizations you share with people on the web!

Contribute to Open-Source Projects#

Almost all of the libraries we learned this quarter (or linked above) are open-source software. This means that all of the code for the libraries is freely available online, and in most cases, are under active development by community members! This means you can probably contribute to them too!

Most of these libraries have projects on GitHub, which is where you can find, read, and contribute to these projects (example: pandas). Every community has their own rules and processes for contributing so make sure to read their guidelines, but you can find a general description for how to contibute here.

Fin#

Thanks so much for engaging with our course materials, and we wish you the best of luck in the future!

Intermediate Data Programming

Lesson 30: Wrap Up and Next Steps!

Contents