Initializing & Cleaning a Dataset Part II

This is part of the “Intro to Data Analysis with Python” series of posts, with content from the Enki app. If you stumbled upon this, you could start from the beginning.

Previously we have determined what our dataset is about and how many rows/columns it has.

We also checked which cells are empty or not using the isnull() function[1].

However, reading that table cell by cell would take a lot of time. Luckily, adding .sum() at the end would count the missing cells for us:

Now that we know what is and isn't missing, we can decide what we want to do with the information we have.

First, let's remove the columns we won't use.

To remove a column from a dataset we need to use the .drop(columns = ["column1", "column2", ...]) command.

Here are the columns we will remove:

If we were to run head or tail again, we would get a cleaner output.

‍

The next step is deciding what we want to analyze and start analyzing. We will do that in the next workout.

Here is all the work we have done in this blog series within a Google Collab Notebook.

If you want to continue reading, download our Enki App and subscribe to the Python Data Analysis topic.

Footnotes

[1:Previous Dataset]

Using the .isnull() function on a DataFrame will give us a table of True/False values.

‍

About Enki

Fully personalized online up-skilling
Unlimited AI coaching
Designed by Silicon Valley experts

Get Started

Meet your AI-enabled coach

Professional athletes have a coach for every aspect of their performance. Why can’t you for your work? Enki’s AI-powered coaching on-demand - combined with state of the art, structured learning content - makes this a reality.

1

1:1 AI Coaching

How do I remove duplicate emails?

Convert the list to a set and back to a list. Sets automatically remove duplicates.

2

Personalized Exercises

3

Interactive practice

Initializing & Cleaning a Dataset Part II

Learn to Code Today!

Footnotes

About Enki

More articles

Matplotlib: Practical Applications in Data Analysis

Why Learning NumPy and Pandas Will Supercharge Your Career

🚀 Swift vs Kotlin: Why Every Developer Should Learn One (or Both!)

How to use HAVING clause in SQL?

How to Calculate Standard Deviation in Python

Concatenation - or How to Combine Strings in Python

Meet your AI-enabled coach

Unlock full access to all skills on Enki with a 7-day free trial

Reviews

Skills

Resources

About