Initializing & Cleaning a Dataset

Learn to Code Today!

Learn 10x faster: coding, no-code and data skills. Join millions of users mastering new tech skills and accelerating their career with Enki.
Get started

This is part of the “Intro to Data Analysis with Python” series of posts, with content from the Enki app. If you stumbled upon this, you could start from the beginning.

Using the dataset from the previous insight[1], we will show you how to clean it up before we start the analysis.

First off, when we import a dataset, we can use the head() or tail() functions to check the top or bottom 5 rows, respectively.

You can also pass a number to head() and tail() to overwrite the default value of 5.

Using importedRawData.head() we get:

dataframe.head

Using importedRawData.tail() we get:

dataframe.tail

This is useful to know right away if your dataset has loaded or not.

As you can see, there are a lot of columns in this dataset.

To check the total number of rows and columns in your dataset, add .shape to your DataFrame.

This dataset has 6234 rows and 12 columns.

Rows start from 0 instead of 1. This is why the last columns show_id is 6233 instead of 6234.

We will remove the columns we don't need for our analysis and leave the ones we will use in this workout.

To determine which columns we will remove, let's first check which cells have missing data.

To check which data is missing run the .isnull() command:

Is null on Raw Data

This will give us a table with True / False values. True meaning empty.

Footnotes

[1:Previous Dataset]

About Enki

  • Fully personalized online up-skilling
  • Unlimited AI coaching
  • Designed by Silicon Valley experts

More articles

Meet your AI-enabled coach

Professional athletes have a coach for every aspect of their performance. Why can’t you for your work? Enki’s AI-powered coaching on-demand - combined with state of the art, structured learning content - makes this a reality.
1
1:1 AI Coaching
How do I remove duplicate emails?
Convert the list to a set and back to a list. Sets automatically remove duplicates.
2
Personalized Exercises
3
Interactive practice

Unlock full access to all skills on Enki with a 7-day free trial

Get started