How to Remove Rows With Nan in Pandas
- HowTo
- Python Pandas Howtos
- Pandas Drop Rows With NaN
Pandas Drop Rows With NaN
Created: January-16, 2021 | Updated: November-26, 2021
- Pandas Drop Rows With NaN Using the
DataFrame.notna()
Method - Pandas Drop Rows Only With
NaN
Values for All Columns UsingDataFrame.dropna()
Method - Pandas Drop Rows Only With
NaN
Values for a Particular Column UsingDataFrame.dropna()
Method - Pandas Drop Rows With
NaN
Values for Any Column UsingDataFrame.dropna()
Method
This tutorial explains how we can drop all the rows with NaN
values using DataFrame.notna()
and DataFrame.dropna()
methods.
We will use the DataFrame in the example code below.
import pandas as pd roll_no = [501, 502, 503, 504, 505] data = pd.DataFrame({ 'Name': ['Alice', 'Steven', 'Neesham', 'Chris', 'Alice'], 'Age': [19, None, 18, 21, None], 'Income($)': [4000, 5000, None, 3500, None], 'Expense($)': [3000, 2000, 2500, 25000, None] }) print(data)
Output:
Name Age Income($) Expense($) 0 Alice 19.0 4000.0 3000.0 1 Steven NaN 5000.0 2000.0 2 Neesham 18.0 NaN 2500.0 3 Chris 21.0 3500.0 25000.0 4 Alice NaN NaN NaN
Pandas Drop Rows With NaN Using the DataFrame.notna()
Method
The DataFrame.notna()
method returns a boolean object with the same number of rows and columns as the caller DataFrame. If an element is not NaN
, it gets mapped to the True
value in the boolean object, and if an element is a NaN
, it gets mapped to the False
value.
import pandas as pd roll_no = [501, 502, 503, 504, 505] data = pd.DataFrame({ 'Name': ['Alice', 'Steven', 'Neesham', 'Chris', 'Alice'], 'Age': [19, None, 18, 21, None], 'Income($)': [4000, 5000, None, 3500, None], 'Expense($)': [3000, 2000, 2500, 25000, None] }) print("Initial DataFrame:") print(data) print("") data = data[data['Income($)'].notna()] print("DataFrame after removing rows with NaN value in Income Field:") print(data)
Output:
Initial DataFrame: Name Age Income($) Expense($) 0 Alice 19.0 4000.0 3000.0 1 Steven NaN 5000.0 2000.0 2 Neesham 18.0 NaN 2500.0 3 Chris 21.0 3500.0 25000.0 4 Alice NaN NaN NaN DataFrame after removing rows with NaN value in Income Field: Name Age Income($) Expense($) 0 Alice 19.0 4000.0 3000.0 1 Steven NaN 5000.0 2000.0 3 Chris 21.0 3500.0 25000.0
Here, we apply the notna()
method to the column Income($)
, which returns a series object with True
or False
values depending upon the column's values. When we pass the boolean object as an index to the original DataFrame, we only get rows without NaN
values for the Income($)
column.
Pandas Drop Rows Only With NaN
Values for All Columns Using DataFrame.dropna()
Method
import pandas as pd roll_no = [501, 502, 503, 504, 505] data = pd.DataFrame({ 'Id': [621, 645, 210, 345, None], 'Age': [19, None, 18, 21, None], 'Income($)': [4000, 5000, None, 3500, None], 'Expense($)': [3000, 2000, 2500, 25000, None] }) print("Initial DataFrame:") print(data) print("") data = data.dropna(how='all') print("DataFrame after removing rows with NaN value in All Columns:") print(data)
Output:
Initial DataFrame: Id Age Income($) Expense($) 0 621.0 19.0 4000.0 3000.0 1 645.0 NaN 5000.0 2000.0 2 210.0 18.0 NaN 2500.0 3 345.0 21.0 3500.0 25000.0 4 NaN NaN NaN NaN DataFrame after removing rows with NaN value in All Columns: Id Age Income($) Expense($) 0 621.0 19.0 4000.0 3000.0 1 645.0 NaN 5000.0 2000.0 2 210.0 18.0 NaN 2500.0 3 345.0 21.0 3500.0 25000.0
It removes only the rows with NaN
values for all fields in the DataFrame. We set how='all'
in the dropna()
method to let the method drop row only if all column values for the row is NaN
.
Pandas Drop Rows Only With NaN
Values for a Particular Column Using DataFrame.dropna()
Method
import pandas as pd roll_no = [501, 502, 503, 504, 505] data = pd.DataFrame({ 'Id': [621, 645, 210, 345, None], 'Age': [19, None, 18, 21, None], 'Income($)': [4000, 5000, None, 3500, None], 'Expense($)': [3000, 2000, 2500, 25000, None] }) print("Initial DataFrame:") print(data) print("") data = data.dropna(subset=["Id"]) print("DataFrame after removing rows with NaN value in Id Column:") print(data)
Output:
Initial DataFrame: Id Age Income($) Expense($) 0 621.0 19.0 4000.0 3000.0 1 645.0 NaN 5000.0 2000.0 2 210.0 18.0 NaN 2500.0 3 345.0 21.0 3500.0 25000.0 4 NaN NaN NaN NaN DataFrame after removing rows with NaN value in Id Column: Id Age Income($) Expense($) 0 621.0 19.0 4000.0 3000.0 1 645.0 NaN 5000.0 2000.0 2 210.0 18.0 NaN 2500.0 3 345.0 21.0 3500.0 25000.0
It drops all the columns in the DataFrame, which have NaN
value only in the Id
Column.
Pandas Drop Rows With NaN
Values for Any Column Using DataFrame.dropna()
Method
import pandas as pd roll_no = [501, 502, 503, 504, 505] data = pd.DataFrame({ 'Id': [621, 645, 210, 345, None], 'Age': [19, None, 18, 21, None], 'Income($)': [4000, 5000, None, 3500, None], 'Expense($)': [3000, 2000, 2500, 25000, None] }) print("Initial DataFrame:") print(data) print("") data = data.dropna() print("DataFrame after removing rows with NaN value in any column:") print(data)
Output:
Initial DataFrame: Id Age Income($) Expense($) 0 621.0 19.0 4000.0 3000.0 1 645.0 NaN 5000.0 2000.0 2 210.0 18.0 NaN 2500.0 3 345.0 21.0 3500.0 25000.0 4 NaN NaN NaN NaN DataFrame after removing rows with NaN value in any column: Id Age Income($) Expense($) 0 621.0 19.0 4000.0 3000.0 3 345.0 21.0 3500.0 25000.0
By default, the dropna()
method will remove all the row which have at least one NaN
value.
Contribute
DelftStack is a collective effort contributed by software geeks like you. If you like the article and would like to contribute to DelftStack by writing paid articles, you can check the write for us page.
Related Article - Pandas DataFrame Row
Related Article - Pandas NaN
How to Remove Rows With Nan in Pandas
Source: https://www.delftstack.com/howto/python-pandas/pandas-drop-rows-with-nan/