How to Remove Rows With Nan in Pandas
- HowTo
- Python Pandas Howtos
- Pandas Drop Rows With NaN
Pandas Drop Rows With NaN
Created: January-16, 2021 | Updated: November-26, 2021
- Pandas Drop Rows With NaN Using the
DataFrame.notna()Method - Pandas Drop Rows Only With
NaNValues for All Columns UsingDataFrame.dropna()Method - Pandas Drop Rows Only With
NaNValues for a Particular Column UsingDataFrame.dropna()Method - Pandas Drop Rows With
NaNValues for Any Column UsingDataFrame.dropna()Method
This tutorial explains how we can drop all the rows with NaN values using DataFrame.notna() and DataFrame.dropna() methods.
We will use the DataFrame in the example code below.
import pandas as pd roll_no = [501, 502, 503, 504, 505] data = pd.DataFrame({ 'Name': ['Alice', 'Steven', 'Neesham', 'Chris', 'Alice'], 'Age': [19, None, 18, 21, None], 'Income($)': [4000, 5000, None, 3500, None], 'Expense($)': [3000, 2000, 2500, 25000, None] }) print(data) Output:
Name Age Income($) Expense($) 0 Alice 19.0 4000.0 3000.0 1 Steven NaN 5000.0 2000.0 2 Neesham 18.0 NaN 2500.0 3 Chris 21.0 3500.0 25000.0 4 Alice NaN NaN NaN Pandas Drop Rows With NaN Using the DataFrame.notna() Method
The DataFrame.notna() method returns a boolean object with the same number of rows and columns as the caller DataFrame. If an element is not NaN, it gets mapped to the True value in the boolean object, and if an element is a NaN, it gets mapped to the False value.
import pandas as pd roll_no = [501, 502, 503, 504, 505] data = pd.DataFrame({ 'Name': ['Alice', 'Steven', 'Neesham', 'Chris', 'Alice'], 'Age': [19, None, 18, 21, None], 'Income($)': [4000, 5000, None, 3500, None], 'Expense($)': [3000, 2000, 2500, 25000, None] }) print("Initial DataFrame:") print(data) print("") data = data[data['Income($)'].notna()] print("DataFrame after removing rows with NaN value in Income Field:") print(data) Output:
Initial DataFrame: Name Age Income($) Expense($) 0 Alice 19.0 4000.0 3000.0 1 Steven NaN 5000.0 2000.0 2 Neesham 18.0 NaN 2500.0 3 Chris 21.0 3500.0 25000.0 4 Alice NaN NaN NaN DataFrame after removing rows with NaN value in Income Field: Name Age Income($) Expense($) 0 Alice 19.0 4000.0 3000.0 1 Steven NaN 5000.0 2000.0 3 Chris 21.0 3500.0 25000.0 Here, we apply the notna() method to the column Income($), which returns a series object with True or False values depending upon the column's values. When we pass the boolean object as an index to the original DataFrame, we only get rows without NaN values for the Income($) column.
Pandas Drop Rows Only With NaN Values for All Columns Using DataFrame.dropna() Method
import pandas as pd roll_no = [501, 502, 503, 504, 505] data = pd.DataFrame({ 'Id': [621, 645, 210, 345, None], 'Age': [19, None, 18, 21, None], 'Income($)': [4000, 5000, None, 3500, None], 'Expense($)': [3000, 2000, 2500, 25000, None] }) print("Initial DataFrame:") print(data) print("") data = data.dropna(how='all') print("DataFrame after removing rows with NaN value in All Columns:") print(data) Output:
Initial DataFrame: Id Age Income($) Expense($) 0 621.0 19.0 4000.0 3000.0 1 645.0 NaN 5000.0 2000.0 2 210.0 18.0 NaN 2500.0 3 345.0 21.0 3500.0 25000.0 4 NaN NaN NaN NaN DataFrame after removing rows with NaN value in All Columns: Id Age Income($) Expense($) 0 621.0 19.0 4000.0 3000.0 1 645.0 NaN 5000.0 2000.0 2 210.0 18.0 NaN 2500.0 3 345.0 21.0 3500.0 25000.0 It removes only the rows with NaN values for all fields in the DataFrame. We set how='all' in the dropna() method to let the method drop row only if all column values for the row is NaN.
Pandas Drop Rows Only With NaN Values for a Particular Column Using DataFrame.dropna() Method
import pandas as pd roll_no = [501, 502, 503, 504, 505] data = pd.DataFrame({ 'Id': [621, 645, 210, 345, None], 'Age': [19, None, 18, 21, None], 'Income($)': [4000, 5000, None, 3500, None], 'Expense($)': [3000, 2000, 2500, 25000, None] }) print("Initial DataFrame:") print(data) print("") data = data.dropna(subset=["Id"]) print("DataFrame after removing rows with NaN value in Id Column:") print(data) Output:
Initial DataFrame: Id Age Income($) Expense($) 0 621.0 19.0 4000.0 3000.0 1 645.0 NaN 5000.0 2000.0 2 210.0 18.0 NaN 2500.0 3 345.0 21.0 3500.0 25000.0 4 NaN NaN NaN NaN DataFrame after removing rows with NaN value in Id Column: Id Age Income($) Expense($) 0 621.0 19.0 4000.0 3000.0 1 645.0 NaN 5000.0 2000.0 2 210.0 18.0 NaN 2500.0 3 345.0 21.0 3500.0 25000.0 It drops all the columns in the DataFrame, which have NaN value only in the Id Column.
Pandas Drop Rows With NaN Values for Any Column Using DataFrame.dropna() Method
import pandas as pd roll_no = [501, 502, 503, 504, 505] data = pd.DataFrame({ 'Id': [621, 645, 210, 345, None], 'Age': [19, None, 18, 21, None], 'Income($)': [4000, 5000, None, 3500, None], 'Expense($)': [3000, 2000, 2500, 25000, None] }) print("Initial DataFrame:") print(data) print("") data = data.dropna() print("DataFrame after removing rows with NaN value in any column:") print(data) Output:
Initial DataFrame: Id Age Income($) Expense($) 0 621.0 19.0 4000.0 3000.0 1 645.0 NaN 5000.0 2000.0 2 210.0 18.0 NaN 2500.0 3 345.0 21.0 3500.0 25000.0 4 NaN NaN NaN NaN DataFrame after removing rows with NaN value in any column: Id Age Income($) Expense($) 0 621.0 19.0 4000.0 3000.0 3 345.0 21.0 3500.0 25000.0 By default, the dropna() method will remove all the row which have at least one NaN value.
Contribute
DelftStack is a collective effort contributed by software geeks like you. If you like the article and would like to contribute to DelftStack by writing paid articles, you can check the write for us page.
Related Article - Pandas DataFrame Row
Related Article - Pandas NaN
How to Remove Rows With Nan in Pandas
Source: https://www.delftstack.com/howto/python-pandas/pandas-drop-rows-with-nan/