Pandas Get Average Of Column Or Mean in Python (2024)

Welcome to another Python tutorial on Pandas! In this guide, we’ll explore how to get the average of a column using the powerful Pandas library. Pandas is a versatile data manipulation and analysis library that provides easy-to-use data structures for handling and analyzing structured data. If you’re working with datasets in Python, Pandas is a must-have tool in your toolkit.

Setting Up Pandas to Get the Average of a Column

Before we dive into calculating column averages, let’s make sure you have Pandas installed. Open your terminal or command prompt and type the following command:

# Installing Pandas librarypip install pandas

If you haven’t installed Pandas yet, this command will download and install it for you.

Now, let’s create a Python script or Jupyter Notebook to follow along with the examples.

Importing Pandas

Start by importing Pandas into your script or notebook. Conventionally, Pandas is imported with the alias ‘pd’:

# Import Pandas and refer it using the pds aliasimport pandas as pds

This allows us to use ‘pd’ as a shorthand reference for Pandas functions throughout the tutorial.

Loading a Sample Dataset

For the purpose of this tutorial, let’s use a complex dataset. You can easily adapt the code to work with your own dataset later. We’ll use a hypothetical dataset representing the sales of different products:

Empl_IDNameDeptSalaryExp_Years
101EmmaHR600003
102AlexIT800005
103SamSales700002
104JackIT900008
105OliviaHR6500011
106MaxFinance750004
107SophiaIT850006
108CharlieSales7200011
109LilyHR690001
110LucasFinance800007

the following is using the above data and creating a data frame for calculating the mean or average.

import pandas as pds# Create a sample DataFrameempl_data = { 'Empl_ID': [101, 102, 103, 104, 105, 106, 107, 108, 109, 110], 'Name': ['Emma', 'Alex', 'Sam', 'Jack', 'Olivia', 'Max', 'Sophia', 'Charlie', 'Lily', 'Lucas'], 'Dept': ['HR', 'IT', 'Sales', 'IT', 'HR', 'Finance', 'IT', 'Sales', 'HR', 'Finance'], 'Salary': [60000, 80000, 70000, 90000, 65000, 75000, 85000, 72000, 69000, 80000], 'Exp_Years': [3, 5, 2, 8, 11, 4, 6, 11, 1, 7]}dfr = pds.DataFrame(empl_data)

If you want the above data read from CSV using Pandas, then use the following code:

# Load the dataset from CSVdfr = pds.read_csv('empl_data.csv')

Now, you have a basic DataFrame with five columns: ‘Empl_ID’, ‘Name’, ‘Dept’, ‘Salary’, and ‘Exp_Years’.

See also Python While Loop Tutorial

Using Pandas to Get the Average or Mean for a Column

To find the average of a specific column, you can use the mean() function provided by Pandas. Let’s calculate the average salary:

average_sales = dfr['Salary'].mean()print(f'The average sales is: {average_sales}')

In this example, dfr['Salary'] selects the ‘Sales’ column, and .mean() calculates the average. The result is then printed to the console. Easy, right?

In real-world scenarios, datasets are often more extensive. That’s why, we took a rich dataset with more rows and columns: However, you can load your own CSV having larger data.

Pandas seamlessly handle larger datasets, making Pandas library an efficient choice for data analysis tasks.

Dealing with Missing Values

In real-world datasets, you might encounter missing values. Pandas provide a convenient way to handle them. Let’s introduce a missing value in our sales data:

# Introduce a missing valuedfr.at[2, 'Exp_Years'] = Nonedfr.at[5, 'Exp_Years'] = Nonedfr.at[4, 'Salary'] = Nonedfr.at[7, 'Salary'] = None

Now, if you try to calculate the average as before, Pandas will automatically skip the missing value:

avg_exp = df['Exp_Years'].mean()print(f'The Average Experience is: {avg_exp}')avg_sal = df['Salary'].mean()print(f'The Average Salary is: {avg_sal}')

Pandas take care of missing values by excluding them from calculations, ensuring you get accurate results.

Pandas to Get the Average for Multiple Columns

What if you want to find the average for multiple columns? Pandas make it straightforward. Let’s extend our dataset with an ‘Expenses’ column:

import pandas as pds# Reuse the Eployee Sample Dataempl_data = { 'Empl_ID': [101, 102, 103, 104, 105, 106, 107, 108, 109, 110], 'Name': ['Emma', 'Alex', 'Sam', 'Jack', 'Olivia', 'Max', 'Sophia', 'Charlie', 'Lily', 'Lucas'], 'Dept': ['HR', 'IT', 'Sales', 'IT', 'HR', 'Finance', 'IT', 'Sales', 'HR', 'Finance'], 'Salary': [60000, 80000, 70000, 90000, 65000, 75000, 85000, 72000, 69000, 80000], 'Exp_Years': [3, 5, 2, 8, 11, 4, 6, 11, 1, 7]}dfr = pds.DataFrame(empl_data)

To calculate the average for both ‘Salary’ and ‘Exp_Years’, you can use the following code:

avg_sal = dfr['Salary'].mean()avg_exp = dfr['Exp_Years'].mean()print(f'The Average Salary is: {avg_sal}')print(f'The Average Experience is: {avg_exp}')

You can easily adapt this approach to calculate averages for as many columns as needed. Let’s now combine all the pieces of code and print the final output.

import pandas as pds# Reuse the Eployee Sample Dataempl_data = { 'Empl_ID': [101, 102, 103, 104, 105, 106, 107, 108, 109, 110], 'Name': ['Emma', 'Alex', 'Sam', 'Jack', 'Olivia', 'Max', 'Sophia', 'Charlie', 'Lily', 'Lucas'], 'Dept': ['HR', 'IT', 'Sales', 'IT', 'HR', 'Finance', 'IT', 'Sales', 'HR', 'Finance'], 'Salary': [60000, 80000, 70000, 90000, 65000, 75000, 85000, 72000, 69000, 80000], 'Exp_Years': [3, 5, 2, 8, 11, 4, 6, 11, 1, 7]}dfr = pds.DataFrame(empl_data)avg_sal = dfr['Salary'].mean()avg_exp = dfr['Exp_Years'].mean()print("========= Printing Avg With Actual Data ========")print(f'The Average Salary is: {avg_sal}')print(f'The Average Experience is: {avg_exp}')# Introduce a missing valuedfr.at[2, 'Exp_Years'] = Nonedfr.at[5, 'Exp_Years'] = Nonedfr.at[4, 'Salary'] = Nonedfr.at[7, 'Salary'] = Noneavg_sal = dfr['Salary'].mean()avg_exp = dfr['Exp_Years'].mean()print("========= Printing Avg With Missing Values ========")print(f'The Average Salary is: {avg_sal}')print(f'The Average Experience is: {avg_exp}')

When you run this code, it will give you the following result:

========= Printing Avg With Actual Data ========The Average Salary is: 74600.0The Average Experience is: 5.8========= Printing Avg With Missing Values ========The Average Salary is: 76125.0The Average Experience is: 6.5

Once we intentionally introduced some missing values, Pandas reported a slight increase in the average or mean value of the ‘Salary’ and ‘Exp_Year’ columns. This is because some of the rows were discarded due to missing values.

See also How to Write Multiline Comments in Python

Get the Average Using Pandas Describe() Method

Pandas also has a describe() method which calculates several other things including the mean or average. So, to test this, let’s get the above code modified to call the describe() method.

import pandas as pds# Reuse the Employee Sample Dataempl_data = { 'Empl_ID': [101, 102, 103, 104, 105, 106, 107, 108, 109, 110], 'Name': ['Emma', 'Alex', 'Sam', 'Jack', 'Olivia', 'Max', 'Sophia', 'Charlie', 'Lily', 'Lucas'], 'Dept': ['HR', 'IT', 'Sales', 'IT', 'HR', 'Finance', 'IT', 'Sales', 'HR', 'Finance'], 'Salary': [60000, 80000, 70000, 90000, 65000, 75000, 85000, 72000, 69000, 80000], 'Exp_Years': [3, 5, 2, 8, 11, 4, 6, 11, 1, 7]}dfr = pds.DataFrame(empl_data)avg_sal = dfr['Salary'].mean()avg_exp = dfr['Exp_Years'].mean()print("========= Printing Avg With Actual Data ========")print(f'The Average Salary is: {avg_sal}')print(f'The Average Experience is: {avg_exp}')# Introduce missing valuesdfr.at[2, 'Exp_Years'] = Nonedfr.at[5, 'Exp_Years'] = Nonedfr.at[4, 'Salary'] = Nonedfr.at[7, 'Salary'] = None# Use describe() to get a summary including meansummary = dfr.describe()# Display the mean from the summaryavg_sal_from_desc = summary.loc['mean', 'Salary']avg_exp_from_desc = summary.loc['mean', 'Exp_Years']print("\n========= Printing Avg With Missing Values Using describe() ========")print(f'The Average Salary is: {avg_sal_from_desc}')print(f'The Average Experience is: {avg_exp_from_desc}')

So, even with this new method, we get the similar results as shown below:

========= Printing Avg With Actual Data ========The Average Salary is: 74600.0The Average Experience is: 5.8========= Printing Avg With Missing Values Using describe() ========The Average Salary is: 76125.0The Average Experience is: 6.5

Conclusion

Congratulations! You’ve successfully learned how to calculate the average for a column in Pandas. We covered importing Pandas, loading a sample dataset, handling missing values, and calculating averages for both single and multiple columns.

Pandas provides a powerful and intuitive way to work with data, making tasks like this a breeze. As you continue your journey with Python, you’ll find Pandas to be an invaluable tool for data manipulation and analysis.

See also How to Find the List Shape in Python

Feel free to explore more Pandas functionalities, as it offers a wide range of features for data exploration and manipulation.

Happy Coding,
Team TechBeamers

Insights, advice, suggestions, feedback and comments from experts

About Me

I am an expert and enthusiast assistant. I have a deep understanding of various topics and can provide accurate and reliable information. My knowledge is based on a wide range of reputable sources, and I can assist with a diverse array of queries. If you have any questions or need assistance, feel free to ask!

Introduction to Pandas and Calculating Averages

The article provides a comprehensive guide on using the Pandas library in Python to calculate the average of a column in a dataset. It covers essential concepts such as setting up Pandas, importing data, handling missing values, and calculating averages for single and multiple columns. The tutorial also demonstrates the use of the mean() function and the describe() method in Pandas to obtain average values from the dataset.

Now, let's delve into the details of the concepts covered in the article.

Setting Up Pandas

To begin, the article emphasizes the importance of having Pandas installed and demonstrates how to install it using the pip install pandas command. It also explains the process of importing Pandas into a Python script or Jupyter Notebook using the alias 'pd'.

Loading a Sample Dataset

The tutorial provides a sample dataset representing the sales of different products and explains how to create a DataFrame using the provided data. Additionally, it offers guidance on loading data from a CSV file using Pandas.

Calculating Averages Using Pandas

The article demonstrates how to use the mean() function in Pandas to calculate the average of a specific column in the dataset. It provides a clear example of calculating the average salary and explains the process step by step.

Handling Missing Values

In real-world datasets, missing values are common, and the article addresses this by introducing missing values into the dataset and showcasing how Pandas automatically handles them when calculating averages.

Calculating Averages for Multiple Columns

The tutorial extends the concept of calculating averages to multiple columns and illustrates how to obtain the average values for both 'Salary' and 'Exp_Years' columns.

Utilizing the describe() Method

The article introduces the describe() method in Pandas, which provides a summary of various statistics, including the mean or average. It demonstrates how to use this method to obtain average values and compares the results with those obtained using the mean() function.

Conclusion

The tutorial concludes by summarizing the key concepts covered, highlighting the power and versatility of Pandas for data manipulation and analysis. It encourages further exploration of Pandas' functionalities for data exploration and manipulation.

In summary, the article provides a comprehensive and practical guide to leveraging Pandas for calculating averages in Python, making it a valuable resource for individuals working with data analysis and manipulation.

Pandas Get Average Of Column Or Mean in Python (2024)
Top Articles
Latest Posts
Article information

Author: Saturnina Altenwerth DVM

Last Updated:

Views: 5793

Rating: 4.3 / 5 (44 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Saturnina Altenwerth DVM

Birthday: 1992-08-21

Address: Apt. 237 662 Haag Mills, East Verenaport, MO 57071-5493

Phone: +331850833384

Job: District Real-Estate Architect

Hobby: Skateboarding, Taxidermy, Air sports, Painting, Knife making, Letterboxing, Inline skating

Introduction: My name is Saturnina Altenwerth DVM, I am a witty, perfect, combative, beautiful, determined, fancy, determined person who loves writing and wants to share my knowledge and understanding with you.