Pandas Basics in Python
Python : Pandas
Pandas is an open-source data manipulation library for Python. It is used for data manipulation, analysis, and cleaning tasks. Pandas provides a simple and efficient way to manipulate data in the form of Series (
1-dimensional
) and DataFrame (2-dimensional
) objects.
Pandas Data Structures:
Pandas provides two primary data structures:
- Series: A one-dimensional labeled array capable of holding any data type.
- DataFrame: A two-dimensional labeled data structure with columns of potentially different types.
Creating a Pandas Series:
A Pandas Series can be created using a list, dictionary, or ndarray.
import pandas as pd
data = [10, 20, 30, 40, 50]
s = pd.Series(data)
print(s)
# output
# 0 10
# 1 20
# 2 30
# 3 40
# 4 50
# dtype: int64
Creating a Pandas DataFrame:
A Pandas DataFrame can be created using a [dictionaries]((https://learngolangonline.com/python/dictionaries), list of dictionaries
, or ndarray
.
import pandas as pd
data = {'Name': ['John', 'Steve', 'Sarah', 'Mike'], 'Age': [25, 30, 28, 35], 'Salary': [50000, 60000, 55000, 70000]}
df = pd.DataFrame(data)
print(df)
Output:
Name Age Salary
0 John 25 50000
1 Steve 30 60000
2 Sarah 28 55000
3 Mike 35 70000
Pandas Basic Operations:
Indexing:
Pandas provides different ways to index and select data. The loc method is used to label-based indexing, and the iloc method is used for integer-based indexing.
import pandas as pd data = {'Name': ['John', 'Steve', 'Sarah', 'Mike'], 'Age': [25, 30, 28, 35], 'Salary': [50000, 60000, 55000, 70000]} df = pd.DataFrame(data) print(df.loc[0]) # select first row print(df.iloc[0]) # select first row print(df['Age']) # select 'Age' column
Filtering:
We can filter rows based on certain conditions.
import pandas as pd data = {'Name': ['John', 'Steve', 'Sarah', 'Mike'], 'Age': [25, 30, 28, 35], 'Salary': [50000, 60000, 55000, 70000]} df = pd.DataFrame(data) print(df[df['Age'] > 28]) # filter rows where Age is greater than 28
Adding and Removing Rows/Columns:
We can add and remove rows and columns from a DataFrame.
import pandas as pd # create a sample dataframe data = {'name': ['John', 'Emily', 'Kate'], 'age': [25, 30, 35], 'city': ['New York', 'Paris', 'London']} df = pd.DataFrame(data) # add a new row df.loc[3] = ['David', 28, 'Tokyo'] print(df) # add a new column df['country'] = ['USA', 'France', 'UK', 'Japan'] print(df) # remove a row df = df.drop(2) print(df)
Loading data:
Pandas can load data from various sources including CSV, Excel, SQL, and more. The read_csv() function is commonly used to load data from CSV files into a Pandas DataFrame. For example, to load a CSV file named "data.csv" into a DataFrame, we can use the following code:
import pandas as pd df = pd.read_csv('data.csv')
Viewing data:
To view the data in a DataFrame, we can use the head() or tail() functions to view the top or bottom rows, respectively. For example, to view the top 5 rows of a DataFrame named "df", we can use the following code:
print(df.head())
Data selection:
Pandas provides various methods for selecting data from a DataFrame. We can select columns using the column name, select rows using boolean indexing, and select subsets of rows and columns using the loc[] and iloc[] functions. For example, to select a column named "column_name" from a DataFrame named "df", we can use the following code:
column = df['column_name']
To select rows based on a condition, we can use boolean indexing. For example, to select rows where a column named "column_name" equals a certain value, we can use the following code:
subset = df[df['column_name'] == value]
Data manipulation:
Pandas provides various methods for manipulating data in a DataFrame. We can add, remove, or modify columns, and perform mathematical operations on the data. For example, to add a new column named "new_column" to a DataFrame named "df" that is the sum of two other columns, we can use the following code:
df['new_column'] = df['column1'] + df['column2']
Data aggregation:
Pandas provides various methods for aggregating data in a DataFrame. We can group data by a column and calculate statistics on the groups, or use pivot tables to summarize the data. For example, to group a DataFrame named "df" by a column named "column_name" and calculate the mean value of another column named "column2", we can use the following code:
grouped_data = df.groupby('column_name')['column2'].mean()
Previous Article
Next Article
Python Tutorials
- Hello World
- Variables and Types
- Lists
- Tuple
- Basic Operators
- Strings
- Conditions
- Loops
- Functions
- Classes and Objects
- Dictionaries
- Map
- Filter
- Reduce
- Sets
- Decorators
- Generators
- Modules and Packages
- Numpy Arrays
- Pandas Basics
- List Comprehensions
- Lambda functions
- Multiple Function Arguments
- Partial functions
- Regular Expressions
- Exception Handling
- Serialization
- Code Introspection