Pandas Basics in Python

Python : Pandas

Pandas is an open-source data manipulation library for Python. It is used for data manipulation, analysis, and cleaning tasks. Pandas provides a simple and efficient way to manipulate data in the form of Series (1-dimensional) and DataFrame (2-dimensional) objects.

Pandas Data Structures:

Pandas provides two primary data structures:

Series: A one-dimensional labeled array capable of holding any data type.
DataFrame: A two-dimensional labeled data structure with columns of potentially different types.

Creating a Pandas Series:

A Pandas Series can be created using a list, dictionary, or ndarray.

import pandas as pd
data = [10, 20, 30, 40, 50]
s = pd.Series(data)
print(s)

# output
# 0    10
# 1    20
# 2    30
# 3    40
# 4    50
# dtype: int64

Creating a Pandas DataFrame:

A Pandas DataFrame can be created using a [dictionaries]((https://learngolangonline.com/python/dictionaries), list of dictionaries, or ndarray.

import pandas as pd
data = {'Name': ['John', 'Steve', 'Sarah', 'Mike'], 'Age': [25, 30, 28, 35], 'Salary': [50000, 60000, 55000, 70000]}
df = pd.DataFrame(data)
print(df)

Output:

    Name  Age  Salary
0   John   25   50000
1  Steve   30   60000
2  Sarah   28   55000
3   Mike   35   70000

Pandas Basic Operations:

Indexing:

Pandas provides different ways to index and select data. The loc method is used to label-based indexing, and the iloc method is used for integer-based indexing.

import pandas as pd
data = {'Name': ['John', 'Steve', 'Sarah', 'Mike'], 'Age': [25, 30, 28, 35], 'Salary': [50000, 60000, 55000, 70000]}
df = pd.DataFrame(data)
print(df.loc[0])  # select first row
print(df.iloc[0])  # select first row
print(df['Age'])  # select 'Age' column

Filtering:

We can filter rows based on certain conditions.

import pandas as pd
data = {'Name': ['John', 'Steve', 'Sarah', 'Mike'], 'Age': [25, 30, 28, 35], 'Salary': [50000, 60000, 55000, 70000]}
df = pd.DataFrame(data)
print(df[df['Age'] > 28])  # filter rows where Age is greater than 28

Adding and Removing Rows/Columns:

We can add and remove rows and columns from a DataFrame.

import pandas as pd

# create a sample dataframe
data = {'name': ['John', 'Emily', 'Kate'], 'age': [25, 30, 35], 'city': ['New York', 'Paris', 'London']}
df = pd.DataFrame(data)

# add a new row
df.loc[3] = ['David', 28, 'Tokyo']
print(df)

# add a new column
df['country'] = ['USA', 'France', 'UK', 'Japan']
print(df)

# remove a row
df = df.drop(2)
print(df)

Loading data:
Pandas can load data from various sources including CSV, Excel, SQL, and more. The read_csv() function is commonly used to load data from CSV files into a Pandas DataFrame. For example, to load a CSV file named "data.csv" into a DataFrame, we can use the following code:
```
import pandas as pd
df = pd.read_csv('data.csv')
```
Viewing data:
To view the data in a DataFrame, we can use the head() or tail() functions to view the top or bottom rows, respectively. For example, to view the top 5 rows of a DataFrame named "df", we can use the following code:
```
print(df.head())
```
Data selection:
Pandas provides various methods for selecting data from a DataFrame. We can select columns using the column name, select rows using boolean indexing, and select subsets of rows and columns using the loc[] and iloc[] functions. For example, to select a column named "column_name" from a DataFrame named "df", we can use the following code:
```
column = df['column_name']
```
To select rows based on a condition, we can use boolean indexing. For example, to select rows where a column named "column_name" equals a certain value, we can use the following code:
```
subset = df[df['column_name'] == value]
```
Data manipulation:
Pandas provides various methods for manipulating data in a DataFrame. We can add, remove, or modify columns, and perform mathematical operations on the data. For example, to add a new column named "new_column" to a DataFrame named "df" that is the sum of two other columns, we can use the following code:
```
df['new_column'] = df['column1'] + df['column2']
```
Data aggregation:
Pandas provides various methods for aggregating data in a DataFrame. We can group data by a column and calculate statistics on the groups, or use pivot tables to summarize the data. For example, to group a DataFrame named "df" by a column named "column_name" and calculate the mean value of another column named "column2", we can use the following code:
```
grouped_data = df.groupby('column_name')['column2'].mean()
```

Pandas Basics in Python

Python : Pandas

Pandas Data Structures:

Creating a Pandas Series:

Creating a Pandas DataFrame:

Pandas Basic Operations:

Indexing:

Filtering:

Adding and Removing Rows/Columns:

Loading data:

Viewing data:

Data selection:

Data manipulation:

Data aggregation:

Previous Article

Next Article

Python Tutorials