Python group conditional sum

qweasd911aca 注册会员
2023-02-28 02:15

the reference answer GPT ᴼ ᴾ ᴱ ᴺ ᴬ ᴵ < br / > you can use np in after the groupby agg. The where condition function to for eligible to sum the data, Here is a sample code:

import numpy as np

# 假设你的数据框名为 df
index_list = list(df.columns[2:])  # 获取所有的指标列名

# 根据 Group 和 Year 进行分组
grouped = df.groupby(['Group', 'Year'])

# 对每组进行条件求和
for group_name, group_df in grouped:
    # 按照你的条件进行求和,下面是以 2015 年为例
    sum_index1 = np.sum(np.where(group_df['Year'] <= 2015, group_df['Index1'], 0))
    sum_index2 = np.sum(np.where(group_df['Year'] <= 2015, group_df['Index2'], 0))
    # 其他指标也可以按照相同的方式求和,这里只是示例
    print(f"Group: {group_name[0]}, Year: {group_name[1]}, Sum of Index1 before 2015: {sum_index1}, Sum of Index2 before 2015: {sum_index2}")

You can sum indices for other years in a similar way, just by modifying the condition in np.where.

davishao 注册会员
2023-02-28 02:15

This answer quotes ChatGPT

You can use condition filtering in conjunction with the groupby and agg functions for condition summation. Here is a sample code:

# 创建一个字典,用于存储每个分组在不同时间段内各个指标的和
grouped_sum = {'Group': [], 'Before_2010': [], 'Before_2015': [], 'Before_2020': []}
# 遍历每个分组
for group_name, group_data in df.groupby('Group'):
    # 将分组名称添加到字典中
    # 在 2010 年之前的指标求和
    before_2010_sum = group_data[group_data['Year'] < 2010][index_list[2:]].sum().tolist()
    # 在 2015 年之前的指标求和
    before_2015_sum = group_data[group_data['Year'] < 2015][index_list[2:]].sum().tolist()
    # 在 2020 年之前的指标求和
    before_2020_sum = group_data[group_data['Year'] < 2020][index_list[2:]].sum().tolist()
# 将结果转换成 DataFrame 格式
result_df = pd.DataFrame(grouped_sum)

In the above code, we first create an empty dictionary, grouped_sum, to store the sum of the metrics for each group over different time periods. We then use the groupby function to Group the data into group columns, and then iterate over each group. For each group, we use conditional filtering and the sum function to find the sum of the metrics over different time periods and add the results to the grouped_sum dictionary. Finally, we convert the grouped_sum dictionary into DataFrame format for subsequent operations.

Thus, you have a DataFrame where each row represents a group, each column represents a time period and a metric.

About the Author

Question Info

Publish Time
2023-02-28 02:15
Update Time
2023-02-28 02:15