Pandas Resample Guide for Time Series

The resample method is a function in pandas that is used for resampling time series data. It can transform time series data into different time frequencies based on a specified frequency.

The general syntax for using the resample method is as follows:

dataframe.resample(rule, axis=0, closed=None, label=None, convention='start', kind=None, loffset=None, base=None, on=None, level=None, origin='start_day', offset=None)

Explanation of parameters:

  1. Rule: indicates the frequency of resampling, which can be represented as a string of time frequency (e.g. ‘D’ for daily, ‘W’ for weekly, etc.) or as a pandas DateOffset object (e.g. pd.DateOffset(days=1) for daily).
  2. The axis parameter represents the axis for resampling operations, with the default value being 0, which corresponds to resampling along the rows.
  3. closed: indicates the closure of the interval, default is None which means left inclusive and right exclusive;
  4. Label: The label representing the resampling result, which can be either ‘left’ or ‘right’, with the default value being None.
  5. The convention parameter indicates the method of resampling, with the default being ‘start’, which signifies using the start time of the interval as the label.
  6. kind: indicates the algorithm used for resampling, the default is None which means linear interpolation will be used.
  7. loffset: indicates the time offset of the resampled result.
  8. base: represents the reference time for resampling results;
  9. Column name to be resampled, only valid for DataFrame.
  10. Level: Indicates the name of the index level that needs to be resampled, only effective for DataFrames with multiple levels of indexing.
  11. origin: the starting time of the resampled results;
  12. offset: indicates the time shift of the resampling result.

Here is an example using the resample method:

import pandas as pd

# 创建一个时间序列数据
data = {'date': pd.date_range(start='2021-01-01', end='2021-01-31'),
        'value': range(31)}
df = pd.DataFrame(data)

# 将数据按每周进行重采样
df_resampled = df.resample('W', on='date').sum()

print(df_resampled)

Output:

            value
date             
2021-01-03      3
2021-01-10     38
2021-01-17    102
2021-01-24    166
2021-01-31    230

The code above first creates a DataFrame containing dates and values, then uses the resample method to resample the data by week and calculate the sum of values for each week. Finally, it prints the resampled result.

bannerAds