Create PyTorch Dataset: Step-by-Step Guide

2 years ago

Noah Thompson

2 minutes

PyTorch offers a class called Dataset that can be used to create custom datasets. To create a dataset, you need to inherit from the Dataset class and implement the methods __len__ and __getitem__.

The __len__ method returns the size of the dataset, which is the number of data samples.

The __getitem__ method returns the corresponding data sample based on the given index. In this method, data files can be read, data can be preprocessed, and the required input and output data for the model can be returned.

Here is a simple example demonstrating how to create a custom dataset class.

import torch
from torch.utils.data import Dataset

class CustomDataset(Dataset):
    def __init__(self, data):
        self.data = data

    def __len__(self):
        return len(self.data)

    def __getitem__(self, index):
        sample = self.data[index]
        # 可以对数据进行预处理
        input_data = sample[:-1]
        target = sample[-1]
        return torch.tensor(input_data), torch.tensor(target)

In the example above, the CustomDataset class takes a list of data as a parameter and implements the methods __len__ and __getitem__. In the __getitem__ method, the data samples are split into input data and target data, and the corresponding tensors are returned.

Once a custom dataset class has been created, the DataLoader class can be used to load the data and iterate through training the model.

#custom dataset #Dataset #Deep Learning #machine learning #PyTorch