Python train_test_split Guide

2 years ago

Isabella Edwards

1 minute

The train_test_split function is used to divide a dataset into training and testing sets. In Python, the train_test_split function can be accessed through the model_selection module in the sklearn library. Here is a simple example:

from sklearn.model_selection import train_test_split
import numpy as np

# 生成一些示例数据
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
y = np.array([0, 1, 0, 1])

# 将数据集划分为训练集和测试集，test_size指定测试集比例，random_state设置随机种子
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 打印划分后的训练集和测试集
print("训练集X:", X_train)
print("测试集X:", X_test)
print("训练集y:", y_train)
print("测试集y:", y_test)

In this example, we created a dataset X containing 4 samples and their corresponding labels y. Then, we used the train_test_split function to divide the dataset into a training set and a test set, with the test_size parameter specifying the proportion of the test set (in this case, 0.2 or 20%), and the random_state parameter setting a random seed to ensure consistent results in the split. Finally, we printed out the training and test sets after the division.

#data splitting #machine learning #Python #sklearn #train_test_split