How to perform data filtering operations in Pig?
FILTER keyword is typically used in Pig to perform data filtering operations. Data that meets specified condition can be filtered out by specifying a conditional expression.
For example, suppose we have a dataset containing student information, including student names and scores, and we want to filter out the data of students with scores greater than or equal to 60, we can use the following statement:
student_data = LOAD 'input/student_data' USING PigStorage(',') AS (name:chararray, score:int);
filtered_data = FILTER student_data BY score >= 60;
DUMP filtered_data;
The code above firstly loads student data and stores it in a relation named student_data. It then filters out student data where the score is greater than or equal to 60 using the FILTER keyword, and stores the result in a relation named filtered_data. Finally, it outputs the filtered data to the console using the DUMP statement.