使用Apache NiFi从FTP中获取数据
团队创建:

目标
定期监控FTP特定文件夹下的文件状态。
一旦有新文件,就利用FTP协议获取并删除原FTP文件夹中的文件。
将从FTP获取的文件保存到本地HDF、HDFS以及S3中。
在小组内选择处理器。
最终效果图 (zuì guǒ tú)

创建NiFi数据流
添加 GetFTP 处理器
-
- GetFTPプロセッサーをダブルクリックし、詳細設定入力
-
- 「Scheduling」タブの「 Run schedule 」を、0s から 10sに変更
- [Properties]タブ:Hostname/username/passwordなどの値を入力, Remote Pathに監視するFTP フォルダのPATHを入力



添加PutFile处理器:将获取的FTP保存到本地

在HDFS上保存:使用PutHDFS
使用PutS3Object处理器将文件保存到S3。

把指定的S3路径放入Bucket中。
範本源代碼
50e163f2-0166-1000-0000-00006d5794e2
01_FTP
429a0e9c-6d51-3297-0000-000000000000
8aeae414-8380-347f-0000-000000000000 1 GB
10000
8aeae414-8380-347f-0000-000000000000
b4ee730b-349e-3c9a-0000-000000000000
PROCESSOR
0 sec
1
success
8aeae414-8380-347f-0000-000000000000
117f64d6-4e2a-354e-0000-000000000000
PROCESSOR
0
60859d11-efdf-3783-0000-000000000000
8aeae414-8380-347f-0000-000000000000 1 GB
10000
8aeae414-8380-347f-0000-000000000000
cdf13c7b-f08f-35d5-0000-000000000000
PROCESSOR
0 sec
1
success
8aeae414-8380-347f-0000-000000000000
117f64d6-4e2a-354e-0000-000000000000
PROCESSOR
0
6b0f6a19-96ca-3756-0000-000000000000
8aeae414-8380-347f-0000-000000000000 1 GB
10000
8aeae414-8380-347f-0000-000000000000
42baf105-4593-3345-0000-000000000000
PROCESSOR
0 sec
1
success
8aeae414-8380-347f-0000-000000000000
117f64d6-4e2a-354e-0000-000000000000
PROCESSOR
0
117f64d6-4e2a-354e-0000-000000000000
8aeae414-8380-347f-0000-000000000000 0.0
0.0
nifi-standard-nar
org.apache.nifi
1.7.0.3.2.0.0-520
WARN
1
Hostname
Hostname
Port
Port
Username
Username
Password
Password
Connection Mode
Connection Mode
Transfer Mode
Transfer Mode
Remote Path
Remote Path
File Filter Regex
File Filter Regex
Path Filter Regex
Path Filter Regex
Polling Interval
Polling Interval
Search Recursively
Search Recursively
Ignore Dotted Files
Ignore Dotted Files
Delete Original
Delete Original
Connection Timeout
Connection Timeout
Data Timeout
Data Timeout
Max Selects
Max Selects
Remote Poll Batch Size
Remote Poll Batch Size
Use Natural Ordering
Use Natural Ordering
proxy-configuration-service
org.apache.nifi.proxy.ProxyConfigurationService
proxy-configuration-service
Proxy Type
Proxy Type
Proxy Host
Proxy Host
Proxy Port
Proxy Port
Http Proxy Username
Http Proxy Username
Http Proxy Password
Http Proxy Password
Internal Buffer Size
Internal Buffer Size
ftp-use-utf8
ftp-use-utf8
PRIMARY
false
30 sec
Hostname
ftp.gmoserver.jp
Port
21
Username
sd0068552.09@gmoserver.jp
Password
Connection Mode
Passive
Transfer Mode
Binary
Remote Path
/test01/
File Filter Regex
Path Filter Regex
Polling Interval
60 sec
Search Recursively
false
Ignore Dotted Files
true
Delete Original
true
Connection Timeout
30 sec
Data Timeout
30 sec
Max Selects
100
Remote Poll Batch Size
5000
Use Natural Ordering
false
proxy-configuration-service
Proxy Type
DIRECT
Proxy Host
Proxy Port
Http Proxy Username
Http Proxy Password
Internal Buffer Size
16KB
ftp-use-utf8
false
0
5 sec
TIMER_DRIVEN
1 sec
false
GetFTP
false
success
STOPPED