使用Apache NiFi从FTP中获取数据

团队创建:

image.png

目标

定期监控FTP特定文件夹下的文件状态。
一旦有新文件,就利用FTP协议获取并删除原FTP文件夹中的文件。
将从FTP获取的文件保存到本地HDF、HDFS以及S3中。

在小组内选择处理器。

最终效果图 (zuì guǒ tú)

image.png

创建NiFi数据流

添加 GetFTP 处理器

    • GetFTPプロセッサーをダブルクリックし、詳細設定入力

 

    • 「Scheduling」タブの「 Run schedule 」を、0s から 10sに変更

 

    [Properties]タブ:Hostname/username/passwordなどの値を入力, Remote Pathに監視するFTP フォルダのPATHを入力
image.png
image.png
image.png

添加PutFile处理器:将获取的FTP保存到本地

image.png

在HDFS上保存:使用PutHDFS

使用PutS3Object处理器将文件保存到S3。

image.png

把指定的S3路径放入Bucket中。

範本源代碼

ソースコード

50e163f2-0166-1000-0000-00006d5794e2
01_FTP

429a0e9c-6d51-3297-0000-000000000000
8aeae414-8380-347f-0000-000000000000 1 GB
10000

8aeae414-8380-347f-0000-000000000000
b4ee730b-349e-3c9a-0000-000000000000
PROCESSOR

0 sec
1

success

8aeae414-8380-347f-0000-000000000000
117f64d6-4e2a-354e-0000-000000000000
PROCESSOR

0

60859d11-efdf-3783-0000-000000000000
8aeae414-8380-347f-0000-000000000000 1 GB
10000

8aeae414-8380-347f-0000-000000000000
cdf13c7b-f08f-35d5-0000-000000000000
PROCESSOR

0 sec
1

success

8aeae414-8380-347f-0000-000000000000
117f64d6-4e2a-354e-0000-000000000000
PROCESSOR

0

6b0f6a19-96ca-3756-0000-000000000000
8aeae414-8380-347f-0000-000000000000 1 GB
10000

8aeae414-8380-347f-0000-000000000000
42baf105-4593-3345-0000-000000000000
PROCESSOR

0 sec
1

success

8aeae414-8380-347f-0000-000000000000
117f64d6-4e2a-354e-0000-000000000000
PROCESSOR

0

117f64d6-4e2a-354e-0000-000000000000
8aeae414-8380-347f-0000-000000000000 0.0
0.0

nifi-standard-nar
org.apache.nifi
1.7.0.3.2.0.0-520

WARN

1

Hostname

Hostname

Port

Port

Username

Username

Password

Password

Connection Mode

Connection Mode

Transfer Mode

Transfer Mode

Remote Path

Remote Path

File Filter Regex

File Filter Regex

Path Filter Regex

Path Filter Regex

Polling Interval

Polling Interval

Search Recursively

Search Recursively

Ignore Dotted Files

Ignore Dotted Files

Delete Original

Delete Original

Connection Timeout

Connection Timeout

Data Timeout

Data Timeout

Max Selects

Max Selects

Remote Poll Batch Size

Remote Poll Batch Size

Use Natural Ordering

Use Natural Ordering

proxy-configuration-service

org.apache.nifi.proxy.ProxyConfigurationService
proxy-configuration-service

Proxy Type

Proxy Type

Proxy Host

Proxy Host

Proxy Port

Proxy Port

Http Proxy Username

Http Proxy Username

Http Proxy Password

Http Proxy Password

Internal Buffer Size

Internal Buffer Size

ftp-use-utf8

ftp-use-utf8

PRIMARY
false
30 sec
Hostname
ftp.gmoserver.jp

Port
21

Username
sd0068552.09@gmoserver.jp

Password

Connection Mode
Passive

Transfer Mode
Binary

Remote Path
/test01/

File Filter Regex

Path Filter Regex

Polling Interval
60 sec

Search Recursively
false

Ignore Dotted Files
true

Delete Original
true

Connection Timeout
30 sec

Data Timeout
30 sec

Max Selects
100

Remote Poll Batch Size
5000

Use Natural Ordering
false

proxy-configuration-service

Proxy Type
DIRECT

Proxy Host

Proxy Port

Http Proxy Username

Http Proxy Password

Internal Buffer Size
16KB

ftp-use-utf8
false

0
5 sec
TIMER_DRIVEN
1 sec

false
GetFTP

false
success

STOPPED

bannerAds