我尝试使用SwarmKit(Docker的编排工具)
追记:2016/6/16
Docker本身已经新增了与SwarmKit的协作功能。
PullRequest编号:GH-23361
看起来更倾向于将其与DockerCLI进行整合,而非单独使用SwarmKit。
docker CLI现在新增了以下命令。
-
- swarm init
-
- swarm join
-
- swarm leave
- swarm update
还添加了用于管理swarm集群节点的命令。
-
- node_accept
-
- node_reject
-
- node_promote
-
- node_demote
-
- node_inspect
-
- node_update
-
- node_tasks
-
- node_ls
- node_rm
在Swarm集群上添加了用于执行作业的命令。
-
- service create
-
- service inspect
-
- service update
-
- service remove
- service tasks
最后,为了应用程序,也添加了Docker Stacks功能(命令),它由多个服务组成。
-
- stack
- deploy
所有命令的详细信息和文档请点击此处。
以下是关于Docker Stacks的文档。
据说swarm init和swarm join的帮助信息如下所示,供参考:
初始化群集
Usage: docker swarm init [OPTIONS]
Initialize a Swarm.
Options:
--auto-accept value Acceptance policy (default [worker,manager])
--force-new-cluster Force create a new cluster from current state.
--help Print usage
--listen-addr value Listen address (default 0.0.0.0:2377)
--secret string Set secret value needed to accept nodes into cluster
初始化一个Swarm集群。这个命令指定的Docker引擎成为新创建的单节点Swarm集群的管理者。
蜂群加入
Usage: docker swarm join [OPTIONS] HOST:PORT
Join a Swarm as a node and/or manager.
Options:
--help Print usage
--listen-addr value Listen address (default 0.0.0.0:2377)
--manager Try joining as a manager.
--secret string Secret for node acceptance
将一个节点加入到Swarm集群中。如果指定了–manager标志,则该命令的目标Docker Engine成为一个管理者。如果没有指定,则成为一个工作节点。
—截至2016年6月16日—
在我无意中查看Docker公司的存储库时,我发现了一个名为SwarmKit的东西。
我查了一下,似乎刚刚它被发布出来,没有太多资料可供参考。
介绍文章 (Chinese)
- First Look at Docker SwarmKit
因为我很在意,所以决定立刻试用一下。
SwarmKit 是什么?
总的来说,这是一个可以在分布式多主机环境中进行Docker容器编排的工具。类似的产品还有Hashicorp的nomad和Mesos+Marathon。
由于我的英文翻译可能会导致误解,我将从 README 中引用一段文字,详细内容请参考此处。
SwarmKit是一套用于协调分布式系统的工具包,可以以任何规模运行。它包括用于节点发现、基于Raft协议的一致性、任务调度等基本组件。
其主要优点为:
分布式:SwarmKit使用Raft协议算法进行协调,不依赖于单点故障来进行决策。
安全:节点间的通信和Swarm内部成员关系通过默认设置是安全的。SwarmKit使用相互认证的TLS来进行节点认证、角色授权和传输加密,自动化证书签发和更新。
简单:SwarmKit操作简单,最小化基础设施依赖。它不需要外部数据库来运行。
ConsulやNomadと同じくRaft実装みたいですね。
機能一覧はFeaturesに記載されています。
配置设置
环境。
我正在使用Mac(OSX)进行操作。我使用的是Docker for Mac(测试版)。
其他环境的用户请适当修改。
安装
SwarmKitのバイナリでの配布は行われていないようでした。
Githubのリポジトリからソースを取得してビルドしました。
# ソース取得
go get -d github.com/docker/swarmkit/...
cd $GOPATH/src/github.com/docker/swarmkit
# 確認
go test $(go list ./... | grep -v vendor)
# ビルド
make binaries
以下的文件将在bin目录下创建。
bin/
├── protoc-gen-gogoswarm
├── swarm-bench
├── swarmctl
└── swarmd
假设以下情况下,swarmd和swarmctl已经设置了正确的路径。
设置swarm集群
我将尝试使用swarmd。
swarmd的帮助信息如下所示。
$ swarmd --help
Run a swarm control process
Usage:
bin/swarmd [flags]
Flags:
-c, --ca-hash string Specifies the remote CA root certificate hash, necessary to join the cluster securely
--election-tick value Defines the amount of ticks (in seconds) needed without a Leader to trigger a new election (default 3)
--engine-addr string Address of engine instance of agent. (default "unix:///var/run/docker.sock")
--force-new-cluster Force the creation of a new cluster from data directory
--heartbeat-tick value Defines the heartbeat interval (in seconds) for raft member health-check (default 1)
--hostname string Override reported agent hostname
--join-addr string Join cluster with a node at this address
--listen-control-api string Listen socket for control API (default "/var/run/docker/cluster/docker-swarmd.sock")
--listen-debug string Bind the Go debug server on the provided address
--listen-remote-api string Listen address for remote API (default "0.0.0.0:4242")
-l, --log-level string Log level (options "debug", "info", "warn", "error", "fatal", "panic") (default "info")
--manager Request initial CSR in a manager role
-s, --secret string Specifies the secret token required to join the cluster
-d, --state-dir string State directory (default "/var/lib/docker/cluster")
-v, --version Display the version and exit
创建第一个节点
使用下面的命令创建第一个节点。
$ swarmd -d /tmp/node-1 --listen-control-api /tmp/manager1/swarm.sock --hostname node-1

接下来,我们尝试添加两个节点。
$ swarmd -d /tmp/node-2 --hostname node-2 --join-addr 127.0.0.1:4242
$ swarmd -d /tmp/node-3 --hostname node-3 --join-addr 127.0.0.1:4242

用swarmctl命令負責執行管理作業。
讓我們來顯示Swarm集群內的節點列表。
# swarmctlコマンドの接続先
$ export SWARM_SOCKET=/tmp/manager1/swarm.sock
# 一覧表示(ls)
$ swarmctl node ls

有3个节点加入了Swarm集群。
在Swarm集群中执行服务。
在集群上运行服务似乎要使用swarmctl service子命令。帮助显示如下:
$ swarmctl help service
Service management
Usage:
bin/swarmctl service [command]
Aliases:
service, svc
Available Commands:
inspect Inspect a service
ls List services
create Create a service
update Update a service
remove Remove a service
Global Flags:
-n, --no-resolve Do not try to map IDs to Names when displaying them
-s, --socket string Socket to connect to the Swarm manager (default "/tmp/manager1/swarm.sock")
Use "swarmctl service [command] --help" for more information about a command.
那么我们马上来启动一些东西吧。这里我们以redis为例进行启动试试。
服务启动
$ swarmctl service create --name redis --image redis:3.0.5
你能启动吗?
确认服务启动
我们可以通过swarmctl service ls命令来进行确认。
$ swarmctl service ls
ID Name Image Instances
-- ---- ----- ---------
2v5ima52ubn1szmrw5qxwiy00 redis redis:3.0.5 1
听起来好像已经启动了。
服务详细确认
似乎可以通过swarmctl service inspect [服务名称]来进行详细检查。
$ swarmctl service inspect redis
ID : 2v5ima52ubn1szmrw5qxwiy00
Name : redis
Instances : 1
Template
Container
Image : redis:3.0.5
Task ID Service Instance Image Desired State Last State Node
------- ------- -------- ----- ------------- ---------- ----
f1r7p3b5m3hxvi7ggd14yppxe redis 1 redis:3.0.5 RUNNING RUNNING 3 minutes ago node-1
顺便从docker命令那边也来看看启动情况吧。
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e57a1aca66f3 redis:3.0.5 "/entrypoint.sh redis" About a minute ago Up About a minute 6379/tcp redis.1.f1r7p3b5m3hxvi7ggd14yppxe
我认为你可以确认 Redis 容器已启动。
服务更新
看起来可以通过增加实例数量等方式来更新属性。让我们将实例数量增加到6个试试看。
# 増やしてみるよ
$ swarmctl service update redis --instances 6
2v5ima52ubn1szmrw5qxwiy00
# 確認してみるよ
$ swarmctl service inspect redis
ID : 2v5ima52ubn1szmrw5qxwiy00
Name : redis
Instances : 6
Template
Container
Image : redis:3.0.5
Task ID Service Instance Image Desired State Last State Node
------- ------- -------- ----- ------------- ---------- ----
f1r7p3b5m3hxvi7ggd14yppxe redis 1 redis:3.0.5 RUNNING RUNNING 7 minutes ago node-1
37xq1iluq89q9ln140way2wuj redis 2 redis:3.0.5 RUNNING RUNNING 18 seconds ago node-3
chvth7r9ol2nhll0o6basebap redis 3 redis:3.0.5 RUNNING RUNNING 18 seconds ago node-2
6d79qrshr74d1x7x0y57hercd redis 4 redis:3.0.5 RUNNING RUNNING 18 seconds ago node-2
etfpywpqz9zi0x90mhbzly5zs redis 5 redis:3.0.5 RUNNING RUNNING 18 seconds ago node-1
43fackr1v05f6w7yuwc2uf2px redis 6 redis:3.0.5 RUNNING RUNNING 18 seconds ago node-3
看起来增加了。似乎也可以进行节点分散。
让我们从docker命令的一侧来查看一下。
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
44747f6bf347 redis:3.0.5 "/entrypoint.sh redis" About a minute ago Up About a minute 6379/tcp redis.6.43fackr1v05f6w7yuwc2uf2px
655a2bf55185 redis:3.0.5 "/entrypoint.sh redis" About a minute ago Up About a minute 6379/tcp redis.4.6d79qrshr74d1x7x0y57hercd
e1a1e2e993bc redis:3.0.5 "/entrypoint.sh redis" About a minute ago Up About a minute 6379/tcp redis.3.chvth7r9ol2nhll0o6basebap
43bc0449a374 redis:3.0.5 "/entrypoint.sh redis" About a minute ago Up About a minute 6379/tcp redis.2.37xq1iluq89q9ln140way2wuj
9cbb0c1a419a redis:3.0.5 "/entrypoint.sh redis" About a minute ago Up About a minute 6379/tcp redis.5.etfpywpqz9zi0x90mhbzly5zs
e57a1aca66f3 redis:3.0.5 "/entrypoint.sh redis" 9 minutes ago Up 9 minutes 6379/tcp redis.1.f1r7p3b5m3hxvi7ggd14yppxe
这次因为所有节点都在同一台主机上,所以所有6个实例都是可见的。如果我们在每个节点上建立一个独立的swarm集群,那么我认为docker ps命令将只显示下一个节点的实例。(我还没有试过)
swarmctl service updateでは、実行イメージの更新だったり、ローリングアップデートもできるようです。
ノードの停止(drain)
让我们停止节点,观察服务实例的重新部署情况。
# node-1を停止
$ swarmctl node drain node-1
# 確認
$ swarmctl node ls
ID Name Membership Status Availability Manager status
-- ---- ---------- ------ ------------ --------------
08ru842rs9yam node-1 ACCEPTED READY DRAIN REACHABLE *
185o4l4u6nnqh node-3 ACCEPTED READY ACTIVE
3ebkf6w3m1dnd node-2 ACCEPTED READY ACTIVE
确认重新配置
由于停止了节点1,6个实例应该会被分配到节点2和节点3。请确认一下。
$ swarmctl service inspect redis
ID : 2v5ima52ubn1szmrw5qxwiy00
Name : redis
Instances : 6
Template
Container
Image : redis:3.0.5
Task ID Service Instance Image Desired State Last State Node
------- ------- -------- ----- ------------- ---------- ----
6dfvxo1hx0ml7wvuly76qzox7 redis 1 redis:3.0.5 RUNNING RUNNING 2 minutes ago node-2
37xq1iluq89q9ln140way2wuj redis 2 redis:3.0.5 RUNNING RUNNING 9 minutes ago node-3
chvth7r9ol2nhll0o6basebap redis 3 redis:3.0.5 RUNNING RUNNING 9 minutes ago node-2
6d79qrshr74d1x7x0y57hercd redis 4 redis:3.0.5 RUNNING RUNNING 9 minutes ago node-2
93vr14ppi1quw5iekzuct8r5i redis 5 redis:3.0.5 RUNNING RUNNING 2 minutes ago node-3
43fackr1v05f6w7yuwc2uf2px redis 6 redis:3.0.5 RUNNING RUNNING 9 minutes ago node-3
在node-1上运行的实例已经消失了,它们被重新配置到了node-2和node-3上。
附加说明:为swarmctl的每个子命令提供帮助。
为那些不愿费心建立的人提供帮助。关于子命令和子子命令,我已经列出了可以指定选项的内容。(如果有您想了解但未列出的内容,请联系@yamamoto-febc,我会添加。)
swarmctl (发号施令)
$ swarmctl --help
Control a swarm cluster
Usage:
bin/swarmctl [command]
Available Commands:
node Node management
service Service management
task Task management
version Print version number of swarm.
network Network management
cluster Cluster management
Flags:
-n, --no-resolve Do not try to map IDs to Names when displaying them
-s, --socket string Socket to connect to the Swarm manager (default "/var/run/docker/cluster/docker-swarmd.sock")
Use "swarmctl [command] --help" for more information about a command.
swarmctl 节点
$ swarmctl node --help
Node management
Usage:
swarmctl node [command]
Available Commands:
accept Accept a node into the cluster
remove Remove a node
inspect Inspect a node
ls List nodes
activate Activate a node
pause Pause a node
drain Drain a node
promote Promote a node to a manager
demote Demote a node from a manager to a worker
Global Flags:
-n, --no-resolve Do not try to map IDs to Names when displaying them
-s, --socket string Socket to connect to the Swarm manager (default "/var/run/docker/cluster/docker-swarmd.sock")
Use "swarmctl node [command] --help" for more information about a command.
蜂群控制服务
$ swarmctl service --help
Service management
Usage:
swarmctl service [command]
Aliases:
service, svc
Available Commands:
inspect Inspect a service
ls List services
create Create a service
update Update a service
remove Remove a service
Global Flags:
-n, --no-resolve Do not try to map IDs to Names when displaying them
-s, --socket string Socket to connect to the Swarm manager (default "/var/run/docker/cluster/docker-swarmd.sock")
Use "swarmctl service [command] --help" for more information about a command.
创建服务的swarmctl命令。
$ swarmctl service create --help
Create a service
Usage:
bin/swarmctl service create [flags]
Flags:
--args value container args (default [])
--constraint value Placement constraint (node.labels.key==value) (default [])
--cpu-limit string CPU cores limit (e.g. 0.5)
--cpu-reservation string number of CPU cores reserved (e.g. 0.5)
--env value container env (default [])
--image string container image
--instances uint number of instances for the service (default 1)
--label value service label (key=value) (default [])
--memory-limit string memory limit (e.g. 512m)
--memory-reservation string amount of reserved memory (e.g. 512m)
--mode string one of replicated, global (default "replicated")
--name string service name
--network string network name
--ports value ports (default [])
--restart-condition string condition to restart the task (any, failure, none) (default "any")
--restart-delay string delay between task restarts (default "5s")
--restart-max-attempts uint maximum number of restart attempts (0 = unlimited)
--restart-window string time window to evaluate restart attempts (0 = unbound) (default "0s")
--update-delay string delay between task updates (0s = none) (default "0s")
--update-parallelism uint task update parallelism (0 = all at once)
Global Flags:
-n, --no-resolve Do not try to map IDs to Names when displaying them
-s, --socket string Socket to connect to the Swarm manager (default "/tmp/manager1/swarm.sock")
swarmctl 任务
$ swarmctl task --help
Task management
Usage:
swarmctl task [command]
Available Commands:
ls List tasks
inspect Inspect a task
remove Remove a task
Global Flags:
-n, --no-resolve Do not try to map IDs to Names when displaying them
-s, --socket string Socket to connect to the Swarm manager (default "/var/run/docker/cluster/docker-swarmd.sock")
Use "swarmctl task [command] --help" for more information about a command.
swarmctl 网络
$ swarmctl network --help
Network management
Usage:
swarmctl network [command]
Available Commands:
inspect Inspect a network
ls List networks
create Create a network
remove Remove a network
Global Flags:
-n, --no-resolve Do not try to map IDs to Names when displaying them
-s, --socket string Socket to connect to the Swarm manager (default "/var/run/docker/cluster/docker-swarmd.sock")
Use "swarmctl network [command] --help" for more information about a command.
swarmctl创建网络。
$ swarmctl network create --help
Create a network
Usage:
bin/swarmctl network create [flags]
Flags:
--driver string Network driver
--gateway value Gateway IP addresses for network segments (default [])
--ip-range value IP ranges to allocate from within the subnets (default [])
--ipam-driver string IPAM driver
--name string Network name
--opts value Network driver options (default [])
--subnet value Subnets in CIDR format that represents a network segments (default [])
Global Flags:
-n, --no-resolve Do not try to map IDs to Names when displaying them
-s, --socket string Socket to connect to the Swarm manager (default "/tmp/manager1/swarm.sock")
集群控制
$ swarmctl cluster --help
Cluster management
Usage:
swarmctl cluster [command]
Available Commands:
inspect Inspect a cluster
ls List clusters
update Update a cluster
Global Flags:
-n, --no-resolve Do not try to map IDs to Names when displaying them
-s, --socket string Socket to connect to the Swarm manager (default "/var/run/docker/cluster/docker-swarmd.sock")
Use "swarmctl cluster [command] --help" for more information about a command.
swarmctl 集群更新
$ swarmctl cluster update
Error: cluster name missing
Usage:
bin/swarmctl cluster update <cluster name> [flags]
Flags:
--autoaccept value Roles to automatically issue certificates for (default [])
--certexpiry duration Duration node certificates will be valid for (default 2160h0m0s)
--heartbeatperiod duration Period when heartbeat is expected to receive from agent
-h, --help help for update
--secret value Secret required to join the cluster (default [])
--taskhistory int Number of historic task entries to retain per instance or node
Global Flags:
-n, --no-resolve Do not try to map IDs to Names when displaying them
-s, --socket string Socket to connect to the Swarm manager (default "/tmp/manager1/swarm.sock")
总结
我已经通过阅读README文件大致了解了SwarmKit。在此之前,搭建Swarm集群需要单独搭建Key-Value存储系统(如etcd或consul),但现在swarmd似乎将这一部分隐藏起来了。
在构建多主机集群时,诸如网络环境和数据容量等方面有很多要考虑的,但目前仍处于开发阶段,信息较为有限。
我仍然不太了解与 Nomad、Mesos+Marathon 等相比的优点。(可能是因为我还没有完全理解…)
我想再多关注一下开发情况。
这就是以上所提到的。