尝试使用基于Kubernetes驱动的机器学习平台”Seldon”进行实验

首先

Seldon是什么?

这是一个英国开发的机器学习开源平台,于2015年12月作为OSS首次公开发布。

使用Kubernetes作为架构的核心是其特点,借助Kubernetes可以轻松地搭建复杂的机器学习平台,例如由Kafka + Spark + Zookeeper(用于机器学习)或Grafana + Influxdb + Fluentd(用于日志可视化)等构成的系统。

另外,利用容器配置的设计中充分发挥了容器组件的优势,如将学习模型压缩到容器中实现微服务化,以及将批量学习处理(基于luigi)压缩到Dockerfile中作为Kubernetes作业进行提交的机制等。

尽管团队规模较小,但他们持续不断地推出了很多充满活力的产品,让我觉得非常有趣。因此,我很荣幸能够在这里向大家介绍。如果您对此感兴趣,我将不胜感激。

与类似软件相比较

作为「OSS 机器学习平台」,我们与类似的 PredictionIO 进行了比较,发现有很多相似之处。

PredictionIOSeldonライセンスApache License 2.0Apache License 2.0メインになる開発言語ScalaPython対応する機械学習領域汎用(Template Engine)Recommendation / Prediction標準サポートモデルMLlibTensorFlow, Keras, Vowpal Wabbit, XGBoost, Gensimコンテナ対応非公式Kubernetes1st release2013/022015/12GitHubのStar数(2017/06/29時点)10,2521,078公式ドキュメント(それなりに)充実必要最低限その他日本語コミュニティがある英語/日本語を問わず情報が少ない

对比一下这样感觉是「嗯,有点儿…」,但实际去碰它时,它具有简单的结构,易于理解,看起来意外地还好办。

如果你觉得「PredictioIO的机制太复杂了…」,或者你想基本上使用Python来实现,我强烈建议你们一定要尝试一下。这样你们可以增加更多的知识和经验。

触摸一下Seldon。

所以,我们进行了操作确认。

    • 自作PC (i7-4790 @ 3.60GHz, SSD) (メモリ 32GB)

Windows 10 Pro 64bit
Hyper-V

コンソール環境

Cygwin

Python 2.7.13
make, htpasswd(httpd-tools), (jq)

Kubernetes

Minikube v0.20.0
Kubectl v1.6.3

文档中写着”若要在minicube上运行,需要至少12GB的主内存”1,于是我在Windows环境下试着用16GB的MBP,但在minikube和kubectl等方面经常遇到问题。
我建议诚实地使用MacOS或Linux,或者使用GCP的方法。

如果您仍然想在Windows环境下尝试,请查看下文提及的“※Windows + Cygwin的情况”。

Seldon的启动

首先,启动Kubernetes集群。
这次,根据官方文档的指定,我们将内存设定为12GB,并且为了预留更多的空间,也将磁盘大小分配得较大。选择了Hyper-V作为驱动程序。

c:\>minikube start --vm-driver="hyperv" --memory=12000 --disk-size=40g
Starting local Kubernetes v1.6.4 cluster...
Starting VM...
Downloading Minikube ISO
 90.95 MB / 90.95 MB [==============================================] 100.00% 0s
Moving files into cluster...
Setting up certs...
Starting cluster components...
Connecting to cluster...
Setting up kubeconfig...
Kubectl is now configured to use the cluster.

确认集群已正确启动。

c:\>kubectl cluster-info
Kubernetes master is running at https://192.168.11.224:8443

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

请从这里检查 Seldon 的最新发布版本,并获取源代码。此次使用的版本是 v1.4.6,并将其配置在 C:\seldon-server 下。

$ cd /cygdrive/c/
$ git clone https://github.com/seldonio/seldon-server
$ cd seldon-server/
$ git checkout -b v1.4.6 refs/tags/v1.4.6

我们将创建Kubernetes Configuration。请提前准备好make和htpasswd命令,因为它们是必需的。
注意,官方安装手册中包含参数修改示例,但在使用Minikube时,保持默认设置没有问题。

$ cd /cygdrive/c/seldon-server/kubernetes/conf
$ make clean conf
rm -f /home/teru/seldon-server/build/build_versions_generated
rm -f mysql.json
rm -f memcache.json
rm -f control.json
rm -f td-agent-server.json
rm -f spark-master.json
rm -f spark-workers.json
rm -f server.json
rm -f influxdb-grafana.json
rm -f examples/reuters/import-data-job.json
rm -f examples/ml100k/ml100k-import.json
rm -f examples/ml10m/ml10m-import-item-similarity.json
rm -f examples/ml10m/ml10m-import-matrix-factorization.json
rm -f examples/finefoods/train-finefoods.json
rm -f examples/tensorflow_deep_mnist/train-tensorflow-deep-mnist.json
rm -f examples/tensorflow_deep_mnist/load-model-tensorflow-deep-mnist.json
rm -f microservice-.json
rm -f glusterfs.json
rm -f zookeeper.json
rm -f kafka.json
rm -f dev/server.json
rm -f dev/control.json
rm -f dev/iago.json
rm -f dev/locust-slave.json.template
rm -f dev/locust-master.json.template
rm -f analytics/*.json
rm -f microservice_pipeline.rest.template
rm -f microservice_pipeline.rpc.template
rm -f models/stream-itemsim-create.json
rm -f models/stream-itemsim-dbupload.json
rm -f rpc/create-proto-jar-job.template.json
rm -f mysql-google-cloudsql.json
rm -f spark-ui.json
rm -f proxy-server.json
rm -f spark.htpasswd
kubectl delete secret grafana-admin-password  > /dev/null 2>&1 || : && \
kubectl create secret generic grafana-admin-password --from-literal=grafana-admin-password.txt="admin"
secret "grafana-admin-password" created
htpasswd -bc spark.htpasswd spark spark
Adding password for user spark
kubectl delete secret sparkui-secret  > /dev/null 2>&1 || : && \
kubectl create secret generic sparkui-secret --from-file=./spark.htpasswd
secret "sparkui-secret" created
created control.json
created mysql.json
created memcache.json
created td-agent-server.json
created spark-master.json
created spark-workers.json
created examples/reuters/import-data-job.json
created glusterfs.json
created server.json
created examples/ml100k/ml100k-import.json
created dev/server.json
created dev/control.json
created influxdb-grafana.json
created zookeeper.json
created kafka.json
created examples/finefoods/train-finefoods.json
created examples/tensorflow_deep_mnist/train-tensorflow-deep-mnist.json
created examples/tensorflow_deep_mnist/load-model-tensorflow-deep-mnist.json
created microservice_pipeline.rest.template
created microservice_pipeline.rpc.template
created examples/US_stocks/train-US-stocks.json
created mysql-google-cloudsql.json
created spark-ui.json
created proxy-server.json

将seldon-server/kubernetes/bin添加到环境变量并使用seldon-up脚本启动。
在我的环境中,启动过程大约需要5到10分钟,请耐心等待。

$ ls /cygdrive/c/seldon-server/kubernetes/bin
create-proto-jar  launch-locust-load-test  seldon-cli  seldon-down  seldon-up  start-microservice

$ export PATH="$PATH:/cygdrive/c/seldon-server/kubernetes/bin"

$ seldon-up
Starting seldon version [1.4.6]
Creating hostpath persistent volume
persistentvolume "host-volume" created
persistentvolumeclaim "seldon-claim" created
Starting core servces
deployment "mysql" created
service "mysql" created
deployment "memcached1" created
service "memcached1" created
deployment "memcached2" created
service "memcached2" created
deployment "redis" created
service "redis" created
service "zookeeper-1" created
service "zookeeper-2" created
service "zookeeper-3" created
deployment "zookeeper1" created
deployment "zookeeper2" created
deployment "zookeeper3" created
deployment "seldon-control" created
deployment "influxdb-grafana" created
service "monitoring-influxdb" created
service "monitoring-grafana" created
NAME                               READY     STATUS              RESTARTS   AGE
influxdb-grafana-842592602-h0m2t   0/2       Pending             0          1s
memcached1-2136693305-lzmmj        0/1       ContainerCreating   0          3s
memcached2-2533120572-9wckh        0/1       ContainerCreating   0          3s
mysql-2529449154-21g6z             0/1       ContainerCreating   0          4s
redis-1963070708-sp2h1             0/1       ContainerCreating   0          3s
seldon-control-2582542290-dl869    0/1       ContainerCreating   0          1s
zookeeper1-467704625-8crjx         0/1       ContainerCreating   0          2s
zookeeper2-1006738229-rqhkt        0/1       ContainerCreating   0          2s
zookeeper3-1545771833-dwr97        0/1       ContainerCreating   0          2s
Waiting for pods to be running as found 9 in non-running state
Sleeping for 5 seconds...
NAME                               READY     STATUS              RESTARTS   AGE
influxdb-grafana-842592602-h0m2t   0/2       ContainerCreating   0          7s
memcached1-2136693305-lzmmj        0/1       ContainerCreating   0          9s
memcached2-2533120572-9wckh        0/1       ContainerCreating   0          9s
mysql-2529449154-21g6z             0/1       ContainerCreating   0          10s
redis-1963070708-sp2h1             0/1       ContainerCreating   0          9s
seldon-control-2582542290-dl869    0/1       ContainerCreating   0          7s
zookeeper1-467704625-8crjx         0/1       ContainerCreating   0          8s
zookeeper2-1006738229-rqhkt        0/1       ContainerCreating   0          8s
zookeeper3-1545771833-dwr97        0/1       ContainerCreating   0          8s
Waiting for pods to be running as found 9 in non-running state
Sleeping for 5 seconds...

... (中略)

Waiting for pods to be running as found 1 in non-running state
Sleeping for 3 seconds...
deployment "spark-worker-controller" created
deployment "spark-ui-proxy-controller" created
service "spark-ui-proxy" created
connecting to zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181 [SUCCEEDED]
connecting to zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181 [SUCCEEDED]
Setting up memcached
Writing data to file[/seldon-data/conf/zkroot/config/memcached/_data_]
Writing data to file[/seldon-data/conf/zkroot/config/memcached/_data_]
connecting to zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181 [SUCCEEDED]
updated zk node[/config/memcached]
connecting to zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181 [SUCCEEDED]
Writing data to file[/seldon-data/conf/zkroot/config/dbcp/_data_]
Setting up Databases
Writing data to file[/seldon-data/conf/zkroot/config/dbcp/_data_]
connecting to zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181 [SUCCEEDED]
updated zk node[/config/dbcp]
connecting to zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181 [SUCCEEDED]
Trying to create the client
Adding api DB to MySQL DB 'ClientDB'
Adding JS consumer key for client 'test' : 'CX3W4JJIWFMGJ6E30F6T'
Adding REST API key for client 'test' : consumer_key='EZ137JB4KP1MZFBDXWHY' consumer_secret='MTPIBH760LE2XOJBB50W'
Writing data to file[/seldon-data/conf/zkroot/all_clients/test/_data_]
updated zk node[/all_clients/test]
Adding grafana dashboard, response code 200
Starting Seldon API server
deployment "seldon-server" created
service "seldon-server" created
Defaulting container name to influxdb.
Use 'kubectl describe pod/influxdb-grafana-842592602-h0m2t' to see all of the containers in this pod.
deployment "kafka-stream-impressions" created
deployment "kafka-stream-predictions" created

如果超过30分钟仍未完成,可能发生异常终止,请尝试使用kubectl get all或kubectl get events等命令来确认状态。以下是正常完成后kubectl get all的结果。

$ kubectl get all
NAME                                            READY     STATUS    RESTARTS   AGE
po/influxdb-grafana-842592602-n9xl6             2/2       Running   0          12m
po/kafka-controller-1424591021-q76xh            1/1       Running   0          5m
po/kafka-stream-impressions-169212079-4s9m7     1/1       Running   0          3m
po/kafka-stream-predictions-140764527-vqqs4     1/1       Running   0          3m
po/memcached1-2136693305-lzg9l                  1/1       Running   0          12m
po/memcached2-2533120572-djxbh                  1/1       Running   0          12m
po/mysql-2529449154-v0j1d                       1/1       Running   0          12m
po/redis-1963070708-qtqk9                       1/1       Running   0          12m
po/seldon-control-2582542290-rn0q2              1/1       Running   0          12m
po/seldon-server-3173692685-sxpcr               3/3       Running   0          3m
po/spark-master-controller-3720462731-pr84p     1/1       Running   0          4m
po/spark-ui-proxy-controller-1688034969-vrtwj   2/2       Running   0          3m
po/spark-worker-controller-3381690000-62g1s     1/1       Running   0          3m
po/spark-worker-controller-3381690000-6dkpz     1/1       Running   0          3m
po/td-agent-server-3988194731-xll8z             1/1       Running   0          5m
po/zookeeper1-467704625-2smxw                   1/1       Running   0          12m
po/zookeeper2-1006738229-870qd                  1/1       Running   0          12m
po/zookeeper3-1545771833-lrwxh                  1/1       Running   0          12m

NAME                      CLUSTER-IP   EXTERNAL-IP   PORT(S)                       AGE
svc/kafka-service         10.0.0.167   <nodes>       9092:30010/TCP                5m
svc/kubernetes            10.0.0.1     <none>        443/TCP                       13m
svc/memcached1            10.0.0.237   <none>        11211/TCP                     12m
svc/memcached2            10.0.0.103   <none>        11211/TCP                     12m
svc/monitoring-grafana    10.0.0.185   <pending>     80:30002/TCP                  12m
svc/monitoring-influxdb   10.0.0.198   <none>        8083/TCP,8086/TCP             12m
svc/mysql                 10.0.0.253   <none>        3306/TCP                      12m
svc/redis                 10.0.0.192   <none>        6379/TCP                      12m
svc/seldon-server         10.0.0.221   <nodes>       80:30015/TCP,5000:30017/TCP   3m
svc/spark-master          10.0.0.211   <none>        7077/TCP                      4m
svc/spark-ui-proxy        10.0.0.145   <pending>     8000:30005/TCP                3m
svc/spark-webui           10.0.0.162   <none>        8080/TCP                      4m
svc/td-agent-server       10.0.0.222   <none>        24224/TCP,24224/UDP           5m
svc/zookeeper-1           10.0.0.219   <none>        2181/TCP,2888/TCP,3888/TCP    12m
svc/zookeeper-2           10.0.0.225   <none>        2181/TCP,2888/TCP,3888/TCP    12m
svc/zookeeper-3           10.0.0.151   <none>        2181/TCP,2888/TCP,3888/TCP    12m

NAME                               DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/influxdb-grafana            1         1         1            1           12m
deploy/kafka-controller            1         1         1            1           5m
deploy/kafka-stream-impressions    1         1         1            1           3m
deploy/kafka-stream-predictions    1         1         1            1           3m
deploy/memcached1                  1         1         1            1           12m
deploy/memcached2                  1         1         1            1           12m
deploy/mysql                       1         1         1            1           12m
deploy/redis                       1         1         1            1           12m
deploy/seldon-control              1         1         1            1           12m
deploy/seldon-server               1         1         1            1           3m
deploy/spark-master-controller     1         1         1            1           4m
deploy/spark-ui-proxy-controller   1         1         1            1           3m
deploy/spark-worker-controller     2         2         2            2           3m
deploy/td-agent-server             1         1         1            1           5m
deploy/zookeeper1                  1         1         1            1           12m
deploy/zookeeper2                  1         1         1            1           12m
deploy/zookeeper3                  1         1         1            1           12m

NAME                                      DESIRED   CURRENT   READY     AGE
rs/influxdb-grafana-842592602             1         1         1         12m
rs/kafka-controller-1424591021            1         1         1         5m
rs/kafka-stream-impressions-169212079     1         1         1         3m
rs/kafka-stream-predictions-140764527     1         1         1         3m
rs/memcached1-2136693305                  1         1         1         12m
rs/memcached2-2533120572                  1         1         1         12m
rs/mysql-2529449154                       1         1         1         12m
rs/redis-1963070708                       1         1         1         12m
rs/seldon-control-2582542290              1         1         1         12m
rs/seldon-server-3173692685               1         1         1         3m
rs/spark-master-controller-3720462731     1         1         1         4m
rs/spark-ui-proxy-controller-1688034969   1         1         1         3m
rs/spark-worker-controller-3381690000     2         2         2         3m
rs/td-agent-server-3988194731             1         1         1         5m
rs/zookeeper1-467704625                   1         1         1         12m
rs/zookeeper2-1006738229                  1         1         1         12m
rs/zookeeper3-1545771833                  1         1         1         12m

确认动作

如果能够顺利启动,可以使用样本进行操作确认。

Movielens 100K样本

这是基于机器学习使用经典的Movielens (100K记录)数据集进行的推荐。

    Content Recommendation Guide | Seldon Documentation

加载数据并创建学习模型。

$ cd /cygdrive/c/seldon-server/kubernetes/conf/examples/ml100k

$ kubectl create -f ml100k-import.json
job "ml100k-import" created

运行kubectl get jobs命令,并等待任务完成(直到”DESIRED”和”SUCCESSFUL”的值相等)。

$ kubectl get jobs -l name=ml100k-import
NAME            DESIRED   SUCCESSFUL   AGE
ml100k-import   1         0            23s

(2~3分後)

$ kubectl get jobs -l name=ml100k-import
NAME            DESIRED   SUCCESSFUL   AGE
ml100k-import   1         1            2m

完成学习模型后,将进行推荐。

$ seldon-cli api --client-name ml100k --endpoint /js/recommendations --item 50 --limit 4
connecting to zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181 [SUCCEEDED]
response code 200
{"size":4,"requested":4,"list":[{"id":"181","name":"","type":1,"first_action":1498640323000,"last_action":1498640323000,"popular":false,"demographics":[],"attributes":{},"attributesName":{"recommendationUuid":"1","release":"14-Mar-1997","title":"Return of the Jedi (1983)","url":"http://us.imdb.com/M/title-exact?Return%20of%20the%20Jedi%20(1983)"}},{"id":"127","name":"","type":1,"first_action":149864
0323000,"last_action":1498640323000,"popular":false,"demographics":[],"attributes":{},"attributesName":{"recommendationUuid":"1","release":"01-Jan-1972","title":"Godfather, The (1972)","url":"http://us.imdb.com/M/title-exact?Godfather,%20The%20(1972)"}},{"id":"1","nam
e":"","type":1,"first_action":1498640323000,"last_action":1498640323000,"popular":false,"demographics":[],"attributes":{},"attributesName":{"recommendationUuid":"1","release":"01-Jan-1995","title":"Toy Story (1995)","url":"http://us.imdb.com/M/title-exact?Toy%20Story%20(1995)"}},{"id":"100","name":"","type":1,"first_action":1498640323000,"last_action":1498640323000,"popular":false,"demographics":[],
"attributes":{},"attributesName":{"recommendationUuid":"1","release":"14-Feb-1997","title":"Fargo (1996)","url":"http://us.imdb.com/M/title-exact?Fargo%20(1996)"}}]}

返回的东西看起来像吗?(*虽然返回的json格式出现了解析错误,但我还是原样显示)

路透社新闻推荐

这是基于 Reuters 21578 数据集的相似文本推荐。

    Recomendation Example | Seldon Documentation

读取数据并创建学习模型。

$ cd /cygdrive/c/seldon-server/kubernetes/conf/examples/reuters

$ kubectl create -f import-data-job.json
job "reuters-import-data" created

(2~3分)

$ kubectl get jobs -l job-name=reuters-import-data
NAME                  DESIRED   SUCCESSFUL   AGE
reuters-import-data   1         1            2m

将学习模型作为微服务启动。

$ start-microservice --type recommendation --client reuters -i reuters-example seldonio/reuters-example:2.0.7 rest 1.0
[Microservice(reuters-example,seldonio/reuters-example:2.0.7,rest,1.000000)]
Replicas is  1
kubectl apply -f c:/seldon-server/kubernetes/bin/../conf/microservices/microservice-reuters-example.json
deployment "reuters-example" configured
service "reuters-example" configured
connecting to zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181 [SUCCEEDED]
Writing data to file[/seldon-data/conf/zkroot/all_clients/reuters/alg_rectags/_data_]
connecting to zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181 [SUCCEEDED]
updated zk node[/all_clients/reuters/alg_rectags]

$ kubectl get pods -l name=reuters-example
NAME                              READY     STATUS    RESTARTS   AGE
reuters-example-327623259-s9rgg   1/1       Running   0          3s

我会提供建议。

$ seldon-cli --quiet api --client-name reuters --endpoint  /js/recommendations --item 6020 --limit 3 | jq .
{
  "size": 3,
  "requested": 3,
  "list": [
    {
      "id": "11348",
      "name": "",
      "type": 1,
      "first_action": 1498640773000,
      "last_action": 1498640773000,
      "popular": false,
      "demographics": [],
      "attributes": {},
      "attributesName": {
        "recommendationUuid": "2",
        "title": "GANDALF <GANDF> ACQUIRES STAKE IN DATA/VOICE",
        "body": "Gandalf Technologies Inc said it\nacquired a significant minority equity interest in privately\nheld Data/Voice Solutions Corp, of Newport Beach, Calif., for\nundisclosed terms.\n    Gandalf did not specify the size of the interest.\n    Data/Voice is a three-year-old designer and manufacturer of\na multiprocessor, multiuser MS-DOS computing system that\nGandalf plans to integrate w
ith its private automatic computer\nexchange information system, Gandalf said.\n Reuter\n\u0003"
      }
    },
    {
      "id": "7816",
      "name": "",
      "type": 1,
      "first_action": 1498640772000,
      "last_action": 1498640772000,
      "popular": false,
      "demographics": [],
      "attributes": {},
      "attributesName": {
        "recommendationUuid": "2",
        "title": "CELINA <CELNA> SHAREHOLDERS APPROVE SALE",
        "body": "Celina Financial Corp said\nshareholders at a special meeting approved a transaction in\nwhich the company transferred its interest in three insurance\ncompanies to a wholly owned subsidiary which then sold the\nthree companies to an affiliated subsidiary.\n    It said the company's interests in West Virginia Fire and\nCasualty Co, Congregation Insurance co and National Term Life\nI
nsurance Co had been transferred to First National Indemnity\nCo, which sold the three to Celina Mutual for cash, an office\nbuilding and related real estate.\n Reuter\n\u0003"
      }
    },
    {
      "id": "8571",
      "name": "",
      "type": 1,
      "first_action": 1498640772000,
      "last_action": 1498640772000,
      "popular": false,
      "demographics": [],
      "attributes": {},
      "attributesName": {
        "recommendationUuid": "2",
        "title": "AVALON <AVL> STAKE SOLD BY DELTEC",
        "body": "Avalon Corp said that <Deltec\nPanamerica SA> has arranged to sell its 23 pct stake in Avalon\nand that Deltec's three representatives on Avalon's board had\nresigned.\n    An Avalon spokeswoman declined to indentify the buyer of\nDeltec's stake or give terms of the sale.\n    In addition, Avalon said three other directors resigned. It\nsaid Benjamin W. Macdonald, a director of <TMO
C Resources Ltd>,\nthe principal holder of Avalon stock, and Hardwick Simmons, a\nvice chairman of Shearson Lehman Bros Inc, were then named to\nthe board.\n Reuter\n\u0003"
      }
    }
  ]
}

鸢尾花分类

这是使用Iris数据集进行分类预测的示例。

    Prediction Example | Seldon Documentation

在这个示例中,执行方式没有一致性,数据读取和训练部分的处理没有分开,似乎在执行Dockerfile中的start-microservice时,也包括了数据导入和训练处理。

$ start-microservice --type prediction --client test -i iris-xgboost seldonio/iris_xgboost:2.1 rest 1.0
[Microservice(iris-xgboost,seldonio/iris_xgboost:2.1,rest,1.000000)]
Replicas is  1
kubectl apply -f c:/seldon-server/kubernetes/bin/../conf/microservices/microservice-iris-xgboost.json
deployment "iris-xgboost" created
service "iris-xgboost" created
connecting to zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181 [SUCCEEDED]
Writing data to file[/seldon-data/conf/zkroot/all_clients/test/predict_algs/_data_]
Added prediction algs for test
connecting to zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181 [SUCCEEDED]
updated zk node[/all_clients/test/predict_algs]

(1~2分)

$ kubectl get pods -l name=iris-xgboost
NAME                            READY     STATUS    RESTARTS   AGE
iris-xgboost-1750015301-v3lxz   1/1       Running   0          1m

我們將進行分類。

$ seldon-cli --quiet api --client-name test --endpoint /js/predict --json '{"data":{"f1":1,"f2":2.7,"f3":5.3,"f4":1.9}}' | jq .
{
  "meta": {
    "puid": "2e43f2625611c7d3317acb33e5537a8fdfcf01dd",
    "modelName": "model_xgb",
    "variation": "iris-xgboost"
  },
  "predictions": [
    {
      "prediction": 0.00252304,
      "predictedClass": "Iris-setosa",
      "confidence": 0.00252304
    },
    {
      "prediction": 0.00350009,
      "predictedClass": "Iris-versicolor",
      "confidence": 0.00350009
    },
    {
      "prediction": 0.993977,
      "predictedClass": "Iris-virginica",
      "confidence": 0.993977
    }
  ],
  "custom": null
}

TensorFlow深度MNIST

最后,让我们在Seldon上运行TensorFlow手写文字识别的演示。

    TensorFlow Deep MNIST Demo | Seldon Documentation
$ cd /cygdrive/c/seldon-server/kubernetes/conf/examples/tensorflow_deep_mnist

$ kubectl create -f load-model-tensorflow-deep-mnist.json
job "load-model-tensorflow-deep-mnist" created

(2~3分)

$ kubectl get jobs | grep load-model
load-model-tensorflow-deep-mnist   1         1            3m

如果想要从使用TensorFlow进行训练的阶段开始,而不是使用预先训练好的机器学习模型(load-model-tensorflow-deep-mnist.json),请指定train-tensorflow-deep-mnist.json。

一旦学习模型完成后,将以微服务(REST API)形式启动。

$ seldon-cli client --action setup --db-name ClientDB --client-name deep_mnist_client
connecting to zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181 [SUCCEEDED]
Trying to create the client
API DB has already been added to the MySQL DB 'ClientDB'
Adding JS consumer key for client 'deep_mnist_client' : '2M5B4Z6PRM032QDK0AGS'
Adding REST API key for client 'deep_mnist_client' : consumer_key='6WAKZ1UNIOOJ98F1W62B' consumer_secret='Y1Q12S9YLAW3PQ82A336'
Writing data to file[/seldon-data/conf/zkroot/all_clients/deep_mnist_client/_data_]
updated zk node[/all_clients/deep_mnist_client]
Adding grafana dashboard, response code 200

$ start-microservice --type prediction --client deep_mnist_client -p tensorflow-deep-mnist /seldon-data/seldon-models/tensorflow_deep_mnist/1/ rest 1.0
[Pipeline(tensorflow-deep-mnist,/seldon-data/seldon-models/tensorflow_deep_mnist/1/,rest,1.000000)]
Replicas is  1
kubectl apply -f c:/seldon-server/kubernetes/bin/../conf/microservices/microservice-tensorflow-deep-mnist.json
deployment "tensorflow-deep-mnist" created
service "tensorflow-deep-mnist" created
connecting to zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181 [SUCCEEDED]
Writing data to file[/seldon-data/conf/zkroot/all_clients/deep_mnist_client/predict_algs/_data_]
Added prediction algs for deep_mnist_client
connecting to zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181 [SUCCEEDED]
updated zk node[/all_clients/deep_mnist_client/predict_algs]

接下来,我们将部署一个用于手写输入的Web应用程序的图形用户界面。

    Tensorflow Deep MNIST Webapp | Seldon Documentation

获取设置为WebApp的seldon参数(ip、key、secret)。

$ seldon-cli keys  --client-name deep_mnist_client --scope all
connecting to zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181 [SUCCEEDED]
[{"scope": "all", "secret": "Y1Q12S9YLAW3PQ82A336", "client": "deep_mnist_client", "db": "ClientDB", "key": "6WAKZ1UNIOOJ98F1W62B"}]

$ kubectl get services seldon-server
NAME            CLUSTER-IP   EXTERNAL-IP   PORT(S)                       AGE
seldon-server   10.0.0.187   <nodes>       80:30015/TCP,5000:30017/TCP   1h

请您分别填写、和的值,然后进行执行。

kubectl run deep-mnist-webapp --image=seldonio/deep_mnist_webapp:1.2 --port=80 --command -- "/run_webapp.sh" "<seldon-server-ip>" "<key>" "<secret>"

下面展示一种执行示例。在我的环境中,从”ContainerCreating”到”Running”的过程大约需要2到3分钟。

$ kubectl run deep-mnist-webapp --image=seldonio/deep_mnist_webapp:1.2 --port=80 --command -- "/run_webapp.sh" "10.0.0.187" "6WAKZ1UNIOOJ98F1W62B" "Y1Q12S9YLAW3PQ82A336"

(2~3分)

$ kubectl get pod | grep deep-mnist-webapp
deep-mnist-webapp-1912876608-l4wbf           1/1       Running   0          2m

$ kubectl expose deployment/deep-mnist-webapp --type="LoadBalancer"
service "deep-mnist-webapp" exposed

请使用以下命令获取部署的Web应用URL,并在浏览器中尝试打开。

c:\> minikube service deep-mnist-webapp --url
http://192.168.11.209:32244

如果应用程序能够显示出来并且能够以这种方式进行操作确认,那就可以了。

aaaaa.gif

Grafana 而已.

最后,让我们来看一下Grafana提供的统计信息仪表盘。
如果您没有更改kubernetes/conf/MAKEFILE文件的内容,可以使用admin/admin登录。

c:\> minikube service monitoring-grafana --url
http://192.168.11.201:30002
0000.png

请从”仪表盘”中选择您想要显示的样本信息。

    ml100k (Content Recommendation API)
11111.png
    deep_mnist_client (Prediction API)
2222.png

特别是Prediction API的显示位置在屏幕底部,请注意需要滚动才能看到图表。

建议

如果需要在Windows + Cygwin环境中验证操作,请注意以下选项:

在Windows + Hyper-V上建立Minikube环境。

请参考我们的文章,我们已经总结好了。如果您想在类似的环境下尝试,请参阅该文章。

可以在命令提示符中使用minikube命令来执行。

只需一个选项,以中文本地方式解释:
如果从Cygwin终端运行,会出现”Error starting host: Error creating host: Error executing step: Creating VM.: exit status 1.”的错误导致失败。请以管理员模式打开命令提示符(或PowerShell),然后从该位置执行minikube命令。

另外,kubectl命令在终端(和shell)中也可以正常运行。

编辑启动脚本

Seldon的脚本是用bash(seldon-up,seldon-cli)和python(start-microservice)编写的,因此需要在终端环境下才能执行。

使用Cygwin时,由于kubectl无法解释/cygdirve/等路径,我们在Seldon启动脚本中直接覆盖了目录路径以解决此问题。(例如:/cygdrive/c/seldon-server → c:/seldon-server)

在其他终端环境中是否也需要相同的处理方式不确定,但如果不能正常运行,请参考。


$ git diff seldon-up
diff --git a/kubernetes/bin/seldon-up b/kubernetes/bin/seldon-up
index 7b68c3c..b7267e2 100755
--- a/kubernetes/bin/seldon-up
+++ b/kubernetes/bin/seldon-up
@@ -4,6 +4,7 @@ set -o nounset
 set -o errexit

 STARTUP_DIR="$( cd "$( dirname "$0" )" && pwd )"
+STARTUP_DIR="c:\\seldon-server\\kubernetes\\bin\\"

 SELDON_HOME=${STARTUP_DIR}/../..
 SELDON_WITH_SPARK=${SELDON_WITH_SPARK:-true}

$ git diff start-microservice
diff --git a/kubernetes/bin/start-microservice b/kubernetes/bin/start-microservice
index ea1006a..776fe27 100755
--- a/kubernetes/bin/start-microservice
+++ b/kubernetes/bin/start-microservice
@@ -140,6 +140,7 @@ class MicroserviceRunner(object):

     def __init__(self,replicas=1):
         self.script_folder = os.path.dirname(os.path.realpath(__file__))
+        self.script_folder = 'c:/seldon-server/kubernetes/bin'
         self.replicas = replicas
         print "Replicas is ",self.replicas

在启动minikube时会出现以下错误:

我做了几次之后,有时候无法发生,有时即使发生了也能正常运行,可是再现性非常微妙,我并不太清楚。

c:\> minikube start --vm-driver="hyperv" --memory=14000 --disk-size=40g
Starting local Kubernetes v1.6.4 cluster...
Starting VM...
Downloading Minikube ISO
 90.95 MB / 90.95 MB [==============================================] 100.00% 0s
E0628 17:06:57.323398    4632 start.go:127] Error starting host: Error creating host: Error executing step: Provisioning VM.
: ssh command error:
command : sudo hostname minikube && echo "minikube" | sudo tee /etc/hostname
err     : exit status 255
output  : .

 Retrying.
E0628 17:06:57.327396    4632 start.go:133] Error starting host:  Error creating host: Error executing step: Provisioning VM.
: ssh command error:
command : sudo hostname minikube && echo "minikube" | sudo tee /etc/hostname
err     : exit status 255
output  :
================================================================================
An error has occurred. Would you like to opt in to sending anonymized crash
information to minikube to help prevent future errors?
To opt out of these messages, run the command:
        minikube config set WantReportErrorPrompt false
================================================================================

处理工作没有结束,一直试错下去结果无法正常运行。

    • kubectl get all

 

    • kubectl get events

kubectl describe nodes

请使用相关命令来确认是否发生了错误。

如果出现错误,有时可以通过使用”seldon-down && seldon-up”来完全重新启动并使其正常工作。

然而,将每个虚拟机都彻底删除并重新开始可能是最可靠的选择。

c:\> minikube stop
c:\> minikube delete

可以考虑删除minikube的配置文件和缓存文件,这样做会更安全放心一些。

$ rm -rf ~/.minikube

工作完成的监视

我觉得有很多方法,但我会举例说明不需要使用watch命令的情况。比如,尝试将DESIRED和SUCCESSFUL的值设为相等时停止,请参考。

$ yes 'kubectl get jobs | grep ml100k; sleep 5' | sh

理解Seldon的机制

我认为提供的样本中有几个,但是TensorFlow的Deep MNIST Demo最容易理解。在大致运行了这个示例之后,

Kuberetesで投げるジョブ

Dockerfile

学習ロジック

公式の解説

以此顺序查看可能更容易把握氛围,供您参考。

最终

尽管Seldon给人一种粗糙的印象,但从整体来看,我觉得它意外地简单且易于上手。一开始我对Kubernetes感到非常困惑,但在理解了Kubernetes的机制和思想后,我很容易地适应了进去。

「这些问题都很有趣,比如『我们能否在实际运营中使用它?』『我们能否轻松地将自己的机器学习模型集成到其中?』『是否会遇到难以适应Seldon规定的数据接口?』『如何对数据进行重新训练?(是否持久化到MySQL中?)』等等。我会在继续探索的同时,进行调查,并期待将来和它们更多地接触。」

如果你觉得这个名字非常美妙,充满浪漫,那么请务必尝试一下!

以上就是。


此外,即使努力了很多,似乎也需要6GB的内存。太过奢侈了呢…

如果以认真的态度,12GB不是–memory=12288的话会感到不安,请确切指定。

默认值是20GB,在尝试时会导致磁盘空间满了。在hostpath.json中指定了”50Gi”,可能最好准备这么多。

即使不设置路径也能工作,但有一些脚本如start-microservice假设已经设置了路径,所以最好设置一下。

在我的环境中花了约1小时。

根据视频,产品的发音与日语写法”セルダン”相近。虽然名字的由来没有官方公布,但是否来自于艾萨克·阿西莫夫的小说《基地》中出现的天才数学家”哈里·塞尔登”呢…?

bannerAds