我们可以在engine template gallery选择适合自己应用场景的机器学习任务模板。

Classification Engine Template （scala）

PredictionIO的Classification Engine Template默认集成Spark MLlib的Naive Bayes算法。

通过Engine Template创建Engine

下载github源码

$ git clone https://github.com/apache/incubator-predictionio-template-attribute-based-classifier.git MyClassification

创建AppID和Access Key

$ pio app new MyApp1

输出信息如下所示：

[INFO] [HBLEvents] The table pio_event:events_1 doesn't exist yet. Creating now...
[INFO] [App$] Initialized Event Store for this app ID: 1.
[INFO] [Pio$] Created a new app:
[INFO] [Pio$]       Name: MyApp1
[INFO] [Pio$]         ID: 1
[INFO] [Pio$] Access Key: ytxhxxg7rLgqS8ZabTASRs74B8ba2_o8XxWR_U0GGH7EJun30N8RMcx7Q8UkI-nt

查看所有的app

$ pio app list

收集训练Data

engine template会读取用户记录的四个属性：attr0,attr1,attr2,plan。你可以通过HTTP或SDK来向PredictionIO Event Server发送event。下面将使用curl命令进行操作，为方便操作，设置环境变量ACCESS_KEY

$ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \
-H "Content-Type: application/json" \
-d '{
  "event" : "$set",
  "entityType" : "user",
  "entityId" : "u0",
  "properties" : {
    "attr0" : 0,
    "attr1" : 1,
    "attr2" : 0,
    "plan" : 1
  }
  "eventTime" : "2014-11-02T09:39:45.618-08:00"
}'

查询event server

$ curl -i -X GET "http://localhost:7070/events.json?accessKey=$ACCESS_KEY"

导入更多的数据

为了导入更多的训练数据，我们需要借助python脚本import_eventserver.py

首先需要安装Python SDK

$ sudo pip install predictionio

然后，导入项目所需要的训练数据

$ cd MyClassification
$ python data/import_eventserver.py --access_key $ACCESS_KEY

将Engine发布为服务

编辑Engine.json

  ...
  "datasource": {
    "params" : {
      "appName": "MyApp1"
    }
  },
  ...

构建MyClassification的engine

$ pio build --verbose

由于在构建过程中需要下载sbt的依赖，默认使用的是Maven中央仓库，为了加速sbt构建，可以选择使用阿里云的maven库。具体使用方式是，在~/.sbt/repositories配置

[repositories]
local
aliyun-nexus: http://maven.aliyun.com/nexus/content/groups/public/  
typesafe: http://repo.typesafe.com/typesafe/ivy-releases/, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext], bootOnly
sonatype-oss-releases
maven-central
sonatype-oss-snapshots

训练数据

$ pio train

部署服务

$ pio deploy

分类实例