约会数据集datingTestSet.txt,主要的样本特征:

  • 每年获得的飞行常客里程数

  • 玩视频游戏所耗时间百分比

  • 每周消费的冰激凌公升数

对原始数据做格式处理,需要用到KNN.py的file2matrix函数,将文本记录转换为Numpy

def file2matrix(filename):
    fr = open(filename)
    numberOfLines = len(fr.readlines())         #get the number of lines in the file
    returnMat = zeros((numberOfLines,3))        #prepare matrix to return
    classLabelVector = []                       #prepare labels return   
    fr = open(filename)
    index = 0
    for line in fr.readlines():
        line = line.strip()
        listFromLine = line.split('\t')
        returnMat[index,:] = listFromLine[0:3]
        classLabelVector.append(int(listFromLine[-1]))
        index += 1
    return returnMat,classLabelVector

results matching ""

    No results matching ""