约会数据集datingTestSet.txt,主要的样本特征:
每年获得的飞行常客里程数
玩视频游戏所耗时间百分比
每周消费的冰激凌公升数
对原始数据做格式处理,需要用到KNN.py的file2matrix函数,将文本记录转换为Numpy
def file2matrix(filename):
fr = open(filename)
numberOfLines = len(fr.readlines()) #get the number of lines in the file
returnMat = zeros((numberOfLines,3)) #prepare matrix to return
classLabelVector = [] #prepare labels return
fr = open(filename)
index = 0
for line in fr.readlines():
line = line.strip()
listFromLine = line.split('\t')
returnMat[index,:] = listFromLine[0:3]
classLabelVector.append(int(listFromLine[-1]))
index += 1
return returnMat,classLabelVector