矩阵分解是推荐系统的一种常用方法,其思想其实很简单。对于user-item 这个评分矩阵,若用户和物品数量较多,其维度可能会非常大,我们可以将其分解为两个维度较小的矩阵,然后想办法用这两个小矩阵去“还原”原来的矩阵,使误差尽可能小。我们可以将R其分解为用户-特性矩阵,以及特性-物品矩阵。这样做的好处有两点:
1. 得到了用户的偏好,以及物品的特性
2. 降低了矩阵的维度。
代码:
def matrix_factorization(R, P, Q, K, steps=10, alpha=0.0002, beta=0.02): Q = Q.T for step in range(steps): for i in range(len(R)): for j in range(len(R[i])): if R[i][j] > 0: eij = R[i][j] - numpy.dot(P[i,:],Q[:,j]) for k in range(K): P[i][k] = P[i][k] + alpha * (2 * eij * Q[k][j] - beta * P[i][k]) Q[k][j] = Q[k][j] + alpha * (2 * eij * P[i][k] - beta * Q[k][j]) #eR = numpy.dot(P,Q) e = 0 for i in range(len(R)): for j in range(len(R[i])): if R[i][j] > 0: e = e + pow(R[i][j] - numpy.dot(P[i,:],Q[:,j]), 2) for k in range(K): e = e + (beta/2) * ( pow(P[i][k],2) + pow(Q[k][j],2) ) if e < 0.001: break return P, Q if __name__ == "__main__": R = [ [1,0,0,0,0,0,0], [0,0,0,1,0,1,0], [1,1,0,0,0,0,0], [1,0,0,0,0,0,1], [0,1,0,0,0,1,0], ] R = numpy.array(R) N = len(R) M = len(R[0]) K = 10 P = numpy.random.rand(N,K) Q = numpy.random.rand(M,K) print('generate target matrix P and Q finished') nP, nQ = matrix_factorization(R, P, Q, K) print('matrix factorization finished!') print(R) T = numpy.dot(nP,nQ) rank = dict() for i in range(0, len(T)): b = np.argsort(-T[i], axis=1) for j in range(0, K): rank.setdefault(i, b[j]) print(rank)
上一篇:网络表示学习