by Shiyu Chen
With the continuous development of network technology and the ever-expanding scale of e-commerce, the number and variety of goods grow rapidly and users need to spend a lot of time to find the goods they want to buy. This is information overload. In order to solve this problem, recommendation system came into being.
The recommendation system is a subset of the Information Filtering System, which can be used in a range of areas such as movies, music, e-commerce, and Feed stream recommendations. The recommendation system discovers the user’s personalized needs and interests by analyzing and mining user behaviors, and recommends information or products that may be of interest to the user. Unlike search engines, recommendation system do not require users to accurately describe their needs, but model their historical behavior to proactively provide information that meets user interests and needs.
In this article we use PaddlePaddle, a deep learning platform from Baidu, to build a model and combine Milvus, a vector similarity search engine, to build a personalized recommendation system that can quickly and accurately provide users with interesting information.
We take MovieLens Million Dataset (ml-1m)  as an example. The ml-1m dataset contains 1,000,000 reviews of 4,000 movies by 6,000 users, collected by the GroupLens Research lab. The original data includes feature data of the movie, user feature, and user rating of the movie, you can refer to ml-1m-README  .
ml-1m dataset includes 3 .dat articles: movies.dat、users.dat and ratings.dat.movies.dat includes movie’s features, see example below:
This means that the movie id is 1, and the title is 《Toy Story》, which is divided into three categories. These three categories are animation, children, and comedy.
users.dat includes user’s features, see example below:
This means that the user ID is 1, female, and younger than 18 years old. The occupation ID is 10.
ratings.dat includes the feature of movie rating, see example below:
That is, the user 1 evaluates the movie 1193 as 5 points.
In the film personalized recommendation system we used the Fusion Recommendation Model  which PaddlePaddle has implemented. This model is created from its industrial practice.
First, take user features and movie features as input to the neural network, where:
For the user feature, map the user ID to a vector representation with a dimension size of 256, enter the fully connected layer, and do similar processing for the other three attributes. Then the feature representations of the four attributes are fully connected and added separately.
For movie features, the movie ID is processed in a manner similar to the user ID. The movie type ID is directly input into the fully connected layer in the form of a vector, and the movie name is represented by a fixed-length vector using a text convolutional neural network. The feature representations of the three attributes are then fully connected and added separately.
After obtaining the vector representation of the user and the movie, calculate the cosine similarity of them as the score of the personalized recommendation system. Finally, the square of the difference between the similarity score and the user’s true score is used as the loss function of the regression model.
Combined with PaddlePaddle’s fusion recommendation model, the movie feature vector generated by the model is stored in the Milvus vector similarity search engine, and the user feature is used as the target vector to be searched. Similarity search is performed in Milvus to obtain the query result as the recommended movies for the user.
The inner product (IP) method is provided in Milvus to calculate the vector distance. After normalizing the data, the inner product similarity is consistent with the cosine similarity result in the fusion recommendation model.
There are three steps in building a personalized recommendation system with Milvus, details on how to operate please refer to Mivus Bootcamp .
# run train.py $ python train.py
Running this command will generate a model recommender_system.inference.model in the directory, which can convert movie data and user data into feature vectors, and generate application data for Milvus to store and retrieve.
# Data preprocessing, -f followed by the parameter raw movie data file name $ python get_movies_data.py -f movies_origin.txt
Running this command will generate test data movies_data.txt in the directory to achieve pre-processing of movie data.
# Implementing personal recommender system based on user conditions $ python infer_milvus.py -a <age>-g <gender>-j <job>[-i]
Running this command will implement personalized recommendations for specified users.
The main process is:
Prediction of the top five movies that the user is interested in:
TopIdsTitleScore 03030Yojimbo2.9444923996925354 13871Shane2.8583481907844543 23467Hud2.849525213241577 31809Hana-bi2.826111316680908 43184Montana2.8119677305221558
By inputing user information and movie information to the fusion recommendation model we can get matching scores, and then sort the scores of all movies based on the user to recommend movies that may be of interest to the user. This article combines Milvus and PaddlePaddle to build a personalized recommendation system. Milvus, a vector search engine, is used to store all movie feature data, and then similarity retrieval is performed on user features in Milvus. The search result is the movie ranking recommended by the system to the user.
Milvus  vector similarity search engine is compatible with various deep learning platforms, searching billions of vectors with only millisecond response. You can explore more possibilities of AI applications with Milvus with ease!
MovieLens Million Dataset (ml-1m): http://files.grouplens.org/datasets/movielens/ml-1m.zip
Fusion Recommendation Model by PaddlePaddle: https://www.paddlepaddle.org.cn/documentation/docs/zh/beginners_guide/basics/recommender_system/index.html#id7