0
Follow
0
View

To recommend the data set used for Chinese news recommendation

wangxianchun 注册会员
2023-02-27 16:53

data set for Chinese news recommendation. Take a look at the following:
There are some publicly available Chinese News datasets, e.g.
1.THUCNews
2.Sina News
3.Tencent RecSys
4.iFLYTEK RecSys.
These datasets contain different categories of news articles as well as user behavior data on news, such as clicks, comments, shares, etc.
These data sets can be used to train and evaluate news recommendation systems, such as content-based recommendation, collaborative filtering recommendation, deep learning recommendation, etc. The subject can choose the appropriate data set and algorithm based on your needs and goals.

cupl513 注册会员
2023-02-27 16:53
< div class = "md_content_show e397 data - v - 3967" = "" >

< span > after two series of data sets where to find? < / span >

duellist 注册会员
2023-02-27 16:53

You can try using the following Chinese news recommendation data set:

News Recommendation data set in Tencent RecSys Challenge 2020. The data set contains user behavior data extracted from Tencent's news App, including users' browsing, clicking, commenting, liking and other actions. With about 400,000 users, 30 million articles, and 120 million user actions, the data set is a very large one.

MIND large-scale news recommendation data set. The dataset, published by Microsoft, contains behavioral data from more than 100,000 news publishers, 70,000 topics, 500,000 articles, and 2 million users. The data set contains users' historical clicking, browsing, collecting, sharing and other behavioral data.

iFLYTEK-THUCNews data set. The dataset, jointly published by iFLYTEK and Tsinghua University, contains 10 news categories and 600,000 news articles. Each article contains information such as title, body, keywords and tags, which can be used for text classification and news recommendation tasks.

It should be noted that these data sets may need to be cleaned and preprocessed for Chinese news recommendation tasks. In addition, these data sets may involve privacy issues and need to comply with relevant laws and regulations.

dgheee1122 注册会员
2023-02-27 16:53

The following answers are based on ChatGPT and GISer Liu:

There are many data sets that can be used to study historical user behavior. Here are some data sets that might work for you:

  1. News Recommendations Dataset : This dataset was provided by the Recommendations Systems Research Group at Stanford University and included user recommendations dataset data from news sites. The data includes the ID of the user, the ID of the story, the click time, and other relevant information. Data sets can be used to build and evaluate news recommendation systems.
  2. Kaggle Click-Through Rate Prediction Dataset : This dataset was provided by contests on Kaggle, including Yahoo! Front Page Today Module user click data. The data set includes the user's ID, the AD's ID, click time, and other relevant information. Data sets can be used to build and evaluate AD recommendation systems.
  3. MovieLens : This dataset was provided by the GroupLens research team and includes user ratings and tag data from movie rating sites. The data set includes the ID of the user, the ID of the movie, the rating, and other relevant information. Data sets can be used to build and evaluate movie recommendation systems.
  4. Aliyun Tianchi has many publicly available language datasets for download;

5. Baidu's AI studio also has many data sets available for download.
All of the above data sets are publicly available and can be downloaded from their official websites or other sources.

daisyby 注册会员
2023-02-27 16:53

The following are some commonly used Chinese news recommended datasets:
(1)Toutiao Dataset: The data set, provided by Toutiao, contains information such as users' clicks on news articles and reading history. The data set is 20GB and contains 3.82 million users and 350,000 news articles. Can be in the < a href = "https://github.com/THUzhz/ToutiaoDataset_v2" Target = "_blank" > < span > https://github.com/THUzhz/ToutiaoDataset_v2 < / span > < / a > on acquisition.
(2)MIND Dataset: This dataset was provided by Microsoft and contained the information of users' clicking and reading history, search history, collection history, etc. The data set is 25GB and contains 10 million users and 5.5 million news articles. Can be in the < a href = "https://www.microsoft.com/en-us/research/project/mind-large-scale-ai-for-computational-advertising/" target="_blank"> https://www.microsoft.com/en-us/research/project/mind-large-scale-ai-for-computational-advertising / .
(3)Sina News Dataset: This dataset was provided by Sina and contained information such as users' clicks on news articles and reading history. The data set is 4GB and contains 20 million users and 50 million news articles. Can be in the < a target = "_blank" > < span > http://ir.sina.com.cn/data.html < / span > < / a > on acquisition.

davee20000 注册会员
2023-02-27 16:53

This is a mechanism for different news sites to record users' visit history. Similar to the browser history

yt0316 注册会员
2023-02-27 16:53

There are some data sets available to suit your needs. For example, Yahoo! The News data set is a typical user history behavior data set, which contains information about users' views on Yahoo! A history of clicking on an article link on a news website. In addition, there are other commonly used user behavior data sets, such as MovieLens, Netflix, Last.fm, etc.

There are several open data sets of user history behavior available. For example, Kaggle has a Dataset called the News Clicks Dataset, which contains 74,809 news stories from a week and the click records of more than 2 million active users of a news site. In addition, there are other similar User historical behavior Data sets, such as Yahoo News Click, User Click Data From Shopping Website, and so on.

can be found from the Internet some public data sets, such as the UCI machine learning repository(< a href = "https://archive.ics.uci.edu/ml/index.php), Kaggle(https://www.kaggle.com/), etc. There are some user behavior related history data sets, such as: "> https://archive.ics.uci.edu/ml/index.php), Kaggle(https://www.kaggle.com/), there are some user behavior related history data sets, such as:

    < li >

    MovieLens(< a href = "https://grouplens.org/datasets/movielens/) : contains the user's film rating data, which can be used to analyze the user's browsing history movie." > https://grouplens.org/datasets/movielens/) : contains the user's film rating data, which can be used to analyze the user's browsing history movie.

    < / li > < li >

    Last. FM(< a href = "https://grouplens.org/datasets/hetrec-2011/) : contains the user's music playback history data, which can be used to analyze the user behavior of music playback history." > https://grouplens.org/datasets/hetrec-2011/) : contains the user's music playback history data, which can be used to analyze the user behavior of music playback history.

    < / li > < li >

    Hetrec(< a href = "https://grouplens.org/datasets/hetrec-2011/) : contains the user's browsing history behavior data such as movies, music, books, can be used to analyze the user a variety of entertainment." > https://grouplens.org/datasets/hetrec-2011/) : contains the user's browsing history behavior data such as movies, music, books, can be used to analyze the user a variety of entertainment.

dengjiaqing2012 注册会员
2023-02-27 16:53

Selected a batch of user(candidate.txt) and a batch of candidate information content data(news_info.csv) to recommend to users. At the same time, it provides a variety of behavioral data of this group of users on the information content in a certain 3 days(recorded as day N-2, day N-1 and day N), including click, complete reading, comment, favorite, share, etc. As the training number < br / > < a href = "https://aistudio.baidu.com/aistudio/datasetdetail/2352/0" id = "textarea_1676803561010_1676803789543_0" rel="noopener noreferrer" target="_blank">

csy19920701 注册会员
2023-02-27 16:53

There are many Chinese data sets that can be used for news recommendations, some of which include user history behavior, such as:

1 MIND Microsoft News Dataset: A large-scale news dataset provided by Microsoft, which contains real news data and user interaction data, can be used for News Recommendation Dataset research. This dataset contains news articles, users' news clicking and display behavior, and other relevant information.

2 Alibaba News Recommendation Dataset: The news recommendation dataset provided by Alibaba contains real news data and user interaction data. This dataset contains users' news clicking, display, and search behavior, as well as other relevant information.

3 Toutiao User Behavior Dataset: The user behavior dataset provided by Toutiao contained the real news data and user interaction data. This dataset contains users' news clicks, displays, comments, and favorites, as well as other relevant information.

If it helps you, please give it, thank you.

About the Author

Question Info

Publish Time
2023-02-27 16:53
Update Time
2023-02-27 16:53