  • 作成日:2019年07月08日 最終更新日:2019年07月08日
データ収集方法やコスト This dataset is collected by asking every traveler who has expereiced POI. Cost : investigators' salary, traffic fee.
データの共有について 一般的に共有してよい
データの共有について (その他を選ばれた方) everyone can use it.


データの種類 時系列
データの変数(パラメーター)の変数名 review_id|longitude|latitude|altitude|review_date|temperature|rating|user_id|user_birthday|user_nationality|category user_career|double user_income
データの概要説明 The dataset is about travelers' personal information.The number of instances is 2000'000 and the number of attributes is 12. The data are collected to find out which factors could affect travelers when they are rating about POI.
想定しているデータの分析・シミュレーションプロセス The input set are No.2-No.12attributes.(This is beacause the first attribute is ID which should not be used to exercise).The output set are factors that most likely have relationship with rating. We can use some regression method to find out the specific relationship with rating. At the same time,there are some missing in user_income.We can use career and natian information to forecast and fill the losing.
想定しているデータの分析・シミュレーションプロセスの結果 (データ分析結果/ツールの出力/典型例など) By using this dataset,I can have the answer about important factors that affect rating and the relationship among these factors.
上記の分析・シミュレーションプロセス以外に期待する分析 Use more suitable regression methods.


入手したいデータ/ツール If I can get these travelers' interests,it will be better because I want to analy the relationship between rating and hobby. If it is too difficult,it is ok to use interest data of people who live near them to replace.