Type of data: Check all that apply. Use "Other" to specify other types so that we can include them in further updates. |
series
table
|
Variable labels of dataset (the names of the variables) |
review_id|longitude|latitude|altitude|review_date|temperature|rating|user_id|user_birthday|user_nationality|category user_career|double user_income |
Outline of data |
The dataset is about travelers' personal information.The number of instances is 2000'000 and the number of attributes is 12. The data are collected to find out which factors could affect travelers when they are rating about POI. |
Simulation process |
The input set are No.2-No.12attributes.(This is beacause the first attribute is ID which should not be used to exercise).The output set are factors that most likely have relationship with rating.
We can use some regression method to find out the specific relationship with rating.
At the same time,there are some missing in user_income.We can use career and natian information to forecast and fill the losing. |
Expected outcome of the process (obtained knowledge, analysis results, output of tools) |
By using this dataset,I can have the answer about important factors that affect rating and the relationship among these factors. |
Anticipation for analyses/simulations other than the typical ones provided above |
Use more suitable regression methods. |
Comment form