WEBVTT 1 00:00:14.380 --> 00:00:19.100 So the second lesson is about our research. 2 00:00:21.110 --> 00:00:24.500 The data we use is cell phone signaling data. 3 00:00:26.090 --> 00:00:27.720 It's the Shanghai data. 4 00:00:27.720 --> 00:00:30.550 So all of our achievements are about Shanghai. 5 00:00:30.770 --> 00:00:33.330 Spatial analysis of Shanghai 6 00:00:35.600 --> 00:00:42.730 Mind data was just at the top of Mr. Yang Junyan's classified list. 7 00:00:44.020 --> 00:00:45.050 Lowest 8 00:00:46.240 --> 00:00:47.850 What are its advantages? 9 00:00:49.170 --> 00:00:51.830 It starts with a very high percentage of samples 10 00:00:52.140 --> 00:00:54.810 No matter which of the three operators, which one of you? 11 00:00:55.630 --> 00:00:57.570 All millions 12 00:00:59.530 --> 00:01:04.700 What we get is not to say that we get the data we allow us to use is China mobile. 13 00:01:05.100 --> 00:01:10.620 His sample size is close to 20 million. 14 00:01:10.810 --> 00:01:13.850 One thousand six thousand seven such a sample 15 00:01:17.240 --> 00:01:18.100 The second 16 00:01:18.320 --> 00:01:19.730 It's 24 hours 17 00:01:20.710 --> 00:01:22.490 Very full time coverage 18 00:01:23.570 --> 00:01:26.140 24 hours in your position, all in the record. 19 00:01:30.420 --> 00:01:32.310 So the accuracy of his record 20 00:01:33.170 --> 00:01:34.970 We use signaling data 21 00:01:36.380 --> 00:01:37.670 So-called signaling data 22 00:01:39.450 --> 00:01:43.860 Regular communication between cell phone and base station 23 00:01:44.340 --> 00:01:45.770 Mutual confirmation 24 00:01:48.060 --> 00:01:50.200 It's not the same as the call data. 25 00:01:50.770 --> 00:01:53.930 The data will only be recorded when you call. 26 00:01:53.930 --> 00:01:56.150 And the signaling data you don't call 27 00:01:56.150 --> 00:01:59.760 He will also check at regular intervals where you are. 28 00:02:00.150 --> 00:02:03.500 Which base station are you? 29 00:02:06.460 --> 00:02:08.720 So it is recorded 30 00:02:09.040 --> 00:02:09.930 Not initiative 31 00:02:09.930 --> 00:02:11.360 You are going to wechat. 32 00:02:11.360 --> 00:02:13.320 You micro-blog, you are active record. 33 00:02:13.600 --> 00:02:15.110 He is passive. 34 00:02:15.110 --> 00:02:18.160 Anyone will go in 35 00:02:19.000 --> 00:02:23.860 So this is the quality of full-time coverage of spatial information. 36 00:02:24.510 --> 00:02:27.820 It is the most direct record of residents' space activities. 37 00:02:29.430 --> 00:02:33.760 As I used to study the behavior of residents 38 00:02:33.920 --> 00:02:36.480 It is to use the quantitative method to study the behavior of the residents. 39 00:02:36.680 --> 00:02:38.700 So when I got this number 40 00:02:39.070 --> 00:02:40.130 I'm very excited 41 00:02:41.740 --> 00:02:44.260 I don't even think I'll ever talk about investigating again. 42 00:02:44.260 --> 00:02:45.260 Questionnaire survey 43 00:02:45.380 --> 00:02:46.460 Of course, this is impossible. 44 00:02:46.920 --> 00:02:51.800 But doing a questionnaire survey is a difficult link in the middle of the survey. 45 00:02:52.460 --> 00:02:54.780 A more nerve-racking link 46 00:02:54.940 --> 00:02:56.120 Put so much data 47 00:02:56.330 --> 00:02:58.190 Almost all the actions are in it. 48 00:02:58.350 --> 00:03:02.000 I'll analyze it later and sit at home and analyze it. 49 00:03:02.700 --> 00:03:04.560 This is of course naive. 50 00:03:04.560 --> 00:03:06.680 But it does, to some extent 51 00:03:06.820 --> 00:03:08.440 Since I started doing this data 52 00:03:08.880 --> 00:03:11.670 There is much less time to do the questionnaire. 53 00:03:14.600 --> 00:03:16.760 The new data has four information. 54 00:03:17.690 --> 00:03:18.180 ABC 55 00:03:18.180 --> 00:03:19.330 The fourth question 56 00:03:20.130 --> 00:03:23.230 There is an ID, we are not very worried. 57 00:03:23.870 --> 00:03:26.350 It seems that the information has been leaked. 58 00:03:26.660 --> 00:03:27.310 No 59 00:03:27.580 --> 00:03:28.260 You can see 60 00:03:28.260 --> 00:03:29.420 This is a new media 61 00:03:29.810 --> 00:03:36.170 This is ID, mobile phone ID, such a string, there is no information. 62 00:03:36.650 --> 00:03:38.100 This information is encrypted. 63 00:03:38.370 --> 00:03:42.940 It is encrypted and a unique identification number is obtained. 64 00:03:43.310 --> 00:03:44.840 This is a mobile phone, a sim card. 65 00:03:44.840 --> 00:03:46.740 Sim card it has an identification number 66 00:03:47.460 --> 00:03:53.770 So later June 26, 2009 67 00:03:54.160 --> 00:03:55.110 One time 68 00:03:56.090 --> 00:03:58.070 At the back is the number of the base station. 69 00:03:59.040 --> 00:04:00.740 When your phone records your location 70 00:04:00.740 --> 00:04:02.550 He can't pinpoint the location. 71 00:04:03.280 --> 00:04:06.350 This is your coordinates 72 00:04:06.680 --> 00:04:07.070 Can't make a reservation. 73 00:04:07.070 --> 00:04:08.900 Can only be ordered to the base station 74 00:04:09.390 --> 00:04:11.380 Its base station uses a double number. 75 00:04:11.380 --> 00:04:15.000 One is that the two together is a specific record. 76 00:04:15.350 --> 00:04:17.120 Then there are some status indicators 77 00:04:17.360 --> 00:04:19.440 What is the state of your mobile phone? 78 00:04:20.350 --> 00:04:28.550 This is one of the four, which can be said to be signaling data in the middle of the minimum amount of information. 79 00:04:30.200 --> 00:04:31.390 More than that 80 00:04:31.630 --> 00:04:32.720 Not this information 81 00:04:33.210 --> 00:04:36.520 But in general, if he can give you to take to analysis. 82 00:04:36.690 --> 00:04:40.470 It will deal with it like this, after information that has no privacy at all. 83 00:04:41.180 --> 00:04:43.110 To be able to hand it over to someone else 84 00:04:48.280 --> 00:04:53.810 Of course, this data is not used for the car is OK. 85 00:04:55.470 --> 00:04:56.850 Mind data collection 86 00:04:56.850 --> 00:04:59.150 He is the operator management. 87 00:04:59.930 --> 00:05:01.520 Is the base station full? 88 00:05:02.450 --> 00:05:03.570 Have been in a state of satisfaction 89 00:05:03.570 --> 00:05:05.710 That will add another one and it will increase the capacity. 90 00:05:07.160 --> 00:05:10.200 Or if no one in the mainland uses this base station, it will be removed. 91 00:05:10.200 --> 00:05:12.040 He is his internal management. 92 00:05:12.040 --> 00:05:16.530 So this data, it is not for us to plan the data collected. 93 00:05:16.760 --> 00:05:22.160 So, like these, of course, he was dealt with. 94 00:05:22.350 --> 00:05:23.540 It has no purpose 95 00:05:23.960 --> 00:05:25.070 No way to travel 96 00:05:25.340 --> 00:05:30.640 And these are the most important things for us to plan for traffic. 97 00:05:30.870 --> 00:05:32.220 There are also errors. 98 00:05:32.220 --> 00:05:33.670 And the absence of records 99 00:05:34.080 --> 00:05:35.250 With these errors 100 00:05:36.030 --> 00:05:37.130 Even so 101 00:05:37.130 --> 00:05:41.380 This data is still very valuable in urban space analysis. 102 00:05:43.760 --> 00:05:45.420 So we say the precision of space 103 00:05:45.560 --> 00:05:48.160 It is based on the base station. 104 00:05:48.160 --> 00:05:52.260 So how accurate is the single space of this egg? 105 00:05:52.740 --> 00:05:56.670 There are more than 36000 chickens in Shanghai. 106 00:05:57.540 --> 00:05:59.690 In the center of the city can be said to be the password. 107 00:06:01.320 --> 00:06:04.000 About 100 to 300 tons 108 00:06:04.850 --> 00:06:06.520 Our current mobile phone credit 109 00:06:06.850 --> 00:06:09.800 Street, which data, which base station? 110 00:06:10.100 --> 00:06:13.100 The base station in Tongji university is above the library. 111 00:06:15.210 --> 00:06:17.620 The upper part of the library is also read by everyone. 112 00:06:17.850 --> 00:06:19.610 Almost a circle in all directions 113 00:06:19.900 --> 00:06:22.600 All three operators are moving there. 114 00:06:23.030 --> 00:06:24.550 China telecom Unicom 115 00:06:24.940 --> 00:06:27.910 So we are all going up in the street now. 116 00:06:28.940 --> 00:06:30.640 Then you have to sacrifice a little when you get to the suburbs. 117 00:06:30.640 --> 00:06:33.180 About one to three kilometers. 118 00:06:35.830 --> 00:06:37.470 Then his total record 119 00:06:37.670 --> 00:06:39.700 The one we saw just now is a record. 120 00:06:39.700 --> 00:06:42.480 It's four to eight one a day 121 00:06:42.480 --> 00:06:43.540 Less weekends 122 00:06:43.540 --> 00:06:44.570 It's a little more usable. 123 00:06:44.810 --> 00:06:47.130 So we can count from these institutions 124 00:06:47.430 --> 00:06:48.650 How many users are there 125 00:06:48.850 --> 00:06:50.950 It's 23 million in two weeks. 126 00:06:51.940 --> 00:06:55.050 The daily record is about 1600 to 1800. 127 00:06:55.670 --> 00:06:58.880 This is 70% of his permanent population. 128 00:06:59.300 --> 00:07:01.470 You may only be able to do 2300 129 00:07:02.480 --> 00:07:04.270 There is a large floating population 130 00:07:04.660 --> 00:07:06.920 You took a walk from Shanghai to the railway station. 131 00:07:07.130 --> 00:07:09.880 After you pass by the train, you will also record it in Shanghai. 132 00:07:10.340 --> 00:07:11.790 That's the kind of person who's in there 133 00:07:11.790 --> 00:07:13.600 So it's all 2300 134 00:07:17.000 --> 00:07:18.960 What do we do with these data? 135 00:07:19.830 --> 00:07:21.470 Can be said to do a lot of use. 136 00:07:24.620 --> 00:07:26.750 First of all, of course, the data came. 137 00:07:26.750 --> 00:07:28.660 We're going to turn it into a usable data 138 00:07:28.660 --> 00:07:29.550 A prediction 139 00:07:30.320 --> 00:07:31.860 It's called the cleaning process. 140 00:07:33.430 --> 00:07:34.950 Questions include this 141 00:07:37.380 --> 00:07:40.910 The technique of eliminating anomaly of technology combination failure drift 142 00:07:41.410 --> 00:07:43.710 And the elimination of these spatial differences 143 00:07:43.710 --> 00:07:47.230 Of course, we are not doing all this work now. 144 00:07:47.230 --> 00:07:51.310 Just get rid of some obvious errors 145 00:07:53.650 --> 00:07:57.110 Even after three cleaning 146 00:07:57.340 --> 00:08:00.010 We are also going to do data training. 147 00:08:00.770 --> 00:08:04.010 Data training is called tag recognition. 148 00:08:05.180 --> 00:08:08.100 Then this requirement has a certain period of time data. 149 00:08:08.900 --> 00:08:10.240 If you only have one day's data. 150 00:08:10.240 --> 00:08:11.280 You can't identify it. 151 00:08:12.970 --> 00:08:14.980 So now we have two weeks of data 152 00:08:14.980 --> 00:08:18.150 So after two weeks of training 153 00:08:18.390 --> 00:08:22.680 You can know where the user lives and where he works. 154 00:08:23.080 --> 00:08:24.800 His regular place of shopping 155 00:08:24.990 --> 00:08:26.910 A place for leisure, etc 156 00:08:27.420 --> 00:08:30.330 Even now we can do dozens of attributes. 157 00:08:31.450 --> 00:08:34.880 Distance frequency 158 00:08:38.800 --> 00:08:44.940 So these things are labeled, and they become available data. 159 00:08:45.910 --> 00:08:49.820 So this job needs to be done by a little more professional students. 160 00:08:49.820 --> 00:08:51.620 After doing this, it becomes this kind of data. 161 00:08:52.020 --> 00:08:54.520 A little bit of a foundation 162 00:08:55.020 --> 00:08:56.820 Knowledge of database 163 00:08:56.920 --> 00:08:59.550 You can do what you want to do. 164 00:09:02.100 --> 00:09:04.860 How high is the precision of the mobile phone data? 165 00:09:08.240 --> 00:09:10.440 People often come to ask this question. 166 00:09:10.630 --> 00:09:11.870 You are not allowed 167 00:09:15.350 --> 00:09:17.570 We can answer that you have no more accurate data. 168 00:09:17.570 --> 00:09:18.800 So this is the most accurate. 169 00:09:19.380 --> 00:09:21.110 There is no more accurate maximum. 170 00:09:21.110 --> 00:09:22.670 So if you have more accurate data 171 00:09:22.670 --> 00:09:23.590 We'll use it 172 00:09:23.870 --> 00:09:24.870 Can you take it out? 173 00:09:24.870 --> 00:09:25.780 Can't take it out. 174 00:09:25.980 --> 00:09:27.760 Of course, we will try our best to answer it. 175 00:09:27.760 --> 00:09:30.120 We must also believe that the data is accurate. 176 00:09:30.120 --> 00:09:31.740 So we use five volts 177 00:09:31.740 --> 00:09:36.530 The data of Liu pu is compared with the data identified by our mobile phone. 178 00:09:36.980 --> 00:09:38.230 This is the resident population. 179 00:09:38.870 --> 00:09:41.990 The census records cell phone data of the resident population. 180 00:09:41.990 --> 00:09:43.760 The number of people we recognize as living people 181 00:09:44.220 --> 00:09:45.630 Let's have a look at these two pictures. 182 00:09:45.630 --> 00:09:47.040 Do another related problem 183 00:09:47.040 --> 00:09:50.100 0.9 is still quite high 184 00:09:50.100 --> 00:09:53.490 Well, there is already a difference of four years. 185 00:09:54.530 --> 00:09:59.390 So it should be said that we still firmly believe that he used mobile phone data to do. 186 00:09:59.390 --> 00:10:01.930 A big result of it is very accurate. 187 00:10:03.540 --> 00:10:06.370 So with this data, you can do a lot of things. 188 00:10:06.370 --> 00:10:11.040 At present, we have summed up the work in these aspects. 189 00:10:11.040 --> 00:10:12.890 One is the dynamic balance of population. 190 00:10:14.110 --> 00:10:15.370 Population dynamics 191 00:10:16.720 --> 00:10:18.510 When we used to talk about the population of the city 192 00:10:18.510 --> 00:10:19.760 It doesn't have a dynamic concept 193 00:10:20.440 --> 00:10:21.630 Or it's dynamic 194 00:10:21.630 --> 00:10:24.370 That is, the year before last year, such a dynamic. 195 00:10:24.640 --> 00:10:26.320 Now we have the cell phone data 196 00:10:26.550 --> 00:10:27.440 You say population 197 00:10:27.440 --> 00:10:29.370 You must say what time is the person? 198 00:10:30.070 --> 00:10:31.440 Because at different times of the day 199 00:10:31.440 --> 00:10:32.650 Its population is different. 200 00:10:33.990 --> 00:10:35.110 Make a dynamic assessment 201 00:10:35.110 --> 00:10:37.600 The second is the assessment of the completed environment. 202 00:10:37.770 --> 00:10:39.950 Or assessment of planning and implementation 203 00:10:41.280 --> 00:10:44.310 Mobile phone data can only be used for activities. 204 00:10:44.550 --> 00:10:45.720 A case of it 205 00:10:46.910 --> 00:10:48.390 So where this kind of activity 206 00:10:48.390 --> 00:10:52.300 It reflects a series of performances of this built environment. 207 00:10:52.530 --> 00:10:57.090 So we're going to use this data to look at the environment in reverse 208 00:10:59.540 --> 00:11:02.750 Construction of monitoring platform for planning and monitoring 209 00:11:03.170 --> 00:11:05.080 The transmutation of the city 210 00:11:05.520 --> 00:11:07.140 Planning is in progress 211 00:11:07.760 --> 00:11:11.150 So how to know his current situation? 212 00:11:13.180 --> 00:11:14.000 Collect data 213 00:11:16.870 --> 00:11:19.800 The fourth is for specific facilities and activities. 214 00:11:21.080 --> 00:11:25.740 This big data may be that we feel that making big things is not its advantage. 215 00:11:26.140 --> 00:11:29.370 In fact, big data, its advantage is to do small things. 216 00:11:30.900 --> 00:11:32.100 All kinds of gadgets 217 00:11:32.100 --> 00:11:34.390 He can do it big, of course. 218 00:11:34.390 --> 00:11:35.380 But big 219 00:11:35.560 --> 00:11:36.270 Beyond that 220 00:11:36.270 --> 00:11:39.690 You can see the system in this picture after you enlarge it indefinitely. 221 00:11:41.150 --> 00:11:43.070 This is a little data you can't do. 222 00:11:45.650 --> 00:11:47.170 So something like that 223 00:11:47.320 --> 00:11:49.120 We turned it into a result 224 00:11:49.540 --> 00:11:51.580 Call it a set of current situation maps 225 00:11:52.170 --> 00:11:59.390 An index system and sustainable planning management system 226 00:12:00.680 --> 00:12:01.610 A set of current situation maps 227 00:12:01.610 --> 00:12:06.290 Here we assess these modern environments as reflected in the picture. 228 00:12:06.760 --> 00:12:09.580 This graph can be produced on a regular basis every year. 229 00:12:09.870 --> 00:12:12.150 Update according to cell phone data 230 00:12:12.410 --> 00:12:12.630 Right 231 00:12:12.790 --> 00:12:14.220 The boss is also updating. 232 00:12:14.700 --> 00:12:15.890 A set of index system 233 00:12:15.890 --> 00:12:17.170 According to this state 234 00:12:17.170 --> 00:12:19.100 We worked out a series of indicators. 235 00:12:19.100 --> 00:12:21.620 This indicator reflects the different aspects of the city. 236 00:12:23.210 --> 00:12:26.990 This is also a regular time to release this information on a regular basis. 237 00:12:28.110 --> 00:12:35.100 At the same time, this data can be used to build a planning and management of the dynamic monitoring and monitoring system. 238 00:12:38.080 --> 00:12:44.980 Then, such a map or indicators in the space unit can be done throughout the city of Shanghai. 239 00:12:45.450 --> 00:12:46.760 Can also do 240 00:12:46.880 --> 00:12:48.800 Urban area or district county 241 00:12:49.260 --> 00:12:51.540 Even to the control unit or street. 242 00:12:52.670 --> 00:12:56.010 So its application should say that the space is very large. 243 00:12:56.590 --> 00:12:58.650 It's definitely not done once 244 00:12:59.550 --> 00:13:01.030 Make a house and always do it 245 00:13:01.570 --> 00:13:03.180 Then there are certain facilities 246 00:13:04.020 --> 00:13:06.930 Football stadium dance hall 247 00:13:07.080 --> 00:13:08.530 It can be done with respect to specific facilities. 248 00:13:09.980 --> 00:13:11.090 So much more important 249 00:13:11.250 --> 00:13:12.190 I wrote a sentence here. 250 00:13:12.190 --> 00:13:15.050 Mobile phone name data is also updating the summary of the plan. 251 00:13:15.310 --> 00:13:20.260 The data we now have on mobile phones is not the same as the data they received a few years ago. 252 00:13:20.540 --> 00:13:21.230 Including accuracy 253 00:13:21.410 --> 00:13:22.110 Include something 254 00:13:22.730 --> 00:13:24.510 It itself is in the middle of evolution 255 00:13:25.520 --> 00:13:28.510 The precision technology including the precision of 4G is very high. 256 00:13:28.690 --> 00:13:32.740 We haven't had the opportunity to use 4G news data yet 257 00:13:33.430 --> 00:13:37.260 And we can imagine what kind of data is 5g6g7g 258 00:13:37.320 --> 00:13:38.650 We can't do anything. 259 00:13:41.650 --> 00:13:42.130 Need 260 00:13:42.130 --> 00:13:43.470 So everyone is interested 261 00:13:43.790 --> 00:13:46.490 It is really possible to consider developing in this direction.