Data Scientist Thinking: How to Transform Data with Data
Lead
We continue with the "Optimization" module on Mental Models courses. The question we're going to discuss in this lesson is, how do you solve a problem through data awareness?
In the process of solving problems, we can often know a general direction, but how to do it is sometimes not clear enough.
But if we consciously use data to locate problems, it is often easier for us to find the gripper to solve the problem.
The experts who are best at solving problems with data are called data scientists. The biggest difference between him and traditional data statistics is that the goal of statistics is to record and organize, while data scientists are committed to solving real real-world problems.
In this lesson, we invited a data scientist who is particularly good at solving problems with data, Mao Mingrui. He is the founder of the data company Urban Quadrant and a planner at the Beijing Urban Planning Institute. His strength is the use of data analysis to diagnose and treat diseases in big cities.
He and his data team made suggestions for data analysis and transformation of Beijing's Huilongguan urban area. This plan, which came from the private sector, was later adopted by the Changping District Government of Beijing. In this lesson, we will ask him to talk about how he used data to diagnose and transform cities.
The instructor of this lesson is Mao Mingrui and the researcher is Luo Yan, who relays the narrator Huaisha.
Okay, let's get started.
In the course of today's lecture, I remind you of one difference: the difference between making decisions through intuition and making decisions through data.
In 2016, I started a project: how can I increase the activity of the Huilongguan community?
Maybe the students in Beijing may smile when they hear it. What exactly is Huilongguan?
It is a satellite urban area outside the north fifth ring road of Beijing, and it is also a well-known sleeping city, sleeping sleep. It was developed in 1998 to undertake the demolition of the central urban area of Beijing, and it was also home to the first affordable housing and relocation housing in Beijing, where more than 300,000 people currently live, claiming to be the largest residential area in Asia.
Although Huilongguan has seen so many people, it has always lacked vitality. It has been almost 20 years, and a healthy community ecology has not been formed. A large number of residents flock to other urban areas to work during the day and then come back together to sleep together at night, and the commute is crowded and time-consuming, and everyone is very miserable. After living here for 20 years, everyone just uses this place as a bed.
This also caused headaches for the government of Changping District in Beijing, to which Huilongguan belonged. So how to improve the activity of urban areas?
In fact, suppose you are a smart government official, and you can think of two solutions when you pat your head:
First, there is no one during the day, so increase the job opportunities and let people come back to work during the day;
Second, it is difficult to commute, so build more roads.
But after these two tasks are spread out, you may sigh. Because there is nowhere to start. For example, the first option: increase job opportunities, so how to increase? What types of job opportunities are being promoted? Open a mall or a factory? There is a second plan: build more roads, where to build more roads? Bus lanes or subways? How should resources be invested?
All of these questions are completely clueless, and that's where it's all about making intuitive decisions.
The same question, how will data decisions be dealt with? Please go back to the perspective of the least white, and let me see, from the perspective of a data scientist, how do we solve the problem of the activity of the Huilong view?
The new data is new oil
The transformation of the old city, people have also tried a lot in the past. The government will also look for the basis for decision-making from the data, for example, like the following data, how many libraries and activity centers for the elderly does a region need? What are the number of jobs and the number of people living in this area? And how much land can be developed here?
We see that behind these data, the government's real focus is either on the improvement of land value or the improvement of infrastructure. But the main body of these problems is actually the government.
When faced with the question of "how to improve the quality of life of residents and enhance the activity of the city", it is difficult to give accurate answers to previous solutions.
The good news is that the people who do data in this era have a big dividend. With the popularity of the mobile Internet and the widespread application of various types of sensors, the cost of data collection has been reduced, and we have a large amount of underlying data.
These new data are the new oil of our time. Whether it is the increase in data dimension or the increase in accuracy, the new data gives us a deeper understanding of the problem and can guide us to find more accurate solutions.
And what are the new data for transforming cities? Too many, let me give you a few examples:
The first is the swipe data of IC bus cards, the use of which has been neglected in the past. However, if analyzed in depth, the swipe card data can well reflect the trajectory of the commuting behavior of huilongguan residents. When do people ride in the car? Where to go by car? How long do you want to sit? With this data, we can restore the real commuting behavior of Huilongguan residents.
Second, mobile phone positioning data, there is too much information to be mined here. For example, how many real residents are there in the Huilong Temple? At what time did they all stay at Huilongguan? And what phones do they use? There are so many dimensions in this that I won't list them all here.
There are still many interesting dimensions to look at the data, and Internet products can also record a large amount of urban data. You can use public reviews to analyze the city's catering situation, for example, the high-frequency words of Huilongguan's catering are fast food and snacks. Compared with the high-frequency words in other urban areas, you can judge that the consumption level of this urban area is not high.
Sensors such as mobile phones, Internet products, and cameras have accumulated a large amount of raw data, which are the rich mines of our re-understanding of the world. Other industries have also ushered in this opportunity, and the key is to see if you can take advantage of this data.
Next, I will tell you how I used the data of Huilongguan after I got it.
Find the frame of reference, look for points of difference
As we said, the first thing is to find data, and the second thing I do is to find a suitable frame of reference to make comparisons.
How to study the problem of Huilongguan? We found another satellite city in Beijing, located in the Wangjing City district northeast of Beijing, and compared it with Huilongguan. Beijing's Wangjing city, whose permanent population is also 300,000, was also criticized as a sleeping city in the early years, but the vitality of the urban area has improved very obviously in recent years. Let's look at the above observations, what is the difference between these two communities?
Let's first compare a large number, the permanent population of these two urban areas is 300,000, how big is the difference in the flow of people in the subway during peak periods?
In 2018, in the Beijing subway station, the top ten stations in the morning peak in terms of passenger flow, four subway stations in Huilongguan District were all on the list, and the four subway stations were: Huilongguan, Huoying, Longze and Huilongguan East Street. And wangjingcheng district does not have a subway station to enter the top ten.
The question worth asking is: the permanent population of the two places is similar, why is the flow of people commuting by subway so much? In the field of urban research, there is a proper noun to describe this phenomenon, called the separation of work and housing, occupation is the occupation of the workplace, and living is the residence of residence. This means that people's places of work and residence are very far apart, so does that mean that the separation of work and residence in Huilongguan is more serious than that of Wangjing?
We collected mobile phone location data from residents of both places:
We found that the proportion of internal commuting between the two places, that is, the proportion of people working locally, was only 9.4% in Huilongguan, while Wangjing had 23.7%. Wangjing provides more jobs, with a quarter of the people working locally, while more than 90% of the people in Huilongguan have to go to other urban areas to work.
Looking at the average commuting distance, the average commuting distance of Huilongguan residents is 10.9 kilometers, while Wangjing is only 8.6 kilometers. Huilongguan residents are farther away from where they go to work.
Comparing some detailed data, how long does it take for residents of the two places to take the subway?
The monitoring found that the morning peak of Huilongguan's card entry was 15 minutes earlier than that of Wangjing, which started at 7:45 a.m., while Wangjing was at 8 o'clock. What about the time to go home at night? People who return to the Dragon View are even later. At night, the peak of Wangjing's card swiping out of the station is from 6:00 to 6:30, while the residents who live in Huilongguan do not concentrate on swiping the card out of the station at 7:00 to 7:00, which is equivalent to saying that the average is 45 minutes late.
Residents who live in Huilongguan and commute by subway go out 15 minutes earlier than people in Wangjing and arrive home 45 minutes late, which is an extra hour of commuting time. That's not counting the time they had lined up outside The Huilongguan Station.
On the issue of commuting, Huilongguan residents spend more than 1 hour more than Wangjing residents, which means more than 1 hour less living time. No wonder they feel their bodies hollowed out.
Hearing this, you may still think that you data scientists are nothing remarkable, but it just turns people's intuitive feelings into quantitative data. But then listen further, and you'll see what data can do.
Give specific solutions that can be implemented
By collecting data and comparing it with other regions, I identified two key issues:
First, there are too few jobs in the Huilongguan urban area;
Second, the subway, as the main circulation mode, makes the energy consumption of residents too high.
In the past, urban planning may have made some general suggestions, such as increasing employment and building more roads. However, there is no real basis for how to land specifically, and improper solutions will bring new problems.
At this time, we need data to help us find out which specific place and which type of population problem is the most prominent and needs to be solved.
In response to the first question, what types of jobs should be provided to Huilongguan?
I analyzed the types of practitioners of Huilongguan residents, and there are two types of people who are the most prominent:
The first category is people who are engaged in public service industries such as culture and commerce. They accounted for 27.5 per cent of the total, and this type of occupation had the highest proportion of women. Their work locations are also very scattered, scattered throughout the city of Beijing.
The second category is programmers, the so-called code farmers, and Huilongguan is very close to the gathering places of code farmers, such as the famous Beijing Zhongguancun, and shangdi north of Zhongguancun. Therefore, the cost of living for code farmers is also relatively low.
The Huilongguan government has two choices, either to provide more public service jobs or more jobs for code farmers. Don't forget, the government's starting point is to increase the activity of the city, which solution is more effective?
Let me start with the answer, it is more effective to solve the direction of women's employment.
First of all, increasing the number of jobs in the yard farmers will not help the vitality of the urban area, and even make the quality of life in the urban area worse. We objectively say that code farmers have no time to consume, they are working overtime every day, and they have no direct contribution to the vitality of the urban area.
But addressing women's employment is very different.
If we let our wives find more local employment, their commute time will be shortened, and what will women do with the extra time? Buy, buy, buy, buy This, in turn, will create more public service jobs for the local area, forming a positive cycle.
In this way, the entire Huilongguan community was revitalized. A female-friendly community is a vibrant community.
So what kind of career opportunities should we design for Huilongguan?
Let's still compare Wangjing to see the supply and demand of commerce in the two places.
We can use mobile phone positioning data to compare the commercial supply of Wangjing and Huilongguan: Huilongguan has 3 shopping malls, 60% of which are consumed by locals; while There are 7 in Wangjing, with only 30% of local consumers, and the remaining 70% are consumed by residents of other urban areas. This shows that the commercial level of Wangjing is relatively high, can attract external consumers, this is the supply side, let's look at the demand side, examine the weekend behavior trajectory of residents in Huilongguan and Wangjing. People who return to Longguan run outside on weekends, while people in Wangjing stay more at local activities. Obviously, the local commercial facilities in Huilongguan do not meet the needs of local residents.
So what needs are not being met? We'll find out where they're going on the weekend.
About 5 kilometers south of Huilongguan, there is a shopping mall called Wucai City, which is the most visited commercial center for Huilongguan residents. Of the passenger flow in Colorful City, 20% comes from the Huilong temple. So why go to Colorful City? Because Colorful City has some consumption places that Huilongguan does not have, like some well-known chain restaurants, trendy consumer brands, parent-child projects and some home experience formats.
Therefore, this kind of business is the commercial format that Huilongguan should introduce, and they can retain more women, whether it is to attract these women to consume or to employment.
Let's look at the second problem, the problem of long commuting times.
By analyzing the swiping data of the subway, we found that the passenger flow in the hour of the morning peak of Huilongguan, their outbound locations were concentrated in Xi'erqi, Wudaokou, Zhichun Road and Shangdi, and we found that these were the places where Internet companies were concentrated. As can be seen from the data, the code farmers are the main group of subway travel.
The distance from these places to Huilongguan is about 10 kilometers, which is a normal commuting distance. Among them, Xi'erqi, where the most concentrated working residents, is only about 5 kilometers away from Huilongguan.
Therefore, we found that the straight-line distance of the yard farmers' commuting is not far, the near is 5 kilometers, and the distance is about 10 kilometers. However, their commute time is particularly long, because the subway is too crowded and the experience is also very poor. The question is clear, what should we do?
The traditional solution is to build more roads and subways, which are not only costly and long implementation cycles, but also likely to have no way to alleviate the problem well.
In fact, for a short commute of about 5 kilometers, there is another option, which is to ride a bicycle. Can we dedicate a dedicated high-speed bike lane? For code farmers, they can both reduce commuting time and exercise, killing two birds with one stone.
High-speed cycling lanes have long been practiced in European countries such as Germany, the Netherlands and Denmark. The bicycle lane is built into a closed elevated interchange road, which is separated from the motorized road to ensure that the bicycle is not disturbed by the motor vehicle. And there are no traffic lights on the road, only some cycling stations, which are used to rest and maintain vehicles in the middle.
Doesn't that sound creative?
Now this idea is not just a creative idea, it has been adopted by the Beijing Municipal Government. The first phase of the planned route, from Huilongguan to Shangdi, a total distance of 6.5 kilometers. According to the normal riding speed of 15 kilometers per hour, cycling will become the shortest travel method for Huilongguan residents during the morning rush hour in the future, with a full commute time of about half an hour.
So let's review what specific solutions we propose through data analysis:
The first is to upgrade the public service of Huilongguan. The proposal to build a commercial center for the recruitment of chain brands on the commercial side, and to use algorithms to add new public service facilities, optimize the layout and accessibility of facilities, is already in the overall project planning of "Huitian counts".
The second is to build a bicycle path from Huilongguan to Shangdi.
Through the specific analysis of this case, I think you can see that most of the solutions and decisions without data are brain doors, and data can provide a specific direction basis for our decisions.
summary
Finally, let's summarize that you are probably not a professional data scientist, as a layman, how can you use the data scientist's mental model?
First, pay attention to the new data that has just emerged and can be recorded, which is the new oil of the day, and you can use it without using others.
For example, in the past, there was a problem with the parts in the factory, and it was necessary for workers to check one by one, and it took a long time to check it out. But now with the digital factory, where sensors are installed in every key part, problems can be identified at the first time, which greatly improves operational efficiency.
Second, find a frame of reference for comparison. How to improve Huilongguan? The answer is hard to find, but if you find a past Huilongguan, you are likely to find your way. This case is to find Wangjing as a reference.
Things that have been around in our time are particularly prone to ridicule, but another characteristic of things that have gone through is that they have been fully experienced, and you may have to go through the stages of cyclical development that you may have to go through in the future. For example, when studying social issues in China, we should pay attention to Japan, and the current aging and post-industrialization problems in Japan may be the situation that China will face in a few decades.
Third, the analysis of data has to be deepened to the point of becoming actionable. For example, there is a specific plan such as where to build a bicycle lane. Otherwise, it is no different from past data reports, which is the unique value provided by data scientists.
The data reports we usually see are often a series of stories, full of pie charts, histograms, and linear charts. It gives the impression that I've already understood something and tell me again in a different way. This has no value, and data analysis must turn into concrete actions that can be executed.
Well, I have finished learning the case of the transformation of huilongguan. So in your own work and life, is there a valuable case of using data to solve problems?
You are welcome to share it with me.