ChinaFile’s research for “State of Surveillance” relied primarily on a dataset comprising some 76,000 procurement notices that central and local government offices posted for goods and services related to surveillance. This article explains how we arrived at that dataset—how we selected notices for inclusion, as well as how we processed and sifted through them in order to draw analytical conclusions.

ChinaFile undertook this type of data-driven research project specifically because of the difficulties in reporting on the ground in China right now. We based our analysis on procurement notices, discussions with experts, and additional research, but we were not able to conduct first-hand reporting in China.

Central and local governing bodies, such as state government departments and Communist Party offices, publish public procurement notices on the Chinese Government Procurement Network (CGPN) website. (We use the terms “officials,” “authorities,” and “government” to refer to either state or Party entities.) The website includes procurement notices from as early as 2001. These notices typically include basic information about the type of equipment or services local governments are hoping to buy, how much they plan to spend, and how and when companies can submit a tender. Local government purchasers do not directly manage the bidding processes. Companies act as middlemen, posting procurement notices on the CGPN website, creating the sometimes lengthy addenda included in these notices, and otherwise handling related administrative tasks. The CGPN website has separate webpages for each phase of the procurement process, from the initial notice to the final announcement of the winning bid.

Some notices include supplemental documents that elaborate on the basic terms of the notice. These can include contract templates, rules for bidding procedures, and information about the specific products or work requested. At times, these addenda explain the reasons and rationale for the requested purchases in rich detail; in others, the rationale is implicit in the kinds of technology officials seek.

Generating and Analyzing Our Dataset

To create this dataset of approximately 76,000 procurement notices, we generated a list of keywords related to surveillance of public spaces in China. Some of these keywords related to specific hardware or software (such as “facial recognition camera”), while others were more abstract or referenced overall surveillance-related programs (such as the national-level campaign “Project Sharp Eyes”).

We also created a list of keywords to narrow down the range of potential purchasers. We did not include every possible government purchase in our dataset; instead, we chose to focus on purchases made by those parts of the government specifically tasked with monitoring citizen behavior in public spaces—entities, such as local Public Security Bureaus, designated in central-level planning documents as responsible for implementing surveillance projects. We also included several agencies or offices whose mission is explicitly linked to citizen monitoring, such as local Politics and Law Commissions, or Petitioning Offices. We also included the local state and Party offices that oversee these entities. This selection of purchasers means that our data does not speak to the surveillance activities of other government organs, whose main missions are not directly tied to monitoring and controlling the population. Such government organs may, of course, be surveilling people.

Any procurement notice included in the dataset had to contain both a surveillance-related keyword and a purchaser-related keyword. The purchaser keyword had to appear in the field of the notice designated for the purchaser name, to ensure it was purchaser-related and not referring to something else.

The 76,375 procurement notices that include these keywords were posted to the CGPN website between June 29, 2004 and May 19, 2020. Among them we found approximately 7,000 notices containing supplemental documents, at time with page lengths running into the hundreds. These included Microsoft Word or PDF documents, as well as other file types, such as JPEGs. We assume there are some number of supplemental documents that we did not successfully collect, meaning that figures or estimates made with this information are likely undercounts.

In some cases, we looked specifically at “awarded bid notices”—that is, notices of government contracts’ being awarded to winning companies. The initial dataset of 76,375 included a number of procurement notices we could not determine had been awarded out; these notices may not have made it through the full bidding process, the purchasing agency may have neglected to post the final announcement stating which company won the bid, or authorities may have removed that final announcement from the website. To account for this, we created a second dataset that only contained notices labeled as having been awarded out to a contractor to fulfill. This narrower dataset comprised 22,061 notices.

One key way we wanted to be able to sort and analyze these notices was by geographic distribution. To avoid reading each notice title and assigning locations manually, we created a computer program to extract the location names automatically. Using information we collected from China’s National Bureau of Statistics website, we derived a list of China’s official place names nationwide as of 2019, down to the village level. The computer program compared notice titles and purchaser names to the official place names and assigned more specific location names to notices where possible. (We use the term “province” to refer to all province-level administrative units, including province-level autonomous regions and province-level municipalities. We use a similar shorthand for “counties” and “townships,” which refer to county-level administrative units and township-level administrative units, respectively.)

Similarly, we attempted to extract information about the amount of money budgeted for each notice by using a custom computer program. However, we were not able to successfully parse each notice; for some notices our software was not able to extract that price information.

It appears that there were fewer procurement notices overall posted to the CGPN website at the end of 2019 and the beginning of 2020, likely due in part to COVID-19. This dip decreases the number of surveillance-related notices during this time period, but it follows the decline of overall notices posted on the CGPN website during that time period and is unlikely related to surveillance trends specifically.

Possible Flaws in the Data

This dataset only contains procurement notices that government organs publicly posted and that still were available at the time we pulled them up on the CGPN website. Therefore, we believe that the figures we cite in the article are most likely undercounts. Further, it is clear that some provinces are assiduous in posting these notices online, while others are less so. We don’t always have a way of knowing whether certain localities are making fewer purchases, whether they are publicizing notices or procuring goods and services outside of the Chinese Government Procurement Network, whether previous notices have been removed from the website because they were retroactively deemed “sensitive,” or if notices are not reflected on the site for any other reason. Not only does this mean that some notices are certainly missing, it also means that it is not entirely random which types of notices are missing. For example, our dataset of roughly 76,000 entries only contains 176 entries from the Xinjiang Uighur Autonomous Region, during a time period when the surveillance build-up there was known to be particularly extensive. It thus seems likely that officials deleted some number of notices after they were written about in news reports.

There are, however, a few ways in which the data, and our methods of searching, might have led us to over-count. As described above, many purchases may appear multiple times in the dataset; for example, one purchase might appear first as an initial announcement, second as a modification announcement, and finally in an announcement of who won the bid. We have done our best to eliminate these in our smaller set of roughly 22,000 entries, but we may have unintentionally included a small percentage of them. Additionally, some of the keywords we used to generate the initial dataset can both refer to a specific surveillance project and appear in other contexts. The term 平安 (ping’an), for instance, is both part of the name of “Safe Cities” and a general term for “peaceful,” and appears in many place names. Still, we opted to keep some such terms in our list of keywords so as not to exclude a large number of relevant notices. When generating numbers to cite in the article, we were often able to minimize these potential over-counts by using narrower, more targeted searches with more specific keywords (as we did to minimize issues with the term “ping’an,” described below).

For the budget case studies referenced in “Budgeting for Surveillance,” we went through additional steps—some with the help of computer programs, some done manually—to exclude any entries that were not relevant. First, we only included entries that contained our keywords in their titles (this ensured that our keywords were of high importance, which they may not have been if they were only found toward the bottom of the full text of a procurement notice). Second, to address the wide usage of “ping’an” in particular, we created more specific keywords (平安城市,平安乡镇,平安校园, 平安景区, and 平安建设) to replace the ping’an keyword. Using this more defined set of data, we selected localities that represented a wide geographic distribution, appeared 50 or more times, and whose notices already contained relatively complete price information. In cases where our computer program wasn’t able to extract price information, we found it by reading individual documents.

Several times in the article, we mention search terms appearing in notices in what may seem like unimpressive numbers in comparison to the full set of 76,375 procurement notices. However, it is important to note that many of the notices in our dataset, and even many of their supplemental materials, are not particularly detailed. Even notices containing a relatively thorough list of goods and services—and many do not contain even this level of detail—often do not discuss why or for what purpose they are needed. The case studies we selected are among several hundred notices that include these more extensive explanations of the rationale behind the purchases. We believe that the larger themes we address in the article hold true, even if every notice for cameras in the CGPN system does not include a lengthy rationale we can search and quantify.

Jessica Batke and Mareike Ohlberg

Translation of Selected Terms

Alert cameras 警戒摄像机

Boulder of Wisdom 智慧磐石

Central Leading Small Group for Stability Maintenance 中央维护稳定工作领导小组

Central Political and Legal Affairs Commission 政法委员会

Central Public Security Comprehensive Management Commission 中央社会治安综合治理委员会

Circles, blocks, grids, lines, spots 圈、块、格、线、点

Comprehensive Management Platform 综合管理平台

Eye in the Sky 天眼工程

Floating Population Information System 流动人口信息系统

Fugitive System 在逃人员系统

Fund analysis module 资金分析模块

Integrated intelligent tracking cameras 智能跟踪一体机

Integrated Joint Operations Platform 一体化联合作战平台

Key persons 重点人员

Key person control module 重点人管控模块

Key populations 重点人口

Ministry of Public Security 公安部

Multivariate analysis module 多元分析模块

Panoramic cameras 全景一体化摄像机

Population density camera 人员密度摄像机

Population Information System 人口信息系统

Project Sharp Eyes 雪亮工程

Project Skynet 天网工程

Safe Borders Platform 安全边界平台

Safe Cities 平安城市

Smart Cities 智慧城市

Smart comprehensive management 智慧综治

Smart monitoring 智慧监控

Smart policing 智慧警务

Societal resource integration platform 社会资源整合平台

Terrorist and violent person prediction module 涉恐涉暴人员预测模块

Thermal imaging monitoring points 高空瞭望热成像监控

Trajectory analysis module 轨迹分析模块

Urban Management (chengguan) 城市管理