Skip to content

European Chinese Law Research Hub

  • Home
  • About Us
  • Follow Us
European Chinese Law Research Hub

Big Data Cannot Answer All Questions About Chinese Courts

By Benjamin Minhao Chen

This is contribution #3 in our series SMART COURTS AND SMART GOVERNANCE IN CHINA, outcome of our workshop in July 2025 at Cologne University.

Imagine a researcher seeks to answer a fundamental question of legal fairness: Do better-resourced parties (the “haves”) achieve more favorable outcomes in Chinese courts simply because of their socio-economic status?

A naïve answer is to compare the win rates of well-resourced and less-resourced parties in litigation. But the researcher might quickly realize that well-resourced parties are likely to be represented by counsel whereas the less-resourced are likely to be self-represented. Perhaps it is the quality of legal representation that influences how judges rule, not the status of the parties. On this account, the “haves” do better than the “have nots” because they have, among other things, superior legal representation.

A solution might be to control (or adjust) for legal representation. It is common for statistical studies to control for an explanatory variable to isolate the effect of the variable of interest on outcomes. Controlling for legal representation here means, essentially, that the “haves” who are represented by counsel are compared to the “have nots” who are represented by counsel. These comparisons seem to make sense: if the researcher finds that represented, well-resourced parties still win more than represented, less-resourced parties, that indicates that status is driving the observed pattern of outcomes.

This approach can result in inaccurate inferences being drawn due to collider bias. In causal modeling, a collider is a variable that is a common effect of two other factors. A classic, non-legal example is the car that fails to start. This can be caused by a dead battery or an empty gas tank.[1] These two causes are independent; one doesn’t cause the other. Suppose you only look at cars that have broken down, i.e. you condition on the collider “car fails to start”. Then a correlation appears between the two causes: if you know a broken-down car has gas, you can be confident the battery is dead. But of course, the car having gas is not itself a cause of the battery being dead.

So how does collider bias apply to our example of the “haves” and “have nots” in Chinese courts? There is suggestive evidence that Chinese litigants are more likely to seek professional advice in harder, more doubtful, cases.[2] So, if a well-resourced party has a lawyer, it could be because they can easily afford one or because the merits of their claim are doubtful. If a less-resourced party has a lawyer, it is probably because the merits of their claim are doubtful. If a researcher controls for legal representation, she is comparing the “haves” who are represented by counsel are compared to the “have nots” who are represented by counsel. But the “haves” who are represented by counsel have, on average, stronger cases on the merits than the “have nots” who have counsel, and they might therefore prevail in more cases. The researcher may conclude from her controlled comparison that the “haves” come out ahead more often in Chinese courts because of who they are. But this inference is not necessarily warranted by the evidence.

This example demonstrates that big data does not absolve legal scholars from thinking through the causal relationships between the variables in their analyses. Indeed, qualitative and sociological methods can produce valuable domain knowledge for distinguishing plausible relationships from spurious ones. A multi-disciplinary paradigm remains critical to studying the Chinese legal system, even in the era of artificial intelligence.

The full article, titled Data Still Needs Theory: Collider Bias in Empirical Legal Research, co-authored by Xiaohan Yin, is available here. Benjamin Minhao Chen is Associate Professor and Director of the Law and Technology Centre at the University of Hong Kong Faculty of Law. You can contact him via email.


[1] Judea Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (San Francisco: Morgan Kaufmann 1988).

[2] Wenzheng Mao and Shitong Qiao, ‘Legal Doctrine and Judicial Review of Eminent Domain in China’ (2021) 46 Law & Social Inquiry 826; Yali Peng and Jinhua Cheng, ‘Ethnic Disparity in Chinese Theft Sentencing’ (2022) 22 China Review 47

General, Smart Courts, Smart Governance Series Chinese courts, Research Methods

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Idealist by NewMediaThemes