Paper

This paper is still a working draft and is not yet peer-reviewed.

abstract

Lextract is a Python pipeline that automatically extracts relevant market definitions from the European Commission’s merger and antitrust decision PDFs. Relevant market definitions establish the scope of competition legislation and identify the specific set of products in an area, which make them indispensable for economists, lawyers, and regulators when determining the effects of mergers and evaluating anticompetitive behavior. This pipeline has been designed for researchers and competition law experts who require a quick and accurate way to extract relevant market definitions from many cases at once. This level of accuracy is accomplished by using strict natural language processing and rule-based pattern recognition to identify market definitions while excluding all irrelevant information. By automating this process, Lextract enables merger and antitrust research at scale and contributes to more efficient competition policy analysis.

full paper

You can view the full paper here.

acknowledgements

Lextract was built by Shriyan Yamali. I am grateful to Professor Thibault Schrepel of Stanford Law School for his invaluable advisement and guidance throughout the course of this project. This research received no funding from any government agency, university, company, or non-profit organization.