The Webis Patent Retrieval Corpus 2012 (Webis-PRA-12) is a corpus for studying the impact of misspelled companies on patent retrieval.
To download the corpus use the following link:
(881.1 KB, MD5 sum: 490e583f4746c661796705b344c1afa9)
If you use the dataset in your research, please send us a copy of your publication. We kindly ask you to refer to the corpus via [bib].
The corpus contains 14,189 different company names extracted on the basis of 2,132,825 patents granted by the United States Patent and Trademark Office (USPTO) between 2001 and 2010.