Scaling Address Parsing Sequence Models through Active Learning

Craig, Helen; Yankov, Dragomir; Wang, Renzhong; Berkhin, Pavel; Wu, Wei

doi:10.1145/3347146.3359070

Cited by 8 publications

(2 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This training dataset may be subtly different from the actual data the resulting algorithms are applied to, which can lead to weaknesses. For example, the most well-established parser ‘libpostal’ (Barratine, 2017) has been shown to have issues parsing incomplete addresses (Yassine et al, 2020), and it also struggles with addresses in formats it is not used to (Craig et al, 2019). A more fundamental issue may be that the algorithms are designed on the assumption that they are parsing a single address, but the OCOD dataset contains large numbers of nested addresses in which a single free text line may contain tens or even hundreds of properties.…”

Section: Introductionmentioning

confidence: 99%

What’s in the laundromat? Mapping and characterising offshore-owned residential property in London

Bourne

Ingianni

McKenzie

2023

Environment and Planning B: Urban Analytics and City Science

View full text Add to dashboard Cite

The UK, particularly London, is a global hub for money laundering, a significant portion of which takes place through residential property. However, understanding the distribution and characteristics of offshore residential property in the UK is a challenge. This paper attempts to remedy that situation by enhancing a publicly available dataset of UK property owned by offshore companies. We create a data-processing pipeline which draws on several datasets and on machine learning techniques to create a parsed set of addresses classified into six use classes. The enhanced dataset contains 138,000 properties – 44,000 more than the original dataset. The majority are residential (95k), with a disproportionate number of those in London (42k). The average offshore residential property in London is worth 1.33 million GBP, and collectively this amounts to approximately 56 billion GBP. We perform an in-depth analysis of offshore residential property in London, comparing the price, distribution and entropy/concentration with Airbnb property, low-use/empty property and conventional residential property. We estimate that the total number of offshore, low-use and Airbnb properties in London is between 144,000 and 164,000, collectively worth between 145–174 billion GBP. Furthermore, offshore residential property is more expensive and has higher entropy/concentration than all other property types. In addition, we identify two different types of offshore property – nested and individual – which have different price and distribution characteristics. Finally, we release the enhanced offshore property dataset, the complete low-use London dataset and the pipeline for creating the enhanced dataset to encourage further research into this topic.

show abstract

Section: Introductionmentioning

confidence: 99%

What’s in the laundromat? Mapping and characterising offshore-owned residential property in London

Bourne

Ingianni

McKenzie

2023

Environment and Planning B: Urban Analytics and City Science

View full text Add to dashboard Cite

show abstract

“…This training dataset may be subtly different from the actual data the resulting algorithms are applied to. This can lead to weaknesses, for example the most well established parser libpostal [23] has been shown to have issues parsing incomplete addresses [22], it also struggles with addresses in formats it is not used to [26]. A more fundemental issue may be that the algorithms are designed on the assumption that they are parsing a single address, the OCOD dataset contains large numbers of nested addresses where a single free text line may contain tens or even hundreds of properties.…”

Section: Introductionmentioning

confidence: 99%

What's in the laundromat? Mapping and characterising offshore owned domestic property in London

Bourne¹,

Ingianni²,

McKenzie³

2022

Preprint

View full text Add to dashboard Cite

The UK, particularly London, is a global hub for money laundering, a significant portion of which uses domestic property. However, understanding the distribution and characteristics of offshore domestic property in the UK is challenging due to data availability. This paper attempts to remedy that situation by enhancing a publicly available dataset of UK property owned by offshore companies. We create a data processing pipeline which draws on several datasets and machine learning techniques to create a parsed set of addresses classified into six use classes. The enhanced dataset contains 138,000 properties 44,000 more than the original dataset. The majority are domestic (95k), with a disproportionate amount of those in London (42k). The average offshore domestic property in London is worth 1.33 million GBP collectively this amounts to approximately 56 Billion GBP. We perform an in-depth analysis of the offshore domestic property in London, comparing the price, distribution and entropy/concentration with Airbnb property, low-use/empty property and conventional domestic property. We estimate that the total amount of offshore, low-use and airbnb property in London is between 144,000 and 164,000 and that they are collectively worth between 145-174 billion GBP. Furthermore, offshore domestic property is more expensive and has higher entropy/concentration than all other property types. In addition, we identify two different types of offshore property, nested and individual, which have different price and distribution characteristics. Finally, we release the enhanced offshore property dataset, the complete low-use London dataset and the pipeline for creating the enhanced dataset to reduce the barriers to studying this topic.

show abstract