Writing on website Medium, Alessandro Paticchio (LinkedIn profile), a data scientist at Casavo, provides some insight into the IBuying company’s use of network science to enhance its data analysis.
“Every day thousands and thousands of listings are published online by real estate agents and private sellers on various listings platforms. At Casavo, we ingest a lot of this data, buying them from multiple players. Every day … [we get] the latest update and analyze [it],” he writes.
Duplicates are a major issue (“We found properties that have been listed more than 100 times in a couple of months,” he notes), and Casavo uses entity resolution (the process of identifying and merging duplicate records in a dataset — also known as record linkage or data deduplication) to resolve this.
This process compares unique identifiers (such data elements as names or addresses) to determine if two records refer to the same entity.
Until a few months ago, entity resolution involved no more than a SQL (database) query that merged listings that referred to properties with common characteristics, but this resulted in a lot of false negatives (i.e. listings for the same property that were not merged), according to Paticchio.
He adds: “By inspecting our data, we found out that the textual description of duplicated listings is quite often the same … bingo! We had another criterion to leverage!”
By combining these two approaches, Casavo created what it has dubbed Doduo (named after a two-headed creature from the video game Pokèmon). Doduo retrieves both metadata-edges (unique data identifiers) and description-edges (text), and uses them to create a graph to find all connected components i.e. cluster of listings selling the same property. These are then used to eliminate duplicates.
According to Paticchio this data can then be used to locate “where in Madrid houses are becoming more expensive,” calculate the Parisian neighborhoods where apartments are sold most rapidly or train an automatic valuation model to forecast the price of a property, among many other things.
Paticchio also notes possible additional improvements to the model — incorporating visual information, would likely be extremely useful for spotting identical properties, and assigning a probability score to clustering, which would show how confident Doduo was in its result.
Based in Italy, Casavo also operates in France, Spain and Portugal. In July 2022, it raised €400 million ($427 million U.S.) in debt and equity funding.