Note : This is my first attempt at creating architectures with GIFs. Please bear with the page to load and share your thoughts in the comments
What Happened?
Early in June as we were navigating through the chaos of COVID19, I got noticed by the realtor that I have to vacate my apartment as the owner wants to sell the apartment.
The first hour after receiving the news I was hit with a barrage of questions! Apartment hunting through pandemic times, how would that work? How risky would it be to get infected during these interactions? Is there a better way to do this?
So instead of browsing through the listings I challenged myself to solve this problem with the technology I knew and have worked across the years. Let me walk you through the solution as a Q&A in my head.
Where should I start my search?
I was looking for a 1 Bedroom with own bathroom in house or apartment at a reasonable walking & commute distance from work, groceries, shopping mall etc.
NSW Fair Trading and NSW Family & Community Services both publish statistics around rental agreements lodged/registered with the Department of Fair Trading. Data published is provided by landlords and real estate managers with detail around the number of bedrooms, postcodes and date registered. Based on my data and my needs, I was able to retrieve list of post codes which have the highest probability to become my next home.
How would I clean data & store the listings?
Rental listings are maintained by people and people make mistakes. It means data needs to be cleaned into the best structure for further mining and duplicates have to be removed. Let’s make the most use of ETL paradigm.
Rental listings from Domain API and other sources would be in JSON format saved in Azure Blob daily. Snowflake would be able to process all of this non structed JSON data into schema that can be transformed later. Computed power would be dedicated to load and clean the data before they are stored in the database. As I aim towards preventing duplicates, Snowflake allows duplicate tables aka virtual tables to be created without getting charged.
Is that a good place to stay?
To understand more about the location, each address would be scored using Google Maps API based on the commute time to work and time to walk to nearest groceries/parks/gym. I know that Sydney suburbs are designed to have shopping malls and places to eat around the train station.
Again using the power of Snowflake’s virtual tables, for data load a separate table of addresses would be created, extracted and sent to Google Maps API service. After processing they would be updated back to the address table which would later update the central database with the actual times. To reduce Google Map costs only new addresses found in the extract would be used and duplicates updated with the existing information.
How can I optimize my Saturday to inspect potential homes?
I was deeply involved in a digital transformation project so inspection during weekdays was impossible, which left me with 8 hours of Saturday. Also, I am yet to buy a car here so I am limited to use trains & buses only.
As I browsed through my emails, I created a table in Snowflake named Selects and entered the listing URL along with the appointment timings provided. Queries then retrieved the address and sorted them according to the location score. Every Friday, I would go through the listings in the Appointments table and sort the final listings I want to inspect on Saturday.
Then, using the Travelling Salesman paradigm and Google Maps API, my Azure Functions app would try to create to create an itinerary. It provided specific timings about when should I leave a property to catch the train/bus to the next property, with the assumption that I can only spend 10 mins to view the property. Above’s an example of the Travelling Salesman in action
At the end of 3 weeks, I found a shared house with more space, more savings at 20% less rent.
Is there anything I could have done better? Appreciate your comments here about opportunities for improvement.