RealStaq
Big data, small cost: Crunching millions of records in a subsecond Time To Live using AWS Infinite Scale and Cost Efficient Microservice Architecture
Industry
RealEstate, Big Data, AI
Technologies Used
- Java
- Python
- AWS
- Serverless
- Elasticsearch
- DynamoDB
- Redis
- Amazon Aurora
- SQL
- Angular
- PHP
Introduction
RealStaq, a data analytics company in the real estate sector, faced challenges with its monolithic architecture, struggling to efficiently process and correlate large volumes of diverse data, including real estate MLS data and national public records from multiple providers.
Apoddo’s Assistance:
Apoddo was engaged to lead an extensive transformation project, focusing on shifting to a serverless architecture in AWS and building a high-capacity data pipeline capable of processing millions of records swiftly and correlating billions of data entries.
Apoddo’s approach to transforming RealStaq’s infrastructure involved a meticulous process. The team initiated the project by transitioning RealStaq from a monolithic architecture to a serverless framework on AWS, employing services like Lambda, S3, DynamoDB, and API Gateway. A critical component of this transformation was the development of an advanced serverless data pipeline. As can be shown in a diagram below, you can see the multi-layered approach to processing data in parallel that scales to millions of processed records per minute.
This pipeline was meticulously designed to process millions of records within minutes, addressing the need for high-speed data processing. To ensure comprehensive data analysis, Apoddo implemented complex algorithms capable of correlating billions of records from diverse data sources, including real estate MLS data and national public records. This was instrumental in maintaining data integrity and consistency across the system. Further enhancing RealStaq’s data capabilities, a data warehouse using AWS Redshift and a data lake based on S3 were established. These systems were geared to handle both structured and unstructured data analytics effectively. Lastly, high-performance APIs were developed to facilitate rapid data access and manipulation, with response times consistently kept below the 50ms mark, managed and monitored through AWS API Gateway.
The website screenshot displayed here is a powerful demonstration of the capabilities of the underlying platform. It offers more than just a glimpse of a well-designed user interface; it reveals the advanced technology and seamless integration that make the website functional and user-friendly. This platform is versatile enough to power real estate search platforms as well as the mortgage industry, showcasing its adaptability and robustness in handling diverse and complex market needs.
Rapid Data Processing: The new data pipeline processed millions of records in minutes, significantly improving data throughput.
Efficient Data Correlation: Successfully correlated billions of records from various data sources, providing comprehensive insights into the real estate market.
Scalable Infrastructure: Achieved dynamic scalability and high availability, suitable for handling large-scale data operations.
Cost Efficiency: The serverless model drastically reduced operational costs and provided a pay-as-you-go pricing structure.