Design Proximity Service/Yelp

Studied by 1 person

0.0(0)

LearnA personalized and smart learning plan

Practice TestTake a test on your terms and definitions

Spaced RepetitionScientifically backed study method

Matching GameHow quick can you match all your cards?

FlashcardsStudy terms and definitions

1 / 13

There's no tags or description

Looks like no one added any tags here yet for you.

14 Terms

Functional and Non-functional requirements

-User enters address, gets 100 businesses back
- Can filter for types
- Gets reviews, contact info for business
- Owners can add/remove businesses

-low/moderate latency
- scalable
- available
- data privacy

New cards

BOE

100 million DAU, 200 million businesses, average of 5 searches a day
2000 QPS

New cards

DB Design and APIs

APIs
GET api/location/businesses
POST api/businesses
PUT api/businesses/id
DELETE api/businesses

DB can be relational. Key tables are business table and geospatial index

New cards

High Level Design Diagram

Location service finds nearby businesses for a given radius and location. Is read heavy with no write requests, high QPS, and stateless.

Business Service is for owners to create, update, and delete businesses. Also, for customers to view detailed business data.

Databases replicated because services are stateless.

<p>Location service finds nearby businesses for a given radius and location. Is read heavy with no write requests, high QPS, and stateless. </p><p>Business Service is for owners to create, update, and delete businesses. Also, for customers to view detailed business data.</p><p></p><p>Databases replicated because services are stateless. </p>

New cards

Algorithms for nearby businesses

2D search- draw circle with pre defined radius, do a bunch of joins on lats and lons between certain values. Can only improve speed in one direction though. SOOO many values

Evenly Divided Grid- Divide grid into even sections. Not even in number of businesses tho

Geohash- Recursively divide map into smaller and smaller areas. Use 0s and 1s. Convert this to base32 and that is hash. Has 12 precisions, only levels 4-6 really used. Boundary issues by meridian/equator. If not enough businesses, remove end of hash .

QuadTree- Memory solution. Keep recursively dividing until only 100 businesses on node. that is now leaf node.

Explain all

New cards

Memory/Aspects of QuadTree

200 million businesses, each leaf node has 100 businesses, so 200 mil/100 = 2 million leaf nodes. Internal nodes are 1/3 the size of leaf nodes, so 670,000.

Data on leaf node has 16 bytes for top left coordinates and 16 for bottom right, plus 100 × 8 bytes for list of business ids. That is 832 bytes. Internal node has same 32 bytes for coordinates, plus 8 bytes for each child node so another 32. so 2 mil 832 bytes + 670,000 * 64 bytes if about 1.6 to 1.7 GB. Easily can fit on one server, but replicate to handle all the requests

Quadtree will take a little while to build with 200 million businesses, if we want to update/remove businesses we will need to rebuild. Maybe do Blue/Green deployment.

New cards

Geohash vs Quadtree

Geohash is easy to implement and use, easy to update businesses.

Quadtree is harder to implement bc tree has to build, supports fetching k-nearest businesses though. Can dynamically resize grid based on population density. Updating is more complicated (must rebuild tree, OR traverse to delete but this can get complicated with multi-threading).

New cards

Deep Dive

Scale the Database

Caching

Region and Availability Zones

Filter results by time or business type

New cards

Scale the database

Do we scale the business table or the geospatial index?

Business table has 200 million businesses with all their info. Shard this! Do it by business id.

Geospatial index needs to be replicated. The full dataset is not that big (similar to quad tree) so no need to shard. (this is the geohash option). Just make read replicas!

New cards

Cache

Should we cache location coordinates of user?? NO, not that accurate and people can move.

Instead, cache business ids for a given geohash! Use a Redis cache for key-value store. Storage- 8 bytes 200 million 3 precisions is about 5GB. Call this Geohash cache

Also, cache Business info (images, name, address) in Business Info cache.

New cards

Region and availability zones

Deploy LBS to multiple regions, makes us physcially closer to service, gives flexibility to spread traffic evenly across population, and different privacy laws met with storage of data locally.

New cards

Filtering??

When world divided into small grids, number of businesses returned from search result is relatively small. Acceptable to return business IDs first, hydrate business objects, then filter them.

New cards

Final Diagram

New cards

Google S2

In memory solution that uses a 1d index based on Hilbert curve. Two points close to each other on curve are close in 1D space.

Complicated library, widely used though.

Great for geofencing because it can cover arbitrary areas with varying levels. Allows us to define perimeters that surround areas of interest and to send notifications to users who are out of areas.

New cards