Design Proximity Service/Yelp

studied byStudied by 1 person
0.0(0)
learn
LearnA personalized and smart learning plan
exam
Practice TestTake a test on your terms and definitions
spaced repetition
Spaced RepetitionScientifically backed study method
heart puzzle
Matching GameHow quick can you match all your cards?
flashcards
FlashcardsStudy terms and definitions

1 / 13

encourage image

There's no tags or description

Looks like no one added any tags here yet for you.

14 Terms

1

Functional and Non-functional requirements

-User enters address, gets 100 businesses back
- Can filter for types
- Gets reviews, contact info for business
- Owners can add/remove businesses

-low/moderate latency
- scalable
- available
- data privacy

New cards
2

BOE

100 million DAU, 200 million businesses, average of 5 searches a day
2000 QPS

New cards
3

DB Design and APIs

APIs
GET api/location/businesses
POST api/businesses
PUT api/businesses/id
DELETE api/businesses


DB can be relational. Key tables are business table and geospatial index

New cards
4

High Level Design Diagram

Location service finds nearby businesses for a given radius and location. Is read heavy with no write requests, high QPS, and stateless.

Business Service is for owners to create, update, and delete businesses. Also, for customers to view detailed business data.

Databases replicated because services are stateless.

<p>Location service finds nearby businesses for a given radius and location. Is read heavy with no write requests, high QPS, and stateless. </p><p>Business Service is for owners to create, update, and delete businesses. Also, for customers to view detailed business data.</p><p></p><p>Databases replicated because services are stateless. </p>
New cards
5

Algorithms for nearby businesses

2D search- draw circle with pre defined radius, do a bunch of joins on lats and lons between certain values. Can only improve speed in one direction though. SOOO many values

Evenly Divided Grid- Divide grid into even sections. Not even in number of businesses tho

Geohash- Recursively divide map into smaller and smaller areas. Use 0s and 1s. Convert this to base32 and that is hash. Has 12 precisions, only levels 4-6 really used. Boundary issues by meridian/equator. If not enough businesses, remove end of hash .

QuadTree- Memory solution. Keep recursively dividing until only 100 businesses on node. that is now leaf node.


Explain all

New cards
6

Memory/Aspects of QuadTree

200 million businesses, each leaf node has 100 businesses, so 200 mil/100 = 2 million leaf nodes. Internal nodes are 1/3 the size of leaf nodes, so 670,000.

Data on leaf node has 16 bytes for top left coordinates and 16 for bottom right, plus 100 × 8 bytes for list of business ids. That is 832 bytes. Internal node has same 32 bytes for coordinates, plus 8 bytes for each child node so another 32. so 2 mil 832 bytes + 670,000 * 64 bytes if about 1.6 to 1.7 GB. Easily can fit on one server, but replicate to handle all the requests

Quadtree will take a little while to build with 200 million businesses, if we want to update/remove businesses we will need to rebuild. Maybe do Blue/Green deployment.

New cards
7

Geohash vs Quadtree

Geohash is easy to implement and use, easy to update businesses.

Quadtree is harder to implement bc tree has to build, supports fetching k-nearest businesses though. Can dynamically resize grid based on population density. Updating is more complicated (must rebuild tree, OR traverse to delete but this can get complicated with multi-threading).

New cards
8

Deep Dive

Scale the Database

Caching

Region and Availability Zones

Filter results by time or business type

New cards
9

Scale the database

Do we scale the business table or the geospatial index?

Business table has 200 million businesses with all their info. Shard this! Do it by business id.

Geospatial index needs to be replicated. The full dataset is not that big (similar to quad tree) so no need to shard. (this is the geohash option). Just make read replicas!

New cards
10

Cache

Should we cache location coordinates of user?? NO, not that accurate and people can move.

Instead, cache business ids for a given geohash! Use a Redis cache for key-value store. Storage- 8 bytes 200 million 3 precisions is about 5GB. Call this Geohash cache

Also, cache Business info (images, name, address) in Business Info cache.

New cards
11

Region and availability zones

Deploy LBS to multiple regions, makes us physcially closer to service, gives flexibility to spread traffic evenly across population, and different privacy laws met with storage of data locally.

New cards
12

Filtering??

When world divided into small grids, number of businesses returned from search result is relatively small. Acceptable to return business IDs first, hydrate business objects, then filter them.

New cards
13

Final Diagram

knowt flashcard image
New cards
14

Google S2

In memory solution that uses a 1d index based on Hilbert curve. Two points close to each other on curve are close in 1D space.

Complicated library, widely used though.

Great for geofencing because it can cover arbitrary areas with varying levels. Allows us to define perimeters that surround areas of interest and to send notifications to users who are out of areas.

New cards
robot