Cap consistency available partition tolerance
Eventual consistency used in multi node databases
ETL extract transform load are typically used in batch data processing
Online multiplayer gaming platform most important properties AP
Public traffic management system CP
Remote health monitoring service AP
Automated supply chain management system CP
Load balancers in distributed web applications AP
· In a row-oriented DBMS, the data would be stored as
· 1001,Green,Rachel,20.12 1002,Geller,Ross,12.25 1003,Bing,Chandler,45.25
· Row vs column oriented
· Rows stored sequentially in a file
· Colum stored in a separate file
· Each column for a given row is at the same offset
· Column more effective if you have millions of data
· CSV (Comma Separated Values)
· TSV (Tab Separated Values)
· Json hierarchical structure
· Parquet binary format supports everything
· json hierarchical
· java script object notation
· key: value pairs
· json any language
· supports all possible combinations
· use camelCase in json
· http hypertext transfer protocol
· get the data submit the data
· Popular HTTP Status Codes
· 200 Series (Success): 200 OK, 201 Created.
· 300 Series (Redirection): 301 Moved Permanently, 302 Found.
· 400 Series (Client Error): 400 Bad Request, 401 Unauthorized, 404 Not Found.
· 500 Series (Server Error): 500 Internal Server Error, 503 Service Unavailable.
REST API
· A monolithic architecture is a software design pattern in which an application is built as a unified unit.
· One big application
· This architecture is simple to develop, test, deploy, and scale vertically.
· Simple in development and development
· Difficulty implementing changes
· Microservices
· Microservices architecture is a method of developing software applications as a suite of small, independently deployable services
· Characteristics: Modularity, Independence, Decentralized Control, Technology Control, Technology Diversity, Resilience, Scalability
· Statefulness
· Session Memory: The server remembers past interactions and may store session data like user authentication, preferences, and other activities.
· Server Dependency: Since the server holds session data, the same server usually handles subsequent requests from the same client. This is important for consistency.
· Resource Intensive: Maintaining state can be resource-intensive, as the server needs to manage and store session data for each client.
· Example: A web application where a user logs in, and the server keeps track of their authentication status and interactions until they log out.
· The server stores information about the client's current session in a stateful system.
· Stickiness or sticky sessions are used in stateful systems, particularly in load-balanced environments. It ensures that requests from a particular client are directed to the same server instance.
· Statelessness
· In a stateless system, each request from the client must contain all the information the server needs to fulfill that request. The server does not store any state of the client's session. This is a crucial principle of RESTful APIs
· Characteristics: No session memory, Scalability, Simplicity and Reliability
· Make a request get the data
· Idempotency- This is a concept where an operation can be applied multiple times without changing the result beyond the initial application. It's an essential concept in stateless architectures, especially for APIs.
· where an operation can be applied multiple times without changing the result beyond the initial application.
· Used with apas
· Poetry run python to_parquet.py
· CURL
· REST API: REpresentational State Transfer is a software architectural style developers apply to web APIs.
· REST APIs provide simple, uniform interfaces because they can be used to make data, content, algorithms, media, and other digital resources available through web URLs. Essentially, REST APIs are the most common APIs used across the web today.
· HTTP METHODS: GET, POST, PUT, DELETE
· GET: This method allows the server to find the data you requested and send it back to you.
· POST: This method permits the server to create a new entry in the database.
· PUT: If you perform the ‘PUT’ request, the server will update an entry in the database.
· DELETE: This method allows the server to delete an entry in the database.
· API Performance
· Caching -Store frequently accessed data in a cache so you can access it faster.
· Load Balancing - It not only helps with performance but also makes your application more reliable.
· Async Processing- With async processing, you can let the clients know that their requests are registered and under process. Then, you process the requests individually and communicate the results to the client later. This allows your application server to take a breather and give its best performance. But of course, async processing may not be possible for every requirement.
· Pagination -You limit the number of records per request. This improves the response time of your API for the consumer.
· Connection Pooling - It’s a good idea to use connection pooling to set up a pool of database connections that can be reused across requests.
· API In Big Data
· Data Ingestion: REST APIs can ingest data from various sources into big data platforms.
· Data Access: REST APIs provide a convenient way for applications to query big data stores and receive responses in a usable format.
· Microservices Architecture: In a microservices architecture, each microservice can handle some data processing and expose results through REST APIs.
· Real-time Processing: REST APIs can serve real-time processed data from big data platforms to end-users or other systems.
· Monitoring and Management: Big Data clusters and systems often come with management interfaces that expose REST APIs for monitoring, scaling, and managing resources.
· Tool Ecosystem: Many Big Data tools and platforms, such as Hadoop, Spark, Kafka, and Elasticsearch, offer RESTful interfaces for managing and interacting with their services. Understanding these APIs is essential for working effectively with these tools.
HTTP (HyperText Transfer Protocol) is the foundation of data communication on the web, used to transfer data (such as HTML files and images).
GET - Navigate to a URL or click a link in real life.
POST - Submit a form on a website, like a username and password.