1/9
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Teradata- on prem
Vertica - on prem
Oracle - now cloud, was on prem
Netezza - on prem
Snowflake - cloud
Amazon Redshift - cloud
Microsoft Synapse - cloud
Data Warehouse - Proprietary way of storing and managing data, built for analytics, massive costs, getting data in takes time
Cloudera - Impala (assume hadoop)
MapR- Drill (assume hadoop)
Hortonworks - Hive (assume hadoop)
Presto - built by facebook - open source (not sold) - connects to all data lakes like we do
Amazon Athena - only works S3
Starburst - most like us - they connect to all data lakes and relational and noSQL sources
Trino (people that originally created presto)
Databricks SQL
DREMIO
SQL Engines - Query in data lake (built for analytics in the data lake)
Informatica
Microsoft SSIS
IBM Datastage
ETL/ELT - Extract Transform Load, Extract Load Transform - for moving data from one repository to another ( moving it for storage, moving it bc its not fast enough)
Alteryx
Paxata
Trifacta
Data Prep - Calculated fields, dropping a row and column
Tableau
Power BI
Cognos
Microstrategy
Looker
Qlik - won’t work quite as well with us
SAP Business Objects
BI (Analyzing - Visual Analytics) - Used By Analysts
Jupyter
Spark (Databricks)
R
SAS
Python
Anaconda
Machine Learning - Predictive Analytics (Used By Data Scientists)
MinIO
Dell Isilon
Dell ECS
Pure Storage
Cumulo
NetApp
Scality
Vast
StorageGrid
Ceph
On-prem object storage - (data lake on prem) for customers who want a modern experience and want to migrate off of hadoop or they are getting a new data lake and don’t want to go to the cloud
Hadoop (on prem) open source
Cloudera
MapR - RIP
Hortonworks - RIP - merged with cloudera
Amazon S3 - cloud
Microsoft Azure/ADLS - cloud
Google - GCP/GCS - Google cloud platform/good cloud storage - cloud
Data Lakes - unstructured data, less expensive, easier to scale, easy to get data into, tough to get analytics out
Microsoft SQL
MySQL
Oracle
Greenplum- we don’t integrate
Postgres
IBM DB2- we don’t integrate
Azure
Databases:
SQL/RDBMS/Relational - traditional database, operates in SQL, operational system
Cassandra- we don’t integrate
MongoDB
Elasticsearch
NoSQL - Not Only SQL (SQL plus other schema) databases