Dremio - First Week

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/9

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

10 Terms

1
New cards

Teradata- on prem

Vertica - on prem

Oracle - now cloud, was on prem

Netezza - on prem 

Snowflake - cloud

Amazon Redshift - cloud

Microsoft Synapse - cloud

Data Warehouse - Proprietary way of storing and managing data, built for analytics, massive costs, getting data in takes time

2
New cards

Cloudera - Impala (assume hadoop) 

MapR- Drill (assume hadoop) 

Hortonworks - Hive  (assume hadoop) 

Presto - built by facebook - open source (not sold) - connects to all data lakes like we do

  • Amazon Athena - only works S3

  • Starburst - most like us - they connect to all data lakes and relational and noSQL sources

  • Trino (people that originally created presto) 

Databricks SQL


DREMIO

SQL Engines - Query in data lake (built for analytics in the data lake)

3
New cards

Informatica

Microsoft SSIS 

IBM Datastage

ETL/ELT - Extract Transform Load, Extract Load Transform - for moving data from one repository to another  ( moving it for storage, moving it bc its not fast enough)

4
New cards

Alteryx

Paxata 

Trifacta

Data Prep - Calculated fields, dropping a row and column 

5
New cards

Tableau 

Power BI

Cognos

Microstrategy

Looker

Qlik - won’t work quite as well with us 

SAP Business Objects

BI (Analyzing - Visual Analytics) - Used By Analysts 

6
New cards

Jupyter

Spark (Databricks) 

R

SAS

Python

Anaconda

Machine Learning - Predictive Analytics (Used By Data Scientists) 

7
New cards

MinIO

Dell Isilon

Dell ECS

Pure Storage

Cumulo 

NetApp 

Scality

Vast

StorageGrid

Ceph

On-prem object storage  - (data lake on prem) for customers who want a modern experience and want to migrate off of hadoop or they are getting a new data lake and don’t want to go to the cloud

8
New cards

Hadoop (on prem) open source

Cloudera 

MapR - RIP 

Hortonworks - RIP - merged with cloudera 

Amazon S3 - cloud 

Microsoft Azure/ADLS - cloud 

Google - GCP/GCS - Google cloud platform/good cloud storage - cloud

Data Lakes - unstructured data, less expensive, easier to scale, easy to get data into, tough to get analytics out

9
New cards

Microsoft SQL

MySQL

Oracle

Greenplum- we don’t integrate

Postgres

IBM DB2- we don’t integrate

Azure

Databases:

SQL/RDBMS/Relational - traditional database, operates in SQL, operational system 

10
New cards

Cassandra- we don’t integrate

MongoDB

Elasticsearch

NoSQL - Not Only SQL (SQL plus other schema) databases