Dremio - First Week

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/9

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

10 Terms

1
New cards

Teradata- on prem

Vertica - on prem

Oracle - now cloud, was on prem

Netezza - on prem 

Snowflake - cloud

Amazon Redshift - cloud

Microsoft Synapse - cloud

Data Warehouse - Proprietary way of storing and managing data, built for analytics, massive costs, getting data in takes time

2
New cards

Cloudera - Impala (assume hadoop) 

MapR- Drill (assume hadoop) 

Hortonworks - Hive  (assume hadoop) 

Presto - built by facebook - open source (not sold) - connects to all data lakes like we do

  • Amazon Athena - only works S3

  • Starburst - most like us - they connect to all data lakes and relational and noSQL sources

  • Trino (people that originally created presto) 

Databricks SQL


DREMIO

SQL Engines - Query in data lake (built for analytics in the data lake)

3
New cards

Informatica

Microsoft SSIS 

IBM Datastage

ETL/ELT - Extract Transform Load, Extract Load Transform - for moving data from one repository to another  ( moving it for storage, moving it bc its not fast enough)

4
New cards

Alteryx

Paxata 

Trifacta

Data Prep - Calculated fields, dropping a row and column 

5
New cards

Tableau 

Power BI

Cognos

Microstrategy

Looker

Qlik - won’t work quite as well with us 

SAP Business Objects

BI (Analyzing - Visual Analytics) - Used By Analysts 

6
New cards

Jupyter

Spark (Databricks) 

R

SAS

Python

Anaconda

Machine Learning - Predictive Analytics (Used By Data Scientists) 

7
New cards

MinIO

Dell Isilon

Dell ECS

Pure Storage

Cumulo 

NetApp 

Scality

Vast

StorageGrid

Ceph

On-prem object storage  - (data lake on prem) for customers who want a modern experience and want to migrate off of hadoop or they are getting a new data lake and don’t want to go to the cloud

8
New cards

Hadoop (on prem) open source

Cloudera 

MapR - RIP 

Hortonworks - RIP - merged with cloudera 

Amazon S3 - cloud 

Microsoft Azure/ADLS - cloud 

Google - GCP/GCS - Google cloud platform/good cloud storage - cloud

Data Lakes - unstructured data, less expensive, easier to scale, easy to get data into, tough to get analytics out

9
New cards

Microsoft SQL

MySQL

Oracle

Greenplum- we don’t integrate

Postgres

IBM DB2- we don’t integrate

Azure

Databases:

SQL/RDBMS/Relational - traditional database, operates in SQL, operational system 

10
New cards

Cassandra- we don’t integrate

MongoDB

Elasticsearch

NoSQL - Not Only SQL (SQL plus other schema) databases