Giter Club home page Giter Club logo

databricks's Introduction

DATABRICKS

LIVRO: Spark: The Definitive Guide

SEÇÕES: I, II e IV

Engenheiro de Dados (ou Engenheiro de Big Data)

Em geral, as empresas requerem experiência em:

  1. Bancos de dados relacionais (Oracle, Postgress e SQL Server) e não relacionais (Cassandra, MongoDB);
  2. Ecossistema Hadoop: HDFS, Hive, Sqoop, Kafka e Spark;
  3. Design e implementação de Data Lakes e DW;
  4. Spark (batch & streaming) usando as API's do Python e/ou Scala;
  5. Metodologias Agile: SCRUM, CI/CD e metodologias DevOps;
  6. Design e implementação de pipelines de dados (batch & streaming);
  7. Azure Data Stack: CosmosDB, Synapse Analytics, Data Lake Storage, Azure Data Factory, Databricks, Event Hub, Streams Analytics;
  8. Monitorar pipelines de dados;

Conteúdos de Spark para a Certificação

SEÇÕES CAPÍTULO ID DETAIL STATUS
I. Gentle Overview of Big Data and Spark 1. What Is Apache Spark? 1.1 Apache Spark’s Philosophy OK
I. Gentle Overview of Big Data and Spark 1. What Is Apache Spark? 1.2 Context: The Big Data Problem OK
I. Gentle Overview of Big Data and Spark 1. What Is Apache Spark? 1.3 History of Spark OK
I. Gentle Overview of Big Data and Spark 1. What Is Apache Spark? 1.4 The Present and Future of Spark OK
I. Gentle Overview of Big Data and Spark 1. What Is Apache Spark? 1.5 Running Spark OK
I. Gentle Overview of Big Data and Spark 1. What Is Apache Spark? 1.5.1 -- Downloading Spark Locally OK
I. Gentle Overview of Big Data and Spark 1. What Is Apache Spark? 1.5.2 -- Launching Spark’s Interactive Consoles OK
I. Gentle Overview of Big Data and Spark 1. What Is Apache Spark? 1.5.3 -- Running Spark in the Cloud OK
I. Gentle Overview of Big Data and Spark 1. What Is Apache Spark? 1.5.4 -- Data Used in This Book OK
I. Gentle Overview of Big Data and Spark 2. A Gentle Introduction to Spark 2.1 Spark’s Basic Architecture OK
I. Gentle Overview of Big Data and Spark 2. A Gentle Introduction to Spark 2.1.2 -- Spark Applications OK
I. Gentle Overview of Big Data and Spark 2. A Gentle Introduction to Spark 2.2 Spark’s Language APIs OK
I. Gentle Overview of Big Data and Spark 2. A Gentle Introduction to Spark 2.3 Spark’s APIs OK
I. Gentle Overview of Big Data and Spark 2. A Gentle Introduction to Spark 2.4 Starting Spark OK
I. Gentle Overview of Big Data and Spark 2. A Gentle Introduction to Spark 2.5 The SparkSession OK
I. Gentle Overview of Big Data and Spark 2. A Gentle Introduction to Spark 2.6 DataFrames OK
I. Gentle Overview of Big Data and Spark 2. A Gentle Introduction to Spark 2.6.1 -- Partitions OK
I. Gentle Overview of Big Data and Spark 2. A Gentle Introduction to Spark 2.7 Transformations OK
I. Gentle Overview of Big Data and Spark 2. A Gentle Introduction to Spark 2.7.1 -- Lazy Evaluation OK
I. Gentle Overview of Big Data and Spark 2. A Gentle Introduction to Spark 2.8 Actions OK
I. Gentle Overview of Big Data and Spark 2. A Gentle Introduction to Spark 2.9 Spark UI OK
I. Gentle Overview of Big Data and Spark 2. A Gentle Introduction to Spark 2.10 An End-to-End Example OK
I. Gentle Overview of Big Data and Spark 2. A Gentle Introduction to Spark 2.10.1 -- DataFrames and SQL
I. Gentle Overview of Big Data and Spark 2. A Gentle Introduction to Spark 2.11 Conclusion
I. Gentle Overview of Big Data and Spark 3. A Tour of Spark’s Toolset 3.1 Running Production Applications
I. Gentle Overview of Big Data and Spark 3. A Tour of Spark’s Toolset 3.2 Datasets: Type-Safe Structured APIs
I. Gentle Overview of Big Data and Spark 3. A Tour of Spark’s Toolset 3.3 Structured Streaming
I. Gentle Overview of Big Data and Spark 3. A Tour of Spark’s Toolset 3.4 Machine Learning and Advanced Analytics
I. Gentle Overview of Big Data and Spark 3. A Tour of Spark’s Toolset 3.5 Lower-Level APIs
I. Gentle Overview of Big Data and Spark 3. A Tour of Spark’s Toolset 3.6 SparkR
I. Gentle Overview of Big Data and Spark 3. A Tour of Spark’s Toolset 3.7 Spark’s Ecosystem and Packages
I. Gentle Overview of Big Data and Spark 3. A Tour of Spark’s Toolset 3.8 Conclusion
II. Structured APIs—DataFrames, SQL, and Datasets 4. Structured API Overview 4.1 DataFrames and Datasets
II. Structured APIs—DataFrames, SQL, and Datasets 4. Structured API Overview 4.2 Schemas
II. Structured APIs—DataFrames, SQL, and Datasets 4. Structured API Overview 4.3 Overview of Structured Spark Types
II. Structured APIs—DataFrames, SQL, and Datasets 4. Structured API Overview 4.3.1 -- DataFrames Versus Datasets
II. Structured APIs—DataFrames, SQL, and Datasets 4. Structured API Overview 4.3.2 -- Columns
II. Structured APIs—DataFrames, SQL, and Datasets 4. Structured API Overview 4.3.3 -- Rows
II. Structured APIs—DataFrames, SQL, and Datasets 4. Structured API Overview 4.3.4 -- Spark Types
II. Structured APIs—DataFrames, SQL, and Datasets 4. Structured API Overview 4.4 Overview of Structured API Execution
II. Structured APIs—DataFrames, SQL, and Datasets 4. Structured API Overview 4.4.1 -- Logical Planning
II. Structured APIs—DataFrames, SQL, and Datasets 4. Structured API Overview 4.4.2 -- Physical Planning
II. Structured APIs—DataFrames, SQL, and Datasets 4. Structured API Overview 4.4.3 -- Execution
II. Structured APIs—DataFrames, SQL, and Datasets 4. Structured API Overview 4.5 Conclusion
II. Structured APIs—DataFrames, SQL, and Datasets 5. Basic Structured Operations 5.1 Schemas
II. Structured APIs—DataFrames, SQL, and Datasets 5. Basic Structured Operations 5.2 Columns and Expressions
II. Structured APIs—DataFrames, SQL, and Datasets 5. Basic Structured Operations 5.2.1 -- Columns
II. Structured APIs—DataFrames, SQL, and Datasets 5. Basic Structured Operations 5.2.2 -- Expressions
II. Structured APIs—DataFrames, SQL, and Datasets 5. Basic Structured Operations 5.3 Records and Rows
II. Structured APIs—DataFrames, SQL, and Datasets 5. Basic Structured Operations 5.3.1 -- Creating Rows
II. Structured APIs—DataFrames, SQL, and Datasets 5. Basic Structured Operations 5.4 DataFrame Transformations
II. Structured APIs—DataFrames, SQL, and Datasets 5. Basic Structured Operations 5.4.1 -- Creating DataFrames
II. Structured APIs—DataFrames, SQL, and Datasets 5. Basic Structured Operations 5.4.2 -- select and selectExpr
II. Structured APIs—DataFrames, SQL, and Datasets 5. Basic Structured Operations 5.4.3 -- Converting to Spark Types (Literals)
II. Structured APIs—DataFrames, SQL, and Datasets 5. Basic Structured Operations 5.4.4 -- Adding Columns
II. Structured APIs—DataFrames, SQL, and Datasets 5. Basic Structured Operations 5.4.5 -- Renaming Columns
II. Structured APIs—DataFrames, SQL, and Datasets 5. Basic Structured Operations 5.4.6 -- Reserved Characters and Keywords
II. Structured APIs—DataFrames, SQL, and Datasets 5. Basic Structured Operations 5.4.7 -- Case Sensitivity
II. Structured APIs—DataFrames, SQL, and Datasets 5. Basic Structured Operations 5.4.8 -- Removing Columns
II. Structured APIs—DataFrames, SQL, and Datasets 5. Basic Structured Operations 5.4.9 -- Changing a Column’s Type (cast)
II. Structured APIs—DataFrames, SQL, and Datasets 5. Basic Structured Operations 5.4.10 -- Filtering Rows
II. Structured APIs—DataFrames, SQL, and Datasets 5. Basic Structured Operations 5.4.11 -- Getting Unique Rows
II. Structured APIs—DataFrames, SQL, and Datasets 5. Basic Structured Operations 5.4.12 -- Random Samples
II. Structured APIs—DataFrames, SQL, and Datasets 5. Basic Structured Operations 5.4.13 -- Random Splits
II. Structured APIs—DataFrames, SQL, and Datasets 5. Basic Structured Operations 5.4.14 -- Concatenating and Appending Rows (Union)
II. Structured APIs—DataFrames, SQL, and Datasets 5. Basic Structured Operations 5.4.15 -- Sorting Rows
II. Structured APIs—DataFrames, SQL, and Datasets 5. Basic Structured Operations 5.4.16 -- Limit
II. Structured APIs—DataFrames, SQL, and Datasets 5. Basic Structured Operations 5.4.17 -- Repartition and Coalesce
II. Structured APIs—DataFrames, SQL, and Datasets 5. Basic Structured Operations 5.4.18 -- Collecting Rows to the Driver
II. Structured APIs—DataFrames, SQL, and Datasets 5. Basic Structured Operations 5.5. Conclusion
II. Structured APIs—DataFrames, SQL, and Datasets 6. Working with Different Types of Data 6.1 Where to Look for APIs
II. Structured APIs—DataFrames, SQL, and Datasets 6. Working with Different Types of Data 6.2 Converting to Spark Types
II. Structured APIs—DataFrames, SQL, and Datasets 6. Working with Different Types of Data 6.3 Working with Booleans
II. Structured APIs—DataFrames, SQL, and Datasets 6. Working with Different Types of Data 6.4 Working with Numbers
II. Structured APIs—DataFrames, SQL, and Datasets 6. Working with Different Types of Data 6.5 Working with Strings
II. Structured APIs—DataFrames, SQL, and Datasets 6. Working with Different Types of Data 6.5.1 -- Regular Expressions
II. Structured APIs—DataFrames, SQL, and Datasets 6. Working with Different Types of Data 6.6 Working with Dates and Timestamps
II. Structured APIs—DataFrames, SQL, and Datasets 6. Working with Different Types of Data 6.7 Working with Nulls in Data
II. Structured APIs—DataFrames, SQL, and Datasets 6. Working with Different Types of Data 6.7.1 -- Coalesce
II. Structured APIs—DataFrames, SQL, and Datasets 6. Working with Different Types of Data 6.7.2 -- ifnull, nullIf, nvl, and nvl2
II. Structured APIs—DataFrames, SQL, and Datasets 6. Working with Different Types of Data 6.7.3 -- drop
II. Structured APIs—DataFrames, SQL, and Datasets 6. Working with Different Types of Data 6.7.4 -- fill
II. Structured APIs—DataFrames, SQL, and Datasets 6. Working with Different Types of Data 6.7.5 -- replace
II. Structured APIs—DataFrames, SQL, and Datasets 6. Working with Different Types of Data 6.8 Ordering
II. Structured APIs—DataFrames, SQL, and Datasets 6. Working with Different Types of Data 6.9 Working with Complex Types
II. Structured APIs—DataFrames, SQL, and Datasets 6. Working with Different Types of Data 6.9.1 -- Structs
II. Structured APIs—DataFrames, SQL, and Datasets 6. Working with Different Types of Data 6.9.2 -- Arrays
II. Structured APIs—DataFrames, SQL, and Datasets 6. Working with Different Types of Data 6.9.3 -- split
II. Structured APIs—DataFrames, SQL, and Datasets 6. Working with Different Types of Data 6.9.4 -- Array Length
II. Structured APIs—DataFrames, SQL, and Datasets 6. Working with Different Types of Data 6.9.5 -- array_contains
II. Structured APIs—DataFrames, SQL, and Datasets 6. Working with Different Types of Data 6.9.6 -- explode
II. Structured APIs—DataFrames, SQL, and Datasets 6. Working with Different Types of Data 6.9.7 -- Maps
II. Structured APIs—DataFrames, SQL, and Datasets 6. Working with Different Types of Data 6.10 Working with JSON
II. Structured APIs—DataFrames, SQL, and Datasets 6. Working with Different Types of Data 6.11 User-Defined Functions
II. Structured APIs—DataFrames, SQL, and Datasets 6. Working with Different Types of Data 6.12 Conclusion
II. Structured APIs—DataFrames, SQL, and Datasets 7. Aggregations 7.1 Aggregation Functions
II. Structured APIs—DataFrames, SQL, and Datasets 7. Aggregations 7.1.1 -- count
II. Structured APIs—DataFrames, SQL, and Datasets 7. Aggregations 7.1.2 -- countDistinct
II. Structured APIs—DataFrames, SQL, and Datasets 7. Aggregations 7.1.3 -- approx_count_distinct
II. Structured APIs—DataFrames, SQL, and Datasets 7. Aggregations 7.1.4 -- first and last
II. Structured APIs—DataFrames, SQL, and Datasets 7. Aggregations 7.1.5 -- min and max
II. Structured APIs—DataFrames, SQL, and Datasets 7. Aggregations 7.1.6 -- sum
II. Structured APIs—DataFrames, SQL, and Datasets 7. Aggregations 7.1.7 -- sumDistinct
II. Structured APIs—DataFrames, SQL, and Datasets 7. Aggregations 7.1.8 -- avg
II. Structured APIs—DataFrames, SQL, and Datasets 7. Aggregations 7.1.9 -- Variance and Standard Deviation
II. Structured APIs—DataFrames, SQL, and Datasets 7. Aggregations 7.1.10 -- skewness and kurtosis
II. Structured APIs—DataFrames, SQL, and Datasets 7. Aggregations 7.1.11 -- Covariance and Correlation
II. Structured APIs—DataFrames, SQL, and Datasets 7. Aggregations 7.1.12 -- Aggregating to Complex Types
II. Structured APIs—DataFrames, SQL, and Datasets 7. Aggregations 7.2 Grouping
II. Structured APIs—DataFrames, SQL, and Datasets 7. Aggregations 7.2.1 -- Grouping with Expressions
II. Structured APIs—DataFrames, SQL, and Datasets 7. Aggregations 7.2.2 -- Grouping with Maps
II. Structured APIs—DataFrames, SQL, and Datasets 7. Aggregations 7.3 Window Functions
II. Structured APIs—DataFrames, SQL, and Datasets 7. Aggregations 7.4 Grouping Sets
II. Structured APIs—DataFrames, SQL, and Datasets 7. Aggregations 7.4.1 -- Rollups
II. Structured APIs—DataFrames, SQL, and Datasets 7. Aggregations 7.4.2 -- Cube
II. Structured APIs—DataFrames, SQL, and Datasets 7. Aggregations 7.4.3 -- Grouping Metadata
II. Structured APIs—DataFrames, SQL, and Datasets 7. Aggregations 7.4.4 -- Pivot
II. Structured APIs—DataFrames, SQL, and Datasets 7. Aggregations 7.5 User-Defined Aggregation Functions
II. Structured APIs—DataFrames, SQL, and Datasets 7. Aggregations 7.6 Conclusion
II. Structured APIs—DataFrames, SQL, and Datasets 8. Joins 8.1 Join Expressions
II. Structured APIs—DataFrames, SQL, and Datasets 8. Joins 8.2 Join Types
II. Structured APIs—DataFrames, SQL, and Datasets 8. Joins 8.3 Inner Joins
II. Structured APIs—DataFrames, SQL, and Datasets 8. Joins 8.4 Outer Joins
II. Structured APIs—DataFrames, SQL, and Datasets 8. Joins 8.5 Left Outer Joins
II. Structured APIs—DataFrames, SQL, and Datasets 8. Joins 8.6 Right Outer Joins
II. Structured APIs—DataFrames, SQL, and Datasets 8. Joins 8.7 Left Semi Joins
II. Structured APIs—DataFrames, SQL, and Datasets 8. Joins 8.8 Left Anti Joins
II. Structured APIs—DataFrames, SQL, and Datasets 8. Joins 8.9 Natural Joins
II. Structured APIs—DataFrames, SQL, and Datasets 8. Joins 8.10 Cross (Cartesian) Joins
II. Structured APIs—DataFrames, SQL, and Datasets 8. Joins 8.11 Challenges When Using Joins
II. Structured APIs—DataFrames, SQL, and Datasets 8. Joins 8.11.1 -- Joins on Complex Types
II. Structured APIs—DataFrames, SQL, and Datasets 8. Joins 8.11.2 -- Handling Duplicate Column Names
II. Structured APIs—DataFrames, SQL, and Datasets 8. Joins 8.12 How Spark Performs Joins
II. Structured APIs—DataFrames, SQL, and Datasets 8. Joins 8.12.1 -- Communication Strategies
II. Structured APIs—DataFrames, SQL, and Datasets 8. Joins 8.13 Conclusion
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.1 The Structure of the Data Sources API
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.1.1 -- Read API Structure
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.1.2 -- Basics of Reading Data
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.1.3 -- Write API Structure
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.1.4 -- Basics of Writing Data
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.2 CSV Files
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.2.1 -- CSV Options
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.2.2 -- Reading CSV Files
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.2.3 -- Writing CSV Files
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.3 JSON Files
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.3.1 -- JSON Options
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.3.2 -- Reading JSON Files
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.3.3 -- Writing JSON Files
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.4 Parquet Files
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.4.1 -- Reading Parquet Files
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.4.2 -- Writing Parquet Files
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.5 ORC Files
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.5.1 -- Reading Orc Files
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.5.2 -- Writing Orc Files
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.6 SQL Databases
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.6.1 -- Reading from SQL Databases
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.6.2 -- Query Pushdown
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.6.3 -- Writing to SQL Databases
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.4 Text Files
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.4.1 -- Reading Text Files
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.4.2 -- Writing Text Files
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.5 Advanced I/O Concepts
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.5.1 -- Splittable File Types and Compression
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.5.2 -- Reading Data in Parallel
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.5.3 -- Writing Data in Parallel
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.5.4 -- Writing Complex Types
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.5.5 -- Managing File Size
II. Structured APIs—DataFrames, SQL, and Datasets 9. Data Sources 9.6 Conclusion
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.1 What Is SQL?
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.2 Big Data and SQL: Apache Hive
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.3 Big Data and SQL: Spark SQL
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.4 Spark’s Relationship to Hive
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.5 How to Run Spark SQL Queries
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.5.1 -- Spark SQL CLI
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.5.2 -- Spark’s Programmatic SQL Interface
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.5.3 -- SparkSQL Thrift JDBC/ODBC Server
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.6 Catalog
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.7 Tables
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.7.1 -- Spark-Managed Tables
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.7.2 -- Creating Tables
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.7.3 -- Creating External Tables
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.7.4 -- Inserting into Tables
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.7.5 -- Describing Table Metadata
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.7.6 -- Refreshing Table Metadata
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.7.7 -- Dropping Tables
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.7.8 -- Caching Tables
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.8 Views
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.8.1 -- Creating Views
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.8.2 -- Dropping Views
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.9 Databases
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.9.1 -- Creating Databases
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.9.2 -- Setting the Database
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.9.3 -- Dropping Databases
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.10 Select Statements
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.10.1 -- case…when…then Statements
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.11 Advanced Topics
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.11.1 -- Complex Types
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.11.2 -- Functions
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.11.3 -- Subqueries
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.12 Miscellaneous Features
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.12.1 -- Configurations
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.12.2 -- Setting Configuration Values in SQL
II. Structured APIs—DataFrames, SQL, and Datasets 10. Spark SQL 10.13 Conclusion
II. Structured APIs—DataFrames, SQL, and Datasets 11. Datasets 11.1 When to Use Datasets
II. Structured APIs—DataFrames, SQL, and Datasets 11. Datasets 11.2 Creating Datasets
II. Structured APIs—DataFrames, SQL, and Datasets 11. Datasets 11.2.1 -- In Java: Encoders
II. Structured APIs—DataFrames, SQL, and Datasets 11. Datasets 11.2.2 -- In Scala: Case Classes
II. Structured APIs—DataFrames, SQL, and Datasets 11. Datasets 11.3 Actions
II. Structured APIs—DataFrames, SQL, and Datasets 11. Datasets 11.4 Transformations
II. Structured APIs—DataFrames, SQL, and Datasets 11. Datasets 11.4.1 -- Filtering
II. Structured APIs—DataFrames, SQL, and Datasets 11. Datasets 11.4.2 -- Mapping
II. Structured APIs—DataFrames, SQL, and Datasets 11. Datasets 11.5 Joins
II. Structured APIs—DataFrames, SQL, and Datasets 11. Datasets 11.6 Grouping and Aggregations
II. Structured APIs—DataFrames, SQL, and Datasets 11. Datasets 11.7 Conclusion
IV. Production Applications 15. How Spark Runs on a Cluster 15.1 The Architecture of a Spark Application
IV. Production Applications 15. How Spark Runs on a Cluster 15.1.1 -- Execution Modes
IV. Production Applications 15. How Spark Runs on a Cluster 15.2 The Life Cycle of a Spark Application (Outside Spark)
IV. Production Applications 15. How Spark Runs on a Cluster 15.2.1 -- Client Request
IV. Production Applications 15. How Spark Runs on a Cluster 15.2.2 -- Launch
IV. Production Applications 15. How Spark Runs on a Cluster 15.2.3 -- Execution
IV. Production Applications 15. How Spark Runs on a Cluster 15.2.4 -- Completion
IV. Production Applications 15. How Spark Runs on a Cluster 15.3 The Life Cycle of a Spark Application (Inside Spark)
IV. Production Applications 15. How Spark Runs on a Cluster 15.3.1 -- The SparkSession
IV. Production Applications 15. How Spark Runs on a Cluster 15.3.2 -- Logical Instructions
IV. Production Applications 15. How Spark Runs on a Cluster 15.3.3 -- A Spark Job
IV. Production Applications 15. How Spark Runs on a Cluster 15.3.4 -- Stages
IV. Production Applications 15. How Spark Runs on a Cluster 15.3.5 -- Tasks
IV. Production Applications 15. How Spark Runs on a Cluster 15.4 Execution Details
IV. Production Applications 15. How Spark Runs on a Cluster 15.4.1 -- Pipelining
IV. Production Applications 15. How Spark Runs on a Cluster 15.4.2 -- Shuffle Persistence
IV. Production Applications 15. How Spark Runs on a Cluster 15.5 Conclusion
IV. Production Applications 16. Developing Spark Applications 16.1 Writing Spark Applications
IV. Production Applications 16. Developing Spark Applications 16.1.1 -- A Simple Scala-Based App
IV. Production Applications 16. Developing Spark Applications 16.1.2 -- Writing Python Applications
IV. Production Applications 16. Developing Spark Applications 16.1.3 -- Writing Java Applications
IV. Production Applications 16. Developing Spark Applications 16.2 Testing Spark Applications
IV. Production Applications 16. Developing Spark Applications 16.2.1 -- Strategic Principles
IV. Production Applications 16. Developing Spark Applications 16.2.2 -- Tactical Takeaways
IV. Production Applications 16. Developing Spark Applications 16.2.3 -- Connecting to Unit Testing Frameworks
IV. Production Applications 16. Developing Spark Applications 16.2.4 -- Connecting to Data Sources
IV. Production Applications 16. Developing Spark Applications 16.3 The Development Process
IV. Production Applications 16. Developing Spark Applications 16.4 Launching Applications
IV. Production Applications 16. Developing Spark Applications 16.4.1 -- Application Launch Examples
IV. Production Applications 16. Developing Spark Applications 16.5 Configuring Applications
IV. Production Applications 16. Developing Spark Applications 16.5.1 -- The SparkConf
IV. Production Applications 16. Developing Spark Applications 16.5.2 -- Application Properties
IV. Production Applications 16. Developing Spark Applications 16.5.3 -- Runtime Properties
IV. Production Applications 16. Developing Spark Applications 16.5.4 -- Execution Properties
IV. Production Applications 16. Developing Spark Applications 16.5.5 -- Configuring Memory Management
IV. Production Applications 16. Developing Spark Applications 16.5.6 -- Configuring Shuffle Behavior
IV. Production Applications 16. Developing Spark Applications 16.5.7 -- Environmental Variables
IV. Production Applications 16. Developing Spark Applications 16.5.8 -- Job Scheduling Within an Application
IV. Production Applications 16. Developing Spark Applications 16.6 Conclusion
IV. Production Applications 17. Deploying Spark 17.1 Where to Deploy Your Cluster to Run Spark Applications
IV. Production Applications 17. Deploying Spark 17.1.1 -- On-Premises Cluster Deployments
IV. Production Applications 17. Deploying Spark 17.1.2 -- Spark in the Cloud
IV. Production Applications 17. Deploying Spark 17.2 Cluster Managers
IV. Production Applications 17. Deploying Spark 17.2.1 -- Standalone Mode
IV. Production Applications 17. Deploying Spark 17.2.2 -- Spark on YARN
IV. Production Applications 17. Deploying Spark 17.2.3 -- Configuring Spark on YARN Applications
IV. Production Applications 17. Deploying Spark 17.2.4 -- Spark on Mesos
IV. Production Applications 17. Deploying Spark 17.2.5 -- Secure Deployment Configurations
IV. Production Applications 17. Deploying Spark 17.2.6 -- Cluster Networking Configurations
IV. Production Applications 17. Deploying Spark 17.2.7 -- Application Scheduling
IV. Production Applications 17. Deploying Spark 17.3 Miscellaneous Considerations
IV. Production Applications 17. Deploying Spark 17.4 Conclusion
IV. Production Applications 18. Monitoring and Debugging 18.1 The Monitoring Landscape
IV. Production Applications 18. Monitoring and Debugging 18.2 What to Monitor
IV. Production Applications 18. Monitoring and Debugging 18.2.1 -- Driver and Executor Processes
IV. Production Applications 18. Monitoring and Debugging 18.2.2 -- Queries, Jobs, Stages, and Tasks
IV. Production Applications 18. Monitoring and Debugging 18.3 Spark Logs
IV. Production Applications 18. Monitoring and Debugging 18.4 The Spark UI
IV. Production Applications 18. Monitoring and Debugging 18.4.1 -- Spark REST API
IV. Production Applications 18. Monitoring and Debugging 18.4.2 -- Spark UI History Server
IV. Production Applications 18. Monitoring and Debugging 18.5 Debugging and Spark First Aid
IV. Production Applications 18. Monitoring and Debugging 18.5.1 -- Spark Jobs Not Starting
IV. Production Applications 18. Monitoring and Debugging 18.5.2 -- Errors Before Execution
IV. Production Applications 18. Monitoring and Debugging 18.5.3 -- Errors During Execution
IV. Production Applications 18. Monitoring and Debugging 18.5.4 -- Slow Tasks or Stragglers
IV. Production Applications 18. Monitoring and Debugging 18.5.5 -- Slow Aggregations
IV. Production Applications 18. Monitoring and Debugging 18.5.6 -- Slow Joins
IV. Production Applications 18. Monitoring and Debugging 18.5.7 -- Slow Reads and Writes
IV. Production Applications 18. Monitoring and Debugging 18.5.8 -- Driver OutOfMemoryError or Driver Unresponsive
IV. Production Applications 18. Monitoring and Debugging 18.5.9 -- Executor OutOfMemoryError or Executor Unresponsive
IV. Production Applications 18. Monitoring and Debugging 18.5.10 -- Unexpected Nulls in Results
IV. Production Applications 18. Monitoring and Debugging 18.5.11 -- No Space Left on Disk Errors
IV. Production Applications 18. Monitoring and Debugging 18.5.12 -- Serialization Errors
IV. Production Applications 18. Monitoring and Debugging 18.6 Conclusion
IV. Production Applications 19. Performance Tuning 19.1 Indirect Performance Enhancements
IV. Production Applications 19. Performance Tuning 19.1.1 -- Design Choices
IV. Production Applications 19. Performance Tuning 19.1.2 -- Object Serialization in RDDs
IV. Production Applications 19. Performance Tuning 19.1.3 -- Cluster Configurations
IV. Production Applications 19. Performance Tuning 19.1.4 -- Scheduling
IV. Production Applications 19. Performance Tuning 19.1.5 -- Data at Rest
IV. Production Applications 19. Performance Tuning 19.1.6 -- Shuffle Configurations
IV. Production Applications 19. Performance Tuning 19.1.7 -- Memory Pressure and Garbage Collection
IV. Production Applications 19. Performance Tuning 19.2 Direct Performance Enhancements
IV. Production Applications 19. Performance Tuning 19.2.1 -- Parallelism
IV. Production Applications 19. Performance Tuning 19.2.2 -- Improved Filtering
IV. Production Applications 19. Performance Tuning 19.2.3 -- Repartitioning and Coalescing
IV. Production Applications 19. Performance Tuning 19.2.4 -- User-Defined Functions (UDFs)
IV. Production Applications 19. Performance Tuning 19.2.5 -- Temporary Data Storage (Caching)
IV. Production Applications 19. Performance Tuning 19.2.6 -- Joins
IV. Production Applications 19. Performance Tuning 19.2.7 -- Aggregations
IV. Production Applications 19. Performance Tuning 19.2.8 -- Broadcast Variables
IV. Production Applications 19. Performance Tuning 19.3 Conclusion

Comandos básicos do Databricks

databricks's People

Contributors

mathmachado avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

sophiasl

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.