Data Engineering - Big Data Trunk

https://project.bigdatatrunk.com Quality Corporate and Classroom Training in Bay Area CA Mon, 07 Apr 2025 15:42:34 +0000 en-US hourly 1 https://wordpress.org/?v=7.0 Byte Sized Series – AI First Mindset https://project.bigdatatrunk.com/courses/byte-sized-series-ai-first-mindset/ https://project.bigdatatrunk.com/courses/byte-sized-series-ai-first-mindset/#respond Mon, 07 Apr 2025 08:03:03 +0000 https://www.bigdatatrunk.com/?post_type=lp_course&p=58151 This introductory session offers professionals a clear, engaging entry point into the world of Generative AI - what it is, how it has evolved, and why it's transforming today’s workplaces.

The post Byte Sized Series – AI First Mindset first appeared on Big Data Trunk.

]]>
Deprecated: Creation of dynamic property OMAPI_Elementor_Widget::$base is deprecated in /home2/schooli5/public_html/project/wp-content/plugins/optinmonster/OMAPI/Elementor/Widget.php on line 41

Overview
Prerequisites
Audience
Curriculum

Description:

This introductory session offers professionals a clear, engaging entry point into the world of Generative AI - what it is, how it has evolved, and why it's transforming today’s workplaces. Through real-world use cases and guided discussions, participants will explore GenAI’s strengths and challenges, while uncovering how an ‘AI-First Mindset’ fosters creativity, accelerates tasks, and enhances collaborative decision-making across functions. No technical experience is needed — just curiosity and a willingness to rethink how we approach work.

Duration: 90 min

Course Code: BDT473

Learning Objectives:

After completing this course, participants will be able to:

Understand the core concepts of Generative AI and its evolution from traditional AI.
Recognize key strengths, risks, and current limitations of GenAI tools.
Identify practical opportunities to adopt GenAI tools in their professional roles.

No technical background required
A general curiosity about AI and openness to new ways of working

Business professionals looking to understand how AI can enhance productivity
Team leads and managers exploring AI integration in workflows
Non-technical professionals interested in AI’s impact on the workplace
Tech-savvy individuals seeking a strategic perspective before diving into tools

Course Outline:

Introduction to Generative AI and Its Evolution
Key milestones in AI development
Real-world examples of GenAI tools (text, image, audio, code generation)
Understanding the AI-First Mindset
What it means to think “AI-First” (mindset vs. toolset)
How companies are reimagining workflows using GenAI
Shifting perspectives: AI as a co-pilot, not a replacement
GenAI Strengths and Challenges
Human + AI: Practical Use Cases in the Workplace
Cross-functional examples: Marketing, HR, legal, sales, data analysis
Live demo and Q&A

The post Byte Sized Series – AI First Mindset first appeared on Big Data Trunk.

]]> https://project.bigdatatrunk.com/courses/byte-sized-series-ai-first-mindset/feed/ 0 Generative AI for UX Designers https://project.bigdatatrunk.com/courses/generative-ai-for-ux-designers/ https://project.bigdatatrunk.com/courses/generative-ai-for-ux-designers/#respond Thu, 03 Apr 2025 07:16:26 +0000 https://www.bigdatatrunk.com/?post_type=lp_course&p=58021 This 1-day course teaches UX designers how to use Generative AI (Gen AI) to enhance their workflows, speed up design tasks, and improve user experiences.

The post Generative AI for UX Designers first appeared on Big Data Trunk.

]]>

Overview
Prerequisites
Audience
Curriculum

Description:

This 1-day course teaches UX designers how to use Generative AI (Gen AI) to enhance their workflows, speed up design tasks, and improve user experiences. Participants will learn to integrate AI into various stages of the UX design process—from ideation and prototyping to user research and usability testing. The course will focus on practical AI tools that directly impact design quality, user insights, and creativity.

Duration: 1 Day

Course Code: BDT424

Learning Objectives:

By the end of the training, participants will be able to:

Understand how Generative AI can improve UX design:
Speed up the ideation phase with AI
Create efficient prototypes with AI assistance
Enhance user research with AI tools.
Improve usability testing with AI
Personalize user experiences with AI

Basic understanding of UX design principles.
Familiarity with tools like Figma, Sketch, or Adobe XD.
Interest in incorporating AI tools into the design process.

The target audience for this course includes UX designers, UI designers, product designers, design researchers, and UX researchers.

Course Outline:

1. Introduction to Generative AI in UX Design

What is Generative AI?
- Brief overview of AI, machine learning, and generative models, with an emphasis on their relevance for UX design.
Why Should UX Designers Embrace AI?
- How AI enhances creativity, workflow, and decision-making within the UX design process.
- Key areas where AI can be most effective: ideation, prototyping, research, testing, and user experience personalization.

2. AI for Ideation and Concept Design

Leveraging AI for Design Inspiration
- Using tools like DALL·E, MidJourney, and Runway to generate design ideas, user interface concepts, and creative assets.
AI-Driven Design Recommendations
- How AI can assist in generating color schemes, layouts, typography, and other visual components based on best practices.

3. AI-Powered Prototyping and Wireframing

AI-Assisted Prototype Generation
- Tools like Uizard and Figma AI for auto-generating wireframes and responsive designs.
- How AI can speed up the prototyping process by suggesting UI elements and layouts based on user needs.
AI for Layouts and Component Variations
- Using AI to quickly iterate on design elements like buttons, navigation menus, and page layouts

4. AI in User Research and Data Analysis

AI-Driven User Research Insights
- How AI tools (e.g., Lookback, Hotjar) help collect, analyze, and interpret user behavior data.
- Using AI to identify trends, patterns, and common pain points in user feedback and analytics.
AI for Sentiment Analysis
- Leveraging AI to analyze open-ended responses from user interviews, surveys, and feedback sessions.

5. AI for Usability Testing and Feedback Analysis

AI-Enhanced Usability Testing
- How AI tools can simulate user interactions and provide predictive insights about potential usability issues.
- Tools like UsabilityHub and PlaybookUX for AI-assisted usability testing and heatmap generation.
Using AI to Optimize User Interfaces
- Analyzing session recordings, heatmaps, and interaction data to identify areas for design improvement.

6. AI for Personalizing User Experiences

Creating Dynamic, Personalized User Interfaces
- How AI can help deliver personalized experiences based on user behavior, preferences, and historical data.
- Implementing personalized content, adaptive layouts, and dynamic UI elements with AI tools.

Training material provided: Yes (Digital format)

The post Generative AI for UX Designers first appeared on Big Data Trunk.

]]> https://project.bigdatatrunk.com/courses/generative-ai-for-ux-designers/feed/ 0 Data Science for Finance https://project.bigdatatrunk.com/courses/data-science-for-finance/ https://project.bigdatatrunk.com/courses/data-science-for-finance/#respond Tue, 01 Apr 2025 12:39:41 +0000 https://www.bigdatatrunk.com/?post_type=lp_course&p=57943 This 3-day hands-on training bridges the gap between financial domain expertise and data science techniques.

The post Data Science for Finance first appeared on Big Data Trunk.

]]>

Overview
Prerequisites
Audience
Curriculum

Description:

This 3-day hands-on training bridges the gap between financial domain expertise and data science techniques. Participants will explore the foundational concepts of data science and their application in finance, including risk modeling, credit scoring, fraud detection, and algorithmic trading. Through a mix of real-world datasets and use cases, learners will use Python and free libraries such as pandas, scikit-learn, and TensorFlow to implement machine learning and deep learning models. Each module includes both conceptual understanding and practical implementation, empowering participants to apply data-driven insights to real-world financial challenges.

Duration: 3 Days

Course Code: BDT48

Learning Objectives:

After this training, participants will be able to:

Describe the role of data science in solving financial problems.
Apply data preprocessing and feature engineering techniques on financial datasets
Implement machine learning models for classification, regression, and anomaly detection.
Analyze model performance using appropriate metrics and improve predictive accuracy.
Develop and evaluate deep learning models for financial forecasting and risk modeling.

Basic knowledge of finance and statistics
Familiarity with Python programming
No prior machine learning experience required

Finance professionals and analysts exploring data science
Aspiring data scientists seeking finance-specific applications
Business analysts looking to apply machine learning in financial decision-making

Course Outline:

Module 1: Introduction to Data Science in Finance

Overview of data science lifecycle
Key challenges in financial data analysis
Exploratory data analysis using pandas and matplotlib
Financial datasets: market data, transactions, credit data
Hands-on: Cleaning and visualizing financial time-series and tabular data

Module 2: Machine Learning Applications in Finance

Supervised vs unsupervised learning in finance
Classification use cases: credit scoring, fraud detection
Regression use cases: stock price prediction, financial forecasting
Clustering use cases: customer segmentation and profiling
Hands-on: Building and evaluating models using scikit-learn
Hands-on: ROC curves, confusion matrix, R-squared, and error metrics

Module 3: Deep Learning and Advanced Use Cases

Introduction to neural networks for financial data
Time-series modeling with LSTM for forecasting
Anomaly detection using autoencoders
Use case: Loan default prediction using TensorFlow/Keras
Hands-on: Deep learning model implementation and evaluation
Best practices for model deployment and explainability (LIME, SHAP)

Training material provided: Yes (Digital format)

The post Data Science for Finance first appeared on Big Data Trunk.

]]> https://project.bigdatatrunk.com/courses/data-science-for-finance/feed/ 0 Data Engineering and Analytics on GCP (Google Cloud Platform) https://project.bigdatatrunk.com/courses/data-engineering-and-analytics-on-gcp-google-cloud-platform/ https://project.bigdatatrunk.com/courses/data-engineering-and-analytics-on-gcp-google-cloud-platform/#respond Tue, 26 Nov 2024 13:26:23 +0000 https://www.bigdatatrunk.com/?post_type=lp_course&p=54798 This training provides an in-depth introduction to data engineering and analytics on Google Cloud Platform (GCP). Participants will explore key GCP services such as BigQuery, Dataflow, and Cloud Storage while learning to build scalable data pipelines and analyze datasets effectively.

The post Data Engineering and Analytics on GCP (Google Cloud Platform) first appeared on Big Data Trunk.

]]>

Overview
Prerequisites
Audience
Curriculum

Description:

This training provides an in-depth introduction to data engineering and analytics on Google Cloud Platform (GCP). Participants will explore key GCP services such as BigQuery, Dataflow, and Cloud Storage while learning to build scalable data pipelines and analyze datasets effectively. The training focuses on hands-on application of GCP tools to address real-world data challenges. By the end of the day, attendees will be equipped to design and implement efficient data workflows and analytics solutions on GCP.

Duration: 1 Day

Course Code: BDT34

Learning Objectives:

By the end of this training, participants will be able to:

Identify the core data engineering and analytics tools on GCP.
Build data pipelines using Cloud Storage, Dataflow, and Pub/Sub.
Analyze large datasets with BigQuery.
Design workflows to integrate real-time and batch processing.
Optimize data solutions for cost and performance on GCP.

Basic familiarity with data concepts and cloud computing is recommended. Knowledge of SQL is helpful but not required.

Data engineers and analysts exploring GCP for data solutions.
IT professionals interested in building scalable data workflows on GCP.
Business leaders seeking to understand GCP analytics capabilities.

Course Outline:

Module 1: Introduction to GCP for Data Engineering and Analytics

Overview of GCP’s Data Ecosystem
Key Services: BigQuery, Dataflow, Cloud Storage, and Pub/Sub

Module 2: Data Storage and ETL Pipelines on GCP

Storing and Managing Data with Cloud Storage
Creating ETL Pipelines with Dataflow
Hands-On: Building a Data Pipeline

Module 3: Analytics with BigQuery

Introduction to BigQuery: Architecture and Features
Querying and Analyzing Datasets
Hands-On: Writing and Executing BigQuery SQL Queries

Module 4: Real-Time Data Processing with Pub/Sub

Introduction to Pub/Sub for Streaming Data
Designing Real-Time Data Workflows
Hands-On: Processing Streaming Data

Module 5: Use Cases, Best Practices, and Wrap-Up

Real-World Applications of GCP in Data Engineering
Best Practices for Performance and Cost Optimization
Q&A and Additional Resources

Training material provided: Yes (Digital format)

The post Data Engineering and Analytics on GCP (Google Cloud Platform) first appeared on Big Data Trunk.

]]> https://project.bigdatatrunk.com/courses/data-engineering-and-analytics-on-gcp-google-cloud-platform/feed/ 0 Data Engineering and Analytics on AWS (Amazon Web Services) https://project.bigdatatrunk.com/courses/data-engineering-and-analytics-on-aws-amazon-web-services/ https://project.bigdatatrunk.com/courses/data-engineering-and-analytics-on-aws-amazon-web-services/#respond Tue, 26 Nov 2024 13:19:40 +0000 https://www.bigdatatrunk.com/?post_type=lp_course&p=54793 This training provides a hands-on introduction to data engineering and analytics capabilities on AWS. Participants will learn how to build scalable data pipelines, process and analyze data, and use key AWS services such as AWS Glue, Redshift, and Athena.

The post Data Engineering and Analytics on AWS (Amazon Web Services) first appeared on Big Data Trunk.

]]>

Overview
Prerequisites
Audience
Curriculum

Description:

This training provides a hands-on introduction to data engineering and analytics capabilities on AWS. Participants will learn how to build scalable data pipelines, process and analyze data, and use key AWS services such as AWS Glue, Redshift, and Athena. The training emphasizes practical applications of AWS tools to manage and analyze large datasets efficiently. By the end of the session, attendees will have the foundational skills to design and implement data workflows and analytics solutions on AWS.

Duration: 1 Day

Course Code: BDT33

Learning Objectives:

By the end of this training, participants will be able to:

Identify the key data engineering and analytics services on AWS.
Build data pipelines using AWS Glue and S3.
Analyze large datasets using Redshift and Athena.
Integrate real-time and batch processing workflows.
Evaluate AWS-based solutions for analytics in business scenarios.

Basic knowledge of cloud computing and data concepts is recommended. Familiarity with SQL is beneficial but not mandatory.

Data engineers and analysts exploring AWS for data solutions.
IT professionals seeking to implement data pipelines and analytics workflows.
Business managers interested in AWS-based analytics solutions.

Course Outline:

Module 1: Introduction to AWS Data Engineering and Analytics

Overview of Data Engineering and Analytics Concepts
AWS Data Ecosystem: S3, Glue, Redshift, Athena, Kinesis

Module 2: Data Storage and ETL Pipelines with AWS Glue

Introduction to AWS Glue for Data Integration
Building ETL Pipelines and Cataloging Data
Hands-On: Creating an ETL Workflow

Module 3: Analytics with Redshift and Athena

Overview of Amazon Redshift for Data Warehousing
Serverless Analytics with Amazon Athena
Hands-On: Querying and Analyzing Data

Module 4: Real-Time Data Processing with Amazon Kinesis

Introduction to Streaming Data Processing
Designing Real-Time Workflows with Kinesis Data Streams

Module 5: Real-World Use Cases and Best Practices

Applications of Data Engineering on AWS
Best Practices for Scalability and Cost Optimization
Q&A and Additional Resources

Training material provided: Yes (Digital format)

The post Data Engineering and Analytics on AWS (Amazon Web Services) first appeared on Big Data Trunk.

]]> https://project.bigdatatrunk.com/courses/data-engineering-and-analytics-on-aws-amazon-web-services/feed/ 0 Data Engineering and Analytics on Microsoft Cloud – Azure https://project.bigdatatrunk.com/courses/data-engineering-and-analytics-on-microsoft-cloud-azure/ https://project.bigdatatrunk.com/courses/data-engineering-and-analytics-on-microsoft-cloud-azure/#respond Tue, 26 Nov 2024 13:13:59 +0000 https://www.bigdatatrunk.com/?post_type=lp_course&p=54788 This training focuses on the powerful data engineering and analytics capabilities provided by Microsoft Azure. Participants will learn how to build robust data pipelines, process large datasets, and perform analytics using Azure services.

The post Data Engineering and Analytics on Microsoft Cloud – Azure first appeared on Big Data Trunk.

]]>

Overview
Prerequisites
Audience
Curriculum

Description:

This training focuses on the powerful data engineering and analytics capabilities provided by Microsoft Azure. Participants will learn how to build robust data pipelines, process large datasets, and perform analytics using Azure services. The training includes an introduction to key Azure tools like Data Factory, Synapse Analytics, and Databricks, complemented by hands-on exercises to apply concepts in real-world scenarios. By the end of the day, participants will have the confidence to implement scalable data engineering and analytics workflows on Azure.

Duration: 1 Day

Course Code: BDT32

Learning Objectives:

By the end of this training, participants will be able to:

Describe the data engineering and analytics services available on Azure.
Build data pipelines using Azure Data Factory.
Process and analyze data with Azure Synapse Analytics and Azure Databricks.
Design scalable workflows for ETL and data integration.
Evaluate use cases for applying Azure solutions in analytics.

Basic understanding of data concepts, including ETL and analytics, is recommended. Familiarity with cloud platforms is helpful but not required.

Data engineers and analysts exploring Azure solutions.
IT professionals seeking to integrate data workflows on Azure.
Business professionals interested in leveraging data analytics on the cloud.

Course Outline:

Module 1: Introduction to Data Engineering and Analytics on Azure

Overview of Data Engineering and Analytics Concepts
Introduction to Azure’s Data Ecosystem
Key Services: Azure Data Factory, Synapse Analytics, Databricks

Module 2: Building Data Pipelines with Azure Data Factory

Introduction to Azure Data Factory (ADF)
Data Integration and ETL Workflow Design
Hands-On: Creating and Managing Pipelines

Module 3: Processing and Analyzing Data with Azure Synapse Analytics

Overview of Azure Synapse: Features and Architecture
Performing Analytics with SQL and Serverless Pools
Hands-On: Analyzing Data in Synapse

Module 4: Advanced Data Processing with Azure Databricks

Introduction to Azure Databricks and Apache Spark Integration
Processing Large Datasets in Real-Time
Hands-On: Implementing Analytics with Databricks

Module 5: Real-World Use Cases and Wrap-Up

Real-World Applications of Data Engineering on Azure
Best Practices for Performance and Cost Optimization
Q&A and Additional Resources

Training material provided: Yes (Digital format)

The post Data Engineering and Analytics on Microsoft Cloud – Azure first appeared on Big Data Trunk.

]]> https://project.bigdatatrunk.com/courses/data-engineering-and-analytics-on-microsoft-cloud-azure/feed/ 0 Kickstart DBT for Snowflake in a Day https://project.bigdatatrunk.com/courses/kickstart-dbt-for-snowflake-in-a-day/ https://project.bigdatatrunk.com/courses/kickstart-dbt-for-snowflake-in-a-day/#respond Thu, 07 Sep 2023 06:34:07 +0000 https://www.bigdatatrunk.com/?post_type=lp_course&p=43402 In this one-day course, students will dive into core concepts of data build tool (DBT) and learn how to streamline data engineering pipelines for Snowflake.

The post Kickstart DBT for Snowflake in a Day first appeared on Big Data Trunk.

]]>

Overview
Prerequisites
Audience
Curriculum

Description:

In this one-day course, students will dive into core concepts of data build tool (DBT) and learn how to streamline data engineering pipelines for Snowflake. From understanding models and materialization to exploring source freshness and using advanced techniques like macros and hooks, students will gain a solid foundation of using data build tools (DBT) effectively. Students will get practical hands-on experience using the data build tool with Snowflake.

Duration: 1 Day

Course Code: BDT302

Learning Objectives:

After this course, you will be able to:

Introduction Data Build Tool (DBT)
Understanding DBT models
Using DBT tests to ensure quality of DBT models
Explore DBT materializations to optimize performance & scalability of DBT models
Integrating seeds and sources in DBT project during data ingestion
Dive into DBT Hooks to integrate external scripts and actions into your DBT workflows

Basic understanding of Snowflake and SQL

This course is designed for Analytics Engineers, Data Analysts, BI Professionals, Data Scientists, Data Engineers, DevOps Engineers, and Architects

1. Introduction to Data Build Tool (DBT)

Introduction to Data Warehouse (Snowflake)
ETL v/s ELT
DBT introduction
DBT installation
DBT cloud introduction
Lab: Getting started with DBT

2. Understanding DBT Models

What are DBT models?
Creating DBT table
Using DBT schema
DBT project organization
Lab: Project organization

3. Using DBT Tests

What is DBT schema?
What is a DBT Macro?
Understanding DBT test types
Lab: Generic and Singular Tests

4. Exploring DBT Materialization

What are materializations in DBT?
Default materialization in DBT
Using Config Block for materialization
Lab: Setting materialization

5. Integrating DBT Seeds and Sources

Seeds and Sources overview
Adding sources in DBT
What is source freshness?
Labs: Adding source freshness check in DBT

6. DBT Hooks

What are DBT Hooks?
Understanding pre-hook, post-hook, on-run-start, on-run-end hooks
Implementing DBT Hook

Training material provided: Yes (Digital format)

Hands-on Lab: Instructions will be provided to set up a free tier snowflake account. Also, students will be provided on how to install DBT tool on Windows/Mac

The post Kickstart DBT for Snowflake in a Day first appeared on Big Data Trunk.

]]> https://project.bigdatatrunk.com/courses/kickstart-dbt-for-snowflake-in-a-day/feed/ 0 Mastering Data Build Tool (DBT) for Snowflake https://project.bigdatatrunk.com/courses/mastering-data-build-tool-dbt-for-snowflake/ https://project.bigdatatrunk.com/courses/mastering-data-build-tool-dbt-for-snowflake/#respond Thu, 07 Sep 2023 05:53:45 +0000 https://www.bigdatatrunk.com/?post_type=lp_course&p=43416 In the data-driven landscape, the ability to efficiently manage, transform, and materialize data is crucial.

The post Mastering Data Build Tool (DBT) for Snowflake first appeared on Big Data Trunk.

]]>

Overview
Prerequisites
Audience
Curriculum

Description:

In the data-driven landscape, the ability to efficiently manage, transform, and materialize data is crucial. In this comprehensive hands-on course, students will dive deep into the world of DBT and learn how to leverage its power to build robust data transformation pipelines. From foundational concepts to advanced techniques, students will gain hands-on experience working with key components: models, materialization, seeds, snapshots, source freshness, macro, and hooks. By the end of this course, students will be equipped with skills to build efficient data pipelines using modern data build tools.

Duration: 2 days

Course Code: BDT301

Learning Objectives:

After this course, you will be able to:

Introduction Data Build Tool (DBT)
Understanding DBT models
Using DBT tests to ensure quality of DBT models
Explore DBT materializations to optimize performance & scalability of DBT models
Integrating seeds and sources in DBT project during data ingestion
Enhance data loading capabilities and create custom macros
Using DBT snapshots learn how to capture historical versions of data for auditing and analysis process
Dive into DBT Hooks to integrate external scripts and actions into your DBT workflows

Basic understanding of Snowflake and SQL

This course is designed for Analytics Engineers, Data Analysts, BI Professionals, Data Scientists, Data Engineers, DevOps Engineers, and Architects

1. Introduction to Data Build Tool (DBT)

Introduction to Data Warehouse (Snowflake)
ETL v/s ELT
DBT introduction
DBT installation
DBT cloud introduction
Lab: Getting started with DBT

2. Understanding DBT Models

What are DBT models?
Creating DBT table
Using DBT schema
DBT project organization
Lab: Project organization

3. Using DBT Tests

What is DBT schema?
What is a DBT Macro?
Understanding DBT test types
Lab: Generic and Singular Tests

4. Exploring DBT Materialization

What are materializations in DBT?
Default materialization in DBT
Using Config Block for materialization
Lab: Setting materialization

5. Integrating DBT Seeds and Sources

Seeds and Sources overview
Adding sources in DBT
What is source freshness?
Labs: Adding source freshness check in DBT

6. DBT Custom Macros

Implementing Table, View and Ephemeral Model
Create custom macro
Understanding DBT package
Labs: Building incremental load

7. Working with DBT Snapshots

Snapshots overview
Creating a snapshot
Labs: create snapshot

8. DBT Hooks

What are DBT Hooks?
Understanding pre-hook, post-hook, on-run-start, on-run-end hooks
Implementing DBT Hook
Labs: implementing DBT hook

Training material provided: Yes (Digital format)

Hands-on Lab: Instructions will be provided to set up a free tier snowflake account. Also, students will be provided on how to install DBT tool on Windows/Mac

The post Mastering Data Build Tool (DBT) for Snowflake first appeared on Big Data Trunk.

]]> https://project.bigdatatrunk.com/courses/mastering-data-build-tool-dbt-for-snowflake/feed/ 0 Kickstart Snowpark with Python https://project.bigdatatrunk.com/courses/kickstart-snowpark-with-python/ https://project.bigdatatrunk.com/courses/kickstart-snowpark-with-python/#respond Thu, 07 Sep 2023 05:42:52 +0000 https://www.bigdatatrunk.com/?post_type=lp_course&p=43391 Snowpark is a new developer experience for Snowflake that allows developers to write code in their preferred language: Scala, Java or Python to supplement the original SQL interface.

The post Kickstart Snowpark with Python first appeared on Big Data Trunk.

]]>

Overview
Prerequisites
Audience
Curriculum

Description:

Unlock the full potential of Snowpark, the innovative developer experience for Snowflake. This course equips you with the expertise to leverage your preferred language—Scala, Java, or Python—alongside the SQL interface. Discover how to harness the Snowpark API to create a customized software development environment. Say goodbye to exporting data to external environments and tap into Snowflake's powerful computing capabilities. Dive into reading and writing operations, transformations, queries, and the creation of Python UDFs (user-defined functions) using Snowpark.

Duration: 1 day

Course Code: BDT300

Learning Objectives

By the end of this course, you will:

Get Started with Snowpark and Python Integration in Snowflake
Leverage Snowpark for -Efficient Structured Data Reading and Writing in Snowflake
Master the Art of Handling Semi-Structured Data Using Snowpark
Perform Real-Time Data -Transformations While --Loading with Snowpark
Seamlessly Integrate Third-Party Python Libraries to Create User-Defined Functions (UDFs) in Snowpark.

Basic knowledge Snowflake and Python.

This course is designed for anyone interested in using the Snowpark API using Python. It is geared towards data engineers, architects, QA engineers, BI professionals, and data analysts who want to use Python to handle data processing in Snowflake.

Course Outline:

1. Introduction to Snowpark and Python Integration

Overview of Snowpark and its importance in Snowflake data processing
Brief introduction to Python’s role in Snowpark development
Setting up Snowpark development environment with Python
Hands-on: Execute a basic Python script to use Snowpark API

2. Use Snowpark to read and write structured data in Snowflake

Create Snowpark Dataframe
Apply schema to Dataframe
Read from S3: CSV and JSON
Write from S3 to Snowflake table, CSV, JSON
Hands-on lab with these topics

3. Handling semi-structured data with Snowpark

Create dataframe from S3 JSON files
Copy data into snowflake dataframe
Create dataframe from parquet files
Copy data into S3 parquet files into Snowflake table
Handle error records
Hands-on labs with these topics

4. Perform transformations while loading

Using the Snowpark’ aggregation framework
Perform grouping of data
Using Window functions
Using Join and the “using” clause
Hands-on labs with these topics

5. Integration third party Python libraries to create UDF

Build generic usable components library in Python
Create Snowpark UDF
Using vectorized UDFs
Integrating external packages
Hands-on lab with these topics

Training material provided: Yes (Digital format)

Hands-on Lab: Instructions will be provided to students to create “trial” snowflake account. Instructions will be provided in class to install and use Snowpark

The post Kickstart Snowpark with Python first appeared on Big Data Trunk.

]]> https://project.bigdatatrunk.com/courses/kickstart-snowpark-with-python/feed/ 0 Getting Started with Apache Spark using Databricks https://project.bigdatatrunk.com/courses/getting-started-with-apache-spark-using-databricks/ https://project.bigdatatrunk.com/courses/getting-started-with-apache-spark-using-databricks/#respond Thu, 07 Sep 2023 05:08:36 +0000 https://www.bigdatatrunk.com/?post_type=lp_course&p=43373 Jumpstart your data journey with our 'Getting Started With Apache Spark Using Databricks' training. This course empowers participants to tackle complex data challenges, harnessing the potential of Apache Hadoop and Apache Spark to uncover valuable insights across various domains.

The post Getting Started with Apache Spark using Databricks first appeared on Big Data Trunk.

]]>

Overview
Prerequisites
Audience
Curriculum

Description:

Jumpstart your data journey with our 'Getting Started With Apache Spark Using Databricks' training. This course empowers participants to tackle complex data challenges, harnessing the potential of Apache Hadoop and Apache Spark to uncover valuable insights across various domains.

In today's data-driven world, Big Data has become the driving force behind intelligent enterprise software. Companies worldwide are adopting Big Data solutions to manage the vast and high-velocity data streams efficiently.

For software architects and engineers, this course offers a practical, hands-on experience with a blend of lectures, demonstrations, and interactive labs, ensuring a comprehensive understanding of Big Data and Apache Spark's advanced applications. Start your data transformation journey today.

Duration: 4 Days

Course Code: BDT97

Learning Objectives:

After this course, you will be able to:

Have a broad understanding of Big Data Ecosystem.
Understand the various offerings like Cloudera, Hortonworks, MapR, Amazon EMR and Microsoft Azure HDInsight in the industry around Big data on cloud and on Premise.
Understand the impact and value of Apache Spark in the Big Data Ecosystem.
Understand the Apache Spark Architecture and the various libraries to perform various use cases like SQL, Streaming, Machine Learning, Graphix/Graph Frames, etc.
Setup Account on Apache Spark Databricks Cloud.
Perform hands-on activity on Big Data Ecosystem.

Experience of programming language like Python required.
SQL and Data knowledge
Familiarity with Big data is a plus

This course is designed for Data Analysts, Software Engineers, Data Engineer, Data Professional, Business Intelligence Developer, Data Architect, DevOps Engineer

Course Outline

Day 1: -

Big Data overview

A brief history of Big Data
History and background of Big Data and Hadoop
5 V’s of Big Data
Secret Sauce of Big Data Hadoop
Big Data Distributions in Industry
End-to-End Big Data Life cycle overview
Industry Use cases

Big Data Ecosystem before Spark

Big Data Ecosystem before Apache Spark
Storage options – HDFS and No-SQL
Processing options – MapReduce, Hive etc.
Administrative tools – Zookeeper, Ozzie etc.
Ingestion tools – Sqoop, Flume

Big Data Ecosystem after Spark

Big Data Ecosystem after Apache Spark
Compare MapReduce Vs Apache Spark
Apache Spark Architecture
Understand Apache Architecture and Libraries like Streaming, Machine Learning with Spark ML, GraphX/GraphFrames, etc.
Understanding Spark RDD
Setup Account on Apache Spark Databricks Cloud.
Introduction to Notebooks concept on Databricks
Demos and Labs

Days 2: -

Getting Started with Apache Spark

Introduction to Spark RDD
Spark RDD Transformation and Actions
Spark Lifecycle
Spark Caching
Lab - Spark RDD Transformation & Actions
Lab - Spark RDD Advanced Transformation & Actions
Demos and Labs

Apache Spark SQL, DataFrames, Datasets

Introduction to Spark SQL
SQL, DataFrames and Datasets Spark Library
Compare the various APIs - RDD, DataFrames and Datasets
Lab - Spark DataFrames Transformation & Actions
Lab - Spark DataFrames Advanced Transformation & Actions
Demos and Labs

Days3: -

Data Science Overview

Data Science Process Overview
Structured and Unstructured Data
Data Acquisition and Transformation
Data Analysis and Machine Learning
Machine Learning Concepts

Machine Learning Overview using Apache Spark

Introduction to Machine Learning and Data Science
Machine Learning Spark Library
Spark Machine Learning – Classification, Regression
Machine Learning Model building with Spark ML Library
Demos and Labs

Days4: -

Structured Streaming Overview using Apache Spark

Need of real time processing
Streaming Spark Library
Streaming Query
Processing and Aggregating Streams
Data Lake concept
Spark Streaming examples
Demos and Labs

Graphix/Graph Frames Overview using Apache Spark

Need of Graphix/Graph Frames
Spark Graphx & GraphFrames Library
Spark Graphx & GraphFrames examples
Demos and Labs

Training material provided: Yes (Digital format)

The post Getting Started with Apache Spark using Databricks first appeared on Big Data Trunk.

]]> https://project.bigdatatrunk.com/courses/getting-started-with-apache-spark-using-databricks/feed/ 0