Data Engineering with AWS

Data Engineering with AWS PDF Author: Gareth Eagar
Publisher: Packt Publishing Ltd
ISBN: 1800569041
Category : Computers
Languages : en
Pages : 482

Get Book

Book Description
The missing expert-led manual for the AWS ecosystem — go from foundations to building data engineering pipelines effortlessly Purchase of the print or Kindle book includes a free eBook in the PDF format. Key Features Learn about common data architectures and modern approaches to generating value from big data Explore AWS tools for ingesting, transforming, and consuming data, and for orchestrating pipelines Learn how to architect and implement data lakes and data lakehouses for big data analytics from a data lakes expert Book DescriptionWritten by a Senior Data Architect with over twenty-five years of experience in the business, Data Engineering for AWS is a book whose sole aim is to make you proficient in using the AWS ecosystem. Using a thorough and hands-on approach to data, this book will give aspiring and new data engineers a solid theoretical and practical foundation to succeed with AWS. As you progress, you’ll be taken through the services and the skills you need to architect and implement data pipelines on AWS. You'll begin by reviewing important data engineering concepts and some of the core AWS services that form a part of the data engineer's toolkit. You'll then architect a data pipeline, review raw data sources, transform the data, and learn how the transformed data is used by various data consumers. You’ll also learn about populating data marts and data warehouses along with how a data lakehouse fits into the picture. Later, you'll be introduced to AWS tools for analyzing data, including those for ad-hoc SQL queries and creating visualizations. In the final chapters, you'll understand how the power of machine learning and artificial intelligence can be used to draw new insights from data. By the end of this AWS book, you'll be able to carry out data engineering tasks and implement a data pipeline on AWS independently.What you will learn Understand data engineering concepts and emerging technologies Ingest streaming data with Amazon Kinesis Data Firehose Optimize, denormalize, and join datasets with AWS Glue Studio Use Amazon S3 events to trigger a Lambda process to transform a file Run complex SQL queries on data lake data using Amazon Athena Load data into a Redshift data warehouse and run queries Create a visualization of your data using Amazon QuickSight Extract sentiment data from a dataset using Amazon Comprehend Who this book is for This book is for data engineers, data analysts, and data architects who are new to AWS and looking to extend their skills to the AWS cloud. Anyone new to data engineering who wants to learn about the foundational concepts while gaining practical experience with common data engineering services on AWS will also find this book useful. A basic understanding of big data-related topics and Python coding will help you get the most out of this book but it’s not a prerequisite. Familiarity with the AWS console and core services will also help you follow along.

Data Engineering with AWS

Data Engineering with AWS PDF Author: Gareth Eagar
Publisher: Packt Publishing Ltd
ISBN: 1804613134
Category : Computers
Languages : en
Pages : 637

Get Book

Book Description
Looking to revolutionize your data transformation game with AWS? Look no further! From strong foundations to hands-on building of data engineering pipelines, our expert-led manual has got you covered. Key Features Delve into robust AWS tools for ingesting, transforming, and consuming data, and for orchestrating pipelines Stay up to date with a comprehensive revised chapter on Data Governance Build modern data platforms with a new section covering transactional data lakes and data mesh Book DescriptionThis book, authored by a seasoned Senior Data Architect with 25 years of experience, aims to help you achieve proficiency in using the AWS ecosystem for data engineering. This revised edition provides updates in every chapter to cover the latest AWS services and features, takes a refreshed look at data governance, and includes a brand-new section on building modern data platforms which covers; implementing a data mesh approach, open-table formats (such as Apache Iceberg), and using DataOps for automation and observability. You'll begin by reviewing the key concepts and essential AWS tools in a data engineer's toolkit and getting acquainted with modern data management approaches. You'll then architect a data pipeline, review raw data sources, transform the data, and learn how that transformed data is used by various data consumers. You’ll learn how to ensure strong data governance, and about populating data marts and data warehouses along with how a data lakehouse fits into the picture. After that, you'll be introduced to AWS tools for analyzing data, including those for ad-hoc SQL queries and creating visualizations. Then, you'll explore how the power of machine learning and artificial intelligence can be used to draw new insights from data. In the final chapters, you'll discover transactional data lakes, data meshes, and how to build a cutting-edge data platform on AWS. By the end of this AWS book, you'll be able to execute data engineering tasks and implement a data pipeline on AWS like a pro!What you will learn Seamlessly ingest streaming data with Amazon Kinesis Data Firehose Optimize, denormalize, and join datasets with AWS Glue Studio Use Amazon S3 events to trigger a Lambda process to transform a file Load data into a Redshift data warehouse and run queries with ease Visualize and explore data using Amazon QuickSight Extract sentiment data from a dataset using Amazon Comprehend Build transactional data lakes using Apache Iceberg with Amazon Athena Learn how a data mesh approach can be implemented on AWS Who this book is forThis book is for data engineers, data analysts, and data architects who are new to AWS and looking to extend their skills to the AWS cloud. Anyone new to data engineering who wants to learn about the foundational concepts, while gaining practical experience with common data engineering services on AWS, will also find this book useful. A basic understanding of big data-related topics and Python coding will help you get the most out of this book, but it’s not a prerequisite. Familiarity with the AWS console and core services will also help you follow along.

Data Engineering with AWS

Data Engineering with AWS PDF Author: Gareth Eagar
Publisher: Packt Publishing
ISBN: 9781800560413
Category :
Languages : en
Pages : 426

Get Book

Book Description
Start your AWS data engineering journey with this easy-to-follow, hands-on guide and get to grips with foundational concepts through to building data engineering pipelines using AWS Key Features: Learn about common data architectures and modern approaches to generating value from big data Explore AWS tools for ingesting, transforming, and consuming data, and for orchestrating pipelines Learn how to architect and implement data lakes and data lakehouses for big data analytics Book Description: Knowing how to architect and implement complex data pipelines is a highly sought-after skill. Data engineers are responsible for building these pipelines that ingest, transform, and join raw datasets - creating new value from the data in the process. Amazon Web Services (AWS) offers a range of tools to simplify a data engineer's job, making it the preferred platform for performing data engineering tasks. This book will take you through the services and the skills you need to architect and implement data pipelines on AWS. You'll begin by reviewing important data engineering concepts and some of the core AWS services that form a part of the data engineer's toolkit. You'll then architect a data pipeline, review raw data sources, transform the data, and learn how the transformed data is used by various data consumers. The book also teaches you about populating data marts and data warehouses along with how a data lakehouse fits into the picture. Later, you'll be introduced to AWS tools for analyzing data, including those for ad-hoc SQL queries and creating visualizations. In the final chapters, you'll understand how the power of machine learning and artificial intelligence can be used to draw new insights from data. By the end of this AWS book, you'll be able to carry out data engineering tasks and implement a data pipeline on AWS independently. What You Will Learn: Understand data engineering concepts and emerging technologies Ingest streaming data with Amazon Kinesis Data Firehose Optimize, denormalize, and join datasets with AWS Glue Studio Use Amazon S3 events to trigger a Lambda process to transform a file Run complex SQL queries on data lake data using Amazon Athena Load data into a Redshift data warehouse and run queries Create a visualization of your data using Amazon QuickSight Extract sentiment data from a dataset using Amazon Comprehend Who this book is for: This book is for data engineers, data analysts, and data architects who are new to AWS and looking to extend their skills to the AWS cloud. Anyone who is new to data engineering and wants to learn about the foundational concepts while gaining practical experience with common data engineering services on AWS will also find this book useful. A basic understanding of big data-related topics and Python coding will help you get the most out of this book but is not needed. Familiarity with the AWS console and core services is also useful but not necessary.

Data Science on AWS

Data Science on AWS PDF Author: Chris Fregly
Publisher: "O'Reilly Media, Inc."
ISBN: 1492079367
Category : Computers
Languages : en
Pages : 524

Get Book

Book Description
With this practical book, AI and machine learning practitioners will learn how to successfully build and deploy data science projects on Amazon Web Services. The Amazon AI and machine learning stack unifies data science, data engineering, and application development to help level upyour skills. This guide shows you how to build and run pipelines in the cloud, then integrate the results into applications in minutes instead of days. Throughout the book, authors Chris Fregly and Antje Barth demonstrate how to reduce cost and improve performance. Apply the Amazon AI and ML stack to real-world use cases for natural language processing, computer vision, fraud detection, conversational devices, and more Use automated machine learning to implement a specific subset of use cases with SageMaker Autopilot Dive deep into the complete model development lifecycle for a BERT-based NLP use case including data ingestion, analysis, model training, and deployment Tie everything together into a repeatable machine learning operations pipeline Explore real-time ML, anomaly detection, and streaming analytics on data streams with Amazon Kinesis and Managed Streaming for Apache Kafka Learn security best practices for data science projects and workflows including identity and access management, authentication, authorization, and more

Data Analytics in the AWS Cloud

Data Analytics in the AWS Cloud PDF Author: Joe Minichino
Publisher: John Wiley & Sons
ISBN: 1119909252
Category : Computers
Languages : en
Pages : 426

Get Book

Book Description
A comprehensive and accessible roadmap to performing data analytics in the AWS cloud In Data Analytics in the AWS Cloud: Building a Data Platform for BI and Predictive Analytics on AWS, accomplished software engineer and data architect Joe Minichino delivers an expert blueprint to storing, processing, analyzing data on the Amazon Web Services cloud platform. In the book, you’ll explore every relevant aspect of data analytics—from data engineering to analysis, business intelligence, DevOps, and MLOps—as you discover how to integrate machine learning predictions with analytics engines and visualization tools. You’ll also find: Real-world use cases of AWS architectures that demystify the applications of data analytics Accessible introductions to data acquisition, importation, storage, visualization, and reporting Expert insights into serverless data engineering and how to use it to reduce overhead and costs, improve stability, and simplify maintenance A can't-miss for data architects, analysts, engineers and technical professionals, Data Analytics in the AWS Cloud will also earn a place on the bookshelves of business leaders seeking a better understanding of data analytics on the AWS cloud platform.

Data Engineering with AWS

Data Engineering with AWS PDF Author: Gareth Eagar
Publisher: Packt Publishing Ltd
ISBN: 1804613134
Category : Computers
Languages : en
Pages : 637

Get Book

Book Description
Looking to revolutionize your data transformation game with AWS? Look no further! From strong foundations to hands-on building of data engineering pipelines, our expert-led manual has got you covered. Key Features Delve into robust AWS tools for ingesting, transforming, and consuming data, and for orchestrating pipelines Stay up to date with a comprehensive revised chapter on Data Governance Build modern data platforms with a new section covering transactional data lakes and data mesh Book DescriptionThis book, authored by a seasoned Senior Data Architect with 25 years of experience, aims to help you achieve proficiency in using the AWS ecosystem for data engineering. This revised edition provides updates in every chapter to cover the latest AWS services and features, takes a refreshed look at data governance, and includes a brand-new section on building modern data platforms which covers; implementing a data mesh approach, open-table formats (such as Apache Iceberg), and using DataOps for automation and observability. You'll begin by reviewing the key concepts and essential AWS tools in a data engineer's toolkit and getting acquainted with modern data management approaches. You'll then architect a data pipeline, review raw data sources, transform the data, and learn how that transformed data is used by various data consumers. You’ll learn how to ensure strong data governance, and about populating data marts and data warehouses along with how a data lakehouse fits into the picture. After that, you'll be introduced to AWS tools for analyzing data, including those for ad-hoc SQL queries and creating visualizations. Then, you'll explore how the power of machine learning and artificial intelligence can be used to draw new insights from data. In the final chapters, you'll discover transactional data lakes, data meshes, and how to build a cutting-edge data platform on AWS. By the end of this AWS book, you'll be able to execute data engineering tasks and implement a data pipeline on AWS like a pro!What you will learn Seamlessly ingest streaming data with Amazon Kinesis Data Firehose Optimize, denormalize, and join datasets with AWS Glue Studio Use Amazon S3 events to trigger a Lambda process to transform a file Load data into a Redshift data warehouse and run queries with ease Visualize and explore data using Amazon QuickSight Extract sentiment data from a dataset using Amazon Comprehend Build transactional data lakes using Apache Iceberg with Amazon Athena Learn how a data mesh approach can be implemented on AWS Who this book is forThis book is for data engineers, data analysts, and data architects who are new to AWS and looking to extend their skills to the AWS cloud. Anyone new to data engineering who wants to learn about the foundational concepts, while gaining practical experience with common data engineering services on AWS, will also find this book useful. A basic understanding of big data-related topics and Python coding will help you get the most out of this book, but it’s not a prerequisite. Familiarity with the AWS console and core services will also help you follow along.

Data Engineering with AWS - Second Edition

Data Engineering with AWS - Second Edition PDF Author: Gareth Eagar
Publisher:
ISBN: 9781804614426
Category :
Languages : en
Pages : 0

Get Book

Book Description


Fundamentals of Data Engineering

Fundamentals of Data Engineering PDF Author: Joe Reis
Publisher: "O'Reilly Media, Inc."
ISBN: 1098108256
Category : Computers
Languages : en
Pages : 454

Get Book

Book Description
Data engineering has grown rapidly in the past decade, leaving many software engineers, data scientists, and analysts looking for a comprehensive view of this practice. With this practical book, you'll learn how to plan and build systems to serve the needs of your organization and customers by evaluating the best technologies available through the framework of the data engineering lifecycle. Authors Joe Reis and Matt Housley walk you through the data engineering lifecycle and show you how to stitch together a variety of cloud technologies to serve the needs of downstream data consumers. You'll understand how to apply the concepts of data generation, ingestion, orchestration, transformation, storage, and governance that are critical in any data environment regardless of the underlying technology. This book will help you: Get a concise overview of the entire data engineering landscape Assess data engineering problems using an end-to-end framework of best practices Cut through marketing hype when choosing data technologies, architecture, and processes Use the data engineering lifecycle to design and build a robust architecture Incorporate data governance and security across the data engineering lifecycle

Data Wrangling on AWS

Data Wrangling on AWS PDF Author: Navnit Shukla
Publisher: Packt Publishing Ltd
ISBN: 1801817669
Category : Computers
Languages : en
Pages : 420

Get Book

Book Description
Revamp your data landscape and implement highly effective data pipelines in AWS with this hands-on guide Purchase of the print or Kindle book includes a free PDF eBook Key Features Execute extract, transform, and load (ETL) tasks on data lakes, data warehouses, and databases Implement effective Pandas data operation with data wrangler Integrate pipelines with AWS data services Book DescriptionData wrangling is the process of cleaning, transforming, and organizing raw, messy, or unstructured data into a structured format. It involves processes such as data cleaning, data integration, data transformation, and data enrichment to ensure that the data is accurate, consistent, and suitable for analysis. Data Wrangling on AWS equips you with the knowledge to reap the full potential of AWS data wrangling tools. First, you’ll be introduced to data wrangling on AWS and will be familiarized with data wrangling services available in AWS. You’ll understand how to work with AWS Glue DataBrew, AWS data wrangler, and AWS Sagemaker. Next, you’ll discover other AWS services like Amazon S3, Redshift, Athena, and Quicksight. Additionally, you’ll explore advanced topics such as performing Pandas data operation with AWS data wrangler, optimizing ML data with AWS SageMaker, building the data warehouse with Glue DataBrew, along with security and monitoring aspects. By the end of this book, you’ll be well-equipped to perform data wrangling using AWS services.What you will learn Explore how to write simple to complex transformations using AWS data wrangler Use abstracted functions to extract and load data from and into AWS datastores Configure AWS Glue DataBrew for data wrangling Develop data pipelines using AWS data wrangler Integrate AWS security features into Data Wrangler using identity and access management (IAM) Optimize your data with AWS SageMaker Who this book is for This book is for data engineers, data scientists, and business data analysts looking to explore the capabilities, tools, and services of data wrangling on AWS for their ETL tasks. Basic knowledge of Python, Pandas, and a familiarity with AWS tools such as AWS Glue, Amazon Athena is required to get the most out of this book.

Data Engineering with Google Cloud Platform

Data Engineering with Google Cloud Platform PDF Author: Adi Wijaya
Publisher: Packt Publishing Ltd
ISBN: 1800565062
Category : Computers
Languages : en
Pages : 440

Get Book

Book Description
Build and deploy your own data pipelines on GCP, make key architectural decisions, and gain the confidence to boost your career as a data engineer Key Features Understand data engineering concepts, the role of a data engineer, and the benefits of using GCP for building your solution Learn how to use the various GCP products to ingest, consume, and transform data and orchestrate pipelines Discover tips to prepare for and pass the Professional Data Engineer exam Book DescriptionWith this book, you'll understand how the highly scalable Google Cloud Platform (GCP) enables data engineers to create end-to-end data pipelines right from storing and processing data and workflow orchestration to presenting data through visualization dashboards. Starting with a quick overview of the fundamental concepts of data engineering, you'll learn the various responsibilities of a data engineer and how GCP plays a vital role in fulfilling those responsibilities. As you progress through the chapters, you'll be able to leverage GCP products to build a sample data warehouse using Cloud Storage and BigQuery and a data lake using Dataproc. The book gradually takes you through operations such as data ingestion, data cleansing, transformation, and integrating data with other sources. You'll learn how to design IAM for data governance, deploy ML pipelines with the Vertex AI, leverage pre-built GCP models as a service, and visualize data with Google Data Studio to build compelling reports. Finally, you'll find tips on how to boost your career as a data engineer, take the Professional Data Engineer certification exam, and get ready to become an expert in data engineering with GCP. By the end of this data engineering book, you'll have developed the skills to perform core data engineering tasks and build efficient ETL data pipelines with GCP.What you will learn Load data into BigQuery and materialize its output for downstream consumption Build data pipeline orchestration using Cloud Composer Develop Airflow jobs to orchestrate and automate a data warehouse Build a Hadoop data lake, create ephemeral clusters, and run jobs on the Dataproc cluster Leverage Pub/Sub for messaging and ingestion for event-driven systems Use Dataflow to perform ETL on streaming data Unlock the power of your data with Data Studio Calculate the GCP cost estimation for your end-to-end data solutions Who this book is for This book is for data engineers, data analysts, and anyone looking to design and manage data processing pipelines using GCP. You'll find this book useful if you are preparing to take Google's Professional Data Engineer exam. Beginner-level understanding of data science, the Python programming language, and Linux commands is necessary. A basic understanding of data processing and cloud computing, in general, will help you make the most out of this book.