@@ -14,14 +14,15 @@ Harnessing the complexity of large amounts of data is a challenge in itself.
1414
1515But Big Data processing is more than that: originally characterized by the 3 Vs of Volume, Velocity and Variety,
1616the concepts popularized by Hadoop and Google require dedicated computing solutions (both software and infrastructure),
17- which will be explored in this module. We'll also take a dive in new programming and infrastructure technologies
18- that emerged from these concepts.
17+ which will be explored in this module.
18+
19+ We'll also take a dive in new programming and infrastructure technologies that emerged from these concepts.
1920
2021## Objectives
2122
2223By the end of this module, participants will be able to:
2324
24- - Understand the differences and usage between main distributed computing architectures (HPC, Big Data, Cloud, CPU vs GPGPU)
25+ - Understand the differences and usages of main distributed computing architectures (HPC, Big Data, Cloud, CPU vs GPGPU)
2526- Implement the distribution of simple operations via the Map/Reduce principle in PySpark and Dask
2627- Understand the principle of Kubernetes
2728- Deploy a Big Data Processing Platform on the Cloud
@@ -77,6 +78,7 @@ What is this course module main subject?
7778
7879## Big Data & Distributed Computing (3h)
7980
81+ - [ Current introduction (30min)] ( 00_SDD_DE_Course_Introduction.html )
8082- [ Introduction to Big Data and its ecosystem (1h)] ( 01_Introduction_Big_Data.html )
8183 - What is Big Data?
8284 - Legacy “Big Data” ecosystem
@@ -92,14 +94,15 @@ What is this course module main subject?
9294
9395## Deployment & Intro to Kubernetes (3h)
9496
95- - MLOps: deploying your model as a Web App
97+ MLOps: deploying your model as a Web App
98+
9699- [ Introduction to Orchestration] ( https://supaerodatascience.github.io/DE/slides/2_2b_orchestration.html )
97100- [ Introduction to Kubernetes] ( 12_OrchestrationKubernetes.html )
98101
99102## Kubernetes hands on (3h)
100103
101104- Zero to Jupyterhub: deploy a Jupyterhub on Kubernetes
102- - Deploy a Daskhub: a Dask enables Jupyterhub (for later use)
105+ - Deploy a Daskhub: a Dask enabled Jupyterhub (for later use)
103106
104107[ Slides] ( 13_Dask_On_Cloud.html )
105108
@@ -113,11 +116,11 @@ What is this course module main subject?
113116 - Machine and Deep Learning (Sickit Learn, TensorFlow, Pytorch)
114117 - Jupyter notebooks, Binder, Google Colab
115118- [ Spark Introduction (30m)] ( 03_Spark_Introduction.html )
116- - Play with MapReduce through Spark (Notebook on small datasets) (1.5h)
119+ - Play with MapReduce using Spark (Notebook on small datasets) (1.5h)
117120
118121## Distributed Processing and Dask hands on (3h)
119122
120- - [ Manage large datasets(30m)] ( 24_Large_Datasets.html )
123+ - [ Manage large datasets (30m)] ( 24_Large_Datasets.html )
121124- [ Dask Introduction (30m)] ( 22_Dask_Pangeo.html )
122125- Includes [ Dask tutorial(2h)] ( https://github.com/dask/dask-tutorial ) .
123126
@@ -127,7 +130,7 @@ What is this course module main subject?
127130 - Subject presentation
128131 - Everyone should have a Daskhub cloud platform setup or Dask on local computer
129132 - Get the data
130- - Notebook with cell codes to fill or answers to give
133+ - Notebook with codes cell to fill and answers to give
131134 - Clean big amounts of data using Dask in the cloud or on a big computer
132135 - Train machine learning models in parallel (hyper parameter search)
133136 - Complete with yor own efforts!
@@ -145,4 +148,3 @@ What will we do today?
145148![ Answer] ( https://cdn.strawpoll.com/images/polls/qr/xVg71DedQyr.png )
146149
147150[ Answer link] ( https://strawpoll.com/xVg71DedQyr )
148-
0 commit comments