Skip to content

Commit cd80249

Browse files
committed
Correct pictures and content
1 parent e716266 commit cd80249

8 files changed

Lines changed: 29 additions & 28 deletions

images/CERNVolumes.png

13.3 KB
Loading

images/google-dc-map.png

411 KB
Loading

images/jean-zay-hpc.png

529 KB
Loading

src/00_SDD_DE_Course_Introduction.md

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -14,14 +14,15 @@ Harnessing the complexity of large amounts of data is a challenge in itself.
1414

1515
But Big Data processing is more than that: originally characterized by the 3 Vs of Volume, Velocity and Variety,
1616
the concepts popularized by Hadoop and Google require dedicated computing solutions (both software and infrastructure),
17-
which will be explored in this module. We'll also take a dive in new programming and infrastructure technologies
18-
that emerged from these concepts.
17+
which will be explored in this module.
18+
19+
We'll also take a dive in new programming and infrastructure technologies that emerged from these concepts.
1920

2021
## Objectives
2122

2223
By the end of this module, participants will be able to:
2324

24-
- Understand the differences and usage between main distributed computing architectures (HPC, Big Data, Cloud, CPU vs GPGPU)
25+
- Understand the differences and usages of main distributed computing architectures (HPC, Big Data, Cloud, CPU vs GPGPU)
2526
- Implement the distribution of simple operations via the Map/Reduce principle in PySpark and Dask
2627
- Understand the principle of Kubernetes
2728
- Deploy a Big Data Processing Platform on the Cloud
@@ -77,6 +78,7 @@ What is this course module main subject?
7778

7879
## Big Data & Distributed Computing (3h)
7980

81+
- [Current introduction (30min)](00_SDD_DE_Course_Introduction.html)
8082
- [Introduction to Big Data and its ecosystem (1h)](01_Introduction_Big_Data.html)
8183
- What is Big Data?
8284
- Legacy “Big Data” ecosystem
@@ -92,14 +94,15 @@ What is this course module main subject?
9294

9395
## Deployment & Intro to Kubernetes (3h)
9496

95-
- MLOps: deploying your model as a Web App
97+
MLOps: deploying your model as a Web App
98+
9699
- [Introduction to Orchestration](https://supaerodatascience.github.io/DE/slides/2_2b_orchestration.html)
97100
- [Introduction to Kubernetes](12_OrchestrationKubernetes.html)
98101

99102
## Kubernetes hands on (3h)
100103

101104
- Zero to Jupyterhub: deploy a Jupyterhub on Kubernetes
102-
- Deploy a Daskhub: a Dask enables Jupyterhub (for later use)
105+
- Deploy a Daskhub: a Dask enabled Jupyterhub (for later use)
103106

104107
[Slides](13_Dask_On_Cloud.html)
105108

@@ -113,11 +116,11 @@ What is this course module main subject?
113116
- Machine and Deep Learning (Sickit Learn, TensorFlow, Pytorch)
114117
- Jupyter notebooks, Binder, Google Colab
115118
- [Spark Introduction (30m)](03_Spark_Introduction.html)
116-
- Play with MapReduce through Spark (Notebook on small datasets) (1.5h)
119+
- Play with MapReduce using Spark (Notebook on small datasets) (1.5h)
117120

118121
## Distributed Processing and Dask hands on (3h)
119122

120-
- [Manage large datasets(30m)](24_Large_Datasets.html)
123+
- [Manage large datasets (30m)](24_Large_Datasets.html)
121124
- [Dask Introduction (30m)](22_Dask_Pangeo.html)
122125
- Includes [Dask tutorial(2h)](https://github.com/dask/dask-tutorial).
123126

@@ -127,7 +130,7 @@ What is this course module main subject?
127130
- Subject presentation
128131
- Everyone should have a Daskhub cloud platform setup or Dask on local computer
129132
- Get the data
130-
- Notebook with cell codes to fill or answers to give
133+
- Notebook with codes cell to fill and answers to give
131134
- Clean big amounts of data using Dask in the cloud or on a big computer
132135
- Train machine learning models in parallel (hyper parameter search)
133136
- Complete with yor own efforts!
@@ -145,4 +148,3 @@ What will we do today?
145148
![Answer](https://cdn.strawpoll.com/images/polls/qr/xVg71DedQyr.png)
146149

147150
[Answer link](https://strawpoll.com/xVg71DedQyr)
148-

src/01_Introduction_Big_Data.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,8 @@ date: 2026
1717

1818
## Some figures
1919

20-
![Volume of data produced in a day in 2019 (source www.visualcapitalist.com)](images/a-day-in-data.jpg){width="50%"}
20+
![](images/a-day-in-data.jpg){width="50%"}
21+
![](https://www.digitalsilk.com/wp-content/uploads/2024/12/how-much-data-is-generated-per-day-hero-image.jpg)
2122

2223
## Some figures in sciences
2324

@@ -78,7 +79,7 @@ Not a technology.
7879

7980
## Quizz
8081

81-
What is the estimated size of the global data sphere?
82+
What is the estimated size of the global data sphere in 2025?
8283

8384
- Answer A: 175 Petabytes
8485
- Answer B: 175 Exabytes
@@ -253,7 +254,7 @@ Data production or scientific exploration:
253254

254255
## Quizz
255256

256-
What is the typical volumes of scientific Datasets (multiple choices)?
257+
What are the typical volumes of scientific Datasets (multiple choices)?
257258

258259
- Answer A: MBs
259260
- Answer B: GBs

src/02_Big_Data_Platforms.md

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -532,23 +532,22 @@ python /data/training/SLURM/plot_template.py
532532
:::
533533
::: {.column width="50%"}
534534

535-
![Jean-Zay supercomputer](http://www.idris.fr/media/images/jean-zay-annonce-01.jpg?id=web%3Aeng%3Ajean-zay%3Acpu%3Ajean-zay-cpu-hw-eng)
535+
![Jean-Zay supercomputer](images/jean-zay-hpc.png)
536536

537537
:::
538538
::::::::::::::
539539

540540
## TOP500
541541

542-
| Rank | System | Cores | Rmax (TFlop/s) | Rpeak (PFlop/s) | Power (kW) |
542+
| Rank | System | Cores | Rmax (PFlop/s) | Rpeak (PFlop/s) | Power (kW) |
543543
|------| -------|-------|----------------|-----------------|------------|
544-
| 1 | Frontier - United States | 8,699,904 | 1,194.00 | 1,679.82 | 22,703 |
545-
| 2 | Aurora - United States | 4,742,808 | 585.34 | 1,059.33 | 24,687 |
546-
| 4 | Supercomputer Fugaku - Japan | 7,630,848 | 442.01 | 537.21 | 29,899 |
547-
| 5 | LUMI - Finland | 2,752,704 | 2379.70 | 531.51 | 7,107 |
548-
| 17 | Adastra - France | 319,072 | 46.10 | 61.61 | 921 |
549-
| 167 | Jean Zay - France | 93,960 | 4.48 | 7.35 | |
544+
| 1 | El Capitan - United States | 11,340,000 | 1,809.00 | 2,821.10 | 29,685 |
545+
| 4 | JUPITER Booster - Germany | 4,801,344 | 1,000.00 | 1,226.28 | 15,794 |
546+
| 7 | Supercomputer Fugaku - Japan | 7,630,848 | 442.01 | 537.21 | 29,899 |
547+
| 26 | CEA-HE - France | 548,352 | 90.79 | 171.26 | 1,770 |
548+
| 290 | Jean Zay - France | 93,960 | 4.48 | 7.35 | |
550549

551-
[Top 500 (november 2023)](https://top500.org/lists/top500/2023/11/)
550+
[Top 500 (november 2025)](https://top500.org/lists/top500/2025/11/)
552551

553552
## Big Data and Hadoop
554553

@@ -628,7 +627,7 @@ Hence the cloud computing model...
628627
### GPGPU
629628

630629
- Specific hardware (expensive)
631-
- Really efficient for Deep Learning algorithms
630+
- Really efficient for Deep Learning algorithms (learning and inference)
632631
- Image processing, Language processing
633632

634633
## Quizz

src/10_Cloud_Computing.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -35,8 +35,7 @@ I took most of the content from theirs:
3535
:::
3636
::: {.column width="70%"}
3737

38-
![](https://www.datacenterknowledge.com/sites/datacenterknowledge.com/files/wp-content/uploads/2013/06/lulea-rows.jpg){width="35%"}
39-
![](https://www.datacenterknowledge.com/sites/datacenterknowledge.com/files/wp-content/uploads/2013/06/fb-lulea-external-fans.jpg){width="35%"}
38+
![](https://www.akita.co.uk/wp-content/uploads/2023/09/cloud-storage-facilities-1.jpg)
4039

4140
(Facebook's data center & server racks)
4241

@@ -45,7 +44,7 @@ I took most of the content from theirs:
4544

4645
## Google Cloud Data Center locations
4746

48-
![Data Centers](https://cloud.google.com/images/locations/regions.png)
47+
![Data Centers](images/google-dc-map.png)
4948

5049
## Cloud Definition
5150

@@ -226,13 +225,13 @@ What means IaaS?
226225
## Public (European)
227226

228227
![](https://www.comptoir-hardware.com/images/stories/_logos/ovhcloud.png){width=20%}
229-
![](https://cloud.orange.com/ui/app/static/assets/brand/logo_header_login.png){width=20%}
228+
![](https://www.orange-business.com/sites/default/files/illustration-obs---cloud---infrastructures.png){width=20%}
230229
![](images/open_telekom_cloud.png){width=20%}
231230

232231
Academic, public founded:
233232

234233
![gaiax](https://gaia-x.eu/wp-content/uploads/2022/12/Gaia-X_Logo_Inverted_White_Transparent_210401-3-1000x687.png){width=20%}
235-
![EOSC](https://eosc-portal.eu/sites/all/themes/theme1/logo.png){width=20%}
234+
![EOSC](https://eosc.eu/wp-content/uploads/2023/08/EOSCA_logo.svg){width=20%}
236235

237236
## Private/on premise
238237

src/14_ObjectStorage.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,7 @@ What is Cloud Optimized?
146146
:::
147147
::: {.column width="50%"}
148148

149-
![](https://staging.dev.element84.com/wp-content/uploads/2019/04/smiley_tiled.png)
149+
![](https://guide.cloudnativegeo.org/images/cog-diagram-2.png)
150150

151151
:::
152152
::::::::::::::

0 commit comments

Comments
 (0)