CNES
diff --git a/‎images/CERNVolumes.png‎
13.3 KB b/‎images/CERNVolumes.png‎
13.3 KB
diff --git a/‎images/google-dc-map.png‎
411 KB b/‎images/google-dc-map.png‎
411 KB
diff --git a/‎images/jean-zay-hpc.png‎
529 KB b/‎images/jean-zay-hpc.png‎
529 KB
diff --git a/‎src/00_SDD_DE_Course_Introduction.md‎
Lines changed: 11 additions & 9 deletions b/‎src/00_SDD_DE_Course_Introduction.md‎
Lines changed: 11 additions & 9 deletions
diff --git a/‎src/01_Introduction_Big_Data.md‎
Lines changed: 4 additions & 3 deletions b/‎src/01_Introduction_Big_Data.md‎
Lines changed: 4 additions & 3 deletions
diff --git a/‎src/02_Big_Data_Platforms.md‎
Lines changed: 9 additions & 10 deletions b/‎src/02_Big_Data_Platforms.md‎
Lines changed: 9 additions & 10 deletions
diff --git a/‎src/10_Cloud_Computing.md‎
Lines changed: 4 additions & 5 deletions b/‎src/10_Cloud_Computing.md‎
Lines changed: 4 additions & 5 deletions
diff --git a/‎src/14_ObjectStorage.md‎
Lines changed: 1 addition & 1 deletion b/‎src/14_ObjectStorage.md‎
Lines changed: 1 addition & 1 deletion
@@ -14,14 +14,15 @@ Harnessing the complexity of large amounts of data is a challenge in itself.
 
 But Big Data processing is more than that: originally characterized by the 3 Vs of Volume, Velocity and Variety, 
 the concepts popularized by Hadoop and Google require dedicated computing solutions (both software and infrastructure), 
-which will be explored in this module. We'll also take a dive in new programming and infrastructure technologies
-that emerged from these concepts.
+which will be explored in this module. 
+
+We'll also take a dive in new programming and infrastructure technologies that emerged from these concepts.
 
 ## Objectives
 
 By the end of this module, participants will be able to:
 
-- Understand the differences and usage between main distributed computing architectures (HPC, Big Data, Cloud, CPU vs GPGPU)
+- Understand the differences and usages of main distributed computing architectures (HPC, Big Data, Cloud, CPU vs GPGPU)
 - Implement the distribution of simple operations via the Map/Reduce principle in PySpark and Dask
 - Understand the principle of Kubernetes
 - Deploy a Big Data Processing Platform on the Cloud
@@ -77,6 +78,7 @@ What is this course module main subject?
 
 ## Big Data & Distributed Computing (3h)
 
+- [Current introduction (30min)](00_SDD_DE_Course_Introduction.html)
 - [Introduction to Big Data and its ecosystem (1h)](01_Introduction_Big_Data.html)
   - What is Big Data?
   - Legacy “Big Data” ecosystem
@@ -92,14 +94,15 @@ What is this course module main subject?
 
 ## Deployment & Intro to Kubernetes (3h)
 
-- MLOps: deploying your model as a Web App 
+MLOps: deploying your model as a Web App 
+
 - [Introduction to Orchestration](https://supaerodatascience.github.io/DE/slides/2_2b_orchestration.html)
 - [Introduction to Kubernetes](12_OrchestrationKubernetes.html)
 
 ## Kubernetes hands on (3h)
 
 - Zero to Jupyterhub: deploy a Jupyterhub on Kubernetes
-- Deploy a Daskhub: a Dask enables Jupyterhub (for later use)
+- Deploy a Daskhub: a Dask enabled Jupyterhub (for later use)
 
 [Slides](13_Dask_On_Cloud.html)
 
@@ -113,11 +116,11 @@ What is this course module main subject?
   - Machine and Deep Learning (Sickit Learn, TensorFlow, Pytorch)
   - Jupyter notebooks, Binder, Google Colab
 - [Spark Introduction (30m)](03_Spark_Introduction.html)
-- Play with MapReduce through Spark (Notebook on small datasets) (1.5h)
+- Play with MapReduce using Spark (Notebook on small datasets) (1.5h)
 
 ## Distributed Processing and Dask hands on (3h)
 
-- [Manage large datasets(30m)](24_Large_Datasets.html)
+- [Manage large datasets (30m)](24_Large_Datasets.html)
 - [Dask Introduction (30m)](22_Dask_Pangeo.html)
 - Includes [Dask tutorial(2h)](https://github.com/dask/dask-tutorial).
 
@@ -127,7 +130,7 @@ What is this course module main subject?
   - Subject presentation
   - Everyone should have a Daskhub cloud platform setup or Dask on local computer
   - Get the data
-- Notebook with cell codes to fill or answers to give
+- Notebook with codes cell to fill and answers to give
   - Clean big amounts of data using Dask in the cloud or on a big computer
   - Train machine learning models in parallel (hyper parameter search)
   - Complete with yor own efforts!
@@ -145,4 +148,3 @@ What will we do today?
 ![Answer](https://cdn.strawpoll.com/images/polls/qr/xVg71DedQyr.png)
 
 [Answer link](https://strawpoll.com/xVg71DedQyr)
-
 
@@ -17,7 +17,8 @@ date: 2026
 
 ## Some figures
 
-![Volume of data produced in a day in 2019 (source www.visualcapitalist.com)](images/a-day-in-data.jpg){width="50%"}
+![](images/a-day-in-data.jpg){width="50%"}
+![](https://www.digitalsilk.com/wp-content/uploads/2024/12/how-much-data-is-generated-per-day-hero-image.jpg)
 
 ## Some figures in sciences
 
@@ -78,7 +79,7 @@ Not a technology.
 
 ## Quizz
 
-What is the estimated size of the global data sphere?
+What is the estimated size of the global data sphere in 2025?
 
 - Answer A: 175 Petabytes
 - Answer B: 175 Exabytes
@@ -253,7 +254,7 @@ Data production or scientific exploration:
 
 ## Quizz
 
-What is the typical volumes of scientific Datasets (multiple choices)?
+What are the typical volumes of scientific Datasets (multiple choices)?
 
 - Answer A: MBs
 - Answer B: GBs
 
@@ -532,23 +532,22 @@ python /data/training/SLURM/plot_template.py
 :::
 ::: {.column width="50%"}
 
-![Jean-Zay supercomputer](http://www.idris.fr/media/images/jean-zay-annonce-01.jpg?id=web%3Aeng%3Ajean-zay%3Acpu%3Ajean-zay-cpu-hw-eng)
+![Jean-Zay supercomputer](images/jean-zay-hpc.png)
 
 :::
 ::::::::::::::
 
 ## TOP500
 
-| Rank | System | Cores | Rmax (TFlop/s) | Rpeak (PFlop/s) | Power (kW) |
+| Rank | System | Cores | Rmax (PFlop/s) | Rpeak (PFlop/s) | Power (kW) |
 |------| -------|-------|----------------|-----------------|------------|
-| 1 | Frontier - United States  | 8,699,904 | 1,194.00 | 1,679.82 | 22,703 |
-| 2 | Aurora - United States | 4,742,808 	 | 585.34 | 1,059.33 | 24,687 |
-| 4 | Supercomputer Fugaku - Japan | 7,630,848 | 442.01 | 537.21 | 29,899 |
-| 5 | LUMI - Finland | 2,752,704 | 2379.70 | 531.51 | 7,107 |
-| 17 | Adastra - France | 319,072 | 46.10 | 61.61 | 921 |
-| 167 | Jean Zay - France | 93,960 | 4.48 | 7.35 | |
+| 1 | El Capitan - United States  | 11,340,000	 | 1,809.00 | 2,821.10 | 29,685 |
+| 4 | JUPITER Booster - Germany  | 4,801,344 | 1,000.00 | 1,226.28 | 15,794 |
+| 7 | Supercomputer Fugaku - Japan | 7,630,848 | 442.01 | 537.21 | 29,899 |
+| 26 | CEA-HE - France | 548,352 | 90.79 | 171.26 | 1,770 |
+| 290 | Jean Zay - France | 93,960 | 4.48 | 7.35 | |
 
-[Top 500 (november 2023)](https://top500.org/lists/top500/2023/11/)
+[Top 500 (november 2025)](https://top500.org/lists/top500/2025/11/)
 
 ## Big Data and Hadoop
 
@@ -628,7 +627,7 @@ Hence the cloud computing model...
 ### GPGPU
 
 - Specific hardware (expensive)
-- Really efficient for Deep Learning algorithms
+- Really efficient for Deep Learning algorithms (learning and inference)
 - Image processing, Language processing
 
 ## Quizz
 
@@ -35,8 +35,7 @@ I took most of the content from theirs:
 :::
 ::: {.column width="70%"}
 
-![](https://www.datacenterknowledge.com/sites/datacenterknowledge.com/files/wp-content/uploads/2013/06/lulea-rows.jpg){width="35%"}
-![](https://www.datacenterknowledge.com/sites/datacenterknowledge.com/files/wp-content/uploads/2013/06/fb-lulea-external-fans.jpg){width="35%"}
+![](https://www.akita.co.uk/wp-content/uploads/2023/09/cloud-storage-facilities-1.jpg)
 
 (Facebook's data center & server racks)
 
@@ -45,7 +44,7 @@ I took most of the content from theirs:
 
 ## Google Cloud Data Center locations
 
-![Data Centers](https://cloud.google.com/images/locations/regions.png)
+![Data Centers](images/google-dc-map.png)
 
 ## Cloud Definition
 
@@ -226,13 +225,13 @@ What means IaaS?
 ## Public (European)
 
 ![](https://www.comptoir-hardware.com/images/stories/_logos/ovhcloud.png){width=20%}
-![](https://cloud.orange.com/ui/app/static/assets/brand/logo_header_login.png){width=20%}
+![](https://www.orange-business.com/sites/default/files/illustration-obs---cloud---infrastructures.png){width=20%}
 ![](images/open_telekom_cloud.png){width=20%}
 
 Academic, public founded:
 
 ![gaiax](https://gaia-x.eu/wp-content/uploads/2022/12/Gaia-X_Logo_Inverted_White_Transparent_210401-3-1000x687.png){width=20%}
-![EOSC](https://eosc-portal.eu/sites/all/themes/theme1/logo.png){width=20%}
+![EOSC](https://eosc.eu/wp-content/uploads/2023/08/EOSCA_logo.svg){width=20%}
 
 ## Private/on premise
 
 
@@ -146,7 +146,7 @@ What is Cloud Optimized?
 :::
 ::: {.column width="50%"}
 
-![](https://staging.dev.element84.com/wp-content/uploads/2019/04/smiley_tiled.png)
+![](https://guide.cloudnativegeo.org/images/cog-diagram-2.png)
 
 :::
 ::::::::::::::