diff --git a/docs/cloud/04_dataproc/02_data_management.md b/docs/cloud/04_dataproc/02_data_management.md index 4997858f77..94f7a23929 100644 --- a/docs/cloud/04_dataproc/02_data_management.md +++ b/docs/cloud/04_dataproc/02_data_management.md @@ -4,7 +4,7 @@ HDFS stands for Hadoop Distributed File System. HDFS is a highly fault-tolerant ### File Permissions and Access Control Lists -You can share files with others using [access control lists (ACLs)](../../hpc/03_storage/09_sharing_data_on_hpc.md). An ACL gives you per-file, per-directory and per-user control over who has permission to access files. You can see the ACL for a file or directory with the getfacl command: +You can share files with others using [access control lists (ACLs)](../../hpc/03_storage/08_sharing_data_on_hpc.md). An ACL gives you per-file, per-directory and per-user control over who has permission to access files. You can see the ACL for a file or directory with the getfacl command: ```sh hdfs dfs -getfacl /user/_nyu_edu/testdir ``` diff --git a/docs/hpc/03_storage/01_intro_and_data_management.mdx b/docs/hpc/03_storage/01_intro_and_data_management.mdx index bc718ecc7f..836f04f426 100644 --- a/docs/hpc/03_storage/01_intro_and_data_management.mdx +++ b/docs/hpc/03_storage/01_intro_and_data_management.mdx @@ -1,131 +1,69 @@ # HPC Storage -The NYU HPC clusters are served by a General Parallel File System (GPFS) cluster and an all Flash VAST storage cluster. - -The NYU HPC team supports data storage, transfer, and archival needs on the HPC clusters, as well as collaborative research services like the [Research Project Space (RPS)](./05_research_project_space.mdx). - -## Highlights -- 9.5 PB Total GPFS Storage - - Up to 78 GB per second read speeds - - Up to 650k input/output operations per second (IOPS) -- Research Project Space (RPS): RPS volumes provide working spaces for sharing data and code amongst project or lab members - -## Introduction to HPC Data Management -The NYU HPC Environment provides access to a number of ***file systems*** to better serve the needs of researchers managing data during the various stages of the research data lifecycle (data capture, analysis, archiving, etc.). Each HPC file system comes with different features, policies, and availability. - -In addition, a number of ***data management tools*** are available that enable data transfers and data sharing, recommended best practices, and various scenarios and use cases of managing data in the HPC Environment. - -Multiple ***public data sets*** are available to all users of the HPC environment, such as a subset of The Cancer Genome Atlas (TCGA), the Million Song Database, ImageNet, and Reference Genomes. - -Below is a list of file systems with their characteristics and a summary table. Reviewing the list of available file systems and the various Scenarios/Use cases that are presented below, can help select the right file systems for a research project. As always, if you have any questions about data storage in the HPC environment, you can request a consultation with the HPC team by sending email to [hpc@nyu.edu](mailto:hpc@nyu.edu). - -### Data Security Warning -::::warning -#### Moderate Risk Data - HPC Approved -- The HPC Environment has been approved for storing and analyzing **Moderate Risk research data**, as defined in the [NYU Electronic Data and System Risk Classification Policy](https://www.nyu.edu/about/policies-guidelines-compliance/policies-and-guidelines/electronic-data-and-system-risk-classification.html). -- **High Risk** research data, such as those that include Personal Identifiable Information (**PII**) or electronic Protected Health Information (**ePHI**) or Controlled Unclassified Information (**CUI**) **should NOT be stored in the HPC Environment**. -:::note -only the Office of Sponsored Projects (OSP) and Global Office of Information Security (GOIS) are empowered to classify the risk categories of data. +:::warning[Only approved for Moderate Risk Data] +- High Risk data, such as those that include Personal Identifiable Information (PII) or electronic Protected Health Information (ePHI) or Controlled Unclassified Information (CUI) **should NOT be stored in the HPC Environment**. We recommend using the [Secure Research Data Environments (SRDE)](../../srde/01_getting_started/01_intro.md) instead for this. +- The Office of Sponsored Projects (OSP) & Global Office of Information Security (GOIS) are exclusively empowered to classify the risk categories for a dataset as listed in the [NYU Electronic Data and System Risk Classification Policy](https://www.nyu.edu/about/policies-guidelines-compliance/policies-and-guidelines/electronic-data-and-system-risk-classification.html). ::: -:::tip -#### High Risk Data - Secure Research Data Environments (SRDE) Approved -Because the HPC system is not approved for High Risk data, we recommend using an approved system like the [Secure Research Data Environments (SRDE)](../../srde/01_getting_started/01_intro.md). -::: -:::: - -### Data Storage options in the HPC Environment -#### User Home Directories -Every individual user has a home directory (under **`/home/$USER`**, environment variable **`$HOME`**) for permanently storing code and important configuration files. Home Directories provide limited storage space (**50 GB**) and inodes (files) **30,000** per user. Users can check their quota utilization using the [myquota](http://www.info-ren.org/projects/ckp/tech/software/version/myquota.html) command. -User home directories are backed up daily and old files under **`$HOME`** are not purged. +The HPC environment provides access to the file-systems listed below to better serve your needs for managing research data during all stages of the [research data life cycle](https://guides.nyu.edu/dataservices#s-lg-box-33756318). Reviewing the list of available file-systems and their intended uses can help you in selecting the right file-system for your tasks. Please note that there are strict limits on the size and number of files you are allowed to have on each filesystem. To find out your current disk space & inode quota utilization refer to the section on [understanding user quota limits.](./05_best_practices.mdx#user-quota-limits-and-the-myquota-command) -The User home directories are available on all HPC clusters (Torch) and on every cluster node (login nodes, compute nodes) as well as and Data Transfer Node (gDTN). +## User Home Directories +You have access to a home directory at `/home/$USER` (accessible via the environment variable `$HOME`) for permanently storing code and important configuration files. Home Directories provide limited storage space (**50 GB**) and inodes (files) capacity **30,000**. You can check your quota utilization using the `myquota`command as [described here](./05_best_practices.mdx#user-quota-limits-and-the-myquota-command). Home directories are backed up daily and old files under `$HOME` are not purged. Home directories are available on every cluster node (login nodes, compute nodes) and the Data Transfer Node (gDTN). :::warning Avoid changing file and directory permissions in your home directory to allow other users to access files. ::: -User Home Directories are not ideal for sharing files and folders with other users. HPC Scratch or [Research Project Space (RPS)](./05_research_project_space.mdx) are better file systems for sharing data. -:::warning -**One of the common issues that users report regarding their home directories is running out of inodes,** i.e. the number of files stored under their home exceeds the inode limit, which by default is set to 30,000 files. This typically occurs when users install software under their home directories, for example, when working with Conda and Julia environments, that involve many small files. -::: +User Home Directories are not ideal for sharing files and folders with other users. HPC Scratch or [Research Project Space (RPS)](./04_research_project_space.mdx) are better file-systems for sharing data. -:::tip -- To find out the current space and inode quota utilization and the distribution of files under your home directory, please see: [Understanding user quota limits and the myquota command.](./06_best_practices.md#user-quota-limits-and-the-myquota-command) -- **Working with Conda environments:** To avoid running out of inode limits in home directories, the HPC team recommends **setting up conda environments with Singularity overlay images** +:::tip[`inode` limits] +- One of the common issues that users report regarding their home directories is running out of inodes (i.e. the number of files stored under their home exceeds the inode limit), which by default is set to 30,000 files +- To find out the current space and inode quota utilization and the distribution of files under your home directory, please see: [Understanding user quota limits and the myquota command.](./05_best_practices.mdx#user-quota-limits-and-the-myquota-command) +- Working with `conda` environments: To avoid running out of inode limits in home directories, the HPC team recommends **setting up `conda` environments with Singularity overlay images** as [described here](../07_containers/03_singularity_with_conda.md). Avoid creating `conda` environments in your `$HOME` directory. ::: -#### HPC Scratch -The HPC scratch file system is the HPC file system where most of the users store research data needed during the analysis phase of their research projects. The scratch file system provides ***temporary*** storage for datasets needed for running jobs. - -Files stored in the HPC scratch file system are subject to the **HPC Scratch old file purging policy:** Files on the /scratch file system that have not been accessed for 60 or more days will be purged. - -Every user has a dedicated scratch directory (**/scratch/$USER**) with **5 TB** disk quota and **1,000,000 inodes** (files) limit per user. - -The scratch file system is available on all nodes (compute, login, etc.) on Torch as well as Data Transfer Node (gDTN). +## HPC Scratch +The HPC scratch is an all flash (VAST) file-system you can store research data needed during the analysis phase of your research projects. It provides ***temporary*** storage for datasets needed for running jobs. Your scratch directory (`/scratch/$USER`) has a limit of **5 TB** disk quota and **5,000,000 inodes** (files). The scratch file-system is available on all nodes (compute, login, etc.) on Torch as well as Data Transfer Node (gDTN). There are no backups for this file-system and files that are deleted accidentally or removed due to storage system failures cannot be recovered. -:::warning -There are **No Back ups of the scratch file system.** ***Files that were deleted accidentally or removed due to storage system failures CAN NOT be recovered.*** +:::warning[Scratch Purging Policy] +- Files on the `/scratch` file-system that have not been accessed for 60 or more days will be purged. +- It is a policy violation to use scripts to change the file access time. Any user found to be violating this policy will have their HPC account locked. A second violation may result in your access to HPC being revoked. ::: -:::tip - -- Since there are ***no back ups of HPC Scratch file system***, users should not put important source code, scripts, libraries, executables in `/scratch`. These important files should be stored in file systems that are backed up, such as `/home` or [Research Project Space (RPS)](./05_research_project_space.mdx). Code can also be stored in a ***git*** repository. -- ***Old file purging policy on HPC Scratch:*** All files on the HPC Scratch file system that have not been accessed ***for more than 60 days*** will be removed. It is a policy violation to use scripts to change the file access time. Any user found to be violating this policy will have their HPC account locked. A second violation may result in your HPC account being turned off. -- To find out the user's current disk space and inode quota utilization and the distribution of files under your scratch directory, please see: [Understanding user quota Limits and the myquota command.](./06_best_practices.md#user-quota-limits-and-the-myquota-command) -- Once a research project completes, users should archive their important files in the [HPC Archive file system](./01_intro_and_data_management.mdx#hpc-archive). +:::tip[Avoiding data loss from purging] +- There are no backups of HPC Scratch file-system and you should not put important source code, scripts, libraries, executables in `/scratch`. These files should instead be stored in file-systems that are backed up, such as `/home` or [Research Project Space (RPS)](./04_research_project_space.mdx). Code can also be stored in a `git` repository. +- Upon the completion of your research study, you are encouraged to archive your data in the [HPC Archive file-system](./01_intro_and_data_management.mdx#hpc-archive). ::: -#### HPC Vast -The HPC Vast all-flash file system is the HPC file system where users store research data needed during the analysis phase of their research projects, particularly for high I/O data that can bottleneck on the scratch file system. The Vast file system provides ***temporary*** storage for datasets needed for running jobs. - -Files stored in the HPC vast file system are subject to the ***HPC Vast old file purging policy:*** Files on the `/vast` file system that have not been accessed for **60 or more days** will be purged. - -Every user has a dedicated vast directory (**`/vast/$USER`**) with **2 TB** disk quota and **5,000,000 inodes** (files) limit per user. - -The vast file system is available on all nodes (compute, login, etc.) on Torch as well as Data Transfer Node (gDTN). - -:::warning -There are **No Back ups** of the vastsc file system. ***Files that were deleted accidentally or removed due to storage system failures CAN NOT be recovered.*** -::: - -:::tip -- Since there are ***no back ups of HPC Vast file system***, users should not put important source code, scripts, libraries, executables in `/vast`. These important files should be stored in file systems that are backed up, such as `/home` or [Research Project Space (RPS)](./05_research_project_space.mdx). Code can also be stored in a ***git*** repository. -- ***Old file purging policy on HPC Vast:*** All files on the HPC Vast file system that have not been accessed ***for more than 60 days will be removed.*** It is a policy violation to use scripts to change the file access time. Any user found to be violating this policy will have their HPC account locked. A second violation may result in your HPC account being turned off. -- To find out the user's current disk space and inode quota utilization and the distribution of files under your vast directory, please see: [Understanding user quota Limits and the myquota command.](./06_best_practices.md#user-quota-limits-and-the-myquota-command) -- Once a research project completes, users should archive their important files in the [HPC Archive file system](./01_intro_and_data_management.mdx#hpc-archive). -::: - -#### HPC Research Project Space -The HPC Research Project Space (RPS) provides data storage space for research projects that is easily shared amongst collaborators, ***backed up***, and ***not subject to the old file purging policy***. HPC RPS was introduced to ease data management in the HPC environment and eliminate the need of having to frequently copying files between Scratch and Archive file systems by having all projects files under one area. ***These benefits of the HPC RPS come at a cost***. The cost is determined by the allocated disk space and the number of files (inodes). -- For detailed information about RPS see: [HPC Research Project Space](./05_research_project_space.mdx) - -#### HPC Work -The HPC team makes available a number of public datasets that are commonly used in analysis jobs. The data sets are available Read-Only under **`/scratch/work/public`**. - -For some of the datasets users must provide a signed usage agreement before accessing. +## HPC Research Project Space +The HPC Research Project Space (RPS) provides data storage space for research projects that is easily shared amongst collaborators, ***backed up***, and ***not subject to the old file purging policy***. HPC RPS was introduced to ease data management in the HPC environment and eliminate the need of having to frequently copying files between Scratch and Archive file-systems by having all projects files under one area. ***These benefits of the HPC RPS come at a cost***. The cost is determined by the allocated disk space and the number of files (inodes). For detailed information about RPS see: [HPC Research Project Space](./04_research_project_space.mdx) -Public datasets available on the HPC clusters can be viewed on the [Datasets page](../04_datasets/01_intro.md). +## HPC Work +The HPC team makes available a number of public datasets that are commonly used in analysis jobs. The data-sets are available Read-Only under `/scratch/work/public`. For some of the datasets users must provide a signed usage agreement before accessing. Public datasets available on the HPC clusters can be viewed on the [Datasets page](../04_datasets/01_intro.md). -#### HPC Archive -Once the Analysis stage of the research data lifecycle has completed, _HPC users should **tar** their data and code into a single tar.gz file and then copy the file to their archive directory (**`/archive/$USER`**_). The HPC Archive file system is not accessible by running jobs; it is suitable for long-term data storage. Each user has access to a default disk quota of **2TB** and ***20,000 inode (files) limit***. The rather low limit on the number of inodes per user is intentional. The archive file system is available only ***on login nodes*** of Torch. The archive file system is backed up daily. +## HPC Archive +Once the Analysis stage of the [research data life cycle](https://guides.nyu.edu/dataservices#s-lg-box-33756318) has completed, you should compress your data before moving it onto the archive (`/archive/$USER`). For instance you can use the `tar` command to compress all your data into a single `tar.gz` file. The HPC Archive file-system is not accessible by running jobs; it is suitable for long-term data storage. Each user has access to a default disk quota of **2TB** and is limited to **20,000 inode (files)**. The rather low limit on the number of inodes per user is intentional. The archive file-system is available only ***on login nodes*** of Torch. The archive file-system is backed up daily. -- Here is an example ***tar*** command that combines the data in a directory named ***my_run_dir*** under ***`$SCRATCH`*** and outputs the tar file in the user's ***`$ARCHIVE`***: +Here is an example `tar` command that combines the data in a directory named `my_run_dir` under `$SCRATCH` and outputs the tar file in the user's `$ARCHIVE`: ```sh # to archive `$SCRATCH/my_run_dir` tar cvf $ARCHIVE/simulation_01.tar -C $SCRATCH my_run_dir ``` -#### NYU (Google) Drive +## NYU (Google) Drive Google Drive ([NYU Drive](https://www.nyu.edu/life/information-technology/communication-and-collaboration/document-collaboration-and-sharing/nyu-drive.html)) is accessible from the NYU HPC environment and provides an option to users who wish to archive data or share data with external collaborators who do not have access to the NYU HPC environment. -As of December 2023, storage limits were applied to all faculty, staff, and sutdent NYU Google accounts. Please see [Google Workspace Storage](https://www.nyu.edu/life/information-technology/about-nyu-it/key-projects-and-initiatives/google-workspace-storage.html) for details +As of December 2023, storage limits were applied to all faculty, staff, and student NYU Google accounts. Please see [Google Workspace Storage](https://www.nyu.edu/life/information-technology/about-nyu-it/key-projects-and-initiatives/google-workspace-storage.html) for details. -There are also limits to the data transfer rate in moving to/from Google Drive. Thus, moving many small files to Google Drive is not going to be efficient. +There are also limits to the data transfer rate in moving to/from Google Drive. Thus, moving many small files to Google Drive is not going to be efficient. Please read the [Instructions on how to use cloud storage within the NYU HPC Environment](./07_transferring_cloud_storage_data_with_rclone.md). -Please read the [Instructions on how to use cloud storage within the NYU HPC Environment](./08_transferring_cloud_storage_data_with_rclone.md). +## HPC Storage Comparison Table -#### HPC Storage Mounts Comparison Table - +| Space | Environment Variable | Purpose | Backed Up / Flushed | Quota Disk Space / # of Files | +|-----------------------------|----------------------|-------------------------------------------------------|-------------------------------------|------------------------------------| +| /home | $HOME | Personal user home space that is best for small files | YES / NO | 50 GB / 30 K | +| /scratch | $SCRATCH | Best for large files | NO / Files not accessed for 60 days | 5 TB / 5 M | +| /archive | $ARCHIVE | Long-term storage | YES / NO | 2 TB / 20 K | +| HPC Research Project Space | NA | Shared disk space for research projects | YES / NO | Payment based TB-year/inodes-year | Please see the next page for best practices for data management on NYU HPC systems. diff --git a/docs/hpc/03_storage/02_available_storage_systems.md b/docs/hpc/03_storage/02_available_storage_systems.md deleted file mode 100644 index afe6ef4667..0000000000 --- a/docs/hpc/03_storage/02_available_storage_systems.md +++ /dev/null @@ -1,39 +0,0 @@ -# Available storage systems - -The NYU HPC clusters are served by the following storage systems: - -## GPFS -General Parallel File System (GPFS) storage cluster is a high-performance clustered file system developed by IBM that provides concurrent high-speed file access to applications executing on multiple nodes of clusters. - -### Configuration -The NYU HPC cluster storage runs on Lenovo Distributed Storage Solution DSS-G hardware: -- 2x DSS-G 202 - - 116 Solid State Drives (SSDs) - - 464TB raw storage -- 2x DSS-G 240 - - 668 Hard Disk Drives (HDDs) - - 9.1PB raw storage - -### Performance -- Read Speed: 78 GB per second read speeds -- Write Speed: 42 GB per second write speeds -- I/O Performance: up to 650k input/output operations per second (IOPS) - -## Flash Tier Storage (VAST) -An all flash file system, using [VAST Flash storage](https://www.vastdata.com/), is now available on Torch. Flash storage is optimal for computational workloads with high I/O rates. For example, If you have jobs to run with huge amount of tiny files, VAST may be a good candidate. If you and your lab members are interested, please reach out to [hpc@nyu.edu](mailto:hpc@nyu.edu) for more information. -- NVMe interface -- Total size: 778 TB -:::note -/vast is available for all users to read and available to approved users to write data. -::: - -## Research Project Space (RPS) -[Research Project Space (RPS)](./05_research_project_space.mdx) volumes provide working spaces for sharing data and code amongst project or lab members. -- RPS directories are available on the Torch HPC cluster. -- There is no old-file purging policy on RPS. -- RPS is backed up. -- There is a cost per TB per year and inodes per year for RPS volumes. - -Please see [Research Project Space](./05_research_project_space.mdx) for more information. - - diff --git a/docs/hpc/03_storage/03_data_transfers.md b/docs/hpc/03_storage/02_data_transfers.md similarity index 95% rename from docs/hpc/03_storage/03_data_transfers.md rename to docs/hpc/03_storage/02_data_transfers.md index 9989801659..af71d5cc8b 100644 --- a/docs/hpc/03_storage/03_data_transfers.md +++ b/docs/hpc/03_storage/02_data_transfers.md @@ -1,14 +1,14 @@ # Data Transfers :::tip Globus -Globus is the recommended tool to use for large-volume data transfers due to the efficiency, reliability, security and ease of use. Use other tools only if you really need to. Detailed instructions available at [Globus](./04_globus.md) +Globus is the recommended tool to use for large-volume data transfers due to the efficiency, reliability, security and ease of use. Use other tools only if you really need to. Detailed instructions available at [Globus](./03_globus.md) ::: ## Data-Transfer nodes Attached to the NYU HPC cluster Torch, the Torch Data Transfer Node (gDTN) are nodes optimized for transferring data between cluster file systems (e.g. scratch) and other endpoints outside the NYU HPC clusters, including user laptops and desktops. The gDTNs have 100-Gb/s Ethernet connections to the High Speed Research Network (HSRN) and are connected to the HDR Infiniband fabric of the HPC clusters. More information on the hardware characteristics is available at [Torch spec sheet](../10_spec_sheet.md). ### Data Transfer Node Access -The HPC cluster filesystems include `/home`, `/scratch`, `/archive` and the [HPC Research Project Space](./05_research_project_space.mdx) are available on the gDTN. The Data-Transfer Node (DTN) can be accessed in a variety of ways +The HPC cluster filesystems include `/home`, `/scratch`, `/archive` and the [HPC Research Project Space](./04_research_project_space.mdx) are available on the gDTN. The Data-Transfer Node (DTN) can be accessed in a variety of ways - From NYU-net and the High Speed Research Network: use SSH to the DTN hostname `dtn011.hpc.nyu.edu` or `dtn012.hpc.nyu.edu` :::info @@ -42,12 +42,12 @@ where username would be your user name, project1 a directory to be copied to the ### Windows Tools #### File Transfer Clients -Windows 10 machines may have the Linux Subsystem installed, which will allow for the use of Linux tools, as listed above, but generally it is recommended to use a client such as [WinSCP](https://winscp.net/eng/docs/tunneling) or [FileZilla](https://filezilla-project.org/) to transfer data. Additionally, Windows users may also take advantage of [Globus](./04_globus.md) to transfer files. +Windows 10 machines may have the Linux Subsystem installed, which will allow for the use of Linux tools, as listed above, but generally it is recommended to use a client such as [WinSCP](https://winscp.net/eng/docs/tunneling) or [FileZilla](https://filezilla-project.org/) to transfer data. Additionally, Windows users may also take advantage of [Globus](./03_globus.md) to transfer files. ### Globus Globus is the recommended tool to use for large-volume data transfers. It features automatic performance tuning and automatic retries in cases of file-transfer failures. Data-transfer tasks can be submitted via a web portal. The Globus service will take care of the rest, to make sure files are copied efficiently, reliably, and securely. Globus is also a tool for you to share data with collaborators, for whom you only need to provide the email addresses. -The Globus endpoint for Torch is available at `nyu#torch`. Detailed instructions available at [Globus](./04_globus.md) +The Globus endpoint for Torch is available at `nyu#torch`. Detailed instructions available at [Globus](./03_globus.md) ### rclone rclone - rsync for cloud storage, is a command line program to sync files and directories to and from cloud storage systems such as Google Drive, Amazon Drive, S3, B2 etc. rclone is available on DTNs. [Please see the documentation for how to use it.](https://rclone.org/) diff --git a/docs/hpc/03_storage/04_globus.md b/docs/hpc/03_storage/03_globus.md similarity index 100% rename from docs/hpc/03_storage/04_globus.md rename to docs/hpc/03_storage/03_globus.md diff --git a/docs/hpc/03_storage/05_research_project_space.mdx b/docs/hpc/03_storage/04_research_project_space.mdx similarity index 96% rename from docs/hpc/03_storage/05_research_project_space.mdx rename to docs/hpc/03_storage/04_research_project_space.mdx index f5e3e1c362..38a4cd6977 100644 --- a/docs/hpc/03_storage/05_research_project_space.mdx +++ b/docs/hpc/03_storage/04_research_project_space.mdx @@ -1,10 +1,10 @@ # Research Project Space (RPS) ## Description -Research Project Space (RPS) volumes provide working space for sharing data and code amongst project or lab members. RPS directories are built on the same parallel file system (GPFS) like HPC Scratch. They are mounted on the cluster Compute Nodes, and thus they can be accessed by running jobs. RPS directories are backed up and there is no old file purging policy. These features of RPS simplify the management of data in the HPC environment as users of the HPC Cluster can store their data and code on RPS directories and they do not need to move data between the HPC Scratch and the HPC Archive file systems. +Research Project Space (RPS) volumes provide working space for sharing data and code amongst project or lab members. RPS directories are built on the same parallel file system (VAST) like HPC Scratch. They are mounted on the cluster Compute Nodes, and thus they can be accessed by running jobs. RPS directories are backed up and there is no old file purging policy. These features of RPS simplify the management of data in the HPC environment as users of the HPC Cluster can store their data and code on RPS directories and they do not need to move data between the HPC Scratch and the HPC Archive file systems. -:::note -- Due to limitations of the underlying parallel file system, ***the total number of RPS volumes that can be created is limited***. +:::info +- Due to limitations of the underlying parallel file system, the total number of RPS volumes that can be created is limited. - There is an annual cost associated with RPS. - The disk space and inode usage in RPS directories do not count towards quota limits in other HPC file systems (Home, Scratch, and Archive). ::: diff --git a/docs/hpc/03_storage/05_best_practices.mdx b/docs/hpc/03_storage/05_best_practices.mdx new file mode 100644 index 0000000000..b020990b9c --- /dev/null +++ b/docs/hpc/03_storage/05_best_practices.mdx @@ -0,0 +1,66 @@ +# Best Practices on HPC Storage +## User Quota Limits and the myquota command +All users have quote limits set on HPC fie systems. There are several types of quota limits, such as limits on the amount of disk space (disk quota), number of files (inode quota) etc. The default user quota limits on HPC file systems are listed [on our Data Management page](./01_intro_and_data_management.mdx#hpc-storage-mounts-comparison-table). + +:::warning[Home directory inode quotas] +_One of the common issues users report is running out of inodes in their home directory._ This usually occurs during software installation, for example installing conda environment under their home directory. Running out of quota causes a variety of issues such as running user jobs being interrupted or users being unable to finish the installation of packages under their home directory. +::: + +Users can check their current utilization of quota using the myquota command. The myquota command provides a report of the current quota limits on mounted file systems, the user's quota utilization, as well as the percentage of quota utilization. + +In the following example the user who executes the `myquota` command is out of inodes in their home directory. The user inode quota limit on the `/home` file system **30.0K inodes** and the user has **33000 inodes**, thus **110%** of the inode quota limit. +```sh +$ myquota +Quota Information for NetID +Hostname: torch-login-2 at 2025-12-09 17:18:24 + +Filesystem Environment Backed up? Allocation Current Usage +Space Variable /Flushed? Space / Files Space(%) / Files(%) + +/home $HOME YES/NO 0.05TB/0.03M 0.0TB(0.0%)/54(0%) +/scratch $SCRATCH NO/YES 5.0TB/5.0M 0.0TB(0.0%)/1(0%) +/archive $ARCHIVE YES/NO 2.0TB/0.02M 0.0TB(0.0%)/1(0%) +``` +You can use the following command to print the list of files within each sub-folder for a given directory: +```sh +$cd $HOME +$du --inodes -h --max-depth=1 +6 ./.ssh +88 ./.config +2 ./.vnc +2 ./.aws +3 ./.lmod.d +5.3K ./.local +3 ./.dbus +408 ./ondemand +2 ./.virtual_documents +6 ./.nv +6.7K ./.pixi +33 ./workshop_scripts +5 ./.cupy +6 ./.gnupg +1 ./.emacs.d +194 ./.nextflow +6 ./.terminfo +2 ./.conda +2 ./.singularity +3 ./.vast-dev +1 ./custom +185 ./genai-workshop +6 ./.atuin +1 ./.apptainer +9 ./.subversion +4 ./packages +1.4K ./.cache +15K . +``` + +## Large number of small files +In case your dataset or workflow requires to use large number of small files, this can create a bottleneck due to read/write rates. Please refer to [our page on working with a large number of files](./06_large_number_of_small_files.md) to learn about some of the options we recommend to consider. + +## Installing Python packages +:::warning +Your home directory is limited to a relatively small number of inodes (30,000). Creating conda/python environments in you home directory, this can eat easily exhaust your inode quota. +::: + +Please review the [Package Management section](../06_tools_and_software/01_intro.md#package-management-for-r-python--julia-and-conda-in-general) of the [Torch Software Page](../06_tools_and_software/01_intro.md). diff --git a/docs/hpc/03_storage/06_best_practices.md b/docs/hpc/03_storage/06_best_practices.md deleted file mode 100644 index 65705284b0..0000000000 --- a/docs/hpc/03_storage/06_best_practices.md +++ /dev/null @@ -1,51 +0,0 @@ -# Best Practices on HPC Storage -## User Quota Limits and the myquota command -All users have quote limits set on HPC fie systems. There are several types of quota limits, such as limits on the amount of disk space (disk quota), number of files (inode quota) etc. The default user quota limits on HPC file systems are listed [on our Data Management page](./01_intro_and_data_management.mdx#hpc-storage-mounts-comparison-table). - -:::warning -_One of the common issues users report is running out of inodes in their home directory._ This usually occurs during software installation, for example installing conda environment under their home directory. Running out of quota causes a variety of issues such as running user jobs being interrupted or users being unable to finish the installation of packages under their home directory. -::: - -Users can check their current utilization of quota using the myquota command. The myquota command provides a report of the current quota limits on mounted file systems, the user's quota utilization, as well as the percentage of quota utilization. - -In the following example the user who executes the `myquota` command is out of inodes in their home directory. The user inode quota limit on the `/home` file system **30.0K inodes** and the user has **33000 inodes**, thus **110%** of the inode quota limit. -```sh -$ myquota -Hostname: log-1 at Sun Mar 21 21:59:08 EDT 2021 -Filesystem Environment Backed up? Allocation Current Usage -Space Variable /Flushed? Space / Files Space(%) / Files(%) -/home $HOME Yes/No 50.0GB/30.0K 8.96GB(17.91%)/33000(110.00%) -/scratch $SCRATCH No/Yes 5.0TB/1.0M 811.09GB(15.84%)/2437(0.24%) -/archive $ARCHIVE Yes/No 2.0TB/20.0K 0.00GB(0.00%)/1(0.00%) -/vast $VAST No/Yes 2.0TB/5.0M 0.00GB(0.00%)/1(0.00%) -``` -Users can find out the number of inodes (files) used per subdirectory under their home directory (`$HOME`), by running the following commands: -```sh -$cd $HOME -$ for d in $(find $(pwd) -maxdepth 1 -mindepth 1 -type d | sort -u); do n_files=$(find $d | wc -l); echo $d $n_files; done -/home/netid/.cache 1507 -/home/netid/.conda 2 -/home/netid/.config 2 -/home/netid/.ipython 11 -/home/netid/.jupyter 2 -/home/netid/.keras 2 -/home/netid/.local 24185 -/home/netid/.nv 2 -/home/netid/.sacrebleu 46 -/home/netid/.singularity 1 -/home/netid/.ssh 5 -/home/netid/.vscode-server 7216 -``` - -## Large number of small files -In case your dataset or workflow requires to use large number of small files, this can create a bottleneck due to read/write rates. - -Please refer to [our page on working with a large number of files](./07_large_number_of_small_files.md) to learn about some of the options we recommend to consider. - -## Installing Python packages -:::warning -Your home directory has a relatively small number of inodes. -If you create a conda or python environment in you home directory, this can eat up all the inodes. -::: - -Please review the [Package Management section](../06_tools_and_software/01_intro.md#package-management-for-r-python--julia-and-conda-in-general) of the [Torch Software Page](../06_tools_and_software/01_intro.md). diff --git a/docs/hpc/03_storage/07_large_number_of_small_files.md b/docs/hpc/03_storage/06_large_number_of_small_files.md similarity index 100% rename from docs/hpc/03_storage/07_large_number_of_small_files.md rename to docs/hpc/03_storage/06_large_number_of_small_files.md diff --git a/docs/hpc/03_storage/08_transferring_cloud_storage_data_with_rclone.md b/docs/hpc/03_storage/07_transferring_cloud_storage_data_with_rclone.md similarity index 96% rename from docs/hpc/03_storage/08_transferring_cloud_storage_data_with_rclone.md rename to docs/hpc/03_storage/07_transferring_cloud_storage_data_with_rclone.md index 10316b7868..1040b38b59 100644 --- a/docs/hpc/03_storage/08_transferring_cloud_storage_data_with_rclone.md +++ b/docs/hpc/03_storage/07_transferring_cloud_storage_data_with_rclone.md @@ -1,7 +1,11 @@ # Transferring Cloud Storage Data with rclone +:::tip Globus +Globus is the recommended tool to use for large-volume data transfers due to the efficiency, reliability, security and ease of use. Use other tools only if you really need to. Detailed instructions available at [Globus](./03_globus.md) +::: + ## Transferring files to and from Google Drive with RCLONE -Having access to Google Drive from the HPC environment provides an option to archive data and even share data with collaborators who have no access to the NYU HPC environment. Other options to archiving data include the HPC Archive file system and using [Globus](./04_globus.md) to share data with collaborators. +Having access to Google Drive from the HPC environment provides an option to archive data and even share data with collaborators who have no access to the NYU HPC environment. Other options to archiving data include the HPC Archive file system and using [Globus](./03_globus.md) to share data with collaborators. Access to Google Drive is provided by [rclone](https://rclone.org/drive/) - rsync for cloud storage - a command line program to sync files and directories to and from cloud storage systems such as Google Drive, Amazon Drive, S3, B2 etc. [rclone](https://rclone.org/drive/) is available on Torch cluster as a module, the module versions currently available (March 2025) are: - **rclone/1.68.2** @@ -344,7 +348,7 @@ Please enter 'q' and we're done with configuration. ### Step 4: Transfer :::warning -Please be sure to perform data transfers on a data transfer node (DTN). It can degrade performance for other users to perform transfers on other types of nodes. For more information please see [Data Transfers](./03_data_transfers.md) +Please be sure to perform data transfers on a data transfer node (DTN). It can degrade performance for other users to perform transfers on other types of nodes. For more information please see [Data Transfers](./02_data_transfers.md) ::: Sample commands: diff --git a/docs/hpc/03_storage/09_sharing_data_on_hpc.md b/docs/hpc/03_storage/08_sharing_data_on_hpc.md similarity index 100% rename from docs/hpc/03_storage/09_sharing_data_on_hpc.md rename to docs/hpc/03_storage/08_sharing_data_on_hpc.md diff --git a/docs/hpc/06_tools_and_software/04_python_packages_with_virtual_environments.mdx b/docs/hpc/06_tools_and_software/04_python_packages_with_virtual_environments.mdx index aa29be619c..b8959a0857 100644 --- a/docs/hpc/06_tools_and_software/04_python_packages_with_virtual_environments.mdx +++ b/docs/hpc/06_tools_and_software/04_python_packages_with_virtual_environments.mdx @@ -33,7 +33,7 @@ Thus you can consider the following options: - Reinstall your packages if some of the files get deleted - You can do this manually - You can do this automatically. For example, within a workflow of a pipeline software like [Nextflow](https://www.nextflow.io/) -- Pay for "Research Project Space" - for details see [Research Project Space](../03_storage/05_research_project_space.mdx) +- Pay for "Research Project Space" - for details see [Research Project Space](../03_storage/04_research_project_space.mdx) ::: diff --git a/docs/hpc/06_tools_and_software/05_r_packages_with_renv.mdx b/docs/hpc/06_tools_and_software/05_r_packages_with_renv.mdx index 97d2582be4..87e5917519 100644 --- a/docs/hpc/06_tools_and_software/05_r_packages_with_renv.mdx +++ b/docs/hpc/06_tools_and_software/05_r_packages_with_renv.mdx @@ -36,7 +36,7 @@ Thus you can consider the following options: - Reinstall your packages if some of the files get deleted - You can do this manually - You can do this automatically. For example, within a workflow of a pipeline software like [Nextflow](https://www.nextflow.io/) -- Pay for "Research Project Space" - for details see [Research Project Space](../03_storage/05_research_project_space.mdx) +- Pay for "Research Project Space" - for details see [Research Project Space](../03_storage/04_research_project_space.mdx) - Use Singularity and install packages within a corresponding overlay file - Details available at [Squash File System and Singularity](../07_containers/04_squash_file_system_and_singularity.md) ::: diff --git a/docs/hpc/06_tools_and_software/06_conda_environments.mdx b/docs/hpc/06_tools_and_software/06_conda_environments.mdx index 589e56d60f..00290589f6 100644 --- a/docs/hpc/06_tools_and_software/06_conda_environments.mdx +++ b/docs/hpc/06_tools_and_software/06_conda_environments.mdx @@ -41,7 +41,7 @@ Thus you can consider the following options: - Reinstall your packages if some of the files get deleted - You can do this manually - You can do this automatically. For example, within a workflow of a pipeline software like [Nextflow](https://www.nextflow.io/) -- Pay for "Research Project Space" - for details see [Research Project Space](../03_storage/05_research_project_space.mdx) +- Pay for "Research Project Space" - for details see [Research Project Space](../03_storage/04_research_project_space.mdx) - Use Singularity and install packages within a corresponding overlay file - Details available at [Squash File System and Singularity](../07_containers/04_squash_file_system_and_singularity.md) ::: diff --git a/docs/hpc/12_tutorial_intro_shell_hpc/03_moving_looking.mdx b/docs/hpc/12_tutorial_intro_shell_hpc/03_moving_looking.mdx index a8ab10fb37..c35ac0046c 100644 --- a/docs/hpc/12_tutorial_intro_shell_hpc/03_moving_looking.mdx +++ b/docs/hpc/12_tutorial_intro_shell_hpc/03_moving_looking.mdx @@ -24,7 +24,6 @@ The NYU HPC clusters have multiple file systems for user’s files. Each file sy | /home | $HOME | Program development space; storing small files you want to keep long term, e.g. source code, scripts. | NO | 20 GB | | /scratch | $SCRATCH | Computational workspace. Best suited to large, infrequent reads and writes. | YES. Files not accessed for 60 days are deleted. | 5 TB | | /archive | $ARCHIVE | Long-term storage | NO | 2 TB | -| /vast | $VAST | Flash memory for high I/O workflows | YES. Files not accessed for 60 days are deleted. | 2 TB | Please see [HPC Storage](../03_storage/01_intro_and_data_management.mdx) for more details. @@ -374,4 +373,4 @@ The directories are listed alphabetical at each level, the files/directories in - To view files, use `ls`. - You can view help for a command with `man command` or `command --help`. - Hit `tab` to autocomplete whatever you’re currently typing. -::: \ No newline at end of file +::: diff --git a/docs/hpc/13_tutorial_intro_hpc/10_using_resources_responsibly.mdx b/docs/hpc/13_tutorial_intro_hpc/10_using_resources_responsibly.mdx index 278f077e41..4355fa78a1 100644 --- a/docs/hpc/13_tutorial_intro_hpc/10_using_resources_responsibly.mdx +++ b/docs/hpc/13_tutorial_intro_hpc/10_using_resources_responsibly.mdx @@ -20,7 +20,7 @@ The widespread usage of scheduling systems where users submit jobs on HPC resour ## Be Kind to the Login Nodes The login node is often busy managing all of the logged in users, creating and editing files and compiling software. If the machine runs out of memory or processing capacity, it will become very slow and unusable for everyone. While the machine is meant to be used, be sure to do so responsibly – in ways that will not adversely impact other users’ experience. -Login nodes are always the right place to launch jobs, but data transfers should be done on the Torch Data Transfer Nodes (gDTNs). Please see more about gDTNs at [Data Transfers](../03_storage/03_data_transfers.md). Similarly, computationally intensive tasks should all be done on compute nodes. This refers to not just computational analysis/research tasks, but also to processor intensive software installations and similar tasks. +Login nodes are always the right place to launch jobs, but data transfers should be done on the Torch Data Transfer Nodes (gDTNs). Please see more about gDTNs at [Data Transfers](../03_storage/02_data_transfers.md). Similarly, computationally intensive tasks should all be done on compute nodes. This refers to not just computational analysis/research tasks, but also to processor intensive software installations and similar tasks. :::warning[Login Nodes Are a Shared Resource] Remember, the login node is shared with all other users and your actions could cause issues for other people. Think carefully about the potential implications of issuing commands that may use large amounts of resource. @@ -77,7 +77,7 @@ Make sure you understand what the backup policy is on the system you are using a ::: ## Transferring Data -The most important point about transferring data responsibly on Green is to be sure to use Torch Data Transfer Nodes (gDTNs) or other options like [Globus](../03_storage/04_globus.md). Please see [Data Transfers](../03_storage/03_data_transfers.md) for details. By doing this you'll help to keep the login nodes responsive for all users. +The most important point about transferring data responsibly on Green is to be sure to use Torch Data Transfer Nodes (gDTNs) or other options like [Globus](../03_storage/03_globus.md). Please see [Data Transfers](../03_storage/02_data_transfers.md) for details. By doing this you'll help to keep the login nodes responsive for all users. Being efficient in *how* you transfer data on the gDTNs is also important. It will not only reduce the load on the gDTNs, but also save your time. Be sure to archive and compress you files if possible with `tar` and `gzip`. This will remove the overhead of trying to transfer many files and shrink the size of transfer. Please see [Transferring Files with Remote Computers](./07_transferring_files_remote.mdx) for details.