You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/concepts/content-addressing.md
+41Lines changed: 41 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -87,6 +87,47 @@ shasum: WARNING: 1 computed checksum did NOT match
87
87
88
88
As we can see, the hash included in the CID does NOT match the hash of the input file `ubuntu-20.04.1-desktop-amd64.iso`.
89
89
90
+
### Why the hashes differ
91
+
92
+
The example above shows that the [Multihash](glossary.md#multihash) inside a CID does not match a simple file checksum. This is because the Multihash is the hash of the [root block](glossary.md#root), not a direct hash of the file's bytes.
93
+
94
+
When you add a file to IPFS, the data goes through several transformations:
95
+
96
+
1.**Chunking**: Large files are split into smaller [blocks](glossary.md#block) (typically 256KiB-1MiB each)
97
+
2.**Structuring**: These blocks are organized into a [DAG](glossary.md#dag) (directed acyclic graph)
98
+
3.**Encoding**: A [codec](glossary.md#codec) wraps the data with metadata describing its structure
99
+
100
+
The root block contains links to all the other blocks, and it's this root block that gets hashed to produce the Multihash in your CID.
101
+
102
+
#### When CID hash equals file hash
103
+
104
+
There is one case where the Multihash does equal the file's hash: when the CID uses the `raw`[codec](glossary.md#codec) and the file fits in a single block. The `raw` codec stores bytes without any wrapper, so for small files added with `--raw-leaves`, the Multihash is a direct hash of the file contents.
105
+
106
+
#### Same file, different CIDs
107
+
108
+
Two identical files can produce different CIDs. The CID depends on both the content *and* how that content is structured:
109
+
110
+
-**Chunk size**: Different chunking strategies produce different block trees
111
+
-**DAG layout**: Balanced trees vs. trickle DAGs organize blocks differently
112
+
-**Codec**: [UnixFS](glossary.md#unixfs) ([dag-pb](glossary.md#dag-pb)), [dag-cbor](glossary.md#dag-cbor), `raw`, and others each encode data differently
113
+
-**CID version**: [CIDv0](glossary.md#cid-v0) vs [CIDv1](glossary.md#cid-v1) use different formats
114
+
-**Hash algorithm**: sha2-256, blake3, and others produce different hashes
115
+
116
+
#### Why this flexibility matters
117
+
118
+
This is a feature, not a limitation. Different structures optimize for different use cases:
119
+
120
+
-**DAG layout** trades off seeking against appending: balanced DAGs enable fast random access in large files like videos, trickle DAGs optimize for sequential, append-only data like logs
121
+
-**Chunking strategy** balances retrieval overhead against sync efficiency: large chunks mean fewer blocks for bulk downloads, small chunks mean less data to transfer when syncing deltas. Strategies range from simple fixed-size chunking to content-defined algorithms like Rabin or Buzhash that fine-tune deduplication based on dataset characteristics
122
+
-**Hash function** varies by system: legacy decisions, regulatory requirements, or interoperability needs may dictate which algorithm to use
123
+
-**Directory sharding** threshold, in systems like [UnixFS](glossary.md#unixfs), determines when directories switch from flat listings to [HAMT](glossary.md#hamt-sharding) to seamlessly support huge directories with millions of files. This threshold also affects how much of the DAG needs to be recreated when a single file in the directory is modified
124
+
125
+
[UnixFS](glossary.md#unixfs) is the default format for files and directories, but you can use other codecs or create custom ones for specialized needs.
126
+
127
+
When you need reproducible CIDs across different tools, the community documents common parameter sets called [CID profiles](https://github.com/ipfs/specs/pull/499). These define standard combinations of chunking, DAG layout, and codec settings.
128
+
129
+
To explore how a CID is structured, use the [CID Inspector](https://cid.ipfs.tech/#bafybeicn7i3soqdgr7dwnrwytgq4zxy7a5jpkizrvhm5mv6bgjd32wm3q4). To see the DAG behind a CID, use the [DAG Explorer](https://explore.ipld.io/#/explore/bafybeicn7i3soqdgr7dwnrwytgq4zxy7a5jpkizrvhm5mv6bgjd32wm3q4).
Copy file name to clipboardExpand all lines: docs/concepts/how-ipfs-works.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -45,13 +45,13 @@ IPFS represents data as content-addressed <VueCustomTooltip label="The term for
45
45
46
46
In IPFS, data is chunked into <VueCustomTooltiplabel="The term for a single unit of data in IPFS."underlinedmultilineis-medium>blocks</VueCustomTooltip>, which are assigned a unique identifier called a <VueCustomTooltiplabel="An address used to point to data in IPFS, based on the content itself, as opposed to the location."underlinedmultilineis-medium>Content Identifier (CID)</VueCustomTooltip>. In general, the CID is computed by combining the hash of the data with its <VueCustomTooltiplabel="Software capable of encoding and/or decoding data."underlinedmultilineis-medium>codec</VueCustomTooltip>. The codec is generated using <VueCustomTooltiplabel="A collection of interoperable, extensible protocols for making data self-describable."underlinedmultilineis-medium>Multiformats</VueCustomTooltip>.
47
47
48
-
CIDs are unique to the data from which they were computed, which provides IPFS with the following benefits:
49
-
-Data can be fetched based on its content, rather than its location.
50
-
-The CID of the data received can be computed and compared to the CID requested, to verify that the data is what was requested.
48
+
Because CIDs are based on content, not location:
49
+
-You can fetch data by *what it is*, not where it's stored.
50
+
-You can verify data by recomputing the CID and comparing it to what you requested.
51
51
52
52
:::callout
53
53
**Learn more**
54
-
Learn more about the concepts behind CIDs described here with the [the CID deep dive](../concepts/content-addressing.md#cid-versions).
54
+
Learn more about CIDs in the [CID deep dive](../concepts/content-addressing.md#cid-versions).
Copy file name to clipboardExpand all lines: docs/quickstart/pin-cli.md
+9-7Lines changed: 9 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -122,22 +122,22 @@ Each method will return a **CID** (Content Identifier) for your uploaded file. S
122
122
123
123
## CIDs explained
124
124
125
-
In IPFS, every file and directory is identified with a Content Identifier ([CID](../concepts/content-addressing.md)). The CID serves as the **permanent address** of the file and can be used by anyone to find it on the IPFS network.
125
+
In IPFS, every file and directory is identified with a Content Identifier ([CID](../concepts/content-addressing.md)), a unique hash derived from the file's contents. The CID serves as the **permanent address** of the file and can be used by anyone to find it on any IPFS network or system.
126
126
127
-
When a file is first added to an IPFS node (like the image used in this guide), it's first transformed into a content-addressable representation in which the file is split into smaller chunks (if above ~1MB) which are linked together and hashed to produce the CID.
127
+
When you add a file to IPFS, the system generates its CID by hashing the contents. Larger files (above ~1MB) are split into smaller chunks, linked together, and hashed.
You can now share the CID with anyone and they can fetch the file using IPFS.
135
+
Once you have a CID, you can share it with anyone and they can fetch the file using IPFS.
136
136
137
-
To dive deeper into the anatomy of the CID, check out the [CID inspector](https://cid.ipfs.tech/#bafybeicn7i3soqdgr7dwnrwytgq4zxy7a5jpkizrvhm5mv6bgjd32wm3q4).
137
+
To explore the anatomy of a CID, check out the [CID Inspector](https://cid.ipfs.tech/#bafybeicn7i3soqdgr7dwnrwytgq4zxy7a5jpkizrvhm5mv6bgjd32wm3q4). To explore the anatomy of the DAG behind a CID, check out the [DAG Explorer](https://explore.ipld.io/#/explore/bafybeicn7i3soqdgr7dwnrwytgq4zxy7a5jpkizrvhm5mv6bgjd32wm3q4).
138
138
139
139
:::callout
140
-
The transformation into a content-addressable representation is a local operation that doesn't require any network connectivity. Many CLI tools perform this transformation locally before uploading.
140
+
**Important caveat:** Two identical files can produce different CIDs. The CID reflects the contents *and* how the file is processed: chunk size, DAG layout, hash algorithm, CID version, and other [UnixFS](https://specs.ipfs.tech/unixfs/) parameters. The same file processed with different parameters will produce different CIDs. See [CIDs are not file hashes](../concepts/content-addressing.md#cids-are-not-file-hashes) for details.
0 commit comments