Skip to content

Latest commit

ย 

History

History
72 lines (54 loc) ยท 6.34 KB

File metadata and controls

72 lines (54 loc) ยท 6.34 KB

Hash Table

key์— ๋Œ€ํ•œ hash ๊ฐ’์„ ์ธ๋ฑ์Šค๋กœ ์‚ฌ์šฉํ•˜์—ฌ key-value ์Œ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๊ณ  ์กฐํšŒํ•˜๋ฉฐ, key-value ์Œ์˜ ๋ฐ์ดํ„ฐ์˜ ๊ฐœ์ˆ˜์— ๋”ฐ๋ผ ๋™์ ์œผ๋กœ ํฌ๊ธฐ๊ฐ€ ์ฆ๊ฐ€ํ•˜๋Š” ์ž๋ฃŒ๊ตฌ์กฐ์ด๋‹ค.

hash table์€ array๋กœ ์ด๋ฃจ์–ด์ ธ์žˆ๊ณ  array ๊ฐ๊ฐ์˜ ์ฃผ์†Œ๋ฅผ bucket ์ด๋ผ ๋ถ€๋ฅธ๋‹ค. hash function ์€ key-value์˜ key๊ฐ’์„ array์˜ index๋กœ ๋งคํ•‘์‹œํ‚ค๋Š” ํ•จ์ˆ˜์ด๋‹ค.

Hash Functin

ํ•ด์‹œ ํ•จ์ˆ˜๋Š” ์ž„์˜ ํฌ๊ธฐ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ณ ์ • ํฌ๊ธฐ์˜ ๊ฐ’์œผ๋กœ ๋งคํ•‘ํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋˜๋Š” ํ•จ์ˆ˜์ด๋‹ค.

  • hash function๋กœ hash table์ด๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” array์— ์ €์žฅ๋  ๋ฐ์ดํ„ฐ์˜ ์œ„์น˜(index)๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.
    • index๋ฅผ ๊ตฌํ•  ๋•Œ, ๋ณดํ†ต mod ์—ฐ์‚ฐ์„ ์‚ฌ์šฉํ•œ๋‹ค.
    • hash function์„ ํ†ตํ•ด index๋ฅผ ๊ตฌํ•˜๋Š” ๊ณผ์ •์„ hashing ์ด๋ผ๊ณ  ํ•œ๋‹ค.

Hash Collision

1/m ํ™•๋ฅ ์˜ ํ•ด์‹œ ์ถฉ๋Œ (hash table์˜ ํฌ๊ธฐ: m)

hash function์˜ ํ‘œํ˜„๋ฒ”์œ„๋ฅผ m์œผ๋กœ ์ขํž˜์œผ๋กœ์จ ์„œ๋กœ ๋‹ค๋ฅธ hashCode๊ฐ’์„ ๊ฐ€์ง„ ๊ฐ์ฒด๊ฐ€ 1/m ์˜ ํ™•๋ฅ ๋กœ ๋™์ผํ•œ ํ•ด์‹œ ๋ฒ„ํ‚ท ์ธ๋ฑ์Šค๋ฅผ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋Š” ํ•ด์‹œ ์ถฉ๋Œ(hash collision) ์ด ๋ฐœ์ƒํ•œ๋‹ค. ํ•ด์‹œ ์ถฉ๋Œ์€ hash function์„ ์•„๋ฌด๋ฆฌ ์ž˜ ๊ตฌํ˜„ํ•˜์—ฌ๋„ ์ƒ๊ด€์—†์ด ๋ฐœ์ƒํ•œ๋‹ค.

ํ•ด์‹œ ์ถฉ๋Œ ํ•ด๊ฒฐ๋ฐฉ๋ฒ•

Open Addressing (๊ฐœ๋ฐฉ ์ฃผ์†Œ๋ฒ•)

๋ฐ์ดํ„ฐ๋ฅผ ์‚ฝ์ž…ํ•˜๋ ค๋Š” hash bucket์ด ์ด๋ฏธ ์‚ฌ์šฉ์ค‘์ด๋ผ๋ฉด ๋น„์–ด์žˆ๋Š” bucket์„ ์ฐพ์•„ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฝ์ž…ํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค.

  • Worst Case์˜ ๊ฒฝ์šฐ ๋น„์–ด์žˆ๋Š” bucket์„ ์ฐพ์ง€ ๋ชปํ•˜๊ณ  ํƒ์ƒ‰์„ ์‹œ์ž‘ํ•œ ์œ„์น˜๋กœ ๋˜๋Œ์•„์˜ฌ ์ˆ˜ ์žˆ๋‹ค.
    • ๋ฐ์ดํ„ฐ๊ฐ€ ์กด์žฌํ•˜๋Š” bucket์ด ๋ชจ์—ฌ์žˆ์œผ๋ฉด Worse Case ๋ฐœ์ƒ ๋นˆ๋„๊ฐ€ ๋†’์•„์ง„๋‹ค.
  • ๋น„์–ด์žˆ๋Š” bucket์„ ํƒ์ƒ‰ํ•˜๋Š” ๋ฐฉ์‹์—๋Š” Linear Probing, Quadratic Probing, Double Hashing ์ด ์žˆ๋‹ค.
    • Linear Probing(์„ ํ˜• ํƒ์ƒ‰)
      • ํ•ด์‹œ ์ถฉ๋Œ์ด ๋ฐœ์ƒํ•œ ํ˜„์žฌ bucket index๋ถ€ํ„ฐ ๊ณ ์ •ํญ n๋งŒํผ ์ด๋™ํ•˜๋ฉด์„œ ๋น„์–ด์žˆ๋Š” bucket์„ ์ฐพ์•„ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค.
      • Primary Clustering ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค.
      • ๋ฐ์ดํ„ฐ๊ฐ€ ์กด์žฌํ•˜๋Š” filled bucket๋“ค์ด ๋ชจ์—ฌ์žˆ๋‹ค๋ฉด ๋น„์–ด์žˆ๋Š” bucket์„ ์ฐพ๊ธฐ ๊นŒ์ง€์˜ ํƒ์ƒ‰ ์‹œ๊ฐ„์ด ๋Š˜์–ด๋‚œ๋‹ค. Primary Cluster๊ฐ€ ํ˜•์„ฑ๋˜๋ฉด ๋น ๋ฅด๊ฒŒ ์ปค์ง€๊ณ  ๊ฒฐ๊ตญ ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ์•ผ๊ธฐํ•œ๋‹ค.
    • Quadratic Probing(์ œ๊ณฑ ํƒ์ƒ‰)
      • ํ•ด์‹œ ์ถฉ๋Œ์ด ๋ฐœ์ƒํ•œ ํ˜„์žฌ bucket index๋ถ€ํ„ฐ n^2 ๋งŒํผ ์ด๋™ํ•˜๋ฉด์„œ ๋น„์–ด์žˆ๋Š” bucket์„ ์ฐพ์•„ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ Linear Probing ๋ฐฉ์‹์˜ Primary Clustering ๋ฐœ์ƒ ๊ฐ€๋Šฅ์„ฑ์„ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค.
      • Secondary Clustering ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค.
      • n^2 ๊ฐ„๊ฒฉ์œผ๋กœ filled bucket์ด ์กด์žฌํ•˜์—ฌ key์˜ hash๊ฐ’(index) ๋ณด๋‹ค ํ›จ์”ฌ ๋–จ์–ด์ง„ ๊ณณ์— ๋ฐ์ดํ„ฐ๊ฐ€ ์‚ฝ์ž…๋˜๋Š” ํ˜„์ƒ์ด๋‹ค. ์ด๋Š” filled bucket ๊ตฐ์ง‘์„ ํฌ๊ฒŒ๋งŒ๋“ค์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— Primary Clustering ๋ณด๋‹ค ๋œ ์‹ฌ๊ฐํ•œ ๋ฌธ์ œ์ด๋‹ค.
    • Double Hashing(์ด์ค‘ ํ•ด์‹ฑ)
      • hash๊ฐ’์„ ๋‹ค๋ฅธ hash function์œผ๋กœ ํ•œ๋ฒˆ ๋” ํ•ด์‹ฑํ•˜์—ฌ hash์˜ ๊ทœ์น™์„ฑ์„ ์—†์• ๋Š” ๋ฐฉ์‹์œผ๋กœ Secondary Clustering ๋ฐœ์ƒ ๊ฐ€๋Šฅ์„ฑ์„ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค.

Open Addressing์˜ ๋ฐ์ดํ„ฐ ํƒ์ƒ‰ ๋ฐ ์‚ญ์ œ

  • ํƒ์ƒ‰
    • target ๋ฐ์ดํ„ฐ๋ฅผ ์ฐพ๊ฑฐ๋‚˜ empty bucket์— ๋„๋‹ฌํ•˜๊ธฐ ์ „๊นŒ์ง€ ํƒ์ƒ‰(probing)์„ ์ง„ํ–‰ํ•œ๋‹ค. target ๋ฐ์ดํ„ฐ๋ฅผ ์ฐพ๋Š” ๊ณผ์ •์—์„œ empty bucket์— ๋„๋‹ฌํ•˜๋ฉด ํƒ์ƒ‰(probing)์„ ์ข…๋ฃŒํ•˜๋Š” ๋ฌธ์ œ์ ์ด ์žˆ๋‹ค.
  • ์‚ญ์ œ
    • ์ค‘๊ฐ„ ์š”์†Œ๋ฅผ ์‚ญ์ œํ•˜๊ฒŒ ๋˜๋ฉด ํƒ์ƒ‰ ์ค‘ empty bucket์„ ๋งŒ๋‚˜ ํƒ์ƒ‰์„ ์ข…๋ฃŒํ•˜๊ฒŒ ๋˜์–ด ์›ํ•˜๋Š” ์š”์†Œ๋ฅผ ํƒ์ƒ‰ํ•  ์ˆ˜ ์—†๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค.
    • ์‚ญ์ œํ•œ ๋ฐ์ดํ„ฐ์˜ bucket์— dummy node ๋ฅผ ๋„ฃ๊ฑฐ๋‚˜ flag (Occupied, Empty, Deleted) ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ํƒ์ƒ‰์ด ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์ง„ํ–‰๋˜๋„๋ก ํ•  ์ˆ˜ ์žˆ๋‹ค.

Separate Chaining (๋ถ„๋ฆฌ ์—ฐ๊ฒฐ๋ฒ•)

๊ฐ๊ฐ์˜ hash bucket์„ LinkedList๋ฅผ ๊ฐ€๋ฆฌํ‚ค๋Š” ํฌ์ธํ„ฐ๋กœ ๊ตฌ์„ฑํ•˜์—ฌ ํ•ด์‹œ ์ถฉ๋Œ์‹œ ํ•ด๋‹น bucket์˜ LinkedList์— ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค.

  • ์ผ๋ฐ˜์ ์œผ๋กœ Seperate Chaining ๋ฐฉ์‹์€ Open Addressing ๋ฐฉ์‹๋ณด๋‹ค ๋น ๋ฅด๋‹ค.

    • Open Addressing์€ ๋ฐ์ดํ„ฐ๊ฐ€ ์กด์žฌํ•˜๋Š” bucket์ด ๋ชจ์—ฌ์žˆ์œผ๋ฉด Worst Case ๋ฐœ์ƒ ๋นˆ๋„๊ฐ€ ๋†’์•„์ง„๋‹ค
    • Seperate Chaining์€ ํ•ด์‹œ ์ถœ๋Œ์„ ํ”ผํ•˜๋„๋ก ๋ณด์กฐ ํ•ด์‹œ ํ•จ์ˆ˜๋ฅผ ์กฐ์ •ํ•˜๋ฉด Worse Case์— ๊ฐ€๊นŒ์›Œ์ง€๋Š” ๋นˆ๋„๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค.
  • Java 7์—์„œ HashMap์€ Seperate Chaining ๋ฐฉ์‹์œผ๋กœ ๊ตฌํ˜„๋˜์–ด์žˆ๋‹ค.

  • Seperate Chaining ๋ฐฉ์‹์€ ๋ณด์กฐ ํ•ด์‹œ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•ด์‹œ ์ถฉ๋Œ ๊ฐ€๋Šฅ์„ฑ์„ ์ค„์ธ๋‹ค.

  • Red Black Tree๋กœ ๊ตฌํ˜„ํ•œ Separate Chaining

    • ํ•˜๋‚˜์˜ hash bucket์— ํ•ด๋‹นํ•˜๋Š” LinkedList์˜ ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜๊ฐ€ 8๊ฐœ๊ฐ€ ๋˜๋ฉด ๊ตฌ์กฐ๋ฅผ LinkedList -> Red-Black Tree ๋กœ ๋ณ€ํ™˜ํ•œ๋‹ค. LinkedList ํƒ์ƒ‰ ์„ฑ๋Šฅ์˜ Worst Case๋Š” O(N) ์ด์ง€๋งŒ Red-Black Tree์˜ ๊ฒฝ์šฐ O(logN)์ด๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

    • ํ•˜๋‚˜์˜ hash bucket์— ํ• ๋‹น๋œ ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜๊ฐ€ 6๊ฐœ์ด๋ฉด ๊ตฌ์กฐ๋ฅผ Red-Black Tree -> LinkedList๋กœ ๋ณ€ํ™˜ํ•œ๋‹ค. ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜๊ฐ€ ์ ์œผ๋ฉด Red-Black Tree์™€ LinkedList๊ฐ„์˜ ํƒ์ƒ‰ ์„ฑ๋Šฅ ์ฐจ์ด๊ฐ€ ๊ฑฐ์˜ ์—†๊ณ , Red-Black Tree๋Š” LinkedList๋ณด๋‹ค ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์ด ๋งŽ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

    • ๋ณ€ํ™˜ ๊ธฐ์ค€์„ 6๊ฐœ, 8๊ฐœ ์ฒ˜๋Ÿผ 2๊ฐœ๋ผ๋Š” ์—ฌ์œ ๋ฅผ ๋‘์–ด ๊ณผ๋„ํ•œ ๊ตฌ์กฐ ๋ณ€ํ™˜์„ ๋ง‰์„ ์ˆ˜ ์žˆ๋‹ค. ๋งŒ์•ฝ ๋ณ€ํ™˜ ๊ธฐ์ค€์ด 6๊ฐœ , 7๊ฐœ ์ด๋ผ๋ฉด ๋ฐ์ดํ„ฐ๊ฐ€ ๋ฐ˜๋ณต์ ์œผ๋กœ ์‚ฝ์ž…, ์‚ญ์ œ๋˜๋Š” ๊ฒฝ์šฐ ๋ถˆํ•„์š”ํ•œ LinkedList โ†” Tree ๋ณ€ํ™˜์ด ์ผ์–ด๋‚˜ ์„ฑ๋Šฅ ์ €ํ•˜๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค.

Open Addressing vs Seperate Chaining

Open Addressing Seperate Chaining
Worst Case O(M) O(M)
์บ์‹œ ํšจ์œจ ์ข‹๋‹ค (์—ฐ์†๋œ ๊ณต๊ฐ„์— ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค) Open Addressing ๋ณด๋‹ค ์ข‹์ง€ ์•Š๋‹ค (ํ•ด์‹œ ์ถฉ๋Œ์‹œ LinkedList์— ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค)
๊ณต๊ฐ„ ํšจ์œจ ์ข‹๋‹ค (ํ•ด์‹œ ์ถฉ๋Œ์ด ๋ฐœ์ƒํ•œ ๊ฒฝ์šฐ์—๋„ probing์„ ํ†ตํ•ด ๋นˆ bucket์— ์ €์žฅ๋˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค) Open Addressing ๋ณด๋‹ค ์ข‹์ง€ ์•Š๋‹ค (ํ•ด์‹œ ์ถฉ๋Œ์ด ๋ฐœ์ƒํ•˜๋ฉด LinkedList์— ์ถ”๊ฐ€๋˜๊ธฐ ๋•Œ๋ฌธ์— ์‚ฌ์šฉ๋˜์ง€ ์•Š๋Š” bucket์ด ์กด์žฌํ•œ๋‹ค)
Resizing ๋นˆ๋„ ๋†’๋‹ค (bucket ์‚ฌ์šฉ๋ฅ ์ด ๋†’์•„ load factor์˜ ์ž„๊ณ„์ ์— ์‰ฝ๊ฒŒ ๋„๋‹ฌํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค) Open Addressing ๋ณด๋‹ค ๋‚ฎ๋‹ค (bucket ์‚ฌ์šฉ๋ฅ ์ด ๋‚ฎ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค)

Dynamic Resizing

hash bucket์˜ ๊ฐœ์ˆ˜๊ฐ€ ์ ๋‹ค๋ฉด ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ์„ ์•„๋‚„ ์ˆ˜ ์žˆ์ง€๋งŒ ํ•ด์‹œ ์ถฉ๋Œ๋กœ ์ธํ•ด ์„ฑ๋Šฅ ์ €ํ•˜๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋ฏ€๋กœ ์ž๋ฐ”์˜ HashMap์€ load factor๊ฐ€ ์ž„๊ณ„์ (0.75)์„ ๋„˜์–ด๊ฐˆ ๊ฒฝ์šฐ์— hash bucket ๊ฐœ์ˆ˜๋ฅผ 2๋ฐฐ๋กœ ๋Š˜๋ฆฐ๋‹ค.