Skip to content

Commit 4d6e764

Browse files
committed
1 parent a128927 commit 4d6e764

File tree

1 file changed

+187
-2
lines changed

1 file changed

+187
-2
lines changed

SoC-2026-Ideas.md

Lines changed: 187 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,140 @@ _Possible mentors_:
143143
* Siddharth Asthana < <siddharthasthana31@gmail.com> >
144144
* Lucas Seiki Oshiro < <lucasseikioshiro@gmail.com> >
145145

146+
### Improve disk space recovery for partial clones
147+
148+
Git's partial clone feature allows users to clone repositories without downloading
149+
all objects immediately, which is particularly useful for very large repositories.
150+
Objects are fetched on-demand from "promisor remotes" as needed. However, over time,
151+
clients may accumulate large local blobs that are no longer needed but remain on disk,
152+
and currently there's no easy way to reclaim this space.
153+
154+
This project aims to improve `git-backfill` (or create a new command) to allow
155+
clients to remove large local blobs when they are available on a promisor remote.
156+
This would help users who want to get back disk space while maintaining the ability
157+
to re-fetch objects when needed.
158+
159+
The project involves:
160+
- Designing a safe mechanism to identify which blobs can be removed
161+
- Implementing the removal process while maintaining repository integrity
162+
- Ensuring removed objects can be transparently re-fetched when needed
163+
- Adding appropriate safeguards and user controls
164+
165+
**Getting started:** Build Git from source, set up a partial clone and experiment
166+
with promisor remotes, study the existing `git-backfill` command (if available)
167+
or related functionality, understand how Git tracks and fetches objects from
168+
promisor remotes, review documentation on partial clones in
169+
`Documentation/technical/partial-clone.txt`, and submit a micro-patch to
170+
demonstrate familiarity with the codebase.
171+
172+
**Resources:**
173+
- [Partial clone documentation](https://git-scm.com/docs/partial-clone)
174+
- [Git Protocol v2 documentation](https://git-scm.com/docs/gitprotocol-v2)
175+
176+
_Expected Project Size_: 175 hours or 350 hours
177+
178+
_Difficulty_: Medium to Hard
179+
180+
_Languages_: C, shell(bash)
181+
182+
_Possible mentors_:
183+
184+
* Christian Couder < <christian.couder@gmail.com> >
185+
* Karthik Nayak < <karthik.188@gmail.com> >
186+
* Justin Tobler < <jltobler@gmail.com> >
187+
* Siddharth Asthana < <siddharthasthana31@gmail.com> >
188+
* Ayush Chandekar < <ayu.chandekar@gmail.com> >
189+
* Lucas Seiki Oshiro < <lucasseikioshiro@gmail.com> >
190+
191+
### Implement promisor remote fetch ordering
192+
193+
When a Git repository is configured with multiple promisor remotes, there's
194+
currently no mechanism to specify or optimize the order in which these remotes
195+
should be queried when fetching missing objects. Different remotes may have
196+
different performance characteristics, costs, or reliability, making fetch
197+
order an important consideration.
198+
199+
This project aims to implement a fetch ordering mechanism for multiple promisor
200+
remotes. The order could be:
201+
- Configured locally by the client
202+
- Advertised by servers through the promisor-remote protocol
203+
- Determined dynamically based on network conditions or other heuristics
204+
205+
The key challenge is designing a flexible system that allows servers to
206+
communicate their preferred fetch order to clients (to ensure optimal
207+
performance and cost management) while still allowing client-side overrides
208+
when appropriate.
209+
210+
**Getting started:** Build Git from source, set up a repository with multiple
211+
promisor remotes and experiment with object fetching, study how Git currently
212+
handles multiple remotes, review the promisor-remote protocol in
213+
`Documentation/gitprotocol-v2.txt`, understand partial clone implementation,
214+
and submit a micro-patch to demonstrate familiarity with the codebase.
215+
216+
**Resources:**
217+
- [Partial clone documentation](https://git-scm.com/docs/partial-clone)
218+
- [Git Protocol v2 documentation](https://git-scm.com/docs/gitprotocol-v2)
219+
220+
_Expected Project Size_: 175 hours or 350 hours
221+
222+
_Difficulty_: Medium to Hard
223+
224+
_Languages_: C, shell(bash)
225+
226+
_Possible mentors_:
227+
228+
* Christian Couder < <christian.couder@gmail.com> >
229+
* Karthik Nayak < <karthik.188@gmail.com> >
230+
* Justin Tobler < <jltobler@gmail.com> >
231+
* Siddharth Asthana < <siddharthasthana31@gmail.com> >
232+
* Ayush Chandekar < <ayu.chandekar@gmail.com> >
233+
* Lucas Seiki Oshiro < <lucasseikioshiro@gmail.com> >
234+
235+
### Enhance promisor-remote protocol for better-connected remotes
236+
237+
Currently, the promisor-remote protocol allows servers to advertise remotes
238+
that the server itself uses as promisor remotes. However, as suggested by
239+
Junio Hamano, it would be more useful if servers could advertise
240+
"better-connected" remotes - remotes that might not be promisor remotes
241+
for the server but would be good choices for the client.
242+
243+
This enhancement would allow servers to guide clients toward optimal remote
244+
configurations, potentially improving performance and reducing load on
245+
individual servers by distributing requests across a network of remotes.
246+
247+
This project involves:
248+
- Extending the promisor-remote protocol to support advertising
249+
better-connected remotes
250+
- Implementing server-side logic to determine and advertise appropriate remotes
251+
- Implementing client-side handling of these advertisements
252+
- Designing the protocol extension with backward compatibility in mind
253+
- Testing with various network topologies
254+
255+
**Getting started:** Build Git from source, study the current promisor-remote
256+
protocol implementation, read Junio's suggestion in `Documentation/gitprotocol-v2.txt`,
257+
understand how Git currently advertises and uses promisor remotes, set up test
258+
scenarios with multiple interconnected remotes, and submit a micro-patch to
259+
demonstrate familiarity with the codebase.
260+
261+
**Resources:**
262+
- [Partial clone documentation](https://git-scm.com/docs/partial-clone)
263+
- [Git Protocol v2 documentation - promisor remote section](https://git-scm.com/docs/gitprotocol-v2#_promisor_remotepr_info)
264+
265+
_Expected Project Size_: 175 hours or 350 hours
266+
267+
_Difficulty_: Hard
268+
269+
_Languages_: C, shell(bash)
270+
271+
_Possible mentors_:
272+
273+
* Christian Couder < <christian.couder@gmail.com> >
274+
* Karthik Nayak < <karthik.188@gmail.com> >
275+
* Justin Tobler < <jltobler@gmail.com> >
276+
* Siddharth Asthana < <siddharthasthana31@gmail.com> >
277+
* Ayush Chandekar < <ayu.chandekar@gmail.com> >
278+
* Lucas Seiki Oshiro < <lucasseikioshiro@gmail.com> >
279+
146280
### Complete and extend the `remote-object-info` command for `git cat-file`
147281

148282
From around June 2024 to March 2025, work was undertaken by Eric Ju to add a
@@ -188,10 +322,61 @@ _Languages_: C, shell(bash)
188322

189323
_Possible mentors_:
190324

325+
* Christian Couder < christian.couder@gmail.com >
326+
* Karthik Nayak < karthik.188@gmail.com >
327+
* Justin Tobler < jltobler@gmail.com >
328+
* Ayush Chandekar < ayu.chandekar@gmail.com >
329+
* Siddharth Asthana < siddharthasthana31@gmail.com >
330+
* Lucas Seiki Oshiro < lucasseikioshiro@gmail.com >
331+
* Chandra Pratap < chandrapratap3519@gmail.com >
332+
333+
### Improve signature handling in fast-export/fast-import and git-filter-repo
334+
335+
Git's `fast-export` and `fast-import` commands are powerful tools for
336+
repository manipulation and migration, and `git-filter-repo` builds on
337+
these to provide advanced repository filtering capabilities. However,
338+
handling of commit and tag signatures during these operations could
339+
be significantly improved.
340+
341+
Currently, signatures may be lost or become invalid when objects are
342+
exported and imported, which can be problematic for repositories that
343+
rely on signed commits or tags for security and verification purposes.
344+
345+
This project aims to improve how these tools handle signatures by:
346+
- Preserving signature information during export/import operations
347+
- Providing options for signature handling (preserve, strip, re-sign, etc.)
348+
- Ensuring signature validity is maintained or appropriately flagged
349+
- Extending `git-filter-repo` to handle signatures correctly
350+
- Adding tests and documentation for signature-related workflows
351+
352+
**Note:** This project may potentially conflict with ongoing work by GitLab
353+
developers (including Christian Couder) on signature handling. Applicants
354+
should coordinate with mentors before proposing this project to ensure the
355+
work would not duplicate ongoing efforts.
356+
357+
**Getting started:** Build Git from source, experiment with `git fast-export`
358+
and `git fast-import` on repositories with signed commits and tags, study
359+
the current signature handling code, review `git-filter-repo` functionality,
360+
understand GPG signature verification in Git, and submit a micro-patch to
361+
demonstrate familiarity with the codebase.
362+
363+
**Resources:**
364+
- [git-fast-export documentation](https://git-scm.com/docs/git-fast-export)
365+
- [git-fast-import documentation](https://git-scm.com/docs/git-fast-import)
366+
- [git-filter-repo project](https://github.com/newren/git-filter-repo)
367+
- [Git signature verification documentation](https://git-scm.com/docs/git-verify-commit)
368+
369+
_Expected Project Size_: 175 hours or 350 hours
370+
371+
_Difficulty_: Medium to Hard
372+
373+
_Languages_: C, Python (for git-filter-repo), shell(bash)
374+
375+
_Possible mentors_:
376+
191377
* Christian Couder < <christian.couder@gmail.com> >
192378
* Karthik Nayak < <karthik.188@gmail.com> >
193379
* Justin Tobler < <jltobler@gmail.com> >
194-
* Ayush Chandekar < <ayu.chandekar@gmail.com> >
195380
* Siddharth Asthana < <siddharthasthana31@gmail.com> >
381+
* Ayush Chandekar < <ayu.chandekar@gmail.com> >
196382
* Lucas Seiki Oshiro < <lucasseikioshiro@gmail.com> >
197-
* Chandra Pratap < <chandrapratap3519@gmail.com> >

0 commit comments

Comments
 (0)