Skip to content

Commit a49895b

Browse files
committed
xdl_change_compact(): introduce the concept of a change group
The idea of xdl_change_compact() is fairly simple: * Proceed through groups of changed lines in the file to be compacted, keeping track of the corresponding location in the "other" file. * If possible, slide the group up and down to try to give the most aesthetically pleasing diff. Whenever it is slid, the current location in the other file needs to be adjusted. But these simple concepts are obfuscated by a lot of index handling that is written in terse, subtle, and varied patterns. I found it very hard to convince myself that the function was correct. So introduce a "struct group" that represents a group of changed lines in a file. Add some functions that perform elementary operations on groups: * Initialize a group to the first group in a file * Move to the next or previous group in a file * Slide a group up or down Even though the resulting code is longer, I think it is easier to understand and review. Its performance is not changed appreciably (though it would be if `group_next()` and `group_previous()` were not inlined). ...and in fact, the rewriting helped me discover another bug in the --compaction-heuristic code: The update of blank_lines was never done for the highest possible position of the group. This means that it could fail to slide the group to its highest possible position, even if that position had a blank line as its last line. So for example, it yielded the following diff: $ git diff --no-index --compaction-heuristic a.txt b.txt diff --git a/a.txt b/b.txt index e53969f..0d60c5fe 100644 --- a/a.txt +++ b/b.txt @@ -1,3 +1,7 @@ 1 A + +B + +A 2 when in fact the following diff is better (according to the rules of --compaction-heuristic): $ git diff --no-index --compaction-heuristic a.txt b.txt diff --git a/a.txt b/b.txt index e53969f..0d60c5fe 100644 --- a/a.txt +++ b/b.txt @@ -1,3 +1,7 @@ 1 +A + +B + A 2 The new code gives the bottom answer. Original Git commit: e8adf23d1ee97b57c8aea32ee8365203b77c0e42
1 parent 09fb5b2 commit a49895b

File tree

1 file changed

+185
-78
lines changed

1 file changed

+185
-78
lines changed

src/xdiff/xdiffi.c

Lines changed: 185 additions & 78 deletions
Original file line numberDiff line numberDiff line change
@@ -412,104 +412,211 @@ static int recs_match(xrecord_t *rec1, xrecord_t *rec2, long flags)
412412
flags));
413413
}
414414

415-
int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
416-
long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec;
417-
char *rchg = xdf->rchg, *rchgo = xdfo->rchg;
418-
xrecord_t **recs = xdf->recs;
415+
/*
416+
* Represent a group of changed lines in an xdfile_t (i.e., a contiguous group
417+
* of lines that was inserted or deleted from the corresponding version of the
418+
* file). We consider there to be such a group at the beginning of the file, at
419+
* the end of the file, and between any two unchanged lines, though most such
420+
* groups will usually be empty.
421+
*
422+
* If the first line in a group is equal to the line following the group, then
423+
* the group can be slid down. Similarly, if the last line in a group is equal
424+
* to the line preceding the group, then the group can be slid up. See
425+
* group_slide_down() and group_slide_up().
426+
*
427+
* Note that loops that are testing for changed lines in xdf->rchg do not need
428+
* index bounding since the array is prepared with a zero at position -1 and N.
429+
*/
430+
struct group {
431+
/*
432+
* The index of the first changed line in the group, or the index of
433+
* the unchanged line above which the (empty) group is located.
434+
*/
435+
long start;
419436

420437
/*
421-
* This is the same of what GNU diff does. Move back and forward
422-
* change groups for a consistent and pretty diff output. This also
423-
* helps in finding joinable change groups and reduce the diff size.
438+
* The index of the first unchanged line after the group. For an empty
439+
* group, end is equal to start.
424440
*/
425-
for (ix = ixo = 0;;) {
426-
/*
427-
* Find the first changed line in the to-be-compacted file.
428-
* We need to keep track of both indexes, so if we find a
429-
* changed lines group on the other file, while scanning the
430-
* to-be-compacted file, we need to skip it properly. Note
431-
* that loops that are testing for changed lines on rchg* do
432-
* not need index bounding since the array is prepared with
433-
* a zero at position -1 and N.
434-
*/
435-
for (; ix < nrec && !rchg[ix]; ix++)
436-
while (rchgo[ixo++]);
437-
if (ix == nrec)
438-
break;
441+
long end;
442+
};
443+
444+
/*
445+
* Initialize g to point at the first group in xdf.
446+
*/
447+
static void group_init(xdfile_t *xdf, struct group *g)
448+
{
449+
g->start = g->end = 0;
450+
while (xdf->rchg[g->end])
451+
g->end++;
452+
}
453+
454+
/*
455+
* Move g to describe the next (possibly empty) group in xdf and return 0. If g
456+
* is already at the end of the file, do nothing and return -1.
457+
*/
458+
static inline int group_next(xdfile_t *xdf, struct group *g)
459+
{
460+
if (g->end == xdf->nrec)
461+
return -1;
462+
463+
g->start = g->end + 1;
464+
for (g->end = g->start; xdf->rchg[g->end]; g->end++)
465+
;
466+
467+
return 0;
468+
}
469+
470+
/*
471+
* Move g to describe the previous (possibly empty) group in xdf and return 0.
472+
* If g is already at the beginning of the file, do nothing and return -1.
473+
*/
474+
static inline int group_previous(xdfile_t *xdf, struct group *g)
475+
{
476+
if (g->start == 0)
477+
return -1;
478+
479+
g->end = g->start - 1;
480+
for (g->start = g->end; xdf->rchg[g->start - 1]; g->start--)
481+
;
482+
483+
return 0;
484+
}
485+
486+
/*
487+
* If g can be slid toward the end of the file, do so, and if it bumps into a
488+
* following group, expand this group to include it. Return 0 on success or -1
489+
* if g cannot be slid down.
490+
*/
491+
static int group_slide_down(xdfile_t *xdf, struct group *g, long flags)
492+
{
493+
if (g->end < xdf->nrec &&
494+
recs_match(xdf->recs[g->start], xdf->recs[g->end], flags)) {
495+
xdf->rchg[g->start++] = 0;
496+
xdf->rchg[g->end++] = 1;
497+
498+
while (xdf->rchg[g->end])
499+
g->end++;
500+
501+
return 0;
502+
} else {
503+
return -1;
504+
}
505+
}
506+
507+
/*
508+
* If g can be slid toward the beginning of the file, do so, and if it bumps
509+
* into a previous group, expand this group to include it. Return 0 on success
510+
* or -1 if g cannot be slid up.
511+
*/
512+
static int group_slide_up(xdfile_t *xdf, struct group *g, long flags)
513+
{
514+
if (g->start > 0 &&
515+
recs_match(xdf->recs[g->start - 1], xdf->recs[g->end - 1], flags)) {
516+
xdf->rchg[--g->start] = 1;
517+
xdf->rchg[--g->end] = 0;
518+
519+
while (xdf->rchg[g->start - 1])
520+
g->start--;
521+
522+
return 0;
523+
} else {
524+
return -1;
525+
}
526+
}
527+
528+
static void xdl_bug(const char *msg)
529+
{
530+
fprintf(stderr, "BUG: %s\n", msg);
531+
exit(1);
532+
}
533+
534+
/*
535+
* Move back and forward change groups for a consistent and pretty diff output.
536+
* This also helps in finding joinable change groups and reducing the diff
537+
* size.
538+
*/
539+
int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
540+
struct group g, go;
541+
long earliest_end, end_matching_other;
542+
long groupsize;
543+
544+
group_init(xdf, &g);
545+
group_init(xdfo, &go);
546+
547+
while (1) {
548+
/* If the group is empty in the to-be-compacted file, skip it: */
549+
if (g.end == g.start)
550+
goto next;
439551

440552
/*
441-
* Record the start of a changed-group in the to-be-compacted file
442-
* and find the end of it, on both to-be-compacted and other file
443-
* indexes (ix and ixo).
553+
* Now shift the change up and then down as far as possible in
554+
* each direction. If it bumps into any other changes, merge them.
444555
*/
445-
ixs = ix;
446-
for (ix++; rchg[ix]; ix++);
447-
for (; rchgo[ixo]; ixo++);
448-
449556
do {
450-
grpsiz = ix - ixs;
557+
groupsize = g.end - g.start;
451558

452559
/*
453-
* If the line before the current change group, is equal to
454-
* the last line of the current change group, shift backward
455-
* the group.
560+
* Keep track of the last "end" index that causes this
561+
* group to align with a group of changed lines in the
562+
* other file. -1 indicates that we haven't found such
563+
* a match yet:
456564
*/
457-
while (ixs > 0 && recs_match(recs[ixs - 1], recs[ix - 1], flags)) {
458-
rchg[--ixs] = 1;
459-
rchg[--ix] = 0;
460-
461-
/*
462-
* This change might have joined two change groups,
463-
* so we try to take this scenario in account by moving
464-
* the start index accordingly (and so the other-file
465-
* end-of-group index).
466-
*/
467-
for (; rchg[ixs - 1]; ixs--);
468-
while (rchgo[--ixo]);
469-
}
565+
end_matching_other = -1;
566+
567+
/* Shift the group backward as much as possible: */
568+
while (!group_slide_up(xdf, &g, flags))
569+
if (group_previous(xdfo, &go))
570+
xdl_bug("group sync broken sliding up");
470571

471572
/*
472-
* Record the end-of-group position in case we are matched
473-
* with a group of changes in the other file (that is, the
474-
* change record before the end-of-group index in the other
475-
* file is set).
573+
* This is this highest that this group can be shifted.
574+
* Record its end index:
476575
*/
477-
ixref = rchgo[ixo - 1] ? ix: nrec;
576+
earliest_end = g.end;
577+
578+
if (go.end > go.start)
579+
end_matching_other = g.end;
478580

581+
/* Now shift the group forward as far as possible: */
582+
while (1) {
583+
if (group_slide_down(xdf, &g, flags))
584+
break;
585+
if (group_next(xdfo, &go))
586+
xdl_bug("group sync broken sliding down");
587+
588+
if (go.end > go.start)
589+
end_matching_other = g.end;
590+
}
591+
} while (groupsize != g.end - g.start);
592+
593+
if (g.end == earliest_end) {
594+
/* no shifting was possible */
595+
} else if (end_matching_other != -1) {
479596
/*
480-
* If the first line of the current change group, is equal to
481-
* the line next of the current change group, shift forward
482-
* the group.
597+
* Move the possibly merged group of changes back to line
598+
* up with the last group of changes from the other file
599+
* that it can align with.
483600
*/
484-
while (ix < nrec && recs_match(recs[ixs], recs[ix], flags)) {
485-
rchg[ixs++] = 0;
486-
rchg[ix++] = 1;
487-
488-
/*
489-
* This change might have joined two change groups,
490-
* so we try to take this scenario in account by moving
491-
* the start index accordingly (and so the other-file
492-
* end-of-group index). Keep tracking the reference
493-
* index in case we are shifting together with a
494-
* corresponding group of changes in the other file.
495-
*/
496-
for (; rchg[ix]; ix++);
497-
while (rchgo[++ixo])
498-
ixref = ix;
601+
while (go.end == go.start) {
602+
if (group_slide_up(xdf, &g, flags))
603+
xdl_bug("match disappeared");
604+
if (group_previous(xdfo, &go))
605+
xdl_bug("group sync broken sliding to match");
499606
}
500-
} while (grpsiz != ix - ixs);
501-
502-
/*
503-
* Try to move back the possibly merged group of changes, to match
504-
* the recorded position in the other file.
505-
*/
506-
while (ixref < ix) {
507-
rchg[--ixs] = 1;
508-
rchg[--ix] = 0;
509-
while (rchgo[--ixo]);
510607
}
608+
609+
next:
610+
/* Move past the just-processed group: */
611+
if (group_next(xdf, &g))
612+
break;
613+
if (group_next(xdfo, &go))
614+
xdl_bug("group sync broken moving to next group");
511615
}
512616

617+
if (!group_next(xdfo, &go))
618+
xdl_bug("group sync broken at end of file");
619+
513620
return 0;
514621
}
515622

0 commit comments

Comments
 (0)