Demonstrate use of Git filters to format source files during checkout, staging, and diffing.
The C source files use 6-space banner-style indentation, which was the standard used in MuseScore Studio's source code prior to version 4. This isn't a popular style, but the point is to demonstrate how you can use a different coding style on your machine without affecting how the code looks for other developers.
You don't actually need to compile the C code in order to understand and use this demo.
You need these in PATH:
- Git
- Bash (version 4.2 or higher to enable
shopt -s lastpipe) - Uncrustify (ideally version 0.73, see instructions).
Uncrustify's output can change between versions, so it's best to use a fixed version.
git clone https://github.com/shoogle/git-filter-demo.git
cd git-filter-demoOr fork the project on GitHub and clone your fork.
Let's define two Git Aliases to list files and their attributes as declared in .gitattributes.
git config --global alias.ls-attrs '!f() { git ls-files "$@" | git check-attr --all --stdin ;}; f'
git config --global alias.ls-attr '!f() {
a="${1%%=*}" p=""
case "$a" in
--*) a="${a#--}" v=unspecified;;
-*) a="${a#-}" v=unset;;
"$1") v=set;;
*) v="${1#*=}" p="/ (set|unset|unspecified)$/! ";;
esac
shift
git ls-files "$@" | git check-attr "$a" --stdin | sed -nE "${p}s :\ $a:\ $v$ p"
}; f'Usage
# List all attributes for:
git ls-attrs # All files
git ls-attrs [globs...] # All matching files
# Example globs:
git ls-attrs '*.h' # All C header files
git ls-attrs '*.c' '*.h' # All C source and header files
git ls-attrs src/demo.c # Just one file
# Note: Special characters in globs must be 'quoted' or \escaped to avoid expansion by the shell.# List files for which attribute 'ATTR' is:
git ls-attr ATTR [paths...] [globs...] # True (i.e. set but not to a value)
git ls-attr -ATTR [paths...] [globs...] # False (i.e. explicitly unset)
git ls-attr --ATTR [paths...] [globs...] # Unspecified (i.e. neither set nor unset)
git ls-attr ATTR=PATTERN [paths...] [globs...] # Set to a matching value
# Pattern can be any POSIX Extended Regular Expression (ERE), such as:
git ls-attr ATTR=bar # Exactly 'bar'
git ls-attr ATTR='(bar|baz)' # Exactly 'bar' or 'baz'
git ls-attr ATTR='[0-9]+' # Just one or more digits
git ls-attr ATTR='.*' # Any string value
git ls-attr ATTR='ba.*' # Begins 'ba'
# Note: Pattern is always matched against an attribute's entire value. Don't try to include a
# leading ^ (match start-of-line) or trailing $ (match end-of-line) in the pattern. Any special
# characters in pattern must be 'quoted' or \escaped to protect them from the shell.See also:
Learn more about aliases and how Git runs shell commands
Git assumes that most aliased commands contain arguments for Git itself. Thus you could make
git changed an alias for git diff --name-only:
git config --global alias.changed 'diff --name-only'
# Usage:
git changed # List files with unstaged changes (i.e. changed files)
git changed --cached # List files with staged changes (i.e. added files)
git changed --cached '*.h' # List C header files with staged changesHowever, when an aliased command starts with !, Git assumes it's a shell command, and runs it
inside a POSIX shell (/bin/sh) along with any arguments given after the alias.
git config --global alias.print-args '!printf "%s\n"'
# Usage:
git print-args # Print an empty line
git print-args foo # Print 'foo'
git print-args foo bar # Print 'foo' and 'bar' on different lines
# Note: If you want to suppress the empty line for zero arguments, set the alias like this instead:
git config --global alias.print-args '![ $# -eq 0 ] || printf "%s\n"'You can use this Bash function to simulate Git's handling of alias commands:
function run_as_alias() (
set -euo pipefail
local cmd="$1" i=1
shift
if [[ "${cmd:0:1}" == '!' ]]; then
cmd="${cmd:1}"
dash -c "${cmd} \"\$@\"" "${cmd}" "$@" # change 'dash' to 'sh' if you don't have dash
else
while [[ "${cmd: -${i}:1}" == '\' ]]; do
((++i))
done
if ((i % 2 == 0)); then
echo >&2 'fatal: bad alias.run_as_alias string: cmdline ends with \'
return 128
fi
cmd="$(printf '%s' "${cmd}" | xargs printf '%q ')" # split on unquoted spaces
dash -c "git ${cmd} \"\$@\"" git "$@"
fi
)
# Usage:
run_as_alias '[!]YOUR_COMMAND [ARGS...]' [args...]
# Examples:
run_as_alias 'diff --name-only' # Test a Git command
run_as_alias 'diff --name-only' --cached '*.h' # Test a Git command with extra arguments
run_as_alias '!echo one' two three # Test a shell command with extra arguments
run_as_alias "$(git config alias.print-args)" 'foo bar' # Test an alias you defined earlierWhenever Git runs a command inside a shell, it does so in the standard shell /bin/sh, and it sets
the shell variable $0 to be the command itself (hence ${cmd} appears twice one one line in the
function above). If extra arguments are passed in after the command, these are set as the shell
parameter variables $1, $2, $3, etc.
On many systems, /bin/sh is a symlink to a simple POSIX shell like dash, or an outdated version
of bash. This means it may not support all the modern "bashisms" (i.e. advanced features) that
you may be accustomed to using in your normal interactive shell.
If you want to use advanced shell features (e.g. arrays), your command needs to specify a more
advanced shell, or load a script that specifies a more advanced shell on the shebang (!#) line.
run_as_alias '!bash -c "YOUR_ADVANCED_COMMAND"' [args...]
run_as_alias '!bash path/to/script.sh' [args...]
run_as_alias '!path/to/script.sh' [args...] # if script.sh begins `#!/usr/bin/env bash`Details
A filter called tidy_c is declared in .gitattributes.
Let's define smudge and clean commands for this filter. Git will run these commands during
checkout and staging respectively.
# Optional: Checkout K&R style (or whatever style you prefer to edit in):
git config filter.tidy_c.smudge "lint/uncrustify/wrapper.sh -l C -c lint/uncrustify/kr.cfg"
git ls-attr | sed -n "s|: filter: tidy_c$||p" | xargs touch # mark affected files dirty
git ls-attr | sed -n "s|: filter: tidy_c$||p" | xargs git checkout -- # apply smudge
# Required: Check-in MuseScore legacy style:
git config filter.tidy_c.clean "lint/uncrustify/wrapper.sh -l C -c lint/uncrustify/musescore.cfg"
git ls-attr | sed -n "s|: filter: tidy_c$||p" | xargs git add --renormalize -- # apply cleanYou can define the smudge command to be whatever you want, or you can leave it undefined if you
prefer to use the internal style shown in the online preview (i.e. when viewing files on GitHub).
You must define the clean command exactly as specified above. This ensures the internal
style remains consistent, which keeps code and diffs clean in the online preview.
Learn more about the smudge and clean commands
The command defined for smudge or clean is run by Git inside a POSIX shell (/bin/sh). The
command must read data from STDIN, process it somehow, and then write data to STDOUT. Git will
not supply any arguments to the command besides those given in the definition.
If the command includes %f, Git will replace this with the 'quoted' path to the file currently
being processed. The command could display this path in a status message sent to STDERR, or use
it to determine how to process the file, e.g. via:
uncrustify --assume=%f(seeuncrustify --help).prettier --stdin-filepath=%f(see docs)- Etc.
However, the command must not attempt to read from the %f file, because this file may not
exist, or its contents may differ from STDIN. This happens if the filter is processing the
staged version of the file, and the file also has unstaged changes.
You can simulate this behavior as follows:
function git_filter_staged() {
local file="$1" cmd="${2//%f/\'${1//\'/\'\\\'\'}\'}"
# Change 'dash' to 'sh' if you don't have dash.
git show ":${file}" | dash -c "${cmd}" "${cmd}"
}
# Test an arbitrary command:
git_filter_staged src/demo.c 'YOUR_COMMAND' # e.g. 'echo >&2 "Processing:" %f; cat' or just 'cat'
# Test the commands you defined earlier:
git_filter_staged src/demo.c "$(git config filter.tidy_c.smudge)"
git_filter_staged src/demo.c "$(git config filter.tidy_c.clean)"As always, when Git runs a command in a shell, it does so in the standard shell (/bin/sh), and
sets the shell variable $0 to be the command itself (hence the repeated "${cmd}" in the
function above).
On many systems, /bin/sh is a symlink to a simple POSIX shell like dash, or an outdated version
of bash. This means it may not support all the modern "bashisms" (i.e. advanced features) that
you're used to using in your normal, interactive shell.
If you want to use advanced shell features (e.g. arrays), your command needs to specify a more
advanced shell, or load a script that specifies a more advanced shell on the shebang (!#) line.
git_filter_staged src/demo.c 'bash -c "YOUR_ADVANCED_COMMAND"'
git_filter_staged src/demo.c 'bash path/to/script.sh'
git_filter_staged src/demo.c 'path/to/script.sh' # if script.sh begins `#!/usr/bin/env bash`See also:
- Git Attributes: filter
- Git Attributes: Keyword expansion (look for
indent)
Make some changes to the .c or .h source files in the repository and see how your changes are
reported by git status and git diff.
Try making some whitespace changes (e.g. move curly braces {} to a new line), and also try making
some semantic changes (e.g. add another puts(), or change some words in a C string).
Notice that git diff ignores whitespace changes because they don't survive the filter.
If you defined a smudge command earlier, you may have noticed that git diff displays C code in
the internal style rather than in your checked-out style. To remedy this, .gitattributes also
declares a diff filter called tidy_c.
Let's define the textconv command for this filter. Git will run this command when you diff files
with this attribute.
# Set the diff filter to match the smudge filter (may not work with all smudge filters):
git config diff.tidy_c.textconv "$(git config filter.tidy_c.smudge) <"
# Alternatively, set the diff filter explicitly:
git config diff.tidy_c.textconv "uncrustify -l C -c lint/uncrustify/kr.cfg -f"Now diffs will use your preferred style. Note that this is purely a visual change. It doesn't
affect what happens with git add or git commit.
Learn more about the textconv command
Unlike the smudge and clean commands, the command defined for textconv doesn't receive data
from STDIN. Instead, Git provides the path to a single file, which the textconv command must
read, process somehow, and then write to STDOUT. This path is provided as an extra argument after
all arguments in the command definition, and is also exposed to the command as the shell variable
$1. Although the path is 'quoted' to preserve space characters, these quotes are stripped by the
shell so they are not visible to your command.
You can simulate this with:
file='src/demo.c'
sh -c 'YOUR_COMMAND_DEFINITION '"'${file}'" '' "${file}" | less
sh -c "$(git config diff.tidy_c.textconv) '${file}'" '' "${file}" | lessTry substituting echo >&2 "Diffing: <$1>" as YOUR_COMMAND_DEFINITION and see what happens!
When you perform a diff (e.g. git diff src/demo.c), Git runs the staged and unstaged versions of
the file through your filter and compares them using the ordinary git diff algorithm.
git show HEAD:src/demo.c >/tmp/staged
diff -u --color=always <(sh -c 'YOUR_COMMAND_DEFINITION /tmp/staged' '' /tmp/staged) <(sh -c 'YOUR_COMMAND_DEFINITION src/demo.c' '' src/demo.c) | less -R
diff -u --color=always <(sh -c "$(git config diff.tidy_c.textconv) /tmp/staged" '' /tmp/staged) <(sh -c "$(git config diff.tidy_c.textconv) src/demo.c" '' src/demo.c) | less -RThe diff filter is only used for the visual diff. When committing changes, Git calculates deltas
based on the output of the clean filter. If no clean filter is defined then it uses the actual
file contents.
See also:
You could define auto-format rules for other types of files, such as Markdown README.md files or
build scripts like CMakeLists.txt.
Rules declared in .gitattributes will affect all developers, whereas rules declared in
.git/info/attributes are personal to you.
My personal view is it's definitely worth defining a clean filter for source projects. Doing so
ensures the internal code style remains consistent, which makes for easy code review on GitHub. It
also unlocks the possibility of developers defining smudge filters on their local machines,
because you can't smudge code unless there's a consistent target to clean back to.
I would define smudge and diff filters for binary files because it's difficult to inspect these
files otherwise.
I would declare smudge and diff filters for text files, however I personally would not bother
to define commands for them on my local machine. I prefer to work with content directly rather than
with a representation of the content. This means getting used to a different coding style in each
project I contribute to, but at least this way the local files on my machine, as well as the output
of git diff and git show, always match what you see online in the GitHub preview.
Peter.