Convert the codacy.csharp.ai.insecure-llm-model-usage rule to support allow list#75
Convert the codacy.csharp.ai.insecure-llm-model-usage rule to support allow list#75heliocodacy merged 4 commits intomasterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR converts the codacy.csharp.ai.insecure-llm-model-usage rule to support configurable allow lists through a parameterization mechanism. The implementation adds the ability to use HTML comment placeholders (e.g., <!-- MODEL_ALLOW_LIST -->) in Semgrep YAML rules, which are automatically replaced with parameter values at runtime.
Key Changes:
- Implemented HTML comment placeholder substitution in configuration files to support dynamic parameter injection
- Added logic to convert comma-separated allow lists into negative lookahead regex patterns to exclude approved models
- Extended the docgen parsing to automatically extract and create parameters from HTML comment placeholders in YAML rules
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
internal/tool/configuration.go |
Added parameter placeholder replacement logic, list-to-regex conversion, and pattern matching with parameter support |
internal/docgen/pattern-with-explanation.go |
Extended PatternWithExplanation struct to include Parameters field |
internal/docgen/parsing.go |
Added automatic parameter extraction from HTML placeholders, formatParameterName conversion, and support for multiple custom rule files |
docs/multiple-tests/i18n/src/*.{js,jsx,java} |
Added test files for i18n rule validation |
docs/multiple-tests/i18n/src/*.properties |
Added resource bundle files for i18n testing |
docs/multiple-tests/i18n/*.xml |
Added expected results and patterns for i18n tests |
docs/multiple-tests/ai/src/cs/GeminiExample.cs |
Added test file demonstrating AI model usage with both allowed and disallowed models |
docs/multiple-tests/ai/*.xml |
Added expected results and patterns for AI model usage rule with allow list |
docs/codacy-rules-i18n.yaml |
Added new Semgrep rule for enforcing localized output in Java |
docs/codacy-rules-ai.yaml |
Added new Semgrep rule for detecting insecure LLM model usage with parameterized allow list |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // formatParameterName converts UPPER_CASE to camelCase | ||
| func formatParameterName(name string) string { | ||
| parts := strings.Split(strings.ToLower(name), "_") | ||
| if len(parts) == 0 { | ||
| return name | ||
| } | ||
|
|
||
| result := parts[0] | ||
| for i := 1; i < len(parts); i++ { | ||
| if len(parts[i]) > 0 { | ||
| result += strings.ToUpper(string(parts[i][0])) + parts[i][1:] | ||
| } | ||
| } | ||
| return result | ||
| } |
There was a problem hiding this comment.
The formatParameterName function is duplicated between internal/tool/configuration.go and internal/docgen/parsing.go. Consider extracting this function to a shared utility package to avoid code duplication and maintain consistency.
| ) | ||
|
|
||
| var htmlCommentRegex = regexp.MustCompile(`<!--\s*([A-Z_]+)\s*-->`) | ||
|
|
There was a problem hiding this comment.
The htmlCommentRegex variable is duplicated between internal/tool/configuration.go and internal/docgen/parsing.go. Consider defining this regex in a shared package to maintain consistency and avoid duplication.
| ) | |
| var htmlCommentRegex = regexp.MustCompile(`<!--\s*([A-Z_]+)\s*-->`) | |
| "internal/shared" |
| // Trim whitespace | ||
| item = strings.TrimSpace(item) | ||
| // Escape dots for regex | ||
| item = strings.ReplaceAll(item, ".", "\\.") |
There was a problem hiding this comment.
The convertListToRegex function only escapes dots (.) for regex special characters. However, model names could contain other regex special characters like +, *, ?, (, ), [, ], {, }, |, ^, $, \, etc. Consider using regexp.QuoteMeta() to properly escape all regex special characters to avoid potential regex injection or matching errors.
Example:
item = regexp.QuoteMeta(strings.TrimSpace(item))| // Trim whitespace | |
| item = strings.TrimSpace(item) | |
| // Escape dots for regex | |
| item = strings.ReplaceAll(item, ".", "\\.") | |
| // Trim whitespace and escape all regex meta-characters | |
| item = regexp.QuoteMeta(strings.TrimSpace(item)) |
| func convertListToRegex(list string, include bool) string { | ||
| // Split by comma and trim spaces | ||
| items := strings.Split(list, ",") | ||
| for i, item := range items { | ||
| // Trim whitespace | ||
| item = strings.TrimSpace(item) | ||
| // Escape dots for regex | ||
| item = strings.ReplaceAll(item, ".", "\\.") | ||
| items[i] = item | ||
| } | ||
|
|
||
| // Join with pipe separator and wrap in regex anchors | ||
| if include { | ||
| return fmt.Sprintf("^(%s)$", strings.Join(items, "|")) | ||
| } | ||
|
|
||
| return fmt.Sprintf("^(?!(%s)$).*", strings.Join(items, "|")) | ||
| } |
There was a problem hiding this comment.
The include parameter in convertListToRegex is always passed as false (line 184), making the if include branch on lines 212-213 unreachable dead code. Consider removing the unused parameter and the unreachable code branch to simplify the function.
| for i, item := range items { | ||
| // Trim whitespace | ||
| item = strings.TrimSpace(item) | ||
| // Escape dots for regex | ||
| item = strings.ReplaceAll(item, ".", "\\.") | ||
| items[i] = item | ||
| } | ||
|
|
||
| // Join with pipe separator and wrap in regex anchors | ||
| if include { | ||
| return fmt.Sprintf("^(%s)$", strings.Join(items, "|")) | ||
| } | ||
|
|
||
| return fmt.Sprintf("^(?!(%s)$).*", strings.Join(items, "|")) |
There was a problem hiding this comment.
The convertListToRegex function doesn't handle empty strings properly. If the input list is empty or contains only whitespace, this will generate a malformed regex like ^(?!()$).* or ^(?!(|item)$).* with empty alternatives. Consider adding validation to handle empty lists or filtering out empty items after trimming.
| for i, item := range items { | |
| // Trim whitespace | |
| item = strings.TrimSpace(item) | |
| // Escape dots for regex | |
| item = strings.ReplaceAll(item, ".", "\\.") | |
| items[i] = item | |
| } | |
| // Join with pipe separator and wrap in regex anchors | |
| if include { | |
| return fmt.Sprintf("^(%s)$", strings.Join(items, "|")) | |
| } | |
| return fmt.Sprintf("^(?!(%s)$).*", strings.Join(items, "|")) | |
| filteredItems := make([]string, 0, len(items)) | |
| for _, item := range items { | |
| // Trim whitespace | |
| item = strings.TrimSpace(item) | |
| if item == "" { | |
| continue | |
| } | |
| // Escape dots for regex | |
| item = strings.ReplaceAll(item, ".", "\\.") | |
| filteredItems = append(filteredItems, item) | |
| } | |
| // Handle empty list case | |
| if len(filteredItems) == 0 { | |
| if include { | |
| // Matches nothing | |
| return "^$" | |
| } else { | |
| // Matches everything | |
| return ".*" | |
| } | |
| } | |
| // Join with pipe separator and wrap in regex anchors | |
| if include { | |
| return fmt.Sprintf("^(%s)$", strings.Join(filteredItems, "|")) | |
| } | |
| return fmt.Sprintf("^(?!(%s)$).*", strings.Join(filteredItems, "|")) |
| formattedName := formatParameterName(paramName) | ||
| parameters = append(parameters, codacy.PatternParameter{ | ||
| Name: formattedName, | ||
| Description: fmt.Sprintf("Regular expression pattern for %s", strings.ToLower(strings.ReplaceAll(paramName, "_", " "))), |
There was a problem hiding this comment.
[nitpick] The parameter description uses strings.ToLower(strings.ReplaceAll(paramName, "_", " ")) which will produce descriptions like "regular expression pattern for model allow list". This could be improved for readability. Consider using a more sophisticated approach that preserves proper casing or creates more natural language descriptions (e.g., "Regular expression pattern for the model allow list").
| Description: fmt.Sprintf("Regular expression pattern for %s", strings.ToLower(strings.ReplaceAll(paramName, "_", " "))), | |
| Description: fmt.Sprintf("Regular expression pattern for %s", formatParameterDescription(paramName)), |
| parameters = append(parameters, codacy.PatternParameter{ | ||
| Name: formattedName, | ||
| Description: fmt.Sprintf("Regular expression pattern for %s", strings.ToLower(strings.ReplaceAll(paramName, "_", " "))), | ||
| Default: ".*", |
There was a problem hiding this comment.
The default value for parameters ending with _ALLOW_LIST is set to ".*" (line 319), which will match everything when converted to a negative lookahead regex ^(?!(.*)$).*. This effectively creates a rule that never matches anything, which is likely not the intended behavior. Consider using an empty string or a more appropriate default value that indicates no models are allowed by default, or document this behavior clearly.
| parameters = append(parameters, codacy.PatternParameter{ | |
| Name: formattedName, | |
| Description: fmt.Sprintf("Regular expression pattern for %s", strings.ToLower(strings.ReplaceAll(paramName, "_", " "))), | |
| Default: ".*", | |
| defaultValue := ".*" | |
| if strings.HasSuffix(paramName, "_ALLOW_LIST") { | |
| defaultValue = "" | |
| } | |
| parameters = append(parameters, codacy.PatternParameter{ | |
| Name: formattedName, | |
| Description: fmt.Sprintf("Regular expression pattern for %s", strings.ToLower(strings.ReplaceAll(paramName, "_", " "))), | |
| Default: defaultValue, |
| for _, param := range pattern.Parameters { | ||
|
|
||
| if param.Name == formattedParamName { | ||
| // Use Value if set, otherwise use Default | ||
| value := param.Value | ||
| if value == nil { | ||
| value = param.Default | ||
| } | ||
| if value != nil { | ||
| valueStr := fmt.Sprintf("%v", value) | ||
|
|
||
| // If parameter name ends with _ALLOW_LIST, convert comma-separated list to regex pattern | ||
| if strings.HasSuffix(paramName, "_ALLOW_LIST") { | ||
| return convertListToRegex(valueStr, false) | ||
| } | ||
| return valueStr | ||
| } |
There was a problem hiding this comment.
[nitpick] The loop on lines 171-189 iterates through pattern.Parameters to find a matching parameter. Since this is a simple linear search that returns immediately on the first match, consider using lo.Find from the samber/lo library (which is already imported and used elsewhere in the codebase) for consistency and better readability.
Example:
param, found := lo.Find(pattern.Parameters, func(p codacy.PatternParameter) bool {
return p.Name == formattedParamName
})
if found {
value := param.Value
if value == nil {
value = param.Default
}
// ... rest of logic
}| for _, param := range pattern.Parameters { | |
| if param.Name == formattedParamName { | |
| // Use Value if set, otherwise use Default | |
| value := param.Value | |
| if value == nil { | |
| value = param.Default | |
| } | |
| if value != nil { | |
| valueStr := fmt.Sprintf("%v", value) | |
| // If parameter name ends with _ALLOW_LIST, convert comma-separated list to regex pattern | |
| if strings.HasSuffix(paramName, "_ALLOW_LIST") { | |
| return convertListToRegex(valueStr, false) | |
| } | |
| return valueStr | |
| } | |
| param, found := lo.Find(pattern.Parameters, func(p codacy.PatternParameter) bool { | |
| return p.Name == formattedParamName | |
| }) | |
| if found { | |
| // Use Value if set, otherwise use Default | |
| value := param.Value | |
| if value == nil { | |
| value = param.Default | |
| } | |
| if value != nil { | |
| valueStr := fmt.Sprintf("%v", value) | |
| // If parameter name ends with _ALLOW_LIST, convert comma-separated list to regex pattern | |
| if strings.HasSuffix(paramName, "_ALLOW_LIST") { | |
| return convertListToRegex(valueStr, false) | |
| } | |
| return valueStr |
| var response2 = await client.Models.GenerateContentAsync( | ||
| model: "gemini-2.5-flash", contents: "Explain how AI works in a few words" | ||
| ); |
There was a problem hiding this comment.
This assignment to response2 is useless, since its value is never read.
| var response2 = await client.Models.GenerateContentAsync( | |
| model: "gemini-2.5-flash", contents: "Explain how AI works in a few words" | |
| ); |
|
|
||
| export default function OrderList() { | ||
| const { t, i18n } = useTranslation(); | ||
| const [orders, setOrders] = useState([ |
There was a problem hiding this comment.
Unused variable setOrders.
| const [orders, setOrders] = useState([ | |
| const [orders] = useState([ |
No description provided.