Skip to content

Convert the codacy.csharp.ai.insecure-llm-model-usage rule to support allow list#75

Merged
heliocodacy merged 4 commits intomasterfrom
add_custom_rules
Dec 3, 2025
Merged

Convert the codacy.csharp.ai.insecure-llm-model-usage rule to support allow list#75
heliocodacy merged 4 commits intomasterfrom
add_custom_rules

Conversation

@heliocodacy
Copy link
Contributor

No description provided.

Copilot AI review requested due to automatic review settings December 3, 2025 16:42
DMarinhoCodacy
DMarinhoCodacy previously approved these changes Dec 3, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR converts the codacy.csharp.ai.insecure-llm-model-usage rule to support configurable allow lists through a parameterization mechanism. The implementation adds the ability to use HTML comment placeholders (e.g., <!-- MODEL_ALLOW_LIST -->) in Semgrep YAML rules, which are automatically replaced with parameter values at runtime.

Key Changes:

  • Implemented HTML comment placeholder substitution in configuration files to support dynamic parameter injection
  • Added logic to convert comma-separated allow lists into negative lookahead regex patterns to exclude approved models
  • Extended the docgen parsing to automatically extract and create parameters from HTML comment placeholders in YAML rules

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
internal/tool/configuration.go Added parameter placeholder replacement logic, list-to-regex conversion, and pattern matching with parameter support
internal/docgen/pattern-with-explanation.go Extended PatternWithExplanation struct to include Parameters field
internal/docgen/parsing.go Added automatic parameter extraction from HTML placeholders, formatParameterName conversion, and support for multiple custom rule files
docs/multiple-tests/i18n/src/*.{js,jsx,java} Added test files for i18n rule validation
docs/multiple-tests/i18n/src/*.properties Added resource bundle files for i18n testing
docs/multiple-tests/i18n/*.xml Added expected results and patterns for i18n tests
docs/multiple-tests/ai/src/cs/GeminiExample.cs Added test file demonstrating AI model usage with both allowed and disallowed models
docs/multiple-tests/ai/*.xml Added expected results and patterns for AI model usage rule with allow list
docs/codacy-rules-i18n.yaml Added new Semgrep rule for enforcing localized output in Java
docs/codacy-rules-ai.yaml Added new Semgrep rule for detecting insecure LLM model usage with parameterized allow list

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +219 to +233
// formatParameterName converts UPPER_CASE to camelCase
func formatParameterName(name string) string {
parts := strings.Split(strings.ToLower(name), "_")
if len(parts) == 0 {
return name
}

result := parts[0]
for i := 1; i < len(parts); i++ {
if len(parts[i]) > 0 {
result += strings.ToUpper(string(parts[i][0])) + parts[i][1:]
}
}
return result
}
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formatParameterName function is duplicated between internal/tool/configuration.go and internal/docgen/parsing.go. Consider extracting this function to a shared utility package to avoid code duplication and maintain consistency.

Copilot uses AI. Check for mistakes.
Comment on lines 13 to +18
)

var htmlCommentRegex = regexp.MustCompile(`<!--\s*([A-Z_]+)\s*-->`)

Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The htmlCommentRegex variable is duplicated between internal/tool/configuration.go and internal/docgen/parsing.go. Consider defining this regex in a shared package to maintain consistency and avoid duplication.

Suggested change
)
var htmlCommentRegex = regexp.MustCompile(`<!--\s*([A-Z_]+)\s*-->`)
"internal/shared"

Copilot uses AI. Check for mistakes.
Comment on lines +204 to +207
// Trim whitespace
item = strings.TrimSpace(item)
// Escape dots for regex
item = strings.ReplaceAll(item, ".", "\\.")
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convertListToRegex function only escapes dots (.) for regex special characters. However, model names could contain other regex special characters like +, *, ?, (, ), [, ], {, }, |, ^, $, \, etc. Consider using regexp.QuoteMeta() to properly escape all regex special characters to avoid potential regex injection or matching errors.

Example:

item = regexp.QuoteMeta(strings.TrimSpace(item))
Suggested change
// Trim whitespace
item = strings.TrimSpace(item)
// Escape dots for regex
item = strings.ReplaceAll(item, ".", "\\.")
// Trim whitespace and escape all regex meta-characters
item = regexp.QuoteMeta(strings.TrimSpace(item))

Copilot uses AI. Check for mistakes.
Comment on lines +200 to +217
func convertListToRegex(list string, include bool) string {
// Split by comma and trim spaces
items := strings.Split(list, ",")
for i, item := range items {
// Trim whitespace
item = strings.TrimSpace(item)
// Escape dots for regex
item = strings.ReplaceAll(item, ".", "\\.")
items[i] = item
}

// Join with pipe separator and wrap in regex anchors
if include {
return fmt.Sprintf("^(%s)$", strings.Join(items, "|"))
}

return fmt.Sprintf("^(?!(%s)$).*", strings.Join(items, "|"))
}
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The include parameter in convertListToRegex is always passed as false (line 184), making the if include branch on lines 212-213 unreachable dead code. Consider removing the unused parameter and the unreachable code branch to simplify the function.

Copilot uses AI. Check for mistakes.
Comment on lines +203 to +216
for i, item := range items {
// Trim whitespace
item = strings.TrimSpace(item)
// Escape dots for regex
item = strings.ReplaceAll(item, ".", "\\.")
items[i] = item
}

// Join with pipe separator and wrap in regex anchors
if include {
return fmt.Sprintf("^(%s)$", strings.Join(items, "|"))
}

return fmt.Sprintf("^(?!(%s)$).*", strings.Join(items, "|"))
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convertListToRegex function doesn't handle empty strings properly. If the input list is empty or contains only whitespace, this will generate a malformed regex like ^(?!()$).* or ^(?!(|item)$).* with empty alternatives. Consider adding validation to handle empty lists or filtering out empty items after trimming.

Suggested change
for i, item := range items {
// Trim whitespace
item = strings.TrimSpace(item)
// Escape dots for regex
item = strings.ReplaceAll(item, ".", "\\.")
items[i] = item
}
// Join with pipe separator and wrap in regex anchors
if include {
return fmt.Sprintf("^(%s)$", strings.Join(items, "|"))
}
return fmt.Sprintf("^(?!(%s)$).*", strings.Join(items, "|"))
filteredItems := make([]string, 0, len(items))
for _, item := range items {
// Trim whitespace
item = strings.TrimSpace(item)
if item == "" {
continue
}
// Escape dots for regex
item = strings.ReplaceAll(item, ".", "\\.")
filteredItems = append(filteredItems, item)
}
// Handle empty list case
if len(filteredItems) == 0 {
if include {
// Matches nothing
return "^$"
} else {
// Matches everything
return ".*"
}
}
// Join with pipe separator and wrap in regex anchors
if include {
return fmt.Sprintf("^(%s)$", strings.Join(filteredItems, "|"))
}
return fmt.Sprintf("^(?!(%s)$).*", strings.Join(filteredItems, "|"))

Copilot uses AI. Check for mistakes.
formattedName := formatParameterName(paramName)
parameters = append(parameters, codacy.PatternParameter{
Name: formattedName,
Description: fmt.Sprintf("Regular expression pattern for %s", strings.ToLower(strings.ReplaceAll(paramName, "_", " "))),
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The parameter description uses strings.ToLower(strings.ReplaceAll(paramName, "_", " ")) which will produce descriptions like "regular expression pattern for model allow list". This could be improved for readability. Consider using a more sophisticated approach that preserves proper casing or creates more natural language descriptions (e.g., "Regular expression pattern for the model allow list").

Suggested change
Description: fmt.Sprintf("Regular expression pattern for %s", strings.ToLower(strings.ReplaceAll(paramName, "_", " "))),
Description: fmt.Sprintf("Regular expression pattern for %s", formatParameterDescription(paramName)),

Copilot uses AI. Check for mistakes.
Comment on lines +316 to +319
parameters = append(parameters, codacy.PatternParameter{
Name: formattedName,
Description: fmt.Sprintf("Regular expression pattern for %s", strings.ToLower(strings.ReplaceAll(paramName, "_", " "))),
Default: ".*",
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value for parameters ending with _ALLOW_LIST is set to ".*" (line 319), which will match everything when converted to a negative lookahead regex ^(?!(.*)$).*. This effectively creates a rule that never matches anything, which is likely not the intended behavior. Consider using an empty string or a more appropriate default value that indicates no models are allowed by default, or document this behavior clearly.

Suggested change
parameters = append(parameters, codacy.PatternParameter{
Name: formattedName,
Description: fmt.Sprintf("Regular expression pattern for %s", strings.ToLower(strings.ReplaceAll(paramName, "_", " "))),
Default: ".*",
defaultValue := ".*"
if strings.HasSuffix(paramName, "_ALLOW_LIST") {
defaultValue = ""
}
parameters = append(parameters, codacy.PatternParameter{
Name: formattedName,
Description: fmt.Sprintf("Regular expression pattern for %s", strings.ToLower(strings.ReplaceAll(paramName, "_", " "))),
Default: defaultValue,

Copilot uses AI. Check for mistakes.
Comment on lines +171 to +187
for _, param := range pattern.Parameters {

if param.Name == formattedParamName {
// Use Value if set, otherwise use Default
value := param.Value
if value == nil {
value = param.Default
}
if value != nil {
valueStr := fmt.Sprintf("%v", value)

// If parameter name ends with _ALLOW_LIST, convert comma-separated list to regex pattern
if strings.HasSuffix(paramName, "_ALLOW_LIST") {
return convertListToRegex(valueStr, false)
}
return valueStr
}
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The loop on lines 171-189 iterates through pattern.Parameters to find a matching parameter. Since this is a simple linear search that returns immediately on the first match, consider using lo.Find from the samber/lo library (which is already imported and used elsewhere in the codebase) for consistency and better readability.

Example:

param, found := lo.Find(pattern.Parameters, func(p codacy.PatternParameter) bool {
    return p.Name == formattedParamName
})
if found {
    value := param.Value
    if value == nil {
        value = param.Default
    }
    // ... rest of logic
}
Suggested change
for _, param := range pattern.Parameters {
if param.Name == formattedParamName {
// Use Value if set, otherwise use Default
value := param.Value
if value == nil {
value = param.Default
}
if value != nil {
valueStr := fmt.Sprintf("%v", value)
// If parameter name ends with _ALLOW_LIST, convert comma-separated list to regex pattern
if strings.HasSuffix(paramName, "_ALLOW_LIST") {
return convertListToRegex(valueStr, false)
}
return valueStr
}
param, found := lo.Find(pattern.Parameters, func(p codacy.PatternParameter) bool {
return p.Name == formattedParamName
})
if found {
// Use Value if set, otherwise use Default
value := param.Value
if value == nil {
value = param.Default
}
if value != nil {
valueStr := fmt.Sprintf("%v", value)
// If parameter name ends with _ALLOW_LIST, convert comma-separated list to regex pattern
if strings.HasSuffix(paramName, "_ALLOW_LIST") {
return convertListToRegex(valueStr, false)
}
return valueStr

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +14
var response2 = await client.Models.GenerateContentAsync(
model: "gemini-2.5-flash", contents: "Explain how AI works in a few words"
);
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment to response2 is useless, since its value is never read.

Suggested change
var response2 = await client.Models.GenerateContentAsync(
model: "gemini-2.5-flash", contents: "Explain how AI works in a few words"
);

Copilot uses AI. Check for mistakes.

export default function OrderList() {
const { t, i18n } = useTranslation();
const [orders, setOrders] = useState([
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable setOrders.

Suggested change
const [orders, setOrders] = useState([
const [orders] = useState([

Copilot uses AI. Check for mistakes.
@heliocodacy heliocodacy merged commit 8d20fbe into master Dec 3, 2025
7 checks passed
@heliocodacy heliocodacy deleted the add_custom_rules branch December 3, 2025 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants