find common tags #1191

misrasaurabh1 · 2026-01-28T17:58:03Z

No description provided.

codeflash-ai · 2026-01-28T18:02:57Z

codeflash/result/common_tags.py

+    common_tags = articles[0].get("tags", [])
+    for article in articles[1:]:
+        common_tags = [tag for tag in common_tags if tag in article.get("tags", [])]
+    return set(common_tags)


⚡️Codeflash found 7,931% (79.31x) speedup for find_common_tags in codeflash/result/common_tags.py

⏱️ Runtime : 580 milliseconds → 7.22 milliseconds (best of 43 runs)

📝 Explanation and details

The optimized code achieves a 79x speedup (7931% faster) by replacing list-based filtering with set-based intersection operations.

Key Changes:

Initial conversion to set: common_tags = set(articles[0].get("tags", [])) immediately creates a set instead of keeping a list

In-place set intersection: common_tags.intersection_update(article.get("tags", [])) replaces the list comprehension

Why This Is Faster:

The original implementation uses [tag for tag in common_tags if tag in article.get("tags", [])], which has O(n*m) complexity for each iteration—every tag in common_tags must be checked against every tag in the current article's tag list using in on a list (linear search).

With sets, intersection_update() leverages hash-based lookups with O(1) average time complexity per element, reducing each iteration to approximately O(min(n,m)) where n and m are the sizes of the two sets being intersected.

Performance Impact:

Line profiler shows the critical loop execution dropped from 638ms (99.6% of runtime) to 11.8ms (80.5% of runtime)—a 54x improvement on the bottleneck line alone.

The optimization is particularly effective for:

Large tag lists: test_large_number_of_tags shows 5618% speedup (4.54ms → 79.3μs)

Many articles: test_large_scale_test_cases shows 11,000%+ speedup (382ms → 3.44ms)

All test cases benefit: Even small inputs see 10-50% improvements due to more efficient set operations

The dramatic speedup occurs because the algorithm now avoids quadratic behavior when checking tag membership, making it scale much better with both the number of articles and the size of tag lists.

✅ Correctness verification report:

Test Status

⚙️ Existing Unit Tests ✅ 2 Passed

🌀 Generated Regression Tests ✅ 29 Passed

⏪ Replay Tests 🔘 None Found

🔎 Concolic Coverage Tests ✅ 2 Passed

📊 Tests Coverage 100.0%

⚙️ Click to see Existing Unit Tests

Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup

test_common_tags.py::test_common_tags_1 6.04μs 4.39μs 37.7%✅

🌀 Click to see Generated Regression Tests

# imports # function to test from __future__ import annotations from codeflash.result.common_tags import find_common_tags # unit tests def test_single_article(): # Single article should return its tags articles = [{"tags": ["python", "coding", "tutorial"]}] codeflash_output = find_common_tags(articles) # 2.13μs -> 1.46μs (45.9% faster) # Outputs were verified to be equal to the original implementation def test_multiple_articles_with_common_tags(): # Multiple articles with common tags should return the common tags articles = [{"tags": ["python", "coding"]}, {"tags": ["python", "data"]}, {"tags": ["python", "machine learning"]}] codeflash_output = find_common_tags(articles) # 3.44μs -> 2.31μs (48.5% faster) # Outputs were verified to be equal to the original implementation def test_empty_list_of_articles(): # Empty list of articles should return an empty set articles = [] codeflash_output = find_common_tags(articles) # 872ns -> 491ns (77.6% faster) # Outputs were verified to be equal to the original implementation def test_articles_with_no_common_tags(): # Articles with no common tags should return an empty set articles = [{"tags": ["python"]}, {"tags": ["java"]}, {"tags": ["c++"]}] codeflash_output = find_common_tags(articles) # 2.46μs -> 2.16μs (13.9% faster) # Outputs were verified to be equal to the original implementation def test_articles_with_empty_tag_lists(): # Articles with some empty tag lists should return an empty set articles = [{"tags": []}, {"tags": ["python"]}, {"tags": ["python", "java"]}] codeflash_output = find_common_tags(articles) # 2.01μs -> 1.94μs (3.55% faster) # Outputs were verified to be equal to the original implementation def test_all_articles_with_empty_tag_lists(): # All articles with empty tag lists should return an empty set articles = [{"tags": []}, {"tags": []}, {"tags": []}] codeflash_output = find_common_tags(articles) # 2.07μs -> 1.78μs (16.3% faster) # Outputs were verified to be equal to the original implementation def test_tags_with_special_characters(): # Tags with special characters should be handled correctly articles = [{"tags": ["python!", "coding"]}, {"tags": ["python!", "data"]}] codeflash_output = find_common_tags(articles) # 2.49μs -> 1.90μs (31.0% faster) # Outputs were verified to be equal to the original implementation def test_case_sensitivity(): # Tags with different cases should not be considered the same articles = [{"tags": ["Python", "coding"]}, {"tags": ["python", "data"]}] codeflash_output = find_common_tags(articles) # 2.19μs -> 1.77μs (23.7% faster) # Outputs were verified to be equal to the original implementation def test_large_number_of_articles(): # Large number of articles with a common tag should return that tag articles = [{"tags": ["common_tag", f"tag{i}"]} for i in range(1000)] codeflash_output = find_common_tags(articles) # 223μs -> 149μs (49.2% faster) # Outputs were verified to be equal to the original implementation def test_large_number_of_tags(): # Large number of tags with some common tags should return the common tags articles = [{"tags": [f"tag{i}" for i in range(1000)]}, {"tags": [f"tag{i}" for i in range(500, 1500)]}] expected = {f"tag{i}" for i in range(500, 1000)} codeflash_output = find_common_tags(articles) # 4.54ms -> 79.3μs (5618% faster) # Outputs were verified to be equal to the original implementation def test_mixed_length_of_tag_lists(): # Articles with mixed length of tag lists should return the common tags articles = [{"tags": ["python", "coding"]}, {"tags": ["python"]}, {"tags": ["python", "coding", "tutorial"]}] codeflash_output = find_common_tags(articles) # 2.71μs -> 2.19μs (23.3% faster) # Outputs were verified to be equal to the original implementation def test_tags_with_different_data_types(): # Tags with different data types should only consider strings articles = [{"tags": ["python", 123]}, {"tags": ["python", "123"]}] codeflash_output = find_common_tags(articles) # 2.36μs -> 1.96μs (20.4% faster) # Outputs were verified to be equal to the original implementation def test_performance_with_large_data(): # Performance with large data should return the common tag articles = [{"tags": ["common_tag", f"tag{i}"]} for i in range(10000)] codeflash_output = find_common_tags(articles) # 2.24ms -> 1.48ms (51.2% faster) # Outputs were verified to be equal to the original implementation def test_scalability_with_increasing_tags(): # Scalability with increasing tags should return the common tag articles = [{"tags": ["common_tag"] + [f"tag{i}" for i in range(j)]} for j in range(1, 1001)] codeflash_output = find_common_tags(articles) # 471μs -> 323μs (45.5% faster) # Outputs were verified to be equal to the original implementation

# imports # function to test from __future__ import annotations from codeflash.result.common_tags import find_common_tags # unit tests def test_empty_input_list(): # Test with an empty list codeflash_output = find_common_tags([]) # 651ns -> 471ns (38.2% faster) # Outputs were verified to be equal to the original implementation def test_single_article(): # Test with a single article with tags codeflash_output = find_common_tags( [{"tags": ["python", "coding", "development"]}] ) # 1.73μs -> 1.45μs (19.3% faster) # Test with a single article with no tags codeflash_output = find_common_tags([{"tags": []}]) # 611ns -> 521ns (17.3% faster) # Outputs were verified to be equal to the original implementation def test_multiple_articles_some_common_tags(): # Test with multiple articles having some common tags articles = [ {"tags": ["python", "coding", "development"]}, {"tags": ["python", "development", "tutorial"]}, {"tags": ["python", "development", "guide"]}, ] codeflash_output = find_common_tags(articles) # 3.15μs -> 2.41μs (30.3% faster) articles = [{"tags": ["tech", "news"]}, {"tags": ["tech", "gadgets"]}, {"tags": ["tech", "reviews"]}] codeflash_output = find_common_tags(articles) # 1.67μs -> 1.15μs (45.2% faster) # Outputs were verified to be equal to the original implementation def test_multiple_articles_no_common_tags(): # Test with multiple articles having no common tags articles = [{"tags": ["python", "coding"]}, {"tags": ["development", "tutorial"]}, {"tags": ["guide", "learning"]}] codeflash_output = find_common_tags(articles) # 2.41μs -> 2.20μs (9.53% faster) articles = [{"tags": ["apple", "banana"]}, {"tags": ["orange", "grape"]}, {"tags": ["melon", "kiwi"]}] codeflash_output = find_common_tags(articles) # 1.28μs -> 1.05μs (21.7% faster) # Outputs were verified to be equal to the original implementation def test_articles_with_duplicate_tags(): # Test with articles having duplicate tags articles = [ {"tags": ["python", "python", "coding"]}, {"tags": ["python", "development", "python"]}, {"tags": ["python", "guide", "python"]}, ] codeflash_output = find_common_tags(articles) # 2.88μs -> 2.31μs (24.2% faster) articles = [ {"tags": ["tech", "tech", "news"]}, {"tags": ["tech", "tech", "gadgets"]}, {"tags": ["tech", "tech", "reviews"]}, ] codeflash_output = find_common_tags(articles) # 1.66μs -> 1.17μs (41.9% faster) # Outputs were verified to be equal to the original implementation def test_articles_with_mixed_case_tags(): # Test with articles having mixed case tags articles = [{"tags": ["Python", "Coding"]}, {"tags": ["python", "Development"]}, {"tags": ["PYTHON", "Guide"]}] codeflash_output = find_common_tags(articles) # 2.38μs -> 2.09μs (13.8% faster) articles = [{"tags": ["Tech", "News"]}, {"tags": ["tech", "Gadgets"]}, {"tags": ["TECH", "Reviews"]}] codeflash_output = find_common_tags(articles) # 1.20μs -> 1.06μs (13.3% faster) # Outputs were verified to be equal to the original implementation def test_articles_with_non_string_tags(): # Test with articles having non-string tags articles = [ {"tags": ["python", 123, "coding"]}, {"tags": ["python", "development", 123]}, {"tags": ["python", "guide", 123]}, ] codeflash_output = find_common_tags(articles) # 3.04μs -> 2.33μs (31.0% faster) articles = [{"tags": [None, "news"]}, {"tags": ["tech", None]}, {"tags": [None, "reviews"]}] codeflash_output = find_common_tags(articles) # 1.67μs -> 1.19μs (40.4% faster) # Outputs were verified to be equal to the original implementation def test_large_scale_test_cases(): # Test with large scale input where all tags should be common articles = [{"tags": ["tag" + str(i) for i in range(1000)]} for _ in range(100)] expected_output = {"tag" + str(i) for i in range(1000)} codeflash_output = find_common_tags(articles) # 382ms -> 3.44ms (11025% faster) # Test with large scale input where no tags should be common articles = [{"tags": ["tag" + str(i) for i in range(1000)]} for _ in range(50)] + [{"tags": ["unique_tag"]}] codeflash_output = find_common_tags(articles) # 189ms -> 1.69ms (11061% faster) # Outputs were verified to be equal to the original implementation

from codeflash.result.common_tags import find_common_tags def test_find_common_tags(): find_common_tags([{}, {}]) def test_find_common_tags_2(): find_common_tags([])

🔎 Click to see Concolic Coverage Tests

Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup

codeflash_concolic_lnp43tks/tmpdki95lvw/test_concolic_coverage.py::test_find_common_tags 1.80μs 1.77μs 1.69%✅

codeflash_concolic_lnp43tks/tmpdki95lvw/test_concolic_coverage.py::test_find_common_tags_2 672ns 491ns 36.9%✅

To test or edit this optimization locally git merge codeflash/optimize-pr1191-2026-01-28T18.02.56

Suggested change

common_tags = articles[0].get("tags", [])

for article in articles[1:]:

common_tags = [tag for tag in common_tags if tag in article.get("tags", [])]

return set(common_tags)

common_tags = set(articles[0].get("tags", []))

for article in articles[1:]:

common_tags.intersection_update(article.get("tags", []))

return common_tags

find common tags

6acd3c9

codeflash-ai bot reviewed Jan 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

find common tags #1191

find common tags #1191

Uh oh!

misrasaurabh1 commented Jan 28, 2026

Uh oh!

codeflash-ai bot Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Test	Status
⚙️ Existing Unit Tests	✅ 2 Passed
🌀 Generated Regression Tests	✅ 29 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	✅ 2 Passed
📊 Tests Coverage	100.0%

Test File::Test Function	Original ⏱️	Optimized ⏱️	Speedup
`codeflash_concolic_lnp43tks/tmpdki95lvw/test_concolic_coverage.py::test_find_common_tags`	1.80μs	1.77μs	1.69%✅
`codeflash_concolic_lnp43tks/tmpdki95lvw/test_concolic_coverage.py::test_find_common_tags_2`	672ns	491ns	36.9%✅

find common tags #1191

Are you sure you want to change the base?

find common tags #1191

Uh oh!

Conversation

misrasaurabh1 commented Jan 28, 2026

Uh oh!

codeflash-ai bot Jan 28, 2026

Choose a reason for hiding this comment

⚡️Codeflash found 7,931% (79.31x) speedup for find_common_tags in codeflash/result/common_tags.py

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

⚡️Codeflash found 7,931% (79.31x) speedup for `find_common_tags` in `codeflash/result/common_tags.py`