Modify apache_kafka.py and related tests for migration#1042
Modify apache_kafka.py and related tests for migration#1042pombredanne merged 10 commits intomainfrom
Conversation
842799b to
a5f7248
Compare
pombredanne
left a comment
There was a problem hiding this comment.
Thank you! Please see feedback mostly to cleanup the code from comments so we can merge.
a5f7248 to
bc141b6
Compare
|
@johnmhoran please rebase your branch and also provide importer improver logs for this importer. |
|
@TG1999 After prep steps I ran into an error trying to run the apache_kafka importer. I don't understand why the |
|
@TG1999 I've been unable to track down the source of the error. All tests pass. Perhaps we can discuss tomorrow? |
639979b to
e016650
Compare
|
@TG1999 All 10 GH checks have passed and both the Please note, however, that the improver success message included a warning: |
1fb0eef to
0b26584
Compare
pombredanne
left a comment
There was a problem hiding this comment.
Thank you++
See my comments for your consideration.
| "affected_packages": [ | ||
| { | ||
| "package": { | ||
| "type": "maven", |
There was a problem hiding this comment.
There is no such thing as an "apache_kafka" maven package. This is a 404: https://repo1.maven.org/maven2/apache_kafka
There are instead two possible package URLs to return:
1. pkg:apache/kafka (let's use this ONLY for now)
2. and many different Maven PURLs because there are many JARS in kafka. That's a job for an improver later.
To get an idea of what maven PURls would be, in this https://downloads.apache.org/kafka/3.3.2/kafka_2.12-3.3.2.tgz we have all these JARS:
- kafka_2.13-3.3.2.jar
- kafka-clients-3.3.2.jar
- kafka-log4j-appender-3.3.2.jar
- kafka-metadata-3.3.2.jar
- kafka-raft-3.3.2.jar
- kafka-server-common-3.3.2.jar
- kafka-shell-3.3.2.jar
- kafka-storage-3.3.2.jar
- kafka-storage-api-3.3.2.jar
- kafka-streams-3.3.2.jar
- kafka-streams-examples-3.3.2.jar
- kafka-streams-scala_2.13-3.3.2.jar
- kafka-streams-test-utils-3.3.2.jar
- kafka-tools-3.3.2.jar
That's only for a kafka build for Scale 2.12.
There is another for 2.13 with a second set of similar JARs.
All these 14 JARs would exist on Maven. For instance with version 3.3.2, we would have:
- https://repo1.maven.org/maven2/org/apache/kafka/kafka_2.12/3.3.2/ as
pkg:maven/org.apache.kafka/kafka_2.12@3.3.2 - https://repo1.maven.org/maven2/org/apache/kafka/kafka_2.13/3.3.2/ as
pkg:maven/org.apache.kafka/kafka_2.13@3.3.2
and many variation for each JARs above.
THEREFORE, let's use only a series of pkg:apache/kafka PURLs for now.
@johnmhoran please create an issue to later create an improver that will handle the maven creation.
There was a problem hiding this comment.
Thank you @pombredanne for your detailed comments. 👍 This test file was an earlier draft and I should have deleted it, doing so now; and I've opened a new issue re a maven importer as requested.
vulnerabilities/tests/test_data/apache_kafka/test-advisories.json
Outdated
Show resolved
Hide resolved
| "qualifiers": null, | ||
| "subpath": null | ||
| }, | ||
| "affected_version_range": "vers:maven/2.0.0|2.0.1|2.1.0|2.1.1|2.2.0|2.2.1|2.2.2|2.3.0|2.3.1|2.4.0|2.4.1|2.5.0|2.5.1|2.6.0|2.6.1|2.6.2|!=2.6.3|2.7.0|2.7.1|!=2.7.2|2.8.0.|!=2.8.1|<3.0.0", |
There was a problem hiding this comment.
Are you sure the range is using maven syntax for all their versions? including some older ones as found in http://archive.apache.org/dist/kafka/ ?
There was a problem hiding this comment.
@pombredanne I don't understand your question, but I think this is superseded by your decision above to use only apache PURLs for now and to add an issue for a maven improver.
vulnerabilities/tests/test_data/apache_kafka/test-advisories.json
Outdated
Show resolved
Hide resolved
vulnerabilities/tests/test_data/apache_kafka/to-advisory-apache_kafka-expected.json
Outdated
Show resolved
Hide resolved
| fixed_versions_string = fixed_versions_row.find_all("td")[1].text | ||
|
|
||
| # Remove leading white space after initial comma | ||
| affected_versions_string_split_SPLIT = [ |
There was a problem hiding this comment.
This is a very variable name.... Consider this code instead:
affected_versions = [v.strip() for v in affected_versions.split(",")]
affected_versions = [v for v in affected_versions if v]
There was a problem hiding this comment.
@pombredanne After making your suggested changes I get an error re my hard-coded mapping:
TypeError: unhashable type: 'list'
Replacing the awkwardly-named
affected_versions_string_split_SPLIT = [
substring.strip()
for substring in affected_versions.split(",")
if not substring.isspace()
]
with
affected_versions = [v.strip() for v in affected_versions.split(",")]
affected_versions = [v for v in affected_versions if v]
has converted the previously-defined affected_versions = affected_versions_row.find_all("td")[1].text to a list, which won't work as a mapping/dict key. There's probably a simple fix for this, giving it some thought now....
There was a problem hiding this comment.
@pombredanne This is where the conversion of affected_versions to a list throws an error. It also applies to your subsequent comment re fixed_versions_string_split_SPLIT.
# This throws a KeyError if the opening h2 tag `id` data changes or is not in the
# hard-coded affected_version_range_mapping dictionary.
if affected_version_range_mapping[cve_id]["action"] == "include":
# These 2 variables (not used elsewhere) trigger the KeyError for changed/missing data.
check_affected_versions_key = affected_version_range_mapping[cve_id][
affected_versions
]
check_fixed_versions_key = affected_version_range_mapping[cve_id][
fixed_versions_string
]
I don't see how to modify the excerpted code to deal with your conversion of affected_versions to a list.
There was a problem hiding this comment.
Hmm, by simply changing the variable name ever so slightly to avoid using affected_versions, your approach works. Am I missing something? ;-) This way we use your cleaner code AND avoid converting affected_versions itself to an unwanted list.
affected_versions_clean = [v.strip() for v in affected_versions.split(",")]
affected_versions_clean = [v for v in affected_versions if v]
| for substring in affected_versions_string.split(",") | ||
| if not substring.isspace() | ||
| ] | ||
| fixed_versions_string_split_SPLIT = [ |
There was a problem hiding this comment.
My above comment seems to apply with equal vigor to your suggested fixed_versions_string_split_SPLIT improvement. All tests once again pass.
0ddb6fe to
a284994
Compare
|
@pombredanne I've addressed all of your PR comments (thank you ;-), committed and pushed my updated code, and all GH checks have passed. Ready for you to review at your convenience. |
Reference: #972 Signed-off-by: John M. Horan <johnmhoran@gmail.com>
Signed-off-by: John M. Horan <johnmhoran@gmail.com>
Reference: #972 Signed-off-by: John M. Horan <johnmhoran@gmail.com>
Reference: #972 Signed-off-by: John M. Horan <johnmhoran@gmail.com>
Reference: #972 Signed-off-by: John M. Horan <johnmhoran@gmail.com>
Reference: #972 Signed-off-by: John M. Horan <johnmhoran@gmail.com>
b973738 to
e33e18d
Compare
|
@pombredanne I've replaced the rather long mapping reference with a variable as you suggested and all tests and GH checks pass. 👍 |
|
|
||
| # This throws a KeyError if the opening h2 tag `id` data changes or is not in the | ||
| # hard-coded affected_version_range_mapping dictionary. | ||
| cve_version_mapping = affected_version_range_mapping[cve_id] |
There was a problem hiding this comment.
Thanks for the update. I would prefer not having a _mapping suffix or similar type suffixes when this is not absolutely needed, but this is cosmetic we can fix this later!
No description provided.