Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature to define findings manually in the model, similar to over… #180

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

nozmore
Copy link
Collaborator

@nozmore nozmore commented Oct 2, 2021

…rides

The idea here is after working with a dev team and threats are identified which does not exist in the pytm threatlib they can be manually added to the model. Similar to using overrides to modify a threat/finding. Alternatively someone could use pytm only for model definition, dfd/seq generation and reporting using manual findings instead of using any of the threatlib.

I originally called these "custom findings" and named variables accordingly. I was thinking of renaming these to "manual findings", thoughts?

I would like to update tm.py and the README.md with a real world example that makes sense given the base tm model. I haven't spent any mental energy to come up with one that good and doesn't exist in the threatlib. If anyone has any thoughts let me know.

@ghost
Copy link

ghost commented Oct 2, 2021

CodeSee Review Map:

Review these changes using an interactive CodeSee Map

Review in an interactive map

View more CodeSee Maps

Legend

CodeSee Map Legend

@raphaelahrens
Copy link
Contributor

Hi,

I think the notation is a little bit redundant

web.custom_findings = [
    Finding(web,
        id = "foo_id_3",
        description = "foo_description",
        details = "foo_details",
        severity = "HIGH",
        mitigations = "foo_mitigations",
        example = "foo_example",
        threat_id = "foo_threat_id_3",
        references = "foo_references",
        response = "accepted",
    ),
]

could just be something like

f(
  web,
  "foo_description",
  details = "foo_details",
  severity = "HIGH",
  mitigations = "foo_mitigations",
  example = "foo_example",
  threat_id = "foo_threat_id_3",
  references = "foo_references",
  response = "accepted",
)

So a function or class constructor could take care of adding the new Finding to the custom/manual_finding of web.
This removes the double use web and the list notation.
It can also create an id for us if we don't want to specify one.
It can also have required fields, like the description, directly as parameters.

If you prefer to have the element in the beginning of the line maybe

web.add_finding( ...

Regarding the name I tend towards manual, but I'm not sure if there should be a separation between manual and automatic findinds.

@nozmore
Copy link
Collaborator Author

nozmore commented Oct 2, 2021

Yes, I meant to mention that in the description, the redundancy is another reason I had this as draft. We haven't used methods for assignment in tm.py before so I just went with the same approach and obviously this doesn't work for json import. At minimum I was going to add logic to remove the dependency requirement.

I think the required fields are up to the report format, not sure if we should impose this. For an internal threat maybe from a custom threat library it may just be a pointer or reference to a ticket.

I was thinking instead of treating them as separate findings in the report I could add them to the main findings list and the findings instance variables but add a 'source' field to the Finding (default would be 'pytm', source could be 'manual' for these findings or a custom value set in the model. To the earlier point when I copy them from the custom_findings instance variable to findings I could assign the target element to remove the redundancy.

@raphaelahrens
Copy link
Contributor

Source could also be an author, so that you have someone you can contact?

@colesmj
Copy link
Collaborator

colesmj commented Oct 2, 2021

A couple of comments on this thread:

It would potentially be useful to know the source of a finding as automatic (from pytm detection) or manual (human added) or possibly others like external or inherited.

Findings should have common fields such as summary, description (long or short), etc, and should probably also contain references or some field to support a contact url or email or something (maybe a link to a support or security team, etc)

To @nozmore 's comment about tying fields to the report, I'm a big fan of delinking. The findings should have metadata or data to be meaningful, the report can use what it wants/needs from findings, as long as there is a common set of fields and logic to detect when something is empty or missing.

@nozmore
Copy link
Collaborator Author

nozmore commented Oct 2, 2021

Source could also be an author, so that you have someone you can contact?

It would potentially be useful to know the source of a finding as automatic (from pytm detection) or manual (human added) or possibly others like external or inherited.

Exactly 'pytm' when found using the threatlib, 'manual' if the finding is defined in the model but 'source' is not set. any string (except for 'pytm') if 'source' is defined in the model.

@colesmj
Copy link
Collaborator

colesmj commented Oct 2, 2021

Source could also be an author, so that you have someone you can contact?

It would potentially be useful to know the source of a finding as automatic (from pytm detection) or manual (human added) or possibly others like external or inherited.

Exactly 'pytm' when found using the threatlib, 'manual' if the finding is defined in the model but 'source' is not set. any string (except for 'pytm') if 'source' is defined in the model.

Question: why should the finding(s) be defined in the model, and not with the model?

@nozmore
Copy link
Collaborator Author

nozmore commented Oct 2, 2021

Findings should have common fields such as summary, description (long or short), etc, and should probably also contain references or some field to support a contact url or email or something (maybe a link to a support or security team, etc)

To @nozmore 's comment about tying fields to the report, I'm a big fan of delinking. The findings should have metadata or data to be meaningful, the report can use what it wants/needs from findings, as long as there is a common set of fields and logic to detect when something is empty or missing.

I re-used the Finding class so they do have a common set of fields however for these manually identified findings I don't believe requiring fields is for us to say. For threatlib findings, yes, fully agree. For these I think the organization maintaining their report template would need to decide what fields are required. Currently target is required and should be, adding description is reasonable so there is a way to describe the finding or point to something.

I'm open to making whatever fields as required that we all agree to, I just don't think its necessary and will lead to people putting junk data like I did to satisfy the requirements of the tool.

@colesmj
Copy link
Collaborator

colesmj commented Oct 2, 2021

I'm open to making whatever fields as required that we all agree to, I just don't think its necessary and will lead to people putting junk data like I did to satisfy the requirements of the tool.

I'm not suggesting making fields mandatory unless they make sense - summary, description, source (how finding got into the list) certainly, and an identifier (auto generated or manually supplied); all other fields should be optional, and if possible not fixed. The report should not rely on fields existing otherwise those fields should always exist and have at least some form of default (even if an empty string), or the report generator should know how to handle missing/empty fields in findings.

@nozmore nozmore marked this pull request as ready for review October 5, 2021 05:00
@nozmore nozmore requested a review from izar as a code owner October 5, 2021 05:00
@nozmore
Copy link
Collaborator Author

nozmore commented Oct 5, 2021

Ok. I think I incorporated all of the comments. Only thing I didn't was @colesmj comment about do the findings need to be stored in the model. No its code so they dont have to be, for now I like that it is part of the model. For me teams would manage 1 tm file per app (or per a given scope if a larger app) run this file thru the tool and produce output. Also there is a link to the model since the findings are added to elements. When I take a look at json support I would like to separate the input (model, manual findings) from the output (all findings)

Below are the comment I addressed..

  • Renamed to manual findings instead of custom
  • manual findings are now incorporated with all other findings with a source field, (if separation is required the if template feature can be used)
  • similar numerical finding 'id' counter is used
  • the duplicate element reference in tm.py is removed (web.manual_findings = [Finding(web........
  • removed required fields from Finding.

The only outstanding change is to finalize on the custom findings in tm.py.... or decide to remove those from tm.py and just document them in the README.md.

pytm/pytm.py Outdated Show resolved Hide resolved
@raphaelahrens
Copy link
Contributor

Ok. I think I incorporated all of the comments. Only thing I didn't was @colesmj comment about do the findings need to be stored in the model.

@colesmj I also have to ask what you ment by "in" vs "with" the model.

When I take a look at json support I would like to separate the input (model, manual findings) from the output (all findings)

Why is this separation helpful? Could you give an example where this separation would make a difference?

I also wanted to state my preference for an extra function for manual findings.
So the first thing I would do is add a function like this to my threat model.

def finding(element, description, **kwargs):
    element.manual_finding.append(Finding(.....

This function allows me to just create a finding without carrying about the structure of pytm.
The model would then at the end have a bunch of lines like

tm = TM()
dB = Datastore()
broker = Process()
Dataflow(broker, dB, " messages")

finding(dB, "cyclic references" source="RA")
finding(broker, "missing topic authZ", source="RA")

The advantage is that the internal data model and the DSL is more decoupled. So as a user I also don't have to care what kind of finding this is and how it is stored.

The json representation is currently depending on the internal data structure and it might not be a good idea to have a user interface which is the datastructure.

pytm/pytm.py Outdated
"cannot overwrite {}.{} value with {}, already set to {}".format(
instance, self.__class__.__name__, value, self.data[instance]
if (not isinstance(instance, Finding)
or (isinstance(instance, Finding) and isinstance(self,varString) and self.data[instance] != "invalid")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm against using special values like "invalid". Looks like we should not use this var* type for Findings? What's exactly the purpose of these conditions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invalid was already there, I didn't remove it. It set when a Finding is created without and element, I dont think it ever happened before because the element was required but actually gets used now. The conditions are used to workaround the 'overwrite' protection. I wanted to narrow down the specific occurances where overwrite would be allowed and is only for Finding's element and target attributes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can avoid setting the findings two times? Can you locate the place where it's being set the second time?

Copy link
Collaborator Author

@nozmore nozmore Oct 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, cleaned things up. Instead of using setting the value to 'Element("invalid")' I set it to None and avoided the double set on the findings element and target attributes.

@@ -626,6 +632,7 @@ class Finding:
""",
)
cvss = varString("", required=False, doc="The CVSS score and/or vector")
source = varString("manual", required=False, doc="The source of the Finding.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name source on its own is too vague. Can the doc mention that it's automatically populated with pytm for findings matched from the threat library? It'll be the second possible value next to the default value, and it would help users to figure out the meaning of this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I'll add some better text. I could remove the logic around source altogether and we just add it to every element in the threatlib.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to have that logic, it's better to have this field only in Finding than both in here and in Threat

@colesmj
Copy link
Collaborator

colesmj commented Oct 5, 2021

Ok. I think I incorporated all of the comments. Only thing I didn't was @colesmj comment about do the findings need to be stored in the model.

@colesmj I also have to ask what you ment by "in" vs "with" the model.

By writing findings directly into the model definition, changes may invalidate the findings or result in conflicting findings. Better IMHO to separate model definition from outcomes. But maybe we're not at that point in pytm structure to support separation.

…nt's findings, added Finding added to the model with element as arg.
@nozmore
Copy link
Collaborator Author

nozmore commented Oct 5, 2021

When I take a look at json support I would like to separate the input (model, manual findings) from the output (all findings)
Why is this separation helpful? Could you give an example where this separation would make a difference?

I would like to have this running in the pipeline to create the report, dfd and findings. This way I can monitor changes to the model (or lack there of) and have them separate from the findings which could update due to pytm or threatlib changes while the model remains the same.

I also wanted to state my preference for an extra function for manual findings. So the first thing I would do is add a function like this to my threat model.

With moving away from the manual_findings attribute to use findings I was 90% there, just made some changes. It now supports findings added to the element's findings attribute or ones created using a lone Finding object with the element as an arg.

The json representation is currently depending on the internal data structure and it might not be a good idea to have a user interface which is the data structure.

I agree, I have some json change locally that I started, might start over but I do want to decouple the data models, definitely for json, but could be done for python stuff as well. Might fit well with @izar 's branch for modularizing pytm.

@nineinchnick
Copy link
Collaborator

nineinchnick commented Oct 6, 2021

I would like to have this running in the pipeline to create the report, dfd and findings. This way I can monitor changes to the model (or lack there of) and have them separate from the findings which could update due to pytm or threatlib changes while the model remains the same.

To avoid unexpected changes in threats, you should lock the version of pytm used or use your own threat library (a copy).

@@ -801,4 +801,4 @@
"onDuplicates": "Action.NO_ACTION",
"threatsExcluded": [],
"threatsFile": "pytm/threatlib/threats.json"
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't remove end of line at end of file. Configure your text editor not to do that automatically.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced the json using the produced json during the tests. I'm editing in vi so nothing is removed automatically on my end. If the space is important we could look to add it to the generated file.

pytm/pytm.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants