Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to use local LLMs and filter sensitive information #20

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

vishwamartur
Copy link
Contributor

Related to #18

Add measures to prevent sensitive information leakage and provide an option to use local LLMs.

  • create_har.py

    • Add a filter to exclude sensitive information such as auth tokens and login credentials from the recorded network requests and cookies.
    • Update the record_har_path and record_har_content parameters to use the filtered data.
    • Add a function to filter sensitive information from requests.
  • integuru/main.py

    • Add an option to use local LLMs instead of sending data to OpenAI.
    • Update the call_agent function to handle the new option for local LLMs.
  • integuru/util/LLM.py

    • Add a method to set and use local LLMs.
    • Update the get_instance method to handle the new option for local LLMs.
  • integuru/util/har_processing.py

    • Add measures to filter out sensitive information from HAR files.
    • Update the parse_har_file function to use the filtered data.
    • Add a function to filter sensitive information from request headers and body.

Related to Integuru-AI#18

Add measures to prevent sensitive information leakage and provide an option to use local LLMs.

* **create_har.py**
  - Add a filter to exclude sensitive information such as auth tokens and login credentials from the recorded network requests and cookies.
  - Update the `record_har_path` and `record_har_content` parameters to use the filtered data.
  - Add a function to filter sensitive information from requests.

* **integuru/__main__.py**
  - Add an option to use local LLMs instead of sending data to OpenAI.
  - Update the `call_agent` function to handle the new option for local LLMs.

* **integuru/util/LLM.py**
  - Add a method to set and use local LLMs.
  - Update the `get_instance` method to handle the new option for local LLMs.

* **integuru/util/har_processing.py**
  - Add measures to filter out sensitive information from HAR files.
  - Update the `parse_har_file` function to use the filtered data.
  - Add a function to filter sensitive information from request headers and body.
Copy link

@PredictiveManish PredictiveManish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work..

@alanalanlu
Copy link
Contributor

appreciate the help but the agent needs to see the full request including the Authorization headers to know if theres a "dynamic part" right in the auth header? If u completely remove the authorization header, what if the graph requires another request to get that header?

@vishwamartur
Copy link
Contributor Author

Thank you, @PredictiveManish, for the feedback!

@alanalanlu , good point regarding the Authorization headers. To address this, instead of completely removing the header, we could implement a partial redaction approach. This way, sensitive information like tokens can be partially masked rather than fully removed. This would allow the agent to identify dynamic parts without exposing sensitive information in full.

I'll update the filter_sensitive_info function to handle this scenario and ensure that the required components for dynamic detection remain intact. Let me know if this sounds good, or if there are any other concerns!

@alanalanlu
Copy link
Contributor

I dont think that would work still. The LLM will identify the masked auth token instead of the actual token and try to find that masked token in the response of a previous request but there will be no matches as we created the masked token. I think the best way to handle this is to either support local LLM (not smart enough for code gen) or have a mapping of the mapped token to the real token (For ex. u mask the token while keeping a mapping of the masked token to the actual token, pass the request with the masked token into the LLM, the LLM spits out the token, then u map it back. There can be issues with this approach too as the token can be anywhere such as the path and u cant hard code this).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants