Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated to Phi-3.5 #93

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Updated to Phi-3.5 #93

wants to merge 3 commits into from

Conversation

neoOpus
Copy link

@neoOpus neoOpus commented Sep 11, 2024

I have a question if you don't mind.

Do you think that using Uncensored models would be better for reverse engineering purposes?

@0xdevalias
Copy link

This is the error of the PR, I thought it would be a drop from 3.1 to 3.5 but I guess I have to learn more about the difference between the tokenization of both.

# [2024-09-12 04:24:09]  Loading model with options {
#   modelPath: '/Users/runner/.humanifyjs/models/Phi-3.5-mini-instruct-Q4_K_M.gguf',
#   gpuLayers: 0
# }
# [node-llama-cpp] Using this model ("~/.humanifyjs/models/Phi-3.5-mini-instruct-Q4_K_M.gguf") to tokenize text with special tokens and then detokenize it resulted in a different text. There might be an issue with the model or the tokenizer implementation. Using this model may not work as intended
# Subtest: /Users/runner/work/humanify/humanify/src/test/e2e.geminitest.ts
not ok 1 - /Users/runner/work/humanify/humanify/src/test/e2e.geminitest.ts
  ---

Originally posted by @neoOpus in #58 (comment)

@Acters
Copy link

Acters commented Sep 18, 2024

I have a question if you don't mind.

Do you think that using Uncensored models would be better for reverse engineering purposes?

Lets think critically, unless the ai is being asked to create malicious content then it would be a pressure point. However, Humanify will usually be querying the ai model with simply assessing/summarizing the code internally and then it will return variable or function names that closely resemble their function/usage in the code. A censored model is not inherently discarding info instead it will not respond with the censored topics/phrases. It is unlikely the AI model will be rejecting proper names or throwing an error without the back end censor systems being flawed in the first place.

I do think there is merit to supporting diverse set of models, including uncensored. The statement "uncensored is better" is a fallacy as it will be subjective, and the statement is both unverifiable and unfalsifiable. On the other hand, a model that is fine tuned or has data set that is relevant to the code's function will be better as it can help reduce the hallucinations from the AI.

@neoOpus
Copy link
Author

neoOpus commented Sep 18, 2024

I have a question if you don't mind.
Do you think that using Uncensored models would be better for reverse engineering purposes?

Lets think critically, unless the ai is being asked to create malicious content then it would be a pressure point. However, Humanify will usually be querying the ai model with simply assessing/summarizing the code internally and then it will return variable or function names that closely resemble their function/usage in the code. A censored model is not inherently discarding info instead it will not respond with the censored topics/phrases. It is unlikely the AI model will be rejecting proper names or throwing an error without the back end censor systems being flawed in the first place.

I do think there is merit to supporting diverse set of models, including uncensored. The statement "uncensored is better" is a fallacy as it will be subjective, and the statement is both unverifiable and unfalsifiable. On the other hand, a model that is fine tuned or has data set that is relevant to the code's function will be better as it can help reduce the hallucinations from the AI.

My suggestion stemmed from the observation that when asking ChatGPT, Gemini, or similar tools to reverse engineer something, they often respond with restrictions. While I know some techniques to bypass this (jailbreaking), I proposed using an uncensored option to conserve pre-instruction tokens.

I respectfully disagree with initial non-support for multiple models. Currently, there isn't an optimal free API or local version that works for everyone.

Sure, that sticking to a single reliable model would minimize bug reports and issues, facilitating development of a shared database that enhances deobfuscation. By avoiding inconsistencies in variable names across models and encouraging experimentation from the start, we can benefit everyone involved. But allowing for experimentation until this reach a certain level of maturity without pushing everyone to create their own fork would be beneficial.

@Acters
Copy link

Acters commented Sep 19, 2024

My suggestion stemmed from the observation that when asking ChatGPT, Gemini, or similar tools to reverse engineer something, they often respond with restrictions. While I know some techniques to bypass this (jailbreaking), I proposed using an uncensored option to conserve pre-instruction tokens.

Asking for unethical actions would encounter this type of restriction. However, humanify is only asking for the AI to analyze the code and return new names that fit the usage of the variable or function. Clearly reverse engineering has gotten a bad reputation and I believe it should be allowed, but it is not.

Instead of brazenly asking for reversing the code(which should be worse as it is a complex subject), it is better to break down the tasks required for reversing by asking the AI for clarification on how the code works, or to refactor the code, or help make the code easier to read. This taps into the coding assistant behaviors instead of whatever the censor classifies the reverse engineering as. Do note: humanify does NOT do this as it only asks to help rename stuff.

Sure, that sticking to a single reliable model would minimize bug reports and issues, facilitating development of a shared database that enhances deobfuscation. By avoiding inconsistencies in variable names across models and encouraging experimentation from the start, we can benefit everyone involved. But allowing for experimentation until this reach a certain level of maturity without pushing everyone to create their own fork would be beneficial.

I never said to stick to one model. what I alluded to is for whoever wants to use an AI model for whatever specific purpose would want to fine tune an AI model to produce less hallucinations.

As it stands, humanify does not need a fine tuned model and the support of other models is to be agnostic/independent from one single source.

It also seems that you have the wrong notion on what the AI model is used for. The AI model is NOT used for de-obfuscation. The AI model is used to help create human readable names for variables and functions for un-minification purposes. In no way is the AI touching the code. read the section titled "Don't let AI touch the code" in the project's blog post. https://thejunkland.com/blog/using-llms-to-reverse-javascript-minification

I respectfully disagree with initial non-support for multiple models. Currently, there isn't an optimal free API or local version that works for everyone.

This statement confuses me as humanify does support multiple models, including a localized free model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants