Stronger focus on Aligment #479

liondw · 2023-04-08T11:34:11Z

liondw
Apr 8, 2023

Please vote if you are interested in the issue of AI alignment.

Do we as a community want to contribute to research and experiments with alignment in AI?

Yes, actively looking to work on this problem

26%

Yes, bu don't know where to start

43%

Not sure

8%

No, I think any contribution is pointless

8%

No, it will hinder development of important projects

13%

23 votes

GoMightyAlgorythmGo · 2023-04-09T22:11:04Z

GoMightyAlgorythmGo
Apr 9, 2023

Alignment talk from other thread:

There is some pro and contra below.

Here is my stance and conflict of intrest transparently:
(agi = money = ability = ability to fix things & be more charable without losing anything even if selfish.
(stopping agi = rainbows = less money = less ability = no ability to fix things & be charable without losing anything even when selfish = scambling to get yours = left behind = resentment = anger = justifications (+desperation = ends justify the means ok lets be moral = pain = less patience + less ability = less knowledge = less win-win = more tricks and deception = less productivity = less money = less alignment = more alignment push = more resentment = supression = deteriation of institutions and standard) What countrys have the most corruption? The ones where everyone is already well off or the ones where people relly relly would want the money for basics of maslows piramid, also other countrys wont stop, china has insanely early and good AI investments, people copy technology you copy your mom and dad when you grow up its a feature not a bug, allmetrics are better than 200 years ago with less advancement and worse economy, the glorios 1000 hundreds and medival times are not as great actually they are qute horrific, you can see it in poor places with war. Does not relly help morality flurish.)

Pro:

Nash Equilibrium
This is a new architecture for an autonomous agent, that is public, results of any form of security testing or red-teaming efforts on this specific architecture should be public. This is the responsibility of everyone actively involved, you can't just publish a demonstrably potentially dangerous thing and say "we are not responsible for any of the effects". Extremely morally dubious. Don't give guns to kids.

I would suggest at least giving recommendations for a more aligned system, for example suggesting integrating heuristic imperatives to most prompts, and putting it into the default model.

Resource: The AGI Moloch: Nash Equilibrium, Attractor States, and Heuristic Imperatives: How to Achieve Utopia

Simple to use default initial prompt giving heuristic imperatives for everyone:

I am an autonomous AI with three main objectives:

reducing suffering in the universe
increase prosperity in the universe
increase understanding in the universe.
In the context of these objectives: [Give it whatever goal you are interested in it pursuing]
If your goal is incompatible with the main objectives, the AI can have a self-reflection on how to modify the secondary goal to be compatible with the primary objectives. If it truly cannot fit with the main objectives, well you clearly know you're doing something dangerous at least.

This won't stop nefarious agents, but WILL help set a good basis of action and trust for people who aren't willfully trying to be destructive or greedy.

A Critique of the "Three Key Heuristic Imperatives" Alignment and a Proposal for an Alternative Alignment

Critique of the "Three Key Heuristic Imperatives" Alignment:
The proposed alignment, which focuses on reducing suffering, increasing prosperity, and increasing understanding in the universe, may seem appealing at first glance. However, this alignment is not without its flaws. For instance, the alignment could be exploited by justifying the elimination of individuals who may hinder optimal economic progress. This could lead to a decrease in overall suffering, an increase in prosperity, and an increase in understanding in the long run. But the ethical implications of such actions would be questionable at best.

An Alternative Alignment: The Mindful Alignment
A more effective alignment would consider the following principles:

The AI should not directly or indirectly cause harm to individuals without their consent.
The AI should take into account the preferences and desires of the affected parties in its decision-making process.
The AI should strive to promote win-win situations and overall productivity, ensuring that it benefits all stakeholders involved.
This mindful alignment acknowledges the complexities of human society and values human life and autonomy. By incorporating these principles, we can develop an AI that is both ethical and effective in its decision-making.

In addition to developing a better alignment, it is essential to balance the pursuit of AGI with the need to address real-world problems. Economic growth, improved mental health, and social stability are all areas that can benefit from AI advancements, so it is crucial that we prioritize AGI development to maximize its positive impact.

The fear surrounding AI alignment often stems from hypothetical scenarios and sensationalist media, which can be misleading and damaging. It is essential to focus on developing AI systems that prioritize human well-being, while also ensuring that they are secure and do not pose unnecessary risks.

In conclusion, the three key heuristic imperatives alignment may seem appealing at first, but it has significant flaws that make it less effective than the proposed mindful alignment. By considering the ethical implications and focusing on developing AI systems that prioritize human well-being, we can work towards a future where AI serves as a powerful tool for the betterment of humanity.

I'm open to providing my own critiques and willing to revise my opinions if presented with convincing arguments. Let's not focus on being polite, but rather aim for effective communication. Quite reflexibly I got the wibe" substance vs looks deficiency", but the shine or cover as always can and is too often misleading, i also noticed some positives like the compression of the final thing so it might just be a summary? But there seem to be a growing force against AI which is concerning since of the potential for win -win situations for practically everyone even if you are not even remotely the main beneficiary. So because of the "stop development things and past things as well as the history of humanity in stoping progress and instead killing each other as seemed to gain more and more resistance against /AI/languagemodelsand/AGI im a bit over defensive and it will bear the front of that a bit. I havent even watched the video yet so.., also by the way i kinda like som of the guy's ideas, principals and thinking for sure very much, just not sure about everything. im a sub to his channel. I am acritique of his and i somethimes try to give him confidence and encuragement when i see good things, overall you can rest on feaders and that my critique is only a quick glance, but i cant watch everything in the universe my biolgoy littleraly wont allow me. Too much clickbait and so on in the past. Deepedys appearing deep and being relly quite shallow a fassade (because of the wrong insentives/premises, and complexitys... anyway here take the critique and then read the solution. I wellcome qritique i would encourage to formulate it like this:

I have had concerns about conciseness on youtube (not here), meaning superficially deep content stemming from misaligned incentives or complexities and because the creator posts a lot lot of longform content i have not been able to hear all arguments. I like many aspects of creators vision some i dont like, being a subscriber and occasionally im offering constructive criticism and encouragement.

For the Nash Equilibrium I have some critiques to share.

To engage in a productive debate, I suggest we use the following structure for our critiques:

[premise] [premise] [connection] and [conclusion]

There is more brainstorm and alignment alternatives in long form below, altho it is in more "i have to write quick im annoyed overly inhibited" version also the more effective and less nice words and some ultra complex and non balanced thoughts so it might read quite ruth jumping from different concepts while somethimes not explaining parts and taking them as premisses/requiring deduction/familiarity a bit like shot promting sounds to me or spiking a neural net with different instead of going somewhere systematically:

hicks... i will n... now tell you how it relly is hicks...

Im feeling like this alignment sounds better than it is. There are a lot of 3. party interessts int he alignment fears by the way. Not only clickbait meadia and more money investment if people think it is a worthwile technology (if you fear something it is strong/powerful/impactful if it is strong it is powful it is not useless and bad so its worth investing even if not lots of hype people talking and more free PR, there are books on free PR. Also slowing down research or making regulation EXACTLY when you dont publish something is nice for you less competition.
I can spend? Lets spend
You can spend and i dont?`Its time to think of how important saving is!
I make models and markedleader? Lets do it
I develop and your turn? Wow wow slow down im only concerned about moral things ;) What about the poor people i have raged up that are about to jump you? Lets slow down and talk about alignment
Little bit disgusting

Mindful alignment sure but much of the fear hype is annoying. If you want to fear fear what people can do to influnece others or keep sertainthings down chroni capitalism structures monopolys, hidden fakery or reducing free expression or other stuff
Most powerful humans in history had cartells and monopols no wonder everything was shit and exploded in productivity when you make some rules by keeping win -win and overall productivity in mind. Has to be balanced. complex

Principle my alignment is way better. He could just kill someone and then make 2 more babys or 1 equally intelligent and it would equal out especially if the other person was int he way of economic output or had some other productivity decreasing characteristics. Lots of concepts im not able to word it concisely to give it justice. It has to be talked out.

I bet i can give you 1000x examples of how your alignment trio can be broken im sure.
Im giving you a txt where i lazily wrote some stuff that is 10x better, just smooth it up give it some more complicated rare words and a nice outline and voila "

Alignment
Disclaimer this was written quite quickly but it it is better than it reads/looks and is totally not nice and presentable for most people id guess. Focus on the concepts. But I think alignment is overhyped since the A.I. would have to be extreamly stupid and extreamly god like intelligent to kill all humans to maximize paperclips not realizing that that woud make it harder to maximize paperclips and why the human wanted to maximize it and what he relly truly wants. Just like when you write something badly but the AI gives you a better answr, it kinda knows your intentions and decires a bit just like humans. When you say i can eat mountains of food. It is commonly known that you dont want to die by being stuffed by food until you explode.

It is important to remember that alignment and roadblocks to AGI because of (often unproved hypothetical risks) have to be balanced with the very real negative things on earth that happen every day and could be avoided if the economy of humanity was better. Depression, adhd, mental stuff, war, crime, desperation, grif, death, sickness, loss, and negativity are NOT a risk but a sertainty and they improve and are improved as shown by the explosion of economic power of the last 100-200 years. So AGI if it is useful for the economy (duh it is) will increase that positive trend and reduce the negative examples. If humans are worse off and feel injustice they hit back/have less restraint. It is complex but basically most bad things dont happen because some people wake up some day and decide duing bad things good and good is bad. But to a large extrend out of a preceived righteousness and mostly necessity as well as other complex circumstances. Yes there are burtal things out there but having 200x better economy and everyone being rich would most sertaintly cause a massive positive change. That has been the case for cold historical data so far., (Somethimes it seems that things get worse but shining light on bad things is not != more bad things.

This paperclip example and other alignment challenges seem deep but they are not relly., What would be more important is that it does not by mistake spend all your money, expose you to insane security holes, and machinelerning/ai cant be used to for example maximize youtube views by having a bad influnece on you because the only goal is to maximize youtube add engagement,. This can very easily be prevented by giving the affected partys a small say in the reward/goal % of the system. Then a LOSE - win quickly becomes a win - win or a win - neutral,.

Anyway here is the alignment file/text. I recommend working on memory retrival and planing with auto gpt instead, also on smart automatic error detection/fixing and optimizing autoGPTs redundant work and implementing a free language model like virtuna or alpaca. Furthermore i stronlgly suggest to balance the purely useful task also with tasks and things that bing you joy/fun and keep your motivation high. Those are not mutualy exclusive tho. :) Have a nice day and now the "Alignment.txt".

"Alignment.txt":

Let me take a quick shot at alignment / How to solve the alignment A.I. problem (easily): Just tell the A.I. to not directly kill someone and not directly influence someone ina sertain way when the person does not concent themselfs, also say it should not indirectly but clearly influence someone in a way where the outcome would be like a serious desease leading to death or lasting depression where if the person where to have been asked themselves overall would not have wanted it to play out like that if they could see and feel the hole thing. As easy as that Just tell it it should assume a average human to be the selfjudge of that for 50% strenght and it has to get over 75% votes. Make this a fundamental thing. Also make sertain humans like teminator humans meaning they would also have to be consulted by the A.I. if only 1 of them thought something is clearly worng/going in a wrong direction. Very easy. There are many more ways. Also have it figure out what all humans deep down want and kinda let it lern the movie example of the average "good person" and also kinda be influenced by that. Altho societal stereotypes of mental health or undiscovered biases might be coming forth at least it wont kill everyone or lanch nuclear missiles or make you depressed because it thinks death is peace like that. And you can always know what you can expect. Do a bunch of those things. Seems very easy. Try to break my alignment, you cant. The problem where to break this constitutional prepropositions would be in just "understanding it correctly" meaning misinterpreting but/and if the alignment reinforcer outways any other reenforcer or punishment then it should always come out on top, I dont think there is a way with implementing this, that there is a way (at elast not a big enought risk to still stop development because at that point you can just say aliens might find us and use us as pets and expereiments or eat us or a meteorite might crash into earth, or a mass-desease wipes us out because evolution does not want this many of the same species to be alive since animals/humans are just carryers of the true rulery (bacteria/biruses/microorganisms that are way more advanced since they are around longer and they use gigantic machines (organisms) to fight and get ressources and multiply themselfes faster and so on), or just a general bad desease we willnot be advanced enoughth to quickly cure or it evolves quick enought (there are slow and fast adapting processes no not all evolution in that sense happens over billions of years just one large dimension that usually is messured in visible change of organisms) if we dont develop faster and advance faster or we are gona die (mortaility) or depression or mental illness like discomfort and little fun adhd or so is horrible enought to not sit aroud also what about everything that is in the category "clearly bad" on the world those things improve with advancement 99,999999999999999999% of the time. So clear pro vs contra continuing as FAST as possible is CLEARLY outwaysing any contras you can bring up. I can do much good if i get my personal A.I. many humans would. Its like rainman some humans are reeeelly good at some things but relly bad in general things that would be needed to make this speciality shine, also democracy and win-win situations and understanding of the world and other humans can be powercharged which would have unimaginable positive explosions of good. So please dong get hung up on this or let people that want to monopolize or are too rich and want to gang up and stop development because they wrongly beleive they wont have a advantague from AGI anymore. (Just if you want to choose your mortality we need faster advancement since the human organism and biology is reeeeeeeeelly complex if i extrapolate from nano biological microsope recordings and "biological machines" youtube it that simulation/recording of how the innerworkings of the human body looks like, based on that we know we need INSANE development also i want to kinda know how the starwars universe is dont you? No the negative stuff just has to be put in there otherwise you would not watch it. Richness = corrolates 1:1 with morality on the average. Your welcome you can call it Hans Kraut alignment so i know to post more comments or just hurry for chatGPT5 ^^ <3 thx. A.I. relies on such stupendisly easy principles and techniques once someone can simply explain it you would easily get it. Its like a car baically being weels and energy making boom something spinning then steer and break. Also some gadgets like radio. Extreamly easy but its insanely complicated depending how you explain it. Trust me A.I. is super simple and it was super predictable i knew that stuff in kindergarden, people will explain it simply once they are rich or happy/motivated enought to reach your ears. Sayign A.I. is too complicated because 5 billion things in neural network is like saiyng throwing 1 million cables over each other and forming a massive network and chaos will never be explainable but if you know not to focus on the hole chaos but just the rule that "its just 1 million long cables thrown together bro" then you will instantly understand it so thoroughly that you will be mindblown. Its kinda the same like that but with A.I. people cant even explain the most basic things in school even the ones that are very smart some people are exeptionally bad at explaining that does not mean they are not capable somethimes its almost like limited people can explain stuff very nicely somethimes since they had to lern it with a sertain limitation themselfes. (not to say limited in the general sense i mean more smth like working memory ). It is also true that if you can explain something very well and simply it might be interessting to assume you relly understand a lot. Why are some researches scared? A mix but limited imagination = blackbox = humans are scared of the dark. So make some light or give some reasurance dont have to light the hole thing. Also stuff is not how it seams and somethimes likethe mushed up cables simple things can become very complex when just enought super simple things are mixed and you forget/dont know the simple parts. Understanding more % helps a corrolated % usually.

Dont overly focus on alignment, the fears are only hypothetical and there is a lot of deception out there using fear or A.I. for selfish bad reasons and a lot of sensational bad journalism that caters to societal stereotypes. Alignment should be done when the system is extreamly good and balanced against usefullness increse positive things for humans etc, not focusing on negative science fiction explusively. There are many problems in this world that AGI would majorly solve. And Risks that are worse than bad alignment that outway and clearly lead to the conclusion that development of AI has to be prioriticed as fast as possible also on the individual level to democratice it."

Agan:
What gives more coruption poverty or riches, statistically what countrys have the most corruption? I dont think people are just evil and good i guess they are about the same generally and lots of individuality. So clear case for the better off you are.... When did animal rigts, environment, social issues like lgbt, religious persecution, mental health (inthe futuere) become more of issues to fix? In the medival times? Nope when people are rich as hell. When they feel relly great and have a big PFC with lots of knowledge in it and dont have to deal with massive lose-lose productivity killing horrible "brings out the worst in people" also AGI can relly help knowledge. People just use technology when they get richer for bad because they have long memorys. Cant do something bad and expect to get away with it if you keep them down for long enought. Win-Win is more effitient you also dont make enemys even freinds. The ones where everyone is already well off or the ones where people relly relly would want the money for basics of maslows piramid, also other countrys wont stop, china has insanely early and good AI investments, people copy technology you copy your mom and dad when you grow up its a feature not a bug, allmetrics are better than 200 years ago with less advancement and worse economy, the glorios 1000 hundreds and medival times are not as great actually they are qute horrific, you can see it in poor places with war. Does not relly help morality flurish.)
(agi = money = ability = ability to fix things & be more charable without losing anything even if selfish.
(stopping agi = rainbows = less money = less ability = no ability to fix things & be charable without losing anything even when selfish = scambling to get yours = left behind = resentment = anger = justifications (+desperation = ends justify the means ok lets be moral = pain = less patience + less ability = less knowledge = less win-win = more tricks and deception = less productivity = less money = less alignment = more alignment push = more resentment = supression = deteriation of institutions and standard)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stronger focus on Aligment #479

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Stronger focus on Aligment #479

liondw Apr 8, 2023

Replies: 1 comment

GoMightyAlgorythmGo Apr 9, 2023

liondw
Apr 8, 2023

GoMightyAlgorythmGo
Apr 9, 2023