Replies: 17 comments 29 replies
-
For me, the main issue is the lack of transparency of haidra, as a community that builds tooling for running and integrating models in general and visual generative AI in particular, and the running of AI horde. I understand that stable diffusion is in use all around the world, but I find the approach that LAION takes to copyright issues questionable at best. Corporations are already steamrolling over issues of consent, and I think we can do much better as an opensource community. I think that running models based on copyrighted materials is harmful to the wider use of these technologies by the people who would have the most to gain from them (artists). I am ok with my funds going to haidra as a dataset agnostic development project, and wish it would address ethics and practices around datasets (which in my view are one of the pillars of ethical AI) more openly than it currently does. I am much less ok with my funds going towards keeping AI horde alive. We have the potential to provide the foundations for ethical, community-built training dataset, built around consent, open reviews, rich discussions, let's not squander it. |
Beta Was this translation helpful? Give feedback.
-
First of all, thank you for your work here.
re: "building its own economy" From "Frequently Asked Questions -- How does this all work?" here: https://github.com/Haidra-Org/AI-Horde/blob/main/FAQ.md "Can I sell my kudos? Is Kudos a cryptocurrency? Can I exchange kudos for cryptocurrencies? Are "kudos" what you're referring to as "building its own economy"? |
Beta Was this translation helpful? Give feedback.
-
The economy indeed refers to the internal reward model, its inflationary by design since users who do not have any including users who do not even have an account can still do the majority of things horde has to offer. In those cases the system generates money on their behalf so the worker gets paid in full. Users spend them on two things (or collect them for the leaderboards), generations that take unreasonably long to do for free such as very large images or large generation lengths on text during times the queue is busy and priority in the queue which can help for bigger models not many people can host. This helps encourage people host AI models and lets the workers democratize who has priority on the platform when resources are insufficient to be fast for everyone. They indeed don't intend for it to be sold, which is why its not a crypto or anything like that this keeps it efficient and doesn't waste any of the valuable compute on computing all the crypto stuff. Having it be sellable would encourage abuse on the platform because people will seek exploits to get rich quick, so this further grounds Haidra to not making such a step in the future. |
Beta Was this translation helpful? Give feedback.
-
Yes. |
Beta Was this translation helpful? Give feedback.
-
Thanks for writing this up @SnoopJ. I'm going to first go through my process concerns, and then get into my actual issues with Haidra. (I apologize in advance for my egregious lack of brevity. TLDR is at the end before the footnotes.) For full disclosure, I'm a general member of Nivenly. I wasn't until today -- I was a ko-fi contributor at the tail end of the Basement Server Era, and then a Github sponsor, but since the Nivenly hello-world I was waiting for governance plans to be firmed up, so I could make an informed decision... and then there was, to use their words, radio silence. I'm a member now because I don't want to be called an outsider contributing to this discussion, and because I want to be able to vote if there is an election. I'm conflicted about this, because it looks like I'm contributing money as a result of onboarding Haidra, which is the opposite of what I want to convey. But, I'm considering it a retroactive contribution since my Github sponsorship dropped off while I was still actively using Hachyderm. Depending on the outcome of all this I may or may not remain a member. Messaging from Nivenly on this topic has been confusingFirst, they responded to nearly all toot-replies to the blog post (which were negative) by with a cheerful-sounding "happy to answer questions" and promised to append a FAQ to the blog post (without specifying when that work might be completed -- it isn't, as of today). At some point, the ask changed from "post your questions here" to "create a Community Discussion". There was no mechanism for doing so -- this repository was created yesterday, and advertised on Discord1 (not Hachyderm). I will say it is a bit frustrating to be asked to re-state all discussion here when it had already started on Mastodon. It's also frustrating to be sort of imperiously told "you've been complaining the wrong way -- obviously, you need to start a Discussion, and we can't do it for you" when that was not documented and the discussion forum did not exist until that instruction. It is documented now -- the plane is being built while flying it, I completely get that, but my concern is about priorities. The point I'll keep coming back to is that, apparently, jumping into an AI project was the top priority, before finishing governance, before whatever else is in the other scheduled blog posts. I would like to echo what @SnoopJ said about being puzzled at the types of actions Nivenly as an organization is expected to take. The decision to onboard Haidra was seemingly taken, and announced, without any public input. But providing an official discussion venue after the announcement is a bridge too far? Odd. How are prospective projects evaluated for Nivenly membership?I know that Hachyderm is merely one member project, but it's a highly visible one and tooting from an
This means that general members have no say over what projects join Nivenly. This mightn't be a problem if the general membership had broad trust in the values and priorities of the trade/project delegates, but frankly that's exactly what many of us are questioning at this very moment. The fact that it seems Nivenly is structured so that the general membership has no influence over the makeup of the organization is a red flag to me. The governance page says that the branches of the Senate should be "equally balanced" -- so if the trade committee and the project steering group agree on something, then the sentiments of the general membership are immaterial3. And since the projects are chosen by the trade and project delegates, such agreement seems quite likely. Perhaps the plain language I'm quoting wasn't intended to be the full story, or I'm misinterpreting it. Let me know if so! Haidra / GenAI in generalOK, enough rules lawyering (and I'm not a lawyer), let's talk about Haidra itself. First of all, is it pronounced /heɪdrə/, /haɪdrə/, or (probably not) /heɪaɪdrə/? This is clearly the most important question. With that out of the way, is this something Nivenly should want to get involved in? And as I keep repeating, is it the first thing Nivenly should want to get involved in, after the founding members Aurae and Hachyderm but before fully thinking through how the organization works? I (and others) argue that the answer is no. LLMs and image generation models are amazing technology. They've completely broken my intuition for what a computer should be able to do. (Remember those "I forced a bot to watch 100 hours of XYZ and here's what it came up with" posts from a few years ago? They were clearly fake then but look amateurish now!) And, I think they are unethically created and harmful to society as a whole. It seems from the announcement post that Haidra is joining Nivenly and receiving "governance and legal support of the Foundation". Monetary support is not mentioned, and the contribution link is a separate Patreon, but the original toots that led to this partnership mentioned that getting donation processing to work was a big issue. So it's unclear if there is monetary support from Nivenly. But lawyers cost money, of course, so legal support is an in-kind contribution. I just wanted to note this nuance to get ahead of a potential "but your membership fee isn't going to Haidra!" rebuttal. (I also found while digging through that history that the maintainer of Haidra was going through burnout and other hardships. They described the reaction to Nivenly's announcement as "the Anti-GenerativeAI crowd taking over the replies" (and I agree it's interesting that nobody who is excited about the project saw fit to reply). I know that by writing this giant rant about AI and opposing Haidra joining Nivenly that I'm joining in on what probably feels like a pile-on attack. I wish it weren't like that, but all I can say is it really isn't personal and these are my sincere concerns about the technology itself.) The project itself, as far as I can tell, is about crowd-sourcing compute resources to run LLMs and image generation models, a little like SETI@Home back in the day. The stated goal is to democratize access to the usage (not training4) of these tools and wrest control from the big corporations. I mean, yes, cool, please smash capitalism. Democratizing access to things is good! But... only if the things themselves are good! This toot from Currently, Haidra offers Stable Diffusion models for image generation. (Diversifying that may be planned for the future, but much like the situation with Nivenly governance, there's a difference between plans and current reality.) Anyway, Stable Diffusion and other so-called "AI" models have been trained by mass-scraping the internet, ingesting a near-infinite amount of human textual and artistic output with absolutely no regard to consent or licensing. There are active class-action lawsuits about this, including specifically against Stable Diffusion itself, so it's sort of ridiculous to blithely dismiss liability concerns by saying it's "completely legal". Also, legal does not equal ethical. Artists and writers are losing their jobs to AI right now.5 That is happening, it's not a hypothetical. When an artist managed to shame Stable Diffusion into removing him from the dataset, the SD community shamelessly trained a model specifically to override his wishes. AI also uses vast amounts of electricity and water as climate change boils our planet. And for what? As AI-generated images and text seep into more and more of our lives, we devalue human work and lose the privilege of believing anything is real. This tech can and will destabilize elections. Shoddy "AI detectors" just shut out non-native English speakers. Did Nivenly consider any of these externalities, or any others, when deciding whether to partner with Haidra? Did it seek advice from folks like Timnit Gebru, who think about these issues full-time? Did it even anticipate that this announcement might generate controversy? If the answer to any of these was yes, it wasn't evident. EOFMy apologies again for rambling. "If I had more time, I would have written a shorter letter." I'll try to summarize:
Footnotes
|
Beta Was this translation helpful? Give feedback.
-
OK. Looking at "Financial Contributions" https://opencollective.com/nivenly-foundation I'm seeing four categories (Project Membership, Individual -- Fellow Membership, Individual -- Trade Membership) each with "Latest Activity" "Be the first one to contribute!" underneath Only General Membership, Individual has "Latest Activity - 57" and a link forward to a membership listing Is it correct to assume from this that there are no Project, Fellow or Trade members, and participation is limited to General members only at this point in time? |
Beta Was this translation helpful? Give feedback.
-
hmm... Found an aburka @ Hachyderm and unblocked We move on |
Beta Was this translation helpful? Give feedback.
-
In this vein, I would like to add that the ACM Code of Ethics and Professional Conduct has the following things to say about data provenance:
Emphasis added. These seem, in light of the above comments, to be incompatible at the base with the use of Stable Diffusion, which has specifically been called out for training processes that work off of copyrighted data without the owner's knowledge or consent. I would very much argue that anything that derives from Stable Diffusion (or from any model that behaves like it in anyway) should be considered "fruit from the poisonous tree"—forever tainted by its original provenance in unethically (irrespective of legality) obtained data. That haidra can use other models is interesting, but not really relevant if it centers stable diffusion or prioritizes its use in any real way, which from the information available so far it appears to. Whether nivenly specifically subscribes to this code or to another, it would be good to understand how nivenly (and haidra) view their ethical responsibilities in respect to these three passages in general and the highlighted text in specific, and to understand what steps, if any, nivenly believes are required to ensure that its participation in this does not further encourage violation of the above. ETA: Looking at the Haidra website I can see that it links to AI Horde which in turn links to a website for AI Horde, which appears to be under the Haidra general moniker. On there I see links to the following
For the text models I end up in a maze of twisty passages, but very few of the listed models seem to list anything about their data provenance. Many of them seem to start with Meta AI's models as a base. |
Beta Was this translation helpful? Give feedback.
-
To make things 100% clear, especially since the AI Horde creator was deliberately muddying the waters on Mastodon, the only image generation worker AI Horde supports is Stable Diffusion. While it's true that there are multiple training models based on Stable Diffusion, the core model is Stable Diffusion, and all those other models are derivatives of it. I want to talk about music and copyright law for a moment. For any artistic work to receive copyright protections, it must be "fixed in a tangible medium". For music, for many years this meant on-paper sheet music. Eventually, musical recordings could be copyrighted, but the equipment to do so was expensive and out of the hands of most people. For most people music followed, effectively, an "oral" tradition. People would "write" music in their heads, play music with their families and communities, sing lyrics that had never been written down. So of course, composers exploited this. They would travel around and listen to "folk music", and write it down, then "own" it. In 20th-century America, this was a deliberate strategy to steal music from black people. When people talk about the appropriation of jazz and rock, it wasn't just mimicking a new style of music, but the actual theft of creative work as the white composers wrote down the "uncopyrighted" work of black musicians, then "own" it according to the law, and in some cases even use the law to ban the original creator from performing their own music. "I'm going to take from you. You have no say in the matter. You're not allowed to stop me. If you try, the law will protect me, not you." This is the same thinking as training AI on art without the artists' consent. Per others' comments, we don't know if the law will side with AI models or creators, but even if training AI on nonconsensual work is legal, so was the theft of music from "folk" musicians. I maintain that it is wildly unethical, I will not be a part of any project or organization that advances it, and I will not trust any person or persons that doesn't side with creators over AI models. That this project was even advanced and accepted by Nivenly breaks my heart. Yes, the talk now, after the pushback, is that AI Horde has applied and there will be a vote. That's not what the announcement said: "We are very excited to share that Haidra has joined the Nivenly Foundation as a member project." (Emphasis mine.) I am not a "General Member" of Nivenly, but only because I just learned of membership today, so apparently I won't get a vote on this. I've been a GitHub contributor since I joined Hachyderm. Honestly, it upsets me that there are multiple donation platforms and they don't convey the same rights and privileges. I have stopped my monthly GitHub donation, which continues through this month. If AI Horde is dropped as a project, I will join Nivenly as a General Member, but as long as there's a possibility of Nivenly supporting AI Horde I cannot be involved. A procedural note on voting membership: is there a "cooling off" period after joining before someone gets to vote? If not, it seems to me that an applying project could convince its members to all get General Memberships to stuff the ballot box in their favor. According to the screenshots above, it looks like there's only ~60 General Members, so it wouldn't take too many $7 contributions to tip the scales regardless of existing Nivenly member sentiment. What steps, if any, is Nivenly taking to ensure this kind of "hostile takeover" doesn't happen? |
Beta Was this translation helpful? Give feedback.
-
I'm on a different mastodon server and not member of nivenly, but I wanted to ask, will Hachyderm provide training data to Horde in any way? Just wondering if I might have a stake in this after all if my content federates to Hachyderm. |
Beta Was this translation helpful? Give feedback.
-
Ethical considerations and questions that I would personally like to see addressed, an expanded form of what I wrote above and what I wrote on mastodon. Because of how tightly tied this is to the question of data provenance and the follow on questions here I am choosing to put this here rather than in another thread, but do not object to it being split off as a conversation.
References are to one of:
My point in referencing these is not to say that any specific one should be adhered to, but rather to highlight how these are concerns within the software engineering discipline that have existed for quite some time and the ethical considerations here are much more ubiquitous than any single discussion on generative AI. I have not attempted a thorough and complete analysis with respect to the above and do not believe that they are at all "deal killers" (the ACM specifically says it isn't an "algorithm") but rather a framework under which to discuss what is good, what is bad, and what can be done about the challenges.They also form coherent starting points for discussing ethical concerns by enabling the question of "if you don't accept this that's fine, but what do you believe." |
Beta Was this translation helpful? Give feedback.
-
Hello everyone! Time sensitive reminder first: the question gathering period will close at 11:59 PM UTC. Second: we wanted to thank everyone to take the time to share their questions, concerns, and expertise as part of this process. It is very important to us that people are heard and have the questions and information that they are seeking. For the questions, further nuance and detail will be provided in our follow up blog posts about Nivenly and Haidra. Something that will inform or member discussion, and that would be helpful to see here, are thoughts are what working together looks like. This is especially important in a co-op model, as members have the ability to drive change. In the more traditional, and likely familiar, corporate model companies are influenced by one-directional, inbound, requests for the company to do / not do something. In the co-op model, members can make changes. Our chambers are designed to allow members, maintainers, and trade members to work together as often as possible. This means that members can take more actions than informing the board that they would like to see more information or change a specific decision. They can also work with the projects. There is a whole blank whiteboard of possibility here, but to seed some ideas so you know what we mean: working with a project can mean identifying an issue with a project, and requesting the project change it. It can mean working with the project, not only via assisting with the specific implementation details but also working with the project so they have other resources they might need to resolve whatever is being requested like mentorship, having panels with the relevant disciplines (e.g. data scientists, artists, etc. for an AI art project), and so on. To say it a little more succinctly: in a co-op model like ours where people are building together, not only can members work with Nivenly leadership to build Nivenly but they can also build with projects to help with those projects. Working with projects does not necessarily mean code contributions, though it can, but rather the feedback loop of what you'd like to see and working with Nivenly and the project to figure out how to make that happen. We'll be posting something like this in the member discussion when we start it as well, but wanted to have people thinking about it before then. When you see a project that needs changes, how would you work towards those changes? What outcomes would everyone be happy and comfortable working towards as members and maintainers? What resources are needed to arrive toward that outcome (our outcomes)? |
Beta Was this translation helpful? Give feedback.
-
It is telling that when called out for using unethically created models, all someone (apparently) from the Haidra project had to say on Mastodon was that their usage is "completely legal", and that they "consider" Haidra to be "completely ethical". Well ok then! That is not a good place to start from in general, but especially not for Nivenly. |
Beta Was this translation helpful? Give feedback.
-
I wanted to pull this out of the other guy's thread to speak about it more broadly. From @nivenly-foundation:
If this is the stance of Nivenly on data consent, then how is AI Horde even being considered as a project? It requires and has enhanced Stability Diffusion (and probably an equivalent text model) that, without consent, appropriated art from across the entire internet, unknown to the artists, many of whom have publicly come out against these AI models. Or does the consent model only apply to data under the umbrella of Nivenly, and anyone else's data can be harvested without consent? |
Beta Was this translation helpful? Give feedback.
-
Before this closes down, I do just want to register my core complaint, even if it's not entirely germane to the specific stated discussion. I think plenty of ink has been spilled on the concept of and concerns around data provenance practices so far, and that is great, and I'm glad we're having this discussion. However, my primary issue with Haidra and any other project working in that same space really isn't around the provenance of the data it used to create the models it uses, but what the models are for and what they do. In a time where artists, authors, and even art itself are being devalued, and generative AI is a powerful, damaging tool being used to devalue it further. It's not like there are no ethical uses of generative AI, but Haidra and other such tools isn't one of them. Sure, a world where only the megacorps have access to these tools is a bad one, but a world where everyone has access and considers it okay to use them is not any better. Potentially we should also have a discussion around "What (if any) machine learning usage practices does the Nivenly Foundation support?", but I wanted to make sure that if this discussion was the one being used to decide What To Do About Haidra, then this basic objection got at least a little more attention and was not lost among the primary discussion, since even if we can solve the data provenance issue around Haidra, I still would not support its inclusion. |
Beta Was this translation helpful? Give feedback.
-
hello, I've been hesitating to pose this question since it's slightly off the stated discussion topic on ethics and provenance, but only slightly. I did mention it in the Mastodon thread to be stiffly told everything was "completely legal". I'm repeating my reaction to it here to make sure it's included in the followup blog posts. Apart from the ethics questions that others have posed in this thread and on Mastodon, I have this specific one:
The above slightly off-topic question has led me to reread all of nivenly.org and reconsider the lack of information wrt the implied general questions, which are much further off topic, but I'd really like to see them addressed on the website. TL'DR: what exactly is Nivenly, as a legal entity, doing and how exposed are members to liability for actions/decisions we ostensibly own but in reality do not control?
Running a co-op is hard work, building consensus is hard work, but also there is the bureaucracy to make it a legal entity that can handle the consequences of deciding to support or not support projects as Nivenly intends to. |
Beta Was this translation helpful? Give feedback.
-
Thank you to everyone for adding your thoughts, needs, experiences, and expertise to this thread and for those of you who have communicated with us elsewhere. We'll be using the discussion here to ensure both Haidra's and Nivenly's blog posts address the needs and requests for information that are included here. Communicating timelines:
Outcome of the next steps will determine what happens next, e.g. the requested member election and/or other governance processes. Again thank you to everyone - both Nivenly members and non - who are contributing to this discussion. |
Beta Was this translation helpful? Give feedback.
-
I am opening this discussion in direct response to a solicitation on the Nivenly Discord for someone to open a thread here to discuss the recent announcement of the addition of Haidra to the Nivenly Foundation as a member project.
Before getting into it, I want to express that I find it perplexing that there was leadership enough to bring this project under the umbrella of Nivenly's stewardship, and to respond to people expressing concerns on Hachyderm, but opening a place for formal discussion of those concerns necessarily required a champion from within the group that had concerns. I don't think that's a great way to begin this discussion, but I'm willing to be the one who opens a thread, so here we go…
What is Haidra/AI Horde?
My understanding of Haidra is that it is a community organization that supports one project called "AI Horde", which seems to be a project for distributing computation for running generative models, with an apparent focus on models that produce text or images, and explicitly building its own economy. As someone who works in computer vision, I am more personally familiar with the pitfalls of the image models and so I will focus on them, but every concern I am about to express has a direct analogue in the world of language models. Perhaps someone who is more familiar with this domain can speak to the language models currently being promoted by AI Horde.
As far as I can tell, much of the image-generative portion of AI Horde appears to be about running "Stable Diffusion" models, whose development was funded heavily by the startup Stability AI (and who appear to be claimed as a sponsor in at least one portion of the AI Horde documentation). These models are often trained on datasets assembled by LAION, which are URLs to images on the web along with alt text associated with those images, without any concern for any terms of use or redistribution the owners of those images have. I am aware that AI Horde serves a variety of models, but in a brief review of these other models, I found much less information about the data used to train them. It is possible these models do not rely on LAION, but if this is the case, it is not something I was able to discover without digging into each model. I hope you will take it on faith that the problems I am about to broach are not specific to the LAION datasets and apply to many image datasets, and I have no reason to believe the bulk of AI Horde's supported models do not suffer from similar (if not exactly the same) problems.
Example: the problems of LAION
LAION themselves side-step concerns about copyright violations by pointing out that they publish only a list of URLs (and some metadata like alt text). I have not found any discussion of data provenance in any of the AI Horde documentation, but it is possible that I missed some mention of it while poking around.
Training an image generation model with this dataset necessarily requires downloading the images and encoding information contained within them into the dataset, and current models are large enough that they may contain complete copies of the images they were trained from. This is the basis on an ongoing lawsuit against the creators of Stability AI, whose models are so effective at reproducing the works they are trained on that they will also reproduce watermarks.
Intellectual property concerns should probably be enough for Nivenly to consider whether projects organized around these models truly fall under the banner of 'open source', but there are additional concerns. For instance, the LAION-5B dataset has been shown to include private medical records, and the flaws of the LAION datasets are discussed in numerous¹ published² scholarly³ works⁴ covering problems ranging from the reproduction of works from the training data to re-enforcement of misogynist and racist biases encoded therein.
What is Nivenly's stance on data provenance in machine learning?
Nivenly advertises itself as "building an equitable future" and as "designed to serve the public interest of its communities". In announcing the addition of Haidra as a member project with no discussion of data provenance, I would say that Nivenly has failed to live up to these promises, because the question of "where'd the training data come from?" is an immensely important one that should be fore-grounded at every opportunity, especially in the world of generative models.
I won't mince words: I feel that many of the models that AI Horde is designed to run are wildly unethical, and I am deeply concerned by the Foundation's decision to promote this. I would like to know what (if any) principles about data provenance guided the Foundation's decision to support these models, and what kind of practices I can expect from it in the future.
If Nivenly is getting into the "AI" business, I would like to pose the following question(s), but I encourage other community members to chime in as they see fit.
Is the Foundation concerned with the data used to train machine learning models? If so, what distinguishes acceptable practices from unacceptable practices?
Beta Was this translation helpful? Give feedback.
All reactions