What (if any) machine learning data provenance practices does the Nivenly Foundation support? #2

SnoopJ · 2023-07-31T04:34:44Z

SnoopJ
Jul 31, 2023

I am opening this discussion in direct response to a solicitation on the Nivenly Discord for someone to open a thread here to discuss the recent announcement of the addition of Haidra to the Nivenly Foundation as a member project.

One of the things that we tried to encourage in the comments, that we haven't see anyone do yet (and it would be aweome if someone did): we need a member of Nivenly Community to open a Community Discussion...
Basically: there is a lot of talking at the Nivenly account to request Nivenly do something, but the point of the co-op is that the members take part in governance. This is a great opportunity to start our first feedback loop on request 🙂

Before getting into it, I want to express that I find it perplexing that there was leadership enough to bring this project under the umbrella of Nivenly's stewardship, and to respond to people expressing concerns on Hachyderm, but opening a place for formal discussion of those concerns necessarily required a champion from within the group that had concerns. I don't think that's a great way to begin this discussion, but I'm willing to be the one who opens a thread, so here we go…

What is Haidra/AI Horde?

My understanding of Haidra is that it is a community organization that supports one project called "AI Horde", which seems to be a project for distributing computation for running generative models, with an apparent focus on models that produce text or images, and explicitly building its own economy. As someone who works in computer vision, I am more personally familiar with the pitfalls of the image models and so I will focus on them, but every concern I am about to express has a direct analogue in the world of language models. Perhaps someone who is more familiar with this domain can speak to the language models currently being promoted by AI Horde.

As far as I can tell, much of the image-generative portion of AI Horde appears to be about running "Stable Diffusion" models, whose development was funded heavily by the startup Stability AI (and who appear to be claimed as a sponsor in at least one portion of the AI Horde documentation). These models are often trained on datasets assembled by LAION, which are URLs to images on the web along with alt text associated with those images, without any concern for any terms of use or redistribution the owners of those images have. I am aware that AI Horde serves a variety of models, but in a brief review of these other models, I found much less information about the data used to train them. It is possible these models do not rely on LAION, but if this is the case, it is not something I was able to discover without digging into each model. I hope you will take it on faith that the problems I am about to broach are not specific to the LAION datasets and apply to many image datasets, and I have no reason to believe the bulk of AI Horde's supported models do not suffer from similar (if not exactly the same) problems.

Example: the problems of LAION

LAION themselves side-step concerns about copyright violations by pointing out that they publish only a list of URLs (and some metadata like alt text). I have not found any discussion of data provenance in any of the AI Horde documentation, but it is possible that I missed some mention of it while poking around.

Training an image generation model with this dataset necessarily requires downloading the images and encoding information contained within them into the dataset, and current models are large enough that they may contain complete copies of the images they were trained from. This is the basis on an ongoing lawsuit against the creators of Stability AI, whose models are so effective at reproducing the works they are trained on that they will also reproduce watermarks.

Intellectual property concerns should probably be enough for Nivenly to consider whether projects organized around these models truly fall under the banner of 'open source', but there are additional concerns. For instance, the LAION-5B dataset has been shown to include private medical records, and the flaws of the LAION datasets are discussed in numerous¹ published² scholarly³ works⁴ covering problems ranging from the reproduction of works from the training data to re-enforcement of misogynist and racist biases encoded therein.

What is Nivenly's stance on data provenance in machine learning?

Nivenly advertises itself as "building an equitable future" and as "designed to serve the public interest of its communities". In announcing the addition of Haidra as a member project with no discussion of data provenance, I would say that Nivenly has failed to live up to these promises, because the question of "where'd the training data come from?" is an immensely important one that should be fore-grounded at every opportunity, especially in the world of generative models.

I won't mince words: I feel that many of the models that AI Horde is designed to run are wildly unethical, and I am deeply concerned by the Foundation's decision to promote this. I would like to know what (if any) principles about data provenance guided the Foundation's decision to support these models, and what kind of practices I can expect from it in the future.

If Nivenly is getting into the "AI" business, I would like to pose the following question(s), but I encourage other community members to chime in as they see fit.

Is the Foundation concerned with the data used to train machine learning models? If so, what distinguishes acceptable practices from unacceptable practices?

wesen · 2023-07-31T14:30:57Z

wesen
Jul 31, 2023

For me, the main issue is the lack of transparency of haidra, as a community that builds tooling for running and integrating models in general and visual generative AI in particular, and the running of AI horde.

I understand that stable diffusion is in use all around the world, but I find the approach that LAION takes to copyright issues questionable at best. Corporations are already steamrolling over issues of consent, and I think we can do much better as an opensource community. I think that running models based on copyrighted materials is harmful to the wider use of these technologies by the people who would have the most to gain from them (artists).

I am ok with my funds going to haidra as a dataset agnostic development project, and wish it would address ethics and practices around datasets (which in my view are one of the pillars of ethical AI) more openly than it currently does.

I am much less ok with my funds going towards keeping AI horde alive.

We have the potential to provide the foundations for ethical, community-built training dataset, built around consent, open reviews, rich discussions, let's not squander it.

0 replies

MadokVaur · 2023-07-31T15:56:01Z

MadokVaur
Jul 31, 2023

First of all, thank you for your work here.

which seems to be a project for distributing computation for running generative models, with an apparent focus on models that produce text or images, and explicitly building its own economy..."

re: "building its own economy"

From "Frequently Asked Questions -- How does this all work?" here: https://github.com/Haidra-Org/AI-Horde/blob/main/FAQ.md

"Can I sell my kudos?
NO! Kudos is inherently valueless and we do not allow anyone exchanging kudos for money. Bypassing this requirement is an existential threat to the AI Horde. Please do not attempt to do this under the table. If you exchange money for Kudos and we discover it, we might zero out your account and whoever you bought it from!

Is Kudos a cryptocurrency?
No! Kudos is completely centralized and involved no blockchain tech whatsoever. The AI Horde is explicitly hostile to blockchain technologies and we will never integrate with any of them. Likewise, there is no way to convert kudos to anything other than favours benefiting the improvement of the AI Horde.

Can I exchange kudos for cryptocurrencies?
No! Same rules and reasoning applies as for selling kudos, see above."

Are "kudos" what you're referring to as "building its own economy"?

0 replies

henk717 · 2023-07-31T16:35:10Z

henk717
Jul 31, 2023

The economy indeed refers to the internal reward model, its inflationary by design since users who do not have any including users who do not even have an account can still do the majority of things horde has to offer. In those cases the system generates money on their behalf so the worker gets paid in full.

Users spend them on two things (or collect them for the leaderboards), generations that take unreasonably long to do for free such as very large images or large generation lengths on text during times the queue is busy and priority in the queue which can help for bigger models not many people can host.

This helps encourage people host AI models and lets the workers democratize who has priority on the platform when resources are insufficient to be fast for everyone.

They indeed don't intend for it to be sold, which is why its not a crypto or anything like that this keeps it efficient and doesn't waste any of the valuable compute on computing all the crypto stuff. Having it be sellable would encourage abuse on the platform because people will seek exploits to get rich quick, so this further grounds Haidra to not making such a step in the future.

0 replies

SnoopJ · 2023-07-31T17:08:15Z

SnoopJ
Jul 31, 2023
Author

Are "kudos" what you're referring to as "building its own economy"?

Yes.

3 replies

MadokVaur Jul 31, 2023

OK.

But an economy internal to transactions within Haidra. Not one which has any (permissible) utility in the surrounding world.

If the overall purpose here is to understand the context of support being offered by the Nivenly Foundation, I think that the distinctions re: "building it own economy" are not merely semantic ones, but more how that distinction fits into the larger world that Niveny inhabits.

durka Jul 31, 2023

I think you can look to reddit karma farming and digital currency trading to see that "internal" economies do not stay that way :)

MadokVaur Jul 31, 2023

Haha...

Yeah. We can only hope...

durka · 2023-08-01T05:44:59Z

durka
Aug 1, 2023

Thanks for writing this up @SnoopJ. I'm going to first go through my process concerns, and then get into my actual issues with Haidra. (I apologize in advance for my egregious lack of brevity. TLDR is at the end before the footnotes.)

For full disclosure, I'm a general member of Nivenly. I wasn't until today -- I was a ko-fi contributor at the tail end of the Basement Server Era, and then a Github sponsor, but since the Nivenly hello-world I was waiting for governance plans to be firmed up, so I could make an informed decision... and then there was, to use their words, radio silence. I'm a member now because I don't want to be called an outsider contributing to this discussion, and because I want to be able to vote if there is an election. I'm conflicted about this, because it looks like I'm contributing money as a result of onboarding Haidra, which is the opposite of what I want to convey. But, I'm considering it a retroactive contribution since my Github sponsorship dropped off while I was still actively using Hachyderm. Depending on the outcome of all this I may or may not remain a member.

Messaging from Nivenly on this topic has been confusing

First, they responded to nearly all toot-replies to the blog post (which were negative) by with a cheerful-sounding "happy to answer questions" and promised to append a FAQ to the blog post (without specifying when that work might be completed -- it isn't, as of today). At some point, the ask changed from "post your questions here" to "create a Community Discussion". There was no mechanism for doing so -- this repository was created yesterday, and advertised on Discord¹ (not Hachyderm). I will say it is a bit frustrating to be asked to re-state all discussion here when it had already started on Mastodon. It's also frustrating to be sort of imperiously told "you've been complaining the wrong way -- obviously, you need to start a Discussion, and we can't do it for you" when that was not documented and the discussion forum did not exist until that instruction. It is documented now -- the plane is being built while flying it, I completely get that, but my concern is about priorities. The point I'll keep coming back to is that, apparently, jumping into an AI project was the top priority, before finishing governance, before whatever else is in the other scheduled blog posts.

I would like to echo what @SnoopJ said about being puzzled at the types of actions Nivenly as an organization is expected to take. The decision to onboard Haidra was seemingly taken, and announced, without any public input. But providing an official discussion venue after the announcement is a bridge too far? Odd.

How are prospective projects evaluated for Nivenly membership?

I know that Hachyderm is merely one member project, but it's a highly visible one and tooting from an @hachyderm.io address feels like close enough association that I don't want to be on the instance long-term if Nivenly is doing something I'd consider unethical. Also, the way to get involved in Hachyderm governance is to be a Nivenly member, so that closer association is required anyway. So I'm asking this because as a Nivenly member, I'd want to know if and how the general membership has any influence over that. The governance page says:²

The Nivenly trade committee and project steering groups are tasked with accepting and approving new projects for the organization.

This means that general members have no say over what projects join Nivenly. This mightn't be a problem if the general membership had broad trust in the values and priorities of the trade/project delegates, but frankly that's exactly what many of us are questioning at this very moment.

The fact that it seems Nivenly is structured so that the general membership has no influence over the makeup of the organization is a red flag to me. The governance page says that the branches of the Senate should be "equally balanced" -- so if the trade committee and the project steering group agree on something, then the sentiments of the general membership are immaterial³. And since the projects are chosen by the trade and project delegates, such agreement seems quite likely.

Perhaps the plain language I'm quoting wasn't intended to be the full story, or I'm misinterpreting it. Let me know if so!

Haidra / GenAI in general

OK, enough rules lawyering (and I'm not a lawyer), let's talk about Haidra itself. First of all, is it pronounced /heɪdrə/, /haɪdrə/, or (probably not) /heɪaɪdrə/? This is clearly the most important question.

With that out of the way, is this something Nivenly should want to get involved in? And as I keep repeating, is it the first thing Nivenly should want to get involved in, after the founding members Aurae and Hachyderm but before fully thinking through how the organization works? I (and others) argue that the answer is no.

LLMs and image generation models are amazing technology. They've completely broken my intuition for what a computer should be able to do. (Remember those "I forced a bot to watch 100 hours of XYZ and here's what it came up with" posts from a few years ago? They were clearly fake then but look amateurish now!) And, I think they are unethically created and harmful to society as a whole.

It seems from the announcement post that Haidra is joining Nivenly and receiving "governance and legal support of the Foundation". Monetary support is not mentioned, and the contribution link is a separate Patreon, but the original toots that led to this partnership mentioned that getting donation processing to work was a big issue. So it's unclear if there is monetary support from Nivenly. But lawyers cost money, of course, so legal support is an in-kind contribution. I just wanted to note this nuance to get ahead of a potential "but your membership fee isn't going to Haidra!" rebuttal.

(I also found while digging through that history that the maintainer of Haidra was going through burnout and other hardships. They described the reaction to Nivenly's announcement as "the Anti-GenerativeAI crowd taking over the replies" (and I agree it's interesting that nobody who is excited about the project saw fit to reply). I know that by writing this giant rant about AI and opposing Haidra joining Nivenly that I'm joining in on what probably feels like a pile-on attack. I wish it weren't like that, but all I can say is it really isn't personal and these are my sincere concerns about the technology itself.)

The project itself, as far as I can tell, is about crowd-sourcing compute resources to run LLMs and image generation models, a little like SETI@Home back in the day. The stated goal is to democratize access to the usage (not training⁴) of these tools and wrest control from the big corporations. I mean, yes, cool, please smash capitalism. Democratizing access to things is good! But... only if the things themselves are good! This toot from @[email protected] really says it all:

Currently, Haidra offers Stable Diffusion models for image generation. (Diversifying that may be planned for the future, but much like the situation with Nivenly governance, there's a difference between plans and current reality.) Anyway, Stable Diffusion and other so-called "AI" models have been trained by mass-scraping the internet, ingesting a near-infinite amount of human textual and artistic output with absolutely no regard to consent or licensing. There are active class-action lawsuits about this, including specifically against Stable Diffusion itself, so it's sort of ridiculous to blithely dismiss liability concerns by saying it's "completely legal". Also, legal does not equal ethical. Artists and writers are losing their jobs to AI right now.⁵ That is happening, it's not a hypothetical. When an artist managed to shame Stable Diffusion into removing him from the dataset, the SD community shamelessly trained a model specifically to override his wishes.

AI also uses vast amounts of electricity and water as climate change boils our planet. And for what? As AI-generated images and text seep into more and more of our lives, we devalue human work and lose the privilege of believing anything is real. This tech can and will destabilize elections. Shoddy "AI detectors" just shut out non-native English speakers.

Did Nivenly consider any of these externalities, or any others, when deciding whether to partner with Haidra? Did it seek advice from folks like Timnit Gebru, who think about these issues full-time? Did it even anticipate that this announcement might generate controversy? If the answer to any of these was yes, it wasn't evident.

EOF

My apologies again for rambling. "If I had more time, I would have written a shorter letter." I'll try to summarize:

What is the actual current structure of Nivenly's governance, i.e. how much of the plane is currently built?⁶ I understand there is a blog post coming out ~tomorrow about this.
What kind of influence does the general membership have over projects joining and leaving Nivenly?
Is Haidra, with its embrace of mainstream, deploy-first-ethics-later AI and casual disregard for data provenance, indicative of the type of project we should expect Nivenly to court in the future?

<rant type="offtopic" subject="discord"> It's a subject for a different rant entirely, but I'll just note that IMO, Discord is a realtime chat platform (and a kinda crappy one at that -- search barely works, threading support is minimal, and it's constantly nagging you to buy emojos), not an archive of information. The fact that all folks can say is "XYZ was said on Discord" and not provide a permalink is exactly the problem. Open projects shouldn't be using closed platforms as the sole source of truth for anything! </rant> ↩
It's slightly unclear from the text what "committee" and "steering group" refer to here, but referencing the graphic at the top of the page, it's implied that they refer to the trade and project chambers of the Nivenly Senate, and this interpretation was confirmed (...on Discord 🙃). ↩
Or as @gnomicutterance said it better than me, becoming a general member could be mathematically pointless. ↩
Although AFAICT, Haidra is collaborating with SD on training upcoming models. ↩
I'm a software engineer and somewhat afraid I'm next. Not because I really think AI can do what I do, but because that doesn't matter if management thinks it can. I don't even like using Copilot for my own work, because for me it sucks the joy out of the creative process of programming (and the thing about sending all my proprietary source code to Microsoft, there's that). ↩
I just have to note that https://nivenly.org/governance/ currently says right at the top: "At this time the board of directors will reserve total control of the organization. Our intention is to delegate out [...] over time." So that sounds like... not very much of the plane! And it is in conflict with this repo's assurance that "the model itself is ready". On Discord I was told this is intended to mean that at this early stage, the BoD will step in "to fill the gaps" if the Senate doesn't reach consensus, or recommends something that's infeasible, etc, and that seems a bit more reasonable but it isn't what the plain language says right now. ↩

1 reply

durka Aug 1, 2023

Ha! Just found this while poking around the website: https://github.com/nivenly/website/blob/93ae511f30136c7fb978288452b6b18cfd7ed9cc/content/en/docs/_index.md?plain=1#L127

How do I submit a project to the Nivenly Foundation?

[...]

Projects that handle data must have clearly defined and documented:
[...]

How the project handles any ethical concerns with the project data

(I confirmed that this text was not just added this weekend.) From context, by "data" this is probably talking more about PII, but it's funny how this mirrors some of the questions we've been asking about Haidra. It would be great to know if Haidra and Nivenly discussed anything related to this bullet point when arranging the partnership. Since we know Stable Diffusion has no regard for what they put in the LAION dataset, there isn't a good answer IMO.

MadokVaur · 2023-08-01T13:36:07Z

MadokVaur
Aug 1, 2023

Or as @gnomicutterance said it better than me, becoming a general member could be mathematically pointless

OK. Looking at "Financial Contributions" https://opencollective.com/nivenly-foundation I'm seeing four categories (Project Membership, Individual -- Fellow Membership, Individual -- Trade Membership) each with "Latest Activity" "Be the first one to contribute!" underneath

Only General Membership, Individual has "Latest Activity - 57" and a link forward to a membership listing

Is it correct to assume from this that there are no Project, Fellow or Trade members, and participation is limited to General members only at this point in time?

7 replies

durka Aug 1, 2023

I'm getting the sense that there is not a PR department 😆

The fact that reactions to this announcement were apparently not anticipated is part of my concern, tbh

BTW I'm also mindful that you've blocked me on Hachyderm (though I've deleted the toot that seemed to be the final straw) so if you'd prefer I stop replying to you here, just ask

durka Aug 2, 2023

Sorry this was bouncing around in my head today

participation is limited to General members only at this point in time?

I mean, clearly it isn't, since someone made the decision to onboard Haidra. Admittedly I wasn't a member at the time -- maybe an email went out through OpenCollective? But I doubt that.

MadokVaur Aug 2, 2023

First I saw of any of this was on Discord, July 28, 10:56 am PDT, @ "Welcoming Haidra!" here:

https://discord.com/channels/858148454953254943/1056722255196979261/1134544822619869254

That announcement linked to a Nivenly Foundation blog post, "Haidra Joins the Nivenly Foundation" on July 28 also, here:

https://nivenly.org/blog/2023/07/28/haidra-joins-the-nivenly-foundation/

Implicit in your question, however, is the idea that Nivenly cannot do anything without consulting the General Membership

Or anything important perhaps; definitions may vary

I'm not sure that's either realistic or practical given the scale of what Nivenly is, and how many individual people are the principal participants

See: "Who is running Nivenly?" here: https://nivenly.org/who/

You will certainly argue that this is a significant change, and that the membership should have been consulted first

I would respond by saying that the Nivenly principals are few in number, and certainly busy with other things in addition to their own personal lives

And around and around we go...

durka Aug 2, 2023

the idea that Nivenly cannot do anything without consulting the General Membership

I see how I might have implied that, but I think you're going further than I meant to. Clearly, delegates exist for a reason. But yes as I mentioned in my giant wall of text, I am looking for more clarity around what kinds of moves we should expect the foundation to make with vs without input from the community!

durka Aug 6, 2023

Adding more context as I learned of it: the application that Haidra submitted to Nivenly is public: https://github.com/nivenly/project/pull/1/files

I'm curious if this repo was public at the time or if it was just not communicated well that an application was happening?

The PR is kind of strange to read at this point, but I'd recommend folks do, to get a sense of the Nivenly board's due diligence, or apparent lack thereof, on prospective member projects. The maintainer of Haidra isn't present at all; one of the Nivenly board members uploaded it on their behalf. It says Haidra's goal is to make an "ethical" community but doesn't talk about what that means or how it would happen. And that it's an anarchist community that requires help with intellectual property law and governance (lol).

Nova raised some interesting issues, which were not addressed (at least not in the venue of this PR). And then it was accepted without further discussion. Just saying, in our Gitlab at work, you can't click Merge until all open threads are resolved ;)

MadokVaur · 2023-08-01T14:26:32Z

MadokVaur
Aug 1, 2023

BTW I'm also mindful that you've blocked me on Hachyderm (though I've deleted the toot that seemed to be the final straw) so if you'd prefer I stop replying to you here, just ask

hmm...

Found an aburka @ Hachyderm and unblocked

We move on

0 replies

dclements · 2023-08-03T23:12:51Z

dclements
Aug 3, 2023

In this vein, I would like to add that the ACM Code of Ethics and Professional Conduct has the following things to say about data provenance:

Computing professionals should therefore credit the creators of ideas, inventions, work, and artifacts, and respect copyrights, patents, trade secrets, license agreements, and other methods of protecting authors’ works.

To minimize the possibility of indirectly or unintentionally harming others, computing professionals should follow generally accepted best practices unless there is a compelling ethical reason to do otherwise. Additionally, the consequences of data aggregation and emergent properties of systems should be carefully analyzed

This requires taking precautions to prevent reidentification of anonymized data or unauthorized data collection, ensuring the accuracy of data, understanding the provenance of the data, and protecting it from unauthorized access and accidental disclosure. Computing professionals should establish transparent policies and procedures that allow individuals to understand what data is being collected and how it is being used, to give informed consent for automatic data collection, and to review, obtain, correct inaccuracies in, and delete their personal data.

Emphasis added. These seem, in light of the above comments, to be incompatible at the base with the use of Stable Diffusion, which has specifically been called out for training processes that work off of copyrighted data without the owner's knowledge or consent. I would very much argue that anything that derives from Stable Diffusion (or from any model that behaves like it in anyway) should be considered "fruit from the poisonous tree"—forever tainted by its original provenance in unethically (irrespective of legality) obtained data.

That haidra can use other models is interesting, but not really relevant if it centers stable diffusion or prioritizes its use in any real way, which from the information available so far it appears to.

Whether nivenly specifically subscribes to this code or to another, it would be good to understand how nivenly (and haidra) view their ethical responsibilities in respect to these three passages in general and the highlighted text in specific, and to understand what steps, if any, nivenly believes are required to ensure that its participation in this does not further encourage violation of the above.

ETA: Looking at the Haidra website I can see that it links to AI Horde which in turn links to a website for AI Horde, which appears to be under the Haidra general moniker.

On there I see links to the following

A client interface (Stable Diffusion)
Stable UI (Stable Diffusion)
Tinybots/Art Bot (Stable Diffusion)
AAAI (Stable Diffusion)
AI Painter (Stable Diffusion)
Aislingeach (Stable Diffusion)

For the text models I end up in a maze of twisty passages, but very few of the listed models seem to list anything about their data provenance. Many of them seem to start with Meta AI's models as a base.

0 replies

Azuaron · 2023-08-04T13:55:47Z

Azuaron
Aug 4, 2023

To make things 100% clear, especially since the AI Horde creator was deliberately muddying the waters on Mastodon, the only image generation worker AI Horde supports is Stable Diffusion. While it's true that there are multiple training models based on Stable Diffusion, the core model is Stable Diffusion, and all those other models are derivatives of it.

I want to talk about music and copyright law for a moment. For any artistic work to receive copyright protections, it must be "fixed in a tangible medium". For music, for many years this meant on-paper sheet music. Eventually, musical recordings could be copyrighted, but the equipment to do so was expensive and out of the hands of most people.

For most people music followed, effectively, an "oral" tradition. People would "write" music in their heads, play music with their families and communities, sing lyrics that had never been written down. So of course, composers exploited this. They would travel around and listen to "folk music", and write it down, then "own" it. In 20th-century America, this was a deliberate strategy to steal music from black people. When people talk about the appropriation of jazz and rock, it wasn't just mimicking a new style of music, but the actual theft of creative work as the white composers wrote down the "uncopyrighted" work of black musicians, then "own" it according to the law, and in some cases even use the law to ban the original creator from performing their own music.

"I'm going to take from you. You have no say in the matter. You're not allowed to stop me. If you try, the law will protect me, not you." This is the same thinking as training AI on art without the artists' consent.

Per others' comments, we don't know if the law will side with AI models or creators, but even if training AI on nonconsensual work is legal, so was the theft of music from "folk" musicians. I maintain that it is wildly unethical, I will not be a part of any project or organization that advances it, and I will not trust any person or persons that doesn't side with creators over AI models. That this project was even advanced and accepted by Nivenly breaks my heart. Yes, the talk now, after the pushback, is that AI Horde has applied and there will be a vote. That's not what the announcement said: "We are very excited to share that Haidra has joined the Nivenly Foundation as a member project." (Emphasis mine.)

I am not a "General Member" of Nivenly, but only because I just learned of membership today, so apparently I won't get a vote on this. I've been a GitHub contributor since I joined Hachyderm. Honestly, it upsets me that there are multiple donation platforms and they don't convey the same rights and privileges.

I have stopped my monthly GitHub donation, which continues through this month. If AI Horde is dropped as a project, I will join Nivenly as a General Member, but as long as there's a possibility of Nivenly supporting AI Horde I cannot be involved.

A procedural note on voting membership: is there a "cooling off" period after joining before someone gets to vote? If not, it seems to me that an applying project could convince its members to all get General Memberships to stuff the ballot box in their favor. According to the screenshots above, it looks like there's only ~60 General Members, so it wouldn't take too many $7 contributions to tip the scales regardless of existing Nivenly member sentiment. What steps, if any, is Nivenly taking to ensure this kind of "hostile takeover" doesn't happen?

0 replies

untitaker · 2023-08-04T14:04:08Z

untitaker
Aug 4, 2023

I'm on a different mastodon server and not member of nivenly, but I wanted to ask, will Hachyderm provide training data to Horde in any way? Just wondering if I might have a stake in this after all if my content federates to Hachyderm.

8 replies

Azuaron Aug 5, 2023

Haidra/AI Horde is not some bystander with regard to stability.ai that happened to create a useful technology that stability.ai exploited. Haidra is directly involved ("I've been collaborating...") with stability.ai's process to train models, and uses AI Horde to facilitate the model training process. Literally, the final model that stability.ai releases is different because of the work done with Haidra on AI Horde. That AI Horde isn't used at the microscopic level of generating each individual statistical model doesn't change the fact that Haidra is collaborating with stability.ai to use AI Horde as part of stability.ai's model training process.

That being said, let's mosey on down the page I linked:

Well each time you generate images with SDXL you can immediately rate them for a hefty kudos rebate! Assuming your client supports it, you should always report back which image is better than the two. You can then optionally also rate each of them for aesthetic ratings (how much you subjectively like them) and for artifacts (how ruined the image is from generation imperfections like multiple fingers etc)

Also, as I mentioned before, this beta is primarily going to be used to improve the model, therefore we’re disallowing anonymous usage on this model for the moment.

The most cool factor is that as stability.ai further improves the model, is will automatically become part of this test, without you having to do anything! As such, you will eventually start getting better and better generations from the SDXL_beta on the AI horde. This is why you continued sharing of the quality is very important!

There is a literal, direct feedback loop between users of AI Horde and stability.ai's model. These human ratings are vital to an AI's model-training process, and is the core of what "training an AI model" is. They are training the statistical models using AI Horde, even if AI Horde is not providing the computational power that generates the statistical model.

henk717 Aug 5, 2023

I am not disputing the benefit of horde's contributions, but it is not relevant to Unitaker's question. Horde is very open about it producing datasets to enhance AI models using the optionaly user data so AI tuners can train on AI generated data for better models that are based on AI generated art instead of man made art. But what it does not do is collect or provide data from external sources which is what unitaker was worried about.

In your case you are against generative AI and this is very evident, there is nothing Horde could say that would make you satisfied because you are against the base models themselves. So you'd inherently object to anything related to these generative AI models automatically making you object to the Horde.

There are entire philosophical debates you can have on these things and drawing those lines is very personal, I am not a member of nivenly neither am I interested in becoming one. So for that portion of the discussion this is not the right place for me to participate. I merely am helping clarify some misunderstandings on what Horde is or isn't and I am going to stick to the more factual side. The philosophy on if Horde is desirable or not ill leave up to the actual Nivenly members to do, I merely just want them to do it based on the facts not speculation.

untitaker Aug 5, 2023

Thanks @henk717, for as long as there is no automated interaction between Horde and the fediverse I don't think there is anything left to discuss in this particular subthread.

henk717 Aug 5, 2023

The only automated interaction I know off is the Horde bots hosted on two instances, including Hachyderm that users can tag to get free AI generated content. But those do not scan messages or images on the fediverse itself, they respond with an image based on the request they got tagged in. The one on hachyderm is here : https://hachyderm.io/@haichy so you can monitor from its history what it does / doesn't do. Its not a scraper though, it merely provides output for users wanting to obtain their own AI art for free.

nivenly-foundation Aug 8, 2023
Collaborator

We wanted to take a moment and chime in to answer this.

For the Haidra specific question: no, Hachyderm data is not used to train AI.

For the broader question: This touches on user consent. We believe that all consent models should be transparent and easy to understand. This ensures that if someone is choosing to share their data, that they can do so with informed consent and maintain control over what is shared or not. This means that there should be no question about what and which data is being shared. This also means that membership of one project cannot be automatically opted into another for data sharing purposes by default. Any project that wanted to introduce data sharing would need to be able to get the consent of the members and user communities of all involved projects. Existing defaults would also need to remain, so that no one finds they have been sharing data they do not intend (by being opted in without their consent, which would violate the consent model).

dclements · 2023-08-05T22:50:23Z

dclements
Aug 5, 2023

Ethical considerations and questions that I would personally like to see addressed, an expanded form of what I wrote above and what I wrote on mastodon. Because of how tightly tied this is to the question of data provenance and the follow on questions here I am choosing to put this here rather than in another thread, but do not object to it being split off as a conversation.

Would Haidra be willing to commit to zero use or advertising of models/workers that are trained on data sourced from copyrighted material that does not include the holder's permission, irrespective of legal fair use qualifiers? (ACM 1.6, 2.8) Presently, Haidra has direct support for Stable Diffusion and all of the image generation models they advertise are rooted in stable diffusion (taken on 5 August 2023). I have not validated the text models to the same level of detail, but the same considerations apply there (though I know that there are ethically trained models in this space). So, would those who enthusiastically support Haidra be willing to make this as a commitment in terms of the models they directly support and advertise and the models that work with the main "AI Horde" network?
Has an analysis been done on the environmental impact of #AiHorde? What would this look like? (ACM 1.1, 1.2; IEEE-SE 2.07, 3.03) There is deep concern right now around the training of these models—which Haidra seems to be involved in with Stability AI—and in some cases even the use of these models consume significant resources, noted in the AI Horde FAQ in terms of compensation for the worker, but does not seem to do any reflection on environmental impact beyond that. Is this something that either the Haidra team or nivenly would be willing to commit to?
Relatedly: What would the members of #nivenly think of codifying a series of ethical principles around the use of generative AI? Would #Haidra and #AiHorde be willing to abide by them? (ACM 1.2, 3.4, 4.1; IEEE-SE 1.05) Regardless of my criticisms of them, even Google has a set of AI principles, AWS talks about responsible AI in the generative era, and meta has their own set of Responsible AI principles. Given the substantive criticism of all three of these organizations AI principles by groups like DAIR and the continuing problems of exploitation in AI, this seems like an area where nivenly can do better from the get-go.
Currently Haidra appears to be taking the "I'm a sign, not a cop" approach to the problem of kudos being exchanged for money. Would Haidra and Nivenly be open to reexamining this strategy and determining if either other mechanisms might be more robust or if it can be further secured (ACM 1.2, 2.5) While there are plenty of strong statements in the FAQ about it, there seem to be no enforcement mechanisms, no work being done to detect this sort of application, and I can't find any security features to prevent it. This is a Hard Problem™, but it is worth thinking about before putting more energy into promotion.
While #Haidra asserts that individuals are not identifiable there do not seem to be strong safeguards in place around this that I can ascertain. Would the parties be open to an audit and considering any privacy risks identified here as P0 priorities to fix, even if it degrades #AiHorde as a service or renders it infeasible? Is this something that #nivenly could invest in? (ACM 1.2, 1.6, 1.7, 2.4, 2.9) Privacy is a huge concern and we see increasing pressure from European regulators and other such groups as people get more familiar with and define a stronger concept of their rights around their own data. Even things as simple as default retentions or documented privacy features could help here.
A follow on to (1): Given the structure of #Haidra there are significant environmental concerns in both the training and execution of models under the #AiHorde. Can benchmarks be set to reduce this environmental impact over time? Would the Haidra project be amenable to treating this as a high priority and to hold themselves accountable to a reasonable schedule here? (ACM 1.1, 1.2, 3.2; IEEE-SE 2.07) The "big tech" alternatives in these spaces often have been seeing a move toward green data centers, which while there are substantive criticisms to be had there, they are generally going to be more efficient (even when not "green") than many individual home systems contributing. As such, it would be good to have an idea of how much impact the "AI Horde" has and set benchmarks here so that it can be evaluated and targeted for reduction over time.
I view it as a mistake to think of this as "all or nothing." Are there good intermediary steps that could be considered by #nivenly and #haidra such that haidra could be grown into a better project over time as a series of targets? For example, associating without making them a "Nivenly project" until some set of criteria are met? Providing some resources but limiting fiscal ones until other criteria or benchmarks are achieved? (ACM 3.2, 3.5) Basically: does Nivenly's involvement improve haidra's stance on these issues and, if so, in what ways? Haidra may be a great project but potentially one where it would be more appropriate given the nature of the topic to give some form of conditional support. At a minimum, it would be good to know exactly what "support" here entails.
What does #nivenly's involvement wrt its mission of "sustainable governance to open source projects and communities around the globe and supports the maintainers’ independent oversight of their projects" mean for #haidra? What does Haidra hope to get out of this? How about nivenly? (Nivenly Mission Statement) Basically, for Nivenly, is this project a good example of what you want to showcase? for Haidra, what are you hoping to get out of this partnership? Making this explicit would be a very positive step.

References are to one of:

Nivenly The Nivenly covenant, mission statement, or other docs.
ACM The ACM Code of Ethics and Professional Conduct
IEEE-SE The IEEE Software Engineering Ethics and Professional Practices Code of Ethics

My point in referencing these is not to say that any specific one should be adhered to, but rather to highlight how these are concerns within the software engineering discipline that have existed for quite some time and the ethical considerations here are much more ubiquitous than any single discussion on generative AI. I have not attempted a thorough and complete analysis with respect to the above and do not believe that they are at all "deal killers" (the ACM specifically says it isn't an "algorithm") but rather a framework under which to discuss what is good, what is bad, and what can be done about the challenges.They also form coherent starting points for discussing ethical concerns by enabling the question of "if you don't accept this that's fine, but what do you believe."

10 replies

durka Aug 6, 2023

It seems ridiculous to me that it would be actually impossible to estimate. You can choose averages for the variable components or come up with a range. Instead of hiding behind statements like "the number of workers is arbitrary" you can look at how many there are right now and how the answers scale with that number. I think it's more likely that those best equipped to make estimates haven't done it because the results might be awkward.

MadokVaur Aug 6, 2023

"I think it's more likely that those best equipped to make estimates haven't done it because the results might be awkward"

This is really getting pretty far out into the weeds.

/EOF

wesen Aug 6, 2023

That's selling ourselves short as engineers. This is not the time to taking the easy way out, and I would be very unwilling to support opensource projects that don't tackle the important and yes, hard problems. This is a time to be ambitious, there is an entire new field opening up to opensource developers.

We can build tooling into the system itself to provide accurate results not just to the particular user, but to the project in general.
We can give an estimate of what a requests would cost.
We can integrate these costs into the economy model
We can gather and provide data for the environmental costs of different deployment strategies.
We can gather data what parameters, models, workflows require in terms of computation.

This is where opensource has serious advantage over proprietary solutions: there are no incentives to hide that data, and every incentive in laying everything out in the open.

henk717 Aug 6, 2023

@durka Assuming that a small group of spare time coders doing it as a passion / hobby project would hide environmental statistics on purpose is just silly. I am surpriced if the idea ever crossed any hobby coders mind. And the people best equipped to do so are probably the ones raising the issue, I have already posted the source of the data which includes how many generations were done.

Edit: To recap you'd need something like this.

How many volunteers are usually on the platform? (API link)
What hardware do they use? (Some mention it most don't)
What is their power source? (Nobody knows)
What are they not doing that they would otherwise be doing? For example, in the winter I use my worker as a 200W space heater instead of my 2000W space heater for a more productive gradual heat. And in the summer the panels cover it in full. Does Horde have a positive or negative impact in my case? And how do you begin to know?
What are the users not doing that they would otherwise be doing? If someone uses Horde on a low power device instead of AI on a power hungry desktop PC that would be a win.

So a fair assesment takes all 5 in to account and probably more. I don't think any of the contributors are equipped to analyze that. The stable diffusion AI calculator expects a single GPU type in a specific datacenter. Which is the furthest away from Hordes scenario.

durka Aug 6, 2023

First of all, thanks for providing that list of considerations in your edit -- that's a good first step towards making the estimates we need!

I'm not accusing anyone of hiding environmental statistics. Just not being particularly interested in gathering them. But the Horde is trying to expand, as hordes tend to do. In my opinion, it reflects poorly on the organizers of the project that they don't seem to have considered any externalities of GenAI, environmental impact being just one. And it also reflects poorly on the Nivenly board that they never asked.

nivenly-foundation · 2023-08-08T17:47:04Z

nivenly-foundation
Aug 8, 2023
Collaborator

Hello everyone!

Time sensitive reminder first: the question gathering period will close at 11:59 PM UTC.

Second: we wanted to thank everyone to take the time to share their questions, concerns, and expertise as part of this process. It is very important to us that people are heard and have the questions and information that they are seeking. For the questions, further nuance and detail will be provided in our follow up blog posts about Nivenly and Haidra.

Something that will inform or member discussion, and that would be helpful to see here, are thoughts are what working together looks like. This is especially important in a co-op model, as members have the ability to drive change. In the more traditional, and likely familiar, corporate model companies are influenced by one-directional, inbound, requests for the company to do / not do something.

In the co-op model, members can make changes. Our chambers are designed to allow members, maintainers, and trade members to work together as often as possible. This means that members can take more actions than informing the board that they would like to see more information or change a specific decision. They can also work with the projects. There is a whole blank whiteboard of possibility here, but to seed some ideas so you know what we mean: working with a project can mean identifying an issue with a project, and requesting the project change it. It can mean working with the project, not only via assisting with the specific implementation details but also working with the project so they have other resources they might need to resolve whatever is being requested like mentorship, having panels with the relevant disciplines (e.g. data scientists, artists, etc. for an AI art project), and so on.

To say it a little more succinctly: in a co-op model like ours where people are building together, not only can members work with Nivenly leadership to build Nivenly but they can also build with projects to help with those projects. Working with projects does not necessarily mean code contributions, though it can, but rather the feedback loop of what you'd like to see and working with Nivenly and the project to figure out how to make that happen.

We'll be posting something like this in the member discussion when we start it as well, but wanted to have people thinking about it before then. When you see a project that needs changes, how would you work towards those changes? What outcomes would everyone be happy and comfortable working towards as members and maintainers? What resources are needed to arrive toward that outcome (our outcomes)?

0 replies

alxndr42 · 2023-08-08T18:44:03Z

alxndr42
Aug 8, 2023

It is telling that when called out for using unethically created models, all someone (apparently) from the Haidra project had to say on Mastodon was that their usage is "completely legal", and that they "consider" Haidra to be "completely ethical". Well ok then!

That is not a good place to start from in general, but especially not for Nivenly.

0 replies

Azuaron · 2023-08-08T20:26:34Z

Azuaron
Aug 8, 2023

I wanted to pull this out of the other guy's thread to speak about it more broadly. From @nivenly-foundation:

This touches on user consent. We believe that all consent models should be transparent and easy to understand. This ensures that if someone is choosing to share their data, that they can do so with informed consent and maintain control over what is shared or not. This means that there should be no question about what and which data is being shared... Existing defaults would also need to remain, so that no one finds they have been sharing data they do not intend (by being opted in without their consent, which would violate the consent model).

If this is the stance of Nivenly on data consent, then how is AI Horde even being considered as a project? It requires and has enhanced Stability Diffusion (and probably an equivalent text model) that, without consent, appropriated art from across the entire internet, unknown to the artists, many of whom have publicly come out against these AI models.

Or does the consent model only apply to data under the umbrella of Nivenly, and anyone else's data can be harvested without consent?

0 replies

DanHulton · 2023-08-08T22:01:38Z

DanHulton
Aug 8, 2023

Before this closes down, I do just want to register my core complaint, even if it's not entirely germane to the specific stated discussion. I think plenty of ink has been spilled on the concept of and concerns around data provenance practices so far, and that is great, and I'm glad we're having this discussion. However, my primary issue with Haidra and any other project working in that same space really isn't around the provenance of the data it used to create the models it uses, but what the models are for and what they do.

In a time where artists, authors, and even art itself are being devalued, and generative AI is a powerful, damaging tool being used to devalue it further. It's not like there are no ethical uses of generative AI, but Haidra and other such tools isn't one of them. Sure, a world where only the megacorps have access to these tools is a bad one, but a world where everyone has access and considers it okay to use them is not any better.

Potentially we should also have a discussion around "What (if any) machine learning usage practices does the Nivenly Foundation support?", but I wanted to make sure that if this discussion was the one being used to decide What To Do About Haidra, then this basic objection got at least a little more attention and was not lost among the primary discussion, since even if we can solve the data provenance issue around Haidra, I still would not support its inclusion.

0 replies

dracon-prime · 2023-08-08T23:40:02Z

dracon-prime
Aug 8, 2023

hello, I've been hesitating to pose this question since it's slightly off the stated discussion topic on ethics and provenance, but only slightly. I did mention it in the Mastodon thread to be stiffly told everything was "completely legal". I'm repeating my reaction to it here to make sure it's included in the followup blog posts.

Apart from the ethics questions that others have posed in this thread and on Mastodon, I have this specific one:

the reality is that Stable Diffusion (SD) is a defendant in copyright litigation, and AI Horde is primarily SD (see "GUI: Image generation" on AIHorde.net). What mechanisms will Nivenly emplace to indemnify the cooperative and its members from getting swept up as a defendant if the litigation expands to include AI Horde? IANAL but Cooperatives in the US do not have the protections corporations do, unless they register as a corporation.

The above slightly off-topic question has led me to reread all of nivenly.org and reconsider the lack of information wrt the implied general questions, which are much further off topic, but I'd really like to see them addressed on the website. TL'DR: what exactly is Nivenly, as a legal entity, doing and how exposed are members to liability for actions/decisions we ostensibly own but in reality do not control?

where is the cooperative registered, what state? different states have different rules and fees for cooperative entities (e.g. infos on Nebraska, California, Washington; I'm most familiar with California and Michigan via NASCO). Non-profits also have to be registered. there's no link or info about any of this under Docs.
no matter how you feel about the word "corporate" most states want co-ops to have articles of incorporation, where are those filed?
what are the rules for members, wrt fiscal responsibility? ("membership is $7/month" is a bit lite when we are potentially about to be on the hook for legal costs stemming from Getty Images taking Stability AI to the cleaners. please note I am not objecting to the fees given the costs incurred by hachyderm; that's not the concern.) often these are called "bylaws".

Running a co-op is hard work, building consensus is hard work, but also there is the bureaucracy to make it a legal entity that can handle the consequences of deciding to support or not support projects as Nivenly intends to.

0 replies

nivenly-foundation · 2023-08-09T00:02:17Z

nivenly-foundation
Aug 9, 2023
Collaborator

Thank you to everyone for adding your thoughts, needs, experiences, and expertise to this thread and for those of you who have communicated with us elsewhere. We'll be using the discussion here to ensure both Haidra's and Nivenly's blog posts address the needs and requests for information that are included here.

Communicating timelines:

Now - closing thread
Monday the 14th: posting Nivenly and Haidra blog posts
Also Monday the 14th: opening the General Member Discussion, that will be informed by the discussion here as well as the answers and additional information that will be in the blog posts.

Outcome of the next steps will determine what happens next, e.g. the requested member election and/or other governance processes.

Again thank you to everyone - both Nivenly members and non - who are contributing to this discussion.

0 replies

What (if any) machine learning data provenance practices does the Nivenly Foundation support? #2

What is Haidra/AI Horde?

Example: the problems of LAION

What is Nivenly's stance on data provenance in machine learning?

Replies: 17 comments · 29 replies

SnoopJ Jul 31, 2023 Author

Messaging from Nivenly on this topic has been confusing

How are prospective projects evaluated for Nivenly membership?

Haidra / GenAI in general

EOF

Footnotes

How do I submit a project to the Nivenly Foundation?

nivenly-foundation Aug 8, 2023 Collaborator

nivenly-foundation Aug 8, 2023 Collaborator

nivenly-foundation Aug 9, 2023 Collaborator

Replies: 17 comments 29 replies

SnoopJ
Jul 31, 2023
Author

nivenly-foundation Aug 8, 2023
Collaborator

nivenly-foundation
Aug 8, 2023
Collaborator

nivenly-foundation
Aug 9, 2023
Collaborator