Collect spam data in a smart way #73

schlessera · 2017-01-29T11:17:39Z

Anonymously collect non-detected spam comments.

What data to collect:

Comments that were not detected as spam and for which the site user manually clicked the "Spam" button.

When to collect:

When the site user first clicks this "Spam" button, we should ask the permission to anonymously send the comment data to a centralized database, in order to improve Antispam Bee.

How to collect:

At first, send to a HTTPS endpoint that stores everything in a simple database (probably NoSQL). We may need to evaluate a more scalable solution in the future. The collected data must not contain any mention of the sender or information about their user or system. It should contain as much information as possible about the actual spam content and where it originated.

timse201 · 2017-01-29T15:16:39Z

are we or the user allowed to do that?
are there some copyright/law issues?
and if we do it everytime there could be some false positives because someone dislikes an user and marks them as spam or its only because someone posted a comment several times by misklicking or caching issues etc.

but i agree
if we are allowed to (no law issues) then we should make it simpler to submit spam

websupporter · 2017-01-29T15:28:25Z

I think, its a great idea. We should also include false positives.

I do not really see legal issues. In my understanding, if someone posts a comment, he gives the website owner the right to publish it. But honestly, I do not know how far this right can be stretched.

there could be some false positives

Yes, but right now, we have the same issue with our Google document. I think its worth a shot.

There should be an option in the settings like (send always, never send), maybe instead but as a addition to the question "do you want to send this specific comment?" to guarantee a quicker work flow.

schlessera · 2017-01-30T08:38:38Z

An alternative would be to add a separate button besides the Spam & Trash buttons. Something like Send for Analysis or similar. If they just want to get rid of their uninteresting newsletters, they will probably not click on Send for Analysis for these...

schlessera · 2017-01-30T08:39:47Z

And, yes, the original idea was to ask for permission once on clicking Spam and then have this be the new default.

Zodiac1978 · 2020-04-12T10:41:57Z

We could use the transformation action hooks comment_unapproved_to_spam and comment_approved_to_spam or we could provide a button / action link for this.

Possible problems: Privacy concerns (IP, Mail, Content, etc. from Comments) are submitted to us (or a Third-Party-Service like Google Forms).

This feature needs consent from the user:
https://developer.wordpress.org/plugins/wordpress-org/detailed-plugin-guidelines/#7-plugins-may-not-track-users-without-their-consent

krafit · 2020-04-12T10:46:03Z

In my opinion the best way to collect non-detected spam would be to add a link alongside “Mark as spam” — something like “report to Antispam Bee”. When a user clicks that link, they'll have to confirm that they are about to disclose the comment and its metadata to the ASB team for further investigation and to improve ASBs filters before its sent.

Zodiac1978 · 2020-04-12T10:52:44Z

To get even more data, we could use the action hooks if someone marks a comment as spam and then ask for the data (like PoEdit does this):

With an opportunity to opt-in to have this as the default.

krafit · 2020-04-12T11:02:42Z

I thought about an opt-in, but I didn't like the privacy implications of having this as a default for everyone after someone opted-in.
But we could handle the opt-in the way PoEdit does, by handling it on a per user basis. This way every user has the opportunity to give informed consent before sharing data (for the first time).

Zodiac1978 · 2020-07-21T12:01:17Z

If we stay with our workflow (using the Google Form) we could pre-fill the form like this:

https://docs.google.com/forms/d/e/1FAIpQLSeQlKVZZYsF1qkKz7U78B2wy_6s6I7aNSdQc-DGpjeqWx70-A/viewform?c=0&w=1&entry.437446945=name%20of%20the%20commenter&entry.462884433=IP&entry.1346967038=Host&entry.121560485=email%20of%20the%20commenter&entry.1210529682=website%20of%20the%20commenter&entry.1837399577=content%20of%20the%20comment

URL encoded data.

The user just needs to hit the "Send" button at the end of the page.

Zodiac1978 · 2020-07-22T21:20:58Z

If someone wants to test this feature: Here is a working addon plugin:

<?php
/**
 * Plugin Name: Report Spam
 * Description: Addon for Antispam Bee to report spam.
 * Plugin URI:  https://torstenlandsiedel.de
 * Version:     1.0
 * Author:      Torsten Landsiedel
 * Author URI:  http://torstenlandsiedel.de
 * Licence:     GPL 2
 * License URI: http://opensource.org/licenses/GPL-2.0
 */

if ( ! defined( 'ABSPATH' ) ) {
	exit; // Exit if accessed directly.
}

/**
 * Add comment action link to report spam to ASB
 *
 * @param array   $actions Array of actions.
 * @param comment $comment Comment object.
 */
function add_report_comment_action_link( $actions, $comment ) {

	// URLencode comment data.
	$name    = rawurlencode( $comment->comment_author );
	$email   = rawurlencode( $comment->comment_author_email );
	$ip      = rawurlencode( $comment->comment_author_IP );
	$host    = rawurlencode( gethostbyaddr( $ip ) );
	$url     = rawurlencode( $comment->comment_author_url );
	$content = rawurlencode( $comment->comment_content );
	$agent   = rawurlencode( $comment->comment_agent );

	// Build action link.
	$target = ' target="_blank" ';
	$rel    = ' rel="noopener noreferrer" ';
	$href   = 'href="https://docs.google.com/forms/d/e/1FAIpQLSeQlKVZZYsF1qkKz7U78B2wy_6s6I7aNSdQc-DGpjeqWx70-A/viewform?c=0&w=1&entry.437446945=' . $name . '&entry.462884433=' . $ip . '&entry.1346967038=' . $host . '&entry.121560485=' . $email . '&entry.1210529682=' . $url . '&entry.1837399577=' . $content . '&entry.372858475=' . $agent . '" ';

	$action  = '';
	$action .= "<a $target $href $rel>";
	$action .= __( 'Report to Antispam Bee', 'antispam-bee' );
	$action .= '</a>';

	$actions['report_spam trash'] = $action;

	return $actions;
}
add_filter( 'comment_row_actions', 'add_report_comment_action_link', 10, 2 );

Zodiac1978 · 2020-07-22T21:21:46Z

Zodiac1978 · 2020-07-22T21:23:02Z

Includes Comment User Agent as a new item (form is already extended for this) and it gets the host from the IP.

Zodiac1978 · 2020-07-23T09:00:09Z

there could be some false positives

We could add a checkbox at the end of the form "o This is a false positive and no spam" which could be checked before sending the form. Although I don't think many people would use it ...

Add report spam action link to spam list (#73)

stkjj · 2021-02-01T07:17:22Z

With regard to https://torstenlandsiedel.de/2021/01/31/antispam-bee-braucht-eure-juristische-hilfe/:

a) self hosted instead of google for sure (or at least a SaaS based within EU and proper data processing contract)
b) if consent is given by the submitter, everything is fine. Can the consent be withdrawn? Legally yes, factually no: once it's worked with, we of course could remove the data from the list of submittance, yet the evidence out of the case remains. At least as long as the submittance is taken care of in a timely manner ;-).
c) regarding the entity receiving: Indeed the biggest flaw as we are acting as a GbR which includes the chance that any random member of the GbR could be sued, fined, … This is the point where a discussion about changing the legal framework for the entity should take place. To be focused on the matter, I'ld suggest to seperate this from this issue. Happy to start this indeed internal discussion on our slack channel.

to get hands-on: The link "Report to Antispam Bee" should ideally give a modal with all neccessary information* e.g. which data is submitted, where it will be stored an for which amount of time, who will have access to it and how it will be purged as well as a note that the data is provided on a consensual base. At last each a confirm / decline button which than submits the data to a GDPR compliant server for further processing.

*let me draft something later this week

stkjj · 2021-02-01T13:19:17Z

For further discussion a text for the modal (de/en):

Vielen Dank dass Du uns hilfst Antispam Bee besser zu machen.

Du bist gerade dabei den Kommentar von [Name des Kommentators] mit dem Inhalt [Inhalt des Kommentars] an uns zu melden, da Du es für nicht erkannten Spam hälst. Folgende Daten haben wir außerdem in dem Kommentar gefunden, die wir für die Auswertung und die Heuristik von Antispam Bee verwerten werden:

[IP Adresse]
[Host]
[UserAgent]
[eMail Adresse des Kommentator]
[Webseite des Kommentators]

Wir werten diese Daten [automatisiert|manuell] aus um damit die Spamerkennung von Antispam Bee zu verbessern. Sofern wir mehrfach gleichlautende Meldungen über einen Spamer bekommen, nutzen wir diese Daten auch um damit Blacklist Updater zu aktualisieren. Die Daten werden von uns in den nächsten x [Stunden|Tagen] verarbeitet und danach automatisch gelöscht. Für den Zeitraum der Verarbeitung werden die Daten ausschliesslich auf Servern mit Standort Deutschland gespeichert. Lediglich das Entwicklerteam von Antispam Bee hat darauf Zugriff. Um den Prozess schlank zu halten, bekommst Du von uns keine weitere Rückmeldung über die Verarbeitung, Speicherung oder Löschung, aber unser Dank wird Dir gewiss sein.

Wenn Du mit der Übermittlung dieser Daten einverstanden bist, kannst Du sie mit dem Button unten absenden.
Button: Verwerfen / Button: Absenden

Thank you for helping us to improve Antispam Bee.

You are about to report the comment by [commenter name] with the content [content of the comment] to us, because you believe it is unrecognized spam. We also found the following data in the comment, which we will exploit for Antispam Bee's evaluation and heuristics:

[IP address]
[Host]
[UserAgent]
[eMail address of the commenter]
[website of the commenter]

We evaluate this data [automated|manually] to improve the spam detection of Antispam Bee. If we receive multiple identical messages about a spammer, we also use this data to improve Blacklist Updater. The data will be processed by us in the next x [hours|days] and then automatically deleted. For the period of processing, the data is stored exclusively on servers located in Germany. Access to this data is only granted to our developer team. To keep the process lean, you will not receive any further feedback from us about the processing, storage or deletion, but pls receive our thanks for your help.

If you agree to submit this data, you can send it using the button below.
Button: Discard / Button: Submit

krafit added this to the 2.10 milestone Feb 6, 2019

krafit added the Hacktoberfest label Sep 29, 2019

Zodiac1978 added enhancement and removed Hacktoberfest labels Nov 14, 2019

krafit mentioned this issue Apr 12, 2020

Report spam in ASB itself #330

Closed

Zodiac1978 added the feedback wanted label Apr 12, 2020

Zodiac1978 added a commit that referenced this issue Aug 13, 2020

Add report spam link

eceb64e

Add report spam action link to spam list (#73)

Zodiac1978 linked a pull request Aug 13, 2020 that will close this issue

Add report spam link #344

Draft

Zodiac1978 modified the milestones: 2.10, Future Release Jun 24, 2021

florianbrinkmann modified the milestones: Future Release, 2.11 Aug 12, 2021

florianbrinkmann self-assigned this Aug 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collect spam data in a smart way #73

Collect spam data in a smart way #73

schlessera commented Jan 29, 2017

timse201 commented Jan 29, 2017 •

edited

Loading

websupporter commented Jan 29, 2017 •

edited

Loading

schlessera commented Jan 30, 2017

schlessera commented Jan 30, 2017

Zodiac1978 commented Apr 12, 2020

krafit commented Apr 12, 2020

Zodiac1978 commented Apr 12, 2020 •

edited

Loading

krafit commented Apr 12, 2020

Zodiac1978 commented Jul 21, 2020 •

edited

Loading

Zodiac1978 commented Jul 22, 2020

Zodiac1978 commented Jul 22, 2020

Zodiac1978 commented Jul 22, 2020

Zodiac1978 commented Jul 23, 2020

stkjj commented Feb 1, 2021

stkjj commented Feb 1, 2021

Collect spam data in a smart way #73

Collect spam data in a smart way #73

Comments

schlessera commented Jan 29, 2017

What data to collect:

When to collect:

How to collect:

timse201 commented Jan 29, 2017 • edited Loading

websupporter commented Jan 29, 2017 • edited Loading

schlessera commented Jan 30, 2017

schlessera commented Jan 30, 2017

Zodiac1978 commented Apr 12, 2020

krafit commented Apr 12, 2020

Zodiac1978 commented Apr 12, 2020 • edited Loading

krafit commented Apr 12, 2020

Zodiac1978 commented Jul 21, 2020 • edited Loading

Zodiac1978 commented Jul 22, 2020

Zodiac1978 commented Jul 22, 2020

Zodiac1978 commented Jul 22, 2020

Zodiac1978 commented Jul 23, 2020

stkjj commented Feb 1, 2021

stkjj commented Feb 1, 2021

timse201 commented Jan 29, 2017 •

edited

Loading

websupporter commented Jan 29, 2017 •

edited

Loading

Zodiac1978 commented Apr 12, 2020 •

edited

Loading

Zodiac1978 commented Jul 21, 2020 •

edited

Loading