My multipost-detecting usenet bot (David Filmer)

use...@davidfilmer.com

unread,

Aug 14, 2006, 6:28:56 PM8/14/06

to

(Note: This message is crossposted to the following newsgroups, as
these groups are affected by the subject bot: comp.lang.perl.misc,
perl.beginners, comp.lang.perl.modules, perl.dbi.users,
perl.beginners.cgi, alt.perl)

Greetings. As many of you are doubtless aware, I recently wrote and
deployed a usenet 'bot which identifies multiposted messages. After
manually flagging such messages for some time, it occurred to me that I
could let Perl do the work for me, and laziness took over.

FIRST OF ALL, I would like to apologize to the usenet community for
having done this unilaterally. I had genuinely not anticipated that
many folks would object or even care about this - it was a very minor
project for me to save me a bit of trouble now and then. I now know
that I should have posted an RFC before deploying my bot, and I would
have done so had I realized the level of interest it would generate.

If I've angered or annoyed anyone, I do apologize. I had no such
intent.

This topic is presently being discussed in a number of threads:
http://tinyurl.com/rdedx, http://tinyurl.com/m2e2r, and
http://tinyurl.com/oubbn (and possibly others), and the topic is
certainly OT to the first two threads (and the third thread is postured
as an attack article). Multiple threads are an ineffective way to
discuss a topic, and I hope that by opening this thread, I can
consolidate (rather than contribute) to the mess. I don't want to
re-hash these threads here; I hope interested folks will read those
messages but continue the discussions here (with good quoting, of
course, so others will be able to follow along).

I have read comments from many respected posters which were both
supportive and critical of my bot. In both cases, however, there was
often a strong sentiment that the bot message was too long and too
harsh.

I had a lot of temporary introductory text in the first couple of
messages that was never intended to be part of the regular bot
messages. That, however, was a mistake (as it led folks to believe that
I really intended to post such a long reply to every multipost). I
should have posted the messages without that additional explanatory
verbiage and perhaps included that additional information in a reply.

HOWEVER, reading many comments has led me to believe that it may not be
a good idea to include very much more than a very basic reply and a
link for more info. I argued against this idea (because I thought the
reply would not be very effective, as novice OPs don't often appear
to follow links) but I have reconsidered my opinion (due to what seems
to be a rough consensus, and because I realize the various strengths of
the other position).

I will therefore modify the bot to something per the suggestions that
John & Sinan made in http://tinyurl.com/oubbn. I have also changed the
bot's handle to my personally-named domain (so it's not anonymous).
It's a different handle than I'm posting under now (for folks who
may wish to killfile the bot but not killfile me). Those who have
already killfiled the bot will need to do so again (sorry) - you may
killfile m...@davidfilmer.net if you wish to killfile the bot.

Opinions have been expressed in roughly four categories:
1 - The whole idea of a bot sucks
2 - The idea is OK, but the implementation (auto-message) sucks
3 - Rock on
4 - Indifference (posted messages without expressing an opinion)

So far, most opinion seems to fall in the second or third category
(though opinions of the first category have been somewhat more vocal).
I believe I have taken measures to address many of the concerns of the
second category. As the discussion develops, if it seems the group
consensus does generally oppose the idea, I have no problem with
shutting it down and I will readily do so.

--
David Filmer (http://DavidFilmer.com)

use...@davidfilmer.com

unread,

Aug 14, 2006, 6:34:48 PM8/14/06

to

use...@DavidFilmer.com wrote:
> (Note: This message is crossposted to the following newsgroups

Hmmm. I'm posting from GoogleGroups (it's all I have access to at the
moment). Apparently GG must have some sort of crosspost limitation,
because the message didn't go to the comma-separated list that I
provided. Grrr - darn Google Groups.

Well, this is really the group whose members' opinions I respect the
most... so maybe it's for the best anyway.

John Bokma

unread,

Aug 14, 2006, 6:47:59 PM8/14/06

to

use...@DavidFilmer.com wrote:

> use...@DavidFilmer.com wrote:
>> (Note: This message is crossposted to the following newsgroups
>
> Hmmm. I'm posting from GoogleGroups (it's all I have access to at the
> moment). Apparently GG must have some sort of crosspost limitation,
> because the message didn't go to the comma-separated list that I
> provided. Grrr - darn Google Groups.

You didn't set a follow up to, so I am quite thankfull that the cross
failed.

Probably my last remark regarding this bot, personally I think a CfV
should be held. For example at least 50 votes, and the majority must vote
yes for this project to continue. Which is better then guessing where
opinions fall.

--
John Bokma Freelance software developer
&
Experienced Perl programmer: http://castleamber.com/

use...@davidfilmer.com

unread,

Aug 14, 2006, 7:02:18 PM8/14/06

to

John Bokma wrote:

> You didn't set a follow up to,

Ha - as if you can set a follow-up in GG... If there's a way to do
that, I don't know what it is. You just list your groups and hope GG
figures it out.

> Probably my last remark regarding this bot, personally
> I think a CfV should be held.

If I could trouble you for one more thing... Is there a
generally-accepted procedure for issuing (or voting in) such a call?
I've heard of this, but I don't actually believe I've ever seen it
done. I could envision such a call becoming (yet) another discussion.

I think a CfV is a good idea (but don't know how it should be handled).
I would feel better about continuing or canning the bot if I had a firm
idea of what the consensus is.

BTW, I would like to thank you... you were obviously very peeved at me,
but you were kind enough to provide helpful and constructive input when
I asked you for it.

John Bokma

unread,

Aug 14, 2006, 7:17:07 PM8/14/06

to

use...@DavidFilmer.com wrote:

> John Bokma wrote:
>
>> You didn't set a follow up to,
>
> Ha - as if you can set a follow-up in GG... If there's a way to do
> that, I don't know what it is. You just list your groups and hope GG
> figures it out.
>
>> Probably my last remark regarding this bot, personally
>> I think a CfV should be held.
>
> If I could trouble you for one more thing... Is there a
> generally-accepted procedure for issuing (or voting in) such a call?
> I've heard of this, but I don't actually believe I've ever seen it
> done. I could envision such a call becoming (yet) another discussion.

Technically (and IIRC) there should be first an RFD which contains the
proposal for such a bot, what it should and shouldn't do, what to use
for "From" etc.

Once such a discussion ends the starter of the RFD could create a
summary and post a second RFD.

If nothing new comes out of that one, a Call for Votes can take place.
Normally a CfV will not cause new discussions.

My experiece with RFDs and CFVs is limited to the creation of new groups
in the nl.* hierarchy. IIRC the 50 votes, and majority must be Y comes
from my memories of the documents I once read on the topic.

Did some googling:

<http://users.tkk.fi/~jpatokal/uvv/vote-faq.html>
<http://www.faqs.org/faqs/usenet/creating-newsgroups/part1/>

Not sure if the UVV wants to handle this vote.

> I think a CfV is a good idea (but don't know how it should be
> handled). I would feel better about continuing or canning the bot if I
> had a firm idea of what the consensus is.
>
> BTW, I would like to thank you... you were obviously very peeved at
> me, but you were kind enough to provide helpful and constructive input
> when I asked you for it.

Yes, I am well known for overreacting, I guess what triggered me the
most was that there was no contact info associated with the bot (it ran
anonymously), and the message was way too lengthy.

I still disagree with the whole idea, but have a few tips:

Remove all whitespace before you calculate the MD5SUM, this way you
might find posts that have been made by copy + paste and have additional
trailing/leading whitespace.

Make sure that the bot posts with a From that is easy to recognize.

Make sure that you provide a contact email address.

Only post a reply if there hasn't been made one yet.

Especially regarding the latter, due to how Usenet works your bot might
post the 2nd, 3rd or even more reply to a multipost.

Also, some people who multipost understand the issue, and cancel the
wrong post. Cancels always run after the facts. What you really want to
avoid is having your bot reply to a message that has been canceled a few
seconds earlier.

use...@davidfilmer.com

unread,

Aug 14, 2006, 7:50:24 PM8/14/06

to

use...@DavidFilmer.com wrote:

Anyone who wishes to view a sample of the much shorter (and more
gentle) message text may visit alt.test.test or see:

http://tinyurl.com/qvvqh

Sherm Pendley

unread,

Aug 14, 2006, 7:58:26 PM8/14/06

to

use...@DavidFilmer.com writes:

> Opinions have been expressed in roughly four categories:
> 1 - The whole idea of a bot sucks
> 2 - The idea is OK, but the implementation (auto-message) sucks
> 3 - Rock on
> 4 - Indifference (posted messages without expressing an opinion)
>
> So far, most opinion seems to fall in the second or third category
> (though opinions of the first category have been somewhat more vocal).

My opinion of the bot itself is somewhat indifferent. I don't see a need
for it and I don't believe it will significantly reduce the "problem" -
which I don't see as such - of multi-posting.

On the other hand, I dislike the way the message chastises anyone who's
thinking of replying to a multi-posted message, and attempts to "burn the
thread" by encouraging others to ignore it. Not everyone agrees with your
idea that a multi-posted message should receive no reply other than "please
don't multi-post".

For myself, I'd prefer to answer the posted question, and include a comment
in the answer about multi-posting, netiquette, and the group guidelines. If
constructive criticism of that sort is given *along with* an answer to the
posted question, it's more likely to be taken seriously. If it's given on
its own, the receiver is (IMO) more likely to dismiss the sender as a crank
and ignore the advice.

sherm--

--
Web Hosting by West Virginians, for West Virginians: http://wv-www.net
Cocoa programming in Perl: http://camelbones.sourceforge.net

John Bokma

unread,

Aug 14, 2006, 8:01:10 PM8/14/06

to

use...@DavidFilmer.com wrote:

> http://tinyurl.com/qvvqh

I probably would word it as follows:

You have posted the same message to several news groups in a form that
is called multiposting:

group1
<news:3oSdnVLzrpxMl3zZ...@giganews.com>

group2
<news:3oSdnU3zrpxMl3zZ...@giganews.com>

Multiposting is generally considered impolite. For an
explanation, please see:

http://www.cs.tut.fi/~jkorpela/usenet/xpost.html

which also explains crossposting, which is the recommended way to post
a single message to more then one group, if such is really needed.

(I left out Usenet, because most people consider Usenet "Google").

Brian Greenfield

unread,

Aug 14, 2006, 8:04:47 PM8/14/06

to

On 14 Aug 2006 15:28:56 -0700, use...@DavidFilmer.com wrote:

>If I've angered or annoyed anyone, I do apologize. I had no such
>intent.

As a long time lurker,and very occasional poster to clpm, I do find
your bot to be both angering and annoying, Please stop. Now.

use...@davidfilmer.com

unread,

Aug 14, 2006, 8:14:19 PM8/14/06

to

John Bokma wrote:
> Did some googling:
> <http://users.tkk.fi/~jpatokal/uvv/vote-faq.html>
> <http://www.faqs.org/faqs/usenet/creating-newsgroups/part1/>

Thanks. I'll read up. Does anyone know: Has this group ever conducted
such a vote (such as for the Posting Guidelines, etc?) or is stuff like
that done by an informal consensus?

> I still disagree with the whole idea, but have a few tips:
>
> Remove all whitespace before you calculate the MD5SUM, this way you
> might find posts that have been made by copy + paste and have additional
> trailing/leading whitespace.

Actually, the script has always:
$body =~ s/\W//g;
(I have observed several multiposts with extra leading spaces, and even
trailing ...'s)

> Make sure that the bot posts with a From that is easy to recognize.

That has now been done (see alt.test.testing or
http://tinyurl.com/qvvqh for an example of the new-and-improved
cop-bot).

> Make sure that you provide a contact email address.

That has also been done. It's my catch-all domain - I'll probably
spam-safe it like I do with use...@davidfilmer.com (which is a
blackhole with an informative autoresponder)

> Only post a reply if there hasn't been made one yet.

That's probably a good idea (although it's not uncommon for manual
flagging to be done subsequent to other replies). Making such a change,
however, would require some significant changes to the flow of the
program...

> Also, some people who multipost understand the issue, and cancel the
> wrong post. Cancels always run after the facts. What you really want to
> avoid is having your bot reply to a message that has been canceled a few
> seconds earlier.

I agree that would be an undesirable situation (though generally
unlikely, IMHO), but I'm not sure how to avoid it. Even posting
manually, I believe it's possible something like this could happen.
I'm pretty sure I've replied (manually) to posts that got pulled out
from under my feet, and only my reply remained (one such post, if I
recall, was in German, but I answered it anyway only to find the
original was gone - probably in favor of a .de group). I don't know if
it's possible to avoid this situation programmatically any more than it
is manually (but I'm open to ideas!)

use...@davidfilmer.com

unread,

Aug 14, 2006, 8:17:36 PM8/14/06

to

Sherm Pendley wrote:
> On the other hand, I dislike the way the message chastises anyone who's
> thinking of replying to a multi-posted message, and attempts to "burn the
> thread" by encouraging others to ignore it.

I've modified the bot (which addresses your perfectly valid
objections); you may find the new version more acceptable. See recent
messages in alt.test.test or http://tinyurl.com/qvvqh

Matt Garrish

unread,

Aug 14, 2006, 8:27:19 PM8/14/06

to

use...@DavidFilmer.com wrote:

Any chance of posting the group names along with the ids to simplify
lookups? For example:

This message has been multiposted as indicated by these message IDs:
alt.test.test : <news:3oSdnVLzrpxMl3zZ...@giganews.com>

Other than that the simplified message is drastic improvement.

Matt

Sherm Pendley

unread,

Aug 14, 2006, 8:40:27 PM8/14/06

to

use...@DavidFilmer.com writes:

Thanks - that's much better. It's concise and much less incendiary.

ax...@white-eagle.invalid.uk

unread,

Aug 14, 2006, 9:26:43 PM8/14/06

to

use...@davidfilmer.com wrote:
> Greetings. As many of you are doubtless aware, I recently wrote and
> deployed a usenet 'bot which identifies multiposted messages. After
> manually flagging such messages for some time, it occurred to me that I
> could let Perl do the work for me, and laziness took over.

> This topic is presently being discussed in a number of threads:
> http://tinyurl.com/rdedx, http://tinyurl.com/m2e2r, and
> http://tinyurl.com/oubbn (and possibly others), and the topic is
> certainly OT to the first two threads (and the third thread is postured
> as an attack article). Multiple threads are an ineffective way to

Now you have hit *my* pet annoyance... posting URLs in Usenet
postings without good cause... sorry, I'm not firing up a browser
to read them.

Axel

use...@davidfilmer.com

unread,

Aug 14, 2006, 10:21:26 PM8/14/06

to

ax...@white-eagle.invalid.uk wrote:
> Now you have hit *my* pet annoyance... posting URLs in Usenet
> postings without good cause...

sorry...

<news:JoPDg.386093$Mn5.194189@pd7tw3no>
<news:1MudndoQatE...@giganews.com>
<news:793Eg.5010$Qf....@newsread2.news.pas.earthlink.net>

John Bokma

unread,

Aug 14, 2006, 10:58:51 PM8/14/06

to

Sherm Pendley <sh...@Sherm-Pendleys-Computer.local> wrote:

> For myself, I'd prefer to answer the posted question, and include a
> comment in the answer about multi-posting, netiquette, and the group
> guidelines. If constructive criticism of that sort is given *along
> with* an answer to the posted question, it's more likely to be taken
> seriously. If it's given on its own, the receiver is (IMO) more likely
> to dismiss the sender as a crank and ignore the advice.

AOL.

John Bokma

unread,

Aug 14, 2006, 11:04:02 PM8/14/06

to

use...@DavidFilmer.com wrote:

> John Bokma wrote:

[..]

>> Make sure that you provide a contact email address.
>
> That has also been done. It's my catch-all domain - I'll probably
> spam-safe it like I do with use...@davidfilmer.com (which is a
> blackhole with an informative autoresponder)

What seems (or seemed) to work is usenet+bot@

spam harvesting bots seem to get only the bot@ :-D (The + is allowed in
email addresses).

>> Only post a reply if there hasn't been made one yet.
>
> That's probably a good idea (although it's not uncommon for manual
> flagging to be done subsequent to other replies). Making such a
> change, however, would require some significant changes to the flow of
> the program...

A programming challenge :-D

>> Also, some people who multipost understand the issue, and cancel the
>> wrong post. Cancels always run after the facts. What you really want
>> to avoid is having your bot reply to a message that has been canceled
>> a few seconds earlier.
>
> I agree that would be an undesirable situation (though generally
> unlikely, IMHO), but I'm not sure how to avoid it. Even posting
> manually, I believe it's possible something like this could happen.

Yes. I am sure that I have replied to canceled messages more then once
in the past years.

> I'm pretty sure I've replied (manually) to posts that got pulled out
> from under my feet, and only my reply remained (one such post, if I
> recall, was in German, but I answered it anyway only to find the
> original was gone - probably in favor of a .de group). I don't know
> if it's possible to avoid this situation programmatically any more
> than it is manually (but I'm open to ideas!)

You could check control.cancel, but it might be overkill.

Mumia W.

unread,

Aug 15, 2006, 4:15:52 AM8/15/06

to

On 08/14/2006 05:28 PM, use...@DavidFilmer.com wrote:
> [...]

> If I've angered or annoyed anyone, I do apologize. I had no such
> intent.

> [...]

Thank you Mr. Filmer. I can see how the 'bot would reduce a
lot of work, but, as you've acknowledged, its message was a
little long and harsh.

Whatever you do, please don't release the code. Hip-ç-rime
would make usenet a nightmare with it.

Michele Dondi

unread,

Aug 15, 2006, 5:14:42 AM8/15/06

to

On 15 Aug 2006 00:01:10 GMT, John Bokma <jo...@castleamber.com> wrote:

>which also explains crossposting, which is the recommended way to post

^^^^^
^^^^^ [*]

>a single message to more then one group, if such is really needed.

[*] is also frowned upon, but

Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
.'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,

Michele Dondi

unread,

Aug 15, 2006, 5:20:14 AM8/15/06

to

On Tue, 15 Aug 2006 01:26:43 GMT, ax...@white-eagle.invalid.uk wrote:

>> This topic is presently being discussed in a number of threads:
>> http://tinyurl.com/rdedx, http://tinyurl.com/m2e2r, and
>> http://tinyurl.com/oubbn (and possibly others), and the topic is
>> certainly OT to the first two threads (and the third thread is postured
>> as an attack article). Multiple threads are an ineffective way to
>
>Now you have hit *my* pet annoyance... posting URLs in Usenet

^^^^
^^^^

Perhaps you mean *web* URLS...

>postings without good cause... sorry, I'm not firing up a browser
>to read them.

I *am* firing it up, but I do agree: I, for one, would prefer
<news:...> urls. Best would be to put both these and the ones for the
web, to make life easier both to those using a "real" client and those
using a web based one.

use...@davidfilmer.com

unread,

Aug 15, 2006, 5:46:03 AM8/15/06

to

Mumia W. wrote:
> Whatever you do, please don't release the code. Hip-ç-rime
> would make usenet a nightmare with it.

Ya know, that type of thing really hadn't occured to me. Egads, what a
real nightmare that could be.

John Bokma

unread,

Aug 15, 2006, 1:49:15 PM8/15/06

to

Michele Dondi <bik....@tiscalinet.it> wrote:

> On 15 Aug 2006 00:01:10 GMT, John Bokma <jo...@castleamber.com> wrote:
>
>>which also explains crossposting, which is the recommended way to post
> ^^^^^
> ^^^^^ [*]
>>a single message to more then one group, if such is really needed.
>
> [*] is also frowned upon, but

Not really, but it's abused a lot, and it's the abuse that's frowned upon.
Note the "really needed", a lot of crossposters get that wrong ;-)

But you're right, maybe it should made a bit stronger.

Personally I don't have a problem with a crosspost if it's really needed
*and* has the follow-up to header set to the most appropriate group. Also
the number of groups should be limited in most cases.

John Bokma

unread,

Aug 15, 2006, 1:51:51 PM8/15/06

to

"Mumia W." <mumia.w.18.spa...@earthlink.net> wrote:

Uhm? People who need such programs can just download them including
software to auto-cancel and repost massively.

use...@davidfilmer.com

unread,

Aug 15, 2006, 2:08:13 PM8/15/06

to

John Bokma wrote:

> Personally I don't have a problem with a crosspost if it's really needed
> *and* has the follow-up to header set to the most appropriate group.

In my observation, many (if not most) crossposts come from
GoogleGroups. You cannot set a follow-up in GG - it does it for you
(and I'm 99.9% sure it sets it for all groups you've x-posted to).

Many new OPs, I believe, don't really know how (functionally) to
crosspost (and I think that's one reason why they multipost). I'm
reluctant to suggest that crossposting might be OK because many of
those posters won't read the fine print, and I think new OPs will
readily abuse it - they will tend to crosspost similar Perl newsgroups
(with all-inclusive follow-ups) and the groups will begin to look like
mirrors of each other.

I believe that it's unusual for most people to have a valid need to
crosspost (I can't recall needing to crosspost in years, and I rarely,
if ever, see Perl crossposts from known and respected posters, unless
it's a reply to an x-posted message with broad follow-ups). I think it
would be downright rare for a new poster to have a valid need to
crosspost, and I'd rather take a discouraging posture when mentioning
it.

ax...@white-eagle.invalid.uk

unread,

Aug 15, 2006, 3:42:50 PM8/15/06

to

John Bokma <jo...@castleamber.com> wrote:
> "Mumia W." <mumia.w.18.spa...@earthlink.net> wrote:

>> On 08/14/2006 05:28 PM, use...@DavidFilmer.com wrote:
>>> [...]
>>> If I've angered or annoyed anyone, I do apologize. I had no such
>>> intent.
>>> [...]

>> Thank you Mr. Filmer. I can see how the 'bot would reduce a
>> lot of work, but, as you've acknowledged, its message was a
>> little long and harsh.

>> Whatever you do, please don't release the code. Hip-ç-rime
>> would make usenet a nightmare with it.

> Uhm? People who need such programs can just download them including
> software to auto-cancel and repost massively.

It makes me wonder what software Cantor & Siegel used in their
infamous massive multiposting.

And that was over a decade ago.

I probably mentioned it before, but I found their book _How to Make a
Fortune on the Information Superhighway _ in a remainder bookshop and
bought it just for fun.

Axel

John Bokma

unread,

Aug 15, 2006, 7:01:06 PM8/15/06

to

use...@DavidFilmer.com wrote:

> I believe that it's unusual for most people to have a valid need to
> crosspost (I can't recall needing to crosspost in years, and I rarely,
> if ever, see Perl crossposts from known and respected posters, unless
> it's a reply to an x-posted message with broad follow-ups). I think it
> would be downright rare for a new poster to have a valid need to
> crosspost, and I'd rather take a discouraging posture when mentioning
> it.

Yup agreed. Xpost is rarely needed, I have xposted before but it's a small
% of my total except in those cases I was not aware that something was
xposted to 5 groups [1] :-D.

[1] Ages ago I have contributed to a thread that was xposted in 13 (!) or
so groups :-D.

John Bokma

unread,

Aug 15, 2006, 7:03:25 PM8/15/06

to

ax...@white-eagle.invalid.uk wrote:

> John Bokma <jo...@castleamber.com> wrote:
>> "Mumia W." <mumia.w.18.spa...@earthlink.net> wrote:
>
>>> On 08/14/2006 05:28 PM, use...@DavidFilmer.com wrote:
>>>> [...]
>>>> If I've angered or annoyed anyone, I do apologize. I had no such
>>>> intent.
>>>> [...]
>
>>> Thank you Mr. Filmer. I can see how the 'bot would reduce a
>>> lot of work, but, as you've acknowledged, its message was a
>>> little long and harsh.
>
>>> Whatever you do, please don't release the code. Hip-ç-rime
>>> would make usenet a nightmare with it.
>
>> Uhm? People who need such programs can just download them including
>> software to auto-cancel and repost massively.
>
> It makes me wonder what software Cantor & Siegel used in their
> infamous massive multiposting.

No idea. Posting to Usenet is not black art, you can do it manually via
telnet. Even if you write it all yourself in Perl (I once did as an
excersise) it doesn't take more then a few hours to have a working version
that posts test messages :-)

> And that was over a decade ago.
>
> I probably mentioned it before, but I found their book _How to Make a
> Fortune on the Information Superhighway _ in a remainder bookshop and
> bought it just for fun.

:-D. I would have bought it to, and not because English books are rare
here in Mexico :-D.

DJ Stunks

unread,

Aug 15, 2006, 7:16:19 PM8/15/06

to

Mumia W. wrote:
> Whatever you do, please don't release the code. Hip-ç-rime
> would make usenet a nightmare with it.

I don't grok "Hip-ç-rime"...

-jp

Mumia W.

unread,

Aug 15, 2006, 8:02:21 PM8/15/06

to

He is someone who attempts to destroy usenet every year or so.
I dare not spell his name correctly because he might be
searching for references to himself and if he finds any, he
might resurface.

use...@davidfilmer.com

unread,

Aug 15, 2006, 8:17:14 PM8/15/06

to

Mumia W. wrote:
> On 08/15/2006 06:16 PM, DJ Stunks wrote:
> > I don't grok "Hip-ç-rime"...
>

> He is someone who attempts to destroy usenet every year or so.

Wikipedia has a brief article, but I won't post a link because it
contains the word in cleartext (it's never a good idea to type the name
of trolls or vandals into cleartext messages). You ought to be able to
un-obfuscate the name easily enough (it has no dashes and can be
represented in 7-bit ASCII).

Nomen Nescio

unread,

Aug 16, 2006, 10:40:29 AM8/16/06

to

use...@DavidFilmer.com wrote:

> (Note: This message is crossposted to the following newsgroups, as
> these groups are affected by the subject bot: comp.lang.perl.misc,
> perl.beginners, comp.lang.perl.modules, perl.dbi.users,
> perl.beginners.cgi, alt.perl)

>
> Greetings. As many of you are doubtless aware, I recently wrote and
> deployed a usenet 'bot which identifies multiposted messages. After
> manually flagging such messages for some time, it occurred to me that I
> could let Perl do the work for me, and laziness took over.

Are you competing against Alan Connor for netkook status?

Maybe your bot should say <article not downloaded> before all the
# BLOCKS OF TEXT #.

If multiposts offend you, why don't you just ignore them like the
rest of us?

John Bokma

unread,

Aug 16, 2006, 1:04:28 PM8/16/06

to

David,

Two remarks:

1 - My Usenet client shows your bot's post as a new post instead
of a follow up to the original multipost. No idea if this is a bug
in my client, but if not, can this be fixed?

2 - Yesterday a fine example of one of my worries popped up: I saw
a post as a reply to a cancelled spam message. The problem is
that there is some time between spam being posted and the cancel,
so your bot might reply to each multiposted spam message.

Maybe a solution might be to scan the multipost for some keywords.
If they don't show up, don't let your bot post.

See also Dr. Ruud's reply, <ebviq3...@news.isolution.nl> which is a
reply to your bot replying to, again, a cancelled spam message.

Personally I again strongly advice to stop the bot until at least some
kind of voting has taken place. It still generates more posts then it
"prevents" at the moment.

use...@davidfilmer.com

unread,

Aug 16, 2006, 2:06:32 PM8/16/06

to

John Bokma wrote:
> 1 - My Usenet client shows your bot's post as a new post instead
> of a follow up to the original multipost. No idea if this is a bug
> in my client, but if not, can this be fixed?

I wouldn't want to suggest that your reader has a bug, but I do set
"References" and "In-Reply-To" headers. Even in GG, this shows info up
(see http://tinyurl.com/mo4op):

References: <1155737247....@75g2000cwc.googlegroups.com>
<1155736963.0...@74g2000cwt.googlegroups.com>
In-Reply-To: <1155736963.0...@74g2000cwt.googlegroups.com>

I'm not sure why your reader wouldn't show that as well.

> 2 - Yesterday a fine example of one of my worries popped up: I saw
> a post as a reply to a cancelled spam message.

The message isn't cancelled in GigaNews (as of this writing). And even
GG also still shows it.

> that there is some time between spam being posted and the cancel,
> so your bot might reply to each multiposted spam message.

The bot won't reply to a message less than five minutes old. Maybe I
should open that up a bit... but how long do cancels take to run? I
thought they ran within a minute or two on modern newsservers.

> See also Dr. Ruud's reply, <ebviq3...@news.isolution.nl> which is a
> reply to your bot replying to, again, a cancelled spam message.

As I mentioned to the good Doc, I believe a good (but very simple) spam
filtering strategy would be to look for web URLs in the body of the
message. Spam almost always has a web URL, but I don't ever remember
seeing a newbie post with a web URL. I think such a filtering rule
should be nearly 100% effective.

> Personally I again strongly advice to stop the bot until at least some
> kind of voting has taken place.

I'm still open to such an idea of a vote, but unsure how to implement
it. The info you kindly provided previously on the topic seems pretty
focused on newsgroup creation; it doesn't seem that it is designed to
be used (or has ever been used) for in-forum voting. How did CLPMisc
ever ratify the posting guidelines? I wasn't around back then...

Sherm Pendley

unread,

Aug 16, 2006, 2:53:53 PM8/16/06

to

use...@DavidFilmer.com writes:

> As I mentioned to the good Doc, I believe a good (but very simple) spam
> filtering strategy would be to look for web URLs in the body of the
> message. Spam almost always has a web URL, but I don't ever remember
> seeing a newbie post with a web URL. I think such a filtering rule
> should be nearly 100% effective.

It could monitor a few other groups that have nothing whatsoever to do with
Perl, CGI, or even programming - rec.food.cooking, for example. A message
that appears in a variety of unrelated groups is likely to be spam.

use...@davidfilmer.com

unread,

Aug 16, 2006, 3:41:10 PM8/16/06

to

Sherm Pendley wrote:
> use...@DavidFilmer.com writes:

> It could monitor a few other groups that have nothing whatsoever to do with
> Perl, CGI, or even programming - rec.food.cooking, for example. A message
> that appears in a variety of unrelated groups is likely to be spam.

That was also suggested previously... Lemme give that some thought.

To effectively cross-reference even a very small percentage of usenet,
I think I would need to look at several hundred groups.

My main concern is that I don't want to go astray of my usenet
provider's ToS, and hitting several hundred groups repeatedly could
possibly be interpreted as abusive (I'm not sure, since GigaNews
doesn't specify a specific abuse threshold. They don't prohibit bots,
as some providers do, but a bot that did something like that might
constitute abuse in their opinion).

John Bokma

unread,

Aug 16, 2006, 4:40:46 PM8/16/06

to

use...@DavidFilmer.com wrote:

> John Bokma wrote:
>> 1 - My Usenet client shows your bot's post as a new post instead
>> of a follow up to the original multipost. No idea if this is a
>> bug in my client, but if not, can this be fixed?
>
> I wouldn't want to suggest that your reader has a bug, but I do set
> "References" and "In-Reply-To" headers. Even in GG, this shows info up
> (see http://tinyurl.com/mo4op):
>
> References: <1155737247....@75g2000cwc.googlegroups.com>
> <1155736963.0...@74g2000cwt.googlegroups.com>
> In-Reply-To: <1155736963.0...@74g2000cwt.googlegroups.com>
>
> I'm not sure why your reader wouldn't show that as well.

I see:

Subject: Workflow Systems--Multiposted

Maybe the missing Re: breaks Xnews? Moreover, I suggest to follow the
netiquette: if you change the subject do it like:

New subject (was: old subject).

I suggest something like:

Multiposted (was: Workflow Systems)

Not sure if there should be a Re: in front though :-)

>> 2 - Yesterday a fine example of one of my worries popped up: I saw
>> a post as a reply to a cancelled spam message.
>
> The message isn't cancelled in GigaNews (as of this writing). And
> even GG also still shows it.

The (my) problem is that I now start to see replies to canceled messages
:-(

>> that there is some time between spam being posted and the cancel,
>> so your bot might reply to each multiposted spam message.
>
> The bot won't reply to a message less than five minutes old. Maybe I
> should open that up a bit... but how long do cancels take to run? I
> thought they ran within a minute or two on modern newsservers.

Depends on who cancels them :-) It can be hours :-) But yeah, if you
wait to long, other people might have been replying to the multiposter.
OTOH, I would check first if the multiposter has had any replies. If he
has, then don't post (IIRC you do that now?)

>> See also Dr. Ruud's reply, <ebviq3...@news.isolution.nl> which is
>> a reply to your bot replying to, again, a cancelled spam message.
>
> As I mentioned to the good Doc, I believe a good (but very simple)
> spam filtering strategy would be to look for web URLs in the body of
> the message. Spam almost always has a web URL, but I don't ever
> remember seeing a newbie post with a web URL. I think such a
> filtering rule should be nearly 100% effective.

As long as there are more false negatives then positives I am happy :-)

>> Personally I again strongly advice to stop the bot until at least
>> some kind of voting has taken place.
>
> I'm still open to such an idea of a vote, but unsure how to implement
> it. The info you kindly provided previously on the topic seems pretty
> focused on newsgroup creation; it doesn't seem that it is designed to
> be used (or has ever been used) for in-forum voting. How did CLPMisc
> ever ratify the posting guidelines? I wasn't around back then...

No idea :-) You might contact the vote collectors (VVU?). They might be
able to provide an answer and help you with the vote collection if
needed.

John Bokma

unread,

Aug 16, 2006, 4:42:14 PM8/16/06

to

use...@DavidFilmer.com wrote:

How about checking for

/perl/i
/cgi/i

and monitor some multi posts for more words? Can't be that many :-)

use...@davidfilmer.com

unread,

Aug 16, 2006, 7:09:54 PM8/16/06

to

John Bokma wrote:

> How about checking for
> /perl/i
> /cgi/i
> and monitor some multi posts for more words? Can't be that many :-)

Lemme go back and research some multiposts that I (and others) have
manually flagged and see how effective a keyword search would be. I
might find that /doesn'?t work/i would catch most of them... ;^))

> The (my) problem is that I now start to see replies to canceled messages

But my point is that the messages aren't cancelled... at least not on
my newsserver. Are they cancelled on yours (ie, do you see the reply
but not the original?)

> > ... but how long do cancels take to run?

> Depends on who cancels them

I was talking about automated spam cancels (by filtering systems), not
user cancels. Or is spam filtering done prior to publishing the post?
I'm not sure how newsservers filter out spam.

> If he has [replies], then don't post (IIRC you do that now?)

Not quite. The bot will flag the original multipost, even if it has
replies (it ignores ALL replies, including replies which might be
cut-and-paste answers to multiposted questions, or even cut-and-paste
answers to similar questions, either of which would hash out as
multiposts but aren't really multiposts, IMHO).

But I'm not sure that it is undesirable for the bot to reply to an
answered thread. There have been past cases where I've (unknowingly)
replied to a multiposted question, only to jump to another group and
realize it was multiposted. In that case, I still respond back to both
threads and flag the post. Even though I previously replied (and, who
knows, I might have even replied correctly), flagging the message may
discourage further "rewarding" the OP with additional assistance (or
corrections to a crap answer I gave... serves the guy right).

I'm not sure why the bot should not act in the same manner...

> New subject (was: old subject).
> I suggest something like:
> Multiposted (was: Workflow Systems)

I've already implemented something like this per Dr.Ruud's suggestion
in another thread.

John Bokma

unread,

Aug 16, 2006, 7:20:08 PM8/16/06

to

use...@DavidFilmer.com wrote:

> John Bokma wrote:
>
>> How about checking for
>> /perl/i
>> /cgi/i
>> and monitor some multi posts for more words? Can't be that many :-)
>
> Lemme go back and research some multiposts that I (and others) have
> manually flagged and see how effective a keyword search would be. I
> might find that /doesn'?t work/i would catch most of them... ;^))
>
>> The (my) problem is that I now start to see replies to canceled
>> messages
>
> But my point is that the messages aren't cancelled... at least not on
> my newsserver. Are they cancelled on yours (ie, do you see the reply
> but not the original?)

Yes.

Notice that a cancel message is a post with a request of removal. Each
news server can be configured to honor such a request, to not honor it
and can be configured to propagate such requests or not.

>> > ... but how long do cancels take to run?
>> Depends on who cancels them
>
> I was talking about automated spam cancels (by filtering systems), not
> user cancels. Or is spam filtering done prior to publishing the post?
> I'm not sure how newsservers filter out spam.

IIRC a spam filter will just drop the post. However, there are cancel
bots that work similar like yours, they just post a different message
(cancel), and there is a delay with those, of course.

I am also talking about manual cancels. In general you don't want to
reply to *any* message that is canceled. Practically this is impossible,
so I would be happy if all spam is not considered multipost.

>> If he has [replies], then don't post (IIRC you do that now?)
>
> Not quite. The bot will flag the original multipost, even if it has
> replies (it ignores ALL replies, including replies which might be
> cut-and-paste answers to multiposted questions, or even cut-and-paste
> answers to similar questions, either of which would hash out as
> multiposts but aren't really multiposts, IMHO).
>
> But I'm not sure that it is undesirable for the bot to reply to an
> answered thread. There have been past cases where I've (unknowingly)
> replied to a multiposted question, only to jump to another group and
> realize it was multiposted. In that case, I still respond back to
> both threads and flag the post. Even though I previously replied (and,
> who knows, I might have even replied correctly), flagging the message
> may discourage further "rewarding" the OP with additional assistance
> (or corrections to a crap answer I gave... serves the guy right).
>
> I'm not sure why the bot should not act in the same manner...

IMO no. Just assume that if there has been one reply that the poster
just got away with it. You might catch him/her later anyway. I would
(again) recommend to limit the number of messages the bot sends out.

>> New subject (was: old subject).
>> I suggest something like:
>> Multiposted (was: Workflow Systems)
>
> I've already implemented something like this per Dr.Ruud's suggestion
> in another thread.

Yeah, he beat me again :-)

use...@davidfilmer.com

unread,

Aug 16, 2006, 10:01:39 PM8/16/06

to

use...@DavidFilmer.com wrote:
> I believe a good (but very simple) spam filtering strategy would
> be to look for web URLs in the body of the message.

This was quick and easy, and I have done so.

I also noticed earlier that a pyramid scheme got flagged. All pyramid
schemes will contain an e-mail address in the message body (but real
multiposts rarely will), so I have coded around e-mail addresses as
well. Presently, messages are ignored if the message body (excluding
cut/tag line) matches per (using Regexp::Common):

if ($body =~m{
$RE{URI}{HTTP}{-scheme => qr{https?}}
| $RE{URI}{FTP}
| $RE{URI}{news}
| $RE{URI}{NNTP}
}xms) {
$log->debug("Ignoring msg with URI (spam?)");
next MSGNUM;
}
if ($body =~ m{($RE{Email}{Address})}xms) {
$log->debug("Ignoring msg with e-mail address (pyramid?)");
next MSGNUM;
}

I think that will ignore 99% of spam (with very few false negatives).

John Bokma

unread,

Aug 17, 2006, 12:15:16 AM8/17/06

to

use...@DavidFilmer.com wrote:

> use...@DavidFilmer.com wrote:
>> I believe a good (but very simple) spam filtering strategy would
>> be to look for web URLs in the body of the message.
>
> This was quick and easy, and I have done so.
>
> I also noticed earlier that a pyramid scheme got flagged. All pyramid
> schemes will contain an e-mail address in the message body (but real
> multiposts rarely will), so I have coded around e-mail addresses as
> well.

Yup, you can't ignore them based on a simple count of $$$$$ :-D.

--
John Experienced Perl programmer: http://castleamber.com/

Perl help, tutorials, and examples: http://johnbokma.com/perl/

Ben Morrow

unread,

Aug 16, 2006, 10:02:51 PM8/16/06

to

Quoth use...@DavidFilmer.com:

> John Bokma wrote:
>
> > The (my) problem is that I now start to see replies to canceled messages
>
> But my point is that the messages aren't cancelled... at least not on
> my newsserver. Are they cancelled on yours (ie, do you see the reply
> but not the original?)

Something you could perhaps try is keeping a database of all your bot's
posts and cancelling them if the original article gets cancelled.

Ben

--
Heracles: Vulture! Here's a titbit for you / A few dried molecules of the gall
From the liver of a friend of yours. / Excuse the arrow but I have no spoon.
(Ted Hughes, [ Heracles shoots Vulture with arrow. Vulture bursts into ]
'Alcestis') [ flame, and falls out of sight. ] benm...@tiscali.co.uk

use...@davidfilmer.com

unread,

Aug 17, 2006, 12:57:54 PM8/17/06

to

Ben Morrow wrote:
> Something you could perhaps try is keeping a database of all your bot's posts

already do that...

> and cancelling them if the original article gets cancelled.

Not sure how to do that... can I query for canceled jobs? Or would I
need to just check every post and see if it was still available?

Of course, in this particular case, apparently a message was canceled
on John's newsserver but not on mine (GigaNews). The bot runs on
GigaNews; it has no way to know about cancels on other servers.

I have never heard anything negative about GigaNews (such as that they
don't honor cancels). One thing I can say for sure about GN - it's
darn fast. But, if I find that they routinely don't honor cancels that
other servers do, I'll regroup.

Ben Morrow

unread,

Aug 17, 2006, 1:42:49 PM8/17/06

to

Quoth use...@DavidFilmer.com:

> Ben Morrow wrote:
> > Something you could perhaps try is keeping a database of all your bot's posts
>
> already do that...
>
> > and cancelling them if the original article gets cancelled.
>
> Not sure how to do that... can I query for canceled jobs? Or would I
> need to just check every post and see if it was still available?

I admit I don't really know, but I would have thought that the control
message would just come down to you with the rest of the news. Then you
can grab the msgid and cancel your replies.

> Of course, in this particular case, apparently a message was canceled
> on John's newsserver but not on mine (GigaNews). The bot runs on
> GigaNews; it has no way to know about cancels on other servers.

Well, this is Usenet :).

Ben

--
It will be seen that the Erwhonians are a meek and long-suffering people,
easily led by the nose, and quick to offer up common sense at the shrine of
logic, when a philosopher convinces them that their institutions are not based
on the strictest morality. [Samuel Butler, paraphrased] benm...@tiscali.co.uk

John Bokma

unread,

Aug 17, 2006, 2:05:43 PM8/17/06

to

use...@DavidFilmer.com wrote:

> Ben Morrow wrote:
>> Something you could perhaps try is keeping a database of all your
>> bot's posts
>
> already do that...
>
>> and cancelling them if the original article gets cancelled.
>
> Not sure how to do that... can I query for canceled jobs?

yes, those are posted in control.cancel

> Of course, in this particular case, apparently a message was canceled
> on John's newsserver but not on mine (GigaNews). The bot runs on
> GigaNews; it has no way to know about cancels on other servers.

I am not sure about that one. I know a bit about Usenet, but no idea if
cancels that are not honored might show up in control.cancel anyway.

> I have never heard anything negative about GigaNews (such as that they
> don't honor cancels). One thing I can say for sure about GN - it's
> darn fast. But, if I find that they routinely don't honor cancels
> that other servers do, I'll regroup.

Newsservers are connected and have up and down feeds, and each have their
own cancel policy. If A sends cancels to B, and B propogates them to C,
and C decideds not to propogate them further they never reach D (for
example).

individual.net was free, but costs now (since 2 years) 10 euro/year (about
10 USD).

use...@davidfilmer.com

unread,

Aug 17, 2006, 9:02:24 PM8/17/06

to

use...@DavidFilmer.com wrote:
> I recently wrote and deployed a usenet 'bot which identifies
> multiposted messages.

Per several helpful suggestions (thanks), the bot has been considerably
refined. Several people who had originally expressed reservations have
given positive feedback on the changes.

The big problem with the original bot (as I now realize) was the long
and heavy message. That issue was fixed a couple of days ago, and the
message text has now been further refined per an additional sugestion.

As several folks also suggested, the message cross-references now
include groupnames (indented for clarity).

The bot now ignores messages which contain e-mail addresses and certain
URIs. I believe this should be very effective in preventing the bot
from flagging spam (without admitting many false-negatives). Keyword
filtering (inclusive) was suggested, but I believe the e-mail/URI
approach would be more robust, based on a bit of research into old
multiposts.

The "References" headers have been tweaked so the I-R-T is always the
last item listed. Some readers weren't properly threading the bot's
reply; there was some speculation that re-ordering the References in
this manner might help (verification appreciated).

It was also suggested that the bot not reply to messages which already
have a reply. I'm looking into that - it would require some significant
changes to program logic; it's not a quick-n-easy thing to do (as were
these other things). And I'm not 100% convinced it's even a good idea
(multiposts with other replies are often flagged manually, right?).

I have not had a chance to look into ignoring control.cancel items yet.
But I've observed that cancels on some servers don't ever show up at
(or are not honored by) my provider (GigaNews), so even that would not
be guaranteed effective for all servers, given the oddities of Usenet
(however, I believe that most spam-related (non-)cancels would be
ignored by the e-mail/URI filtering anyway).

You may see an example of the current behavior of the bot at:
http://tinyurl.com/oll3u
or <news:n9GdnS7qRsw0mXjZ...@giganews.com>
or see recent "Lorem Ipsum" postings in alt.test.test

To tell you the truth, if I knew then what I know now, I don't think I
would have ever written this bot in the first place. But I *have*
written it, and it's getting pretty well refined (thanks to many
suggestions), and it seems to have settled into something that is
favorable (or at least not patenly objectionable) to many folks. John
Bokma has suggested some sort of vote, and I like that idea (though I'm
still not sure how to conduct it), but before attempting anything like
that, I'd like to let the bot run for 30 days or so to prove it out (so
folks have a more informed idea of exactly what they're voting
for/against). And September is just around the corner, so there should
be some good test cases popping up soon...

Further input, of course, is always welcomed and appreciated.

Dr.Ruud

unread,

Aug 17, 2006, 9:57:45 PM8/17/06

to

use...@DavidFilmer.com schreef:

> I've observed that cancels on some servers don't ever show
> up at (or are not honored by) my provider (GigaNews)

Also, the same provider can have 10 types of newsservers: ones that do
and ones that don't honor cancels.

--
Affijn, Ruud

"Gewoon is een tijger."