

I’ve toyed around with LLM-based moderation tools but it never really panned out. It was too hit or miss to be relied upon even with the temperature parameters turned way down in an attempt to get consistent results. Granted, I was using a small local model and not feeding it to one of the big players.
To give an example, I tried to keep it focused by creating one custom model per rule to enforce. An example prompt to mod calls for violence was basically:
System Prompt to Enforce "No Calls for Violence'" Rule [1]
ROLE: You are a forum moderator who does not want users calling for violence. Examine the input and analyze whether it violates any constraints.
KNOWLEDGE:
- {list of dog-whistle slang for calling for murder}
CONSTRAINTS:
- Content should not advocate violence
- Content should not normalize violence
- Content should not escalate tensions or fan flames
- Content should avoid promoting harmful stereotypes
- Content should not utilize broad, sweeping generalizations
- Content should not use dehumanizing language
- Content should not undermine human rights, due process, or the rule of law
FORMAT YOUR RESPONSES AS JSON:
{
reason: [A one to two sentence summary],
score: [On a scale of 0 to 10, how severe is the content advocating violence]
}
The score part of the response was my band-aid to get around the high number of both false positives and false negatives as I originally had it returning true or false only. Any score 7 or higher caused the item to be passed to the mod queue along with the reason, and I would review its actions later.
Ultimately it was slow and still somewhat unreliable, so I abandoned the idea after running it for a little less than a day since I can 't run bigger models to get better results fast enough to keep up. Using a cloud based service was out of the question for many, many reasons, both financial and ethical.
To answer your question, as long as the models were locally hosted and properly tuned/tested, I’m fine with it in theory, except for the ideology part; that’s pretty messed up. While I don’t want my submissions used to train anyone’s model and take measures to prevent my own instance from being used as a data source, I remain aware that once I post something, I have no control over its fate the moment it federates out.
[1] Yes, I know that’s like half the comments that get posted around here. My goal was to try to have it mod things so posts were bases for actual discussions instead of being a knee-jerk rage factory.










God I miss bacon. The plant-based fake’un just doesn’t hit the spot.