Let’s say we have lemmy instances A, B, C.
alice from A makes a post “Hello, world” to B. What happens? How is it processed on servers A, B, C and how do users from A, B, C receive her post?
alice from A makes a post “Hello, world” to B
Alice can’t make a post to B, but I assume you mean a community on B, let’s call it foo. When Alice makes a post it first goes through A’s local API and creates the local (and canonical) version of Alice’s post. Once A has finished processing Alice’s post, it will create an ActivityPub representation of Alice’s post to send to B.
ActivityPub is basically a bunch of assumptions laid on top of JSON. An ActivityPub ‘file’ can be divided into broadly 3 types,
Object
,Activity
and actors.[1] These types then have subtypes; for example, both Alice and foo are actors but Alice is aPerson
while foo is aGroup
.A second important assumption of ActivityPub is the concept of inboxs and outboxs, but, for Lemmy, only inboxs matter. An inbox is just a URL where Lemmy can send activities and it’s something all actors have.
So when instance A is finished processing Alice’s post, it will turn it into a
Page
object, wrap that in aCreate
activity and send it foo’s inbox.Round about what the JSON would look like
{ "@context": [ "https://join-lemmy.org/context.json", "https://www.w3.org/ns/activitystreams" ], "actor": "https://a/u/alice", "type": "Create", "to": ["https://www.w3.org/ns/activitystreams#Public"], "cc": ["https://b/c/foo"], "id": "https://a/activities/create/19199919009100", "object": { "type": "Page", "id": "https://a/post/1", "attributedTo": "https://a/u/alice", "to": [ "https://b/c/foo", "https://www.w3.org/ns/activitystreams#Public" ], "audience": "https://b/c/main", "name": "Hello world", "attachment": [], "sensitive": false, "language": { "identifier": "en", "name": "English" }, "published": "2024-12-29T15:10:51.557399Z" } }
.
Now instance B will then receive this and do the same kind of processing A did when Alice created the post via the API. Once it has finished, it will turn the post back into a
Page
but this time wrap it in anAnnounce
activity. B will then look at all the actors that follow the foo (i.e. are subscribed to it) and send this Announce to all of their inboxs. Assuming a user on instance C follows foo, it will receive thisAnnounce
and process it like A and B before it, creating the local version of Alice’s post.Edit: I made a small mistake, I said that foo wrapped the
Page
in anAnnounce
, when it actually wraps theCreate
in anAnnounce
.
Technically,
Activity
and actors are themselves objects, but they’re treated differently. There’s alsoCollection
’s which are their own type, but Lemmy doesn’t really utilise them. ↩︎
Thank you, very clear.
So B will list all users subscribed to foo, look at their instances, and send the update to them.
I assume that if someone from a new instance (D) subscribes to foo, then D will need to request all the old posts from foo, since they weren’t pushed to D?
I assume that if someone from a new instance (D) subscribes to foo, then D will need to request all the old posts from foo, since they weren’t pushed to D?
Lemmy is pretty bad about backfilling content. Communities do have outboxs, but these only list the last 50 posts and you can’t get the vote or comments on any of them. See GitHub issues #5283, #3448 and #2004.
ActivityPub works like a magazine subscription. They don’t send you back issues for subscribing.
Why does a mastodon user get completely different profiles and history when viewed from different lemmy instances? They look like 2 completely different users when compared except for having the same @address. In fact this makes them immune from moderation if they comment from a different instance than the mod is on.
Mastodon doesn’t have
Group
support (fep-1b12), so when they reply to a post, they don’t send it to the community’s inbox (only to the inbox of thePerson
they’re replying to), thus breaking Lemmy’s model of federation.Okay, thanks.
Does ActivityPub really send copies of all activities to www.w3.org?
No, the
https://www.w3.org/ns/activitystreams#Public
is just there to indicate that it’s ok for receiving instances to display this publicly, nothing actually gets sent to it. See the spec for more details.Why not a binary flag or something? Is it just to avoid making it a formal part of the protocol?
Because it is JSON-LD and that’s how JSON-LD works. It’s an extensible format. Similar to XML namespaces.
So overengineered bullshit
I don’t understand the comment. It’s like calling the fact that
firstName
is in the JSON{"firstName": "Bob"}
“over engineered bullshit” when they should’ve made some application specific protocol instead of using JSON. ActivityStreams and ActivityPub are built on top of JSON-LD to utilize existing libraries to represent linked data (that’s what the LD is). To specify what schemas are used there is a “context” field. There are other schemas as well. Take a look at https://schema.org/ to see them.If it feels over engineered it’s because it’s meant to be able to represent a wide variety of types of social media and typical interactions with them. I seriously doubt Mastodon (micro blogging) and Lemmy (link aggregation forum) would be able to interact easily if they weren’t “over engineered”.
I don’t care, json-ld is itself overengineered, ie bloating every JSON that you send with 300 useless http:// links without an actual purpose (instead of a boolean flag or whatever) This bloated protocol doesn’t even… work properly.
I actually don’t know, you’d need to ask someone privy to design decisions made with ActivityPub, like Prodromou or Lemmer-Webber. It’s definitely not to avoid making it part of the protocol, because it already is (see the link in the last comment).
It’s because it’s JSON-LD.
What about JSON-LD makes it so they have to include the “this is public” declaration in the
to
field instead of having anas:public
property on the object? (I don’t know a whole lot about JSON-LD or RDF more broadly)
Thanks—I meant “formal” as in “formal grammar”, not that it wasn’t described in the published protocol. As in, there’s nothing in the protocol’s explicit form that distinguishes between this implied meaning and a real extra recipient—so it simplifies the parsing but adds an extra post-parsing step.
Think of it this way, when you make a post that post will be automatically distributed by your server to everyone who is a subscriber, depending on the type of platform that could mean subscriber to the community, or it could mean to your user account in the case of things like Mastodon. When the post is received it will be copied and re-hosted on all the servers which have subscribers.
Exceptions to this happening are in the case of a user being banned or server being defederated, in which case the request is denied and the post isn’t re-hosted by the instance with the ban or defederation against the user or server who made the post. It should be known that bans and defederation only typically happen in extreme cases such as defending against spam, hate speech, or abusive users.
Might be a more simple explanation but I’m trying to keep it more simple since it helps people better understand the process.
It helps when you understand that you only ever directly interact with your instance.
- Alice posts to A (in some community hosted on B)
- B is federated with A so will eventually receive the post
- C is federated with B so will eventually get the post
The easiest way to explain it is that the instances have no native ability to crawl other instances for communities or content. For all intents and purposes, a fresh Lemmy server is on an island and all other instances are their own island until someone builds a bridge to them.
The ability of an instance to receive content is dependent on the subscriptions users add to the database. Once the instance is aware of these other places it will begin checking them for updates and you’ll see them regularly whether you interact with them or not.
This goes completely against what the average person is expecting and causes a lot of confusion.
Piefed instances now do have a form of this for instance admins to populate new instances.
Admins can:
-pull the lemmyverse data and subscribe to a bunch of communities at once
or
-target a single lemmy or mbin instance, get the list of communities that instance hosts, and subscribe to a bunch of communities on that instance.Both have some tunable settings to allow admins control over how many communities are followed.
Its not an end-user thing, but it should help with setting up new instances and them not being so ‘empty’.
edit: typo
That sounds like a much better implementation of community discovery.
This goes completely against what the average person is expecting and causes a lot of confusion.
But this is only true if the user looks at the All feed, correct?
But this is only true if the user looks at the All feed
It impacts what content is available to users at all. The All feed is just the visual representation of what’s actively federating.
Let’s say you join a new instance for whatever reason with no outside awareness of how the fediverse works. If you try to search the instance for “sportball” and get zero results the natural assumption is going to be that there are no communities and no interest in that topic. The user has no idea that lemmyserver5000.com has a sportball community with thousands of users because no one with those interests ever did the work to get the content flowing in a way that they could access it intuitively. It’s a poor design IMO.
The reason I brought it up has more to do with starting a new instance or using a smaller instance. Communities that the instance isn’t aware of (via someone previously subscribing) won’t show up at all which causes places to appear non-existent or dead by default. Someone trying a federating website for the first time isn’t going to know this, so to them, that’s all the fediverse has to offer.
OK, I see that problem. In fact I remember having the same issue myself. (Presumably this will create a secondary confusion problem for “All” subscribers, who will see the content of their feed gradually expand without explanation as other users subscribe to other foreign servers, correct? Whatever, I don’t care much about them, someone who subscribes to “All” apparently doesn’t know what they want anyway!)
So the optimal solution here would be for each instance to preemptively connect to a whitelist of known foreign communities, perhaps? Or maybe each instance could regularly ping other servers in order to update its search database with popular communities.
It’s a poor design if what you want to do is emulate a centralized social media service.
But maybe we should stop trying to do that.
Maybe.
But I’d counter that it’s prohibitive to growth. People aren’t used to turning up at a domain name only to find out 90% of the content can’t be accessed without jumping through a bunch of hoops.
instances have no native ability to crawl other instances for communities or content
That’s not quite true. They don’t do it automatically or routinely, but a user can cause a server to read a post from another server by putting its URL into the search box. This can be useful for an end user to manually address a federation glitch.
Here’s a concrete example. I was trying to post a comment via lemmy.world, but lemmy.world sits behind Cloudflare, and Cloudflare flagged its content as potentially malicious. I then posted that comment via my own Mastodon server, but push federation to lemmy.world also failed, for the same reason. I could, however cause lemmy.world to pull the comment using the search.
Does that mean that an “all” view is "onl"y all of the subscriptions/places people from my server have?
That’s quite interesting.
And thanks!
Does that mean that an “all” view is "onl"y all of the subscriptions/places people from my server have?
Correct.
Note that many instances either have a bot subscribed to other communities to force federation, or use something like https://lemmy-federate.com/
Note that many instances either have a bot subscribed to other communities to force federation, or use something like https://lemmy-federate.com/
FWIW this approach can be helpful but is flawed in its own ways.
Firstly, since not all instances participate you still aren’t getting the “complete” fediverse so to speak. This becomes less of an issue as more instances join the bot program, but it’s another step that roadblocks what should be an easy and organic process.
Secondly, the bot can pose a potential security risk depending on how it’s configured. If you use it to federate in both directions you’re subject to malicious actors spinning up tons of new communities on instances that don’t restrict user registration. This will in turn hammer the database an instance uses for EVERYTHING and eventually causes slow downs, crashes, etc. The solution to this is to only seed your communities outwardly but if everyone only does that the bot is rather useless…
I don’t have a solution for any of this, I’m just pointing out some rather frustrating problems this platform has in its current state.
Well, you can always defederate if an instance starts abusing it. Not that much different to the normal flow, really.
you can always defederate if an instance starts abusing it
Sure, but potentially after at least one of the instances subscribed to the bot goes down and someone realizes what’s happening. It’s incredibly easy to overwhelm a small server’s database just by subscribing to a lot of communities the normal way. The difference here is potentially any instance federating the bot in both directions is susceptible to this.
Not that much different to the normal flow, really.
The impact across the fediverse vs just one instance would be the main difference. Plenty of people are using that bot having no real idea of what it’s doing.
That’s just a part of the learning process, IMO. My instance crashed many times, I’ve fixed it every time and now it’s better than before. And I don’t think I’ve had my last fuck up with the instance.
And that’s fine for you, I’m not knocking the experimenting and learning process. That was the whole reason I spun up an instance myself.
What I’m saying is that to the other users that would be impacted by these things, it sucks. People are patient to a point but the fediverse has a lot of odd quirks that make it more difficult than it should be to use for a lot of people. Things have gotten better in the last year or so but it still feels like we’re asking people to know more than they should have to just to figure out that Lemmy isn’t empty. Many people will get frustrated and leave long before they start making excuses for a site they don’t know anything about.
It’s easy to sit around proclaiming that reddit sucks but the fact of the matter is that it’s easy to use and everything they have to offer is covered under one domain. Again, I don’t have the solution to these things for Lemmy, but we can’t deny that this platform is harder to use than most and a lot of people aren’t going to handle that well.
- A makes a post to B
- B federates that post to all instances that have at least 1 user subbed to the community of the post
All users from all instances get the post from their home instance.
Thanks but this is quite high-level.
Okay, so Alice makes a request to A. A makes a request to B. B makes requests to all other instances.
If you get posts from your home instance, does it mean that all instances duplicate the same database?
They don’t duplicate the database in a technical sense, but when things go right, they each have a copy of the same post and comment text, and the same votes.
Do you mean that the database is not identical, but still duplicates all data, basically? (you said “they each have a copy”, I assume it’s persistent on disk). So if we have 100 lemmy instances, they all save the same post.
Correct. Each server that shows the post to its users stores a copy of the post. It does not necessarily store attached media (IIRC Mastodon usually does and Lemmy usually hotlinks media).
If you get posts from your home instance, does it mean that all instances duplicate the same database?
Ur home instance only has a database of posts that are on a community that at least 1 user has subscribed to.