Handle remote federated users in the Gitea database #2

Closed
opened 2022-04-25 21:40:04 +00:00 by Ta180m · 6 comments
Ta180m commented 2022-04-25 21:40:04 +00:00 (Migrated from github.com)

Feature Description

We have to store federated users in the database somehow. One logical way to do this is to reuse the user table. This is what lunny said about this in the Gitea chat:

Since user table already has two types column, one is LoginType which could be NoType, Plain, LDAP, SMTP, PAM, DLDAP, OAuth2,SSPI. Another column is UserType which could be Individual or Organization.
We can have a new value for LoginType if reuse user table.
Federate, but we need another table to store extra information. Maybe external_login_user table.
And every external Gitea instance could be added as a record in login_source
A login_source could also be disabled from Admin UI.

@zeripath Any thoughts? Are we going to go ahead and try implementing it that way?

Screenshots

No response

### Feature Description We have to store federated users in the database somehow. One logical way to do this is to reuse the user table. This is what lunny said about this in the Gitea chat: > Since user table already has two types column, one is LoginType which could be NoType, Plain, LDAP, SMTP, PAM, DLDAP, OAuth2,SSPI. Another column is UserType which could be Individual or Organization. > We can have a new value for LoginType if reuse user table. > Federate, but we need another table to store extra information. Maybe external_login_user table. > And every external Gitea instance could be added as a record in login_source > A login_source could also be disabled from Admin UI. @zeripath Any thoughts? Are we going to go ahead and try implementing it that way? ### Screenshots _No response_
xy referenced this issue from a commit 2022-05-10 00:13:30 +00:00

OK, I've finally had some time to think about this.

First thoughts:

  • I've never really liked this UserType thing in the User table - but I guess if it's here we should use it.
  • The User table is too large - we need to move the login and preferences stuff out from it completely.
User table
// User represents the object of individual and member of organization.
type User struct {
	ID        int64  `xorm:"pk autoincr"`
	LowerName string `xorm:"UNIQUE NOT NULL"`
	Name      string `xorm:"UNIQUE NOT NULL"`
	FullName  string
	// Email is the primary email address (to be used for communication)
	Email                        string `xorm:"NOT NULL"`
	KeepEmailPrivate             bool
	EmailNotificationsPreference string `xorm:"VARCHAR(20) NOT NULL DEFAULT 'enabled'"`
	Passwd                       string `xorm:"NOT NULL"`
	PasswdHashAlgo               string `xorm:"NOT NULL DEFAULT 'argon2'"`

	// MustChangePassword is an attribute that determines if a user
	// is to change his/her password after registration.
	MustChangePassword bool `xorm:"NOT NULL DEFAULT false"`

	LoginType   auth.Type
	LoginSource int64 `xorm:"NOT NULL DEFAULT 0"`
	LoginName   string
	Type        UserType
	Location    string
	Website     string
	Rands       string `xorm:"VARCHAR(32)"`
	Salt        string `xorm:"VARCHAR(32)"`
	Language    string `xorm:"VARCHAR(5)"`
	Description string

	CreatedUnix   timeutil.TimeStamp `xorm:"INDEX created"`
	UpdatedUnix   timeutil.TimeStamp `xorm:"INDEX updated"`
	LastLoginUnix timeutil.TimeStamp `xorm:"INDEX"`

	// Remember visibility choice for convenience, true for private
	LastRepoVisibility bool
	// Maximum repository creation limit, -1 means use global default
	MaxRepoCreation int `xorm:"NOT NULL DEFAULT -1"`

	// IsActive true: primary email is activated, user can access Web UI and Git SSH.
	// false: an inactive user can only log in Web UI for account operations (ex: activate the account by email), no other access.
	IsActive bool `xorm:"INDEX"`
	// the user is a Gitea admin, who can access all repositories and the admin pages.
	IsAdmin bool
	// true: the user is only allowed to see organizations/repositories that they has explicit rights to.
	// (ex: in private Gitea instances user won't be allowed to see even organizations/repositories that are set as public)
	IsRestricted bool `xorm:"NOT NULL DEFAULT false"`

	AllowGitHook            bool
	AllowImportLocal        bool // Allow migrate repository by local path
	AllowCreateOrganization bool `xorm:"DEFAULT true"`

	// true: the user is not allowed to log in Web UI. Git/SSH access could still be allowed (please refer to Git/SSH access related code/documents)
	ProhibitLogin bool `xorm:"NOT NULL DEFAULT false"`

	// Avatar
	Avatar          string `xorm:"VARCHAR(2048) NOT NULL"`
	AvatarEmail     string `xorm:"NOT NULL"`
	UseCustomAvatar bool

	// Counters
	NumFollowers int
	NumFollowing int `xorm:"NOT NULL DEFAULT 0"`
	NumStars     int
	NumRepos     int

	// For organization
	NumTeams                  int
	NumMembers                int
	Visibility                structs.VisibleType `xorm:"NOT NULL DEFAULT 0"`
	RepoAdminChangeTeamAccess bool                `xorm:"NOT NULL DEFAULT false"`

	// Preferences
	DiffViewStyle       string `xorm:"NOT NULL DEFAULT ''"`
	Theme               string `xorm:"NOT NULL DEFAULT ''"`
	KeepActivityPrivate bool   `xorm:"NOT NULL DEFAULT false"`
}

As you can see there are a huge number of fields in here which are not associated with Orgs or Federated users.

However, using the User table is probably the only option.

  • We would need to overload the Username/lowerusername
    • as we spoke in the federation meetings adding an initial illegal character to prevent login. Fortunately we've done a lot of work to make sure that this shouldn't cause any sec issues but the choice of character will be important.
    • My suspicion is the rest of the username will probably need to be a GUID. (Whilst that GUID could be shared we probably can't just trust it implicitly so we'd always need some kind of mapping and thence have to use our own internal GUID.)

So we need to think about the mapping:

  • Now the ExternalLoginUser can be used to provide extra data:
ExternalLoginUser
// ExternalLoginUser makes the connecting between some existing user and additional external login sources
type ExternalLoginUser struct {
	ExternalID        string                 `xorm:"pk NOT NULL"`
	UserID            int64                  `xorm:"INDEX NOT NULL"`
	LoginSourceID     int64                  `xorm:"pk NOT NULL"`
	RawData           map[string]interface{} `xorm:"TEXT JSON"`
	Provider          string                 `xorm:"index VARCHAR(25)"`
	Email             string
	Name              string
	FirstName         string
	LastName          string
	NickName          string
	Description       string
	AvatarURL         string `xorm:"TEXT"`
	Location          string
	AccessToken       string `xorm:"TEXT"`
	AccessTokenSecret string `xorm:"TEXT"`
	RefreshToken      string `xorm:"TEXT"`
	ExpiresAt         time.Time
}

But we need to remember that too much use of a lot of RawData here may be bad idea as it needs to be unmarshalled. I guess the question is what kind of data we need?

  • LoginType I suspect we can simply add another logintype no login.
  • There is a kind of overloading of LoginSourceID which is actually a RegistrationSource and LoginSource - I guess we need to think a bit more about this.
OK, I've finally had some time to think about this. First thoughts: * I've never really liked this UserType thing in the User table - but I guess if it's here we should use it. * The User table is too large - we need to move the login and preferences stuff out from it completely. <details> <summary>User table</summary> ```go // User represents the object of individual and member of organization. type User struct { ID int64 `xorm:"pk autoincr"` LowerName string `xorm:"UNIQUE NOT NULL"` Name string `xorm:"UNIQUE NOT NULL"` FullName string // Email is the primary email address (to be used for communication) Email string `xorm:"NOT NULL"` KeepEmailPrivate bool EmailNotificationsPreference string `xorm:"VARCHAR(20) NOT NULL DEFAULT 'enabled'"` Passwd string `xorm:"NOT NULL"` PasswdHashAlgo string `xorm:"NOT NULL DEFAULT 'argon2'"` // MustChangePassword is an attribute that determines if a user // is to change his/her password after registration. MustChangePassword bool `xorm:"NOT NULL DEFAULT false"` LoginType auth.Type LoginSource int64 `xorm:"NOT NULL DEFAULT 0"` LoginName string Type UserType Location string Website string Rands string `xorm:"VARCHAR(32)"` Salt string `xorm:"VARCHAR(32)"` Language string `xorm:"VARCHAR(5)"` Description string CreatedUnix timeutil.TimeStamp `xorm:"INDEX created"` UpdatedUnix timeutil.TimeStamp `xorm:"INDEX updated"` LastLoginUnix timeutil.TimeStamp `xorm:"INDEX"` // Remember visibility choice for convenience, true for private LastRepoVisibility bool // Maximum repository creation limit, -1 means use global default MaxRepoCreation int `xorm:"NOT NULL DEFAULT -1"` // IsActive true: primary email is activated, user can access Web UI and Git SSH. // false: an inactive user can only log in Web UI for account operations (ex: activate the account by email), no other access. IsActive bool `xorm:"INDEX"` // the user is a Gitea admin, who can access all repositories and the admin pages. IsAdmin bool // true: the user is only allowed to see organizations/repositories that they has explicit rights to. // (ex: in private Gitea instances user won't be allowed to see even organizations/repositories that are set as public) IsRestricted bool `xorm:"NOT NULL DEFAULT false"` AllowGitHook bool AllowImportLocal bool // Allow migrate repository by local path AllowCreateOrganization bool `xorm:"DEFAULT true"` // true: the user is not allowed to log in Web UI. Git/SSH access could still be allowed (please refer to Git/SSH access related code/documents) ProhibitLogin bool `xorm:"NOT NULL DEFAULT false"` // Avatar Avatar string `xorm:"VARCHAR(2048) NOT NULL"` AvatarEmail string `xorm:"NOT NULL"` UseCustomAvatar bool // Counters NumFollowers int NumFollowing int `xorm:"NOT NULL DEFAULT 0"` NumStars int NumRepos int // For organization NumTeams int NumMembers int Visibility structs.VisibleType `xorm:"NOT NULL DEFAULT 0"` RepoAdminChangeTeamAccess bool `xorm:"NOT NULL DEFAULT false"` // Preferences DiffViewStyle string `xorm:"NOT NULL DEFAULT ''"` Theme string `xorm:"NOT NULL DEFAULT ''"` KeepActivityPrivate bool `xorm:"NOT NULL DEFAULT false"` } ``` As you can see there are a huge number of fields in here which are not associated with Orgs or Federated users. </details> However, using the User table is probably the only option. * We would need to overload the Username/lowerusername * as we spoke in the federation meetings adding an initial illegal character to prevent login. Fortunately we've done a lot of work to make sure that this shouldn't cause any sec issues but the choice of character will be important. * My suspicion is the rest of the username will probably need to be a GUID. (Whilst that GUID could be shared we probably can't just trust it implicitly so we'd always need some kind of mapping and thence have to use our own internal GUID.) So we need to think about the mapping: * Now the ExternalLoginUser can be used to provide extra data: <details> <summary>ExternalLoginUser</summary> ```go // ExternalLoginUser makes the connecting between some existing user and additional external login sources type ExternalLoginUser struct { ExternalID string `xorm:"pk NOT NULL"` UserID int64 `xorm:"INDEX NOT NULL"` LoginSourceID int64 `xorm:"pk NOT NULL"` RawData map[string]interface{} `xorm:"TEXT JSON"` Provider string `xorm:"index VARCHAR(25)"` Email string Name string FirstName string LastName string NickName string Description string AvatarURL string `xorm:"TEXT"` Location string AccessToken string `xorm:"TEXT"` AccessTokenSecret string `xorm:"TEXT"` RefreshToken string `xorm:"TEXT"` ExpiresAt time.Time } ``` </details> But we need to remember that too much use of a lot of RawData here may be bad idea as it needs to be unmarshalled. I guess the question is what kind of data we need? * LoginType I suspect we can simply add another logintype no login. * There is a kind of overloading of LoginSourceID which is actually a RegistrationSource and LoginSource - I guess we need to think a bit more about this.
Collaborator

The simpler option would be to create "staged" users when needed, as Discourse does when importing mails authored by people who do not (yet) have account in the forum. A staged user in Gitea could have a special UserType that used to display it differently or be handled in a special way at login / password recoverty. The minimum information it has is either:

  • an email
  • a URL for third party authentication

so that the account can be retrieved at a later time by the legitimate user. In discourse most of the code associated with those special users revolve around the logic that allows them to retrieve their account in a safe way.

The less complicated problem is to pick a username when staging a user. If the username is already taken, adding a trailing number until there is no conflict works. When the user reclaims their account, they can then pick a username that is available and that is more to their liking.

The other nice thing Discourse does is to transparently turn a "staged" user to a normal user when someone creates an account using the same email. Most people are not aware that data is waiting for them in the forum and they will not think to recover their password on a service that they never used.

The simpler option would be to create "staged" users when needed, as [Discourse](https://discourse.org) does when importing mails authored by people who do not (yet) have account in the forum. A staged user in Gitea could have a special UserType that used to display it differently or be handled in a special way at login / password recoverty. The minimum information it has is either: * an email * a URL for third party authentication so that the account can be retrieved at a later time by the legitimate user. In discourse most of the code associated with those special users revolve around the logic that allows them to retrieve their account in a safe way. The less complicated problem is to pick a username when staging a user. If the username is already taken, adding a trailing number until there is no conflict works. When the user reclaims their account, they can then pick a username that is available and that is more to their liking. The other nice thing Discourse does is to transparently turn a "staged" user to a normal user when someone creates an account using the same email. Most people are not aware that data is waiting for them in the forum and they will not think to recover their password on a service that they never used.

The simpler option would be to create "staged" users when needed, as Discourse does when importing mails authored by people who do not (yet) have account in the forum. A staged user in Gitea could have a special UserType that used to display it differently or be handled in a special way at login / password recoverty. The minimum information it has is either:

  • an email
  • a URL for third party authentication

so that the account can be retrieved at a later time by the legitimate user. In discourse most of the code associated with those special users revolve around the logic that allows them to retrieve their account in a safe way.

The other nice thing Discourse does is to transparently turn a "staged" user to a normal user when someone creates an account using the same email. Most people are not aware that data is waiting for them in the forum and they will not think to recover their password on a service that they never used.

Is there a reason why a user would want to retrieve an account on a different instance? I understand why Discourse does this, because Discourse is not federated, but ideally Gitea users will just have one account on a single instance that they use across the federation.

The less complicated problem is to pick a username when staging a user. If the username is already taken, adding a trailing number until there is no conflict works. When the user reclaims their account, they can then pick a username that is available and that is more to their liking.

One way to guarentee that the username is unique could be to include the instance as part of the username.

> The simpler option would be to create "staged" users when needed, as [Discourse](https://discourse.org) does when importing mails authored by people who do not (yet) have account in the forum. A staged user in Gitea could have a special UserType that used to display it differently or be handled in a special way at login / password recoverty. The minimum information it has is either: > > * an email > * a URL for third party authentication > > so that the account can be retrieved at a later time by the legitimate user. In discourse most of the code associated with those special users revolve around the logic that allows them to retrieve their account in a safe way. > > The other nice thing Discourse does is to transparently turn a "staged" user to a normal user when someone creates an account using the same email. Most people are not aware that data is waiting for them in the forum and they will not think to recover their password on a service that they never used. Is there a reason why a user would want to retrieve an account on a different instance? I understand why Discourse does this, because Discourse is not federated, but ideally Gitea users will just have one account on a single instance that they use across the federation. > The less complicated problem is to pick a username when staging a user. If the username is already taken, adding a trailing number until there is no conflict works. When the user reclaims their account, they can then pick a username that is available and that is more to their liking. One way to guarentee that the username is unique could be to include the instance as part of the username.
Collaborator

Is there a reason why a user would want to retrieve an account on a different instance?

When the user decides that they want to move from one forge to another. Say I'm a happy user of https://gitea.hostea.org which is federated with https://gitea.com. But then, for some reason, I'm no longer interested and I want to use https://gitea.com instead: I retrieve my account there and that's all I have to do.

Does that use case answer your question?

> Is there a reason why a user would want to retrieve an account on a different instance? When the user decides that they want to move from one forge to another. Say I'm a happy user of https://gitea.hostea.org which is federated with https://gitea.com. But then, for some reason, I'm no longer interested and I want to use https://gitea.com instead: I retrieve my account there and that's all I have to do. Does that use case answer your question?

Is there a reason why a user would want to retrieve an account on a different instance?

When the user decides that they want to move from one forge to another. Say I'm a happy user of https://gitea.hostea.org which is federated with https://gitea.com. But then, for some reason, I'm no longer interested and I want to use https://gitea.com instead: I retrieve my account there and that's all I have to do.

Does that use case answer your question?

Yes, that sounds like a valid use case. However, the ActivityPub protocol does not allow users to easily move instances, since users are referenced in federation using their username@instance.com. If you move instances, all the other instances will still think you are located at the original instance.

I believe Mastodon implements moving instances using a special Move activity that adds a key on the user's Actor object that points to the account on the new instance so other instances know where to look for the moved user.

> > Is there a reason why a user would want to retrieve an account on a different instance? > > When the user decides that they want to move from one forge to another. Say I'm a happy user of https://gitea.hostea.org which is federated with https://gitea.com. But then, for some reason, I'm no longer interested and I want to use https://gitea.com instead: I retrieve my account there and that's all I have to do. > > Does that use case answer your question? Yes, that sounds like a valid use case. However, the ActivityPub protocol does not allow users to easily move instances, since users are referenced in federation using their username@instance.com. If you move instances, all the other instances will still think you are located at the original instance. I believe Mastodon implements moving instances using a special `Move` activity that adds a key on the user's Actor object that points to the account on the new instance so other instances know where to look for the moved user.
Collaborator

Even if there is no way to move a user, as implemented by Mastodon, it is still very handy for a user to just stop using an account on a given instance and start using another account on another instance. The account on the other instance has been updated continuously, it already has everything and work can resume. There is a lot to do before that actually works as I describe it, but that's what I'd like and it is also a natural side effect of federation.

Even if there is no way to move a user, as implemented by Mastodon, it is still very handy for a user to just stop using an account on a given instance and start using another account on another instance. The account on the other instance has been updated continuously, it already has everything and work can resume. There is a **lot** to do before that actually works as I describe it, but that's what I'd like and it is also a natural side effect of federation.
xy closed this issue 2022-06-26 19:18:12 +00:00
This repo is archived. You cannot comment on issues.
No Label
No Milestone
No project
No Assignees
3 Participants
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: xy/gitea#2
No description provided.