Development/Docs/Cluster Admin
From Mandriva Community Wiki
A quick reference of various cluster admin tasks
[edit] Platform overview
The platform for the build system is running on multiple machines and is composed of 2 main layers:
- the logical "cooker" layer
- the physical layer (technical system infrastructure)
[edit] Troubleshooting the cooker layer
The logical cooker layer is made of chroots runnning on multiple nodes (n1..n5 for x86, seggie & deborah for x86_64). To access these machines you login as normal with "ssh n5" for example. From there, you can sudo to perform administration tasks as described below. To help in the daily platform administration, you can request to do so on the distrib-admin mailing list. To get help, login on IRC and contact distrib-admin members who have sudo priliveges on this layer.
[edit] Troubleshooting the physical layer
This layered is formed by the real system running on the nodes, some of them are in 2007, some of them in a mix of 2006 & cooker. Normaly as a build engineer or a contributor, you do not need to access this part.
If a problem cannot be solved from the logical cooker layer, you (as a registered member of distrib-admin) can contact ISTeam to get help. The current SLA for the physical layer is that HW troubles are taken into account during the week with IS Team normal duties. HW troubles happening during the week-end are only fixed on a best effort basis.
[edit] Access to the root shell
If you want to become root, just use sudo bash.
[edit] Cluster configuration
Some configuration files are managed by cfengine, the configuration is in /var/lib/config/ on kenobi.
A cron task is updating each node of the cluster. Managed files are updated every hour or so. To expedite or force an update, run the /etc/cron.hourly/config script on each managed machine.
These are managed via LDAP:
- user and group accounts (service accounts are local for each machine, i.e., id < 500). Group accounts follow RFC2307bis, that is, group membership is via the member attribute using a full DN.
- sudo rights
- automount maps
- password policies (not in effect yet)
- svn login to user+email mapping ([users] section in repsys.conf)
- administration privileges (ou=System Accounts and ou=System Groups)
- svnperms groups for ACLs (same as posix groups)
- mandriva.org email aliases for contributors
- ssh public keys are also stored in the user's entry in LDAP
The tree layout is based on Projects/OpenLDAP_DIT
The "producer" LDAP server (i.e., "master" and R/W) is at svn.mandriva.com, and kenobi.mandriva.com hosts one "consumer" (i.e., "slave" and R/O). So direct any write operations to the producer or else you will get back an error and a referral to the right server.
[edit] LDAP
There are two LDAP servers: the one on svn.mandriva.com is the producer one, i.e., where all writes go to. The other one is at kenobi.mandriva.com and is a read-only consumer. All cluster machines point to these two servers, using the closest one first. So if one goes down, the other is used.
This configuration means that all write operations have to be directed to svn.mandriva.com. Read operations can be done in any of the two, as the replication is quite fast meaning the data is consistent.
So, how does one deal with this change? User accounts and groups creation is handled by script. All the rest is, for now, handled via LDAP commands. So, dear admin, fire up your LDAP client of choice. Some suggestions: luma, gq, openldap-clients.
One last thing: use your own account to perform admin tasks. Because of the use of group memberships, members of the Account Admins system group (see cn=Account Admins,ou=System Groups,dc=mandriva,dc=com) have such rights. If you are not a member of that group and think this is an error, please contact distrib-admin@mandrivalinux.org.
[edit] Sudoers in LDAP
Sudoers configuration is stored under the ou=sudoers,dc=mandriva,dc=com branch. Any change to this branch is immediately visible in all cluster nodes, so be careful.
Sudo defaults are managed under the cn=defaults entry. All other entries represent sudo roles, which are similar to a line in /etc/sudoers.
Here is an example of a sudo role called youri-submit:
dn: cn=youri-submit,ou=sudoers,dc=mandriva,dc=com objectClass: sudoRole cn: youri-submit sudoUser: %packager sudoRunAs: mandrake sudoCommand: /usr/local/bin/mdv-youri-submit.wrapper sudoOption: !authenticate sudoHost: n1.mandriva.com sudoHost: n2.mandriva.com sudoHost: n3.mandriva.com sudoHost: n4.mandriva.com sudoHost: n5.mandriva.com sudoHost: seggie.mandriva.com sudoHost: deborah.mandriva.com sudoHost: kenobi.mandriva.com
This role has these characteristics:
- is about the command /usr/local/bin/mdv-youri-submit.wrapper
- can be run only on the listed hosts (n1 through n5, seggie, deborah and kenobi)
- has to be run as the mandrake user (i.e., sudo -u mandrake)
- can be run by any member of the packager group
- no authentication needed (equivalent of NOPASSWD in sudoers)
If we want to add another command to this role, just add it via another sudoCommand attribute. Like this:
dn: cn=youri-submit,ou=sudoers,dc=mandriva,dc=com (...) sudoCommand: /usr/local/bin/mdv-youri-submit.wrapper sudoCommand: /usr/local/bin/youri-submit (...)
Want another user or group in the list of authorized entities? Just add another sudoUser attribute. Another option? Add sudoOption. Host? sudoHost. And so on.
[edit] User and group accounts in LDAP
User and group accounts are stored in LDAP with the following characteristics:
- using RFC2307bis schema (i.e., groups use groupOfNames as structural object class)
- using cn=unixIdPool entry to store free user and group global numeric identifiers
- group membership is automatically handled by the OpenLDAP server whenever an user is removed
- primary group for all user accounts is users (gidNumber=100)
The svn/git machine uses a little trick to prevent users from using interactive shells: on that machine only, loginShell from LDAP is overriden locally to point to a wrapper script which only allows git and svn commands. This is done in /etc/ldap.conf.
For now there are two scripts: one to add users and another to add groups. Both have similar sintaxes and behaviour, see their usage text for more details.
NOTE: anything that touches the userPassword attribute has to be protected by encryption. This means that any non-anonymous authentication has to use ldaps:// or ldap:// + START_TLS. For the OpenLDAP client command-line tools, add -ZZ as a parameter. For other LDAP clients, check the respective documentation.
WARNING: there is a race condition in adding users and groups because the increment+modify operation (RFC 4525 and RFC 4527 for pre/post-read) in LDAP is not being used at the moment. This means that if more than one admin is using the add user/group script at the same time, the new entries may get the same uidNumber or gidNumber. uidNumber is protected from duplication on the server side and the script would fail if it happened, but not gidNumber. So, if such a problem arises, please check these attributes for duplication and, if necessary, increment the values in the cn=unixIdPool,dc=mandriva,dc=com entry.
[edit] Adding an user
Use this script: cluster-adduser.sh
Currently these are the roles available for user accounts. A role is represented by a set of groups:
| Role | Privileges | Groups | Use for... |
|---|---|---|---|
| apprentice | shell,iurt | apprentice | people who may become maintainers in the future |
| svn-only | commit,shell | svn | basic svn access |
| translator | commit,shell | svn,po | only translator related work |
| packager | commit,shell,iurt,upload | svn,packager | maintainers who commit and upload packages |
(nss_ldap has support for nested groups, i.e., groups within groups, but I don't trust it yet)
To add a new user account, please use the above mentioned cluster-adduser.sh script. WARNING: this script is still lacking some features:
- subscribing the user to maintainers@ mailing list in the case of a packager role
- create home directory on specified cluster node and svn machine
So, after creating the user in LDAP, these tasks need to be performed manually:
- for packagers, ask the user to subscribe himself to the maintainers list by sending an email to sympa@mandrivalinux.org with the body "subscribe maintainers"
- ssh into the node and run /var/lib/config/bin/new-account <loginname> users
[edit] Adding a group
Use this script: cluster-addgroup.sh
This script also has the option of specifying an owner for the group. Owners are like group admins: they can manage the group ownership at will by including or removing members.
Note it's currently not possible to add initial members to the group with the script. This has to be performed later with standard LDAP operations.
[edit] Modifying a group
Here is a script to add user(s) to a group: http://svn.mandriva.com/svn/soft/build_system/account_management/cluster-adduser2group.sh
Alternatively, you can use standard LDAP operations on the group entry. For example, to make jsmith part of the packager group, run this:
$ ldapmodify -x -ZZ -D uid=<your-login>,ou=people,dc=mandriva,dc=com -W -h svn.mandriva.com Enter LDAP Password: secret dn: cn=packager,ou=Group,dc=mandriva,dc=com changetype: modify add: member member: uid=jsmith,ou=People,dc=mandriva,dc=com modifying entry "cn=packager,ou=Group,dc=mandriva,dc=com" ^D
Or just use your LDAP client of choice to add that new member attribute.
To remove an user from a group, it's almost the same operation. Just be careful to specify which member you are removing: if left blank, all members would be removed!
$ ldapmodify -x -ZZ -D uid=<your-login>,ou=people,dc=mandriva,dc=com -W -h svn.mandriva.com Enter LDAP Password: secret dn: cn=packager,ou=Group,dc=mandriva,dc=com changetype: modify delete: member member: uid=jsmith,ou=People,dc=mandriva,dc=com modifying entry "cn=packager,ou=Group,dc=mandriva,dc=com" ^D
The command above removed jsmith from the packager group.
[edit] Modifying an user
Again, standard LDAP operations should be used to perform modifications on an user entry. For example, to change the alias email of jsmith to js22@gmail.com, run this:
$ ldapmodify -x -ZZ -D uid=<your-login>,ou=people,dc=mandriva,dc=com -W -h svn.mandriva.com Enter LDAP Password: secret dn: uid=jsmith,ou=People,dc=mandriva,dc=com changetype: modify replace: mailForwardingAddress mailForwardingAddress: js22@gmail.com modifying entry "uid=jsmith,ou=People,dc=mandriva,dc=com" ^D
The manager of an user, defined by the manager attribute if present, has some additional permissions over that entry when compared to the user him/herself. For example, the manager can upload a photo (jpegPhoto), change the mailForwardingAddress value and edit some other attributes which won't be listed here because this could change in the future as this feature is more (or less) used.
For example, if jsmith had this entry:
dn: uid=jsmith,ou=people,dc=mandriva,dc=com (...) manager: uid=peter,ou=people,dc=mandriva,dc=com mailForwardingAddress: peter21@gmail.com
This means that uid=peter,ou=people,dc=mandriva,dc=com could change, among other things, the mailForwardingAddress of this jsmith entry.
The manager attribute also points out who is tutor of an user in the case of an apprentice, so you know who to contact if needed.
[edit] Removing a group
To remove a group, just delete it's entry in LDAP with a standard LDAP operation. Like this:
$ ldapdelete -x -ZZ -D uid=yourname,ou=People,dc=mandriva,dc=com -W -h svn.mandriva.com cn=<group>,ou=Group,dc=mandriva,dc=com Enter LDAP Password: secret
Notice that any filesystem objects which had this group in its set of permissions will stop recognizing the group name, showing the numeric identifier instead.
[edit] Removing/Disabling an user
In most cases, it's better to disable an user account instead of removing it.
To disable an account, just make sure it has the shadowExpire: 1 attribute/value in it. For example, to disable the account of the jsmith user:
$ ldapmodify -x -ZZ -D uid=<yourlogin>,ou=People,dc=mandriva,dc=com -W -h svn.mandriva.com Enter LDAP Password: secret dn: uid=jsmith,ou=People,dc=mandriva,dc=com changetype: modify replace: shadowExpire shadowExpire: 1 modifying entry "uid=jsmith,ou=People,dc=mandriva,dc=com" ^D
This will prevent that user from logging in the cluster and using authenticated svn sessions. To re-enable an account, just remove that attribute. For example, in the case of the same jsmith user:
$ ldapmodify -x -ZZ -D uid=<yourlogin>,ou=People,dc=mandriva,dc=com -W -h svn.mandriva.com Enter LDAP Password: secret dn: uid=jsmith,ou=People,dc=mandriva,dc=com changetype: modify delete: shadowExpire modifying entry "uid=jsmith,ou=People,dc=mandriva,dc=com" ^D
Alternatively, you can use these scripts: cluster-enableuser.sh and cluster-disableuser.sh.
Removing an user is a bit more tricky, because of the many places the user is referenced. Some are automatically handled, but others are not:
- group membership: automatically handled by the OpenLDAP server. The user is removed from all groups to which he/she belongs to.
- automount maps: have to be manually removed from ou=Mounts,dc=mandriva,dc=com
- sudo rules: have to be manually updated in ou=sudoers,dc=mandriva,dc=com unless groups are used (search for sudoUser=name)
- bugzilla: not affected (bugzilla's database is independent of LDAP)
- email alias: automatically dropped, because it's in the user entry itself
- maintainers@ ml subscription: has to be dealt with manually
- home directory: has to be dealt with manually
We will try to come up with a script to do this all.
[edit] Automount maps in LDAP
Automount maps are stored in LDAP under ou=Mounts,dc=mandriva,dc=com. These maps are automatically created by the cluster-useradd.sh script. Here is an example:
dn: cn=andreas,ou=auto.home,ou=Mounts,dc=mandriva,dc=com objectClass: automount automountInformation: -rw,nfs,soft,intr,nosuid,rsize=8192,wsize=8192 n5.mandriva.com:/export/home/& cn: andreas
[edit] Password policies in LDAP
These are currently ignored. The documentation is here only for completeness.
It is possible now to store password policies in LDAP, and use different policies for different users. For example, here is a fictional policy called "cluster":
dn: cn=cluster,ou=Password Policies,dc=mandriva,dc=com pwdExpireWarning: 604800 cn: cluster objectClass: pwdPolicy objectClass: namedObject pwdMinLength: 6 pwdCheckQuality: 1 pwdAttribute: userPassword pwdMaxAge: 5184000 pwdMustChange: TRUE pwdInHistory: 2
Policies will only be used after some more testing with ssh interaction, specially the password change feature. Unless we decide to use ssh-key authentication only (better).
[edit] Contributor email aliases in LDAP
The contributors email aliases are also handled in LDAP. Periodically a script scans the user entries and generates a postfix-compatible virtual alias file that is sent to the mandriva.org MTA. These two attributes are used to construct the alias: mail and mailForwardingAddress (can also be called mailAlternateAddress).
For example, this user entry:
dn: uid=jsmith,ou=People,dc=mandriva,dc=com (...) mail: jsmith@mandriva.org mail: jsmith@mandriva.com mailForwardingAddress: jsmith26@gmail.com (...)
Would generate this line for a postfix virtual alias (note how @mandriva.com was ignored):
jsmith@mandriva.org jsmith26@gmail.com
So, the rules are:
- mail: has to be a @mandriva.org address
- mailForwardingAddress: if mail is a @mandriva.org address, then this attribute contains the aliased address, i.e., the final destination
- any mail other than @mandriva.org is ignored by the script
The current script is here: generate-aliases.py
[edit] svnperms.conf groups in LDAP
The svnperms.conf has fine grained ACLs for write access to different paths of a repository. One of its sections is the [group] one, which defines a group and its members. This definition was moved to LDAP and shares the posix groups, i.e., svn groups are the same as posix groups and all posix groups are available to be used as svn groups.
So, for example, to add an user to a group mentioned in an svn ACL, just add this user to that posix group.
To add permissions for a new project, a new entry has to be added in the matching repository section (for example [projects]) of svnperms.conf:
myproject/.* = *() @mygroup(add,remove,update)
[edit] ssh public keys in LDAP
The SSH public keys are centralized in LDAP, inside each user's entry. We are not yet, however, using openssh with the LPK patch, which means that SSH still looks for these keys somewhere on file.
So there is a cron job which periodically fetches the keys from LDAP and stores them in a local file, outside the user's home directory. This has the added bonus that the user will still be able to login even if his/her homedir is not available due to nfs problems.
The following scripts are available to deal with SSH keys in LDAP:
- send-sshkey-ldap.py: this script is used to add or replace keys in the user's entry
- ldap-sshkey2file.py: this is the script that runs via cron and stores the keys in /var/lib/pubkeys/loginname/authorized_keys
The cluster-adduser.sh script already adds the ssh public key to the user it is creating in LDAP. In fact, supplying the ssh key is mandatory now.
There are ACLs protecting access to these keys:
- the user can update his/her keys at will using LDAP commands or the send-sshkey-ldap.py script as long as he/she has a password in LDAP;
- admins can update keys;
- keys can only be read by the user or admins in authenticated sessions. So, an anonymous search of the LDAP tree won't display the SSH keys. This may need to be changed if we start to use the LPK patch.
[edit] Granting svn commit rights to a user who already has ssh access
There is more than one SVN repository available. The example below is for the packages one:
- check that the user is part of at least the svn group (run id <user>)
- add the user to the group that already has the needed rights
- if some ACL change is needed (usually it's not), do the following:
- checkout svn+ssh://svn.mandriva.com/svn/config/svn/packages/conf
- edit the svnperms.conf file if needed, and commit. Note that groups are defined in LDAP.
- update the checkout on svn.mandriva.com cd /svn/packages/conf; svn up (a checkout on commit feature would be nice, but there is some privileges problem for the moment, and no clean way to handle them)
Groups used in ACLs in the svnperms.conf file are defined in LDAP as regular posixGroup/groupOfNames. So, if all that is needed is to add an user to a group, do it in LDAP.
For example, if one wants to add user jsmith to the drakx group, this would do it if you are an admin:
$ ldapmodify -x -ZZ -D uid=<your-login>,ou=people,dc=mandriva,dc=com -W -h svn.mandriva.com Enter LDAP Password: secret dn: cn=drakx,ou=Group,dc=mandriva,dc=com changetype: modify add: member member: uid=jsmith,ou=People,dc=mandriva,dc=com modifying entry "cn=drakx,ou=Group,dc=mandriva,dc=com" ^D
The change is effective immediately for all repositories.
[edit] Adding a buildhost to the upload system ACL, based on architecture
In order to declare another buildhost to the upload system, you need to edit the file /etc/youri/hosts.conf, i.e. use the /var/lib/config copy of kenobi. The format is simple :
host-regexp arch-regexp
The check comes from Youri::Upload::Check::Host (/usr/local/lib/perl/Youri/Upload/Check/Host.pm)
[edit] Adding full privileges to someone on bugzilla
Ask vdanen@mandriva.com
[edit] Access to the real system outside of chroot
In order to recover in case of big problems, cluster node uses a chroot. The real system can be accessed on port 12, like this:
ssh n5 -p 12
You can also mount the real partion ( /dev/hda5 ) and use chroot to go outside of the first chroot.
[edit] Cleaning iurt process that does not respond
If the build is too old (i.e. more than one day, and does nothing, iurt should be killed. Please take a look at the log file first (shown by ps aux) to try another method. rpm/urpmi locking problems are known, this requires a kill.
[edit] RPM build failing in weird ways
Builds with bm work, but with rpm don't. Check if nscd is running. If not, start it with:
sudo /sbin/service nscd restart (allowed for everybody in the packagers group)
This is a bug in nss_ldap (http://bugzilla.padl.com/show_bug.cgi?id=273) that can be "workarounded" by using nscd.
This was fixed with nss_ldap-257, which should be installed in all cluster nodes by now.
[edit] System doesn't show changes in accounts/groups
We use nscd in cluster machines, which is a cache of user and group information. So when one changes some group membership, for example, it can take some time to show up for tools like getent and id. To speed it up, just invalidate the cache. For example, to invalidade the group cache, run:
sudo nscd -i group
To invalidade the passwd cache, the command is:
sudo nscd -i passwd
[edit] Problem with autofs?
In case autofs is not working, here is a quick summary on how things are set up: autofs is the one from 2006.0, because we run a 2006.0 kernel outside of the cooker chroot (there are various incompatibilities with the cooker version at the moment).
To reinstall, just run :
rpm -Uvh /mnt/BIG/dis/community/2006.0/i586/media/main/autofs-4.1.4-4.2.20060mdk.i586.rpm
Autofs is run from inside the cooker chroot. The config files are /etc/auto.home and /etc/auto.master, managed by cfengine.
Kenobi uses autofs5, because it has a newer kernel. The config files are stored in /etc/autofs instead of /etc directly.
[edit] Upload is stopped on kenobi
Sometimes, a mail is sent to signal a problem on kenobi or ken :
Subject: [Maintainers] kenobi.mandriva.com filesystem is full Only 4878812 bytes available. Stopping upload and mirroring processes.
This mail, on kenobi is sent by the script /etc/cron.hourly/stop_if_full, which runs /home/mandrake/bin/stop_if_full. The script checks avaliable disk space on /export/home/ and /mnt/BIG/ and stops crond.
So if this happens, this usually means that something is taking too much space, and among the usual suspects, we have /mnt/BIG/dis/uploads/failure/cooker/{contrib,main}/release/, that can fill pretty quickly. Using find and rm to remove the old log is the usual solution to clean it.
Once this is done, crond must be started again.
service crond start
[edit] (re)move a package
The package repository reference machine is ken. If you have the right access to it, whatever is done there in terms of package move, removal, etc. is reflected in the rest of the world. For example, to move a package from 2007 main/backports to 2007 contrib backports one could do this:
for m in /mnt/BIG/dis/2007.0/{SRPMS,*/media}; do mv $m/main/backports/*warzone* $m/contrib/backports; done
[edit] Checking the logs of buildsystem
Since buildsystem is mainly using cron, you can get the logs of the job by running "mutt" on kenobi, as the user mandrake.
[edit] Cluster is broken, what should be checked ?
- First, check disk space on every node.
- Check if the time is correct. Beware, ntpdate will refuse to sync if the gap between kenobi and the cluster is too wide.
[edit] I modified a group/user but the changes don't show up!
All cluster nodes use nscd, which means, the posix data from LDAP is cached. To see the real data, stop nscd and check again. If you are in a hurry, you can remove the cache in /var/db/nscd/* and restart the daemon.
[edit] rpmlint configuration tips for kenobi
Kenobi runs rpmlint on each submitted package.
- RPM groups: /etc/rpmlint/config, managed via cfengine. So, edit /var/lib/config/etc/rpmlint/config instead
- extra checks are loaded from /usr/local/bin/rpmlint/
[edit] Todo
- script to create user accounts. Mostly done:
-
support rfc2307bis groups -
support cn, sn, givenName and email domain -
better support for being called in a script (i.e., needs command line parameters for some stuff) -
support for uid/gid pool instead of enumerating all users/groups in order to find out what number to use -
possibly not depend on nss_ldap configured - create home dir on node and svn.mandriva.com
- upload ssh key to hosts (home host + svn.mandriva.com
(and patch it with command in the svn host)) - subscribe to maintainers list
-
use employeeType and manager attributes. For example, for the apprentice role we could have:-
employeeType: apprentice -
manager: the DN of the user who is tutoring this apprentice (for example, uid=peroyvind,ou=People,dc=mandriva,dc=com)
-
-
give bugzilla permissions (I think this could be done automatically by bugzilla via an email regexp)
-
-
change OpenLDAP ACLs to allow the manager of an user to write to some of his/her attributes (still need to define which ones). For example, fix email, reset password, add photo, etc. -
autofs: test autofs with these maps, come um with a configuration -
fix emails: contributor vs employee (.org vs .com)(based on repsys.conf) -
fix aliases: need a script to dump forwarding email from ldap into aliases format and rsync to postfix server -
replication: setup slave/consumer, decide on which machine(s) - nested groups: patch nss_ldap to disable nested group support (patch done, but not applied by default, it's not "upstream quality")
-
svnperms: patch svnperms.py to support groups in LDAP instead of svnperms.conf - script to remove and/or
disable users - to help admins not familiar with LDAP, script to:
- manage group membership (
add users to groups, remove users from groups)
- manage group membership (

