Useful little bit of code – List all domains on cPanel server

Need to list all domains on the server (no subdomains, etc)?

Here’s the magical code:

cat /etc/userdomains | awk -F ':' '{ print $1 }' | sed 's/.*\.\(.*\.\)/\1/' | sort -u

Citrix XenServer + VLAN Trunking setup (Cisco switch)

I have been involved with recent Citrix virtualization setups with our company and I have not see any clear tutorials on how to work with VLAN trunking and Citrix XenServer. The advantage of using this is being able to use VLANs to separate your network but at the same time being able to do live migrations without losing connectivity. These articles will be separated in a couple of parts, this part is mainly focused on setting up the trunking on the switch itself. The trunking allows the XenServer to have, technically, presence on every VLAN, therefore when migrating servers, the internet connection does not stop working because the VLAN on the target server is different. Hope it’s clear.

First of all, telnet to your switch, these instructions are for Cisco switches, that’s all I ever worked with anyways, heh. All those instructions use IOS.

First of all you need to pick the port that you’ll be working on, make sure you have physical access to the server you’re going to be working with, I’m not responsible for some settings going poof and you having to drive down to your data center to fix the issue. :P

So, the interface that I will be working with is FastEthernet1/8 — Enter configuration mode and go to the port

conf t
int FastEthernet1/8

Enable trunking and 801q capsulation

switchport mode trunk
switchport trunk encapsulation dot1Q
switchport trunk native vlan 1000

Make sure the native VLAN is the same VLAN that the network was running on previously, read more information why here:

To establish 802.1q trunking both sides must be in the same native VLAN this is because the encapsulation is not setup yet and the 2 switches must talk over an un-encapsulated link (native VLAN) to setup the encapsulation in the first place. Why this works is because neither side is encapsulating packets with its VLAN tag since they are both talking over their native VLAN, basicly neither side knows that the other side is in a different VLAN to begin with and they are just sending unencapsulated packets back and forth. So if you set a port on the core switch as native vlan 5 for example and connected a dumb switch to it vlan 5 traffic would go un encapsulated to the dumb switch and it can understand it but it will put it in its VLAN 1 ports there is no actual trunking going on. No 802.1q or ISL!
juniperr @ DevShed

cPanel humor: Uninstalling cPanel

It’s always been known that once you install cPanel, you cannot uninstall it, the only way to get rid of it reinstall the server completely. However, I was doing some work on a server and ran into this file:

/usr/local/cpanel/uninstall

And the content:
#!/bin/sh
# Uninstall script for Cpanel
rm -f "/usr/local/cpanel/uninstall"
#### END OF UNINSTALL
echo "Cpanel has been uninstalled."

Nice to know that developers still have a sense of humor. cPanel’s developers are always known to be down to earth guys and I can assure you that’s true! (not if that little bit of code in a production program didn’t prove it! :P )

HyperVM login error: not_in_list_of_allowed_ip

We had a client who was not able to login to his HyperVM control panel to make modifications to his virtual private server, the error he was getting was the following:

Alert: not_in_list_of_allowed_ip [xx.xx.xx.xx]

The IP of the client is located at the “xx.xx.xx.xx” part, this is easily fixed by clearing the block list on the server with this command (on the main node), you must replace the user.vm part by the username of the client at HyperVM (most of the time, something.vm).

/script/clearallowedblockedip --class=client --name=user.vm

It should return something like the following:

AllowedIp Sucessfully cleared for client:user.vm

Afterwards, the client/you should be able to login with no problem at all.

Horde broken or not working correctly on cPanel server

Simple, short & sweet post, this should easily fix any problems you’re having

/usr/local/cpanel/bin/update-horde --force

Also, this might be useful to run hourly, I leave this on our servers “just in case”.

(mysqlcheck --auto-repair eximstats ; mysqlcheck --auto-repair horde) >/dev/null 2>/dev/null

/tmp clean-up script modification, sessions dying with PHP

It seems there there was a little flaw in the script that I wrote a while ago, any PHP sessions on the server will timeout/die after 1 hour if you run that as an hourly cronjob, I have made a small modification to the script.

The only small modification is that now, it deletes all sess_* files that have not been accessed for 5 days therefore are probably just sitting there and never going to be used again, the rest remain deleted because it’s failed uploads/etc that will never be used again.


#!/bin/bash

# Change directory to /tmp
cd /tmp

# Clean up trash left by Gallery2
ls | grep '[0-9].inc*' | xargs rm -fv

# Clean up PHP temp. session files
find /tmp -atime +5 -name 'sess_*' -print | xargs rm -fv

# Clean up dead vBulletin uploads
ls | grep 'vbupload*' | xargs rm -fv

# Clean up failed php uploads
ls | grep 'php*' | xargs rm -fv

# Clean up failed ImageMagick conversions.
ls | grep 'magick*' | xargs rm -fv

Thanks!

Cleaning up /tmp directory on busy cPanel web hosting servers

Usually, the /tmp directory is one of the frequently accessed directories, temp files of MySQL, PHP and other applications are placed and often if processes die, left over. Uploads using PHP are always uploaded to the /tmp directory till they are complete, if you have some dying processes, you will end up with a filled /tmp directory which is hell.

Why? Because MySQL leaves and uses it’s temporary files in /tmp, and if there is no space in there, queries will start failing. Uploads from PHP or Perl are placed in there till the upload process is over, they cannot be further placed there because there is no more space left. So far, we have failing MySQL & inability to upload complete PHP files, system administrator hell.

Easy fix, you might say? Just a simply rm -rf / should take care of it? Nope. Try that, have fun trying to fix the sockets you deleted, specifically applications that depend on the mysql.sock placed in your /tmp directory, things just got worse. In case you actually did delete everything, just restart the services, they should re-appear, if they don’t, they should be somewhere else and you have to create a symbolic link using li, MySQL’s socket is usually located at /var/lib/mysql/mysql.sock.

The best way is to either have a script that cleans it up hourly if you know what usually fills it up or manually run ls -alhS /tmp | head and looking what’s causing the problem and how to avoid it in the future, I have developed a script that I run on multiple servers with no problems at the moment. It takes care of the most trash caused on a cPanel server

#!/bin/bash

# Change directory to /tmp
cd /tmp

# Clean up trash left by Gallery2
ls | grep '[0-9].inc*' | xargs rm -fv

# Clean up PHP temp. session files
ls | grep 'sess_*' | xargs rm -fv

# Clean up dead vBulletin uploads
ls | grep 'vbupload*' | xargs rm -fv

# Clean up failed php uploads
ls | grep 'php*' | xargs rm -fv

# Clean up failed ImageMagick conversions.
ls | grep 'magick*' | xargs rm -fv

That usually is enough, my suggestion is to have that run as a cronjob every hour, but I’m not going in detail on how to do that, because if you don’t know how to setup a cronjob, perhaps you shouldn’t be messing around in /tmp directories and deleting stuff on the first place!

Update: This script is faulty and will cause you a lot of problems with PHP sessions, please read more information and read the new one here

AACRAID based controllers timing out / aborting / SCSI hang

We’ve been lately starting to use more Adaptec RAID controllers rather than 3ware RAID controllers.  3ware has been nothing but trouble for us, dropping hard drives, even RAID5 arrays are running slower than a regular hard drive with no RAID.  Our latest issue was a server just simply having a Kernel Panic when using high IO, our experience with 3ware RAID controllers & Linux is terrible.

On this other side, Adaptec has been great.  We’ve been using them for a while now and see no problems at all, however there is just a small catch, Linux usually has a SCSI subsystem timeout of less than 30 seconds which results in a small difference between the controller timeout (at 35 seconds) versus the Linux timeout (at 30 seconds).  This usually brings a server to a halt for a couple of seconds (and minutes in cases) till the server recovers, errors like this are thrown in the console:

aacraid: Host adapter abort request (0,1,3,0)
aacraid: Host adapter abort request (0,1,1,0)
aacraid: Host adapter abort request (0,1,2,0)
aacraid: Host adapter abort request (0,1,1,0)
aacraid: Host adapter abort request (0,1,2,0)
aacraid: Host adapter reset request. SCSI hang ?

The best method that usually works best is to increase the timeout higher than 45 to ensure that the Linux timeout does not occur before the RAID controller timeout, this is done per device / array.

echo '45' > /sys/block/sda/device/timeout
echo '45' > /sys/block/sdb/device/timeout
echo '45' > /sys/block/sdc/device/timeout

This should be done to every device, 45 is a good number however you can use what you’d like as long as it’s over 35. If you’re experiencing issues with loads going sky-high with no apparent reason, this might very well be the reason, to check if this is a possible cause, you can run the following

dmesg | grep aacraid

If you see errors like the ones that I have up there, then I suggest using that small workaround, if even after using the workaround, you’re still facing these problems, here are the suggestions/checklist that Adaptec suggests:

  • Check for any updated firmware for the motherboard, controller, targets and enclosure on the respective manufacturer’s web sites.
  • Check per-device queue depth in SYSFS to make sure it is reasonable.
  • Engage disk drive manufacturer’s technical support department to check through compatibility or drive class issues.
  • Engage enclosure manufacturer’s technical support department to check through compatibility issues.

Anyhow, just like with every Linux issue, your mileage may vary, so if you know of any other fixes or figured out a way how to fix this, feel free to post it as a comment to help others.

Migrating LVM volumes over network (using snapshots)

We run a big share of Xen virtual servers spanned over multiple servers and if you want to use the full or best capability of Xen, I would suggest LVM (Logical Volume Manager), it makes life a lot easier, especially for those who do not run a RAID setup (We run RAID10 on all VM nodes) as you can split the partition over multiple hard drives. I’m not going to cover setting up the LVM as there are loads of tutorials on how to do that but I will rather cover the best way to migrate a LVM volume.

First, we will need to create a snapshot of the LVM volume as we cannot create an image of the live version, we run the following line:
lvcreate -L20G -s -n storageLV_s /dev/vGroup/storageLV
The 20G part is the size of the snapshot LVM, I would suggest looking up the size of the real original LV and making it the same, you can find out the size of the LV by using this command: lvdisplay /dev/vGroup/storageLV — There will be a “LV Size” field, get it from there and put it in the command, the -n switch is for the name, usually I name them the same as the LV with a trailing _s for snapshot, the last argument is simply the real LV that we want to make a snapshot of.

Afterwards, we will use dd in different way, usually if you use dd in one line, it’s either reading or it’s either writing which makes it crawl, to bypass this, we will read the LV and pipe it to one that writes so the minimum speed is the fastest speed of the slowest hard drive (I could re-phrase that but it’s 11:10 PM!) — To speed it up a bit more, we used a block size of 64K.
dd if=/dev/vGroup/storageLV_s conv=noerror,sync bs=64k | dd of=/migrate/storageLV_s.dd bs=64k

I won’t cover the file transfer process as there are multiple methods, if you want to use SCP, I would suggest disabling encryption or anything as it really slows it down, our node usually has httpd installed on them so I simply changed the configuration to listen on a different port (for security) and changed the DocumentRoot to /migrate

Once you got your file on the server, you’ll need to re-create the LV on the target server, you’ll need to run this
lvcreate -L20G -n storageLV vGroup
You’ll have to keep the same size, bring the same name (this time without a trailing _s as it won’t be a snapshot) and the volume group at the end.

The last step is to actually restore the image using dd, again using our block-size & pipe tweak for better performance.
dd if=/migrate/storageLV_s.dd conv=noerror,sync bs=64k | dd of=/dev/vGroup/storageLV bs=64k

I have migrated around 16 LVs with this method without any problems, 13 of them were 20G each, 2 40G and 1 75G — So far every part is fast however I have to admit that the slowest part was the file transfer, I would suggest using a Gbit crossover or even better if you have a Gbit switch, if you don’t but you’re right next to the server, might consider using a spare USB 2.0 HDD as they are much faster compared to 100mbps (USB2.0 is around 480Mbps).