Unix and Linux Help, Resources and information for Unix/Linux, Mac OS X. Articles on blogging, web site mechanics, and self employment. Mostly techy, Unix/Linux related, but we don't really try to stay tightly focused. If you've never been here before, there's a lot to explore.
Kerio is renaming its mailserver to "Kerio Connect". My initial reaction is that changing "Mailserver" to "Connect" is a bad idea. They feel that because the product is so much more than a mailserver, changing the name is appropriate.
I do partly agree. Many of my customers only use the email features and ignore all of the collaboration aspects. Sometimes that's because they see no need for scheduling, etc. and sometimes it's because they just never noticed those capabilities (someone only using POP or IMAP, for example).
Well, regardless of how I feel about it, the next release will be Kerio Connect. It's in beta now and looks solid enough that its release is likely not very far way, so let's take a look at what it offers.
A few releases back Kerio introduced scheduled administrative deletion of Junk and Deleted Items messages. With Kerio Connect 7, this has been extended to include all folders (except Contacts, for obvious reasons). You can set a policy that automatically deletes all messages domain-wide after so many years.
I don't see this as particularly useful for most companies. First, it's only domain-wide - at least right now, this hasn't been brought to individual users as the control of Junk and Deleted is now. Some users may need to keep some email (or particularly specific email folders) forever. I think this needs more fine-grained control to be useful.
One of the first features I noticed is distributed domains. From the manual:
If your company uses more Kerio Connect servers physically scattered (located in different cities, countries, continents), you can now add them to a cluster and move all users across all servers involved into a single email domain (distributed domain).
Note that this isn't load balancing. One server is the master point where all incoming email arrives; it is responsible for relaying any that belong at a satellite server. The "slave" servers should have the master set as their relay also if you want single-point archiving and backup.
The "Master /Slave" designation is arbitrary. All servers are really peer to peer and use the same directory service. You determine which servers mailboxes belong on which server and which is the master. Obviously that would need to be the server that is set as the MX for the domain also.
You'll find this in Services, set to run on port 587. Kerio suggests using this to get around the problem of outgoing port 25 being blocked at hotels and public access points. The user sets his outgoing SMTP port to 587 and the Kerio server listens on that. As this service requires authentication, it can't be used by spammers - unless they've hacked the user's account, of course, but at least we do then absolutely know the source of the spam!
The Message Submission service is defined in RFC 2476 and has much more to do with mail architecture than just bypassing blocked ports. This FAQ: SMTP Message Submission to Proposed Standard describes the reasoning behind the RFC.
The release notes say:
Kerio Connect uses a more efficient file access method to the message store data. This includes the properties.fld database access and listing mailbox folders.
That doesn't tell us much, does it?
The "properties.fld" file is apparently IMAP annotation data. It's interesting to look at what these metadata files are:
But that still doesn't tell us how any of this helps speed up file access.
Kerio has a problem with very large mailbox folders. They store individual mail messages rather than packing them all into one database as Exchange does. I think that's the right approach, but it can cause performance issues. These files are designed to help that by providing a mix: database files pointing to individual messages. Apparently this release has made some changes in this area; we'll see if it helps very large folders.
I'd much prefer to see domain or user controlled folder archiving. I'm not referring to the archiving found in Archiving and Backup but rather moving older messages into another folder at regular intervals. For example, if this were set for monthly archiving, everything in your Inbox from last month would be moved to Inbox-2009-09. But that's not a feature of this release and may never be.
With Kerio Connect 7, you can now do all administrative functions through your web browser. For example, on the server itself, I can connect to http://localhost/admin.
This actually runs on port 4040 and there's no control over that in Administration. However, it is listed in mailserver.cfg, so you could adjust this as necessary.
Speaking of config files, there's a new cluster.cfg file. I assume that is for the distributed domains mentioned earlier, but there is also an undocumented Cluster section in the mailserver.cfg, so bigger plans may be afoot. That's pure speculation, of course.
The web administration is very useful - it's not that it's at all difficult to download the free administrative console, but having this available from any web browser is handy,
The release notes mention over-the-air synchronization of HTC Hero mobile devices and that the IMAP server has been improved for better support of multi-session IMAP connections. You'll be able to rename a domain also. That seems to be about it.
/Kerio/connect-beta.html copyright and reprint notice
I wrote this up after a forum discussion in which several posters didn't really understand why ">" can free disk space when "rm" cannot. The basic problem is that if another process has a file open (for reading or writing, it doesn't matter), the disk blocks are not freed by an "rm" until the process or process using the file quits (or stops using the file, at least). That part seems to be well understood.
What is perhaps more difficult to understand is why a simple ">" CAN free up the bytes that "rm' cannot.
For those very new to Unix/Linux: if you are using almost any shell but "csh", a ">" followed by a file name will empty that file. That is, it will be truncated to zero bytes without changing the ownership or permissions. On more recent Linux machines, you may have a "truncate" command that will do the same thing.
For those NOT new to Unix or Linux, this article isn't meant for you. Unfortunately, it was linked to from some places frequented by more advanced users. Feel free to read it, of course, but it's not going to tell you anything you do not already know.
To show that, we need to write a little code. I'll use Perl for that, but if you don't grok Perl, don't worry -I'll explain it as we go along. I did this on a Mac, but you'd see the same thing on Linux or BSD.
Let's start with the "rm" issue. Our Perl code will just open a file and loop. We'll run that in one Terminal window and do everything else in another.
The script just opens "t" and then loops. It never reads or writes anything, but it does have "t" open while running.
The file "t" already exists before running this and is large enough to notice its absence in "df". Here it is before the script runs and while it runs:.
If we now remove "t", nothing will change:
When we interrupt the script, disk space is reclaimed:
However, if we do the same thing with ">", diskspace will be reclaimed instantly:
I demonstrated similar code at the forum and quickly got back this comment:
The > trick will simply remove the data in the file and have no need for the os to clear up unused data so this might work unless the process remembers where in the file it is appending too and always does a seek.
Let's see if that's true. We'll need a different script:
This time, the script writes 4096 bytes (4096 "x"'s) on every loop. I'm not going to bother to show the listings and df's; the behavior is exactly the same: the bytes are freed as soon as you do "> t".
But our doubting poster mentioned "seek". For those who do not know, a seek moves the writing or reading position of the file to a specific place. We can do that with Perl:
This doesn't do anything different than the previous script, it just does it another way. Instead of just writing bytes, it specifically positions itself before writing. If nothing else is happening with "t", no different outcome is expected.
What happens when we do "> t" while that puppy is running?
Instant reclaim. But on the next write from the script, the file's size is right back up:
But - and this is the important part - notice that the available disk space did NOT change!
What happens here is that the file goes to zero and available space increases, but then when the writer writes again, it's back to a large size instantly. That's because of the "seek" - the bytes were written at a specific position. But the available space is NOT back to what it was, and "od" shows why:
If you had looked at "t" before the ">", it would have looked like this:
"od" shows repeated bytes by an '*" - since we are writing nothing but "x", there's no need to show more. After the ">", "od" shows "0" in all of the bytes up to the subsequent write by the script. Those nul bytes aren't really there - this is a "sparse" file. It was created by the absolute seeks of the perl script after "> t" had emptied the file.
If you didn't understand this, I encourage you to play with these scripts on your own system. To avoid confusion, the system should be relatively "quiet" while you do this - I couldn't control that absolulutely here so some figures shown by "df" are different than you might expect - that's because I had a few other things going on while doing this. You still should be able to see that ">" really does reclaim space.
The lesson is this: if another process has a file open, use ">" to reclaim the disk space. If the process is doing absolute seeks, you may not be able to tell with "ls -l" that the space has been reclaimed, but it will be.
/Unixart/freeing-disk-space.html copyright and reprint noticeDid I just read a complaint about the article content being "too complicated?"
Also, who's that Tiony guy that's passing himself off as A.P. Lawrence? <Grin>
Fri Nov 20 16:41:03 2009 TonyLawrence
I saw that Tiony guy earlier today in the bathroom when I was shaving. He looks dangerous.
Fri Nov 20 16:52:59 2009 TonyLawrence
But seriously:
The guy complaining thinks I should have used "truncate". I disagree because that's relatively new and is Linux centric. But he's right that I should have fully explained that ">" truncates a file. I didn't think about that while writing this because of the context - I was writing for people who knew what ">" does but didn't think it would free up disk space if the file were in use.
I'm going to add a paragraph up there to correct that oversight. I already removed the gratuitous "}" that snuck itself into one Perl script earlier (which that same reader also noticed).
As to being overly long and using unnecessary examples, I make no apology. The people at that forum needed specific examples.
Fri Nov 20 20:19:33 2009 Open with O_TRUNC frees blocks anonymous
I really doubt that either bash or csh uses ftruncate. Instead, they are probably using the O_TRUNC flag to open, which is part of POSIX and thus quite portable. From man 2 open:
O_TRUNC: If the file already exists and is a regular file and the open mode allows writing (i.e., is O_RDWR or O_WRONLY) it will be truncated to length 0. If the file is a FIFO or terminal device file, the O_TRUNC flag is ignored. Otherwise the effect of O_TRUNC is unspecified.
Fri Nov 20 20:23:14 2009 TonyLawrence
OK.
And your point is what?
Sat Nov 21 17:32:43 2009 anonymous
Some files should not be truncated though, but just removed, as they may still be in use (and I mean something reading from it, like a kernel mount on a loopback-mounted ISO file).
Sat Nov 21 17:46:54 2009 TonyLawrence
Some files should not be truncated though
Yes, good point - if mounted, you definitely would not want to truncate without unmounting.
Add your comments
I hope this book helps put an end to ridiculously priced SEO "courses". It could also help eliminate some of the shadier practitioners of SEO, but I'd be happy if it just helps a few people avoid wasting money on nonsense.
This book isn't nonsense. It's a complete exposition of the state of SEO as it stands today. More than 500 pages cover everything you could ever hope to know about optimizing your website for search engines.
I've said before that it's unnecessary to pay for expensive courses to learn SEO because all of it, every single thing you'd ever need to know, is available on the Internet for free. That's still true, though of course the problem is that you have to find it and learn to discern good advice from bad, outdated concepts from those that reflect current reality. This book pulls everything together in 500 pages or so,
Of course it's not for everyone. It assumes some technical knowledge or at least access to someone with that knowledge. As an example, at several points 301 redirects are used as the solution certain SEO issues. Although some minor direction is given about implementing these, you'd need more than is provided here if you were a neophyte - someone using Blogger or Wordpress.com isn't going to become proficient with Apache from reading this.
The only small complaint I can make is that sometimes the authors use too many examples, especially for the more basic concepts at the beginning of the book. However, too many is far better than too few, so I won't complain too much. I definitely can't complain that they left anything out: this could serve as a course book for a SEO class.
So, once more: don't waste your money on expensive "courses" from self-styled Internet Gurus. Buy this instead. If you are already reasonably proficient in this area, this book can serve as a reference and refresher. Either way, I strongly recommend it.
Tony Lawrence 2009-05-03 Rating:
Order (or just read more about) The Art of SEO from Amazon.com. Yes, I earn a small referral fee if you use that link to purchase the book.
I have sometimes seen people use a pipeline that includes "sort | uniq". The result of that is no different than just adding a -u flag to sort and absolutely requires more time and processing power - not that it usually matters; unless the input is humongously long, you'd need to run them through "time" to spot any difference. So why use "uniq"?
For cases like that, where there is no difference in the output, it's probably just habit - you may be accustomed to using "uniq" for other jobs and just reach for it automatically. I'll argue that it's a good habit to have: if you are in the habit of using "sort -u", you may tend to forget about "uniq" and that could cause you do do something much more difficult and clumsy when a job needs something that "uniq" does well.
However, it's also true that "sort" has tricks that "uniq" lacks, so if you only know about "uniq", you again could make your life more difficult.
One of the helpful abilities that "sort" has is the ability to specify the field separator. Let's take a sample file:
If all we cared about was removing duplicatelines, we could use "sort -u file" or "sort file | uniq". But what if we want to sort by the second field?
We can do that directly with "sort -t: -k 2 -u", but it's much harder to do with "uniq" because you can't tell it a separator character. You can get around that partially with "tr" or "sed", translating ":"'s to spaces or tabs, but that's clumsy. Even after translating, "uniq" only lets you skip fields, so you don't get quite the same output:
We could argue about which output truly represents unique lines when sorted on field 2, but the point to understand is that skipping fields isn't the same as what "sort" does.
You can also lock down fields with "sort" :
As "uniq" can only skip fields and can't anchor to one field only, it's much harder to get these results. However, "uniq" again has tricks that "sort" can't do: it can skip a specific number of characters in addition to skipping fields. It can also give you only the unique lines or only the lines that were repeated:
Either of those is extremely convoluted without "uniq", and the need for one or the other does come up surprisingly often.
Somebody thought that we could use "sort" and "uniq" in one program: Sortu is the result.
The sortu program is a replacement for the sort and uniq programs. It is common for Unix script writers to want to count how many separate patterns are in a file. For example, if you have a list of addresses, you may want to see how many are from each state. So you cut out the state part, sort these, and then pass them through uniq -c. Sortu does all this for you in a fraction of the time.
I think by the time I figured out how to use "sortu" I could have already done the job another way, but you might find it interesting anyway.
I think the important thing is to realize that "sort" and "uniq" have both conflicting and complementary abilities. Don't tie yourself in pipeline knots with either of them; learn to use each of them appropriately and your scripts will be easier.
/Unixart/sort-vs-uniq.html copyright and reprint noticeAs for uniq, I know about it but for some reason have never found a good reason to use it. I guess sort and pipes are too entrenched in my mind.
Mon Nov 16 20:04:39 2009 TonyLawrence
But "uniq" can do things sort -u cannot...
Add your comments
Girish Venkatachalam is a UNIX hacker with more than a decade of
networking and crypto programming experience.
His hobbies include yoga,cycling, cooking and he runs his own
business. Details here:
http://gayatri-hitech.com
http://spam-cheetah.com
It is really easy to create a USB bootable OpenBSD LiveUSB image. With that you can do just about anything you want. Don't believe me?
Then head to http://liveusb-openbsd.sourceforge.net and download the USB image. Boot it and find out!
You can watch videos with mplayer in full screen, you can bask in the glory of mplayer's sexy OSD menu, you can read manual pages in color, you can lookup English words using the dictionary client, you can chat with pidgin, you can browse with Mozilla Firefox, you can use sox to convert audio, you can play any video or audio file with mplayer, you can stream audio from the Internet, you can do whatever you want!
Moreover you can also use the rich repertoire of tiny but incredibly powerful tools like netcat, socat, nmh, mutt, vim, randtype, figlet. After all the man pages tell you how to use these tools and you have examples too. You also have ready access to the perl, python and lua interpreters, you have all the spam control daemons, the routing protocols like BGP or OSP, you have FTP server, HTTP server or you could do image processing with ImageMagick.
There is one detail however.
You have to use DHCP to connect to the Internet if your ADSL MODEM dishes out dynamic IPs or you can configure the IP using the ifconfig command. Usually this will do.
# dhclient vr0 (Your ethernet interface could be fxp0, rl0 or something else, find out with ifconfig)
Give it a whirl and get in touch with me should you have any issues using this. After all it is free and open source.
And oh by the way the fixed write cycles of USB memory drives is largely a myth.
/Girish/LiveUSB.html copyright and reprint notice
Comments /Kerio/connect-beta.html
Add your comments