JavaTechie

Its all about Technology

How to filter spam with Spamassassin and Postfix in Debian June 30, 2008

Filed under: Mail — javatechie @ 8:54 am
Tags: , ,

Install Spamassassin in Debian

#apt-get install spamassassin spamc

spamassassin package can also be integrated into a Mail Transport Agent such as postfix.

Preparation

By default Spamassassin will run as root users when you install from debian repository and is not started to avoid that, we are going to create a specific user and group for spamassassin.

#groupadd -g 5001 spamd

#useradd -u 5001 -g spamd -s /sbin/nologin -d /var/lib/spamassassin spamd

#mkdir /var/lib/spamassassin

#chown spamd:spamd /var/lib/spamassassin

we need to change some settings in /etc/default/spamassassin and make sure you get the following values

ENABLED=1
SAHOME=”/var/lib/spamassassin/”
OPTIONS="--create-prefs --max-children 5 --username spamd --helper-home-dir ${SAHOME} -s ${SAHOME}spamd.log"
PIDFILE=”${SAHOME}spamd.pid”

We are going to run spamd daemon as user spamd and make it use its own home dir (/var/lib/spamassassin/) and is going to output its logs in /var/lib/spamassassin/spamd.log

spamassassin Configuration

we need to give spamassassin some rules. The default settings are quite fine, but you might tweak them up a bit. So let’s edit /etc/spamassassin/local.cf and make it looks like that

#vi /etc/spamassassin/local.cf

Modify this file looks like below

rewrite_header Subject [***** SPAM _SCORE_ *****]
required_score 2.0
#to be able to use _SCORE_ we need report_safe set to 0
#If this option is set to 0, incoming spam is only modified by adding some “X-Spam-” headers and no changes will be made to the body.
report_safe 0

# Enable the Bayes system
use_bayes 1
use_bayes_rules 1
# Enable Bayes auto-learning
bayes_auto_learn 1

# Enable or disable network checks
skip_rbl_checks 0
use_razor2 0
use_dcc 0
use_pyzor 0

we set spamassassin’ spamd default settings to rewrite email subject to [***** SPAM _SCORE_ *****], where _SCORE_ is the score attributed to the email by spamassassin after running different tests, only if the actual score is greater or equal to 2.0. So email with a score lower than 2 won’t be modified.

To be able to use the _SCORE_ in the rewrite_header directive, we need to set report_safe to 0.

In the next section, we tell spamassassin to use bayes classifier and to improve itself by auto-learning from the messages it will analyse.

In the last section, we disable collaborative network such as pyzor, razor2 and dcc. Those collaborative network keep an up-to-date catalogue of know mail checksum to be recognized as spam. Those might be interresting to use, but I’m not going to use them here as I found it took long enough to spamassassin to deal with spams only using it rules.

Restart spamassassin using the following command

#/etc/init.d/spamassassin start

Configuring Postfix call Spamassassin

spamassassin will be invoked only once postfix has finished with the email.

To tell postfix to use spamassassin, we are going to edit /etc/postfix/master.cf

#vi /etc/postfix/master.cf

Change the following line

smtp inet n – – – – smtpd

to

smtp inet n – – – – smtpd
-o content_filter=spamassassin

and then, at the end of master.cf file add the following lines

spamassassin unix – n n – – pipe

user=spamd argv=/usr/bin/spamc -f -e
/usr/sbin/sendmail -oi -f ${sender} ${recipient}

Save and exit the file

That’s it our spam filter is setted up, we need to reload postfix settings and everything should be ready.

#/etc/init.d/postfix reload

 

URL REWRITE MOD_REWRITE IN PHP June 14, 2008

Filed under: PHP — javatechie @ 6:47 am
Tags:

What is mod_rewrite?

Mod_rewrite is an Apache extension which allows you to “rewrite” the URLs of your web pages.

If your server supports this technology (most linux webhosts do nowadays) you are able to rewrite virtually any URL into anything you like. Most often it is used to rewrite the URLs of dynamicly generated webpages such as www.mywebsite.com/index.php?par1=1&par2=2&par3=2… This can easy be ‘translated’ into www.mywebsite.com/par1/par2/par3

Why mod_rewrite?

- Search engine optimization – there are a lot of debates on this topic, but it is still true that the static-looking links rank better than the dynamic ones.

Here is a comfirmation from Google on that topic:

“Your pages are dynamically generated. We’re able to index dynamically generated pages. However, because our web crawler could overwhelm and crash sites that serve dynamic content, we limit the number of dynamic pages we index. In addition, our crawlers may suspect that a URL with many dynamic parameters might be the same page as another URL with different parameters. For that reason, we recommend using fewer parameters if possible. Typically, URLs with 1-2 parameters are more easily crawlable than those with many parameters.”

- User-friendlyness – Some users remember the URLs visally. Even if they bookmark, they can easier recognize a link like www.mywebsite.com/services.html than www.mywebsite.com/index.php?task=12 for example.

- Security – mod_rewrite helps you hide the parametters passed in the application. Basicly your dynamic pages should be secure enough even without mod_rewrite. But hiding the parametters will decrease the danger of attack

How to use it?

Mod_rewrite is really powerful if you are familiar with the regular expressions which it uses.

But learning the whole pattern syntax can be quite complicated, especially for the non-technical user. Thats why i’ll teach you at several simple patterns which are pretty enough to get your website URLs rewritten.

Lets start:

First you need to create a file called .htaccess and place it exactly in the folder where you want the rewriting to take effect (it will also take effect over all subfolders). In case you already have a .htaccess file you can simply add the lines to it (if it already has mode_rewrite directives you can mess them however).

Open it in a simple text editor an start with:

Options +FollowSymLinks
RewriteEngine on

Now the rewrite engine is switched on. You can now start adding as many rewrite rules as you want. The format is simple:

RewriteRule rewrite_from rewrite_to

Here “RewriteRule” is static text, i.e. you should not change. “rewrite_from” is the address which will be typed in the browser and “rewrite_to” – which page the server will actually activate. Both of these can contain “masks”, but in “rewrite_to” we will only use $ and will discuss more or “rewrite_from” part. Let me “meet you” with the very few masks you’ll need and bring you some samples. You’ll see how easy is it.

Let’s stop talking theory and see an example. Let’s imagine your server runs an e-shop, which uses URLs like index.php?task=categories to list the categories, index.php?task=category&id=5 to show a category contents and other parametters in ‘task’ to do other things.

RewriteRule ^(.*).html index.php?task=$1

What does all that mean? This is a rewrite rule which allows you to make your URLs looking as “static”. In this example categories.html will be “translated” to index.php?task=categories.

So you no longer need dynamic URL to list ther categories, but can write categories.html

But what do all these strange characters mean?

- ^ character marks the beginning. I.e. you tell the server that it should not expect anything before it.

- (.*) – This combination is the most often used and it means literally “everything”. So everything you type before “.html” (i.e. your fake file name) will be passed as:

- $1 – This is a parametter, saying where the first mask should be put. If you have more than one masks (masks are everything which you use to represent dynamich text or file names) you can use $2, $3 etc. You’ll seemor ein the following examples.

So, if you have categories.html it will be translated info index.php?task=categories, services.html into index.php?task=services etc…

What if you have more than one parametter? First, you should use some characters as delimiter:

RewriteRule ^(.*)-(.*).html index.php?task=$1&language=$2

Here how you can also pass task and language. For example: categories-englist.html will be translated into index.php?task=categories&language=english.

IMPORTANT: If you first write RewriteRule ^(.*).html index.php?task=$1 The second one may not work. You need to always start from the most complicated rule to the simplest one.

Make it Better:

The rule (.*) is too general and often may prevent you of making more complicated rewriting rules. So it is recommended that you “limit” the rules into something more concrete. Here are a couple of advices:

- Use the “OR” operator. In our e-shop example we have only few possible “tasks” passed to index.php. Lets say:

index.php?task=categories
index.php?task=category
index.php?task=product
index.php?task=services

What will happen if you want to use your static file about.html? It will be rewritten into index.php?task=about and won’t work. So you can use the OR operator and limit the rewriting only to the cases you need:

RewriteRule ^(categories|category|product|services).html index.php?task=$1

This tells the server to rewrite only if the file name is categories.html OR category.html OR product.html OR services.html

- Using “numbers”. You can easy limit the rewriter to rewrite if it meets only numbers at a certain place:

RewriteRule ^category-([0-9]*).html index.php?task=category&id=$1

With ([0-9]*) mask you tell the rewrite engine that on the mask place it should expect onlly numbers. So if it see category-english.html it won’t rewrite to index.php?task=category&id=english, but to index.php?task=category&language=english (because of the rule we have shown above – RewriteRule ^(.*)-(.*).html index.php?task=$1&language=$2.).

Complete example: Here is how will look the final .htaccess file for our imaginary e-shop:

--------
Options +FollowSymLinks
RewriteEngine on

RewriteRule ^(.*)-(.*).html index.php?task=$1&language=$2.
RewriteRule ^(categories|category|product|services).html index.php?task=$1
RewriteRule ^category-([0-9]*).html index.php?task=category&id=$1

 

HttpSessionListener in Servlets June 12, 2008

Filed under: Java — javatechie @ 5:16 am
Tags:
public interface HttpSessionListener
extends java.util.EventListener

Implementations of this interface are notified of changes to the list of active sessions in a web application. To receive notification events, the implementation class must be configured in the deployment descriptor for the web application.

sessionCreated

public void sessionCreated(HttpSessionEvent se)

Notification that a session was created.
Parameters:
se – the notification event

sessionDestroyed

public void sessionDestroyed(HttpSessionEvent se)

Notification that a session is about to be invalidated.
Parameters:
se – the notification event

 

Cookie in Java June 12, 2008

Filed under: Java — javatechie @ 5:12 am
Tags:
public class Cookie
extends java.lang.Object
implements java.lang.Cloneable

Creates a cookie, a small amount of information sent by a servlet to a Web browser, saved by the browser, and later sent back to the server. A cookie’s value can uniquely identify a client, so cookies are commonly used for session management.

A cookie has a name, a single value, and optional attributes such as a comment, path and domain qualifiers, a maximum age, and a version number. Some Web browsers have bugs in how they handle the optional attributes, so use them sparingly to improve the interoperability of your servlets.

The servlet sends cookies to the browser by using the HttpServletResponse.addCookie(javax.servlet.http.Cookie) method, which adds fields to HTTP response headers to send cookies to the browser, one at a time. The browser is expected to support 20 cookies for each Web server, 300 cookies total, and may limit cookie size to 4 KB each.

The browser returns cookies to the servlet by adding fields to HTTP request headers. Cookies can be retrieved from a request by using the HttpServletRequest.getCookies() method. Several cookies might have the same name but different path attributes.

public Cookie(java.lang.String name,
              java.lang.String value)

Constructs a cookie with a specified name and value.

The name must conform to RFC 2109. That means it can contain only ASCII alphanumeric characters and cannot contain commas, semicolons, or white space or begin with a $ character. The cookie’s name cannot be changed after creation.

 

Transparent PNGs in Internet Explorer 6 June 4, 2008

Filed under: CSS — javatechie @ 3:52 am
Tags:

Transparent PNGs in Internet Explorer 6 by Drew McLellan

Newer breeds of browser such as Firefox and Safari have offered support for PNG images with full alpha channel transparency for a few years. With the use of hacks, support has been available in Internet Explorer 5.5 and 6, but the hacks are non-ideal and have been tricky to use. With IE7 winning masses of users from earlier versions over the last year, full PNG alpha-channel transparency is becoming more of a reality for day-to-day use.

However, there are still numbers of IE6 users out there who we can’t leave out in the cold this Christmas, so in this article I’m going to look what we can do to support IE6 users whilst taking full advantage of transparency for the majority of a site’s visitors.

So what’s alpha channel transparency?

Cast your minds back to the Ghost of Christmas Past, the humble GIF. Images in GIF format offer transparency, but that transparency is either on or off for any given pixel. Each pixel’s either fully transparent, or a solid colour. In GIF, transparency is effectively just a special colour you can chose for a pixel.

The PNG format tackles the problem rather differently. As well as having any colour you chose, each pixel also carries a separate channel of information detailing how transparent it is. This alpha channel enables a pixel to be fully transparent, fully opaque, or critically, any step in between.

This enables designers to produce images that can have, for example, soft edges without any of the ‘halo effect’ traditionally associated with GIF transparency. If you’ve ever worked on a site that has different colour schemes and therefore requires multiple versions of each graphic against a different colour, you’ll immediately see the benefit.

What’s perhaps more interesting than that, however, is the extra creative freedom this gives designers in creating beautiful sites that can remain web-like in their ability to adjust, scale and reflow.

The Internet Explorer problem

Up until IE7, there has been no fully native support for PNG alpha channel transparency in Internet Explorer. However, since IE5.5 there has been some support in the form of proprietary filter called the AlphaImageLoader. Internet Explorer filters can be applied directly in your CSS (for both inline and background images), or by setting the same CSS property with JavaScript.

CSS:

  1. img {
  2. filter: progid:DXImageTransform.Microsoft.AlphaImageLoader(...);
  3. }
  4. Source: /code/supersleight-transparent-png-in-ie6/1.txt

JavaScript:

  1. img.style.filter = "progid:DXImageTransform.Microsoft.AlphaImageLoader(...)";
  2. Source: /code/supersleight-transparent-png-in-ie6/2.txt

That may sound like a problem solved, but all is not as it may appear. Firstly, as you may realise, there’s no CSS property called filter in the W3C CSS spec. It’s a proprietary extension added by Microsoft that could potentially cause other browsers to reject your entire CSS rule.

Secondly, AlphaImageLoader does not magically add full PNG transparency support so that a PNG in the page will just start working. Instead, when applied to an element in the page, it draws a new rendering surface in the same space that element occupies and loads a PNG into it. If that sounds weird, it’s because that’s precisely what it is. However, by and large the result is that PNGs with an alpha channel can be accommodated.

The pitfalls

So, whilst support for PNG transparency in IE5.5 and 6 is possible, it’s not without its problems.

Background images cannot be positioned or repeated

The AlphaImageLoader does work for background images, but only for the simplest of cases. If your design requires the image to be tiled (background-repeat) or positioned (background-position) you’re out of luck. The AlphaImageLoader allows you to set a sizingMethod to either crop the image (if necessary) or to scale it to fit. Not massively useful, but something at least.

Delayed loading and resource use

The AlphaImageLoader can be quite slow to load, and appears to consume more resources than a standard image when applied. Typically, you’d need to add thousands of GIFs or JPEGs to a page before you saw any noticeable impact on the browser, but with the AlphaImageLoader filter applied Internet Explorer can become sluggish after just a handful of alpha channel PNGs.

The other noticeable effect is that as more instances of the AlphaImageLoader are applied, the longer it takes to render the PNGs with their transparency. The user sees the PNG load in its original non-supported state (with black or grey areas where transparency should be) before one by one the filter kicks in and makes them properly transparent.

Both the issue of sluggish behaviour and delayed load only really manifest themselves with volume and size of image. Use just a couple of instances and it’s fine, but be careful adding more than five or six. As ever, test, test, test.

Links become unclickable, forms unfocusable

This is a big one. There’s a bug/weirdness with AlphaImageLoader that sometimes prevents interaction with links and forms when a PNG background image is used. This is sometimes reported as a z-index issue, but I don’t believe it is. Rather, it’s an artefact of that weird way the filter gets applied to the document almost outside of the normal render process.

Often this can be solved by giving the links or form elements hasLayout using position: relative; where possible. However, this doesn’t always work and the non-interaction problem cannot always be solved. You may find yourself having to go back to the drawing board.

Sidestepping the danger zones

Frankly, it’s pretty bad news if you design a site, have that design signed off by your client, build it and then find out only at the end (because you don’t know what might trigger a problem) that your search field can’t be focused in IE6. That’s an absolute nightmare, and whilst it’s not likely to happen, it’s possible that it might. It’s happened to me. So what can you do?

The best approach I’ve found to this scenario is

  1. Isolate the PNG or PNGs that are causing the problem. Step through the PNGs in your page, commenting them out one by one and retesting. Typically it’ll be the nearest PNG to the problem, so try there first. Keep going until you can click your links or focus your form fields.
  2. This is where you really need luck on your side, because you’re going to have to fake it. This will depend on the design of the site, but some way or other create a replacement GIF or JPEG image that will give you an acceptable result. Then use conditional comments to serve that image to only users of IE older than version 7.

A hack, you say? Well, you started it chum.

Applying AlphaImageLoader

Because the filter property is invalid CSS, the safest pragmatic approach is to apply it selectively with JavaScript for only Internet Explorer versions 5.5 and 6. This helps ensure that by default you’re serving standard CSS to browsers that support both the CSS and PNG standards correct, and then selectively patching up only the browsers that need it.

Several years ago, Aaron Boodman wrote and released a script called sleight for doing just that. However, sleight dealt only with images in the page, and not background images applied with CSS. Building on top of Aaron’s work, I hacked sleight and came up with bgsleight for applying the filter to background images instead. That was in 2003, and over the years I’ve made a couple of improvements here and there to keep it ticking over and to resolve conflicts between sleight and bgsleight when used together. However, with alpha channel PNGs becoming much more widespread, it’s time for a new version.

Introducing SuperSleight

SuperSleight adds a number of new and useful features that have come from the day-to-day needs of working with PNGs.

  • Works with both inline and background images, replacing the need for both sleight and bgsleight
  • Will automatically apply position: relative to links and form fields if they don’t already have position set. (Can be disabled.)
  • Can be run on the entire document, or just a selected part where you know the PNGs are. This is better for performance.
  • Detects background images set to no-repeat and sets the scaleMode to crop rather than scale.
  • Can be re-applied by any other JavaScript in the page – useful if new content has been loaded by an Ajax request.

Download SuperSleight

Implementation

Getting SuperSleight running on a page is quite straightforward, you just need to link the supplied JavaScript file (or the minified version if you prefer) into your document inside conditional comments so that it is delivered to only Internet Explorer 6 or older.

  1. <!--[if lte IE 6]>
  2. <script type="text/javascript" src="supersleight-min.js"></script>
  3. <![endif]-->
  4. Source: /code/supersleight-transparent-png-in-ie6/3.txt

Supplied with the JavaScript is a simple transparent GIF file. The script replaces the existing PNG with this before re-layering the PNG over the top using AlphaImageLoaded. You can change the name or path of the image in the top of the JavaScript file, where you’ll also find the option to turn off the adding of position: relative to links and fields if you don’t want that.

The script is kicked off with a call to supersleight.init() at the bottom. The scope of the script can be limited to just one part of the page by passing an ID of an element to supersleight.limitTo(). And that’s all there is to it.