Reference library

Article number: 301017

Sitekit Search options

The following is a list of search options that can be integrated into the Sitekit CMS. The Google descriptions were fine when written, but of course Google may change things without telling us first.

Google co-op http://www.google.co.uk/cse/
This is a free customised search engine than can be added  to your site. It can be branded in a rudimentary way by the addition of a header and footer matching your own site. It’s free and has ads on the RHS of the result set. It will display all the content reachable by external Google spiders. Generally the search results don’t differ much from what you’d get with a standard Google site specific search via ‘site:www.simsl.co.uk’. Results are ordered by relevance using the ever changing Google algorithm. It searches inside PDFs but cannot search behind password protected areas.

Google site search http://www.google.com/work/search/products/gss.html
Same as above  - but with the removal of ads. Price is based on the number of documents crawled.

Google Search Appliance http://www.google.com/work/search/products/
This is a rack mounted box that could be operated at the Sitekit end or you could purchase and configure yourself. More output customisation is possible than with the options above, plus the ability to emphasise crawling of specific areas. Depending on the authentication method used it can provide sub searches of protected areas. Sitekit have a rack mounted 'mini' on our shared hosting environment so configuration is fast. Price is based on setup plus number of documents crawled.

Google Search for Work http://www.google.com/work/search/
Completely configurable but expensive. Not tested by Sitekit yet. You can promote the pages you want to be found for specific key words by ensuring that their metadata or title is seeded with the vocabulary you want. Care needs to be taken with this approach so as not too overpopulate the description. The current industry thinking is that 120 characters is the maximum needed in a description that SE’s will pay attention to. Any more and it could be ignored.

Sitekit search
Just as a reminder, for comparison with the options listed above. Sitekit Search results are ordered by last changed on date descending, and grouped via source with editorial first, then news, then events, directory etc. No searching within PDF’s unless PDF content has been copied to the correct field in the admin system. Searches can be localised on a specific branch but will still get downloads from all areas.

Sitekit index search (NN) - updated for 10.4
Search scope is governed by asset class so it's possible to configure searches that are limited to specific branches or specific downloads folders. The appearance of results are governed by the relevant CSS.

Results are weighted so that the result at the top will be the one that contains most of the terms, closest to the start of the indexed text. The weighting calculation is basically the sum of the positions of the first instance of each term within the indexed text, with an absent term given a weight of the indexed text length to penalise it.

The indexed text is made up of a concatenation of the following:

  1. Editorial Title
  2. Meta description
  3. Meta keywords
  4. Content text

The relevant indexes are populated by uploading new content or saving changes to existing content or by triggering a re-index via the admin interface. Some dynamically generated pages cannot as yet be included in indexes these include: events details, FAQs, directory details, normal news details though pages as news is included. Further details on the search indexing is here.

 

Related questions