Skip to main content

Overview

When you add a new domain to SwiftQuery, we’ll ask you to choose a URL discovery method. We support two methods of URL discovery:
  • Sitemap detection
  • Bulk upload

Sitemap Detection

When you choose Sitemap detection, we automatically scan your domain for a sitemap.xml file and parse this to form the basis of our URL discovery. We automatically identify all URLs in a sitemap (including any associated sitemap files, if you have more than one) and add them to the crawl list. We re-discover your sitemap on each crawl, to account for changes to your website’s content. The New Domain wizard will display a list of URLs discovered for you to review.
If you need to exclude URLs from crawling, you should untick Start crawling immediately from Step 4. You can exclude URLs from the Domains page after.

Bulk upload

This method lets you add URLs manually by pasting them in, one per line. Note that subsequent crawls do not modify the URL list. If you wish to change the URLs after they have been discovered, you must click the Bulk Import button again (from the Domain Settings page). We strongly advise you to use the Sitemap Detection method where possible. If your site’s content changes, this will not be reflected until you manually update the URL list.

Excluding URLs

You may wish to exclude specific URLs from being crawled, to prevent their content’s being used in answers to your users. Once you’ve added a domain, you can browse the discovered URLs list via the dashboard, and exclude URLs by selecting them and clicking “Exclude Selected”. Exclusions persist across crawls.

Exclusion patterns

You can exclude multiple URLs at once using Exclusion Patterns. To configure this, click Manage URL Exclusions in the Discovered URLs section under your domain settings page. From there you can add an Exclusion pattern. We support the following pattern types:
  • Exact Match: URL must exactly match the pattern
  • URL Prefix: URL must start with the pattern
  • Contains: URL must contain the pattern anywhere
  • Regex: URL must match the regular expression
Exclusion patterns also persist across crawls.