Sinossi
ScanWeb

ScanWeb is program for on-line finding of regular expressions on WWW pages. Pages must be in depth to N of links from entered base URL. The N you can to set. You can select term of validity for pages. Program be able to save of pages with sought expression.

Finding is persistent: possibility of regeneration after shutdown the program or computer. You can have more finding operations concurrently.

Output is page in HTML format with list of found expressions.

Program can be localize to other languages as Windows Commander.

Definition of finding plan

  • Finding plan (project) are the most important settings for finding and saving of expressions. Projects are various settings. Example is on Figure 1:
  • Name is identity for finding operation. Default name is Find. Name convention is the same as name of files.
  • Directory is destination for found informations. The directory must exist before start of finding. In other case you will have nothing result. Files in the directory will be deleting and complement without confirm!!!
  • Base URL is cross roads to next pages for scan.
  • Depth level have the meaning: 0 - find on base URL only, 1 - find on base URL and direct linked pages, ...
  • Find string can be new regular expression, predefined or combination.
  • Continue option is for regeneration of broke searching. If it is off over start of find, you will lost data from previous process!!!
  • Save pages option is for save of pages which have sought expression.
  • Unique option remove duplicities from result and prevent from saving of pages with duplicities only.
  • Base domain only option prevent from finding on pages with domain part URL equal domain part base URL.

Regular expressions table

Program package include predefined set of regular expressions. Definition set is on Figure 2. You can add new definitions to the set. All rows you write as $variable = expression. Table control:

  • Double click or key Insert is for add row.
  • Key Delete remove row.
  • Arrows, keys PageUp, PageDown, Home, End is for move actual definition in table.
  • Left mouse button click select row. If the row was selected, you can edit it.

You can define of regular expression with using some previous definition. order is important.

 

Additional

conditions

You can set obligatory content, forbidden content and maximum size of page. Page out of the conditions is not for finding.


Obligatory or forbidden content is in regular expressions form.

 

Set of language

Select of language on third tabpage. You can create new labels for other laguage. Validity and right location is necessary for functionality. Create new labels that way:

  • Copy a *.lng file in Language directory to new *.lng file.
  • First line is name of language without next importance.
  • Translate of labels inside characters ". Retain numbers. No changing sequence of lines, no add, no delete.
  • Start program, select new language and check labels siting and size on all tabpages.

Regular expressions

  • Individual characters. e.g. 'h' is a regular expression. For non printable characters one has to use either the notation \xhh where h means a hexadecimal digit or one of the escape sequences \n \r \t \v known from C language. Because the characters * + ? . | [ ] ( ) - $ ^ have a special meaning in regular expressions, escape sequences must also be used to specify these characters literally. For example '\*'. Special characters of nationals languages and space are the same - ('\á','\Ž','\ ', ...).
  • Character sets enclosed in square brackets [ ]. e.g. [A-Za-z_$] matches any alphabetic character, the underscore and the dollar sign (the dash (-) indicates a range), e.g.[A-Za-z$_] matches 'B', 'b', '_', '$' and so on. A ^ immediately following the [ of a character set means 'form the inverse character set'. e.g. [^0-9A-Za-z] matches nonalphanumeric characters.
  • Expressions enclosed in round parens ( ). Any regular expression can be used on the lowest level by enclosing it in round brackets.
  • The dot . It means match any character.
  • An identifer prefixed by a $. It refers to an already defined regular expression. e.g. $Ident stands for a user defined regular expression previously defined. Think of it as a regular expression enclosed in round parens, which has a name.
  • Character * meaning repetition (possibly zero times); e.g. [0-9]* not only matches 8 but also 87576 and even the empty string ''.
  • Character + meaning at least one occurrence; e.g. [0-9]+ matches 8, 9185278 but not the empty string.
  • Character ? meaning at most one occurrence; e.g. [$_A-Z]? matches '_', U, $, .. and ''.
  • Sequence \i meaning ignore case for basic set of charactecs. Validity is only for capital letters in regular expression.
  • Character | is alternative between two regular expressions. e.g. A | $number.

Concatenation of regular expressions is some text. E. g. '(Hallo (,\ how\ are\ you\?)?)' can to be 'Hallo', or 'Hallo, how are you?'



Figure 1

Figure 2


Download

  0.2 MB   ScanWeb 1.2 32 bit (9.5.2002)

ScanWeb is freeware. All files are in ZIP archive. No installation. Extract and you can use it.


Language Versions

There is no specific language better supported by the EServer than the others. Each language version has its own directory, named by a two-character abbreviation same as that used at the www header. For example, for Czech it is cz, for Slovak sk, for English en, etc. The advantage of this system is, that it enables to make specific menu, index page and advertisements for each language.

Should you wish to create a new language version for visitors of your web pages, proceed simply as follows:

  • Copy the content of any current language version into the new directory.
    • The new directory should be named by a two-character abbreviation of the language.
  • Translate the index page, the info page and menu.
  • Add the specific part of the header for the language into the content.txt file.
    • The header should be located under the key #headers and it should begin with the new directory name.
  • Finally, add new language choice into the title.html file.
    • Add the respective flag.
    • The link is to be connected to the eserver.php script, whose parameter is the name of the new directory.

Copying the language version, the new one has also inherited advertisements from the parent version. Now you can adapt it to the new language.

The above procedure is sufficient to have the new language version suitable for an usual visitor. Should you wish to have it complete, you must also translate manuals for the EServer, which represents greater volume of texts. If you decide to do that, please contact the administrator.