Newsstar

Tony Houghton

<h@realh.co.uk>

Copyright   2003 Tony Houghton

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Table of Contents

Introduction
    Quick start guide
News
Obtaining newsstar
Requirements
Directories used
Building
Setting up
    The main.cf file
    Setting up for download
    Setting up for upload
    Command-line options
    Upgrading from versions earlier than 0.7.0
How it works
    Preparing articles for upload
    Running the binary, processing downloaded articles, clearing up
Migrating from suck
Score-based killfiling
Interrupting while running
Contact details

Introduction

Newsstar fetches news and posts it to a local server; INN, s-news and sn are
supported, and it should be easy to adapt for other servers with some
configuration and extra scripts. It's designed for Unix-like systems, and all
the development was done on Linux.

There are already plenty of other programs to do this, but what makes newsstar
special is that it can make multiple simultaneous connections, not only to one
server, but to several, supporting up to 10 threads. Before fetching each
article it checks that it hasn't already been downloaded by another thread or
in a previous session. It can also pipeline article requests to make better use
of available bandwidth.

I wrote it because a number of ISPs I have used suffer from unreliable
newsfeeds. There is an excellent free server made available by
news.individual.net, but it can be a bit slow at times, and using external
servers uses more bandwidth. Therefore I wanted a program which could fetch
whatever articles my ISP has available, but use the foreign server to avoid
missing posts or getting them very late, and to do it as fast as possible.

Newsstar is distributed under the GPL. If you are reading this file offline you
should have a file called COPYING with details. Due to a long-term illness I am
unable to earn a living, either by programming or by any other means, so any
gifts will be much appreciated. You can use the Sourceforge/Paypal donation
scheme or see below to find out where to send cheques or cash etc.

Quick start guide

A quick start guide is provided, with brief instructions on how to configure
newsstar once it is installed. It starts off in the source distribution as docs
/QuickStart.in; this is readable, but some directory names are abstracted,
enclosed in @ symbols. After configure has been run, docs/QuickStart will be
available, containing the actual directory names.

If you are installing from source, and you want to use the quick start guide,
read on to the building section, follow the instructions there, then read the
generated guide.

If you are using a binary package, the guide should be available in /usr/share/
doc/newsstar.

News

See news.html for the latest news or the project's ChangeLog file for more
details.

News summaries for older versions are available in the file OLDNEWS in the
distribution's docs directory.

Obtaining newsstar

The latest version of the file you are now reading should be available at http:
//newsstar.sourceforge.net. The Sourceforge project site, where you can
download newsstar, report bugs and use the forums etc, is at http://
sourceforge.net/projects/newsstar/.

A Debian package is also available from the project's Sourceforge file list.

Requirements

You'll need a Unix-like system, eg Linux, to run and compile newsstar. The
curses library (preferably GNU ncurses) gives extra functionality, but isn't
essential. The gdbm library and headers are needed if newsstar is to maintain
its own history database.

Obviously you'll need the details of at least one news server to download news
from. The easiest way to feed articles to most servers is via an rnews batch,
but direct injection via NNTP is supported for INN and s-news. The remote
server(s) must support the XHDR command. Newsstar used to request articles by
their Message-ID, but in 0.11.0 switched to fetching by article number (group
index) which should improve performance, especially on poorer servers.

Newsstar optionally uses the NNTP CHECK command to avoid downloading articles
the local server already has. If your server/spool doesn't support that,
newsstar can maintain its own database of Message-IDs with gdbm. INN supports
CHECK, but I don't know in which version it became available; sn and s-news
don't, but newsstar can use s-news' database.

Directories used

Some important directories used by newsstar are configurable, so their final
locations can not be shown in this file. Here is a summary:

┌────────────┬────────────────────────┬───────────────────────────────────────┐
│  Abstract  │    Typical location    │              Description              │
│    name    │                        │                                       │
├────────────┼────────────────────────┼───────────────────────────────────────┤
│CONF_DIR    │/etc/newsstar           │Holds main configuration files         │
├────────────┼────────────────────────┼───────────────────────────────────────┤
│RC_DIR      │/var/lib/newsstar       │Writable data files, especially newsrc │
│            │                        │files                                  │
├────────────┼────────────────────────┼───────────────────────────────────────┤
│INCOMING_DIR│/var/spool/             │Temporary holding place for downloaded │
│            │newsstar-incoming       │articles                               │
├────────────┼────────────────────────┼───────────────────────────────────────┤
│SPOOL_DIR   │/var/spool/news         │Local news server's spool directory    │
├────────────┼────────────────────────┼───────────────────────────────────────┤
│OUTGOING_DIR│/var/spool/news/outgoing│Where local news server lists outgoing │
│            │                        │articles                               │
├────────────┼────────────────────────┼───────────────────────────────────────┤
│ACTIVE_FILE │/var/lib/news/active    │INN's or s-news' list of active        │
│            │                        │newsgroups                             │
├────────────┼────────────────────────┼───────────────────────────────────────┤
│ARTICLES_DIR│/var/spool/news/articles│Where local server stores newsgroup    │
│            │                        │articles                               │
└────────────┴────────────────────────┴───────────────────────────────────────┘

Building

First I recommend you run ./configure --help and check whether any of the
options will be useful to you.

Then it's just a matter of running the usual:

./configure
make
make install

Tip

make install should normally be performed as root, but it's recommended to do
the first two stages as a normal user in a directory you own.

Tip

It's recommended that you include INN's binaries in your PATH before running
configure, if they're not in one of the standard branches or in one of /usr/lib
/news/bin, /usr/local/news/bin or /var/lib/news/bin. This will help newsstar
decide the PATH variable to use at run-time.

Setting up

Newsstar installs a binary called newsstar.bin in the configured libexec
directory, which isn't intended to be run independently. Instead, you should
always call the perl script, also called newsstar, which acts as a front-end
and performs a lot of support work for the binary. Prior to version 0.11.0, the
script was called newsstar.pl. A symbolic link used to be installed for
backwards compatibility, but the name newsstar.pl was deprecated and is no
longer used at all as of version 1.0.

Important

Newsstar must be run with write access to the news spool. Most systems have a
user called news, which is ideal for this use.

The main.cf file

When newsstar starts it first looks in its config directory for a file called
main.cf. This has one option per line in the form:

keyword     value

Any amount of whitespace is permitted between the key and value. If you want
leading whitespace in a value it should be enclosed in double quotes ("). Lines
beginning with # are comments, and blank lines are also ignored. A sample file
called main.cf.sample is provided in the sample_config^[1] directory, which you
should copy to use as the basis for your own file. Each option is documented
there, with the default value for each option shown. Commented options have no
default value.

A main.cf file is not compulsory.

See also server-specific option files.

Setting up for download

newsrc files

Once newsstar has read its main.cf file it scans the newsrc directory (RC_DIR)
for one or more files named newsrc.*, where the * is the name of each server.
This name can either be its address, or a nickname, in which case you must
provide its address in its config file (see below).

Each newsrc file contains one newsgroup entry per line: the name of each
newsgroup, optionally separated from a number by whitespace. If there is no
number, newsstar will try to fetch all available articles from the group. If
the number is negative, -n, it will try to fetch the group's n most recent
articles. A positive number means that was the last article downloaded from the
group, and the next fetch will try to fetch all articles newer than that.
Usually you will only use blanks or negative numbers when creating the file.
When newsstar has run, it automatically updates each newsrc file with the the
number of the last article downloaded in each group.

A sample newsrc file, newsrc.sample is provided in the sample_config^[1]
directory. You should delete or rename this file; it will be ignored, so you
can't have a server called sample. It is compulsory to have at least one newsrc
file, because this is how newsstar identifies which servers to connect to.

You may create newsrc files that start off empty and use newsstar's -A option
to have it automatically download groups to match those found on the
destination server.

As you will usually be creating these files manually you should also make sure
the user newsstar runs as (usually news) has write access to these files.

Server-specific config files

For each server it finds a newsrc file for, newsstar looks for an optional file
in its config directory called cf.* where the * is the name of each server, as
used in the newsrc file names.

The underlying format of these files is the same as for main.cf, and a sample
with comments is again provided in the sample_config^[1] directory. Some of the
options correspond to those in main.cf, in which case the value from main.cf is
used by default, but can be overridden on a per-server basis.

Ignore files

newsstar has an option to generate newsrc files automatically based on the
local server's list of active groups. Usually you will not want to download all
of these groups from all remote servers, so ignore files are used.

Ignore files are optional and are stored in newsstar's config directory. One
called master.ignore (new to newsstar 0.17.0) is common to all servers.
Additionally, each server may have its own ignore file called ignore.
server_name. Server-specific gnore files take priority over master.ignore.

Each ignore file consists of one perl regular expression per line. Perl regexps
are too major a subject to cover here, but are widely documented, the section
on migrating from suck contains some tips, and a sample is provided in the
sample_config^[1] directory. An ignore file may contain blank lines and comment
lines beginning with #; these lines are skipped.

When newsstar considers a group for inclusion in a newsrc file, it checks it
against each regexp in turn from the ignore files – server-specific first,
followed by the master file – taking action when it finds the first matching
pattern. A matching group is excluded from the newsrc file unless the pattern
is negated, in which case the group is specifically included.

A pattern is negated (introduced in newsstar 0.13.0) by preceding it with an
exclamation mark (!). In the unlikely event you want to use a pattern starting
with a !, precede it with a backslash (\). This means you can use negative
patterns to allow specific groups or hierarchies, followed by more general
patterns to exclude. For example, say you connected to a specialist news server
carrying groups in the loki.* hierarchy. You could use the following ignore
file to make it fetch all loki groups found in your active file, and ignore all
others:

!loki\.
.*

Setting up for upload

For each server with an outgoing feed enabled, newsstar looks for a
subdirectory named after the server in its outgoing directory eg OUTGOING_DIR/
newsstar/my.news.server where OUTGOING_DIR is usually /var/spool/news/outgoing.
It reads article files from the directory and tries to upload each one to the
remote server.

From newsstar 1.3.0 onwards you can use the extra_feed directive in the
server's config file to make it use a named feed as well as or instead of the
feed named after the server's identifier. More than one server may use the same
feed file simultaneously, in which case each article will be uploaded
exclusively to whichever server's thread gets to it first. This feature enables
you to set up your feeds so that a server can have a feed to itself for its own
specialist groups and also be fed mainstream groups shared with another server,
as a form of redundancy. If an article is rejected by one server for a reason
other than the server already having a copy it will still be offered to other
servers sharing that feed. The process involves race conditions which are
harmless except that the reported number of articles uploaded can become
inconsistent.

Files successfully uploaded or rejected due to the server already carrying them
are deleted from the directory. Those that are rejected or unable to be
uploaded for other reasons are moved to OUTGOING_DIR/newsstar/failed. If you
use newsstar's -m option, it will delete these files after mailing them back to
the sender.

The above uploading strategy was chosen for flexibility, although INN doesn't
set up its outgoing feeds in that way. The newsstar perl script takes a typical
INN outgoing feed file for each server and produces the directories full of
individual files that the binary requires.

Filtering uploaded articles

It is sometimes necessary to manipulate the contents of outgoing articles
before they are uploaded to a remote server eg to remove unwanted headers. You
may provide a perl script called filter.pl in CONF_DIR, containing a function
called filter to perform this filtering. Each time an article is about to be
uploaded, the function will be called with two arguments. The first is a
reference to an array, each element containing one line (including terminator)
of the message. The second is the name of the server.

The recommended way to delete a header is to replace it with a null string,
without a line terminator. To prevent a message being uploaded at all, replace
every line with blanks.

A sample script is provided. If you do not provide a script, newsstar's own
perl script has a built-in function which removes the same headers as the
sample.

Command-line options

Some of the options below are used only by newsstar.bin ^[2] (the script
recognises them and passes them on), some only by the perl script ^[3], and
some by both ^[4]. Most users can ignore which is which, but the information is
provided for advanced tinkerers.

Select local server type ^[4], -sn, -snews

    Use the -sn option if your local server is sn or the -snews option for
    snews. If neither option is given, the server is assumed to be INN.

Location of important news server executables ^[4] , --snstore, --overview

    If sn's snstore executable is in an unusual location you may specify its
    full path with the --snstore option, eg --snstore==/home/me/bin/snstore.

    Similarly, use --overview for the location of s-news' overview executable.

    If either of these options is ommitted when the server type is sn or snews,
    the script will attempt to find the binary itself and pass it on to the
    main program.

Verbosity ^[4] , -v, -vv (note that's two v's, not a w)

    The two verbosity level options cause newsstar to output extra information,
    mainly for debugging, with -vv being considerably more verbose than -v.

    All standard messages, including warnings and errors, are sent to stdout,
    while the extra information enabled by -v and -vv are sent to stderr. The
    reason for this policy is to make it possible to capture the extra
    information without it drowning out the standard information (normally)
    sent to the console.

    In non-full-screen mode (without the -f option), each message is prefixed
    with the index number of the child process it originates from, or <M> for
    the master process.

Disable running stats display ^[2] , -q

    Usually newsstar shows a continuous display of the number of messages
    transferred etc. If not in full-screen mode, this can be disabled with the
    -q option.

Merge stdout and stderr ^[2] , -s

    The -s option is used to let newsstar know whether stdout and stderr output
    to separate terminals or files. Normally newsstar assumes that both output
    to the same place, but the -s option tells it they are separate. This
    distinction is useful because of the way progress output overwrites itself
    on the same line where possible.

    As of newsstar 0.6.0, the use of -s also causes messages that are sent to
    stdout (or to the screen in full-screen mode) to be duplicated on the
    stderr stream, making logs generated from stderr easier to follow.

    Using the full-screen (-f) option modifies the meaning of the -s option;
    see below.

Full-screen mode ^[2] , -f

    The -f option causes newsstar to take over the whole terminal, using the
    curses library. It divides the screen up into a number of sections,
    including one for each thread, which makes it easier to keep track of
    progress on a per-thread basis.

    You can configure the colours and other attributes used for different types
    of information in this mode, using a file called curses.cf in the config
    directory. This has a similar underlying format to the other config files,
    and a sample is provided showing examples of every available option.

    In full-screen mode, the -s option has a different meaning. If used, it
    means that the extra messages enabled by -v and -vv are sent to stderr, but
    not to the terminal via its full-screen interface. Other levels of message
    are sent to both. Be sure to redirect stderr away from the terminal if you
    use the full-screen (-f) option, otherwise the display will be messed up.

    You will also see some messages printed to the console before and after the
    full-screen display is active, especially if using the -v or -vv options.

Brief mode ^[4] , -b

    The -b (for brief) option minimises the number of messages printed to
    stdout. It has very little effect in full-screen mode but otherwise makes
    it easier to get an overview of the progress of a fetch. All the other
    messages are still printed to stderr so you should use it in conjunction
    with the -s option and redirect stderr.

    In this mode newsstar prints a single figure showing the sum total number
    of messages available in each group from all servers. This may look a
    little strange, especially when one server is faster than another, because
    it will start downloading messages from one server without waiting for this
    information to become available from a slower one, so it sometimes appears
    to complete downloading before knowing how many messages to download.

Synchronise and sort newsrc files with local server's active groups ^[3] , -A, 
-a

    Newsstar's -A is equivalent to suck's -A option. It takes an optional
    argument, immediately after the -A with no space (eg -A-50). If present,
    its value is used for any groups added to the newsrc file(s). You should
    only consider using 0, negative values, or ommitting the argument. Groups
    already in the newsrc file have their value left unchanged of course.

    With the -A option enabled, the script reads the server's active file, or
    scans sn's article directory for newsgroup directories, and checks that
    newsrc files contain the same set of groups. Each server may also have an
    ignore file in CONF_DIR, named ignore.server_identifier. See the section on
    ignore files.

    If the -a option is given as well as -A, each newsrc file will be sorted so
    that its groups appear in alphabetical order.

Connect to only one remote server ^[4] , -o

    If you have multiple newsrc files, but want to connect to only one server,
    you can use the -o option. The syntax is -oserver where server is the name
    newsstar uses to identify the server. Note the lack of space between the
    option flag and the name.

Bounce failed postings ^[3] , -m

    Use the -m option to have failed postings removed from the failed directory
    and mailed back to the sender.

Compact the history database ^[2] , -r

    Newsstar has the option of maintaining its own history file (see main.cf).
    If this is in use, the -r command-line option causes it to reorganize the
    database whenever items have been deleted, to free up unused space.

Specify a wrapper program ^[3] , -wWRAPPER

    You can specify a wrapper program to run the binary in. This is intended
    for debugging. For example -w/usr/bin/valgrind. Note there is no space
    between -w and WRAPPER.

Debugging without running the binary ^[3] , --preprocess, --postprocess

    As an aid to debugging, the --preprocess and --postprocess options instruct
    the perl script just to perform its functions prior to or after running the
    binary respectively, without actually running the binary.

Upgrading from versions earlier than 0.7.0

As SPOOL_DIR is dependent on the news server, rather than owned by newsstar,
newsstar's RC_DIR is now separate. SPOOL_DIR and other directories/files owned
by the news server are now configurable in main.cf, to prevent the need for
separately compiled versions of newsstar to work with the different types of
server.

If upgrading from an earlier version, you should use a directory such as /var/
lib/newsstar for RC_DIR and move your old RC_DIR there.

As ignore files are only written by the user, and not updated by newsstar, they
have been relocated to CONF_DIR. RC_DIR is still checked for their presence for
backwards compatibility, but you should move these too.

The newsstar.bin binary is now located in libexecdir, because it is not usually
called directly.

How it works

The newsstar perl script performs these functions:

Preparing articles for upload

For each newsrc file, the script looks for a corresponding newsfeeds file (for
INN or s-news) or outgoing directory (for sn) in OUTGOING_DIR.

Newsfeeds files are flushed, and the referenced articles copied into the
appropriate directory for the newsstar binary, with the upload filter applied.
The feed file, usually generated by INN, contains one article reference per
line. Newsstar is only interested in the first field in each line, which can
either be a partial path to the article, or a storage token. The script
automatically distinguishes between partial paths and tokens, taking
appropriate action to locate the referenced file.

For sn, the processs is slightly simpler: the files just have to be moved,
processed from sn's "wire format" into plain text, and the upload filter
applied.

Running the binary, processing downloaded articles, clearing up

Finally, the script runs the binary, posts the downloaded batch (if present) to
the local server, mails failed postings back to their sender and deletes them,
removes any remaining temporary files etc, and exits.

Migrating from suck

This is quite an easy process, because suck's sucknewsrc files can be used as
newsstar newsrc files without alteration, just by moving/renaming them.
Converting suck active-ignore files to newsstar ignore files requires some more
work, but very little. The main things to watch out for are that in perl
regexps, . (period) means match any character, so if you want to exactly match
a . in a group name, you should precede it with a backslash. Also, * means
match any number of the preceding character, so * in active-ignore should be
replaced with .*. There is no need to use trailing or leading .*s because
patterns match any part of a group name.

Score-based killfiling

Newsstar can decide not to download certain articles based on the contents of
their headers (killfiling). It only reads the headers available in XOVER, to
help keep the code simple. Enabling killfiling forces the xover option to be
used for a server (see the section on server.cf files).

Those familiar with NNTP may be aware that XOVER only lists the contents of
certain headers in a fixed order, without the header name. Newsstar
reconstructs the full header as it would appear in a message by prepending it
with its name, a colon and a space.

The headers available are: Newsgroups, Subject, From, Date, Message-ID,
References, Size, Lines, in that order.

Newsgroups may not be the real Newsgroups header as it appears in the message,
but is constructed by newsstar based on the Xref header, if present in the
overview, or will simply contain the group it's examining when it finds the
message. In the latter case it will not be aware of any other groups the
article is crossposted to.

Size is a pseudo-header that doesn't appear in actual messages, but the data is
available in the overview and may be useful for scoring.

The headers for each message are tested against each regular expression in one
block, each header separated by newlines, so you may take advantage of the
order and put multiple headers in one expression, but this is not recommended,
in case the overview order changes.

To enable scoring create a file called master.score in the config directory. As
usual, a sample is provided. The structure of this file is similar to the other
config files, but a score value takes the place of a keyword, and an extended
regular expression takes the place of a value. Regular expressions are
case-insensitive by default but may be made case-sensitive by using the
regex_ic option in main.cf.

To enable extra scoring specific to a particular server create a file called
score.* in the config directory where * is the server name, in a similar way to
server config files. Any scores from this file will be added to any master
scores.

Each article starts off with a score of 100, but this value may be changed in a
server's config file. The headers are tested against all the lines in the score
file(s), and each time a regular expression matches, the corresponding score is
added to the article's total. If the final total is less than 0, the article is
not downloaded. Therefore many of the score values you'll be using will be
negative.

If the score is between 0 and the kill_score value in the server's config file
a pseudo-article may be posted to the local server, containing only the headers
newsstar generated by scanning the XOVER data, plus any path configured in
main.cf or cf.server. This is quite an important option because the receiving
server may object to articles with no Path header. The default value for
kill_score is 0, effectively disabling this feature.

Interrupting while running

If newsstar is sent an interrupt signal (SIGINT), usually by pressing Ctrl-C in
the terminal it's running on, it will not exit straight away, but first wait
until it's received all articles it's requested which are held in the
"pipeline" etc and log off cleanly from all servers.

If it receives a second SIGINT at this stage, or another non-drastic signal at
any time, it will discard any partially downloaded articles then try to log off
from any servers but not wait for acknowledgement.

Contact details

Newsstar was written by Tony Houghton. The best way to get support for newsstar
is via the Sourceforge project site where you can use the tracker to raise a
support request or join the mailing list for support and general discussion. If
you'd rather email me directly, please visit my email address page. If you're
not online you can find an email address for newsstar in the AUTHORS file.

My home page is http://www.realh.co.uk.

Please send material gifts to:

271 Upper Weston Lane
Woolston
Southampton
SO19 9HY
UK


━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

^[1] In the Debian package this directory is /usr/share/doc/examples/. In
previous versions these samples were installed in CONF_DIR and RC_DIR, but as
of version 1.0 they remain in the samples directory. The Debian package will,
however, install main.cf, curses.cf and master.ignore as conffiles in /etc/
newsstar.

^[2] Binary only

^[3] Script only

^[4] Binary and script

