![]() | ![]() |
|
|
Usenet News HOWTO
1. What is the Usenet?1.1. Discussion groupsThe Usenet is a huge worldwide collection of discussion groups. Each discussion group has a name, e.g. comp.os.linux.announce, and a collection of messages. These messages, usually called articles, are posted by readers like you and me who have access to Usenet servers, and are then stored on the Usenet servers. This ability to both read and write into a Usenet newsgroup makes the Usenet very different from the bulk of what people today call ``the Internet.'' The Internet has become a colloquial term to refer to the World Wide Web, and the Web is (largely) read-only. There are online discussion groups with Web interfaces, and there are mailing lists, but Usenet is probably more convenient than either of these for most large discussion communities. This is because the articles get replicated to your local Usenet server, thus allowing you to read and post articles without accessing the global Internet, something which is of great value for those with slow Internet links. Usenet articles also conserve bandwidth because they do not come and sit in each member's mailbox, unlike email based mailing lists. This way, twenty members of a mailing list in one office will have twenty copies of each message copied to their mailboxes. However, with a Usenet discussion group and a local Usenet server, there's just one copy of each article, and it does not fill up anyone's mailbox. Another nice feature of having your own local Usenet server is that articles stay on the server even after you've read them. You can't accidentally delete a Usenet articles the way you can delete a message from your mailbox. This way, a Usenet server is an excellent way to archive articles of a group discussion on a local server without placing the onus of archiving on any group member. This makes local Usenet servers very valuable as archives of internal discussion messages within corporate Intranets, provided the article expiry configuration of the Usenet server software has been set up for sufficiently long expiry periods. 1.2. How it works, loosely speakingUsenet news works by the reader first firing up a Usenet news program, which in today's GUI world will highly likely be something like Netscape Messenger or Microsoft's Outlook Express. There are a lot of proven, well-designed character-based Usenet news readers, but a proper review of the user agent software is outside the scope of this HOWTO, so we will just assume that you are using whatever software you like. The reader then selects a Usenet newsgroup from the hundreds or thousands of newsgroups which are hosted by her local server, and accesses all unread articles. These articles are displayed to her. She can then decide to respond to some of them. When the reader writes an article, either in response to an existing one or as a start of a brand-new thread of discussion, her software posts this article to the Usenet server. The article contains a list of newsgroups into which it is to be posted. Once it is accepted by the server, it becomes available for other users to read and respond to. The article is automatically expired or deleted by the server from its internal archives based on expiry policies set in its software; the author of the article usually can do little or nothing to control the expiry of her articles. A Usenet server rarely works on its own. It forms a part of a collection of servers, which automatically exchange articles with each other. The flow of articles from one server to another is called a newsfeed. In a simplistic case, one can imagine a worldwide network of servers, all configured to replicate articles with each other, busily passing along copies across the network as soon as one of them receives a new articles posted by a human reader. This replication is done by powerful and fault-tolerant processes, and gives the Usenet network its power. Your local Usenet server literally has a copy of all current articles in all relevant newsgroups. 1.3. About sizes, volumes, and so onAny would-be Usenet server administrator or creator must read the "Periodic Posting about the basic steps involved in configuring a machine to store Usenet news," also known as the Site Setup FAQ, available from ftp://rtfm.mit.edu/pub/usenet/news.answers/usenet/site-setup or ftp://ftp.uu.net/usenet/news.answers/news/site-setup.Z. It was last updated in 1997, but trends haven't changed much since then, though absolute volume figures have. If you want your Usenet server to be a repository for all articles in all newsgroups, you will probably not be reading this HOWTO, or even if you do, you will rapidly realise that anyone who needs to read this HOWTO may not be ready to set up such a server. This is because the volumes of articles on the Usenet have reached a point where very specialised networks, very high end servers, and large disk arrays are required for handling such Usenet volumes. Those setups are called ``carrier-class'' Usenet servers, and will be discussed a bit later on in this HOWTO. Administering such an array of hardware may not be the job of the new Usenet administrator, for which this HOWTO (and most Linux HOWTO's) are written. Nevertheless, it may be interesting to understand what volumes we are talking about. Usenet news article volumes have been doubling every fourteen months or so, going by what we hear in comments from carrier class Usenet administrators. In the beginning of 1997, this volume was 1.2 GBytes of articles a day. Thus, the volumes should have roughly done five doublings, or grown 32 times, by the time we reach mid-2002, at the time of this writing. This gives us a volume of 38.4 GBytes per day. Assume that this transfer happens using uncompressed NNTP (the norm), and add 50% extra for the overheads of NNTP, TCP, and IP. This gives you a raw data transfer volume of 57.6 GBytes/day or about 460 Gbits/day. If you have to transfer such volumes of data in 24 hours (86400 seconds), you'll need raw bandwidth of about 5.3 Mbits per second just to receive all these articles. You'll need more bandwidth to send out feeds to other neighbouring Usenet servers, and then you'll need bandwidth to allow your readers to access your servers and read and post articles in retail quantities. Clearly, these volume figures are outside the network bandwidths of most corporate organisations or educational institutions, and therefore only those who are in the business of offering Usenet news can afford it. At the other end of the scale, it is perfectly feasible for a small office to subscribe to a well-trimmed subset of Usenet newsgroups, and exclude most of the high-volume newsgroups. Starcom Software, where the authors of this HOWTO work, has worked with a fairly large subset of 600 newsgroups, which is still a tiny fraction of the 15,000+ newsgroups that the carrier class services offer. Your office or college may not even need 600 groups. And our company had excluded specific high-volume but low-usefulness newsgroups like the talk, comp.binaries, and alt hierarchies. With the pruned subset, the total volume of articles per day may amount to barely a hundred MBytes a day or so, and can be easily handled by most small offices and educational institutions. And in such situations, a single Intel Linux server can deliver excellent performance as a Usenet server. Then there's the internal Usenet service. By internal here, we mean a private set of Usenet newsgroups, not a private computer network. Every company or university which runs a Usenet news service creates its own hierarchy of internal newsgroups, whose articles never leave the campus or office, and which therefore do not consume Internet bandwidth. These newsgroups are often the ones most hotly accessed, and will carry more internally generated traffic than all the ``public'' newsgroups you may subscribe to, within your organisation. After all, how often does a guy have something to say which is relevant to the world at large, unless he's discussing a globally relevant topic like ``Unix rules!''? If such internal newsgroups are the focus of your Usenet servers, then you may find that fairly modest hardware and Internet bandwidth will suffice, depending on the size of your organisation. The new Usenet server administrator has to undertake a sizing exercise to ensure that he does not bite off more than he, or his network resources, can chew. We hope we have provided sufficient information for him to get started with the right questions. 2. Principles of OperationHere we discuss the basic concepts behind the operation of a Usenet news system. 2.1. Newsgroups and articlesA Usenet news article sits in a file or in some other on-disk data structure on the disks of a Usenet server, and its contents look like this:
A Usenet article's header is very interesting if you want to learn about the functioning of the Usenet. The From, Subject, and Date headers are familiar to anyone who has used email. The Message-ID header contains a unique ID for each message, and is present in each email message, though not many non-technical email users know about it. The Content-Type and Mime-Version headers are used for MIME encoding of articles, attaching files and other attachments, and so on, just like in email messages. The Organisation header is an informational header which is supposed to carry some information identifying the organisation to which the author of the article belongs. What remains now are the Newsgroups, Xref, Path and Distributions headers. These are special to Usenet articles and are very important. The Newsgroups header specifies which newsgroups this article should belong to. The Distributions header, sadly under-utilised in today's globalised Internet world, allows the author of an article to specify how far the article will be re-transmitted. The author of an article, working in conjunction with well-configured networks of Usenet servers, can control the ``radius'' of replication of his article, thus posting an article of local significance into a newsgroup but setting the Distribution header to some suitable setting, e.g. local or starcom, to prevent the article from being relayed to servers outside the specified domain. The Xref header specifies the precise article number of this article in each of the newsgroups in which it is inserted, for the current server. When an article is copied from one server to another as part of a newsfeed, the receiving server throws away the old Xref header and inserts its own, with its own article numbers. This indicates an interesting feature of the Usenet system: each article in a Usenet server has a unique number (an integer) for each newsgroup it is a part of. Our sample above has been added to two newsgroups on our server, and has the article numbers 211 and 452 in those groups. Therefore, any Usenet client software can query our server and ask for article number 211 in the newsgroup starcom.tech.misc and get this article. Asking for article number 452 in starcom.tech.security will fetch the article too. On another server, the numbers may be very different. The Path specifies the list of machines through which this article has travelled before it has reached the current server. UUCP-style syntax is used for this string. The current example indicates that a user called shuvam first wrote this article and posted it onto a computer which calls itself purva, and this computer then transferred this article by a newsfeed to news.starcomsoftware.com. The Path header is critical for breaking loops in newsfeeds, and will be discussed in detail later. Our sample article will sit in the two newsgroups listed above forever, unless expired. The Usenet software on a server is usually configured to expire articles based on certain conditions, e.g. after it's older than a certain number of days. The C-News software we use allows expiry control based on the newsgroup hierarchy and the type of newsgroup, i.e. moderated or unmoderated. Against each class of newsgroups, it allows the administrator to specify a number of days after which the article will be expired. It is possible for an article to control its own expiry, by carrying an Expires header specifying a date and time. Unless overriden in the Usenet server software, the article will be expired only after its explicit expiry time is reached. 2.2. Of readers and serversComputers which access Usenet articles are broadly of two classes: the readers and the servers. A Usenet server carries a repository of articles, manages them, handles newsfeeds, and offers its repository to authorised readers to read. A Usenet reader is merely a computer with the appropriate software to allow a user to access a software, fetch articles, post new articles, and keep track of which articles it has read in each newsgroup. In terms of functionality, Usenet reading software is less interesting to a Usenet administrator than a Usenet server software. However, in terms of lines of code, the Usenet reader software can often be much larger than Usenet server software, primarily because of the complexities of modern GUI code. Most modern computers almost exclusively access Usenet servers using the NNTP (Network News Transfer Protocol) for reading and posting. This protocol can also be used for inter-server communication, but those aspects will be discussed later. The NNTP protocol, like any other well-designed TCP-based Internet protocol, carries ASCII commands and responses terminated with CR-LF, and comprises a sequence of commands, somewhat reminiscent of the POP3 protocol for email. Using NNTP, a Usenet reader program connects to a Usenet server, asks for a list of active newsgroups, and receives this (often huge) list. It then sets the ``current newsgroup'' to one of these, depending on what the user wants to browse through. Having done this, it gets the meta-data of all current articles in the group, including the author, subject line, date, and size of each article, and displays an index of articles to the user. The user then scans through this list, selects an article, and asks the reader to fetch it. The reader gives the article number of this article to the server, and fetches the full article for the user to read through. Once the user finishes his NNTP session, he exits, and the reader program closes the NNTP socket. It then (usually) updates a local file in the user's home area, keeping track of which news articles the user has read. These articles are typically not shown to the user next time, thus allowing the user to progress rapidly to new articles in each session. The reader software is helped along in this endeavour by the Xref header, using which it knows all the different identities by which a single article is identified in the server. Thus, if you read the sample article given above by accessing starcom.tech.misc, you'll never be shown this article again when you access starcom.tech.misc or starcom.tech.security; your reader software will do this by tracking the Xref header and mapping article numbers. When a user posts an article, he first composes his message using the user interface of his reader software. When he finally gives the command to send the article, the reader software contacts the Usenet server using the pre-existing NNTP connection and sends the article to it. The article carries a Newsgroups header with the list of newsgroups to post to, often a Distribution header with a distribution specification, and other headers like From, Subject etc. These headers are used by the server software to do the right thing. Special and rare headers like Expires and Approved are acted upon when present. The server assigns a new article number to the article for each newsgroup it is posted to, and creates a new Xref header for the article. Transfer of articles between servers is done in various ways, and is discussed in quite a bit of detail in Section XXX titled ``Newsfeeds'' below. 2.3. Newsfeeds2.3.1. Fundamental conceptsWhen we try to analyse newsfeeds in real life, we begin to see that, for most sites, traffic flow is not symmetrical in both directions. We usually find that one server will feed the bulk of the world's articles to one or more secondary servers every day, and receive a few articles written by the users of those secondary servers in exchange. Thus, we usually find that articles flow down from the stem to the branches to the leaves of the worldwide Usenet server network, and not exactly in a totally balanced mesh flow pattern. Therefore, we use the term ``upstream server'' to refer to the server from which we receive the bulk of our daily dose of articles, and ``downstream server'' to refer to those servers which receive the bulk dose of articles from us. Newsfeeds relay articles from one server to their ``next door neighbour'' servers, metaphorically speaking. Therefore, articles move around the globe, not by a massive number of single-hop transfers from the originating server to every other server in the world, but in a sequence of hops, like passing the baton in a relay race. This increases the latency time for an article to reach a remote tertiary server after, say, ten hops, but it allows tighter control of what gets relayed at every hop, and helps in redundancy, decentralisation of server loads, and conservation of network bandwidth. In this respect, Usenet newsfeeds are more complex than HTTP data flows, which typically use single-hop techniques. Each Usenet news server therefore has to worry about newsfeeds each time it receives an article, either by a fresh post or from an incoming newsfeed. When the Usenet server digests this article and files it away in its repository, it simultaneously looks through its database to see which other server it should feed the article to. In order to do this, it carries out a sequence of checks, described below. Each server knows which other servers are its ``next door neighbours;'' this information is kept in its newsfeed configuration information. Against each of its ``next door neighbours,'' there will be a list of newsgroups which it wants, and a list of distributions. The new article's list of newsgroups will be matched against the newsgroup list of the ``next door neighbour'' to see whether there's even a single common newsgroup which makes it necessary to feed the article to it. If there's a matching newsgroup, and the server's distribution list matches the article's distribution, then the article is marked for feeding to this neighbour. When the neighbour receives the article as part of the feed, it performs some sanity checks of its own. The first check it performs is on the Newsgroups header of the new article. If none of the newsgroups listed there are part of the active newsgroups list of this server, then the article can be rejected. An article rejected thus may even be queued for outgoing feeds to other servers, but will not be digested for incorporation into the local article repository. The next check performed is against the Path header of the incoming article. If this header lists the name of the current Usenet server anywhere, it indicates that it has already passed through this server at least once before, and is now re-appearing here erroneously because of a newsfeed loop. Such loops are quite often configured into newsfeed topologies for redundancy: ``I'll get the articles from Server X if not Server Y, and may the first one in win.'' The Usenet server software automatically detects a duplicate feed of an article and rejects it. The next check is against what is called the server's history database. Every Usenet server has a history database, which is a list of the message IDs of all current articles in the local repository. Oftentimes the history database also carries the message IDs of all messages recently expired. If the incoming article's message ID matches any of the entries in the database, then again it is rejected without being filed in the local repository. This is a second loop detection method. Sometimes, the mere checking of the article's Path header does not detection of all potential problems, because the problem may be a re-insertion instead of a loop. A re-insertion happens when the same incoming batch of news articles is re-fed into the local server, perhaps after recovering the system's data from tapes after a system crash. In such cases, there's no newsfeed loop, but there's still the risk that one article may be digested into the local server twice. The history database prevents this. All these simple checks are very effective, and work across server and software types, as per the Internet standards. Together, they allow robust and fail-safe Usenet article flow across the world. 2.3.2. Types of newsfeedsThis section explains the basics of newsfeeds, without getting into details of software and configuration files. 2.3.2.1. Queued feedsThis is the commonest method of sending articles from one server to another, and is followed whenever large volumes of articles are to be transferred per day. This approach needs a one-time modification to the upstream server's configuration for each outgoing feed, to define a new queue. In essence all queued feeds work in the following way. When the sending server receives an article, it processes it for inclusion into its local repository, and also checks through all its outgoing feed definitions to see whether the article needs to be queued for any of the feeds. If yes, it is added to a queue file for each outgoing feed. The precise details of the queue file can change depending on the software implementation, but the basic processes remain the same. A queue file is a list of queued articles, but does not contain the article contents. Typical queue files are ASCII text files with one line per article giving the path to a copy of the article in the local spool area. Later, a separate process picks up each queue file and creates one or more batches for each outgoing feed. A batch is a large file containing multiple Usenet news articles. Once the batches are created, various transport mechanisms can be used to move the files from sending server to receiving server. You can even use scripted FTP. You only need to ensure that the batch is picked up from the upstream server and somehow copied into a designated incoming batch directory in the downstream server. UUCP has traditionally been the mechanism of choice for batch movement, because it predates the Internet and wide availability of fast packet-switched data networks. Today, with TCP/IP everywhere, UUCP once again emerges as the most logical choice of batch movement, because it too has moved with the times: it can work over TCP. NNTP is the de facto mechanism of choice for moving queued newsfeeds for carrier-class Usenet servers on the Internet, and unfortunately, for a lot of other Usenet servers as well. The reason why we find this choice unfortunate is discussed in Section 12.1> below. But in NNTP feeds, an intermediate step of building batches out of queue files can be eliminated --- this is both its strength and its weakness. In the case of queued NNTP feeds, articles get added to queue files as described above. An NNTP transmit process periodically wakes up, picks up a queue file, and makes an NNTP connection to the downstream server. It then begins a processing loop where, for each queued article, it uses the NNTP IHAVE command to inform the downstream server of the article's message~ID. The downstream server checks its local repository to see whether it already has the message. If not, it responds with a SENDME response. The transmitting server then pumps out the article contents in plaintext form. When all articles in the queue have been thus processed, the sending server closes the connection. If the NNTP connection breaks in between due to any reason, the sending server truncates the queue file and retains only those articles which are yet to be transmitted, thus minimising repeat transmissions. > A queued NNTP feed works with the sending server making an NNTP connection to the receiving server. This implies that the receiving server must have an IP address which is known to the sending server or can be looked up in the DNS. If the receiving server connects to the Internet periodically using a dialup connection and works with a dynamically assigned IP address, this can get tricky. UUCP feeds suffer no such problems because the sending server for the newsfeed can be the UUCP server, i.e. passive. The receiving server for the feed can be the UUCP master, i.e. the active party. So the receiving server can then initiate the UUCP connection and connect to the sending server. Thus, if even one of the two parties has a static IP address, UUCP queued feeds can work fine. Thus, NNTP feeds can be sent out a little faster than the batched transmission processes used for UUCP and other older methods, because no batches need to be constructed. However, NNTP is often used in newsfeeds where it is not necessary and it results in colossal waste of bandwidth. Before we study efficiency issues of NNTP versus batched feeds, we will cover another way feeds can be organised using NNTP: the pull feeds. 2.3.2.2. Pull feedsThis method of transferring a set of articles works only over NNTP, and requires absolutely no configuration on the transmitting, or upstream, server. In fact, the upstream server cannot even easily detect that the downstream server is pulling out a feed --- it appears to be just a heavy and thorough newsreader, that's all. This pull feed works by the downstream server pulling out articles i one by one, just like any NNTP newsreader, using the NNTP ARTICLE command with the Message-ID as parameter. The interesting detail is how it gets the message~IDs to begin with. For this, it uses an NNTP command, specially designed for pull feeds, called NEWNEWS. This command takes a hierarchy and a date,
This command is sent by the downstream server over NNTP to the upstream server, and in effect asks the upstream server to list out all news articles which are newer than 15 August 1997 in the comp hierarchy. The upstream server responds with a (often huge) list of message~IDs, one per line, ending with a period on a line by itself. The pulling server then compares each newly received message~ID with its own article database and makes a (possibly shorter) list of all articles which it does not have, thus eliminating duplicate fetches. That done, it begins fetching articles one by one, using the NNTP ARTICLE command as mentioned above. In addition, there is another NNTP command, NEWGROUPS, which allows the NNTP client --- i.e. the downstream server in this case --- to ask its upstream server what were the new newsgroups created since a given date. This allows the downstream server to add the new groups to its active file. The NEWNEWS based approach is usually one of the most inefficient methods of pulling out a large Usenet feed. By inefficiency, here we refer to the CPU loads and RAM utilisation on the upstream server, not on bandwidth usage. This inefficiency is because most Usenet news servers do not keep their article databases indexed by hierarchy and date; CNews certainly does not. This means that a NEWNEWS command issued to an upstream server will put that server into a sequential search of its article database, to see which articles fit into the hierarchy given and are newer than the given date. If pull feeds were to become the most common way of sending out articles, then all upstream servers would badly need an efficient way of sorting their article databases to allow each NEWNEWS command to rapidly generate its list of matching articles. A slow upstream server today might take minutes to begin responding to a NEWNEWS command, and the downstream server may time out and close its NNTP connection in the meanwhile. We have often seen this happening, till we tweak timeouts. There are basic efficiency issues of bandwidth utilisation involved in NNTP for news feeds, which are applicable for both queued and pull feeds. But the problem with NEWNEWS is unique to pull feeds, and relates to server loads, not bandwidth wastage. 2.4. Control messagesThe Usenet is a massive dispersed collection of servers which operate almost without any supervision, provided they have adequate disk space and do not suffer disk corruption due to power failures, etc. (It is indeed surprising how self-managing a good Usenet server is, provided these two pre-requisites are met.) These servers are each under the control of human administrators, but it is preferable that certain routine actions be performed across all these servers remotely from one location, without the manual intervention of these humans. One common need for centralised operations is the creation of new groups in the standard eight hierarchies. The Usenet follows a fairly formal process which asks for votes from readers worldwide before deciding on the restructuring of its newsgroups list, including merging of low-volume groups, splitting of high-volume groups into many specialised groups, creating new groups, and even deleting groups. Once the voting process for a change concludes and the change action is to be carried out, it would be extremely tedious to send email to the hundreds of thousands of Usenet administrators and hope that they make the changes right, and answer their doubts if they get confused. It would be much better to have an automatic way to make the changes across all servers, of course with proper authorisation. The solution to this does not lie in giving some central authority the ability to run an OS-level command of his choice on all the world's Usenet servers, because OS commands differ from OS to OS, and because few Usenet administrators would trust a stranger from another part of the world with OS level access. Therefore, the solution lay in defining a small set of common Usenet maintenance actions, and permitting only these actions to be triggered on all servers through the passing of special command messages, called control messages. Control messages look like ordinary Usenet articles, more or less. They have an extra header line, with its value in a specific format, but they usually carry body text which looks like a normal human-written article. Here is a control message (a spurious one at that, but it'll do for now):
A control message must have a Control header. Besides, all control messages will have an Approved header, like messages posted to moderated newsgroups. The Control header actually specifies a command to run on the local server, and the parameter(s) to supply to it. The local Usenet server software is supposed to figure out its own way to get the task done. In this example, the command in the Control header is newgroup, which creates a new newsgroup. And its parameter is humanities.hipcrime, which gives the name of the newsgroup to create. In C-News, the control message implementation works through separate shellscripts kept in a fixed directory, $NEWSBIN/ctl/, as a security measure; if the executable script isn't present there, the control message command will be ignored. The control message types supported are:
The Usenet news software maintains a pseudo-newsgroup called control, where it files all control messages it receives. If you have an incoming newsfeed from the public Usenet, your server's control group will usually be full with thousands of cancel messages from trigger-happy fingers all over the world. Usenet news server software like C-News allows you to filter the incoming feed based on newsgroups, and will discard articles for groups they do not subscribe to. But since all servers have to receive and process control messages, they will all accept these cancel messages, though many of them may apply to articles which are not part of your highly-pruned subset of groups. C'est la vie. Remember to set expiry for the control group to one day or even shorter, so that the junk can be cleaned out as rapidly as possible, just like the junk newsgroup. The beauty of the control message architecture is that it integrates seamlessly into the newsfeed mechanism for automatic control of the network of servers. No separate channel of connection is needed for the control actions. And article replication automatically propagates control messages with human-readable articles, thus guaranteeing reach across heterogenous networks technologies. What your Usenet server does on receiving a control message is governed by an authorisation file: $NEWSCTL/controlperms in the case of C-News and control.ctl in the case of INN, for instance. The security measures implemented by this module are further enhanced by the pgpcontrol package with its pgpverify script. Using pgpverify, your server can check that all control messages (except for article cancellation messages) are digitally signed by a trusted party using military-spec public key cryptography. Our integrated Usenet news software distribution includes integration with pgpverify. 3. Usenet news software3.1. A brief history of Usenet systemsTowards the end of this HOWTO, we have added some information about the history of Usenet server software by quoting sections from an earlier Usenet Periodic Posting. We consider this historical perspective, and the Usenix papers and other documents referred to in it, essential reading for any Usenet server administrator. Please see the section titled "Usenet software: a historical perspective>". 3.2. C-News and NNTPdC-News was written by Henry Spencer and Geoff Collyer of the Department of Zoology, University of Toronto, almost entirely in shell and awk, as a replacement for an earlier system called B-News. The focus was on adding some extra features and a lot of performance. The first release was called Shellscript Release, which was deployed by a very large number of servers worldwide, as a natural upgrade to B-News. This version of C-News had upward compatibility with B-News meta-data, e.g. history files. This was the version of C-News which was initially rolled out in 1991 or so at the National Centre for Software Technology (NCST, http://www.ncst.ernet.in) and the Indian Institutes of Technology in India as part of the Indian educational and research network (ERNET). We received guidance from the NCST about Usenet news installation and management. The Shellscript Release was soon followed by a re-write with a lot more C code, called Performance Release, and then a set of cleanup and component integration steps leading to the last release called the Cleanup Release. This Cleanup Release was patched many times by the authors, and the last one was CR.G (Cleanup Release revision G). The version of C-News discussed in this HOWTO is a set of small bug fixes on CR.G. Since C-News came from shellscript-based antecedents, its architecture followed the set-of-programs style so typical of Unix, rather than large monolothic software systems traditional to some other OSs. All pieces had well-defined roles, and therefore could be easily replaced with other pieces as needed. This allowed easy adaptations and upgradations. This never affected performance, because key components which did a lot of work at high speed, e.g. newsrun, had been rewritten in C by that time. Even within the shellscripts, crucial components which handled binary data, e.g. a component called dbz to manipulate efficient on-disk hash arrays, were C programs with command-line interfaces, called from scripts. C-News was born in a world with widely varying network line speeds, where bandwidth utilisation was a big issue and dialup links with UUCP file transfers was common. Therefore, it has strong support for batched feeds, specially with a variety of compression techniques and over a variety of fast and slow transport channels. And C-News virtually does not know the existence of TCP/IP, other than one or two tiny batch transport programs like viarsh. However, its design was so modular that there was absolutely no problem in plugging in NNTP functionality using a separate set of C programs without modifying a single line of C-News. This was done by a program suite called NNTP Reference Implementation, which we call NNTPd. This software suite could work with B-News and C-News article repositories, and provided the full NNTP functionality. Since B-News died a gradual death, the combination of C-News and NNTPd became a freely redistributable, portable, modern, extensible, and high-performance software suite for Unix Usenet servers. Further refinements were added later, e.g. nov, the News Overview package and pgpverify, a public-key-based digital signature module to protect Usenet news servers against fraudulent control messages. 3.3. INNINN is one of the two most widely used Usenet news server solutions. It was written by Rich Salz for Unix systems which have a socket API --- probably all Unix systems do, today. INN has an architecture diametrically opposite to CNews. It is a monolithic program, which is started at bootup time, and keeps running till your server OS is shut down. This is like the way high performance HTTP servers are run in most cases, and allows INN to cache a lot of things in its memory, including message-IDs of recently posted messages, etc. This interesting architecture has been discussed in an interesting paper by the author, where he explains the problems of the older B-News and C-News systems that he tried to address. Anyone interested in Usenet software in general and INN in particular should study this paper. INN addresses a Usenet news world which revolves around NNTP, though it has support for UUCP batches --- a fact that not many INN administrators seem to talk about. INN works faster than the CNews-NNTPd combination when processing multiple parallel incoming NNTP feeds. For multiple readers reading and posting news over NNTP, there is no difference between the efficiency of INN and NNTPd. Section 5.7> discusses the efficiency issues of INN over the earlier C-News architecture, based on Rich Salz' paper and our analyses of usage patterns. INN's architecture has inspired a lot of high-performance Usenet news software, including a lot of commercial systems which address the ``carrier class'' market. That is the market for which the INN architecture has clear advantages over C-News. 3.4. LeafnodeThis is an interesting software system, to set up a ``small'' Usenet news server on one computer which only receives newsfeeds but does not have the headache of sending out bulk feeds to other sites, i.e. it is a ``leaf node'' in the newsfeed flow diagram. According to its homepage (www.leafnode.org), ``Leafnode is a USENET software package designed for small sites running any flavour of Unix, with a few tens of readers and only a slow link to the net. [...] The current version is 1.9.24.'' This software is a sort of combination of article repository and NNTP news server, and receives articles, digests and stores them on the local hard disks, expires them periodically, and serves them to an NNTP reader. It is claimed that it is simple to manage and is ideal for installation on a desktop-class Unix or Linux box, since it does not take up much resources. Leafnode is based on an appealing idea, but we find no problem using C-News and NNTPd on a desktop-class box. Its resource consumption is somewhat proportional to the volume of articles you want it to process, and the number of groups you'll want to retain for a small team of users will be easily handled by C-News on a desktop-class computer. An office of a hundred users can easily use C-News and NNTPd on a desktop computer running Linux, with 64 MBytes of RAM, IDE drives, and sufficient disk space. Of course, ease of configuration and management is dependent on familiarity, and we are more familiar with C-News than with Leafnode. We hope this HOWTO will help you in that direction. There is, however, one area in which Leafnode is far easier to administer than INN or C-News. Leafnode constantly monitors the actual usage of the newsgroups it carries, based on readership statistics of its NNTP readers. If a particular newsgroup is not read at all by any user for a week, then Leafnode will delete all articles in that newsgroup, free up disk space, and stop fetching new articles for it. If it finds that a previously abandoned newsgroup is now again receiving attention, even from one user, then it'll fetch all articles for that group from its upstream server the next time it connects. This self-tuning feature of Leafnode is really an excellent advantage which makes a Leafnode site easier to manage, specially for small setups with bandwidth and disk space constraints. The Leafnode Website gives a lot of details in an easily understood format. TO BE EXTENDED AND CORRECTED. 3.5. SuckSuck is a program which lets you pull out an NNTP feed from an NNTP server and file it locally. It does not contain any article repository management software, expecting you to do it using some other software system, e.g. C-News or INN. It can create batchfiles which can be fed to C-News, for instance. (Well, to be fair, Suck does have an option to store the fetched articles in a spool directory tree very much like what is used by C-News or INN in their article area, with one file per article. You can later read this raw message spool area using a mail client which supports the msgdir file layout for mail folders, like MH, perhaps. We don't find this option useful if you're running Suck on a Usenet server.) Suck finally boils down to a single command-line program which is invoked periodically, typically from cron. It has a zillion command-line options which are confusing at first, but later show how mature and finely tunable the software is. If you need an NNTP pull feed, then we know of no better programs than Suck for the job. The nntpxfer program which forms part of the NNTPd package also implements an NNTP pull feed, for instance, but does not have one-tenth of the flexibility and fine-tuning of Suck. One of the banes of the NNTP pull feed is connection timeouts; Suck allows a lot of special tuning to handle this problem. If we had to set up a Usenet server with an NNTP pull feed, we'd use Suck right away. TO BE EXTENDED AND CORRECTED. 3.6. Carrier class softwareCarrier-class servers are expected to handle a complete feed of all articles in all newsgroups, including a lot of groups which have what we call a ``high noise-to-signal ratio.'' They do not have the luxury of choosing a ``useful'' subset like administrators of internal corporate Usenet servers do. Secondly, carrier-class servers are expected to turn articles around very fast, i.e. they are expected to have very low latency from the moment they receive an article to the time they retransmit it by NNTP to downstream servers. Third, they are supposed to provide very high availability, like other ``carrier class'' services. This usually means that they have parallel arrays of computers in load sharing configurations. And fourth, they usually do not cater to retail connections for reading and posting articles by human users. Usenet news carriers usually reserve separate computers to handle retail connections. Thus, carrier-class servers do not need to maintain a repository of articles; they only need to focus on super-efficient real-time re-transmission. These highly specialised servers have software which receive an article over NNTP, parse it, and immediately re-queue it for outward transmission to dozens or hundreds of other servers. And since they work at these high throughputs, their downstream servers are also expected to be live on the Internet round the clock to receive incoming NNTP connections, or be prepared to lose articles. Therefore, there's no batching or long queueing needed, and C-News-style batching in fact is totally inapplicable. Therefore, these carrier-class Usenet servers are more like packet routers than servers with repositories. They are referred to nowadays as NNTP routers or news routers. It can be seen why batch-oriented repository management software like C-News is a total anachronism here, and why they need an NNTP-oriented, online, real-time design. The INN antecedents of some of these systems is therefore natural. We would love to hear from any Linux HOWTO reader whose Usenet server requirements include carrier-class behaviour. We are aware of only one freely redistributable NNTP router: NNTPRelay (see http://nntprelay.maxwell.syr.edu/); this software runs on NT. There is no reason why such services cannot run off Linux servers, even Intel Linux, provided you have fast network links and arrays of servers. Linux as an OS platform is not an issue here. TO BE EXTENDED AND CORRECTED. 4. Setting up CNews + NNTPd4.1. Getting the sources and stuff4.1.1. The sourcesC-News software can be obtained from ftp://ftp.uu.net/networking/news/transport/cnews/cnews.tar.Z and will need to be uncompressed using the BSD uncompress utility or a compatible program. The tarball is about 650 KBytes in size. It has its own highly intelligent configuration and installation processes, which are very well documented. The version that is available is Cleanup Release revision G, on which our own version is based. NNTPd (the NNTP Reference Implementation) is available from ftp://ftp.uu.net/networking/news/nntp/nntp.1.5.12.1.tar.Z. It has no automatic scripts and processes to configure itself. After fetching the sources, you will have to follow a set of directions given in the documentation and configure some C header files. These configuration settings must be done keeping in mind what you have specified when you build the C-News sources, because NNTPd and C-News must work together. Therefore, some key file formats, directory paths, etc., will have to be specified identically in both software systems. The third software system we use is Nestor. This too is to be found in the same place where the NNTPd software is kept, at ftp://ftp.uu.net/networking/news/nntp/nestor.tar.Z. This software compiles to one binary program, which must be run periodically to process the logs of nntpd, the NNTP server which is part of NNTPd, and report usage statistics to the administrator. We have integrated Nestor into our source base. The fourth piece of the system, without which no Usenet server administrator dares venture out into the wild world of public Internet newsfeeds, is pgpverify. We have been working with C-News and NNTPd for many years now, and have fixed a few bugs in both packages. We have also integrated the four software systems listed above, and added a few features here and there to make things work more smoothly. We offer our entire source base to anyone for free download from http://www.starcomsoftware.com/proj/usenet/src/news.tar.gz. There are no licensing restrictions on our sources; they are as freely redistributable as the original components we started with. When you download our software distribution, you will extract it to find a directory tree with the following subdirectories and files:
Needless to say, we believe that our source tree is a better place to start with than the original components, specially if you are installing a Usenet server on a Linux box and for the first time. We will be available on email to provide technical assistance should you run into trouble. 4.1.2. The key configuration filesOnce you get the sources, you will need some key configuration files to seed your C-News system. These configuration files are actually database tables, and are changing frequently, whenever newsgroups are created, modified or deleted. These files specify the list of active newsgroups in the ``public'' Usenet. You can, and should, add your organisation's internal newsgroups to this list when you set up your own server, but you will need to know the list of public standard newsgroups to begin with. This list can be obtained from the same FTP server by downloading the files active.gz and newsgroups.gz from ftp://ftp.uu.net/networking/news/config/. You can create your own active and newsgroups files by retaining a subset of the entries in these two files. Both these are ASCII text files. Getting the sources from our server will not obviate the need to get the latest versions of these files from ftp.uu.net. We do not (yet) maintain an up-to-date copy of these files on our server, and we will add no value to the original by just mirroring them. 4.2. Compiling and installingFor installing, first make sure you have an entry for a user called news in your /etc/password file. This is setting the news-database owner to news. Now download the source from us and untar it in the home directory of news. This creates two main directories viz. c-news and nntp. To install and compile, run the script build.sh as root in the directory that contains the script. It is important that the script run as root as it sets ownerships, installs and compiles the source as user news. This is a one-step process that puts in place both the C-News and the NNTP software, setting correct permissions and paths. Following is a brief description of what build.sh does:
4.3. Configuring the system: What and how to configure files?Once installed, you have to now configure the system to accept feeds and batch them for your neighbours. You will have to do the following:
4.4. Testing the systemTo locally test the system, follow the steps given below:
4.5. pgpverify and controlpermsAs mentioned in "Section 2.4>", it becomes necessary to authenticate control messages to protect yourself from being attacked by pranksters. For this, you will have to configure the $NEWSCTL/controlperm file to declare whose control messages you are willing to honour and for what newsgroups alongwith their public key ID. The controlperm manpage shall give you details on the format. This will work only in association with pgpverify which verifies the Usenet control messages that have been signed using the signcontrol process. The script can be found at ftp://ftp.isc.org/pub/pgpcontrol/pgpverify. pgpverify internally uses the PGP binary which will have to be made available in the default executables directory. If you wish to send control messages for your local news system, you will have to digitally sign them using the above mentioned signcontrol program which is available at ftp://ftp.isc.org/pub/pgpcontrol/signcontrol. You will also have to configure the signcontrol program accordingly. 4.6. Feeding off an upstream neighbourFor external feeds, commercial customers will have to buy them from a regular News Provider like dejanews.com or newsfeeds.com. You will have to specify to them what hierarchies you want and decide on the mode of transmission, i.e. UUCP or NNTP, based on your requirements. Once that is done, you will have to ask them to initiate feeds, and check $NEWSARTS/in.coming directory to see if feeds are coming in. If your organisation belongs to the academic community or is otherwise lucky enough to have an NDN server somewhere which is willing to provide you a free newsfeed, then the payment issue goes out of the picture, but the rest of the technical requirements remain the same. One problem with incoming NNTP feeds is that it is far easier to use (relatively) efficient NNTP inflows if you have a server with a permanent Internet connection and a fixed IP address. If you are a small office with a dialup Internet connection, this may not be possible. In that case, the only way to get incoming newsfeeds by NNTP may be by using a highly inefficient pull feed. 4.7. Configuring outgoing feedsIf you are a leaf node, you will only have to send feeds back to your news provider for your postings in public newsgroups to propagate to the outside world. To enable this, you need one line in the sys and batchparms files and one directory in $NEWSARTS/out.going. If you are willing to transmit articles to your neighbouring sites, you will have to configure sys and batchparms with more entries. The number of directories in $NEWSARTS/out.going shall increase, too. Refer to first two sections of the chapter titled "Components of a running system>"for a better understanding of outgoing feeds. Again, you will have to determine how you wish to transmit the feed: UUCP or NNTP. 4.7.1. By UUCPFor outgoing feeds by UUCP, we recommend that you start with Taylor UUCP. In fact, this is the UUCP version which forms part of the GNU Project and is the default UUCP on Linux systems. A full treatment of UUCP configuration is beyond the scope of this document. However, the basic steps will be as follows. First, you will have to define a "system" in your Usenet server for the NDN (next door neighbour) host. This definition will include various parameters, including the manner in which your server will call the remote server, the protocol it will use, etc. Then an identical process will have to be followed on the NDN server's UUCP configuration, for your server, so that that server can recognize your Usenet server. Finally, you will need to set up appropriate cron jobs for the user uucp to run uucico periodically. Taylor UUCP comes with a script called uusched which may be modified to your requirements; this script calls uucico. One uucico connection will both upload and download news batches. Smaller sites can run uusched even once or twice a day. Later versions of this document will include the uusched scripts that we use in Starcom. We use UUCP over TCP/IP, and we run the uucico connection through an SSH tunnel, to prevent transmission of UUCP passwords in plain text over the Internet, and our SSH tunnel is established using public-key cryptography, without passwords being used anywhere. 4.7.2. By NNTPFor NNTP feeds, you will have to decide whether your server will be the connection initiator or connection recipient. If you are the connection initiator, you can send outgoing NNTP feeds more easily. If you are the connection recipient, then outgoing feeds will have to be pulled out of your server using the NNTP NEWNEWS command, which will place heavy loads on your server. This is not recommended. Connecting to your NDN server for pushing out outgoing feeds will require the use of the nntpsend.sh script, which is part of the NNTPd source tree. This script will perform some housekeeping, and internally call the nntpxmit binary to actually send the queued set of articles out. You may have to provide authentication information like usernames and passwords to nntpxmit to allow it to connect to your NDN server, in case that server insists on checking the identity of incoming connections. (You can't be too careful in today's world.) nntpsend.sh will clean up after an nntpxmit connection finishes, and will requeue any unsent articles for the next session. Thus, even if there is a network problem, typically nothing is lost and all pending articles are transmitted next time. Thus, pushing feeds out via may mean setting up nntpsend.sh properly, and then invoking it periodically from cron. If your Usenet server connects to the Internet only intermittently, then the process which sets up the Internet connection should be extended or modified to fire nntpsend.sh whenever the Internet link is established. For instance, if you are using the Linux pppd, you can add statements to the /etc/ppp/ip-up script to change user to news and run nntpsend.sh 5. Setting up INN5.1. Getting the sourceINN is maintained and archived by the ISC (Internet Software Consortium, www.isc.org) since 1996, and the INN homepage is at http://www.isc.org/products/INN/. The latest release of INN as of the time of this writing is INN v2.3.3, released 7 May 2002. The full sources can be downloaded from ftp://ftp.isc.org/isc/inn/inn-2.3.3.tar.gz 6. Connecting email with Usenet newsUsenet news and mailing lists constantly remind us of each other. And the parallels are so strong that many mailing lists are gatewayed two-way with corresponding Usenet newsgroups, in the bit hierarchy which maps onto the old BITNET, and elsewhere. There are probably ten different situations where a mailing list is better, and ten others where the newsgroup approach works better. The point to recognise is that the system administrator needs a choice of gatewaying one with the other, whenever tradeoffs justify it. Instead of getting into the tradeoffs themselves, this chapter will then focus on the mechanisms of gatewaying the two worlds. One clear and recurring use we find for this gatewaying is for mailing lists which are of general use to many employees in a corporate network. For instance, in stockbroking company, many employees may like to subscribe to a business news mailing list. If each employee had to subscribe to the mailing list independently, it would waste mail spool area and perhaps bandwidth. In such situations, we receive the mailing list into an internal newsgroup, so that individual mailboxes are not overloaded. Everyone can then read the newsgroup, and messages are also archived till expired. 6.1. Feeding Usenet news to emailIn CNews, this is trivially done by adding one line to the sys file, defining a new outgoing feed listing all the relevant groups and distributions, and specifying the commandline to be executed which is supposed to send out the outgoing message to that ``feed.'' This command, in our case, should be a mail-sending program, e.g. /bin/mail user@somewhere.com. This is often adequate to get the job done. We are sure almost every Usenet news software system will have an equally easy way of piping the feed of a newsgroup to an email address. 6.2. Feeding email to news: the mail2news gatewayWith our Usenet software sources has been integrated a set of scripts which we have been using for at least five years internally. This set of scripts is called mail2news. It contains one shellscript called mail2news, which takes an email message from stdin, processes it, and feeds the processed version to inews, the stdin-based news article injection utility of C-News. The inews utility accepts a new article post in its stdin and queues it for digestion by newsrun whenever it runs next. To use mail2news, we assume you are using Sendmail to process incoming email. Our instructions can easily be modified to adapt to any Mail Transport Agent (MTA) of your choice. You will have to configure Sendmail or any other MTA to redirect incoming mails for the gateway to a program called m2nmailer, a Perlscript which accepts the incoming message in its standard input and a list of newsgroup names, space separated, on its command line. Sendmail can be easily configured to trigger m2nmailer this way by defining a new mailer in sendmail.cf, and directing all incoming emails meant for the Usenet news system to this mailer. Once you set up the appropriate rulesets for Sendmail, it automatically triggers m2nmailer each time an incoming email comes for the mail2news gateway. The precise configuration changes to Sendmail have already been specified in the chapter titled ``Setting up C-News + NNTPd.'' 7. Security issuesIt almost seems strange that we are discussing security issues in the context of Usenet news servers. Usenet news has been one of the most open and free-for-all network services traditionally. However, with the exponential growth of the Internet, all services are becoming aware of potential threats. The community of Internet intruders too has acquired new profiles: a lot of Internet intrusion attempts are program-driven, and exploit a set of ``well known'' vulnerabilities, i.e. vulnerabilities which have been identified by the computer security and intrusion community and published in their reports and advisories. Thus, the question of ``Why will someone attack my harmless Usenet server?'' is no longer valid. It will be attacked if it can be attacked, merely because its IP address falls in a range of addresses being targeted, perhaps. Security issues for Usenet news servers fall into two categories. First come vulnerabilities which will allow an attacker to bring down your server or run code of his choice on it. Second come vulnerabilities which can distort or corrupt your Usenet article hierarchy, either by junk postings, unsolicited commercial messages, or forged control messages. The second category of threats is specific to Usenet news and needs Usenet-specific protection mechanisms, some of which require tapping into defence mechanisms designed by the Usenet administrator community. 7.1. Intrusion threatsHere we discuss the vulnerabilities which will allow an intruder to ``gain control'' of your Usenet server, or ``bring it down,'' either of which may be irritating, embarassing, or downright disastrous for your business or occupation. 7.1.1. Generic server vulnerabilitiesForemost among these vulnerabilities are those which render any server vulnerable to intrusion attempts. Most of these vulnerabilities are unrelated to Usenet news itself. For instance, if you have the Telnet service active on a server exposed to the Internet, then it is likely that systematic attempts by intruders to acquire usernames and passwords will bear fruit, using methods we will best leave to specialised texts on the subject. Once this is done, the intruder will merely ``walk into'' your server by Telnetting into it. We will not discuss this class of vulnerabilities here any further; they belong in documents dedicated to general security issues. For further reading, check the ``Security HOWTO'', the ``Security Quickstart HOWTO'', the ``User Authentication HOWTO'', the ``VPN HOWTO'', and the ``VPN Masquerade HOWTO'' ... and that's just from the Linux HOWTO collection. As one can see, there is, if anything, a surfeit of material on this and related subjects. There are vulnerabilities which allow an intruder to mount the so-called DoS attacks, which make your service inaccessible to legitimate users, even though it does not let the intruder in. The most publicised of these attacks were the SYNFlood and the Ping of Death attacks, both quite old and well-understood by now. A Linux server running a recent version of the kernel and properly configured, should be immune to both these attack methods. But network protocols being what they are, there are always new DoS methods being thought up, which can temporarily overload or slow down a server. Once again, the texts discussing generic security issues are the best place to study these vulnerabilities. 7.1.2. Vulnerabilities in Usenet softwareThen come server vulnerabilities, if any, which are caused specifically by Usenet news software. For instance, if it was possible for an intruder to issue some string of bytes to your server's NNTP server and cause it to execute a command of the intruder's choice, then this vulnerability would be in this category. Any server which accepts a text string as input from a client is open to the buffer overrun class of attacks, if the gets() C library function has been used in its code instead of the fgets() with a buffer size limit. This was a vulnerability made famous by the 1988 Morris Internet Worm, discussions on which can be found elsewhere. (Go Google for it if you're keen.) As far as we know, the INN NNTP server and the nntpd which forms part of the NNTP Reference Implementation both have no known buffer overrun vulnerabilities. This class of vulnerabilities is less significant in the case of NNTPd or INN because these daemons do not run as root. In fact, they would begin to cause malfunctioning of the underlying Usenet software if they ran as root. Therefore, even if an intrepid intruder could find some way of gaining control of these daemons, she would only be able to get into the server as user news, which means that she can play havoc with the Usenet installation, but no further. A daemon which runs as root, if compromised, can allow an intruder to take control of the operating system itself. UUCP is generally believed to be insecure. We believe a careful configuration of Taylor UUCP plugs a lot of these vulnerabilities. One vulnerability with UUCP over TCP is that the username and password travel in plaintext form in TCP data streams, much like with Telnet or FTP. We therefore do not advise using UUCP over TCP in this manner if security is a concern at all. We recommend the use of UUCP through a SSH tunnel, with the SSH setup working only with a pre-installed public key. This way, there is no need for usernames and passwords for the SSH tunnel setup, and passwords cannot be leaked even intentionally. And the UUCP username and password then passes through this encrypted tunnel and is therefore totally superfluous for security; the preceding SSH tunnel provides a much stronger connection authentication than the UUCP username and password. And since we set up our SSH tunnels to demand key-based authentication only, it rejects any attempt to connect using usernames and passwords when the tunnel is being set up. A third possible vulnerability is related to the back-end software which processes incoming Usenet articles. It is conceivable that an NNTP server will receive an incoming POST command, receive an article, and queue it for processing on the local spool; the NNTP server often does not perform any real-time processing on the incoming post. The post-processing software which periodically processes the incoming spool (the in.coming directory in C-News) will read this article and somehow be forced to run a command of the intruder's choice, either by buffer overrun vulnerabilities or any other means. While this possibility exists, it appears that neither the C-News newsrun and family nor INN are vulnerable to this class of attempts. We base our comment on the solid evidence that both these systems have been around in an intrusion-prone world of public Usenet servers for more than a decade. INN, the newer of the two, completed one decade of life on 20 August 2002. And both these software systems had their source freely available to all, including intruders. We can be fairly certain that if vulnerabilities of this class have not been seen, it not for want of intrusion attempts. 7.2. Vulnerabilities unique to the Usenet serviceThere are certain security precautions that a Usenet server administrator has to take to ensure that her servers are not swamped by irritating junk or configured out of shape by spurious control messages. These vulnerabilities do not allow an intruder to run her software on your servers, but allows her to mess up your server, causing you to lose a precious weekend (or week) straightening out the mess. 7.2.1. Unsolicited commercial messagesUnsolicited commercial messages are called SPAM. There is a war against SPAM being fought in the Internet community. The biggest battlefront is in the world of email. Second to that is Usenet newsgroups. There are many tools that Usenet administrators use in their battle against SPAM. The most important of these is the NoCeM suite. See http://www.cm.org/ for details of NoCeM, and the newsgroup alt.nocem.misc for the SPAM cancel messages which NoCeM reads to identify which articles to discard. Your server will need a feed of alt.nocem.misc to use the NoCeM facility. These special messages are signed by NoCeM volunteers whose job is to identify SPAM articles, list their message-IDs, and then issue these deletion instruction, digitally signed with special private keys, which tell all Usenet servers to delete the SPAM messages. Your server's NoCeM software will need public key software (typically PGP) and a keyring with the public key of each NoCeM volunteer you want to accept instructions from. Other anti-spam tools for Usenet services are listed in the Anti-SPAM Software Web page (http://www.exit109.com/~jeremy/news/antispam.html). The Cleanfeed software will clean out articles identified as SPAM. There are many others. SPAM is such a nuisance and a drain on organisational expense pockets (by wasting bandwidth you pay for) that it is almost imperative today that every Usenet server protects itself against it. We will integrate some selected anti-SPAM measures into our integrated source distribution soon. 7.2.2. Spurious control messagesControl messages, discussed in detail earlier in Section 2.4>, instruct a Usenet server to take certain actions, like delete a message or create a newsgroup. If this facility is ``open to the public'', anyone with half a brain can forge control messages to create twenty new newsgroups, and then post thousands of articles into those groups. In the mid-nineties, we were hit by a storm of over 2,000 (two thousand) newgroup control messages, which rapidly taught us the danger of unprotected control messages and the protection against them. The standard protection mechanism against this vulnerability is pgpverify, which can be downloaded from multiple Websites and FTP mirror sites by searching for pgpverify (the program) or pgpcontrol (the total software package). We have integrated this into our source distribution, so that our C-News works in a tightly coupled manner with pgpverify. pgpverify works using public key cryptography, much like NoCeM, and all the official maintainers of respective Usenet group hierarchies sign control messages using their private keys. Your server will carry their public keys, and pgpverify will check the sign on each control message to ensure that it's from the official maintainer of the hierarchy. It will then act upon legit control messages and discard the spurious ones. In today's nuisance-ridden Usenet environment, no sane Usenet server administrator receiving a feed of ``public'' hierarchies and control messages will even dream of running her server without pgpverify protection. 8. Access control in NNTPdThe original NNTPd had host-based authentication which allowed clients connecting from a particular IP address to read only certain newsgroups. This was very clearly inadequate for enterprise deployment on an Intranet, where each desktop computer has a different IP address, often DHCP-assigned, and the mapping between person and desktop is not static. What was needed was a user-based authentication, where a username and password could be used to authenticate the user. Even this was provided as an extension to NNTPd, but more was needed. The corporate IS manager needs to ensure that certain Usenet discussion groups remain visible only to certain people. This authorisation layer was not available in NNTPd. Once authenticated, all users could read all newsgroups. We have extended the user-based authentication facility in NNTPd in some (we hope!) useful ways, and we have added an entire authorisation layer which lets the administrator specify which newsgroups each user can read. With this infrastructure, we feel NNTPd is fit for enterprise deployment and can be used to handle corporate document repositories, messages, and discussion archives. Details are given below. 9. Components of a running systemThis chapter reviews the components of a running CNews+NNTPd server. Analogous components will be found in an INN-based system too. We invite additions from readers familiar with INN to add their pieces to this chapter. 9.1. /var/lib/news: the CNews control areaThis directory is more popularly known as $NEWSCTL. It contains configuration, log and status files. There are no articles or binaries kept here. Let's see what some of the files are meant for. Control files are dealt in slightly greater detail in "Section 4.3>"
9.2. /var/spool/news: the article repositoryThis is also known as the $NEWSARTS or $NEWSSPOOL directory. This is where the articles reside on your disk. No binaries or control files should belong here. Enough space should be allocated to this directory as the number of articles keep increasing with each batch that is digested. An explanation of the following sub-directories will give you an overview of this directory:
| ||||||||||||||||||||||||||||||||||||||||||