Version 1.0, Last modified 18th May, 1999
LINX Content Regulation Committee
Contributors: Richard Clayton (Demon Internet), Grahame Davies (Easynet), Chris Hall (Demon Internet), Andrew Hilborne (Hilborne Consulting), Keiran Hartnett (Frontier), Deri Jones (NTA Monitor), Paul Mansfield (PSI), Keith Mitchell (LINX), Rick Payne (Netcom UK), Nigel Titley (Level 3), Dave Williams (Demon Internet).
- IP Addresses
- Locating the machine
- Identifying the machine
- Identifying dial up accounts
- Ownership of accounts
- Identifying users
- Caller line identification
- Logging and audit trails
- Domain Name System
- Email traceability
- Usenet traceability
- Chat Services
This document deals with the issues that arise when an action has occurred on the Internet and one wishes to answer the question "Who did this?"
Determining who was responsible for an action is of particular interest when networks have been misused. This misuse may be at a technical level, preventing the networks from operating correctly, or it may be actual criminal, civil or social abuse. It is impossible to prevent all misuse, so it must be possible to identify the users whose misuse is a problem, after the fact, so that appropriate action can be taken. Of course, the ability to trace actions back to their source will, in itself, discourage unreasonable behavior.
To date, one of the most effective means for dealing with the problem of illegal content on the Internet has been the model embodied in the UK's Internet Watch Foundation (IWF). An important element within the "SafetyNet/R3" principles upon which IWF is founded is that of the individual responsibility of end users for any material they place on the Internet. By attributing responsibility in this way, ISPs are given a defence against liability for their customers' actions, which they may be unaware of and not consciously involved in.
Current Internet technology can be used to ensure that activity on the Internet is attributable to an individual person, provided that "Best Practice" procedures are followed. Such "traceability" is a general-purpose tool going far beyond its usage, in the IWF context, of determining the source of illegal material. It can be applied to finding the sources of the rapidly growing problem of Unsolicited Bulk Messaging (UBM), to tracing and preventing denial-of-service attacks, and in helping to deal with hacking, scams, frauds etc.
There are, of course, situations where it is desirable for anonymity to be preserved. For example, there is a genuine need for the use of anonymous re-mailers that hide the identity of email authors. These systems are in regular use by participants in discussion and support groups for persecuted minorities, abuse victims etc. It is not intended that applying Best Practice procedures should remove or erode the right to anonymity, but rather to ensure that anonymity is available to be used where and when appropriate. Anonymity should be explicitly supported by relevant tools, rather than being present as a blanket status quo, open to use and misuse.
People have a right to privacy when using the Internet. Ensuring that activity on the Internet can be traced back to the person responsible emphatically does not mean that people's activity should be routinely monitored. The only purpose of traceability is to allow misuse, once detected, to be rooted out.
This document describes a range of "Best Practice" operational and configuration measures that can be taken by ISPs to minimize unattributable activity by their own customers. Applying Best Practice at individual ISPs will not necessarily help to trace the activities of the customers at other ISPs, but it is "good citizenship" to contribute to the raising of standards.
Widespread adoption of the Best Current Practice outlined in this document will lead towards global accountability of users for their actions, and thus to a more secure and stable Internet.
- This document should not be read or understood to constitute legal advice. It is simply indicating what "Best Practice" might be.
- This document does not go into much detail on the uses to which "traceability" can be put, but concentrates upon describing "Best Practice" in achieving it.
- This document does not discuss whether messages sent over the Internet or other actions performed using the Internet may or may not be illegal. For a discussion of the issues that arise in this area, the LINX "Best Current Practice on Handling Illegal Material" should be consulted.
- The LINX is a membership organisation formed to provide efficient Internet connectivity within the UK. Whilst individual members may endorse this document and adopt the Best Practices within it, it does not reflect an official position of the LINX itself.
The Internet is formed from many interconnected packet switching networks. The packets of information which flow across the Internet contain source and destination addresses. Routeing protocols use these IP (Internet Protocol) addresses to direct packets to their intended destination and then route return packets back, with results, to their source.
Every address on the Internet is unique. Three regional Registries allocate IP addresses and maintain the databases that record their current ownership. Tracing the machine from which packets are sent is essentially a matter of looking up the source address in these databases. However, there are some details that need to be attended to if practice is to match up with theory. Sections 4 through 7 of this document cover these details.
Section 4 discusses IP addresses and shows why it is usually possible to determine which machine originated the packets of interest, despite the fact that it is possible to forge the source address in a packet. It also discusses how Best Practice can ensure that such forged packets fail to propagate.
Section 5 considers the techniques for converting an IP address into a host machine identity and discusses the pitfalls and their avoidance.
Section 6 looks at the cases where there is not a one-to-one mapping from IP addresses to host machines and explains how Best Practice can deal with this.
Section 7 examines the particular problems which arise with dialup accounts and discusses the methods of determining which dialup account was using an IP address at a particular time.
Section 8 considers the relationship between using an account and being the owner of it. Section 9 considers the various ways in which a "real world" owner can be identified when an account is created. Section 10 looks at the use of Caller Line Identification (CLI), which specifies where an incoming call came from.
Section 11 deals with the issues that arise when storing the logging information required to provide traceability. Section 12 is a short review of the Domain Name Service (DNS), and Section 13 is a more general discussion of the security procedures which must be in place to ensure that traceability information can be trusted.
The rest of the document is concerned with some particular issues that arise with common services. Section 14 deals with email, Section 15 with Usenet news and Section 16 with chat services.
Determining the origin of a packet on the Internet is usually as simple as inspecting the packet and extracting the source address. The ownership of the address can then be determined by using the "whois" program to interrogate the regional registry databases that describe IP address allocations.
The destination address in a packet is inherently "correct" as that specifies where the packet is to be delivered. On the other hand, the source address need not specify the true source of the packet, as the delivery mechanisms do not use this address. However, much Internet traffic is two-way and the source address is used as the destination address for responses. So if the source address has been forged then there can be little in the way of an ongoing conversation. In practice, therefore, the source address is usually available for traceability purposes.
It is somewhat like receiving a letter through conventional mail. The envelope will contain your name and address but there may be some doubt about any return address that the sender provides. However, to strike up a two-way correspondence with a stranger the return address will have to be valid. Most interesting activity on the Internet involves such two-way traffic.
Nevertheless, some one-way traffic is of interest. Denial-of-service attacks aim to overwhelm the resources of a host or router so it cannot perform its normal functions. Many of these attacks will achieve their aim even if packets are not returned to their source. Thus miscreants can hide their identity by sending these packets with a deliberately forged (or "spoofed") source address. The spoofed addresses may be other valid Internet addresses, which can cause secondary problems as response packets arrive at innocent third parties, or they may be chosen from the generic address ranges described in RFC1918 which are not assigned to anyone particular.
Even where a two way conversation is required, there are "hacking" techniques for predicting the responses made by the destination of TCP packets (which underlie almost all protocols) and thus fooling a machine into believing that the source of a connection is elsewhere. These techniques are defeated by the use of good random number generators within TCP/IP stacks and by filtering at network borders to discard traffic from outside that network which has a source address which can only occur within that network. Further information can be found in the CERT advisories CA-95:01 and CA-96:21 (see Appendix B for these and other references).
Source addresses are trustworthy if the routeing infrastructure enforces validity at all points along the path that the packet has travelled. However, the further away from the origin of the packet, the less likely it is that forged source addresses can be distinguished from valid ones. It is therefore Best Practice to check source addresses at the point at which packets enter the network. IP filters should be implemented on the Network Access System (NAS) equipment that handles dialup access to the Internet. Similar filters should be applied to the routers that handle leased lines customers.
Most NASs can install IP filters under the control of authentication and authorisation systems such as RADIUS or TACACS+. These filters can be used to restrict packets coming in from the remote host, discarding those that do not have the correct source address. In the absence of such filters in the dialup equipment itself, it is possible to employ filtering on other routers within the local network to achieve a similar result. However, filtering at a later stage is less desirable. It can only ensure that packets contain a source address that is used in the local network. It cannot ensure that particular packets contain exactly the address that they should.
RFC2267 provides further discussion and detail about "source address ingress filtering" and why it should be employed on all access ports.
Source routing is a standard part of the Internet Protocol that allows the originator of an IP packet to specify the exact route that packets will take to reach their destination. This route can be "strict" and fully specified or "loose" and partly given. These protocol facilities were provided for network administrators to allow them to diagnose faults in the network using such tools as "third party traceroute".
However, the technique can be exploited to allow traffic to take paths through the Internet which avoid some filters or firewalls. It can also be combined with source address spoofing (see 4.2) to make forged traffic look quite genuine.
Most routers have the ability to block source routed packets, for example on Cisco routers the appropriate global command is:
no ip source-route
There is seldom any reason to allow ISP customers to send source routed packets and Best Practice is to configure appropriate routers to block them. If the ISP's staff are not using the types of diagnostic tools that use source routeing then the blocking can, and should, be done on all possible routers.
It should be noted that the application of the techniques in 4.1, 4.2 and 4.3 will not in themselves prevent denial of service attacks such as "TCP SYN flooding" or "fraggle" where the idea is to overwhelm the resources of a server or router. However, Best Practice in providing traceability will improve the chances of locating the origin of such attacks.
Once the IP address of a remote system is known with some certainty, it is then possible to try to identify the machine name, its location and owner.
When logging IP addresses it is often a good idea to do a "reverse lookup" in the DNS and then log the "user-friendly" machine name for the use of human readers. It is Best Practice to always log the IP address, because this is the only fact that is known with certainty.
Besides traceability "after the fact", an ISP may wish to ensure that the source of traffic is traceable before allowing use of some facility or service whose misuse the ISP wishes to minimize.
The delegation of IP addresses can be checked in the appropriate registry, RIPE, ARIN or APNIC (see Appendix B for references).
The "whois" tools can report the name of the organisation registered to be using the IP address along with postal and telephone contact details. It is also possible to get the same information for any intermediate organisations that have delegated the address assignment. These intermediaries are usually the "upstream" connectivity providers.
The simplest technique of determining which machine corresponds to an IP address is to attempt a reverse DNS lookup.
The Domain Name System (DNS) is usually used to translate human friendly names of machines into IP addresses, but it also supports a system to do the reverse, and translate an IP address into a name. If a machine has the dotted IP address of a.b.c.d then there should also be an entry for d.c.b.a.in-addr.arpa within the DNS, with a PTR record to the machine name.
It is Best Practice to ensure that there are reverse DNS entries for all addresses that are in use on the Internet. Where address space is delegated to customers an ISP should take active steps to ensure that this requirement is understood and acted upon. It is also important to ensure that the in-addr.arpa domains are properly delegated so that DNS lookups will find the right records.
Unfortunately, Best Practice is not always followed, and reverse DNS records do not always exist. Even when they do exist, it is necessary to assume that the owner of that part of in-addr.arpa namespace has been truthful when entering the PTR records. Reliance is also being placed upon the basic correctness of answers obtained from the DNS. This reliance may be ill-founded if the databases being interrogated are insecure, perhaps because the Best Practice outlined in Sections 12 and 13 of this document has not been followed.
So, where a reverse lookup returns a machine name, it is Best Practice to do a forward lookup of that name to check that it returns the original numeric IP address. It is also wise to check that the reverse lookup result is consistent with any other available information pointing to the machine's identity, such as a registry database entry or a traceroute result.
A traceroute program can be used to determine a path across the Internet to the machine of interest. Special note should be made of the names of the machines that appear over the last few hops. Traceroute results will always lend credence to an identity and location that has been established by another method.
Traceroute can also assist in providing the names of "upstream" connectivity providers who it may be useful to contact if approaching the owner of the actual remote machine is impossible or undesirable.
To improve traceability, applications may refuse incoming connections where there is no reverse DNS entry, or where the forward lookup does not match the reverse lookup.
Although most legitimate usage will be unaffected by such arrangements, there can be problems when attempting to interwork with poorly configured sites. For this reason, it is not considered proper to configure email systems to act upon this type of check and RFC1123 (section 5.2.5) prohibits the rejection of email on the basis of apparently incorrect machine names within a HELO command.
With Best Practice procedures in place it should be possible to determine ownership of the IP address from which packets originated.
Unfortunately, although the IP address is unique, it may not always identify a physical machine in an unambiguous manner.
There may be difficulties where a dynamic sharing protocol, such as DHCP, is used to allocate from a pool of IP addresses to a group of machines. Local logs and other auditing information will have to be consulted to determine which particular machine was using a given IP address at a relevant time. Similar issues arise with Network Address Translation (NAT) which can hide many machines behind one IP address.
Many ISPs provide dialup access to the Internet over telephone lines.
A few ISPs provide each dialup account with its own IP address. When the user connects to the Internet that "static" (or fixed) IP address is used for the connection. Thus the packets using that source IP address can be traced back to the account which created them.
The use of static IP addresses for dialup accounts is relatively unusual and most ISPs dynamically allocate an IP address to a dialup connection from a pool of addresses when the connection is made. The source address in a packet does not, therefore, directly identify an account. Mechanisms to link a dynamically allocated IP address to the dialup account that used it are discussed in the next section.
Previous sections have shown that where dialup access accounts are using pools of IP addresses the traceability problem "Who did this" requires an answer to the question: "Who was using the Internet from this IP address at this particular time"?
Most ISPs use RADIUS, or similar systems, for authenticating dialup accounts. These systems generate login and logout records that give details of users and the IP addresses they were allocated. It is possible to interrogate a database containing these records to determine which account was allocated a given IP address at a particular time. Unfortunately, there are difficulties:
Mundanely, the syslog records produced by some of the most popular dialup routers are difficult to parse for login information. Each new firmware release seems to change the semantics and the syntax of these logs. Fortunately, the trend is towards more and better-integrated information.
Using RADIUS logout records does not scale well in current implementations. The size of the records is substantial and it is necessary to recombine flat files that have been delivered to multiple servers. In order to generate useful data it is necessary to do a substantial amount of post-processing.
The login/logout information can be fallible:
When inspecting login records it should be noted that it is usual to employ the "unreliable" UDP protocol for transferring the information, so packets may have been dropped on the floor in the face of congestion. This may mean that an action is ascribed to a previous user of the same NAS port.
Finally, timing is everything:
In all cases, for database records to be of any use, their timestamps must be accurate. The obvious mechanism to use is NTP (Network Time Protocol). Where the NAS does not support this protocol timestamps will have been applied by the syslog or RADIUS host, which introduces uncertainties caused by network and scheduling delays.
The machine where the action to be traced took place will have supplied the time for which IP address usage is to be checked. If that machine does not have an accurate idea of time, or supplies uncorrected values from other timezones, then the query results will be completely misleading.
As section 7.1 has indicated, the most common technique for determining which account was using a NAS port at a particular time has a number of significant problems associated with it. Other techniques are possible, but none is widely deployed enough, or technically good enough, to be currently described as being "Best Practice". They may, nevertheless, be extremely useful to ISPs that choose to deploy them.
For example, it is possible to arrange that logins and logouts change the hostname associated in the DNS with a dialup IP address to reflect the identity of the account that is currently using the NAS port. If the machine wishing to record traceability information is aware that this is occurring, then it can record contemporaneous account identification information.
Within a particular ISP's system it may be possible to modify particular applications to record traceability information obtained using local knowledge (and permissions). For example, when a news article or email message posted via NNTP or SMTP is processed, a script could check the login id of the poster by accessing the NAS (by SNMP or Telnet) and then record the information in an appropriate header field like "NNTP-Posting-Host:".
It would not be Best Practice to ignore the problem of identifying dialup accounts merely because no perfect system currently exists. If a series of events has occurred over a few hours or days, it would be unusual for the problems described above to make all such events untraceable.
Furthermore, it is often possible to narrow the search down to a relatively small number of accounts. If multiple incidents have occurred, there may only be one dialup account which is suspect in every case, thus achieving the goal of traceability.
As has been explained, from a source IP address it should be possible to trace back as far as a machine or dialup account. The final step, to identify the actual user who was "at the keyboard" when the traffic of interest was exchanged, poses its own difficulties.
There is a protocol called "ident" (see RFC1413), where an ident server running on a remote machine can be interrogated to determine which user process is running. However, people who have appropriate local access to the remote machine will be able to subvert the ident mechanism. Only the administrators of the remote machine can interpret or assess the value of ident protocol results and thus it is of limited practical assistance in providing traceability.
Although the actual user may turn out to be impossible to determine, for many purposes it is possible to treat a machine as having an owner who takes responsibility for any actions taken by people they permit to use the machine. Similarly a dialup account can be considered to have an owner who is able to control who uses that account.
Thus the traceability problem has now become that of determining the ownership of a dialup account, or of a particular machine. Best Practice consists of ensuring first, that all accounts or machines do indeed have such "owners", second that it is possible to identify these owners in the "real world" and finally that owners understand their responsibilities for all use of their accounts or machines.
The first two interrelated issues, of assigning an owner and knowing who that owner might be, have a number of possible solutions as set out in Section 9 below. The last issue, of taking responsibilities seriously, is a matter of education, backed up with suitable contractual arrangements, acceptable usage policies and so on.
Before looking at the detail of user identification, it is appropriate to advise caution in interpreting this identification.
It is usual to think of accounts as being equivalent to individuals, who have ownership of the account and who are responsible, morally or literally, for all activity performed using the account. This simplistic view is not always accurate. There is usually a password that allows an individual to access the account and it is hoped that the individual will not let this password become widely known. If the password does become compromised then anyone who knows it can use the account without the owner necessarily being aware of this. So it is knowledge of the password that really matters in practice, not the formal ownership of the account it grants access to.
It is Best Practice for ISPs to ensure that the significance of passwords is impressed upon customers. It is also Best Practice for passwords to be chosen by the customers and for them never to be made known to ISP staff. Help should be given in choosing passwords of an appropriate nature for the use they will be put to.
Gullible customers may reveal passwords to someone impersonating ISP staff. Best Practice is to emphasise that there are never any circumstances in which a real member of staff will ask a customer to reveal their password.
Sometimes a password is deliberately published, perhaps a sales promotion will allow trial use, and it is felt to be simpler to allow multiple use of one account rather than go to the expense of issuing individual accounts. Sometimes the account and password are just "well-known", which is why modern security practice calls for the removal or reconfiguring of manufacturer supplied accounts and passwords.
It is Best Practice to avoid truly anonymous accounts and to ensure that, even for trial use, there is some sort of real-world information known about users and that each user has an identifiable account with its own password.
It is not uncommon for anonymous accounts to be justified on the basis that they have limited access to the Internet. It is certainly true that misuse of the ISPs own machines is impossible if these accounts are given "read only" access to services such email and Usenet news, but users of such accounts may be able to misuse other facilities either at the ISP or elsewhere.
In particular, it is a mistake to treat Web access as being a "read only" service, as in practice there are many gateways and services on the Web that allow creation of email or news messages. These sites are usually responsible enough to validate their users. However, close inspection of their security model usually reveals an expectation that the last links in the traceability chain will be provided by the ISP, who let the user access the Internet in the first place. Thus, if the Web can be used at all, all the issues of traceability will have to be considered.
This section discusses the ways in which it may be possible to determine the "real world" identity of a user (or owner) of an account when that account is created. None of the individual solutions is a panacea and so Best Practice is to adopt as many of these solutions as is appropriate to the type of Internet access supplied and the equipment available.
The simplest way of identifying a user is to ask them for their name and address. The vast majority of users will provide valid details.
Those who attempt to mislead may be detectable by data validity checks. It is possible, for example, to use standard databases to check that the house number and street corresponds to the postcode that has been provided.
It is common to insist that account holders provide a credit card number before their account is activated. If the account is to be paid for, there are obvious reasons for taking this action - but it has the desirable side effect of providing a link from the Internet account into the real world.
The mechanism is not infallible since lists of stolen cards circulate and the formulas for creating apparently valid numbers are widely published. A check with the credit card company will demonstrate whether the card has been issued and, for a fee, a bureau will handle this type of validation.
If the user is called back by telephone before the account is activated this will provide some assurance that they can be contacted again. The technique is labour intensive.
A similar scheme is to require the user to provide a fax number for contact. This can be useful in business transactions, although more formal methods of credit checking may be more appropriate. In practice, most traceability problems arise from residential use.
The availability of cryptographic digital certificates provides a possible future solution to the problem of user identification. The ISP could accept account applications that were digitally signed by the user and rely on the promise by the issuing authority that the certificate belonged to a traceable individual in the real world.
This is not a viable scheme today, with many issues such as cost, reliability and the lack of any legislative framework still unresolved.
Free trials have been provided by the ISP industry for some time, but a recent development is to provide accounts for free in perpetuity.
The necessity to "pay up" has previously meant that account users must eventually reveal themselves to hand over their dues, and even if the name and address turn out to be incorrect, the credit card or bank account details provide traceability information. In the past, ISPs have avoided offering accounts to unidentifiable people since this has been seen to affect whether or not they were likely to be paid.
"Free" services have no such problems, but must still manage to achieve traceability or at the very least be able to detect the return of a user who has previously misused their system. Best Practice at present is to require that CLI be available, as discussed in the next section, or that there is some existing relationship with the user that will in itself provide traceability.
Calling Line Identification (CLI) provides the number from which a telephone call is made to the destination equipment. A telephone company (a Carrier) will usually be able to map this number to an actual physical location. CLI may not be available over analogue lines, but this is of decreasing importance since the ISP end of connections is now almost exclusively digital.
In an ideal world, from a traceability point of view, all telephone calls would accurately carry the CLI of the calling party. The real world is somewhat different.
In the UK most Carriers have access to two CLIs. One is the user presented CLI and the other contains an "engineering" CLI plus some extra information stating what may be done with the user CLI. The engineering CLI is important to the Carriers because this is the only method of calculating charges between them.
ISPs are not usually Carriers and so they will only be able to access the user CLI. However, the user CLI may not be presented because CLI was either unavailable, or it was purposely withheld by the caller. If it is present then it will be a valid number or a valid partial number - some PABX systems generate partial numbers and there are a few old UXD5 BT exchanges that only have a partial CLI, but these are being phased out.
As of early 1999 ISPs get a usable CLI on over 80% of calls. There have been occurrences where the CLI passed on by the Carrier is one of their internal exchange numbers and it is possible that this error may still be present or will recur. This will invalidate some data.
The UK is moving towards a more comprehensive system for CLI whereby a user can elect to:
withhold CLI on a call by call basis,
withhold CLI on all calls,
provide CLI on a call by call basis,
or have a Presentation Number.
The Presentation Number (PN) will be a number that the user has the right to use and wishes Called Parties to return calls on. It is for use in scenarios such as doctors withholding home CLIs but wishing to present a surgery CLI for return calls.
There are a number of permutations of CLI/PN but this does not concern this document. With the co-operation of the telephone providers CLI is an appropriate method of determining a real-world location of a user. Even without CLI an approximate time for the call can assist a telephone company, who may have records of their own, in determining identity.
The problems of missing data and scalability that were rehearsed when discussing login information in section 7 can recur, but they are not as severe:
The CLI is delivered (reliably) via the RADIUS request packet or it can be collected via SNMP. CLI is not normally available by telnet access to the NAS. In the case of 3Com, CLI is available via syslog but may not be available via RADIUS accounting.
Ascend and 3Com NAS equipment does not differentiate between unavailable and withheld CLI although Cisco kit may do so. Further pressure by ISPs is needed to ensure that manufacturers supply comprehensive information in the future.
At the current time, insistence upon the availability of CLI may not be a commercial option because users may not be able to provide it. Obviously, withheld CLI is the users choice, but 15-20% of calls have unavailable CLI because of the way in which they have been handled by the telecommunications Carriers. It would be desirable to see regulatory pressure to reduce this percentage towards zero. Also, enforcement, presumably by OFTEL, may be needed to ensure the maximum possible accuracy of the information that is presented.
Even where CLI is present, this does not of itself guarantee that access is not anonymous. Calls from phone boxes or public phone-points present obvious traceability issues. Similarly, calls from hotels or made over international connections may suffer from limited practical traceability. If the connection is from a Cybercafe then it is unlikely that there will be a record of the actual user.
Some telephone chargecards may generate sufficient traceability information on calls that are otherwise a problem, but this will not necessarily be reflected in the CLI presented to the ISP on which access decisions are to be made.
ISPs that rely on CLI to provide their traceability information will have to consider all of the issues presented above. They should consider using specific IP address ranges for calls which they consider less traceable, to allow other ISPs to be aware of the lack of accountability. The risk of not doing this will be the growth of lists of all possible dialup port IP addresses. This will then lead to perfectly genuine, easily traceable, users encountering limitations to their access to services and facilities throughout the Internet.
Misuse of the Internet and its systems cannot always be detected immediately. Traceability may be required some time after an event has occurred. This means that it is necessary to keep the logging information in case it is wanted.
Best Practice is to keep the logs required to provide traceability for at least three months. This can involve a significant commitment of online storage space, so it may be necessary to rotate older information onto magnetic tape or other offline storage media. Fortunately, such information will be rarely consulted. ISPs will be able to determine their own tradeoffs between speed of access and the likely cost of making such access.
The type of data which is logged for traceability is almost certain to be personal data within the meaning of the Data Protection Act (1984 and its 1998 replacement). Even if the data does not identify individuals directly, it can be combined with other data so as to do so; that, after all, is the raison d'être for logging it in the first place.
This means that all ISPs must register as a "data user" under the Act. They will need to describe the purposes for which data is being held and will be subject to all the restrictions of the Act in terms of keeping the data secure, avoiding disclosing it, etc. Best Practice is to secure all of the data against casual examination so that only authorised personnel can extract information about users.
One of the Data Protection Principles is that data should not be kept for longer than is necessary for the purposes that have been declared, and naturally it is important to conform to this. However, in the special case of "traffic data" (which in this instance means the records of phone calls made to an ISP) the European Directive 97/66/EC requires (paraphrasing considerably) that the data must be destroyed or anonymised once it is no longer required for billing purposes. It may however be kept for the prevention of fraud, and since an ISP will have Conditions of Use which ban misuse, and it is fraud to infringe these, it will be lawful to keep the data for the purpose of traceability. However, it cannot be kept forever. The LINX has secured the opinion of the Data Protection Registrar (soon the Data Protection Commissioner) that an upper limit of six months should apply. After that period traffic data must be discarded or processed so that it cannot be related back to identifiable individuals.
Besides the obvious traceability requirements of such information as login times, IP address allocations and caller line identification, traceability also depends upon keeping supporting records such as customer contact information.
It is useless knowing an account name if the records of a short-term triallist have been destroyed when they terminated their account.
Best Practice will ensure that any information known about users is preserved for at least three months after they cease to be customers.
As has been apparent throughout this document, one of the requirements for traceability is to accurately identify the originating machine. Many logging and recording systems avoid "unfriendly" IP addresses and use hostnames instead.
Hostname logging relies upon the availability, accuracy, completeness, stability and security of the Domain Name System (DNS). This can make hostnames unreliable and that is why it is Best Practice to always record the IP address. However, inadequate logging systems containing only hostnames will continue to exist for some time to come, so in the meantime traceability is improved by ensuring that the DNS is kept as secure as possible.
Traceability is not, of course, the only reason for ensuring that DNS systems are secure against unexpected failure or unauthorised tampering, but the issue can be overlooked. Even if there has been no actual breach, it may be necessary to demonstrate that there is sensible protection for DNS systems. In cases where logging information does rely upon DNS data, defence lawyers are likely to cast aspersions on the validity of these logs.
It is usual to provide resilience against network or machine failures by using multiple name servers that are on different networks and rely upon different infrastructure such as power supplies. It is not uncommon for secondary servers to be on a different continent. All these DNS servers must be kept secure so that their data may be relied upon. Best Practice is to provide internal name servers for use within a network and external name servers for providing DNS to the rest of the Internet, but the details of this are well beyond the scope of this document.
All traceability information depends upon being able to trust the systems that generate, transmit, store and process the raw data. This was briefly discussed in the previous section in the context of DNS, but in fact it applies to all systems involved in a particular incident. If servers are compromised then it is no longer possible to trust the software running on them to report valid information or to trust any historic logging information to remain unchanged.
This section discusses some general security Best Practice procedures which should be applied to machines in general and to those servers involved in providing traceability in particular.
The operating system of the machines should be "hardened". All relevant security patches should be applied to ensure the machine remains secure. Recent versions of the service software should always be used and once again all relevant security patches should be applied.
Unnecessary services and applications should be removed. In particular, public systems such as email or web servers should not be run on the same hardware as another service. There are too many security vulnerability potentials in running such services, a failure in any one of which would also compromise the service that is actually being provided on the machine.
Access to service machines should be restricted as far as is possible. Firewalls or packet filter routers should ensure that all traffic is blocked except the particular service being run upon the machine and any necessary management or monitoring protocol access. The following specific points are important:
- Only internal staff, ISP employees, should access the machines for management purposes.
- Access should be restricted to as small a number of physical hosts as possible.
- Login passwords should be of the "one time" variety and the protocol chosen should never send them in the clear. This reduces the damage that can be done by "sniffing" traffic if another machine on the same network segment were to be compromised.
- Low security telnet and FTP protocols should not be used to manage machines. Encrypted and authenticated systems such as Secure Shell should be used to ensure that passwords and other sensitive data never pass across the network in clear text.
Service machines should, ideally, be connected to the rest of the network in such a way that traffic in and out of these servers cannot be "sniffed" by a machine that may become compromised. Switched Ethernet can improve security, especially if the switch's MAC table is pre-filled with the MAC addresses in use. However, some switches can be abused so that their MAC "routing" tables overflow and they can then behave partially like a hub (broadcasting all packets to all hosts). Thus although a switch can improve security, it should not be relied on.
- Accounts on service machines should be limited to staff who actually require them, keeping the number of accounts on the systems to a minimum.
- Where services require day to day updating, as would be the case for DNS for example, then limited "shell accounts" should be used to make these changes. The capabilities of these accounts should be as restricted as possible.
- There should be a documented method for staff to manage the machines - how to make changes and so on.
- Staff activity on the systems should be logged - which may help in spotting malicious activity should it arise. The logs should be archived onto tape or other offline media at defined, regular, intervals.
Once the service is deployed and configured, it must be thoroughly tested to ensure that it functions correctly and is secure. Security is different to many other areas of ISP activity in that users do not act as "testers" - they will not notice if the security is flawed.
It cannot be assumed that services are secure "because they are designed that way". They must be actively tested.
Earlier sections of this document describe how it is possible to identify a user from an IP address. In this section specific attention is paid to the traceability issues which arise with email.
As might be expected, the key issue is to capture the time an action took place and the IP address that was responsible.
When email systems are used inappropriately, perhaps for Unsolicited Bulk Messaging (UBM), there is often an attempt to disguise the true origin of the email. The sender may give a false hostname or report an incorrect creation time.
As email is passed from machine to machine, extra "Received:" header lines are added. These lines were originally invented to allow problems with the email system to be tracked down, but they serve equally well to record the path that an email has taken, and hence they can lead to the original source, whether it has attempted to disguise itself or not.
Since the unscrupulous may generate fake "Received:" headers before injecting their email into the system, it is very important that every email handling system adds its own header. The boundary between the valid and invalid headers will then become clear, and hence the source of the email can be identified with certainty.
Besides the issues which arise around UBM (and the interested reader is referred to the Best Current Practice document on UBM also published by the LINX), traceability is also important in dealing with incorrectly delivered or undeliverable email. Email generated by broken clients can be easily traced to a particular machine, and thus email will not go astray. If there is misuse, then when the source of unwanted email is clear, complaints can be directed to the originator and not to some innocent party.
Email is made traceable by virtue of headers placed into the email as it passes through each system. These headers are placed at the top of the email and previous headers are left alone. Reading downwards, it is possible to work back in time towards the original source.
An example of a typical header line is:
Received: from faked.and.invalid(really [192.168.0.1]) by mailer.example.net with SMTP id ABC001 for
; Sat, 25 Dec 1999 03:00:00 +0000
Here, the email was sent from [192.168.0.1], which declared itself in the SMTP protocol HELO message to be the host "faked.and.invalid". The email was received by the machine "mailer.example.net" and any local logs on that machine will refer to the message under the identifier "ABC001". The email was timed to have arrived at 3 o'clock (GMT) on Christmas morning and it was addressed to
Note that it is most important for the Received header to record the IP address of the sender. It should not rely upon the HELO message to identify a machine unless it has performed its own DNS check to ensure that it is correct. In any event, Best Practice is to always record the IP address in case the DNS records later turn out to be less than reliable.
Implementations of email systems developed within the last two or three years are generally much better than older systems at providing trace information, because unsolicited email has become such a problem. However, it is frequently possible to improve older systems by reconfiguration.
As we have seen, it is vital to record the IP address of the sender. This is the only piece of identifying information that cannot be trivially forged. It is also necessary to clearly identify the email handling machine that is adding the Received line. That means placing a fully qualified domain name into the header.
The domain name is probably fetched from the underlying operating system and TCP/IP stack. So, these must be correctly configured with, for example, "mailer.example.net" specified rather than just "mailer". Users of Microsoft Exchange should pay special attention to avoiding spaces in machine names, because these are converted to the (invalid) underscore character when communicating with other machines. The hostname should also exist in the DNS for the domain, it should have a valid reverse DNS lookup, and of course the name should be unique.
The final configuration issue is to record an accurate time and date. Best Practice is to use NTP for synchronisation. Care must also be take to configure correctly so that the correct timezone is given. It should also be checked to ensure that it will remain correct during summer months when daylight saving is in force.
There are many email systems in use, so it is not possible to comment upon them all. Most modern systems like Exim will work correctly "out of the box", but older systems may have legacy configurations even if the latest code is in use. In particular, sendmail is widely available and often produces insufficient information in Received lines, in particular by failing to log the IP address of the email sender. This is easy to correct. Replace the lines:
HReceived: $?sfrom $s $.by $j ($v/$V) id $i; $b
by the lines:
HReceived: $?sfrom $s $.$?_($?s$|from $.$_) $.by $j ($v/$Z)$?r with $r$. id $i$?u for $u$.; $b
Usenet distributes news articles all over the planet by means of a "floodfill" system whereby the articles are passed from machine to machine until they have achieved universal distribution. In early 1999 a "full feed" of all possible Usenet articles was about 700,000 articles per day (averaging 25 to 30 gigabytes of data), but the volumes are constantly growing.
News articles enter into the newsfeed directly at news servers, or indirectly via mail-to-news gateways that accept suitably formatted email messages. Traceability is achieved by means of an "NNTP-Posting-Host" (or the older "X-NNTP-Posting-Host") header, which the news server adds to articles to record the machine which originally injected the article.
"X-Trace" is an additional header now commonly added by news servers. It will usually include the name of the news server where the article was injected, the time of injection and the IP address of the machine the article came from.
There is also a "Path:" header to which news servers add themselves as articles pass through them. This was designed to make the floodfill algorithm efficient, but it also provides further traceability information by showing where the article originated. Just as with email Received headers, the unscrupulous may preload this header and careful inspection may be needed to detect this type of forgery.
To control Usenet abuse it is essential to ensure that the originators of articles can be traced. At present there are no formal standards for these tracing headers (most notably for the X-Trace header), but clearly the key information that such headers must provide is an IP address and timestamp.
Where timestamps are required to disambiguate IP addresses, then Best Practice is to use NTP to ensure that these timestamps are accurate.
As with email, it is important to ensure that any domain names that are used are valid, and the recording of IP addresses is to be encouraged.
It is Best Practice to remove any pre-existing NNTP-Posting-Host headers from articles being posted to Usenet and to add a new NNTP-Posting-Host header to identify the posting machine. It also Best Practice to add an X-Trace header containing the IP address and timing information. Modern versions of server software such as INN will include these headers in articles by default.
Most ISPs do not need to accept articles for posting from anyone except their own customers. This eliminates abuse from hard to trace "foreign" users. In the case of INN a simple change to the nnrpd.access file to limit which source IP addresses may post is all that is needed.
Providing traceability for articles that pass through mail-to-news gateways is rather more problematic.
Since email may pass through several systems before arriving at the gateway it is of limited use to record the immediate originator of the message. Parsing of "Received:" lines to identify the originator is a complex task and not really suitable for automation. Large numbers of X-Received lines generated from the Received lines from original message would be unwelcome on Usenet. So it is hard to provide reliable traceability on the articles themselves, though full records should be kept at the gateway itself.
It may be possible to handle locally generated email correctly, because the Received lines will be of a known format and will be trustworthy. Therefore usage of mail-to-news gateways by customers of the ISP running the news system is far more likely to provide traceability within the articles, than when the gateway is used by arbitrary remote hosts.
Most ISPs do not need to run a mail-to-news gateway. Most ISPs who have this need have no requirement to run a gateway that is open to the world.
Chat services allow individual users to converse with each other, usually by typing in text, though systems such as Microsoft's NetMeeting will also handle graphical information.
Some chat services run as "CGI"s on web servers where extensive logging of connections and activity is very much the norm. More traditionally they are run over Internet Relay Chat (IRC).
Chat services may be public, with open forums where many users participate, or they may be private, with a restricted number of users. Even where conversations start on a chat service, most software has facilities for setting up further one-to-one conversations that are no longer reliant upon the chat service and are invisible to that service. Some chat systems, such as ICQ, use servers solely for determining whether users are connected and what IP address they are currently using.
IRC allows global participation in conversations by the provision of "relay" servers maintained by individual ISPs and institutions. These servers are combined into "IRC networks" and the text of the conversations is automatically transferred between the servers so that it available as required.
The IRC networks have evolved with cultures that place the privacy of users above all other matters. If this privacy is negated by actions on a particular "relay" then it is likely that its future participation in the network will be impossible.
The server software often provides users with the ability to render themselves largely invisible to server operators and restricts the ability of those operators and other users to join forums in which those users have granted "operator" privileges to themselves.
So there are both social and technical reasons why institutions that are operating a server within an IRC network will be unable to monitor the individuals using their relays. Nevertheless, it is considered acceptable to generate some logging information about users (strictly, about traffic coming from particular IP addresses):
- The machine from which the user connected to the server.
- The time a user connected to the server.
- The length of time the user remained connected to the server.
The problem with this level of logging is that when using IRC the users adopt "nicknames" by which they are known to other users of the network. The users will also move from forum to forum, known as "channels" within the IRC universe. When misuse occurs the reports of this will generally be in terms of the user's nickname, or will recount events on a particular channel at a specific time.
In order to provide Best Practice traceability the following extra information needs to be logged:
- The time at which the user joins and leaves each forum.
- Any times at which the user changes nickname, and to what nickname.
As of early 1999, the majority of IRC server software does not, by default, log all the required information to provide traceability. There are various patches to the release server software that may provide some or all of this information. To achieve Best Practice standards of traceability it is necessary to apply these patches, bearing in mind the sensibilities of the particular network(s) that are being connected to.
Asia Pacific Network Information Centre
The non-profit Internet Registry organisation for the Asia-Pacific region.
American Registry for Internet Numbers
The nonprofit organisation established for the purpose of administration and registration of Internet Protocol (IP) numbers to the geographical areas previously managed by Network Solutions, Inc. (InterNIC). Those areas include, but are not limited to North and South America, the Caribbean and sub-Saharan Africa.
Berkeley Internet Name Domain
A well-known program that provides DNS services for a machine.
Computer Emergency Response Team (now the CERT Coordination Center)
The CERT Coordination Center is based at Carnegie Mellon University in Pennsylvania, USA. CERT provides technical assistance for responding to computer security incidents, product vulnerability assistance, technical documents, and seminars. In particular the team creates the well-known CERT advisories that document important security issues.
Common Gateway Interface
The system for transferring data back and forth between web input and application programs running on the web server.
Calling Line Identification
The signalling of the telephone number of the device which initiates a telephone call.
Dynamic Host Configuration Protocol
A protocol for allowing networked machines to fetch configuration information, including the IP address that they are to use, from a central server. It is described in RFC2131.
Domain Name System
The distributed system that provides a translation service between names and IP addresses. It is described in RFC1035.
File Transfer Protocol
A protocol for the bulk movement of files of data. It is described in RFC959.
A command within the SMTP email protocol, used to announce the name of a remote machine.
Widely used Usenet news server software.
"I seek you"
A popular person to person messaging program.
A basic protocol for exchanging packets between machines on the Internet. Other protocols are layered upon this to provide services for users. It is described in RFC791 and RFC1122.
Internet Relay Chat
A protocol for providing multi-way text based conversations. It is documented in RFC1459.
Internet Service Provider
ISP is used in this document as a generic term to describe companies and organisations that provide Internet access to others.
Internet Watch Foundation
The IWF is an independent organisation launched in 1996 to address the problem of illegal material on the Internet, with particular reference to child pornography.
London Internet Exchange
The LINX is a totally neutral, not for profit partnership between ISPs. It operates the major UK Internet exchange point. As well as its core activity of facilitating the efficient movement of Internet traffic it is involved in non-core activities of general interest to its members. One such activity on "content regulation" has, as part of its work, generated this document.
Media Access Control
A MAC address is the unique hardware address of a network interface.
Network Access System
The equipment at an ISP that handles leased lines or dialup access to the Internet. Often thought of as the ISP's modems.
Network Address Translation
A scheme for translating IP addresses as they cross network boundaries.
Network News Transfer Protocol
The predominant protocol used to transfer Usenet articles between servers. It is described in RFC977.
Network Time Protocol
A protocol for obtaining an accurate measurement of the current time described in RFC1119 and RFC1305.
OFTEL is the regulator or "watchdog" for the UK telecommunications industry.
An alternative number that can be presented to the destination of a telephone call in place of the number of the true originator of the call.
A type of DNS record that provides a name to correspond with an IP address.
Remote Authentication Dial-In User Service
A protocol that provides authentication and authorisation for dialup accounts. Originally created by Livingstone, it is now a de facto industry standard used by many network product companies. It is described in RFC2138.
Request for Comments
The RFCs are a series of notes, started in 1969, about the Internet (originally the ARPANET). The notes discuss many aspects of computing and computer communication focusing in networking protocols, procedures, programs, and concepts, but also including meeting notes, opinion, and sometimes humour. The Internet standards are documented within the RFC documents.
Réseaux IP Européens
The RIPE Network Coordination Centre acts as the Regional Internet Registry for Europe and surrounding areas.
Simple Mail Transfer Protocol
The email transfer protocol. It is documented in RFC821 and RFC1123.
Simple Network Management Protocol
A protocol used for network management and monitoring. It is documented by more RFCs than it is helpful to list here.
A widely used cryptographically secure login program.
A collection of software units operating in a layered manner to handle protocols such as TCP, UDP and IP.
A flag used within IP packets.
A recording system for the logging of realtime events.
Terminal Access Controller Access System
A TCP based protocol for user authentication and authorisation.
Transmission Control Protocol
TCP is a reliable connection-oriented data transfer protocol built on top of the unreliable connectionless IP protocol. Most Internet services encountered by users are layered on top of TCP. It is described in RFC793 and RFC1122.
Transmission Control Protocol / Internet Protocol
TCP/IP is usually used as an adjective to describe the suite of IP based protocols. It is often applied to software that is capable of handling these Internet protocols.
A general-purpose interactive text based protocol for communicating with remote machines. It is documented in a number of RFCs.
Unsolicited Bulk Messaging
Email or other messages which are sent in large numbers without any explicit requests being made. They are sometimes called "junk email" or "spam". At present they usually contain advertising material for commercial ventures of dubious propriety.
User Datagram Protocol
An unreliable connectionless data transfer protocol layered on top of the IP protocol and described in RFC768. UDP is usually used for short, single packet, messages or where data is time sensitive and high level decisions must be made about retransmissions.
Usenet is a worldwide conferencing system.
World Wide Web
The universe of network-accessible information.
- SafetyNet/R3 Principles
- LINX Best Current Practice Unsolicited Bulk Email
- LINX Best Current Practice Handling Illegal Material
- All published RFCs are available from:
- CERT advisories are available from:
- The main allocation registry "whois" tools are available at:
- and further "whois" databases for particular areas at: