Home > Geek stuff > Sun Resources > Notes > Solaris 2.x Tuning http://ke3vin.org/geek/sun/notes/tuning/
Last updated: 887 day(s) ago (Sun Feb 24 15:01:34 2008) Fri Jul 30 18:10:44 2010

Note: This content is mirrored from http://www.sun-microsystems.org/Tutorials/soltune/soltune.html

The site was often down when I grabbed this however I saved the Google cache of the original content.

Credit for this content goes to them. I just didn't want to lose it. :-)

SolarisTM 2.x - Tuning Your TCP/IP Stack and More


Important Notice!

SUN managed to publish a Solaris Tunable ParametersReference Manual, applying to Solaris 8, HW 2/02, and Solaris Tunable ParametersReference Manual, applying to Solaris 9, HW 9/02. You might want tocheck there for anything you miss here. Another good read is SolarisOperating Environment Network Settings for Security, if you areconcerned about security and denial-of-service attacks.

Table of contents

  1. Introduction
    1.1 History
    1.2 Quick intro into ndd
    1.3 How to read this document
  2. TCP connection initiation
  3. Retransmission related parameters
  4. Path MTU discovery
  5. Further advice, hints and remarks
    5.1 Common TCP timers
    5.2 Erratic IPX behaviors
    5.3 Common IP parameters
    5.4 TCP and UDP port relatedparameters
  6. Windows, buffers and watermarks
  7. Tuning your system
    7.1 Things to watch
    7.2 General entries in the file /etc/system
    7.3 System V IPC related entries
    7.4 How to find further entries
  8. 100 Mbit ethernet and related entries
    8.1 The hme interface
    8.2 Other problems
  9. Recommended patches
  10. Literature
    10.1 Books
    10.2 Internet resources
    10.3 RFC, mentioned and otherwise
    10.4 Further material
  11. Solaris' Future
    11.1 Solaris 7
    11.2 Solaris 8
    11.3 Solaris 9
  12. Uncovered material
  13. Scripts
  14. List of things to do

Appendices are separate documents. They are quoted from within thetext,but you might be interested in them when downloading the currentdocument. If you say "print" for this document, the appendices willnot be printed. You have to download and print themseparately.

  1. Simple transactions using TCP
  2. System V IPC parameter
  3. Retransmission behavior
  4. Slow start implications
  5. The change log
  6. Glossary (first attempt)
  7. Index (first attempt)

1. Introduction

Use at your own risk!

If your system behaves erratically after applying some tweaks,please don'tblame me. Remember to have a backup handy before starting to tune.Alwaysmake backup copies of the files you are changing. I tried carefully toassemble the information you are seeing here, aimed at improved systemperformance. As usual, there are no guarantees that what worked for mewillwork for you. Please don't take my recommendation at heart: They arestarting points, not absolutes. Always read my reasoning, don't usethemblindly.

Before you start, you ought to grab a copy of the TCP statetransition diagram as specified in RFC 793 on page 23. Thedrawback isthe missing error correction supplied by later RFCs. There is an easierwayto obtain blowup printouts to staple to your office walls. Grab a copyofthe PostScript file pocket guide, page 2accompanyingStevens' TCP/IP Illustrated Volume 1 [4]. Orsimplyopen the book at figure 18.12.

Please share your knowledge

The set of documents may look a trifle colorful, or just odd, ifyourbrowser supports cascading stylesheets. Care was taken to select theformatting tags in a way that the printed output still resembles theintentions of the author, and that the set of documents is stillviewablewith browser like Mosaic or Lynx. Stylesheets were used as an opticalenhancement. Most notable is the different color of interior andexternallinks. Interiorlinks areshown in greenish colors, and will be rendered within the sameframe. Externallinks onthe other hand are shown in bluish colors, and all will be showninthe same new frame. If you leave it open, a new external link will beshownwithin the same window. Literature references within the text are ofteninterior links, pointing to the literature section, where the externallinks are located.

1.1 History

This page and the related work have a long history in gathering. Istarted out peeking wide eyed over the shoulders of two people from asearchengine provider when they were installing the Germanserverof a customer of my former employer. My onlyalternative resource of tuning information was the brilliant book TCP/IPIllustrated 1 [4] by Stevens. I started gathering all informationabout tuning I was able to get my hands upon. The cumulation of these you areexperiencing on these pages.

1.2 Quick intro into ndd

Solaris allows you to tune, tweak, set and reset various parametersrelated to the TCP/IP stack while the system is running. Backinthe SunOS 4.x days, one had to change various C files in the kernelsourcetree, generate a new kernel, reboot the machine and try out thechanges.The Solaris feature of changing the important parameters on the fly isveryconvenient.

Many of the parameters I mention in the rest of the document you arereading are time intervals. All intervals are measured inmilliseconds. Other parameters are usually bytecounts,buta few times different units of measurements are used and documented. Afewitems appear totally unrelated to TCP/IP, but due to the lack of abetterframework, they materialized on this page.

Most tunings can be achieved using the program ndd.Anyuser may execute this program to read the current settings, dependingonthe readability of the respective device files. But only the super userisallowed to execute ndd -set to change values. This makessenseconsidering the sensitive parameters you are tuning. Details on the useofndd can be obtained from the respective manual page.

ndd will become your friend, as it is the major tool totweak most of the parameters described in this document. Therefore youbetter make yourself familiar with it. A quick overview will be giveninthis section, too. ndd is not limited to tweaking TCP/IPrelated parameters. Many other devices, which have a device fileunderneath/dev and a kernel module can be configured with the helpofndd. For instance, any networking driver which supportstheData Link Provider Interface (DLPI) can be configured.

The parameters supplied to ndd are symbolic keysindexingeither a single usually numerically value, or a table. Please note thatthekeys usually (but not always) start out with the module or device name.Forinstance, changing values of the IP driver, you have to use the devicefile/dev/ip and all parameters start out with ip_.The question mark is the most notable exception to this rule.

1.2.1 Interactive mode

The interactive mode allows you to inspect and modify a device,driveror module interactively. In order to inspect the available keywordnamesassociated with a parameter, just type the question mark. The next itemwill explain about the output format of the parameter list.

# ndd /dev/tcp
name to get/set ? tcp_slow_start_initial
value ?
length ?
2
name to get/set ? ^D

The example above queries the TCP driver for the value of the slowstartfeature in an interactive fashion. The typed input is shown boldface.

1.2.2 Show all available parameters

If you are interested in the parameters you can tweak for a givenmodule, query for the question mark. This special parameter name ispartof all ndd configurable material. It tells the names ofallparameters available - including itself - and the access mode of theparameter.

# ndd /dev/icmp \?
? (read only)
icmp_wroff_extra (read and write)
icmp_def_ttl (read and write)
icmp_bsd_compat (read and write)
icmp_xmit_hiwat (read and write)
icmp_xmit_lowat (read and write)
icmp_recv_hiwat (read and write)
icmp_max_buf (read and write)

Please mind that you have to escape the question mark with abackslashfrom the shell, if you are querying in the non-interactive fashion asshown above.

1.2.3 Query the value of one or more parameters (read access)

At the command line, you often need to check on settings of yourTCP/IP stack or other parameters. By supplying the parameter name, youcan examine the current setting. It is permissible to mention severalparameters to check on at once.

 # ndd /dev/udp udp_smallest_anon_port
32768
# ndd /dev/hme link_status link_speed link_mode
1

1

1

The first example checks on the smallest anonymous port UDP may usewhensending a PDU. Please refer to the appropriate section later in thisdocument on the recommended settings for this parameter.

The second example checks the three important link report values ofa100 Mbit ethernet interface. The results are separated by an emptyline,because some parameters may refer to tabular values instead of a singlenumber.

1.2.4 Modify the value of one parameter (write access)

This mode of interaction with ndd will frequently befound in scripts or when changing value at the command line in anon-interactive fashion. Please note that you may only set one valueat a time. The scripts section below containsexamples in how to make changes permanent using a startup script.

 # ndd -set /dev/ip ip_forwarding 0

The example will stop the forwarding of IP PDUs, even if more thanone non-local interface is active and up. Of course, you can onlychange parameters which are marked for both, reading and writing.

1.2.5 Further remarks

Andres Kroonmaa kindly supplied a nifty scripttocheck all existing values for a network component (tcp, udp, ip, icmp,etc.). Usually I do the same thing using asmallPerl script.

1.3 How to read this document

This document is separated into several chapters with littleinter-relation. It is still advisable to loosely follow the orderoutlinedin the table of contents.

The first chapter entirely focusses on the TCP connection queues. Itisquite long for such small topic, but it is also meant to introduce youintomy style of writing. The next chapter deals with TCP retransmissionrelatedparameters that you can adjust to your needs. The chapter is moreconcise.One chapter on deals with path MTU discovery, as there used to beproblemswith older versions of Solaris. Recent versions usually do not need anyadjustments.

The fifth chapter is a kind of catch-all. Some TCP, some UDP andsome IPrelated parameters are explained (forwarding, port ranges, timers), andaquick detour into bug 1226653 explains that some versions were capableofsending packages larger than the MTU. The following chapter in depthdealswith windows, buffers and related issues.

Chapter seven detours from the ndd interface, andfocusseson variables you can set in your /etc/system file, assomethings can only be thus managed. Another part of that chapter dealswiththe hme interface and appropriate tunables. The chaptermay besplit in future, and parts of it are already found in the appendices.

The chapter dealing with patches, an important topic with any OS,justpoints you to various sources, and only mentions some essential thingsforolder versions of Solaris.

Literature exists in abundance. The literature sections is more alosecollection of links and some books that I consider essential whenworkingwith TCP/IP, not limited to Solaris. The RFC sections is kind of hardtokeep up-to-date, but then, I reckon you know how to read the rfc-indexfile.

The final chapters quickly glance at new or at one time new versionsofSolaris - time makes them obsolete. The chapter is there for historicalreason, more or less. The scripts sections deals with thenettune script used by YaSSP. It finishes with some TODO material.

2. TCP connection initiation

This section is dedicated exclusively to the various queues andtunablevariable(s) used during connection instantiation. The socket APImaintainssome control over the queues. But in order to tune anything, you havetounderstand how listen and accept interactwiththe queues. For details, see the various Stevens books mentioned in theliterature section.

When the server calls listen, the kernel moves the socketfromthe TCP state CLOSED into the state LISTEN,thusdoing a passive open. All TCP servers work like this. Also, the kernelcreates and initializes various data structures, among them the socket buffers and two queues:
incomplete connection queue

This queue contains an entry for every SYN thathas arrived. BSD sources assign so_q0len entries to thisqueue. The server sends off the ACK of the client's SYNand the server side SYN. The connection get queued andthe kernel now awaits the completion of the TCP three way handshake toopen a connection. The socket is in the SYN_RCVD state.On the reception of the client's ACK to the server's SYN,the connection stays one round trip time (RTT) in this queuebefore the kernel moves the entry into the

completed connection queue

This queue contains an entry for each connection for which thethree way handshake is completed. The socket is in the ESTABLISHEDstate. Each call to accept() removes the front entry ofthe queue. If there are no entries in the queue, the call to acceptusually blocks. BSD source assign a length of so_qlen tothis queue.

Both queues are limited regarding their number of entries. Bycallinglisten(), the server is allowed to specify the size of thesecond queue for completed connections. If the server is for whateverreason unable to remove entries from the completed connectionqueue, the kernel is not supposed to queue any more connections. Atimeout is associated with each received and queued SYNsegment. If the server never receives an acknowledgment for a queuedSYN segment, TCP state SYN_RCVD, the timewillrun out and the connection thrown away. The timeout is an importantresistance against SYN flood attacks.

A model of TCP listening queues   TCP connection initiation timing diagram
Figure 1: Queues maintained for listening sockets.   Figure 2: TCP three way handshake, connectioninitiation.

Historically, the argument to the listen function specified themaximumnumber of entries for the sum of both queues. Many BSD derivedimplementations multiply the argument with a fudge factor of3/2.Solaris <= 2.5.1 do not use the fudge factor, but adds 1, whileSolaris2.6 does use the fudge factor, though with a slightly differentroundingmechanism than the one BSD uses. With a backlog argument of 14, Solaris2.5.1 servers can queue 15 connections. Solaris 2.6 server can queue 22connections.

Stevens shows that the incomplete connection queue does needmore entries for busy servers than the completedconnection queue. The only reason for specifying a large backlog valueisto enable the incomplete connection queue to grow as SYNarrive from clients. Stevens shows that moderately busy webserver hasanempty completed connection queue during 99 % of the time, buttheincomplete connection queue needed 15 or less entries in 98 %ofthe time! Just try to imagine what this would mean for a really busywebcache like Squid.

Data for an established connection which arrives before theconnection isaccept()ed, should be stored into the socket buffer. Ifthequeues are full when a SYN arrived, it is dropped in thehopethat the client will resend it, hopefully finding room in the queuesthen.

According to Cockroft [2], there was only onelisten queue for unpatched Solari <= 2.5.1. Solari >= 2.6 or anapplied TCP patch 103582-12 or above splits the single queue in the twoshown in figure 1. The system administrator is allowed to tweakandtune the various maxima of the queue or queues with Solaris. Dependingonwhether there are one or two queues, there are different setsoftweakable parameters.

The old semantics contained just one tunable parametertcp_conn_req_max which specified the maximum argumentforthe listen(). The patched versions and Solaris 2.6replacedthis parameter with the two new parameterstcp_conn_req_max_q0 andtcp_conn_req_max_q. A SunWorld article on 2.6 byAdrianCockroft tells the following about the new parameters:

tcp_conn_req_max[is]replaced. This value is well-known as it normally needs to be increasedforWeb servers in older releases of Solaris 2. It no longer exists inSolaris2.6, and patch 103582-12 adds this feature to Solaris 2.5.1. The changeispart of a fix that prevents denial of service from SYNfloodattacks. There are now two separate queues of partially completeconnections instead of one.

tcp_conn_req_max_q0is the maximum number of connections with handshake incomplete. A SYNflood attack could only affect this queue, and aspecialalgorithm makes sure that valid connections can still get through.

tcp_conn_req_max_qisthe maximum number of completed connections waiting to return from anaccept call as soon as the right process gets some CPU time.

In other words, the first specifies the size of the incompleteconnection queue while the second parameters assigns the maximumlength of the completed connection queue. Allthree parameters are covered below.

You can determine if you need to tweak this set of parameters bywatching the output of netstat -sP tcp. Look for thevalue oftcpListenDrop, if available on your version of Solaris.Olderversions don't have this counter. Any value showing up might indicatesomething wrong with your server, but then, killing a busy server (likesquid) shuts down its listening socket, and might increase this counter(and others). If you get many drops, you might need to increase theappropriate parameter. Since connections can also be dropped, becauselisten() specifies a too small argument, you have to becareful interpreting the counter value. On old versions, a SYNflood attack might also increase this counter.

Newer or patched versions of Solaris, with both queues available,willalso have the additional counters tcpListenDropQ0 andtcpHalfOpenDrop. Now the original countertcpListenDrop counts only connections dropped from thecompleted connection queue, and the counter ending inQ0 the drops from the incomplete connection queue.Killing a busy server application might increase either or bothcounters.If the tcpHalfOpenDrop shows up values, your server waslikelyto be the victim of a SYN flood. The counter is onlyincremented for dropping noxious connection attempts. I have no idea,ifthose will also show up in the Q0 counter, too.

tcp_conn_req_max
default 8 (max. 32), since 2.5 32 (max. 1024), recommended 128<= x <= 1024
since 2.6 or 2.5.1 with patches 103630-09and 103582-12 or above applied:
see tcp_conn_req_max_qand tcp_conn_req_max_q0

The current parameter describes the maximum number of pendingconnectionrequests queued for a listening endpoint in the completedconnectionqueue. The queue can only save the specified finite number ofrequests. If a queue overflows, nothing is sent back. The client willtimeout and (hopefully) retransmit.

The size of the completed connection queue does notinfluencethe maximum number of simultaneous established connections after theywere accepted nor does it have anyinfluence onthe maximum number of clients a server can serve. With Solaris, themaximumnumber of file descriptors is the limiting factor for simultaneousconnections, which just happened to coincide with the maximum backlogqueuesize.

From the viewpoint of TCP those connections placed in the completedconnection queue are in the TCP state ESTABLISHED,eventhough the application has not reaped the connection with a call to accept.That is the number limited by the size of thequeue,which you tune with this parameter. If the application, for somereason,does not release entries from the queue by calling accept,thequeue might overflow, and the connection is dropped. The client's TCPwillhopefully retransmit, and might find a place in the queue.

Solaris offers the possibility to place connections into thebacklogqueue as soon as the first SYN arrives, called eagerlistening. The three way handshake will be completed as soon astheapplication accept()s the connection. The use of eagerlistening is not recommended for production systems.

Solari < 2.5 have a maximum queue length of 32 pendingconnections.The length of the completed connection queue can also be usedtodecrease the load on an overloaded server: If the queue is completelyfilled, remote clients will be denied further connections. Sometimesthiswill lead to a connection timed out error message.

Naively, I assumed that a very huge length might lead to a longservicetime on a loaded server. Stevens showed that the incompleteconnectionqueue needs much more attention than the completed connectionqueue. But with tcp_conn_req_max you have nooptionto tweak that particular length.

Earlier versions of this document suggested to tune tcp_conn_req_maxwith regards to the values of rlim_fd_maxand rlim_fd_cur, but theinterdependencies aremore complex than any rule of thumb. You have to find your own ideal.Whena connection is still in the queue, only the queue length limits thenumberof entries. Connections taken from the queue are put into a filedescriptoreach.

There is a trick to overcome the hardcoded limit of 1024 witha patch. SunSolve shows this trick inconnection with SYN flood attacks. A greatly increasedlistenbacklog queue may offer some small increased protection against thisvulnerability. On this topic also look at the tcp_ip_abort_cinterval parameter. Better, use the mentioned TCP patches, and increase the q0length.

echo "tcp_param_arr+14/W 0t10240" | adb -kw /dev/ksyms /dev/mem

This patch is only effective on the currently active kernel,limiting itsextend to the next boot. Usually you want to append the line above onthestartup script /etc/init.d/inetinit. The shown patchincreases hard limit of the listen backlog queue to 10240. Onlyafter applying this patch you may use values above 1024 for the tcp_conn_req_maxparameter.

A further warning: Changes to the value of tcp_conn_req_maxparameter in a running system will not take effect until each listening application isrestarted. The backlog queue length is evaluated whenever anapplicationcalls listen(3N), usually once during startup. Sending aHUPsignal may or may not work; personally I prefer to TERM the applicationandrestart them manually or, even better, use a startup script.

tcp_conn_req_max_q0
since2.5.1 with patches 103630-09 and 103582-12 or above applied: default 1024;
since2.6: default 1024, recommended 1024 <= x <= 10240

After installing the mentioned TCP patches, alternatively afterinstalling Solaris 2.6, the parameter tcp_conn_req_maxis no longer available. Inits steadthe new parameters tcp_conn_req_max_q and tcp_conn_req_max_q0emerged. tcp_conn_req_max_q0 is the maximum number ofconnectionswith handshake incomplete, basically the length of the incompleteconnection queue.

In other words, the connections in this queue are just beinginstantiated. A SYN was just received from the client,thusthe connection is in the TCP SYN_RCVD state. Theconnectioncannot be accept()ed until the handshake is complete,even ifthe eager listening is active.

To protect against SYN flooding, you can increase thisparameter. Also refer to the parameter tcp_conn_req_max_qabove. I believe thatchanges won'ttake effect unless the applications are restarted.

tcp_conn_req_max_q
since 2.5.1 with patches 103630-09 and 103582-12 or above applied: default 128;
since 2.6: default 128, recommended 128 <= x <= tcp_conn_req_max_q0

After installing the mentioned TCP patches, alternatively afterinstallingSolaris 2.6, the parameter tcp_conn_req_maxis no longer available. Inits steadthe new parameters tcp_conn_req_max_q and tcp_conn_req_max_q0emerged. tcp_conn_req_max_q is the length of the completedconnection queue.

In other words, connections in this queue of length tcp_conn_req_max_qhave completed the three wayhandshakeof a TCP open. The connection is in the state ESTABLISHED.Connections in this queue have not been accept()ed by theserver process (yet).

Also refer to the parameter tcp_conn_req_max_q0.Remember that changeswon't takeeffect unless the applications are restarted.

tcp_conn_req_min
Since 2.6: default 1, recommended: don't touch

This parameter specifies the minimum number of availableconnectionsin the completed connection queue for select()or poll() to return "readable" for a listening (server)socket descriptor.

Programmers should note that Stevens [7]describes atiming problem, if the connection is RST between the select()or poll() call and thesubsequent accept() call. If the listening socket isblocking,thedefault for sockets, it will block in accept() until avalid connection is received. While this seems no tragedy with awebserver or cache receiving several connection requests per second,the application is not free to do other things in the meantime, whichmight constitute a problem.

3. Retransmission related parameters

The retransmission timeout values used by Solaris are waytooaggressive for wide area networks, although they can be consideredappropriate for local area networks. SUN thus did not follow thesuggestions mentioned in RFC 1122. Newer releases of theSolaris kernel are correcting the values in question:

The recommended upper and lower bounds on the RTO are knownto be inadequate on large internets. The lower bound SHOULDbe measured in fractions of a second (to accommodate highspeed LANs) and the upper bound should be 2*MSL, i.e., 240seconds.

Besides the retransmit timeout (RTO) value two furtherparameters R1 and R2 may be of interest. These don't seem to be tunableviaany Solaris' offered interface that I know of.

The value of R1 SHOULD correspond to at least 3retransmissions, at the current RTO. The value of R2 SHOULDcorrespond to at least 100 seconds.

[...]

However, the values of R1 and R2 may be different for SYNand data segments. In particular, R2 for a SYN segment MUSTbe set large enough to provide retransmission of the segmentfor at least 3 minutes. The application can close theconnection (i.e., give up on the open attempt) sooner, ofcourse.

Great many internet servers which are running Solaris do retransmitsegments unnecessarily often. The current condition of Europeannetworksindicate that a connection to the US may take up to 2 seconds. Allparameters mentioned in the first part of this section relate to eachother!

As a starter take this little example. Consider a picture, size 1440byte, LZW compressed, which is to be transferred over a serial linkupwith14400 bps and using a MTU of 1500. In the ideal case only one PDU getstransmitted. The ACK segment can only be sent afterthe complete PDU is received. The transmission takes about 1 second.Thesevalues seem low, but they are meant as 'food for thought'. Now considersomething going awry...

Solaris 2.5.1 is behaving strange, if the initial SYNsegment from the host doing the active open is lost. The initialSYN gets retransmitted only after a period of 4 *tcp_rexmit_interval_initial plus a constant C. The time is 12seconds with the default settings. More information is being preparedonthe retransmission test page.

The initial lost SYN may or may not be of importanceinyour environment. For instance, if you are connected via ATM SVCs, theinitial PDU might initiate a logical connection (ATM works point topoint)in less than 0.3 seconds, but will still be lost in the process. It israther annoying for a user of 2.5.1 to wait 12 seconds until somethinghappens.

tcp_rexmit_interval_initial
default 500, since 2.5.1 3000, recommended >= 2000 (500 forspecial purposes)

This interval is waited before the last data sent isretransmitted dueto a missing acknowledgment. Mind that this interval is used only forthe first retransmission. The more international your serveris, thelarger you should chose this interval.

Special laboratory environments working in LAN-only environmentsmightbe better off with 500 ms or even less. If you are doing measurementsinvolving TCP (which is almost always a bad idea), you should considerlowering this parameter.

Why do I consider TCP measurements a bad idea? If ad-hocapproaches areused, or there is no deeper knowledge of the mechanics of TCP, you arebound to arrive at wrong conclusions. Unless there are TCP dumps todocument that indeed what you expect is actually happening, results maylead to wrong conclusions. If done properly, there is nothing wrongwithTCP measurements. The same rules apply, if you are measuring protocolsontop of TCP.

There are lots of knobs and dials to be fiddled with - all ofwhich needto be documented along with the results. Scientific experiments need toberepeatable by others in order to verify your findings.

tcp_rexmit_interval_min
default 200, recommended >= 1000 (200 for special purposes)
Since 8: default 400

After the initial retransmission further retransmissions willstart afterthe tcp_rexmit_interval_min interval. BSD usuallyspecifies 1500 milliseconds. This interval should be tuned to the valueof tcp_rexmit_interval_initial, e.g. some valuebetween50 % up to 200 %. The parameter has no effect on retransmissions duringan active open, see my accompanyingdocument onretransmissions.

The tcp_rexmit_interval_min doesn't displayany influenceon connection establishment with Solaris 2.5.1. It does with 2.6,though. The influence on regular data retransmissions, or FINretransmissions I have yet to research.

tcp_ip_abort_interval
default 120000, since 2.5 480000, recommended 600000

This interval specifies how long retransmissions fora connectionin the ESTABLISHED state should be tried before a RESETsegment is sent. BSD systems default to 9minutes.

You don't want your connections to fail too quickly once theyare inthe ESTABLISHED state. A reader reported that Veritasbackupclients might fail with "socket write failed". Veritas recommends nottoset above parameter below 8 minutes.

tcp_ip_abort_linterval
default ?, recommended ?

According to an unconfirmed user report, the parameter is theabortinterval for passive connections, i.e. those received on ports in theLISTEN state. Refer to the tcp_ip_abort_cintervalfor details, as there is some confusion between what SunSolve says andwhatcan be read in Stevens.

tcp_ip_abort_cinterval
default 240000, since 2.5 180000, recommended ?

This interval specifies how long retransmissions for a remotehost arerepeated until the RESET segment is sent. The differencetothe tcp_ip_abort_interval parameter is that thisconnection is about to be established - it has not yet reached thestate ESTABLISHED. This value is interesting considering SYNflood attacks on your server. Proxy server aredoublyhandicapped because of their Janus behavior (like a server towards thedownstream cache, like a client towards the upstream server).

According to Stevens this interval is connected to the activeopen, e.g. the connect(3N) call. But according to SunSolve the interval has animpetus on bothdirections. A remote client can refuse to acknowledge an openingconnectionup to this interval. After the interval a RESET is sent.Theother way around works out, too. If the three-way handshake to open aconnection is not finished within this interval, the RESETSegment will be sent. This can only happen, if the final ACKwent astray, which is a difficult test case to simulate.

To improve your SYN flood resistance, SUN suggeststo usean interval as small as 10000 milliseconds. This value has only beentestedfor the "fast" networks of SUN. The more international your connectionis,the slower it will be, and the more time you should grant in thisinterval.Proxy server should never lower this value (and should let Squidterminate the connection). Webservers are usually not affected, as theyseldom actively open connections beyond the LAN.

tcp_rexmit_interval_max
default 60000, RFC 1122 recommends 240000 (2MSL), recommended1...2 * tcp_close_wait_intervalor tcp_time_wait_interval
Since 2.6: default 240000
Since 8: default 60000

All previously mentioned retransmissions related interval use an exponential backoff algorithm. The wait interval betweentwoconsecutive retransmissions for the same PDU is doubled starting withtheminimum.

The tcp_rexmit_interval_max interval specifiesthemaximum wait interval between two retransmissions. If changing thisvalue,you should also give the abort interval an inspection. The maximum waitinterval should only be reached shortly before the abort interval timerexpires. Additionally, you should coordinate your interval with thevalueof tcp_close_wait_intervalor tcp_time_wait_interval.

tcp_deferred_ack_interval
default 50, BSD 200, recommended 200 (regular), 50(benchmarking), or 500 (WAN server)
Since 8: default 100

This parameter specifies the timeout before sending a delayed ACK. The value should not be increasedabove500, as required by RFC 1122. This value is ofgreatinterest for interactive services. A small number will increase the"responsiveness" of a remote service (telnet, X11), while a largervaluecan decrease the number of segments exchanged.

The parameter might also interest to HTTP servers which transmitsmallamounts of data after a very short retrieval time. With a heavy-dutyservers or in laboratory banging environment, you might encounterservicetimes answering a request which are well above 50 ms. An increase to500might lead to less PDUs transferred over the network, because TCP isableto merge the ACK with data. Increases beyond 500 shouldnot be even considered.

SUN claims that Solarisrecognizesthe initial data phase of a connection. An initial ACK(not SYN) is not delayed. As opposed to thesimplisticapproach mentioned in the SUN paper, a request for a webservice (both,server or proxy) which does not fit into a single PDU can betransmittedfaster. Also check the tcp_slow_start_initialParameter.

The tcp_deferred_ack_interval also seems tobe used todistinguish full-sized segments between interactive traffic and bulkdatatransfer. If a sender uses MSS sized segments, but sends each segmentfurther apart than approximately 0.9 times the interval, the trafficwillbe rated interactive, and thus every segment seems to get ACKed.

tcp_deferred_acks_max
Since 2.6: default 8, recommended ?, maximum 16

This parameter features the maximum number of segments receivedafterwhich an ACK just has to be sent. Previously I thought this parametersolely related to interactive data transfer, but I was mistaken. Thisparameter specifies the number of outstanding ACKs. Youcangive it a look when tuning for high speed traffic and bulk transfer,butthe parameter is controversial. For instance, unless you employselectiveacknowledgments (SACK) like Solaris 7, you can only ACK the number ofsegments correctly received. With the parameter at a larger value,statistically the amount of data to retransmit is larger.

Good values for retransmission tuning don't beam into existence fromawhite source. Rather you should carefully plan an experiment to getdecentvalues. Intervals from another site can not becarriedover to another Solaris system without change. But they might give youanidea where to start when choosing your own values.

The next part looks at a few parameters having to do withretransmissions, as well.

tcp_slow_start_initial
Since 2.5.1 with patch 103582-15 applied:default 1
Since 2.6: default 1, recommended 2 or 4 for servers
Since 8: default 4, no recommendations

This parameter provides the slow-start bug discovered in BSD andWindowsTCP/IP implementations for Solaris. More information on the topic canbefound on the servers of SUN and in Stevens[6]. To summarize the effect, a server startssending two PDUs at once without waiting for an ACK duetowrong ACK counts. The ACK from connectioninitiation being counted as data ACK - compare with figure 2. Network congestion avoidancealgorithms arebeing undermined. The slow start algorithm does not allow the buggybehavior, compare with RFC 2001.

Setting the parameter to 2 allows a Solaris machine to behavelike ithas the slow start bug, too. Well, IETF is said to make amends to theslowstart algorithm, and the bug is now actively turned into a feature. SUN also warns:

It's still conceivable, although rare, that on a configurationthatsupports many clients on very slow-links, the change might induce morenetwork congestions. Therefore the change of tcp_slow_start_initialshould be made withcaution.

[...]

Future Solaris releases are likely to default to 2.

You can also gain performance, if many of your clients arerunning oldBSD or derived TCP/IP stacks (like MS). I expect new BSD OS releasesnot tofigure this bug, but then I am not familiar with the BSD OS family. Areader of this page told me about cutting the latency of his server inhalf, just by using the value of 2.

If you want to know more about this feature and its behavior,you canhave a look at some experiments I haveconducted concerning that particular feature. The summary is that Iagreewith the reader: A BSDish client like Windows definitely profits fromusinga value of 2.

tcp_slow_start_after_idle
Since 2.6: default 2, no recommendations
Since 8: default 4, no recommendations

I reckon that this parameter deals with the slow start for analreadyestablished connection which was idle for some time (however the termidleis defined here).

tcp_dupack_fast_retransmit
default 3, no recommendations

Something to do with the number of duplicates ACKs.If wedo fast retransmit and fast recovery algorithms, this many ACKsmust be retransmitted until we assume that asegment hasreally been lost. A simple reordering of segments usually causes nomorethan two duplicate ACKs.

There are a couple of parameters which require some elementaryfamiliarity with RFC 2001, which covers TCP SlowStart, Congestion Avoidance, Fast Retransmit, and Fast RecoveryAlgorithms,as well as ssthresh and cwnd.

tcp_rtt_updates
default 0, BSD 16, recommended: (see text)
Since 8: 20, no recommendations

This parameter controls when things like rtt_sa (thesmoothedRTT), rtt_sd (the smoothed mean deviation), and ssthresh(the slow start threshold) are cached in the routing table. By default,Solaris does not cache any of the parameters. It is claimed that youcanset it to a value you like, but to be the same as BSD, use 16.

The value to this parameter is the number of RTT samples thathad to besampled, so that an accurate enough value can be stored in the routingtable. If you chose to use this feature, use a value of 16 or above.Using16 allows the smoothed RTT filter to converge within 5 % of the correctvalue, compare Stevens [4], chapter 21.9.

ip_ire_cleanup_interval
default 30000, no recommendatations
Since 8: the parameter has a new name: ip_ire_arp_interval

The parameters may do more than described here. If a routingtable entryis not directly connected and not being used, the cache for things like rtt_sa, rtt_sd and ssthresh associatedwith theentry will be flushed after 30 seconds. The parameter tcp_rtt_updatesmust be greater than zero toenable thecache.

I could imagine that external helper programs invoked by MRTGona regular basis connecting to a far-away host might benefit fromincreasingthis value slightly above the invocation interval.

4. path MTU discovery

Whenever a connection is about to be established, the three-wayhandshake open negotiation, the segment size used will be set to theminimum of (a) the smallest MTU of an outgoing interface, and (b) fromMSSannounced by the peer. If the remote peer does not announce a MSS,usuallythe value 536 will be assumed. If path MTU discovery is active, alloutgoing PDUs have the IP option DF (don't fragment) set.

If the ICMP error message fragmentation needed isreceived, arouter on the way to the destination needed to fragment the PDU, butwasnot allowed to do so. Therefore the router discarded the PDU and didsendback the ICMP error. Newer router implementations enclose the neededMSS inthe error message. If the needed MSS is not included, the correct MSSmustbe determined by trial and error algorithm.

Due to the internet being a packet switching network, the route a PDUtravels along a TCP virtual circuit may change with time. For thisreasonRFC 1191 recommends torediscoverthe path MTU of an active connection after 10 minutes. Improvements oftheroute can only be noticed by repeated rediscoveries. Unfortunately,Solarisaggressively tries to rediscover the path MTU every 30 seconds. Whilethisis o.k. for LAN environments, it is a grossly impolite behavior inWANs.Since routes may not change that often, aggressive repetitions of pathMTUdiscoveries leads to unnecessary consumption of channel capacity andelongated service times.

Path MTU discovery is a far reaching and controversialtopicwhen discussing it with local ISPs. Still, pMTU discovery is at thefoundation of IPv6. The PSC tuning page argues pro path MTU discovery,especially if you maintain a high-speed or long-delay (e.g. satellite)link.

The recommendation I can give you is not to usethedefaults of Solaris < 2.5. Please use path MTU discovery, but tuneyoursystem RFC conformant. You may alternatively want to switch off thepathMTU discovery all together, though there are few situations where thisisnecessary.

I was made aware of the fact that in certain circumstances bridgesconnecting data link layers of differing MTU sizes defeat pMTUdiscovery. Ihave to put some more investigation into this matter. If a frame withmaximum MTU size is to be transported into the network with the smallerMTUsize, it is truncated silently. A bridge does not know anything abouttheupper protocol levels: A bridge neither fragments IP nor sends an ICMPerror.

There may be work-arounds, and the tcp_mss_def isoneof them. Setting all interfaces to the minimum shared MTU might help,atthe cost of losing performance on the larger MTU network. Using what RFC 1122 calls an IPgateway is a possible, yet expensive solution.

ip_ire_pathmtu_interval
default 30000, recommended 600000
Since 2.5 600000, no recommendations

This timer determines the interval Solaris rediscovers the pathMTU. Anextremely large value will only evaluate the path MTU once atconnectionestablishment.

ip_path_mtu_discovery
default 1, recommended 1

This parameter switches path MTU discovery on or off. If youenter a 0here, Solaris will never try to set the DF bit in the IP option -unlessyour application explicitly requests it.

tcp_ignore_path_mtu
default 0, recommended 0

This is a debug switch! Whenactivated, thisswitch will have the IP or TCP layer ignore all ICMP error messages fragmentationneeded. By this, you will achieve theopposite ofwhat you intended.

tcp_mss_def
default 536, recommended >= 536
Since 8: split into tcp_mss_def_ipv4and tcp_mss_def_ipv6

This parameter determines the default MSS (maximum segmentsize) for non-local destination. For path MTU discovery to workeffectively, this value can be set to the MTU of the most-used outgoinginterface descreased by 20 byte IP header and 20 byte TCP header - ifandonly if the value is bigger than 536.

tcp_mss_def_ipv4
Since 8: default 536
tcp_mss_def_ipv6
Since 8: default 1460

Solaris 8 supports IPv6. Since IPv6 uses different defaults forthemaximum segment size, one has to distinguish between IPv4 and IPv6. Thedefault for IPv6 is close to what is said for tcp_mss_def.

tcp_mss_min
default 1, see text, (88 in Linux 2.4 and Win2k)
Since 8 with patch 108528-14 or above: 108

This parameter defines the minimum for the maximum segment size.While Istill ponder the implications of increasing this parameter, some peopletell me that a minimum of 108, especially in an NFS environment, isfavorable. It might also have security implications on theTCP MSShandshake during connection initiation: A malicious host can announce aMSSthat is below any sensible value, and this parameter mightprotectyour host by enforcing a sensible minimum.

Unfortunately, currently I don't have a host to test this.

5. Further advice, hints and remarks

This section covers a variety of topics, starting with various TCPtimers which do not relate to previously mentioned issues. The nextsubsection throws a quick glance at some erratic behavior. The finalsection looks at a variety of parameters which deal with thereservation ofresources.

Additionally, I strongly suggest the use of a file /etc/init.d/nettune(always calledfirst script) which changes the tunable parameters./etc/rcS.d/S31nettune is a hardlink to this file. Thescriptwill be executed during bootup when the system is in single usermode. A killscript is not necessary. The section about startup scripts below reiterates this topic ingreaterdepth.

5.1 Common TCP timers

The current subsection covers three important TCP timers. First Iwillhave a look at the keepalive timer. The timer is rather controversial,andsome Solari implement them incorrectly. The next parameter limits thetwice maximum segment lifetime (2MSL) value, which isconnected tothe time a socket spends in the TCP state TIME_WAIT. Thefinalentry looks at the time spend in the TCP state FIN_WAIT_2.

tcp_keepalive_interval
default 7200000, minimum 10000, recommended 10000 <= x <=oo

This value is one of the most controversial ones when talkingwith otherpeople about appropriate values. The interval specified with this keymustexpire before a keep-alive probe can be sent. Keep-aliveprobesare described in the host requirements RFC 1122: Ifahost chooses to implement keep-alive probes, it mustenable the application to switch them on oroff for aconnection, and keep-alive probes must be switchedoff bydefault.

Keep-alives can terminate a perfectly good connection(as faras TCP/IP is concerned), cost your money and use up transmissioncapacity(commonly called bandwidth, which is, actually, something completelydifferent). Determining whether a peer is alive should be a task of theapplication and thus kept on the application layer. Only if you runintothe danger of keeping a server in the ESTABLISHED stateforever, and thus using up precious server resources, you should switchonkeep-alive probes.

Example for a webserver response

Figure 3: A typical handshake during a transaction.

Figure 3 shows the typical handshake during a HTTP connection.It is ofno importance for the argumentation if the server is threaded,preforked orjust plain forked. Webservers work transaction oriented as is shown inthefollowing simplified description - the numbers do notrelate to the figure:

  1. The client (browser) initiates a connection (active open).
  2. The client forwards its query (request).
  3. The server (daemon) answers (response).
  4. The server terminates the connection (active close).

Common implementations need to exchange 9..10TCP segments per HTTPconnection. The keep-alive option as a HTTP/1.0protocol and extensions can be regarded as a hack. Persistentconnections are a different matter, and not shown here. Most peoplestilluse HTTP/1.0, especially the Squid users.

The keep-alive timer becomes significant forwebservers, if instep 1 the client crashed or terminates without the server knowingaboutit. This condition can be forced sometimes by quickly pressing the stopbutton of netscape or the Logo of Mosaic. Thus the keep-aliveprobes do make sense for webservers. HTTP Proxies look like a server tothebrowser, but look like a client to the server they are querying. Due totheir server like interface, the conditions for webservers are true forproxies, as well.

With an implementation of keep-alive probes workingcorrectly, a very small value can make sense when trying toimprove webservers. In this case you have to make sure that the probesstopafter a finite time, if a peer does not answer. Solari <=2.5have a bug and send keep-alive probes forever. They seem towantto elicit some response, like a RST or some ICMP errormessagefrom an intermediate router, but never counted on the destinationsimplybeing down. Is this fixed with 2.5.1? Is there a patch availableagainstthis misbehavior? I don't know, maybe you can help me.

I am quite sure that this bug is fixed in 2.6 and that it issafe to usea small value like ten minutes. Squid users should synchronize theircacheconfiguration accordingly. There are some Squid timeouts dealing withanidle connection.

tcp_close_wait_interval
default 240000 (according to RFC 1122, 2MSL), recommended 60000,possibly lower
Since 7: obsoleted parameter, use tcp_time_wait_intervalinstead
Since 8: no more access, use tcp_time_wait_interval

Even though the parameter key contains "close_wait" in its name,thevalue specifies the TIME_WAIT interval! In order tofixthis kind of confusion, starting with Solaris 7, the parameter tcp_close_wait_intervalwas renamed to the correctname tcp_time_wait_interval.The old key tcp_close_wait_interval still exists forbackward compatibility reasons. User of Solari below 7 mustusethe old name tcp_close_wait_interval. Still, refer to tcp_time_wait_intervalfor an in-depth explaination.

tcp_time_wait_interval
Since 7: default 240000 (2MSL according to RFC 1122), recommended60000, possibly lower

As Stevens repeatedly states in his books, the TIME_WAITstate is your friend. You should not desperately try to avoid it,rathertry to understand it. The maximum segment lifetime(MSL) isthemaximum interval a TCP segment may live in the net. Thus waiting twicethisinterval ensures that there are no leftover segments coming to hauntyou.This is what the 2MSL is about. Afterwards it is safe to reuse thesocketresource.

The parameter specifies the 2MSL according to the four minutelimitspecified in RFC 1122. With the knowledgeaboutcurrent network topologies and the strategies to reserve ephemeralportsyou should consider a shorter interval. The shorter the interval, thefaster precious resources like ephemeral ports are available again.

A toplevel search engine implementor recommends a value of 1000millisecond to its customers. Personally I believe this is too low forregular server. A loaded search engine is a different matteralltogether,but now you see where some people start tweaking their systems. Irathertend to use a multiple of the tcp_rexmit_interval_initialinterval. The current value of tcp_rexmit_interval_maxshould also be considered in this case - even though retransmissionsareunconnected to the 2MSL time. A good starting point might be the doubleRTTto a very remote system (e.g. Australia for European sites).Alternativelya German commercial provider of my acquaintance uses 30000, thesmallestinterval recommended by BSD.

tcp_fin_wait_2_flush_interval
BSD 675000, default 675000, recommended 67500 (one zero less)

This values seems to describe the (BSD) timer interval whichprohibits aconnection to stay in the FIN_WAIT_2 state forever. FIN_WAIT_2is reached, if a connection closesactively. The FIN is acknowledged, but the FINfromthe passiveside didn't arrive yet - and maybe never will.

Usually webservers and proxies actively close connections - aslong asyou don't use persistent connection and even those are closed from timetotime. Apart from that HTTP/1.0 compliant server and proxies closeconnections after each transaction. A crashed or misbehaving browsermaycause a server to use up a precious resource for a long time.

You should consider decreasing this interval, if netstat-finet shows many connections in the state FIN_WAIT_2.The timer is only used, if the connection is really idle.Mindthat after a TCP half close a simplex data transmission is stillavailabletowards the actively closing end. TCP half closes are not yet supportedbySquid, though many web servers do support them (certain HTTP draftssuggestan independent use of TCP connections). Nevertheless, as long as theclientsends data after the server actively half closed an establishedconnectionthe timer is not active.

Sometimes, a Squid running on Solaris (2.5.1) confuses the systemutterly. A great number of connection to a varying degree are inCLOSE_WAIT for reasons beyond me. During this phase theproxyis virtually unreachable for HTTP requests though, obnoxiously, itstillanswers ICP requests. Although lowering the value for tcp_close_wait_intervalis only fixing symptoms indirectly, not the cause, it may helpovercomingthose periods of erratic behavior faster than the default. The thingneededwould be some means to influence the CLOSE_WAIT intervaldirectly.

5.2 Erratic IPX behavior

I noticed that Solari < 2.6 behave erratically under someconditions,if the IPX ethernet MTU of 1500 is used. Maybe there is an error in theframe assembly algorithm. If you limit yourself to the IEEE 802.3 MTUof1492 byte, the problem does not seem to appear. A sample startup script with link in /etc/rc2.dcanbe used to change the MTU of ethernet interfaces after theirinitialization. Remember to set the MTU for every virtual interface,too!

Note, with a patched Solaris 2.5.1 or Solaris 2.6, the problemdoes notseem to appear. Limiting your MTU to non-standard might introduceproblemswith truncated PDUs in certain (admittedly very special) environments.Thusyou may want to refrain from using the above mentioned script (alwayscalled second script in this document).

Since I observed the erratic behavior only in a Solaris 2.5, Ibelieveit has been fixed with patch 103169-10, or above. The error descriptionreads "1226653 IP can send packets larger than MTU size to the driver."

5.3 Common IP parameters

The following parameters have little impact on performance,nevertheless Ireckon them worth noting here. Please note that parameters startingwiththe ip6 prefix apply to IPv6 while its twin with theip applies to IPv4:

ip6_forwarding
Since 8: default 1, recommended 0 for pure server hosts orsecurity
ip_forwarding
default 2, recommended 0 for pure server hosts or security
Since 8: default 1, recommended 0 for security reasons

If you intend to disable the routing abilities of your host alltogether, because you know you don't need them, you can set this switchto0. The default value of 2 was only available in older versions ofSolaris.It activates IP forwarding, if two or more real interfaces are up. Thevalue of 1 in Solari < 8 activates IP forwarding regardless of thenumber of interfaces. With the possible exception of MBone routers andfirewalling, you should leave routing to the dedicated routinghardware.

Starting with Solaris 8, the parameter set is split. You use ip_forwardingand ip6_forwardingtooverall switch on forwarding of IPv4 and IPv6 PDU respectively betweeninterfaces. The interfaces participating in forwarding can be activatedseparately, see if:ip_forwarding.Unlessyou host is acting as router, it is still recommended for securityreasonsto switch off any forwarding between interfaces.

if:ip_forwarding
Since 8: default 0, maximum 1, recommended 0

Please replace the if part of the parameter name withtheappropriate interface available on your system, e.g. hme0 or hme0.Look into the available /dev/ipparameters,if unsure what interfaces are known to the IP stack.

Starting with Solaris 8, a subset of interfaces participating inIPforwarding can be selected by setting the appropriate parameter to 1.Youalso need to set the ip6_forwardingand ip_forwardingparameter, ifyou want to forward IPv6 or IPv6 respectively.

For security reasons, and in many environments, forwarding is notrecommended.

ip6_forward_src_routed
Since 8: default 1, recommended 0 for security reasons
ip_forward_src_routed
default 1, recommended 0 for security reasons

This parameter determines if IP datagrams can be forwarded whichhavethe source routing option activated. The parameter has littlemeaning for performance but is rather of security relevance. Solarismayforward such datagrams, if the host route option is activated,bypassingcertain security construct - possibly undermining your firewall. Thusyoushould disable it always, unless the host functions as a regular router(and no other services).

If you enabled IPv6 forwarding or IPv4 forwarding, the *_forward_src_routedparameters may relate toforwarding.

ip_forward_directed_broadcasts
default 1, recommended 0 for pure server hosts or security

This switch decides whether datagrams directed to any of yourdirectbroadcast addresses can be forwarded as link-layer broadcasts. If theswitch is on (default), such datagrams are forwarded. If set to zero,pingsor other broadcasts to the broadcast address(es) of your installedinterface(s) are silently discarded. The switch is recommended for anyhost, but can break "expected" behavior.

ip6_respond_to_echo_multicast
Since 8: default 1, recommended 0 for security reasons
ip_respond_to_echo_broadcast
default 1, recommended 0 for security reasons

If you don't want to respond to an ICMP echo request (usuallygeneratedby the ping program) to any of your IPv4 broadcast orIPv6multicast addresses addresses, set the matching parameter to 0. On onehand, responding to broadcast pings is rumored to have caused panics,or atleast partial network meltdowns. On the other hand, it is a validbehavior,and often used to determine the number of alive hosts on a particularnetwork. If you are dead sure that neither you nor your network adminwillneed this feature, you can switch it off by using the value of 0.

If you do not want to respond to any IPv4-broadcast orIPv6-multicastprobes for security reasons, it is recommended to set the matchingparameter to 0.

ip_icmp_err_burst
Since 8: default 10, min 1, maximum 99999, see text
ip_icmp_err_interval
default 500, recommended: see text

Solaris IP only generates ip_icmp_err_burstICMP errormessages in any ip_icmp_err_interval, regardless ofIPv4or IPv6. In order to protect from denial of service (DOS)attacks,the parameters do not need to be changed. Some administrators may needahigher error generation rate, and thus may want to decrease theinterval orincrease the generated message.

In versions of Solaris prior to 8, ip_icmp_err_intervalused to define the minimum time between two consecutive ICMP errorresponses - as if in older versions the (by then not existing) ip_icmp_err_burstparameter had a value of 1. Thegenerated ICMP responses include the time exceeded message asevoked by the traceroute command. If your current settinghere isabove the RTT of a traceroute probe, usually the second probeyousee will time out.

If you set ip_icmp_err_burst to exactly 0, traceroutewill not give away your host as runningSolaris. Also,you switched of the rate limitation of ICMP messages, and are thus opentoDOS attacks. Of course, there are other ways to determine which TCP/IPimplementation a networked host is running.

ip6_icmp_return_data_bytes
Since 8: default 64, minimum 8, maximum 65520, no recommendations
ip_icmp_return_data_bytes
default 64, minimum 8, maximum 65520, no recommendations

The parameters control the number of bytes returned by any ICMPerrormessage generated on this Solaris host. The default value 64 issufficientfor most cases. Some laboratory environments may want to temporarilyincrease the value in order to figure out problems with some networkservices.

ip6_send_redirects
Since 8: default 1, recommendation 0 for security reasons
ip_send_redirects
default 1, recommendation 0 for security reasons

These parameters control whether the IPv4 or IPv6 part of the IPstacksend ICMP redirect messages. For security reasons, it is recommended todisable sending out such messages, unless your host is acting asrouter.

If you enabled IPv6 forwarding or IPv4 forwarding, the *_send_redirectsparameters may relate toforwarding.

ip6_ignore_redirect
Since 8: default 0, recommendation 1 for security reasons
ip_ignore_redirect
default 0, recommendation 1 for security reasons

This flag control, if your routing table can be updated by ICMPredirectmessages. Unless you run your host to act as router, it is recommendedtodisable this feature for security reasons. Otherwise, maliciousexternal hosts may confuse your routing table.

If you enabled IPv6 forwarding or IPv4 forwarding, the *_ignore_redirectsparameters may relate toforwarding.

ip_addrs_per_if
default 256, minimum 1, maximum 8192, no recommendations

This parameter limits the number of virtual interfaces you candeclareper physical interface. Especially if you run WebPolygraph, you will need to increase the number ofvirtual interfaces available on your system.

ip6_strict_dst_multihoming
Since 8: default 0, recommended: see text
ip_strict_dst_multihoming
default 0, recommended: see text

According to RFC 1122, a host is said to bemultihomed, if it has more than one IP address. Each IP address isassumedto be a logical interface. Different logical interfaces may map to thesamephysical interface. Physical interfaces may be connected to the same ordifferent networks.

The strong end system model aka strictmultihomingrequires a host not to accept datagrams on physical interfaces to whichtological one is not bound. Outgoing datagrams are restricted to theinterface which corresponds with the source ip address.

The weak end system model aka loosemultihoming lets ahost accept any of its ip addresses on any of its interfaces. Outgoingdatagrams may be sent on any interface.

For security reasons, it is recommended to require strictmultihoming,that is, setting the parameter to value 1. In certain circumstances,though, it may be necessary to disable strict multihoming, e.g. if thehostis connected to a virtual private networks (VPN) or sometimeswhenacting as firewall.

For instance, I once maintained a setup, where a pair of relatedcachingproxies were talking exclusively to each other via a crossover cable ononeinterface using private addresses while the other interface wasconnectedto the public internet. In order to have them actually use thebehind-the-scenes link, I had to manually set routes and disable strictmultihoming.

5.4 TCP and UDP port related parameters

There are some parameters related to the ranges of ports associatedwithreserved access and non-privileged access. This section deals with themajority of useful parameters when selecting different than defaultportranges.

udp_smallest_anon_port
tcp_smallest_anon_port
default 32768, recommended 8192

This value has the same size for UDP and TCP. Solaris allocatesephemeral ports above 32768. Busy servers or hosts using a large 2MSL,see tcp_close_wait_interval,may want to lower this limit to 8192. This yields more preciousresources,especially for proxy servers.

A contra-indication may be servers and services running on wellknownports above 8192. This parameter should be set very early during systembootup, especially before the portmapper is started.

The IANA port numbers documentrequiresthe assigned and/or private ports to start at 49152. For busy servers,severly limiting their ephemeral port supply in such a manner is not anoption.

udp_largest_anon_port
default 65535, recommended: see text

This parameter has to be seen in combination with udp_smallest_anon_port.The tracerouteprogram tries to reach a random UDP port above 32768 - or rather triesnotto reach such a port - in order to provoke an ICMP error message fromthehost.

Paranoid system administrator may want to lower the value forthisreason down to 32767, after the corresponding valuefor udp_smallest_anon_port has been lowered. On theotherhand, datagram application protocols should be able to cope withforeignprotocol datagrams.

If an ICP caching proxy or other UDP hyper-active applicationsare used,the lowering of this value can not be recommended. The respective TCPparameter tcp_largest_anon_port does not suffer thisproblem.

tcp_largest_anon_port
default 65535, no recommendations

The largest anonymous port for TCP should be the largestpossible portnumber. There is no need to change this parameter.

udp_smallest_nonpriv_port
default 1024, no recommendations
tcp_smallest_nonpriv_port
default 1024, no recommendations

Privileged ports can only be bound to by the superuser. Thesmallestnon-privileged port is the first port that a regular user can have hisorher application to bind to.

tcp_extra_priv_ports_add
udp_extra_priv_ports_add
write-only action
tcp_extra_priv_ports_del
udp_extra_priv_ports_del
write-only action
tcp_extra_priv_ports
udp_extra_priv_ports
default (depends on active services)

The extra priviledged ports are those priviledged ports outsidethescope of the reserved ports. Reserved port numbers are usually below1024,see tcp_smallest_nonpriv_port for TCP and udp_smallest_nonpriv_portfor UDP, and requiresuperuserprivileges in order to bind to. For instance, if NFS is activated, theNFSserver port 2049 is marked as privileged.

You can examine the extra privileged TCP port by looking at theread-only parameter tcp_extra_priv_ports. If you needtoadd an extra privileged port, use the tcp_extra_priv_ports_addwith the port number asargument.If you need to remove an extra privileged port, use the tcp_extra_priv_ports_delaction with the portnumber toremove as parameter. You can only add or remove one port at a time.

# ndd /dev/tcp tcp_extra_priv_ports
2049
4045
# ndd -set /dev/tcp tcp_extra_priv_ports_add 4444 5555
# ndd /dev/tcp tcp_extra_priv_ports
2049
4045
4444
# ndd -set /dev/tcp tcp_extra_priv_ports_del 4444
# ndd /dev/tcp tcp_extra_priv_ports
2049
4045

Analogous procedures apply to UDP extra privileged port.

6. Windows, buffers and watermarks

This section is about windows, buffers and watermarks. It is stillworkin progress. The explanations available to me were very confusing(sigh),though the Stevens [7] helped to clear up a fewthings.If you have corrections to this section, please let me know andcontributeto an update of the page. Many readers will thank you!

buffers and fragmentation while descending protocol layers

Figure 4: buffers and related issues

Here just a short trip through the network layer in order to explainwhat happens where. Your application is able to send almost any size ofdata to the transport layer. The transport layer is either UDP or TCP.Thesocket buffers are implemented on the transport layer. Depending onyourchoice of transport protocol, different actions are taken on thislevel.

TCP
All application data is copied into the socket buffer. If thereis insufficient size, the application will be put to sleep. From thesocket buffer, TCP will create segments. No chunk exceeds the MSS.

Only when the data was acknowledged from the peer instance, thedata can be removed from the socket buffer! For slow connections or aslowly working peer, this implies a very long time some data uses upthe buffer.

UDP
The socket buffer size of UDP is simply the maximum size ofdatagram UDP is able to transmit. Larger datagrams ought to elicit the EMSGSIZEerror response from the socket layer. With UDP implementing anunreliable service, there is no need to keep the datagram in the socketbuffer.

Please assume that there is not really a socket buffer forsending UDP. This really depends on the operating systems, but manysystems copy the user data to some kernel storage area, whereas otherstry to eliminate all copy operations for the sake of performance.

Please note that for the reverse direction, that is receivingdatagrams, UDP does indeed employ real buffering.

The IP layer needs to fragment chunks which are too large. Among thereasons TCP prechunks its segments is the need to avoid fragmentation.IPsearches the routing tables for the appropriate interface in order todetermine the fragment size and interface.

If the output queue of the datalink layer interface is full, thedatagram will be discarded and an error will be returned to IP and backtothe transport layer. If the transport protocol was TCP, TCP will try toresend the segment at a later time. UDP should return the ENOBUFSerror, but some implementations don't.

To determine the MTU sizes, use the ifconfig -acommand.The MTUs are needed for some calculation to be done later in thissection.With IPv4 you can determine the MSS from the interface MTU bysubtracting20 Bytes for the TCP header and 20 Bytes for the IP header. Keep thisinmind, as the calculation will be repeatedly necessary in the textfollowingbelow.

$ ifconfig -a
lo0: flags=849<UP,LOOPBACK,RUNNING,MULTICAST> mtu 8232
inet 127.0.0.1 netmask ff000000
hme0: flags=863<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST> mtu 1500
inet 130.75.3.xxx netmask ffffff80 broadcast 130.75.3.255
ci0: flags=843<UP,BROADCAST,RUNNING,MULTICAST> mtu 9180
inet 130.75.214.xxx netmask ffffff00 broadcast 130.75.214.255
ether xx:xx:xx:xx:xx:xx
fa0: flags=842<BROADCAST,RUNNING,MULTICAST> mtu 9188
inet 0.0.0.0 netmask 0
ether xx:xx:xx:xx:xx:xx
el0: flags=843<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 130.75.215.xxx netmask ffffff00 broadcast 130.75.215.255
ether xx:xx:xx:xx:xx:xx

I removed the uninteresting things. hme0 is the regular100Mbps ethernet interface. The 10 Mbps ethernet interface is calledle0. The el0 interface is an ATM LAN emulation(lane)interface. ci0 is the ATM classical IP (clip) interface.fa0 is the interface that supports Fore's proprietaryimplementation of native ATM. Fore is the vendor of the installed ATMcard.AFAIK you can use this interface to build PVCs or, if you are alsousingFore switches, SVCs. You see an unconfigured interface there.

The buffer sizes for sending and receiving TCP segment and for UDPdatagrams can be tuned with Solaris. With the help of thenetstat command you can obtain an output similar butunlikethe following one. The data was obtained on a server which runs a Squidwith five dnsserver children. Since the interprocess communication isaccomplished via localhost sockets, you see both, the client side andtheserver side of each dnsserver child socket.

$ netstat -f inet

TCP
Local Address Remote Address Swind Send-Q Rwind Recv-Q State
-------------------- -------------------- ----- ------ ----- ------ -------
blau-clip.ssh challenger-clip.1023 57344 19 63980 0 ESTABLISHED
localhost.38437 localhost.38436 57344 0 57344 0 ESTABLISHED
localhost.38436 localhost.38437 57344 0 57344 0 ESTABLISHED
localhost.38439 localhost.38438 57344 0 57344 0 ESTABLISHED
localhost.38438 localhost.38439 57344 0 57344 0 ESTABLISHED
localhost.38441 localhost.38440 57344 0 57344 0 ESTABLISHED
localhost.38440 localhost.38441 57344 0 57344 0 ESTABLISHED
localhost.38443 localhost.38442 57344 0 57344 0 ESTABLISHED
localhost.38442 localhost.38443 57344 0 57344 0 ESTABLISHED
localhost.38445 localhost.38444 57344 0 57344 0 ESTABLISHED
localhost.38444 localhost.38445 57344 0 57344 0 ESTABLISHED

The columns titled with Swind and Rwindcontain values for the size of the respective send- and receptionwindows, based on the free space available in thereceivebuffer at each peer. The Swind columncontains the offered window size as reported by theremotepeer. The Rwind column displays the advertisedwindow size being transmitted to the remote peer.

An application can change the size of the the socket layerbuffers with calls to setsockopt withtheparameter SO_SNDBUF or SO_RCVBUF. Windowsandbuffers are not interchangeable. Just remember: The buffers have afixedsize - unless you use setsockopt to change. Windows ontheother hand depend on the free space available in the input buffer. Theminimum and maximum requirements for buffer sizes are tunablewatermarks.

buffers, watermarks and windows

Figure 5: buffers, watermarks and window sizes.

Figure 5 shows the relation of the different buffers, windowsandwatermarks. I decided to let the send buffer grow from the maximumtowardszero, which is just a way of showing things, and does probably notrepresent the real implementation. I left out the different socketoptionsas the picture is confusing enough.

Squid users should note the following behavior seen with Solaris2.6.The default socket buffer sizes which are detected during configurationphase are representative of the values for tcp_recv_hiwat,udp_recv_hiwat, tcp_xmit_hiwat andudp_xmit_hiwat. Also note that enabling the hitobject feature still limits hit object size to 16384 byte,regardlessof what your system is able to achieve.

Output from Squid 1.1.19 configuration script on a Solaris 2.6hostwith the previously mentioned parameters all set to 64000. Please mindthatthese parameters do not constitute optimal sizes in most environments:

checking Default UDP send buffer size... 64000
checking Default UDP receive buffer size... 64000
checking Default TCP send buffer size... 64000
checking Default TCP receive buffer size... 64000

Buffers and windows are very important if you link via satellite.Due tothe daterate possible but the extreme high round-trip delays of asatellitelink, you will need very large TCP windows and possibly theTCPtimestamp option. Only RFC 1323 conformant systemswillachieve these ends. In other words, get a Solaris 2.6. For 2.5 systems,RFC1323 compliance can be purchased as a Sun Consulting Special.

Window sizes are important for maximum throughput calculations, too.AsStevens [4] shows, you cannot go faster than thewindow size offered by your peer, divided by the round-trip time(RTT). The lower your RTT, the faster you can transmit. The largeryour window, the faster you can transmit. If you intend to employmaximumwindow sizes, you might want to give tcp_deferred_acks_maxanother look.

The network research laboratory of the German research network didmeasurements on satellite links. The RTT for a 10 Mbps link (if Iremembercorrectly) was about 500 ms. A regular system was able to transmit 600kbpswhereas a RFC 1323 conformant system was able to transmit about 7 Mbps.Only bulk data transfer will do that for you.

 (1)   10 Mbps * 0.5 s = 5 Mbit = 625 KB
(2) 512 KB / 0.5 s = 1 MBps = 8 Mbps
(3) 64 KB / 0.5 s = 128 KBps = 1 Mbps

The bandwidth-delay-product can be used to estimate theinitialvalue when tweaking buffer sizes. The buffers then represent thecapacityof the link. If we apply the bandwidth-delay-product calculations tothesatellite link above, we get the following results: Equation 1estimatesthe buffer sizes necessary to fully fill the 10 Mbps link. Equation 2assumes that the buffer sizes were set to 512 KB, which would yield 8Mbps.Slight deviation in the experiment may have been caused byretransmissions.Finally, equation 3 estimates the maximum datarate we can use on thesatellite link, if limited to 64 KB buffers, e.g. Solaris <= 2.5.1.The1 Mbps constitute an upper limit, as can be seen by the measured 600Kbps.

Application developers, especially those for web-based applications,should be aware of the implications of persistent connections. As longasHTTP/1.0 connection-per-transaction style is used by your application,depending on the size of the transaction data, you will not get anydecenttransmissions via satellite. For instance, the average web object isabout13 KByte in size, thus transmitting such an object on aconnection-per-transaction basis will never get past TCP slow start.Whilethis may or may not be a big deal with terrestrial links, but you willnever be able to fill a satellite pipe to a satisfactorily degree.Doingthings in parallel might help. Only when reaching TCP congestionavoidanceyou will see any filling of the pipe. You might also want to check outtheunrelated tcp_slow_start_initialparameter.

A word of caution seems to be in order, when tuning theSolaris' TCP high watermarks: Starting with Solaris 2.6, settingtcp_xmit_hiwat or tcp_recv_hiwatnear65535 may have the side effect of turning on the wscale option, becausethese values are rounded up to multiples of MTU for eachconnection. In some cases you may not want to accidentally use wscale,because it may break something else in your setup such as IP-Filter. Toavoid accidentally using wscale, you need to make sure thattcp_xmit_hiwat and tcp_recv_hiwatareboth at least 1 MTU below 65535. For ethernet interfaces, 64000 is agoodchoice.

tcp_cwnd_max
default 32768, since 2.? 65535, recommended 65535 for Solaris<= 2.5.1
since 2.6: 262144 (finally!), no recommendations
Since 8: 1048576, no recommendations

This parameter describes the maximum size the congestionwindowcan be opened. The congestion window is opened as large aspossible with any Solaris up to 2.5.1. A change to this value is onlynecessary for older Solaris systems, which defaulted to 32768. TheSolaris2.6 default looks reasonable, but you might need to increase thisfurtherfor satellite or long, fast links.

Though window sizes beyond 64k are possible, mind that the windowscaleoption is only announced during connection creation and yourmaximumwindows size is 1 GByte (1,073,725,440 Byte). Also, the windowscaleoption is only employed during the connection, if both sidessupport it.

tcp_recv_hiwat
default 8192, recommended 16384 (see text), Cockroft 32768,maximum65535
Solaris 2.6 LFN bulk data transfer 131071 or above (see text)
Since 8: 24576 (see text)

This parameter determines the maximum size of the initial TCPreceptionbuffer. The specified value will be rounded up to the next multiple oftheMSS. From the free space within the buffer the advertised windowsize is determined. That is, the size of the reception windowadvertised to the remote peer. Squid users will be interested in thisvaluewith regards to the socket buffer size the Squid auto configurationprogramfinds.

The previous table shows an Rwind value of 63980 =7 *9140. 9140 is the MSS of the ATM classical IP interface (clip) in host blau.The interface itself uses a MTU of 9180. For thestandardbuiltin 10 Mbps or 100 Mbps IPX ethernet, you get a MTU of 1500 on theoutgoing interface, which yields an MSS of 1460. The value of 57344 inthenext Rwind line points to the lo0(loopback)interface, MTU 8232, MSS 8192 and 57344 = 7 * 8192.

Starting with Solaris 2.6 values above 65535 are possible, seethe window scale option from RFC 1323.Only if the peer host also implements RFC 1323, you will benefit frombuffer sizes above 65535. If one host does not implement the windowscale option, the window is still limited to 64K. The option isonlyactivated, if buffer sizes above 64K are used.

For HTTP, I don't see the need to increase the buffer above 64k.Imagineservicing 1024 simultaneous connections. If both the TCP highwatermarks ofyour system are tuned to 64k and your application uses the system'sdefaults, you would need 128M just for your TCP buffers!

Squid's configuration option tcp_recv_bufsize letsyouselect a TCP receive buffer size, but if set to 0 (default) the kernelvalue will be taken, which is configurable with the tcp_recv_hiwatparameter. A buffer size of 16K islargeenough to cover over 70 % of all received webobjects on our caches.

Refer to tcp_host_paramfor a way toconfigure special defaults for a set of hosts and networks.

tcp_recv_hiwat_minmss
default 4, no recommendations

This parameter influences the minimum size of the input buffer.Thereception buffer is at least as large as this value multiplied by theMSS.The real value is the maximum of tcp_recv_hiwat roundupto the next MSS and tcp_recv_hiwat_minmss multipliedbythe MSS, in other words, something akin to:

  hiwat_tmp ~= ceil( tcp_recv_hiwat / MSS )
real_size := MAX( hiwat_tmp, tcp_recv_hiwat_minmss ) * MSS

That way, however bad you misconfigure the buffers, there is aguaranteed space for tcp_recv_hiwat_minmss fullsegmentsin your input buffer.

udp_recv_hiwat
default 8192, recommended 16384 (see text), maximum 65535

The highwater mark for the UDP reception buffer size. This valuemay beof interest for Squid proxies which use ICP extensively. Please readtheexplanations for tcp_recv_hiwat. Squid users willwant atleast 16384, especially if you are planning on using the (obsolete) hitobject feature of Squid. A larger value lets your computer receivemore seemingly simultaneous ICP PDUs.

If you see many dead parent detections in your cache.logfile without cause, you might want to increase the receive buffer. Inmostenvironments an increase to 64000 will have a negligible effect on thememory consumption, as most application, including Squid, use only oneorvery few UDP sockets, and often in an iterative way.

Remember if you don't set your socket buffer explicitly with acall to setsockopt(), your default reception buffer willhaveaboutthe mentioned size. Arriving Datagrams of a larger size might betruncatedor completely rejected. Some systems don't even notify your receivingapplication.

tcp_xmit_hiwat
default 8192, recommended 16384 (see text), Cockroft 32768,maximum65535
Solaris 2.6 LFN bulk data transfer 131071 or above (see text)
Since 8: 16384 (see text)

This parameter influence a heuristic which determines the sizeof theinitial send window. The actual value will be rounded up to the nextmultiple of the MSS, e.g. 8760 = 6 * 1460. Also do read the section on tcp_recv_hiwat.

The table further to the top shows a Swind of57344 = 7 *8192. For the standard builtin 10 Mbps or 100 Mbps IPX ethernet, youget anMTU of 1500 on the outgoing interface, which yields a MSS of 1460.

Starting with Solaris 2.6 values above 65535 are possible, seethe window scale option from RFC 1323. Only if the peer hostalsoimplements RFC 1323, you will benefit frombuffer sizes above 65535. If one host does not implement the windowscale option, the window is still limited to 64K.

I don't see the need to increase the buffer above 32K for HTTPapplications. Imagine servicing 1024 simultaneous connections. If bothTCPhigh watermarks of your system are tuned to 32K, you would need 64Mjustfor your TCP buffers. Mind that the send buffer has to keep a copy ofallunacknowledged segments. Therefore it is affordable to give it agreatersize than the receive buffer. Again, 16K covers over 70 % of alltransferred web objects on our caches, and 32K should cover 90 %.

Refer to tcp_host_paramfor a wayto configure special defaults for a set of hosts and networks.

udp_xmit_hiwat
default 8192, recommended 16384, maximum 65535

This refers to the highwater mark for send buffers. May be ofinterestfor proxies using ICP extensively. Please refer to the explanations for tcp_xmit_hiwat. Squid users will want 16384,especially ifyou are planning on using the hit object feature of Squid.Selecting a higher value for the transmission is not feasible.

Please remember that there exists no real send buffer for UDP onthesocket layer. Thus, trying to send a larger amount of data than udp_xmit_hiwatwill truncate the excess, unlessthe SO_SNDBUF socket option was used to extend theallowedsize.

tcp_xmit_lowat
default 2048, no recommendations
Since 8: 4096, no recommendations

The current parameter refers to the amount of data which must beavailable in the TCP socket sendbuffer until select or pollreturn writable for the connected filedescriptor.

Usually there is no need to tune this parameter. Applicationscan usethe socket option SO_SNDLOWAT to change this parameter onaprocess local basis.

udp_xmit_lowat
default 1024, no recommendations

The current parameter refers to the amount of data which must beavailable until select or poll return writablefor the connected file descriptor. Since UDP doesnotneed to keep datagrams and thus needs no outgoing socket buffer, thesocketwill always be writable as long as the socket sendbuffer size value isgreater than the low watermark. Thus it does not really make much sensetowait for a datagram socket to become writable unless you constantlyadjustthe sendbuffer size.

Usually there is no need to tune this parameter, especially noton asystem-wide basis.

tcp_max_buf
default 262144, minimum 65536, no immediate recommendations
since 2.6 1048576, minimum 65536, no immediate recommendations
udp_max_buf
default 262144 (since 2.5), minimum 65536, no immediaterecommendations

Finally found the explanations in the SUN TCP/IP Admin Guide. Thecurrent parameter refers to the maximum buffersize an application is allowed to specify with the SO_SNDBUFand SO_RCVBUF socket option calls. Attempts to use largerbuffers willfail with a EINVAL return code from the socket option call.SUNrecommends to use only the largest buffer necessary for any of yourapplications - that is, the supremum function, not the sum. Specifyingagreater size does not seem to have much impact, if all yourapplicationsare well-behaving. If not, they may consume quite an amount of kernelmemory, thus this parameter is also a kind of safety line.

A few odd remarks at this point, concerning the recommendationsgivenfor the transmission buffer sizes. I decreased the recommendations ofAdrian Cockroft in favor of a more conservative memory consumption.Also,with an average HTTP object size of 13 KByte, you can expect to fitover 50 % of all objects into the transmission buffer. On the otherhand, largerobjects which are to be transmitted by a cache or webserver may sufferincertain circumstances. Furthermore, I should recommend a generictransmission buffer size which is double the reception buffer size.Thisrecommendation bases on the fact that unacknowledged segments occupythesend buffer until they are acknowledged.

Here some more material from the SUN TCP/IP Admin Guide,kindlypointed out by Mr. Murphy. Refer to the SUN guide for a more detaileddescription of these parameters, and their respective applicability.Mostnoteworthy is tcp_host_param, which allows perhost/network defaults regarding RFC 1323 TCP options.

tcp_wscale_always
Since 2.6: default 0

If the parameter is set (non-zero), then the TCP windowscaleoption will always be negotiated during connection initiation.Otherwise, the scale option will only be used if the buffer size isabove64K. To take effect, both hosts have to support RFC 1323.

tcp_tstamp_always
Since 2.6: default 0

If the parameter is set (non-zero), then the TCP timestampoption will always be negotiated during connection initiation. Thescale option will always be used if the remote system sent a timestampoption during connection initiation. To use the timestamp, both hostshaveto support RFC 1323.

tcp_tstamp_if_wscale
Since 2.6: default 0

If the option is set (non-zero), the TCP timestamp optionwillbe used in addition to the TCP window scale option, if theuserhas requested a buffer size above 64K, that is, if window scaling isactive.

tcp_host_param_ipv6
Since 8: default is empty (this is a tabular value)

Refer to tcp_host_paramforinstructions on handling the table. The same rules apply exceptthat the ipv6 table is meant for IPv6, of course.

tcp_host_param
Since 2.6: default is empty (this is a tabular value)

This parameter represents a table which contains special TCPoptions tobe used with a remote host or network. The table is configurable withthehelp of ndd, and empty by default. The following piece ofcodedisplays the contents of the table at various points, sets an entry andremoves it again:

# ndd /dev/tcp tcp_host_param
Hash HSP Address Subnet Mask Send Receive TStamp

# ndd -set /dev/tcp tcp_host_param '192.168.4.17 sendspace 262144 recvspace 262144'
# ndd /dev/tcp tcp_host_param
Hash HSP Address Subnet Mask Send Receive TStamp
125 62bae844 192.168.004.017 000.000.000.000 0000262144 0000262144 0

# ndd -set /dev/tcp tcp_host_param '192.168.4.17 delete'
# ndd /dev/tcp tcp_host_param
Hash HSP Address Subnet Mask Send Receive TStamp

Use the mask command to supply a netmask for anetwork, and the timestamp command to supply the timestampoption. Fillthis tablefrom a startup script, if you want large default windows only forcertainlinks (e.g. which go via satellite), but small windows for anythingelse.The content of this table takes precedence over the generic globalvalues,if certain criteria are met:

7. Tuning your system

This section evolved around tuning items, which were not directlyrelated to the TCP/IP stack, but nevertheless play an important role inthe tuning of any system. Refer to SUN's "Solaris Tunable ReferenceManual" for more in-depth information (links at top and bottom).

7.1 Things to watch

Did you reserve enough swap space? You should have atleastas much swap as you have main memory. If you have littlemainmemory, even double your swap. Do not be fooled by theresultof the vmstat command - read the manpage and realize thatthesmall value for free memory shown there is (usually) correct.

With Solaris there seems to exist a difference between virtuallygenerated processes and real processes. The latter is extremelydependenton the amount of virtual memory. To test the amount of both kinds ofprocesses, try a small program of mine. Dostart itat the console, without X and not as privileged user. The firstvalue is the hard limit of processes, and the second value the amountofprocesses you can really create given your virtual memoryconfiguration.Tweaking your ulimit values may or may not help.

7.2 General entries in the file /etc/system

The file /etc/system contains various very importantresource configurable parameters for your system. You use these tuningstogive a heavily loaded system more resources of a certain kind.Unfortunately a reboot is necessary after changing anything.Though one could schedule reboots after midnight, I advice against it.Youshould always check if your changes have the desired effect, and won'tteardown the system.

Adrian Cockroft severely warns against transporting an/etc/system from one system onto another, even worse, ontoanother hardware platform:

Clean out your /etc/system when you upgrade.

The most frequent changes are limited to the number of filedescriptors,because the socket API uses file descriptors for handling internetconnectivity. You may want to look at the hardlimit offilehandles available to you. Proxies like Squid have to counttwice tothrice for each request: open request descriptors and an open fileand/or(depending what squid you are using) an open forwarding requestdescriptors. Similar calculations are true for other caches.

You are able to influence the tuning with the reserved wordset. Use a whitespace to separate the key from thekeyword.Use an equals sign to separate the value from its key. There are a fewexamples in the comments of the file.

Please, before you start, make a backup copy of yourinitial/etc/system. The backup should be located on yourroot filesystem. Thus, if some parameters fail, you can always supplythealternative, original system file on the boot prompt. The followingshowstwo typically entered parameters:

* these are the defaults of Solaris < 8
set rlim_fd_max=1024
set rlim_fd_cur=64

WARNING! SUN does not make any guarantees for the correctworkingof your system, if you use more file descriptors than 4096. Personally,myold fvwm window manager did quit working alltogether. In my case, Icompiled it on a Solaris 2.3 or 2.4 system and transferred it alwaysonwards to a 2.5 system. After re-compiling it on the new OS, it workedtomy satisfaction.

If you experience SEGV core dumps from your select(3c)system call after increasing your file descriptors above 4096, you havetorecompile the affected programs. Especially the select(3c)call is known to the Squid users for its bad tempers concerning themaximumnumber of file descriptors. SUN remarks to this topic:

The default value for FD_SETSIZE (currently1024) is largerthan the default limit on the number of open files. Inorder to accommodate programs that may use a larger numberof open files with select(), it is possible to increase thissize within a program by providing a larger definition ofFD_SETSIZE before the inclusion of <sys/types.h>.

Note: This does not work as expected! See text below.

I did test this suggestion by SUN, and a friend of mine tried itwithSquid Caches. The result was a complete success or disaster both times,depending on your point of view: If you can live with supplying nakedwomento your customers instead of bouncing logos of companies, go ahead andtryit. If you really need to access file descriptors above 1024, don'tuse select(), use poll() instead!poll() is supposed to be faster with Solaris, anyway. Adifferent source mentions that the redefinition workaround mentionedaboveworks satisfactorily; not for me, my personal experiences warn againstsuchan action.

At the pages of VJ are a some trickswhichI incorporated into this paper, too. Personally I am of the opinionthatthe VJ pages are not as up to date as they could be.

Many parameters of interest can be determined using the sysdef-i command. Please keep in mind that many values are inhexadecimal notation without the 0x prefix. Anotherverygood program to see your system's configuration is sysinfo, the program. Refer tothe manpages how toinvoke this program.

There is also the possibility to use a smallhelperscript kindly supplied by Mr. Kroonma to have a look into somekernelvariables with the help of the absolute debugger (adb). Youcanextend the script to suit your own needs, but you should know what youaredoing. Refer to the manual page of the absolute debugger for details ofdisplaying non-ulong datatype variables. If you don't know, what adbcan dofor you, hands off.

rlim_fd_cur
default 64, recommended 64 or 256
Since 8: default 256, no recommendations

This parameters defines the soft limit of open files you canhave. Thecurrently active soft limit can be determined from a shell withsomethinglike

ulimit -Sn

Use at your own risk values above 256, especially if youarerunning old binaries. A value of 4096 may look harmless enough, but maystill break old binaries.

Another source mentions that using more than 8192 filedescriptors isdiscouragable. It mentions that you ought to use more processes, if youneed more than 4096 file descriptors. On the other hand, an ISP of myacquaintance is using 16384 descriptors to his satisfaction.

Thepredicate rlim_fd_cur <= rlim_fd_maxmust be fulfilled.

Please note that Squid only cares about the hard limit (nextitem). Withrespect to the standard IO library, you should not raise the soft limitabove 256. Stdio can only use <= 256 FDs. You can either use AT&T'ssfiolibrary, or use Solaris 64-bit mode applications which fix thestdioweakness. RPC prior to 2.6 may break, if more than 1024 FDs areavailableto it.

Also note that RPC prior to Solaris 2.6 may break, if more than1024 FDsare available to it. Also, setting the soft limit to or above 1024impliesthat your license server queries break (first hand experience - thanksJens). Using 256 is really a strong recommendation.

rlim_fd_max
default 1024, recommended >=4096

This parameter defines the hard limit of open files you canhave. For aSquid and most other servers, regardless of TCP or UDP, the number ofopenfile descriptors per user process is among the most importantparameter. The number of file descriptors is one limit on thenumber ofconnections you can have in parallel. You can find out the value ofyourhard limit on a shell with something like

ulimit -Hn

You should consider a value of at least 2 * tcp_conn_req_max and youshould provide at least 2 * rlim_fd_cur. Thepredicate rlim_fd_cur <= rlim_fd_maxmust befulfilled.

Use at your own risk values above 1024. SUN does not makeanywarranty for the workability of your system, if you increase this above1024. Squid users of busy proxies will have to increase this value,though.A good starting seems to be 16384 <= x <= 32768. Remember tochangethe Makefile for Squid to use poll() instead of select().Also remember that each call of configure will change theMakefile back, if you didn'tchange Makefile.in.

Any decent application will incorporate code to increase itssoft limitto a possibly higher hard limit. Please note (again) that Squid, assuch anapplication, only cares about the hard limit.

maxphys
default 126976 (sun4m and sun4d), 131072 (sun4u), 57,344 (Intel),
1048576 (sd driver with wide-SCSI), 1048576 (SPARC storagearray driver), no recommendations

A work-copy of this value is often stored in the mount structureordriver structure at the time it is attached. If a driver sees IOrequestslarger than this parameter, the requests will be broken down intoappropriotely sized chunks. The file system may further fragment thechunks. A change might be conceivable, if your database server uses rawdevices and issues large requests - mind that many of todays databaseusageparadigms result in many small chunked requests and will not speed upbyincreasing this value.

If working large chunked IO with UFS, you can additionallyincrease thenumber of cylinder groups and decrease the number of inodes per group(asthere will be a few large files).

maxusers
default 249 ~= Megs RAM (Ultra-2/2 CPUs/256 MB), min 8, max 2048,no recommendations

This parameter determines the size of certain kernel datastructureswhich are initialized at startup. Recent versions of Solaris derivemosttable sizes now from the amount of memory available, but there arestillsome dependent variables on this parameter, see max_nprocs, maxuprc, ufs_ninode, ncsizeand ndquot. There is strong indication that thedefault for maxusers itself is being determined fromthe mainmemoryin megs. It might also be a function of the available memory and/orarchitecture.

The greater you chose the number for maxusers,thegreater the number of the mentioned resources. The relation in strictlyproportional: A doubling of maxusers will (more orless)double the other resources.

Adrian Cockroft advises against a settingof maxusers. The kernel uses a lot of space whilekeeping track of the RAM usages within the system, therefore it mightneed to be reduced on systems with gigabytes of main memory.The point to change this parameter is whenever the automagicallydetermined number of user processes is way too high, e.g. file servers,database servers, compute servers with few processes, or way too low.

pidmax
Since 8: default 30000, minimum 266, maximum 999999, norecommendations

Starting with Solaris 8, you can determine the number of thelargestpossible value for a pid_t the system can set. From thisparameter, the kernel variable maxpid will be set onceduring startup. maxpid on the otherhand cannotbe set via /etc/system.

reserved_procs
Since 8: default 5, mininum 5, maximum MAXINT, no recommendations

This parameter is the mysterious difference between the numberof allprocesses max_nprocs and the number of user processes maxuprc, and affects the number of system processtableslots reserved for uid 0, e.g. sched, pageoutand fsflush.

Though a change is not immanently recommended, increasing thenumber ofroot slots to 10 plus number of root processes might be considered, inorder to provide root with a shell at times the system is uncapable ofcreating a user-level shell, e.g. run-away user-processes,fork-of-death,etc.

max_nprocs
default 10+maxusers*16, minimum 266, maximumMIN(maxpid,65534), no recommendations

This is the systemwide number of processes available, user andsystem processes. You should leave sufficient space to the parameter maxuprc.The value of this parameter is influencedby thesetting of maxusers.

The number is used to compute various further parameters (seebelow),including the DNLC cache, the quota structures, System-V semaphorelimits,address translation table resources for sun4m, sun4d and Intel Solarisverions.

maxuprc
default max_nprocs-reserved_procs,minimum 1, maximum=default, no recommendations

This parameter describes the number of processes available tousers. Theactual value is determined from max_nprocs which isitselfdetermined by maxusers. Adjustments to this parametershould be implemented by changing max_nprocs and/or reserved_procsinstead.

npty
default 48, no recommendations

The parameter defines the maximum number of BSD ttys(/dev/ptty??) available. A few BSD networking things mightneed these devices. If you run into a limit, you may want to increasethenumber of available ttys, but usually the size is sufficient.

pt_cnt
default 48, min 48, max 3000, no recommendations
Since 8: remove from /etc/system

Solaris only allocated 48 SYSV pseudo tty devices (slave devicesin /dev/pts/*). On a server with many remote login, ormany openxterm windows you may reach this limit. It is of little interest towebservers or proxies, but of greater interest for personalworkstations.

Starting with Solaris 8, the pseudo terminals are allocateddynamically,see docs.sun.com. Presetting thevariable to some value disables the dynamic allocation.

vac_size
default 16384 (with maxusers 249), recommended:don't set

This parameter specifies the size of the virtual address cache.If apersonal workstation with many open xterms and sufficient tty deviceshas avery degraded performance, this parameter might be too small. Myrecommendation is to let the system chose the correct value. Thecurrentvalue is determined by the size of maxusers.

ufs_ninode
default 4323 = 17*maxusers+90(with maxusers 249), min 226, max: see below,
Or 2.5.1: 4323 = max_nprocs+16+maxusers+64,(with max_nprocs 3994 and maxusers249)
Since 2.6: 4323 = 4*(max_nprocs+maxusers)+320(with max_nprocs 3994 and maxusers249)
no immediate recommendations
ncsize
default 4323 = 17*maxusers+90(with maxusers 249), min 226, max: see below,
Or 2.5.1: 4323 = max_nprocs+16+maxusers+64,(with max_nprocs 3994 and maxusers249)
Since 2.6: 4323 = 4*(max_nprocs+maxusers)+320(with max_nprocs 3994 and maxusers249)
no immediate recommendations

The first formula is taken from the NFS Server PerformanceandTuning Guide for SUN Hardware, the second formula is taken fromthe System Administration Guide, Volume II and the third froman emailon squid-users. I guess, in the end, after substituting allvariables and interdependencies, they turn out more or less the same.

The ufs_inode parameter specifies the size ofan inodetable. The actual value will be determined by the value of maxusers.A memory-resident inode is used wheneveranoperation is performed on an entity in the file system (e.g. files,directories, FIFOs, devices, Unix sockets, etc.). The inode read fromdiskis cached in case it is needed again. ufs_ninode isthesize that the Unix file system attempts to keep the list of idleinodes. Asactive inodes become idle, if the number of idle inodes increases abovethelimit of the cache, the memory is reclaimed by tossing out idle inodes.

The ncsize parameter specifies the size of thedirectory name lookup cache (DNLC). The DNLC caches recently accesseddirectory names and their associated vnodes. Since UFS directoryentriesare stored in a linear fashion on the disk, locating a file namerequiressearching the complete directory for each entry. Also, adding orcreating afile needs to ensure the uniqueness of a name for the directory, alsoneeding to search the complete directory. Therefore, entire directoriesarecached in memory. For instance, a large directory name lookup cachesizesignificantly helps NFS servers that have a lot of clients. On othersystems the default is adequate. The default value is determined by maxusers.

Every entry in the directory name lookup cache (DNLC) points toan entryin the inode cache, so both caches should be sized together. The inodecache should be at least as big as the DNLC cache. For bestperformance, itshould be the same size in the Solaris 2.4 through Solaris 8 operatingenvironments.

The upper bound for the inode cache is set by the amount ofkernelmemory used for inodes. The largest test value was 34906. Starting withSolaris 2.5.1, each inode uses 320 byte kernel memory. I was able toset myinode cache to 54688 on an 80 MB sun4m, and there are reports of anevenlarger settings of 128000 entries in the inode cache on a 1 GB machine.

The kernel will decrease the inode cache based on the mainmemoryavailable, if too large, but it will not perform anymagicfor ridiculous large values. Your application could suffer from inodestarvation, if the value is too large, and the inodes are notsufficientlyrecycled. You can check the current settings with the help of the netstat-k inode_cache command. The example shows amaximumsize of 54688 entries:

$ netstat -k inode_cache
inode_cache:
size 947 maxsize 54688 hits 74 misses 1214 kmem allocs 947 kmem frees 0

Warning: Do not set ufs_ninodeless than ncsize. The ufs_ninodeparameter limits the number of inactiveinodes,rather than the total number of active and inactive inodes. With theSolaris 2.5.1. to Solaris 8 software environments, ufs_ninodeis automatically adjusted to be atleast ncsize. Tune ncsize to getthehit rateup and let the system pick the default ufs_ninode.

I have heard from a few people who increase ncsizeto30000 when using the Squid webcache. Imagine, a Squid uses 16 topleveldirectories and 256 second level directories. Thus you'd need over 4096entries just for the directories. It looks as if webcaches andnewsserverwhich store data in files generated from a hash need to increase thisvaluefor efficient access.

You can check the performance of your DNLC - its hit rate - withthehelp of the vmstat -s command. Please note that Solaris 7re-implemented the algorithm, and thus doesn't have the toolongentry any more:

 
$ vmstat -s
... 1743348604 total name lookups (cache hits 95%) 32512 toolong

Up to Solaris 7, only names less than 30 characters are cached.Also,names too long to be cached are reported. A cache miss means that adiskI/O may be needed to read the directory (though it mightstill bein the kernel buffer cache) when traversing the path name components togetto a file. A hit rate of less than 90 percent requires attention. Sinceonly short names are cached in Solaris version prior to 7, such abehaviorwould call for putting Squid cache disks or News spool disks ontopartitions of their own (always a recommended feature for variousreasons),and, more importantly, use a mount point in the root directory with ashortname, e.g. /disk1. /var/spool/cache just might beshortenough for Squid.

Solaris 7 re-implemented the DNLC algorithm. Now, memory isallocateddynamically, and path names with more than 30 characters are cached,too.Mr. Storm pointed to Adrian Cockroft answers a reader's question on Sun World Online LettersSection:

You can set the DNLC to be as big as you like. You shouldbenchmark Solaris7 as it has a new, faster DNLC implementation that has the extrafeature ofknowing that a directory is totally cached in the DNLC, so it doesn'tneedto scan the disk to ascertain that a new filename isn't already in use.

Solaris 8 6/00, further enhances the DNLC, see the System Administration Supplementforenlightenment. The improved DNLC is now capable of caching negativehits,that is, to verify the non-existence of a file. I reckon that there becache coherence protocols employed, so an application polling for theexistence of a lock file will be notified as soon as possible.

dnlc_dir_enable
Since 8 6/00: default 1, recommended: don't touch

The switch enables the DNLC for large directories. There is noneed totouch, but if problem occur, then set this variable to 0, to turn ofthecaching of large directories.

dnlc_dir_min_size
Since 8 6/00: default 40, minimum 0, maximum MAXUINT,recommended: don't touch
dnlc_dir_max_size
Since 8 6/00: default MAXUINT, minimum 0, maximum MAXUINT,recommended: don't touch

MAXUINT may have different concrete values, depending on thekernelrunning in 32 bit or 64 bit mode.

The dnlc_dir_min_size places a minimum limiton thedirectories which will eventually be cached. It looks as if the defaultvalue is a balance between the overhead of setting up the cache for thedirectory, and by-passing the cache. It is one of the usual problemsthatcaching comes not for free. For this reason, it is strongly suggestednotto decrease the default. If performance problems occur when cachingsmalldirectories, increase the minimum default. From the System Administration Supplement:

Note that individual file systems might have their own rangelimits forcaching directories. For instance, UFS limits directories to a minimumof ufs_min_dir_cache bytes (approximately 1024entries),assuming 16 bytes per entry.

If performance problems occur with large directories, thenenforce alimit on the cachable directory using dnlc_dir_max_size.The dnlc_dir_enable parameter might be another switchtodisable the new DNLC of (overly) large directories.

bufhwm
default 2 % of main memory, no immediate recommendations,maximum 20 %

Now, considering the SVR3 buffer cache described by Maurice Bach [11], this parameter specifies themaximummemory size allowed for the kernel buffer cache. The 0 value reportedby sysinfo says to take 2 % of the main memory for buffercaches. sysdef -i shows the size in bytes taken for thebuffercache.

Refer to the NFS Server Performance and Tuning Guide forSUN HWfor further documentation on this parameter. I have seen Squid adminsincreasing this value up to 10 %, also a recommendation for dedicatedNFSservers with a relatively small memory system. On a larger system, the bufhwmvariable may need to be limited to preventthesystem from running out of the operating system kernel virtual addressspace.

The buffer cache is used to cache inode, indirect block, andcylindergroup related disk I/O only. If you change this value, you have toenterthe number of kByte you want for the buffer cache. Please keep in mindthatyou are effectively 'double buffering', if you increase this value inconjunction with a proxy-cache like Squid.

If you have your system accounting up and running, you can checkandmonitor your buffer cache with the sar -b command - checkwiththe manual page on how to run sar. The numbers in thecolumnstitled as %rcache and %wcache are reportedforthe read hit rate and write hit rate respectively. You need to tuneyoursystem, if your read hit rate falls below 90 % and/or your write hitratefalls below 65 %.

physmem
default number of available pages excluding kernel core and data,recommendation: don't touch

This is more or less a debug or test value to simulate a systemwithless memory than actually available.

lwp_default_stacksize
default 8192, 16384 for sun4u in 64bit mode, recommendation:don't touch

This parameter affects the size of the stack of a kernel thread (lightweight process, LWP) at the time of its creation.Increasing this value will result in almost every kernel thread to usealarger stack, most of the time eating memory resources without usingthem.If your system panic due to running out of stack space, chances arethatthere is something wrong with your application.

ndquot
default 6484, no recommendations

This parameter specifies the size of the quota table. Manystandalonewebservers or proxies don't use quotas.

nstrpush
default 9, no recommendations

This parameter determines how many STREAMS modules you areallowed topush into the Solaris kernel - I guess this is a per user or perprocesscount. The only application of widespread use which may need such akernelmodule is xntp. Even with other modules pushed, usuallyyouhave sufficient room and no need to tweak this parameter.

strmsgsz
default 65536, no recommendations

This parameter determines the maximum size of a message which isto be piped through the SYSV STREAMS.

strctlsz
default 1024, no recommendations

The maximum size of the control part of a STREAMS message.

autoup
default 30, no (immediate) recommendations
tune_t_fsflushr
default 5, no (immediate) recommendations

The autoup value determines the maximum age amodifiedmemory page. The fsflush kernel daemon wakes up everyfiveseconds as determined by the tune_t_fsflushrinterval. Ateach wakeup, it checks a portion of the main memory - the quotient of autoupdivided by tune_t_fsflushr.Thepages are queued to the pageout kernel daemon, whichforms itinto clusters for faster write access. Furthermore, the fsflushdaemon flushed modified entries from the inodecachesto disk!

Some squid admins recommend lowering this value, because at highdiskloads, the fsflush effectively kills the I/O subsystemwithits updates, unless the stuff is flushed out fairly often. StewardForsternotes that this is justifiable, because squid writes disjoint data setsandrarely does multiple writes to the same disk block. If

 /usr/proc/bin/ptime sync

reports the time spent for updating the disks above five secondsonseveral occasions, you can consider lowering autoupamongseveral options. Please note that a larger bufhwmwilltake longer to flush. Also, the settings of ufs_ninodeand ncsize have an impact on the time spent updatingthedisks. Setting the value too low has harmful impact on yourperformance,too.

There are also instances, where increasing the autoupmakes sense. Whenever you are usingsynchronouswrites like NFS or raw database partition, fsflush haslittleto do, and the overhead of frequent memory scans are a hindrance. Referto Adrian Cockroft [2] for a more detailedenlightenmenton the subject. I never claimed that tweaking your kernel is easy norfoolproof.

use_mxcc_prefetch
default 0 (sun4d) or 1 (sun4m), recommended: see text

Adrian Cockroft explains in Whatare the tunable kernel parameters for Solaris 2? this parameter.Theparameter determines the external cache controller prefetches. You haveto know your workload. Applications with extensive floating pointarithmetic will benefit from prefetches, thus the parameter is turnedonon personal workstations. On random access databases with little or noneed for float point arithmetic the prefetch will likely get into theway, therefore it is turned off on server machines. It looks as if itshould be turned off on dedicated squid servers.

noexec_user_stack_log
Since 2.6: default ?, recommended: don't touch
noexec_user_stack
Since 2.6: default 0, recommended: see CERTCA-98.06, or DE-CERT. Limited to sun4[mud]platforms!
Warning: This option might crashsome ofyour application software, and endanger your system's stability!

By default, the Solaris 32 bit application stack memory areasare setwith permissions to read, write and execute, as specified in the SPARCandIntel ABI. Though many hacks prefer to modify the program counter savedduring a subroutine call, a program snippet in the stack area can beusedto gain root access to a system.

If the variable is set to a non-zero value, the stack defaultsto readand write, but not executable permissions. Most programs, but not all,willfunction correctly, if the default stack permissions exclude executablerights. Attempts to execute code on the stack will kill the processwith aSIGSEGV signal and log a message in kern:notice. Programwhichrely on an executable stack must use the mprotect(2)functionto explicitly mark executable memory areas.

Refer to the System Administration Guide formoreinformation on this topic. Admins which don't want the report aboutexecutable stack can set the noexec_user_stack_logvariable explicitly to 0.

Also note that the 64 bit V9 ABI defaults to stacks withoutexecutepermissions.

priority_paging
Since 7 or 2.6 with patch >= 105181-09 applied: default 0,recommended 1
Since 8: remove from /etc/system

Priority paging is an advanced memory paging technique whichenhancesthe responsiveness of the system. If the file system is used heavily,Solaris may suffer from the file system cache stealing pages fromapplications. High performance clusters almost always benefit from thepriority paging. The more memory you have, the better it is to activelyavoid swapping.

Please refer to Priority Paging page by RichardMcDougall, Triet Vo, and Tom Pothier. The paper rumours about anappropriate kernel patch for Solaris 2.5.1.

There is one drawback, though, or a feature for some of us: Ifyou datahas the executable bit set, it can fool the virtual memory managementintobelieving it is treating a real executable, and thus will not engagepriority paging for that data.

The Solaris 8 operating environment introduces a new file systemcachingarchitecture, which subsumes the Solaris 7 Priority Pagingfunctionality.The system variable priority_paging should not be set in the Solaris 8operating environment, and should be removed from the directory /etc/systemwhen systems are upgraded to the Solaris 8operating environment.

tcp:tcp_conn_hash_size
default 256, recommended: increase on busy servers
Since 8: 512, still increase on busy servers

The tcp connection hash size determines the size of the tablewhereSolaris keeps all interesting information like RTO, MSS, windows andstateson any TCP connection. You can check the current content of the tablewiththe ndd command:

$ ndd /dev/tcp tcp_conn_hash
tcp_conn_hash_size = 256
TCP dest snxt suna swnd rnxt rack rwnd
rto mss w sw rw t recent [lport,fport] state
251 f5bcf2a8 130.075.003.xxx 204a5e77 204a5e77 0000032120 e6255721 e6255721 0000034752
02000 01448 1 00 00 1 002a16c0 [22, 1022] TCP_ESTABLISHED

The default size is printed when investigating the table. If youhavea busy server, you might want to consider increasing the table's size.Mr. Storm reports that SUN increases the hash size up to 262144 forweb server benchmarks.

nfssrv:nfs_portmon
default 0, recommended: read text, NFS only

If the value is set to 1, the NFS service daemon by places therestrictionon the client to use a privileged port, see nfsd(1m). It issaidto make it a little more difficult to abuse Leendert's NFS shell, if the server is thus setup.

Some services use a multitude of caches files like Squid or someNewsserver where names (URLs or articles) are mapped by a hash function toashallow directory tree, helping the buffer cache and inode caches ofthehost file system (compared to using unlimited subdirectories like theCERNcache). As well-known in software engineering, the speedup by using theright algorithm usually far exceeds anything you can achieve byfiddling with the hardware or tweaking system parameters. Still, theservices can be helped by proper tuning of ncsize andufs_ninode.

7.3 System V IPC related entries

Many applications still use the (old) SYSV IPCs. The System V IPCcan beordered into the three separate areas message queues, shared memory andsemaphores. With Solaris you have an easier and faster API to achievethesame ends with Unix sockets or FIFOs, shared memory throughmemory maps, see mmap(2), and file locks instead ofsemaphores. Due to the reduced need for System V IPC, Solaris hasdecreasedthe resources for System V IPC drastically. This is o.k. for standaloneservers, but personal workstations may need increased resources.

In some cases large database applications or VRML viewer use SystemVIPC. Thus you should consider increasing a few resources. The activeresource can be determined with the sysdef -i command.Relevant for your inspection are the parts rather at the end, allhavingIPC in their names.

At first glance, the

System V IPC resourcesfor message queues and semaphores seem to be disabled by default. Thisisnot true, because the necessary modules are loaded dynamically into thekernel as soon as they are referenced. The default System V sharedmemoryuses 1 MB main memory. Proxy and webserver may even want to decreasethisvalue, but database servers may need up to 25 % of the main memory asSystem V shared memory.
* personal workstations using mpeg_play, or vic
set shmsys:shminfo_shmmax=16777216

The entries in /etc/system for all System V IPCrelatedinformations contains the prefix msgsys:msginfo_  for message queues, the prefixsemsys:seminfo_  forsemaphores, and the prefix shmsys:shminfo_  for shared memory. After the prefixesstartsthe resource identifier, all lower case letters, for the correspondingvalue displayed by the sysdef command, e.g.shmmax for the value of SHMMAX. The meaningofthe parameters can be obtained from any programming resource on SystemVICP, e.g. Stevens' [3]. If anything, you only needtochange the value for SHMMAX.

7.4 How to find further entries

There are thousands of further items you can adjust. Every modulewhichhas a device in the /dev directory and a module filesomewherein the kernel tree underneath /kernel can be configuredwiththe help of ndd. Whether you have to have superuserprivilegesdepends on the access mode of the device file.

There is a way to get your hands on the names of keys to tweak. Forinstance, the System V IPC modules don't have a related device file.Thisimplies that you cannot tweak things with the help of ndd.Nevertheless, you can obtain all clear text strings from the modulefile inthe kernel.

strings -a /kernel/sys/shmsys # possible
nm /kernel/sys/shmsys # recommended

There is a number of strings you are seeing. Most of the strings areeither names of function within the module or clear text stringpassagesdefined within. Strings starting with shminfo are thenames ofuser tunable parameters, though. Now, how do you separate tunableparameters from the other stuff? I really don't know. If you have someknowledge about Sun DDI, you may be able to help me to find arecommendableway, e.g. using _info(9E) and mod_info.

The interesting part, though, is to configure devices and moduleswiththe SUN supported way to do things, and that means using ndd.Please refer to the ndd section on howtouse ndd for changing values non-permanently. Remember, ifyouwant to know what names there are to tweak, use the question markspecialparameter.

Of course, you can only change entries marked for read and write. Ifyouare satisfied with your settings, and want to store the configurationas adefault at boot time, you can enter your preferred values into the/etc/system file. Just prefix the key with the module nameandseparate both with a colon. You did see this earlier the System V IPCpage,and the same will be shown for 100 Mbit ethernet.

8. 100 Mbit ethernet and related entries

This section focuses on the hme fast ethernetinterface,but some ways to do things may be applicable to other interface cards,too.Please refer to the SUN Platform Notes: The hme FastEthernet Driverfor a detailed introduction of the handling of the fast ethernetdevice.Refer to that document for the use of lance_mode,pace_size and ipg0 throughipg2.

8.1 The hme interface

The current section can only be regarded as a quick introductioninto amore complex theme. The focus is on the selection of the bestperformingdata mode when inter-operating with switches and employingauto-negotiation. Refer to theWhichnetwork cards support full duplex (SUN FAQ) for an overview whichinterfaces support what data mode. Users of a Solaris 2.5.1 have to usePatch for 2.5.1 and autonegotiation, in order to use auto negotiation successfully. Thereleasenotes of the patch state:

NOTE2: For devices that do not advertise auto-negotiation andadvertise10-full-duplex and 10-half-duplex, hme will first select the10-half-duplex. However, one can force it to 10-full-duplex (ifdesired).

In order to check the current setting of your 100 Mbit interfaces,you haveto use ndd. If you system is a 2.5.1, and unpatched, onlyrelyon the data the switch, hub or router is giving you. You should make aspecial issue of back-checking the values obtained from your Solarissystemwith whatever kind of link-partner you are connected to.

instance
default: 0, see text

If there is just one hme interface installed in your system, nddwill auto-magically select the correct one. Ifthere ismore than one 100 Mbit interface card installed in your system, youhave toselect the appropriate card you want to inspect or modify. First checkthefile /etc/path_to_inst in order to identify theinterface. Usethat instance number, and set the instance parameterofthe hme driver. Now all further modification or inspections will applytojust that particular interface.

Please note the importance of the instanceparameter,if you have more than one card of the same kind installed. Allinspectionsor modifications to parameters described below are depending on thesettingof instance. The following set of read-onlyparametersallows you to inspect the behavior of the interface:

link_status (read-only)
default: 0 or 1

With the help of the link_status parameter youcandetermine whether your link is up or down. A value of 0 means that thelinkis down, a value of 1 that the link is up.

link_speed (read-only)
default: 0 or 1

This parameter lets you determine the speed which has beenselected forthe interface. The content is only valid, if the link is up. A value of0implies 10 Mbps, a value of 1 means 100 Mbps.

link_mode (read-only)
default: 0 or 1

The link_mode shows the duplex mode the linkemploys.The content is only valid, if the link is up. A value of 0 means thathalf-duplex is used, a value of 1 implies full-duplex. If you aredetectinghalf-duplex mode, and you are sure that this is unwanted, you will needtotake some of the steps described below.

transceiver_inuse(read-only)
default: 0 or 1

A value of 0 translates to "internal transceiver" and a value of1 tothe "external transceiver".

Check the content of the link_* values carefully. Ifyougot all 1 values there, everything is working at optimum performancefor anhme interface, and you might want to skip to the next section. On theotherhand, if either Solaris or your link partner is telling you aboutsub-optimal performance like 10 Mbps and/or half-duplex mode, and youareabsolutely sure that both partners, the Solaris host and its linkpartner,are able to perform better, you might need to tweak your setup. It is awell-known problem that auto negotiation of the link setup may fail.

You might first want to look, if your hardware thinks it is capableofsupporting the modes you intend to select. Also, you might want tocheckwhat your interface things the link partner supports. There is a set ofsixvalues repeating for several values to check and one set of data toset.The asterisk * has the meaning of a wild card (like from ashell):

If you replace the asterisk with the prefix lp_(including the underscore), you get a set of six read-only variables,whichdescribe the notion your interface has about its link partner. That is,theabilities advertised by your link partner, as seen from Solaris. Checkthelp_autoneg_cap value first, because if it is 0, alltheother lp_* values have an undefined meaning.

If you replace the asterisk with no prefix (just remove it), you getanother set of six read-only variables. These variables describe thelocaltransceiver abilities of the hardware. Please do not be too alarmed, ifthetransceiver reports to be able to support only half-duplex mode.Accordingto SUN, the internal transceiver can support all capabilities. Thus youmight still be able to configure full-duplex mode with the hmeinterface.

Finally, if you replace the asterisk with the prefixadv_ (including the underscore), you get yet anotherset ofsix variables, this time writable ones, which describe the capabilitiestheinterface is to advertise to its link partner. After changing anyvalues inthis set, you have to shut the interface down with theifconfig command, and start it up again, or temporarilydisconnect the link cable. If more than one speed capability toadvertiseis activated, the items are priorized, highest priority first:

  1. adv_100fdx_cap
  2. adv_100T4_cap
  3. adv_100hdx_cap
  4. adv_10fdx_cap
  5. adv_10hdx_cap

Table 1 shows the default values for the un-prefixed andadv_ prefixed sets. The table does not show the valuesforthe lp_ set, as those are determined from the linkpartnercapabilities. Please note that Solaris 2.5.1 and below default tohalf-duplex operations. In order to use auto negotiation, you have tousethe patch mentioned above.

ability *="" *="adv_"
*autoneg_cap 1 1
*100T4_cap 0 0
*100fdx_cap 0 (Solari < 2.6)
1 (Solari >= 2.6)
0 (Solari < 2.6)
1 (Solari >= 2.6)
*100hdx_cap 1 1
*10fdx_cap 0 (Solari < 2.6)
1 (Solari >= 2.6)
0 (Solari < 2.6)
? (Solari >= 2.6)
*10hdx_cap 1 1

Table 1: Default values for the internal transceiverperceived abilities and advertisable abilities.

If you are experiencing trouble with auto negotiation, you will havetoset explicitly the values supported for your interface card. Forinstance,if your link partner is not capable of auto negotiation, the correctspeedfor the link, and half-duplex mode (!) will be selected. But there arethree ways to force your own choice:

  1. setting values with the help of ndd
  2. setting values in /etc/system
  3. setting values in hme.conf

Please note that setting options with ndd only worksuntilthe next reboot. Also, you have to disconnect the link cabletemporarilyfor a few seconds to initiate auto negotiation of the newly setcapabilities. You should use ndd to test out a workingset ofcapabilities, which you can manifest later in either but notboth of the files mentioned above.

use_int_xcvr
default 0

If the default is active, the external transceiver will be used,ifconnected to the link. Otherwise the internal transceiver will be used.Ifyou want to override an external transceiver, you can set this optionto 1,and force the use of the internal transceiver.

adv_autoneg_cap
default 1, recommended 1, if possible

If you experienced severe problems with auto negotiation, youmight wantto try setting this value to 0. By using the zero value, you can forceyour preferred mode onto the hardward, but if your link partner doesnotsupport the chosen mode/speed combination, you might end up withnothingat all.

Usually the link partners like switches do auto negotiation, aswell. For instance, if you want to force the use of 100 Mbpsfull-duplex, itmay be necessary to set this parameter to 0 andconfigure your link partner hardware manually to 100 FDX. Also, onlyset one of the following parameters to 1, and all theothersto 0. This is a last resort which always used to work for me.

adv_100T4_cap
default 0, no recommendations

The 100Base-T4 mode is only supported by an externaltransceiver, and usually not relevant for most of the hosts I know of.

adv_100fdx_cap
default 0, recommended 1, if possible
Since 2.6: 1
adv_100hdx_cap
default 1, recommended 1, if possible

The fdx parameter switches the advertising of the full-duplexmodecapability, the hdx parameter of the half-duplex mode. If youexperiencedproblems forcing your preferred mode, you can try to set thefull-duplexparameter the opposite of the half-duplex value.

adv_10fdx_cap
default 0, recommended 1, if possible
Since 2.6: ?
adv_10hdx_cap
default 1, recommended 1, if possible

The latter parameters concern 10 Mbps speed capabilities to beadvertised to a link partner. You'd probably prefer your server to workata degraded performance, if your link partner and you happened todisagreeon auto-negotiation, rather than not being able to reach it at all.

A few conditions on incorrectly working 100 Mbit interfaces resultin adowngrade to 10 Mbit ethernet and/or half-duplex mode. Thus check atallavailable ends, if you are really getting the data rate you areexpecting.The first hints about a misconfigured interface can be obtained withthenetstat -ni input errors. Of course, good information canonlybe obtained at the link partner, if it happens to be a switch orrouter.

You have to be super-user to be able to tweak the hmedevice. If you are able to see any value of the hmeinterfacewith ndd as mere mortal user, you are suffering from aseveresecurity hole. In that case check the access rights and ownership ofthetools, devices and module files.

After you have determined a working set of special configurations,youcan make the selection permanent by writing them into the/etc/system file. If you have more than one hme interfaceinstalled, you have to select the instance first. Otherwise, allmodifications are reflected on all interfaces, sometimes thepreferred way to initialize things.

In order to insert the values into /etc/system, youwillhave to prefix the adv_* values withhme:hme_. A typical entry in the /etc/systemof a patched 2.5.1 hosts sets all capability advertisements. If theautonegotiation with the link partner works out, 100 Mbps full-duplex willbeselected:

set hme:hme_adv_100fdx_cap=1
set hme:hme_adv_100hdx_cap=1
set hme:hme_adv_10fdx_cap=1
set hme:hme_adv_10hdx_cap=1
set hme:hme_adv_autoneg_cap=1

On the other hand, a Solaris 2.5 host must force the 100 Mbps modeinfull-duplex. Additionally, the link partner has to disable its autonegotiation capability, and you have to manually instruct it to use 100Mbps in full-duplex mode:

set hme:hme_adv_100fdx_cap=1
set hme:hme_adv_100hdx_cap=0
set hme:hme_adv_10fdx_cap=0
set hme:hme_adv_10hdx_cap=0
set hme:hme_adv_autoneg_cap=0

A 2.6 host should work correctly at optimum performance with itsdefaults, but it does not hurt to set the parameter like the patched2.5.1host. If you have more than one hme interface installed, and you needtoconfigure them differently, first you have to select theinstance as described above. Then you configure theparameters for that interface. Afterwards you can select a differentinstance and modify its configuration differently.

The other method to set the selected options permanently is tocreate ahme.conf file in the /kernel/drv directory.Thecontents of that file are not trivial, none of the kernel deviceconfiguration files are! Refer to the SUN Platform Notes: The hme FastEthernet Driver for the step by step guide of how to set uphme.conf.

If the third way to permanently set your options with the helpndd and a startup script looks tempting to you, you mightwantto consider appending the first startup scriptmentioned below. But please keep in mind that you have to shutdown andrestart any ndd configured interfaces in order to havetheoptions take effect.

8.2 Other problems

8.2.1 Multicast problems

If you are using the multicast backbone (MBone) of the internet,yourethernet interface, e.g. be0 is probably the primary choiceof themulticast interface. The interface will speak to your router asdetailed inRFC 1112. Prior to Solaris 2.6, the ethernetdriverallowed only for simultaneous participation in 64 multicast groups.

In previous versions of Solaris, when your multicast support ran outofspace for group participation, IP believed from the error conditionthatthe interface doesn't support any multicast whatsoever. Hence, itswitchedto link-level broadcasts for all multicast traffic - which does notinteroperate with other hosts still using regular multicast traffic.Upgradingto the hme interface and Solaris >= 2.6 is said to havesolvedthis particular problem.

8.2.2 Number of virtual interfaces

The number of virtual interfaces supported by any interface isfinite,of course. The tunable parameter is ip_addrs_per_if.

8.2.3 Distinct MAC addresses for multiple physical interfaces

If you have more than one ethernet interfaceinstalled into your Solaris box, you will notice that SUNs by defaultusethe MAC address of the first interface for all interfaces. Actually, itwill use the MAC address burnt into the EEPROM of the motherboard. Icannotthink of good reasons to do this, except for certain high-availabilityenvironments, so, if you want each interface to use its own MACaddress,type as super-user:

eeprom local-mac-address\?=true

9. Recommended patches

It is utterly necessary to patch you Solaris system, if you didn'talready do so! Have a look at the DFN CERT patch mirror or the original source from SUN. Theremay be a mirror closerto you, e.g. EUNet and FUNET have their own mirrors, if I am informedcorrectly.

In order to increase your TCP performance, security of websites andfixseveral severe bugs, do patch! Whoever still runs a Solaris below 2.5should upgrade to 2.6 at least. Each new version of Solarisincorporatesmore new TCP features than the previous one, and bug fixes, too.

Please remember to press the Shift button on your netscape navigatorwhile selecting a link. If the patch is not loadable, probably a newrelease appeared in the meantime. To determine the latter case, have alookat the directories of DFN CERT or SUN . The README file on theDNF-CERT server is keptwithout a version number and thus always up to date.

ip and ifconfig patch
103630-15 for Solaris 2.5.1 (README)
103169-15 for Solaris 2.5 (README)

tcp patch (only with ip patches)
103582-24 for Solaris 2.5.1 (README)
103447-10 for Solaris 2.5 (README)

hme patch
104212-13 for Solaris 2.5.1 (README)

Any system administrator should know the contents of SUN's patch page. Besidespreviouslymentioned patches for a good TCP/IP performance, you should alwaysconsiderthe security related patches. Also, SUN recommends a set of furtherpatchesto complete the support for large IP addresses. You should reallyincludeany DNS related patch.

The SUN supplied patches to fix multicast problems with 2.5.1 areincompatible with the TCP patch. Unfortunately, you have todecidebetween an unbroken multicast and a fixed TCP module. Yes, I am awarethatmulticast is only possible via UDP, nevertheless the multicast patchreplacesthe installed TCP module. If you have problems here, ask your SUNpartnerfor a workaround - he will probably suggest upgrading to 2.6.

10. Literature

The current section features a set of related material containingbooks,request for comments (RFCs), software and a multitude of links.

10.1 Books

[1]
Adrian Cockroft; Sun Performance and Tuning;SUN Microsystems Inc.; 1995; ISBN 0-13-149642-5. Regrettably only up toSolaris 2.4, but most information is still valid for current Solarissystems.
The HeiseVerlag offers a German translation.
[2]
[must read]Adrian Cockroft; Sun Performance and Tuning;2nd edition; SUN Microsystems Inc.; 04'1998; ISBN 0-13-095249-4.The improved version on performance and tuning, covers quick tips andSolaris 2.6 as well as Java server technologies.
[3]
W. Richard Stevens; Advanced Programming in the UNIX Environment; Addison-Wesley Publishing Company;Reading, MA; 1992; ISBN 0-201-56317-7.
A German translation is available as: Programmieren in derUNIX-Umgebung; ISBN 3-9319-814-8, 1995.
[4]
[must read] W.Richard Stevens; TCP/IP Illustrated, Volume 1 - The Protocols; Addison-Wesley Publishing Company; Reading, MA;1994; ISBN 0-201-63346-9.
A German translation is available.
[5]
W. Richard Stevens; TCP/IP Illustrated, Volume 2 - The Implementation; Addison-Wesley Publishing Company; Reading, MA;1995; ISBN 0-201-63354-X.
A German translation is available.
[6]
W. Richard Stevens; TCP/IP Illustrated, Volume 3 - T/TCP, HTTP, NNTP, Unix Domain Sockets; Addison-Wesley Publishing Company;Reading, MA; 1994; ISBN 0-201-63495-3.
A German translation is available.
[7]
W. Richard Stevens; Unix Network Programming, Network APIs: Sockets and XTI; Prentice-Hall Inc.; UpperSaddle River, NJ; 1998; ISBN 0-13-081081-9.
A German translation is not yet available.
[7b]
W. Richard Stevens; Unix Network Programming, Network APIs: Sockets and XTI; Prentice-Hall Inc.; UpperSaddle River, NJ; 1998; ISBN 0-13-490012-X.
A German translation is not yet available.
[8]
Brian Wong; Configuration and CapacityPlanning for Solaris Servers; SUN Microsystems Inc.; 199?;ISBN 0-13-349952-9.
A book showing host- and peripheral technoligies. Contains hints ontuning. Eases the detection of hardware errors, because it explainsabout the workings of the hardware (you are then able to determine, ifa misbehavior is a bug or a feature).
[9]
Andrew S. Tanenbaum;Computer Networks; Istill use the 2nd edition; Prentice Hall Inc., 1989, ISBN 0-13162959-X(2nd) and 1996, ISBN 0-13349945-6 (3rd).
A German translation of the 2nd edition is availabe: ComputerNetzwerke; Wolfram's Fachverlag, 1990, ISBN 3-925328-79-3.
[10]
Andrew S. Tanenbaum; Modern OperatingSystems; Prentice Hall Inc., 1992, ISBN 0-13588187-0.
[11]
Maurice Bach;Design of the Unix Operating System;Prentice Hall, 1986, ISBN 0-13201799-7.
A German translation is available.

10.2 Internet resources

10.3 RFC, mentioned and otherwise

RFCs mentioned in the text:

Unmentioned, but important Internet resources, for Web services.
Compare with Duane Wessel's required reading listfor Squid developers, and
W3C's change history of HTTP, or theHTTP protocol homepage. Links which areconsidered essential for the topic are marked darkgreen.

Also of interest in the regard of webservices may be a bunch ofrelateddrafts, partially expired, still sprouting with ideas. Compare with theIETF - Hypertext Transfer Protocol (HTTP) WorkingGroup published documents and W3C's change history of HTTP:

More recently, the work of the IETF WREC (web replication andcaching)working group was created. Its first effort deals with a taxonomy fortermsrelated to replication services and cache services, including proxyservices.

10.4 Further material

Software by SUN - no support offered!

11. Solaris' Future

This section will deal with forthcoming releases of the Solarisoperatingsystems.

11.1 Solaris 7

Solaris 7 was announced the end of October 1998. There are manyexcitingnew things in Solaris 7, and I am not talking about the 64 bitcapabilities. The readers of this page are probably as interested, whatbugs are fixed, extensions are implemented and features are available.Note, even though it is called Solaris 7, the unamecommand will still return 5.7, thus being staying compatible with manyscripts which are in circulation.

The interested reader will, of course, want to skip through thepagesmentioned above. For the impatient, here are the most interestingfeaturesfor speeding up your webrelated services:

11.2 Solaris 8

Solaris 8 did a lot of work in the TCP tunable section, aimed atfurtherperformance improvements for Internet servers and services.

Also, Solaris 8 offers IPv6 without needing to obtain extrapackages. Ifenabled, IPv6 support is integrated into many regular places. Still,for asecureSolaris installation, it is recommended to neither install KerberosnorIPv6.

If you are installing Solaris 8 for Intel, and you would like to usethestand-alone installation, boot from the 2nd CD-ROM. My installationwiththe web-installer failed frequently on different machines, and thestand-alone installation (my favourite, anyway) was the only way to getgoing.

Solaris 8 media kit now comes with many highly usable open sourceprograms like gcc 2.95.2 (the SPARC optimizer might still bebrain-dead, ifusing -mcpu=ultrasparc), perl 5.005_03, ghostscript 5.10,olvwm, rxvt, tiff, XPM, flex, bison, automake and many more.

If you upgrade to Solaris 8, clean out your /etc/system.Several of the parameters changed meaning, and should no longer be setin/etc/system. Look at pt_cnt andpriority_paging.

I am still collecting more material about Solaris 8!

11.3 Solaris 9

TBD

12. Uncovered material

There are a bunch of parameters which I didn't cover in the sectionsabove, but some of which may be worth looking at, among thesetcp_ip_abort_linterval (the correct version), tcp_ip_notify_cinterval,tcp_ip_notify_interval, tcp_rexmit_interval_extra,tcp_sth_rcv_lowat.

13. Scripts

For the important tweakable parameters exist startup scripts forSolaris. Only the first script is really necessary.

  1. The first script changed all parametersdeemed necessary and described in the previous sections. The fileshould be called something like /etc/init.d/nettune andyou must link (hardlinks preferred, symbolic links are o.k.) /etc/rcS.d/S31nettuneto the init.d file.

    SUN recommends to run control scripts like nettunebetween S69inet and S72inetsvr.

    Please read the script carefully before installing. It is arather straight-forward shell script. The piping and awking isn't asbad as it looks:

    Always tune the parameters to your needs, not mine.Thus, examine the values closely.

  2. The second script just changes the MTUof le0 from the IPX to the IEEE 802.3 size. The meaningis shown further up. The script is not strictlynecessary, and reports about odd behavior may have ceeded with apatched 2.5.1 or a 2.6.

    Since I observed the erratic behavior only in a Solaris 2.5, Ibelieve it has been fixed with patch 103169-10, or above. The errordescription reads "1226653 IP can send packets larger than MTU size tothe driver."

    If you intend to go ahead with this script, the file is called /etc/init.d/nettune2and you need to create a link to it (hard or soft, as above) as /etc/rc2.d/S90nettune2.Please mind that GNU awk is used in the script, normal awk does notseem to work satisfactorily.

  3. As this is the scripts section, I should remention the nifty script kindly supplied by Mr. Kroonmaa. Itallow the user to check on all existing values for a network component(tcp, udp, ip, icmp, etc.). Previously, I did somethingsimilar in Perl, but nothing as sophisticated until I saw Mr.Kroonmaa's script. He is really talented with scripts.

  4. There is a little helper, which letsyou inspect a bunch of kernel related parameters. The script wassupplied by Mr. Kroonmaa, and minimally modified by myself. It displaysthe contents of physmem, minfree, desfree, lotsfree, fastscan, slowscan, maxpgio, tune_t_gpgslo, tune_t_fsflushr, autoup, ncsize, ufs_ninode, maxusers, max_nprocs, maxuprc, ndquot, nbuf, bufhwm, rlim_fd_cur, rlim_fd_max, nrnode, coredefault, and is extensible to your needs. Forthe settings of these parameters, refer to AdrianCockroft [2]. You should remember to set the access rights to thescript to only allow root and admins to use it, i.e. change it to group adm, and use mode 0750.

    Another word of warning to this script, if you intend to usethe absolute debugger (adb), you'd better know what you aredoing.

14. List of things to do

This section is not about things you have to do, butratherabout items which I think of being in need to be reworked.Thus itis more a kind of meta-section.


[Solaris tuning] [TCP transactions] [SYS-V-IPC] [TCP rexmit] [Slow start] [Index] 

Sun, Sun Microsystems, the Sun Logo and Solaris are trademarks orregistered trademarks of Sun Microsystems, Inc. in the United Statesand other countries.

Sitemap:

[ Home | My professional site | Sig | Blogs | Podcasts | Where is Kevin? | Contact | Resume | More about Kevin ]
[ Geek | Weather | Radio | Aero | Electronics | Trains | GPS | Music | Travel | Movies | For sale | Photos ]
[ Mailing Lists | Quotes | Looking for a job? | Cheap gas? | Public files | Links | Changes ]
[ Inscoe Family | The Volt Company | Yellow Twister Hosting | Refuge Software ]
[ Central Florida Geeks | Central Florida Unix Professionals ]

Why are these pages black?

This site is Lynx friendly!

Current time in Deltona, Florida, United States [28.9002N 81.2419W | Grid: EL98jv] is Fri Jul 30 18:10:44 2010


Copyright © 1995-2005 Kevin P. Inscoe Viewable With Any Browser

This website and all original artwork and material is © copyright 2009 Kevin P. Inscoe. Other material is used under the "Fair Use" provisions of United States of America Copyright law, and all rights remain with the original copyright holders.