The io package

Introduction

This page gives an overview of the io package, which provides Object Icon’s object-oriented replacement to Icon’s builtin File type.

Details

The central class in the io package is io.Stream. This class models a file-like object which can have some or all of the following operations :-

input
output
seek and tell
truncation

The io package also provides some procedures which emulate the old Icon builtin functions which the package replaces (write, open, etc).

File I/O

The class io.FileStream provides conventional filesystem I/O. To open a new FileStream, use the constructor :-

   f := FileStream("test.txt", ior(FileOpt.WRONLY, 
                                   FileOpt.TRUNC, 
                                   FileOpt.CREAT)) | stop(&why)
   f.write("some text")
   f.close()

The second parameter is made up by taking some constants from the FileOpt class and or-ing them together. These constants match the constants used in the underlying Posix open system call, except that their names lack the “O” prefix. So in the above example the open call is logically equivalent to the C call

   f = open("test.txt", O_WRONLY | O_TRUNC | O_CREAT);

As an alternative to using FileStream directly, the open procedure can be used to open a file, as follows :-

   f := open("test.txt", "w") | stop(&why)
   f.write("some text")
   f.close()

open emulates the traditional Icon builtin function of the same name. It also returns a different type of Stream - rather than a FileStream, it returns a io.BufferStream wrapping a FileStream. This provides better performance than a raw FileStream because it tends to read or write data to the underlying filesystem in larger chunks.

Standard input and output

Standard input and output, and error output, are available via the following static constants.

FileStream.stdin - standard input
FileStream.stdout - standard output
FileStream.stderr - standard error output

These are all FileStream instances, and are hence unbuffered. If you intend to read or write a large amount of data to one of these streams, performance will be improved by using a BufferStream wrapper. For example, consider the following program to count lines :-

import io

procedure main()
   local n
   n := 0
   while read() do
      n +:= 1
   write("n=", n)
end

On my system this takes about 2.4 seconds to read a 47000-line file. The following simple change reduces the run time to only 0.45 seconds :-

import io

procedure main()
   local n, f
   n := 0
   f := BufferStream(FileStream.stdin)
   while f.read() do
      n +:= 1
   write("n=", n)
end

Socket I/O

The io.SocketStream class represents a stream based on an underlying socket. There are several static methods in that class to create sockets either for a client or a server. Both local and network sockets are supported.

Here is a simple example to create a client to perform a HTTP HEAD request.

import io

procedure main()
   local f
   f := SocketStream()
   f.connect("inet:www.google.com:80") | stop(&why)
   f.writes("HEAD / HTTP/1.1\r\nConnection: close\r\n\r\n")
   while write(f.read())
   f.close()
end

More robust socket i/o

One problem with the bit of socket code shown above is that it is vulnerable to network problems. Either the connect, writes or read calls could potentially hang and never return. The io package provides the necessary interface to the underlying system calls which can be used to deal with this problem.

Non-blocking mode

This mode is used to ensure an i/o request won’t block. To enable it, use the flag method as follows :-

   f.flag(FileOpt.NONBLOCK)

flag accepts a second parameter, which are flags to turn off, so you could turn non-blocking mode off again as follows :-

   f.flag(, FileOpt.NONBLOCK)

When set, calls to a stream’s in and out methods will fail, setting &why to a message like “Resource temporarily unavailable (errno=11)”. The errno constant can be extracted from &why with the errno() procedure; in this case it would return 11, which is equal to the constant Errno.EAGAIN. errno and Errno can be found in the util package and the posix package respectively (they aren’t in the io package because the errno feature applies more generally than just to i/o).

The poll() method

poll() is a static method in the io.DescStream class (the common parent of FileStream and SocketStream). It provides an interface to the posix function of the same name, and is used to wait, with an optional timeout, for one or more files to become ready for i/o. To use it, pass a list of one or more argument pairs, followed by an optional timeout. Each pair consists of a stream followed by an integer flag, made up from the constants in io.Poll, which indicates what event(s) we are polling for. For example :-

r := DescStream.poll([f1, Poll.OUT, f2, ior(Poll.IN,Poll.OUT)], 5000)

This will wait for up to 5 seconds, polling f1 for input and f2 for input or output. The return value is a list with an integer element for each pair indicating the flags which poll detected (or zero if no flags matched for that particular stream). The call fails on timeout or error and sets &why appropriately.

Here is a longer example showing how to do the HTTP HEAD request of the previous example using poll on a non-blocking socket.

Download httphead.icn

import io, posix

procedure main()
   local f, r, s, i

   #
   # Create a socket and switch on non-blocking mode
   #
   f := SocketStream()
   f.flag(FileOpt.NONBLOCK) | stop(&why)

   #
   # Connect, which will probably fail with an EINPROGRESS error - that
   # isn't really an error.
   #
   f.connect("inet:www.google.com:80") | {
      errno() = Errno.EINPROGRESS | stop("Couldn't connect:" || &why)
      # Wait up to 5s for the connection to complete
      r := DescStream.poll([f, Poll.OUT], 5000) | stop(&why)
      r[1] = Poll.OUT | stop("Socket error")
   }

   #
   # Send the request, in more than one piece if necessary.
   #
   s := "HEAD / HTTP/1.1\r\nConnection: close\r\n\r\n"
   while *s > 0 do {
      r := DescStream.poll([f, Poll.OUT], 1000) | stop(&why)
      r[1] = Poll.OUT | stop("Socket error")
      i := f.out(s) | stop(&why)
      s := s[i+1 : 0]
   }

   # 
   # Read the data sent back, until end-of-file
   #
   repeat {
      r := DescStream.poll([f, Poll.IN], 1000) | stop(&why)
      r[1] = Poll.IN | stop("Socket error")
      s := f.in(1024) | stop(&why)
      # &null indicates EOF
      if /s then
         break
      writes(s)
   }

   f.close()
end

Note that this code uses the methods in and out to read and write data, rather than read, reads, write and writes. This is because those latter functions are implemented in terms of in and out and may call them several times in one invocation. This makes them unsuitable for use with a non-blocking file, because we need to call poll or select immediately before each i/o call to ensure that call has something to read or write. For similar reasons, it is not possible to use BufferStream to wrap a non-blocking stream.

In-memory I/O

There are two types of stream which operate entirely on data in memory. The first is io.StringStream which represents the stream’s data as an Icon string. The second is io.RamStream which uses memory allocated independently from Icon’s allocation system. StringStream is useful when a constant string needs to be represented as a stream, but RamStream is much faster when the stream will be written to and changed.

RamStream can be used to efficiently concatenate long strings together. This is a potential problem in Icon if you are not careful, because of the way strings are allocated. For example, consider the following program.

import io

procedure main()
   local s, i
   s := ""
   every i := 1 to 20000 do {
      if i % 200 = 0 then
         write(i)
      s ||:= repl(i, 5)
   }
   write("Done, *s=", *s)
end

This program builds up a long string s consisting of each integer replicated 5 times. If you run this program you will see that the longer s becomes, the longer each iteration of the loop takes. This is because the implicit conversion of i to a string in the repl call creates a temporary string which separates the result of repl from s, meaning that the concatenation operation has to copy s in its entirety - an operation that becomes slower as s becomes longer.

One way to dramatically speed this program up is to use a RamStream, to build up the string independently of Icon’s string allocation, as follows :-

import io

procedure main()
   local s, i
   s := RamStream()
   every i := 1 to 20000 do {
      if i % 200 = 0 then
         write(i)
      s.writes(repl(i, 5))
   }
   write("Done, *s=", *s.str())
   s.close()
end

The RamStream’s str() method returns its content as an Icon string.

Note that it is most important to remember to close a RamStream in order to free its internal memory - Icon’s garbage collector won’t do it for us.

FilterInputStream and FilterOutputStream

These two classes make use of non-blocking I/O to allow useful filter programs (such as gzip) to be utilized. Normally it is not possible to use filter programs with blocking I/O because of the risk of deadlock. With non-blocking I/O it becomes possible, but intricate, and these two classes provide an easy-to-use interface to the complexities involved.

The idea behind io.FilterOutputStream is that we specify the command to be run, and its parameters. A process is forked and the command begins to run. Then we write data to the stream, and this becomes the input of the command. But what of the output of the command? That is captured in another Stream (called the “sink” Stream), specified in the FilterOutputStream constructor. This is typically a RamStream, or a StringStream if the amount of data is small. After the FilterOutputStream is closed, we can examine the sink Stream and get the results.

io.FilterInputStream works just the same way, but the order of things is switched around. Instead of a sink Stream we provide a “source” Stream. This provides the input to the command. And we read the results of the command simply by calling in() or reads() or one of the other usual Stream methods.

Of the two options, FilterOutputStream seems to be the most natural to use.

Both classes allow standard error output to be captured too, and can test whether the command succeeded or not. The method succeeded() is particularly helpful for testing for success, and getting a helpful diagnostic message from the standard error output.

To illustrate these classes, here is a program which uses gzip to compress some data, and then gunzip to uncompress it again.

Download filter.icn

import io, util

procedure main()
   local f, data, ram, gdata, data2

   # Create some data to compress
   data := repl("The quick brown fox jumps over the lazy dog", 1000)

   # ram will be the "sink" for the output of gzip.
   ram := RamStream()
   use { 
      f := FilterOutputStream(ram, "gzip", ["-c"]),
      f.writes(data)
   }
   f.succeeded() | stop("gzip problem: ", &why)

   # Get the output as a string.
   gdata := ram.done()
   write("Compressed ", *data, " to ", *gdata, " bytes")

   # Now pass the compressed data (as a StringStream source) into
   # gunzip and read the output to end of file.
   data2 := use {
      f := FilterInputStream(StringStream(gdata), "gunzip", ["-c"]),
      f.read_all()
   }
   f.succeeded() | stop("gunzip problem: ", &why)

   # Check the results.
   if data == data2 then
      write("Recovered original data OK")
end

An example run may produce the output.

Compressed 43000 to 203 bytes
Recovered original data OK

Note the use of succeeded() to check the result of the command. If we introduce a deliberate error, such as in invalid flag to gunzip, the output might be :-

Compressed 43000 to 203 bytes
gunzip problem: gunzip failed: exited with status 1: gzip: invalid option -- 'X' Try `gzip --help' for more information.

Contents