Thursday, September 12, 2013

Pipes, Pipes and More Pipes


No this post is not about plumbing or smoking. This is about the pipes used in GNU/Linux and other operating systems.

A Couple of Notes:

This article is about using redirection, pipes and named pipes in a command shell on a system running GNU/Linux. If you are not interested in command line use, or if you are not interested in using GNU/Linux, please do not bother to make comments about your lack of interest in such things. Comments about how much better other operating systems are, or how outdated command line mode is, will not be posted. I am familiar with, and use, many operating systems. This will not become a forum for arguing the benefits of other operating system. If you want to post a comment that describes a similar method of piping input or output for other operating systems, feel free. Keep it civil and on topic.

Many of the everyday electronic items we now use are running some form of open source software. This allows the system to be modified as desired, or needed, by the user. Most often I find that these items are running a version of GNU/Linux. Anytime I get a new electronic device, I check to see what operating system is running, and what access methods are available

Back to the Pipes:

There are three primary methods to change the standard input, named stdin, and standard output, named stdout, that are commonly used in GNU/Linux operating systems. These are redirection, pipes and named pipes. Is redirection a pipe or is a pipe redirection? In most articles, they consider a pipe to be redirection. I believe that redirection using the < or > symbols is a one way pipe. You are piping the input or output of the command to something other than stdin or stdout. However, in most cases, you will see the use of < or > referred to as redirection and the use of | or a named pipe FIFO referred to as a pipe. I follow that convention most of the time. However, I call redirection a pipe more often than I call a pipe redirection.

Many text mode commands have default output and input devices. Often the default output device is stdout and this is usually directed to the screen, or a window on the screen. The default input device is usually stdin and is most often the keyboard. Both of these, as well as another device, called standard error or stderr, can be redirected to other devices including files or printers.

In our example below, the file /tmp/pipe1, is a named pipe FIFO. Output to it is from the Cat command using a redirection with the > symbol. The default output device for the cat command is stdout so our use of the redirection simply changes the output device to the file, which happens to be a FIFO. This redirection is the first pipe in the example command line.

The second pipe used in the example below is referred to as, simply enough, a pipe and is designated by the vertical bar symbol. The normal function of a pipe is to connect the stdout of the command to the left, to the stdin of the command to the right. It also causes both commands to run at the same time. However, not all commands, by default, accept their input from stdin or send their output to stdout. Ftpput, as used in the example, is such a command. Ftpput does not have a default input device, it must be given an input file or device. More on that in a minute. We still want the Ftpput command to run at the same time as the cat command. The pipe is being used to make that happen.

As mentioned above, not all programs, or commands, have a default input or output. Some, such as ftpput, must be given a file or device to get input from or send output to. In some cases stdin and stdout can still be used. For example, the ftpput command will accept input from stdin two different ways. First, the - symbol can be used in place of the input file name. The - symbol is used by many programs to represent stdin or stdout, depending on where it is used in the command line. In the example, we could use this command line instead of the one given:

#cat (20130908_1916_R*)|ftpput -v -u(user name) -p(password) (192.168.1.203) ("/Home Public/Shows/Actual Show Name and Episode.mpg") -

And it will work fine. In this case, the output from the cat command is being allowed to go to stdout and the input to ftpput is from stdin. The pipe is creating the link to redirect stdout and stdin. In other cases, stdout can be given as a named file with /dev/stdout and stdin can be given as a named file with /dev/stdin. So the command line:

#cat (20130908_1916_R*)>/dev/stdout|ftpput -v -u(user name) -p(password) (192.168.1.203) ("/Home Public/Shows/Actual Show Name and Episode.mpg") /dev/stdin

Will also work.

With either of these alternate command lines, the shell simply creates a FIFO in a buffer and uses it in the same manner as a named pipe FIFO is used.

The third pipe in the example is the FIFO, /tmp/pipe1, which is known as a named pipe. This is a special type of file. It is shown in the directory listing with a p, for pipe, in the first position of the attribute field, the same place that shows a d for a sub-directory. On some operating systems, the directory listing will have the vertical bar symbol at the end of the file name to indicate that the file is a pipe. Using this file, all data that must, or can, be written to or read from a named file may be piped from one command to another command. This type of pipe has some other unique properties besides the name. The file really only exists in the physical realm as a memory buffer and a name in a directory. A directory listing will always show the file as having zero bytes. This is because the file does not accept input from the command doing the writing, until there is a command running that is reading the file. Each byte written to the file is expected to be read right away by another process. The FIFO will cause blocking on one side if the other side is not running. So the cat command will wait for the ftpput command to start looking for data before it starts writing that data.

You may wonder why a named pipe would be used if the same thing can be accomplished using one of the alternate methods of redirection of stdin and stdout. That brings us to another unique property of named pipes. The process, or command, that is writing to the FIFO and the process that is reading from the FIFO do not have to be running in the same shell, or even on the same machine. The example could be broken up so that the cat command is running in one shell terminal and the ftpput command is running in another shell terminal. And the terminals do not have to be on the same machine as long as both have access to the named pipe file and the operating system on both machines recognize that type of file.

Named pipes are persistent so they remain after the processes that are using them end. So the /tmp/pipe1 file in the example remains even after the Cat and Ftpput commands complete and the Telnet session is closed. This is another feature that can be used by programs. it is possible to have a command watching the FIFO that will perform a function when the FIFO is opened by another program or command for data to be written. For example, I have a script that runs all the time on one of my NAS units. It watches a FIFO and sends the data it receives to a file whose name increments each time the command runs. That program receives data from an automatic backup in a program that could only be given a single fixed name for the backup file. Before I started using the FIFO, I had to run a script that renamed all the prior backup files so the fixed name could be used without overwriting the past backup.

There are some other interesting features provided by named pipes. For example, you can read and write to them from multiple programs at the same time. That allows use of things like the Tee command which would allow writing the output to the screen and a file, or to multiple files, at the same time.

This is not intended to be a full Howto for redirection and pipes. I strongly suggest you explore some of the excellent information available on the Internet to learn more about them and how to use them in scripts and programs. This was just an intro to show how I used all three types to accomplish the goal for this case.

The Details and Example:

A little background on what I needed to do and how I did it for this case. Note that I do not use this command line anymore. I have written a shell script that looks for new files on the FAT32 partition and does the cat and ftpput automatically when a new file appears using a database that I created that contains the date and time the file was copied to the partition and the name of the show as I want it stored on the NAS drive.

One of my recent purchases was a new, old stock, DVR device. It was made in 2008 and connects to a network using a wired or wireless Ethernet connection. I found it was capable of a Telnet connection and was running a Linux Venus operating system, a common system for media devices.

This DVR has the capability to copy recorded programs, which are on a UDF file system, to a hard drive. However, it will only copy to a FAT32 partition and it breaks the program up into multiple files, each 2GB, or less. The files are broken up using a Split command, so to use them on other devices they must be put back together. I wanted to save these recordings on network attached storage units, which are formatted as NTFS or EXT3, so they could be viewed from other attached media devices on the network Both of those file systems handle files much larger than the combined file size of any recording. So I just needed to combine the smaller files into one file on the NAS unit.

The other problem was that the DVR gave the files names based on the date and time they were copied to the FAT32 partition. For example, a 3 hour show would have a name, on the FAT32 drive, like 20130908_1917_Rnn.mpg. The Rnn would be R00, R01 and R02 for the three files the program was split into. I wanted the actual name of the show to be the final name on the NAS.

Checking the available commands on the DVR through a Telnet connection, showed that the only method to get a file to a NAS drive was using a Ftpput command. There was no Mount command that could be used to mount a remote share and there was no standard FTP command that would allow wildcard copies. So the copy to the NAS would need to be a single file at a time. I would then need to combine the files on the NAS drive and rename the resulting file. Without using redirection and pipes, this would require dealing with each bit of data at least three times on the network.

Here was the solution I decided to use instead:

I opened a Telnet session to the DVR. After logging in and changing to the directory where the DVR saved the copied files, I issued the following command in the shell.

Note: This is the example referred to above;

#mknod /tmp/pipe1 p;cat (20130908_1916_R)*>/tmp/pipe1|ftpput -v -u(user name) -p(password) (192.168.1.203) ("/Home Public/Shows/Actual Show Name and Episode.mpg") /tmp/pipe1

The ( and ) symbols in the command line shown are not typed into the actual command line. They are used here to indicate items that would need to be changed to match the actual items on your system. For example, your user name and password on the NAS unit you are copying to. However, the " symbols were needed since the path and file name have spaces in them..

The Mknod command was used to create the named pipe FIFO file /tmp/pipe1. On newer GNU/Linux systems a Mkfifo command is frequently available for this. This system did not offer that command. The p after the file name tells Mknod that the file is going to be a named pipe FIFO. As noted earlier, this file is persistent, so that command does not really have to be run each time. However if the file exists, the command simply presents a message to that effect and the shell continues with the next command.

For those who are not familiar with this, you will notice a semicolon after the Mknod command. The semicolon tells the shell to run the Mknod command and, when it finishes, to run the Cat command. This is in contrast to the vertical bar symbol used between the Cat command and the Ftpput command which tells the shell to run both commands at the same time.

Next, a Cat command is used to concatenate the multiple files into a single file using redirection to the FIFO /tmp/pipe1. The Cat command simply reads the file(s), in order, and creates an output which is normally directed to stdout. Redirection of stdout to a file is done using the > symbol. It can also be redirected using a standard pipe with the | symbol. For example "cat sometext.txt|more" would cause Cat to read the file sometext.txt and send it to the More command so it is displayed one page at a time in the console.

The Cat command and the Ftpput command are tied together with a standard pipe so they will run at the same time. Doing this causes the Cat command to open the FIFO for writing and the Ftpput command to open the FIFO for reading at the same time. As data is written to the FIFO by Cat, it is read by Ftpput and sent to the remote file system as a single file. When the Cat command completes, it closes the FIFO for writing. The Ftpput command will see that the FIFO has been closed and, once all data has been processed, will close the file for reading and exit.



No comments: