Randomly Generated Input Stream

When writing tests, programmers often need to provide some test files for their code to work. This is typically done by uploading them to their version control systems or exposing them over the network to be downloaded at runtime. However the reasons for a particular test file being used may differ greatly. Usually there are these main reasons to include test files in your automated testing process:

  • Configuration
  • Data transfer
  • Test data

However, we don’t always need data, that are logically structured or meaningful. Recently, I have been working on a project that involved development of an enterprise grade archiving system. We were faced with a situation how to solve the problem of providing a flexible number of test files for our unit and integration tests. These files were to be submitted via REST API and the tests validated system-wide processes and their footprint. To accomplish this we tried to search for a randomly generated streams in all well-known libraries but found nothing. This got us thinking and we decided to implement following stream to generate data for us with a little twist.

As you can see, our RandomGeneratedInputStreams constructor takes either one or three arguments. One argument constructor creates truly randomly generated data for test files. However the three argument constructor provides programmer the means to select strategy used to generate the stream content and also control its degree of randomness. Thanks to this little tweak we are able to generate not only random test files, but also files that, when compressed to ZIP archive for example, become really efficient to transport over the network – given their compressed size and allow for an even load to a compression handler. See the following snippet for details on how to use this stream and also how to configure it properly:

/**
 * Input stream that generates random data with three possible configurations:
 * <ul>
 *     <li>Fully random stream created by calling one argument constructor.</li>
 *     <li>Stream that starts with fixed uniform block of bytes followed by fully random data.</li>
 *     <li>Stream that creates fixed blocks and populates each with randomly generated new byte.</li>
 * </ul>
 * 
 * The first option is quite universal and may be used for any type of tests. The second option might be
 * useful when testing logic dependent on compressed data size and archived data are transferred via
 * network. However, the second configuration is not efficient in terms of complexity of compression and
 * its speed. In order to remove this bias the third configuration was introduced to provide evenly 
 * distributed data that provide steady load for compression and decompression of generated files.
 * 
 * @author Petr Fiala, Jakub Stas
 */
public class RandomGeneratedInputStream extends InputStream {

    private final Random random = new Random();

    /** Type of randomization strategy. */
    private final Type type;

    /** Target size of the stream. */
    private final long size;

    /** Size of the block populated by same byte. */
    private final long blockSize;

    /** Size of expandable block being populated by same byte. */
    private long currentBlockSize;

    /** Value of last generated byte. */
    private int lastUsedByte;

    /** Internal counter. */
    private long index;

    /**
     * @param size target size of the stream [byte]
     */
    public RandomGeneratedInputStream(long size) {
        this(size, 1, Type.FIXED);
    }

    /**
     * @param size target size of the stream [byte]
     * @param blockSize size of the block populated by same byte [byte]
     * @param type randomization strategy 
     */
    public RandomGeneratedInputStream(long size, long blockSize, Type type) {
        super();

        if (blockSize < 1) {
            throw new IllegalArgumentException("Block size must be at least one byte!");
        }

        this.size = size;
        this.type = type;
        this.blockSize = blockSize;
        this.currentBlockSize = blockSize;
        this.lastUsedByte = random.nextInt(255);
    }

    @Override
    public int read() throws IOException {
        if (index == size) {
            return -1;
        }

        switch (type) {
            case ITERATIVE:
                if (index == currentBlockSize) {
                    lastUsedByte = random.nextInt(255);
                    currentBlockSize += blockSize;
                }
                break;
            case FIXED:
                if (index >= blockSize) {
                    lastUsedByte = random.nextInt(255);
                }
                break;
            default:
                break;
        }

        index++;

        return lastUsedByte;
    }

    /**
     * Type of randomization strategy used to populate the stream.
     */
    public static enum Type {
        ITERATIVE,
        FIXED;
    }
}

To back my claim I wrote two simple unit tests that use RandomGeneratedInputStream to create 20 and 25 test files of size 20 480 bytes. Code afterwards archives these files into ZIPs that can be injected into the relevant entities and used for testing of the business logic. I offer you these simple tables containing sizes and compression ratios of a single run of my tests. As expected, for FIXED strategy all sizes as well as compression ratios scale pretty evenly. Interestingly enough, the second table containing results of ITERATIVE strategy provides insights for ratios that behave differently from previous case. Both tests were executed under JDK 1.7.0_21 and compression was done using default NIO.2 mechanism for creation of ZIP archives (to be described in NIO.2 series soon).

Results of testing: FIXED strategy
Stream size Block size Archive size Compression ratio
20 480 b 20 480 b 183 b 99.11 %
20 480 b 19 456 b 1 297 b 93.67 %
20 480 b 18 432 b 2 318 b 88.68 %
20 480 b 17 408 b 3 345 b 83.67 %
20 480 b 16 384 b 4 363 b 78.70 %
20 480 b 15 360 b 5 385 b 73.71 %
20 480 b 14 336 b 6 399 b 68.75 %
20 480 b 13 312 b 7 423 b 63.75 %
20 480 b 12 288 b 8 442 b 58.78 %
20 480 b 11 264 b 9 461 b 53.80 %
20 480 b 10 240 b 10 476 b 48.85 %
20 480 b 9 216 b 11 494 b 43.88 %
20 480 b 8 192 b 12 517 b 38.88 %
20 480 b 7 168 b 13 535 b 33.91 %
20 480 b 6 144 b 14 554 b 28.94 %
20 480 b 5 120 b 15 570 b 23.97 %
20 480 b 4 096 b 16 592 b 18.98 %
20 480 b 3 072 b 17 613 b 14.00 %
20 480 b 2 048 b 18 631 b 9.03 %
20 480 b 1 024 b 19 647 b 4.07 %
20 480 b 0 b 20 640 b -0.78 %

”Results
Stream size Block size Archive size Compression ratio
20 480 b 1 b 20 618 b -0.67 %
20 480 b 2 b 19 662 b 3.99 %
20 480 b 3 b 10 825 b 47.14 %
20 480 b 4 b 8 241 b 59.76 %
20 480 b 5 b 6 794 b 66.83 %
20 480 b 6 b 5 824 b 71.56 %
20 480 b 7 b 5 110 b 75.05 %
20 480 b 8 b 4 547 b 77.80 %
20 480 b 9 b 4 109 b 79.94 %
20 480 b 10 b 3 748 b 81.70 %
20 480 b 11 b 3 645 b 82.20 %
20 480 b 12 b 3 335 b 83.72 %
20 480 b 13 b 3 190 b 84.42 %
20 480 b 14 b 2 909 b 85.80 %
20 480 b 15 b 2 799 b 86.33 %
20 480 b 16 b 2 567 b 87.47 %
20 480 b 17 b 2 502 b 87.78 %
20 480 b 18 b 2 315 b 88.70 %
20 480 b 19 b 2 363 b 88.46 %
20 480 b 20 b 2 231 b 89.11 %
20 480 b 21 b 2 126 b 89.62 %
20 480 b 22 b 2 033 b 90.07 %
20 480 b 23 b 2 011 b 90.18 %
20 480 b 24 b 1 888 b 90.78 %
20 480 b 25 b 1 818 b 91.12 %

5 thoughts on “Randomly Generated Input Stream

    1. Hi Raja, the use of this stream is quite universal, but it becomes especially useful when it comes to testing of systems working with files. As my team has often seen in production code left for us by our predecessors, it is really common, even for a senior developer, to convert an output stream to an input stream using byte array (hence everything gets stored in memory). This is possible to do for files that have few MBs, but try that on 5 GB video file and you will run in trouble soon. So you can use this stream for example for performance or smoke testing and see how your application handles large files, what the performance is and whether there are any memory leaks. To counter the problem with in memory conversion of streams there is a sweet library to do so and i’ll post on that one in following weeks.

      1. Hey there, I would really like to use your code, but there is no mention of a license anywhere.
        Could you add a single line for this so I can use your code without the fear of getting sued? 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *