Randomly Generated Input Stream

When writing tests, programmers often need to provide some test files for their code to work. This is typically done by uploading them to their version control systems or exposing them over the network to be downloaded at runtime. However the reasons for a particular test file being used may differ greatly. Usually there are these main reasons to include test files in your automated testing process:

• Configuration
• Data transfer
• Test data

However, we don’t always need data, that are logically structured or meaningful. Recently, I have been working on a project that involved development of an enterprise grade archiving system. We were faced with a situation how to solve the problem of providing a flexible number of test files for our unit and integration tests. These files were to be submitted via REST API and the tests validated system-wide processes and their footprint. To accomplish this we tried to search for a randomly generated streams in all well-known libraries but found nothing. This got us thinking and we decided to implement following stream to generate data for us with a little twist.

As you can see, our RandomGeneratedInputStreams constructor takes either one or three arguments. One argument constructor creates truly randomly generated data for test files. However the three argument constructor provides programmer the means to select strategy used to generate the stream content and also control its degree of randomness. Thanks to this little tweak we are able to generate not only random test files, but also files that, when compressed to ZIP archive for example, become really efficient to transport over the network – given their compressed size and allow for an even load to a compression handler. See the following snippet for details on how to use this stream and also how to configure it properly:

To back my claim I wrote two simple unit tests that use RandomGeneratedInputStream to create 20 and 25 test files of size 20 480 bytes. Code afterwards archives these files into ZIPs that can be injected into the relevant entities and used for testing of the business logic. I offer you these simple tables containing sizes and compression ratios of a single run of my tests. As expected, for FIXED strategy all sizes as well as compression ratios scale pretty evenly. Interestingly enough, the second table containing results of ITERATIVE strategy provides insights for ratios that behave differently from previous case. Both tests were executed under JDK 1.7.0_21 and compression was done using default NIO.2 mechanism for creation of ZIP archives (to be described in NIO.2 series soon).

Results of testing: FIXED strategy
Stream size Block size Archive size Compression ratio
20 480 b 20 480 b 183 b 99.11 %
20 480 b 19 456 b 1 297 b 93.67 %
20 480 b 18 432 b 2 318 b 88.68 %
20 480 b 17 408 b 3 345 b 83.67 %
20 480 b 16 384 b 4 363 b 78.70 %
20 480 b 15 360 b 5 385 b 73.71 %
20 480 b 14 336 b 6 399 b 68.75 %
20 480 b 13 312 b 7 423 b 63.75 %
20 480 b 12 288 b 8 442 b 58.78 %
20 480 b 11 264 b 9 461 b 53.80 %
20 480 b 10 240 b 10 476 b 48.85 %
20 480 b 9 216 b 11 494 b 43.88 %
20 480 b 8 192 b 12 517 b 38.88 %
20 480 b 7 168 b 13 535 b 33.91 %
20 480 b 6 144 b 14 554 b 28.94 %
20 480 b 5 120 b 15 570 b 23.97 %
20 480 b 4 096 b 16 592 b 18.98 %
20 480 b 3 072 b 17 613 b 14.00 %
20 480 b 2 048 b 18 631 b 9.03 %
20 480 b 1 024 b 19 647 b 4.07 %
20 480 b 0 b 20 640 b -0.78 %

”Results
Stream size Block size Archive size Compression ratio
20 480 b 1 b 20 618 b -0.67 %
20 480 b 2 b 19 662 b 3.99 %
20 480 b 3 b 10 825 b 47.14 %
20 480 b 4 b 8 241 b 59.76 %
20 480 b 5 b 6 794 b 66.83 %
20 480 b 6 b 5 824 b 71.56 %
20 480 b 7 b 5 110 b 75.05 %
20 480 b 8 b 4 547 b 77.80 %
20 480 b 9 b 4 109 b 79.94 %
20 480 b 10 b 3 748 b 81.70 %
20 480 b 11 b 3 645 b 82.20 %
20 480 b 12 b 3 335 b 83.72 %
20 480 b 13 b 3 190 b 84.42 %
20 480 b 14 b 2 909 b 85.80 %
20 480 b 15 b 2 799 b 86.33 %
20 480 b 16 b 2 567 b 87.47 %
20 480 b 17 b 2 502 b 87.78 %
20 480 b 18 b 2 315 b 88.70 %
20 480 b 19 b 2 363 b 88.46 %
20 480 b 20 b 2 231 b 89.11 %
20 480 b 21 b 2 126 b 89.62 %
20 480 b 22 b 2 033 b 90.07 %
20 480 b 23 b 2 011 b 90.18 %
20 480 b 24 b 1 888 b 90.78 %
20 480 b 25 b 1 818 b 91.12 %

5 thoughts on “Randomly Generated Input Stream”

1. Raja says:

could you please give a real life scenario where we can use this java program

1. Jakub Staš says:

Hi Raja, the use of this stream is quite universal, but it becomes especially useful when it comes to testing of systems working with files. As my team has often seen in production code left for us by our predecessors, it is really common, even for a senior developer, to convert an output stream to an input stream using byte array (hence everything gets stored in memory). This is possible to do for files that have few MBs, but try that on 5 GB video file and you will run in trouble soon. So you can use this stream for example for performance or smoke testing and see how your application handles large files, what the performance is and whether there are any memory leaks. To counter the problem with in memory conversion of streams there is a sweet library to do so and i’ll post on that one in following weeks.

1. Raja says:

Many thanks for the explanation

2. Markus Ressel says:

Hey there, I would really like to use your code, but there is no mention of a license anywhere.
Could you add a single line for this so I can use your code without the fear of getting sued? 🙂

1. jakub says:

No need to fear anything Markus. Feel free to use the code. Hope it helps you!