GDAL C++ API: How to create PNG or JPEG from scratch

I'm new to GIS and GDAL. My question probably is very basic, but I couldn't find answer. May be I don't understand GDAL ideology.

I need to create raster images from scratch, for example, JPEG or PNG. Their drivers don't support Create function - only CreateCopy. What is the common technique of new files creation in this case?

In principle, I can try to create Tiff because its driver suports Create(). Next, I can use CreateCopy() for PNG or JPEG using this Tiff. But such method looks indirect and unnatural for me. Also I suppose that this procedure can be too memory hungry if rasters are large.

I dealt with some image libraries before, they usually provide direct and simple way of bitmaps creation. Could somebody show me right direction for GDAL?

As user30184 said, in Python, the process would be creating a memory raster of the same dimensions (layers and layer extension), and executing the CreateCopy after that:

driver = gdal.GetDriverByName( 'MEM' ) driver2 = gdal.GetDriverByName( 'PNG' ) ds = driver.Create(", 255, 255, 1, gdal.GDT_Int32) ds2 = driver.CreateCopy('/tmp/out.png">

I'm using GDAL Version 1.11 in C++, that said my answer may not be relevant depending on the version you are using.

The theCreate()method is not supported by the PNG driver (I'm not familiar with the JPEG driver), so you must first create aGDALDatasetwith a different driver the use a theCreateCopymethod from a PNG driver to get your PNG image. Also, the PNG driver only supportsGDALDataTypeGDT_UInt16andGDT_Byteso the source dataset you are copying must by one of those types.

Here's a little example code…

GDALDriver *pDriverTiff, *pDriverPng; GDALDataset *pSourceDS, *pPngDS; //call GDALGetDriverManager to setup the two different drivers //create (or open) the source dataset pSourceDS = pDriverTiff->Create(datasetPath, 300, 300, 1, GDT_Byte, NULL); //use the png driver to copy the source dataset pPngDS = pDriverPNG->CreateCopy(pngPath, pSourceDS, FALSE, NULL, NULL, NULL);

Using my toojpeg library

  1. include the toojpeg.h header file
  2. define a callback that accept a single compressed byte
    • will be called for every byte
    • basically similar to fputc
    • . but can be anything you like, too, e.g. send that byte via network, add it to an std::vector , .
  3. call TooJpeg::writeJpeg()
hide #include "toojpeg.h" // create your RGB image auto pixels = new unsigned char[ 1024 * 768 * 3 ] // set pixels as you like . // . // define a callback that accepts a single byte void writeByte(unsigned char oneByte) < fputc( oneByte , output) >// . // and start compression ! bool ok = TooJpeg::writeJpeg( writeByte , pixels , width, height) // actually there are some optional parameters, too //bool ok = TooJpeg::writeJpeg(writeByte, pixels, width, height, isRGB, quality, downSample, comment) show example.cpp // ////////////////////////////////////////////////////////// // how to use TooJpeg: creating a JPEG file // see // compile: g++ example.cpp toojpeg.cpp -o example -std=c++11 #include "toojpeg.h" // ////////////////////////////////////////////////////////// // use a C++ file stream #include <fstream> // output file std::ofstream myFile( "example.jpg" , std::ios_base::out | std::ios_base::binary ) // write a single byte compressed by tooJpeg void myOutput(unsigned char byte) < myFile << byte >// ////////////////////////////////////////////////////////// int main() < // 800x600 image const auto width = 800 const auto height = 600 // RGB: one byte each for red, green, blue const auto bytesPerPixel = 3 // allocate memory auto image = new unsigned char[ width * height * bytesPerPixel ] // create a nice color transition (replace with your code) for (auto y = 0 y < height y++) for (auto x = 0 x < width x++) < // memory location of current pixel auto offset = ( y * width + x ) * bytesPerPixel // red and green fade from 0 to 255, blue is always 127 image [ offset ] = 255 * x / width image [ offset + 1 ] = 255 * y / height image [ offset + 2 ] = 127 > // start JPEG compression // note: myOutput is the function defined in line 18, it saves the output in example.jpg // optional parameters: const bool isRGB = true // true = RGB image, else false = grayscale const auto quality = 90 // compression quality: 0 = worst, 100 = best, 80 to 90 are most often used const bool downsample = false // false = save as YCbCr444 JPEG (better quality), true = YCbCr420 (smaller file) const char* comment = "TooJpeg example image" // arbitrary JPEG comment auto ok = TooJpeg::writeJpeg( myOutput , image , width , height , isRGB , quality , downsample , comment ) delete[] image // error => exit code 1 return ok ? 0 : 1 >

The same example, but this time for grayscale images (download file):

show example-gray.cpp // ////////////////////////////////////////////////////////// // how to use TooJpeg: creating a JPEG file // see // compile: g++ example.cpp toojpeg.cpp -o example -std=c++11 #include "toojpeg.h" // ////////////////////////////////////////////////////////// // use a C++ file stream #include <fstream> // output file const char* filename = "example-gray.jpg" std::ofstream myFile( filename , std::ios_base::out | std::ios_base::binary ) // write a single byte compressed by TooJpeg void myOutput(unsigned char byte) < myFile << byte >// ////////////////////////////////////////////////////////// int main() < // 800x600 image const auto width = 800 const auto height = 600 // Grayscale: one byte per pixel const auto bytesPerPixel = 1 // allocate memory auto image = new unsigned char[ width * height * bytesPerPixel ] // create a nice color transition (replace with your code) for (auto y = 0 y < height y++) for (auto x = 0 x < width x++) < // memory location of current pixel auto offset = ( y * width + x ) * bytesPerPixel // red and green fade from 0 to 255, blue is always 127 auto red = 255 * x / width auto green = 255 * y / height image [ offset ] = ( red + green ) / 2 > // start JPEG compression // note: myOutput is the function defined in line 18, it saves the output in example.jpg // optional parameters: const bool isRGB = false // true = RGB image, else false = grayscale const auto quality = 90 // compression quality: 0 = worst, 100 = best, 80 to 90 are most often used const bool downsample = false // false = save as YCbCr444 JPEG (better quality), true = YCbCr420 (smaller file) const char* comment = "TooJpeg example image" // arbitrary JPEG comment auto ok = TooJpeg::writeJpeg( myOutput , image , width , height , isRGB , quality , downsample , comment ) delete[] image // error => exit code 1 return ok ? 0 : 1 >

GDAL C++ API: How to create PNG or JPEG from scratch - Geographic Information Systems

Meta Raster Format (MRF) User Guide

MRF: Definition, Context and Introduction

MRF, short for Meta Raster Format, is a technology that combines raster storage with tile web services and cloud computing. While the main target domain is cloud GIS, the MRF technology can also be used in other areas such as medical imaging and scientific data processing.

  • A raster storage format
  • A tile cache format for web services
  • A dynamic tile cache for another raster

For the purpose of this document, a raster is defined as a two dimensional array of values. In information technology, a raster most commonly represents an image, in which case the array elements are known as pixels, short for picture elements. An image can be grayscale or color. In the latter case multiple values are associated with each pixel, usually one for each of the red, green and blue components. In scientific applications, rasters are commonly used to represent sampled scalar fields or matrices, where each array value is the numeric value of the scalar field at a specific point. In geographic information science (GIS), a raster can be either a map image or an array of values. A raster is a very compact and efficient way of storing uniformly sampled data, since the set of coordinates does not have to be stored with each of the data values. Instead, the coordinates are calculated by knowing the raster projection, one or more reference points and resolution.

There are many raster formats in use today. Most of them have started as image formats intended for disk storage or archiving. JPEG, PNG and TIFF are some of the best known examples. As image formats, they usually support grayscale and color images of reasonable size, and employ various compression algorithms for reducing the amount of storage needed. Most of these formats have been designed before the advent of the internet, and have continued to be used since they serve their purpose very well. Yet these formats have significant limitations, for example when used to store extremely large images, or non-image data.

MRF was designed to leverage existing image formats while addressing some of their shortcomings, without adding unnecessary complexities. In the simplest form, MRF explicitly provides tiling, spatial indexing and multiple resolutions (aka overviews, pyramid, or resolution-sets) support. This is an extremely common approach in GIS, allowing data for a specific area to be read without having to read the complete raster. It also allows for raster sizes well beyond what is feasible with traditional image formats. Since the tiles in an MRF file may themselves be stored in a raster format, the MRF is suitable as the tile storage format for web services. MRF can also be used as a cloud raster cache format, to improve the performance of web applications. MRF segregates the data, index and metadata in different files, which allows different classes of storage to be used for the different components as needed, enhancing efficiency, even on a single system.

There are of course other technologies that try to address the same areas. For example the naive approach of leaving the image tiles in folders and imposing a known folder and file naming strategy. This has the advantage that no special tools or applications are needed to explore and curate the larger dataset. This approach is somewhat fragile and does not scale as well as it seems at first glance, since the file and operating system overhead is increasing significantly with dataset size. A slightly better approach is to use a database for storing the tiles. In general a database has less overhead than a file system and thus scales slightly better than the file in folder approach. The disadvantage is that a full database engine is needed while most of the database functionality (tables, queries, transactions) are not useful or applicable to raster tile storage. In addition, databases expect and are optimized for smaller records than the normal raster tile size. The two dimensional grid intrinsic to a raster is not a common database construct, and tools for populating a database of rasters are scarce, non-standard or have to be written from scratch.

MRF takes the middle road between these two approaches. It provides excelent scalability by providing only the needed database functionality parts. It does not rely heavily on the file system for tile management. It acts as a raster itself, so it can be read and written using raster aware applications. Performance and scalability have been the main design goals for the MRF, closely followed by simplicity, usability and flexibility. The MRF is implemented as a GDAL driver (Geospatial Data Abstraction Layer), which allows the MRF to be immediately leveraged in many GIS applications, and providing access to well documented tools and workflows. As with most technologies, understanding the features and limitations by MRF is important if good results are to be expected. This document contains the detailed MRF documentation.

An MRF dataset has three components, metadata, index and data. While normally each component is stored in a separate file, alternative configurations exist.

  • The metadata contains high level information about the raster itself. It is stored as an XML formatted file, which improves readability and extensibility. The metadata file is the starting point in any operation on an MRF dataset. In GDAL, the XML content of this file can be used instead of a file name, so in some cases the metadata component can be just a text string. The metadata file uses the .mrf extension by convention, any other file extension can also be used.
  • The index is concerned with the two dimensional organization of the raster tiles on a grid. It contains one or more two dimensional array of records, where each record holds the size and offset of a raster tile. The index file size is proportional to the number of tiles that may reside in an MRF. The organization of the index file depends on which MRF features are being used, but for a single raster they are stored in a top-left aligned, row major array. The index file name is by default the same as the metadata file name with the .idx extension.
  • The data component contains the raster tiles forming the MRF, which themselves contain data values for each pixel. As opposed to the index, there is no guaranteed order of the data tiles within the data file. The data file is modified only by appending at the end of the file, all existing content will continue to take space on disk, even if it was replaced and is no longer accessible via the MRF driver.

Note that there is no redundancy of information, neither of the components contain any information which exist in a different component of the same MRF. All three components are required for accessing the MRF content. It is not usually possible to fully recover the dataset from only one or two of the components.

The normal reference to a specific MRF raster is to use to the metadata file name. The metadata file can have any extension, the file format detection in GDAL is done by matching the first ten characters in the file, which have the value "<MRF_META>" . For example this command will work if the test.mrf is an MRF metadata file.

Another way to reference an MRF is to use the XML content of the metadata as a string. In this case the data and index file names have to be explicitly identified in the metadata string itself, since it is not possible to derive them based on the metadata file name. When this method is used from the command line, shell special characters have to be escaped, so that a correctly formed XML string is passed to the GDAL open command.

When a MRF is created, all three component files are usually created on disk in the same folder. One of the MRF format features is that the dataset can be read as soon as the files are created, even before any data is actually written. It is also possible to read from an MRF file while it is being written into. Regions of an MRF that have not been written to automatically return the NoData value if the NoData value is defined for the dataset, or zero otherwise. This is also true for the overviews, data can be read as soon as the MRF file is flagged as containing overviews.

An MRF can have either no overviews or all the overviews for a specific overview scale progression, until all the raster fits into a single tile. There is no MRF support for individual overviews, so it is not possible to only have a selected few! If the overviews have not been populated with data, they will return NoData value or zero.
The MRF driver contains optimized code to generate overviews using averaging or nearest value interpolation, for overview scale of powers of 2. The generic GDAL overview generation code can also be used, in which case overviews with various resampling methods or at other scale factor (3, 4 …) can be generated. The same rule applies all levels have to exist until all raster fits in one tile. If the MRF is not already marked as having overviews, the scale between overviews will be the first value passed to gdaladdo utility. The first overview to be generated and populated has to be the largest one. It is also mandatory to generate all necessary overviews in sequence, since they are generated recursively, from the previous one. Usually the list of levels passed to gdaladdo should be all the needed powers of the scale factor, like this:

gdaladdo -r avg Test.mrf 2 4 8 16 32 64 128
Or for powers of 3:
gdaladdo Test.mrf 3 9 27 81

For convenience, unnecessary levels (the large values) generate a warning but not an error. It is not possible to change MRF overviews from one scale factor to another. It is however possible to generate the overviews multiple times, which will not reclaim the space used by the older overviews.

While not recommended, it is possible to generate MRF external overviews in GDAL, which are usually not in MRF format and are not subject to the MRF limitations.

The MRF driver contains its own optimized resampling code, using either averaging or nearest neighbor algorithms. The internal code has less overhead than the GDAL averaging and is usually faster. It is also optimized for MRFs with large areas of NoData. Use –r avg or -r nnb as the sampling option to gdaladdo with 2 as a scale factor to trigger the use of the MRF specific overview generation. Only scale 2 works for the internal sampler! The MRF sampler pads to the right and bottom of the image when needed, keeping the scale factor an exact 2. In contrast, the GDAL sampler stretches the input when needed by repeating rows and/or columns, which keeps the bounding box for all the overviews identical but the ratio between two successive overviews may not be exactly 2. Since for the internal resampler the scale factor is exactly 2, the avg algorithm can also be considered a bilinear interpolation. Both avg and nbb samplers do take the NoData into account.

Note that GDAL up to version 1.11 used an incorrect step when generating overviews. This bug results in inefficient execution, larger than necessary file sizes and sometimes visible artifacts. This problem has been addressed and should not affect future versions of GDAL. Also, use –r average to use the GDAL average interpolation and -r near to select the GDAL nearest neighbor one. As described above, the results will differ slightly from the MRF internal sampler, due to the different padding. For the internal sampler, the progress indicator is per generated level.

GDAL resampling takes into consideration both the NoData value and the alpha band when it exists, setting to zero pixels where the alpha band is zero. To force gdal to preserve the data values even for pixels where the alpha value is zero, set the MRF create option PHOTOMETRIC=MULTISPECTRAL . The downside of this workaround is that it will set the photometric interpretation of all bands to unknown, which may create other problems. The MRF avg or nnb resampling methods are not subject to this behavior, it will keep the the data values even if the alpha band is zero.

Reading a single overview

In case of an MRF file with overviews, it is possible to open a single specific overview level, usually to check the overview in isolation. The overviews are identified by their numeral and not by the relative scale, with 0 being the largest overview. The syntax used for this is <filename>:MRF:L<n>

For example, this command will explicitly open the first overview level:

Inserting data in an existing MRF

Using an MRF specific utility mrf_insert , it is possible to replace or modify a part of an MRF and generate only the affected portions of the overviews. This facility makes it possible to build very large datasets efficiently, operating on small areas at a time. This functionality relies on the internal MRF resampling, so it will only work with avg or nnb resampling mode and powers of two between levels.
Set create option APPEND_SUBDATASET to true avoid deleting the MRF header file.
Since a Caching or Cloning MRF may be used at the same time by different processes, the MRF driver contains code that allows it to be written by multiple processes on the same machine safely, as long as the MRF resides on a local disk. This feature might be useful for other types of MRF, for example when mrf_insert is used to update different areas of the same file, or when multiple third dimension MRF Z-Slices can be written to at the same time. To turn on this feature, manually add a boolean attribute called mp_safe with the value on to the Raster node of the MRF metadata. This feature is not on by default since it slows down the write operations somewhat. This feature has been tested on Windows and Linux, and it may fail on specific operating and file system implementations. It does not work on shared, network file systems like CIFS and NFS, because these file systems do not implement the file append mode correctly.

Types of tile compressions supported by MRF

Tiles in an MRF are stored using one of the multiple supported packing or compression formats. Some of the formats are themselves standard raster formats like JPEG, TIFF or PNG, while others are only compression formats. The choice of the tile format is passed to the MRF driver using the GDAL create option COMPRESS .

As the name suggest, the NONE format directly stores the tile array, in a row major pixel order. PIXEL and BAND interleave modes are supported, as well as all the GDAL supported data types. The NONE format has no other options or features, all the common MRF functionality applies. If a NoData value is defined per band, tiles consisting only in NoData values are not stored on disk. If the NoData value is not defined, tiles which only contain zeros are not stored. As with any other tile format, the MRF does not guarantee any specific order of the tiles in the data file.

DEFLATE is a well known generic compression algorithm, implemented in the open source zlib library. In MRF it can be used in two ways, as a stand-alone tile packing mechanism and also as a second compression step to other compression formats. The second meaning is activated by adding DEFLATE:on to the free form list OPTIONS . None compression with the DEFLATE:on option is equivalent to the DEFLATE as compression format, even though the content of the metadata file is different. The following two commands should generate MRFs with identical size data files, although the tile order may differ.

The zlib compression level is calculated from the QUALITY setting as level = floor(Quality/10). The default is 8, which is very good compression albeit slow. A quality setting of 60 is recommended as a tradeoff between compression speed and size. Quality of zero, corresponding to quality values under 10, means no compression.
The DEFLATE compression can use different tile headers. The default should be used in general, since the speed and size difference between these options is insignificant. By default, zlib compatible tile headers are generated. Gzip or no tile headers can be used instead, by setting the boolean free form OPTIONS GZ and RAWZ. If both are set, the headers will be gzip. The zlib header is 6 bytes and includes a checksum calculated with the zlib specific ADLER32 algorithm. The gzip header is slightly larger and uses a CRC32 as a checksum, which is very slightly slower than the zlib one. Raw deflate does not have a checksum nor a header and is slightly faster than either the gzip or zip headers.

The following command will generate an MRF in which every tile is a gzip stream:

gdal_translate –of MRF –co COMPRESS=DEFLATE -co OPTIONS="GZ:on" input.tif gzipped.mrf

Zlib also supports slightly different compression strategies, and MRF can control these strategies. The compression speed and the size of the output will change significantly if these options are used. This options only affect the compression algorithm, so the generated tiles can always be decompressed. For exact details on the strategy flags refer to the zlib documentation. The free form option to use is Z_STRATEGY , and the valid values are:

  • Z_FILTERED : Skips the optional filtering of the input stream
  • Z_HUFFMAN_ONLY : Only the Huffman encoding part of DEFLATE is performed
  • Z_RLE : Somewhat like an RLE, within the limits of DEFLATE
  • Z_FIXED : Fixed Huffman tables

Example which will generate an RLE compressed tile with gzip style headers:
gdal_translate –of MRF –co COMPRESS=DEFLATE -co OPTIONS="GZ:on Z_STRATEGY:Z_RLE" input.tif gzipped.mrf

PNG is a lossless compression image format which uses a filter plus the DEFLATE algorithm internally. PNG is currently the default compression mechanism for MRF. PNG generation is slower than DEFLATE, but results in smaller data files which are also suitable as tiled web services. PPNG is an MRF specific compression name, it stands for Palette PNG. While both types can have an MRF level palette, PPNG also stores the palette inside each and every PNG tile. This mode should only be used if the individual tiles are to be served over the web as colorized images, otherwise the regular PNG compression results in smaller data files. The PNG format itself supports up to sixteen bit unsigned integer data types. However, the MRF driver can treat a 16 bit PNG as containing either unsigned or signed data type (Int16), in which case the values stored in the PNG are interpreted as signed. The QUALITY setting controls the DEFLATE stage of the PNG, with the same behavior as the ones described in the DEFLATE compression. Similarly, the Z_STRATEGY band option controls the DEFLATE stage of PNG. Choosing Z_RLE or Z_HUFFMAN_ONLY as strategies will result in much faster compression at the expense of size, Z_HUFFMAN_ONLY being the fastest. Z_FIXED and Z_FILTERED have much less effect. The effect of the strategy setting is much stronger than the QUALITY value setting.
Example of gdal_translate to MRF/PNG:
gdal_translate -of MRF –co COMPRESS=PNG –co OPTIONS="Z_STRATEGY:Z_RLE" –co QUALITY=50 input.tif output.mrf

The JPEG compression is a well know lossless image compression, tuned for good visual quality combined with good compression. Since JPEG is a well known format, the MRF tiles compressed as JPEG are suitable for serving as web tiles. Depending on how the GDAL MRF was built, the MRF/JPEG format can handle 8 and sometimes 12 bit data. The 12 bit option is only available when the GDAL internal libJPEG is used and the GDAL 12bit JPEG is enabled. MRF/JPEG can handle up to 10 bands in pixel interleave mode. Note that only 8 bit JPEGs with 1 or 3 bands are suitable for web tile services in most cases. The MRF QUALITY output option value is directly passed to JPEG library as the Q factor, with the default value being 85. Note that the JPEG Q value does control the output quality and size, but it is not linear. For the exact interpretation of Q, please consult JPEG documentation. Values between 0 and 100 are supported, the reasonable range being between sixty and eighty five, larger values producing visually better results at the cost of increased size. For three bands interleaved, a couple of encoding options are available, controlled via the PHOTOMETRIC setting. The default setting should be used most of the time.

The valid choices for the PHOTOMETRIC setting are:

  • DEFAULT: JPEG uses YCbCr, 4:2:0 sampling internally. This provides good compression and visual quality. The color space has significantly lower quality than the brightness, which roughly matches the human vision charateristics.
  • YCC : Compressed as YCbCr with 4:4:4 sampling, ie color space is not spatially resampled. This setting produces tiles about a third larger than the default, tiles which have fewer color artifacts. The color conversion itself still results in a loss of information, as well as the quantization.
  • RGB : Compressed as RGB, not color converted and not spatially resampled. This setting produces much larger JPEG files, usually twice as large or more. Files are about three times larger than with the default setting. MRF with this setting can be decoded and re-encoded multiple times at the same quality without any data quality degradation.

Optimizing the Huffman encoding tables for each tile, as opposed to using the default value can be enabled by having the "OPTIMIZE=ON" in the OPTIONS list. Choosing this will increase encoding time and reduce the tile size slightly, both are relatively small changes in most cases.
To use the 12 bit JPEG, when available, set the data type to Int16 or UInt16.

While commonly refered to as a JPEG file, the format normally used to stored JPEG compressed images is actually JFIF. A newer format, which can be losslessly converted back and forth to JFIF exists, named brunsli. Brunsli has the advantage that it can store the same information as the JFIF in a smaller package, usually around 22% smaller. Since brunsli is just a better packaging of a JPEG, the result is still JPEG compressed and the raster has exactly the same characteristics and limitations. Brunsli supports all the standard JFIF/JPEG features, with the notable exception of 12bit per sample JPEGs.
Using the brunsli format does have a small negative effect on the speed of reading and writing the data when compared with the JFIF format, because the brunsli adds a codec stage. Both the read and write are still fast compared with algorithms like DEFLATE or PNG. When GDAL and MRF are compiled with brunsli support and JPEG compression is selected, the extra compression is very beneficial, so MRF will store the data in the brunsli format when possible. In some cases it is useful to force the older format, JFIF to be used. For example when the tiles are to be directly served over the web to a browser or when a legacy GDAL application, compiled without brunsli support may be used to to read the data. The OPTIONS flag JFIF can be set in those cases, forcing MRF to only generate JFIF compatible tiles:
gdal_translate -of MRF -co COMPRESS=JPEG -co OPTIONS=JFIF:1 input.tif output.mrf

JPEG Zero Enhanced (Zen) Extension

The JPEG tiles generated by MRF contain a mask of zero value pixels stored in a JPEG Zen chunk, using APP3 "Zen" tag. If the size of the Zen chunk is zero, all pixels within the respective tile are known to be non-zero. When reading a JPEG that contains a Zen chunk, the MRF driver will ensure that the pixel positions that contain zero matches the mask. In essence, the pixels that contain zero are stored in a lossless way and can be used as a data mask, when read with the MRF driver. This eliminates the JPEG edge artifacts when the background is black, enabling a Zen JPEG encoded MRF to be used as an overlay on top of other data, as long as black is made transparent. Using MRF/JPEG for storing visua data can produce significant space savings over the next best option, which would generally be lossless PNG or LERC. Since the Zen chunk is built in accordance to the JFIF standard, the mask will be ignored by legacy applications, which will only decode the JPEG image content. Since the mask is generated and consumed at the MRF level, it is not visible to GDAL. This feature works with either 8 or 12 bit JPEG tiles, and works even when the brunsli tile format is used.

The Zen bitmask is organized in a 8x8 2D bitmask, which is then compressed by run length encoding (RLE). For most inputs, the size of the Zen chunk containing the mask is negligible. The potential benefit of being able to treat black as transparent outweigh this size increase thus this feature cannot be turned off.

The JPNG compression is a combination of PNG or JPEG tiles, depending on the presence of non-opaque pixels. If all the pixels within a tile are opaque the tile is stored as JPEG, otherwise it is stored as PNG with an Alpha channel. It is presented to GDAL as either a Luma-Alpha or RGBA image, it will always have 2 or 4 bands, and always PIXEL interleaved. Most of the MRF options from PNG and from JPEG compression still apply, including the JFIF flag. The data file will be smaller than when using only PNG, if there are tiles that are fully opaque and can be stored as JPEG. Note that depending on the options used and the input data, the transition from PNG to JPEG might be visible. The normal JPEG with Zen mask should be used in most cases, except if 0 is not to be transparent and when gradual transparency is needed. Another advantage over MRF/JPEG/Zen is that legacy clients such as web browser applications do not usually need modification to be able to display the tiles as intended.

In the MRF with TIFF compression, every tile is a TIFF raster which uses the lossless LZW internal compression. Most data types are supported. Note that the tiles are not GeoTiffs, they do not contain geotags. This compression is mostly useful for web services for certain clients which support decoding TIFF.

Limited Error Raster Compression (LERC) is an original Esri raster compression format. The main benefit of using LERC is extremely fast operation when compared with PNG, DEFLATE and even with JPEG, as well as excellent compression for data types larger than eight bit. The LERC compression can be either lossy or lossless. The lossy part is due to an initial quantization stage, which is controled by the LERC maximum error value (LERC_PREC), which is a floating point number. LERC may alter the values stored, but the change is always less or equal to this LERC maximum error value. The quanta or precision of the output data values will thus be twice the LERC_PREC value. If the LERC maximum error is zero or too small for any space savings to be obtained by quantization, the input data values are not modified, and LERC becomes a lossless format. LERC supports an explicit data mask, which in MRF is enabled when the NoData value is defined. The NoData values are not stored in the compressed tile, which makes LERC a good choice for storing sparse data. In MRF, for integer types the default LERC_PREC value is 0.5, corresponding to lossless compression. For floating point types the LERC_PREC defaults to 0.001 (.002 data resolution). The compression achieved by LERC heavily depends on the LERC_PREC value, which should be carefully selected for each particular dataset.

To set a custom LERC precision value, use the free form MRF OPTIONS mechanism, the option name being "OPTIONS". To set the LERC precision for a new MRF, use the create option like this: -co OPTIONS="LERC_PREC:0.005"
When set, the LERC_PREC value will be used for all subsequent writes into the respective MRF.

There are two different versions of LERC compression supported in MRF, LERC (default, V2) and LERC1.

LERC supports more data types and higher precision than LERC1. While in most cases LERC achieves very similar compression to LERC1, it also includes different compression methods that may result in significantly better compression. For byte input data for example, a Huffman compression algorithm is used instead of the LERC algorithm. LERC also supports pixel interleaved data, which usually results in better compression. Note that pixel interleaved LERC compression was introduced later, MRF files using this feature will be unreadable by older versions of the MRF driver.

LERC1 is the original LERC algorithm, implemented as a single band compression from floating point. MRF can make use of it for integer type data by conversion to floating point before invoking the LERC1 algorithm. This means that LERC1 integer precision is limited to 24 bits. MRF also simulated pixel interleaved LERC1 compression by concatenating the results for each individual band. While there is no size advantage to using LERC1 pixel interleaved in MRF, there might still be a performance advantage in a cloud environment since data for all bands is read in a single operation.

To choose LERC1 instead of the default LERC, add V1=ON to the options string, like this: -co OPTIONS="LERC_PREC=0.01 V1=ON"

MRF tiles compressed by LERC can be further compressed with zlib (DEFLATE) which in some cases can improve the compression at the expense of speed. DEFLATE speed is asymmetric, with decompression being faster than compression, so it does not affect read speeds as much as it does writes. However, DEFLATE decompression is still significantly slower than LERC so it should be used only when the size is critical or when the decompression speed is not the main source of delays, for example when reading tiles from cloud storage. To add DEFLATE to LERC, add "DEFLATE:ON" to the list of free form options. This example sets both the LERC precision and the extra DEFLATE option: -co OPTIONS="LERC_PREC=0.01 DEFLATE=ON"

This is the name for the basic storage format MRF, where all the three components are physically sitting in the same folder. In use it is similar to a TIFF or many other raster formats.

The three component files of an MRF (metadata, index and data) are distributed across different storage systems. This is accomplished by having two XML nodes in the MRF metadata, each containing a GDAL accessible file names for the index or respectively for the data file, similar to hyperlinks. These XML nodes are not usually created by the GDAL MRF driver, then need to be added by modifying the metadata file once the location of the component files is known. The two nodes to be added are <IndexFile> and <DataFile> . They are added as sub-nodes of the <Raster> node. The content is simply a GDAL readable path to where the data or the index file can be found. The Split MRF can be used for example to accelerate access to data on slow storage, by keeping the metadata files and possibly the index file on a fast storage (local SSD) while having the large data files on a HDD a NAS or even in a cloud storage by using the GDAL VSI (virtual storage interface). Other than the file location, there is no difference between the Static and the Split MRF. The IndexFile and DataFile nodes can also contain an optional attribute called offset , with a numerical value. This value will be added to the normal, calculated file offsets for all access to the respective files.

A caching MRF is used as a cache format for another raster file. The original raster is called the source raster, while the MRF used to cache becomes the caching MRF. Only reading from a caching MRF is possible in GDAL, the update of the caching MRF content occurs automatically. Opening a caching MRF for update is not supported. It is also not possible to write to the parent dataset through a caching MRF. Some of the GDAL functionality of the parent raster might not be available when accessing the data through an MRF. Only static rasters, including static/split MRFs should be cached. Chaining caching MRFs is possible but cache coherency may become an issue. When the location of the caching MRF data file is on a local disk, the caching MRF can be used in parallel by multiple processes on the same machine. For example, multiple gdal based GIS applications can be active at the same time, reading and sharing the same caching MRF data.

CACHEDSOURCE create option

The basic way to create a caching MRF is using the gdal_translate command. In addition to the normal MRF create options, the creation of a caching MRF dataset requires the presence of the CACHEDSOURCE option, whose value is the file name of the raster dataset to be cached. Any raster format readable by GDAL can be used as the source, including properly quoted string GDAL specifiers. The file name should be absolute, except for the case where the parent raster file is located in the same exact folder as the caching MRF metadata file.

An example of creating a caching MRF:
gdal_translate –of MRF -co NOCOPY=True –co CACHEDSOURCE=H12003_MB_1m_MLLW_14of16.tif H12003_MB_1m_MLLW_14of16.tif tst.mrf

In the command above, the presence of the CACHEDSOURCE option flags the file as a caching MRF and the value of the option gets stored in the MRF metadata file. Since the values used are the file name without an absolute path, the caching mrf metadata file will always reside in the same location as the parent dataset file. Absolute source path is also supported, and is the right choice in most cases.

The command above will create the caching MRF metadata, data and index files but will not copy the source data. The caching MRF has the same structure as a normal, static MRF, except that in the metadata it is flagged as a caching MRF. It is possible to erase the index and data files and then use the MRF for caching, the index and data files of a caching MRF dataset are created as empty when needed needed.

WARNING: Always remove the index and data files of a caching MRF together, otherwise errors will occur.

As seen above, to initialize a caching MRF but not copy any data in it, use the boolean create option NOCOPY=True . For example:
gdal_translate -of MRF -co COMPRESS=LERC -co BLOCKSIZE=512 -co OPTIONS="LERC_PREC=0.01" -co NOCOPY=True -co CACHEDSOURCE=/data/LERC_test/H12003_MB_1m_MLLW_14of16.tif H12003_MB_1m_MLLW_14of16.tif caching.mrf

The example above, in addition to the precedent one, sets the caching MRF compression to LERC, sets the blocksize to be used, sets the LERC max error via the freeform option and sets the NOCOPY to true. This will leave the caching MRF initialized but empty. When raster blocks are then read from the MRF, data is read from the CACHEDSOURCE raster and stored in the caching MRF. On subsequent reads, if the data already exists in the caching MRF the parent dataset is not longer accessed. The caching MRF can be used to transcode data from any raster file format supported by GDAL.

The combined use of the CACHEDSOURCE and NOCOPY options should be the most common use pattern. Normally, the source raster as used on the gdal_translate command line and the value of the CACHEDSOURCE are the same. The difference is that the source raster is used as the source of metadata during the gdal_translate execution, while the CACHEDSOURCE raster is used for reasing, when attempting to read from the caching MRF, if the data is not present in the MRF itself. This syntax is required due to the structure of gdal_translate, and it also offers the possibility to initialize a caching MRF using a local file while caching a different, possibly remote raster. Since opening and reading the metadata from a remote raster can take a while, this option can greatly speed up setting multiple caching MRFs without having to open each and every remote raster.


The MRF (caching or static) can be flagged at creation time as already having the full set of internal overlays. This is useful when creating a caching MRF, since it will then cache and offer access to overview tiles.
This command creates a caching MRF with the normal, factor 2:
gdal_translate -of MRF -co COMPRESS=LERC -co BLOCKSIZE=512 -co OPTIONS="LERC_PREC:0.01" -co UNIFORM_SCALE=2 -co NOCOPY=True -co CACHEDSOURCE=/data/LERC_test/H12003_MB_1m_MLLW_14of16.tif H12003_MB_1m_MLLW_14of16.tif /data/LERC_test/test.mrf

Note that data for the overlays of a caching MRF will be read at the corresponding scale by reading from the parent dataset, thus they might be different from the ones created on a static MRF. Do not use gdalado on a caching MRF.

This is the easy part, simply use the caching MRF for reading data just as any other raster format in GDAL. When opened, the MRF driver will not open the source dataset. When reading, if the tile already exists in the caching MRF, then it will be read from it. Otherwise, the source file will be opened and the tile will be requested from the source. Then the MRF tile will be created and stored in the caching MRF before returning the data. Thus, the first time a tile is requested it will have the source performance, any subsequent writes will have local performance. The delayed source dataset open provides additional performance. The performance of the caching MRF depends on a multitude of factors, including the page sizes of both the caching MRF and the remote files. Good performance is usually achieved when the caching MRF and the remote file have the same page size and alignment. A particular case is when the remote is pixel interleaved but the caching MRF is band interleaved. In this case, the remote page may be read and decompressed multiple times, once for each and every output band. However, if the GDAL block cache is large enough to hold all the remote blocks this will not happen and the blocks will be reused. If the source page size is not efficient for the user application, it is recommended that the source data be reformatted ahead of time with a suitable page size, possibly as MRF.

Advanced use of caching MRF

The two extra features of a caching MRF over a static one, fetching content from a different source and storing content locally can be individually turned off. Turning them both off will transform the caching MRF into a static MRF, where only the content that already exists within the cache is accessible. The ability to turn these features off and then turn them on again is done via file access rights. The state of these features is set when the data and index files are opened, and they will persist for that process as long as those files are kept open.

Turning off the local cache writes while still reading data from the source still allows reading the cached content as well as source content. This feature is useful for example when the local cache should not be allowed to increase in size. To turn off local cache writing off, make the existing MRF data file read only.

Turning off the new content fetch is useful for reading only the local cache, or when the source is not available. It avoids the latency and penalty of trying and failing to access the source. To turn off the source fetching, make the existing MRF index file read only. Turning off the new content fetch will implicitly turn off local cache writes, since there is no new content to be written. If a caching MRF uses the same file for both data and index, this will be the behavior. Data which does not exist in the local caching MRF will be returned as NoData or black.

Sometimes it is useful to temporarily stop the caching MRF from storing data locally while preserving data access to the remote data source, without modifying the file access flags. This can be achieved by setting the environment variable MRF_BYPASSCACHING to TRUE. This variable can be set as a gdal configuration option. All caching and cloning MRF files opened while this variable is set to true are affected, it is not possible to selectively choose which caching MRFs are affected.

As a further optimization of a caching MRF, if the source dataset of a caching MRF is itself an MRF, and the caching MRF has the identical structure with the source one (image size, projection, page size, compression …), the page transcoding is eliminated, and a copy of the already compressed pages from the source MRF into the cache. This type of MRF is a Cloning MRF , since its tiles are an identical copy of the source MRF ones, possibly in a different order. Creating a clone MRF cannot be done using gdal_translate, since it is not possible to insure that the source has the same properties as the caching MRF. Instead, a cloning MRF has to be created by copying the cloned MRF metadata file to where the cloning MRF should reside and adding the following lines to the top level node:

The MRF driver recognizes and reads a LERC/LERC1 compressed file. This type of file behaves as a read-only single tile MRF with LERC compression, without geo-reference. This feature is mostly intended to be used by the GDAL WMS driver. An open option, DATATYPE , can be used to set the data type when reading from LERC1 compressed data, since that information is not available in the LERC itself. The default data type for LERC1 is byte. LERC (V2) can only be read as the same data type it was encoded, the DATATYPE open option is ignored.

When overwriting an MRF, GDAL normally tries to erase the files if they exist. To avoid having the data or the index file erased un-intentionally, the MRF driver does not do this. This means that if a file exists and is used repeatedly as a destination for gdal_translate, the data file will keep growing and the index file will keep its old content, which is the desired behavior. This can create problems in certain cases, for example when the same file name is reused for images of different size or structure, or when the MRF itself is corrupt. Crashes may occur in some of these situations. In these cases, the index and data file should be erased by hand, outside of the GDAL infrastructure.

APPENDIX A, MRF Metadata Schema

APPENDIX B, Index file format

The MRF index is a vector of tile records. A tile index record is sixteen bytes long and contains the tile offset and size, each stored as an eight byte unsigned integer. In C, a tile index record is defined as

Example values in this document will be using the notation [Offset, Size] for a tile index. By convention the index for a tile which has no data written into it has the size of zero [R, 0]. This index record will generate a tile filled with zeros or NoData on read. The value for the offset is reserved and should be written as zero. An offset value of 1 and a size of 0 [1, 0] is used by the caching/cloning MRFs as a flag that a tile is zero or NoData in the source file. It will be read as zero without triggering a read from the source.

The order of the tile records in the index file depends on the type of MRF:

Tile records are usually stored in top-left orientation, in row major order (Y, X).

If the MRF contains multiple channels (Bands) and they are stored as band interleaved data, the band index changes first, then the spatial index (Y, X, C).

If the MRF Z dimension is more than 1, each Z slice index is stored consecutively, thus Z varies after Y (Z, Y, X, C).

If the MRF contains versions, the current version is stored at the start of the index file, immediately followed by version one and so on (V, Y, X, C) or (V, Z, Y, X, C).

If there are overviews, the tile index vectors for the overviews immediately follow the index vector for the full resolution, in the decreasing order of resolution, (lX,Y,C) or (lZ,X,Y,C). Note that the vector for an overview level is smaller than the vector for the previous overview or base resolution.

For cloning MRFs, the index of the local cache data is followed immediately by a copy of the cloned MRF index. The content of both may be updated during reads.

To print the content of the index in a human readable form, the following command can be used on UNIX. The first number is the offset, the second one the size of each tile

APPENDIX C, Create Options

In GDAL, a list of key-value string pairs can be used to pass various options to the target driver. Using the gdal_translate utility, these options are passed using the –co Key=Value syntax. Most of the names of the options supported by MRF have been chosen to match the ones used by TIFF. The create options supported by MRF are:

Key Default Value Description
BLOCKSIZE 512 The tile size, in both X and Y
BLOCKXSIZE 512 Horizontal tile size
BLOCKYSIZE 512 Vertical tile size
COMPRESS PNG Choses the tile packing algorithm
ZSIZE 1 Specifies the third dimension size
INTERLEAVE PIXEL or BAND, format dependent PIXEL or BAND interleave
NETBYTEORDER FALSE If true, for some packings, forces endianness dependent input data to big endian when writing, and back to native when reading
QUALITY 85 An integer, 0 to 100, used to control the compression
PHOTOMETRIC Sets the interpretation of the bands and controls some of the compression algorithms
SPACING 0 Reserve this many bytes before each tile data
NOCOPY False Create an empty MRF, do not copy input
UNIFORM_SCALE Flags the MRF as containing overviews, with a given numerical scale factor between successive overviews
CACHEDSOURCE GDAL raster reference to be cached in the caching MRF being created

APPENDIX D, Free-form Create Options

In addition to the normal create options which are applicable to all supported tile packings, MRF compressions also accept a set of options that control features of only specific packing formats, or can be used to modify default behaviors. The main difference between the create options and free-form options is that the latter are saved in the MRF metadata file and may apply when a file is read, not only when it is written. The free form options are not part of the GDAL interface, and as such they are not checked for correctness when passed to the driver. If a free form option doesn't seem to have the expected effect, the exact spelling should be checked, they are case sensitive.

The free-form option list is passed as a single string value for the create option called OPTIONS. The value is a free form string containing white space separated key value pairs. GDAL list parsing is being used when reading, either the equal sign = or the colon : can be used as the separator between key and value. Boolean flags are false by default, they are treated as true only if the value is one of the Yes , True or 1 .

When using gdal_translate utility, the free form option syntax will be:

-co OPTIONS="Key1:Value1 Key2:Value2 …"

Key Default Packing Description
DEFLATE False Most Apply zlib DEFLATE as a final packing stage
GZ False DEFLATE Generate gzip header style
RAWZ False DEFLATE No zlib or gzip headers
V1 False LERC Uses LERC1 compression instead of LERC (V2)
LERC_PREC 0.5 for integer types 0.001 for floating point LERC Maximum value change allowed
OPTIMIZE False JPEG, JPNG Optimize the Huffman tables for each tile. Always true for JPEG12
JFIF False JPEG, JPNG When set, write JPEG tiles in JFIF format. By default, brunsli format is preferred

Starting with GDAL 2.x, a list of key-value string pairs can be used to pass various options to the target driver when reading. Using the gdal_translate utility, these options are passed using the –oo Key=Value syntax.

◆ ImwritePNGFlags

Imwrite PNG specific flags used to tune the compression algorithm.

These flags will be modify the way of PNG image compression and will be passed to the underlying zlib processing stage.

  • The effect of IMWRITE_PNG_STRATEGY_FILTERED is to force more Huffman coding and less string matching it is somewhat intermediate between IMWRITE_PNG_STRATEGY_DEFAULT and IMWRITE_PNG_STRATEGY_HUFFMAN_ONLY.
  • IMWRITE_PNG_STRATEGY_RLE is designed to be almost as fast as IMWRITE_PNG_STRATEGY_HUFFMAN_ONLY, but give better compression for PNG image data.
  • The strategy parameter only affects the compression ratio but not the correctness of the compressed output even if it is not set appropriately.
  • IMWRITE_PNG_STRATEGY_FIXED prevents the use of dynamic Huffman codes, allowing for a simpler decoder for special applications.

Use this value for normal data.

Use this value for data produced by a filter (or predictor).Filtered data consists mostly of small values with a somewhat random distribution.

In this case, the compression algorithm is tuned to compress them better.

Use this value to force Huffman encoding only (no string match).

Use this value to limit match distances to one (run-length encoding).

Using this value prevents the use of dynamic Huffman codes, allowing for a simpler decoder for special applications.

◆ imwrite()

bool cv::imwrite ( const String & filename,
InputArray img,
const std::vector< int > & params = std::vector< int >()
cv.imwrite(filename, img[, params]) -> retval

Saves an image to a specified file.

The function imwrite saves the image to the specified file. The image format is chosen based on the filename extension (see cv::imread for the list of extensions). In general, only 8-bit single-channel or 3-channel (with 'BGR' channel order) images can be saved using this function, with these exceptions:

  • 16-bit unsigned (CV_16U) images can be saved in the case of PNG, JPEG 2000, and TIFF formats
  • 32-bit float (CV_32F) images can be saved in TIFF, OpenEXR, and Radiance HDR formats 3-channel (CV_32FC3) TIFF images will be saved using the LogLuv high dynamic range encoding (4 bytes per pixel)
  • PNG images with an alpha channel can be saved using this function. To do this, create 8-bit (or 16-bit) 4-channel image BGRA, where the alpha channel goes last. Fully transparent pixels should have alpha set to 0, fully opaque pixels should have alpha set to 255/65535 (see the code sample below).
  • Multiple images (vector of Mat) can be saved in TIFF format (see the code sample below).

If the image format is not supported, the image will be converted to 8-bit unsigned (CV_8U) and saved that way.

If the format, depth or channel order is different, use Mat::convertTo and cv::cvtColor to convert it before saving. Or, use the universal FileStorage I/O functions to save the image to XML or YAML format.

The sample below shows how to create a BGRA image, how to set custom compression parameters and save it to a PNG file. It also demonstrates how to save multiple images in a TIFF file:

GDAL C++ API: How to create PNG or JPEG from scratch - Geographic Information Systems

An error has been encountered in accessing this page.

1. Server:
2. URL path: /pub/png/pngcode.html
3. Error notes: Server unable to read htaccess file, denying access to be safe
4. Error type: 403
5. Request method: GET
6. Request query string: NONE
7. Time: 2021-06-27 20:08:24 UTC (1624824504)

Reporting this problem: The problem you have encountered is with a project web site hosted by This issue should be reported to the project (not to

  1. Contact the project via their designated support resources.
  2. Contact the project administrators of this project via email (see the upper right-hand corner of the Project Summary page for their usernames) at

If you are a maintainer of this web content, please refer to the Site Documentation regarding web services for further assistance.

NOTE: As of 2008-10-23 directory index display has been disabled by default. This option may be re-enabled by the project by placing a file with the name ".htaccess" with this line:

2 Answers 2

Adding support for a new language is pretty straightforward, you actually just need to follow the documentation and you can get to the point. You also need to have a knowledge of the scripting language which will help you to cut manual work on some steps. Unix command line experience is a big plus, though you can work on Windows too.

1) Read Introduction to become familiar with concepts of speech recognition - features, acoustic models, language models, etc.

2) Try CMUSphinx with US English model to understand how things work. Try to train with sample US English AN4 database following acoustic model training tutorial.

3) Read about your language in Wikipedia.

4) Collect a set of transcribed recordings for your language - podcasts, radio shows, audiobooks. You can also record some initial amount yourself. You need about 20 hours of transcribed data to start, 100 hours to create a good model.

5) Based on the data you collected, create a list of words and a phonetic dictionary. Most phonetic dictionaries could be created with a simple rules with a small script in your favorite scripting language like Python. See Generating a dictionary for details.

6) Segment the audio to short sentences manually or with sphinx4 aligner, create a database with required files as described in training tutorial.

7) Integrate new model into your application and design a data collection to improve your model.

Compile MapServer¶

Once you have compiled the supporting libraries successfully, you are ready to take the final compilation step. If you have not already done so, open a command prompt and set the VC++ environment variables by running the vcvars32.bat usually located in C:Program FilesMicrosoft Visual StudioVC98binvcvars32.bat.

Now issue the command: nmake /f and wait for it to finish compiling. If it compiles successfully, you should get mapserver.lib, libmap.dll, mapserv.exe, and other .EXE files. That’s it for the compilation process. If you run into problems, read section 4 about compiling errors. You can also ask for help from the helpful folks in the MapServer-dev e-mail list.

GDAL C++ API: How to create PNG or JPEG from scratch - Geographic Information Systems

This version of the C-language interface reference is broken down into small pages for easy viewing.

C/C++ Exported Functions

CNN/RNN Training Interfaces

RealNets Training Interfaces

Image/Frame Processing Interfaces

Raw Pixels Access & Conversion Interfaces

Image Creation & Blob Loading/Saving Interfaces

OpenCV Integration Interfaces

C/C++ Exported Objects

This is a list of all abstract objects and datatypes exported by the SOD library. There are few objects, but most applications only use a handful.

• sod_cnn

An instance of the opaque sod_cnn structure hold all layers of a standard Convolutional Neural Network (CNN) or Recurrent Neural Network (RNN). The syntax is based on the darknet framework and the task vary depending on the specified architecture (i.e. object detection & classification for CNN, text generation for RNN). The life of a sod_cnn instance goes something like this:

  1. Obtain a new sod_cnn handle via sod_cnn_create() . This routine expect you to specify the desired network architecture and a SOD weight file downloadable here. Built-in magic words such as :face:tiny:full:voc:rnn, etc. are aliases for pre-ready to use CNN or RNN architectures for object detection and text generation purposes. These magic words and their expected SOD weight files are documented here . This routine is often the first API call that an application makes and is a prerequisite in order to work with the CNN/RNN layer.
  2. Optionally, configure the network via sod_cnn_config() where you can tune its parameters, specify a RNN consumer callback, detection threshold values, temperatures and so forth.
  3. Prepare the input data for prediction. This step is only mandatory for object detection tasks and is performed via sod_cnn_prepare_image() .
  4. Perform the network prediction via sod_cnn_predict() .
  5. Consume the network output which vary depending on the specified architecture (i.e. bounding boxes, generated text/code, etc.). You are invited to take a look at the introduction course for additional information.
  6. Destroy the sod_cnn instance via sod_cnn_destroy() .

• sod_realnet

RealNets are simple container for multiple neural networks architecture and were introduced to the public in the first release of SOD. By taking advantage of each network architecture since most of them are specialized (i.e. CNNs for object classification, ANNs for pattern extraction and so forth) and stack one network on top of another, one could achieve amazing results. For example detecting & extracting facial shapes has never been easier and quick (few milliseconds) by stacking up together in a RealNets container a set of decision trees and a simple Artificial Neural Network. The life of a sod_realnet instance goes something like this:

  1. Obtain a new sod_realnet handle via sod_realnet_create() . This routine is often the first API call that an application makes and is a prerequisite in order to work with the RealNets layer.
  2. Register one or more RealNets models via sod_realnet_load_model_from_disk() or sod_realnet_load_model_from_mem() . You can already rely on pre-trained models available to download from or train your own RealNets models on your CPU via the training interfaces.
  3. Optionally, configure your models via sod_realnet_model_config() where you can tune parameters like detection threshold values, minimum & maximum window size, scale & stride factors and so on.
  4. Optionally, prepare the input grayscale sod_img object holding the target image or video frame for detection via sod_image_to_blob() .
  5. Perform Real-time detection via sod_realnet_detect() .
  6. Consume the network output which is returned as an array of bounding boxes via an instance of sod_box . You are invited to take a look at the introduction course for additional information.
  7. Finally, release the sod_realnet handle via sod_realnet_destroy() .

Notice that RealNets are designed to analyze & extract useful information from video stream rather than static images thanks to their fast processing speed (less than 10 milliseconds on 1920*1080 HD stream) and low memory footprint making them suitable for use on mobile devices. You are encouraged to connect the RealNets APIs with the OpenCV Video capture interfaces or any proprietary Video capture API to see them in action.

• sod_img

Internally, each in-memory representation of an input image or video frame is kept in an instance of the sod_img structure. Basically, a sod_img is just a record of the width, height and number of color channels in an image, and also the pixel values for every pixel. Images pixels are arranged in CHW format. This means in a 3 channel image with width 400 and height 300, the first 400 values are the 1st row of the 1st channel of the image. The second 400 pixels are the 2nd row. after 120,000 values we get to pixels in the 2nd channel, and so forth.

Practically, all the exported interfaces deal with a sod_img instance and over 70 interfaces provide advanced image/frame processing routines. These includes sod_canny_edge_image() , sod_hilditch_thin_image() , sod_hough_lines_detect() , sod_image_find_blobs() , sod_otsu_binarize_image() , sod_crop_image() , sod_resize_image() , sod_dilate_image() , sod_image_draw_bbox() and so forth. You are invited to take a look at the list of image processing interfaces for their complete documentation.

A sod_img can be loaded from disk via sod_img_load_from_file() , from memory (i.e. network socket) using sod_img_load_from_mem() or dynamically created via sod_make_image() . An OpenCV compile-time directive is provided to help you integrate OpenCV with SOD. When enabled, primitives such as sod_img_load_cv_ipl() , sod_img_load_from_cv_stream() and so on are available to call. This let you record video frames from external sources such as your Webcam, CCTV , etc. and convert them back to a working sod_img instance.

Raw pixels values can be manipulated via a set of public interfaces such as sod_img_add_pixel() , sod_img_get_pixel() , sod_img_set_pixel() , etc. or directly via the data pointer member of this structure. In that case, you have to be careful of the target pixel location that must be in range unlike the exposed public interfaces that take care of checking range location for you.

• sod_box

A bounding box or bbox for short is represented by an instance of the sod_box structure. A sod_box instance always store the coordinates of a rectangle obtained from a prior successful call to one of the object detection routines of a sod_cnn or sod_realnet handle such as sod_cnn_predict() or from the connected component labeling interface sod_image_find_blobs() . Besides the rectangle coordinates, the zName and score fields member of this structure hold useful information about the object it surround.

Finally, the drawing interfaces sod_image_draw_bbox() , sod_image_draw_bbox_width() , sod_image_draw_box() , sod_image_draw_box_grayscale() or sod_crop_image() let you draw/extract a rectangle on/from an input image using the sod_box coordinates.

• sod_pts

An instance of the sod_pts structure describe a 2D point in space with integer coordinates (usually zero-based). This structure is rarely manipulated by SOD and is used mostly by the Hough line detection interface sod_hough_lines_detect() and line drawing routine sod_image_draw_line() .

• sod_realnet_trainer

  • Prepare your dataset (positive, negative and test samples) for example for a standard object detection purpose.
  • Allocate and initialize a new sod_realnet_trainer instance via sod_realnet_train_init() .
  • Configure this instance via sod_realnet_train_config() by specifying a log consumer callback and the path where the RealNet output model should be stored.
  • Start the training phase via sod_realnet_train_start() and pass your training instructions (i.e. where the dataset is located, minimal tree depth and so on).
  • Finally, depending on how big your dataset is, wait some days until training is complete and release this instance via sod_realnet_train_release() .

• sod_realnet_model_handle

Since Realnets are just container for various neural networks architecture, each loaded & registered network is given an unique ID returned via calls to sod_realnet_load_model_from_mem() or sod_realnet_load_model_from_disk() . With this in hand, you can configure your network via sod_realnet_model_config() by passing the handle ID which uniquely identify the target network to configure.

Compile-Time Directives

For most purposes, SOD can be built just fine using the default compilation options. However, if required, the compile-time options documented below can be used to omit SOD features such as the CNN/RNN layer which consume a lot of memory or to integrate SOD with other libraries such as OpenCV. Every effort has been made to ensure that the various combinations of compilation options work harmoniously and produce a working library.


If this directive is defined, built-in OpenCV integration interfaces such as sod_img_load_cv_ipl() , sod_img_load_from_cv() , sod_img_load_from_cv_stream() , sod_img_fill_from_cv_stream() , sod_img_save_to_cv_jpg() are included in the build. In which case, you need to link your code against OpenCV and adjust the OpenCV include paths accordingly.


If this directive is defined, all built-in image reading interfaces such as sod_img_load_from_file() , sod_img_load_from_mem() , sod_img_set_load_from_directory() , etc. are omitted from the build. In which case, you have to rely on your own code to read images from external sources and convert them back to a working sod_img object.


If this directive is defined, all built-in image writing interfaces such as sod_img_save_as_png() or sod_img_blob_save_as_png() are omitted from the build. In which case, you have to rely on your own code (take a look at the OpenCV integration interfaces) to write a sod_img object back to an external source.


If this directive is defined, the entire CNN/RNN layer and its exported interfaces such as sod_cnn_create() , sod_cnn_predict() , etc. are completely omitted from the build. You are aware that a CNN is a memory & CPU hog. The smallest CNN detection model consume about 75

160MB of RAM unlike the RealNet architecture which is memory efficient. In some cases where resources is limited, this directive is of particular interest if you want a small library footprint.


If this directive is defined, the RealNet training interfaces are included in the build. You'll be able to train your own RealNet models on the CPU using just few calls and your training dataset. RealNets training interfaces are documented here.

Watch the video: Generating Code in RepreZen API Studio (September 2021).