ID3v2 is a general tagging format for audio, which makes it possible
to store meta data about the audio inside the audio file itself. The
ID3 tag described in this document is mainly targeted at files
encoded with MPEG-1/2 layer I, MPEG-1/2 layer II, MPEG-1/2 layer III
and MPEG-2.5, but may work with other types of encoded audio or as a
stand alone format for audio meta data.
ID3v2 is designed to be as flexible and expandable as possible to
meet new meta information needs that might arise. To achieve that
ID3v2 is constructed as a container for several information blocks,
called frames, whose format need not be known to the software that
encounters them. At the start of every frame is an unique and
predefined identifier, a size descriptor that allows software to skip
unknown frames and a flags field. The flags describes encoding
details and if the frame should remain in the tag, should it be
unknown to the software, if the file is altered.
The bitorder in ID3v2 is most significant bit first (MSB). The
byteorder in multibyte numbers is most significant byte first (e.g.
$12345678 would be encoded $12 34 56 78), also known as big endian
and network byte order.
Overall tag structure:
Header (10 bytes) |
Extended Header (variable length, OPTIONAL) |
Frames (variable length) |
Padding (variable length, OPTIONAL) |
Footer (10 bytes, OPTIONAL) |
In general, padding and footer are mutually exclusive. See details in
sections 3.3, 3.4 and 5.
The first part of the ID3v2 tag is the 10 byte tag header, laid out
as follows:
ID3v2/file identifier "ID3"
ID3v2 version $04 00
ID3v2 flags %abcd0000
ID3v2 size 4 * %0xxxxxxx
The first three bytes of the tag are always "ID3", to indicate that
this is an ID3v2 tag, directly followed by the two version bytes. The
first byte of ID3v2 version is its major version, while the second
byte is its revision number. In this case this is ID3v2.4.0. All
revisions are backwards compatible while major versions are not. If
software with ID3v2.4.0 and below support should encounter version
five or higher it should simply ignore the whole tag. Version or
revision will never be $FF.
The version is followed by the ID3v2 flags field, of which currently
four flags are used.
a - Unsynchronisation
Bit 7 in the 'ID3v2 flags' indicates whether or not
unsynchronisation is applied on all frames (see section 6.1 for
details); a set bit indicates usage.
b - Extended header
The second bit (bit 6) indicates whether or not the header is
followed by an extended header. The extended header is described in
section 3.2. A set bit indicates the presence of an extended
header.
c - Experimental indicator
The third bit (bit 5) is used as an 'experimental indicator'. This
flag SHALL always be set when the tag is in an experimental stage.
d - Footer present
Bit 4 indicates that a footer (section 3.4) is present at the very
end of the tag. A set bit indicates the presence of a footer.
All the other flags MUST be cleared. If one of these undefined flags
are set, the tag might not be readable for a parser that does not
know the flags function.
The ID3v2 tag size is stored as a 32 bit synchsafe integer (section
6.2), making a total of 28 effective bits (representing up to 256MB).
The ID3v2 tag size is the sum of the byte length of the extended
header, the padding and the frames after unsynchronisation. If a
footer is present this equals to ('total size' - 20) bytes, otherwise
('total size' - 10) bytes.
An ID3v2 tag can be detected with the following pattern:
$49 44 33 yy yy xx zz zz zz zz
Where yy is less than $FF, xx is the 'flags' byte and zz is less than
$80.
The extended header contains information that can provide further
insight in the structure of the tag, but is not vital to the correct
parsing of the tag information; hence the extended header is
optional.
Extended header size 4 * %0xxxxxxx
Number of flag bytes $01
Extended Flags $xx
Where the 'Extended header size' is the size of the whole extended
header, stored as a 32 bit synchsafe integer. An extended header can
thus never have a size of fewer than six bytes.
The extended flags field, with its size described by 'number of flag
bytes', is defined as:
%0bcd0000
Each flag that is set in the extended header has data attached, which
comes in the order in which the flags are encountered (i.e. the data
for flag 'b' comes before the data for flag 'c'). Unset flags cannot
have any attached data. All unknown flags MUST be unset and their
corresponding data removed when a tag is modified.
Every set flag's data starts with a length byte, which contains a
value between 0 and 127 ($00 - $7f), followed by data that has the
field length indicated by the length byte. If a flag has no attached
data, the value $00 is used as length byte.
b - Tag is an update
If this flag is set, the present tag is an update of a tag found
earlier in the present file or stream. If frames defined as unique
are found in the present tag, they are to override any
corresponding ones found in the earlier tag. This flag has no
corresponding data.
Flag data length $00
c - CRC data present
If this flag is set, a CRC-32 [ISO-3309] data is included in the
extended header. The CRC is calculated on all the data between the
header and footer as indicated by the header's tag length field,
minus the extended header. Note that this includes the padding (if
there is any), but excludes the footer. The CRC-32 is stored as an
35 bit synchsafe integer, leaving the upper four bits always
zeroed.
Flag data length $05
Total frame CRC 5 * %0xxxxxxx
d - Tag restrictions
For some applications it might be desired to restrict a tag in more
ways than imposed by the ID3v2 specification. Note that the
presence of these restrictions does not affect how the tag is
decoded, merely how it was restricted before encoding. If this flag
is set the tag is restricted as follows:
Flag data length $01
Restrictions %ppqrrstt
p - Tag size restrictions
00 No more than 128 frames and 1 MB total tag size.
01 No more than 64 frames and 128 KB total tag size.
10 No more than 32 frames and 40 KB total tag size.
11 No more than 32 frames and 4 KB total tag size.
q - Text encoding restrictions
0 No restrictions
1 Strings are only encoded with ISO-8859-1 [ISO-8859-1] or
UTF-8 [UTF-8].
r - Text fields size restrictions
00 No restrictions
01 No string is longer than 1024 characters.
10 No string is longer than 128 characters.
11 No string is longer than 30 characters.
Note that nothing is said about how many bytes is used to
represent those characters, since it is encoding dependent. If a
text frame consists of more than one string, the sum of the
strungs is restricted as stated.
s - Image encoding restrictions
0 No restrictions
1 Images are encoded only with PNG [PNG] or JPEG [JFIF].
t - Image size restrictions
00 No restrictions
01 All images are 256x256 pixels or smaller.
10 All images are 64x64 pixels or smaller.
11 All images are exactly 64x64 pixels, unless required
otherwise.
It is OPTIONAL to include padding after the final frame (at the end
of the ID3 tag), making the size of all the frames together smaller
than the size given in the tag header. A possible purpose of this
padding is to allow for adding a few additional frames or enlarge
existing frames within the tag without having to rewrite the entire
file. The value of the padding bytes must be $00. A tag MUST NOT have
any padding between the frames or between the tag header and the
frames. Furthermore it MUST NOT have any padding when a tag footer is
added to the tag.
To speed up the process of locating an ID3v2 tag when searching from
the end of a file, a footer can be added to the tag. It is REQUIRED
to add a footer to an appended tag, i.e. a tag located after all
audio data. The footer is a copy of the header, but with a different
identifier.
ID3v2 identifier "3DI"
ID3v2 version $04 00
ID3v2 flags %abcd0000
ID3v2 size 4 * %0xxxxxxx
The default location of an ID3v2 tag is prepended to the audio so
that players can benefit from the information when the data is
streamed. It is however possible to append the tag, or make a
prepend/append combination. When deciding upon where an unembedded
tag should be located, the following order of preference SHOULD be
considered.
1. Prepend the tag.
2. Prepend a tag with all vital information and add a second tag at
the end of the file, before tags from other tagging systems. The
first tag is required to have a SEEK frame.
3. Add a tag at the end of the file, before tags from other tagging
systems.
In case 2 and 3 the tag can simply be appended if no other known tags
are present. The suggested method to find ID3v2 tags are:
1. Look for a prepended tag using the pattern found in section 3.1.
2. If a SEEK frame was found, use its values to guide further
searching.
3. Look for a tag footer, scanning from the back of the file.
For every new tag that is found, the old tag should be discarded
unless the update flag in the extended header (section 3.2) is set.
The only purpose of unsynchronisation is to make the ID3v2 tag as
compatible as possible with existing software and hardware. There is
no use in 'unsynchronising' tags if the file is only to be processed
only by ID3v2 aware software and hardware. Unsynchronisation is only
useful with tags in MPEG 1/2 layer I, II and III, MPEG 2.5 and AAC
files.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|