Supervised Tag Discovery User Docs
Note: This page is targeted to end users. Detailed information for developers can be found here.
This process takes an existing set of tags and scans media files looking for tag examples that best match the source tags.
Search Within TagClasses
This features restricts the search space to the time within an existing set of Tags.
For example, say we want to search a library of music only during vocals. We can first tag all of the vocal sections as "Vocals". When we run the Supervised search, we can choose the "Vocals" TagClass for the restriction, and the search will only run within these existing Tags.
Note: The Tags used to restrict the search are similarly limited by the Source TagSets, so Tags within other TagSets will not be used within the search.
Note: Be sure that the TagClass you are restricting within ("Vocals" above) is NOT in the Discovery Classes, or the function of preventing overlap will prevent any Tags from being created.
Note: This restricts the search only on time; it ignores any Min/Max Frequency information from the Tags.
Some text copied from the HiPSTAS Wiki ([]).
|Name||Yes||Not used internally. This is means as a book-keeping item for later reference by the user. For example, a researcher could name each task based on what experiment number we had written out in our notes. This makes it easy to find the task and review the settings later.|
|Discovery Classes||Yes||The TagClasses to be used for discovery. To search multiple classes at once, select more than one TagClass.|
|Source TagSets||Yes|| Which TagSets are used for the source tags
We have 3 TagClasses; singing, speaking, silence. We have 3 TagSets, say a separate TagSet per person who tagged: Tony, David, Michael.
You can select "Singing" and "Michael" to run the search with only Tags that Michael tagged as Singing. - or - You could select "Singing" and all TagSets to use examples tagged "Singing" by anyone.
|Destination TagSet||Yes||This parameter designates the TagSet destination for the resulting created machine tags.|
|Number of Tags to Discover||Yes||How many machine tags to discover per file, per source tag. Choosing "1" tells the system to select only one top match per tag, per file. The total number of new tags will be up to (Number of Tags to Discover * Number of Files * Number of Source Tags)|
|Spectra Weight||Yes||This parameter in part defines the similarity function used for comparing the spectra. The function is based on the value of this parameter added to the Pitch Weight, Pitch Energy Weight, and Average Energy Weight parameters to equal 1. Each pixel is how much acoustic energy is present for each band in the Time Frame at a specific frequency. The spectra, therefore, represents a two-dimensional matrix of numbers (without processing) mapped to colors for display. This parameter designates how much value or weight comparing these two-dimensional array of numbers should have in the similarity function. A user should play with changing the combinations of these settings to see what results in the highest accuracy for the problem at hand. More weight on this specific parameter is useful for general matches on both pitch and rhythm. A good starting point is to set this parameter to "1" and the other three to "0".|
|Average Energy Weight||Yes||This parameter in part defines how the similarity function used for comparing the spectra. The function is based on the value of this parameter added to the Spectra Weight, Pitch Weight, Pitch Energy Weight, and Pitch Energy Weight parameters to equal 1. This parameter designates how much value or weight should be given to the similarity function for the average energy changes over time. More weight on this specific parameter is useful for identifying spectra with similar rhythms.|
|Pitch Weight||Yes||This parameter in part defines how the similarity function used for comparing the spectra. The function is based on the value of this parameter added to the Spectra Weight, Pitch Energy Weight, and Average Energy Weight parameters to equal 1. This represents parameter designates how much value or weight pitch trace should have in the similarity function. Pitch trace is how the band frequency with maximum energy at any point in time changes over time. More weight on this parameter is useful for identifying differences in higher or lower notes on the spectra or the melody of a person's speech in a lower or higher voice.|
|Pitch Energy Weight||Yes||This parameter in part defines how the similarity function used for comparing the spectra. The function is based on the value of this parameter added to the Spectra Weight, Pitch Weight, and Average Energy Weight parameters to equal 1. For each Time Frame, the system finds the band with the highest energy. Pitch energy weight measures how that maximum energy value changes over time. This parameter designates how much value or weight should be given to that measurement in the similarity function. More weight on this specific parameter is useful for identifying changes in audio volume.|
|Min Match Performance||Yes||This specifies the minimum Tag Strength allowed to save new Tags. If the best candidates are below this threshold, they will not be saved. Therefore, it is possible that fewer than Number of Tags to Discover will actually be saved. 0 practically disables this (all Tags saved) and 1 would indicate a perfect match. Practical values tend to be around 0.5 - 0.75 but depend upon the specific project.|
|Number of Frequency Bands||Yes||This parameter designates the number of divisions given to a spectra between minFrequency and maxFrequency. Each band can be thought of as tuning fork or human inner ear hair. Reasonable parameter ranges are between 1 and 3500, because 3500 is the approximate number of hairs in a human inner ear and represents what a human being could possible hear. The more bands you use, the higer resolution of matching and the more power it takes to compute, resulting in a slower response.|
|Number Time Frames Per Second||Yes||This parameter designates how often to sample the energy of each hair or tuning fork. The sum of potential energy and kinetic energy for each band is always a constant. For example, if you are trying to identify a quickly changing audio event such as syllables, a reasonable parameter would be approximately 100 times per second or more. If you are trying to identify a sustained audio event such as applause or background noise, a reasonable parameter could be 1 sample per second.|
|Damping Ratio||Yes||This parameter represents how quickly the band responds to changes in the audio event. This is equivalent to designating how much "drag" is on the tuning fork or human hair or how quickly the tuning fork will stop ringing. It ranges between 0 and 1. A high damping factor (.9) is best for examining quick rhythmic features. A low damping factor (.001) is better for detecting a faint sound of a mechanical hum or a fan in the background that does not change pitch.|
|Min. Frequency||Yes||Of the Frequency Bands you have chosen, this parameter represents the lowest value of vibrations per second (Hz) on the range of bands. In other words, this is the lowest audio frequency to which a tuning fork or hair would respond. E.g., the lowest key on a piano is 27.5 Hz. See http://en.wikipedia.org/wiki/Piano_key_frequencies.|
|Max. Frequency||Yes||Of the Frequency Bands you have chosen, this parameter represents the highest value of vibrations per second on the range of bands. In other words, this is the highest audio frequency to which a tuning fork or hair would respond. E.g., the highest key on a piano is 4,186 Hz. See http://en.wikipedia.org/wiki/Piano_key_frequencies.|
|Search Within TagClasses||Optional||Restricts the search to times within all Tags of this TagClass.|
|Save Best 'N' Tags Per File||Optional||(0 or absent parameter disables this feature) Save only the best 'N' Tags per file, regardless of TagClass.|
|Save Best 'N' Tags Per Class Per File||Optional||(0 or absent parameter disables this feature) Save only the best 'N' Tags per TagClass per file.|
|Num Random Probes||Deprecated||This parameter designates how many samples per file the system will search. A large value, with many matches would be 1,000,000. Please note that more probes will take the system more time to process.|