Publisher-based Assets
Overview
This dataset includes assets (images, text, and video) collected from a publisher’s page (url), and is useful for generating cookieless, contextual targeting solutions, as well as extracting raw assets to use in classification and categorization solutions.
Dataset
| Field | Type | Description |
|---|---|---|
| publisher_id | integer | ID for the publisher that the asset payload is associated with. |
| page_id | integer | ID for the page that that asset was detected on. |
| type | string | Defines whether or not the asset is a text or image. |
| asset | string | For the text-based Asset type, this includes the text extracted from the page, which has been scrubbed of html and is designed to be machine readable. for the Image type, this field includes a URL to the source of the asset. |
| asset_alt_text | string | If type == “PublisherImageAsset” this will contain the descriptive alt text of the image, if found |
| asset_dimensions | string | If type == “PublisherImageAsset” this contains the actual dimensions of the asset (width x height) |
| asset_dimensions_on_page | string | regardless of the asset dimensions, what is the dimension of the asset as it appears on the publisher’s environment (image only) - note this defaults to 1280x1024 resolution |
| asset_percentage_of_viewport | float | The percentage of the active viewport does the asset comprise. (image assets only) |
| asset_viewability_by_resolution | json | a JSON blob of all the different viewability scores of the asset, depending on the client device resolution. |
| article_body | string | if the page type = “article” what is the refined and processed text of exclusively the article body. (text assets only.) |
| article_body_chars | integer | count of the characters included in the article_body. (text assets only.) |
| article_title | string | if page type = “article”, this is the title of the article. (text assets only.) |
| article_excerpt | string | if page type = “article” this is an excerpt of the article. (text assets only.) |
| asset_collection_date | DateTime | Timestamp for when the asset was collected. |
| classification | json | If the asset has been classified (via machine learning) the output will be included here. For both image and text-based assets, classification includes labeling, confidence scores, and brand safety data. |