Link Search Menu Expand Document

Pages

Overview

The pages dataset is effectively the url-level dataset for any Publishers within Sincera. The pages dataset contains a significant amount of metadata regarding a given URL, including classification, categorization, and sentiment data for the most popular pages that are scanned on the platform.

Dataset

Field Type Description
id integer identifier for the for the page object
publisher_id integer ID of the associated publisher to the page object.
url string URL of the page that was scanned. Note that this is the full URL, unlike the domain object that is used in other datasets, which is a top-level domain.
last_slot_scan date Date when the page was last scanned for ad slots.
last_pbjs_scan date Date when the page was last scanned for pbjs objects.
scan_count integer Count of how many times the page has been individually scanned.
publisher_assets_countnew integer Count of how many publisher assets (text, image, video) sincera has logged.
layout string if possible, determine the layout of the page (article or nil) - useful for discovering content-rich pages
valid_image_count integer count of valid images that can be used for contextual classification
invalid_image_count integer count of invalid images that cannot be used for contextual classification
total_images integer count of total images that Sincera has found on this page / url.