First of all, see the plugin's documentation about the XPath format.
A typical XPath generated for a tag looks like the following:
/HTML/BODY/DIV[@class='wp-site-blocks']/*[1][self::HEADER]/*[1][self::DIV]/*[2][self::IMG]
Note that /HTML and /HTML/BODY lack any predicates because there is no possibility for ambiguity in an HTML document at these levels. When it comes to descendants of the BODY, there is the possibility of ambiguity since there can be any number of children of any type. In the case of direct descendants of the BODY, the predicate consists of the id, role, or class attribute since at wp_body_open any number of tags may be output which causes the node index to change, resulting in /HTML/BODY/DIV[@class='wp-site-blocks']. Below this level, however, there is more consistency. So to ensure that an IMG captured during detection remains the same IMG and isn't moved around in the document due to a paragraph being added before, for example, nodes further below are selected specifying the node index with the tag name being another predicate, like *[2][self::IMG]. This XPath will stop matching a given IMG if a preceding tag is added or removed. This helps ensure that optimizations don't get incorrectly applied.
Nevertheless, there are a couple cases I'm aware of where this breaks down.
- What if someone adds a huge image
IMG to a page and it is the LCP element, and then detection occurs. If afterward, this IMG is replaced with a tiny image, the result can be that the IMG may erroneously get fetchpriority=high added to it. (Aside: One option would be to not do any optimization if there are stale URL Metrics according to the ETag, since a post update to change the image would cause the ETag to change.)
- The Gallery block has a "Randomize Order" option. While I'm not sure how popular this option is, it can cause the wrong optimizations to occur. For example, consider if the portrait image (the white stag) is the LCP element since it is the largest, in the following three page loads the order changes:
This means that an XPath which identified one IMG as LCP for one page load may incorrectly try to optimize the IMG in the same document position as LCP for the next page load.
This issue is something I also encountered as a possibility while working on the Intrinsic Dimensions plugin. If there is a Gallery of images which lack dimensions, for example, and all the images are presented in a random order, then the intrinsic dimensions captured for one IMG may incorrectly be applied to a different IMG which appears in the same document position with the next page load. To address this, I explored introducing a "srcHash" which takes the src and srcset for an image and turns it into an MD5 hash. This hash is then stored in the URL Metric with the element along with the intrinsic width and height. When the tag visitor optimizes an IMG without dimensions, it checks to see if the current srcHash matches the srcHash stored in the URL Metric. If not, it aborts.
One way to deal with all of these issues would be to include the src (and srcset?) of the IMG in its XPath. This would prevent optimizations from erroneously applying to wrong element. (This is also relevant to VIDEO which can also have a src, but which also can have SOURCE elements instead.) For images, an alternative would be to include the class name in the XPath instead.
My suspicion is that the above scenarios are edge cases that most of the time wouldn't cause an issue. I'm not sure they are significant enough to warrant safeguards, but I wanted to raise the scenario for consideration.
First of all, see the plugin's documentation about the XPath format.
A typical XPath generated for a tag looks like the following:
Note that
/HTMLand/HTML/BODYlack any predicates because there is no possibility for ambiguity in an HTML document at these levels. When it comes to descendants of theBODY, there is the possibility of ambiguity since there can be any number of children of any type. In the case of direct descendants of theBODY, the predicate consists of theid,role, orclassattribute since atwp_body_openany number of tags may be output which causes the node index to change, resulting in/HTML/BODY/DIV[@class='wp-site-blocks']. Below this level, however, there is more consistency. So to ensure that anIMGcaptured during detection remains the sameIMGand isn't moved around in the document due to a paragraph being added before, for example, nodes further below are selected specifying the node index with the tag name being another predicate, like*[2][self::IMG]. This XPath will stop matching a givenIMGif a preceding tag is added or removed. This helps ensure that optimizations don't get incorrectly applied.Nevertheless, there are a couple cases I'm aware of where this breaks down.
IMGto a page and it is the LCP element, and then detection occurs. If afterward, thisIMGis replaced with a tiny image, the result can be that theIMGmay erroneously getfetchpriority=highadded to it. (Aside: One option would be to not do any optimization if there are stale URL Metrics according to the ETag, since a post update to change the image would cause the ETag to change.)This means that an XPath which identified one
IMGas LCP for one page load may incorrectly try to optimize theIMGin the same document position as LCP for the next page load.This issue is something I also encountered as a possibility while working on the Intrinsic Dimensions plugin. If there is a Gallery of images which lack dimensions, for example, and all the images are presented in a random order, then the intrinsic dimensions captured for one
IMGmay incorrectly be applied to a differentIMGwhich appears in the same document position with the next page load. To address this, I explored introducing a "srcHash" which takes thesrcandsrcsetfor an image and turns it into an MD5 hash. This hash is then stored in the URL Metric with the element along with the intrinsicwidthandheight. When the tag visitor optimizes anIMGwithout dimensions, it checks to see if the currentsrcHashmatches thesrcHashstored in the URL Metric. If not, it aborts.One way to deal with all of these issues would be to include the
src(andsrcset?) of theIMGin its XPath. This would prevent optimizations from erroneously applying to wrong element. (This is also relevant toVIDEOwhich can also have asrc, but which also can haveSOURCEelements instead.) For images, an alternative would be to include theclassname in the XPath instead.My suspicion is that the above scenarios are edge cases that most of the time wouldn't cause an issue. I'm not sure they are significant enough to warrant safeguards, but I wanted to raise the scenario for consideration.