Montag, 19. August 2013

Google Referrer Query Strings Debunked Part 4

In Part 3, we learned, that the ved query parameter is actually protobuf encoded and therefore represents a message. I also provided a little script, that decodes the ved structure. What's now left to do is to deduce the proper names of the variables.

As a reminder, here is the output from the script from Part 3:

…
v1: 58
v2: 22
v6: 3

v1: 67
v2: 22
v6: 4
v7: 10

v1: 67
v2: 22
v6: 4
v7: 10

v1: 6
v2: 22
v5: 2
…

Parameters v6 and v7 are quite easy to deduce, they correspond to the parameters r and s of the plaint text variant. My guess would be, that r and s stand for [r]esult_position and [s]tart. both are zero based indices that denote the page and the number of the result that was visited. You can calculate the absolute position simply via absolute_position=start+result_position.

v5 could actually be called sub_link_position. Also a zero based index, that denotes the position of a related link.

Here are some screenshots to make the matter a little clearer:
sub_link_position in a search result


sub_link_position in related searches

Then we have two parameters, that are called i and t in the plain form. My guess would be, that t stands for type. On normal web searches this parameter is always be 22, on image search it is 429.

The parameter i is an interesting one. In is monotonically increasing on each page of search results. And it changes wether you're logged in or not. When searching for "grumpy cats" and logged into a google account, i was 42 for the first result. When not logged in, it was 57. (at least in my test case)

Sure enough, the results are ordered differently when logged in or not. So maybe the i parameter is somehow related to individualised search? Lets say,i is a value, that denotes the relevance of the search result to the user. I'll call it index_boost. Here is also more research needed, please comment if you find out more.

That makes five parameters: index_boost, type, sub_link_position, result_position and start. As mentioned earlier, protobuf messages are key-value pairs. The key in this case is an positive (excluding zero) integer. The parameters I've found so far have index 1, 2, 5, 6, 7 and none of the veds I've encountered had any other value set. It is possible, that google deprecated earlier parameters to the ved message. Again, more research is needed here. It would be pretty awesome, if some of you guys could provide with huge dumps of veds to do more digging.

Currently I am also able to generate valid veds. Because I've compared the generated ones with given ones and found no difference, I am quite confident, that I've captured all parameters (in my dataset).

You're invited to try out the online demo of the decoder or pull the source from github. Comments and pull-requests are highly welcome. Please keep me posted, if you find out something new.

thats all for now
-- Benjamin

Kommentare:

  1. Hi Benjamin, very interesting read. I am very curious whether we will one day find the search terms hidden in the variables.

    AntwortenLöschen
  2. Hi Benjamin,

    Great article - thanks for sharing. I've been looking at the parameters too and wonder why no one has used the protobuf method to decode the usg parameter. Do you have any ideas why this parameter's been largely ignored?

    AntwortenLöschen
    Antworten
    1. Hi,

      I took a look at the other parameters this way too. But nothing fruitful came out of it. My current assumption is, that the usg parameter is some kind of hash to prevent link spoofing.

      best
      -- Benjamin

      Löschen
  3. I've you're right, this is brilliant. I knew simple text parsing wasn't the answer (and it was failing on complex queries), but I doubted we'd be able to reverse engineer Google's encoding. I can pull a ton of VED examples from complex queries, but I'm having trouble getting your web version to work (seems off the left-hand side of the screen for me).

    AntwortenLöschen
  4. [SOLVED] Google parameters finally revealed.
    http://revealing-google-parameters.blogspot.com

    AntwortenLöschen
  5. hi ben, have you done any further research into the VED? The only way I've been able to generate them so far is on image search and they're a lot longer than the ones in this post. Would be great to see what information they hold now

    AntwortenLöschen