tag:blog.mendes.codes,2014:/feedthe bloggest of mendes2020-04-17T09:07:48-07:00Fernando Mendeshttps://blog.mendes.codesfernando@mendes.codesSvbtle.comtag:blog.mendes.codes,2014:Post/plug-based-authorisation-for-elixir-and-phoenix2020-04-17T09:07:48-07:002020-04-17T09:07:48-07:00Plug-based authorisation for Elixir and Phoenix<p><em>Note: this post was originally published on the <a href="https://subvisual.com/blog">Subvisual blog</a>.</em></p>
<p>Some years ago, most of us here at Subvisual got <em>really-perhaps-a-bit-too-much</em> into Elixir. Ever since then, whenever we are free to choose the technology to work with, we’ve pretty much been going Elixir all the way.</p>
<p>We learned a lot. We laughed a lot. And I copy and pasted some code from different projects a lot. Don’t tell the rest of the development team. <em>Aaaaanyway</em>, I finally got around to open sourcing the copy/pasted code and releasing it as a package.</p>
<p>I called this thingy <a href="https://github.com/subvisual/dictator">Dictator</a>. It implements a plug-based authorisation system and allows you <em>dictate (get it??)</em> what your users can access, by defining <em>policies (hah! get it??)</em>. You can be as granular as you want and override pretty much everything. <strong>The philosophy behind it is to implement sane defaults but be easily overridable as well. You might even call it <em>convention over configuration</em>.</strong> Enough chit-chat, let’s showcase it.</p>
<h2 id="how-to-use-the-thing_2">How to use the thing <a class="head_anchor" href="#how-to-use-the-thing_2">#</a>
</h2>
<p><em>very important pre-condition: it assumes you have a <code class="prettyprint">current_user</code> or <code class="prettyprint">current_resource</code> or similar in your <code class="prettyprint">conn.assigns</code></em></p>
<p><strong>Dictator uses the concept of <em>policy</em>, which is a set of rules you implement to determine what actions your users can take.</strong> To do that, you just define a <code class="prettyprint">can?/3</code> function, which receives the current user as the first argument, the action (<code class="prettyprint">:new</code>, <code class="prettyprint">:index</code>, so on) as the second and finally the resource being accessed. Loading of all those is automagically handled for you.</p>
<p>Let’s assume you want to define a <code class="prettyprint">Post</code> policy:</p>
<pre><code class="prettyprint lang-elixir"># lib/client_web/policies/post.ex
defmodule ClientWeb.Policies.Post do
alias Client.Content.Post
use Dictator.Policy, for: Post
def can?(%User{id: user_id}, action, %Post{user_id: user_id})
when action in [:edit, :update, :delete, :show], do: true
def can?(_, action, _) when action in [:index, :new, :create], do: true
def can?(_, _, _), do: false
end
</code></pre>
<p>In this scenario our users can update, edit and delete their own things. But anyone can index and create things, even if they don’t belong to them. The last <code class="prettyprint">can?/3</code> function branch prevents users from editing, updating or deleting post that don’t belong to them.</p>
<p>This scenario is <em>so common</em> across different resources and projects I had, that I extracted it to a <code class="prettyprint">Standard</code> policy. To do the above, you can just do the following:</p>
<pre><code class="prettyprint lang-elixir"># lib/client_web/policies/post.ex
defmodule ClientWeb.Policies.Post do
alias Client.Content.Post
use Dictator.Policies.Standard, for: Post
end
</code></pre>
<p>This is a prime example of what I had in mind when building and extracting the code from previous projects: implement the most common use cases and allow edge cases to be overridden.</p>
<p>Once you have defined a policy, <strong>simply <code class="prettyprint">plug</code> in <code class="prettyprint">Dictator.Plug.Authorize</code> and it will even infer the policy</strong> to use (provided some details explained below, but we’ll get to that)</p>
<pre><code class="prettyprint lang-elixir"># lib/client_web/controllers/post_controller.ex
defmodule ClientWeb.PostController do
use ClientWeb, :controller
plug Dictator.Plug.Authorize
# ...
end
</code></pre>
<p>Tadaaaaaaa. Half-a-dozen lines of code and you’re already bossing around your users. Screw <em>the user is always right</em>, we dictatin’ everything ‘round 'ere.</p>
<p>Well, it seems that so far Dictator does a lot of magic behind the scenes, but fear not. We’ll go through how it loads resources, how it figures out the correct policy, how it determines which action the user is attempting and how we can override the stuff it uses. You and me, on a magic trip across the Land Of Code as if we were building <code class="prettyprint">dictator</code> from scratch.</p>
<h2 id="how-the-thing-loads-resources_2">How the thing loads resources <a class="head_anchor" href="#how-the-thing-loads-resources_2">#</a>
</h2>
<p>The first thing we need to do when enforcing a policy is to <strong>figure what the hell we are dealing with</strong>. This means figuring out what resource the user wants to access, what action they want to take and what specific policy decides if they can or cannot perform said action.</p>
<p>So let’s start with getting the correct resource. The first piece of the puzzle we need is the module that defines the resource being accessed. Well, that’s easy, <strong>when defining the policy the developer needs to specify what resource it is referring to</strong>:</p>
<pre><code class="prettyprint lang-elixir">defmodule ClientWeb.Policies.Post do
alias Client.Content.Post
use Dictator.Policy, for: Post
# ...
end
</code></pre>
<p>Nice work! What a team you and me are! So now we need to <strong>get the correct repo</strong>. If you dive into <a href="https://github.com/subvisual/dictator/blob/4e1050a66718bda73aa2d510950240f5b68c4feb/lib/dictator/policy.ex"><code class="prettyprint">policy.ex</code></a>, you’ll figure out how much of lazy cheaters you and me are. We try two things and then give up.</p>
<p>First we try to use the namespace and see if that module exists (<code class="prettyprint">get_repo_from_namespace/1</code>). If you are defining a policy for <code class="prettyprint">Client.Content.Post</code> most of the time you’ll have a <code class="prettyprint">Client.Repo</code>. So let’s just check if that exists and hope for the best. If that doesn’t work, well, we can just use the <code class="prettyprint">:ecto_repo</code> config that we are required to have when using <code class="prettyprint">Ecto</code> and hope there is only one <code class="prettyprint">Repo</code> defined (<code class="prettyprint">get_repo_from_application/1</code>).</p>
<p>Sometimes this isn’t the case. Sometimes our web apps need multiple repos or we even accidentally choose the wrong one (e.g. in the first scenario, if the developer has defined multiple repos we may end up with the wrong one). We really can’t figure out what the developer wants in those cases. Instead let’s just be lazy, raise an error and <strong>ask the developer to specify the repo via the <code class="prettyprint">:repo</code> key</strong>:</p>
<pre><code class="prettyprint lang-elixir">defmodule ClientWeb.Policies.Post do
alias Client.Content.Post
use Dictator.Policy, for: Post, repo: Client.MyFunkyWeirdRepo
# ...
end
</code></pre>
<p>At this point we have the repo and module for the resource the user is trying to access. We also know the params of the HTTP call. So now we just need to call <code class="prettyprint">repo.get(module, params["id"])</code>. Now, <strong>this assumes the resource has a primary key named <code class="prettyprint">id</code></strong>. For the large majority of the resources we code, this happens to be true and we can default to that. However, developers like to get picky and use different primary keys. We’ll need to <strong>accept a <code class="prettyprint">:key</code> option</strong>:</p>
<pre><code class="prettyprint lang-elixir">defmodule ClientWeb.Policies.Post do
alias Client.Content.Post
use Dictator.Policy, for: Post, key: :uuid
# ...
end
</code></pre>
<p>Note that this assumes the key has the same name in the HTTP call params hash. If we have <code class="prettyprint">id</code> as the primary key, we expect the params hash to be <code class="prettyprint">%{"id" => id}</code>. If it’s <code class="prettyprint">uuid</code>, we expect it to be <code class="prettyprint">%{"uuid" => uuid}</code>. This logic is defined in the <a href="https://github.com/subvisual/dictator/blob/4e1050a66718bda73aa2d510950240f5b68c4feb/lib/dictator/policy.ex#L47-L57"><code class="prettyprint">load_resource/1</code> function</a>.</p>
<p>But, we, developers, like to complicate things. Sometimes the primary key might be <code class="prettyprint">uuid</code> but the HTTP param might be named something different. Sometimes we like to feel smart and have composite primary keys. Well, that’s too much of a hassle to handle and there are way too many edge cases. <strong>Let’s just allow the <code class="prettyprint">load_resource/1</code> function to be overridable and say “heh, developers can handle it”:</strong></p>
<pre><code class="prettyprint lang-elixir">defmodule ClientWeb.Policies.Post do
alias Client.Content.Post
alias Client.Repo
use Dictator.Policy, for: Post
def load_resource(params) do
Repo.get_by(Post, uuid: params["uuid"], id: params["id"])
end
# ...
end
</code></pre>
<p>You can notice we allow the function to be overridden in the same <a href="https://github.com/subvisual/dictator/blob/4e1050a66718bda73aa2d510950240f5b68c4feb/lib/dictator/policy.ex"><code class="prettyprint">policy.ex</code></a> file and the <code class="prettyprint">defoverridable</code> call.</p>
<p>Let’s recap. At this point we know how to find repos, load resources and we’ve allowed developers that use our library to have a bunch of options when <code class="prettyprint">use</code>-ing <code class="prettyprint">Dictator.Policy</code>:</p>
<ul>
<li><strong><code class="prettyprint">:repo</code> allows them to specify which repo to use to load resources.</strong></li>
<li><strong><code class="prettyprint">:key</code> allows them to specify a different primary key for the resource.</strong></li>
<li><strong><code class="prettyprint">load_resource/1</code> is overridable to allow complex queries.</strong></li>
</ul>
<p>Time to move along to how this Dictator thingy calls the police.</p>
<h2 id="how-the-thing-calls-the-delpolicedel-policy_2">How the thing calls the <del>police</del> policy <a class="head_anchor" href="#how-the-thing-calls-the-delpolicedel-policy_2">#</a>
</h2>
<p>The next step on our tour is a detour (<em>get it?? I’m on fire today</em>) to <a href="https://github.com/subvisual/dictator/blob/5c870ec348b9d18dc14770613dda5c9581763e36/lib/dictator/plug/authorize.ex"><code class="prettyprint">plug/authorize.ex</code></a>, specifically the <a href="https://github.com/subvisual/dictator/blob/5c870ec348b9d18dc14770613dda5c9581763e36/lib/dictator/plug/authorize.ex#L36-L50"><code class="prettyprint">extract_policy_module/1</code></a> function. The trick to inferring the correct policy is very obvious: use private Phoenix stuff that may or may not be in the documentation and get the controller from that. Obviously. We then use that to generate the policy module. If the controller is <code class="prettyprint">ClientWeb.PostController</code>, we’ll transform it to <code class="prettyprint">ClientWeb.Policies.Post</code>.</p>
<p>With that in mind, we can again rely on the developers to be picky and define shared policies or to want to reuse them or do weird developer stuff. Which means that they’ll need an override option. Luckily we can easily arrange it. When we are plugging the policy into the controller, <strong>developers can provide a <code class="prettyprint">:policy</code> key</strong> and we’ll only call <code class="prettyprint">load_policy/1</code> if the key isn’t present:</p>
<pre><code class="prettyprint lang-elixir"># lib/client_web/controllers/post_controller.ex
defmodule ClientWeb.PostController do
use ClientWeb, :controller
plug Dictator.Plug.Authorize, policy: ClientWeb.Policies.Content
# ...
end
</code></pre>
<p>We’ve covered how to load resources and how to select the policy. But we’re missing a couple of things: how to get the current user and how to get the action.</p>
<h2 id="how-the-thing-interacts-with-phoenix_2">How the thing interacts with Phoenix <a class="head_anchor" href="#how-the-thing-interacts-with-phoenix_2">#</a>
</h2>
<p>Starting with the current user, let’s once again be lazy: <strong>we assume there’s a <code class="prettyprint">current_user</code> in the <code class="prettyprint">conn.assigns</code></strong>. Most of the time it will. Of course, them developers will not always call it that, so we can - guess what? - give them an <strong>overridable <code class="prettyprint">:resource_key</code> option</strong> when they’re <code class="prettyprint">plug</code>-ing the policy in the controller. If your current user in <code class="prettyprint">conn.assigns</code> is called <code class="prettyprint">current_resource</code>, you can do:</p>
<pre><code class="prettyprint lang-elixir"># lib/client_web/controllers/post_controller.ex
defmodule ClientWeb.PostController do
use ClientWeb, :controller
plug Dictator.Plug.Authorize, resource_key: :current_resource
# ...
end
</code></pre>
<p>All we need now is the action. <a href="https://github.com/subvisual/dictator/blob/5c870ec348b9d18dc14770613dda5c9581763e36/lib/dictator/plug/authorize.ex"><code class="prettyprint">authorize.ex</code></a> has the answer for that: use private Phoenix stuff, again. <code class="prettyprint">conn.private.phoenix_action</code>, ez-pz.</p>
<p>For the sake of sanity, let’s add one final option. <strong><code class="prettyprint">:only</code> which specifies the actions which to enforce the policy</strong>. By default, we enforce the policy to all them actions. But a developer might want to only call a policy for the <code class="prettyprint">create</code> action:</p>
<pre><code class="prettyprint lang-elixir"># lib/client_web/controllers/post_controller.ex
defmodule ClientWeb.PostController do
use ClientWeb, :controller
plug Dictator.Plug.Authorize, only: [:new]
# ...
end
</code></pre>
<p>We finally have the current user, the action they want to take, the policy to be enforced. All we have to do in our <code class="prettyprint">Authorize</code> plug is to <a href="https://github.com/subvisual/dictator/blob/5c870ec348b9d18dc14770613dda5c9581763e36/lib/dictator/plug/authorize.ex#L29-L34">call <code class="prettyprint">policy.can?(user, action, resource)</code></a> and if they can, return an unchanged <code class="prettyprint">conn</code>. If not, well, 401 it and halt everything.</p>
<p>The logic for all these tricks is straightforward and the whole project boils down to two relevant modules (<code class="prettyprint">Dictator.Plug.Authorize</code> and <code class="prettyprint">Dictator.Policy</code>) with a staggering total of 141 lines of code. Isn’t Elixir awesome?</p>
<h2 id="overrides-for-the-standard-policy_2">Overrides for the Standard Policy <a class="head_anchor" href="#overrides-for-the-standard-policy_2">#</a>
</h2>
<p>I mentioned in the beginning of this post that there’s a very common scenario: when the developer wants to allow users to edit, update and delete their own resources and everyone to read or create new posts.</p>
<p>For that, Dictator comes bundled with the <code class="prettyprint">Dictator.Policies.Standard</code> policy. However, this policy makes two assumptions:</p>
<ol>
<li>the primary key of the user trying to access is <code class="prettyprint">id</code>
</li>
<li>the foreign key of the resource being accessed is <code class="prettyprint">user_id</code>
</li>
</ol>
<p>Of course, this doesn’t happen all the time. So when <code class="prettyprint">use</code>-ing the <code class="prettyprint">Standard</code> policy, developers have these corresponding override options:</p>
<ol>
<li>
<strong><code class="prettyprint">owner_key</code> (e.g. if your user has a <code class="prettyprint">uuid</code> field as primary key instead of <code class="prettyprint">id</code>)</strong>.</li>
<li>
<strong><code class="prettyprint">foreign_key</code> (e.g. if your resource has a <code class="prettyprint">manager_id</code> instead of <code class="prettyprint">user_id</code> as the foreign key in the relation)</strong>.</li>
</ol>
<h2 id="in-summary_2">In Summary <a class="head_anchor" href="#in-summary_2">#</a>
</h2>
<p>Lots of stuff happening, small number of codes. Elixir awesome. Demo <a href="https://github.com/subvisual/dictator_demo">here</a>. Please contribute to project: <a href="https://github.com/subvisual/dictator">subvisual/dictator</a>.</p>
<p><a href="https://www.youtube.com/watch?v=VvPaEsuz-tY">Have a nice.</a></p>
<p>Mendes</p>
tag:blog.mendes.codes,2014:Post/jobs-and-timers-in-neovim-how-to-watch-your-builds-fail2019-06-07T09:23:28-07:002019-06-07T09:23:28-07:00jobs and timers in neovim: how to watch your builds fail<p><em>Note: this blog post was originally written for the <a href="https://medium.com/subvisual">Subvisual blog</a>. You can find the original <a href="https://medium.com/subvisual/jobs-and-timers-in-neovim-how-to-watch-your-builds-fail-f18931f2ffb6">here</a></em>.</p>
<p>If you’re like me (and for your own sake, I truly hope you are not), you<br>
probably tend to have a lot of builds fail. Even worse, if you <strong>really</strong> are like<br>
me, you spend most of your time in vim.</p>
<p>If that is not the case, you’re in the clear, there’s nothing wrong with you,<br>
feel free to go, end this blog post now, be free, happy, enjoy the sunlight and<br>
the birds and the trees. Life is good.</p>
<p>.</p>
<p>.</p>
<p>.</p>
<p>.</p>
<p>.</p>
<p>.</p>
<p>.</p>
<p>.</p>
<p>.</p>
<p>.</p>
<p>.</p>
<p>.</p>
<p>… Are we, the sadists, all alone now? Cool. Ok, so you use vim a lot and you make builds<br>
fail. Chances are you would like to know when that happens without ever leaving vim. It’s alright. I got you, mate.</p>
<p>Here’s an asciicast of my nvim. Notice how the status bar includes, on the bottom<br>
right, the status of the CI. Notice how it updates. Damn, that’s neat. You want<br>
that.</p>
<p><a href="https://asciinema.org/a/5ynHiyckpQmQP7oWYI6HsVKKI"><img src="https://asciinema.org/a/5ynHiyckpQmQP7oWYI6HsVKKI.svg" alt="asciicast"></a></p>
<p>First things first, either make an API wrapper, preferably in Rust or Go,<br>
something compiled and fancy, that allows you to check the <a href="https://developer.github.com/v3/checks/">GitHub checks<br>
API</a>. Got it? Good. Now stop being a muppet and use <a href="https://hub.github.com/">hub</a><br>
instead.</p>
<p>Now that you have <code class="prettyprint">hub</code>, you can make use of the <code class="prettyprint">hub ci-status</code> command.</p>
<pre><code class="prettyprint lang-shell">$ hub ci-status
success
</code></pre>
<p>Coolio.</p>
<p>Now let’s change our custom status bar.</p>
<p>First, we want to check if we’re in a git project:</p>
<pre><code class="prettyprint lang-vim">let s:in_git = system("git rev-parse --git-dir 2> /dev/null")
if s:in_git == 0
" call hub
endif
</code></pre>
<p>So now we need to call <code class="prettyprint">hub</code>. However just doing a <code class="prettyprint">system</code> call to <code class="prettyprint">hub</code> would<br>
be a blocking operation and we don’t want our vim to block every few<br>
moments for like 5 seconds. So let’s use <code class="prettyprint">jobstart</code>.</p>
<p>Start by calling <code class="prettyprint">:h jobstart</code> from your (n)vim. You can see that it runs an<br>
asynchronous job and it supports shell commands.</p>
<p>So let’s create a <code class="prettyprint">CiStatus</code> function that looks like this:</p>
<pre><code class="prettyprint lang-vim">function! CiStatus()
let l:callbacks = {
\ 'on_stdout': function('OnCiStatus'),
\ }
call jobstart('hub ci-status', l:callbacks)
endfunction
</code></pre>
<p>We define a map of callbacks for <code class="prettyprint">stdout</code> and delegate that to a new function<br>
called <code class="prettyprint">OnCiStatus</code>. This is a very simple function that gets the output from<br>
<code class="prettyprint">hub</code> and converts it to whatever we want, storing it in a <code class="prettyprint">g:ci_status</code><br>
variable. We will later use this variable in our statusline.</p>
<pre><code class="prettyprint lang-vim">function! OnCiStatus(job_id, data, event) dict
if a:event == "stdout" && a:data[0] != ''
let g:ci_status = ParseCiStatus(a:data[0])
endif
endfunction
function! ParseCiStatus(out)
let l:states = {
\ 'success': "ci passed",
\ 'failure': "ci failed",
\ 'neutral': "ci yet to run",
\ 'error': "ci errored",
\ 'cancelled': "ci cancelled",
\ 'action_required': "ci requires action",
\ 'pending': "ci running",
\ 'timed_out': "ci timed out",
\ 'no status': "no ci",
\ }
return l:states[a:out] . ", "
endfunction
</code></pre>
<p>There are a couple of things missing though. This runs the <code class="prettyprint">hub ci-status</code> job<br>
only once. We want to have it perform constant checks. If we do <code class="prettyprint">:h timers</code>, we<br>
can see the new <code class="prettyprint">time</code> API in neovim. Theres a <code class="prettyprint">timer_start</code> that takes a period<br>
and a callback to run after that period.</p>
<p>We can then change our <code class="prettyprint">OnCiStatus</code> function to call <code class="prettyprint">timer_start</code> with that<br>
first <code class="prettyprint">CiStatus</code> function again:</p>
<pre><code class="prettyprint lang-vim">function! OnCiStatus(job_id, data, event) dict
if a:event == "stdout" && a:data[0] != ''
let g:ci_status = ParseCiStatus(a:data[0])
call timer_start(30000, 'CiStatus') " relevant new part
endif
endfunction
</code></pre>
<p>Now <code class="prettyprint">CiStatus</code> gets called by <code class="prettyprint">timer_start</code> every 3 seconds. <code class="prettyprint">timer_start</code>,<br>
however, passes the <code class="prettyprint">timer_id</code> as an argument to the callback. So we will need<br>
to modify <code class="prettyprint">CiStatus</code> to accept an argument (that we can safely ignore):</p>
<pre><code class="prettyprint lang-vim">function! CiStatus(timer_id)
let l:callbacks = {
\ 'on_stdout': function('OnCiStatus'),
\ }
call jobstart('hub ci-status', l:callbacks)
endfunction
" We also need to change the first CiStatus call to receive an int
" Since we don't care about it, let's just use 0
let s:in_git = system("git rev-parse --git-dir 2> /dev/null")
if s:in_git == 0
call CiStatus(0)
endif
</code></pre>
<p>All that’s missing now is to take the value of <code class="prettyprint">g:ci_status</code> and put into the<br>
statusline. That’s pretty simple, using some code borrowed from <a href="https://kadekillary.work/post/statusline-vim/">Kade<br>
Killary</a>.</p>
<pre><code class="prettyprint lang-vim">set statusline=
set statusline+=\ \ \ " Empty space
set statusline+=%< " Where to truncate line
set statusline+=%f " Path to the file in the buffer, as typed or relative to current directory
set statusline+=%{&modified?'\ +':''}
set statusline+=%{&readonly?'\ ':''}
set statusline+=%= " Separation point between left and right aligned items
set statusline+=\ %{g:ci_status} " Our custom CI status check
set statusline+=col:\ %c
set statusline+=\ \ \ " Empty space
</code></pre>
<p>And that’s that. Cheerios. Hugs n kisses and all that.</p>
tag:blog.mendes.codes,2014:Post/return-here-with-a-shrubbery2019-03-05T06:05:18-08:002019-03-05T06:05:18-08:00you must return here with a shrubbery: the pixels camp quizshow qualifier treasure hunt<p><em>This is the story of how I locked myself inside my room for 29 hours and only left after finishing one of the craziest tech wargames/treasure hunt I have ever taken part in.</em></p>
<p><a href="https://svbtleusercontent.com/nYtZjvFczpxu28oGuaa4qR0xspap.png"><img src="https://svbtleusercontent.com/nYtZjvFczpxu28oGuaa4qR0xspap_small.png" alt="Screenshot 2019-03-04 at 23.56.16.png"></a></p>
<h2 id="prelude_2">Prelude <a class="head_anchor" href="#prelude_2">#</a>
</h2>
<p>For those of you who don’t know, <a href="https://pixels.camp/">Pixels Camp</a> has a quiz show.</p>
<p><strong>To get to the quiz show, you have to <del>be tortured</del> get through a very good and awesome and oh so fun, so amazingly fun qualifiers.</strong></p>
<p>The qualifiers run for 4 weeks. Then there’s a week off. Then the 16 top players get to find a partner to <del>violently murder the quizmaster</del> go on stage and make fools of themselves. In order to do that, you get asked questions and <strong>then you fail. Invariably.</strong> Eventually consistently.</p>
<p>2 editions ago, me and <a href="https://twitter.com/naps62">@naps62</a> failed so hard we won. The next year, we felt a bit more confident and got swept in the first round. As far as logic goes, leave it by the door of the quiz finals and pick it up afterwards. Think you know the answer? You don’t. Been feeling like a champ? <strong>You’re gonna get chewed up and then made fun of by the quizmaster and probably your own coworkers who, incidentally, had been not only watching it live but also recording everything for reaction gifs</strong> (yes, oddly specific, I know).</p>
<p>Every year the qualifiers have a treasure hunt. That is one of my favorite things in the world. First because you get mentally challenged. Second because if you fail, you can just blame it on the lack of time and how popular you are.</p>
<blockquote>
<p><em>yeah, I couldn’t do the qualifier because it starts on a Friday night and I’m out at the pub unlike YOU LOSERS AHAHAHAH also does someone have any tips for step 3?</em></p>
<p>– stage one of doing the annual treasure hunt</p>
</blockquote>
<p>Third because it gives you a reason to hate on another human being. And, let’s face it, we all love blaming our <u>beloved</u> quizmaster, <a href="https://twitter.com/carlosefr">@carlosefr</a>, for our own shortcomings.</p>
<blockquote>
<p><em>I hate you quiz master, I haven’t slept properly for 7 weeks, I’ve been putting on weight since 2016, don’t feel like going to the gym and it’s all because of YOU. YOU and YOUR STUPID TREASURE HUNT.</em></p>
<p>– stage two of the treasure hunt, commonly found on the Pixels Camp Slack</p>
</blockquote>
<p>Enough chit chat, let’s go through the solution.</p>
<hr>
<h2 id="step-1-the-cipher-and-the-dial_2">Step 1: The Cipher and The Dial <a class="head_anchor" href="#step-1-the-cipher-and-the-dial_2">#</a>
</h2>
<p>How does the treasure hunt work? Simple, two steps:</p>
<ol>
<li>You start on step 1 and get to the next step</li>
<li>You repeat.</li>
</ol>
<p>There are no rules, but there are patterns. It’s never too complicated. It’s usually a single solution step. If you have an image, you have all it takes to get to the next level. You won’t need to do the hex dump of the image, convert it to decimal, factor out the primes, convert to ascii and that will give you a riddle to solve. If you have an image, everything you need is in there (sometimes literally <u>in</u> the image). And, again, it’s usually a single step. The multitude of steps like the one I showed you, will give you <u>uncertainty</u>. You won’t know if you are on the right track. But with the treasure hunt you always know and you should keep this in mind. <strong>If you find you are uncertain in your solution, it’s probably wrong.</strong></p>
<p>Right then, how did it start? We went to the <a href="https://quiz.pixels.camp/challenge/2019-2-not-mojibake-09c6e098-43518c9be33e65de/">challenge page</a> (sidenote: there’s a small chance this link might not be available due to the challenge closing). And we had the explanation for the treasure hunt, a form for submitting the final solution (given in the last step of the hunt) and an image. This image:</p>
<p><a href="https://svbtleusercontent.com/bBnbNrNHbe9oMAAcovEAi0xspap.png"><img src="https://svbtleusercontent.com/bBnbNrNHbe9oMAAcovEAi0xspap_small.png" alt="start_altered.png"></a></p>
<p>Ah, a weird dialect. An unknown alphabet. A ciphertext! I have a background in cryptography, so I knew <u>exactly</u> what to do in this case…</p>
<p>Yep, that’s right. <strong>Reverse google search.</strong></p>
<p>The reverse google search lead nowhere. So what’s next? My cryptographer brain was prepared. Years of study and long nights reading through complex mathematical formulas all built up to this moment.</p>
<p>You see, there’s this little trick known as <a href="https://learncryptography.com/attack-vectors/frequency-analysis">frequency analysis</a>. Basically it consists in taking every symbol of the ciphertext, drawing it carefully in a notebook and ignoring it because the right thing to do is to google “weird alien fonts”.</p>
<p>After spending some time in questionable websites which had every possible font, the solution: <strong>this is the Aurebesh alphabet!</strong></p>
<p>I found this quite amusing. It’s literally the image for <a href="https://en.wikipedia.org/wiki/Languages_in_Star_Wars">“Languages in Star Wars”</a> in Wikipedia.</p>
<p>Translating it, we got:</p>
<blockquote class="short">
<p>TO DIAL ANOTHER PLANET, USE THE QM PREFIX. OVER</p>
</blockquote>
<p>Perfect, we need to use <a href="https://ipfs.io/">IPFS</a>. All the IPFS hashes start with <code class="prettyprint">Qm</code>. But where do we find the hash? Like I said, all we need is in the picture.</p>
<p><code class="prettyprint">curl</code>‘ing the image, we got the following output:</p>
<pre><code class="prettyprint lang-shell">$ curl https://quiz.pixels.camp/challenge/2019-2-not-mojibake-09c6e098-43518c9be33e65de/start.png
# [TRUNCATED]
��������������������������������������������������#�Ř��w�ֶYtEXtcommentBright Pixel Mars Research Facility: QmfXPu3fBiPt6x6F8bsoXyuur1KPbJRqxqC72whtJRuxG���IEND�B`�%
</code></pre>
<p><em>Important sidenote: the original image had transparency, so it wouldn’t show on this page. I uploaded an altered version which might not include the hash. So if you try it yourself with the uploaded image, beware of different results.</em></p>
<p>There we go! We got a hash! Let’s put it into the IPFS gateway: <code class="prettyprint">https://gateway.ipfs.io/ipfs/QmfXPu3fBiPt6x6F8bsoXyuur1KPbJRqxqC72whtJRuxG/</code></p>
<p>Huh, it doesn’t work… Finds nothing. Let’s install IPFS and use the CLI… Nope. Just the same. Wait whaaat? How is this… Is the hash ok? Let’s compare it:</p>
<pre><code class="prettyprint lang-shell"># IPFS webpage hash, 46 bytes
QmTeW79w7QQ6Npa3b1d5tANreCDxF2iDaAPsDvW6KtLmfB
# our hash, 45 bytes
QmfXPu3fBiPt6x6F8bsoXyuur1KPbJRqxqC72whtJRuxG
</code></pre>
<p>Ok, we’re missing quite a byte. At this point I wrote a program to brute force the remaining byte. But nothing worked. What do we add?</p>
<p>The treasure hunt is supposed to be a brain challenge. Having you guess random stuff isn’t the MO. What do we have then? A hash with a byte missing. A message saying to use IPFS. <strong>What wrongful assumption are we making?</strong></p>
<p>Turns out the quizmaster is really cheeky. Our assumption that he is telling us to use IPFS is not wrong but incomplete. He’s literally telling us to use the <code class="prettyprint">Qm</code> prefix. So you <strong>add <code class="prettyprint">Qm</code> to your hash</strong> and now you have an extra byte. You remove the final <code class="prettyprint">G</code> and you go to the IPFS gateway.</p>
<p><a href="https://gateway.ipfs.io/ipfs/QmQmfXPu3fBiPt6x6F8bsoXyuur1KPbJRqxqC72whtJRux">It works.</a> A double <code class="prettyprint">Qm</code> hash! Oh, how devilish.</p>
<p>Fun fact: later on, after finishing the challenge, discussing this with the quizmaster he told me it was indirectly because of me and another contestant. While discussing past treasure hunts in the #quizshow Slack channel, 3 days before the start of the treasure hunt, this exchange happened:</p>
<p><a href="https://svbtleusercontent.com/qgHjyJ3aY5UvKRa6Cy4pK0xspap.png"><img src="https://svbtleusercontent.com/qgHjyJ3aY5UvKRa6Cy4pK0xspap_small.png" alt="Screenshot 2019-03-03 at 17.46.47.png"></a></p>
<p>This forced our quizmaster to make that step harder, which in turn led to the double <code class="prettyprint">Qm</code> hash. Oops.</p>
<hr>
<h2 id="step-2-a-message-in-hebrew_2">Step 2: A Message in Hebrew <a class="head_anchor" href="#step-2-a-message-in-hebrew_2">#</a>
</h2>
<p>Accessing the file on IPFS we had this:</p>
<pre><code class="prettyprint lang-shell">66 1
66 1
4 14#6 2#8 4#2 6#2 14#4 1
4 2#10 2#2 6#4 10#8 2#10 2#4 1
4 2#2 6#2 2#2 2#8 4#2 6#2 2#2 2#2 6#2 2#4 1
4 2#2 6#2 2#4 2#6 2#2 2#4 6#2 2#2 6#2 2#4 1
4 2#2 6#2 2#2 6#2 2#2 4#2 4#6 2#2 6#2 2#4 1
4 2#10 2#2 2#2 2#4 2#2 4#4 2#4 2#10 2#4 1
4 14#2 2#2 2#2 2#2 2#2 2#2 2#2 2#2 14#4 1
20 2#2 4#2 2#2 8#2 2#20 1
4 4#2 2#4 4#6 10#8 2#2 6#2 4#6 1
8 8#2 2#2 2#2 8#2 6#2 8#2 8#4 1
4 6#2 6#10 6#8 4#4 4#4 2#6 1
4 12#2 2#2 4#2 2#2 2#2 2#6 4#8 2#2 2#4 1
8 2#6 4#4 6#4 8#2 6#4 2#2 2#6 1
6 8#4 4#4 2#10 2#2 6#10 2#6 1
6 4#6 2#12 4#16 12#4 1
4 2#2 4#2 2#10 2#2 2#2 2#2 6#2 2#2 2#2 2#4 2#4 1
6 2#8 2#2 4#2 2#6 2#2 2#2 6#6 2#2 4#4 1
8 6#4 4#2 2#2 2#2 2#2 2#2 2#2 6#4 8#4 1
4 2#2 2#6 4#2 2#2 4#6 6#2 2#2 6#2 6#4 1
8 2#4 2#8 6#2 4#2 4#2 2#2 2#4 2#2 4#4 1
4 2#4 4#2 2#2 2#2 2#2 2#10 14#2 4#6 1
20 2#4 6#4 4#2 4#6 2#2 4#6 1
4 14#2 2#6 6#6 2#2 2#2 2#2 8#6 1
4 2#10 2#4 8#4 2#6 4#6 2#2 2#2 2#4 1
4 2#2 6#2 2#4 2#2 4#2 2#2 4#4 10#6 2#4 1
4 2#2 6#2 2#2 2#4 2#2 4#2 6#6 6#2 4#6 1
4 2#2 6#2 2#8 2#12 2#2 4#4 4#4 2#4 1
4 2#10 2#2 4#2 4#2 2#2 12#4 2#4 2#6 1
4 14#2 16#2 2#4 6#4 2#2 2#6 1
66 1
66 1
</code></pre>
<p>My intuition told me to inspect the file carefully. There was no hidden data, no metadata of any kind, no extra whitespaces or invisible characters. Nothing. WYSIWYG.</p>
<p>This was one of the hardest steps, personally. What could it be? The title of the challenge was <code class="prettyprint">Mojibake?</code>. I started with esoteric languages. When that didn’t work, I tried to think of encodings that could fit. I tried different text encodings. Then, I focused on the <code class="prettyprint">#</code> sign. Googled every cipher I could, to see if one converted text to numbers and if any used the <code class="prettyprint">#</code> sign as a delimiter.</p>
<p>When that failed, my attention shifted to <a href="https://v2.cryptii.com/text/htmlentities">HTML entities</a> and later even <a href="http://www.i18nguy.com/unicode/hebrew-numbers.html">hebrew</a>.</p>
<p>@naps62 noticed the sum of each line was always 67. I noticed that every line was a different permutation of 67. I tried to search for ciphers around the number 67.</p>
<p>I went back to <a href="https://esolangs.org/wiki/Main_Page">esolang</a> and went through every language on the list.</p>
<p>Nothing would fit. I went to bed 7 hours after the challenge started.</p>
<p><strong>I couldn’t stop thinking about it.</strong> I barely slept.</p>
<p><strong>I woke up in the middle of the night with the idea of Run-Length Encoding.</strong> I had to code a C encoder & decoder in my first year of university and that moment came to me suddenly. I dismissed it, it wouldn’t make sense. RLE transforms <code class="prettyprint">aaabbbbcc</code> into <code class="prettyprint">a3b4c2</code>. Looking at the text, we were missing characters between some numbers. The <code class="prettyprint">1</code> at the didn’t have any character.</p>
<p>I woke up again thinking about prime factors. Then again thinking about the LCM of 67. It wouldn’t fit, this was sum, not multiplication.</p>
<p>I don’t know how long I slept in total, but <strong>I was very much sleep-deprived going into the second day of the challenge.</strong> I grabbed my computer and started reading about number based ciphers, once more. <a href="https://twitter.com/iampfac">@pfac</a>, the remaining member of our triple trouble team for every Pixels Camp quiz, suggested chess moves the night before. Wouldn’t fit.</p>
<p>I was getting frustrated. At this point, I hadn’t left my room for 16 hours.</p>
<p>There were some whispers between other participants of thinking of it like a grid. Then it hit me.</p>
<p><strong><code class="prettyprint">4 2#4</code>: 4 spaces, 2 <code class="prettyprint">#</code>, 4 spaces. The 1s at the end were <code class="prettyprint">1</code> newline.</strong></p>
<p>It’s pretty simple. You can solve it with one line of Ruby. One line that solved this whole ordeal and had haunted me for the last 15 hours. Ready? Here it goes:</p>
<pre><code class="prettyprint lang-ruby">puts LINES.gsub(/(\d+)(\D)/m) { $2 * $1.to_i }
</code></pre>
<p>So simple it hurts. And here is the result:</p>
<pre><code class="prettyprint">
############## ## #### ###### ##############
## ## ###### ########## ## ##
## ###### ## ## #### ###### ## ## ###### ##
## ###### ## ## ## ## ###### ## ###### ##
## ###### ## ###### ## #### #### ## ###### ##
## ## ## ## ## #### ## ## ##
############## ## ## ## ## ## ## ## ##############
## #### ## ######## ##
#### ## #### ########## ## ###### ####
######## ## ## ######## ###### ######## ########
###### ###### ###### #### #### ##
############ ## #### ## ## ## #### ## ##
## #### ###### ######## ###### ## ##
######## #### ## ## ###### ##
#### ## #### ############
## #### ## ## ## ## ###### ## ## ## ##
## ## #### ## ## ## ###### ## ####
###### #### ## ## ## ## ## ###### ########
## ## #### ## #### ###### ## ###### ######
## ## ###### #### #### ## ## ## ####
## #### ## ## ## ## ############## ####
## ###### #### #### ## ####
############## ## ###### ## ## ## ########
## ## ######## ## #### ## ## ##
## ###### ## ## #### ## #### ########## ##
## ###### ## ## ## #### ###### ###### ####
## ###### ## ## ## #### #### ##
## ## #### #### ## ############ ## ##
############## ################ ## ###### ## ##
</code></pre>
<p><strong>You know what this is called? Run-Length Encoding. Yeah… I’m not particularly brilliant…</strong></p>
<p>Well, how nice. A QR Code. And here I was searching for ciphers and text in it. Turns out most QR Code readers don’t really like the <code class="prettyprint">#</code> sign, so we can change it for the <a href="https://www.fileformat.info/info/unicode/char/2588/index.htm"><code class="prettyprint">FULL BLOCK</code></a> character.</p>
<pre><code class="prettyprint lang-ruby">puts LINES.gsub(/(\d+)(\D)/m) { ($2 == "#" ? "█" : $2) * $1.to_i }
██████████████ ██ ████ ██████ ██████████████
██ ██ ██████ ██████████ ██ ██
██ ██████ ██ ██ ████ ██████ ██ ██ ██████ ██
██ ██████ ██ ██ ██ ██ ██████ ██ ██████ ██
██ ██████ ██ ██████ ██ ████ ████ ██ ██████ ██
██ ██ ██ ██ ██ ████ ██ ██ ██
██████████████ ██ ██ ██ ██ ██ ██ ██ ██████████████
██ ████ ██ ████████ ██
████ ██ ████ ██████████ ██ ██████ ████
████████ ██ ██ ████████ ██████ ████████ ████████
██████ ██████ ██████ ████ ████ ██
████████████ ██ ████ ██ ██ ██ ████ ██ ██
██ ████ ██████ ████████ ██████ ██ ██
████████ ████ ██ ██ ██████ ██
████ ██ ████ ████████████
██ ████ ██ ██ ██ ██ ██████ ██ ██ ██ ██
██ ██ ████ ██ ██ ██ ██████ ██ ████
██████ ████ ██ ██ ██ ██ ██ ██████ ████████
██ ██ ████ ██ ████ ██████ ██ ██████ ██████
██ ██ ██████ ████ ████ ██ ██ ██ ████
██ ████ ██ ██ ██ ██ ██████████████ ████
██ ██████ ████ ████ ██ ████
██████████████ ██ ██████ ██ ██ ██ ████████
██ ██ ████████ ██ ████ ██ ██ ██
██ ██████ ██ ██ ████ ██ ████ ██████████ ██
██ ██████ ██ ██ ██ ████ ██████ ██████ ████
██ ██████ ██ ██ ██ ████ ████ ██
██ ██ ████ ████ ██ ████████████ ██ ██
██████████████ ████████████████ ██ ██████ ██ ██
</code></pre>
<p>And the content?</p>
<blockquote class="short">
<p>Looks like a hash, but it’s so tiny… 96f493f1</p>
</blockquote>
<p>My experience from the previous treasure hunts immediately told me what this was. It’s a nice taunt. It’s a hash, but it’s tiny. What do you do with it? <a href="http://tinyurl.com/96f493f1">tinyurl.com/96f493f1</a>.</p>
<hr>
<h2 id="step-3-then-shalt-thou-count-to-three_2">Step 3: Then Shalt Thou Count To Three <a class="head_anchor" href="#step-3-then-shalt-thou-count-to-three_2">#</a>
</h2>
<p>The URL led to a Google Drive folder with this image:</p>
<p><a href="https://svbtleusercontent.com/r1gsCfFynNJJ5y6fQQfjsE0xspap.jpg"><img src="https://svbtleusercontent.com/r1gsCfFynNJJ5y6fQQfjsE0xspap_small.jpg" alt="grail.jpg"></a></p>
<p>My first step was again to analyse the image. No metadata. No extra content inside it (unlike the previous one). Nothing.</p>
<p>One thing caught our attention. The letters in <code class="prettyprint">aaaarrrrggghhhh</code> have different fonts. Some are composed of <code class="prettyprint">+</code>, others of <code class="prettyprint">-</code>.</p>
<p>I split these into two words.</p>
<pre><code class="prettyprint lang-shell">..a...aaaa..rr.r.rr.gggg.gg.hh.. # letters with +
aa.aaa....rr..r.r..r....g..h..hh # letters with -
</code></pre>
<p>Obviously there was no sense to make of this. <strong>I was so sleep deprived it took me quite a while to not think like a complete idiot.</strong> The following line of thought is one of my… uh… most brilliant ever, let’s call it that:</p>
<p>I noted the order of the <code class="prettyprint">+</code> and <code class="prettyprint">-</code>: <code class="prettyprint">--+---++++--++-+-++-++++-++-++--</code></p>
<p><strong>It was obviously a sort of binary code.</strong> What binary codes do we know of? I tried tap code and morse code. Nothing.</p>
<p>I was so disheartened. I was chatting with @luisfcorreia and vented:</p>
<blockquote class="short">
<p>Could be morse, could be tap code. An IP maybe?</p>
</blockquote>
<p>And then he saw it. <strong>Yep, it’s a binary code.</strong> Literally binary. I wasted away an hour before thinking of the obvious. Was I in a bad mental state? Yes. Was this a sign I needed coffee? Yes, a lot.</p>
<p><em>32 bits of binary. It’s an IP</em>, he told me.</p>
<pre><code class="prettyprint lang-ruby">luis.beers += 1
</code></pre>
<p>Let’s look at it:</p>
<pre><code class="prettyprint lang-shell">00100011 11001101 01101111 01101100
35 .205 .111 .108
</code></pre>
<p>We have cracked the grail.</p>
<hr>
<h2 id="step-4-the-sound-of-silence_2">Step 4: The Sound of Silence <a class="head_anchor" href="#step-4-the-sound-of-silence_2">#</a>
</h2>
<p>The IP address had a single file in it. <code class="prettyprint">synesthesia.png</code>.</p>
<p><a href="https://svbtleusercontent.com/pKEkCLsT3PDo3MBjFyxLbL0xspap.png"><img src="https://svbtleusercontent.com/pKEkCLsT3PDo3MBjFyxLbL0xspap_small.png" alt="synesthesia.png"></a></p>
<p>Synesthesia is basically your brain going belly up and you starting to hear colours or seeing sounds. Your senses get all mixed up. What senses can you tickle with a computer? Vision and hearing.</p>
<p>It was obvious to me, <strong>the image had a sound file in it</strong>. The first qualifier I went through, a few years ago, had content hidden in one of the channels of the image.</p>
<p>I isolated all the channels separately. Every permutation. Red, Green, Blue, Alpha, Red & Green, Red & Blue, Green & Blue, RGB (no alpha). Two things caught my eye. The “no alpha” version and the “alpha” version.</p>
<pre><code class="prettyprint lang-shell"># install imagemagick before this
convert synesthesia.png -alpha off no_alpha.png
convert synesthesia.png -channel RBG -fx 0 alpha.png
</code></pre>
<p>Let’s start by analysing <code class="prettyprint">alpha.png</code>.</p>
<p><a href="https://svbtleusercontent.com/rDihTYGHokz1wwWMv51iET0xspap.png"><img src="https://svbtleusercontent.com/rDihTYGHokz1wwWMv51iET0xspap_small.png" alt="alpha.png"></a></p>
<p>I had seen this type of pattern before in a different wargame. Sets of squares in grayscale, almost randomly, an <strong>indicator of a binary file with the PNG headers around it</strong>. They seem like random noise, but they really are not. You can clearly see some lines with similar “gray” values. Usually this indicates that there is information concealed.</p>
<p>I suspected the audio file was in the alpha channel and the confirmation came with <code class="prettyprint">no_alpha.png</code>:</p>
<p><a href="https://svbtleusercontent.com/48YHSTAfGkdVQ4MPfgmnQx0xspap.png"><img src="https://svbtleusercontent.com/48YHSTAfGkdVQ4MPfgmnQx0xspap_small.png" alt="no_alpha.png"></a></p>
<p>There’s nothing interesting in this image, except that it does look like an actual image instead of the spaghetti mess that <code class="prettyprint">synesthesia.png</code> is. I reverse google searched the image and found <a href="https://paintingvalley.com/abstract-river-painting#abstract-river-painting-22.jpg">the original</a>. The evil quizmaster stole this artist’s intellectual property and compressed the image almost beyond recognition, all in the name of producing a mastercrime of a challenge for us. How thoughtful and sweet.</p>
<p><strong>Having seen the original, I took it as a confirmation I was on the right track.</strong> Before going back to the <code class="prettyprint">alpha.png</code> image, we need to understand how PNGs work.</p>
<p>Images, in general, are grid of pixels. Each pixel has 3 values ranging from 0 to 255. One for each of red, blue and green. The intensity of each colour combined determines the final colour of the pixel. For each channel we need 8 bits. 24 in total for a single pixel. PNGs in particular can have 32 bits. The final 8 bits are for transparency. AKA the alpha channel.</p>
<p>My reasoning was set on the audio file being in there, everything pointed to it. However, <strong>when I extracted the alpha channel it was still an image</strong>. The problem was that during the extraction, <strong>I was converting back to a PNG</strong> and stretching out that information again, adding in the headers and in general screwing everything up.</p>
<p>I needed a way to remove the 8 bit sequences and placing them into a single file without converting it into an image. I was almost starting to implement my own code to do so when <a href="https://twitter.com/luisfcorreia">a dear friend</a> (to whom I owe a beer for preventing me from doing this), hinted me the correct command.</p>
<pre><code class="prettyprint lang-shell"># wrong command
$ convert synesthesia.png -alpha off no_alpha.png
# right command
$ convert -alpha extract synesthesia.png synesthesia.gray
$ file synesthesia.gray
synesthesia.gray: Audio file with ID3 version 2.3.0, contains:MPEG ADTS, layer III, v2.5, 32 kbps, 8 kHz, Monaural
</code></pre>
<p><strong>He googled for more <code class="prettyprint">imagemagick</code> options. I googled for C vim plugins and libs</strong>, ready to get my hands dirty. Let’s ponder about that for a second and <em>never</em> speak of it again.</p>
<hr>
<h2 id="step-5-bacon-i-really-can39t-think-of-a-cleve_2">Step 5: bacon. I really can’t think of a clever name for this section, I just hate the quizmaster so much. <a class="head_anchor" href="#step-5-bacon-i-really-can39t-think-of-a-cleve_2">#</a>
</h2>
<p>So now we have our audio file. Let’s open our minds for what we are about to hear, for we have been graced with - nevermind, close it, <strong>it’s morse</strong>, let’s just plug it into an <a href="https://morsecode.scphillips.com/labs/audio-decoder-adaptive/">online decoder</a></p>
<blockquote class="short">
<p>TO AVOID CREW SUSPICION REQUEST INFO WITH SMALL BACON. ONE BACON ONLY.</p>
</blockquote>
<p>And what do you get out of this?</p>
<p>…</p>
<p>…</p>
<p>…</p>
<p>It’s obvious isn’t it?</p>
<p>You have to make a request…</p>
<p>…</p>
<p>… maybe?</p>
<p>To… some… server? I think?</p>
<p>Seriously, what. the. hell.</p>
<p>Ok, let’s break it down. We have to make a request for info. We have the <strong>previous server IP address, so intuition tells us it’s that.</strong></p>
<p><code class="prettyprint">nmap</code> tells us… nothing.</p>
<pre><code class="prettyprint lang-shell">PORT STATE SERVICE
22/tcp open ssh
80/tcp open http
3389/tcp closed ms-wbt-server
</code></pre>
<p>Maybe an HTTP request?</p>
<pre><code class="prettyprint">$ curl 35.205.111.108/bacon
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>
$ curl 35.205.111.108/.bacon
# same response
$ curl -I 35.205.111.108
HTTP/1.1 200 OK
Server: nginx
Date: Mon, 04 Mar 2019 22:49:33 GMT
Content-Type: text/html; charset=utf-8
Connection: keep-alive
Vary: Accept-Encoding
# aka: nothing in the headers
</code></pre>
<p>At this point I tried some custom headers. <code class="prettyprint">Content-Type: Bacon</code>, <code class="prettyprint">Authorization: Bacon</code>, <code class="prettyprint">Footloose: Kevin Bacon</code>, <code class="prettyprint">Accepts: Bacon</code>. Nothing. Not even <code class="prettyprint">Francis: Bacon</code>.</p>
<p>At this point, the quizmaster, ever so helpful, said something that… threw me and @pfac off as apparently we both thought the same.</p>
<p><a href="https://svbtleusercontent.com/4jjQGSyJ3Br2s8fmDdTdNE0xspap.png"><img src="https://svbtleusercontent.com/4jjQGSyJ3Br2s8fmDdTdNE0xspap_small.png" alt="Screenshot 2019-03-04 at 22.52.54.png"></a></p>
<p><em>Hurry… Be quick… QUIC! That’s it!</em></p>
<p>Aaaaaaaaaand we went in circles for the next hour.</p>
<p>The right tip came with this:</p>
<p><a href="https://svbtleusercontent.com/k9Dy7n75Q3Qjnw6pFwehFp0xspap.png"><img src="https://svbtleusercontent.com/k9Dy7n75Q3Qjnw6pFwehFp0xspap_small.png" alt="Screenshot 2019-03-04 at 22.57.24.png"></a></p>
<p>What an oddly specific thing to say, I wond- oh it’s a quote from <a href="https://en.wikipedia.org/wiki/The_Hunt_for_Red_October_(film)">The Hunt for Red October</a>. I’m sure it’s related to the movie. Turns out <a href="https://twitter.com/luisfcorreia">someone I now owe three beers to</a> hinted me the following line:</p>
<blockquote class="short">
<p>Give me a ping, Vasily. One ping only, please.</p>
</blockquote>
<p>Elementary, now! <strong>We need to ping the server.</strong> How did I not <em>immediately</em> think of that?</p>
<pre><code class="prettyprint lang-shell">$ ping -c 1 35.205.111.108
PING 35.205.111.108 (35.205.111.108): 56 data bytes
--- 35.205.111.108 ping statistics ---
1 packets transmitted, 0 packets received, 100.0% packet loss
</code></pre>
<p>Of course <strong>it’s not replying to pings</strong>. This is actually a good indicator that we are onto the right track.</p>
<p>After googling the <a href="https://en.wikipedia.org/wiki/Ping_(networking_utility)#Echo_request">ICMP echo request message format</a>, we can see that it has a payload.</p>
<p>At this point <strong>I was fairly certain we needed to send <code class="prettyprint">bacon</code></strong> (lowercase. Remember, “small bacon”) as the payload.</p>
<p><code class="prettyprint">ping</code> in macOS supports the <code class="prettyprint">-p</code> flag for the payload. We can send 16 bytes of hex. I was sure of this. This was it. We got it. After so many hours hitting the wall. Here we go:</p>
<pre><code class="prettyprint lang-shell">$ ping -c 1 -p "6261636f6e" 35.205.111.108
PATTERN: 0x6261636f6e
PING 35.205.111.108 (35.205.111.108): 56 data bytes
--- 35.205.111.108 ping statistics ---
1 packets transmitted, 0 packets received, 100.0% packet loss
$ flip
(ノಠ益ಠ)ノ彡┻━┻
</code></pre>
<p>How? Maybe we padded it wrong. Maybe we should use <code class="prettyprint">00000000000000000000006261636f6e</code> as the string…</p>
<p>But still nothing.</p>
<p>How? I was so certain of it.</p>
<p>I’m not even sure how this happened, but @pfac said something like <strong>“I’m gonna try this on Linux. Can’t trust these Macs”</strong> while I just stood in disbelief.</p>
<p>Aaaaaand of course it worked, how could it not?</p>
<p>To this day, I don’t know why. Running the same command on macOS yielded nothing. After the quiz I asked the quizmaster about this and he went through the server logs. Nothing was coming on the payload of my requests so, yeah. Thanks for that, Apple.</p>
<p>Regardless, there was a different solution for this:</p>
<p><a href="https://svbtleusercontent.com/n896SFSfyCTRLUsFRJ5DAq0xspap.png"><img src="https://svbtleusercontent.com/n896SFSfyCTRLUsFRJ5DAq0xspap_small.png" alt="Screenshot 2019-03-04 at 23.12.36.png"></a></p>
<p>And you know what? This was much better! Want to know why!? It gave you the reply payload! Normal ping didn’t! Poor @pfac had to install Wireshark! And <em>JAVA! <strong>EWWWWW</strong></em>.</p>
<hr>
<h2 id="step-6-bear-with-me-man-i-lost-my-train-of-th_2">Step 6: Bear with me man, I lost my train of thought… <a class="head_anchor" href="#step-6-bear-with-me-man-i-lost-my-train-of-th_2">#</a>
</h2>
<p>Finally we had it.</p>
<blockquote class="short">
<p>I.have.leaked.a.data.breach.in.the.usual.place:.DYS4VbmB</p>
</blockquote>
<p>This was fairly obvious. <a href="https://pastebin.com/DYS4VbmB">Where else would you leak a data breach?</a></p>
<pre><code class="prettyprint">I've read this and, frankly, it's really embarrassing.
2018-02-11 03:38:23.040: DEBUG: PDU: 079153210100001001000C9153210100002000008A437498CD2EBBCF6510F9ED2E8740C332BB2C9687E96590380F0ABBE7F7B23CED3E83EE693A1A4447A7E7A0F9DB7DD62914C377985E2E83C6E830885E079DDF6F16285B1783DC757148317C87E9E532688C0E83E87510F9FD7681B261F43D8C0E2986EF30BD5C068DD16110BD0EA2BFDF2E50360C1AA3C3E110
</code></pre>
<p>Converting to ASCII or decimal gave us nothing. Alright, think. <code class="prettyprint">PDU</code>. <strong>What is that? Google says it’s a Power Distribution Unit.</strong> Ok, this is maybe a kernel panic debug log? How do we decode this? Are there similar examples online? I can’t find any with that format. Maybe if we-</p>
<blockquote>
<p><em>Nah, mate. I just googled “PDU Decoder”. Stick it in <a href="https://www.diafaan.com/sms-tutorials/gsm-modem-tutorial/online-sms-pdu-decoder/">here</a>. Apparently that’s a Protocol Data Unit? Anyway, it outputs the final text.</em></p>
<p>– @pfac</p>
</blockquote>
<p>Ah… That’s… fair…</p>
tag:blog.mendes.codes,2014:Post/refresh-from-your-editor2018-05-26T05:37:40-07:002018-05-26T05:37:40-07:00stop the overhead, refresh from your editor<p><em>tl;dr: I am lazy and I made a script for when I don’t have webpack to refresh my browser for me. I can now refresh it from my editor. It’s available <a href="https://github.com/fribmendes/dotfiles/blob/master/bin/browser.refresh">here</a>. This is specific for macOS. Script is explained ahead.</em></p>
<p>I’m a lazy programmer. If anything requires me to get off my terminal or my vim, I will probably automate it. <strong><a href="https://github.com/fribmendes/dotfiles/blob/master/bin/xkcd">Like checking the most recent xkcd</a></strong>.</p>
<p>Sometimes I don’t have webpack to refresh my browser for me. This is an issue because it requires me to change focus from the terminal to the browser and <em>then</em> refresh. You may think that automating this is a huge overkill. However, I found that having a shortcut in my editor to do that has <em>significantly</em> decreased the time it takes for me to process everything that happened.</p>
<p>Let’s go through this one step at a time.</p>
<p>First of all, you need to do cmd+tab. You are very likely to code in full screen and in macOS which means <strong>you have a cute little animation that takes a few hundred milliseconds to change screens</strong>. Since this is a mechanical process, your muscles get used to doing cmd+tab once. <strong>This doesn’t take you necessarily to your browser</strong>. It takes you to the last window you had open. Oops. Now you instinctively do cmd+tab again. It takes you back to your last open window. Your terminal. Now you’re back where you started. This is a pattern I’ve seen happen recurrently in most developers.</p>
<p>But let’s assume the best case scenario where you effectively went to your browser. <strong>Your brain still has to process everything</strong> that happened and make sure you opened the right window. Only now will you hit cmd+r which requires a different hand movement and a couple more hundred milliseconds.</p>
<p>There is one final overhead: <strong>context switching</strong>. I found this happens not only to me but to other developers as well. The act of changing windows and refreshing has made your brain <strong>switch context to process the visual changes</strong>. This is mostly to make sure you are in the window you wanted to be. It takes a while for your brain to go back to the previous context and figure out what was the task you wanted to check. If you’re tired, this can mean up to a couple more seconds of overhead.</p>
<p>All this cognitive overhead can be solved by <strong>muscle memory</strong>. Like I said, that first cmd+tab is usually the result of muscle memory. The problem is everything that comes afterwards. We can leverage that and have a shortcut that finds the correct browser window, focuses on it if needed and refreshes in one go.</p>
<p>This is the script:</p>
<pre><code class="prettyprint lang-applescript">#!/usr/bin/osascript
tell application "System Events"
set processList to get the name of every process whose background only is false
set applicationNameList to {}
repeat with processName in processList
set applicationList to file of (application processes where name is processName)
repeat with applicationAlias in applicationList
set applicationName to (name of applicationAlias) as string
set applicationNameList to applicationNameList & applicationName
end repeat
end repeat
end tell
set browserLaunched to true
if applicationNameList contains "Firefox Developer Edition.app" then
set browser to "Firefox Developer Edition"
else if applicationNameList contains "Google Chrome.app" then
set browser to "Google Chrome"
else if applicationNameList contains "Firefox.app" then
set browser to "Firefox"
else
set browser to "Firefox Developer Edition"
set browserLaunched to false
end if
set numberOfDisplays to (do shell script "system_profiler SPDisplaysDataType -detailLevel | grep -e 'Resolution:' | wc -l | tr -d '[:space:]'") as integer
if browserLaunched and numberOfDisplays > 1 then
set browserShouldActivate to false
else
set browserShouldActivate to true
end if
tell application browser
if browserShouldActivate then activate
end tell
tell application "System Events"
tell process browser
keystroke "r" using {command down}
delay 0.1
end tell
end tell
</code></pre>
<p>It works the following way:</p>
<ol>
<li>
<strong>Get the current application list.</strong> For those of you familiar with AppleScript, you will wonder why I go through the extra process of getting the application name, instead of using the process name. That’s because “Firefox Developer Edition” and “Firefox” are different applications but use the same “firefox” process.</li>
<li>
<strong>Get the correct browser.</strong> I usually use FF Dev when working, so if it’s open, it’s going to be that one. Otherwise, I’m probably working in Chrome, so that’s the next check. If neither is open, try regular Firefox. Finally, just assume no browser is launched and set the flag to launch it. Those of you that use a different browser or browser stack, should just change the order of the <code class="prettyprint">if</code> statements.</li>
<li>
<strong>Check the number of displays.</strong> If I’m using two displays, I’ll have the terminal on one and the browser in the other so I don’t need to change focus to it. If I’m using only one display, then I need to put the focus on the browser.</li>
<li>
<strong>Activate if needed</strong>. If no browser is launched, this will launch it. Otherwise, it will just change the window focus.</li>
<li><strong>Refresh the browser.</strong></li>
</ol>
<p>Save this into a <code class="prettyprint">browser.refresh</code> file and put it somewhere in <code class="prettyprint">$PATH</code>. The next step is to call it from the editor. I use nvim, so it is a simple one-liner: <code class="prettyprint">nnoremap <localleader>r :silent !browser.refresh<CR></code>.</p>
<p>This automation allows me to resolve all that hassle by clicking <code class="prettyprint">,r</code>.</p>
<p>I found that by using this, by the time the screen finishes switching, the refresh is almost always done. It also prevents me from the annoying cmd+tab dance I do when the browser isn’t the last window I opened.</p>
<p>Take the script. Put it in a file in your <code class="prettyprint">$PATH</code>. Add a shortcut to your editor. Stop the overhead.</p>
<p><strong>UPDATE</strong>: Some people told me they mostly use Chrome and the browser lookup thing is a bit of a hassle for them. This should work for you, change Chrome to your browser as needed:</p>
<pre><code class="prettyprint lang-applescript">#!/usr/bin/osascript
set numberOfDisplays to (do shell script "system_profiler SPDisplaysDataType -detailLevel | grep -e 'Resolution:' | wc -l | tr -d '[:space:]'") as integer
if numberOfDisplays > 1 then
set browserShouldActivate to false
else
set browserShouldActivate to true
end if
tell application "Google Chrome"
if browserShouldActivate then activate
end tell
tell application "System Events"
tell process "Google Chrome"
delay 0.1
keystroke "r" using command down
end tell
end tell
</code></pre>
tag:blog.mendes.codes,2014:Post/i-always-wanted-to-do-a-screencast2018-05-18T10:38:13-07:002018-05-18T10:38:13-07:00I always wanted to do a screencast<p>I always wanted to do a screencast. I was also always afraid to do it.</p>
<p>That being said, it’s called Beware of the Software and the first episode is <a href="https://www.youtube.com/watch?v=LELH5ohH2Vo">here</a>:</p>
<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/LELH5ohH2Vo?rel=0"></iframe>
<p>You can <a href="https://github.com/fribmendes/beware_of_the_software">find the code for it here</a>.</p>
<p>The screencast is going to be about… uh… computer stuff, let’s call it that. I can’t promise you it will always be about Elixir or distributed systems. But at least the next batch of episodes will be precisely on that. Distributed systems with Elixir. From there, I’m thinking about going through some CS papers. I also take suggestions if you’re willing to give them.</p>
<p>I’m still learning and experimenting with the format so it is far from perfect… Or maybe even “good”. But you can help me make it good.</p>
<p>I’m asking everyone for feedback. Drastically reducing the number of minutes, showing myself while coding, increasing the font size, tips for the mic and voice. Help has been invaluable. <a href="https://twitter.com/fribmendes">Hit me on Twitter</a> with all you have (<a href="https://svbtle.com">Svbtle</a> doesn’t allow comments). Rip me a new one if you must, but all feedback is appreciated.</p>
<p>Hope you enjoy!<br>
hack the gibson and all that.</p>
tag:blog.mendes.codes,2014:Post/mix-format-in-vim-from-anywhere-or-just-in-umbrella-apps2018-04-23T03:31:59-07:002018-04-23T03:31:59-07:00
mix format in vim from anywhere (or just in umbrella apps)<p><em>Spoiler alert: This post is about setting up vim so that you run mix format automatically when you save a file and have it detect the nearest <code class="prettyprint">.formatter.exs</code></em></p>
<p>I have this weird issue with mix format and umbrella apps.</p>
<p>The issue is that <code class="prettyprint">mix format</code> assumes you have a <code class="prettyprint">.formatter.exs</code> file in the current directory. If you don’t, it doesn’t look upwards in the file tree. It simply assumes you want to run it with the default config. You can change this behaviour by using the <code class="prettyprint">--dot-formatter</code> flag to explicitly point to the formatter file you want to use.</p>
<p>Now, in vim you can also use <a href="https://github.com/w0rp/ale"><code class="prettyprint">ale</code></a> to run <code class="prettyprint">mix format</code> on save. If you don’t know <code class="prettyprint">ale</code>, take the time to do so. I set up <code class="prettyprint">ale</code> to do this precisely by adding the following line to my (n)vim config:</p>
<pre><code class="prettyprint lang-vim">let g:ale_fixers['elixir'] = ['mix_format']
</code></pre>
<p>Most of the time you will be good to go with this and you won’t find any more issues.</p>
<p>But when I’m working with Elixir umbrella apps, I sometimes <code class="prettyprint">cd</code> to the <code class="prettyprint">apps/<app></code> directly so that my <a href="https://github.com/kien/ctrlp.vim">Ctrl+P</a> doesn’t get all cluttered by similarly named files from different applications. If you have a <code class="prettyprint">.formatter.exs</code> file with custom rules and <code class="prettyprint">ale</code> set up to run <code class="prettyprint">mix format</code> on file save, things get tricky. <code class="prettyprint">mix format</code> won’t detect your config unless you explicitly set the <code class="prettyprint">--dot-formatter</code> flag.</p>
<p>As of a few weeks ago, <a href="https://github.com/w0rp/ale/pull/1410">my PR to allow custom mix format options</a> in <code class="prettyprint">ale</code> has been accepted.</p>
<p>With this, we can attempt to find the nearest <code class="prettyprint">.formatter.exs</code> and dynamically pass the location to <code class="prettyprint">ale</code>. Here’s a bit of vimscript to do so (ignore my n00bness writing vimscript):</p>
<pre><code class="prettyprint lang-vim">function! LoadNearestFormatter()
let l:formatters = []
let l:directory = fnameescape(expand("%:p:h"))
for l:fmt in findfile(".formatter.exs", l:directory . ";", -1)
call insert(l:formatters, l:fmt)
endfor
call reverse(l:formatters)
let g:ale_fixers['elixir'] = ['mix_format']
if len(l:formatters) > 0
let g:ale_elixir_mix_format_options = "--dot-formatter " . l:formatters[0]
endif
endfunction
call LoadNearestFormatter()
</code></pre>
<p>If you have a better version of this, please let me know.</p>
<p>That bit of code will look for <code class="prettyprint">.formatter.exs</code> files along the file tree and if any is found, it passes that along with the correct option.</p>
<p>You can also define a <code class="prettyprint">.formatter.exs</code> in your home and use it as fallback for when no other <code class="prettyprint">.formatter.exs</code> is present, but I’d advise against this.</p>
<p><a href="https://twitter.com/naps62">@naps62</a> has the following vim code instead:</p>
<pre><code class="prettyprint lang-vim">let l:git_root = system("git rev-parse --show-toplevel")[:-2]
let l:fmt = findfile(".formatter.exs", l:git_root)
let g:ale_elixir_mix_format_options = "--dot-formatter " . l:fmt
</code></pre>
<p>It’s a lot less verbose and it won’t search for <code class="prettyprint">.formatter.exs</code> files in your home directory.</p>
<p>And that’s it! Now you can use <code class="prettyprint">ale</code> to automatically run <code class="prettyprint">mix format</code> inside umbrella apps.</p>
<hr>
<p><em>I tend to write here so you can <a href="http://blog.mendes.codes/feed">subscribe</a>. I also <a href="https://twitter.com/fribmendes">tweet</a> and <a href="https://github.com/fribmendes">do open source</a> sometimes. If you are into that type of things, hit that follow button.</em></p>
tag:blog.mendes.codes,2014:Post/a-look-into-bloom-filters-with-ruby2018-04-17T06:44:01-07:002018-04-17T06:44:01-07:00a look into bloom filters with ruby<p><em>Disclaimer: this blog post was originally written and published in the <a href="https://subvisual.co/blog/posts/96-a-look-into-bloom-filters-with-ruby/">Subvisual blog</a> in April 2016.</em></p>
<p>I remember one particular class I had. It was late May and, as pretty much every Spring day in Portugal, the sun decided to greet us with a little too much enthusiasm.</p>
<p>The class was about Reliable Distributed Systems, as part of my Distributed Systems & Cryptography master’s program. Distributed Systems students at <a href="https://www.uminho.pt/EN">University of Minho</a> have their classes every Monday in the mythical 0.05 room. A room conveniently located just a couple of meters away from the coffee machine. A room in front of a beautiful, grassy, green patch right in the middle of the campus. A room where the blazing heat caused by 6 straight hours of direct sunlight meets the noisy embrace of <em>dozens</em> of servers in the back. Of course, eager PhD students have millions of tests, queries and transactions to analyse, which doesn’t help our case <em>at all</em>. And of course they all come back from the weekend anxious to run them all at once while those poor, young and ambitious master’s students are having classes.</p>
<p>During that particular class, the professor was introducing P2P networks and <a href="https://en.wikipedia.org/wiki/Gnutella">Gnutella</a>. At this point, everyone was in awe. I remember hearing in the distance <em>“So this is how we piracy™…”</em> I still find the lack of terrible corporation related puns a bit disturbing but maybe that noisy server embrace sucked all our humor away.</p>
<p>When the professor mentioned <em>“bloom filters”</em> my senses started to tickle. Maybe it was the late May heat or the fact that it sounded <em>really</em> fancy. But I was very bored, and I decided to check it out.</p>
<h2 id="bloom-filters_2">Bloom Filters <a class="head_anchor" href="#bloom-filters_2">#</a>
</h2>
<p>I began my research by opening up the promised land of essays for university students: Wikipedia.</p>
<p>According to the Wikipedia entry, a bloom filter is a <em>space-efficient probabilistic data-structure</em>, which at the time I thought was mostly technical jargon for <em>a funky array that uses hash functions to index boolean values and is supposed be really really small</em>.</p>
<p>It’s used for testing the inclusion of elements in a set (<em>is 6 in the bloom filter?</em>), and some notorious adopters include Akamai, Bitcoin, Medium and loads of databases. Apparently, Gnutella uses it to check if a super-peer’s connections are sharing requested content. I probably could’ve learned that earlier if I was actually listening to the professor…</p>
<p>Before we delve into its internal behaviour, let’s make sure we get the basic definition and overall behaviour right.</p>
<p><em>Think of a Bloom Filter as a small blackbox where you can save values but not remove them. Another trait is that you can query it whether it contains a certain value. If the response is negative, it’s <strong>guaranteed</strong> that the value is not in the bloom filter. However, if the response is positive, it <strong>probably</strong> is in the bloom filter but it can sparingly happen that this isn’t true.</em></p>
<p>At this point, like me in that hot late Spring afternoon, you’re probably thinking:</p>
<blockquote class="short">
<p>Why would anyone even like this?</p>
</blockquote>
<p>You put values in but you can’t remove them. You query values but you can’t trust the answer. As far as usefulness goes, you’ve probably labeled them already as the <em>Magikarp of Computer Science</em>.</p>
<p>Well, the thing about bloom filters is that they are very efficient, both in space and time. You can see if a value is inside the bloom filter in near-constant lookup and you don’t even need to save the element you are querying. In fact, most bloom filters use only few bits per element. As we will see further ahead, if your application requires fast inclusion tests and can handle a few occasional false positives, bloom filters are for you. Let’s dissect our Magikarp.</p>
<h3 id="dissecting-a-bloom-filter_3">Dissecting a Bloom Filter <a class="head_anchor" href="#dissecting-a-bloom-filter_3">#</a>
</h3>
<p>So how does such a peculiar data type work?</p>
<p>Bloom filters implement two operations: <code class="prettyprint">add</code> and <code class="prettyprint">test</code>. Both operations start by hashing the given value multiple times, either by using a different seed or running different hash functions. The output is a set of indexes or keys that will either be checked for inclusion (if we are testing) or marked as <code class="prettyprint">present</code> (if we are adding).</p>
<p>Imagine I give you an empty bloom filter and you want to add <code class="prettyprint">subvisual</code>. The string will be hashed 3 times and the 3 corresponding indexes will be filled up. The result should be something similar to this:</p>
<p><img src="https://subvisual.s3.amazonaws.com/blog/post_image/142/image-1469810899727.png" alt='Bloom Filter containing "subvisual"'></p>
<p>Ok, seems good. However, you are now curious and you begin to wonder if <code class="prettyprint">rubyconfpt</code> is contained in the structure. You decide to <code class="prettyprint">test</code> it.</p>
<p>The string will be hashed the same amount of times and the resulting indexes will be verified.</p>
<p><img src="https://subvisual.s3.amazonaws.com/blog/post_image/143/image-1469810920599.png" alt="Bloom Filter testing `rubyconfpt`"></p>
<p>Even though one of the indexes was indeed filled up, the other two weren’t, so you can conclusively state that <code class="prettyprint">rubyconfpt</code> isn’t in the filter. In fact, if as much as a single index reveals an empty entrance, you can safely make this conclusion.</p>
<p>Eager for more values, you try adding <code class="prettyprint">rubyconfpt</code> next. The resulting indexes will also be marked as present. Any repeated index will have no changes since the universe of possible values inside a bloom filter is only <code class="prettyprint">filled</code> or <code class="prettyprint">empty</code>.</p>
<p><img src="https://subvisual.s3.amazonaws.com/blog/post_image/144/image-1469810942917.png" alt="Bloom Filter after adding `rubyconfpt`"></p>
<p>Now, suppose you want to test if <code class="prettyprint">mirrorconf</code> is in the bloom filter. I can assure you it isn’t, but you’re clever and curious. Instead of taking my word for it and decide to test it anyway.</p>
<p><img src="https://subvisual.s3.amazonaws.com/blog/post_image/145/image-1469810973884.png" alt="Bloom Filter testing `mirrorconf`"></p>
<p>Even though <code class="prettyprint">mirrorconf</code> was never added, the bloom filter is saying it indeed contains it. Well, <em>probably contains</em>. This happens because a bloom filter is a <strong>probabilistic data structure</strong>. The fact that we have a reduced number of indexes available to fill, along with the natural properties of hash functions, means that eventually collisions will occur. The use of multiple hashed values attempts to reduce the amount of collisions, making them sparse but not inexistent.</p>
<h3 id="diving-into-ruby_3">Diving into Ruby <a class="head_anchor" href="#diving-into-ruby_3">#</a>
</h3>
<p>We can implement a <em>very</em> simple bloom filter as an array or hash table. This will be a very <em>dumbed-down</em>, inefficient implementation. Let’s call it our <em>Dumbfilter</em>.</p>
<p>Everything I said so far mentioned hash functions, but are they really required? We’ll start by implementing it with a simple array. Every element we may want to <code class="prettyprint">add</code> is going to be pushed into it. As a consequence, testing will be done using <code class="prettyprint">Array#include?</code>. The resulting code looks something like this:</p>
<pre><code class="prettyprint lang-ruby">module DumbFilter
class Array
def initialize
@data = []
end
def test(str)
@data.include? str
end
def add(str)
@data << str
end
end
end
</code></pre>
<p>If we take some time to think about the issues with this implementation, we can find some very obvious ones. Well, for starters you don’t get to play with hash functions which, at least for me during the symphony of servers orchestrated by my professor, was a big <em>put-off</em>. Besides that, the sequential access that comes with using an array means we end up with <code class="prettyprint">O(n)</code> time complexity for both adding and testing, not to mention <code class="prettyprint">O(n)</code> space complexity.</p>
<p>Let’s try to improve our <em>dumbfilter</em> by reducing the time complexity. If hash functions are required for efficiency, we can achieve constant lookup by using a hash table. In fact, let’s make use of Ruby’s internal hash functions and just use the <code class="prettyprint">Hash#[]</code> operator to set the accessed value to <code class="prettyprint">true</code>.</p>
<pre><code class="prettyprint lang-ruby">module DumbFilter
class Hash
def initialize
@data = {}
end
def test(str)
@data[str]
end
def add(str)
@data[str] = true
end
end
end
</code></pre>
<p>This solution appears to be better since we now have constant access. However we are saving explicit <code class="prettyprint">(key, value)</code> tuples and Bloom Filters are <em>space-efficient data structures</em>, so the current solution isn’t exactly what we are looking for. Our milestone will be the <em>few bits per element</em> I mentioned earlier. We can start by saving the values in an array and generating the correct indexes for each string. To do this, let’s start by adding <a href="https://github.com/peterc/bitarray">@peterc’s bitarray</a> to our project. We’ll also be using the <a href="https://github.com/jakedouglas/fnv-ruby">fnv hash</a>.</p>
<p>In this version we are going to hash a given string, obtaining an integer as a result. That integer has to be limited to the size of our array and we can guarantee that by using <a href="https://en.wikipedia.org/wiki/Modulo_operation">the modulo operation</a>: <code class="prettyprint">index % size</code> would result in a value between 0 and <code class="prettyprint">size</code>. After that, adding and testing both become a simple access the correct index, setting a bit to 1 if requested.</p>
<pre><code class="prettyprint lang-ruby">require "fnv"
require "bitarray"
module BloomFilter
class V1
def initialize(size: 1024)
@bits = BitArray.new(size)
@fnv = FNV.new
@size = size
end
def add(str)
@bits[i(str)] = 1
end
def test(str)
@bits[i(str)] == 1
end
private
def i(str)
@fnv.fnv1a_64(str) % @size
end
end
end
</code></pre>
<p>The main issue with this version is that, over time, the bloom filter will become clogged with multiple false positives due to recurrent collisions. Since our universe of possible values is limited to the array size, bloom filters in particular tend to suffer from this effect. To handle it, we can either use multiple hash functions or the same hash function with different seeds. Let’s implement the latter.</p>
<p>To guarantee that for multiple invocations of the same input produce the exact same output, we’ll need to generate the seeds and save them beforehand.</p>
<pre><code class="prettyprint lang-ruby">def seed(nr)
(1..nr).each_with_object([]) do |n, s|
s << SecureRandom.hex(3).to_i(16)
end
end
</code></pre>
<p>After generating and saving the seeds, we need to define how hashing will occur for multiple seeds. In our case, we will simply generate an array containing the hash value for every available seed.</p>
<p>This particular implementation uses the <a href="https://en.wikipedia.org/wiki/MurmurHash">MurmurHash function</a> which is internally used by Ruby. By using it, we can later compare results with the actual Hash implementation.</p>
<pre><code class="prettyprint lang-ruby">def i(str)
@seeds.map { |seed| hash(str, seed) % @size }
end
def hash(str, seed)
MurmurHash3::V32.str_hash(str, seed)
end
</code></pre>
<p>Having these three methods, we are now able to generate the same indexes in recurrent calls. Adding should be nothing more than marking every index with 1 and testing should be limited to retrieving the index values and checking if they are all 1. The final versions of the code are available <a href="https://gist.github.com/frmendes/67eae3f7792ed812330a344e91e35dfa">here</a>. Feel free to comment if you have any questions or want to add something.</p>
<h3 id="in-summary_3">In Summary <a class="head_anchor" href="#in-summary_3">#</a>
</h3>
<p>By now I hope to have shown you what bloom filters are and how they work.</p>
<p>In the wild, companies like Quora and Medium use them to <a href="https://medium.com/the-story/what-are-bloom-filters-1ec2a50c68ff">help tailor your suggestions</a>. Facebook also uses bloom filters on <a href="https://www.facebook.com/video/video.php?v=432864835468">type-ahead queries</a> and bitly for <a href="http://word.bitly.com/post/28558800777/dablooms-an-open-source-scalable-counting">malicious url checks</a>, among several others.</p>
<p>As for Ruby there seem to be two alternatives that stand out. <a href="https://github.com/igrigorik/bloomfilter-rb">igrigorik’s bloomfilter-rb</a>, which can work with Redis and act as counting/non-counting filter, and <a href="https://github.com/deepfryed/bloom-filter">deepfryed’s bloom-filter</a>. Both rely on C extensions.</p>