Skip to content

Commit 89b8b00

Browse files
author
Exploding Labs Bot
committed
Update site from Jekyll source repo
1 parent 12ba20d commit 89b8b00

File tree

89 files changed

+1193
-1675
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

89 files changed

+1193
-1675
lines changed
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
<div class="wide-logos">
2+
<p><img src="/posts/assets/airflow.png" alt="airflow" /></p>
3+
</div>
4+
5+
<p>Airflow’s context dictionary can be found in the <code class="language-plaintext highlighter-rouge">get_template_context</code> method,
6+
in Airflow’s
7+
<a href="https://github.com/databricks/incubator-airflow/blob/master/airflow/models.py">models.py</a>.</p>
8+
9+
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span>
10+
<span class="s">'dag'</span><span class="p">:</span> <span class="n">task</span><span class="p">.</span><span class="n">dag</span><span class="p">,</span>
11+
<span class="s">'ds'</span><span class="p">:</span> <span class="n">ds</span><span class="p">,</span>
12+
<span class="s">'ds_nodash'</span><span class="p">:</span> <span class="n">ds_nodash</span><span class="p">,</span>
13+
<span class="s">'ts'</span><span class="p">:</span> <span class="n">ts</span><span class="p">,</span>
14+
<span class="s">'ts_nodash'</span><span class="p">:</span> <span class="n">ts_nodash</span><span class="p">,</span>
15+
<span class="s">'yesterday_ds'</span><span class="p">:</span> <span class="n">yesterday_ds</span><span class="p">,</span>
16+
<span class="s">'yesterday_ds_nodash'</span><span class="p">:</span> <span class="n">yesterday_ds_nodash</span><span class="p">,</span>
17+
<span class="s">'tomorrow_ds'</span><span class="p">:</span> <span class="n">tomorrow_ds</span><span class="p">,</span>
18+
<span class="s">'tomorrow_ds_nodash'</span><span class="p">:</span> <span class="n">tomorrow_ds_nodash</span><span class="p">,</span>
19+
<span class="s">'END_DATE'</span><span class="p">:</span> <span class="n">ds</span><span class="p">,</span>
20+
<span class="s">'end_date'</span><span class="p">:</span> <span class="n">ds</span><span class="p">,</span>
21+
<span class="s">'dag_run'</span><span class="p">:</span> <span class="n">dag_run</span><span class="p">,</span>
22+
<span class="s">'run_id'</span><span class="p">:</span> <span class="n">run_id</span><span class="p">,</span>
23+
<span class="s">'execution_date'</span><span class="p">:</span> <span class="bp">self</span><span class="p">.</span><span class="n">execution_date</span><span class="p">,</span>
24+
<span class="s">'prev_execution_date'</span><span class="p">:</span> <span class="n">prev_execution_date</span><span class="p">,</span>
25+
<span class="s">'next_execution_date'</span><span class="p">:</span> <span class="n">next_execution_date</span><span class="p">,</span>
26+
<span class="s">'latest_date'</span><span class="p">:</span> <span class="n">ds</span><span class="p">,</span>
27+
<span class="s">'macros'</span><span class="p">:</span> <span class="n">macros</span><span class="p">,</span>
28+
<span class="s">'params'</span><span class="p">:</span> <span class="n">params</span><span class="p">,</span>
29+
<span class="s">'tables'</span><span class="p">:</span> <span class="n">tables</span><span class="p">,</span>
30+
<span class="s">'task'</span><span class="p">:</span> <span class="n">task</span><span class="p">,</span>
31+
<span class="s">'task_instance'</span><span class="p">:</span> <span class="bp">self</span><span class="p">,</span>
32+
<span class="s">'ti'</span><span class="p">:</span> <span class="bp">self</span><span class="p">,</span>
33+
<span class="s">'task_instance_key_str'</span><span class="p">:</span> <span class="n">ti_key_str</span><span class="p">,</span>
34+
<span class="s">'conf'</span><span class="p">:</span> <span class="n">configuration</span><span class="p">,</span>
35+
<span class="s">'test_mode'</span><span class="p">:</span> <span class="bp">self</span><span class="p">.</span><span class="n">test_mode</span><span class="p">,</span>
36+
<span class="s">'var'</span><span class="p">:</span> <span class="p">{</span>
37+
<span class="s">'value'</span><span class="p">:</span> <span class="n">VariableAccessor</span><span class="p">(),</span>
38+
<span class="s">'json'</span><span class="p">:</span> <span class="n">VariableJsonAccessor</span><span class="p">()</span>
39+
<span class="p">}</span>
40+
<span class="p">}</span>
41+
</code></pre></div></div>
42+
43+
<p>An explanation of each item is found in the documentation under
44+
<a href="https://airflow.apache.org/docs/stable/macros-ref.html">Macros</a>.</p>
45+
46+
<p>Incidentally, you can generate the context from a TaskInstance.</p>
47+
48+
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">context</span> <span class="o">=</span> <span class="n">TaskInstance</span><span class="p">(</span>
49+
<span class="n">task</span><span class="o">=</span><span class="n">task</span><span class="p">,</span>
50+
<span class="n">execution_date</span><span class="o">=</span><span class="n">datetime</span><span class="p">.</span><span class="n">now</span><span class="p">()</span>
51+
<span class="p">).</span><span class="n">get_template_context</span><span class="p">()</span>
52+
</code></pre></div></div>
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
<div class="wide-logos">
2+
<p><img src="/posts/assets/airflow.png" alt="airflow" /></p>
3+
</div>
4+
5+
<p>Install cryptography:</p>
6+
7+
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip <span class="nb">install </span>cryptography
8+
</code></pre></div></div>
9+
10+
<p>Generate a fernet key:</p>
11+
12+
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>python <span class="nt">-c</span> <span class="s2">"from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"</span>
13+
<span class="nv">81HqDtbqAywKSOumSha3BhWNOdQ26slT6K0YaZeZyPs</span><span class="o">=</span>
14+
</code></pre></div></div>
15+
16+
<h2 id="use-a-fernet-key-with-airflow">Use a fernet key with Airflow</h2>
17+
18+
<p>Paste the key into your <code class="language-plaintext highlighter-rouge">airflow.cfg</code>.</p>
19+
20+
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>fernet_key = 81HqDtbqAywKSOumSha3BhWNOdQ26slT6K0YaZeZyPs=
21+
</code></pre></div></div>
22+
23+
<p>Alternatively, set the environment variable.</p>
24+
25+
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">AIRFLOW__CORE__FERNET_KEY</span><span class="o">=</span><span class="s1">'81HqDtbqAywKSOumSha3BhWNOdQ26slT6K0YaZeZyPs='</span>
26+
</code></pre></div></div>
27+
28+
<p>Restart Airflow’s webserver.</p>
29+
30+
<div class="warning">
31+
<p>For existing connections (the ones that were defined before setting the Fernet
32+
key), you need to open each connection in the web admin, re-type the password
33+
and save it.</p>
34+
</div>
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
<div class="wide-logos">
2+
<p><img src="/posts/assets/airflow.png" alt="airflow" /></p>
3+
</div>
4+
5+
<p>The command to trigger an Airflow dag is simply:</p>
6+
7+
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>airflow trigger_dag my-dag
8+
</code></pre></div></div>
9+
10+
<p>But I also want to watch the logs in the terminal. Trouble is, each time a task is run a new directory and file is created. Something like:</p>
11+
12+
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>~/airflow/logs/my-dag/my-task/2018-03-06T09:59:10.427477/1.log
13+
</code></pre></div></div>
14+
15+
<p>This makes it hard to tail-follow the logs. Thankfully, starting from Airflow
16+
1.9, logging can be configured easily, allowing you to put all of a dag’s logs
17+
into one file.</p>
18+
19+
<div class="warning">
20+
<p>If you make this change, you won’t be able to view task logs in the web UI,
21+
because the UI expects log filenames to be in the normal format.</p>
22+
</div>
23+
24+
<div class="warning">
25+
<p>Logging to a single file is useful for development (using the
26+
SequentialExecutor), but it won’t work in production because issues
27+
will arise when multiple tasks attempt to write to the same log file at once.</p>
28+
</div>
29+
30+
<h2 id="easy-solution">Easy Solution</h2>
31+
32+
<div class="warning">
33+
<p>Requires Airflow 1.10+</p>
34+
</div>
35+
36+
<p>Set the <code class="language-plaintext highlighter-rouge">FILENAME_TEMPLATE</code> setting.</p>
37+
38+
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">AIRFLOW__CORE__LOG_FILENAME_TEMPLATE</span><span class="o">=</span><span class="s2">"{{ ti.dag_id }}.log"</span>
39+
</code></pre></div></div>
40+
41+
<h2 id="advanced-solution---recommended">Advanced Solution - Recommended</h2>
42+
43+
<div class="warning">
44+
<p>Requires Airflow 1.9+</p>
45+
</div>
46+
47+
<p>Since Airflow 1.9, logging is configured Pythonically.</p>
48+
49+
<p>Grab Airflow’s default log config, <code class="language-plaintext highlighter-rouge">airflow_local_settings.py</code>, and copy it
50+
somewhere in your <code class="language-plaintext highlighter-rouge">PYTHONPATH</code>.</p>
51+
52+
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl <span class="nt">-O</span> https://raw.githubusercontent.com/apache/incubator-airflow/master/airflow/config_templates/airflow_local_settings.py
53+
<span class="nb">cp </span>airflow_local_settings.py <span class="nv">$AIRFLOW__CORE__DAGS_FOLDER</span>
54+
</code></pre></div></div>
55+
56+
<p>Set the logging_config_class setting. (Make sure this is set in both your
57+
scheduler and worker’s environments). (Alternatively set the related setting in
58+
airflow.cfg.)</p>
59+
60+
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">AIRFLOW__CORE__LOGGING_CONFIG_CLASS</span><span class="o">=</span>airflow_local_settings.DEFAULT_LOGGING_CONFIG
61+
</code></pre></div></div>
62+
63+
<p>Now you can configure logging to your liking.</p>
64+
65+
<p>Edit airflow_local_settings.py, changing <code class="language-plaintext highlighter-rouge">FILENAME_TEMPLATE</code> to:</p>
66+
67+
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>FILENAME_TEMPLATE <span class="o">=</span> <span class="s1">'{{ ti.dag_id }}.log'</span>
68+
</code></pre></div></div>
69+
70+
<p>You should now get all of a dag log output in a single file.</p>
71+
72+
<h2 id="tailing-the-logs">Tailing the logs</h2>
73+
74+
<p>Start the scheduler and trigger a dag.</p>
75+
76+
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>airflow scheduler
77+
airflow trigger_dag my-dag
78+
</code></pre></div></div>
79+
80+
<p>Watch the output with <code class="language-plaintext highlighter-rouge">tail -f</code>.</p>
81+
82+
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">tail</span> <span class="nt">-f</span> ~/airflow/logs/my-dag.log
83+
</code></pre></div></div>
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
<div class="wide-logos">
2+
<p><img src="/posts/assets/airflow.png" alt="airflow" /></p>
3+
</div>
4+
5+
<p>Airflow has a fairly strange way of registering DAGs and tasks. They’re put
6+
into the global namespace of the DAG definition file.</p>
7+
8+
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">dag</span> <span class="o">=</span> <span class="n">DAG</span><span class="p">(</span><span class="n">dag_id</span><span class="o">=</span><span class="s">'foo'</span><span class="p">,</span> <span class="n">start_date</span><span class="o">=</span><span class="n">start_date</span><span class="p">)</span>
9+
<span class="n">MyOperator</span><span class="p">(</span><span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">,</span> <span class="n">task_id</span><span class="o">=</span><span class="s">'foo'</span><span class="p">)</span>
10+
</code></pre></div></div>
11+
12+
<p>Airflow then comes along and finds them.</p>
13+
14+
<p>When importing that file however, as you do when unit testing, it’s not ideal
15+
to have those global objects created.</p>
16+
17+
<p>The solution is to protect that code with an <code class="language-plaintext highlighter-rouge">if</code> statement:</p>
18+
19+
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="n">__name__</span><span class="p">.</span><span class="n">startswith</span><span class="p">(</span><span class="s">'unusual_prefix'</span><span class="p">):</span>
20+
<span class="n">dag</span> <span class="o">=</span> <span class="n">DAG</span><span class="p">(</span><span class="n">dag_id</span><span class="o">=</span><span class="s">'foo'</span><span class="p">,</span> <span class="n">start_date</span><span class="o">=</span><span class="n">start_date</span><span class="p">)</span>
21+
<span class="n">MyOperator</span><span class="p">(</span><span class="n">dag</span><span class="o">=</span><span class="n">dag</span><span class="p">,</span> <span class="n">task_id</span><span class="o">=</span><span class="s">'foo'</span><span class="p">)</span>
22+
</code></pre></div></div>
23+
24+
<p>This is Airflow’s equivalent of <a href="http://effbot.org/pyfaq/tutor-what-is-if-name-main-for.htm"><code class="language-plaintext highlighter-rouge">if __name__ == "__main__"</code></a>.</p>
25+
26+
<p>Airflow will still find your DAG as normal, however that code inside the block
27+
won’t be executed when the module is imported.</p>

0 commit comments

Comments
 (0)