<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
  <channel>
    <title>selective logging</title>
    <description>Maksim Yegorov Blog</description>
    <link>http://presently.me</link>
    <pubDate>2012-02-22 11:37:49.370005</pubDate>
    <generator>http://presently.me</generator>

    <language>en</language>


  <item>
    <title><![CDATA[job search crutch]]></title>
    <link>http://www.presently.me/2010/1/job-search-crutch</link>
    <guid isPermaLink="true">http://www.presently.me/2010/1/job-search-crutch</guid>
  
    <pubDate>2010/1/19 23:3</pubDate>
    <description><![CDATA[
  
    <p>I'm about to embark upon job search. To keep it in the background, I've written a python script that crawls Craigslist's job boards nationwide and emails me any updates via <tt>smtp</tt>. This job agent of sorts could be easily adapted for any other crawling and scraping task, so here's the full <a href="/static/craigslistJobs.py">source</a>. I'll go over parts of it in what follows. </p> <br>
    
    <p>To get started, we need to satisfy the dependencies: <tt>sudo easy_install</tt> the <a href="http://wwwsearch.sourceforge.net/mechanize/"><tt>mechanize</tt></a> API to query and traverse Craigslist and the <a href="http://www.crummy.com/software/BeautifulSoup/"><tt>BeautifulSoup</tt></a> library to scale the document tree and select elements. In addition, I strip html tags with <a href="http://www.aaronsw.com/2002/html2text/"><tt>html2text</tt></a> and store the results in a <tt>buzhug</tt> table.</p> <br>
    
    <div class="codehilite"><pre><span class="c">#Create/open existing database:</span> <br>
    
    <span class="n">mydb</span> <span class="o">=</span> <span class="n">Base</span><span class="p">(</span><span class="s">&#39;./mydb&#39;</span><span class="p">)</span> <br>
    
    <span class="k">try</span><span class="p">:</span> <br>
    
      <span class="n">mydb</span><span class="o">.</span><span class="n">open</span><span class="p">()</span> <span class="c"># close the db at the end with mydb.close()</span> <br>
    
    <span class="k">except</span> <span class="ne">IOError</span><span class="p">:</span> <br>
    
      <span class="n">mydb</span><span class="o">.</span><span class="n">create</span><span class="p">((</span><span class="s">&#39;PostingID&#39;</span><span class="p">,</span><span class="nb">str</span><span class="p">),</span> <span class="c"># stores Craigslist post id</span> <br>
    
                   <span class="p">(</span><span class="s">&#39;dt&#39;</span><span class="p">,</span><span class="n">datetime</span><span class="p">),</span>   <span class="c"># post date, a datetime object</span> <br>
    
                   <span class="p">(</span><span class="s">&#39;title&#39;</span><span class="p">,</span><span class="nb">str</span><span class="p">),</span>     <span class="c"># post title</span> <br>
    
                   <span class="p">(</span><span class="s">&#39;location&#39;</span><span class="p">,</span><span class="nb">str</span><span class="p">),</span>  <span class="c"># post location</span> <br>
    
                   <span class="p">(</span><span class="s">&#39;url&#39;</span><span class="p">,</span><span class="nb">unicode</span><span class="p">),</span>   <span class="c"># post url</span> <br>
    
                   <span class="p">(</span><span class="s">&#39;body&#39;</span><span class="p">,</span><span class="nb">unicode</span><span class="p">),</span>  <span class="c"># digest of post</span> <br>
    
                   <span class="p">(</span><span class="s">&#39;notified&#39;</span><span class="p">,</span><span class="nb">bool</span><span class="p">)</span>  <span class="c"># True if record emailed</span> <br>
    
                  <span class="p">)</span> <br>
    
    </pre></div> <br>
    
     <br>
    
     <br>
    
    <p>We'll now create a browser instance</p> <br>
    
    <div class="codehilite"><pre><span class="c">#...and set headers: </span> <br>
    
    <span class="n">br</span> <span class="o">=</span> <span class="n">mechanize</span><span class="o">.</span><span class="n">Browser</span><span class="p">()</span> <br>
    
    <span class="n">br</span><span class="o">.</span><span class="n">set_handle_equiv</span><span class="p">(</span><span class="bp">True</span><span class="p">)</span> <br>
    
    <span class="n">br</span><span class="o">.</span><span class="n">set_handle_gzip</span><span class="p">(</span><span class="bp">True</span><span class="p">)</span> <br>
    
    <span class="n">br</span><span class="o">.</span><span class="n">set_handle_redirect</span><span class="p">(</span><span class="bp">True</span><span class="p">)</span> <br>
    
    <span class="n">br</span><span class="o">.</span><span class="n">set_handle_referer</span><span class="p">(</span><span class="bp">True</span><span class="p">)</span> <br>
    
    <span class="n">br</span><span class="o">.</span><span class="n">set_handle_robots</span><span class="p">(</span><span class="bp">False</span><span class="p">)</span> <br>
    
     <br>
    
    <span class="n">br</span><span class="o">.</span><span class="n">addheaders</span> <span class="o">=</span> <span class="p">[(</span><span class="s">&#39;User-agent&#39;</span><span class="p">,</span>  <br>
    
      <span class="s">&#39;Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.10) Gecko/2009042523 Ubuntu/9.04 (jaunty) Firefox/3.0.10&#39;</span><span class="p">)]</span> <br>
    
     <br>
    
    <span class="c"># optionally set debug levels</span> <br>
    
    <span class="c">#br.set_debug_http(True)</span> <br>
    
    <span class="c">#br.set_debug_redirects(True)</span> <br>
    
    <span class="c">#br.set_debug_responses(True)</span> <br>
    
    </pre></div> <br>
    
     <br>
    
     <br>
    
    <p>We're now ready to start at the root of Craigslist and traverse the links from there:</p> <br>
    
    <div class="codehilite"><pre><span class="c">#...CRAIG_ROOT =&quot;http://geo.craigslist.org/iso/us&quot;</span> <br>
    
    <span class="n">root</span> <span class="o">=</span> <span class="n">br</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">CRAIG_ROOT</span><span class="p">)</span> <br>
    
     <br>
    
    <span class="n">root_html</span><span class="o">=</span><span class="n">root</span><span class="o">.</span><span class="n">read</span><span class="p">()</span> <br>
    
    <span class="n">root_soup</span><span class="o">=</span><span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">root_html</span><span class="p">)</span> <br>
    
     <br>
    
    <span class="c"># get a list of site links under &lt;div #list&gt; --&gt; get anchors and the location strings they enclose</span> <br>
    
    <span class="n">divblock</span> <span class="o">=</span> <span class="n">root_soup</span><span class="o">.</span><span class="n">findAll</span><span class="p">(</span><span class="nb">id</span><span class="o">=</span><span class="s">&quot;list&quot;</span><span class="p">)</span> <br>
    
    <span class="n">anchors</span> <span class="o">=</span> <span class="n">divblock</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">findAll</span><span class="p">(</span><span class="s">&#39;a&#39;</span><span class="p">)</span> <br>
    
    <span class="n">locs</span><span class="o">=</span><span class="p">[]</span> <br>
    
    <span class="k">for</span> <span class="n">a</span> <span class="ow">in</span> <span class="n">anchors</span><span class="p">:</span> <br>
    
      <span class="k">try</span><span class="p">:</span> <br>
    
        <span class="n">locs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">a</span><span class="o">.</span><span class="n">findChildren</span><span class="p">()[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">string</span><span class="o">.</span><span class="n">strip</span><span class="p">())</span> <br>
    
      <span class="k">except</span><span class="p">:</span> <br>
    
        <span class="n">locs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">a</span><span class="o">.</span><span class="n">string</span><span class="o">.</span><span class="n">strip</span><span class="p">())</span> <br>
    
    </pre></div> <br>
    
     <br>
    
     <br>
    
    <p>We now have a list of (link,location) tuples: <br /> <br>
    
    </p> <br>
    
    <div class="codehilite"><pre><span class="n">directory</span><span class="o">=</span> <span class="nb">zip</span><span class="p">([</span><span class="n">l</span><span class="p">[</span><span class="s">&quot;href&quot;</span><span class="p">]</span> <span class="k">for</span> <span class="n">l</span> <span class="ow">in</span> <span class="n">anchors</span><span class="p">],</span><span class="n">locs</span><span class="p">)</span> <br>
    
    </pre></div> <br>
    
     <br>
    
     <br>
    
    <p>What we'll do is follow each link in turn, click on the "arch / engineering" section, and submit the search form. My pseudocode is in fact not far from <tt>mechanize</tt> syntax:</p> <br>
    
    <div class="codehilite"><pre><span class="c">#...here l is the root link for given Craigslist site (e.g. &quot;http://sfbay.craigslist.org/&quot;)</span> <br>
    
    <span class="n">r</span><span class="o">=</span><span class="n">br</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">l</span><span class="p">)</span>  <br>
    
    <span class="n">req</span><span class="o">=</span><span class="n">br</span><span class="o">.</span><span class="n">click_link</span><span class="p">(</span><span class="n">text</span><span class="o">=</span><span class="s">&quot;arch / engineering&quot;</span><span class="p">)</span>   <br>
    
    <span class="n">br</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">req</span><span class="p">)</span> <br>
    
     <br>
    
    <span class="c"># select the first form on the page</span> <br>
    
    <span class="k">try</span><span class="p">:</span>  <br>
    
      <span class="n">br</span><span class="o">.</span><span class="n">select_form</span><span class="p">(</span><span class="n">nr</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <br>
    
    <span class="k">except</span><span class="p">:</span> <span class="c"># if the form is not found (Craigslist&#39;s &quot;scam alert&quot; page), follow the link to the form</span> <br>
    
      <span class="n">req</span><span class="o">=</span><span class="n">br</span><span class="o">.</span><span class="n">click_link</span><span class="p">(</span><span class="n">text</span><span class="o">=</span><span class="s">&quot;Continue to arch / engineering job postings&quot;</span><span class="p">)</span>   <br>
    
      <span class="n">br</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">req</span><span class="p">)</span> <br>
    
      <span class="n">br</span><span class="o">.</span><span class="n">select_form</span><span class="p">(</span><span class="n">nr</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <br>
    
     <br>
    
    <span class="c"># perform search</span> <br>
    
    <span class="n">br</span><span class="o">.</span><span class="n">form</span><span class="p">[</span><span class="s">&#39;catAbbreviation&#39;</span><span class="p">]</span><span class="o">=</span><span class="p">[</span><span class="s">&#39;egr&#39;</span><span class="p">]</span> <br>
    
    <span class="n">br</span><span class="o">.</span><span class="n">form</span><span class="p">[</span><span class="s">&#39;query&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">QUERY_STR</span>  <span class="c">#QUERY_STR could be &#39;structural -&quot;mechanical engineer&quot; -drafter -draftsman -someSpammerString&#39;</span> <br>
    
    <span class="n">br</span><span class="o">.</span><span class="n">submit</span><span class="p">()</span> <br>
    
    </pre></div> <br>
    
     <br>
    
     <br>
    
    <p>Let's use a regular expression to extract the post links:</p> <br>
    
    <div class="codehilite"><pre><span class="c"># ...a typical link looks like: &quot;*/egr/1560953114.html&quot;</span> <br>
    
    <span class="n">urlre</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="s">&#39;(/egr/([0-9]+).html)&#39;</span><span class="p">)</span> <br>
    
    <span class="n">all_links</span> <span class="o">=</span> <span class="p">[</span><span class="n">l</span> <span class="k">for</span> <span class="n">l</span> <span class="ow">in</span> <span class="n">br</span><span class="o">.</span><span class="n">links</span><span class="p">(</span><span class="n">url_regex</span><span class="o">=</span><span class="n">urlre</span><span class="p">)]</span> <br>
    
    </pre></div> <br>
    
     <br>
    
     <br>
    
    <p>We can now extract the PostingID (and title) from the link (<tt>postid = link.url.split('/')[-1].split('.')[0]</tt>) and query our table to see if the record exists. </p> <br>
    
    <div class="codehilite"><pre><span class="c">#... If not, we follow the link with:</span> <br>
    
    <span class="n">br</span><span class="o">.</span><span class="n">follow_link</span><span class="p">(</span><span class="n">link</span><span class="p">)</span> <br>
    
    <span class="n">html</span><span class="o">=</span><span class="n">br</span><span class="o">.</span><span class="n">response</span><span class="p">()</span><span class="o">.</span><span class="n">read</span><span class="p">()</span> <br>
    
    <span class="n">soup</span><span class="o">=</span><span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">html</span><span class="p">)</span> <br>
    
     <br>
    
    <span class="c"># Extract the date and url:</span> <br>
    
    <span class="n">dates</span><span class="o">=</span><span class="n">re</span><span class="o">.</span><span class="n">findall</span><span class="p">(</span><span class="s">r&#39;Date:([ 0-9-,:A-Z]+)&#39;</span><span class="p">,</span> <span class="n">html</span><span class="p">)</span> <br>
    
    <span class="k">if</span> <span class="n">dates</span><span class="p">:</span> <br>
    
      <span class="n">dates</span><span class="o">=</span><span class="n">string</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">dates</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span><span class="o">.</span><span class="n">split</span><span class="p">()[:</span><span class="mi">2</span><span class="p">],</span><span class="s">&#39; &#39;</span><span class="p">)</span> <br>
    
      <span class="n">d</span><span class="o">=</span><span class="n">datetime</span><span class="o">.</span><span class="n">strptime</span><span class="p">(</span><span class="n">dates</span><span class="p">,</span> <span class="s">&quot;%Y-%m-</span><span class="si">%d</span><span class="s">, %H:%M%p&quot;</span><span class="p">)</span>             <br>
    
    <span class="k">else</span><span class="p">:</span> <br>
    
      <span class="n">d</span><span class="o">=</span><span class="n">date</span><span class="o">.</span><span class="n">today</span><span class="p">()</span>    <span class="c"># if Date couldn&#39;t be found for some reason (a recently expired post?)</span> <br>
    
     <br>
    
    <span class="c"># fetch post url</span> <br>
    
    <span class="n">post_url</span><span class="o">=</span><span class="n">br</span><span class="o">.</span><span class="n">response</span><span class="p">()</span><span class="o">.</span><span class="n">geturl</span><span class="p">()</span> <br>
    
    </pre></div> <br>
    
     <br>
    
     <br>
    
    <p>Now let's gnaw at the body of the post. The problem here is the amount of redundancy in Craigslist posts. We could write a custom filter, e.g. by matching sets of posts and setting some upper bar of overlap. On the other hand, some verbatim reposts might merely change the job title or some other essential detail that would be overlooked as a result of overzealous screening. Accordingly, I chose to rely on the search string instead. With some tweaking, this filters out the more aggressive spammers.<br /> <br>
    
    </p> <br>
    
    <div class="codehilite"><pre><span class="c"># ...Let&#39;s fetch the first N words (&lt;tt&gt;DIGEST_SIZE&lt;/tt&gt;) from the post body and enter the record into our database:</span> <br>
    
    <span class="n">userbody</span><span class="o">=</span><span class="n">soup</span><span class="o">.</span><span class="n">findAll</span><span class="p">(</span><span class="nb">id</span><span class="o">=</span><span class="s">&#39;userbody&#39;</span><span class="p">)</span> <br>
    
    <span class="k">if</span> <span class="n">userbody</span><span class="p">:</span>    <span class="c">#if no body, don&#39;t bother processing</span> <br>
    
      <span class="n">userbody</span><span class="o">=</span><span class="nb">unicode</span><span class="p">(</span><span class="n">userbody</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <br>
    
      <span class="n">userbody</span><span class="o">=</span><span class="n">string</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">userbody</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&#39; &#39;</span><span class="p">)[:</span><span class="n">DIGEST_SIZE</span><span class="p">],</span><span class="s">&#39; &#39;</span><span class="p">)</span>  <br>
    
      <span class="n">userbody</span> <span class="o">=</span> <span class="n">html2text</span><span class="p">(</span><span class="n">userbody</span><span class="p">)</span> <br>
    
     <br>
    
      <span class="c"># see if there&#39;s an exact same record in db (a repost?):</span> <br>
    
      <span class="n">recs</span> <span class="o">=</span> <span class="n">mydb</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="bp">None</span><span class="p">,</span> <span class="n">location</span> <span class="o">=</span> <span class="n">post_location</span><span class="p">,</span> <span class="n">body</span> <span class="o">=</span> <span class="n">userbody</span><span class="p">)</span> <br>
    
     <br>
    
      <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">recs</span><span class="p">)</span><span class="o">==</span><span class="mi">0</span><span class="p">:</span>  <span class="c"># if no duplicates found, enter the record into table</span> <br>
    
        <span class="n">recid</span><span class="o">=</span><span class="n">mydb</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="n">PostingID</span><span class="o">=</span><span class="n">postid</span><span class="p">,</span> <br>
    
                          <span class="n">dt</span><span class="o">=</span><span class="n">d</span><span class="p">,</span> <br>
    
                          <span class="n">title</span><span class="o">=</span><span class="n">ti</span><span class="p">,</span>  <br>
    
                          <span class="n">location</span><span class="o">=</span><span class="n">post_location</span><span class="p">,</span> <br>
    
                          <span class="n">url</span><span class="o">=</span><span class="n">post_url</span><span class="p">,</span>  <br>
    
                          <span class="n">body</span><span class="o">=</span><span class="n">userbody</span><span class="p">,</span> <br>
    
                          <span class="n">notified</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span> <br>
    
    </pre></div> <br>
    
     <br>
    
     <br>
    
    <p>Finally, we can send an email update:</p> <br>
    
    <div class="codehilite"><pre><span class="c"># ...using Google&#39;s or Yahoo&#39;s SMTP server</span> <br>
    
    <span class="n">recs</span><span class="o">=</span><span class="n">mydb</span><span class="o">.</span><span class="n">select_for_update</span><span class="p">([</span><span class="s">&#39;dt&#39;</span><span class="p">,</span><span class="s">&#39;title&#39;</span><span class="p">,</span><span class="s">&#39;location&#39;</span><span class="p">,</span><span class="s">&#39;url&#39;</span><span class="p">,</span><span class="s">&#39;body&#39;</span><span class="p">,</span><span class="s">&#39;notified&#39;</span><span class="p">],</span><span class="n">notified</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span> <br>
    
    <span class="n">logfile</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s">&quot;</span><span class="se">\n</span><span class="s">There&#39;s a total of </span><span class="si">%d</span><span class="s"> records to email&quot;</span> <span class="o">%</span><span class="nb">len</span><span class="p">(</span><span class="n">recs</span><span class="p">))</span> <br>
    
    <span class="n">mesg</span><span class="o">=</span><span class="s">&#39;&#39;</span> <br>
    
    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">recs</span><span class="p">:</span> <br>
    
      <span class="n">mesg</span> <span class="o">+=</span> <span class="s">&quot;</span><span class="se">\n</span><span class="si">%s</span><span class="s"> </span><span class="se">\n</span><span class="s">Date: </span><span class="si">%s</span><span class="s"> </span><span class="se">\n</span><span class="s">Title: </span><span class="si">%s</span><span class="s"> (</span><span class="si">%s</span><span class="s">) </span><span class="se">\n</span><span class="s">Digest: </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span><span class="p">(</span><span class="n">i</span><span class="o">.</span><span class="n">url</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s">&#39;utf-8&#39;</span><span class="p">),</span> <br>
    
                                                                <span class="n">i</span><span class="o">.</span><span class="n">dt</span><span class="o">.</span><span class="n">strftime</span><span class="p">(</span><span class="s">&quot;%A, </span><span class="si">%d</span><span class="s"> %B %Y, %I:%M%p&quot;</span><span class="p">),</span> <br>
    
                                                                <span class="n">i</span><span class="o">.</span><span class="n">title</span><span class="p">,</span><span class="n">i</span><span class="o">.</span><span class="n">location</span><span class="p">,</span> <br>
    
                                                                <span class="n">i</span><span class="o">.</span><span class="n">body</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s">&#39;utf-8&#39;</span><span class="p">))</span> <br>
    
      <span class="n">mesg</span> <span class="o">+=</span><span class="s">&quot;</span><span class="se">\n\n</span><span class="s">----------------------</span><span class="se">\n</span><span class="s">&quot;</span> <br>
    
     <br>
    
      <span class="n">encoding</span> <span class="o">=</span> <span class="s">&#39;iso-8859-15&#39;</span> <br>
    
      <span class="n">msg</span> <span class="o">=</span> <span class="n">MIMEText</span><span class="p">(</span><span class="n">_text</span><span class="o">=</span><span class="n">mesg</span><span class="p">,</span> <span class="n">_charset</span><span class="o">=</span><span class="s">&#39;charset=</span><span class="si">%s</span><span class="s">&#39;</span> <span class="o">%</span> <span class="n">encoding</span><span class="p">)</span> <br>
    
      <span class="n">msg</span><span class="p">[</span><span class="s">&#39;subject&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="s">&quot;Craigslist updates&quot;</span> <br>
    
      <span class="n">sender</span> <span class="o">=</span> <span class="s">&#39;craigslistJobs.py &lt;</span><span class="si">%s</span><span class="s">&gt;&#39;</span> <span class="o">%</span><span class="p">(</span><span class="s">&quot;from.me@gmail.com&quot;</span><span class="p">)</span>  <span class="c"># substitute your gmail account</span> <br>
    
     <br>
    
      <span class="n">msg</span><span class="p">[</span><span class="s">&#39;from&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">sender</span> <br>
    
     <br>
    
      <span class="n">s</span> <span class="o">=</span> <span class="n">smtplib</span><span class="o">.</span><span class="n">SMTP</span><span class="p">(</span><span class="s">&#39;smtp.gmail.com&#39;</span><span class="p">,</span><span class="mi">587</span><span class="p">)</span>  <span class="c"># gmail uses port 587</span> <br>
    
      <span class="n">s</span><span class="o">.</span><span class="n">ehlo</span><span class="p">()</span> <br>
    
      <span class="n">s</span><span class="o">.</span><span class="n">starttls</span><span class="p">()</span> <br>
    
      <span class="n">s</span><span class="o">.</span><span class="n">ehlo</span><span class="p">()</span> <br>
    
      <span class="n">s</span><span class="o">.</span><span class="n">login</span><span class="p">(</span><span class="s">&quot;from.me@gmail.com&quot;</span><span class="p">,</span><span class="s">&quot;my password&quot;</span><span class="p">)</span> <br>
    
      <span class="n">s</span><span class="o">.</span><span class="n">sendmail</span><span class="p">(</span><span class="n">sender</span><span class="p">,</span> <span class="s">&quot;to.you@gmail.com&quot;</span><span class="p">,</span> <span class="n">msg</span><span class="o">.</span><span class="n">as_string</span><span class="p">())</span> <br>
    
      <span class="n">s</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> <br>
    
     <br>
    
      <span class="c"># if success, set notified=True</span> <br>
    
      <span class="n">mydb</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">recs</span><span class="p">,</span> <span class="n">notified</span> <span class="o">=</span> <span class="bp">True</span><span class="p">)</span> <br>
    
    </pre></div> <br>
    
     <br>
    
     <br>
    
    <p><a href="/static/craigslistJobs.py">My script</a> adds basic error logging and table maintenance (Craigslist job postings expire after 30 days). All that remains is to schedule a cron job!</p> <br>
    
    ]]></description>
  
  </item>
  <item>
    <title><![CDATA[about]]></title>
    <link>http://www.presently.me/2010/1/about</link>
    <guid isPermaLink="true">http://www.presently.me/2010/1/about</guid>
  
    <pubDate>2010/1/15 16:52</pubDate>
    <description><![CDATA[
  
    <p>My name is Maksim Yegorov. I'm the guy behind this blog. I'm studying structural engineering at UC Berkeley, but happen to have other all-consuming interests on the side. These are mostly passing distractions, but on a couple occassions they've added to my collection of degrees. </p> <br>
    
    <p>Anyway, over the course of the winter break I hacked together a bare-bones blogging engine in <a href="http://www.python.org"><tt>python</tt></a>. Outside the standard library, I'm using the <a href="http://www.webpy.org"><tt>web.py</tt></a> module to serve http requests and process templates. Flat-file storage is implemented with <a href="http://buzhug.sourceforge.net/"><tt>buzhug</tt></a>.<br /> <br>
    
    </p> <br>
    
    ]]></description>
  
  </item>
  </channel>
</rss>


