perrygeo.comhttps://www.perrygeo.com/2022-12-03T00:00:00-07:00Data, Software, SystemsGetting started with application configuration in Rust2022-12-03T00:00:00-07:002022-12-03T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2022-12-03:/getting-started-with-application-configuration-in-rust.html<p>Sooner or later, that application you're writing will need to be configured.
At the very least, you'll need a way to adjust inputs without editing source code. Wouldn't it be nice to have a reasonable configuration system from the start?</p>
<p>The best way to configure your app will depend on …</p><p>Sooner or later, that application you're writing will need to be configured.
At the very least, you'll need a way to adjust inputs without editing source code. Wouldn't it be nice to have a reasonable configuration system from the start?</p>
<p>The best way to configure your app will depend on
the environment in which you're using the software,
and the requirements of the project, all of which will change over time.
Ideally, we'd start out with a system
that had the flexibility to pull our configuration from a number of input sources:</p>
<ul>
<li><strong>Command Line Interface</strong> for interactive development with standard flags, clear usage and error handling</li>
<li><strong><code>.env</code></strong> files for declarative configuration, either development or production</li>
<li><strong>Environment variables</strong> for containers and many production settings</li>
<li>Reasonable <strong>defaults</strong> if nothing is provided by the user. And if there is no obvious default, mark it clearly as a mandatory argument.</li>
</ul>
<p>For a language that is often refered to
as a low-level "systems" language, Rust allows for some very ergonomic
abstractions. We can implement a type-safe configuration system
with a minimal amount of imperative code, letting the third-party crates handle the mechanical details. Let's walk through a new project...</p>
<h2>Project setup</h2>
<p>In this example, we'll create a Rust project using the <code>clap</code> and <code>dotenv</code> crates.</p>
<div class="highlight"><pre><span></span><code>cargo new myapp
<span class="nb">cd</span> myapp
cargo add clap --features derive,env
cargo add dotenv
</code></pre></div>
<p>Your <code>Cargo.toml</code> file should look something like</p>
<div class="highlight"><pre><span></span><code><span class="k">[dependencies]</span><span class="w"></span>
<span class="n">clap</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">version</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"4.0.29"</span><span class="p">,</span><span class="w"> </span><span class="n">features</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="s">"derive"</span><span class="p">,</span><span class="w"> </span><span class="s">"env"</span><span class="p">]</span><span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="n">dotenv</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"0.15.0"</span><span class="w"></span>
</code></pre></div>
<h2>Creating the configuration struct</h2>
<p>Let's build it up from scratch, starting with a plain <code>struct</code> defining all values we need to configure the app.</p>
<p>In our <code>src/main.rs</code></p>
<div class="highlight"><pre><span></span><code><span class="k">pub</span><span class="w"> </span><span class="k">struct</span> <span class="nc">Config</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">ipaddr</span>: <span class="nb">String</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">port</span>: <span class="kt">i32</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">database_url</span>: <span class="nb">String</span><span class="p">,</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Let's pause for a second to consider types. In Rust, types can help us out by providing powerful correctness guarantees.</p>
<p>Is <code>ipaddr</code> really a String? The type system should enforce a valid IPv4 address instead of a free-form string.
Likewise, let's make sure the <code>port</code> is a unsigned 16 bit integer to stay within
the range of viable port numbers.</p>
<div class="highlight"><pre><span></span><code><span class="k">use</span><span class="w"> </span><span class="n">std</span>::<span class="n">net</span>::<span class="n">Ipv4Addr</span><span class="p">;</span><span class="w"></span>
<span class="k">pub</span><span class="w"> </span><span class="k">struct</span> <span class="nc">Config</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">ipaddr</span>: <span class="nc">Ipv4Addr</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">port</span>: <span class="kt">u16</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">database_url</span>: <span class="nb">String</span><span class="p">,</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<h2>Clap annotations</h2>
<p>Next, we use the <a href="https://docs.rs/clap/latest/clap/index.html"><code>clap</code></a> crate and add annotations to our struct.</p>
<p>This turns our declarative struct into a powerful command line interface,
with error handling, default values and type conversion.</p>
<div class="highlight"><pre><span></span><code><span class="k">use</span><span class="w"> </span><span class="n">clap</span>::<span class="n">Parser</span><span class="p">;</span><span class="w"></span>
<span class="k">use</span><span class="w"> </span><span class="n">std</span>::<span class="n">net</span>::<span class="n">Ipv4Addr</span><span class="p">;</span><span class="w"></span>
<span class="cp">#[derive(Parser, Debug)]</span><span class="w"></span>
<span class="cp">#[command(author, version, about)]</span><span class="w"></span>
<span class="k">pub</span><span class="w"> </span><span class="k">struct</span> <span class="nc">Config</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="cp">#[arg(short, long, default_value = </span><span class="s">"0.0.0.0"</span><span class="cp">)]</span><span class="w"></span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">ipaddr</span>: <span class="nc">Ipv4Addr</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="cp">#[arg(short, long, default_value_t = 3000)]</span><span class="w"></span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">port</span>: <span class="kt">u16</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="cp">#[arg(short, long)]</span><span class="w"></span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">database_url</span>: <span class="nb">String</span><span class="p">,</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>The author, version and about text are derived from the contents of our <code>Cargo.toml</code> file.</p>
<p>Note that the <code>database_url</code> does not use a default value.</p>
<h2>Self documentation</h2>
<p>We can add docstrings (<code>///</code>) to the struct and to its members.
This serves the purpose of both documenting the code and exposing
friendly command line usage and error messages.</p>
<div class="highlight"><pre><span></span><code><span class="k">use</span><span class="w"> </span><span class="n">clap</span>::<span class="n">Parser</span><span class="p">;</span><span class="w"></span>
<span class="k">use</span><span class="w"> </span><span class="n">std</span>::<span class="n">net</span>::<span class="n">Ipv4Addr</span><span class="p">;</span><span class="w"></span>
<span class="sd">/// My Awesome Application</span>
<span class="cp">#[derive(Parser, Debug)]</span><span class="w"></span>
<span class="cp">#[command(author, version, about)]</span><span class="w"></span>
<span class="k">pub</span><span class="w"> </span><span class="k">struct</span> <span class="nc">Config</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="sd">/// IPv4 address</span>
<span class="w"> </span><span class="cp">#[arg(short, long, default_value = </span><span class="s">"0.0.0.0"</span><span class="cp">)]</span><span class="w"></span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">ipaddr</span>: <span class="nc">Ipv4Addr</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="sd">/// Port number</span>
<span class="w"> </span><span class="cp">#[arg(short, long, default_value_t = 3000)]</span><span class="w"></span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">port</span>: <span class="kt">u16</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="sd">/// Database connection string</span>
<span class="w"> </span><span class="cp">#[arg(short, long)]</span><span class="w"></span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">database_url</span>: <span class="nb">String</span><span class="p">,</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<h2>Environment handling</h2>
<p>Clap can handle env vars explicitly by add the <code>env(...)</code> annotation
to each configuration item. Here, we explictly define each variable name
using the <code>APP_*</code> prefix, all upper case, as a convention:</p>
<div class="highlight"><pre><span></span><code><span class="k">use</span><span class="w"> </span><span class="n">clap</span>::<span class="n">Parser</span><span class="p">;</span><span class="w"></span>
<span class="k">use</span><span class="w"> </span><span class="n">dotenv</span>::<span class="n">dotenv</span><span class="p">;</span><span class="w"></span>
<span class="k">use</span><span class="w"> </span><span class="n">std</span>::<span class="n">net</span>::<span class="n">Ipv4Addr</span><span class="p">;</span><span class="w"></span>
<span class="sd">/// My Awesome Application</span>
<span class="cp">#[derive(Parser, Debug)]</span><span class="w"></span>
<span class="cp">#[command(author, version, about)]</span><span class="w"></span>
<span class="k">pub</span><span class="w"> </span><span class="k">struct</span> <span class="nc">Config</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="sd">/// IPv4 address</span>
<span class="w"> </span><span class="cp">#[arg(short, long, env(</span><span class="s">"APP_IPADDR"</span><span class="cp">), default_value = </span><span class="s">"0.0.0.0"</span><span class="cp">)]</span><span class="w"></span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">ipaddr</span>: <span class="nc">Ipv4Addr</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="sd">/// Port number</span>
<span class="w"> </span><span class="cp">#[arg(short, long, env(</span><span class="s">"APP_PORT"</span><span class="cp">), default_value_t = 3000)]</span><span class="w"></span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">port</span>: <span class="kt">u16</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="sd">/// Database connection string</span>
<span class="w"> </span><span class="cp">#[arg(short, long, env(</span><span class="s">"APP_DATABASE_URL"</span><span class="cp">))]</span><span class="w"></span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">database_url</span>: <span class="nb">String</span><span class="p">,</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<h2>Constructor</h2>
<p>Since we want to (optionally) populate our environment using a <code>.env</code> file,
we have to set up the environment before invoking the Clap parser. To do this,
We'll implement a <code>from_env_and_args</code> constructor method for our <code>Config</code> struct.</p>
<div class="highlight"><pre><span></span><code><span class="k">impl</span><span class="w"> </span><span class="n">Config</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="k">fn</span> <span class="nf">from_env_and_args</span><span class="p">()</span><span class="w"> </span>-> <span class="nc">Self</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">dotenv</span><span class="p">().</span><span class="n">ok</span><span class="p">();</span><span class="w"></span>
<span class="w"> </span><span class="bp">Self</span>::<span class="n">parse</span><span class="p">()</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>With four potential inputs, how do we reason about which takes precendence?
To determine the config value, the effective order is as follows, <em>first one</em> wins:</p>
<ol>
<li>Command line interface argument</li>
<li>File (<code>.env</code>)</li>
<li>Environment variable</li>
<li>Default value</li>
</ol>
<h2>Main</h2>
<p>Finally, we write our main function to create and construct the <code>Config</code> at runtime.</p>
<div class="highlight"><pre><span></span><code><span class="k">fn</span> <span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">cfg</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Config</span>::<span class="n">from_env_and_args</span><span class="p">();</span><span class="w"></span>
<span class="w"> </span><span class="fm">println!</span><span class="p">(</span><span class="s">"Starting HTTP server on {}:{}"</span><span class="p">,</span><span class="w"> </span><span class="n">cfg</span><span class="p">.</span><span class="n">ipaddr</span><span class="p">,</span><span class="w"> </span><span class="n">cfg</span><span class="p">.</span><span class="n">port</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="fm">println!</span><span class="p">(</span><span class="s">"Connecting to {}"</span><span class="p">,</span><span class="w"> </span><span class="n">cfg</span><span class="p">.</span><span class="n">database_url</span><span class="p">);</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Presumably, your application will do something more interesting here!</p>
<h2>Result</h2>
<div class="highlight"><pre><span></span><code>$ cargo build
...
$ ./target/debug/myapp --help
My Awesome Application
Usage: myapp <span class="o">[</span>OPTIONS<span class="o">]</span> --database-url <DATABASE_URL>
Options:
-i, --ipaddr <IPADDR> IPv4 address <span class="o">[</span>env: <span class="nv">APP_IPADDR</span><span class="o">=]</span> <span class="o">[</span>default: <span class="m">0</span>.0.0.0<span class="o">]</span>
-p, --port <PORT> Port number <span class="o">[</span>env: <span class="nv">APP_PORT</span><span class="o">=]</span> <span class="o">[</span>default: <span class="m">3000</span><span class="o">]</span>
-d, --database-url <DATABASE_URL> Database connection string <span class="o">[</span>env: <span class="nv">APP_DATABASE_URL</span><span class="o">=]</span>
-h, --help Print <span class="nb">help</span> information
-V, --version Print version information
</code></pre></div>
<p>In this case, we see that the <code>database_url</code> is undefined in the environment, has no default, but
is required by the application. If we try to run it now, the app exits with status
code of <code>2</code> and we get a human-readable message that we are missing the database URL:</p>
<div class="highlight"><pre><span></span><code>$ ./target/debug/myapp
error: The following required arguments were not provided:
--database-url <DATABASE_URL>
Usage: myapp --database-url <DATABASE_URL>
For more information try <span class="s1">'--help'</span>
</code></pre></div>
<p>To provide it we have three options, depending on your operational needs.</p>
<p>First, we can use the command line for interactive testing:</p>
<div class="highlight"><pre><span></span><code>./target/debug/myapp --database-url postgres://postgres@localhost:5432/postgres
</code></pre></div>
<p>Or, an environment variable for production settings:</p>
<div class="highlight"><pre><span></span><code><span class="nb">export</span> <span class="nv">APP_DATABASE_URL</span><span class="o">=</span><span class="s2">"postgres://postgres@localhost:5432/postgres"</span>
./target/debug/myapp
</code></pre></div>
<p>Or finally, using a <code>.env</code> file for declarative environment setup (in prod or dev).</p>
<div class="highlight"><pre><span></span><code><span class="nb">echo</span> <span class="s1">'APP_DATABASE_URL=postgres://postgres@localhost:5432/postgres'</span> >> .env
./target/debug/myapp
</code></pre></div>
<p>Whichever way we configure the required <code>DATABASE_URL</code>, we get the same result.</p>
<div class="highlight"><pre><span></span><code>$ ./target/debug/myapp
Starting HTTP server on <span class="m">0</span>.0.0.0:3000
Connecting to postgres://postgres@localhost:5432/postgres
</code></pre></div>
<p>Error handling is intuitive from the command line.
Let's see what happens when we provide an invalid IP adress and port number.</p>
<div class="highlight"><pre><span></span><code>$ ./target/debug/myapp --ipaddr <span class="m">255</span>.255.255.999
error: Invalid value <span class="s1">'255.255.255.999'</span> <span class="k">for</span> <span class="s1">'--ipaddr <IPADDR>'</span>: invalid IPv4 address syntax
For more information try <span class="s1">'--help'</span>
$ ./target/debug/myapp --port <span class="m">999999</span>
error: Invalid value <span class="s1">'999999'</span> <span class="k">for</span> <span class="s1">'--port <PORT>'</span>: <span class="m">999999</span> is not <span class="k">in</span> <span class="m">0</span>..<span class="o">=</span><span class="m">65535</span>
For more information try <span class="s1">'--help'</span>
</code></pre></div>
<p>Viola. A simple, declarative, type-safe abstraction with minimal code.
We get operational flexibility and confidence in the validity of the inputs
without writing imperative code to handle the details of each scenario.</p>
<p>This can serve as a starter template suitable for most backend server or command line applications. Here it is, all 26 lines of code in one place:</p>
<div class="highlight"><pre><span></span><code><span class="k">use</span><span class="w"> </span><span class="n">clap</span>::<span class="n">Parser</span><span class="p">;</span><span class="w"></span>
<span class="k">use</span><span class="w"> </span><span class="n">dotenv</span>::<span class="n">dotenv</span><span class="p">;</span><span class="w"></span>
<span class="k">use</span><span class="w"> </span><span class="n">std</span>::<span class="n">net</span>::<span class="n">Ipv4Addr</span><span class="p">;</span><span class="w"></span>
<span class="sd">/// My Awesome Application</span>
<span class="cp">#[derive(Parser, Debug)]</span><span class="w"></span>
<span class="cp">#[command(author, version, about)]</span><span class="w"></span>
<span class="k">pub</span><span class="w"> </span><span class="k">struct</span> <span class="nc">Config</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="sd">/// IPv4 address</span>
<span class="w"> </span><span class="cp">#[arg(short, long, env(</span><span class="s">"APP_IPADDR"</span><span class="cp">), default_value = </span><span class="s">"0.0.0.0"</span><span class="cp">)]</span><span class="w"></span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">ipaddr</span>: <span class="nc">Ipv4Addr</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="sd">/// Port number</span>
<span class="w"> </span><span class="cp">#[arg(short, long, env(</span><span class="s">"APP_PORT"</span><span class="cp">), default_value_t = 3000)]</span><span class="w"></span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">port</span>: <span class="kt">u16</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="sd">/// Database connection string</span>
<span class="w"> </span><span class="cp">#[arg(short, long, env(</span><span class="s">"APP_DATABASE_URL"</span><span class="cp">))]</span><span class="w"></span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">database_url</span>: <span class="nb">String</span><span class="p">,</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="k">impl</span><span class="w"> </span><span class="n">Config</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="k">fn</span> <span class="nf">from_env_and_args</span><span class="p">()</span><span class="w"> </span>-> <span class="nc">Self</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">dotenv</span><span class="p">().</span><span class="n">ok</span><span class="p">();</span><span class="w"></span>
<span class="w"> </span><span class="bp">Self</span>::<span class="n">parse</span><span class="p">()</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">cfg</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Config</span>::<span class="n">from_env_and_args</span><span class="p">();</span><span class="w"></span>
<span class="w"> </span><span class="fm">println!</span><span class="p">(</span><span class="s">"Starting HTTP server on {}:{}"</span><span class="p">,</span><span class="w"> </span><span class="n">cfg</span><span class="p">.</span><span class="n">ipaddr</span><span class="p">,</span><span class="w"> </span><span class="n">cfg</span><span class="p">.</span><span class="n">port</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="fm">println!</span><span class="p">(</span><span class="s">"Connecting to {}"</span><span class="p">,</span><span class="w"> </span><span class="n">cfg</span><span class="p">.</span><span class="n">database_url</span><span class="p">);</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Check out the <a href="https://docs.rs/clap/latest/clap/index.html">clap docs</a> for more examples
of how you can extend this approach.</p>
<p>I think this interface shows that we don't need to compromise between ergonomics and type-safety, speed and correctness. It's a great example of Rust's potential as a higher level application language.</p>Don't install PostgreSQL - Using containers for local development.2022-02-11T00:00:00-07:002022-02-11T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2022-02-11:/dont-install-postgresql-using-containers-for-local-development.html<p>So you need a database for an application you're developing. You've looked around and decided that PostgreSQL fits the bill. Excellent choice! Now it's time to start coding. How do you get postgres running locally to devlop and test against it?</p>
<p>The typical suggestion for many web application frameworks is …</p><p>So you need a database for an application you're developing. You've looked around and decided that PostgreSQL fits the bill. Excellent choice! Now it's time to start coding. How do you get postgres running locally to devlop and test against it?</p>
<p>The typical suggestion for many web application frameworks is to install PostgreSQL to your system using your chosen dependency management tool - <code>brew install postgresql</code> or <code>apt install postgresql</code> - then configure it to work for your application (maybe tweaking some settings in <code>/etc/postgresql/</code> as the root user), starting a background process with your system supervisor of choice (<code>sudo systemctl start postgresql</code>), hooking it up to your app, and you're off to the races.</p>
<p>But what happens when you're working on project that needs a different major version of postgresql, with different extensions or entirely different settings? I often found myself in a scenario where my system was full of cruft, having been reworked many times over to swap out different postgresql instances. Additionally there is only a single data directory (<code>/etc/postgresql/<version>/main</code>) so if you need the data to persist for more than a single project, you have to manage backup and restore each time you switch contexts.</p>
<p>A traditional system install just doesn't cut it. We need a way to run many different postgres instances, independent of each other with isolated data, settings and software versions. We can use Docker containers to run postgresql in a more flexible way that allows for greater experimentation, data stability, and greatly improved ease of use.</p>
<h2>Running postgres in Docker, the naive approach</h2>
<p>There's no real secret to running Docker containers. We know that <a href="https://hub.docker.com/_/postgres/">postgresql docker images</a> exist and we should be able to run them like any other.</p>
<div class="highlight"><pre><span></span><code><span class="o">$</span><span class="w"> </span><span class="n">docker</span><span class="w"> </span><span class="n">run</span><span class="w"> </span><span class="n">postgres</span><span class="p">:</span><span class="mf">14.1</span><span class="w"></span>
<span class="n">Unable</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">find</span><span class="w"> </span><span class="n">image</span><span class="w"> </span><span class="s1">'postgres:14.1'</span><span class="w"> </span><span class="n">locally</span><span class="w"></span>
<span class="mf">14.1</span><span class="p">:</span><span class="w"> </span><span class="n">Pulling</span><span class="w"> </span><span class="n">from</span><span class="w"> </span><span class="n">library</span><span class="o">/</span><span class="n">postgres</span><span class="w"></span>
<span class="o">...</span><span class="w"></span>
<span class="n">Status</span><span class="p">:</span><span class="w"> </span><span class="n">Downloaded</span><span class="w"> </span><span class="n">newer</span><span class="w"> </span><span class="n">image</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">postgres</span><span class="p">:</span><span class="mf">14.1</span><span class="w"></span>
<span class="n">Error</span><span class="p">:</span><span class="w"> </span><span class="n">Database</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="n">uninitialized</span><span class="w"> </span><span class="ow">and</span><span class="w"> </span><span class="n">superuser</span><span class="w"> </span><span class="n">password</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="ow">not</span><span class="w"> </span><span class="n">specified</span><span class="o">.</span><span class="w"></span>
<span class="w"> </span><span class="n">You</span><span class="w"> </span><span class="n">must</span><span class="w"> </span><span class="n">specify</span><span class="w"> </span><span class="n">POSTGRES_PASSWORD</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">non</span><span class="o">-</span><span class="n">empty</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">the</span><span class="w"></span>
<span class="w"> </span><span class="n">superuser</span><span class="o">.</span><span class="w"> </span><span class="n">For</span><span class="w"> </span><span class="n">example</span><span class="p">,</span><span class="w"> </span><span class="s2">"-e POSTGRES_PASSWORD=password"</span><span class="w"> </span><span class="n">on</span><span class="w"> </span><span class="s2">"docker run"</span><span class="o">.</span><span class="w"></span>
<span class="w"> </span><span class="n">You</span><span class="w"> </span><span class="n">may</span><span class="w"> </span><span class="n">also</span><span class="w"> </span><span class="n">use</span><span class="w"> </span><span class="s2">"POSTGRES_HOST_AUTH_METHOD=trust"</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">allow</span><span class="w"> </span><span class="n">all</span><span class="w"></span>
<span class="w"> </span><span class="n">connections</span><span class="w"> </span><span class="n">without</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">password</span><span class="o">.</span><span class="w"> </span><span class="n">This</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="o">*</span><span class="ow">not</span><span class="o">*</span><span class="w"> </span><span class="n">recommended</span><span class="o">.</span><span class="w"></span>
<span class="w"> </span><span class="n">See</span><span class="w"> </span><span class="n">PostgreSQL</span><span class="w"> </span><span class="n">documentation</span><span class="w"> </span><span class="n">about</span><span class="w"> </span><span class="s2">"trust"</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">www</span><span class="o">.</span><span class="n">postgresql</span><span class="o">.</span><span class="n">org</span><span class="o">/</span><span class="n">docs</span><span class="o">/</span><span class="n">current</span><span class="o">/</span><span class="n">auth</span><span class="o">-</span><span class="n">trust</span><span class="o">.</span><span class="n">html</span><span class="w"></span>
</code></pre></div>
<p>Ah, clearly there are a few tricks specific to running postgres in a container. If we set a postgres password, we can get a running postgres instance.</p>
<div class="highlight"><pre><span></span><code>$ docker run -e <span class="nv">POSTGRES_PASSWORD</span><span class="o">=</span>password postgres:14.1
...
<span class="m">2022</span>-02-03 <span class="m">18</span>:23:38.823 UTC <span class="o">[</span><span class="m">1</span><span class="o">]</span> LOG: database system is ready to accept connections
</code></pre></div>
<p>The container startup script will initialize your database, create users and start the process, listening for connections. But where is it listening? We can't yet connect to it. And where is the data? We can't see any data anywhere on our host system. Everything is, well, contained within the running Docker container.</p>
<p>To make this workflow viable for local development, we'd like</p>
<ul>
<li>An open TCP port on the host system so we can connect to it.</li>
<li>The data to live on the host system, not in the container's overlay filesystem.</li>
<li>To give postgres access to files from the host system so that we can import datasets.</li>
<li>Settings to live on the host system so that we can adjust them and optionally check them into source control.</li>
</ul>
<p>Of course the offical <a href="https://hub.docker.com/_/postgres/">PostgreSQL Docker documentation</a> covers these exact scenarios, showing us how we can use <em>port forwarding</em> and <em>volume mounts</em>.</p>
<h1>An alternative to system-wide PostgreSQL installs</h1>
<p>Here is my opinionated take on how to set up an ergonomic postgres environment for local development.</p>
<p>First, create a <code>database</code> directory in your project to hold all things postgres</p>
<p>Then create <code>database/postgresql.conf</code> to specify the postgres settings. The example below is a subset of the full postgres config, the settings that I typically need to adjust when doing any serious performance-sensistive development</p>
<div class="highlight"><pre><span></span><code><span class="c"># PostgreSQL configuration file</span>
<span class="c"># See https://github</span><span class="nt">.</span><span class="c">com/postgres/postgres/blob/master/src/backend/utils/misc/postgresql</span><span class="nt">.</span><span class="c">conf</span><span class="nt">.</span><span class="c">sample</span>
<span class="c">#</span><span class="nb">------------------------------------------------------------------------------</span><span class="c"></span>
<span class="c"># CONNECTIONS AND AUTHENTICATION</span>
<span class="c">#</span><span class="nb">------------------------------------------------------------------------------</span><span class="c"></span>
<span class="c">listen_addresses = '*'</span>
<span class="c">port = 5432 # (change requires restart)</span>
<span class="c">max_connections = 100 # (change requires restart)</span>
<span class="c">#</span><span class="nb">------------------------------------------------------------------------------</span><span class="c"></span>
<span class="c"># RESOURCE USAGE (except WAL)</span>
<span class="c">#</span><span class="nb">------------------------------------------------------------------------------</span><span class="c"></span>
<span class="c">shared_buffers = 2048MB # min 128kB</span>
<span class="c">work_mem = 40MB # min 64kB</span>
<span class="c">maintenance_work_mem = 640MB # min 1MB</span>
<span class="c">dynamic_shared_memory_type = posix # the default is the first option</span>
<span class="c">max_parallel_workers_per_gather = 6 # taken from max_parallel_workers</span>
<span class="c">max_parallel_workers = 12 # maximum number of max_worker_processes that</span>
<span class="c">#</span><span class="nb">------------------------------------------------------------------------------</span><span class="c"></span>
<span class="c"># WRITE</span><span class="nb">-</span><span class="c">AHEAD LOG</span>
<span class="c">#</span><span class="nb">------------------------------------------------------------------------------</span><span class="c"></span>
<span class="c">checkpoint_timeout = 40min # range 30s</span><span class="nb">-</span><span class="c">1d</span>
<span class="c">max_wal_size = 1GB</span>
<span class="c">min_wal_size = 80MB</span>
<span class="c">checkpoint_completion_target = 0</span><span class="nt">.</span><span class="c">75 # checkpoint target duration</span><span class="nt">,</span><span class="c"> 0</span><span class="nt">.</span><span class="c">0 </span><span class="nb">-</span><span class="c"> 1</span><span class="nt">.</span><span class="c">0</span>
<span class="c">#</span><span class="nb">------------------------------------------------------------------------------</span><span class="c"></span>
<span class="c"># REPORTING AND LOGGING</span>
<span class="c">#</span><span class="nb">------------------------------------------------------------------------------</span><span class="c"></span>
<span class="c">logging_collector = off</span>
<span class="c">log_autovacuum_min_duration = 0</span>
<span class="c">log_checkpoints = on</span>
<span class="c">log_connections = on</span>
<span class="c">log_disconnections = on</span>
<span class="c">log_error_verbosity = default</span>
<span class="c">log_min_duration_statement = 20ms</span>
<span class="c">log_lock_waits = on</span>
<span class="c">log_temp_files = 0</span>
<span class="c">log_timezone = 'UTC'</span>
<span class="c">#</span><span class="nb">------------------------------------------------------------------------------</span><span class="c"></span>
<span class="c"># AUTOVACUUM</span>
<span class="c">#</span><span class="nb">------------------------------------------------------------------------------</span><span class="c"></span>
<span class="c">autovacuum_vacuum_scale_factor = 0</span><span class="nt">.</span><span class="c">02 # fraction of table size before vacuum</span>
<span class="c">autovacuum_analyze_scale_factor = 0</span><span class="nt">.</span><span class="c">01 # fraction of table size before analyze</span>
<span class="c">#</span><span class="nb">------------------------------------------------------------------------------</span><span class="c"></span>
<span class="c"># CLIENT CONNECTION DEFAULTS</span>
<span class="c">#</span><span class="nb">------------------------------------------------------------------------------</span><span class="c"></span>
<span class="c">datestyle = 'iso</span><span class="nt">,</span><span class="c"> mdy'</span>
<span class="c">timezone = 'UTC'</span>
<span class="c">lc_messages = 'C</span><span class="nt">.</span><span class="c">UTF</span><span class="nb">-</span><span class="c">8'</span>
<span class="c">lc_monetary = 'C</span><span class="nt">.</span><span class="c">UTF</span><span class="nb">-</span><span class="c">8'</span>
<span class="c">lc_numeric = 'C</span><span class="nt">.</span><span class="c">UTF</span><span class="nb">-</span><span class="c">8'</span>
<span class="c">lc_time = 'C</span><span class="nt">.</span><span class="c">UTF</span><span class="nb">-</span><span class="c">8'</span>
<span class="c">default_text_search_config = 'pg_catalog</span><span class="nt">.</span><span class="c">english'</span>
<span class="c">shared_preload_libraries = 'pg_stat_statements'</span>
</code></pre></div>
<p>Create a <code>database/pg_hba.conf</code> to control access to the database. You might need to adjust this to experiment with different networking setups, different users, etc. Usually the defaults here are fine.</p>
<div class="highlight"><pre><span></span><code>#<span class="w"> </span><span class="nv">PostgreSQL</span><span class="w"> </span><span class="nv">Client</span><span class="w"> </span><span class="nv">Authentication</span><span class="w"> </span><span class="nv">Configuration</span><span class="w"> </span><span class="nv">File</span><span class="w"></span>
#<span class="w"> </span><span class="o">===================================================</span><span class="w"></span>
#<span class="w"> </span><span class="nv">TYPE</span><span class="w"> </span><span class="nv">DATABASE</span><span class="w"> </span><span class="nv">USER</span><span class="w"> </span><span class="nv">CIDR</span><span class="o">-</span><span class="nv">ADDRESS</span><span class="w"> </span><span class="nv">METHOD</span><span class="w"></span>
#<span class="w"> </span><span class="nv">Database</span><span class="w"> </span><span class="nv">administrative</span><span class="w"> </span><span class="nv">login</span><span class="w"> </span><span class="nv">by</span><span class="w"> </span><span class="nv">UNIX</span><span class="w"> </span><span class="nv">sockets</span><span class="w"></span>
#<span class="w"> </span><span class="s2">"local"</span><span class="w"> </span><span class="nv">is</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nv">Unix</span><span class="w"> </span><span class="nv">domain</span><span class="w"> </span><span class="nv">socket</span><span class="w"> </span><span class="nv">connections</span><span class="w"> </span><span class="nv">only</span><span class="w"></span>
<span class="nv">local</span><span class="w"> </span><span class="nv">all</span><span class="w"> </span><span class="nv">postgres</span><span class="w"> </span><span class="nv">ident</span><span class="w"></span>
<span class="nv">local</span><span class="w"> </span><span class="nv">all</span><span class="w"> </span><span class="nv">all</span><span class="w"> </span><span class="nv">ident</span><span class="w"></span>
#<span class="w"> </span><span class="nv">IPv4</span><span class="w"> </span><span class="nv">local</span><span class="w"> </span><span class="nv">connections</span>:<span class="w"></span>
<span class="nv">host</span><span class="w"> </span><span class="nv">all</span><span class="w"> </span><span class="nv">all</span><span class="w"> </span><span class="mi">172</span>.<span class="mi">17</span>.<span class="mi">0</span>.<span class="mi">0</span><span class="o">/</span><span class="mi">16</span><span class="w"> </span><span class="nv">md5</span><span class="w"></span>
#<span class="w"> </span><span class="nv">IPv6</span><span class="w"> </span><span class="nv">local</span><span class="w"> </span><span class="nv">connections</span>:<span class="w"></span>
<span class="nv">host</span><span class="w"> </span><span class="nv">all</span><span class="w"> </span><span class="nv">all</span><span class="w"> </span>::<span class="mi">1</span><span class="o">/</span><span class="mi">128</span><span class="w"> </span><span class="nv">md5</span><span class="w"></span>
</code></pre></div>
<p>Make two subdirectories to hold the data: <code>database/mnt_data</code> to hold data you intend to import/export and <code>database/pgdata</code> to hold the actual database.</p>
<div class="highlight"><pre><span></span><code>$ mkdir mnt_data
$ mkdir pgdata
</code></pre></div>
<p>You probably don't want to check your datasets or database into source control. Create a <code>database/.gitignore</code> to ignore them</p>
<div class="highlight"><pre><span></span><code># .gitignore
pgdata
mnt_data
</code></pre></div>
<p>Finally, create a <code>run-postgres.sh</code> script to launch the docker container with everything hooked up.</p>
<div class="highlight"><pre><span></span><code><span class="c1"># run-postgres.sh</span>
<span class="nb">set</span> -e
<span class="nv">HOST_PORT</span><span class="o">=</span><span class="m">5432</span>
<span class="nv">NAME</span><span class="o">=</span>postgres-dev
<span class="nv">DOCKER_REPO</span><span class="o">=</span>postgres
<span class="nv">TAG</span><span class="o">=</span><span class="m">14</span>.1
docker run --rm --name <span class="nv">$NAME</span> <span class="se">\</span>
--volume <span class="sb">`</span><span class="nb">pwd</span><span class="sb">`</span>/pgdata:/var/lib/pgsql/data <span class="se">\</span>
--volume <span class="sb">`</span><span class="nb">pwd</span><span class="sb">`</span>/mnt_data:/mnt/data <span class="se">\</span>
--volume <span class="sb">`</span><span class="nb">pwd</span><span class="sb">`</span>/pg_hba.conf:/etc/postgresql/pg_hba.conf <span class="se">\</span>
--volume <span class="sb">`</span><span class="nb">pwd</span><span class="sb">`</span>/postgresql.conf:/etc/postgresql/postgresql.conf <span class="se">\</span>
-e <span class="nv">POSTGRES_PASSWORD</span><span class="o">=</span>password <span class="se">\</span>
-e <span class="nv">POSTGRES_USER</span><span class="o">=</span>postgres <span class="se">\</span>
-e <span class="nv">PGDATA</span><span class="o">=</span>/var/lib/pgsql/data/pgdata14 <span class="se">\</span>
-e <span class="nv">POSTGRES_INITDB_ARGS</span><span class="o">=</span><span class="s2">"--data-checksums --encoding=UTF8"</span> <span class="se">\</span>
-e <span class="nv">POSTGRES_DB</span><span class="o">=</span>db <span class="se">\</span>
-p <span class="si">${</span><span class="nv">HOST_PORT</span><span class="si">}</span>:5432 <span class="se">\</span>
<span class="si">${</span><span class="nv">DOCKER_REPO</span><span class="si">}</span>:<span class="si">${</span><span class="nv">TAG</span><span class="si">}</span> <span class="se">\</span>
postgres <span class="se">\</span>
-c <span class="s1">'config_file=/etc/postgresql/postgresql.conf'</span> <span class="se">\</span>
-c <span class="s1">'hba_file=/etc/postgresql/pg_hba.conf'</span>
</code></pre></div>
<p>Note the <code>HOST_PORT</code> variable. If you've already got another database running on 5432, this won't work. This is where you need to get a bit creative and tune the process to your needs. What I typically do is use port <strong>6</strong>432 and increment by one for every project so they don't conflict. This allows to run all of your databases at the same time on one machine. The only downside is you need to remember which port maps to which database!</p>
<h2>Running it</h2>
<div class="highlight"><pre><span></span><code>$ ./run-postgres.sh
...
<span class="m">2022</span>-02-03 <span class="m">19</span>:13:09.673 UTC <span class="o">[</span><span class="m">1</span><span class="o">]</span> LOG: starting PostgreSQL <span class="m">14</span>.1 <span class="o">(</span>Debian <span class="m">14</span>.1-1.pgdg110+1<span class="o">)</span> on x86_64-pc-linux-gnu, compiled by gcc <span class="o">(</span>Debian <span class="m">10</span>.2.1-6<span class="o">)</span> <span class="m">10</span>.2.1 <span class="m">20210110</span>, <span class="m">64</span>-bit
<span class="m">2022</span>-02-03 <span class="m">19</span>:13:09.673 UTC <span class="o">[</span><span class="m">1</span><span class="o">]</span> LOG: listening on IPv4 address <span class="s2">"0.0.0.0"</span>, port <span class="m">5432</span>
<span class="m">2022</span>-02-03 <span class="m">19</span>:13:09.673 UTC <span class="o">[</span><span class="m">1</span><span class="o">]</span> LOG: listening on IPv6 address <span class="s2">"::"</span>, port <span class="m">5432</span>
<span class="m">2022</span>-02-03 <span class="m">19</span>:13:09.677 UTC <span class="o">[</span><span class="m">1</span><span class="o">]</span> LOG: listening on Unix socket <span class="s2">"/var/run/postgresql/.s.PGSQL.5432"</span>
<span class="m">2022</span>-02-03 <span class="m">19</span>:13:09.685 UTC <span class="o">[</span><span class="m">26</span><span class="o">]</span> LOG: database system was shut down at <span class="m">2021</span>-11-13 <span class="m">21</span>:34:06 UTC
<span class="m">2022</span>-02-03 <span class="m">19</span>:13:09.700 UTC <span class="o">[</span><span class="m">1</span><span class="o">]</span> LOG: database system is ready to accept connections
</code></pre></div>
<p>Using this setup, the logs are sent directly to <code>stdout</code> so you'll see everything in the terminal. The ports and paths in the logs are <em>inside</em> the container, so don't get fooled trying to find them on your host system.</p>
<p>To connect, we use the defined host port</p>
<div class="highlight"><pre><span></span><code>$ psql postgres://postgres:password@localhost:6432/postgres
</code></pre></div>
<p>You can put data in <code>mnt_data</code> from the host system, which will be exposed to postgresql as the <code>/mnt/data</code> directory inside the container. For example, load it with psql using <code>COPY data FROM '/mnt/data/my.csv' WITH CSV HEADER;</code>. Likewise, any data dumps or exports <em>from</em> postgres can be output to this directory, immediately accessible to the host system.</p>
<p>To stop the server, use Ctrl-C. The data will persist to your <code>pgdata</code> directory. Resist the temptation to touch any files therein as they are managemed internally to postgres. But you can move the directory as a whole around the filesystem or to another machine. It's not quite as convenient as a process-less, single file SQLite database but it's close.</p>
<p>Because the <code>pgdata</code> directory is created by postgres which provides strong gaurantees that the on-disk
data format will be consistent within a major version, we can even use a different image altogether to access the same underlying dataset. This can be very handy for e.g. switching between vanilla postgres and postgis,
or for testing different versions of extensions, etc. As long as the image follows the basic rules of the postgres container behavior and uses the same major version, it should just work.</p>
<h2>What about in production?</h2>
<p>Installing postgresql on a VM or bare-metal server is still viable, especially if automated with configuration tools like Ansible or Chef. But there are other options.</p>
<p>If your project is all-in on containers in production, consider checking out some of the Kubernetes operators for postgres.
You can use the exact same container image in production that you test on locally,
albeit with some additional operational concerns around availability
and stateful data. Operator software like
<a href="https://www.crunchydata.com/products/crunchy-postgresql-for-kubernetes/">Crunchy PostgreSQL for Kubernetes</a> and <a href="https://www.kubegres.io">Kubegres</a> can be configured for load balancing, high-availability, backups, monitoring, etc. which can ease the operational burden should your database require such things.</p>
<p>Of course, there is always the cloud hosted option. I've used postgresql on both GCP Cloud SQL and AWS RDS and, while you give up some control of the environment and are no longer able to run the exact same database locally as you do in prod, the easy of adminstering these hosted databases might be worth it.</p>
<h2>Conclusion</h2>
<p>Docker containers provide a robust way to run postgres in local development, with very few compromises. A container-based workflow makes it easier to maintain multiple parallel database, and to move data freely between systems. For my money,
there's no need to <code>apt install</code> postgres again.</p>Zonal Stats with PostGIS Rasters, part 22020-11-28T00:00:00-07:002020-11-28T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2020-11-28:/zonal-stats-with-postgis-rasters-part-2.html<p>In my <a href="https://www.perrygeo.com/zonal-stats-with-postgis-rasters.html">last post</a> I compared two approaches for calculating zonal statistics:</p>
<ul>
<li>A Python approach using the rasterstats library</li>
<li>A SQL approach using PostGIS rasters.</li>
</ul>
<p>I came away happy that I could express zonal stats in SQL, but wasn't happy with the performance; an 87x slowdown compared to the equivalent …</p><p>In my <a href="https://www.perrygeo.com/zonal-stats-with-postgis-rasters.html">last post</a> I compared two approaches for calculating zonal statistics:</p>
<ul>
<li>A Python approach using the rasterstats library</li>
<li>A SQL approach using PostGIS rasters.</li>
</ul>
<p>I came away happy that I could express zonal stats in SQL, but wasn't happy with the performance; an 87x slowdown compared to the equivalent Python code. When in doubt though, it's user error! I received some good suggestions from readers of this blog (Thanks Stefan Jäger and Pierre Racine!) who suggested some performance enhancements from <strong>tiling</strong> and <strong>spatial indexes</strong>.</p>
<p>Additionally, I wasn't happy with the setup of the last experiment; while PostGIS and Rasterio both interact with the underlying GDAL C API,
in my experiment they were using GDAL libraries of different origins. And I'm skeptical that my synthetic vector data was representive of all workloads. A common case for zonal statistics is aggregating a raster by (non-overlapping) administrative boundaries. The nature of the datasets can have a significant impact; best to go with something more realistic.</p>
<p>Time for a reboot...</p>
<h2>Reproducible containers</h2>
<p>I used my <a href="https://github.com/perrygeo/docker-postgres"><code>docker-postgres</code> image</a> to easily recreate an environment where everything is built from source against the same shared libraries.</p>
<p>To run a postgresql server from a docker container (no messy install required) with
local data volumes mounted in <code>./pgdata</code>.</p>
<div class="highlight"><pre><span></span><code>git clone https://github.com/perrygeo/docker-postgres.git
<span class="nb">cd</span> docker-postgres
./run-postgres.sh
</code></pre></div>
<p>This will download a pre-built image <a href="https://hub.docker.com/r/perrygeo/postgres/tags?page=1&ordering=last_updated">from Dockerhub</a> so you can try it out without messing with your system. Then launches
the Postgresql server process, with your local <code>pgdata</code>, <code>mnt_data</code> and <code>log</code> directories mounted as container
volumes. </p>
<p>In order to run Python code from the same container, we can <code>exec</code> into it to get shell access:</p>
<div class="highlight"><pre><span></span><code>docker <span class="nb">exec</span> -ti postgres-server /bin/bash
</code></pre></div>
<p>From here we can run our Python-based command line tools (Rasterio)</p>
<div class="highlight"><pre><span></span><code>$ rio --version
<span class="m">1</span>.1.8
$ rio --gdal-version
<span class="m">3</span>.2.0
</code></pre></div>
<p>Connecting to the server with <code>psql</code>, I can use the built-in version commands to show what we're working with</p>
<p><code>SELECT version();</code> </p>
<div class="highlight"><pre><span></span><code>PostgreSQL 13.0 on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
</code></pre></div>
<p><code>SELECT postgis_full_version();</code></p>
<div class="highlight"><pre><span></span><code>POSTGIS="3.1.0alpha3 b2221ee"
PGSQL="130"
GEOS="3.9.0dev-CAPI-1.14.0"
PROJ="7.2.0"
GDAL="GDAL 3.2.0, released 2020/10/26"
LIBXML="2.9.4"
LIBJSON="0.12.1"
LIBPROTOBUF="1.3.3"
WAGYU="0.5.0 (Internal)"
RASTER
</code></pre></div>
<p>Since the Rasterio library is running in the container, linked to exact same GDAL, GEOS and PROJ libraries as PostGIS, we can be assured of a more consistent environment.</p>
<h2>Raster dataset</h2>
<p>For our raster dataset, we'll use the historic climate data provided by the <a href="https://worldclim.org/data/worldclim21.html">WorldClim</a> project. For our experiment we'll use the historic average monthly temperature rasters.</p>
<div class="highlight"><pre><span></span><code>wget http://biogeo.ucdavis.edu/data/worldclim/v2.1/base/wc2.1_2.5m_tavg.zip
unzip wc2.1_2.5m_tavg.zip
</code></pre></div>
<p>The result is a dozen monthly GeoTIFF files representing the historic average temperature for the month - we'll use <code>wc2.1_2.5m_tavg_07.tif</code>, the average July temperature. Each raster is a 4320 x 8640 grid with global coverage in WGS84 coordinates.</p>
<p>And use Rasterio to inspect the shape of the raster grid</p>
<div class="highlight"><pre><span></span><code>rio info wc2.1_2.5m_tavg_07.tif <span class="p">|</span> jq -c .shape
</code></pre></div>
<p>which prints to stdout, confirming the raster grid shape:</p>
<div class="highlight"><pre><span></span><code>[4320,8640]
</code></pre></div>
<h2>Vector data</h2>
<p>For our vector dataset, we're using the <a href="https://www.naturalearthdata.com/downloads/50m-cultural-vectors/50m-admin-0-countries-2/">Natural Earth Admin</a>
dataset with 241 multipolygons, one for each nation.</p>
<div class="highlight"><pre><span></span><code>wget https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/50m/cultural/ne_50m_admin_0_countries.zip
unzip ne_50m_admin_0_countries.zip
</code></pre></div>
<p>Check the number of features using Fiona</p>
<div class="highlight"><pre><span></span><code>$ fio info ne_50m_admin_0_countries.shp <span class="p">|</span> jq .count
<span class="m">241</span>
</code></pre></div>
<p>Overlaying the admin polygons on top of the stylized temperature raster and we get a good picture of the question we're trying to answer:</p>
<blockquote>
<p>What is the historical average temperature of each country in the month of July?</p>
</blockquote>
<p><img src="assets/img/worldclim-avg-temp-ne-admin.png" width="800px"></p>
<h2>Zonal Stats using <code>python-rasterstats</code></h2>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">rasterstats</span> <span class="kn">import</span> <span class="n">zonal_stats</span>
<span class="n">stats</span> <span class="o">=</span> <span class="n">zonal_stats</span><span class="p">(</span>
<span class="n">vector</span><span class="o">=</span><span class="s2">"ne_50m_admin_0_countries.shp"</span><span class="p">,</span>
<span class="n">raster</span><span class="o">=</span><span class="s2">"wc2.1_2.5m_tavg_07.tif"</span><span class="p">,</span>
<span class="n">stats</span><span class="o">=</span><span class="p">[</span><span class="s2">"sum"</span><span class="p">,</span> <span class="s2">"mean"</span><span class="p">,</span> <span class="s2">"count"</span><span class="p">,</span> <span class="s2">"std"</span><span class="p">,</span> <span class="s2">"min"</span><span class="p">,</span> <span class="s2">"max"</span><span class="p">]</span>
<span class="p">)</span>
</code></pre></div>
<p>The time to complete this script was <strong>6.67 seconds</strong> (fastest of 3 runs).</p>
<h2>Zonal Stats using <code>postgis_raster</code></h2>
<p>To test the performance of the database, we need to get the data in:</p>
<h3>Load the raster data</h3>
<p>In part 1, I imported my raster data using a rather naive <code>raster2pgsql</code> command. This
time, we add a few more options to tune performance.</p>
<div class="highlight"><pre><span></span><code>raster2pgsql -Y -d -t 256x256 -N <span class="s1">'-3.4e+38'</span> -I -C -M -n <span class="s2">"path"</span> <span class="se">\</span>
wc2.1_2.5m_tavg_07.tif tavg_07 <span class="p">|</span> psql
</code></pre></div>
<p>The <code>-t 256x256</code> is a key parameter. By cutting the raster into 256-pixel square tiles,
the resulting raster table contains multiple rows, one per tile. A spatial index on the tiles, combined with rewriting the SQL to take advantage of the index and to aggregate across tiles, zonal stats can be made much more efficient inside PostgreSQL.</p>
<p>The <code>-I</code> indicates that a spatial index of the raster tiles should be built after import. The spatial index, along with a spatial query that can take advantage of it, can quickly select the subset of tiles that overlap your features of interest. </p>
<p>The other parameters to note:</p>
<ul>
<li><code>-Y</code> uses COPY for more efficient transfer.</li>
<li><code>-d</code> deletes the table if it already exists (useful for testing but careful in production).</li>
<li><code>-N</code> defines a nodata value directly at the CLI.</li>
<li><code>-n</code> create a <code>path</code> column to store the filename.</li>
<li><code>-C</code> applies constraints to ensure valid raster alignment, etc.</li>
<li><code>-M</code> runs <code>VACUUM ANALYZE</code> on the table as a final step.</li>
</ul>
<h3>Load the vector data</h3>
<p>Using a standard <code>shp2pgsql</code> with a <code>-I</code> to build and index.</p>
<div class="highlight"><pre><span></span><code>shp2pgsql -g geometry -I -s <span class="m">4326</span> ne_50m_admin_0_countries.shp countries <span class="p">|</span> psql
</code></pre></div>
<h3>Run the query</h3>
<p>Now we have two tables loaded, <code>countries</code> and <code>tavg_07</code>, and can ask our question in SQL:</p>
<div class="highlight"><pre><span></span><code><span class="k">SELECT</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="n">ST_SummaryStatsAgg</span><span class="p">(</span><span class="n">ST_Clip</span><span class="p">(</span><span class="n">raster</span><span class="p">.</span><span class="n">rast</span><span class="p">,</span><span class="w"> </span><span class="n">countries</span><span class="p">.</span><span class="n">geometry</span><span class="p">,</span><span class="w"> </span><span class="k">true</span><span class="p">),</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="k">true</span><span class="p">)).</span><span class="o">*</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">countries</span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">name</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">countries</span><span class="p">.</span><span class="n">geometry</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">geometry</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="k">count</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">n_tiles</span><span class="w"></span>
<span class="k">FROM</span><span class="w"></span>
<span class="w"> </span><span class="n">tavg_07</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">raster</span><span class="w"></span>
<span class="k">INNER</span><span class="w"> </span><span class="k">join</span><span class="w"> </span><span class="n">countries</span><span class="w"> </span><span class="k">on</span><span class="w"></span>
<span class="w"> </span><span class="n">ST_INTERSECTS</span><span class="p">(</span><span class="n">countries</span><span class="p">.</span><span class="n">geometry</span><span class="p">,</span><span class="w"> </span><span class="n">raster</span><span class="p">.</span><span class="n">rast</span><span class="p">)</span><span class="w"></span>
<span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"></span>
<span class="w"> </span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">geometry</span><span class="p">;</span><span class="w"></span>
</code></pre></div>
<p>I added the <code>GROUP BY</code> to aggregate across tiles; otherwise we'd get multiple rows per country. And on the SELECT side, PostGIS provides a <code>ST_SummaryStatsAgg</code> function (the aggregate variant of the <code>ST_SummaryStats</code>) to sum across tiles.</p>
<p>Here's the resulting map data rendered via DBeaver. The <code>count</code> is the number of raster <em>pixels</em> intersecting the feature, while the <code>n_tiles</code> is the number of raster <em>tiles</em>. The <code>mean</code> is probably what we're interested in; the avergage temperature.</p>
<p><img src="assets/img/dbeaver-zonal-results.png"></p>
<p>Here's the bottom line on performance: <strong>PostGIS can perform this query in 6.1s</strong>. Marginally faster than the Python rasterstats version even. It could be that the latest improvements in the geospatial stack account for some of this effect but tiling clearly matters to performance.</p>
<h2>Effect of tile size</h2>
<p>The chosen value of <code>-t</code> determines how much data fits into each tile. There's an unavoidable inverse relationship between the size of a row and the number of rows/tiles. Not surprisingly we find a tradeoff between those two constraints.</p>
<table>
<thead>
<tr>
<th>tilesize</th>
<th>query (s)</th>
<th>raster2pgsql <br> import (s)</th>
</tr>
</thead>
<tbody>
<tr>
<td>64x64</td>
<td>5.9</td>
<td>58.7</td>
</tr>
<tr>
<td>256x256</td>
<td>6.6</td>
<td>15.8</td>
</tr>
<tr>
<td>1024x1024</td>
<td>8.5</td>
<td>7.3</td>
</tr>
<tr>
<td>untiled</td>
<td>49.2</td>
<td>5.2</td>
</tr>
</tbody>
</table>
<p>Smaller tiles with a spatial index means more efficient queries, at the expernse of pre-chopping the raster into many tiles. Depending on the nature of your analysis, you'll want to adjust accordingly. The optimal tilesize is likely to depend on hardware, the tiling patterns of the orignal data and and the usage patterns you expect.</p>
<p>For this dataset, somewhere around 256x256 appears to be an optimal size. It would make a good default providing the benefits of tiling without as much import overhead as smaller tiles. </p>
<p>Surprisingly, the <strong>untiled version</strong> still performs <em>ok</em> relative to the python code. The query on an untiled raster is "only" 7.5x slower than the python code, not as bad as the 80x performance hit I found in part 1. While this factor seems highly dependent on the data at hand, the conclusion doesn't change - tiling maters.</p>
<h2>Conclusion</h2>
<p>Use <code>raster2pgsql -t 256x256 -I</code> to tile your PostGIS rasters. Combined with aggregate functions and spatial indexes, you get similar zonal stats query functionality and performance from PostGIS as you would with equivalent single-threaded Python/GDAL approaches.</p>
<p>There's still much to be explored regarding optimal tiling, parallel aggregates, out-of-band rasters, and the impact of source raster data file layout on performance. More to come in part 3... </p>Zonal Stats with PostGIS Rasters2018-12-31T00:00:00-07:002018-12-31T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2018-12-31:/zonal-stats-with-postgis-rasters.html<p><strong>Zonal statistics</strong> is a technique to summarize the values of a raster dataset
overlapped by a set of vector geometries.
The analysis can answer queries such as
"<em>Average elevation of each nation park</em>" or "<em>Maximum temperature by state</em>".</p>
<p>My goal in this article is to demonstrate a PostGIS implementation of …</p><p><strong>Zonal statistics</strong> is a technique to summarize the values of a raster dataset
overlapped by a set of vector geometries.
The analysis can answer queries such as
"<em>Average elevation of each nation park</em>" or "<em>Maximum temperature by state</em>".</p>
<p>My goal in this article is to demonstrate a PostGIS implementation of zonal stats and
compare the results and runtime performance to a reference Python implementation.</p>
<ul>
<li>Python with the <a href="https://github.com/perrygeo/python-rasterstats"><code>rasterstats</code> library</a> using GeoTIFF and GeoJSON files.</li>
<li>SQL queries using <a href="https://postgis.net/docs/RT_reference.html">PostGIS raster</a> and vector tables.</li>
</ul>
<h2>The Dataset</h2>
<p>For the raster data, let's use the <a href="https://www.eorc.jaxa.jp/ALOS/en/aw3d30/index.htm">ALOS Global Digital Surface Model</a> (from the Japan Aerospace Exploration Agency ©JAXA). I picked a 1°x1° tile with 1 arcsecond resolution (roughly 30 meters) in <code>GeoTIFF</code> format.</p>
<p>Next, generate 100 random circular polygon features covering the extent of the raster.
The following Python script shows how to do so with the Rasterio and Shapely libs.</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/usr/bin/env python</span>
<span class="kn">import</span> <span class="nn">json</span>
<span class="kn">import</span> <span class="nn">random</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="kn">import</span> <span class="nn">rasterio</span>
<span class="kn">from</span> <span class="nn">shapely.geometry</span> <span class="kn">import</span> <span class="n">Point</span>
<span class="k">def</span> <span class="nf">random_features_for_raster</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">steps</span><span class="o">=</span><span class="mi">100</span><span class="p">):</span>
<span class="k">with</span> <span class="n">rasterio</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">path</span><span class="p">)</span> <span class="k">as</span> <span class="n">src</span><span class="p">:</span>
<span class="n">x1</span><span class="p">,</span> <span class="n">y1</span><span class="p">,</span> <span class="n">x2</span><span class="p">,</span> <span class="n">y2</span> <span class="o">=</span> <span class="n">src</span><span class="o">.</span><span class="n">bounds</span>
<span class="n">xs</span> <span class="o">=</span> <span class="p">[</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">x1</span><span class="p">,</span> <span class="n">x2</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">steps</span><span class="p">)]</span>
<span class="n">ys</span> <span class="o">=</span> <span class="p">[</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">y1</span><span class="p">,</span> <span class="n">y2</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">steps</span><span class="p">)]</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">ys</span><span class="p">)):</span>
<span class="n">buffdist</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="mf">0.002</span><span class="p">,</span> <span class="mf">0.04</span><span class="p">)</span>
<span class="n">shape</span> <span class="o">=</span> <span class="n">Point</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span><span class="o">.</span><span class="n">buffer</span><span class="p">(</span><span class="n">buffdist</span><span class="p">)</span>
<span class="k">yield</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"Feature"</span><span class="p">,</span>
<span class="s2">"properties"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"name"</span><span class="p">:</span> <span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)},</span>
<span class="s2">"geometry"</span><span class="p">:</span> <span class="n">shape</span><span class="o">.</span><span class="n">__geo_interface__</span><span class="p">,</span>
<span class="p">}</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">"__main__"</span><span class="p">:</span>
<span class="k">for</span> <span class="n">feat</span> <span class="ow">in</span> <span class="n">random_features_for_raster</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]):</span>
<span class="nb">print</span><span class="p">(</span><span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">feat</span><span class="p">))</span>
</code></pre></div>
<p>Piping the features through <code>fio collect</code> gives us a valid GeoJSON collection with 100 polygon features.</p>
<div class="highlight"><pre><span></span><code><span class="n">python</span><span class="w"> </span><span class="n">make</span><span class="o">-</span><span class="n">random</span><span class="o">-</span><span class="n">features</span><span class="p">.</span><span class="n">py</span><span class="w"> </span><span class="n">N035W106_AVE_DSM</span><span class="p">.</span><span class="n">tif</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">fio</span><span class="w"> </span><span class="n">collect</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">regions</span><span class="p">.</span><span class="n">geojson</span><span class="w"></span>
</code></pre></div>
<p>Visualizing the data in QGIS shows what we're working with.
The goal is to find basic summary statistics for elevation in each of the regions:</p>
<p><img width=500 src="assets/img/20181231_data.jpg"></p>
<h2>Python with <code>rasterstats</code></h2>
<p>Using <code>zonal_stats</code> Python function allows you to express the processing at a high-level.</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/usr/bin/env</span>
<span class="kn">import</span> <span class="nn">json</span>
<span class="kn">from</span> <span class="nn">rasterstats</span> <span class="kn">import</span> <span class="n">zonal_stats</span>
<span class="n">features</span> <span class="o">=</span> <span class="n">zonal_stats</span><span class="p">(</span>
<span class="s2">"regions.geojson"</span>
<span class="s2">"N035W106_AVE_DSM.tif"</span>
<span class="n">stats</span><span class="o">=</span><span class="p">[</span><span class="s2">"count"</span><span class="p">,</span> <span class="s2">"sum"</span><span class="p">,</span> <span class="s2">"mean"</span><span class="p">,</span> <span class="s2">"std"</span><span class="p">,</span> <span class="s2">"min"</span><span class="p">,</span> <span class="s2">"max"</span><span class="p">],</span>
<span class="n">prefix</span><span class="o">=</span><span class="s2">"dem_"</span><span class="p">,</span>
<span class="n">geojson_out</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">"regions_with_elevation.geojson"</span><span class="p">,</span> <span class="s2">"w"</span><span class="p">)</span> <span class="k">as</span> <span class="n">dst</span><span class="p">:</span>
<span class="n">collection</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"FeatureCollection"</span><span class="p">,</span>
<span class="s2">"features"</span><span class="p">:</span> <span class="nb">list</span><span class="p">(</span><span class="n">features</span><span class="p">)}</span>
<span class="n">dst</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">collection</span><span class="p">))</span>
</code></pre></div>
<p>Running this script takes about <strong>2.4 seconds</strong> and
creates a new GeoJSON file <code>regions_with_elevation.geojson</code> with the following attributes, as viewed in QGIS</p>
<p><img width=500 src="assets/img/20181231_attrs.jpg"></p>
<p>And the resulting features can be mapped, in this case using the <code>dem_mean</code> field to show the average elevation of each
region:</p>
<p><img width=500 src="assets/img/20181231_elev.jpg"></p>
<h2>Postgis</h2>
<p>Instead of working with GeoTiff rasters and GeoJSON files, we can perform the same thing in PostGIS tables using SQL.</p>
<h3>Loading the data</h3>
<p>To create a raster table name <code>dem</code> from the GeoTIFF.</p>
<div class="highlight"><pre><span></span><code>raster2pgsql N035W106_AVE_DSM.tif dem | psql <connection info>
</code></pre></div>
<p>For some rasters, it might be necessary to explictly set the nodata value.</p>
<div class="highlight"><pre><span></span><code><span class="k">UPDATE</span><span class="w"> </span><span class="n">dem</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">rast</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ST_SetBandNoDataValue</span><span class="p">(</span><span class="n">rast</span><span class="p">,</span><span class="w"> </span><span class="o">-</span><span class="mi">32768</span><span class="p">);</span><span class="w"></span>
</code></pre></div>
<p>To create a vector table named <code>regions</code> from the GeoJSON file. (See <a href="https://www.gdal.org/drv_pg.html"><code>ogr2ogr</code> docs</a> for details on the connection info)</p>
<div class="highlight"><pre><span></span><code><span class="n">ogr2ogr</span><span class="w"> </span><span class="o">-</span><span class="n">f</span><span class="w"> </span><span class="n">PostgreSQL</span><span class="w"> </span><span class="nl">PG:</span><span class="s">"<connection info>"</span><span class="w"> </span><span class="n">regions</span><span class="p">.</span><span class="n">geojson</span><span class="w"></span>
</code></pre></div>
<h3>Zonal Statistics in SQL</h3>
<p>Now we can express our zonal stats analysis as a SQL statement.</p>
<div class="highlight"><pre><span></span><code><span class="k">SELECT</span><span class="w"></span>
<span class="w"> </span><span class="c1">-- provides: count | sum | mean | stddev | min | max</span>
<span class="w"> </span><span class="p">(</span><span class="n">ST_SummaryStats</span><span class="p">(</span><span class="n">ST_Clip</span><span class="p">(</span><span class="n">dem</span><span class="p">.</span><span class="n">rast</span><span class="p">,</span><span class="w"> </span><span class="n">regions</span><span class="p">.</span><span class="n">wkb_geometry</span><span class="p">,</span><span class="w"> </span><span class="k">TRUE</span><span class="p">))).</span><span class="o">*</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">regions</span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">name</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="n">regions</span><span class="p">.</span><span class="n">wkb_geometry</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">geometry</span><span class="w"></span>
<span class="k">INTO</span><span class="w"></span>
<span class="w"> </span><span class="n">regions_with_elevation</span><span class="w"></span>
<span class="k">FROM</span><span class="w"></span>
<span class="w"> </span><span class="n">dem</span><span class="p">,</span><span class="w"> </span><span class="n">regions</span><span class="p">;</span><span class="w"></span>
</code></pre></div>
<p>Let's break that down a bit</p>
<ul>
<li><code>FROM dem, regions</code> does a full product of the 100 regions X 1 raster row.</li>
<li><code>ST_Clip</code> function clips each raster to the precise geometry of each feature.</li>
<li><code>ST_SummaryStats</code> function summarizes each clipped raster and produces a count, sum, mean, standard deviation, min and max column.</li>
<li><code>INTO regions_with_elevation</code> creates a new table with the results.</li>
</ul>
<p>Conceptually, this approach is similar to the internal process used by <code>rasterstats</code>.</p>
<div class="highlight"><pre><span></span><code><span class="n">database</span><span class="o">=</span><span class="p">#</span><span class="w"> </span><span class="n">SELECT</span><span class="w"> </span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">min</span><span class="p">,</span><span class="w"> </span><span class="n">max</span><span class="p">,</span><span class="w"> </span><span class="n">mean</span><span class="p">,</span><span class="w"> </span><span class="n">count</span><span class="w"> </span><span class="n">from</span><span class="w"> </span><span class="n">regions_with_elevation</span><span class="p">;</span><span class="w"></span>
<span class="n">name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">min</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">max</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">mean</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">count</span><span class="w"></span>
<span class="o">-----+------+------+------------------+-------</span><span class="w"></span>
<span class="p">...</span><span class="w"></span>
<span class="mh">32</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mh">2104</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mh">2196</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">2141.13257847212</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mh">6977</span><span class="w"></span>
<span class="mh">33</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mh">2296</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mh">2667</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">2429.01510429154</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mh">4171</span><span class="w"></span>
<span class="mh">34</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mh">1784</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mh">1917</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">1852.97140948564</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mh">7485</span><span class="w"></span>
<span class="mh">35</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mh">2033</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mh">2144</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">2083.38765260393</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mh">51768</span><span class="w"></span>
<span class="mh">36</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mh">1796</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mh">1843</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">1828.69792802617</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mh">917</span><span class="w"></span>
<span class="mh">37</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mh">2072</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mh">2206</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">2122.1204719764</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mh">8475</span><span class="w"></span>
<span class="mh">38</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mh">2117</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mh">2214</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">2152.05270513076</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mh">5009</span><span class="w"></span>
<span class="mh">39</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mh">1915</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mh">2071</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mf">2040.61622890496</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mh">15762</span><span class="w"></span>
<span class="p">...</span><span class="w"></span>
</code></pre></div>
<p>Compared to attribute table screenshot above, the results are identical for all columns.
That isn't too surprising given that both approachs use GDAL's rasterization API under the hood.</p>
<p>Performance is a different story. The zonal stats query took <strong>81.90 seconds</strong>, roughly 34x slower than the Python code for
the equivalent result.</p>
<h2>Thoughts</h2>
<p>In terms of the expressiveness of the two approaches, I can see the appeal of both Python code and SQL queries.
Of course this will be personal preference depending on your background and familiarity with the environments.
The Python API hides the implementation details and is more flexible, with more statistics options and rasterization strategies.
But the SQL approach covers the common use case in a declarative query; it exposes the implementation
details yet remains very readable.</p>
<p>The performance impact is significant enough to be a deal breaker for PostGIS.
I haven't delved into the issues too closely;
There might some obvious ways to optimize this query but I haven't found anything as of writing this.
PostGIS experts, please get in touch if you find any speedups that I could consider here!</p>
<p>Performance combined with the additional overhead of managing postgres instances and data imports
tells me that running zonal stats in postgis will not be a great option unless you're already running PostgreSQL.
If your application is already committed to postgres and you want to integrate zonal stats tightly into
your data management strategy, it could be a viable approach.
For example, you could create a <code>TRIGGER</code> or an asyncronous worker via <code>LISTEN/NOTIFY</code> to ensure run zonal statistics is run each time a new feature is inserted into your vector table.</p>
<p>For most other zonal stats use cases, using <code>rasterstats</code> against local files or in-memory Python data will be a faster with less data management overhead.</p>Processing vector features in Python2016-04-16T00:00:00-06:002016-04-16T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2016-04-16:/processing-vector-features-in-python.html<p>Working with geospatial vector data typically involves manipulating collections of <strong>features</strong> - points, lines and polygons with attributes. You might change their geometries, alter their properties or both. This is nothing new. Tools like this have been around since the first days of GIS.
Notice the essential role of many of …</p><p>Working with geospatial vector data typically involves manipulating collections of <strong>features</strong> - points, lines and polygons with attributes. You might change their geometries, alter their properties or both. This is nothing new. Tools like this have been around since the first days of GIS.
Notice the essential role of many of these operations: taking vector data as input, doing some work and producing vector data as output.
While conceptually very simple, this logic often gets siloed, tied too closely to our specific implementions, formats, and systems.</p>
<p>The following is my take on the best practices for designing and building
your own vector processing modules using modern Python. The goals here are not primarily
performance but <strong>interoperability</strong> and <strong>composability</strong>.</p>
<h2>GeoJSON guides the way</h2>
<p>Using <strong>GeoJSON-like Feature mappings</strong> as a representation of simple features buys us a ton of interoperability.
It's not only <em>a</em> standard but the only one that can be translated to fully represent a feature as a python data structure.
Other standards specify file formats or data structures for geometries only.
Most Python modules that deal with geospatial data can speak GeoJSON-like data.
And if they don't, the data structure is easy to construct manually.
Let's take a look at our humble Feature</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'Feature'</span><span class="p">,</span>
<span class="s1">'properties'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'name'</span><span class="p">:</span> <span class="s1">'Example'</span><span class="p">},</span>
<span class="s1">'geometry'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'Point'</span><span class="p">,</span>
<span class="s1">'coordinates'</span><span class="p">:</span> <span class="p">[</span><span class="o">-</span><span class="mf">120.0</span><span class="p">,</span> <span class="mf">42.0</span><span class="p">]}}</span>
</code></pre></div>
<p>The <code>geometry</code>, the geographic component, is just iterables of lon, lat locations - you can represent points, lines, polygons or multis. The <code>properties</code> dictionary holds non-geographic information about the features, analagous to the "attribute table" in many GIS.</p>
<p>A quick note on the term "GeoJSON-like Feature mapping"... GeoJSON is a text serialization format. When we take GeoJSON and translate it into a python data structure it is no longer GeoJSON but a python dictionary (mapping) which follows the semantics of a GeoJSON Feature. From here on out, I'll just refer to this GeoJSON-like python data structure as a <strong>feature</strong>. If you're writing functions that work with vector data, they should accept and return features.</p>
<p><strong>That's the convention</strong>, the golden rule of writing Python vector processing functions</p>
<blockquote>
<p>Functions should take features as inputs and yield or return features.</p>
</blockquote>
<p>In other words, <strong>features in, features out</strong>. That's it. It's really that simple, and the simplicity buys you a great deal of potential.</p>
<h2>The IO Sandwich</h2>
<p>Functions which fit this convention will not read or write to anything outside of locally-scoped variables.
Does your function need to read from a file or write to the network in addition to processing features?
Why should one function be responsible for doing multiple tasks? We're striving for functions that do <em>one</em> thing - process vector features.</p>
<p>All the data your function needs should be passed in as arguments. Note that this is very different than passing in a file <em>path</em> and doing the reading and writing of data within your function:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># BAD</span>
<span class="n">process_features</span><span class="p">(</span><span class="s2">"/path/to/shapefile.shp"</span><span class="p">,</span> <span class="n">output</span><span class="o">=</span><span class="s2">"out.shp"</span><span class="p">)</span>
<span class="c1"># GOOD</span>
<span class="n">features</span> <span class="o">=</span> <span class="n">read_features</span><span class="p">(</span><span class="s2">"/path/to/shapefile.shp"</span><span class="p">)</span>
<span class="n">new_features</span> <span class="o">=</span> <span class="n">process_features</span><span class="p">(</span><span class="n">features</span><span class="p">)</span>
<span class="n">write_features</span><span class="p">(</span><span class="n">new_features</span><span class="p">,</span> <span class="n">output</span><span class="o">=</span><span class="s2">"out.shp"</span><span class="p">)</span>
</code></pre></div>
<p>You might be concerned about memory. But don't worry, well-behaved Python libraries can use <a href="https://wiki.python.org/moin/Generators">generators</a> to load the data as needed.</p>
<p>Another way to picture it is that your application should build an <strong>IO sandwich</strong> with all of the reading and writing happening outside of your processing function.</p>
<div class="highlight"><pre><span></span><code><span class="n">Read</span><span class="w"> </span><span class="n">Shapefile</span><span class="w"> </span><span class="n">into</span><span class="w"> </span><span class="n">Features</span><span class="w"> </span><span class="o">--></span><span class="w"> </span><span class="n">process_features</span><span class="w"> </span><span class="o">--></span><span class="w"> </span><span class="n">Write</span><span class="w"> </span><span class="n">Features</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">Shapefile</span><span class="w"></span>
</code></pre></div>
<p>That way anyone can use the same function with different inputs and outputs</p>
<div class="highlight"><pre><span></span><code><span class="n">Read</span><span class="w"> </span><span class="n">Web</span><span class="w"> </span><span class="kr">Service</span><span class="w"> </span><span class="n">into</span><span class="w"> </span><span class="n">Features</span><span class="w"> </span><span class="o">--></span><span class="w"> </span><span class="n">process_features</span><span class="w"> </span><span class="o">--></span><span class="w"> </span><span class="n">Write</span><span class="w"> </span><span class="n">Features</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">PostGIS</span><span class="w"></span>
</code></pre></div>
<p>Processing functions should not care where their input features come from or where the output features are going.
As long as <code>process_features</code> takes and returns features, any number of combinations are possible.</p>
<p>This not only decouples IO but allows us to <strong>compose</strong> processes together</p>
<div class="highlight"><pre><span></span><code><span class="n">Read</span><span class="w"> </span><span class="n">Features</span><span class="w"> </span><span class="o">--></span><span class="w"> </span><span class="n">process1</span><span class="w"> </span><span class="o">--></span><span class="w"> </span><span class="n">process2</span><span class="w"> </span><span class="o">--></span><span class="w"> </span><span class="n">process3</span><span class="w"> </span><span class="o">--></span><span class="w"> </span><span class="n">Write</span><span class="w"> </span><span class="n">Features</span><span class="w"></span>
</code></pre></div>
<h2>Other guidelines</h2>
<p>When possible, you should strive for <a href="https://en.wikipedia.org/wiki/Pure_function">pure functions</a>; avoid mutating data and return a clean copy.</p>
<p>Unless you have specific reason, leave the original feature intact except for the thing your function is expected to manipulate. For instance, if your function just alters the geometry, don't drop or change existing properties.</p>
<p>There are some cases where it makes sense to collect your features into a collection and return the entire thing at once. This will generally occur if the features are not independent. In many cases though, your features will largely be independent and can be processed one-by-one. For these situations, it makes sense to use a generator (i.e. <code>yield feature</code> instead of <code>return features</code>).</p>
<p>Finally, you should aim to make your features <em>serializable</em>. You should be able to <code>json.dump()</code> the output features. The <code>properties</code> member should not contain nested dicts which might confuse some GIS formats which require a flat structure. And if possible, avoid extending the json with extra elements outside of <code>properties</code>.</p>
<h2>An Example</h2>
<p>In this simple example, we'll write a single vector processing function that buffers a geometry by a specified distance.
Taking an input of points, for example:</p>
<p><img src="/assets/img/points.png" ></p>
<p>and buffering them by 10 units.</p>
<p><img src="/assets/img/points_buffered.png"></p>
<p>Here is the core processing function which follows the <strong>features in, features out</strong> convention</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">shapely.geometry</span> <span class="kn">import</span> <span class="n">shape</span>
<span class="k">def</span> <span class="nf">buffer</span><span class="p">(</span><span class="n">features</span><span class="p">,</span> <span class="n">buffer</span><span class="o">=</span><span class="mf">1.0</span><span class="p">):</span>
<span class="sd">"""Buffer a feature by specified units</span>
<span class="sd"> """</span>
<span class="k">for</span> <span class="n">feature</span> <span class="ow">in</span> <span class="n">features</span><span class="p">:</span>
<span class="n">geom</span> <span class="o">=</span> <span class="n">shape</span><span class="p">(</span><span class="n">feature</span><span class="p">[</span><span class="s1">'geometry'</span><span class="p">])</span> <span class="c1"># Convert to shapely geometry to operate on it</span>
<span class="n">geom</span> <span class="o">=</span> <span class="n">geom</span><span class="o">.</span><span class="n">buffer</span><span class="p">(</span><span class="n">buffer</span><span class="p">)</span> <span class="c1"># Buffer</span>
<span class="n">new_feature</span> <span class="o">=</span> <span class="n">feature</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span>
<span class="n">new_feature</span><span class="p">[</span><span class="s1">'geometry'</span><span class="p">]</span> <span class="o">=</span> <span class="n">geom</span><span class="o">.</span><span class="n">__geo_interface__</span>
<span class="k">yield</span> <span class="n">new_feature</span>
</code></pre></div>
<p>Then we could use it in our IO sandwich by reading features from a shapefile and outputing the Features to GeoJSON on stdout. Here's what our <strong>Python interface</strong> looks like</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">fiona</span> <span class="c1"># for input</span>
<span class="kn">import</span> <span class="nn">json</span> <span class="c1"># for output</span>
<span class="kn">from</span> <span class="nn">process</span> <span class="kn">import</span> <span class="n">buffer</span>
<span class="k">with</span> <span class="n">fiona</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s2">"data/points.shp"</span><span class="p">)</span> <span class="k">as</span> <span class="n">src</span><span class="p">:</span>
<span class="k">for</span> <span class="n">feature</span> <span class="ow">in</span> <span class="n">buffer</span><span class="p">(</span><span class="n">src</span><span class="p">,</span> <span class="mf">10.0</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">feature</span><span class="p">))</span>
</code></pre></div>
<p>So the python interface is looking good. What if we wanted to use it in a <strong>command line interface</strong>? Well luckily click and <a href="https://github.com/mapbox/cligj">cligj</a> has got the input covered. The <code>@cligj.features_in_arg</code> reads in an iterable of features from a file, a FeatureCollection or stream of Features.</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">click</span>
<span class="kn">import</span> <span class="nn">cligj</span>
<span class="kn">import</span> <span class="nn">json</span>
<span class="kn">from</span> <span class="nn">process</span> <span class="kn">import</span> <span class="n">buffer</span>
<span class="nd">@click</span><span class="o">.</span><span class="n">command</span><span class="p">()</span>
<span class="nd">@click</span><span class="o">.</span><span class="n">argument</span><span class="p">(</span><span class="s2">"distance"</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="nb">float</span><span class="p">)</span>
<span class="nd">@cligj</span><span class="o">.</span><span class="n">features_in_arg</span>
<span class="k">def</span> <span class="nf">buffer_cmd</span><span class="p">(</span><span class="n">features</span><span class="p">,</span> <span class="n">distance</span><span class="p">):</span>
<span class="k">for</span> <span class="n">feature</span> <span class="ow">in</span> <span class="n">buffer</span><span class="p">(</span><span class="n">features</span><span class="p">,</span> <span class="n">distance</span><span class="p">):</span>
<span class="n">click</span><span class="o">.</span><span class="n">echo</span><span class="p">(</span><span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">feature</span><span class="p">))</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">"__main__"</span><span class="p">:</span>
<span class="n">buffer_cmd</span><span class="p">()</span>
</code></pre></div>
<p>Which we can then use between <code>fio cat</code> and <code>fio collect</code> to process Features in a memory-efficient stream.</p>
<div class="highlight"><pre><span></span><code>$ fio cat data/points.shp <span class="p">|</span> python buffer_cmd.py <span class="m">10</span> <span class="p">|</span> fio collect > points_buffer.geojson
</code></pre></div>
<p>What about an <strong>HTTP interface</strong>? Flask provides us with a lightweight framework to turn our function into a web service:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">json</span>
<span class="kn">from</span> <span class="nn">flask</span> <span class="kn">import</span> <span class="n">Flask</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="n">Response</span>
<span class="kn">from</span> <span class="nn">process</span> <span class="kn">import</span> <span class="n">buffer</span>
<span class="n">app</span> <span class="o">=</span> <span class="n">Flask</span><span class="p">(</span><span class="vm">__name__</span><span class="p">)</span>
<span class="nd">@app</span><span class="o">.</span><span class="n">route</span><span class="p">(</span><span class="s1">'/buffer/<distance>'</span><span class="p">,</span> <span class="n">methods</span><span class="o">=</span><span class="p">[</span><span class="s1">'POST'</span><span class="p">])</span>
<span class="k">def</span> <span class="nf">index</span><span class="p">(</span><span class="n">distance</span><span class="p">):</span>
<span class="n">collection</span> <span class="o">=</span> <span class="n">request</span><span class="o">.</span><span class="n">get_json</span><span class="p">(</span><span class="n">force</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">distance</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="n">distance</span><span class="p">)</span>
<span class="n">new_features</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span>
<span class="n">buffer</span><span class="p">(</span><span class="n">collection</span><span class="p">[</span><span class="s1">'features'</span><span class="p">],</span> <span class="n">distance</span><span class="p">))</span>
<span class="n">collection</span><span class="p">[</span><span class="s1">'features'</span><span class="p">]</span> <span class="o">=</span> <span class="n">new_features</span>
<span class="k">return</span> <span class="n">Response</span><span class="p">(</span>
<span class="n">response</span><span class="o">=</span><span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">collection</span><span class="p">),</span>
<span class="n">status</span><span class="o">=</span><span class="mi">200</span><span class="p">,</span> <span class="n">mimetype</span><span class="o">=</span><span class="s2">"application/json"</span><span class="p">)</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="n">app</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">debug</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</code></pre></div>
<p>Which gives us a <code>buffer</code> web service to which you can post GeoJSON FeatureCollections and get back a buffered collection:</p>
<div class="highlight"><pre><span></span><code>$ fio dump data/points.shp <span class="p">|</span> <span class="se">\</span>
curl -X POST -d @- http://localhost:5000/buffer/10.0 > points_buffered.geojson
</code></pre></div>
<h2>Conclusion</h2>
<p>Writing your vector processing code to follow these simple conventions
enables great flexibility. You can use your code in a Python application,
a command line interface, an HTTP web service - all based on the same core processing functions.
Assuming you can write some glue code to express input and output as GeoJSON features,
this will work with <em>any</em> vector data source and is not constrained to a single context.
You can use this with any data, anywhere that supports Python. That's a pretty powerful concept,
all made possible by the simple convention of <strong>features in, features out</strong>.</p>Running Python with compiled code on AWS Lambda2015-10-10T00:00:00-06:002015-10-10T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2015-10-10:/running-python-with-compiled-code-on-aws-lambda.html<p>With the recent announcement that AWS Lambda <a href="https://aws.amazon.com/blogs/aws/aws-lambda-update-python-vpc-increased-function-duration-scheduling-and-more/">now supports Python</a>, I decided to take a look at using it for geospatial data processing.</p>
<p>Previously, I had built <a href="https://github.com/Ecotrust/growth-yield-batch">queue-based systems with Celery</a> that allow you to run discrete processing tasks in parallel on AWS infrastructure. Just start up as many workers …</p><p>With the recent announcement that AWS Lambda <a href="https://aws.amazon.com/blogs/aws/aws-lambda-update-python-vpc-increased-function-duration-scheduling-and-more/">now supports Python</a>, I decided to take a look at using it for geospatial data processing.</p>
<p>Previously, I had built <a href="https://github.com/Ecotrust/growth-yield-batch">queue-based systems with Celery</a> that allow you to run discrete processing tasks in parallel on AWS infrastructure. Just start up as many workers on EC2 instances as you need, set up a broker and a results store, add jobs to the queue and collect the results. The problem with this system is that you have to manage all of the infrastructure and services yourself.</p>
<p>Ideally you wouldn't need to worry about infrastructure at all. That is the promise of AWS Lambda. Lambda can respond to events, fire up a worker and run the task without you needing to worry about provisioning a server. This is especially nice for sporadic work loads in response to events like user-uploaded data where you need to scale up or down regularly.</p>
<p>The reality of AWS Lambda is that you <em>do</em> need to worry about infrastructure in a different way. The constraints of the runtime environment mean that you need to get creative if you're doing anything beyond the basics. <strong>If your task relies on compiled code</strong>, either Python C extensions or shared libraries, you have to jump through some hoops. And for any geo data processing you are going to use a good amount of compiled code to call into C libs (see numpy, rasterio, GDAL, geopandas, Fiona, and so on)</p>
<p>This article describes my approach to solving the problem of running Python with calls to native code on AWS Lambda.</p>
<h2>Outline</h2>
<p>The short version goes like this:</p>
<ol>
<li>Start an <strong>EC2 instance</strong> using the official Amazon Linux AMI (based on Red Hat Enterprise Linux)</li>
<li>On the EC2 insance, Build any <strong>shared libries</strong> from source.</li>
<li>Create a <strong>virtualenv</strong> with all your python dependecies.</li>
<li>Write a python <strong>handler</strong> function to respond to events and interact with other parts of AWS (e.g. fetch data from S3)</li>
<li>Write a python <strong>worker</strong>, as a command line interface, to process the data</li>
<li><strong>Bundle</strong> the virtualenv, your code and the binary libs into a zip file</li>
<li><strong>Publish</strong> the zip file to AWS Lambda</li>
</ol>
<p>The deployment process is a bit clunky but the benefit is that, once it works, you don't have any servers to manage! A fair tradeoff IMO.</p>
<p>The process will take a raster dataset uploaded to the input s3 bucket</p>
<p><img alt="dem" src="/assets/img/grenada_srtm_raster.png"></p>
<p>and automatically extract the shape of the valid data region, placing the resulting GeoJSON in the output s3 bucket.</p>
<p><img alt="shape" src="/assets/img/grenada_srtm_shape.png"></p>
<h2>Start EC2</h2>
<p>Under the hood, your Lambda functions are running on EC2 with Amazon Linux. You don't have to think about that at runtime but, if you're calling native compiled code, it needs to be compiled on a similar OS. Theoretically you could do this with your own version of RHEL or CentOS but to be safe it's easier to use the official Amazon Linux since we know that's the exact environment our code will be run in.</p>
<p>I'm not going to go over the details of <a href="http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html">setting up EC2</a> so I'll assume we already have our account set up. The AMI ids are listed <a href="https://aws.amazon.com/amazon-linux-ami/">here</a>, pick the appropriate one for your region</p>
<div class="highlight"><pre><span></span><code>aws ec2 run-instances --image-id ami-9ff7e8af \
--count 1 --instance-type t2.micro \
--key-name your-key --security-groups your-sg
</code></pre></div>
<p>And ssh in</p>
<div class="highlight"><pre><span></span><code><span class="n">ssh</span><span class="w"> </span><span class="o">-</span><span class="n">i</span><span class="w"> </span><span class="n">your</span><span class="o">-</span><span class="k">key</span><span class="p">.</span><span class="n">pem</span><span class="w"> </span><span class="n">ec2</span><span class="o">-</span><span class="k">user</span><span class="nv">@your</span><span class="p">.</span><span class="k">public</span><span class="p">.</span><span class="n">ip</span><span class="w"></span>
</code></pre></div>
<p>Make sure everything's up to date:</p>
<div class="highlight"><pre><span></span><code>sudo yum -y update
sudo yum -y upgrade
</code></pre></div>
<h2>Build shared libraries from source</h2>
<p>Because your Lambda function will run in a clean AWS linux environment, you can't assume any system libraries will be there. Compiling from source isn't the only option - you could install binaries from the <a href="http://elgis.argeo.org/">Enterprise Linux GIS</a> effort but those tend to be older versions. To get more recent libs, compiling from source is an effective approach.</p>
<p>First install some compile-time deps</p>
<div class="highlight"><pre><span></span><code>sudo yum install python27-devel python27-pip gcc libjpeg-devel zlib-devel gcc-c++
</code></pre></div>
<p>Then build and install proj4 to a local prefix</p>
<div class="highlight"><pre><span></span><code>wget https://github.com/OSGeo/proj.4/archive/4.9.2.tar.gz
tar -zvxf 4.9.2.tar.gz
cd proj.4-4.9.2/
./configure --prefix=/home/ec2-user/lambda/local
make
make install
</code></pre></div>
<p>And build GDAL, statically linking proj4</p>
<div class="highlight"><pre><span></span><code><span class="n">wget</span><span class="w"> </span><span class="n">http</span><span class="p">:</span><span class="o">//</span><span class="n">download</span><span class="o">.</span><span class="n">osgeo</span><span class="o">.</span><span class="n">org</span><span class="o">/</span><span class="n">gdal</span><span class="o">/</span><span class="mf">1.11</span><span class="o">.</span><span class="mi">3</span><span class="o">/</span><span class="n">gdal</span><span class="o">-</span><span class="mf">1.11</span><span class="o">.</span><span class="mf">3.</span><span class="n">tar</span><span class="o">.</span><span class="n">gz</span><span class="w"></span>
<span class="n">tar</span><span class="w"> </span><span class="o">-</span><span class="n">xzvf</span><span class="w"> </span><span class="n">gdal</span><span class="o">-</span><span class="mf">1.11</span><span class="o">.</span><span class="mf">3.</span><span class="n">tar</span><span class="o">.</span><span class="n">gz</span><span class="w"></span>
<span class="n">cd</span><span class="w"> </span><span class="n">gdal</span><span class="o">-</span><span class="mf">1.11</span><span class="o">.</span><span class="mi">3</span><span class="w"></span>
<span class="o">./</span><span class="n">configure</span><span class="w"> </span><span class="o">--</span><span class="n">prefix</span><span class="o">=/</span><span class="n">home</span><span class="o">/</span><span class="n">ec2</span><span class="o">-</span><span class="n">user</span><span class="o">/</span><span class="n">lambda</span><span class="o">/</span><span class="n">local</span><span class="w"> </span>\<span class="w"></span>
<span class="w"> </span><span class="o">--</span><span class="n">with</span><span class="o">-</span><span class="n">geos</span><span class="o">=/</span><span class="n">home</span><span class="o">/</span><span class="n">ec2</span><span class="o">-</span><span class="n">user</span><span class="o">/</span><span class="n">lambda</span><span class="o">/</span><span class="n">local</span><span class="o">/</span><span class="n">bin</span><span class="o">/</span><span class="n">geos</span><span class="o">-</span><span class="n">config</span><span class="w"> </span>\<span class="w"></span>
<span class="w"> </span><span class="o">--</span><span class="n">with</span><span class="o">-</span><span class="k">static</span><span class="o">-</span><span class="n">proj4</span><span class="o">=/</span><span class="n">home</span><span class="o">/</span><span class="n">ec2</span><span class="o">-</span><span class="n">user</span><span class="o">/</span><span class="n">lambda</span><span class="o">/</span><span class="n">local</span><span class="w"></span>
<span class="n">make</span><span class="w"></span>
<span class="n">make</span><span class="w"> </span><span class="n">install</span><span class="w"></span>
</code></pre></div>
<p>This should leave us with a nice shared library at <code>/home/ec2_user/lambda/local/lib/libgdal.so.1</code> that can be safely
moved to another AWS Linux box.</p>
<h2>Create a virtualenv</h2>
<p>Pretty straighforward but keep in mind that some of the dependecies here are compiled extensions so these builds are platform-specific - which is why we need to build it on the target Amazon Linux OS.</p>
<div class="highlight"><pre><span></span><code><span class="n">virtualenv</span><span class="w"> </span><span class="n">env</span><span class="w"></span>
<span class="n">source</span><span class="w"> </span><span class="n">env</span><span class="o">/</span><span class="n">bin</span><span class="o">/</span><span class="n">activate</span><span class="w"></span>
<span class="k">export</span><span class="w"> </span><span class="n">GDAL_CONFIG</span><span class="o">=/</span><span class="n">home</span><span class="o">/</span><span class="n">ec2</span><span class="o">-</span><span class="n">user</span><span class="o">/</span><span class="n">lambda</span><span class="o">/</span><span class="n">local</span><span class="o">/</span><span class="n">bin</span><span class="o">/</span><span class="n">gdal</span><span class="o">-</span><span class="n">config</span><span class="w"></span>
<span class="n">pip</span><span class="w"> </span><span class="n">install</span><span class="w"> </span><span class="n">rasterio</span><span class="w"></span>
</code></pre></div>
<h2>Python handler function</h2>
<p>The handler's job is to respond to the event (e.g. a new file created in an S3 bucket), perform any amazon-specific tasks (like
fetching data from s3) and invoke the worker. Importantly, in the context of this article, the handler
must set the <code>LD_LIBRARY_PATH</code> to point to any shared libraries that the worker may need.</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">subprocess</span>
<span class="kn">import</span> <span class="nn">uuid</span>
<span class="kn">import</span> <span class="nn">boto3</span>
<span class="n">libdir</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">getcwd</span><span class="p">(),</span> <span class="s1">'local'</span><span class="p">,</span> <span class="s1">'lib'</span><span class="p">)</span>
<span class="n">s3_client</span> <span class="o">=</span> <span class="n">boto3</span><span class="o">.</span><span class="n">client</span><span class="p">(</span><span class="s1">'s3'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">handler</span><span class="p">(</span><span class="n">event</span><span class="p">,</span> <span class="n">context</span><span class="p">):</span>
<span class="n">results</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">record</span> <span class="ow">in</span> <span class="n">event</span><span class="p">[</span><span class="s1">'Records'</span><span class="p">]:</span>
<span class="c1"># Find input/output buckets and key names</span>
<span class="n">bucket</span> <span class="o">=</span> <span class="n">record</span><span class="p">[</span><span class="s1">'s3'</span><span class="p">][</span><span class="s1">'bucket'</span><span class="p">][</span><span class="s1">'name'</span><span class="p">]</span>
<span class="n">output_bucket</span> <span class="o">=</span> <span class="s2">"</span><span class="si">{}</span><span class="s2">.geojson"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">bucket</span><span class="p">)</span>
<span class="n">key</span> <span class="o">=</span> <span class="n">record</span><span class="p">[</span><span class="s1">'s3'</span><span class="p">][</span><span class="s1">'object'</span><span class="p">][</span><span class="s1">'key'</span><span class="p">]</span>
<span class="n">output_key</span> <span class="o">=</span> <span class="s2">"</span><span class="si">{}</span><span class="s2">.geojson"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">key</span><span class="p">)</span>
<span class="c1"># Download the raster locally</span>
<span class="n">download_path</span> <span class="o">=</span> <span class="s1">'/tmp/</span><span class="si">{}{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">uuid</span><span class="o">.</span><span class="n">uuid4</span><span class="p">(),</span> <span class="n">key</span><span class="p">)</span>
<span class="n">s3_client</span><span class="o">.</span><span class="n">download_file</span><span class="p">(</span><span class="n">bucket</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">download_path</span><span class="p">)</span>
<span class="c1"># Call the worker, setting the environment variables</span>
<span class="n">command</span> <span class="o">=</span> <span class="s1">'LD_LIBRARY_PATH=</span><span class="si">{}</span><span class="s1"> python worker.py "</span><span class="si">{}</span><span class="s1">"'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">libdir</span><span class="p">,</span> <span class="n">download_path</span><span class="p">)</span>
<span class="n">output_path</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">check_output</span><span class="p">(</span><span class="n">command</span><span class="p">,</span> <span class="n">shell</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="c1"># Upload the output of the worker to S3</span>
<span class="n">s3_client</span><span class="o">.</span><span class="n">upload_file</span><span class="p">(</span><span class="n">output_path</span><span class="o">.</span><span class="n">strip</span><span class="p">(),</span> <span class="n">output_bucket</span><span class="p">,</span> <span class="n">output_key</span><span class="p">)</span>
<span class="n">results</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">output_path</span><span class="o">.</span><span class="n">strip</span><span class="p">())</span>
<span class="k">return</span> <span class="n">results</span>
</code></pre></div>
<p>It's important that the handler function does not import any modules which require
dynamic linking. For example, you cannot <code>import rasterio</code> in the main python
handler since the dynamic linker doesn't yet know where to look for the GDAL shared library.
Your can control the linker paths using the <code>LD_LIBRARY_PATH</code> environment variable
but only <em>before</em> the process is started. Lambda doesn't give you any control over the environment variables
of the handler function itself. I
tried hacks like creating new processes within the handler using <code>os.execv</code> or <code>multiprocessing</code> pools but the user running the lambda function
doesn't have the necessary permissions to that (both give you <code>OSErrors</code> - <code>[Errno 13] Permission Denied</code> and <code>[Errno 38] Function not implemented</code> respectively).</p>
<p>Fortunately, Lambda lets you call out to the shell so we can just do our real work through a worker script exposed as a command line interface (details in the next section). While at first this feels clunky, it has the side benefit of forcing separation of your AWS code from your business logic which can be written and tested separately.</p>
<h2>Worker</h2>
<p>The worker script can be written in any language, compiled or interpreted, so long as it follows the basic rules of command line interfaces. We're using Python in the handler to set up the appropriate environment. For this example, the worker will <em>also</em> be written in Python because of it's awesome support for geospatial data processing. But it could be written in Bash or C or just about anything so long as it's runtime environment can be configured with environment variables and arguments.</p>
<p>In this case, the handler is calling <code>worker.py</code> which looks like:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">rasterio</span>
<span class="kn">from</span> <span class="nn">tempfile</span> <span class="kn">import</span> <span class="n">NamedTemporaryFile</span>
<span class="kn">import</span> <span class="nn">json</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="kn">from</span> <span class="nn">rasterio</span> <span class="kn">import</span> <span class="n">features</span>
<span class="k">def</span> <span class="nf">raster_shape</span><span class="p">(</span><span class="n">raster_path</span><span class="p">):</span>
<span class="k">with</span> <span class="n">rasterio</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">raster_path</span><span class="p">)</span> <span class="k">as</span> <span class="n">src</span><span class="p">:</span>
<span class="c1"># read the first band and create a binary mask</span>
<span class="n">arr</span> <span class="o">=</span> <span class="n">src</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="n">ndv</span> <span class="o">=</span> <span class="n">src</span><span class="o">.</span><span class="n">nodata</span>
<span class="n">binarray</span> <span class="o">=</span> <span class="p">(</span><span class="n">arr</span> <span class="o">==</span> <span class="n">ndv</span><span class="p">)</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="s1">'uint8'</span><span class="p">)</span>
<span class="c1"># extract shapes from raster</span>
<span class="n">shapes</span> <span class="o">=</span> <span class="n">features</span><span class="o">.</span><span class="n">shapes</span><span class="p">(</span><span class="n">binarray</span><span class="p">,</span> <span class="n">transform</span><span class="o">=</span><span class="n">src</span><span class="o">.</span><span class="n">transform</span><span class="p">)</span>
<span class="c1"># create geojson feature collection</span>
<span class="n">fc</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'FeatureCollection'</span><span class="p">,</span>
<span class="s1">'features'</span><span class="p">:</span> <span class="p">[]}</span>
<span class="k">for</span> <span class="n">geom</span><span class="p">,</span> <span class="n">val</span> <span class="ow">in</span> <span class="n">shapes</span><span class="p">:</span>
<span class="k">if</span> <span class="n">val</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span> <span class="c1"># not nodata, i.e. valid data</span>
<span class="n">feature</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'Feature'</span><span class="p">,</span>
<span class="s1">'properties'</span><span class="p">:</span> <span class="p">{</span><span class="s1">'name'</span><span class="p">:</span> <span class="n">raster_path</span><span class="p">},</span>
<span class="s1">'geometry'</span><span class="p">:</span> <span class="n">geom</span><span class="p">}</span>
<span class="n">fc</span><span class="p">[</span><span class="s1">'features'</span><span class="p">]</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">feature</span><span class="p">)</span>
<span class="c1"># Write to file</span>
<span class="k">with</span> <span class="n">NamedTemporaryFile</span><span class="p">(</span><span class="n">suffix</span><span class="o">=</span><span class="s2">".geojson"</span><span class="p">,</span> <span class="n">delete</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span> <span class="k">as</span> <span class="n">temp</span><span class="p">:</span>
<span class="n">temp</span><span class="o">.</span><span class="n">file</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">fc</span><span class="p">))</span>
<span class="k">return</span> <span class="n">temp</span><span class="o">.</span><span class="n">name</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">"__main__"</span><span class="p">:</span>
<span class="n">in_path</span> <span class="o">=</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="n">out_path</span> <span class="o">=</span> <span class="n">raster_shape</span><span class="p">(</span><span class="n">in_path</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">out_path</span><span class="p">)</span>
</code></pre></div>
<p>Notice how the worker itself has no knowledge of AWS events or S3 - it works entirely on the local filesystem and thus can be used in other contexts and tested much more easily.</p>
<h2>Bundle</h2>
<p>In order to deploy to Lambda, you need to package it up in a zip file in a slightly unusual manner. All of your Python packages and your handler script should be at the root while the shared libraries can be put in a directory (<code>local/lib</code> in this case)</p>
<div class="highlight"><pre><span></span><code>cd ~/lambda
zip -9 bundle.zip handler.py
zip -r9 bundle.zip worker.py
zip -r9 bundle.zip local/lib/libgdal.so.1
cd $VIRTUAL_ENV/lib/python2.7/site-packages
zip -r9 ~/lambda/bundle.zip *
cd $VIRTUAL_ENV/lib64/python2.7/site-packages
zip -r9 ~/lambda/bundle.zip *
</code></pre></div>
<h2>Publish</h2>
<p>The details of setting up a Lambda function are far too verbose for this article - I would suggest running through the <a href="http://docs.aws.amazon.com/lambda/latest/dg/python-walkthrough-s3-events-adminuser.html">AWS S3 walkthrough</a> to get the basic S3 example working first. Then use the AWS CLI to update your existing Lambda function:</p>
<div class="highlight"><pre><span></span><code>aws lambda update-function-code \
--function-name testfunc1 \
--zip-file fileb://bundle.zip
</code></pre></div>
<h1>The end result</h1>
<p>Uploading a raster dataset to your S3 bucket should now trigger the Lambda function which will create a new GeoJSON in the output bucket. All automatically invoked based on the S3 events and completely scalable without having to worry about managing or provisioning servers. Nifty!</p>
<p>The worker and handler code above are intentionally kept short to be more readable. In real usage they would need significantly more error handling and conditionals to handle edge cases, malformed inputs, etc.</p>
<p>It occured to me after writing this that there really is nothing Python-specific about this approach - the handler could just as easily have been written in Javascript and the worker in some other language. But this should provide a general approach for incorporating native code of any sort in AWS Lambda.</p>
<p>It remains to be seen if this approach is faster or cheaper than a queue-based system with autoscaled EC2 instances. If you're doing a constantly-high workload with lots of data, it's probably safe to say that Lambda is not appropriate. If you're doing sporadic workloads with some discrete processing task based on user-uploaded data, Lambda might be the ticket. The primary advantage is not necessarily speed or cost but reduced infrastructure complexity and hands-off autoscaling.</p>Python affine transforms2015-09-13T00:00:00-06:002015-09-13T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2015-09-13:/python-affine-transforms.html<p><em>Raster data coordinate handling with 6-element geotransforms is a pain. Use the <a href="https://github.com/sgillies/affine">affine</a> Python library instead.</em></p>
<p>The typical geospatial coordinate reference system is defined on a cartesian plane with the 0,0 origin in the bottom left and X and Y increasing as you go up and to the right …</p><p><em>Raster data coordinate handling with 6-element geotransforms is a pain. Use the <a href="https://github.com/sgillies/affine">affine</a> Python library instead.</em></p>
<p>The typical geospatial coordinate reference system is defined on a cartesian plane with the 0,0 origin in the bottom left and X and Y increasing as you go up and to the right. But raster data, coming from its image processing origins, uses a different referencing system to access pixels. We refer to rows and columns with the 0,0 origin in the upper left and rows increase and you move <em>down</em> while the columns increase as you go right. Still a cartesian plane but not the same one.
<img alt="xyrowcol" src="/assets/img/xyrowcol.png"></p>
<p>So how do you transform between the two? <a href="https://en.wikipedia.org/wiki/Transformation_matrix#Affine_transformations">Affine transformations</a> provide a simple way to do it through the use of matrix algebra. Geospatial software of all varieties use an affine transform (sometimes refered to as "geotransform") to go from raster rows/columns to the x/y of the coordinate reference system. Converting from x/y back to row/col uses the inverse of the affine transform. Of course the software implementations vary widely.</p>
<p>For the remainder, I'll assume the simple case of a non-rotated "north up" raster as that is by far the most common case. </p>
<p>If you're coming from the matrix algebra perspective, you can ignore the constants in the affine matrix and refer to the the six paramters as <code>a, b, c, d, e, f</code>. This is the ordering and notation used by the <a href="https://github.com/sgillies/affine">affine</a> Python library.</p>
<ul>
<li><strong>a</strong> = width of a pixel</li>
<li><strong>b</strong> = row rotation (typically zero)</li>
<li><strong>c</strong> = x-coordinate of the upper-left corner of the upper-left pixel</li>
<li><strong>d</strong> = column rotation (typically zero)</li>
<li><strong>e</strong> = height of a pixel (typically negative)</li>
<li><strong>f</strong> = y-coordinate of the of the upper-left corner of the upper-left pixel</li>
</ul>
<p>Perhaps the most pervasive implementation of affine transform encoding in the GIS world is the <a href="http://webhelp.esri.com/arcims/9.3/General/topics/author_world_files.htm">ESRI World File</a>. The world file is a simple text file accompanying any raster image which uses six line-separated values in this order:</p>
<ul>
<li><strong>a</strong> = width of a pixel</li>
<li><strong>d</strong> = column rotation (typically zero)</li>
<li><strong>b</strong> = row rotation (typically zero)</li>
<li><strong>e</strong> = height of a pixel (typically negative)</li>
<li><strong>c</strong> = x-coordinate of the <em>center</em> of the upper-left pixel</li>
<li><strong>f</strong> = y-coordinate of the <em>center</em> of the upper-left pixel</li>
</ul>
<p>It's important to note that the <strong>c</strong> and <strong>f</strong> parameters refer to the center of the cell, not the origin!</p>
<p>GDAL also uses the 6 parameter transform in yet a different order with the "Geotransform" array</p>
<ul>
<li><strong>c</strong> = x-coordinate of the upper-left corner of the upper-left pixel</li>
<li><strong>a</strong> = width of a pixel</li>
<li><strong>b</strong> = row rotation (typically zero)</li>
<li><strong>f</strong> = y-coordinate of the of the upper-left corner of the upper-left pixel</li>
<li><strong>d</strong> = column rotation (typically zero)</li>
<li><strong>e</strong> = height of a pixel (typically negative)</li>
</ul>
<p>None of those orderings are particularly intutive but at least the first, as implemented by <code>affine</code>, is "correct" from the matrix algebra perspective. </p>
<p>For python programmers looking to work with raster data, the <code>osgeo.gdal</code> library has existed for quite a while. With it the notion of a 6-tuple geotransform in GDAL ordering has become pervasive. And if ordering were the only issue, it wouldn't necessarily be worth switching to the use of the <code>affine</code> library. The more convincing argument for the use of <code>affine</code> is the ease with which you can transform coordinates. In other words, why should you have to worry about ordering of parameters at all?</p>
<p>When dealing with the geotransform as a simple 6-element tuple, you'll probably end up writing code like this to do the actual conversion: </p>
<div class="highlight"><pre><span></span><code># Using osgeo.gdal and GDAL geotransform 6-tuples
gt = ds.GetGeoTransform()
# col, row to x, y
x = (col * gt[1]) + gt[0]
y = (row * gt[5]) + gt[3]
# x,y to col,row
col = int((x - gt[0]) / gt[1])
row = int((y - gt[3]) / gt[5])
</code></pre></div>
<p>I'd be willing to guess that variations of that formula exist in hundreds of python codebases. Not very complicated math but opaque enough not to commit to memory. It's also very easy to slip up ("Is the y origin element 4 or 5?") and introduce non-obvious bugs. Why should such a basic formulation be reimplemented by every programmer? Again, why rely on element ordering at all? <code>affine</code>, through the use of clever operation overloading, gives you a much simpler interface:</p>
<div class="highlight"><pre><span></span><code># Using rasterio and affine
a = ds.affine
# col, row to x, y
x, y = a * (col, row)
# x, y to col, row
col, row = ~a * (x, y)
</code></pre></div>
<p>Clean, nice looking code that's harder to get wrong, wouldn't you agree? And as @Asgerpetersen <a href="https://twitter.com/perrygeo/status/643156086229331968">pointed out</a>, if there were a non-zero rotation parameter, the affine example would handle it seamlessly while the geotransform formula would fail. </p>
<p>Also, interoperability with GDAL-style geotransforms is painless</p>
<div class="highlight"><pre><span></span><code><span class="c1"># construct from our GDAL geotransform</span><span class="w"></span>
<span class="n">a</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Affine</span><span class="o">.</span><span class="n">from_gdal</span><span class="p">(</span><span class="o">*</span><span class="n">gt</span><span class="p">)</span><span class="w"></span>
<span class="n">gt</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">a</span><span class="o">.</span><span class="n">to_gdal</span><span class="p">()</span><span class="w"></span>
</code></pre></div>
<p>As is the ability to read/write from World Files</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">affine</span> <span class="kn">import</span> <span class="n">loadsw</span><span class="p">,</span> <span class="n">dumpsw</span>
<span class="c1"># Read from World File</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'raster.tfw'</span><span class="p">)</span> <span class="k">as</span> <span class="n">tfw</span><span class="p">:</span>
<span class="n">a</span> <span class="o">=</span> <span class="n">loadsw</span><span class="p">(</span><span class="n">tfw</span><span class="o">.</span><span class="n">read</span><span class="p">())</span>
<span class="c1"># Write to World File</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'other.wld'</span><span class="p">,</span> <span class="s1">'w'</span><span class="p">)</span> <span class="k">as</span> <span class="n">dest</span><span class="p">:</span>
<span class="n">dest</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">dumpsw</span><span class="p">(</span><span class="n">a</span><span class="p">))</span>
</code></pre></div>
<p>With <a href="https://github.com/mapbox/rasterio">rasterio</a> planning to deprecate the use of GDAL-style geotransforms in the 1.0 release, it's never too early to start making the switch. Your cleaner raster coordinate code will be well worth the effort. </p>Raspberry Pi: real-time sensor plots with websocketd2015-03-02T00:00:00-07:002015-03-02T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2015-03-02:/raspberry-pi-real-time-sensor-plots-with-websocketd.html<p>This year I'm starting to delve into some electronics projects and hardware hacking.
What follows is an account of my first end-to-end Raspberry Pi project.
In terms of functionality, it doesn't do much at the moment - just reads from
a photoresistor sensor and plots the light levels in the corner …</p><p>This year I'm starting to delve into some electronics projects and hardware hacking.
What follows is an account of my first end-to-end Raspberry Pi project.
In terms of functionality, it doesn't do much at the moment - just reads from
a photoresistor sensor and plots the light levels in the corner of my office. Eventually,
I want to hook up a couple of light, moisture and temperature sensors throughout
my garden to do some science experiments and/or remind myself to water
the tomatoes. This is but the first step
in that larger project...</p>
<ul>
<li>
<p>The Pi is wired up to a 3.3v circuit with a photoresistor.</p>
</li>
<li>
<p>The state of the digital input pins are read by a python program.</p>
</li>
<li>
<p>The readings are streamed to a websocket via log file.</p>
</li>
<li>
<p>The HTML/Javascript interface connects with the websocket and plots the values in real time. </p>
</li>
</ul>
<p><img src="assets/img/rpi_websockets.png"></p>
<p>Although it's all just for fun at this point, I've discovered a lot of great unix networking
tools and javascript libraries
that will come in handy in my day job as well. Here's the details on how it all came together...</p>
<h2>The circuit</h2>
<p>I implemented the design
from <a href="https://learn.adafruit.com/basic-resistor-sensor-reading-on-raspberry-pi/basic-photocell-reading">the adafruit tutorial</a> on the subject. The adafruit
image shows the basic idea:</p>
<p><img src="https://learn.adafruit.com/system/assets/assets/000/001/321/medium800/raspberry_pi_photocell.jpg?1396770994"></p>
<p>The <strong>photoresistor</strong> provides increased resistance to electric current as the visible light becomes dimmer. Conversely, resistance decreases as light becomes brighter. It is an analog sensor but
the Raspberry Pi only has digital inputs (the general purpose input output or GPIO pins).
To solve that, we can employ a capacitor using "RC timing". </p>
<p>A <strong>capacitor</strong> builds up voltage
over time and, when this voltage hits ~1.4V, the digital input pin reads "high". So
instead of taking a direct analog reading, we set a loop and time how long it
takes for the capacitor to "fill up". </p>
<p>If the time interval is small (i.e. the capacitor is charging rapidly), there
is less resistance from our analog sensor which means more light. If the time
interval is large (i.e. the capacitor is taking a long time to charge on each cycle),
there is more resistance and less light.</p>
<p>Wired up to the photoresistor on my 25 year-old Radio Shack Electronics Learning Lab,
it looks a bit clunkier but still does the trick:</p>
<p><img src="/assets/img/withpi.png" alt="withpi"></p>
<p>Quick side note: The ribbon and connectors between the raspberry pi and the breadboard are called
a <a href="http://www.adafruit.com/product/914">Pi Cobber</a>. It makes working with the
GPIO pins easier but, as you can tell from the photos, the incoming cable obstructs access a bit.
I might take a look at the <a href="https://www.adafruit.com/products/1105">T-Cobbler</a>
which promises to clear up some vertical space on the breadboard.</p>
<h2>Reading digital input pins from an analog sensor</h2>
<p>In order to read input from our analog pins, we can use the <a href="https://pypi.python.org/pypi/RPi.GPIO">RPi.GPIO</a> python library. </p>
<p>There's not much more that I can add to <a href="https://learn.adafruit.com/basic-resistor-sensor-reading-on-raspberry-pi/basic-photocell-reading">the adafruit tutorial</a> which covers the topic well. I made a few modifications:</p>
<ul>
<li>output a unix timestamp along with the reading</li>
<li>flush the output to <code>stdout</code> after every reading to make sure the output isn't buffered.</li>
</ul>
<div class="highlight"><pre><span></span><code><span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">"__main__"</span><span class="p">:</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="c1"># Get sensor timing and unix timestamp</span>
<span class="n">reading</span> <span class="o">=</span> <span class="n">RCtime</span><span class="p">(</span><span class="mi">18</span><span class="p">)</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">()</span>
<span class="n">timestamp</span> <span class="o">=</span> <span class="n">to_unix_timestamp</span><span class="p">(</span><span class="n">n</span><span class="p">)</span>
<span class="nb">print</span> <span class="s2">"</span><span class="si">{}</span><span class="s2">,</span><span class="si">{}</span><span class="s2">"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">timestamp</span><span class="p">,</span> <span class="n">reading</span><span class="p">)</span>
<span class="n">sys</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">flush</span><span class="p">()</span>
</code></pre></div>
<p>You can read the complete <a href="https://github.com/perrygeo/pi_sensor_realtime/blob/master/read_sensor.py">read_sensor.py script</a> on github. </p>
<p>With the script in place and the circuit wired up, I can fire up the script </p>
<div class="highlight"><pre><span></span><code>sudo python read_sensor.py
</code></pre></div>
<p>and see the timestamp and sensor reading written to the console as comma-separated values:</p>
<div class="highlight"><pre><span></span><code><span class="mf">1425505117.05</span><span class="p">,</span><span class="mf">793</span><span class="w"></span>
<span class="mf">1425505117.16</span><span class="p">,</span><span class="mf">802</span><span class="w"></span>
<span class="mf">1425505117.38</span><span class="p">,</span><span class="mf">768</span><span class="w"></span>
<span class="mf">1425505117.82</span><span class="p">,</span><span class="mf">709</span><span class="w"></span>
<span class="mf">1425505117.93</span><span class="p">,</span><span class="mf">801</span><span class="w"></span>
<span class="mf">1425505118.05</span><span class="p">,</span><span class="mf">798</span><span class="w"></span>
</code></pre></div>
<p>So what do those values mean? They represent a count of the number of cycles it took to
charge the capacitor. Not a meaningful number by itself but it could be calibrated to
use standard units or simply used as relative values (lower value == brighter light)</p>
<p>It's important to note that, on a Linux machine, you can't be guaranteed that your
event loop won't get interrupted by other processes. So you probably shouldn't use Linux
as a real-time sensor platform directly. However, it works well enough for demonstration
provided your Raspberry Pi isn't bogged down by other CPU-intensive processes.</p>
<p>Another caveat with this approach - we can only use a <em>single process</em> to access the GPIO
pins in this manner. Having multiple processes or threads setting/reading GPIO pin states
would cause inaccuracies as each process could reset the pins mid-cycle and interrupt
the timing of other processes. </p>
<h2>Streaming websockets</h2>
<!-- <iframe border=0 frameborder=0 height=410 width=550 src="http://twitframe.com/show?url=https%3A%2F%2Ftwitter.com%2Fperrygeo%2Fstatus%2F570721261715742720">
</iframe> -->
<p>Websockets are an extension to HTTP that allow data to be sent <em>from</em> a server <em>to</em> a client
using a persistent connection. Think pushing notification messages. </p>
<p><a href="http://websocketd.com">websocketd</a> allows you to
take the standard output from any unix program and publish it on a
websocket. It can also work with standard input, opening up the doors for some
amazing software workflows: imagine taking any well behaved Unix command and immediately
wrapping it's functionality in a web protocol! </p>
<p>To output the sensor readings using a websocket, I'll first run the <code>read_sensor.py</code> script in the background with high priority (<code>nice -20</code>) and redirect the output to a logfile:</p>
<div class="highlight"><pre><span></span><code>sudo nice -20 python read_sensor.py > log.txt <span class="p">&</span>
</code></pre></div>
<p>Then I will run <code>websocketd</code> on port 8080, serve a few static
files and provide a command to run.
In this case, the command is the basic unix <code>tail -f</code> which streams the contents of the log file.</p>
<div class="highlight"><pre><span></span><code>websocketd --port<span class="o">=</span><span class="m">8080</span> --staticdir<span class="o">=</span>./static tail -f log.txt
</code></pre></div>
<p>Now the sensor readings are being logged and a websocket server is running.
For each client that connects to the websocket, a new process (<code>tail -f log.txt</code>) will be started
and <code>stdout</code> will be streamed to that client via websocket messages.</p>
<p>Note that the <code>tail -f</code> command is <em>not</em> yet running until a websocket client makes a
connection. Because it runs in its own process and simply reads the sensor log file,
we can start as many of them as our hardware can handle.</p>
<p>In summary, the pattern is: run a single process that reads from the GPIO pins and writes to a sensor log, then fire off multiple processes that read the log and stream the output over websockets.</p>
<p>Now we're ready to test it. </p>
<h2>HTML/Javascript interface</h2>
<p>Working with websockets in Javascript is fairly straightforward. First, create a connection</p>
<div class="highlight"><pre><span></span><code><span class="kd">var</span><span class="w"> </span><span class="nx">ws</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nx">WebSocket</span><span class="p">(</span><span class="s1">'ws://example.org:8080/'</span><span class="p">);</span><span class="w"></span>
</code></pre></div>
<p>then set some callbacks to handle incoming messages from the server.</p>
<div class="highlight"><pre><span></span><code><span class="nx">ws</span><span class="p">.</span><span class="nx">onmessage</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kd">function</span><span class="p">(</span><span class="nx">e</span><span class="p">){</span><span class="w"></span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s2">"We got something:"</span><span class="p">,</span><span class="w"> </span><span class="nx">e</span><span class="p">.</span><span class="nx">data</span><span class="p">);</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Websockets are built into almost every modern browser so this functionality
works out of the box. But if the connection is lost for any reason, the native Websocket
implementations do not automatically reconnect. To solve that problem,
there is <a href="https://github.com/joewalnes/reconnecting-websocket">ReconnectingWebSocket</a> which does exactly what it sounds like; attempts to
reconnect automatically when needed.</p>
<p>Then to create an animated real time plot of the streaming data, you'll need a javascript library like <a href="http://smoothiecharts.org/">Smoothie Charts</a>.</p>
<p>I should also note that the server (<a href="http://websocketd.com">websocketd</a>), the javascript plotting library (<a href="http://smoothiecharts.org/">Smoothie Charts</a>), and the javascript networking library (<a href="https://github.com/joewalnes/reconnecting-websocket">ReconnectingWebSocket</a>) were all written by <a href="https://github.com/joewalnes/">joewalnes</a> - this guy is responsible for making the three biggest pieces of this system and deserves mad props! </p>
<p>All of the HTML and js can be found here: <a href="https://github.com/perrygeo/pi_sensor_realtime/blob/master/static/index.html">index.html</a>. </p>
<p>Finally, here is the result. A streaming, real time plot of sensor readings. This clip was recorded as I came into my office, opened a few
windows and turned on a light. As the room gets brighter, you can see the sensor readings drop, and then rise again as I pass my hand over sensor a few times to block the light.</p>
<!-- <img src="assets/img/rpi_plot.png"> -->
<iframe id="player" type="text/html" width="640" height="390"
src="https://www.youtube.com/embed/CfwRj3HP3j0?enablejsapi=1&origin=http://example.com"
frameborder="0"></iframe>
<p>Maybe not incredibly useful in it's current state but it provided an excellent learning experience to work on the entire stack, integrating electronics and hardware with web software. It opens the doors for all sorts of new projects. All of the code is available on my <a href="https://github.com/perrygeo/pi_sensor_realtime">github repo</a>. Any questions? Shoot me an email or message on twitter. I'm a beginner when it comes to electrical theory so somebody please correct me if I'm way off the mark on something. </p>Zonal statistics: histograms as user-defined aggregate functions2015-02-23T00:00:00-07:002015-02-23T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2015-02-23:/zonal-statistics-histograms-as-user-defined-aggregate-functions.html<p>sthoijslkfjaslfjlasglaskglsdfjgdjsflkgjdsfl rglkje</p><h2>Introduction</h2>
<p>Zonal statistics allow you to summarize raster datasets based on vector geometries
by aggregating all pixels associated with each vector feature, typically to a single scalar value. For example, you might want the <em>mean</em> elevation of each country against an SRTM Digital Elevation
Model (DEM). This is easily accomplished in python using <a href="https://github.com/perrygeo/python-raster-stats"><code>rasterstats</code></a>:</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">rasterstats</span> <span class="kn">import</span> <span class="n">zonal_stats</span>
<span class="n">stats</span> <span class="o">=</span> <span class="n">zonal_stats</span><span class="p">(</span><span class="s1">'countries.shp'</span><span class="p">,</span> <span class="s1">'elevation.tif'</span><span class="p">,</span> <span class="n">stats</span><span class="o">=</span><span class="s2">"mean"</span><span class="p">,</span> <span class="n">copy_properties</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="k">for</span> <span class="n">s</span> <span class="ow">in</span> <span class="n">stats</span><span class="p">:</span>
<span class="nb">print</span> <span class="n">s</span><span class="p">[</span><span class="s1">'name'</span><span class="p">],</span> <span class="n">s</span><span class="p">[</span><span class="s1">'mean'</span><span class="p">]</span>
</code></pre></div>
<p>Which would give us output similar to below, with the mean elevation (meters) for each country:</p>
<div class="highlight"><pre><span></span><code>Afghanistan 1826.38
Netherlands 8.78
Nepal 2142.28
Zimbabwe 980.85
</code></pre></div>
<h2>Zonal Histograms</h2>
<p>Using the built-in aggregate functions in <code>rasterstats</code> can reveal a lot about
about the underlying raster dataset (see <a href="https://github.com/perrygeo/python-raster-stats#statistics">statistics</a> for full list). Most of the time the standard descriptive statistics
like min, max, mean, median, etc. can tell us everything we need to know.</p>
<p>But what if we want to retain more information about the underlying distribution of
values? Instead of simply stating </p>
<blockquote>
<p>Afghanistan is, on average, 1826.38 meters above sea level</p>
</blockquote>
<p>supposed we wanted to see how much of the country is in high vs low elevation areas.
We could bin the elevations into meaningful ranges (say 0-200 meters, 200 to 400 meters, etc) and create a histogram of pixel counts to show the shape of the underlying distribution. In this case, the aggregate function does not return a scalar value but a dictionary with
each bin as a key.</p>
<div class="highlight"><pre><span></span><code>>> stats['elevation_histogram']
{'0 to 500m': ...,
'500 to 1000m': ...,
'1000 to 3000m':...,
'3000 to 5000m':...,
'5000m+'
}
</code></pre></div>
<p>That's the goal, now how do we accomplish this? </p>
<h2>User-defined aggregate functions</h2>
<p>Because a histogram might need to <a href="http://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram.html">specify a number of arguments</a> to customize the results, it's not
feasible for <code>rasterstats</code> to define a generic histogram function. However, as of <a href="https://pypi.python.org/pypi/rasterstats/0.6.1">version 0.6</a>, we
have the ability to create custom, user-defined aggregate functions such as the zonal histogram
idea described above.</p>
<p>First, we have to write our function. The first and only argument is a masked numpy array
and will typically be handled by <code>numpy</code> functions. The function's return value will be added to the stats output for each feature. The returned
value does <em>not</em> need to be a scalar, it can be any valid python value (though it's probably
best to stick with dicts, lists and other simple data structures that are easily
serializable).</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="kn">import</span> <span class="nn">itertools</span>
<span class="k">def</span> <span class="nf">elevation_histogram</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="n">bin_edges</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">400</span><span class="p">,</span> <span class="mi">1000</span><span class="p">,</span> <span class="mi">3000</span><span class="p">,</span> <span class="mi">5000</span><span class="p">,</span> <span class="mi">10000</span><span class="p">]</span>
<span class="n">hist</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="kp">histogram</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">bins</span><span class="o">=</span><span class="n">bin_edges</span><span class="p">)</span>
<span class="n">data</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">for</span> <span class="n">upper</span><span class="p">,</span> <span class="n">lower</span><span class="p">,</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">itertools</span><span class="o">.</span><span class="n">izip</span><span class="p">(</span><span class="n">bin_edges</span><span class="p">,</span> <span class="n">bin_edges</span><span class="p">[</span><span class="mi">1</span><span class="p">:],</span> <span class="n">hist</span><span class="p">):</span>
<span class="n">key</span> <span class="o">=</span> <span class="s2">"</span><span class="si">{}</span><span class="s2"> to </span><span class="si">{}</span><span class="s2">m"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">upper</span><span class="p">,</span> <span class="n">lower</span><span class="p">)</span>
<span class="n">data</span><span class="p">[</span><span class="n">key</span><span class="p">]</span> <span class="o">=</span> <span class="n">value</span>
<span class="k">return</span> <span class="n">data</span>
</code></pre></div>
<p>And then add our custom <code>elevation_histogram</code> function to our <code>zonal_stats</code> call
using the <code>add_stats</code> keyword argument:</p>
<div class="highlight"><pre><span></span><code><span class="nv">stats</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">zonal_stats</span><span class="ss">(</span><span class="s1">'countries.shp'</span>,<span class="w"> </span><span class="s1">'elevation.tif'</span>,<span class="w"> </span><span class="nv">copy_properties</span><span class="o">=</span><span class="nv">True</span>,<span class="w"></span>
<span class="w"> </span><span class="nv">add_stats</span><span class="o">=</span>{<span class="s1">'elevation_histogram'</span>:<span class="w"> </span><span class="nv">elevation_histogram</span>}<span class="ss">)</span><span class="w"></span>
<span class="k">for</span><span class="w"> </span><span class="nv">s</span><span class="w"> </span><span class="nv">in</span><span class="w"> </span><span class="nv">stats</span>:<span class="w"></span>
<span class="w"> </span><span class="nv">print</span><span class="w"> </span><span class="nv">s</span>[<span class="s1">'name'</span>],<span class="w"> </span><span class="nv">s</span>[<span class="s1">'mean'</span>],<span class="w"> </span><span class="nv">s</span>[<span class="s1">'elevation_histogram'</span>]<span class="w"></span>
</code></pre></div>
<p>which gives us output similar to the following which gives you raw pixel counts
for each of the elevation bins (formatted for readability)</p>
<div class="highlight"><pre><span></span><code>Afghanistan 1826.38 {
'3000 to 5000m': 1099730,
'0 to 400m': 1754317,
'1000 to 3000m': 2884917,
'5000 to 10000m': 83158,
'400 to 1000m': 1907790}
</code></pre></div>
<p>The only caveat with using this technique is that nested dictionaries and other
non-scalar values might cause difficulty when trying to serialize this
data structure to other formats. For example, most GIS formats don't support hierarchical
properties (nested dictionaries) so you might have to flatten the data before
writing to e.g. PostGIS or an ESRI shapefile. </p>
<p>With the ability to write user-defined aggregate functions, I can keep the core
of <code>rasterstats</code> light while allowing for the possibility of complex aggregate analysis
that might be needed in the future. Good stuff.</p>Topological simplification of simple features2015-01-11T00:00:00-07:002015-01-11T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2015-01-11:/topological-simplification-of-simple-features.html<h2>The case for topology</h2>
<p><a href="http://en.wikipedia.org/wiki/Simple_Features">Simple feature</a> representations
of polygon geometries are ubiquitous due to their ease of use.
Thinking of spatial features as having a single, independent geometry is easy and fits most use cases.
But that ease of use disappears when we need to represent the topological
relationship between …</p><h2>The case for topology</h2>
<p><a href="http://en.wikipedia.org/wiki/Simple_Features">Simple feature</a> representations
of polygon geometries are ubiquitous due to their ease of use.
Thinking of spatial features as having a single, independent geometry is easy and fits most use cases.
But that ease of use disappears when we need to represent the topological
relationship between features.</p>
<p>In this article, I'll focus on one particular task with simple features data that
would benefit from topology - namely simplifying a polygon dataset by removing vertices.
Here's the original dataset, a 30+MB shapefile with very dense line work.</p>
<p><img alt="original" src="/images/topo_simplify/original.png"></p>
<p>Geometries <em>can</em> be simplified under the Simple Features model but, since
each geometry is processed independently, the <strong>topological relationships between
features can be disrupted</strong>. For instance, using the <code>Simplify Geometries</code> tool in
QGIS, I can simplify the polygons dramatically but we see gaps between polygons
and other side effects.</p>
<p><img alt="no_topology" src="/images/topo_simplify/no_topology.png"></p>
<h2>The plan</h2>
<p>Because we'll need to <em>build</em> topology before acting on it, the process for simplifying
simple features datasets involves converting the data to topological structure,
simplifying it, then converting it back to a simple features representation.</p>
<p>Many of the big GIS systems (ESRI's .e00, ArcInfo "coverages", and GRASS vectors)
have their own topological data structures. More recently, we've seen the rise of
Open Street Map (OSM) format and TopoJSON, both of which model topological relationships.</p>
<p>Of these options, I selected <a href="https://github.com/mbostock/topojson/wiki">TopoJSON</a>
because of it's robust <a href="https://github.com/mbostock/topojson/wiki/Command-Line-Reference">command-line tool</a>
which handles building topology and simplification in one step. Additionally, it
works with GeoJSON and Shapefile inputs, two of the most common
data formats for simple features.</p>
<p>The workflow goes something like this:</p>
<ol>
<li>Convert data into a shapefile with the EPSG:4326 spatial reference (lonlat, wgs84)</li>
<li>Convert to topojson and simplify</li>
<li>Convert to geojson</li>
<li>Optionally, convert geojson to other formats supported by OGR</li>
</ol>
<p>To follow along, you'll need to have the following software installed:</p>
<ul>
<li>GDAL command line utilities (we'll use <code>ogr2ogr</code> at the command line)<ul>
<li><code>apt-get install gdal-bin</code></li>
</ul>
</li>
<li>The <code>topojson</code> command line utility<ul>
<li><code>npm install -g topojson</code></li>
</ul>
</li>
<li>Python with the <code>shapely</code> package installed.<ul>
<li><code>pip install shapely</code></li>
</ul>
</li>
</ul>
<h2>Step 1: Convert to WGS84 shapefile</h2>
<p>If you're already working with an ESRI Shapefile or GeoJSON format and your data
is already in unprojected WGS84 coordinates (i.e. EPSG:4326), you can skip to step 2.</p>
<p>Otherwise, <code>ogr2ogr</code> makes that conversion simple:</p>
<div class="highlight"><pre><span></span><code>ogr2ogr -t_srs epsg:4326 -f <span class="s2">"ESRI Shapefile"</span> <span class="se">\</span>
ecoregions_original.shp EcoregionSummaries3.gdb.zip EcoRegions
</code></pre></div>
<h2>Step 2: Convert to TopoJSON and simplify</h2>
<p>The simplification, quantization (more on that later) and the conversion to
a topological data model are handled by <code>topojson</code></p>
<p>You have two options for specifying how aggressively you want to simplify your data.</p>
<ol>
<li>Use a tolerance, specified in <a href="http://en.wikipedia.org/wiki/Steradian#SI_multiples">steridians</a> with the <code>-s</code> flag</li>
<li>Use a proportion of points, 0 to 1, to retain with the <code>--simplify-proportion</code> flag</li>
</ol>
<p>One quirk of the topojson implementation is that it uses a relatively low quantization factor by default.
Effectively, this snaps coordinates to a grid in order to save space and simplify geometries even further.
This yields nice small coordinates but can result in a "stair step" effect at higher
zoom levels. The default is <code>-q 1E4</code> but I've found good results with <code>-q 1E6</code> as
recommended in the topojson docs.</p>
<p>As an example, let's take our <code>ecoregions_original.shp</code> and convert it to topojson
with a tolerance of <code>1E-8</code> steridians. We want to make sure we explicitly mention
that the data is in spherical (unprojected) coordinates and to retain the properties
of the original attribute table:</p>
<div class="highlight"><pre><span></span><code>topojson --spherical <span class="se">\</span>
--properties <span class="se">\</span>
-s 1E-8 <span class="se">\</span>
-q 1E6 <span class="se">\</span>
-o temp.topojson <span class="se">\</span>
ecoregions_original.shp
</code></pre></div>
<h2>Step 3: Convert to GeoJSON</h2>
<p>This part was a bit trickier than I anticipated. Luckily Sean Gillies has written
some preliminary <a href="http://sgillies.net/blog/1159/topojson-with-python">python functions</a>
for converting topojson geometries to standard GeoJSON-like python dictionaries.</p>
<p>In order to make a higher-level conversion utility, I started working on <a href="https://gist.github.com/perrygeo/1e767e42e8bc54ad7262#file-topo2geojson-py">topo2geojson.py</a> which provides a command line
interface to perform TopoJSON to GeoJSON conversions.</p>
<div class="highlight"><pre><span></span><code>python topo2geojson.py temp.topojson ecoregions_simple.geojson
</code></pre></div>
<p>There is some additional logic to ensure validity of polygons though it is very
basic and I'm sure there are ways to make the geometry conversions more robust.
Please note that I've only tested this script on this one dataset and it likely needs
additional work to be considered a full-fledged conversion tool; consider it more of a
starting point than an out-of-the box solution.</p>
<h2>Optional Step 4: Convert to any OGR format</h2>
<p>Once data is in GeoJSON format, we're free to do what we want with it, including
converting it back to a shapefile or any other OGR supported data format.</p>
<div class="highlight"><pre><span></span><code>ogr2ogr -f <span class="s2">"ESRI Shapefile"</span> ecoregions_simple.shp ecoregions_simple.geojson OGRGeoJson
</code></pre></div>
<h1>Case study: evaluating simplification tolerances</h1>
<p>In the remainder of this article, I'll walk through a demonstration of these steps
in order to find an optimal simplification tolerance for my test data. The optimal
tolerance depends on your needs, what scales you will be using your data and how
aggressively you need to reduce file size. Ultimately, it's a <strong> tradeoff between
low geometry size and accurate line work</strong>.</p>
<p>We can easily script this solution in order to test multiple simplification tolerances.
As a bonus, we can fire off multiple iterations at once to leverage multiple cores.
Since I've got 4 cores on my laptop, I can run 4 processes in nearly the same time
it takes to run 1 using some simple shell tricks (Linux/OSX only; sorry Windows users but I don't know .bat files well enough to demonstrate)</p>
<div class="highlight"><pre><span></span><code><span class="k">for</span> tolerance <span class="k">in</span> 1E-7 1E-8 1E-9 1E-10
<span class="k">do</span>
topojson --spherical <span class="se">\</span>
--properties <span class="se">\</span>
-s <span class="nv">$tolerance</span> <span class="se">\</span>
-q 1E6 <span class="se">\</span>
-o temp_<span class="nv">$tolerance</span>.topojson <span class="se">\</span>
ecoregions_original.shp <span class="o">&&</span>
<span class="c1"># Convert it to GeoJSON</span>
python topo2geojson.py temp_<span class="nv">$tolerance</span>.topojson temp_<span class="nv">$tolerance</span>.geojson <span class="o">&&</span>
<span class="c1"># Optionally, convert GeoJSON to any OGR data source</span>
ogr2ogr -f <span class="s2">"ESRI Shapefile"</span> ecoregions_<span class="nv">$tolerance</span>.shp temp_<span class="nv">$tolerance</span>.geojson OGRGeoJson <span class="p">&</span>
<span class="k">done</span>
<span class="nb">wait</span>
</code></pre></div>
<p>Then we can take a look at the resulting .topojson file sizes</p>
<div class="highlight"><pre><span></span><code>$ ls -lh *.topojson
-rw-rw-r-- <span class="m">1</span> mperry mperry <span class="m">4</span>.5M Jan <span class="m">11</span> <span class="m">12</span>:25 temp_1E-10.topojson
-rw-rw-r-- <span class="m">1</span> mperry mperry <span class="m">2</span>.1M Jan <span class="m">11</span> <span class="m">12</span>:25 temp_1E-9.topojson
-rw-rw-r-- <span class="m">1</span> mperry mperry 869K Jan <span class="m">11</span> <span class="m">12</span>:25 temp_1E-8.topojson
-rw-rw-r-- <span class="m">1</span> mperry mperry 362K Jan <span class="m">11</span> <span class="m">12</span>:25 temp_1E-7.topojson
</code></pre></div>
<p>OK, so with a simplification tolerance of 1E-10 steridians, we can get a 4.5M file.
If we reduce it to 1E-7, we can get 362k file - a 12.5x reduction. Is the reduction
in file size worth the reduction in geometric accuracy? The only way to find out is to
render maps of the resulting datasets and visually assess them.</p>
<table>
<tr>
<th> </th>
<th>Original</th>
<th>1E-7</th>
<th>1E-8</th>
<th>1E-9</th>
</tr>
<tr>
<th> </th>
<th><img src="/images/topo_simplify/original.png" width=200></th>
<th><img src="/images/topo_simplify/1E-7.png" width=200></th>
<th><img src="/images/topo_simplify/1E-8.png" width=200></th>
<th><img src="/images/topo_simplify/1E-9.png" width=200></th>
</tr>
</table>
<p>First thing that we notice - all of the results have retained topology with no gaps or slivers introduced.
(the key benefit to this workflow).</p>
<p>Next, we notice that at this scale (roughly 1:500k on my monitor) we can barely
see a difference between the 1E-9 version and the original. And the 1E-7 version
looks a bit too simplified and chunky. So, in this case, we can say that a simplification
tolerance of around 1E-8 steridians is an optimal balance of file size and detail.</p>
<p>Of course other datasets, scales and uses may have completely different results so please try
it out and let me know how it goes. Just don't settle for simple features simplification next time you need to
reduce file sizes!</p>Sensitivity Analysis in Python2014-01-19T00:00:00-07:002014-01-19T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2014-01-19:/sensitivity-analysis-in-python.html<h3>Demonstrates the use of the <code>SALib</code> python module to sample and test the sensitivity of models</h3>
<hr>
<p>As (geo)data scientists, we spend much of our time working with data models that try (with varying degrees of success) to capture some essential truth about the world while still being as simple …</p><h3>Demonstrates the use of the <code>SALib</code> python module to sample and test the sensitivity of models</h3>
<hr>
<p>As (geo)data scientists, we spend much of our time working with data models that try (with varying degrees of success) to capture some essential truth about the world while still being as simple as possible to provide a useful abstraction. Inevitably, complexity starts to creep into every model and we don't often stop to assess the value added by that complexity. When working with models that require a large number of parameters and a huge domain of potential inputs that are expensive to collect, it becomes difficult to answer the question:</p>
<p><strong>What parameters of the model are the most sensitive?</strong></p>
<p>In other words, if I am going to spend my resources obtaining/refining data for this model, where should I focus
my efforts in order to get the best bang for the buck? If I spend weeks working on deriving a single parameter for the model,
I want some assurance that the parameter is critically important to the model's prediction.
The flip-side, of course, is that if a parameter is <em>not</em> that important to the model's predictive power, I could
save some time by perhaps just using some quick-and-dirty approximation. </p>
<h3>SALib: a python module for testing model sensitivity</h3>
<p>I was thrilled to find <a href="http://jdherman.github.io/SALib/">SALib</a> which implements a number of vetted methods for quantitatively
assessing parameter sensitivity. There are three basic steps to running SALib:</p>
<ol>
<li>Define the parameters to test, define their domain of possible values and generate <em>n</em> sets of randomized input parameters. </li>
<li>Run the model <em>n</em> times and capture the results.</li>
<li>Analyze the results to identify the most/least sensitive parameters.</li>
</ol>
<p>I'll leave the details of these steps to the <a href="http://jdherman.github.io/SALib/">SALib documentation</a>.
The beauty of the SALib approach is that you have the flexibility[1] to run any model in any way you want, so long as you can manipulate the inputs and outputs adequately.</p>
<h3>Case Study: Climate effects on forestry</h3>
<p>I wanted to compare a forest growth and yield model under different climate change scenarios in order to assess what the most sensitive climate-related variables were. I identified 4 variables:</p>
<ul>
<li>Climate model (4 global circulation models)</li>
<li>Representative Concentration Pathways (RCPs; 3 different emission trajectories)</li>
<li>Mortality factor for species viability (0 to 1)</li>
<li>Mortality factor for equivalent elevation change (0 to 1)</li>
</ul>
<p>In this case, I was using the <a href="http://www.fs.fed.us/fmsc/fvs/">Forest Vegetation Simulator</a>(FVS) which requires
a configuration file for every model iteration. So, for Step 2, I had to iterate through each set of input variables and use them to generate an appropriate configuration file. This involved translating the real numbers from the samples into categorical variables in some cases. Finally, in order to get the result of the model iteration, I had to parse the outputs of FVS and do some post-processing to obtain the variable of interest (the average volume of standing timber over 100 years). So the flexibility of SALib comes at a slight cost: unless your model works directly with the file formatted for SALib, the input and outputs may require some data manipulation. </p>
<p>After running the all required iterations of the model[2] I was able to analyze the results and assess the sensitivity of the four parameters. </p>
<p>Here's the output of SALib's analysis (formatted slightly for readability):</p>
<div class="highlight"><pre><span></span><code>Parameter First_Order First_Order_Conf Total_Order Total_Order_Conf
circulation 0.193685 0.041254 0.477032 0.034803
rcp 0.517451 0.047054 0.783094 0.049091
mortviab -0.007791 0.006993 0.013050 0.007081
mortelev -0.005971 0.005510 0.007162 0.006693
</code></pre></div>
<p>The <em>first order effects</em> represent the effect of that parameter alone. The <em>total order effects</em> are arguably more
relevant to understanding the overall interaction of that parameter with your model. The "Conf" columns represent confidence and can be interpreted as error bars.</p>
<p>In this case, we interpret the output as follows:</p>
<div class="highlight"><pre><span></span><code>Parameter Total Order Effect
circulation 0.47 +- 0.03 (moderate influence)
rcp 0.78 +- 0.05 (dominant parameter)
mortviab 0.01 +- 0.007 (weak influence)
mortelev 0.007 +- 0.006 (weak influence)
</code></pre></div>
<p>We can graph each of the input parameters against the results to visualize this:</p>
<p><img alt="sagraph" src="/assets/img/sagraph.png"></p>
<p>Note that the 'mortelev' component is basically flat (as the factor increases, the result stays the same) whereas the choice of 'rcp' has a heavy influence (as emissions increase to the highest level, the resulting prediction for timber volumes are noticeably decreased).</p>
<p>The conclusion is that the climate variables, particularly the RCPs related to human-caused emissions, were the strongest determinants[1] of tree growth <em>for this particular forest stand</em>. This ran counter to our initial intuition that the mortality factors would play a large role in the model. Based on this sensitivity analysis, we may be able to avoid wasting effort on refining parameters that are of minor consequence to the output.</p>
<hr>
<p>Footnotes:</p>
<ol>
<li>Compared to more tightly integrated, model-specific methods of sensitivity analysis</li>
<li>20 thousand iterations took approximately 8 hours; sensitivity analysis generally requires lots of processing</li>
<li>Note that the influence of a parameter says nothing about direct <em>causality</em></li>
</ol>Leaflet SimpleCSV2013-09-30T00:00:00-06:002013-09-30T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2013-09-30:/leaflet-simplecsv.html<h3>Simple leaftlet-based template for mapping tabular point data on a slippy map</h3>
<hr>
<p>Anyone who's worked with spatial data and the web has run across the need to take
some simple tabular data and put points on an interactive map.
It's the fundamental "<em>Hello World</em>" of web mapping. Yet I always …</p><h3>Simple leaftlet-based template for mapping tabular point data on a slippy map</h3>
<hr>
<p>Anyone who's worked with spatial data and the web has run across the need to take
some simple tabular data and put points on an interactive map.
It's the fundamental "<em>Hello World</em>" of web mapping. Yet I always find myself spending way too much time
solving this seemingly simple problem. When you consider zoom levels, attributes,
interactivity, clustering, querying, etc... it becomes apparent that interactive maps
require a bit more legwork. But that functionality is fairly consistent case-to-case so I've developed a generalized solution that works for the majority of basic use cases out there: </p>
<p><a class="btn btn-primary" href="https://github.com/perrygeo/leaflet-simple-csv">leaftlet-simple-csv on github</a></p>
<p>The idea is pretty generic but useful for most point marker maps:
* Data is in tabular delimited-text (csv, etc.) with two required columns: <code>lat</code> and <code>lng</code>
* Points are plotted on full-screen <a href="https://github.com/Leaflet/Leaflet">Leaflet</a> map
* Point markers are clustered dynamically based on zoom level.
* Clicking on a point cluster will zoom into the extent of the underlying features.
* Hovering on the point will display the name.
* Clicking will display a popup with columns/properties displayed as an html table.
* Full text filtering with typeahead
* Completely client-side javascript with all dependencies included or linked via CDN</p>
<p>Of course this is mostly just a packaged version of existing work, namely <a href="https://github.com/Leaflet/Leaflet">Leaflet</a> with the <a href="https://github.com/joker-x/Leaflet.geoCSV">geoCSV</a> and <a href="https://github.com/Leaflet/Leaflet.markercluster">markercluster</a> plugins.</p>
<h2>Usage</h2>
<ol>
<li>Grab the <a href="https://github.com/perrygeo/leaflet-simple-csv/archive/master.zip">leaflet-simple-csv zip file</a> and unzip it to a location accessible through a web server. </li>
<li>Copy the <code>config.js.template</code> to <code>config.js</code></li>
<li>Visit the <a href="assets/leaflet-simple-csv/index.html">index.html</a> page to confirm everything is working with the built-in example.</li>
<li>Customize your <code>config.js</code> for your dataset.</li>
</ol>
<p>An example config:</p>
<div class="highlight"><pre><span></span><code>var dataUrl = 'data/data.csv';
var maxZoom = 9;
var fieldSeparator = '|';
var baseUrl = 'http://otile{s}.mqcdn.com/tiles/1.0.0/osm/{z}/{x}/{y}.jpg';
var baseAttribution = 'Data, imagery and map information provided by <span class="nt"><a</span> <span class="na">href=</span><span class="s">"http://open.mapquest.co.uk"</span> <span class="na">target=</span><span class="s">"_blank"</span><span class="nt">></span>MapQuest<span class="nt"></a></span>, <span class="nt"><a</span> <span class="na">href=</span><span class="s">"http://www.openstreetmap.org/"</span> <span class="na">target=</span><span class="s">"_blank"</span><span class="nt">></span>OpenStreetMap<span class="nt"></a></span> and contributors, <span class="nt"><a</span> <span class="na">href=</span><span class="s">"http://creativecommons.org/licenses/by-sa/2.0/"</span> <span class="na">target=</span><span class="s">"_blank"</span><span class="nt">></span>CC-BY-SA<span class="nt"></a></span>';
var subdomains = '1234';
var clusterOptions = {showCoverageOnHover: false, maxClusterRadius: 50};
var labelColumn = "Name";
var opacity = 1.0;
</code></pre></div>
<p>The example dataset:</p>
<div class="highlight"><pre><span></span><code>Country|Name|lat|lng|Altitude
United States|New York City|40.7142691|-74.0059738|2.0
United States|Los Angeles|34.0522342|-118.2436829|115.0
United States|Chicago|41.8500330|-87.6500549|181.0
United States|Houston|29.7632836|-95.3632736|15.0
...
</code></pre></div>
<p>I make no claims that this is the "right" way to do it but leveraging
100% client-side javascript libraries and native delimited-text formats seems like the simplest approach.
Many of the features included here (clustering, filtering) are useful enough
to apply to most situations and hopefully you'll find it useful.</p>
<hr>
<div><iframe src="http://blog.perrygeo.net/assets/leaflet-simple-csv/index.html" height="450" width="740"></iframe></div>Python rasterstats2013-09-24T00:00:00-06:002013-09-24T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2013-09-24:/python-rasterstats.html<h3>This article introduces a python module for summarizing geospatial raster datasets based on vector geometries (i.e. zonal statistics).</h3>
<p>A common task in many of my data workflows involves "zonal statistics"; summarizing raster data based on vector geometries. Despite many
alternatives (starspan, the QGIS Zonal Statistics plugin, ArcPy and R …</p><h3>This article introduces a python module for summarizing geospatial raster datasets based on vector geometries (i.e. zonal statistics).</h3>
<p>A common task in many of my data workflows involves "zonal statistics"; summarizing raster data based on vector geometries. Despite many
alternatives (starspan, the QGIS Zonal Statistics plugin, ArcPy and R) there
were none that were</p>
<ul>
<li>open source</li>
<li>fast enough</li>
<li>flexible enough</li>
<li>worked with python data structures</li>
</ul>
<p>We'd written a wrapper around starspan for madrona (see <a href="https://github.com/Ecotrust/madrona/blob/master/docs/raster_stats.rst">madrona.raster_stats</a> ) but
relying on shell calls and an aging, unmaintained C++ code base was not cutting
it.</p>
<p>So I set out to create a solution using numpy, GDAL and python. The
<code>rasterstats</code> package was born. </p>
<p><a href="https://github.com/perrygeo/python-raster-stats" class="btn btn-primary">`python-raster-stats` on github</a></p>
<h2>Example</h2>
<p>Let's jump into an example. I've got a polygon shapefile of continental US
<em>state boundaries</em> and a raster dataset of <em>annual precipitation</em> from the
<a href="http://www.cec.org/Page.asp?PageID=924&ContentID=2336">North American Environmental
Atlas</a>.</p>
<p><img alt="states_precip" src="/assets/img/states_precip.jpeg"></p>
<div class="highlight"><pre><span></span><code><span class="n">states</span> <span class="o">=</span> <span class="s2">"data/boundaries_contus.shp"</span>
<span class="n">precip</span> <span class="o">=</span> <span class="s2">"data/precipitation.tif"</span>
</code></pre></div>
<p>The <code>raster_stats</code> function is the main entry point. Provide a vector and a
raster as input and expect a list of dicts, one for each input feature.</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">rasterstats</span> <span class="kn">import</span> <span class="n">raster_stats</span>
<span class="n">rain_stats</span> <span class="o">=</span> <span class="n">raster_stats</span><span class="p">(</span><span class="n">states</span><span class="p">,</span> <span class="n">precip</span><span class="p">,</span> <span class="n">stats</span><span class="o">=</span><span class="s2">"*"</span><span class="p">,</span> <span class="n">copy_properties</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="nb">len</span><span class="p">(</span><span class="n">rain_stats</span><span class="p">)</span> <span class="c1"># continental US; 48 states plus District of Columbia</span>
<span class="mi">49</span>
</code></pre></div>
<p>Print out the stats for a given state:</p>
<div class="highlight"><pre><span></span><code><span class="p">[</span><span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">rain_stats</span> <span class="k">if</span> <span class="n">x</span><span class="p">[</span><span class="s1">'NAME'</span><span class="p">]</span> <span class="o">==</span> <span class="s2">"Oregon"</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span>
<span class="p">{</span><span class="s1">'COUNTRY'</span><span class="p">:</span> <span class="s1">'USA'</span><span class="p">,</span>
<span class="s1">'EDIT'</span><span class="p">:</span> <span class="s1">'NEW'</span><span class="p">,</span>
<span class="s1">'EDIT_DATE'</span><span class="p">:</span> <span class="s1">'20060803'</span><span class="p">,</span>
<span class="s1">'NAME'</span><span class="p">:</span> <span class="s1">'Oregon'</span><span class="p">,</span>
<span class="s1">'STATEABB'</span><span class="p">:</span> <span class="s1">'US-OR'</span><span class="p">,</span>
<span class="s1">'Shape_Area'</span><span class="p">:</span> <span class="mf">250563567264.0</span><span class="p">,</span>
<span class="s1">'Shape_Leng'</span><span class="p">:</span> <span class="mf">2366783.00361</span><span class="p">,</span>
<span class="s1">'UIDENT'</span><span class="p">:</span> <span class="mi">124704</span><span class="p">,</span>
<span class="s1">'__fid__'</span><span class="p">:</span> <span class="mi">35</span><span class="p">,</span>
<span class="s1">'count'</span><span class="p">:</span> <span class="mi">250510</span><span class="p">,</span>
<span class="s1">'majority'</span><span class="p">:</span> <span class="mi">263</span><span class="p">,</span>
<span class="s1">'max'</span><span class="p">:</span> <span class="mf">3193.0</span><span class="p">,</span>
<span class="s1">'mean'</span><span class="p">:</span> <span class="mf">779.2223903237395</span><span class="p">,</span>
<span class="s1">'median'</span><span class="p">:</span> <span class="mf">461.0</span><span class="p">,</span>
<span class="s1">'min'</span><span class="p">:</span> <span class="mf">205.0</span><span class="p">,</span>
<span class="s1">'minority'</span><span class="p">:</span> <span class="mi">3193</span><span class="p">,</span>
<span class="s1">'range'</span><span class="p">:</span> <span class="mf">2988.0</span><span class="p">,</span>
<span class="s1">'std'</span><span class="p">:</span> <span class="mf">631.539502512283</span><span class="p">,</span>
<span class="s1">'sum'</span><span class="p">:</span> <span class="mf">195203001.0</span><span class="p">,</span>
<span class="s1">'unique'</span><span class="p">:</span> <span class="mi">2865</span><span class="p">}</span>
</code></pre></div>
<p>Find the three driest states:</p>
<div class="highlight"><pre><span></span><code><span class="p">[(</span><span class="n">x</span><span class="p">[</span><span class="s1">'NAME'</span><span class="p">],</span> <span class="n">x</span><span class="p">[</span><span class="s1">'mean'</span><span class="p">])</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span>
<span class="nb">sorted</span><span class="p">(</span><span class="n">rain_stats</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">k</span><span class="p">:</span> <span class="n">k</span><span class="p">[</span><span class="s1">'mean'</span><span class="p">])[:</span><span class="mi">3</span><span class="p">]]</span>
<span class="p">[(</span><span class="s1">'Nevada'</span><span class="p">,</span> <span class="mf">248.23814034118908</span><span class="p">),</span>
<span class="p">(</span><span class="s1">'Utah'</span><span class="p">,</span> <span class="mf">317.668743027571</span><span class="p">),</span>
<span class="p">(</span><span class="s1">'Arizona'</span><span class="p">,</span> <span class="mf">320.6157232064074</span><span class="p">)]</span>
</code></pre></div>
<p>And write the data out to a csv.</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">rasterstats</span> <span class="kn">import</span> <span class="n">stats_to_csv</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s1">'out.csv'</span><span class="p">,</span> <span class="s1">'w'</span><span class="p">)</span> <span class="k">as</span> <span class="n">fh</span><span class="p">:</span>
<span class="n">fh</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">stats_to_csv</span><span class="p">(</span><span class="n">rain_stats</span><span class="p">))</span>
</code></pre></div>
<h2>Geo interface</h2>
<p>The basic usage above shows the path of an entire OGR vector layer as the first argument. But raster-stats
also supports other vector features/geometries.</p>
<ul>
<li>Well-Known Text/Binary</li>
<li>GeoJSON string and mappings</li>
<li>Any python object that supports the <a href="https://gist.github.com/sgillies/2217756">geo_interface</a></li>
<li>Single objects or iterables</li>
</ul>
<p>In this example, I use a geojson-like python mapping to specify a single geometry</p>
<div class="highlight"><pre><span></span><code><span class="n">geom</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'coordinates'</span><span class="p">:</span> <span class="p">[[</span>
<span class="p">[</span><span class="o">-</span><span class="mf">594335.108537269</span><span class="p">,</span> <span class="o">-</span><span class="mf">570957.932799394</span><span class="p">],</span>
<span class="p">[</span><span class="o">-</span><span class="mf">422374.54395311</span><span class="p">,</span> <span class="o">-</span><span class="mf">593387.5716581973</span><span class="p">],</span>
<span class="p">[</span><span class="o">-</span><span class="mf">444804.1828119133</span><span class="p">,</span> <span class="o">-</span><span class="mf">765348.1362423564</span><span class="p">],</span>
<span class="p">[</span><span class="o">-</span><span class="mf">631717.839968608</span><span class="p">,</span> <span class="o">-</span><span class="mf">735441.9510972851</span><span class="p">],</span>
<span class="p">[</span><span class="o">-</span><span class="mf">594335.108537269</span><span class="p">,</span> <span class="o">-</span><span class="mf">570957.932799394</span><span class="p">]]],</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'Polygon'</span><span class="p">}</span>
<span class="n">raster_stats</span><span class="p">(</span><span class="n">geom</span><span class="p">,</span> <span class="n">precip</span><span class="p">,</span> <span class="n">stats</span><span class="o">=</span><span class="s2">"min median max"</span><span class="p">)</span>
<span class="p">[{</span><span class="s1">'__fid__'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="s1">'max'</span><span class="p">:</span> <span class="mf">1011.0</span><span class="p">,</span> <span class="s1">'median'</span><span class="p">:</span> <span class="mf">451.0</span><span class="p">,</span> <span class="s1">'min'</span><span class="p">:</span> <span class="mf">229.0</span><span class="p">}]</span>
</code></pre></div>
<h2>Categorical</h2>
<p>We're not limited to descriptive statistics for <em>continuous</em> rasters either; we
can get unique pixel counts for <em>categorical</em> rasters as well. In this example,
we've got a raster of 2005 land cover (i.e. general vegetation type). </p>
<p><img alt="states_veg" src="/assets/img/states_veg.jpeg"></p>
<p>Note that
we can specify only the stats that make sense and the <code>categorical=True</code>
provides a count of each pixel value.</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="n">landcover</span> <span class="o">=</span> <span class="s2">"/data/workspace/rasterstats_blog/NA_LandCover_2005.img"</span>
<span class="o">>>></span> <span class="n">veg_stats</span> <span class="o">=</span> <span class="n">raster_stats</span><span class="p">(</span><span class="n">states</span><span class="p">,</span> <span class="n">landcover</span><span class="p">,</span>
<span class="n">stats</span><span class="o">=</span><span class="s2">"count majority minority unique"</span><span class="p">,</span>
<span class="n">copy_properties</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<span class="n">nodata_value</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span>
<span class="n">categorical</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="o">>>></span> <span class="p">[</span><span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">veg_stats</span> <span class="k">if</span> <span class="n">x</span><span class="p">[</span><span class="s1">'NAME'</span><span class="p">]</span> <span class="o">==</span> <span class="s2">"Oregon"</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span>
<span class="p">{</span><span class="mi">1</span><span class="p">:</span> <span class="mi">999956</span><span class="p">,</span>
<span class="mi">3</span><span class="p">:</span> <span class="mi">6</span><span class="p">,</span>
<span class="mi">5</span><span class="p">:</span> <span class="mi">3005</span><span class="p">,</span>
<span class="mi">6</span><span class="p">:</span> <span class="mi">198535</span><span class="p">,</span>
<span class="mi">8</span><span class="p">:</span> <span class="mi">2270805</span><span class="p">,</span>
<span class="mi">10</span><span class="p">:</span> <span class="mi">126199</span><span class="p">,</span>
<span class="mi">14</span><span class="p">:</span> <span class="mi">20883</span><span class="p">,</span>
<span class="mi">15</span><span class="p">:</span> <span class="mi">301884</span><span class="p">,</span>
<span class="mi">16</span><span class="p">:</span> <span class="mi">17452</span><span class="p">,</span>
<span class="mi">17</span><span class="p">:</span> <span class="mi">39246</span><span class="p">,</span>
<span class="mi">18</span><span class="p">:</span> <span class="mi">28872</span><span class="p">,</span>
<span class="mi">19</span><span class="p">:</span> <span class="mi">2174</span><span class="p">,</span>
<span class="s1">'COUNTRY'</span><span class="p">:</span> <span class="s1">'USA'</span><span class="p">,</span>
<span class="s1">'EDIT'</span><span class="p">:</span> <span class="s1">'NEW'</span><span class="p">,</span>
<span class="s1">'EDIT_DATE'</span><span class="p">:</span> <span class="s1">'20060803'</span><span class="p">,</span>
<span class="s1">'NAME'</span><span class="p">:</span> <span class="s1">'Oregon'</span><span class="p">,</span>
<span class="s1">'STATEABB'</span><span class="p">:</span> <span class="s1">'US-OR'</span><span class="p">,</span>
<span class="s1">'Shape_Area'</span><span class="p">:</span> <span class="mf">250563567264.0</span><span class="p">,</span>
<span class="s1">'Shape_Leng'</span><span class="p">:</span> <span class="mf">2366783.00361</span><span class="p">,</span>
<span class="s1">'UIDENT'</span><span class="p">:</span> <span class="mi">124704</span><span class="p">,</span>
<span class="s1">'__fid__'</span><span class="p">:</span> <span class="mi">35</span><span class="p">,</span>
<span class="s1">'count'</span><span class="p">:</span> <span class="mi">4009017</span><span class="p">,</span>
<span class="s1">'majority'</span><span class="p">:</span> <span class="mi">8</span><span class="p">,</span>
<span class="s1">'minority'</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span>
<span class="s1">'unique'</span><span class="p">:</span> <span class="mi">12</span><span class="p">}</span>
</code></pre></div>
<p>Of course the pixel values alone don't make much sense. We need to interpret the
pixel values as land cover classes:</p>
<div class="highlight"><pre><span></span><code>Value, Class_name
1 Temperate or sub-polar needleleaf forest
2 Sub-polar taiga needleleaf forest
3 Tropical or sub-tropical broadleaf evergreen
4 Tropical or sub-tropical broadleaf deciduous
5 Temperate or sub-polar broadleaf deciduous
6 Mixed Forest
7 Tropical or sub-tropical shrubland
8 Temperate or sub-polar shrubland
9 Tropical or sub-tropical grassland
10 Temperate or sub-polar grassland
11 Sub-polar or polar shrubland-lichen-moss
12 Sub-polar or polar grassland-lichen-moss
13 Sub-polar or polar barren-lichen-moss
14 Wetland
15 Cropland
16 Barren Lands
17 Urban and Built-up
18 Water
19 Snow and Ice
</code></pre></div>
<p>So, for our Oregon example above we can see that, despite Oregon's reputation as
a lush green landscape, the majority land cover class (#8) is "Temperate or sub-
polar shrubland" at 2.27m pixels out of 4 millions total.</p>
<p>There's a lot more functionality that isn't covered in this post but you get the
picture... please check it out and let me know what you think. </p>Creating UTFGrids directly from a polygon datasource2012-08-20T00:00:00-06:002012-08-20T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2012-08-20:/creating-utfgrids-directly-from-a-polygon-datasource.html<p>We've begun to rely on the interactivity provided by <a href="http://mapbox.com/mbtiles-spec/utfgrid/">UTFGrids</a> in many of our recent web maps. (Quick recap: UTFGrids are "invisible" map tiles that allow direct interactivity with feature attributes without querying the server.) Earlier this year, I created the <a href="/2012/02/24/utfgrids-with-openlayers-and-tilestache/">initial OpenLayers UTFGrid support</a> and was glad to see …</p><p>We've begun to rely on the interactivity provided by <a href="http://mapbox.com/mbtiles-spec/utfgrid/">UTFGrids</a> in many of our recent web maps. (Quick recap: UTFGrids are "invisible" map tiles that allow direct interactivity with feature attributes without querying the server.) Earlier this year, I created the <a href="/2012/02/24/utfgrids-with-openlayers-and-tilestache/">initial OpenLayers UTFGrid support</a> and was glad to see it accepted into OpenLayer 2.12 (with some enhancements). </p>
<p>With the client-side javascript support in place, the only missing piece in the workflow was to create the UTFGrid .json files.
We had expirimented with several alternate <a href="https://github.com/springmeyer/utfgrid-example-writers">UTFGrid renderers</a> but Mapnik's rendering was by far the fastest and produced the best results.
Using Tilemill was a convenient way to leverage the Mapnik UTFGrid renderer but it came at the cost of a somewhat circuitious and manual workflow: </p>
<ol>
<li>Load the data up into <a href="http://mapbox.com/tilemill/">Tilemill</a>,</li>
<li>Configure interactivity fields</li>
<li>Export to .mbtiles</li>
<li><a href="http://blog.perrygeo.net/2012/03/25/working-with-mbtiles-in-python/">Convert to .json files</a></li>
</ol>
<p>What we really needed was a <strong>script to take a polygon shapefile and render the UTFGrids directly to files</strong>. <a href="http://mapnik.org">Mapnik</a> would provide the rendering while the <a href="http://www.maptiler.org/google-maps-coordinates-tile-bounds-projection/globalmaptiles.py">Global Map Tiles</a> python module would provide the logic for going back and forth between geographic coordinates and tile grid coordinates. From there it's just a matter of determining the extent of the data set and, for a specified set of zoom levels, looping through and using Mapnik to render the UTFGrid to a .json file in <code>Z/X/Y.json</code> directory structure. </p>
<p><a href="https://github.com/Ecotrust/create-utfgrids" class="btn btn-primary">Get `create-utfgrids` on github</a></p>
<p>If we have a mercator polygon shapefile of ecoregions and want to render UTFGrids for zoom levels 3 through 5 using the <code>dom_desc</code> and <code>div_desc</code> attributes, we could use a command like</p>
<div class="highlight"><pre><span></span><code>$ ./create_utfgrids.py test_data/bailey_merc.shp <span class="m">3</span> <span class="m">5</span> ecoregions -f dom_desc,div_desc
WARNING:
This script assumes a polygon shapefile <span class="k">in</span> spherical mercator projection.
If any of these assumptions are not true, don<span class="err">'</span>t count on the results!
* Processing Zoom Level <span class="m">3</span>
* Processing Zoom Level <span class="m">4</span>
* Processing Zoom Level <span class="m">5</span>
</code></pre></div>
<p>and inspect the output (e.g. zoom level 5, X=20, Y=18)</p>
<div class="highlight"><pre><span></span><code>$ cat ecoregions/5/20/18.json <span class="p">|</span> python -mjson.tool
<span class="o">{</span>
<span class="s2">"data"</span>: <span class="o">{</span>
<span class="s2">"192"</span>: <span class="o">{</span>
<span class="s2">"div_desc"</span>: <span class="s2">"RAINFOREST REGIME MOUNTAINS"</span>,
<span class="s2">"dom_desc"</span>: <span class="s2">"HUMID TROPICAL DOMAIN"</span>
<span class="o">}</span>,
...
<span class="s2">"grid"</span>: <span class="o">[</span>
<span class="s2">" !!!!!!!!!#####</span>$<span class="s2">%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%"</span>,
...
</code></pre></div>
<p>Some caveats:</p>
<ul>
<li>This currently only works for polygon datasets in a Web Mercator projection.</li>
<li>It's only tested with shapefiles as it assumes a single-layer datasource at the moment. Full OGR Datasource support would not be too difficult to add for PostGIS, etc.</li>
<li>It assumes a top-origin tile scheme (as do OSM and Google Maps). Supporting TMS bottom-origin schemes in the future should be straightforward. </li>
<li>Requires OGR and Mapnik >= 2.0 with python bindings. Finding windows binaries for the required version of Mapnik may be difficult so using OSX/Linux is recommended at this time. </li>
</ul>
<p>Many thanks to Dane Springmeyer for his help on UTFGrid related matters and
and to Klokan Petr Přidal for his <a href="http://www.maptiler.org/google-maps-coordinates-tile-bounds-projection/">MapTiler docs</a></p>Introducing the Madrona framework2012-07-11T00:00:00-06:002012-07-11T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2012-07-11:/introducing-the-madrona-framework.html<h3><a href="http://madrona.ecotrust.org">Madrona</a>: A software framework for effective place-based decision making</h3>
<p><img alt="Madrona" src="http://madrona.ecotrust.org/assets/img/madrona-logo.png"></p>
<p>My work at <a href="http://www.ecotrust.org/">Ecotrust</a> mainly revolves around creating web-based spatial analysis tools - software to bring data-driven science to the place-based descision making process. This began several years ago when I joined the MarineMap team. Since working with Ecotrust, we've taken the …</p><h3><a href="http://madrona.ecotrust.org">Madrona</a>: A software framework for effective place-based decision making</h3>
<p><img alt="Madrona" src="http://madrona.ecotrust.org/assets/img/madrona-logo.png"></p>
<p>My work at <a href="http://www.ecotrust.org/">Ecotrust</a> mainly revolves around creating web-based spatial analysis tools - software to bring data-driven science to the place-based descision making process. This began several years ago when I joined the MarineMap team. Since working with Ecotrust, we've taken the MarineMap software far beyond it's original niche. What was once a specific tool for marine protected area planning has now become a powerful framework for <a href="http://madrona.ecotrust.org/experience/">all sorts of web-based spatial tools</a> in the realms of marine, forestry, conservation planning, aquatic habitat restoration, etc. So, in a sense, <a href="http://madrona.ecotrust.org">Madrona</a> is a recognition of that evolution. </p>
<p>From the official <a href="http://madrona.ecotrust.org">Madrona</a> release announcement from the <a href="http://blog.ecotrust.org/software-for-21st-century-decisions-2/">Ecotrust blog post</a>:</p>
<blockquote>
<p>Over the last year we’ve distilled the best ideas from our most successful tools into a suite of software building blocks that can be mixed and matched to create cutting-edge software for decision support and spatial planning at any scale. These building blocks are already at the heart of our work and now we’re ready to share them with you.</p>
</blockquote>
<p>So what is <a href="http://madrona.ecotrust.org">Madrona</a> from a developer's perspective? </p>
<ul>
<li>A set of <em>python</em> <em>django</em> apps that provide models, views and templates for representing spatial features and solving problems specific to spatial decision tools.</li>
<li>A RESTful <em>API</em> for accessing spatial features</li>
<li>A collection of <em>javascript</em> libraries (based on JQuery) to provide a web-based interface to the API.</li>
</ul>
<p>In short, we think its a great platform for spatial tools and we want to open it up to the wider developer audience. Ecotrust already has many <a href="http://madrona.ecotrust.org/experience/">madrona-based apps</a> in the wild (with many more in development) but we're hoping to get other folks using (and contributing to) the Madrona framework in the future. </p>
<p>I know this post is short on technical details but there will more to come ... for now, check out the <a href="http://madrona.ecotrust.org/technology/">technology page</a> for an overview or the <a href="http://madrona.ecotrust.org/developer/">developer's page</a> to dive in. </p>Migrating from Wordpress to Jekyll2012-04-28T00:00:00-06:002012-04-28T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2012-04-28:/migrating-from-wordpress-to-jekyll.html<p>I just switched this blog from an ancient version of wordpress running on a VPS
to a static-file <a href="http://jekyllbootstrap.com/">jekyll bootstrap</a> site
(hosted by <a href="http://github.com/perrygeo/perrygeo.github.com">github</a>).
Let me know if you experience any wierdness on the site or feeds. I've taken good measures to make sure links don't break (old URLS should …</p><p>I just switched this blog from an ancient version of wordpress running on a VPS
to a static-file <a href="http://jekyllbootstrap.com/">jekyll bootstrap</a> site
(hosted by <a href="http://github.com/perrygeo/perrygeo.github.com">github</a>).
Let me know if you experience any wierdness on the site or feeds. I've taken good measures to make sure links don't break (old URLS should get a 301 permanent redirect to blog.perrygeo.net) but let me know if you get any 404s.</p>
<h3>So why do it?</h3>
<ol>
<li>Having a PHP-MySQL app running on a VPS just to serve up a bunch of blog posts seemed excessive. I don't have the desire to maintain that sort of infrastructure for a simple blog!</li>
<li>Wordpress' editing and admin interface suck. I prefer vim and bash.</li>
<li>Markdown is a great language for quickly banging out blog posts.</li>
<li>Static files just make sense for what is basically static content.</li>
<li>Github pages provides the hosting for me and even handles CNAMEs for DNS.</li>
<li>Managing revisions with <code>git</code>.</li>
</ol>
<h3>The conversion process</h3>
<p>It was not an entirely smooth transition, most of which can be traced directly to dumb decisions on my part. I won't recount the entire process (there are plenty of guides on internets) but I'll outline the major steps here:</p>
<ol>
<li>Export the wordpress blog to an xml file. I has to use <code>xmllint</code> to clean it up a bit. </li>
<li>Set up a <a href="http://disqus.com">disqus</a> account and import my wordpress file. Disqus will handle all the comments which are the only dynamic content on the page. </li>
<li>Use <a href="https://github.com/thomasf/exitwp">exitwp.py</a> to convert the xml to jekyll markdown files. This worked OK. Not great. Tags and formatting did not come through as expected and I had to wrestle the script a bit. Tables were destroyed and some iframes (youtube links) were lost. </li>
<li>Forked Jekyll Bootstrap and brought in my posts. </li>
<li>Started tweaking of css and markdown to get formatting right. Still have a ways to go on this front - let me know if there is any content you'd like me to restore faster than others.</li>
<li>Had to write a little web service to redirect posts; the old blog stupidly used the default wordpress URLS like <code>/wordpress/?p=4</code> which needed to go to <code>/2010/01/01/blah</code></li>
<li>My images were all over the place; some I had in wordpress uploads, others on various servers, some were absolute links, others relative. Gathering them all in one place and using some sed-fu to get the paths right was essential.</li>
<li>Retagged some posts - still working on tags.</li>
<li>Set up Google Analytics to track usage. </li>
</ol>
<p>I think that's about it. There are still some big formatting problems on older posts (mostly due to the fact that I used blockquotes for code). And tables are still destroyed. I'll be working on cleaning these up as I go along. </p>
<p>Overall impression of Jekyll-Bootstrap and hosting with Github pages? <strong>Awesome</strong>. I would highly recomend it to anyone starting a new blog or converting a smaller/better-behaved wordpress site.
It is so much better than having to deal with PHP and MySQL (hopefully the last time I'll ever see them!). But the conversion was a bit tricky and took way more of my Friday and Saturday than I'd like to admit. I would not want to do that again... But I'm glad did. </p>
<p>What do you think of the new digs?</p>Working with mbtiles in python2012-03-25T00:00:00-06:002012-03-25T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2012-03-25:/working-with-mbtiles-in-python.html<p><a href="https://github.com/perrygeo/python-mbtiles">python-mbtiles</a>. Check it out.</p>
<p>I've been working a bit with Tilemill lately and love the Carto css styling, iteractivity through UTFGrids and being able to export the whole deal as a single <a href="http://mapbox.com/mbtiles-spec/">mbtiles</a> sqlite database. But when it comes to working with the mbtiles databases, I've found both Tilestache and …</p><p><a href="https://github.com/perrygeo/python-mbtiles">python-mbtiles</a>. Check it out.</p>
<p>I've been working a bit with Tilemill lately and love the Carto css styling, iteractivity through UTFGrids and being able to export the whole deal as a single <a href="http://mapbox.com/mbtiles-spec/">mbtiles</a> sqlite database. But when it comes to working with the mbtiles databases, I've found both Tilestache and Tilestream to be fairly limiting:</p>
<p>Tilestache serves images but does not (yet) serve up UTFGrids _directly from mbtiles _ while Tilestream hardcodes a "grid()" JSONP callback around the returned json data making it fairly specific to Wax client libraries.</p>
<p>So I went down two paths, first trying to export all the tiles out of mbtiles to json and png files (for those times when you just want to serve static files), then trying to write a simple server that would do dynamic jsonp callbacks. Turns out that in the process, I was able to abstract a lot of the python< ->sqlite interaction into some generic python classes.</p>
<p>Thus <a href="https://github.com/perrygeo/python-mbtiles">python-mbtiles</a> was born. It provides a simple mbtiles web server, a conversion script, and some python classes to work with. No frills, no anything really at this point. More an experiment gone right that might be useful to someone out there in GeoPython land. Enjoy and let me know if you have any ideas!</p>Average Aspect2012-03-18T00:00:00-06:002012-03-18T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2012-03-18:/average-aspect.html<p>Ever try to figure out what the average aspect of an area is? i.e. </p>
<blockquote>
<p>What direction does this hillside face? </p>
</blockquote>
<p>Let's say we want to determine the average elevation of an area based on a raster DEM. Just take the arithmetic mean of all the elevation cells contained in …</p><p>Ever try to figure out what the average aspect of an area is? i.e. </p>
<blockquote>
<p>What direction does this hillside face? </p>
</blockquote>
<p>Let's say we want to determine the average elevation of an area based on a raster DEM. Just take the arithmetic mean of all the elevation cells contained in the area - a simple zonal statistics problem.</p>
<p>Turns out that aspect is not quite as straightforward. True, we can easily use <a href="http://www.gdal.org/gdaldem.html">gdaldem</a> to create an aspect map.</p>
<p><code>gdaldem aspect elevation.tif aspect.tif</code></p>
<p>This gives a raster with values in degrees: 0 is north, 90 is east, 180 is south, etc... but note that 360 is north as well. We're dealing with angular units, not linear units. </p>
<p>For example, take a nearly North facing hillside; the left edge is facing slightly NW (350 degrees) while the right edge faces slighty NE (10 degrees).</p>
<p>The arithmetic mean of the aspect values = <code>(350+350+10+10)/4 = 180°</code>. Due south? That's entirely wrong! It doesn't take into account the angular units. For that we need to create grids representing the <em>sin</em> and <em>cos</em> of the aspect. </p>
<p>Luckily you can use the handy <a href="http://svn.osgeo.org/gdal/trunk/gdal/swig/python/scripts/gdal_calc.py">gdal_calc.py</a> utility that comes with recent versions of gdal. This allows you to apply numpy's trigonometric functions to a raster...</p>
<div class="highlight"><pre><span></span><code>gdal_calc.py -A aspect.tif --calc "cos(radians(A))" --format "GTiff" --outfile cos_aspect.tif
gdal_calc.py -A aspect.tif --calc "sin(radians(A))" --format "GTiff" --outfile sin_aspect.tif
</code></pre></div>
<p>Now we can look at the sum of the cos/sin grid cells for our area and take the arctangent according to this python code</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">math</span>
<span class="n">avg_aspect_rad</span> <span class="o">=</span> <span class="n">math</span><span class="o">.</span><span class="n">atan2</span><span class="p">(</span><span class="nb">sum</span><span class="p">(</span><span class="n">cos_cells</span><span class="p">),</span> <span class="nb">sum</span><span class="p">(</span><span class="n">sin_cells</span><span class="p">))</span>
<span class="n">avg_aspect_deg</span> <span class="o">=</span> <span class="n">math</span><span class="o">.</span><span class="n">degrees</span><span class="p">(</span><span class="n">avg_aspect_rad</span><span class="p">)</span>
<span class="nb">print</span> <span class="n">avg_aspect_deg</span>
</code></pre></div>
<p>In our example avg_aspect_deg comes out to an aspect of 0 degrees (due north) which is exactly what we'd expect. </p>
<p>Thanks to Dan Patterson for his <a href="http://forums.esri.com/Thread.asp?c=3&f=40&t=119358&mc=8#343468">forum post</a> which clued me into this approach. </p>UTFGrids with OpenLayers and Tilestache2012-02-24T00:00:00-07:002012-02-24T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2012-02-24:/utfgrids-with-openlayers-and-tilestache.html<p>A while back, the Development Seed team developed the <a href="http://mapbox.com/mbtiles-spec/utfgrid/">UTFGrid spec</a> to provide</p>
<blockquote>
<p>a standard, scalable way of encoding data for hundreds or thousands of features alongside your map tiles.</p>
</blockquote>
<h3>The basics</h3>
<p>In more detail, the UTFGrids are invisible "ASCII Art" and attribute data embedded in json. They sit "behind …</p><p>A while back, the Development Seed team developed the <a href="http://mapbox.com/mbtiles-spec/utfgrid/">UTFGrid spec</a> to provide</p>
<blockquote>
<p>a standard, scalable way of encoding data for hundreds or thousands of features alongside your map tiles.</p>
</blockquote>
<h3>The basics</h3>
<p>In more detail, the UTFGrids are invisible "ASCII Art" and attribute data embedded in json. They sit "behind" your map tiles (they are not rendered visually) and allows quick attribute lookups <em>without</em> going back to the server. This allows a high degree of real-time map interactivity in an HTML web map - something that has typically been the strong point of plugin-based maps. </p>
<p>So take this tile image...</p>
<p><img alt="" src="http://vmap0.tiles.osgeo.org/wms/vmap0?LAYERS=basic&SERVICE=WMS&VERSION=1.1.1&REQUEST=GetMap&STYLES=&FORMAT=image%2Fjpeg&SRS=EPSG%3A900913&BBOX=-0.0007999986410141,5009377.084,5009377.084,10018754.1688&WIDTH=256&HEIGHT=256"> </p>
<p>and it's corresponding "utfgrid" ...</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="o">!</span><span class="err">######</span><span class="o">$$$$%%%</span><span class="w"> </span><span class="o">%%%%</span><span class="w"> </span><span class="o">%</span><span class="w"> </span>
<span class="w"> </span><span class="o">!</span><span class="err">#######</span><span class="o">$$$$%%%</span><span class="w"> </span><span class="o">%%%</span><span class="w"></span>
<span class="w"> </span><span class="o">!!</span><span class="err">#####</span><span class="w"> </span><span class="o">$$$%%%</span><span class="w"> </span><span class="o">%%%</span><span class="w"></span>
<span class="w"> </span><span class="o">!</span><span class="err">######</span><span class="w"> </span><span class="o">$$$$%%%</span><span class="w"> </span><span class="o">%%</span><span class="w"> </span><span class="o">%%%</span><span class="w"></span>
<span class="w"> </span><span class="o">!!!</span><span class="err">####</span><span class="w"> </span><span class="o">$$$$$%%%%</span><span class="w"> </span><span class="o">%%%%</span><span class="w"></span>
<span class="w"> </span><span class="o">!</span><span class="w"> </span><span class="o">!</span><span class="err">######</span><span class="w"> </span><span class="o">$$$$$$%%%%%%%%%%</span><span class="w"></span>
<span class="w"> </span><span class="o">!</span><span class="w"> </span><span class="o">!!</span><span class="err">#####</span><span class="w"> </span><span class="o">$$$$$$$%%%%%%%%%</span><span class="w"></span>
<span class="w"> </span><span class="o">!!!!!</span><span class="err">####</span><span class="w"> </span><span class="o">$$$$$$%%%%%%%%%%</span><span class="w"></span>
<span class="w"> </span><span class="o">!!!!!</span><span class="err">####</span><span class="w"> </span><span class="o">$$$$$$%%%%%%%%%%</span><span class="w"></span>
<span class="w"> </span><span class="o">!!!!!</span><span class="err">####</span><span class="w"> </span><span class="o">$$$$$%%%%%%%%%%%</span><span class="w"></span>
<span class="w"> </span><span class="o">!!!!!</span><span class="err">#####</span><span class="o">%</span><span class="w"> </span><span class="o">$$</span><span class="w"> </span><span class="o">%%%%%%%%%%%</span><span class="w"></span>
<span class="w"> </span><span class="o">!!!!!</span><span class="err">###</span><span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="o">%%%%%%%%%%%%</span><span class="w"></span>
<span class="w"> </span><span class="o">!!!</span><span class="w"> </span><span class="err">#####</span><span class="w"> </span><span class="s1">''''</span><span class="o">%%%%%%%%%%%%</span><span class="w"></span>
<span class="w"> </span><span class="o">!</span><span class="w"> </span><span class="err">###</span><span class="w"> </span><span class="o">(</span><span class="err">'</span><span class="o">%%%%%%%%%%%%</span><span class="w"></span>
<span class="w"> </span><span class="o">)</span><span class="w"> </span><span class="err">###</span><span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="o">(</span><span class="w"> </span><span class="o">((%%%%%%%%%%%%</span><span class="w"></span>
<span class="w"> </span><span class="o">))</span><span class="w"> </span><span class="err">##</span><span class="w"> </span><span class="o">(((((%%%%%%%%%%%%</span><span class="w"></span>
<span class="w"> </span><span class="o">))</span><span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="o">****(+%%%%%%%%%%%</span><span class="w"></span>
<span class="w"> </span><span class="o">)</span><span class="w"> </span><span class="o">%**++++%%%%%%%%%</span><span class="w"></span>
<span class="w"> </span><span class="o">,</span><span class="w"> </span><span class="o">,</span><span class="w"> </span><span class="nt">------</span><span class="o">*+++++%%%%%%%%%</span><span class="w"></span>
<span class="o">.</span><span class="w"> </span><span class="o">,,,,,</span><span class="nt">------</span><span class="o">+++++++%%%%%%%%</span><span class="w"></span>
<span class="o">..</span><span class="w"> </span><span class="o">/,,,,,,</span><span class="nt">------</span><span class="o">++++++%%%%%%%%%</span><span class="w"></span>
<span class="o">.</span><span class="w"> </span><span class="o">//,,,,,,</span><span class="nt">------000</span><span class="o">++</span><span class="nt">000</span><span class="o">%%%%%%%</span><span class="w"></span>
<span class="w"> </span><span class="nt">211</span><span class="o">,,,,,</span><span class="nt">33------00000000</span><span class="o">%%%%%%</span><span class="w"></span>
<span class="w"> </span><span class="nt">2221</span><span class="o">,,,,</span><span class="nt">33333---00000000000</span><span class="o">%%%%</span><span class="w"></span>
<span class="nt">222222</span><span class="o">,,,,</span><span class="nt">3635550000000000000</span><span class="o">%%%</span><span class="w"></span>
<span class="nt">222222</span><span class="o">,,,,</span><span class="nt">6665777008900000000</span><span class="o">%%%</span><span class="w"></span>
<span class="nt">22222</span><span class="p">::</span><span class="nd">66666777788889900000</span><span class="w"> </span><span class="o">%%%%</span><span class="w"></span>
<span class="nt">22222</span><span class="o">:;;;;%%=</span><span class="nt">7</span><span class="o">%</span><span class="nt">8888890</span><span class="w"> </span><span class="nt">0</span><span class="w"> </span><span class="o">%%%%</span><span class="w"></span>
<span class="nt">22222</span><span class="o">;;;;</span><span class="w"> </span><span class="o">==??%%</span><span class="nt">888888</span><span class="w"> </span><span class="nt">00</span><span class="w"> </span><span class="o">%%%%%</span><span class="w"></span>
<span class="nt">222222</span><span class="w"> </span><span class="o">;;</span><span class="w"> </span><span class="o">=??%%%</span><span class="nt">8888</span><span class="w"> </span><span class="o">%%%%</span><span class="w"></span>
<span class="nt">222</span><span class="w"> </span><span class="o">;;</span><span class="w"> </span><span class="o">?</span><span class="nt">A</span><span class="o">>>@@@</span><span class="w"> </span><span class="nt">B</span><span class="o">%</span><span class="w"></span>
<span class="nt">CCC</span><span class="w"> </span><span class="o">;;</span><span class="w"> </span><span class="nt">DEE</span><span class="o">@@@</span><span class="w"> </span><span class="nt">BB</span><span class="w"></span>
</code></pre></div>
<p>You can see how each character corresponds with a country. The character's code is used as a lookup key to retrieve the data associated with that feature (which is also included in the json tile).</p>
<p>If you want to dig in, check out the <a href="http://mapbox.com/demo/visiblemap/">mapbox demo</a>. </p>
<h3>The Server side</h3>
<p>I'm going to assume you have <a href="http://tilestache.org/">Tilestache</a> and <a href="https://github.com/mapnik/mapnik">Mapnik 2+</a> already installed (if not, you should!). The steps to configuring your server for UTFGrids are fairly simple.. </p>
<p><strong>First</strong>, set up mapnik xml file pointing to your data source.</p>
<div class="highlight"><pre><span></span><code><span class="cp"><?xml version="1.0"?></span>
<span class="cm"><!-- An ultra simple Mapnik stylesheet --></span>
<span class="cp"><!DOCTYPE Map [</span>
<span class="cp"><!ENTITY google_mercator "+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +wktext +no_defs +over"></span>
]>
<span class="nt"><Map</span> <span class="na">srs=</span><span class="s">"&google_mercator;"</span><span class="nt">></span>
<span class="nt"><Style</span> <span class="na">name=</span><span class="s">"style"</span><span class="nt">></span>
<span class="nt"><Rule></span>
<span class="nt"><PolygonSymbolizer></span>
<span class="nt"><CssParameter</span> <span class="na">name=</span><span class="s">"gamma"</span><span class="nt">></span>.65<span class="nt"></CssParameter></span>
<span class="nt"><CssParameter</span> <span class="na">name=</span><span class="s">"fill"</span><span class="nt">></span>green<span class="nt"></CssParameter></span>
<span class="nt"><CssParameter</span> <span class="na">name=</span><span class="s">"fill-opacity"</span><span class="nt">></span>0.5<span class="nt"></CssParameter></span>
<span class="nt"></PolygonSymbolizer></span>
<span class="nt"><LineSymbolizer></span>
<span class="nt"><CssParameter</span> <span class="na">name=</span><span class="s">"stroke"</span><span class="nt">></span>#666<span class="nt"></CssParameter></span>
<span class="nt"><CssParameter</span> <span class="na">name=</span><span class="s">"stroke-width"</span><span class="nt">></span>0.3<span class="nt"></CssParameter></span>
<span class="nt"></LineSymbolizer></span>
<span class="nt"></Rule></span>
<span class="nt"></Style></span>
<span class="nt"><Layer</span> <span class="na">name=</span><span class="s">"layer"</span> <span class="na">srs=</span><span class="s">"&google_mercator;"</span><span class="nt">></span>
<span class="nt"><StyleName></span>style<span class="nt"></StyleName></span>
<span class="nt"><Datasource></span>
<span class="nt"><Parameter</span> <span class="na">name=</span><span class="s">"type"</span><span class="nt">></span>shape<span class="nt"></Parameter></span>
<span class="nt"><Parameter</span> <span class="na">name=</span><span class="s">"file"</span><span class="nt">></span>sample_data/world_merc.shp<span class="nt"></Parameter></span>
<span class="nt"></Datasource></span>
<span class="nt"></Layer></span>
<span class="nt"></Map></span>
</code></pre></div>
<p><strong>Next</strong>, set up tilestache configuration file</p>
<div class="highlight"><pre><span></span><code>{
"cache": {
"name": "Disk",
"path": "/tmp/stache"
},
"layers": {
"world":
{
"provider": {"name": "mapnik", "mapfile": "style.xml"}
},
"world_utfgrid":
{
"provider":
{
"class": "TileStache.Goodies.Providers.MapnikGrid:Provider",
"kwargs":
{
"mapfile": "style.xml",
"fields":["NAME", "POP2005"],
"layer_index": 0,
"scale": 4
}
}
}
}
</code></pre></div>
<p>Finally, you're ready to run the tilestache server...</p>
<div class="highlight"><pre><span></span><code>tilestache-server.py -c your.cfg -i localhost -p 7890
</code></pre></div>
<p>Now you should be serving utfgrids to <code>http://localhost:7890/world_utfgrid/</code></p>
<h3>The Client side</h3>
<p>Now we need something to consume the UTFGrid tiles and interact with them in an HTML/JS environment. The original client implementation of UTFGrid support is provided by <a href="http://mapbox.com/wax/">Wax</a> which sits atop mapping clients like Modest Maps and Leaflet. Wax is very slick and easy to use but doesn't work so well for more complex arrangements or with OpenLayers-based maps. </p>
<p>Rather than clog up Wax with the complex UTFGrid use cases that we envisioned, we decided to implement a UTFGrid client in native OpenLayers. Hence my project for the <a href="http://wiki.osgeo.org/wiki/IslandWood_Code_Sprint_2012">OSGEO code sprint</a> was born.</p>
<p><img alt="olexample.PNG" src="/assets/img/uploads/2012/02/olexample.PNG"></p>
<p>The result was a new OpenLayers Layer which loads up the json "tiles" behind the scenes...</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">grid_layer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">new</span><span class="w"> </span><span class="n">OpenLayers</span><span class="o">.</span><span class="n">Layer</span><span class="o">.</span><span class="n">UTFGrid</span><span class="p">(</span><span class="w"> </span>
<span class="w"> </span><span class="s1">'Invisible UTFGrid Layer'</span><span class="p">,</span><span class="w"> </span>
<span class="w"> </span><span class="s2">"./utfgrid/world_utfgrid/${z}/${x}/${y}.json"</span><span class="w"></span>
<span class="w"> </span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">map</span><span class="o">.</span><span class="n">addLayer</span><span class="p">(</span><span class="n">grid_layer</span><span class="p">);</span><span class="w"></span>
</code></pre></div>
<p>and an OpenLayers Control that handles how the mouse events interact with the grid. In this example, as the mouse moves over the map, a custom callback if fired off which updates a div with some attribute information.</p>
<div class="highlight"><pre><span></span><code> var callback = function(attributes) {
if (attributes) {
var msg = "<span class="nt"><strong></span>In 2005, " + attributes.NAME
msg += " had a population of " + attributes.POP2005 + " people.<span class="nt"></strong></span>";
var element = OpenLayers.Util.getElement('attrsdiv');
element.innerHTML = msg;
return true;
} else {
this.element.innerHTML = '';
return false;
}
}
var control = new OpenLayers.Control.UTFGrid({
'handlerMode': 'move',
'callback': callback
});
map.addControl(control);
</code></pre></div>
<p>Overall the design goal was to decouple the loading/tiling of the UTFGrids from the interactivity/control. I think this works out nicely and, while a bit more cumbersome than the method used by Wax, it is more flexible and integrates well with existing OpenLayers apps. </p>
<p>You can see them in action on the examples pages:</p>
<ul>
<li>
<p>Demonstrating the use of <a href="http://labs.ecotrust.org/utfgrid/events.html">different event handlers</a> (click, hover, move)</p>
</li>
<li>
<p>Demonstrating <a href="http://labs.ecotrust.org/utfgrid/multi.html">multiple interactivity layers</a> (the interactivity layer need not visible in the map tiles!)</p>
</li>
</ul>
<p>And feel free to check out the code at <a href="https://github.com/perrygeo/openlayers/tree/utfgrid">my github fork</a> for the code. </p>
<p>What do you think? Let me know...</p>Optimizing KML for hierarchical polygon data2011-05-18T00:00:00-06:002011-05-18T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2011-05-18:/optimizing-kml-for-hierarchical-polygon-data.html<p>For all the benefits of KML, it is decidedly a step backwards for handling large vector datasets. Most KML clients, including the cannonical Google Earth application, experience debilitating slow-down when viewing a couple dozen MB of vector data - datasets that I could easily open on a Pentium 4 in ArcView …</p><p>For all the benefits of KML, it is decidedly a step backwards for handling large vector datasets. Most KML clients, including the cannonical Google Earth application, experience debilitating slow-down when viewing a couple dozen MB of vector data - datasets that I could easily open on a Pentium 4 in ArcView 3.2 10 years ago! </p>
<p>The unfortunate reality is that optimizing the performance of KML datasets is conflated with the structure of the data and is thus the responsibility of the data publisher. The wisdom of combining styling, performance-related structure, organizational structure, geometry and attributes into a single file format may be questionable, but KML has become the defacto geographic markup language due to it's other benefits. </p>
<p>Anyways, back to performance enhancements on big vector datasets... The concept of "regionation" is used by several KML software to improve performance. From the <a href="http://google-latlong.blogspot.com/2010/09/faster-larger-closer-regionation-in.html">Google LatLong Blog</a>:</p>
<blockquote>
<p>You can think of Regionation as a <strong>hierarchical subdivision of points or tiles</strong>, which shows less detail from afar, and more detail as you zoom in to the globe. This dynamic loading creates clearer visualizations by minimizing clutter, while simultaneously speeding up the rendering process.</p>
</blockquote>
<p>In most implementations, there is a generic strategy for determining this hierarchy based on attributes or geometry size (in the case of vectors) or by a tile system. Neither is ideal when you want to preserve the vector nature of the data, split it into small, easily-loadable files and determine it's view based on the <strong>natural hierarchy that is built into the data structure</strong>.</p>
<p>Specifically I am thinking about watersheds here - the US <a href="http://nwis.waterdata.usgs.gov/tutorial/huc_def.html">Hydrologic Units</a>. Hydrologic units are watershed boundaries that are organized in a nested hierarchy; higher levels contain smaller watersheds that are contained within a single watershed from a "parent" level. The unique identifiers (hydrologic unit codes or HUCs) are rather ingenious as well; Each level is represented by 2 digits and are concatenated to form a single identifier that can be used to determine it's "parent". For example:</p>
<p><img alt="Level 4 HUCs" src="/assets/img/uploads/2011/05/huc8.png"></p>
<p><img alt="Level 5 HUCs" src="/assets/img/uploads/2011/05/huc10.png"></p>
<p><img alt="Level 6 HUCs" src="/assets/img/uploads/2011/05/huc12.png"></p>
<p>Level 4 HUCs <br>
e.g. 170900<strong>11</strong></p>
<p>Level 5 HUCs <br>
e.g. 17090011<strong>04</strong></p>
<p>Level 6 HUCs <br>
e.g. 1709001104<strong>03</strong></p>
<p>Instead of fabricating a hierarchy of features, why not just use this natural hierarchy to structure the KML documents?</p>
<p><img alt="hucs-1.png" src="/assets/img/uploads/2011/05/hucs-1.png"></p>
<p>Or as KML markup:</p>
<div class="highlight"><pre><span></span><code> <span class="nt"><placemark></span>
<span class="nt"><name></span>17090009<span class="nt"></name></span>
<span class="nt"><styleurl></span>#HUC_8-default<span class="nt"></styleurl></span>
<span class="nt"><polygon><outerboundaryis><linearring><coordinates></span>...
<span class="nt"></coordinates></linearring></outerboundaryis></polygon></span>
<span class="nt"></placemark></span>
<span class="nt"><networklink></span>
<span class="nt"><name></span>17090009_children<span class="nt"></name></span>
<span class="nt"><region></span>
<span class="nt"><latlonaltbox></span>
<span class="nt"><west></span>-123.001645628<span class="nt"></west></span>
<span class="nt"><south></span>44.8300083641<span class="nt"></south></span>
<span class="nt"><east></span>-122.203351254<span class="nt"></east></span>
<span class="nt"><north></span>45.298653051<span class="nt"></north></span>
<span class="nt"></latlonaltbox></span>
<span class="nt"><lod></span>
<span class="nt"><minlodpixels></span>256<span class="nt"></minlodpixels></span>
<span class="nt"><maxlodpixels></span>1600<span class="nt"></maxlodpixels></span>
<span class="nt"></lod></span>
<span class="nt"></region></span>
<span class="nt"><link></span>
<span class="nt"><href></span>./17090009_children.kml<span class="nt"></href></span>
<span class="nt"><viewrefreshmode></span>onRegion<span class="nt"></viewrefreshmode></span>
<span class="nt"></link></span>
<span class="nt"></networklink></span>
</code></pre></div>
<p>The advantages to this design are that you don't have to break the geometries up to fit into a square tiling pattern, data loads and renders in a logical pattern and there will always be 100 or less (usually far less) placemarks per file due to the design of the HUC data structure. File sizes stay low, network links load quickly and request/rendering occurs only when they come into view. For this example dataset totaling 300M of shapefiles, there are several hundred resulting kmz files without any repeated features and all less than ~ 150K each. In essence, it achieves optimal performance by its very design. </p>
<p>Here's a video of it in action:</p>
<iframe width="420" height="315" src="http://www.youtube.com/embed/5FgOfLEVX8M" frameborder="0">iframecontent</iframe>
<p>This was all done with <a href="http://watershed-priorities.googlecode.com/hg/util/kml_regionate_heirarchy.py">a fairly "hackish" python script</a>. I'll continue to refine it as needed for this particular application but, at this time, it's not intended to be a reusable tool - if you want to use it, be prepared to dig through the source code and get your hands dirty. The same concept could theoretically be applied to any spatially-hierarchical vector data (think geographic boundaries ... country > state > county > city).</p>Um - nice “review” of QGIS2010-12-20T00:00:00-07:002010-12-20T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2010-12-20:/um-nice-review-of-qgis.html<p>RJ Zimmer at American Surveyor magazine did what he described as a comparison of several free GIS application entitled "<a href="http://www.amerisurv.com/PDF/TheAmericanSurveyor_Zimmer-SomethingForNothing_Vol7No8.pdf">Something for Nothing</a>"</p>
<p>First of all, the title bugs me. The idea that the sole benefit of free software is simply cost savings is pretty naive. It disregards openness, community support …</p><p>RJ Zimmer at American Surveyor magazine did what he described as a comparison of several free GIS application entitled "<a href="http://www.amerisurv.com/PDF/TheAmericanSurveyor_Zimmer-SomethingForNothing_Vol7No8.pdf">Something for Nothing</a>"</p>
<p>First of all, the title bugs me. The idea that the sole benefit of free software is simply cost savings is pretty naive. It disregards openness, community support, ability to transfer knowledge, freedom from restrictive licensing, etc. But I can live with the title.</p>
<p>I can also live with his decision to include only a single open-source GIS application alongside 3 closed-but-gratis applications. He doesn't claim that it's a comprehensive review despite the fact that the ecosystem of Free GIS is far more diverse.</p>
<p>But I can't accept his treatment of Quantum GIS:</p>
<blockquote>
<p>I did not fully test Quantum GIS. I did download and install it but the software was too complicated to use "right out of the box", and I did not have time to learn to use it.</p>
</blockquote>
<p>The feature comparison chart includes mainly "?" in the QGIS column. </p>
<p>OK we get it - your deadline hit before you could bother to learn one of the applications you were supposedly reviewing. One even wonders why he included QGIS the review at all. This is nothing short of irresponsible reporting. When people post stuff like this, it really rubs me the wrong way - now a whole audience of users have a inaccurate view of QGIS and entire free GIS ecosystem thanks to his slacker journalism.</p>kmltree2010-06-09T00:00:00-06:002010-06-09T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2010-06-09:/kmltree.html<p>When the <a href="http://marinemap.org">MarineMap</a> team started delving into the <a href="http://earth.google.com/plugin/">Google Earth plugin</a>, it was apparent that it supported the display and rendering of KML files <em>almost</em> as well as the Google Earth desktop application. The missing piece of functionality was the nice tree-style legend that is provided with the desktop app …</p><p>When the <a href="http://marinemap.org">MarineMap</a> team started delving into the <a href="http://earth.google.com/plugin/">Google Earth plugin</a>, it was apparent that it supported the display and rendering of KML files <em>almost</em> as well as the Google Earth desktop application. The missing piece of functionality was the nice tree-style legend that is provided with the desktop app. The plugin lets you add KML for display but gives you no HTML interface to work with it. For simple apps, you can just roll your own html/js form. But that quickly becomes unmanageable if you're adding KML dynamically and need to create a tree-style legend for any arbitrary KML document. </p>
<p>Enter <a href="http://code.google.com/p/kmltree/">kmltree</a>. </p>
<blockquote>
<p>kmltree is a javascript tree widget that can be used in conjunction with the Google Earth API. It replicates the functionality of the Google Earth desktop client, and is fast, extensible, and stable for use in advanced web applications. It's built utilizing the earth-api-utility-library and jQuery. </p>
</blockquote>
<p><a href="/assets/img/uploads/2010/06/screen-shot-2010-06-09-at-81707-am.png"><img alt="kmltree" src="/assets/img/uploads/2010/06/screen-shot-2010-06-09-at-81707-am.png"></a></p>
<p>Any arbitrary KML can be parsed and represented in a tree-style legend right in the web browser. <a href="http://kmltree.googlecode.com/hg/examples/refresh.html">Try it out</a>.</p>
<p>Kmltree is the brainchild of <a href="http://www.google.com/profiles/underbluewaters">Chad Burt</a> who developed it as part of the marinemap codebase but had the foresight to realize that this would be useful to a much wider audience and abstracted it into its own javascript library. If you're building a web mapping application with the Google Earth API, give it a shot!</p>MarineMap wins award for Environmental Conflict Resolution2010-05-27T00:00:00-06:002010-05-27T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2010-05-27:/marinemap-wins-award-for-environmental-conflict-resolution.html<p>For the last year or so, I've had the pleasure of working with the <a href="http://www.marinemap.org">MarineMap Consortium</a>. We just learned yesterday that the U.S. Institute for Environmental Conflict Resolution <a href="http://eon.businesswire.com/portal/site/eon/permalink/?ndmViewId=news_view&newsId=20100526007072&newsLang=en">awarded</a> MarineMap the “Innovation in Technology and Environmental Conflict Resolution”.</p>
<iframe width="560" height="315" src="http://www.youtube.com/embed/GCUxpnUSiUg" frameborder="0"> x</iframe>
<p>I joined the team after the launch of the <a href="http://southcoast.marinemap.org/marinemap/">South …</a></p><p>For the last year or so, I've had the pleasure of working with the <a href="http://www.marinemap.org">MarineMap Consortium</a>. We just learned yesterday that the U.S. Institute for Environmental Conflict Resolution <a href="http://eon.businesswire.com/portal/site/eon/permalink/?ndmViewId=news_view&newsId=20100526007072&newsLang=en">awarded</a> MarineMap the “Innovation in Technology and Environmental Conflict Resolution”.</p>
<iframe width="560" height="315" src="http://www.youtube.com/embed/GCUxpnUSiUg" frameborder="0"> x</iframe>
<p>I joined the team after the launch of the <a href="http://southcoast.marinemap.org/marinemap/">South Coast of California</a> site which was already widely recognized as a successful decision-support tool for marine spatial planning. We've since been working on version 2 of the MarineMap tool which is deployed currently for the <a href="http://northcoast.marinemap.org/marinemap">North Coast of California</a> in support of their Marine Life Protection Act (MLPA) process. </p>
<p>It's been a tremendous challenge to bring a <a href="http://code.google.com/p/marinemap/">new version of the software</a> to life and have it meet and exceed the standards set by its predecessor. It has also been tremendously rewarding and having our work recognized at this level is a great honor. It's nice to know that the tools we've developed have been so helpful and instrumental in the marine planning process along the coast of California. Looking forward, I see MarineMap growing beyond a tool for a specific purpose (supporting the MLPA Initiative) to a robust framework for developing web-based spatial planning tools for all sorts of environmental applications, both marine and terrestrial. And this award confirms that we are already heading in the right direction. Very exciting news!</p>Exploring Geometry2010-05-06T00:00:00-06:002010-05-06T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2010-05-06:/exploring-geometry.html<p>I don't know how I let this gem slip past my radar for so long. It was only via <a href="http://lin-ear-th-inking.blogspot.com/2010/05/random-points-in-polygon-in-jts.html">a post by Dr. JTS</a> himself (aka Martin Davis) that I saw a screenshot of JTS TestBuilder and decided to check it out. </p>
<p>I was actually just talking with someone about …</p><p>I don't know how I let this gem slip past my radar for so long. It was only via <a href="http://lin-ear-th-inking.blogspot.com/2010/05/random-points-in-polygon-in-jts.html">a post by Dr. JTS</a> himself (aka Martin Davis) that I saw a screenshot of JTS TestBuilder and decided to check it out. </p>
<p>I was actually just talking with someone about a tool that could provide simple visualization of WKT geometries; JTS Test Builder does that and much more. </p>
<p>You can input geometries (graphically or by well-known text) and compare two geometries based on spatial predicates:</p>
<p><a href="/assets/img/uploads/2010/05/screen-shot-2010-05-06-at-81418-pm.png"><img alt="spatial predicates" src="/assets/img/uploads/2010/05/screen-shot-2010-05-06-at-81418-pm.png"></a></p>
<p>Do overlay analyses with the two geometries. Note that you can see the result as WKT below.</p>
<p><a href="/assets/img/uploads/2010/05/screen-shot-2010-05-06-at-81502-pm.png"><img alt="overlay" src="/assets/img/uploads/2010/05/screen-shot-2010-05-06-at-81502-pm.png"></a></p>
<p>And there are a host of other spatial operations to generate geometries using buffers...
<a href="/assets/img/uploads/2010/05/screen-shot-2010-05-06-at-81602-pm.png"><img alt="buffers" src="/assets/img/uploads/2010/05/screen-shot-2010-05-06-at-81602-pm.png"></a></p>
<p>... convex hulls ...
<a href="/assets/img/uploads/2010/05/screen-shot-2010-05-06-at-81716-pm.png"><img alt="convex hull" src="/assets/img/uploads/2010/05/screen-shot-2010-05-06-at-81716-pm.png"></a></p>
<p>This app provides a very nice and user-friendly way to quickly and simply explore and test geometric operations. To try it out, <a href="http://sourceforge.net/projects/jts-topo-suite/">download JTS</a> and unzip the contents somewhere. If you're on windows, the .bat file is provided. If you're running anything else, you have to cook up a shell script that will set up the environment and run JTS TestBuilder:</p>
<blockquote>
<div class="highlight"><pre><span></span><code><span class="nt">JTS_HOME</span><span class="o">=/</span><span class="nt">usr</span><span class="o">/</span><span class="nt">share</span><span class="o">/</span><span class="nt">java</span><span class="o">/</span><span class="nt">jts-1</span><span class="p">.</span><span class="nc">11</span><span class="w"></span>
<span class="nt">CP</span><span class="o">=$</span><span class="nt">CLASSPATH</span><span class="w"></span>
<span class="nt">for</span><span class="w"> </span><span class="nt">i</span><span class="w"> </span><span class="nt">in</span><span class="w"> </span><span class="o">$</span><span class="nt">JTS_HOME</span><span class="o">/</span><span class="nt">lib</span><span class="o">/*</span><span class="p">.</span><span class="nc">jar</span><span class="o">;</span><span class="w"> </span><span class="nt">do</span><span class="w"> </span><span class="nt">CP</span><span class="o">=$</span><span class="nt">i</span><span class="o">:$</span><span class="nt">CP</span><span class="o">;</span><span class="w"> </span><span class="nt">done</span><span class="w"></span>
<span class="nt">java</span><span class="w"> </span><span class="nt">-Xmx256m</span><span class="w"> </span><span class="nt">-cp</span><span class="w"> </span><span class="o">$</span><span class="nt">CP</span><span class="w"> </span><span class="nt">com</span><span class="p">.</span><span class="nc">vividsolutions</span><span class="p">.</span><span class="nc">jtstest</span><span class="p">.</span><span class="nc">testbuilder</span><span class="p">.</span><span class="nc">JTSTestBuilder</span><span class="w"> </span><span class="o">$*</span><span class="w"></span>
</code></pre></div>
</blockquote>Distributed2010-03-31T00:00:00-06:002010-03-31T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2010-03-31:/distributed.html<p>I've been playing around with some distributed version control systems (DVCS) to replace svn. </p>
<p>First, the <em>why</em>: I'll leave the details up to Joel in his excellent <a href="http://hginit.com/">HgInit tutorial</a>. Its mercurial-specific but the general concepts apply to any DVCS. The takeaway message for any project with > 1 developer is this …</p><p>I've been playing around with some distributed version control systems (DVCS) to replace svn. </p>
<p>First, the <em>why</em>: I'll leave the details up to Joel in his excellent <a href="http://hginit.com/">HgInit tutorial</a>. Its mercurial-specific but the general concepts apply to any DVCS. The takeaway message for any project with > 1 developer is this:</p>
<blockquote>
<p>Mercurial [ed: DVCS] separates the act of committing new code from the act of inflicting it on everybody else.</p>
</blockquote>
<p>Next, the <em>implementation</em>: I'm using <strong>git</strong> to work on another project (<a href="http://goldencheetah.org/">Golden Cheetah</a>) and its been a tough learning curve. Git is no doubt the most powerful DVCS out there. You can do magical things with it like combine commits and mess with history trees. And you can also screw things up pretty badly if you misinterpret the esotric docs for some non-intuitive piece of the workflow. </p>
<p>I just tried <strong>mercurial</strong> this morning - hg seems to fit my mind well. There is less power but the workflow is very clear and intuitive. And there are docs written for people who don't want to do an in-depth study of their version control software. It stays out of the way. </p>
<p>Long story short, I'm going to use mercurial/hg for my new projects. Ah what the heck my old/ongoing projects as well. My <a href="http://code.google.com/p/perrygeo/">googlecode repository</a> has been converted over to Mercurial. Svn will stick around but wont be updated.</p>Lazy raster processing with GDAL VRTs2010-02-18T00:00:00-07:002010-02-18T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2010-02-18:/lazy-raster-processing-with-gdal-vrts.html<p>No, not lazy as in REST :-) ... Lazy as in "<a href="http://en.wikipedia.org/wiki/Lazy_evaluation">Lazy evaluation</a>":</p>
<blockquote>
<p>In computer programming, lazy evaluation is the technique of delaying a computation until the result is required.</p>
</blockquote>
<p>Take an <strong>example raster processing workflow</strong> to go from a bunch of tiled, latlong, GeoTiff digital elevation models to a single shaded …</p><p>No, not lazy as in REST :-) ... Lazy as in "<a href="http://en.wikipedia.org/wiki/Lazy_evaluation">Lazy evaluation</a>":</p>
<blockquote>
<p>In computer programming, lazy evaluation is the technique of delaying a computation until the result is required.</p>
</blockquote>
<p>Take an <strong>example raster processing workflow</strong> to go from a bunch of tiled, latlong, GeoTiff digital elevation models to a single shaded relief GeoTiff in projected space:</p>
<ol>
<li>Merge the tiles together </li>
<li>Reproject the merged DEM (using bilinear or cubic interpolation) </li>
<li>Generate the hillshade from the merged DEM </li>
</ol>
<p>Simple enough to do with GDAL tools on the command line. Here's the typical, <strong>process-as-you-go</strong> implementation:</p>
<div class="highlight"><pre><span></span><code>gdal_merge.py -of GTiff -o srtm_merged.tif srtm_12_*.tif
gdalwarp -t_srs epsg:3310 -r bilinear -of GTiff srtm_merged.tif srtm_merged_3310.tif
gdaldem hillshade srtm_merged_3310.tif srtm_merged_3310_shade.tif -of GTiff
</code></pre></div>
<p>Alternately, we can simulate <strong>lazy evaluation</strong> by using <a href="http://www.gdal.org/gdal_vrttut.html">GDAL Virtual Rasters</a> (VRT) to perform the intermediate steps, only outputting the GeoTiff as the final step. </p>
<div class="highlight"><pre><span></span><code>gdalbuildvrt srtm_merged.vrt srtm_12_0*.tif
gdalwarp -t_srs epsg:3310 -r bilinear -of VRT srtm_merged.vrt srtm_merged_3310.vrt
gdaldem hillshade srtm_merged_3310.vrt srtm_merged_3310_shade2.tif -of GTiff
</code></pre></div>
<p>So what's the advantage to doing it the VRT way? They both produce <em>exactly</em> the same output raster. Lets compare:</p>
<table class="table table-striped table-bordered table-condensed">
<thead>
<tr>
<th> </th>
<th>Process-As-You-Go</th>
<th>"Lazy" VRTs</th>
</tr>
</thead>
<tbody>
<tr>
<th>Merge (#1) time</th>
<td>3.1 sec</td>
<td>0.05 sec</td>
</tr>
<tr>
<th>Warp (#2) time </th>
<td>7.3 sec </td>
<td>0.10 sec </td>
</tr>
<tr>
<th>Hillshade (#3) time</th>
<td>10.5 sec </td>
<td>19.75 sec</td>
</tr>
<tr>
<th>Total processing time</th>
<td>20.9 sec</td>
<td>19.9 sec </td>
</tr>
<tr>
<th>Intermediate files</th>
<td>2 tifs</td>
<td>2 vrts</td>
</tr>
<tr>
<th>Intermediate file size</th>
<td>261 MB</td>
<td>0.005 MB</td>
</tr>
</tbody>
</table>
<p>The Lazy VRT method <strong>delays all the computationally-intensive processing until it is actually required</strong>. The intermediate files, instead of containing the raw raster output of the actual computation, are XML files which contain the <em>instructions</em> to get the desired output. This allows GDAL to do all the processing in one step (the final step #3). The <em>total</em> processing time is not significantly different between the two methods but in terms of the productivity of the GIS analyst, the VRT method is superior. Imagine working with datasets 1000x this size with many more steps - having to type the command, wait 2 hours, type the next, etc. would be a waste of human resources versus assembling the instructions into vrts then hitting the final processing step when you leave the office for a long weekend.</p>
<p>Additionaly, the VRT method produces only <strong>small intermediate xml files</strong> instead of having a potentially huge data management nightmare of shuffling around GB (or TB) of intermediate outputs! Plus those xml files serve as an excellent piece of metadata which describe the exact processing steps which you can refer to later or adapt to different datasets. </p>
<p>So next time you have a multi-step raster workflow, use the GDAL VRTs to your full advantage - you'll save yourself time and disk space by being lazy. </p>Peaksware licensing revisted …2009-12-16T00:00:00-07:002009-12-16T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2009-12-16:/peaksware-licensing-revisted.html<p>I had previously <a href="http://www.perrygeo.net/wordpress/?p=138">bitched and moaned</a> about the licensing restrictions on the <a href="http://www.trainingpeaks.com/WKO">TrainingPeaks WKO+</a> software. Truth be told, the reason I was so put off by their crappy licensing scheme was that my cycling training relied so heavily on their software. It was not perfect but it was the best …</p><p>I had previously <a href="http://www.perrygeo.net/wordpress/?p=138">bitched and moaned</a> about the licensing restrictions on the <a href="http://www.trainingpeaks.com/WKO">TrainingPeaks WKO+</a> software. Truth be told, the reason I was so put off by their crappy licensing scheme was that my cycling training relied so heavily on their software. It was not perfect but it was the best tool available. I've since discovered <a href="http://goldencheetah.org/">Golden Cheetah</a> which is a viable open-source alternative but it still lags behind WKO+ in many critical features.</p>
<p>Now, fresh in time for the 2010 training season, Peaksware has released a new version 3.0 of WKO+ which, amongst many UI and functionality improvements, has made considerable progress on the licensing front.</p>
<blockquote>
<p>We know, our licensing has been a challenge to deal with for our customers in the past, but we’ve always tried to be as helpful as possible getting you back up and running after a hard drive crash or new computer. To remedy this, we’re pleased to announce an all new flexible licensing system. First, with every purchase we now allow you to install WKO+ 3.0 on up to two computers; second, we’ve built an online activation/deactivation system so you are free to move your active licenses from machine to machine. Are you leaving on a 2 week trip? Just de-activate your home computer, activate your laptop, and you’re on your way. When you get home, de-actiavate your laptop, re-activate your desktop and you’re all set.</p>
</blockquote>
<p>It ain't open source (there is still a place in this world for proprietary software if they can push the boundaries and innovate) but the sensitivity to the licensing issue just may have restored my faith in their company. </p>Nice examples of ESRIs geoprocessing python module (9.3)2009-08-10T00:00:00-06:002009-08-10T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2009-08-10:/nice-examples-of-esris-geoprocessing-python-module-93.html<p>Just thought I'd point out a great presentation about the "new" 9.3 geoprocessing (gp) python module from ESRI. </p>
<p>Ghislain Prince and Elizabeth Flanary do a great job of introduction by examples. The latest gp module is much more pythonic and these examples show how to leverage that to its …</p><p>Just thought I'd point out a great presentation about the "new" 9.3 geoprocessing (gp) python module from ESRI. </p>
<p>Ghislain Prince and Elizabeth Flanary do a great job of introduction by examples. The latest gp module is much more pythonic and these examples show how to leverage that to its full advantage. If you try to do this with older gp versions, the code would make most pythonistas cringe. This latest version returns objects and lists, use real booleans, and uses true objects instead of funky string parameters. Basic OO stuff for most python libraries but a big improvement for gp. </p>
<p>Here's the <a href="http://arcscripts.esri.com/details.asp?dbid=16509">powerpoint presentation</a>. Thanks to Jamey Rosen for the tip!</p>Peaksware licensing hell2009-06-23T00:00:00-06:002009-06-23T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2009-06-23:/peaksware-licensing-hell.html<p>I've been using Peaksware's WKO+, a cycling and running training tool to manage data from heart rate monitors, GPS units, power meters, etc. Its a powerful tool with a clunky UI but I've gotten used to it. </p>
<p>You pay $100 for a "personal" license. Not a big deal to me …</p><p>I've been using Peaksware's WKO+, a cycling and running training tool to manage data from heart rate monitors, GPS units, power meters, etc. Its a powerful tool with a clunky UI but I've gotten used to it. </p>
<p>You pay $100 for a "personal" license. Not a big deal to me since they basically have a monopoly on this software niche. I first installed it on my work computer to test the data from my daily bike commute. Cool it works. Then I went to install it at home since that's where I'll be using it. Works ok. I proceed to gather all my fitness data into their proprietary binary format. </p>
<p>Fast forward a few months. I'm reformatting the hard drive on the laptop and want to move all my data and software to my desktop. But installing WKO+ is giving me a headache ("Error: Too many installations"). The registration process takes a hardware fingerprint and your must active it via the web to get a registration code. However, hidden withing their EULA, is a term which <strong>dissallows the transfer of license</strong> to another computer other than the one to which it was originally installed. The second installation was just an allowance they make to allow for "hard drive crashes" and such.</p>
<p>Since neither of those machines would be available to me, certainly there would be a way to transfer it? After several progressively more desperate communications with Matt Allen at peaksware support, he informed me that there was no way they would transfer the license (the non-transfer clause IS in the EULA after all). <strong>I would need to purchase another license simply because I switched computers</strong>!</p>
<p>Here is my response:</p>
<blockquote>
<p>Basically what you are telling me is that I can no longer use WKO+
without paying again. I get to use the software for a few months and
you revoke my right to use it because I buy a new computer! I am a
paying customer, trying to be totally legit here, willing to support
your business in exchange for a license to use your software and you
insist on screwing me over. Brilliant.</p>
</blockquote>
<p>This is one of the most unprofessional and idiotic stances I have ever
seen from a software company. Your intention appears to be to screw
over your paying customers and milk as much cash from them as possible
- you might want to rethink that business model unless you want to
loose customers! I will never endorse, recommend or purchase another
product or service from peaksware nor will any of my family, friends,
teammates or readers once the word gets out about your disrespectful
policies.</p>
<p>There are numerous typical situations where a new copy of the software
would need to be installed including:</p>
<ul>
<li>Hard drive failure</li>
<li>Operating system upgrades</li>
<li>New computer purchases</li>
<li>Extended traveling and touring (installing onto a laptop or netbook)</li>
</ul>
<p>Now I fully understand why your policy is one license per computer. It
makes perfect sense. I have seen plenty of other software with a
similar licensing model. But they also allow to uninstall the software
and re-register it on another computer due to these circumstances.
There is simply no technological reason why you could not implement a
licensing structure that allowed the user more freedom to transfer
licenses while still preventing piracy. As it stands, your licensing
model treats paying customers like criminals if they happen to run
across any one of the above situations.</p>
<p>So, to sum it up - your foolish license policy has lost you one
customer and many future ones.</p>
<p>Good riddance.</p>
<p>So if you want to support a company that treats its paying customers like criminals because they get a new computer, go right ahead and support Peaksware. But anyone who expects to use software that they pay for even if they happen to buy a new computer should steer clear.</p>
<p>The real kicker is that all that work is locked away in their proprietary file format simply because of their draconian licensing. This is the real take home lesson to all software users (not just fitness geeks): <strong>If you lock your data away in a proprietary format and are beholden to a single company in order to access it, they can and will screw you. Always insist on open data formats, even if using proprietary software</strong>. Oh and always read the EULA carefully before clicking OK!</p>Reading XFS partition from Windows2009-06-21T00:00:00-06:002009-06-21T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2009-06-21:/reading-xfs-partition-from-windows.html<p>When I was setting up my linux system a few years ago, I did some research into filesystems and determined that the <a href="http://en.wikipedia.org/wiki/XFS">XFS file system</a>, being particularly proficient in dealing with large files, would be ideal for my home directory. And it was. But the one factor I didn't consider …</p><p>When I was setting up my linux system a few years ago, I did some research into filesystems and determined that the <a href="http://en.wikipedia.org/wiki/XFS">XFS file system</a>, being particularly proficient in dealing with large files, would be ideal for my home directory. And it was. But the one factor I didn't consider was portability. Turns out that there is basically no support for XFS in windows. </p>
<p>So how do you access your files from Windows if they are on an XFS partition? I had just shy of 1 TB of data to transfer so using my other linux box and transferring across the network would have taken forever. The solution I came up with is a bit convoluted but it has some real advantages:</p>
<p>1) Install Sun's VirtualBox.
2) Download an iso for your favorite linux distribution (mine being Ubuntu 9.04)
3) Create a virtual machine from the linux iso
4) Install the VBOxGuestAdditions in the linux virtual machine.
5) Create a Share folder on the windows host and register it with the virtual machine. This will allow you to transfer files from the guest (linux) to the host(windows) You may have to manually mount the drive in the linux guest:</p>
<div class="highlight"><pre><span></span><code>mount -t vboxsf share_name /mnt/share_name
</code></pre></div>
<p>6) Using the windows host cmd line, create a vmdk from the physical drive that your XFS partition resides on. In this case, PhysicalDrive1 corresponds to the second SATA connector. This will allow your guest OS to talk directly with the drive:</p>
<div class="highlight"><pre><span></span><code><span class="n">cd</span><span class="w"> </span><span class="nl">C:</span><span class="n">\Program</span><span class="w"> </span><span class="n">Files\Sun\xVM</span><span class="w"> </span><span class="n">VirtualBox</span><span class="w"></span>
<span class="n">VBoxManage</span><span class="p">.</span><span class="n">exe</span><span class="w"> </span><span class="n">internalcommands</span><span class="w"> </span><span class="n">createrawvmdk</span><span class="w"> </span>
<span class="w"> </span><span class="o">-</span><span class="n">filename</span><span class="w"> </span><span class="s">"C:\Documents and Settings\perry\.VirtualBox\HardDisks\Physical1.vmdk"</span><span class="w"> </span>
<span class="w"> </span><span class="o">-</span><span class="n">rawdisk</span><span class="w"> </span><span class="n">\\.\PhysicalDrive1</span><span class="w"> </span><span class="o">-</span><span class="n">register</span><span class="w"></span>
</code></pre></div>
<p>Once completed, you should see:</p>
<div class="highlight"><pre><span></span><code>RAW host disk access VMDK file
C:\Documents and Settings\perry\.VirtualBox\HardDisks\Physical1.vmdk created successfully.
</code></pre></div>
<p>7) Make sure to add the physical drive to your list of hard drives in the linux guest options. Restart the linux guest virtual machine and your XFS partition should already be mounted. Now you can begin transfering files between your XFS partition and the shared folder on the windows host.</p>
<p>Whew. Lots of hassle for a simple file transfer, right! But the side benefit is that now you have a fully functional linux virtual machine with a shared folder set up to the windows host. Very useful - even when you must run windows, it helps to have a linux VM standing by!</p>IronPython (2.6) and ArcGIS - ready for prime time!!2009-06-16T00:00:00-06:002009-06-16T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2009-06-16:/ironpython-26-and-arcgis-ready-for-prime-time.html<p>Not sure why this didn't occur to me <em>before</em> I wrote <a href="http://www.perrygeo.net/wordpress/?p=135">that last post</a> but I tried the "pythonic" version of the code under the <a href="http://ironpython.codeplex.com/Release/ProjectReleases.aspx?ReleaseId=25126">IronPython 2.6 Beta 1</a> release and it works!</p>
<div class="highlight"><pre><span></span><code>lyr = Carto.LayerFileClass()
lyr.Open('C:\\test.lyr')
print lyr.Filename
</code></pre></div>
<p>Works perfectly now. So IronPython …</p><p>Not sure why this didn't occur to me <em>before</em> I wrote <a href="http://www.perrygeo.net/wordpress/?p=135">that last post</a> but I tried the "pythonic" version of the code under the <a href="http://ironpython.codeplex.com/Release/ProjectReleases.aspx?ReleaseId=25126">IronPython 2.6 Beta 1</a> release and it works!</p>
<div class="highlight"><pre><span></span><code>lyr = Carto.LayerFileClass()
lyr.Open('C:\\test.lyr')
print lyr.Filename
</code></pre></div>
<p>Works perfectly now. So IronPython <strong>2.6</strong> promises to be a viable option for extending ArcGIS. My enthusiasm has been renewed.</p>IronPython and ArcGIS - not quite ready for prime time2009-06-16T00:00:00-06:002009-06-16T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2009-06-16:/ironpython-and-arcgis-not-quite-ready-for-prime-time.html<p>Occasionally I find myself in the C#/.NET world in order to write code using ESRI ArcObjects. Today I was toying with the idea of automating the creation of ESRI Layer files (a file which defines the cartographic styling of a dataset). Of course they are in an undocumented binary …</p><p>Occasionally I find myself in the C#/.NET world in order to write code using ESRI ArcObjects. Today I was toying with the idea of automating the creation of ESRI Layer files (a file which defines the cartographic styling of a dataset). Of course they are in an undocumented binary file format, <a href="http://blog.cleverelephant.ca/2009/04/esri-formats-back-to-future.html">inaccessible to anything but ESRI software</a>. So I pop open Visual Studio .... </p>
<p>I feel a nagging unease every time I type a set of curly braces. And VB just makes me insane. I prefer, of course, to use python. Luckily there is <a href="http://www.codeplex.com/Wiki/View.aspx?ProjectName=IronPython">IronPython</a> which runs on .NET - which means I could theoretically use it to interact with ArcGIS. </p>
<p>I only found a <a href="http://moreati.org.uk/blog/2009/01/27/from-esriarcgis-import-geodatabase/">single working example</a> of using ArcObjects through IronPython. But it looked promising enough to close Visual Studio and give it a go. </p>
<p>The first nagging problem is an IronPython-specific one. Relatively minor annoyance but you have to add the reference to a .NET assembly (library) before you can load it. </p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">clr</span>
<span class="n">clr</span><span class="o">.</span><span class="n">AddReference</span><span class="p">(</span><span class="s1">'ESRI.ArcGIS.System'</span><span class="p">)</span>
<span class="n">clr</span><span class="o">.</span><span class="n">AddReference</span><span class="p">(</span><span class="s1">'ESRI.ArcGIS.Carto'</span><span class="p">)</span>
<span class="kn">from</span> <span class="nn">ESRI.ArcGIS</span> <span class="kn">import</span> <span class="n">esriSystem</span>
<span class="kn">from</span> <span class="nn">ESRI.ArcGIS</span> <span class="kn">import</span> <span class="n">Carto</span>
</code></pre></div>
<p>Now there is the issue of grabbing an ESRI license. A little verbose IMO but it could easily be encapsulated in a helper function to clean things up. </p>
<div class="highlight"><pre><span></span><code><span class="nv">aoc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">esriSystem</span>.<span class="nv">AoInitializeClass</span><span class="ss">()</span><span class="w"></span>
<span class="nv">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">esriSystem</span>.<span class="nv">IAoInitialize</span>.<span class="nv">IsProductCodeAvailable</span><span class="ss">(</span><span class="nv">aoc</span>,<span class="w"> </span>
<span class="w"> </span><span class="nv">esriSystem</span>.<span class="nv">esriLicenseProductCode</span>.<span class="nv">esriLicenseProductCodeArcView</span><span class="ss">)</span><span class="w"></span>
<span class="k">if</span><span class="w"> </span><span class="nv">res</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nv">esriSystem</span>.<span class="nv">esriLicenseStatus</span>.<span class="nv">esriLicenseAvailable</span>:<span class="w"></span>
<span class="w"> </span><span class="nv">esriSystem</span>.<span class="nv">IAoInitialize</span>.<span class="nv">Initialize</span><span class="ss">(</span><span class="nv">aoc</span>,<span class="w"> </span>
<span class="w"> </span><span class="nv">esriSystem</span>.<span class="nv">esriLicenseProductCode</span>.<span class="nv">esriLicenseProductCodeArcView</span><span class="ss">)</span><span class="w"></span>
</code></pre></div>
<p>Now that we've satisfied the demands of our proprietary license overlords, we can proceed with the real work .. in this case I just want to open an existing Layer file and see if the resulting object knows it's own file path. Really simple, right?</p>
<div class="highlight"><pre><span></span><code>lyr = Carto.LayerFileClass()
if "Open" in dir(lyr): print "The Layer object has an Open method but...."
lyr.Open('C:\\test.lyr')
print lyr.Filename
The Layer object has an Open method but....
Traceback (most recent call last):
File "<span class="nt"><stdin></span>", line 1, in <span class="nt"><module></span>
AttributeError: 'GenericComObject' object has no attribute 'Open'<span class="nt"></module></stdin></span>
</code></pre></div>
<p>Hrm. Looks like we've run across <a href="http://www.codeplex.com/IronPython/WorkItem/View.aspx?WorkItemId=1506">bug 1506</a> which doesn't allow access to the properties and methods of a given instance - instead your have to work through the functions provided by the implementation. Grr...</p>
<div class="highlight"><pre><span></span><code>Carto.ILayerFile.Open(lyr, 'C:\\test.lyr')
print Carto.ILayerFile.Filename.GetValue(lyr)
</code></pre></div>
<p>That is unwieldy, ugly and <a href="http://shalabh.infogami.com/Be_Pythonic2">unpythonic</a>. What's the point of object oriented programming if you can't access the methods and properties of an object directly? Since all ArcObjects applications are based on extending COM interfaces, this would be a major pain in any non-trivial application. Basically, until these .NET-accessible COM objects can be treated in a pythonic way, I don't see any compelling reason to pursue IronPython and ArcGIS integration. Looks like its back to C# for the moment ... (/me take a deep sigh and opens Visual Studio) ... unless of course anyone has some brilliant solution to share!!</p>The GPS told me to do it2009-06-12T00:00:00-06:002009-06-12T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2009-06-12:/the-gps-told-me-to-do-it.html<p>Another disastrous consequence of inaccurate spatial information... Not only can you accidentally <a href="http://www.perrygeo.net/wordpress/?p=75">tag your neighbor as a criminal</a>, now it appears that sloppy spatial data has lead to <a href="http://www.wsbtv.com/news/19715994/detail.html">the wrong house getting demolished</a>. </p>
<p>I've asked it before but its worth repeating ... with all the recent advances in spatial data publishing …</p><p>Another disastrous consequence of inaccurate spatial information... Not only can you accidentally <a href="http://www.perrygeo.net/wordpress/?p=75">tag your neighbor as a criminal</a>, now it appears that sloppy spatial data has lead to <a href="http://www.wsbtv.com/news/19715994/detail.html">the wrong house getting demolished</a>. </p>
<p>I've asked it before but its worth repeating ... with all the recent advances in spatial data publishing, where are the advances in metadata and data quality assurance? How do you know where the data comes from, what's been done to it and by whom? What is the intended use of the data? For the vast majority of the data being shoved out onto the web, these bits of metadata are sorely lacking.</p>
<p>Of course this case is more a matter of one person's sheer stupidity; I'm not sure any caveats in the metadata would have stopped the wrecking ball!</p>The magic bullet2009-03-25T00:00:00-06:002009-03-25T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2009-03-25:/the-magic-bullet.html<p>Dealing with corrupted shapefiles can be a painful experience: programs crash for seemingly no reason, attribute tables get screwy, features get lost, queries results don't look right and ArcGIS processing tools fail with mysterious error codes:</p>
<p><img alt="Dissolve error" src="/assets/img/uploads/2009/03/dissolve_error.jpg"></p>
<p>Never fear, OGR is here. The magic bullet for fixing corrupted shapefiles is, 90 …</p><p>Dealing with corrupted shapefiles can be a painful experience: programs crash for seemingly no reason, attribute tables get screwy, features get lost, queries results don't look right and ArcGIS processing tools fail with mysterious error codes:</p>
<p><img alt="Dissolve error" src="/assets/img/uploads/2009/03/dissolve_error.jpg"></p>
<p>Never fear, OGR is here. The magic bullet for fixing corrupted shapefiles is, 90% of the time, accomplished by using ogr2ogr to convert the shapefile to another shapefile. </p>
<div class="highlight"><pre><span></span><code>ogr2ogr -f "ESRI Shapefile" shiny_new_clean_dataset.shp corrupted_dataset.shp corrupted_dataset
</code></pre></div>
<p>OGR's internal data model cleans it up and the output is a fresh shiny new shapefile that works without hassle. </p>TV cycling coverage is dead2009-02-19T00:00:00-07:002009-02-19T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2009-02-19:/tv-cycling-coverage-is-dead.html<p>Real-time spatial application developers take note...</p>
<p>I've been following the Tour of California this week (looking forward to the Solvang Time Trial this Friday) and have been disappointed with the TV coverage on Versus. Its not that the coverage is bad, its just that long-distance endurance sports don't lend themselves …</p><p>Real-time spatial application developers take note...</p>
<p>I've been following the Tour of California this week (looking forward to the Solvang Time Trial this Friday) and have been disappointed with the TV coverage on Versus. Its not that the coverage is bad, its just that long-distance endurance sports don't lend themselves to the traditional 2 announcers and 1 camera format. There are multiple groups of riders and so much spatial information to keep track of if one really wants to understand the dynamics of a cycling event.</p>
<p>Maybe I've just been spoiled by the <a href="http://tracker.amgentourofcalifornia.com/">Amgen Tour Tracker</a>. It is a crowning example of a spatially-aware real-time web application.</p>
<p><a href="/assets/img/tour_tracker.png"><img alt="" src="/assets/img/tour_tracker_thumb.jpg"></a></p>
<p>It provides two cameras of live coverage, live commentary with interviews, chat, summary updates, gps tracking of riders shown on both an elevation profile and a yahoo-based aerial map, "gps+" location prediction, race standings, time checks, etc. Far more information than any TV coverage without resorting to information overload. </p>Stimulus watch2009-02-12T00:00:00-07:002009-02-12T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2009-02-12:/stimulus-watch.html<p>Last time I posted on this blog, Hillary and Obama were still battling it out for the Democratic nomination. Now Barack Obama is our president with an uphill battle to save the economy. So yeah, it's been a while. I haven't been doing too much innovative Geo-related stuff lately, hence …</p><p>Last time I posted on this blog, Hillary and Obama were still battling it out for the Democratic nomination. Now Barack Obama is our president with an uphill battle to save the economy. So yeah, it's been a while. I haven't been doing too much innovative Geo-related stuff lately, hence the lack of blog posts. I'll try to pick up the pace a bit, even if I have to resort to fluff pieces like this one...</p>
<p>Well, it looks like the economic stimulus bill is going to pass. The bill doesn’t actually specify the projects that will be funded; the money will be allocated to cities and some federal grant agencies. The mayors have already proposed thousands of “shovel-ready” projects that might get a green light depending on how much funding the city gets.</p>
<p>There’s a great site, <a href="http://www.stimuluswatch.org">stimuluswatch.org</a>, that allows the public to review these proposals. Good to know where our tax dollars are headed!</p>
<p>There are several <a href="http://www.stimuluswatch.org/project/search/GIS">GIS proposals</a> ranging from projects with specific, well-defined (and measurable) objectives to the nebulous "Give us $500,000 to upgrade our cities' GIS program". It will be interesting to see which ones pan out, which ones produce results and which ones are just a pure waste of taxpayer dollars. </p>
<p>P.S. If you'd like to see where most of my time and energy is going these days, it's training for the US National Cup mountain bike race series. My <a href="http://viedevelo.wordpress.com/">cycling exploits are available for all</a> who are inclined to read them.</p>R is for Radiohead2008-07-15T00:00:00-06:002008-07-15T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2008-07-15:/r-is-for-radiohead.html<p>Radiohead realeased their video for <a href="http://code.google.com/creative/radiohead/">House of Cards</a> yesterday. Besides being a big radiohead fan, I was also loving the <a href="http://www.velodyne.com/lidar/">LIDAR </a><a href="http://www.geometricinformatics.com/">technology </a>behind the video. </p>
<p>If you want to check it out yourself, there are code samples on the site as well as access to the raw data. The csv …</p><p>Radiohead realeased their video for <a href="http://code.google.com/creative/radiohead/">House of Cards</a> yesterday. Besides being a big radiohead fan, I was also loving the <a href="http://www.velodyne.com/lidar/">LIDAR </a><a href="http://www.geometricinformatics.com/">technology </a>behind the video. </p>
<p>If you want to check it out yourself, there are code samples on the site as well as access to the raw data. The csv files have four columns (x, y, z, and intensity). For me the quickest way to visualize the data was through R and it's OpenGL interface called rgl (which is a wonderful high-level 3D data visualization environment). </p>
<p>Assuming you have R installed, rgl is a simple add on through the CRAN repositories:</p>
<div class="highlight"><pre><span></span><code>install.packages("rgl")
</code></pre></div>
<p>Then you need to load the library, load the csv, scale the intensity values from 0 to 1. Then it's a simple rgl.points command to get an interactive 3D rendering:</p>
<div class="highlight"><pre><span></span><code><span class="nf">library</span><span class="p">(</span><span class="n">rgl</span><span class="p">)</span>
<span class="n">d</span> <span class="o"><</span> <span class="o">-</span> <span class="nf">read.csv</span><span class="p">(</span><span class="s">"C:/temp/radiohead/22.csv"</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="kc">FALSE</span><span class="p">)</span>
<span class="c1"># scale intensity values from 0 to 1</span>
<span class="n">d</span><span class="o">$</span><span class="n">int</span> <span class="o"><-</span> <span class="n">d</span><span class="p">[,</span><span class="m">4</span><span class="p">]</span> <span class="o">/</span> <span class="m">255</span>
<span class="c1"># rgl.points(x,y,z,size=__,color=__)</span>
<span class="c1"># note y value is inverted</span>
<span class="c1"># color is a grayscale rgb based on intensity</span>
<span class="nf">rgl.points</span><span class="p">(</span><span class="n">d</span><span class="p">[,</span><span class="m">1</span><span class="p">],</span><span class="n">d</span><span class="p">[,</span><span class="m">2</span><span class="p">]</span><span class="o">*</span><span class="m">-1</span><span class="p">,</span><span class="n">d</span><span class="p">[,</span><span class="m">3</span><span class="p">],</span> <span class="n">size</span><span class="o">=</span><span class="m">3</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="nf">rgb</span><span class="p">(</span><span class="n">d</span><span class="o">$</span><span class="n">int</span><span class="p">,</span><span class="n">d</span><span class="o">$</span><span class="n">int</span><span class="p">,</span><span class="n">d</span><span class="o">$</span><span class="n">int</span><span class="p">))</span>
</code></pre></div>
<p>That's all it takes to render Thom Yorke in all his 3D digital glory:</p>
<p><img alt="" src="/assets/img/radiohead2.jpg"></p>Geospatial Reddit - 2 weeks later2008-06-12T00:00:00-06:002008-06-12T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2008-06-12:/geospatial-reddit-2-weeks-later.html<p>So, despite frustrations with getting submitted URLs to appear, <a href="http://www.reddit.com/r/geospatial/">Geospatial Reddit</a> is still puttering along. Not exactly a vibrant community <em>yet</em> but there are currently 133 subscribers. If you're subscribed, take a minute to submit your favorite URLs. If you haven't subscribed, <a href="http://www.reddit.com/r/geospatial/">check it out</a>.</p>
<p>I thought 133 subscribers was …</p><p>So, despite frustrations with getting submitted URLs to appear, <a href="http://www.reddit.com/r/geospatial/">Geospatial Reddit</a> is still puttering along. Not exactly a vibrant community <em>yet</em> but there are currently 133 subscribers. If you're subscribed, take a minute to submit your favorite URLs. If you haven't subscribed, <a href="http://www.reddit.com/r/geospatial/">check it out</a>.</p>
<p>I thought 133 subscribers was a decent number until I found that the <a href="http://www.reddit.com/r/bacon/">Bacon subreddit</a> has over 500. Apparently the world would rather discuss their greasy breakfast food than maps. </p>Jabref - Open Source Alternative to EndNote2008-06-08T00:00:00-06:002008-06-08T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2008-06-08:/jabref-open-source-alternative-to-endnote.html<p>For those of you that use EndNote to keep track of your bibliographies/references , there is an alternative. <a href="http://jabref.sourceforge.net">JabRef</a>. I find the <a href="http://jabref.sourceforge.net/images/Jabref-ScreenShot-MainWindow.png">UI</a> to be very intuitive and it has a range of customizable import/export formats. JabRef uses the <a href="http://en.wikipedia.org/wiki/BibTeX">BibTex</a> format as it's native file format so, of course …</p><p>For those of you that use EndNote to keep track of your bibliographies/references , there is an alternative. <a href="http://jabref.sourceforge.net">JabRef</a>. I find the <a href="http://jabref.sourceforge.net/images/Jabref-ScreenShot-MainWindow.png">UI</a> to be very intuitive and it has a range of customizable import/export formats. JabRef uses the <a href="http://en.wikipedia.org/wiki/BibTeX">BibTex</a> format as it's native file format so, of course, it integrates very well with <a href="http://en.wikipedia.org/wiki/LaTeX">LaTeX</a>.</p>
<p>One of the neat features is the ability to create custom bibliographies in HTML, complete with javascript-based search capabilities. Here's <a href="http://perrygeo.net/references.html">my reference list</a> which I'll be slowly adding to as I convert all my old text-based and EndNote reference lists over. </p>Geospatial Reddit - A democratic solution to geo blog overload?2008-05-28T00:00:00-06:002008-05-28T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2008-05-28:/geospatial-reddit-a-democratic-solution-to-geo-blog-overload.html<p>All the great GIS news/blog aggregators out there (planetgs, slashgeo, etc) are moderator driven - a few people act as the gatekeepers and inevitably <a href="http://www.spatiallyadjusted.com/2008/05/23/planet-geospatial-reboot-coming/">have to decide what information is useful</a>. This is <a href="http://zcologia.com/news/762/planet-geospatial/">not the ideal way</a> to do things. </p>
<p>There's a more democratic and distributed way to spread the …</p><p>All the great GIS news/blog aggregators out there (planetgs, slashgeo, etc) are moderator driven - a few people act as the gatekeepers and inevitably <a href="http://www.spatiallyadjusted.com/2008/05/23/planet-geospatial-reboot-coming/">have to decide what information is useful</a>. This is <a href="http://zcologia.com/news/762/planet-geospatial/">not the ideal way</a> to do things. </p>
<p>There's a more democratic and distributed way to spread the role - it's called <em>reddit</em>. <img alt="" src="http://reallystatic.reddit.com/static/create-a-reddit.png"> More specifically, <a href="http://reddit.com/r/geospatial">Geospatial Reddit</a>. For those unfamiliar with reddit (or similar sites like digg), the idea is simple: users submit stories and users vote on stories. The most popular ones rise to the top and, theoretically, the best articles magically appear on the front page. Much like democracy itself, there are flaws in the theory but its the best thing we've got.</p>
<p>Geospatial Reddit is public so sign up, submit your favorite stories and vote. Lets see if we can make this work.</p>Posting to Geospatial Reddit2008-05-28T00:00:00-06:002008-05-28T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2008-05-28:/posting-to-geospatial-reddit.html<p>Some folks have had trouble submitting links so I figured I should post a bit more detail on that. To get articles to show up on the <em>geospatial</em> reddit (not the main reddit), go to <a href="http://reddit.com/r/geospatial/submit">http://reddit.com/r/geospatial/submit</a> or click the "Submit a Link" button on the …</p><p>Some folks have had trouble submitting links so I figured I should post a bit more detail on that. To get articles to show up on the <em>geospatial</em> reddit (not the main reddit), go to <a href="http://reddit.com/r/geospatial/submit">http://reddit.com/r/geospatial/submit</a> or click the "Submit a Link" button on the right - from the geospatial page. When you're submitting the url, you should see "submit to geospatial" as the page header. </p>
<p>I know at least 2 of us have been successful at posting. If this doesn't work for you, please let me know and I'll try and figure it out. </p>So you want to learn to learn about kriging …2008-05-25T00:00:00-06:002008-05-25T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2008-05-25:/so-you-want-to-learn-to-learn-about-kriging.html<p>Guides like <a href="http://spatial-analyst.net/">Tomislav Hengl's</a> <a href="http://eusoils.jrc.it/ESDB_Archive/eusoils_docs/other/EUR22904en.pdf">Practical Guide to Geostatistical Mapping of Environmental Variables</a> and Rossiter's <a href="http://www.itc.nl/~rossiter/teach/stats/ssi_short.pdf">Introduction to applied geostatistics</a> do an excellent job of providing a grounded, relatively easy to understand, introduction to geostatical prediction and kriging.</p>
<p>But if you're an experience learner (like me) you don't absorb the mathematics fully …</p><p>Guides like <a href="http://spatial-analyst.net/">Tomislav Hengl's</a> <a href="http://eusoils.jrc.it/ESDB_Archive/eusoils_docs/other/EUR22904en.pdf">Practical Guide to Geostatistical Mapping of Environmental Variables</a> and Rossiter's <a href="http://www.itc.nl/~rossiter/teach/stats/ssi_short.pdf">Introduction to applied geostatistics</a> do an excellent job of providing a grounded, relatively easy to understand, introduction to geostatical prediction and kriging.</p>
<p>But if you're an experience learner (like me) you don't absorb the mathematics fully without <em>doing</em> something with the knowledge; Seeing it in action brings the concepts to life. Unfortunately most geostats/kriging software is either too complex for exploratory learning (not enough immediate feedback) or too simplistic (making too many assumptions, disallowing access to the nitty-gritty details). Either way, you're bound to produce output with fundamental flaws because you're not aware of the finer details of variogram modelling. I speak from exerience!</p>
<p>Luckily Dennis J. J. Walvoort of the Wageningen University & Research Center saw the same problem and created an nifty learning to to explore varigoram models and spatial predictions using ordinary kriging - <a href="http://www.ai-geostats.org/index.php?id=114">EZ-Kriging</a>. No degree in math or statistical theory required. Just drag the points around, play with the parameters and alter the underlying data as a table and see the results immediately.</p>
<p><a href="/assets/img/ezkriging.jpg"><img alt="" src="/assets/img/ezkriging_thumb.jpg"></a></p>
<p>Its nothing more than a simulation so don't expect to load your own datasets or produce any meaningful output with it. But it truly excels as a learning tool to understand the core concepts behind kriging and is a great complement to Hengl and Rossiter's work. With that knowledge you can do the real deal in Surfer, R, ILWIS or your geostats software of choice.</p>
<p>EDIT: One complaint about this EZ-Kriging that I have: it doesn't show the observed sample variogram cloud overlayed on the variogram model. Oh well still a nice tool.</p>
<p>EDIT2: It's a windows .exe but it runs smoothly under wine in linux.</p>Ubuntu as a GIS workstation (updated for Hardy Heron)2008-05-14T00:00:00-06:002008-05-14T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2008-05-14:/ubuntu-as-a-gis-workstation-updated-for-hardy-heron.html<p>As a followup to my previous post on <a href="http://www.perrygeo.net/wordpress/?p=10">turning Ubuntu Gutsy into a GIS workstation</a>, Here are the revised instructions for Ubuntu 8.04 (The Hardy Heron). </p>
<p>Note that there are a few additonal apps and changes in here:</p>
<ul>
<li>
<p>Postgis</p>
</li>
<li>
<p>Mapnik</p>
</li>
<li>
<p>New version of QGIS installed via repository</p>
</li>
<li>
<p>OpenStreetMap tools …</p></li></ul><p>As a followup to my previous post on <a href="http://www.perrygeo.net/wordpress/?p=10">turning Ubuntu Gutsy into a GIS workstation</a>, Here are the revised instructions for Ubuntu 8.04 (The Hardy Heron). </p>
<p>Note that there are a few additonal apps and changes in here:</p>
<ul>
<li>
<p>Postgis</p>
</li>
<li>
<p>Mapnik</p>
</li>
<li>
<p>New version of QGIS installed via repository</p>
</li>
<li>
<p>OpenStreetMap tools (JOSM and osm2pgsql)</p>
</li>
<li>
<p>Geotiff utilities</p>
</li>
<li>
<p>Some nice python spatial libs (shapely, owslib, geopy and pyproj) </p>
</li>
</ul>
<p>Run the following as root on your new Hardy installation, answer a few configuration questions and you'll be ready to go.</p>
<div class="highlight"><pre><span></span><code><span class="n">echo</span><span class="w"> </span><span class="s1">'deb http://ppa.launchpad.net/qgis/ubuntu hardy main'</span><span class="w"> </span><span class="o">>></span><span class="w"> </span><span class="o">/</span><span class="n">etc</span><span class="o">/</span><span class="n">apt</span><span class="o">/</span><span class="n">sources</span><span class="o">.</span><span class="n">list</span><span class="w"></span>
<span class="n">apt</span><span class="o">-</span><span class="n">get</span><span class="w"> </span><span class="n">update</span><span class="w"></span>
<span class="n">apt</span><span class="o">-</span><span class="n">get</span><span class="w"> </span><span class="o">-</span><span class="n">y</span><span class="w"> </span><span class="o">--</span><span class="n">force</span><span class="o">-</span><span class="n">yes</span><span class="w"> </span><span class="n">install</span><span class="w"> </span><span class="n">grass</span><span class="w"> </span><span class="n">mapserver</span><span class="o">-</span><span class="n">bin</span><span class="w"> </span>\<span class="w"></span>
<span class="n">gdal</span><span class="o">-</span><span class="n">bin</span><span class="w"> </span><span class="n">cgi</span><span class="o">-</span><span class="n">mapserver</span><span class="w"> </span><span class="n">python</span><span class="o">-</span><span class="n">qt4</span><span class="w"> </span><span class="n">python</span><span class="o">-</span><span class="n">sip4</span><span class="w"> </span><span class="n">python</span><span class="o">-</span><span class="n">gdal</span><span class="w"> </span>\<span class="w"></span>
<span class="n">python</span><span class="o">-</span><span class="n">mapscript</span><span class="w"> </span><span class="n">gmt</span><span class="w"> </span><span class="n">gmt</span><span class="o">-</span><span class="n">coastline</span><span class="o">-</span><span class="n">data</span><span class="w"> </span><span class="n">r</span><span class="o">-</span><span class="n">recommended</span><span class="w"> </span><span class="n">gpsbabel</span><span class="w"> </span>\<span class="w"></span>
<span class="n">shapelib</span><span class="w"> </span><span class="n">qgis</span><span class="w"> </span><span class="n">qgis</span><span class="o">-</span><span class="n">plugin</span><span class="o">-</span><span class="n">grass</span><span class="w"> </span><span class="n">python</span><span class="o">-</span><span class="n">setuptools</span><span class="w"> </span>\<span class="w"></span>
<span class="n">python</span><span class="o">-</span><span class="n">mapnik</span><span class="w"> </span><span class="n">mapnik</span><span class="o">-</span><span class="n">plugins</span><span class="w"> </span><span class="n">mapnik</span><span class="o">-</span><span class="n">utils</span><span class="w"> </span><span class="n">osm2pgsql</span><span class="w"> </span><span class="n">josm</span><span class="w"> </span><span class="n">postgresql</span><span class="o">-</span><span class="mf">8.3</span><span class="o">-</span><span class="n">postgis</span><span class="w"> </span>\<span class="w"></span>
<span class="n">python</span><span class="o">-</span><span class="n">dev</span><span class="w"> </span><span class="n">build</span><span class="o">-</span><span class="n">essential</span><span class="w"> </span><span class="n">libgdal</span><span class="o">-</span><span class="n">dev</span><span class="w"> </span><span class="n">geotiff</span><span class="o">-</span><span class="n">bin</span><span class="w"> </span><span class="n">sun</span><span class="o">-</span><span class="n">java6</span><span class="o">-</span><span class="n">jre</span><span class="w"></span>
<span class="n">easy_install</span><span class="w"> </span><span class="n">shapely</span><span class="w"> </span><span class="n">geopy</span><span class="w"> </span><span class="n">owslib</span><span class="w"> </span><span class="n">pyproj</span><span class="w"></span>
</code></pre></div>
<p>EDIT: If you're looking for more up to date packages for geos, gdal, etc, try adding <code>deb http://les-ejk.cz/ubuntu/ hardy multiverse</code> to your /etc/apt/sources.list </p>'Hike of Doom #2- OGC KML'2008-04-21T00:00:00-06:002008-04-21T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2008-04-21:/hike-of-doom-2-ogc-kml.html<p>In commemoration of the <a href="http://www.opengeospatial.org/pressroom/pressreleases/857">OGC approval of KML</a> as an open standard to share geographic content over the web, I'd like to share our recent <a href="/assets/img/hikeofdoom2/hikeofdoom_20080413.kmz">"Hike of Doom #2"</a> (kml provided by Mark Dotson).</p>
<hr>
<p>The first weekend to hit 90 degrees, my friends and I travel inland to dive and …</p><p>In commemoration of the <a href="http://www.opengeospatial.org/pressroom/pressreleases/857">OGC approval of KML</a> as an open standard to share geographic content over the web, I'd like to share our recent <a href="/assets/img/hikeofdoom2/hikeofdoom_20080413.kmz">"Hike of Doom #2"</a> (kml provided by Mark Dotson).</p>
<hr>
<p>The first weekend to hit 90 degrees, my friends and I travel inland to dive and swim in the Santa Ynez river. It is billed as a "30 minute" hike to our favorite watering hole. It becomes much more than that. </p>
<p>Of course the road leading up to the trailhead is closed due to construction so we have 3 miles of hiking on pavement just to get to the former trailhead- the Red Rocks parking lot.
<img alt="" src="/assets/img/hikeofdoom2/IMGP5144.jpg"> </p>
<p>Then the fun begins. A decent rainy season and some dam releases make for high flows and we've got half-dozen major river crossings to contend with. The recent fires added a good deal of organic matter to the river and the algae has bloomed accordingly. It is a wet, hot, rocky and slimy hike. </p>
<p><img alt="" src="/assets/img/hikeofdoom2/IMGP5178.jpg"></p>
<p>We make it to the swimming hole and enjoy the day. We dive, laugh, have a few beers.</p>
<p><img alt="" src="/assets/img/hikeofdoom2/IMGP5199.jpg">
<img alt="" src="/assets/img/hikeofdoom2/IMGP5206.jpg"> </p>
<p>The sun sets and the fun _ really _ begins. </p>
<p>Klaus, the Bavarian cyclist whom we'd met at the swimming hole, met up with us just after my girlfriend, Joselyne, sprained her ankle on a rock. Her ankle hadn't started to swell yet but I could tell, drawing from my basketball injuries from the past, that she was not putting weight on it any time soon. We fashioned crutches from some driftwood. We met up with some turkey hunters (dressed in more camouflage more effective than most military uniforms) who helped us out by providing us some ankle wrap.
<img alt="" src="/assets/img/hikeofdoom2/IMGP5254.jpg"> </p>
<p>David and Andy began the trek back to the car to get help. The rest of us could either go back via the river bed , a rocky and treacherous endeavor given the setting sun, or head up to the main road and get some help. We decided on the main road and Shaun took off to alert the others to our plans. The main fire road was a trek in the _ opposite_ direction - longer, more elevation changes but smooth enough for a bike or truck and more accessible to vehicles. </p>
<p>I carried Jos, over my shoulder fireman-style and/or piggy-back, over the river crossings.
<img alt="" src="/assets/img/hikeofdoom2/IMGP5256.jpg">
On the flats, Mark and I pushed Jos on Klaus' bike.
<img alt="" src="/assets/img/hikeofdoom2/IMGP5261.jpg"></p>
<p>We pushed on up the trail until we reached the main road. Klaus, after drinking the last of our beer, biked up to the dam keeper's residence at Gibraltar Dam while Christina, Sarah, Mark, Jos and I continued up the trail. A half-hour later, Klaus and the dam keeper arrived in a pickup and drove the rest of us back to the Red Rock "parking lot".
<img alt="" src="/assets/img/hikeofdoom2/IMGP5268.jpg"> </p>
<p>But the construction and rebar on the causeway meant there was no way to cross with a normal vehicle so we went by foot. Jos got back on Klaus' bike and we pushed.
<img alt="" src="/assets/img/hikeofdoom2/IMGP5274.jpg"></p>
<p>Luckily the slight downhill grade allowed her to glide back for a good portion, graciously sparing Mark and I from permanent back injury.</p>
<p>Meanwhile the away team had gotten some semblance of cellular reception and attempted to call the authorities. The goal was to get a ranger truck to drive out to get us or at least unlock the gate to meet us half way at the Red Rock parking lot. The authorities response was fantastic if not a bit overzealous. By the time we had gotten within a 1/4 mile of our car, we spotted helicopters. Then a firetruck. Then an ambulance. Joselyne was coasting by on Klaus' bike and they didn't even stop for her on the first pass! Apparently expecting to rescue a mangled body from the wilderness, the EMTs were somewhat disappointed at the less challenging situation they faced - a girl, coasting down the road on a bike with a sprained ankle.
<img alt="" src="/assets/img/hikeofdoom2/IMGP5279.jpg"></p>
<p>We were back in the car, on the road before dark and got home in time for pizza.</p>
<p>So what did we learn from this? Well as a Boy Scout, I am ashamed to say I wasn't prepared. A well prepped emergency kit would have helped a lot. At least we had an LED headlamp. Some rope would have gone a long way towards making a stretcher. An instant-ice-pack, ankle wrap and some ibuprofen would have been handy. We were wet and the mercury was falling quickly; some emergency shelter and clothing would have assuaged my concerns about the nighttime chill.</p>
<p>But this was offset by the generosity of the many people we met for the first time - The hunters who lent us their medical supplies, the dam keeper who got up from his Sunday dinner to make sure we got back safely, the EMTs who put tremendous resources into organizing a military-scale search party, Klauss who so generously stuck with us and shared with us his bike, his wisdom and his company. Without their help and our group of friends, the story might have a less happy ending. </p>
<p>Never underestimate the power of human kindness, generosity and cooperation! And never believe me when I say it's a short hike.</p>A quick Cython introduction2008-04-19T00:00:00-06:002008-04-19T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2008-04-19:/a-quick-cython-introduction.html<p>I love python for its beautiful code and practicality. But it's not going to win a <a href="http://shootout.alioth.debian.org/debian/benchmark.php?test=all&lang=python&lang2=gcc">pure speed race</a> with most languages. Most people think of speed and ease-of-use as polar opposites - probably because they remember the pain of writing C. <a href="http://www.cython.org/">Cython</a> tries to eliminate that duality and lets you …</p><p>I love python for its beautiful code and practicality. But it's not going to win a <a href="http://shootout.alioth.debian.org/debian/benchmark.php?test=all&lang=python&lang2=gcc">pure speed race</a> with most languages. Most people think of speed and ease-of-use as polar opposites - probably because they remember the pain of writing C. <a href="http://www.cython.org/">Cython</a> tries to eliminate that duality and lets you have python syntax with C data types and functions - the best of both worlds. Keeping in mind that I'm by no means an expert at this, here are my notes based on my first real experiment with Cython:</p>
<p>EDIT: Based on some feedback I've received there seems to be some confusion - Cython is for generating <em>C extensions to Python</em> not standalone programs. The whole point is to speed up an existing python app one function at a time. No rewriting the whole application in C or Lisp. No <a href="http://www.dalkescientific.com/writings/NBN/c_extensions.html">writing C extensions by hand</a>. Just an easy way to get C speed and C data types into your slow python functions. </p>
<hr>
<p>So lets say we want to make this function faster. It is the <a href="http://mathworld.wolfram.com/GreatCircle.html">"great circle" calculation</a>, a quick spherical trig problem to calculate distance along the earth's surface between two points:</p>
<p><em>p1.py</em></p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">math</span>
<span class="k">def</span> <span class="nf">great_circle</span><span class="p">(</span><span class="n">lon1</span><span class="p">,</span><span class="n">lat1</span><span class="p">,</span><span class="n">lon2</span><span class="p">,</span><span class="n">lat2</span><span class="p">):</span>
<span class="n">radius</span> <span class="o">=</span> <span class="mi">3956</span> <span class="c1">#miles</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">math</span><span class="o">.</span><span class="n">pi</span><span class="o">/</span><span class="mf">180.0</span>
<span class="n">a</span> <span class="o">=</span> <span class="p">(</span><span class="mf">90.0</span><span class="o">-</span><span class="n">lat1</span><span class="p">)</span><span class="o">*</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="n">b</span> <span class="o">=</span> <span class="p">(</span><span class="mf">90.0</span><span class="o">-</span><span class="n">lat2</span><span class="p">)</span><span class="o">*</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="n">theta</span> <span class="o">=</span> <span class="p">(</span><span class="n">lon2</span><span class="o">-</span><span class="n">lon1</span><span class="p">)</span><span class="o">*</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="n">c</span> <span class="o">=</span> <span class="n">math</span><span class="o">.</span><span class="n">acos</span><span class="p">((</span><span class="n">math</span><span class="o">.</span><span class="n">cos</span><span class="p">(</span><span class="n">a</span><span class="p">)</span><span class="o">*</span><span class="n">math</span><span class="o">.</span><span class="n">cos</span><span class="p">(</span><span class="n">b</span><span class="p">))</span> <span class="o">+</span>
<span class="p">(</span><span class="n">math</span><span class="o">.</span><span class="n">sin</span><span class="p">(</span><span class="n">a</span><span class="p">)</span><span class="o">*</span><span class="n">math</span><span class="o">.</span><span class="n">sin</span><span class="p">(</span><span class="n">b</span><span class="p">)</span><span class="o">*</span><span class="n">math</span><span class="o">.</span><span class="n">cos</span><span class="p">(</span><span class="n">theta</span><span class="p">)))</span>
<span class="k">return</span> <span class="n">radius</span><span class="o">*</span><span class="n">c</span>
</code></pre></div>
<p>Lets try it out and <a href="http://www.diveintopython.net/performance_tuning/timeit.html">time it</a> over 1/2 million function calls:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">timeit</span>
<span class="n">lon1</span><span class="p">,</span> <span class="n">lat1</span><span class="p">,</span> <span class="n">lon2</span><span class="p">,</span> <span class="n">lat2</span> <span class="o">=</span> <span class="o">-</span><span class="mf">72.345</span><span class="p">,</span> <span class="mf">34.323</span><span class="p">,</span> <span class="o">-</span><span class="mf">61.823</span><span class="p">,</span> <span class="mf">54.826</span>
<span class="n">num</span> <span class="o">=</span> <span class="mi">500000</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">timeit</span><span class="o">.</span><span class="n">Timer</span><span class="p">(</span><span class="s2">"p1.great_circle(</span><span class="si">%f</span><span class="s2">,</span><span class="si">%f</span><span class="s2">,</span><span class="si">%f</span><span class="s2">,</span><span class="si">%f</span><span class="s2">)"</span> <span class="o">%</span> <span class="p">(</span><span class="n">lon1</span><span class="p">,</span><span class="n">lat1</span><span class="p">,</span><span class="n">lon2</span><span class="p">,</span><span class="n">lat2</span><span class="p">),</span>
<span class="s2">"import p1"</span><span class="p">)</span>
<span class="nb">print</span> <span class="s2">"Pure python function"</span><span class="p">,</span> <span class="n">t</span><span class="o">.</span><span class="n">timeit</span><span class="p">(</span><span class="n">num</span><span class="p">),</span> <span class="s2">"sec"</span>
</code></pre></div>
<p>About <strong>2.2 seconds</strong>. Too slow! </p>
<p>Lets try a quick rewrite in Cython and see if that makes a difference:
<em>c1.pyx</em></p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">math</span>
<span class="k">def</span> <span class="nf">great_circle</span><span class="p">(</span><span class="nb">float</span> <span class="n">lon1</span><span class="p">,</span><span class="nb">float</span> <span class="n">lat1</span><span class="p">,</span><span class="nb">float</span> <span class="n">lon2</span><span class="p">,</span><span class="nb">float</span> <span class="n">lat2</span><span class="p">):</span>
<span class="n">cdef</span> <span class="nb">float</span> <span class="n">radius</span> <span class="o">=</span> <span class="mf">3956.0</span>
<span class="n">cdef</span> <span class="nb">float</span> <span class="n">pi</span> <span class="o">=</span> <span class="mf">3.14159265</span>
<span class="n">cdef</span> <span class="nb">float</span> <span class="n">x</span> <span class="o">=</span> <span class="n">pi</span><span class="o">/</span><span class="mf">180.0</span>
<span class="n">cdef</span> <span class="nb">float</span> <span class="n">a</span><span class="p">,</span><span class="n">b</span><span class="p">,</span><span class="n">theta</span><span class="p">,</span><span class="n">c</span>
<span class="n">a</span> <span class="o">=</span> <span class="p">(</span><span class="mf">90.0</span><span class="o">-</span><span class="n">lat1</span><span class="p">)</span><span class="o">*</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="n">b</span> <span class="o">=</span> <span class="p">(</span><span class="mf">90.0</span><span class="o">-</span><span class="n">lat2</span><span class="p">)</span><span class="o">*</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="n">theta</span> <span class="o">=</span> <span class="p">(</span><span class="n">lon2</span><span class="o">-</span><span class="n">lon1</span><span class="p">)</span><span class="o">*</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="n">c</span> <span class="o">=</span> <span class="n">math</span><span class="o">.</span><span class="n">acos</span><span class="p">((</span><span class="n">math</span><span class="o">.</span><span class="n">cos</span><span class="p">(</span><span class="n">a</span><span class="p">)</span><span class="o">*</span><span class="n">math</span><span class="o">.</span><span class="n">cos</span><span class="p">(</span><span class="n">b</span><span class="p">))</span> <span class="o">+</span> <span class="p">(</span><span class="n">math</span><span class="o">.</span><span class="n">sin</span><span class="p">(</span><span class="n">a</span><span class="p">)</span><span class="o">*</span><span class="n">math</span><span class="o">.</span><span class="n">sin</span><span class="p">(</span><span class="n">b</span><span class="p">)</span><span class="o">*</span><span class="n">math</span><span class="o">.</span><span class="n">cos</span><span class="p">(</span><span class="n">theta</span><span class="p">)))</span>
<span class="k">return</span> <span class="n">radius</span><span class="o">*</span><span class="n">c</span>
</code></pre></div>
<p>Notice that we still <em>import math</em> - cython lets you mix and match python and C data types to some extent. The conversion is handled automatically though not without cost. In this example all we've done is define a <em>python</em> function, declare its input parameters to be floats, and declare a static C float data type for all the variables. It still uses the python math module to do the calcs. </p>
<p>Now we need to convert this to C code and compile the python extension. The best way to do this is through a <a href="http://ldots.org/pyrex-guide/2-compiling.html#distutils">setup.py distutils script</a>. But we'll do it the <a href="http://ldots.org/pyrex-guide/2-compiling.html#gcc">manual way</a> for now to see what's happening:</p>
<div class="highlight"><pre><span></span><code>#<span class="w"> </span><span class="nv">this</span><span class="w"> </span><span class="nv">will</span><span class="w"> </span><span class="nv">create</span><span class="w"> </span><span class="nv">a</span><span class="w"> </span><span class="nv">c1</span>.<span class="nv">c</span><span class="w"> </span><span class="nv">file</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nv">the</span><span class="w"> </span><span class="nv">C</span><span class="w"> </span><span class="nv">source</span><span class="w"> </span><span class="nv">code</span><span class="w"> </span><span class="nv">to</span><span class="w"> </span><span class="nv">build</span><span class="w"> </span><span class="nv">a</span><span class="w"> </span><span class="nv">python</span><span class="w"> </span><span class="nv">extension</span><span class="w"></span>
<span class="nv">cython</span><span class="w"> </span><span class="nv">c1</span>.<span class="nv">pyx</span><span class="w"></span>
#<span class="w"> </span><span class="nv">Compile</span><span class="w"> </span><span class="nv">the</span><span class="w"> </span><span class="nv">object</span><span class="w"> </span><span class="nv">file</span><span class="w"> </span>
<span class="nv">gcc</span><span class="w"> </span><span class="o">-</span><span class="nv">c</span><span class="w"> </span><span class="o">-</span><span class="nv">fPIC</span><span class="w"> </span><span class="o">-</span><span class="nv">I</span><span class="o">/</span><span class="nv">usr</span><span class="o">/</span><span class="k">include</span><span class="o">/</span><span class="nv">python2</span>.<span class="mi">5</span><span class="o">/</span><span class="w"> </span><span class="nv">c1</span>.<span class="nv">c</span><span class="w"></span>
#<span class="w"> </span><span class="nv">Link</span><span class="w"> </span><span class="nv">it</span><span class="w"> </span><span class="nv">into</span><span class="w"> </span><span class="nv">a</span><span class="w"> </span><span class="nv">shared</span><span class="w"> </span><span class="nv">library</span><span class="w"></span>
<span class="nv">gcc</span><span class="w"> </span><span class="o">-</span><span class="nv">shared</span><span class="w"> </span><span class="nv">c1</span>.<span class="nv">o</span><span class="w"> </span><span class="o">-</span><span class="nv">o</span><span class="w"> </span><span class="nv">c1</span>.<span class="nv">so</span><span class="w"></span>
</code></pre></div>
<p>Now you should have a c1.so (or .dll) file which can be imported in python. Lets give it a run:</p>
<div class="highlight"><pre><span></span><code> <span class="n">t</span> <span class="o">=</span> <span class="n">timeit</span><span class="o">.</span><span class="n">Timer</span><span class="p">(</span><span class="s2">"c1.great_circle(</span><span class="si">%f</span><span class="s2">,</span><span class="si">%f</span><span class="s2">,</span><span class="si">%f</span><span class="s2">,</span><span class="si">%f</span><span class="s2">)"</span> <span class="o">%</span> <span class="p">(</span><span class="n">lon1</span><span class="p">,</span><span class="n">lat1</span><span class="p">,</span><span class="n">lon2</span><span class="p">,</span><span class="n">lat2</span><span class="p">),</span>
<span class="s2">"import c1"</span><span class="p">)</span>
<span class="nb">print</span> <span class="s2">"Cython function (still using python math)"</span><span class="p">,</span> <span class="n">t</span><span class="o">.</span><span class="n">timeit</span><span class="p">(</span><span class="n">num</span><span class="p">),</span> <span class="s2">"sec"</span>
</code></pre></div>
<p>About <strong>1.8 seconds</strong>. Not the kind of speedup we were hoping for but its a start. The bottleneck must be in the usage of the python math module. Lets use the C standard library trig functions instead:</p>
<p><em>c2.pyx</em></p>
<div class="highlight"><pre><span></span><code><span class="nv">cdef</span><span class="w"> </span><span class="nv">extern</span><span class="w"> </span><span class="nv">from</span><span class="w"> </span><span class="s2">"math.h"</span>:<span class="w"></span>
<span class="w"> </span><span class="nv">float</span><span class="w"> </span><span class="nv">cosf</span><span class="ss">(</span><span class="nv">float</span><span class="w"> </span><span class="nv">theta</span><span class="ss">)</span><span class="w"></span>
<span class="w"> </span><span class="nv">float</span><span class="w"> </span><span class="nv">sinf</span><span class="ss">(</span><span class="nv">float</span><span class="w"> </span><span class="nv">theta</span><span class="ss">)</span><span class="w"></span>
<span class="w"> </span><span class="nv">float</span><span class="w"> </span><span class="nv">acosf</span><span class="ss">(</span><span class="nv">float</span><span class="w"> </span><span class="nv">theta</span><span class="ss">)</span><span class="w"></span>
<span class="nv">def</span><span class="w"> </span><span class="nv">great_circle</span><span class="ss">(</span><span class="nv">float</span><span class="w"> </span><span class="nv">lon1</span>,<span class="nv">float</span><span class="w"> </span><span class="nv">lat1</span>,<span class="nv">float</span><span class="w"> </span><span class="nv">lon2</span>,<span class="nv">float</span><span class="w"> </span><span class="nv">lat2</span><span class="ss">)</span>:<span class="w"></span>
<span class="w"> </span><span class="nv">cdef</span><span class="w"> </span><span class="nv">float</span><span class="w"> </span><span class="nv">radius</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">3956</span>.<span class="mi">0</span><span class="w"> </span>
<span class="w"> </span><span class="nv">cdef</span><span class="w"> </span><span class="nv">float</span><span class="w"> </span><span class="nv">pi</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">3</span>.<span class="mi">14159265</span><span class="w"></span>
<span class="w"> </span><span class="nv">cdef</span><span class="w"> </span><span class="nv">float</span><span class="w"> </span><span class="nv">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">pi</span><span class="o">/</span><span class="mi">180</span>.<span class="mi">0</span><span class="w"></span>
<span class="w"> </span><span class="nv">cdef</span><span class="w"> </span><span class="nv">float</span><span class="w"> </span><span class="nv">a</span>,<span class="nv">b</span>,<span class="nv">theta</span>,<span class="nv">c</span><span class="w"></span>
<span class="w"> </span><span class="nv">a</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ss">(</span><span class="mi">90</span>.<span class="mi">0</span><span class="o">-</span><span class="nv">lat1</span><span class="ss">)</span><span class="o">*</span><span class="ss">(</span><span class="nv">x</span><span class="ss">)</span><span class="w"></span>
<span class="w"> </span><span class="nv">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ss">(</span><span class="mi">90</span>.<span class="mi">0</span><span class="o">-</span><span class="nv">lat2</span><span class="ss">)</span><span class="o">*</span><span class="ss">(</span><span class="nv">x</span><span class="ss">)</span><span class="w"></span>
<span class="w"> </span><span class="nv">theta</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ss">(</span><span class="nv">lon2</span><span class="o">-</span><span class="nv">lon1</span><span class="ss">)</span><span class="o">*</span><span class="ss">(</span><span class="nv">x</span><span class="ss">)</span><span class="w"></span>
<span class="w"> </span><span class="nv">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">acosf</span><span class="ss">((</span><span class="nv">cosf</span><span class="ss">(</span><span class="nv">a</span><span class="ss">)</span><span class="o">*</span><span class="nv">cosf</span><span class="ss">(</span><span class="nv">b</span><span class="ss">))</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="ss">(</span><span class="nv">sinf</span><span class="ss">(</span><span class="nv">a</span><span class="ss">)</span><span class="o">*</span><span class="nv">sinf</span><span class="ss">(</span><span class="nv">b</span><span class="ss">)</span><span class="o">*</span><span class="nv">cosf</span><span class="ss">(</span><span class="nv">theta</span><span class="ss">)))</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nv">radius</span><span class="o">*</span><span class="nv">c</span><span class="w"></span>
</code></pre></div>
<p>Instead of importing the math module, we use <em>cdef extern</em> which uses the C function declarations from the specified include header (in this case math.h from the C standard library). We've replaced the calls to some of the expensive python functions and are ready to build the new shared library and re-test:</p>
<div class="highlight"><pre><span></span><code> <span class="n">t</span> <span class="o">=</span> <span class="n">timeit</span><span class="o">.</span><span class="n">Timer</span><span class="p">(</span><span class="s2">"c2.great_circle(</span><span class="si">%f</span><span class="s2">,</span><span class="si">%f</span><span class="s2">,</span><span class="si">%f</span><span class="s2">,</span><span class="si">%f</span><span class="s2">)"</span> <span class="o">%</span> <span class="p">(</span><span class="n">lon1</span><span class="p">,</span><span class="n">lat1</span><span class="p">,</span><span class="n">lon2</span><span class="p">,</span><span class="n">lat2</span><span class="p">),</span>
<span class="s2">"import c2"</span><span class="p">)</span>
<span class="nb">print</span> <span class="s2">"Cython function (using trig function from math.h)"</span><span class="p">,</span> <span class="n">t</span><span class="o">.</span><span class="n">timeit</span><span class="p">(</span><span class="n">num</span><span class="p">),</span> <span class="s2">"sec"</span>
</code></pre></div>
<p>Now that's a bit more like it. <strong> 0.4 seconds </strong> - a 5x speed increase over the pure python function. What else can we do to speed things up? Well c2.great_circle() is still a python function which means that calling it incurs the overhead of the python API, constructing the argument tuple, etc. If we could write it as a pure C function, we might be able to speed things up a bit. </p>
<p><em>c3.pyx</em></p>
<div class="highlight"><pre><span></span><code><span class="nv">cdef</span><span class="w"> </span><span class="nv">extern</span><span class="w"> </span><span class="nv">from</span><span class="w"> </span><span class="s2">"math.h"</span>:<span class="w"></span>
<span class="w"> </span><span class="nv">float</span><span class="w"> </span><span class="nv">cosf</span><span class="ss">(</span><span class="nv">float</span><span class="w"> </span><span class="nv">theta</span><span class="ss">)</span><span class="w"></span>
<span class="w"> </span><span class="nv">float</span><span class="w"> </span><span class="nv">sinf</span><span class="ss">(</span><span class="nv">float</span><span class="w"> </span><span class="nv">theta</span><span class="ss">)</span><span class="w"></span>
<span class="w"> </span><span class="nv">float</span><span class="w"> </span><span class="nv">acosf</span><span class="ss">(</span><span class="nv">float</span><span class="w"> </span><span class="nv">theta</span><span class="ss">)</span><span class="w"></span>
<span class="nv">cdef</span><span class="w"> </span><span class="nv">float</span><span class="w"> </span><span class="nv">_great_circle</span><span class="ss">(</span><span class="nv">float</span><span class="w"> </span><span class="nv">lon1</span>,<span class="nv">float</span><span class="w"> </span><span class="nv">lat1</span>,<span class="nv">float</span><span class="w"> </span><span class="nv">lon2</span>,<span class="nv">float</span><span class="w"> </span><span class="nv">lat2</span><span class="ss">)</span>:<span class="w"></span>
<span class="w"> </span><span class="nv">cdef</span><span class="w"> </span><span class="nv">float</span><span class="w"> </span><span class="nv">radius</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">3956</span>.<span class="mi">0</span><span class="w"> </span>
<span class="w"> </span><span class="nv">cdef</span><span class="w"> </span><span class="nv">float</span><span class="w"> </span><span class="nv">pi</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">3</span>.<span class="mi">14159265</span><span class="w"></span>
<span class="w"> </span><span class="nv">cdef</span><span class="w"> </span><span class="nv">float</span><span class="w"> </span><span class="nv">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">pi</span><span class="o">/</span><span class="mi">180</span>.<span class="mi">0</span><span class="w"></span>
<span class="w"> </span><span class="nv">cdef</span><span class="w"> </span><span class="nv">float</span><span class="w"> </span><span class="nv">a</span>,<span class="nv">b</span>,<span class="nv">theta</span>,<span class="nv">c</span><span class="w"></span>
<span class="w"> </span><span class="nv">a</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ss">(</span><span class="mi">90</span>.<span class="mi">0</span><span class="o">-</span><span class="nv">lat1</span><span class="ss">)</span><span class="o">*</span><span class="ss">(</span><span class="nv">x</span><span class="ss">)</span><span class="w"></span>
<span class="w"> </span><span class="nv">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ss">(</span><span class="mi">90</span>.<span class="mi">0</span><span class="o">-</span><span class="nv">lat2</span><span class="ss">)</span><span class="o">*</span><span class="ss">(</span><span class="nv">x</span><span class="ss">)</span><span class="w"></span>
<span class="w"> </span><span class="nv">theta</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ss">(</span><span class="nv">lon2</span><span class="o">-</span><span class="nv">lon1</span><span class="ss">)</span><span class="o">*</span><span class="ss">(</span><span class="nv">x</span><span class="ss">)</span><span class="w"></span>
<span class="w"> </span><span class="nv">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">acosf</span><span class="ss">((</span><span class="nv">cosf</span><span class="ss">(</span><span class="nv">a</span><span class="ss">)</span><span class="o">*</span><span class="nv">cosf</span><span class="ss">(</span><span class="nv">b</span><span class="ss">))</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="ss">(</span><span class="nv">sinf</span><span class="ss">(</span><span class="nv">a</span><span class="ss">)</span><span class="o">*</span><span class="nv">sinf</span><span class="ss">(</span><span class="nv">b</span><span class="ss">)</span><span class="o">*</span><span class="nv">cosf</span><span class="ss">(</span><span class="nv">theta</span><span class="ss">)))</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nv">radius</span><span class="o">*</span><span class="nv">c</span><span class="w"></span>
<span class="nv">def</span><span class="w"> </span><span class="nv">great_circle</span><span class="ss">(</span><span class="nv">float</span><span class="w"> </span><span class="nv">lon1</span>,<span class="nv">float</span><span class="w"> </span><span class="nv">lat1</span>,<span class="nv">float</span><span class="w"> </span><span class="nv">lon2</span>,<span class="nv">float</span><span class="w"> </span><span class="nv">lat2</span>,<span class="nv">int</span><span class="w"> </span><span class="nv">num</span><span class="ss">)</span>:<span class="w"></span>
<span class="w"> </span><span class="nv">cdef</span><span class="w"> </span><span class="nv">int</span><span class="w"> </span><span class="nv">i</span><span class="w"></span>
<span class="w"> </span><span class="nv">cdef</span><span class="w"> </span><span class="nv">float</span><span class="w"> </span><span class="nv">x</span><span class="w"></span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nv">i</span><span class="w"> </span><span class="nv">from</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="nv">num</span>:<span class="w"></span>
<span class="w"> </span><span class="nv">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">_great_circle</span><span class="ss">(</span><span class="nv">lon1</span>,<span class="nv">lat1</span>,<span class="nv">lon2</span>,<span class="nv">lat2</span><span class="ss">)</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nv">x</span><span class="w"></span>
</code></pre></div>
<p>Notice that we still have a python function wrapper (<em>def</em>) which takes an extra argument, num. The looping is done inside this function with <code>for i from 0 < = i < num:</code> instead of the more pythonic but slower <code>for i in range(num):</code>. The actual work is done in a C function (<em>cdef</em>) which returns float type. This runs in <strong>0.2 seconds</strong> - a 10x speed boost over the original python function. </p>
<p>Just to confirm that we're doing things optimally, lets write a little app in pure C and time it:</p>
<div class="highlight"><pre><span></span><code><span class="cp">#include</span><span class="w"> </span><span class="cpf"><math .h></span><span class="cp"></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><stdio .h></span><span class="cp"></span>
<span class="cp">#define NUM 500000</span>
<span class="kt">float</span><span class="w"> </span><span class="nf">great_circle</span><span class="p">(</span><span class="kt">float</span><span class="w"> </span><span class="n">lon1</span><span class="p">,</span><span class="w"> </span><span class="kt">float</span><span class="w"> </span><span class="n">lat1</span><span class="p">,</span><span class="w"> </span><span class="kt">float</span><span class="w"> </span><span class="n">lon2</span><span class="p">,</span><span class="w"> </span><span class="kt">float</span><span class="w"> </span><span class="n">lat2</span><span class="p">){</span><span class="w"></span>
<span class="w"> </span><span class="kt">float</span><span class="w"> </span><span class="n">radius</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">3956.0</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">float</span><span class="w"> </span><span class="n">pi</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">3.14159265</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">float</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pi</span><span class="o">/</span><span class="mf">180.0</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">float</span><span class="w"> </span><span class="n">a</span><span class="p">,</span><span class="n">b</span><span class="p">,</span><span class="n">theta</span><span class="p">,</span><span class="n">c</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="mf">90.0</span><span class="o">-</span><span class="n">lat1</span><span class="p">)</span><span class="o">*</span><span class="p">(</span><span class="n">x</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="mf">90.0</span><span class="o">-</span><span class="n">lat2</span><span class="p">)</span><span class="o">*</span><span class="p">(</span><span class="n">x</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">theta</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">lon2</span><span class="o">-</span><span class="n">lon1</span><span class="p">)</span><span class="o">*</span><span class="p">(</span><span class="n">x</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">acos</span><span class="p">((</span><span class="n">cos</span><span class="p">(</span><span class="n">a</span><span class="p">)</span><span class="o">*</span><span class="n">cos</span><span class="p">(</span><span class="n">b</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="n">sin</span><span class="p">(</span><span class="n">a</span><span class="p">)</span><span class="o">*</span><span class="n">sin</span><span class="p">(</span><span class="n">b</span><span class="p">)</span><span class="o">*</span><span class="n">cos</span><span class="p">(</span><span class="n">theta</span><span class="p">)));</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">radius</span><span class="o">*</span><span class="n">c</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="kt">int</span><span class="w"> </span><span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">i</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">float</span><span class="w"> </span><span class="n">x</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">NUM</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span>
<span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">great_circle</span><span class="p">(</span><span class="mf">-72.345</span><span class="p">,</span><span class="w"> </span><span class="mf">34.323</span><span class="p">,</span><span class="w"> </span><span class="mf">-61.823</span><span class="p">,</span><span class="w"> </span><span class="mf">54.826</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"%f"</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="p">);</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Now compile it with <code>gcc -lm -o ctest ctest.c</code> and test it with <code>time ./ctest</code>... about <strong>0.2 seconds as well</strong>. This gives me confidence that my Cython extension is at least as efficient as my C code (which probably isn't saying much as my C skills are weak).</p>
<hr>
<p>Some cases will be more or less optimal for cython depending on how much looping, number-crunching and python-function-calling are slowing you down. In some cases people have reported 100 to 1000x speed boosts. For other tasks it might not be so helpful. Before going crazy rewriting your python code in Cython, keep this in mind:</p>
<blockquote>
<p>"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." -- Donald Knuth</p>
</blockquote>
<p>In other words, write your program in python first and see if it works alright. Most of the time it will... some times it will bog down. Use a <a href="http://docs.python.org/lib/module-hotshot.html">profiler</a> to find the slow functions and re-implement them in cython and you should see a quick return on investment.</p>
<p>Links:
<a href="http://trac.gispython.org/projects/PCL/wiki/WorldMill">WorldMill</a> - a python module by Sean Gillies which uses Cython to provide a fast, clean python interface to the libgdal library for handling vector geospatial data.</p>
<p><a href="http://www.sagemath.org:9001/WritingFastPyrexCode">Writing Fast Pyrex code</a> (Pyrex is the predecessor of Cython with similar goals and syntax)</p>Spatial data in SQLite2008-04-15T00:00:00-06:002008-04-15T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2008-04-15:/spatial-data-in-sqlite.html<p>Slashgeo pointed me to a very interesting set of projects - <a href="http://www.gaia-gis.it/spatialite/">SpatiaLite and VirtualShape</a>. They provide a spatial data engine for the <a href="http://www.sqlite.org/index.html">sqlite</a> database. Think of it as the PostGIS of SQLite. It looks like this extends sqlite's spatial capabilities far beyond the <a href="http://www.gdal.org/ogr/drv_sqlite.html">sqlite OGR driver</a>.</p>
<p>SpatiaLite provides many of the …</p><p>Slashgeo pointed me to a very interesting set of projects - <a href="http://www.gaia-gis.it/spatialite/">SpatiaLite and VirtualShape</a>. They provide a spatial data engine for the <a href="http://www.sqlite.org/index.html">sqlite</a> database. Think of it as the PostGIS of SQLite. It looks like this extends sqlite's spatial capabilities far beyond the <a href="http://www.gdal.org/ogr/drv_sqlite.html">sqlite OGR driver</a>.</p>
<p>SpatiaLite provides many of the basic OGC Simple Features functions - transforming geometries between projections, spatial operations of bounding boxes, and some basic functions to disect, analyze and export geometries. </p>
<p>VirtualShape provides the really neat ability to access a shapefile using the SpatiaLite/SQlite interface without having to import a copy - it reads directly off the shapefile by exposing the shapefile and its attributes as a "virtual table". I can think of a million uses for this. For example, lets say you have a shapefile of US counties and the number of voter in the 2004 election as an attribute in the dbf. You want to find the total voter count in each state:</p>
<div class="highlight"><pre><span></span><code><span class="o">$</span><span class="w"> </span><span class="n">ls</span><span class="w"> </span><span class="o">-</span><span class="mi">1</span><span class="w"> </span><span class="n">counties</span><span class="o">.*</span><span class="w"></span>
<span class="n">counties</span><span class="o">.</span><span class="n">dbf</span><span class="w"></span>
<span class="n">counties</span><span class="o">.</span><span class="n">prj</span><span class="w"></span>
<span class="n">counties</span><span class="o">.</span><span class="n">shp</span><span class="w"></span>
<span class="n">counties</span><span class="o">.</span><span class="n">shx</span><span class="w"></span>
<span class="o">$</span><span class="w"> </span><span class="n">sqlite3</span><span class="w"> </span><span class="n">test</span><span class="o">.</span><span class="n">db</span><span class="w"></span>
<span class="n">sqlite</span><span class="o">></span><span class="w"> </span><span class="o">.</span><span class="n">load</span><span class="w"> </span><span class="s1">'SpatiaLite.so'</span><span class="w"></span>
<span class="n">sqlite</span><span class="o">></span><span class="w"> </span><span class="o">.</span><span class="n">load</span><span class="w"> </span><span class="s1">'VirtualShape.so'</span><span class="w"></span>
<span class="n">sqlite</span><span class="o">></span><span class="w"> </span><span class="n">CREATE</span><span class="w"> </span><span class="n">virtual</span><span class="w"> </span><span class="n">table</span><span class="w"> </span><span class="n">virtual_counties</span><span class="w"> </span><span class="n">using</span><span class="w"> </span><span class="n">VirtualShape</span><span class="p">(</span><span class="n">counties</span><span class="p">);</span><span class="w"></span>
<span class="n">sqlite</span><span class="o">></span><span class="w"> </span><span class="n">select</span><span class="w"> </span><span class="n">sum</span><span class="p">(</span><span class="n">voters</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">total_voters</span><span class="p">,</span><span class="w"> </span><span class="n">state_name</span><span class="w"> </span>
<span class="w"> </span><span class="n">from</span><span class="w"> </span><span class="n">virtual_counties</span><span class="w"> </span>
<span class="w"> </span><span class="n">group</span><span class="w"> </span><span class="n">by</span><span class="w"> </span><span class="n">state_name</span><span class="w"> </span>
<span class="w"> </span><span class="n">order</span><span class="w"> </span><span class="n">by</span><span class="w"> </span><span class="n">total_voters</span><span class="w"> </span><span class="n">desc</span><span class="p">;</span><span class="w"></span>
<span class="mf">9830550.0</span><span class="o">|</span><span class="n">California</span><span class="w"></span>
<span class="mf">7563055.0</span><span class="o">|</span><span class="n">Florida</span><span class="w"></span>
<span class="mf">7346779.0</span><span class="o">|</span><span class="n">Texas</span><span class="w"></span>
<span class="o">...</span><span class="w"></span>
</code></pre></div>
<p>Now this is fairly straightforward non-spatial SQL but the ability to run it against a shapfile without having to export to an intermediate data format is a very valuable tool. </p>
<p>Links:
* <a href="http://www.sqlite.org/whentouse.html">When to use SQlite.</a>
* A <a href="http://video.google.com/videoplay?docid=-5160435487953918649">video presentation</a> by Richard Hipp (the author of sqlite).</p>Shell history - Why not?2008-04-11T00:00:00-06:002008-04-11T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2008-04-11:/shell-history-why-not.html<p>What an odd meme .. I don't know why but I expected some more interesting results. I guess the majority of the commands I use are pretty pedestrian.</p>
<div class="highlight"><pre><span></span><code><span class="n">history</span><span class="o">|</span><span class="n">awk</span><span class="w"> </span><span class="s1">'{a[$2]++ } END{for(i in a){print a[i] " " i}}'</span><span class="o">|</span><span class="n">sort</span><span class="w"> </span><span class="o">-</span><span class="n">rn</span><span class="o">|</span><span class="n">head</span><span class="w"></span>
<span class="mi">163</span><span class="w"> </span><span class="n">vi</span><span class="w"></span>
<span class="mi">48</span><span class="w"> </span><span class="n">screen</span><span class="w"></span>
<span class="mi">29</span><span class="w"> </span><span class="n">python</span><span class="w"></span>
<span class="mi">28 …</span></code></pre></div><p>What an odd meme .. I don't know why but I expected some more interesting results. I guess the majority of the commands I use are pretty pedestrian.</p>
<div class="highlight"><pre><span></span><code><span class="n">history</span><span class="o">|</span><span class="n">awk</span><span class="w"> </span><span class="s1">'{a[$2]++ } END{for(i in a){print a[i] " " i}}'</span><span class="o">|</span><span class="n">sort</span><span class="w"> </span><span class="o">-</span><span class="n">rn</span><span class="o">|</span><span class="n">head</span><span class="w"></span>
<span class="mi">163</span><span class="w"> </span><span class="n">vi</span><span class="w"></span>
<span class="mi">48</span><span class="w"> </span><span class="n">screen</span><span class="w"></span>
<span class="mi">29</span><span class="w"> </span><span class="n">python</span><span class="w"></span>
<span class="mi">28</span><span class="w"> </span><span class="n">ls</span><span class="w"></span>
<span class="mi">17</span><span class="w"> </span><span class="n">cp</span><span class="w"></span>
<span class="mi">17</span><span class="w"> </span><span class="n">cd</span><span class="w"></span>
<span class="mi">9</span><span class="w"> </span><span class="n">sqlite3</span><span class="w"></span>
<span class="mi">6</span><span class="w"> </span><span class="n">rm</span><span class="w"></span>
<span class="mi">5</span><span class="w"> </span><span class="n">sudo</span><span class="w"></span>
<span class="mi">4</span><span class="w"> </span><span class="n">htop</span><span class="w"></span>
</code></pre></div>Working hard for some REST2008-04-02T00:00:00-06:002008-04-02T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2008-04-02:/working-hard-for-some-rest.html<p>I don't spend much time with web programming these days but I decided to give <a href="http://webpy.org/">web.py</a> (the minimalist python web framework) a shot and, while I was at it, try implementing a simple REST api.</p>
<p>First of all, web.py is truly everything it claims to be - small, light …</p><p>I don't spend much time with web programming these days but I decided to give <a href="http://webpy.org/">web.py</a> (the minimalist python web framework) a shot and, while I was at it, try implementing a simple REST api.</p>
<p>First of all, web.py is truly everything it claims to be - small, light and easy to deploy behind <a href="http://www.lighttpd.net/">lighttpd</a>. It gives you a ton of flexibility to implement anything however you want - which is a plus or minus depending on how you look at it. I liked the inifinte flexibility but I can see alot of refactoring taking place and features needing to be implemented just to match the functionality built into a more structured framework like Django.</p>
<p>Back to the REST side of things. So I created a url-mapping to my "resources" or "nouns" and used the HTTP verbs (POST,GET,PUT,DELETE) to supply the interface. This was a joy to do in web.py which made it easy.</p>
<div class="highlight"><pre><span></span><code>urls = ("/thing/(\d+)", "thing")
...
class thing:
def GET(self, thingid):
# select query and render to template
....
def POST(self, thingid):
# insert query and redirect to /thing/thingid
....
def DELETE(self, thingid):
# delete query
....
def PUT(self, thingid):
# use cgi args to run update query on specified thing
....
</code></pre></div>
<p>The hard part came when I realized that HTML forms do not implement DELETE or PUT methods! 2 of the 4 cornerstone HTTP verbs are not implemented in HTML forms? </p>
<p>Surely this can be accomplished with a top-notch AJAX library. I tried Prototype.js and it appears that the PUT and DELETE methods are simply tunneled over POST with an extra arg attached and the server side has to handle it accordingly. So I ended up just using a straight XMLHttpRequest which works but has it's own problems.</p>
<p>How are you supposed to call PUT or DELETE through a web page? Is XMLHttpRequest the only way? What about browsers without javascript?</p>Upcoming books2008-03-12T00:00:00-06:002008-03-12T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2008-03-12:/upcoming-books.html<p>There are two new books coming out this summer which fill a valuable niche in the open-source GIS bookshelf:</p>
<ul>
<li>
<p><a href="http://www.springer.com/statistics/statistical+theory+and+methods/book/978-0-387-78170-9">Applied Spatial Data Analysis with R</a></p>
</li>
<li>
<p><a href="http://www.pragprog.com/titles/gsdgis">Desktop GIS: Mapping the Planet with Open Source</a></p>
</li>
</ul>
<p>These are both written by some of the top developers within their respective topics and I'm really …</p><p>There are two new books coming out this summer which fill a valuable niche in the open-source GIS bookshelf:</p>
<ul>
<li>
<p><a href="http://www.springer.com/statistics/statistical+theory+and+methods/book/978-0-387-78170-9">Applied Spatial Data Analysis with R</a></p>
</li>
<li>
<p><a href="http://www.pragprog.com/titles/gsdgis">Desktop GIS: Mapping the Planet with Open Source</a></p>
</li>
</ul>
<p>These are both written by some of the top developers within their respective topics and I'm really looking forward to reading them. </p>Google Earth and the tilt sensor joystick on the X61s2008-02-17T00:00:00-07:002008-02-17T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2008-02-17:/google-earth-and-the-tilt-sensor-joystick-on-the-x61s.html<p>The X61s is one bad-ass machine. Besides the great performance, battery life and solid engineering, there are other hidden gems. Like the tilt sensors that were designed to protect the hard drive in case of a drop can also be used to detect the laptops motion under more normal circumstances …</p><p>The X61s is one bad-ass machine. Besides the great performance, battery life and solid engineering, there are other hidden gems. Like the tilt sensors that were designed to protect the hard drive in case of a drop can also be used to detect the laptops motion under more normal circumstances. </p>
<p>There are <a href="http://www-128.ibm.com/developerworks/linux/library/l-knockage.html">some</a> <a href="http://www.pberndt.com/Programme/Linux/pyhdaps/index.html#">interesting</a> <a href="http://blog.micampe.it/articles/2006/06/04/here-comes-the-smackpad">applications</a> that use some simple statistics to determine when the machine is "tapped" or julted to left or right. You can then assign actions to unique combinations of taps.</p>
<p>These applications all use the sysfs interface to the sensors (_ cat /sys/bus/platform/devices/hdaps/position _ will show your position in the x and y axis). But the sensors also provide a joystick interface that allow you to tilt the laptop along the two horizontal axes to control any number of applications. Including Google Earth.</p>
<ol>
<li>
<p>Install <a href="http://www.thinkwiki.org/wiki/Tp_smapi">tp_smapi</a></p>
</li>
<li>
<p>Test the sensors by running hdaps-gl , a simple OpenGL app showing the real-time tilt of your thinkpad.</p>
</li>
<li>
<p>Run jscal to calibrate the joystick. You'll need to install the "joystick" package for this. The command is:
<code>jscal -c /dev/input/js0</code>
After which you should keep your laptop level for a few seconds. Then, when prompted, tilt left, center, right, back (towards you), center, then forward.</p>
</li>
<li>
<p>Now fire up Google Earth. Open the Options menu, go to Navigation and select Enable Contoller. </p>
</li>
</ol>
<p><img alt="" src="/assets/img/GE_joystick.jpg"></p>
<ol>
<li>You should now be able to zoom around by tilting the laptop. The keyboard shortcuts really help when you're in this mode (Ctl-Up/Down to zoom, Shift-Up/Down to tilt, Shift-Left/Right to pivot). </li>
</ol>
<p>There's also a neat <a href="http://www.metafilter.com/52312/More-accellerometer-goodness">Perl-script technique to control a web-based google map</a> which has some cool potential for an openlayers based system. </p>
<p>Since most Apple laptops have a similar sensor, you should be able to get the same thing going on your Macbook. Try it out..its alot more fun that using the mouse!</p>The shiny new X61s2008-02-16T00:00:00-07:002008-02-16T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2008-02-16:/the-shiny-new-x61s.html<p>My HP laptop was nearing 5 years old. It had held up extremely well but most modern software taxed it to the absolute limits (just having firefox open with a flash ad in one tab was enough to send the system load through the roof). So I decided to try …</p><p>My HP laptop was nearing 5 years old. It had held up extremely well but most modern software taxed it to the absolute limits (just having firefox open with a flash ad in one tab was enough to send the system load through the roof). So I decided to try something new.</p>
<p>I was looking for something in the ultra-portable range. I tried out the OLPC and looked seriously at the Asus eeepc for a while. But they were far too difficult for me to type on. Ergonomics were extremely important and the only ultraportable that consistently rated high in that department was the IBM/Lenovo thinkpads. The X61s was appealing with its low voltage core2 duo and 2GB of RAM. All that in a small package about 3 lbs and about an inch thick.</p>
<p><img alt="" src="/assets/img/x61s.jpg"></p>
<p>So the X61s arrived and I figured I'd give it a try with the "stock" software. It was my first experience with Vista and I gave it my best shot. After about 1/2 hour of excessive clicking, boggy performance and pop-up windows, I shrunk the ntfs partition and installed Ubuntu Hardy Heron Alpha 4. </p>
<p>Sound, wireless with WPA, Compiz with 3D; the major things that normally plague a linux laptop install worked right out of the box. On the other hand, I'm running into a few bugs in nautilus (this is is alpha software after all), I can't get bluetooth working, suspending to ram works but is a little buggy (have to restart some services manually) and I had to edit a few config files and compile a kernel module to utilize all the bells and whistles provided by the hardware. But it is still more fun than using Vista.</p>
<p>One thing that really shines on this machine is the battery. I got the 8-cell extended life battery and used some powertop tweaks cut my power consumption and was able to get the wattage down in the 10 to 15 watt range depending on usage patterns. No wonder it is energy star compliant! With that kind of wattage and battery capacity, I'm easily getting about 6 to 7 full hours of battery life.</p>
<p>Some tips if you're setting up Linux on your X61s:</p>
<ul>
<li>
<p>First and foremost, read <a href="http://thinkwiki.org">thinkwiki</a>. There you'll find 95% of your answers. But to summarize my experience: </p>
</li>
<li>
<p>Upgrade your BIOS first (this is a good reason to keep your Vista partion around since Lenovo ships some handy update utils for windows). </p>
</li>
<li>
<p>Install the <a href="http://www.thinkwiki.org/wiki/Tp_smapi">tp_smapi kernel module</a> with HDAPS support. This will enable Linux to access the hard drive sensors for disk protection, motion sensing and the joystick interface</p>
</li>
<li>
<p>The big blue "Thinkvantage" button doesn't work out of the box. I'm not sure what it <em>should</em> do but its a nicely placed button so <a href="http://www.thinkwiki.org/wiki/How_to_get_special_keys_to_work#acpi_fakekey">don't let it go to waste</a>.</p>
</li>
<li>
<p>Tweak the power consumption. For the impatient, just install powertop and follow the instructions .. it will tell you what processes are waking your CPU and how to stop them. Also check out <a href="http://lesswatts.org">Less Watts</a> - a full resource for tweaking linux power consumption.</p>
</li>
<li>
<p>Configure your <a href="http://www.thinkwiki.org/wiki/How_to_configure_the_TrackPoint">trackpoint</a> pointer and buttons. This involves setting up you xorg.conf file to emulate a middle scroll wheel as well as tweaking the speed and sensitivity of the pointer. BTW - if you've never tried a pointer, give it a shot ... I've found it much more comfortable than a touchpad.</p>
</li>
<li>
<p><a href="http://samwel.tk/laptop_mode/">Laptop-mode</a> , a set of kernel and userspace tools to manage hard-drive power consumption, can be handy. It can also be <a href="https://bugs.launchpad.net/mandriva/+source/laptop-mode-tools/+bug/59695">deadly to your disk if configured incorrectly</a>. Basically it aggressively spins down the disk after short periods of inactivity to save power. Inevitably an application will try to hit the disk again and it will spin right back up. This leads to an unreasonably high amount of load cycles (100 per hour) and the drive can only handle a finite amount before failure (~600,000). You can configure it for more sane behavior but do your research before you enable laptop-mode! And check out smartctl to monitor the disks health. </p>
</li>
<li>
<p>If, after you unsuspend the machine, your screen is way too dark, try Ctl-Alt-F1 followed by Ctl-Alt-F7. There are some other hacks involving acpi configuration or grub kernel options but none of them have worked for me yet.</p>
</li>
</ul>Human Impacts on the Global Marine Ecosystem2008-02-15T00:00:00-07:002008-02-15T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2008-02-15:/human-impacts-on-the-global-marine-ecosystem.html<p><a href="http://sciencenow.sciencemag.org/cgi/content/full/2008/214/2">We did it</a>!</p>
<p>As some of you may know in 2005 through 2006, I was part of a research team[1] , led by Ben Halpern at NCEAS, developing a global model of human impacts on the marine ecosystem. We created or compiled 17 high-resolution global datasets of human-induced threats (land-based …</p><p><a href="http://sciencenow.sciencemag.org/cgi/content/full/2008/214/2">We did it</a>!</p>
<p>As some of you may know in 2005 through 2006, I was part of a research team[1] , led by Ben Halpern at NCEAS, developing a global model of human impacts on the marine ecosystem. We created or compiled 17 high-resolution global datasets of human-induced threats (land-based pollutants, fishing, shipping, climate change, etc.) and 20 ocean habitat datasets. These were combined to create an impact index which models the cumulative level of human-induced stress on our oceans. </p>
<p><a href="http://ebm.nceas.ucsb.edu/GlobalMarine/models/model/jpg/model_high_res.jpg"><img alt="" src="/assets/img/map_400.jpg"></a></p>
<p>The results were published today in <a href="http://www.sciencemag.org/cgi/content/abstract/319/5865/948">Science</a> magazine and presented yesterday at the <a href="http://news.aaas.org/releases/2008_ann_mtg/scientists-track-human-footpri.html">AAAS Annual Meeting</a>. To summarize, we found that the entire ocean is affected and 40% is heavily impacted. It is not all bad news as there are many areas of relatively low impact which could provide examples for ecosystem restoration and opportunities for conservation. The global map is the first of its kind and will help clarify and quantify our cumulative impacts on the ocean and allow us to focus efforts geographically. The model is not perfect and can't really be used to make decisions at a very localized scale but, given the available globally-consistent, reasonably-high-resolution data for all the various ocean threats and habitats, this is the best effort to date. The model itself is relatively simple with a very clear methodology which will allow scientists to tweak the parameters and add better data as it becomes available. For those of you interested in the GIS modeling end, NCEAS has a <a href="http://www.nceas.ucsb.edu/GlobalMarine">great summary</a> of the data used in the model. Most of the data are available as raster data products or KML.</p>
<p>The media has picked up on the story with <a href="http://www.npr.org/templates/story/story.php?storyId=19059595">NPR</a>, <a href="http://www.msnbc.msn.com/id/23155918/">MSNBC</a>, <a href="http://www.washingtonpost.com/wp-dyn/content/article/2008/02/14/AR2008021401992.html?hpid=topnews">The Washington Post</a>, <a href="http://www.usatoday.com/tech/science/environment/2008-02-14-oceans-human-activity_N.htm">USA Today</a> and <a href="http://news.nationalgeographic.com/news/2008/02/080214-oceans.html">National Geographic</a> covering it (to name a few). I especially recommend the NPR site as it has a great animation and an audio segment. </p>
<p>So congratulations to everyone who made this happen! </p>
<p><em>[1] Benjamin S. Halpern, Shaun Walbridge, Kimberly A. Selkoe, Carrie V. Kappel, Fiorenza Micheli, Caterina D'Agrosa, John F. Bruno, Kenneth S. Casey, Colin Ebert, Helen E. Fox, Rod Fujita, Dennis Heinemann, Hunter S. Lenihan, Elizabeth M.P. Madin, Matthew T. Perry, Elizabeth R. Selig, Mark Spalding, Robert Steneck, Reg Watson (2008). A global map of human impact on marine ecosystems. Science, vol. 319</em></p>
<hr>
<p>EDIT:</p>
<p>Some additional articles:</p>
<ul>
<li>
<p><a href="http://www.nytimes.com/interactive/2008/02/25/science/earth/20080225_COAST_GRAPHIC.html">New York Times</a></p>
</li>
<li>
<p><a href="http://youtube.com/watch?v=0qh49Da5A5M">BBC Video</a> on YouTube </p>
</li>
</ul>Why is the command line a dying art?2008-02-02T00:00:00-07:002008-02-02T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2008-02-02:/why-is-the-command-line-a-dying-art.html<p>Sadly, a lot of GIS folks have never come into contact with a command line interface (CLI) . I've met even experienced computer users who, when faced with a command-line prompt, experience some autonomous nervous system lock up that causes their eyes to glaze over and prevents any knowledge from entering …</p><p>Sadly, a lot of GIS folks have never come into contact with a command line interface (CLI) . I've met even experienced computer users who, when faced with a command-line prompt, experience some autonomous nervous system lock up that causes their eyes to glaze over and prevents any knowledge from entering their brain from that moment forward. The all-Windows, all-GUI mentality of the current GIS market leaders just doesn't expose you to it (if you remember working with coverages at the ESRI Arc/Info command line, you official qualify as an "old-timer"). And the DOS command line is virtually invisible to XP and vista users. Linux users are more CLI aware but this is even becoming less important as distros such as ubuntu GUI-ify everything.</p>
<p>So why the fear of the command line? Why is it assumed to be more "complicated" than a graphical user interface (GUI)? I have found that, in some cases, the opposite is true ... there is something reassuringly simple about typing something and getting a response back. It feels like you are in direct control of the computer. Which, indeed, you <em>always</em> are. The computers always do exactly what you tell them, whether you are in a GUI or a CLI. But GUIs attempt to abstract away the details so that you <em>don't need to know</em> exactly what you're telling the computer to do. This nice fluffy feeling comes at the cost of many important factors. </p>
<h2>The benefits of the command line interface</h2>
<h3>Automation</h3>
<p>If you had, for instance, monitoring data coming in in a hourly basis and needed to process the data, would you want to be on call 24 hours a day to click a few buttons. Of course not. Write a command that performs the job and schedule it to execute at some regular interval. (I wonder if those guys on LOST ever thought to just set up a cron job to enter the numbers in the hatch?) </p>
<h3>Repeatability</h3>
<p>Whenever I show someone a CLI-based method for solving their problem, they almost immediately say (or at least imply) that the typing is too much trouble. Consider this command to convert a .tif image to ERDAS .img (HFA) format:</p>
<div class="highlight"><pre><span></span><code>cd /data/images
gdal_translate -of HFA aerial.tif aerial.tif.img
</code></pre></div>
<p>You might ask, "Why not just use a GUI, click a button or two, and get your output". Sure. Now do that for 2,000 tif images. With a CLI you only have to type a few extra lines. </p>
<div class="highlight"><pre><span></span><code><span class="nv">cd</span><span class="w"> </span><span class="o">/</span><span class="nv">data</span><span class="o">/</span><span class="nv">images</span><span class="w"></span>
<span class="k">for</span><span class="w"> </span><span class="nv">i</span><span class="w"> </span><span class="nv">in</span><span class="w"> </span><span class="o">*</span>.<span class="nv">tif</span><span class="c1">; do </span><span class="w"></span>
<span class="w"> </span><span class="nv">gdal_translate</span><span class="w"> </span><span class="o">-</span><span class="nv">of</span><span class="w"> </span><span class="nv">HFA</span><span class="w"> </span>$<span class="nv">i</span><span class="w"> </span>$<span class="nv">i</span>.<span class="nv">img</span><span class="c1">;</span><span class="w"></span>
<span class="nv">done</span><span class="w"></span>
</code></pre></div>
<h3>Documentability</h3>
<p>There is nothing more important to a GIS Analyst than documenting his/her work! We live by metadata and methods write-ups. Now picture an intense 5 hour work session ... everything needed to get out by 2pm. You're done and now it's time to document your procedure and methods. With the CLI, you copy and paste your commands from the terminal or simply look at your command history which will show <em>exactly</em> what you did and how. You can store this in a text file and come back to it months later and be able to re-run the procedure. </p>
<p>With the GUI, you have to remember and describe every click, every sub-menu, every option, every action taken to arrive at the answer. Often this requires verbose description, screenshots, etc. None of which is recorded in any history file of course. And of course, when the client inevitably comes back the next day with modifications, none of it is repeatable in any automated fashion with a GUI. </p>
<h3>Accessibility</h3>
<p>It's just plain text with a CLI. You can print it out and study it on the bus. You can email the whole process to co-workers. You can use a concurrent versioning system to keep track of changes to scripted procedures. You can transfer massive amounts of knowledge without having to sit down and go through everything step-by-step, click-by-click in a visual interface. </p>
<h3>Accuracy</h3>
<p>Far too often, GUI designers make over-reaching assumptions about how things should work. The idea is often that the user should not need to know anything more than the absolute minumum. To use a car analogy, the driver turns the key, presses the pedal and steers but does not need to know what goes on under the hood. This works most of the time. But the <a href="http://en.wikipedia.org/wiki/Leaky_abstraction">law of leaky abstractions</a> usually takes hold and something inevitably breaks or performs differently than expected. Since the CLI does not hold your hand (it executes the exact command you give it) it more accurately mimics the actual physical interaction with the computer and is much more useful in debugging and investigating complex problems. </p>
<p>So basically, don't make the mistake of thinking that a pretty window will always contain the magic button to get the job done. In many cases, a command line is much more efficient, even essential. If you don't know how to effectively work in a command-line environment, do yourself a huge favor and learn.</p>
<p>Oh and I'd be remiss if I didn't mention <a href="http://www.amazon.com/Beginning-was-Command-Line-Neal-Stephenson/dp/0380815931">Neal Stephenson's book </a> on the subject ... a bit technically outdated but a great quick read on why command lines are still very relevant in the face of increasingly sophisticated graphical interfaces.</p>Impervious surface deliniation with GRASS2008-01-26T00:00:00-07:002008-01-26T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2008-01-26:/impervious-surface-deliniation-with-grass.html<p>Watersheds with lots of roads, buildings, parking lots, rock surfaces, compacted dirt, etc tend to prevent inflitration and cause rapid runoff in response to rainfall. This poses a <a href="http://chesapeake.towson.edu/landscape/impervious/what_imp.asp">number of challenges for managing stormwater</a> and water quality. Not surprisingly, the percentage of hydrologically impervious surface in a given watershed is …</p><p>Watersheds with lots of roads, buildings, parking lots, rock surfaces, compacted dirt, etc tend to prevent inflitration and cause rapid runoff in response to rainfall. This poses a <a href="http://chesapeake.towson.edu/landscape/impervious/what_imp.asp">number of challenges for managing stormwater</a> and water quality. Not surprisingly, the percentage of hydrologically impervious surface in a given watershed is an important factor in many hydrologic models. Using standard aerial photography and GRASS, it's a relatively simple process to create an impervious surface map using supervised classification.</p>
<p>First find an aerial photo. I grabbed a NAIP image from <a href="http://new.casil.ucdavis.edu/casil/remote_sensing/naip_2005/county_mosaics/">CASIL</a> but you might want to try <a href="http://crschmidt.net/blog/archives/285/producing-a-large-image-from-openaerialmap/">using OpenAerialMap</a>. The red, green and blue visible bands are usually sufficient for differentiating between impervious and pervious land use types... For distinguishing different types of vegetation you might want to use a multispectral imagery source with non-visible bands (ie near infrared) but this is usually lower resolution (eg. 30 meter pixels of Landsat) or much more expensive.</p>
<p>Next we jump into GRASS and import our image into a new location:</p>
<div class="highlight"><pre><span></span><code>r.in.gdal -e input=naip.img output=naip location=impervious
</code></pre></div>
<p>Exit and log back into your new location. If you look at the imported rasters, you'll see three rasters, not one. Each band (R, G and B) gets imported separately.</p>
<div class="highlight"><pre><span></span><code>GRASS 6.3.cvs (impervious):~/> g.list rast
raster files available in mapset permanent:
naip.1 naip.2 naip.3
</code></pre></div>
<p>We need to indicate that these rasters form a logical group</p>
<div class="highlight"><pre><span></span><code><span class="n">i</span><span class="p">.</span><span class="k">group</span><span class="w"> </span><span class="k">group</span><span class="o">=</span><span class="n">naip2</span><span class="w"> </span><span class="n">subgroup</span><span class="o">=</span><span class="n">naip2</span><span class="w"> </span><span class="k">input</span><span class="o">=</span><span class="n">naip</span><span class="mf">.3</span><span class="nv">@PERMANENT</span><span class="p">,</span><span class="n">naip</span><span class="mf">.2</span><span class="nv">@PERMANENT</span><span class="p">,</span><span class="n">naip</span><span class="mf">.1</span><span class="nv">@PERMANENT</span><span class="w"></span>
<span class="n">i</span><span class="p">.</span><span class="n">target</span><span class="w"> </span><span class="o">-</span><span class="n">c</span><span class="w"> </span><span class="k">group</span><span class="o">=</span><span class="n">naip2</span><span class="w"></span>
</code></pre></div>
<p>At any time you can list the rasters in a given group/subgroup to confirm.</p>
<div class="highlight"><pre><span></span><code>i.group -l -g group=naip2 subgroup=naip2
</code></pre></div>
<p>Now the real heart of the process. We need to define "training areas" which are polygons around representative land use types. I used QGIS to load the aerial photo and create a new polygon layer with an integer attribute field called vegnum. I digitized a few rocks, paved areas, rooftops and dirt roads to represent the impervious areas to which I assigned vegnum=1. Then I selected some grasslands, forests, lakes and chaparral and assigned 2 as the vegnum. The next step is to load the polygon data into GRASS and rasterize it (<em>in retrospect it would have just been easier to create the grass vector layer from scratch in QGIS to avoid the import step</em>). Note that the vegnum field is specified as the raster value column.</p>
<div class="highlight"><pre><span></span><code>v.in.ogr -o dsn=./training/train1_utm/train1_utm.shp output=train1 layer=train1_utm min_area=0.0001 type=boundary snap=-1
v.to.rast input=train1 output=train1 use=attr column=vegnum type=point,line,area layer=1 value=1 rows=4096
</code></pre></div>
<p>Next we use i.gensig to generate a spectral signature (the statistical profile; mean and covariance matrix of the input pixels) for the training areas. </p>
<div class="highlight"><pre><span></span><code>i.gensig trainingmap=train1 group=naip2 subgroup=naip2 signaturefile=naip2_train1.sig
</code></pre></div>
<p>Now that we have a signature of impervious vs. non-impervious surfaces, we can use the maximum likelihood method to classify each pixel into the highest probability category.</p>
<div class="highlight"><pre><span></span><code>i.maxlik group=naip2 subgroup=naip2 sigfile=naip2_train1.sig class=imperv
</code></pre></div>
<p>You might notice a slight speckled, noisy appearance due to things like shadows, reflections or imperfect training areas. Usually these small 1-pixel deviations are not interesting enough to keep so we can smooth out the image taking the mode (most comon) cell in a 3x3 window.</p>
<div class="highlight"><pre><span></span><code>r.neighbors input=imperv output=imperv_mode method=mode size=3
</code></pre></div>
<p>And here are the results... calculating imperviousness will most likely be an iterative process so be prepared to evaluate the output, tweak the training areas and rerun the process a few times. Once you're happy with the results, you can use zonal statistics with a tool like starspan to find the percent imperviousness of your watersheds or other regions.</p>
<p><img alt="" src="/assets/img/aerial.jpg"></p>
<p><img alt="" src="/assets/img/imperv_smooth.png"></p>A GUI for GDAL and GMT'2008-01-06T00:00:00-07:002008-01-06T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2008-01-06:/a-gui-for-gdal-and-gmt.html<p>In the why-haven't-I-ever-heard-of-this department:</p>
<blockquote>
<p><a href="http://w3.ualg.pt/%7Ejluis/mirone/manual.htm">Mirone</a> is a Windows MATLAB-based framework tool that allows the display and manipulation of a large number of grid formats through its interface with the GDAL library. Its main purpose is to provide users with an easy-to-use graphical interface to the more commonly used programs of …</p></blockquote><p>In the why-haven't-I-ever-heard-of-this department:</p>
<blockquote>
<p><a href="http://w3.ualg.pt/%7Ejluis/mirone/manual.htm">Mirone</a> is a Windows MATLAB-based framework tool that allows the display and manipulation of a large number of grid formats through its interface with the GDAL library. Its main purpose is to provide users with an easy-to-use graphical interface to the more commonly used programs of the GMT package. </p>
</blockquote>
<p>There is also a version that does not depend on MATLAB which is what I decided to try. This is a great package; easy to install, very usable, lots of high-end raster functionality, and a good sense of humor...</p>
<p><img alt="" src="/assets/img/mirone.png"></p>
<p>Considering GMT and GDAL can be a bit challenging and unfamiliar for a typical windows user, Mirone is a huge step forward. </p>
<p>Among some of the functionality that is an absolute pleasure to work with compared to some other software packages: surface profiles, image-flipping, DEM derivatives, color-ramping, contouring, histograms, kernel filtering... And that's just scratching the surface. I highly recommend checking it out.</p>More on Google Charts and a python interface2007-12-19T00:00:00-07:002007-12-19T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2007-12-19:/more-on-google-charts-and-a-python-interface.html<p>Well it's been almost a full two weeks since <a href="http://code.google.com/apis/chart/">google charts API</a> came out. A really nice service but it's only going to be useful with a high-level programming API. Enter <a href="http://pygooglechart.slowchop.com/">PyGoogleChart</a> .. a python interface to generate google chart urls. </p>
<p>Taking one of my <a href="http://www.perrygeo.net/wordpress/?p=64">previous example datasets</a>, here's the 10-second …</p><p>Well it's been almost a full two weeks since <a href="http://code.google.com/apis/chart/">google charts API</a> came out. A really nice service but it's only going to be useful with a high-level programming API. Enter <a href="http://pygooglechart.slowchop.com/">PyGoogleChart</a> .. a python interface to generate google chart urls. </p>
<p>Taking one of my <a href="http://www.perrygeo.net/wordpress/?p=64">previous example datasets</a>, here's the 10-second howto:</p>
<div class="highlight"><pre><span></span><code><span class="o"><</span><span class="n">blockquote</span><span class="o">></span><span class="kn">from</span> <span class="nn">pygooglechart</span> <span class="kn">import</span> <span class="n">SimpleLineChart</span>
<span class="n">chart</span> <span class="o">=</span> <span class="n">SimpleLineChart</span><span class="p">(</span><span class="mi">400</span><span class="p">,</span><span class="mi">200</span><span class="p">)</span>
<span class="n">data</span> <span class="o">=</span> <span class="p">[</span><span class="mf">32.5</span><span class="p">,</span><span class="mf">35.2</span><span class="p">,</span><span class="mf">39.9</span><span class="p">,</span><span class="mf">40.8</span><span class="p">,</span><span class="mf">43.9</span><span class="p">,</span><span class="mf">48.2</span><span class="p">,</span><span class="mf">50.5</span><span class="p">,</span><span class="mf">51.9</span><span class="p">,</span><span class="mf">53.1</span><span class="p">,</span><span class="mf">55.9</span><span class="p">,</span><span class="mf">60.7</span><span class="p">,</span><span class="mf">64.4</span><span class="p">]</span>
<span class="n">chart</span><span class="o">.</span><span class="n">add_data</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="n">url</span> <span class="o">=</span> <span class="n">chart</span><span class="o">.</span><span class="n">get_url</span><span class="p">()</span>
<span class="nb">print</span> <span class="n">url</span>
<span class="o"></</span><span class="n">blockquote</span><span class="o">></span>
</code></pre></div>
<p>which gives us:</p>
<blockquote>
<p>http://chart.apis.google.com/chart?cht=lc&chs;=400x200&chd;=t:32.5,35.2,39.9,40.8,43.9,48.2,50.5,51.9,53.1,55.9,60.7,64.4</p>
</blockquote>
<p>and our chart image:</p>
<p><img alt="" src="http://chart.apis.google.com/chart?cht=lc&chs=400x200&chd=t:32.5,35.2,39.9,40.8,43.9,48.2,50.5,51.9,53.1,55.9,60.7,64.4"></p>Geologist vs. Engineer2007-12-12T00:00:00-07:002007-12-12T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2007-12-12:/geologist-vs-engineer.html<p><a href="http://uncyclopedia.org/wiki/Main_Page">Uncylopedia</a>, the self-proclaimed encyclopedia "full of misinformation and utter lies", has a hillarious <a href="http://uncyclopedia.org/wiki/Geologist">article about Geologists</a>. I especially like the "<a href="http://uncyclopedia.org/wiki/Geologist#The_Great_Geologist-Engineer_Controversy">Geologist-Engineer Controversy</a>" which, having worked with both geologists and engineers extensively, is a pretty accurate portrayal of their respective approaches.</p>
<blockquote>
<p>Geology, being an art as much as a science, has …</p></blockquote><p><a href="http://uncyclopedia.org/wiki/Main_Page">Uncylopedia</a>, the self-proclaimed encyclopedia "full of misinformation and utter lies", has a hillarious <a href="http://uncyclopedia.org/wiki/Geologist">article about Geologists</a>. I especially like the "<a href="http://uncyclopedia.org/wiki/Geologist#The_Great_Geologist-Engineer_Controversy">Geologist-Engineer Controversy</a>" which, having worked with both geologists and engineers extensively, is a pretty accurate portrayal of their respective approaches.</p>
<blockquote>
<p>Geology, being an art as much as a science, has always baffled and worried engineers, hence the engineers' defensive weapons of pocket protectors, slide rules, black socks, and eventually computers.</p>
</blockquote>
<p>A related joke:</p>
<blockquote>
<p>A geologist and engineer walk into a job interview. They are each asked a simple math question : 'What is 2 times 2?'. The engineer replies, 'It's 4.00000'. The geologist replies, 'Ah.. it's about 4'</p>
</blockquote>Quick way to publish a point shapefile to html2007-12-10T00:00:00-07:002007-12-10T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2007-12-10:/quick-way-to-publish-a-point-shapefile-to-html.html<p>There are better ways to put data on the web but my latest little project wasn't about the <em>best</em> way but the quickest way to get some spatial data into the hands of those unfortunate souls who don't have GIS software. The goals were pretty simple:</p>
<ul>
<li>
<p>Take a single point …</p></li></ul><p>There are better ways to put data on the web but my latest little project wasn't about the <em>best</em> way but the quickest way to get some spatial data into the hands of those unfortunate souls who don't have GIS software. The goals were pretty simple:</p>
<ul>
<li>
<p>Take a single point shapefile (or other OGR readable vector data source)</p>
</li>
<li>
<p>Convert it into html/js that would use one of the web mapping APIs to display the points and all their attributes. </p>
</li>
<li>
<p>The output had to be a standalone, self-contained html file that could be emailed. No server side anything required.</p>
</li>
</ul>
<p>I came up with a quick python hack to do the job (<a href="http://perrygeo.googlecode.com/svn/trunk/gis-bin/shp2Mapstraction.py">source code</a>). <a href="http://www.mapstraction.com/">Mapstraction</a>, with it's goal of providing a common javascript API for a number of map providers, seemed like an obvious choice. The python portion of the code reads the shapefile using OGR (you will need the python-gdal bindings, see FWTools) and constructs the html/js. All the javascript is sourced to external URLs so there is no software dependency except for a working network connection. </p>
<p>This allows for a single command:</p>
<blockquote>
<p>shp2Mapstraction.py bearboxes.shp bearboxes.html Yahoo</p>
</blockquote>
<p><img alt="" src="/assets/img/bearboxes.jpg"></p>
<p>which produces <a href="/assets/img/bearboxes.html">an html file</a> providing a Yahoo maps interface to the data; in this case the point location of all the bear boxes (food storage lockers to keep your stuff separated from the bears) in the Sierra Nevada. </p>
<p>Currently it just supports Microsoft Virtual Earth and Yahoo. I had to bypass Google because their key system is restricted by URL. And the mapstraction-to-openlayers connection wasn't working too well though I haven't really investigated.</p>
<p>Anyways, it provides a quick and easy way to deliver spatial data to anyone with a browser and internet connection. </p>Google Charts - their latest web service2007-12-06T00:00:00-07:002007-12-06T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2007-12-06:/google-charts-their-latest-web-service.html<p>Google Charts is a <a href="http://code.google.com/apis/chart/">web based API</a> for generating charts/graphs. It supports alot of the common types of graphics including line, pie, bar, scatter plots and Venn diagrams. I've relied on a bunch of other server-side graph generators (<a href="http://www.maptools.org/owtchart/index.phtml">owtchart</a>, <a href="http://www.aditus.nu/jpgraph/">jpgraph</a>, <a href="http://www.perrygeo.net/wordpress/?p=64">sparklines</a>, <a href="http://matplotlib.sourceforge.net/">matplotlib</a>, etc) but this looks like it might …</p><p>Google Charts is a <a href="http://code.google.com/apis/chart/">web based API</a> for generating charts/graphs. It supports alot of the common types of graphics including line, pie, bar, scatter plots and Venn diagrams. I've relied on a bunch of other server-side graph generators (<a href="http://www.maptools.org/owtchart/index.phtml">owtchart</a>, <a href="http://www.aditus.nu/jpgraph/">jpgraph</a>, <a href="http://www.perrygeo.net/wordpress/?p=64">sparklines</a>, <a href="http://matplotlib.sourceforge.net/">matplotlib</a>, etc) but this looks like it might be a contender.</p>
<p>Still there is no higher-level programming API yet ... but give it a few days (interface with numpy anyone?). <a href="http://exilejedi.livejournal.com/189606.html">ExileJedi blog lists </a>some other potential disadvantages:</p>
<blockquote>
<div class="highlight"><pre><span></span><code><span class="o">*</span><span class="w"> </span><span class="nv">You</span><span class="w"> </span><span class="nv">are</span><span class="w"> </span><span class="nv">limited</span><span class="w"> </span><span class="nv">to</span><span class="w"> </span><span class="mi">50</span>,<span class="mi">000</span><span class="w"> </span><span class="nv">queries</span><span class="w"> </span><span class="nv">per</span><span class="w"> </span><span class="nv">user</span><span class="w"> </span><span class="nv">per</span><span class="w"> </span><span class="nv">day</span>,<span class="w"> </span><span class="nv">which</span><span class="w"> </span><span class="nv">may</span><span class="w"> </span><span class="nv">pose</span><span class="w"> </span><span class="nv">some</span><span class="w"> </span><span class="nv">scalability</span><span class="w"> </span><span class="nv">concerns</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">you</span><span class="w"> </span><span class="nv">plan</span><span class="w"> </span><span class="nv">to</span><span class="w"> </span><span class="nv">build</span><span class="w"> </span><span class="nv">something</span><span class="w"> </span><span class="nv">big</span><span class="w"> </span><span class="nv">on</span><span class="w"> </span><span class="nv">this</span>.<span class="w"></span>
<span class="o">*</span><span class="w"> </span><span class="nv">You</span><span class="w"> </span><span class="nv">have</span><span class="w"> </span><span class="nv">to</span><span class="w"> </span><span class="nv">be</span><span class="w"> </span><span class="nv">careful</span><span class="w"> </span><span class="nv">about</span><span class="w"> </span><span class="nv">the</span><span class="w"> </span><span class="nv">number</span><span class="w"> </span><span class="nv">of</span><span class="w"> </span><span class="nv">data</span><span class="w"> </span><span class="nv">points</span><span class="w"> </span><span class="nv">you</span><span class="w"> </span><span class="nv">submit</span><span class="w"> </span><span class="nv">in</span><span class="w"> </span><span class="nv">your</span><span class="w"> </span><span class="nv">request</span><span class="w"> </span><span class="nv">as</span><span class="w"> </span><span class="nv">you</span><span class="w"> </span><span class="nv">can</span><span class="w"> </span><span class="nv">quickly</span><span class="w"> </span><span class="nv">exceed</span><span class="w"> </span><span class="nv">the</span><span class="w"> </span><span class="nv">allowable</span><span class="w"> </span><span class="nv">URL</span><span class="w"> </span><span class="nv">length</span>,<span class="w"> </span><span class="nv">and</span><span class="w"> </span><span class="nv">furthermore</span><span class="w"> </span><span class="nv">you</span><span class="w"> </span><span class="nv">might</span><span class="w"> </span><span class="k">end</span><span class="w"> </span><span class="nv">up</span><span class="w"> </span><span class="nv">with</span><span class="w"> </span><span class="nv">illegibly</span><span class="w"> </span><span class="nv">smooshed</span><span class="o">-</span><span class="nv">together</span><span class="w"> </span><span class="nv">data</span><span class="w"> </span><span class="nv">points</span><span class="w"> </span><span class="nv">due</span><span class="w"> </span><span class="nv">to</span><span class="w"> </span><span class="nv">the</span><span class="w"> </span><span class="nv">scale</span><span class="w"> </span><span class="nv">of</span><span class="w"> </span><span class="nv">your</span><span class="w"> </span><span class="nv">output</span>.<span class="w"></span>
<span class="o">*</span><span class="w"> </span><span class="nv">There</span><span class="s1">'s always the "OMG Google will absorb all our data and become sentient, turn evil, and unleash an army of death robots on us all, run for your lives!" paranoia, but that'</span><span class="nv">s</span><span class="w"> </span><span class="nv">really</span><span class="w"> </span><span class="nv">just</span><span class="w"> </span><span class="nv">silly</span><span class="w"> </span><span class="nv">talk</span>.<span class="w"></span>
</code></pre></div>
</blockquote>
<p>EDIT: It appears this service only support GET requests. On one hand you're adding new data so you should be POSTing it, right? On the other hand, you're asking to GET a graphical representation of a set of numerical values. What would a "restful" version of a web graphing API look like? Maybe some of the REST gurus can clear that up.</p>Take the larger view of GIS2007-12-05T00:00:00-07:002007-12-05T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2007-12-05:/take-the-larger-view-of-gis.html<p>It's interesting to see the passionate responses to <a href="http://apb.directionsmag.com/archives/3703-Neogeography-is-not-GIS;-not-LI.html">Joe Francia's article</a> claiming that neogeography is != GIS. One one side there is a small group of folks bashing neogeography and claiming the superiority of "GIS". On the other side there is the attitude claiming that some "revolution" has occurred which has …</p><p>It's interesting to see the passionate responses to <a href="http://apb.directionsmag.com/archives/3703-Neogeography-is-not-GIS;-not-LI.html">Joe Francia's article</a> claiming that neogeography is != GIS. One one side there is a small group of folks bashing neogeography and claiming the superiority of "GIS". On the other side there is the attitude claiming that some "revolution" has occurred which has supplanted traditional geographic techniques. You'd think there was a cold war going on! Both memes are as wrong as they are arrogant. </p>
<p>I have always defined GIS as</p>
<blockquote>
<p>Geographic Information System: The integration of hardware, software, procedures and people to manage the collection, creation, analysis, synthesis, sharing and visualization of spatial information.</p>
</blockquote>
<p>Neogeography easily fits that bill. So does Enterprise IT. So does Desktop mapping. So does Geostatistics. Geodesy. Web Mapping. Remote Sensing. LBS mobile technologies. Cartography. Surveying. Spatial Analysis and Modeling. Database management. Sensor webs. GPS... These disciplines are all a small piece of the larger puzzle that is GIS (whether their staunch adherents will admit to it or not!). </p>
<p>The key word in this controversial acronym is <strong>S</strong>ystem. In order for any organization to implement a successful GIS, they must figure out a) which technologies will work for them and b) how to integrate them into a coherent whole. All of these aspects of GIS have something to offer so it's important not to get stuck in a rut with blinders on. This goes for all "sides" of this ridiculous "neogeo vs GIS" argument. </p>For the cartographers in the house…2007-12-04T00:00:00-07:002007-12-04T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2007-12-04:/for-the-cartographers-in-the-house.html<p>Here's another one for the blogrolls:</p>
<p><a href="http://strangemaps.wordpress.com/">http://strangemaps.wordpress.com/</a></p>Privacy, Location Technology and Bad Journalism2007-11-20T00:00:00-07:002007-11-20T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2007-11-20:/privacy-location-technology-and-bad-journalism.html<p>The Ventura Star has run an article about <a href="http://www.venturacountystar.com/news/2007/nov/18/where-are-you-in-life-if-you-dont-know-others-do/">privacy issues and modern geolocation technology</a>.</p>
<p>As important as this topic is, <a href="http://www.venturacountystar.com/staff/john-moore/contact/">John Moore</a> (the author) is clearly uninformed. This is a <em>horrible</em> piece of journalism. Moore mixes the potential negative effects of various technology such as RFID, cellular communication, sensor networks …</p><p>The Ventura Star has run an article about <a href="http://www.venturacountystar.com/news/2007/nov/18/where-are-you-in-life-if-you-dont-know-others-do/">privacy issues and modern geolocation technology</a>.</p>
<p>As important as this topic is, <a href="http://www.venturacountystar.com/staff/john-moore/contact/">John Moore</a> (the author) is clearly uninformed. This is a <em>horrible</em> piece of journalism. Moore mixes the potential negative effects of various technology such as RFID, cellular communication, sensor networks, nanotech, community data collection efforts, navigation systems, and GPS into one chilling, over-simplified and baseless viewpoint. Instead of reporting the details of <a href="http://www.geog.ucsb.edu/~good/">Michael Goodchild's</a> talk at Ventura College, he treated us to his own paranoid, incoherent vision of the future of technology. Moore's entire premise is based on the fact that: </p>
<blockquote>
<p>"GPS is a system that basically allows you to know where you are anywhere in the world within one meter" </p>
</blockquote>
<p>That much is true. He uses this fact to extrapolate the conclusion that GPS allows some nefarious force to monitor your groceries, cell phone calls, and indeed your every movement.</p>
<p>GPS <em>recieves</em> satellite signals translates those signals into a location. It takes an entirely different technology to transmit these locations to some third party. I guarantee you that none of my gps tracks have gotten into anyones hands without my consent (come on John Moore, prove me otherwise). </p>
<p>The title speaks volumes to his ignorance: </p>
<blockquote>
<p>"Where are you in life? If you don't know, others using GPS devices do" </p>
</blockquote>
<p>Suggesting that other people with GPS can a) track my movements or b) be tracked by me , shows a complete lack of understanding of the technology. Sure there are privacy dangers. But those dangers must be presented clearly and concisely by someone with half a clue, not this paranoid bullshit journalism. This article would not even pass as a high school essay. </p>Looking for LIDAR services2007-11-12T00:00:00-07:002007-11-12T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2007-11-12:/looking-for-lidar-services.html<p>I'm looking for a LIDAR specialist to fly some sensors around San Diego. Ideally we would need someone who could collect both LIDAR data and digital aerial photography (high-res but only visible spectrum), process the data (generate bare-earth DEMs and georeferenced aerials) and deliver it in a GIS-compatible format. This …</p><p>I'm looking for a LIDAR specialist to fly some sensors around San Diego. Ideally we would need someone who could collect both LIDAR data and digital aerial photography (high-res but only visible spectrum), process the data (generate bare-earth DEMs and georeferenced aerials) and deliver it in a GIS-compatible format. This is in response to the recent fires related to erosion control.. with rainy season coming we'd be on a tight schedule.</p>
<p>Does anyone have any suggestions of good companies who could provide this service? Please feel free to recommend your own services if you think it would be a good fit.</p>
<p>You can also contact me directly at perrygeo+lidar at gmail.com</p>Poetics of Cartography2007-10-20T00:00:00-06:002007-10-20T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2007-10-20:/poetics-of-cartography.html<p>In case you missed the fantastic Chicago public radio program last night on <a href="http://www.thisamericanlife.org/Radio_Episode.aspx?episode=110">This American Life</a>, the NPR-syndicated show did an entire program on "mapping". It goes well beyond the idea of simply mapping our physical infrastructure and really opens up the idea of mapping to the widest possible definition …</p><p>In case you missed the fantastic Chicago public radio program last night on <a href="http://www.thisamericanlife.org/Radio_Episode.aspx?episode=110">This American Life</a>, the NPR-syndicated show did an entire program on "mapping". It goes well beyond the idea of simply mapping our physical infrastructure and really opens up the idea of mapping to the widest possible definition; using all our senses to create a multi-dimensional representation of our world. Within the vast experience of life, mapping is described as the abstract process of summarizing and synthesizing a singular slice of that experience.</p>
<p>The show is available<a href="http://www.thisamericanlife.org/Radio_Episode.aspx?episode=110"> as a stream</a> and is really worth a listen this weekend.</p>
<p>P.S. The title of this post comes directly from a quote by Denis Wood, the author of <a href="http://www.amazon.com/Power-Maps-Denis-Wood/dp/0898624932/ref=pd_bbs_2/104-8757092-7919961?ie=UTF8&s=books&qid=1192849090&sr=8-2">The Power Of Maps</a> and geographer who is mapping some non-conventional aspects of his neighborhood in Raleigh, North Carolina. The first and arguably most interesting portion of the show from a geographer's standpoint.</p>Turning Ubuntu into a GIS workstation2007-10-20T00:00:00-06:002007-10-20T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2007-10-20:/turning-ubuntu-into-a-gis-workstation.html<p>It just keeps getting easier and easier to get a fully functional open source GIS workstation up and running thanks to Ubuntu. The following instructions will take your vanilla installation of <a href="http://www.ubuntu.com/getubuntu">Ubuntu 7.10</a> and add the following top-notch desktop GIS applications:</p>
<ul>
<li>
<p>Postgresql/PostGIS : a relational database with vector spatial …</p></li></ul><p>It just keeps getting easier and easier to get a fully functional open source GIS workstation up and running thanks to Ubuntu. The following instructions will take your vanilla installation of <a href="http://www.ubuntu.com/getubuntu">Ubuntu 7.10</a> and add the following top-notch desktop GIS applications:</p>
<ul>
<li>
<p>Postgresql/PostGIS : a relational database with vector spatial data handling </p>
</li>
<li>
<p>GRASS : A full blown GIS analysis toolset </p>
</li>
<li>
<p>Quantum GIS: A user-friendly graphical GIS application </p>
</li>
<li>
<p>GDAL, Proj, Geos : Libraries and utilities for processing spatial data </p>
</li>
<li>
<p>Mapserver : web mapping program and utilites</p>
</li>
<li>
<p>Python bindings for QGIS, mapserver and GDAL </p>
</li>
<li>
<p>GPSBabel : for converting between various GPS formats </p>
</li>
<li>
<p>R : a high-end statistics package with spatial capabilities </p>
</li>
<li>
<p>GMT : the Generic Mapping Tools for automated high-quality map output </p>
</li>
</ul>
<p>While this is not a comprehensive list of open source GIS software, these packages cover most of my needs. If you want to live on the bleeding edge and have to have the absolute latest versions, you'll be better off installing these from source. But for those of us that want a stable and highly functional GIS workstation with minimal fuss, this is the way to go:</p>
<ol>
<li>
<p>Go to _ System > Administration > Software Sources _ and make sure the universe and multiverse repositories are turned on. Close the window and the list of available software packages will be refreshed.</p>
</li>
<li>
<p>Open up a terminal (ie the command line) via _ Applications > Accessories > Terminal_ and type the following:</p>
</li>
</ol>
<blockquote>
<p>sudo apt-get -y install qgis grass qgis-plugin-grass mapserver-bin gdal-bin cgi-mapserver \
python-qt4 python-sip4 python-gdal python-mapscript gmt gmt-coastline-data \
r-recommended gpsbabel shapelib libgdal1-1.4.0-grass</p>
</blockquote>
<p>The <em>sudo</em> part indicates that the command will be run as the administrator user, _ apt-get -y install_ is the command telling it to install the list of packages and answer yes to any questions that pop up. </p>
<ol>
<li>There is one package that is worth upgrading to the latest and greatest - Quantum GIS. The latest version (0.9) is due out very shortly and has the ability to write plugins using the python programming language. A big plus! </li>
</ol>
<p>Download the latest build from <a href="http://qgis.org/uploadfiles/testbuilds/qgis0.9.0.debs_ubuntu_gutsy.tar.gz">http://qgis.org/uploadfiles/testbuilds/qgis0.9.0.debs_ubuntu_gutsy.tar.gz</a> and extract it ( right-click > Extract Here ). In the directory you'll see 4 .deb files, only 3 of which you'll need unless you plan on doing any development work.</p>
<p>Double click libqgis1_0.9.0_i386.deb and you'll get a message saying an older version is available from directly from ubuntu. We already know this so just close and ignore it. Click <em>Install Package</em> and wait for it to complete then close out.</p>
<p>Repeat for qgis_0.9.0_i386.deb and qgis-plugin-grass_0.9.0_i386.deb (in that order).</p>
<p>And there we have it, about 15 minutes depending on your internet speed and you've installed a high-end GIS workstation built completely on free and open source software.</p>Update to QGIS Geocoding plugin2007-10-19T00:00:00-06:002007-10-19T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2007-10-19:/update-to-qgis-geocoding-plugin.html<p>With the release of QGIS 0.9 imminent , I decided to install in on Windows XP and noticed that <a href="http://www.perrygeo.net/wordpress/?p=60">the geocoding plugin </a>was failing... sure enough I had hardcoded linux temporary directories. So I reworked the python code to determine the temp dir in a more cross-platform way (using tempfile …</p><p>With the release of QGIS 0.9 imminent , I decided to install in on Windows XP and noticed that <a href="http://www.perrygeo.net/wordpress/?p=60">the geocoding plugin </a>was failing... sure enough I had hardcoded linux temporary directories. So I reworked the python code to determine the temp dir in a more cross-platform way (using tempfile.gettempdir() ) and it works fine.</p>
<p>The update can be downloaded <a href="http://perrygeo.googlecode.com/svn/trunk/qgis/geocode.zip">here</a>.</p>
<p>Assuming you've installed qgis in the standard location, just unzip this into C:\Program Files\Quantum GIS\python\plugins (windows) or /usr/share/qgis/python/plugins (Linux) and you should be good to go. Note that you'll have to create the "plugins" directory if it doesn't exist.</p>CTech software goes multithreaded2007-10-12T00:00:00-06:002007-10-12T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2007-10-12:/ctech-software-goes-multithreaded.html<p>CTech has announced that the next version of it's flagship software package, <a href="http://www.ctech.com/index.php?page=evspro">EVS (Environmental Visualization System)</a>, will take full advantage of multiple processors. </p>
<p><img alt="" src="/assets/img/evs.gif"></p>
<p>My experience with EVS is mostly in the realm of 3-dimensional kriging and geostatistics. Given the amount of data crunching involved, it's always been sluggish when dealing …</p><p>CTech has announced that the next version of it's flagship software package, <a href="http://www.ctech.com/index.php?page=evspro">EVS (Environmental Visualization System)</a>, will take full advantage of multiple processors. </p>
<p><img alt="" src="/assets/img/evs.gif"></p>
<p>My experience with EVS is mostly in the realm of 3-dimensional kriging and geostatistics. Given the amount of data crunching involved, it's always been sluggish when dealing with a non-trivial amount of data. Nothing is more frustrating that seeing one of your CPU cores cranking away while the others sit idle! But <a href="http://www.ctech.com/forum/viewtopic.php?pid=213#213">some users are reporting</a> that the new multithreaded modules get nearly linear performance increases when adding more processing cores.</p>
<p>CTech is certainly not the first scientific/geostats application to go parallel. But it is the first program that I personally use on a regular basis that will take advantage of a multi-processor system. I hope this marks the beginning of an industry trend in that direction.</p>Autodesk open sources coordinate system software2007-09-25T00:00:00-06:002007-09-25T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2007-09-25:/autodesk-open-sources-coordinate-system-software.html<p>Not very often do I see open source mentioned on the front page of my Google Finance page (let alone Geospatial Open Source). But here it is.. the announcement was made at FOSS4G2007 that <a href="http://money.cnn.com/news/newsfeeds/articles/prnewswire/AQTU16425092007-1.htm"> autodesk will be open sourcing part of it's coordinate system and map projection technology</a>. </p>
<p>So what …</p><p>Not very often do I see open source mentioned on the front page of my Google Finance page (let alone Geospatial Open Source). But here it is.. the announcement was made at FOSS4G2007 that <a href="http://money.cnn.com/news/newsfeeds/articles/prnewswire/AQTU16425092007-1.htm"> autodesk will be open sourcing part of it's coordinate system and map projection technology</a>. </p>
<p>So what motivation does Autodesk (or any other company) have to open source it's technology? An important line from Lisa Campbell, vice president, Autodesk Geospatial:</p>
<blockquote>
<p>"Our intent to contribute again to the open source community is a reflection of our customers' desire for faster innovation, more frequent product releases, and lower total cost of ownership."</p>
</blockquote>Parallel python and GIS2007-09-18T00:00:00-06:002007-09-18T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2007-09-18:/parallel-python-and-gis.html<p>Let's face it - processing speeds aren't going to be increasing according to Moore's Law anymore; Instead of faster CPUs, <a href="http://www.gotw.ca/publications/concurrency-ddj.htm">we'll be getting more of them</a>. The future of programming, it seems to me, lies in the ability to leverage multiple processors. In other words, we have to write parallel code …</p><p>Let's face it - processing speeds aren't going to be increasing according to Moore's Law anymore; Instead of faster CPUs, <a href="http://www.gotw.ca/publications/concurrency-ddj.htm">we'll be getting more of them</a>. The future of programming, it seems to me, lies in the ability to leverage multiple processors. In other words, we have to write parallel code. Until I read <a href="http://zcologia.com/news/571/catching-up-with-python/">Seans' post</a>, I was unware that there was a viable python solution. I had been growing quite dissillusioned by python's dreaded <a href="http://www.pyzine.com/Issue001/Section_Articles/article_ThreadingGlobalInterpreter.html">Global Interpreter Lock</a> which confines python to a single processing core. I've even started learning <a href="http://www.erlang.org/">Erlang</a> to leverage SMP processing (until I realized that Erlang and it's standard libraries are virtually useless for anything that needs to handle geospatial data).</p>
<p>So I gave <a href="http://www.parallelpython.com/">Parallel Python</a> (pp) a shot. Since Sean also offered up a bounty for the first GIS application that used pp, I thought it might be a good time to try ;-)</p>
<p>A good candidate for parallel processing is any application that has to crunch away on lists/arrays of data and whose individual members be handled independently (see <a href="http://www.erlang.org/ml-archive/erlang-questions/200606/msg00130.html">pmap in Erlang</a>). I have been working on <a href="http://perrygeo.googlecode.com/svn/trunk/gis-bin/bezier_smooth_pp.py">an application to smooth linework using bezier curves</a>. It's not quite polished yet but the image below shows the before and after</p>
<p><img alt="" src="/assets/img/smoothed.jpg"></p>
<p>... but <a href="http://en.wikipedia.org/wiki/B%C3%A9zier_curve">bezier curves</a> aren't quite the subject of this post. Let's just say the algorithm takes some time to compute (if you're using a high density of verticies) and can be handled one LineString feature at a time. This makes it a prime candidate for parallelization.</p>
<p>Given a list of input LineStrings, I could process them the sequential way:</p>
<div class="highlight"><pre><span></span><code><span class="nt"><blockquote></span>smooth_lines = []
for line in lines:
smooth_lines.append( calcBezierFromLine( line, num_bezpts, beztype, t) )<span class="nt"></blockquote></span>
</code></pre></div>
<p>Or use pp to start up a "job server" which doles the tasks out to as many "workers". A busy worker utilizes a single processing core so a good rule of thumb would be to start up as many workers as you have CPU cores:</p>
<div class="highlight"><pre><span></span><code><span class="nt"><blockquote></span>numworkers = 2 # dual-core machine
job_server = pp.Server(numworkers, ppservers=ppservers)
smooth_lines = []
jobs = [(line, job_server.submit(calcBezierFromLine, (line, num_bezpts, beztype, t), \
(computeBezier, getPointOnCubicBezier), ("numpy",) )) for line in lines]
for input, job in jobs:
smooth_lines.append( job() )<span class="nt"></blockquote></span>
</code></pre></div>
<p>Theoretically the parellized version should run twice as fast as the sequential version on my core2 duo machine. And reality was pretty darn close to that:</p>
<div class="highlight"><pre><span></span><code><span class="nt"><blockquote></span>$ time python bezier_smooth_pp.py 2
Shapefile contains 1114 lines
Starting pp with 2 workers
Completed 1114 new lines with 8 additional verticies for each line segment along a cubic bezier curve
real 0m10.908s
...
$ time python bezier_smooth_pp.py 1
Shapefile contains 1114 lines
Starting pp with 1 workers
Completed 1114 new lines with 8 additional verticies for each line segment along a cubic bezier curve
real 0m20.007s
...
<span class="nt"></blockquote></span>
</code></pre></div>
<p>Just think of the possibilities. In the forseeable future, the average computer might have 8+ cores to work with. This could mean that your app will move 8x faster if you parallize the code (assuming there are no IO or bandwidth bottlenecks). I'd love to test it out on a system with more than 2 processing cores but, unfortunately, I don't have access to any <a href="http://www.calvin.edu/~adams/research/microwulf/">beowulf clusters</a>, <a href="http://www.sun.com/processors/UltraSPARC-T1/"> Sun UltraSparc servers,</a> or <a href="http://www.apple.com/macpro/">8-core Xeon Mac Pros</a>. This is what I <em>really need</em> to complete my research ;-) So if anyone want to donate to the cause, send me an email! </p>
<p>And to answer Sean's bounty, I don't consider this an actual application (yet) but I hope it can spur some interest and move things in that direction. But if you feel the need to send me some New Belgium swag (or one of the machines listed above), feel free ;-)</p>The world turned right-side up2007-09-05T00:00:00-06:002007-09-05T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2007-09-05:/the-world-turned-right-side-up.html<p>I've been working alot in <a href="http://www.goldensoftware.com/products/surfer/surfer.shtml">Surfer</a> these days; an excellent geostats and surface mapping package. I was very happy to find that GDAL read it's .grd binary format until I noticed the output from gdalinfo:</p>
<div class="highlight"><pre><span></span><code>> C:\Workspace\Temp\interpolation>gdalinfo svpce_5.grd
Driver: GS7BG/Golden Software 7 Binary Grid (.grd …</code></pre></div><p>I've been working alot in <a href="http://www.goldensoftware.com/products/surfer/surfer.shtml">Surfer</a> these days; an excellent geostats and surface mapping package. I was very happy to find that GDAL read it's .grd binary format until I noticed the output from gdalinfo:</p>
<div class="highlight"><pre><span></span><code>> C:\Workspace\Temp\interpolation>gdalinfo svpce_5.grd
Driver: GS7BG/Golden Software 7 Binary Grid (.grd)
Files: svpce_5.grd
Size is 555, 339
Coordinate System is `'
Origin = (383371.000000000000000,3764907.000000000000000)
Pixel Size = (0.500000000000000,0.500000000000000)
Corner Coordinates:
Upper Left ( 383371.000, 3764907.000)
Lower Left ( 383371.000, 3765076.500)
Upper Right ( 383648.500, 3764907.000)
Lower Right ( 383648.500, 3765076.500)
Center ( 383509.750, 3764991.750)
Band 1 Block=555x1 Type=Float64, ColorInterp=Undefined
NoData Value=1.70141e+038
</code></pre></div>
<p>Notice that upper Y value is <em>south</em> of the lower Y value! Basically the raster lines order is reversed (bottom-to-top instead of the normal raster orientation of top-to-bottom). I've also experienced the same issue with some NetCDF files so I thought it would be good to have a generic solution to the problem.</p>
<p>So I hacked up the gdal_merge.py script (distributed with gdal, fwtools, etc) and created a raster flip script that will invert the image along the y axis and retain the georeferencing and metadata. The resulting <a href="http://perrygeo.googlecode.com/svn/trunk/gis-bin/flip_raster.py">flip_raster.py</a> script seems to work pretty well though it is far from tested.</p>
<p>Here's an example:</p>
<p>The standard gdal_translate method (which doesn't account for the inverted coordinate space):</p>
<blockquote>
<p>gdal_translate -of GTiff krig1.grd krig1_translate.tif</p>
</blockquote>
<p><img alt="" src="/assets/img/standard.jpg"></p>
<p>And the flipped raster method:</p>
<blockquote>
<p>flip_raster.py -o krig1_flip.tif -of GTiff krig1.grd </p>
</blockquote>
<p><img alt="" src="/assets/img/flipped.jpg"></p>
<p>And we're good. gdalinfo confirms that we have the same extents, pixel sizes, metadata, etc as the original dataset. </p>Mapserver vs Mapnik revisited2007-09-04T00:00:00-06:002007-09-04T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2007-09-04:/mapserver-vs-mapnik-revisited.html<p>A while ago, I was enamored with mapnik's image quality despite it's limitations compared to the vast configurability of the mapserver mapfile. Now that mapserver uses the AGG rendering library, it might not be necessary to compromise configurability in order to get beautiful linework. I just installed the recent beta …</p><p>A while ago, I was enamored with mapnik's image quality despite it's limitations compared to the vast configurability of the mapserver mapfile. Now that mapserver uses the AGG rendering library, it might not be necessary to compromise configurability in order to get beautiful linework. I just installed the recent beta of mapserver 5.0 and the image quality is very crisp... but this comes at the expense of rendering speed.</p>
<p>All the times below are the average of ten runs using a full global view of a simplified shapefile of country borders. </p>
<p><img alt="" src="/assets/img/mapserver_gd_test.jpg"></p>
<p><strong>mapserver (gd) : 0.082 sec , 18kb</strong></p>
<blockquote>
<p>OUTPUTFORMAT
NAME "GD_JPEG"
DRIVER "GD/JPEG"
MIMETYPE "image/jpeg"
IMAGEMODE RGB
EXTENSION "jpg"
END</p>
</blockquote>
<p>shp2img -m test.map -o mapserver_gd_test.jpg</p>
<p><img alt="" src="/assets/img/mapserver_agg_test.jpg"></p>
<p><strong>mapserver (agg) : 0.188 sec , 16kb</strong></p>
<blockquote>
<p>IMAGEQUALITY 80
OUTPUTFORMAT
NAME 'AGG_JPEG'
DRIVER AGG/JPEG
IMAGEMODE RGB
END</p>
</blockquote>
<ul>
<li>Note that if we bump up imagequality to 90% to (roughly) match the mapnik image, the rendering time and size increase a bit (.201 sec, 25kb)</li>
</ul>
<p>shp2img -m test.map -o mapserver_agg_test.jpg</p>
<p><img alt="" src="/assets/img/mapnik_output.jpg"></p>
<p><strong>mapnik (agg) : 0.282 sec, 23kb</strong>
python test_mapnik.py</p>
<ul>
<li>Running this through the python interpreter is likely to interfere with the speed of the results so these times may not be very comparable to shp2img.</li>
</ul>
<p>Using these preliminary results, it looks like mapserver 5.0 with AGG rendering is roughly equal to mapnik based on a balance of quality/speed/image size. But since I'd prefer to use mapfiles over the undocumented mapnik xml format any day, I think I'll stick with my beloved mapserver. Kudos to the mapserver developers for raising the bar once again.</p>Performance testing rasters with mapserver2007-09-04T00:00:00-06:002007-09-04T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2007-09-04:/performance-testing-rasters-with-mapserver.html<p>There's been some good talk on the mapserver list (thanks to Gregor's diligent testing) about performance related to serving up raster imagery. </p>
<p>First off, comparisons of <a href="http://lists.umn.edu/cgi-bin/wa?A2=ind0709&L=mapserver-users&T=0&O=D&P=1526">image</a> <a href="http://lists.umn.edu/cgi-bin/wa?A2=ind0709&L=mapserver-users&T=0&O=D&P=1526">formats. </a>Then a look at some TIFF <a href="http://lists.umn.edu/cgi-bin/wa?A2=ind0709&L=mapserver-users&T=0&O=D&P=2214">optimization</a> <a href="http://lists.umn.edu/cgi-bin/wa?A2=ind0709&L=mapserver-users&T=0&O=D&P=4492">techniques</a> like overviews (similar to "pyramids" in ESRI land) and internal tiling to boost rendering …</p><p>There's been some good talk on the mapserver list (thanks to Gregor's diligent testing) about performance related to serving up raster imagery. </p>
<p>First off, comparisons of <a href="http://lists.umn.edu/cgi-bin/wa?A2=ind0709&L=mapserver-users&T=0&O=D&P=1526">image</a> <a href="http://lists.umn.edu/cgi-bin/wa?A2=ind0709&L=mapserver-users&T=0&O=D&P=1526">formats. </a>Then a look at some TIFF <a href="http://lists.umn.edu/cgi-bin/wa?A2=ind0709&L=mapserver-users&T=0&O=D&P=2214">optimization</a> <a href="http://lists.umn.edu/cgi-bin/wa?A2=ind0709&L=mapserver-users&T=0&O=D&P=4492">techniques</a> like overviews (similar to "pyramids" in ESRI land) and internal tiling to boost rendering speed. </p>
<p>Most of the conclusions are not all that staggering: </p>
<ul>
<li>
<p>TIFF is fastest but takes up more space compared to ECW and JPEG2000. </p>
</li>
<li>
<p>Overviews speed up TIFFs tremendously when zoomed out (ie when mapserver would otherwise have to perform some heavy downsampling) </p>
</li>
<li>
<p>Internal tiles in GeoTIFF format give a boost when zoomed in (only the necessary tiles are read from disk) </p>
</li>
<li>
<p>The TIFF comparison was run on two setups; a monsterous 8-core, RAID-5 equipped beast and a low-memory virtual machine on low-end PC hardware. The TIFF optimizations are very noticeable on the lesser machine but almost completely negligible on the high-end machine. </p>
</li>
</ul>
<blockquote>
<p>Both tiling and overviews are useful, but only on machines with resource
shortages, such as slow disks or a lack of spare RAM for caching.</p>
</blockquote>
<p>Nothing earth-shattering (these techniques are often mentioned as best practices) but is very nice to see some hard numbers to back it up. Plus the verbose test logs provide a good example for a newbie trying to implement them. Good stuff Gregor!</p>Mapping the Undesirable2007-08-28T00:00:00-06:002007-08-28T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2007-08-28:/mapping-the-undesirable.html<p>While by no means a new phenomenon, <a href="http://www.thevision2020.com/LocateSexOffenders.aspx"> Vision 20/20</a> is offering a service allowing you to see a map of the registered sex offenders in your area. WorldChanging, one of my favorite blogs on emerging technologies, has a great article discussing the issues surrounding <a href="http://www.worldchanging.com/archives/007189.html"> mapping of sex offenders </a>. </p>
<blockquote>
<p>Is …</p></blockquote><p>While by no means a new phenomenon, <a href="http://www.thevision2020.com/LocateSexOffenders.aspx"> Vision 20/20</a> is offering a service allowing you to see a map of the registered sex offenders in your area. WorldChanging, one of my favorite blogs on emerging technologies, has a great article discussing the issues surrounding <a href="http://www.worldchanging.com/archives/007189.html"> mapping of sex offenders </a>. </p>
<blockquote>
<p>Is this sort of service, based on powerful networked technologies -- and one being sold on the basis of fear -- an appropriate use of the technology? Where is the data being sourced from? How are the people inputting it being supervised? And what rights to privacy and presumptions of innocence are the people it tracks entitled to? </p>
</blockquote>
<p>These are good points, but even more disturbing to me as a citizen and a GIS professional, is that these maps use geocoding services that are <a href="http://www.ij-healthgeographics.com/content/2/1/10/abstract/"> not nearly accurate enough</a> for the scale at which they are being viewed. Even in suburban areas, using linear-referenced geocoding techniques can still yield errors of 100s of meters! The margin of error in the geocoding engine alone is enough to place the sex offender icon directly on an innocent citizens' home.</p>
<p>For instance, which of the homes in the map below is the residence of a sex offender? Does the ambiguity bother you? Would it matter more if <em>you</em> were the innocent person living next door?</p>
<p><img alt="" src="/assets/img/offender.png"></p>
<p>For maps with this much social weight, I think that a bit more diligence is due to ensure that this data is as accurate as it needs to be! </p>Zaca Lake Fire Map2007-08-03T00:00:00-06:002007-08-03T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2007-08-03:/zaca-lake-fire-map.html<p><img alt="" src="/assets/img/viewfromwest027.jpg"></p>
<p>Ah the joys of living in southern california. The Zaca Lake fire has been burning since July 4th and recently <a href="http://independent.com/news/2007/aug/03/zaca-fire-explodes/">flared up again</a> with a shift in winds which is blowing ash and a very ominous plume of smoke all over downtown santa barbara. While it's still burning in the …</p><p><img alt="" src="/assets/img/viewfromwest027.jpg"></p>
<p>Ah the joys of living in southern california. The Zaca Lake fire has been burning since July 4th and recently <a href="http://independent.com/news/2007/aug/03/zaca-fire-explodes/">flared up again</a> with a shift in winds which is blowing ash and a very ominous plume of smoke all over downtown santa barbara. While it's still burning in the wilderness areas north of town, the Paradise Road area along the Santa Ynez river has been evacuated. <a href="http://maps.google.com/maps/ms?ie=UTF8&hl=en&msa=0&msid=105524280382284020010.0004351434f7c4b6bb5eb&ll=34.787162,-120.029583&spn=0.137739,0.144711&t=h&z=13&om=1">Check it out on google maps</a>.</p>
<p>The Santa Barbara News Press is reporting the fire has reached 39,000 acres and has cost $43 million thus far to contain. The county supervisors are likely to declare a state of emergency and there is already a health warning in effect. So much for my bike ride this afternoon...</p>Desktop vs Web UI2007-06-11T00:00:00-06:002007-06-11T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2007-06-11:/desktop-vs-web-ui.html<p>This might be a dup story for some but I thought it was interesting enough to post nonetheless:</p>
<p>Jeff Atwood wrote an interesting piece about Desktop vs Web UI that is directly relevant to mapping : <a href="http://www.codinghorror.com/blog/archives/000883.html">Who Killed the Desktop Application?</a>. He compares the usability of Microsoft Streets and Trips with …</p><p>This might be a dup story for some but I thought it was interesting enough to post nonetheless:</p>
<p>Jeff Atwood wrote an interesting piece about Desktop vs Web UI that is directly relevant to mapping : <a href="http://www.codinghorror.com/blog/archives/000883.html">Who Killed the Desktop Application?</a>. He compares the usability of Microsoft Streets and Trips with Google Maps and concludes </p>
<blockquote>
<p>All the innovation in user interface seems to be taking place on the web, and desktop applications just aren't keeping up. </p>
</blockquote>OGR and matplotlib examples2007-06-10T00:00:00-06:002007-06-10T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2007-06-10:/ogr-and-matplotlib-examples.html<p>Jose Gomez-Dans posted a great example of using OGR, Postgis and Matplotlib with Python - <a href="http://jgomezdans.googlepages.com/ogr%2Cpythonymatplotlib">OGR, Python y Matplotlib</a> (Spanish only).</p>FDO, GDAL/OGR and FME ?2007-05-31T00:00:00-06:002007-05-31T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2007-05-31:/fdo-gdalogr-and-fme.html<p><a href="http://fdo.osgeo.org/">FDO</a>, <a href="http://gdal.osgeo.org/">GDAL</a> and <a href="http://safe.com/products/fme/index.php">FME</a> all seem to operate in roughly the same domain - Providing a data model, API and tools to translate between spatial data formats. Does anyone know of any good write-ups comparing/contrasting the features of these three libraries? </p>QGIS Geocoding plugin2007-05-28T00:00:00-06:002007-05-28T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2007-05-28:/qgis-geocoding-plugin.html<p>A few weeks back, I decided to take the plunge and learn the python bindings for QGIS 0.9. My first experiment was to implement a geocoder plugin. What started mostly as a learning experiment turned into something that might actually be useful!</p>
<p>The idea was to use web services …</p><p>A few weeks back, I decided to take the plunge and learn the python bindings for QGIS 0.9. My first experiment was to implement a geocoder plugin. What started mostly as a learning experiment turned into something that might actually be useful!</p>
<p>The idea was to use web services to do all the actual geocoding work (the hard part!) and the delimited text provider to load the results into qgis. Right now it's built on top of the <a href="http://developer.yahoo.com/maps/rest/V1/geocode.html">Yahoo geocoder</a> which is, IMO, the best out there.. very flexible about the input format. The <a href="http://exogen.case.edu/projects/geopy/">geopy module</a> is used to interact with the geocoding services so it could potentially support other engines such as geocoder.us, virtual earth, google, etc. </p>
<p>The user interface is very straightforward; enter list of addresses/placenames seperated by a line break, pick an output file and go. To be legitimate, you should also sign up for a yahoo api key, though the 'YahooDemo' key will work ok for testing purposes.</p>
<p><a href="/assets/img/dialog.jpg"><img alt="" src="/assets/img/dialog_thumb.jpg"></a></p>
<p><a href="/assets/img/result.jpg"><img alt="" src="/assets/img/result_thumb.jpg"></a></p>
<p>Here's the install process (assuming you already have <a href="http://www.reprojected.com/presentations/Videos/qgis_install_051407/install_qgis.txt">python, pyqt4, qgis 0.9, qgis bindings, etc. set up</a>):</p>
<blockquote>
<p>svn checkout http://perrygeo.googlecode.com/svn/trunk/qgis/geocode
cd geocode
emacs Makefile # change install directory if needed
sudo make install</p>
</blockquote>
<p>This is just a rough cut and it's my first attempt at using the qgis and qt apis so there are probably many things that could be improved upon. Ideally this plugin could:</p>
<ul>
<li>
<p>Parse text files as input </p>
</li>
<li>
<p>Allow for a choice of geocoding engine </p>
</li>
<li>
<p>??? </p>
</li>
</ul>
<p>Feedback (and patches) welcome ;-)</p>Python gpsd bindings2007-05-27T00:00:00-06:002007-05-27T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2007-05-27:/python-gpsd-bindings.html<p>If you want to get a linux/unix machine talking to your GPS unit, most likely you'll be using <a href="http://gpsd.berlios.de/">gpsd</a>. There are many great apps that build off of gpsd such as kismet and gpsdrive. </p>
<p>Installing gpsd on debian/ubuntu systems is as simple as </p>
<div class="highlight"><pre><span></span><code>sudo apt-get install gpsd gpsd-clients …</code></pre></div><p>If you want to get a linux/unix machine talking to your GPS unit, most likely you'll be using <a href="http://gpsd.berlios.de/">gpsd</a>. There are many great apps that build off of gpsd such as kismet and gpsdrive. </p>
<p>Installing gpsd on debian/ubuntu systems is as simple as </p>
<div class="highlight"><pre><span></span><code>sudo apt-get install gpsd gpsd-clients
</code></pre></div>
<p>You should be able to connect your gps via serial port and start a gpsd server </p>
<div class="highlight"><pre><span></span><code>sudo gpsd /dev/ttyS0
</code></pre></div>
<p>The gpsd server reads NMEA sentences from the gps unit and is accessed on port 2947. You can test if everything is working by running a pre-built gpsd client such as xgps.</p>
<p>This is very useful for situations where you need lower-level access to the gps data; for logging your position to a postgres database for example. The debian packages (and most others I'm assuming) come with gps.py, a python interface to gpsd allowing you to pull your lat/long from the gps in real time. This opens the door for all sorts of neat real-time gps apps.</p>
<blockquote>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">gps</span><span class="o">,</span> <span class="nn">os</span><span class="o">,</span> <span class="nn">time</span>
<span class="n">session</span> <span class="o">=</span> <span class="n">gps</span><span class="o">.</span><span class="n">gps</span><span class="p">()</span>
<span class="k">while</span> <span class="mi">1</span><span class="p">:</span>
<span class="n">os</span><span class="o">.</span><span class="n">system</span><span class="p">(</span><span class="s1">'clear'</span><span class="p">)</span>
<span class="n">session</span><span class="o">.</span><span class="n">query</span><span class="p">(</span><span class="s1">'admosy'</span><span class="p">)</span>
<span class="c1"># a = altitude, d = date/time, m=mode, </span>
<span class="c1"># o=postion/fix, s=status, y=satellites</span>
<span class="nb">print</span>
<span class="nb">print</span> <span class="s1">' GPS reading'</span>
<span class="nb">print</span> <span class="s1">'----------------------------------------'</span>
<span class="nb">print</span> <span class="s1">'latitude '</span> <span class="p">,</span> <span class="n">session</span><span class="o">.</span><span class="n">fix</span><span class="o">.</span><span class="n">latitude</span>
<span class="nb">print</span> <span class="s1">'longitude '</span> <span class="p">,</span> <span class="n">session</span><span class="o">.</span><span class="n">fix</span><span class="o">.</span><span class="n">longitude</span>
<span class="nb">print</span> <span class="s1">'time utc '</span> <span class="p">,</span> <span class="n">session</span><span class="o">.</span><span class="n">utc</span><span class="p">,</span> <span class="n">session</span><span class="o">.</span><span class="n">fix</span><span class="o">.</span><span class="n">time</span>
<span class="nb">print</span> <span class="s1">'altitude '</span> <span class="p">,</span> <span class="n">session</span><span class="o">.</span><span class="n">fix</span><span class="o">.</span><span class="n">altitude</span>
<span class="nb">print</span> <span class="s1">'eph '</span> <span class="p">,</span> <span class="n">session</span><span class="o">.</span><span class="n">fix</span><span class="o">.</span><span class="n">eph</span>
<span class="nb">print</span> <span class="s1">'epv '</span> <span class="p">,</span> <span class="n">session</span><span class="o">.</span><span class="n">fix</span><span class="o">.</span><span class="n">epv</span>
<span class="nb">print</span> <span class="s1">'ept '</span> <span class="p">,</span> <span class="n">session</span><span class="o">.</span><span class="n">fix</span><span class="o">.</span><span class="n">ept</span>
<span class="nb">print</span> <span class="s1">'speed '</span> <span class="p">,</span> <span class="n">session</span><span class="o">.</span><span class="n">fix</span><span class="o">.</span><span class="n">speed</span>
<span class="nb">print</span> <span class="s1">'climb '</span> <span class="p">,</span> <span class="n">session</span><span class="o">.</span><span class="n">fix</span><span class="o">.</span><span class="n">climb</span>
<span class="nb">print</span>
<span class="nb">print</span> <span class="s1">' Satellites (total of'</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">session</span><span class="o">.</span><span class="n">satellites</span><span class="p">)</span> <span class="p">,</span> <span class="s1">' in view)'</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">session</span><span class="o">.</span><span class="n">satellites</span><span class="p">:</span>
<span class="nb">print</span> <span class="s1">'</span><span class="se">\t</span><span class="s1">'</span><span class="p">,</span> <span class="n">i</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
</code></pre></div>
</blockquote>
<p>... which gives you a simple readout to the terminal every 3 seconds.</p>
<p><img alt="" src="/assets/img/gpsd_python.jpg"></p>
<p>Obviously there are much more interesting applications for this ( logging data to postgis, displaying real-time tracking data in QGIS via a python plugin, etc). But this is a good start for any python based app.</p>Sparklines in python2007-05-19T00:00:00-06:002007-05-19T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2007-05-19:/sparklines-in-python.html<p>Edward Tufte, the outspoken guru of data visualization, has long been an advocate of clear and concise (almost minimalist) graphical representations of data. He's got a lot of great ideas relevant to cartography (my cartography course at Humboldt State used his book "The Visual Display of Quantitative Information" as our …</p><p>Edward Tufte, the outspoken guru of data visualization, has long been an advocate of clear and concise (almost minimalist) graphical representations of data. He's got a lot of great ideas relevant to cartography (my cartography course at Humboldt State used his book "The Visual Display of Quantitative Information" as our text). </p>
<p>One of the coolest ideas are "sparklines" which he describes as "data-intense, design-simple, word-sized graphics". Instead of standalone charts that are often placed on their own and separate from the text that discusses them, sparklines are meant to be placed in-line with the text and provide memorable, simple and contextually-relevant data to support the surrounding text. For example:</p>
<p>_The US National Debt as a percentage of GDP increased during the Reagan and Bush presidencies <img alt="" src="/assets/img/reaganbush.GIF"> but dropped off slightly during the Clinton administration <img alt="" src="/assets/img/clinton.GIF"> . _</p>
<p>Now of course I had to figure out how to produce these in python. Theres a great <a href="http://bitworking.org/projects/sparklines/#source">cgi application</a>, written in python by Joe Gregorio, that does sparklines. I needed something that was abstracted away from the CGI framework, more of a proper python module. Replacing all the CGI-specific code was straightforward and I came up with a standalone sparkline python module (<a href="http://perrygeo.googlecode.com/svn/trunk/gis-bin/spark.py"> View / Download the Source Code. </a> ) The only dependencies are python and the python imaging library.</p>
<p>In the minimalist spirt of sparklines, the interface was kept simple. First you create a list of data values then simply pass the list to one of the sparkline generators:</p>
<blockquote>
<p>import spark
a = [32.5,35.2,39.9,40.8,43.9,48.2,50.5,51.9,53.1,55.9,60.7,64.4]
spark.sparkline_smooth(a).show()</p>
</blockquote>
<p>Or if you prefer a more discrete, bar-graph-style <img alt="" src="/assets/img/discrete.GIF"> instead of a smooth line:</p>
<blockquote>
<p>spark.sparkline_discrete(a).show()</p>
</blockquote>
<p>There's plenty of room for configuration. For example, in the national debt example above I wanted to keep the y axis at the same scale (instead of the default min-max scaling) and make each step 6 pixels wide:</p>
<blockquote>
<p>spark.sparkline_smooth(a, dmin=30,dmax=70, step=6).show()</p>
</blockquote>
<p>How does this relate to cartography? GIS typically takes a snapshot representation of earth, frozen in time. Since sparklines seem particularly good at representing change-over-time, it could be an interesting way to add a time dimension to a 2-D map. For example, instead of just displaying country polygons with labels, you could place a sparkline right under the label showing the population changes over the last century. It seems like it would be an ideal way to embed alot of useful information into a small map. </p>
<p>Anyone know of any good examples?</p>Blessed Unrest - Paul Hawken’s presentation2007-05-14T00:00:00-06:002007-05-14T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2007-05-14:/blessed-unrest-paul-hawkens-presentation.html<p>I got the chance to see Paul Hawken speak tonight in Santa Barbara. I knew him best as the author of <a href="http://www.natcap.org/">Natural Capitalism</a> which provided a great roadmap for integrating ecologically sustainable practices with the business world. This talk was based on his recent book - <a href="http://blessedunrest.com/">Blessed Unrest - How the Largest …</a></p><p>I got the chance to see Paul Hawken speak tonight in Santa Barbara. I knew him best as the author of <a href="http://www.natcap.org/">Natural Capitalism</a> which provided a great roadmap for integrating ecologically sustainable practices with the business world. This talk was based on his recent book - <a href="http://blessedunrest.com/">Blessed Unrest - How the Largest Movement in the World Came into Being and Why No One Saw It Coming</a>. </p>
<p>The basis of this book is simple: that organically-developed, bottom-up, non-hierarchical organizations (which number in the millions according to his research) are now leading the world in many diverse areas of service. He describes these environmental and social justice organizations as the "immune system" of our societies; our response to destructive and corrupt habits perpetrated by those in power who are willing to compromise our future for short-term gain. </p>
<p>One thing that struck me about the subject was the importance of sharing <em>information</em> and <em>ideas</em> (as opposed to spreading an <em>ideology</em>). I thought one of the most interesting stories of the night was his description of how the meme of non-violent civil disobedience evolved... from Emerson, to Thoreau, to Ghandi, to Rosa Parks to Martin Luther King, Jr. At each turn of the story, there was someone (often unnamed but vitally important) who turned on each of these people to the ideas of those who came before. </p>
<p>Paul was eager to point out the role of technology in this inter-connected mesh of grassroots community organizations. He mentioned open-source software a few times and even gave a shout out to Ruby on Rails (which I gather was the backbone for his <a href="http://wiserearth.org/">WiserEarth.org</a> site focussed on connecting these diverse organizations).</p>
<p>It was a careful mix of optimism and pessimism; Paul was careful in noting the many severe challenges we've been handed but was confident that this bottom-up mesh of interconnected citizens can form a community strong enough to withstand anything that comes it's way. In the end, his message was about doing what you love, connecting with others and standing up for your values. Sounds like good advice to me.</p>Cleaning up CAD data with postgis2007-05-14T00:00:00-06:002007-05-14T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2007-05-14:/cleaning-up-cad-data-with-postgis.html<p>Don't you just love getting CAD data into GIS! I received a .dwg file with study areas delineated as polylines which we needed as polygons for analysis purposes. And it wasn't just one polyline surrounding each study area ... there were hundreds of little line segments which outlined a couple dozen …</p><p>Don't you just love getting CAD data into GIS! I received a .dwg file with study areas delineated as polylines which we needed as polygons for analysis purposes. And it wasn't just one polyline surrounding each study area ... there were hundreds of little line segments which outlined a couple dozen areas (what was this CAD tech thinking?) . Luckily each segment had a name to associate it with the proper area.</p>
<p>I found that ArcMap's tools for doing this are painfully inadequate so I turned to postgis. After converting the dataset to a shapefile, the solution was simple:</p>
<blockquote>
<p>shp2pgsql "study_areas.shp" areas | psql -d gisdata
pgsql2shp -f "study_areas_poly.shp" gisdata \
"SELECT BuildArea(collect(the_geom)) AS the_geom, name
FROM areas
GROUP by name"</p>
</blockquote>
<p>Viola... a new shapefile with my proper polygons instead of CAD chicken scratch. </p>Back on the train2007-05-13T00:00:00-06:002007-05-13T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2007-05-13:/back-on-the-train.html<p>I'd like to have some interesting excuse as to why I haven't posted since last July. But I don't. </p>
<p>I've since left my postion at NCEAS, started a new job at <a href="http://www.geosyntec.com">Geosyntec</a> and have been keeping busy with life, love and the pursuit of happiness. Oh and GIS of course …</p><p>I'd like to have some interesting excuse as to why I haven't posted since last July. But I don't. </p>
<p>I've since left my postion at NCEAS, started a new job at <a href="http://www.geosyntec.com">Geosyntec</a> and have been keeping busy with life, love and the pursuit of happiness. Oh and GIS of course.</p>
<p>Anyway, I expect to be posting on a much more regular basis from here on (unless I get distracted again ;-) ).</p>Worldwind Java - Jython example2007-05-13T00:00:00-06:002007-05-13T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2007-05-13:/worldwind-java-jython-example.html<p>The <a href="http://worldwind.arc.nasa.gov/java/index.html"> worldwind java sdk </a> has finally been released. It's a neat SDK, well organized, <a href="http://tleilax.chinoy.com/worldwind/articles/20070510-FirstImpressions.html">easy to bring into Eclipse</a> with some good examples to start hacking away.</p>
<p>The only problem is the examples are written in Java ;-) . If braces make you cringe but you still want to work with all …</p><p>The <a href="http://worldwind.arc.nasa.gov/java/index.html"> worldwind java sdk </a> has finally been released. It's a neat SDK, well organized, <a href="http://tleilax.chinoy.com/worldwind/articles/20070510-FirstImpressions.html">easy to bring into Eclipse</a> with some good examples to start hacking away.</p>
<p>The only problem is the examples are written in Java ;-) . If braces make you cringe but you still want to work with all the excellent Java libraries out there, you'll want to take a look at Jython. Taking the AWT1Up.java code and porting a subset of the functionality to Jython was surprisingly easy and yielded much more readable code in my opinion. And the ability to manipulate objects at the interactive prompt is just so sweet. </p>
<p><a href="/assets/img/wwj_jython.jpg"> <img alt="" src="/assets/img/wwj_jython_thumb.jpg"> </a></p>
<p><a href="http://perrygeo.googlecode.com/svn/trunk/gis-bin/wwj_demo.py"> View the Source Code </a></p>
<p>Setup is not too terrible:</p>
<ol>
<li>
<p>Get a Java JDK (I'm using sun java 6) </p>
</li>
<li>
<p>Download and install Jython 2.2b2 </p>
</li>
<li>
<p>Download and unzip the worldwind java sdk (ex: /opt/wwj )</p>
</li>
<li>
<p>Set your LD_LIBRARY_PATH variable to /opt/wwj</p>
</li>
<li>
<p>Set your CLASSPATH variable to /opt/wwj/worldwind.jar</p>
</li>
<li>
<p>Run <code>jython wwj_demo.py</code></p>
</li>
</ol>
<p>One thing that is a bit disappointing with the WorldWind SDK in general is the lack of support for rendering common formats. Maybe I missed something but I couldn't get gpx or georss feeds working properly. It is version 0.2 so I expect support for GeoRSS and GPX to improve and for GML, KML, GeoJSON, Shapefiles, Rasters, WMS, etc to be included eventually.</p>
<p>Anyone else out there started playing with Jython / Worldwind yet?</p>The reliability of web services2006-07-24T00:00:00-06:002006-07-24T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2006-07-24:/the-reliability-of-web-services.html<p>A few months back I posted a link to my <a href="http://www.perrygeo.net/wordpress/?p=35">ten favorite Web Mapping Services</a>. The post included live links directly to the WMS servers. At first I questioned this move as locally hosted images would be far more reliable. But I thought it would be a neat experiment to …</p><p>A few months back I posted a link to my <a href="http://www.perrygeo.net/wordpress/?p=35">ten favorite Web Mapping Services</a>. The post included live links directly to the WMS servers. At first I questioned this move as locally hosted images would be far more reliable. But I thought it would be a neat experiment to see the downtime of each site. So I checked it daily just out of curiosity...</p>
<p>Well with today's apparent disappearance of the <a href="http://wms.jpl.nasa.gov/wms.cgi?request=GetCapabilities">NASA JPL site</a>, all but one of my WMS layers mentioned have been down for at least a significant portion of a day. (The only one that's been consitently up has been http://mesonet.agron.iastate.edu) .</p>
<p>This echos back to what I was complaining about with the whole <a href="http://www.perrygeo.net/wordpress/?p=43">USGS National Map debacle</a>. The bottom line is that whenever we rely heavily on a web service to deliver essential data, we are risking the integrity of the end product. The chain is only as strong as it's weakest link and, unfortunately as the USGS and NASA have shown, those links can and will fail completely from time to time.</p>Converting Shapefiles (and more) to KML2006-07-14T00:00:00-06:002006-07-14T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2006-07-14:/converting-shapefiles-and-more-to-kml.html<p>A while back I wrote about converting <a href="http://www.perrygeo.net/wordpress/?p=3">KML files into a shapefile</a> for use with GIS apps other than GoogleEarth. I got a ton of emails and site traffic from people looking to go the opposite direction; getting their GIS data into KML. </p>
<p>There are, of course, a couple of …</p><p>A while back I wrote about converting <a href="http://www.perrygeo.net/wordpress/?p=3">KML files into a shapefile</a> for use with GIS apps other than GoogleEarth. I got a ton of emails and site traffic from people looking to go the opposite direction; getting their GIS data into KML. </p>
<p>There are, of course, a couple of utilities already implemented: ArcMap-based extensions including <a href="http://arcscripts.esri.com/details.asp?dbid=14344">KML Home Companion</a> and <a href="http://www.arc2earth.com/">Arc2Earth</a>, a nice MapWindow app called <a href="http://interactiveearth.blogspot.com/2006/06/download-shape2earth-beta-2.html"> Shape2Earth</a>, and the open source WMS <a href="http://docs.codehaus.org/display/GEOS/Home">Geoserver</a> all support KML output. </p>
<p>Not to be left behind, GDAL/OGR now supports KML output. Oddly enough it does not yet read KML. But hand it any <a href="http://ogr.maptools.org/ogr_formats.html">OGR-readable vector dataset</a> and it can be converted into KML. It currently doesn't offer as much control over the output as the above options but is quicker to implement, works with a wide variety of input formats and can be easily scripted.</p>
<p>This functionality is in CVS only at the moment but should be included in the next release. If you can't wait and don't feel like compiling from cvs source, try the 1.0.5 version of <a href="http://fwtools.maptools.org/">FWTools</a> (for Windows and Linux).</p>
<p>The conversion process is pretty straightforward. For example, the following will convert a shapefile (sbpoints.shp) to KML (mypoints.kml). </p>
<div class="highlight"><pre><span></span><code>ogr2ogr -f KML mypoints.kml sbpoints.shp sbpoints
</code></pre></div>
<p>The KML format flys in the face of the GIS mantra stating that content should be seperate from styling. Since styling information is purposefully absent from most standard vector formats, it makes for pretty bland KML output. The attributes just get dumped out into one big text block and there is no classification or styling control.
<img alt="" src="/assets/img/ogrkml.jpg"></p>
<p>But in terms of getting your data into Google Earth quickly (esp. point data), the OGR method looks promising.</p>Wardriving with Ubuntu Linux and Google Earth2006-07-03T00:00:00-06:002006-07-03T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2006-07-03:/wardriving-with-ubuntu-linux-and-google-earth.html<p>Wardriving is fun. Going around the neighborhood and mapping all the wireless networks may be nothing more than a geeky hobby but it can sure teach you alot. And viewing the results in Google Earth is icing on the cake.</p>
<p>I've used NetStumbler on windows and this works great but …</p><p>Wardriving is fun. Going around the neighborhood and mapping all the wireless networks may be nothing more than a geeky hobby but it can sure teach you alot. And viewing the results in Google Earth is icing on the cake.</p>
<p>I've used NetStumbler on windows and this works great but since my computers at home are now nearly Microsoft-free, I had to relearn the process on Linux. It breaks down into a few easy steps:</p>
<ol>
<li>
<p>Install the <strong>drivers</strong> for you wireless card. On my HP laptop with a Broadcom card, I followed the instructions on the <a href="http://ubuntuforums.org/showthread.php?p=1071920&mode=linear"> ubuntu forums </a> which worked great with one exception: the driver link on that page doesn't have a valid md5 sum so you can download it from <a href="http://forums.fedoraforum.org/forum/attachment.php?attachmentid=7759">this url</a> instead</p>
</li>
<li>
<p>Install <strong>gpsd.</strong> This is the software that talks to your gps unit and is available in the ubuntu packages through apt. The one hitch is that I had to set up my Magellan GPS unit up for the correct baud rate and NMEA output. Once installed, I connected the GPS unit via a serial port, turned it on and ran _ gpsd /dev/ttyS0 _ to start the gpsd server.</p>
</li>
<li>
<p>Install <strong>kismet,</strong> the wireless packet sniffer. The version in the ubuntu repository is not recent enough to support my Broadcom driver so I had to download the latest source and compile it with the standard _ configure, make, sudo make install <em>. Then I had to edit the /usr/local/etc/kismet.conf to reflect my system configuration; I changed the _suiduser</em>, <em>source</em> and <em>logtemplate</em> variables. Once configured, you can start it with the command <em>sudo kismet</em>.</p>
</li>
<li>
<p>Now <strong>drive/bike/walk around</strong> for a bit with your laptop and gps unit. When you're done, shutdown kismet and you'll have a bunch of fresh logfiles to work with.</p>
</li>
<li>
<p>The main kismet log is an xml file containing all the info on the available wireless networks including their SSID, their encryption sheme, transfer rater and their geographic position via gpsd. I worked up a small python script, <a href="http://perrygeo.googlecode.com/svn/trunk/gis-bin/kismet2kml.py">kismet2kml.py</a> (based on a blog entry at <a href="http://www.larsen-b.com/Article/204.html">jkx@Home</a>), to <strong>parse the logfile into a KML file</strong> for use with Google Earth. It could certainly use some tweaking but it's a start. To run it, give it the kismet logfile and pipe the output to a kml file: </p>
</li>
</ol>
<div class="highlight"><pre><span></span><code>kismet2kml.py kismet-log-Jul-03-2006-1.xml > wardrive.kml
</code></pre></div>
<ol>
<li>Now fire up <strong>Google Earth</strong> (Linux version now available!) and load your KML file.</li>
</ol>
<p><img alt="" src="/assets/img/kismetkml.jpg"></p>
<p>Also, as James Fee <a href="http://www.spatiallyadjusted.com/2006/07/03/help-me-think-of-a-good-mashup-to-create/">points out</a>, posting your data as KML files means that the data can be integrated into a growing number of kml-ready apps including google maps (just upload the kml and point your browser to <em>http://maps.google.com/maps?q=http://your.server/wardrive.kml</em>). </p>
<p>Another neat application I've found for dealing with kismet logs is the <a href="http://wiki.openstreetmap.org/index.php/User:Dutch#Converting_Kismet_.gps_files_to_gpx">kismet2gpx script</a> for converting the kismet gps tracklog into gpx. Since most gps units have pretty tight limitations on the length of stored tracks, logging them to your laptop with kismet could be an effective way of creating detailed tracks on very long trips.</p>Mapserver Include2006-06-25T00:00:00-06:002006-06-25T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2006-06-25:/mapserver-include.html<p>If you mange even a small number of Mapserver sites, eventually you notice that you use a number of identical layers in multiple mapfiles. The way this is typically done is to copy and paste the LAYER definition into each mapfile. But inevitably you'll need to change the styling or …</p><p>If you mange even a small number of Mapserver sites, eventually you notice that you use a number of identical layers in multiple mapfiles. The way this is typically done is to copy and paste the LAYER definition into each mapfile. But inevitably you'll need to change the styling or the data source and you have to manually go through each mapfile to sync the changes. Wouldn't it be nice to define the layer in a single file and use it in many mapfiles?</p>
<p>While Mapserver has no concept of an "include", the C preprocessor (cpp) does. This is mentioned on the Mapserver list every time the subject of includes comes up. Still I have yet to find an actual example so I thought I'd share my notes on how I accomplish a mapserver include:</p>
<ol>
<li>Create your mapfile as usual but leave out any LAYER definitions that you wish to share amongst mapfiles. Instead use something like :</li>
</ol>
<blockquote>
<h1>include "landsat.layer"</h1>
</blockquote>
<ol>
<li>
<p>The C Preprocessor doesn't deal well with "#" which is the mapfile's chosen comment charachter. Instead replace with "##" to indicate a comment </p>
</li>
<li>
<p>Save this pseudo-mapfile as <em>mymap.template</em></p>
</li>
<li>
<p>Create a file in the same directory called <em>landsat.layer</em> with the LAYER block. </p>
</li>
<li>
<p>Run the template through the preprocessor to generate the real mapfile :</p>
</li>
</ol>
<blockquote>
<p>cpp -P -C -o mymap.map mymap.template </p>
</blockquote>
<p>The next step would be to script the preprocessing of <em>all</em> your mapfiles so that changing a layer definition in multiple mapfiles was as simple as changing the *.layer file and running the script. </p>Some thoughts on Where 2.02006-06-15T00:00:00-06:002006-06-15T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2006-06-15:/some-thoughts-on-where-20.html<p>Oh man, it's a long drive from San Jose back to Santa Barbara! Anyways, just got back from where 2.0 and want to throw out my quick summary of the event.</p>
<ul>
<li>There was alot of talk about all things <strong>open</strong>; open data, open source and open standards. There was …</li></ul><p>Oh man, it's a long drive from San Jose back to Santa Barbara! Anyways, just got back from where 2.0 and want to throw out my quick summary of the event.</p>
<ul>
<li>There was alot of talk about all things <strong>open</strong>; open data, open source and open standards. There was lots of buzz around the open street map project, osgeo applications like grass, ossim, gdal, mapbender, etc., and tons of discussion of WMS, WFS and other relevant standards. This is great as I think all three will be the cornerstone of the spatial industry in the near future. </li>
</ul>
<p>But, as I've mentioned before, people throw the word "open" around so much that it begins to loose meaning. From alot of conversations I had, I found many people were confused about the differences. Some folks seemed to think that the osgeo foundation was a data repository for open data (it may soon be! .. but not quite yet) and also that osgeo was an open standards organization trying to "compete" with the OGC. But that is what an event like this is for; to reach out and communicate, clarify and bridge the gaps between communities.</p>
<p>Of course I had to laugh as I heard a couple dozen people refer to Google Maps as an "open source" application.... it's proprietary source code using proprietary data through a proprietary data transfer mechanism. It may be "free" as in beer but that's about the extent of it's openness.</p>
<ul>
<li>
<p><strong>Social Data</strong>: using location technology as the basis for sharing personal experiences and social networking was a powerful theme at Where 2.0. It ran the gammut from tagging locations to writing personal travelogs to mobile location-based games to virtual worlds to mobile apps that would could differentiate stangers vs aquantainces in range of your bluetooth device. </p>
</li>
<li>
<p><strong>Security and privacy</strong>: There are implications to the web/where2.0 mindframe. Publishing your location and personal information in real time through the web and mobile devices brings up some frightening security and privacy issues. Who owns the data? What licenses are your personal data distributed under? Do you need others permission to post their photos or locations? Who decides what is acceptable and what gets taken down? How is spam dealt with? Only two speakers were brave enough to fully address these issues head on and the panel had some good discussion on these topics. Kudos to them. </p>
</li>
<li>
<p>Bringing location technology to <strong>the masses</strong>: This was repeated by a few speakers; that in order to be successful in spatial technologies you need to bring your service to the masses. Certainly if you're trying to compete in the social networking space, this is true. But in general GIS and spatial tech has application that are far beyond the interests of the vast majority of people.. emergency management, infrastructure, environmental, real estate, etc. </p>
</li>
</ul>
<p>The mantra that spatial data and services must appeal to a wide audience is analogous to saying that family cars are the only successful type of motorized vehicle. In terms of numbers, they may be a majority. But in terms of utility, there is a reason that construction companies pay hundreds of thousands of dollars for heavy industrial machinery.. because trying to haul tons of earth and debris with a Toyota Camry just doesn't work. Likewise there is a similar reason most municipalities don't use a Google Mashup to manage their parcel data.. it simply doesn't work. So what is appropriate for mass consumption may have little applicabilty to business/government/industry/research. And vice versa. </p>
<ul>
<li>
<p><strong>Mobile Applications</strong>: So much potential here and some really cool innovations in geotagging content. Really, for the first time, I got a sense that these personal devices could become a means for creating a vast database of socially relevant information. But the lack of security and privacy safegaurds along with the domination of the cellular networks and the heterogenous environment of mobile platforms, I still view most of this as pie-in-the-sky.</p>
</li>
<li>
<p>Some new discoveries: </p>
<ul>
<li>
<p>metacarta: A text parsing engine with a public API to extract geo info from plain text! </p>
</li>
<li>
<p>gutenkarte: An application of the above to classic works of literature.</p>
</li>
<li>
<p>open layers: A javascript application with a slick UI and simple API for displaying WMS and WFS</p>
</li>
<li>
<p>open street map: A fantastic project focussing on collaborative development of a public street database </p>
</li>
<li>
<p>mapstraction: A javascript layer on top of the 'Big 3' Mapping APIs that allows yoiu to switch seamlessly between the service providers.</p>
</li>
<li>
<p>Google Earth & Sketchup: GE for linux!!! Wooo-hooo!! There was also a sweet demo of creating 3D drawings in Sketchup and placing them in GE. Very slick.</p>
</li>
<li>
<p>Google Maps: Now with kml support! Just try http://maps.google.com/?q=http://path.to.your.kml </p>
</li>
<li>
<p>Mapguide: I am embarrased to say I have never tried out Autodesk's open source offering but the demo was sweet.. a very high powered GIS for a web app. And the Autodesk folks were about the nicest group of guys you could meet.</p>
</li>
<li>
<p>ArcGIS/Server 9.2: Author a map in ArcMap. Save as .mxd. Drop into web server. Instant kml and wms server! </p>
</li>
<li>
<p>And while not new to me, there were alot of good overviews of some of my favorite software packages like OSSIM, GRASS, GDAL, Geoserver and World Wind (Java version coming this fall!!). </p>
</li>
</ul>
</li>
<li>
<p>Finally, the prize for most interesting talk goes to Chris Spurgeon who spoke about the best geohacks of the last 3000 years. Long before computers, Chris showed how Eratosthenes measured the diameter of the earth, how the Polypenesian's used the stars as an advanced navigation system, how the post-renaissance world _re_discovered stars as a the key to navigation. And in more recent times he showed how Harry Beck reinvented the cartography of transportation with the London subway maps and how the VOR transmitters created highways in the featureless sky. This presentation really put current innovations in location technologies into perspective.</p>
</li>
</ul>
<p>OK sorry about the lack of links but it's too late in the evening for that. Hope you enjoyed my rundown and I'm sure I'll have more to say after I get some sleep!</p>Animating the Blue Marble2006-06-09T00:00:00-06:002006-06-09T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2006-06-09:/animating-the-blue-marble.html<p>A while back I posted my technique for creating an <a href="http://www.perrygeo.net/wordpress/?p=39">animated gif</a> out of a time series of maps. While this may have been the pinnacle of web animation circa 1997, the animated gif just didn't quite seem hip enough for this day and age.</p>
<p>Today I found a more …</p><p>A while back I posted my technique for creating an <a href="http://www.perrygeo.net/wordpress/?p=39">animated gif</a> out of a time series of maps. While this may have been the pinnacle of web animation circa 1997, the animated gif just didn't quite seem hip enough for this day and age.</p>
<p>Today I found a more modern example. This <a href="http://worldkit.org/wmstimenav/">WorldKit interface</a>, built with Flash, shows the seasonal progression of snow and land cover changes courtesy of the next generation Blue marble images. Complete with time slider, image fading and full animation controls, this interface really shines at providing an interactive experience rather than a passive visual display. </p>HostGIS Linux 3.6 Released2006-06-03T00:00:00-06:002006-06-03T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2006-06-03:/hostgis-linux-36-released.html<p>Though probably not as big of a news item as this week's <a href="http://www.ubuntu.com/news/606released">release of Ubuntu Dapper</a>, there's another Linux release that might be of interest to us GIS folk:</p>
<p>Built off of a <a href="http://www.slackware.com/">Slackware</a> base (one of the oldest, most stable linux distros), <a href="http://www.hostgis.com/linux/">HostGIS Linux</a> aims to be a "minimal …</p><p>Though probably not as big of a news item as this week's <a href="http://www.ubuntu.com/news/606released">release of Ubuntu Dapper</a>, there's another Linux release that might be of interest to us GIS folk:</p>
<p>Built off of a <a href="http://www.slackware.com/">Slackware</a> base (one of the oldest, most stable linux distros), <a href="http://www.hostgis.com/linux/">HostGIS Linux</a> aims to be a "minimal yet complete" distribution specifically built with GIS in mind. It is first and foremost a server platform; it does not include any window system at all. If you're looking for desktop GIS applications out-of-box, it might not be the best for you. </p>
<p>But for a GIS server, it comes with most of the open source stack preinstalled and configured. This latest release has <a href="http://www.hostgis.com/linux/manual/changes.html">a few changes</a> and version upgrades for most of the components.</p>
<ul>
<li>
<p>PHP, Python and Perl Mapscript </p>
</li>
<li>
<p>GDAL/OGR with PHP, Python and perl bindings </p>
</li>
<li>
<p>Postgresql 8.1 with PostGIS 1.1 </p>
</li>
<li>
<p>drivers for many extra formats including jpeg2000 and ecw </p>
</li>
<li>
<p>Apache web server with Mapserver CGI </p>
</li>
</ul>
<p>The primary motivation for creating HGL was to speed up the installation of new gis-enabled servers. Gregor Mosheh, the head programmer for HostGIS, has done an excellent job pretty much single-handedly putting this together. ( In full disclosure, I do consulting work for HostGIS, though I wasn't really involved in the creation of HostGIS Linux. )</p>
<p>The setup is your standard text-based install and is a piece of cake if you've ever installed Linux before. When you're through, you have the good ole' black and white text console staring at you. Not very interesting... But the really satisfying part is to fire up a web browser after the install and be able to point it to a working webGIS application. Anyone who has spent the time to set up the mapserver stack and its seemingly infinite dependencies can appreciate the amount of work this saves! </p>
<p>If you're not into learning a new distro, there is always the <a href="http://www.maptools.org/fgs/">FGS</a> linux installer which will set up a similar software stack on pretty much any linux.</p>
<p>And for Desktop GIS, many linux distros have a selection of GIS apps in their package repositories (You'll want to certainly grab GRASS, GDAL and QGIS) . <a href="http://fwtools.maptools.org/">FWTools</a> can be a good option on both Linux and Windows to get you up and running quickly. Finally there are a number of other more desktop-oriented distros for GIS including <a href="http://www.sourcepole.com/gis-knoppix/"> Knoppix GIS</a> and <a href="http://www.geolivre.org.br/modules/news/">GeoLivre</a>, both of which run as a live-cd so you can check it out before you install.</p>
<p>Anyways, back to sum up HostGIS Linux: </p>
<p>If you need to set up a GIS server with minimal fuss and you have some experience with Linux, you might like to try it out. It will save lots of time. </p>
<p>If you're a GIS user who needs a graphic windows environment to do GIS work on the Desktop, HostGIS Linux will not really make you happy out-of-the-box. Of course, since HGL is slackware based, you <em>can</em> use the slackware package management system to build an impressive Desktop system. But if you don't need to run a server or really care about having the latest versions, Ubuntu comes with a solid desktop environment and packages for alot of good GIS apps. </p>More on Mapnik WMS2006-05-18T00:00:00-06:002006-05-18T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2006-05-18:/more-on-mapnik-wms.html<p>One of my initial complaints about the Mapnik WMS server was that it would not accept any parameters that were not in the OGC WMS spec. Some WMS clients will tag on extra parameters for various reasons and the OGC supports this in relation to vendor-specific parameters. The fix was …</p><p>One of my initial complaints about the Mapnik WMS server was that it would not accept any parameters that were not in the OGC WMS spec. Some WMS clients will tag on extra parameters for various reasons and the OGC supports this in relation to vendor-specific parameters. The fix was pretty simple;in <strong>mapnik/ogcserver/common.py</strong> you can simply comment out </p>
<blockquote></blockquote>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="n">#for</span><span class="w"> </span><span class="n">paramname</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">params</span><span class="p">.</span><span class="n">keys</span><span class="p">()</span><span class="err">:</span><span class="w"></span>
<span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">paramname</span><span class="w"> </span><span class="ow">not</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">SERVICE_PARAMS</span><span class="o">[</span><span class="n">requestname</span><span class="o">]</span><span class="p">.</span><span class="n">keys</span><span class="p">()</span><span class="err">:</span><span class="w"></span>
<span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="n">raise</span><span class="w"> </span><span class="n">OGCException</span><span class="p">(</span><span class="s1">'Unknown request parameter "%s".'</span><span class="w"> </span><span class="o">%</span><span class="w"> </span><span class="n">paramname</span><span class="p">)</span><span class="w"></span>
</code></pre></div>
<p>to get the desired effect.</p>
<hr>
<p>There was also the question of speed and how it compared to other WMS servers such as Mapserver. Since I already had both a Mapnik and Mapserver WMS set up using the exact same data source, styled in the same fashion, it was pretty simple to write a quick python script that would smack each WMS server with a given number of back-to-back WMS GetMap requests:</p>
<blockquote>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span>
<span class="normal">12</span>
<span class="normal">13</span></pre></div></td><td class="code"><div><pre><span></span><code><span class="ch">#!/usr/bin/env python</span>
<span class="kn">import</span> <span class="nn">urllib</span>
<span class="n">server</span> <span class="o">=</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="n">hits</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span>
<span class="k">if</span> <span class="n">server</span> <span class="o">==</span> <span class="s1">'mapnik'</span><span class="p">:</span>
<span class="n">url</span> <span class="o">=</span> <span class="s2">"http://localhost/fcgi-bin/wms?VERSION=1.1.1&REQUEST;=GetMap&SERVICE;=WMS&LAYERS;=world_borders&SRS;=EPSG:4326&BBOX;=-4.313249999999993,20.803500000000003,59.58675000000002,52.75350000000002&WIDTH;=800&HEIGHT;=400&FORMAT;=image/png&STYLES;=&TRANSPARENT;=TRUE&UNIQUEID;="</span>
<span class="k">elif</span> <span class="n">server</span> <span class="o">==</span> <span class="s1">'mapserver'</span><span class="p">:</span>
<span class="n">url</span> <span class="o">=</span> <span class="s2">"http://localhost/cgi-bin/mapserv?map=/home/perrygeo/mapfiles/world.map&VERSION;=1.1.1&REQUEST;=GetMap&SERVICE;=WMS&LAYERS;=worldborders&SRS;=EPSG:4326&BBOX;=-4.313249999999993,20.803500000000003,59.58675000000002,52.75350000000002&WIDTH;=800&HEIGHT;=400&FORMAT;=image/png&STYLES;=&TRANSPARENT;=TRUE&UNIQUEID;="</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="n">hits</span><span class="p">):</span>
<span class="n">urllib</span><span class="o">.</span><span class="n">urlretrieve</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
</code></pre></div></td></tr></table></div>
</blockquote>
<p>Then just run the script from the command line, specifying the server and number of hits, and wrap it in the <em>time</em> command. Here are the results:</p>
<p><img alt="" src="/assets/img/manik_vs_mapserv_speed.png"></p>
<p>Pretty close. Mapserver was just slightly faster in every case. Now this is just a preliminary test and it would be interested to see a comparison:</p>
<ul>
<li>
<p>With larger datasets and more complex styling including classification and text labelling</p>
</li>
<li>
<p>With data from other sources such as postgis where the connection overhead might be significant</p>
</li>
<li>
<p>With Mapserver running as a fastcgi </p>
</li>
<li>
<p>With concurrent requests as opposed to back-to-back requests </p>
</li>
</ul>
<p>Overall though, my opinion of Mapnik WMS remains high and I'd love to put it in production use in the near future. Stay tuned...</p>Mapnik WMS Server2006-05-17T00:00:00-06:002006-05-17T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2006-05-17:/mapnik-wms-server.html<p>A few months ago, <a href="http://mapnik.org/"> Mapnik</a> came onto my radar and I was immediately impressed with the <a href="http://mapnik.org/maps/">beautiful</a> <a href="http://static.flickr.com/35/106561736_afcdc30ddb_o.png">cartography</a>. But, until recently, it was just a C++ libary with some python bindings that could be used to programmatically build nice map images from shapfiles, geotiffs or postgis layers. There were no …</p><p>A few months ago, <a href="http://mapnik.org/"> Mapnik</a> came onto my radar and I was immediately impressed with the <a href="http://mapnik.org/maps/">beautiful</a> <a href="http://static.flickr.com/35/106561736_afcdc30ddb_o.png">cartography</a>. But, until recently, it was just a C++ libary with some python bindings that could be used to programmatically build nice map images from shapfiles, geotiffs or postgis layers. There were no common interfaces such as WMS to access mapnik... until last month. Jean Francois Doyon recently added <a href="http://mapnik.org/news/2006/apr/18/wms/">a prototype WMS interface</a> to Mapnik. It runs as a fastcgi script under apache. It is still a bit rough around the edges but the result is well worth a little extra setup effort. </p>
<p>I set up Mapnik as a WMS server recently and would like to share my process and results. This tutorial assumes you already have python, postgresql/postgis, proj4, python imaging library and apache2 already running. The examples are for Ubuntu Dapper Drake.. they may work well on other versions of Ubuntu and Debian but for other unixes (and certainly windows) many things may need to be tweaked.</p>
<p>First off, we have to install the base mapnik libs. These depend on the boost python bindings and the whole compile process is very simple (if a bit slow) in Ubuntu:</p>
<blockquote>
<div class="highlight"><pre><span></span><code><span class="n">sudo</span><span class="w"> </span><span class="n">apt</span><span class="o">-</span><span class="n">get</span><span class="w"> </span><span class="n">install</span><span class="w"> </span>\<span class="w"></span>
<span class="w"> </span><span class="n">libboost</span><span class="o">-</span><span class="n">python1</span><span class="o">.</span><span class="mf">33.1</span><span class="w"> </span><span class="n">libboost</span><span class="o">-</span><span class="n">python</span><span class="o">-</span><span class="n">dev</span><span class="w"> </span>\<span class="w"></span>
<span class="w"> </span><span class="n">libboost</span><span class="o">-</span><span class="n">regex1</span><span class="o">.</span><span class="mf">33.1</span><span class="w"> </span><span class="n">libboost</span><span class="o">-</span><span class="n">regex</span><span class="o">-</span><span class="n">dev</span><span class="w"> </span>\<span class="w"></span>
<span class="w"> </span><span class="n">libboost</span><span class="o">-</span><span class="n">serialization</span><span class="o">-</span><span class="n">dev</span><span class="w"> </span>\<span class="w"></span>
<span class="w"> </span><span class="n">libboost</span><span class="o">-</span><span class="n">signals1</span><span class="o">.</span><span class="mf">33.1</span><span class="w"> </span><span class="n">libboost</span><span class="o">-</span><span class="n">signals</span><span class="o">-</span><span class="n">dev</span><span class="w"> </span>\<span class="w"></span>
<span class="w"> </span><span class="n">libboost</span><span class="o">-</span><span class="n">thread1</span><span class="o">.</span><span class="mf">33.1</span><span class="w"> </span><span class="n">libboost</span><span class="o">-</span><span class="n">thread</span><span class="o">-</span><span class="n">dev</span><span class="w"> </span>\<span class="w"></span>
<span class="w"> </span><span class="n">libboost</span><span class="o">-</span><span class="n">program</span><span class="o">-</span><span class="n">options1</span><span class="o">.</span><span class="mf">33.1</span><span class="w"> </span><span class="n">libboost</span><span class="o">-</span><span class="n">program</span><span class="o">-</span><span class="n">options</span><span class="o">-</span><span class="n">dev</span><span class="w"> </span>\<span class="w"></span>
<span class="w"> </span><span class="n">libboost</span><span class="o">-</span><span class="n">filesystem1</span><span class="o">.</span><span class="mf">33.1</span><span class="w"> </span><span class="n">libboost</span><span class="o">-</span><span class="n">filesystem</span><span class="o">-</span><span class="n">dev</span><span class="w"> </span>\<span class="w"></span>
<span class="w"> </span><span class="n">libboost</span><span class="o">-</span><span class="n">iostreams1</span><span class="o">.</span><span class="mf">33.1</span><span class="w"> </span><span class="n">libboost</span><span class="o">-</span><span class="n">iostreams</span><span class="o">-</span><span class="n">dev</span><span class="w"></span>
<span class="n">cd</span><span class="w"> </span><span class="o">~/</span><span class="n">src</span><span class="w"></span>
<span class="n">svn</span><span class="w"> </span><span class="n">checkout</span><span class="w"> </span><span class="n">svn</span><span class="p">:</span><span class="o">//</span><span class="n">svn</span><span class="o">.</span><span class="n">berlios</span><span class="o">.</span><span class="n">de</span><span class="o">/</span><span class="n">mapnik</span><span class="o">/</span><span class="n">trunk</span><span class="w"> </span><span class="n">mapnik</span><span class="w"></span>
<span class="n">cd</span><span class="w"> </span><span class="n">mapnik</span><span class="w"></span>
<span class="n">python</span><span class="w"> </span><span class="n">scons</span><span class="o">/</span><span class="n">scons</span><span class="o">.</span><span class="n">py</span><span class="w"> </span><span class="n">PYTHON</span><span class="o">=/</span><span class="n">usr</span><span class="o">/</span><span class="n">bin</span><span class="o">/</span><span class="n">python</span><span class="w"> </span><span class="n">PGSQL_INCLUDES</span><span class="o">=/</span><span class="n">usr</span><span class="o">/</span><span class="n">local</span><span class="o">/</span><span class="n">include</span><span class="o">/</span><span class="n">postgresql</span><span class="w"> </span>\<span class="w"></span>
<span class="w"> </span><span class="n">PGSQL_LIBS</span><span class="o">=/</span><span class="n">usr</span><span class="o">/</span><span class="n">local</span><span class="o">/</span><span class="n">lib</span><span class="o">/</span><span class="n">postgresql</span><span class="w"> </span><span class="n">BOOST_INCLUDES</span><span class="o">=/</span><span class="n">usr</span><span class="o">/</span><span class="n">include</span><span class="o">/</span><span class="n">boost</span><span class="w"> </span><span class="n">BOOST_LIBS</span><span class="o">=/</span><span class="n">usr</span><span class="o">/</span><span class="n">lib</span><span class="w"></span>
<span class="n">sudo</span><span class="w"> </span><span class="n">python</span><span class="w"> </span><span class="n">scons</span><span class="o">/</span><span class="n">scons</span><span class="o">.</span><span class="n">py</span><span class="w"> </span><span class="n">install</span><span class="w"> </span><span class="n">PYTHON</span><span class="o">=/</span><span class="n">usr</span><span class="o">/</span><span class="n">bin</span><span class="o">/</span><span class="n">python</span><span class="w"> </span><span class="n">PGSQL_INCLUDES</span><span class="o">=/</span><span class="n">usr</span><span class="o">/</span><span class="n">local</span><span class="o">/</span><span class="n">include</span><span class="o">/</span><span class="n">postgresql</span><span class="w"> </span>\<span class="w"></span>
<span class="w"> </span><span class="n">PGSQL_LIBS</span><span class="o">=/</span><span class="n">usr</span><span class="o">/</span><span class="n">local</span><span class="o">/</span><span class="n">lib</span><span class="o">/</span><span class="n">postgresql</span><span class="w"> </span><span class="n">BOOST_INCLUDES</span><span class="o">=/</span><span class="n">usr</span><span class="o">/</span><span class="n">include</span><span class="o">/</span><span class="n">boost</span><span class="w"> </span><span class="n">BOOST_LIBS</span><span class="o">=/</span><span class="n">usr</span><span class="o">/</span><span class="n">lib</span><span class="w"></span>
<span class="n">sudo</span><span class="w"> </span><span class="n">ldconfig</span><span class="w"></span>
</code></pre></div>
</blockquote>
<p>Now we have to set up some additional libs in order to run the WMS:</p>
<blockquote>
<div class="highlight"><pre><span></span><code><span class="nt">cd</span><span class="w"> </span><span class="o">~/</span><span class="nt">src</span><span class="w"></span>
<span class="nt">wget</span><span class="w"> </span><span class="nt">http</span><span class="o">://</span><span class="nt">easynews</span><span class="p">.</span><span class="nc">dl</span><span class="p">.</span><span class="nc">sourceforge</span><span class="p">.</span><span class="nc">net</span><span class="o">/</span><span class="nt">sourceforge</span><span class="o">/</span><span class="nt">jonpy</span><span class="o">/</span><span class="nt">jonpy-0</span><span class="p">.</span><span class="nc">06</span><span class="p">.</span><span class="nc">tar</span><span class="p">.</span><span class="nc">gz</span><span class="w"></span>
<span class="nt">tar</span><span class="w"> </span><span class="nt">-xzvf</span><span class="w"> </span><span class="nt">jonpy-0</span><span class="p">.</span><span class="nc">06</span><span class="p">.</span><span class="nc">tar</span><span class="p">.</span><span class="nc">gz</span><span class="w"></span>
<span class="nt">cd</span><span class="w"> </span><span class="nt">jonpy-0</span><span class="p">.</span><span class="nc">06</span><span class="o">/</span><span class="w"></span>
<span class="nt">sudo</span><span class="w"> </span><span class="nt">python</span><span class="w"> </span><span class="nt">setup</span><span class="p">.</span><span class="nc">py</span><span class="w"> </span><span class="nt">install</span><span class="w"></span>
<span class="err">#</span><span class="w"> </span><span class="nt">copy</span><span class="w"> </span><span class="nt">the</span><span class="w"> </span><span class="nt">ogcserver</span><span class="w"> </span><span class="nt">stuff</span><span class="w"> </span><span class="nt">into</span><span class="w"> </span><span class="nt">its</span><span class="w"> </span><span class="nt">own</span><span class="w"> </span><span class="nt">dir</span><span class="w"></span>
<span class="nt">mkdir</span><span class="w"> </span><span class="o">/</span><span class="nt">opt</span><span class="o">/</span><span class="nt">mapnik</span><span class="o">;</span><span class="w"> </span><span class="nt">cd</span><span class="w"> </span><span class="o">/</span><span class="nt">opt</span><span class="o">/</span><span class="nt">mapnik</span><span class="w"></span>
<span class="nt">cp</span><span class="w"> </span><span class="o">~/</span><span class="nt">src</span><span class="o">/</span><span class="nt">mapnik</span><span class="o">/</span><span class="nt">utils</span><span class="o">/</span><span class="nt">ogcserver</span><span class="o">/*</span><span class="w"> </span><span class="o">.</span><span class="w"></span>
</code></pre></div>
</blockquote>
<p>Now you'll want to edit the <strong>ogcserver.conf</strong> file and change the following lines. The <em>module</em> is essentially the name of a python file (minus the .py extension) that we'll create later. The height and width just cutoff the maximum possible image size that can be requested.</p>
<blockquote>
<div class="highlight"><pre><span></span><code> <span class="k">module</span>=<span class="n">worldMapFactory</span>
<span class="n">maxheight</span>=<span class="mi">2048</span>
<span class="n">maxwidth</span>=<span class="mi">2048</span>
</code></pre></div>
</blockquote>
<p>Create our "map factory" module defining data sources, styles, etc.( <strong>worldMapFactory.py</strong> ). Most of this configuration is explained in the mapnik docs and well-commented examples. One thing to note is that the shapefile must be specified <em>without</em> the .shp extension :</p>
<blockquote>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">mapnik.ogcserver.WMS</span> <span class="kn">import</span> <span class="n">BaseWMSFactory</span>
<span class="kn">from</span> <span class="nn">mapnik</span> <span class="kn">import</span> <span class="o">*</span>
<span class="k">class</span> <span class="nc">WMSFactory</span><span class="p">(</span><span class="n">BaseWMSFactory</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">BaseWMSFactory</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span>
<span class="n">sty</span> <span class="o">=</span> <span class="n">Style</span><span class="p">()</span>
<span class="n">rl</span> <span class="o">=</span> <span class="n">Rule</span><span class="p">()</span>
<span class="n">rl</span><span class="o">.</span><span class="n">symbols</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">PolygonSymbolizer</span><span class="p">(</span><span class="n">Color</span><span class="p">(</span><span class="mi">248</span><span class="p">,</span><span class="mi">216</span><span class="p">,</span><span class="mi">136</span><span class="p">)))</span>
<span class="n">rl</span><span class="o">.</span><span class="n">symbols</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">LineSymbolizer</span><span class="p">(</span><span class="n">Color</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">),</span><span class="mi">1</span><span class="p">))</span>
<span class="n">sty</span><span class="o">.</span><span class="n">rules</span><span class="o">.</span><span class="n">append</span><span class="p">(</span> <span class="n">rl</span> <span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">register_style</span><span class="p">(</span><span class="s1">'style1'</span><span class="p">,</span> <span class="n">sty</span><span class="p">)</span>
<span class="n">lyr</span> <span class="o">=</span> <span class="n">Layer</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">'world_borders'</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="s1">'shape'</span><span class="p">,</span> \
<span class="n">file</span><span class="o">=</span><span class="s1">'/opt/data/world_borders/world_borders'</span><span class="p">)</span>
<span class="n">lyr</span><span class="o">.</span><span class="n">styles</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="s1">'style1'</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">register_layer</span><span class="p">(</span><span class="n">lyr</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">finalize</span><span class="p">()</span>
</code></pre></div>
</blockquote>
<p>Now we need to set up apache2 to handle fastcgi:</p>
<blockquote>
<div class="highlight"><pre><span></span><code>sudo apt-get install libapache2-mod-fcgid
sudo a2enmod fcgid
</code></pre></div>
</blockquote>
<p>... and add some config lines to the apache config files, usually /etc/apache/httpd.conf but, in the case of this Ubuntu install, <strong>/etc/apache2/sites-enabled/default</strong> :</p>
<blockquote>
<div class="highlight"><pre><span></span><code> ScriptAlias /fcgi-bin/ /usr/lib/fcgi-bin/
< Directory "/usr/lib/fcgi-bin" >
AllowOverride All
Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
Order allow,deny
Allow from all
SetHandler fastcgi-script
< Directory>
</code></pre></div>
</blockquote>
<p>Create the fast-cgi directory refered to by apache</p>
<blockquote>
<div class="highlight"><pre><span></span><code>sudo mkdir /usr/lib/fcgi-bin
</code></pre></div>
</blockquote>
<p>Now create the actual server script as <strong>/usr/lib/fcgi-bin/wms</strong></p>
<blockquote>
<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span>
<span class="normal">12</span>
<span class="normal">13</span>
<span class="normal">14</span>
<span class="normal">15</span></pre></div></td><td class="code"><div><pre><span></span><code><span class="ch">#!/usr/bin/env python</span>
<span class="c1"># Your mapnik dir containing the map factory </span>
<span class="c1"># must be in the python path!</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="n">sys</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="s1">'/opt/mapnik'</span><span class="p">)</span>
<span class="kn">from</span> <span class="nn">mapnik.ogcserver.cgiserver</span> <span class="kn">import</span> <span class="n">Handler</span>
<span class="kn">import</span> <span class="nn">jon.fcgi</span> <span class="k">as</span> <span class="nn">fcgi</span>
<span class="k">class</span> <span class="nc">WMSHandler</span><span class="p">(</span><span class="n">Handler</span><span class="p">):</span>
<span class="n">configpath</span> <span class="o">=</span> <span class="s1">'/opt/mapnik/ogcserver.conf'</span>
<span class="n">fcgi</span><span class="o">.</span><span class="n">Server</span><span class="p">({</span><span class="n">fcgi</span><span class="o">.</span><span class="n">FCGI_RESPONDER</span><span class="p">:</span> <span class="n">WMSHandler</span><span class="p">})</span><span class="o">.</span><span class="n">run</span><span class="p">()</span>
</code></pre></div></td></tr></table></div>
</blockquote>
<p>Finally restart the apache server </p>
<blockquote>
<div class="highlight"><pre><span></span><code><span class="n">sudo</span><span class="w"> </span><span class="o">/</span><span class="n">etc</span><span class="o">/</span><span class="n">init</span><span class="o">.</span><span class="n">d</span><span class="o">/</span><span class="n">apache2</span><span class="w"> </span><span class="n">force</span><span class="o">-</span><span class="n">reload</span><span class="w"></span>
</code></pre></div>
</blockquote>
<p>Now you can access it with a WMS request like so:</p>
<blockquote>
<div class="highlight"><pre><span></span><code><span class="nt">http</span><span class="o">://</span><span class="nt">localhost</span><span class="o">/</span><span class="nt">fcgi-bin</span><span class="o">/</span><span class="nt">wms</span><span class="o">?</span><span class="nt">VERSION</span><span class="o">=</span><span class="nt">1</span><span class="p">.</span><span class="nc">1</span><span class="p">.</span><span class="nc">1</span><span class="o">&</span><span class="nt">REQUEST</span><span class="o">;=</span><span class="nt">GetMap</span><span class="o">&</span><span class="nt">LAYERS</span><span class="o">;=</span><span class="nt">world_borders</span><span class="o">&</span><span class="w"></span>
<span class="nt">FORMAT</span><span class="o">=</span><span class="nt">image</span><span class="o">/</span><span class="nt">png</span><span class="o">&</span><span class="nt">SRS</span><span class="o">;=</span><span class="nt">EPSG</span><span class="p">:</span><span class="nd">4326</span><span class="o">&</span><span class="nt">STYLES</span><span class="o">;=&</span><span class="nt">BBOX</span><span class="o">;=</span><span class="nt">-81</span><span class="p">.</span><span class="nc">54375</span><span class="o">,</span><span class="nt">-58</span><span class="p">.</span><span class="nc">3125</span><span class="o">,</span><span class="nt">-59</span><span class="p">.</span><span class="nc">04375</span><span class="o">,</span><span class="nt">-47</span><span class="p">.</span><span class="nc">0625</span><span class="o">&</span><span class="w"></span>
<span class="nt">EXCEPTIONS</span><span class="o">=</span><span class="nt">application</span><span class="o">/</span><span class="nt">vnd</span><span class="p">.</span><span class="nc">ogc</span><span class="p">.</span><span class="nc">se_inimage</span><span class="o">&</span><span class="nt">width</span><span class="o">;=</span><span class="nt">600</span><span class="o">&</span><span class="nt">height</span><span class="o">;=</span><span class="nt">300</span><span class="w"></span>
</code></pre></div>
</blockquote>
<p><img alt="" src="/assets/img/mapnik.png"></p>
<p>Compare the linework with a comparable WMS service with UMN Mapserver on the backend. I'll let the results speak for themselves...</p>
<p><img alt="" src="/assets/img/mapserv.png"></p>
<p>Even if it's map rendering is smooth, Mapnik's WMS server is still a bit rough around the edges:</p>
<ul>
<li>
<p>It does not support GetFeatureInfo requests</p>
</li>
<li>
<p>The server has trouble with extra parameters. For instance some WMS clients like mapbuilder like to
tag on an extra 'UNIQUEID' parameter to the URL and this causes an unnecessary error with mapnik's WMS server.</p>
</li>
<li>
<p>Mapnik intself does not support reprojection </p>
</li>
<li>
<p>It only supports shapefiles, geotiffs and postgis layers.</p>
</li>
</ul>
<p>The readme.txt file in docs/ogcserver/ directory of the recent mapnik SVN checkout has a full list of known features and caveats so refer to them for the complete story.</p>
<p>But, all in all, I am <em>very</em> impressed with the quality of the Mapnik WMS server. I figured that, since Mapnik's goal has been high-quality cartographic output, speed would be sacrificed but I didn't notice any significant lag; on the contrary I think it was actually about on-par with Mapserver running as a CGI. If it was any slower, I didn' t notice it immediately. But then again it was only working with a relatively small shapefile and I was the only user. I'd like to do more rigourous stress tests on the Mapnik WMS to see how it compares to Mapserver and Geoserver under varying loads with greater volumes of data.</p>Educational ways to waste some time2006-05-12T00:00:00-06:002006-05-12T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2006-05-12:/educational-ways-to-waste-some-time.html<p>It's always great to find fun internet-based games that actually challenge you in "real world" skills. (And no, working on your wizard's Ether Flame spell in EverQuest is NOT a real world skill). After all, if you going to waste some time, it might as well be educational, right. Can …</p><p>It's always great to find fun internet-based games that actually challenge you in "real world" skills. (And no, working on your wizard's Ether Flame spell in EverQuest is NOT a real world skill). After all, if you going to waste some time, it might as well be educational, right. Can you tell that my mother is a school teacher? Happy Mothers day!</p>
<p>Anyways, these might be old news to some folks but I've found two fun games that will keep your brain fresh.</p>
<p>First, there is <a href="http://geosense.net">GeoSense</a>. This is a fanstastic interactive game that pits users one-on-one in a timed geography quiz. You're given a city and country and you have 10 seconds to click the map. The player with the best combination of speed and accuracy wins. Given <a href="http://news.nationalgeographic.com/news/2006/05/0502_060502_geography.html">American youth's horrible knowledge of geography</a>, this site could be really helpful. I would recommend it to children of all ages if it weren't for the chatroom being infested with pubescent teen sex fiends. Just go use use myspace or something...</p>
<p>Secondly, for you Python programmers out there, there is the <a href="http://www.pythonchallenge.com/">Python Challenge</a>, a surprisingly challenging and mind-boggling course of puzzles that can be solved with Python. Actually some people have solved them with UNIX shell commands, perl or ruby, but many of the hints are python specific. They require a good dose of logic, persistence, knowledge of python libraries and a knack for finding patterns. Basically your goal is, given a minimal set of hints to find and process the data that will lead you to the next URL. I'm on level 9 right now and, well, I'm not going to admit to anyone how long it took to get there. Addictively challenging...</p>
<p>Thats it for now. Have fun.</p>The impact of urban areas on CO2 emmissions2006-05-06T00:00:00-06:002006-05-06T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2006-05-06:/the-impact-of-urban-areas-on-co2-emmissions.html<p>Increases in atmoshperic carbon dioxide (CO2) due to vehicle emmisions are considered one of the most important human-induced factors of climate change. Conventional wisdom would say that urban areas, with their huge populations, dense road networks and congested freeways, are the biggest offenders. This is true to some extent. But …</p><p>Increases in atmoshperic carbon dioxide (CO2) due to vehicle emmisions are considered one of the most important human-induced factors of climate change. Conventional wisdom would say that urban areas, with their huge populations, dense road networks and congested freeways, are the biggest offenders. This is true to some extent. But, viewed from a different perspective, the <em>per-capita</em> CO2 emmissions for these urban areas can be considerably less than surrounding rural and suburban areas.</p>
<p>Travelmatter.org has posted <a href="http://www.travelmatters.org/maps/regional/"> a series of maps</a> comparing these two conflicting views. Here's a sample from Chicago that demonstrates the sharp dichotomy; both entirely accurate but different ways of analyzing the same data:</p>
<p><img alt="" src="/assets/img/co2-map-chi-med.gif"></p>
<p>In every case, the <em>total</em> CO2 emmissions are much greater in dense urban areas. But, <em>per-capita</em>, the urban areas have much lower emmissions, sometimes dramatically lower. This second view indicates, as <a href="http://www.worldchanging.com/archives/004390.html"> WorldChanging </a> points out, that living in denser neighborhoods can reduce your climate impact. It makes sense that living closer to the places you need to go on a daily basis and having more access to public transportation would reduce the emmissions impact. Maybe cities are "greener" than most of us percieve them to be? </p>USGS Seamless is back2006-05-05T00:00:00-06:002006-05-05T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2006-05-05:/usgs-seamless-is-back.html<p>Two weeks after I first noticed something had gone awry with the USGS Seamless site, they appear to have fixed their server issues. As of this morning, the interactive <a href="http://seamless.usgs.gov/website/seamless/viewer.php">data viewer and download interface</a> is fully functionaly as far as I can tell. </p>
<p>Now be gentle on their server. Rumour …</p><p>Two weeks after I first noticed something had gone awry with the USGS Seamless site, they appear to have fixed their server issues. As of this morning, the interactive <a href="http://seamless.usgs.gov/website/seamless/viewer.php">data viewer and download interface</a> is fully functionaly as far as I can tell. </p>
<p>Now be gentle on their server. Rumour has it, if you download more than 3 DEMs at a time, the server might go down for another 2 weeks! Just kidding... everything seems to be working fine. Download away....</p>What’s going on with seamless.usgs.gov ?2006-04-25T00:00:00-06:002006-04-25T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2006-04-25:/whats-going-on-with-seamlessusgsgov.html<p>Since April 21, I have not been able to view or extract any data from the USGS Seamless site, ostensibly the central distribution center for the US National Map. The site has been changing rapidly from day to day ever since and it seems that changes are underway so at …</p><p>Since April 21, I have not been able to view or extract any data from the USGS Seamless site, ostensibly the central distribution center for the US National Map. The site has been changing rapidly from day to day ever since and it seems that changes are underway so at least we know someone is working on it.. or hacking it to pieces. The last day or two they appear to have given up and are just redirecting people to gisdata.usgs.gov which, of course, has no mention of the outage on the home page.</p>
<p>When I develop an internet application, even if it's only used by a few people, I usually seperate the development version from the stable, live version to minimize any downtime. And if you absolutely can't keep the app running, at least put a big banner on the page indicating that the system is down so people (like me) don't waste half an hour trying to figure out what they're doing wrong. Is this too much to ask of the USGS? They are supposed to be the official portal for accessing our nation's spatial data, right? And we're not talking about a small server hiccup here, it has been down since at least April 21st with no public indication that problems are occuring on the site. </p>
<p>I just recieved an email this morning from the USGS web mapping admin. The emphasis is mine:</p>
<blockquote>
<p>We apologize for any issues you may have experienced lately. The Seamless server, and all related map services will be unavailable for at least the next few days. During this time, <strong>the sites may still appear to be functioning.</strong> Some may ask for a password, and others may not show up at all. Normally our status messages are posted at http://seamless.usgs.gov. However, since this server has been affected by this outage, users are being re-directed to http://gisdata.usgs.net. We are in the process of posting a message here as well, which you will be able to monitor for any updates. We are estimating that the site will be available again by <strong>Monday May 1st 2006.</strong> Our team is working diligently to have this service available as soon as possible. We appreciate you patience during this time. </p>
</blockquote>
<p>I really shouldn't be surprised that a government agency botched it so badly; that seems to be the norm here in the US. But I've really come to rely on the seamless site for alot of data and it seems that 10 days of downtime for the <em>sole</em> distributor of our seamless national spatial data archive is a bit... amateur. </p>The distinction between open source and open standards2006-04-23T00:00:00-06:002006-04-23T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2006-04-23:/the-distinction-between-open-source-and-open-standards.html<p>Time and time again I see <em>open source</em> and <em>open standards</em> mentioned in the <a href="http://veryspatial.com/?p=802">same</a> <a href="http://www.ced.org/projects/ecom.shtml#open">sentence</a>. While I'm a strong proponet of both, it is a bit disheartening to see how closely intertwined the two concepts are in the eyes of many GIS folks. </p>
<p>Open source refers to <em>software</em> distributed …</p><p>Time and time again I see <em>open source</em> and <em>open standards</em> mentioned in the <a href="http://veryspatial.com/?p=802">same</a> <a href="http://www.ced.org/projects/ecom.shtml#open">sentence</a>. While I'm a strong proponet of both, it is a bit disheartening to see how closely intertwined the two concepts are in the eyes of many GIS folks. </p>
<p>Open source refers to <em>software</em> distributed with a license that allows access to view and modify the source code. There are also some <a href="http://www.opensource.org/index.php">other criteria</a> but unrestricted access to the source code is the key component. </p>
<p>Open standards refers to <em>software-neutral</em> specifications, usually developed collaboratively, to accomplish a technical goal. In the GIS world, this typically means <a href="http://www.opengeospatial.org/specs/?page=specs">OpenGIS specifications</a> for sharing data across a network (WMS/ WFS/ WCS), data formats (GML), or for working with spatial data in a relational database (Simple Features Spec for SQL). We could arguably include pseudo-open specifications for data such as <a href="http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf">shapfiles</a> and <a href="http://earth.google.com/kml/kml_intro.html">KML</a>.</p>
<p>Open source applications do not always conform to open standards. Standards-compliant software does not necessarily have to be open source. So why are the two often mentioned in the same breath as though they were synonymous? Perhaps open source software is perceived as being "ahead" of other types of software in terms of adoption of standards; and maybe that's true. But there are many proprietary software companies that have devoted alot of effort towards making their software communicate via open standards and their efforts should not go unnoticed (<a href="http://www.esri.com/software/standards/ogc-support.html">ESRI</a> and <a href="http://www.cadcorp.com/">Cadcorp</a> just to name the two I'm familiar with). </p>
<p>The promise of open standards is that anyone can develop and use compliant applications that can easily interoperate regardless of the chosen software package. While that promise is far from being fully realized, associating open standards with a particular type of software will not get us any closer. </p>
<p><strong>Update</strong>: Or maybe we <em>are</em> getting close... check out <a href="http://geospatial.blogs.com/geospatial/2006/04/interoperabilit.html">Geoff Ziess' post</a> on the OGC interoperability demonstration in Tampa. Ten vendors interoperating and sharing data in real time... this is what it's all about.</p>Animating Static Maps - The Geologic Evolution of North America2006-04-11T00:00:00-06:002006-04-11T00:00:00-06:00Matthew T. Perrytag:www.perrygeo.com,2006-04-11:/animating-static-maps-the-geologic-evolution-of-north-america.html<p>The Cartography blog <a href="http://ccablog.blogspot.com/2006/04/paleogeographic-maps.html"> recently talked about </a> a series of <a href="http://jan.ucc.nau.edu/%7Ercb7/nam.html">excellent Paleogeographic maps</a> developed by Dr. Ron Blakey at Northern Arizona University. Ever since I first studied geology, I had dreamed of an atlas that would clearly and visually demostrate how our current land masses came to be. This time series …</p><p>The Cartography blog <a href="http://ccablog.blogspot.com/2006/04/paleogeographic-maps.html"> recently talked about </a> a series of <a href="http://jan.ucc.nau.edu/%7Ercb7/nam.html">excellent Paleogeographic maps</a> developed by Dr. Ron Blakey at Northern Arizona University. Ever since I first studied geology, I had dreamed of an atlas that would clearly and visually demostrate how our current land masses came to be. This time series of maps focuses on North America and the geologic events that shaped have shaped it for the last 500 million years. Truly fascinating and excellent work. I encourage everyone to check out the site and read a little about it as well as <a href="http://bldgblog.blogspot.com/2006/04/assembling-north-america_11.html"> the narrative by Geoff Manaugh</a> . </p>
<p><img alt="" src="/assets/img/29.gif"></p>
<p>Now it occured to me that a time series of maps lends itself very well to an animated sequence. While I am no graphic artist, I have done a few projects in the past that required stiching together a time-series of maps into an animated gif. The process is fairly simple:</p>
<ol>
<li>
<p>Download or create each map you want to include in the series. For best results, all maps should have the same size and extents.</p>
</li>
<li>
<p>Rename the images in alpha-numeric order (001.jpg, 002.jpg.... 045.jpg) </p>
</li>
<li>
<p>Install <a href="http://www.imagemagick.org/script/index.php">ImageMagick</a> - a collection of efficient command line tools for image processing. It supports almost every common image format available these days.</p>
</li>
<li>
<p>run the <em>convert</em> command to create the animated gif:</p>
</li>
</ol>
<div class="highlight"><pre><span></span><code><span class="nv">convert</span><span class="w"> </span><span class="o">-</span><span class="nv">geometry</span><span class="w"> </span><span class="mi">500</span><span class="nv">x483</span><span class="w"> </span><span class="o">-</span><span class="nv">delay</span><span class="w"> </span><span class="mi">200</span><span class="w"> </span><span class="o">-</span><span class="k">loop</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">*</span>.<span class="nv">jpg</span><span class="w"> </span><span class="nv">mymovie</span>.<span class="nv">gif</span><span class="w"></span>
</code></pre></div>
<p>The geometry is simply the WIDTHxHEIGHT dimensions of the output image (it helps if this is proportional to the original image dimensions). </p>
<p>The delay parameter specifies how many hundreths of a second delay occurs between each frame. </p>
<p>The loop parameter, when set to zero, indicates the gif will loop infinitely.</p>
<p>The *.jpg, if your operating environment supports wildcards, will take each of the jpg images in the current directory and stich them into an animated gif named mymovie.gif</p>
<p>Viola! An animated movie from a series of static maps. In the case of the Paleogeologic maps, there were 41 maps which produced a sizable animated gif (about 7.5 MB). You can <a href="/assets/img/geo_evolution.gif">check out the results here</a>. I could watch this play for hours!! Really fascinating stuff.. many thanks to Dr. Ron Blakey for putting this project together.</p>LIDAR data processing with open source tools2006-04-01T00:00:00-07:002006-04-01T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2006-04-01:/lidar-data-processing-with-open-source-tools.html<p>LIDAR data is certainly a hot technology these days. LIght Detection And Ranging data can be used to create extremely detailed terrain models but there are lots of barriers to using LIDAR data effectively. <a href="http://lidar.cr.usgs.gov/"> USGS Center for LIDAR Information Coordination and Knowledge </a> was put in place to "<em>facilitate data access …</em></p><p>LIDAR data is certainly a hot technology these days. LIght Detection And Ranging data can be used to create extremely detailed terrain models but there are lots of barriers to using LIDAR data effectively. <a href="http://lidar.cr.usgs.gov/"> USGS Center for LIDAR Information Coordination and Knowledge </a> was put in place to "<em>facilitate data access, user coordination and education of lidar remote sensing for scientific needs</em>". </p>
<p>Beyond the sheer size of the datasets and the knowledge and hardware required to process them, software is a big issue. In the realm of open-source GIS tools, there are many applications (GRASS being the most prominent) for dealing with elevation point data and processing it into more meaningful products such as elevation DEMs and contours. </p>
<p>Usually the data comes as simple ASCII text files and the x,y and z values are easily extracted from such a file. But take a look at the USGS data distribution site and you'll notice some of the datasets are distributed as <a href="http://www.lasformat.org/">LAS binary files</a>. It makes sense to store such massive datasets in binary so I started looking for some LAS conversion tools.So after some searching, I found a bunch of proprietary products for working with LAS but no open source tools. Luckily, the format is <a href="http://www.lasformat.org/documents/ASPRS%20LAS%20Format%20Documentation%20-%20V1.1%20-%2003.07.05.pdf">well documented</a> thanks to the efforts by the ASPRS to make it an open specification.</p>
<p>So dusting off my notes about parsing binary files in python, I set out to create a python module for extracting LIDAR data from LAS files. The LAS format contains a header which needs to be parsed first in order to read the point cloud. Once you have the header info, you can scan your way through the dataset to pick out the x,y,z values. </p>
<p>Here's an example of the python interface that will read the first 10,000 points into a 2D shapefile with the elevation as a attribute in the dbf:</p>
<blockquote>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">pylas</span>
<span class="n">infile</span> <span class="o">=</span> <span class="s1">'sanand000001.las'</span>
<span class="n">outfile</span> <span class="o">=</span> <span class="s1">'lidar.shp'</span>
<span class="n">header</span> <span class="o">=</span> <span class="n">pylas</span><span class="o">.</span><span class="n">parseHeader</span><span class="p">(</span><span class="n">infile</span><span class="p">)</span>
<span class="n">pylas</span><span class="o">.</span><span class="n">createShp</span><span class="p">(</span><span class="n">outfile</span><span class="p">,</span> <span class="n">header</span><span class="p">,</span> <span class="n">numpts</span><span class="o">=</span><span class="mi">10000</span><span class="p">,</span> <span class="n">rand</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
</code></pre></div>
</blockquote>
<p>The issue I struggled with is the sheer size of these datasets. A USGS quarter quad can contain 10 million points which is an excessive number of points to create, say, a 10 meter DEM over such a small area. Clearly there was a need to extract a subset of this dataset but just taking the points sequentially gives you a subset of the total area. So, by default, pylas randomly scans the data to pull the number of specified points so that the point cloud could cover the entire area (at a much lower point density). Without numpts specified, it will randomly select 1/2000th of the total number.</p>
<p>So the simplified interface to make a more manageable lidar shapefile would be:</p>
<blockquote>
<div class="highlight"><pre><span></span><code>header = pylas.parseHeader(infile)
pylas.createShp(outfile, header)
</code></pre></div>
</blockquote>
<p>Once the shapefile is created, you can bring it into GRASS to do the processing to generate DEMs, contours and other derived elevation products:</p>
<blockquote>
<div class="highlight"><pre><span></span><code><span class="n">v</span><span class="p">.</span><span class="n">in</span><span class="p">.</span><span class="n">ogr</span><span class="w"> </span><span class="n">dsn</span><span class="o">=</span><span class="n">lidar</span><span class="p">.</span><span class="n">shp</span><span class="w"> </span><span class="n">layer</span><span class="o">=</span><span class="n">lidar</span><span class="w"> </span><span class="k">output</span><span class="o">=</span><span class="n">lidar</span><span class="w"></span>
<span class="n">g</span><span class="p">.</span><span class="n">region</span><span class="w"> </span><span class="n">vect</span><span class="o">=</span><span class="n">lidar</span><span class="w"></span>
<span class="n">g</span><span class="p">.</span><span class="n">region</span><span class="w"> </span><span class="n">res</span><span class="o">=</span><span class="mh">10</span><span class="w"></span>
<span class="n">v</span><span class="p">.</span><span class="n">surf</span><span class="p">.</span><span class="n">rst</span><span class="w"> </span><span class="k">input</span><span class="o">=</span><span class="n">lidar</span><span class="w"> </span><span class="n">elev</span><span class="o">=</span><span class="n">lidar_dem</span><span class="w"> </span><span class="n">zcolumn</span><span class="o">=</span><span class="n">elev</span><span class="w"></span>
<span class="p">#</span><span class="w"> </span><span class="n">Launch</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">interactive</span><span class="w"> </span><span class="mh">3</span><span class="n">D</span><span class="w"> </span><span class="n">viewer</span><span class="w"></span>
<span class="n">nviz</span><span class="w"> </span><span class="n">lidar_dem</span><span class="w"></span>
</code></pre></div>
</blockquote>
<p><img alt="" src="/assets/img/nviz_lidar.png"></p>
<p>Of course the method I just described is very simplistic and does not even come close to utilizing the full potential of the LIDAR point cloud, but it's a start.</p>
<p>The pylas.py module can be <a href="http://pylas.googlecode.com/svn/trunk/pylas.py">downloaded here</a>. The code has worked for me on the few datasets I've tested it with but it should certainly be considered a rough-cut, alpha product. There is much room for improvement and, of course, if you have any suggestions or contributions, please get in touch.</p>My Top Ten'2006-03-26T00:00:00-07:002006-03-26T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2006-03-26:/my-top-ten.html<p>Web Mapping Services (WMS) are not always my prefered option for accessing data; relying on a remote server to generate a pretty picture of the data is hardly a substitute for having the raw data in hand. But for many cases, I just need a decent looking basemap image and …</p><p>Web Mapping Services (WMS) are not always my prefered option for accessing data; relying on a remote server to generate a pretty picture of the data is hardly a substitute for having the raw data in hand. But for many cases, I just need a decent looking basemap image and don't want to download gigabytes of data, especially if that data is updated frequently. </p>
<p>Software like GeoServer and Mapserver are making it easier to publish data via WMS and the number of WMS servers is surely growing... but how do you find them? There is no central registry for WMS servers but efforts like the <a href="http://www.refractions.net/white_papers/ogcsurvey/">refractions research ogc survey</a>, <a href="http://www.mapdex.org/wms_list.cfm">mapdex</a> and a few <a href="http://chris.narx.net/2006/01/19/wms-service-mining/">google tricks</a> are making it easier to find data distributed via WMS. After many hours digging through WMS services to find the ones that suite my mapping needs, I've come across a number of gems that I use time and time again. Hopefully this will inspire some others to share their secret stash of WMS servers! </p>
<p>(<strong>Update:</strong> <a href="http://my.opera.com/gisuser/blog/show.dml/199960">Anything Geospatial</a> has a great link to a well-organized <a href="http://www.skylab-mobilesystems.com/en/wms_serverlist.html"> WMS server list</a> for public use. Nice. )</p>
<p>You should be able to provide the online resource URL to your favorite WMS client software (my personal choice is <a href="http://openjump.org/wiki/show/HomePage">openjump</a>) and the client should display the list of layers available from that service. </p>
<p>If you're contructing WMS URLs "by hand" or in a browser, you can do a capabilities request (the online resource URL with _service=WMS?request=GetCapabilities _ appended to it) which will return an XML document describing the available layers, image formats, projections,etc. Take a look at the image src for any of the thumbnails below to see how the map request is constructed.</p>
<ol>
<li>TerraServer Digital Raster Graphic (DRG): USGS Topo Quads
<strong> Online Resource URL </strong> : _ http://terraservice.net/ogcmap.ashx? _ </li>
</ol>
<p><strong> Layer Name </strong> : <em>DRG</em>
<img alt="" src="http://terraservice.net/ogcmap.ashx?VERSION=1.1.1&SERVICE=wms&request=GetMap&LAYERS=DRG&FORMAT=jpeg&styles=&SRS=EPSG:4326&BBOX=-124.1,41.2,-123.9,41.4&WIDTH=150&HEIGHT=150"></p>
<ol>
<li>TerraServer Digital Ortho Photo Quads (DRG): Black and white aerial photos for the US
<strong> Online Resource URL </strong> : _ http://terraservice.net/ogcmap.ashx? _ </li>
</ol>
<p><strong> Layer Name </strong> : <em>DOQ</em>
<img alt="" src="http://terraservice.net/ogcmap.ashx?VERSION=1.1.1&SERVICE=wms&request=GetMap&LAYERS=DOQ&FORMAT=jpeg&styles=&SRS=EPSG:4326&BBOX=-124.1,41.2,-123.9,41.4&WIDTH=150&HEIGHT=150"></p>
<ol>
<li>NASA Landsat Imagery
The Landsat mosaic is available in fase color (default) or in natural color (style=visual) as shown below. </li>
</ol>
<p><strong> Online Resource URL </strong> : <em>http://onearth.jpl.nasa.gov/wms.cgi?</em> </p>
<p><strong> Layer Name </strong> : <em>global_mosaic</em>
<img alt="" src="http://onearth.jpl.nasa.gov/wms.cgi?VERSION=1.1.1&SERVICE=wms&request=GetMap&LAYERS=global_mosaic&FORMAT=image/png&styles=visual&SRS=EPSG:4326&BBOX=-124.1,41.2,-123.9,41.4&WIDTH=150&HEIGHT=150"></p>
<ol>
<li>45-minute Weather Radar Images (NEXRAD Base Reflectivity).
Since this is a dynamic data source, the image below may look really boring (ie blank) if there's no storms over the Continental US. </li>
</ol>
<p><strong> Online Resource URL </strong> : <em>http://mesonet.agron.iastate.edu/cgi-bin/wms/nexrad/n0r.cgi?</em> </p>
<p><strong> Layer Name </strong> : <em>nexrad-n0r-m45m</em> </p>
<p><img alt="" src="http://mesonet.agron.iastate.edu/cgi-bin/wms/nexrad/n0r.cgi?VERSION=1.1.1&SERVICE=wms&request=GetMap&LAYERS=nexrad-n0r-m45m&FORMAT=jpeg&styles=&SRS=EPSG:4326&BBOX=-125,25,-65,55&WIDTH=300&HEIGHT=150"></p>
<ol>
<li>USGS National Landcover
The 30-meter natial landcover dataset. USGS is nice enough to provide a legend, of course. </li>
</ol>
<p><strong> Online Resource URL </strong> : <em>http://gisdata.usgs.net/servlet/com.esri.wms.Esrimap?ServiceName=USGS_WMS_NLCD&</em> </p>
<p><strong> Layer Name </strong> : <em>US_NLCD</em> </p>
<p><img alt="" src="http://gisdata.usgs.net/servlet/com.esri.wms.Esrimap?ServiceName=USGS_WMS_NLCD&request=GetMap&LAYERS=US_NLCD&FORMAT=image/png&SRS=EPSG:4326&BBOX=-124.1,41.2,-123.9,41.4&WIDTH=150&HEIGHT=150"></p>
<p><img alt="" src="http://gisdata.usgs.net/Image_Library/legends/Legend_NLCD5.png"></p>
<ol>
<li>USGS National Elevation - Shaded Relief
<strong> Online Resource URL </strong> : <em>http://gisdata.usgs.net:80/servlet/com.esri.wms.Esrimap?servicename=USGS_WMS_NED&</em> </li>
</ol>
<p><strong> Layer Name </strong> : <em>US_NED_Shaded_Relief</em>
<img alt="" src="http://gisdata.usgs.net:80/servlet/com.esri.wms.Esrimap?servicename=USGS_WMS_NED&request=GetMap&LAYERS=US_NED_Shaded_Relief&FORMAT=image/jpeg&SRS=EPSG:4326&BBOX=-124.1,41.2,-123.9,41.4&WIDTH=150&HEIGHT=150"></p>
<ol>
<li>USGS Reference Maps
<strong> Online Resource URL </strong> : <em>http://gisdata.usgs.net:80/servlet/com.esri.wms.Esrimap?servicename=USGS_WMS_REF&</em> </li>
</ol>
<p><strong> Layer Names </strong> : <em>States,County,Roads,Route_Numbers,Streams,Federal_Lands</em>
<img alt="" src="http://gisdata.usgs.net:80/servlet/com.esri.wms.Esrimap?servicename=USGS_WMS_REF&request=GetMap&LAYERS=States,County,Roads,Route_Numbers,Streams,Federal_Lands&FORMAT=image/png&SRS=EPSG:4326&BBOX=-124.1,41.2,-123.9,41.4&WIDTH=150&HEIGHT=150"></p>
<ol>
<li>Life Mapper
Besides the standard WMS paramters, some services can take extra parameters in order to render a map. In this excellent service, LifeMapper requires that you provide the species name and it will render maps of known species locations and modelled distributions. Here's an example of the distribution of Black Bear (ie. <em>Ursus americanus</em>) over central california </li>
</ol>
<p><strong> Online Resource URL </strong> : <em>http://www.lifemapper.org/Services/WMS/?ScientificName=Ursus%20americanus&</em> </p>
<p><strong> Layer Names </strong> : <em>Species Distribution Models,Political Boundaries,Species Data Points</em> </p>
<p><img alt="" src="http://www.lifemapper.org/Services/WMS/?Version=1.1.0&Request=GetMap&width=150&height=150&Bbox=-124.1,35.4,-118.1,41.4&Layers=Species%20Distribution%20Models,Political%20Boundaries,Species%20Data%20Points&Styles=&ScientificName=Ursus%20americanus&SRS=EPSG:4326&format=image/gif"></p>
<ol>
<li>MODIS Daily Satellite Imagery
<strong> Online Resource URL </strong> : <em>http://wms.jpl.nasa.gov/wms.cgi?</em> </li>
</ol>
<p><strong> Layer Names </strong> : <em>daily_terra, daily_aqua</em> </p>
<p>Terra
Aqua </p>
<p><img alt="" src="http://wms.jpl.nasa.gov/wms.cgi?request=GetMap&LAYERS=daily_terra&FORMAT=image/png&SRS=EPSG:4326&BBOX=-124.1,35.4,-118.1,41.4&WIDTH=150&HEIGHT=150&styles="></p>
<p><img alt="" src="http://wms.jpl.nasa.gov/wms.cgi?request=GetMap&LAYERS=daily_aqua&FORMAT=image/png&SRS=EPSG:4326&BBOX=-124.1,35.4,-118.1,41.4&WIDTH=150&HEIGHT=150&styles="></p>
<ol>
<li>SRTMPlus 90 Meter DEM
The image below doesn't make for a very good basemap OR a very good DEM for analytical purposes since all the values are scaled to an 8-bit color depth. However, JPL also offers this layer as an integer (16bit) GeoTIFF (Use <em>format=image/geotiff</em> and <em>styles=short_int</em>), so this can be a valuable way to quickly grab a DEM for a given region.
<strong> Online Resource URL </strong> : <em>http://wms.jpl.nasa.gov/wms.cgi?</em> </li>
</ol>
<p><strong> Layer Names </strong> : <em>srtmplus</em>
<img alt="" src="http://wms.jpl.nasa.gov/wms.cgi?request=GetMap&LAYERS=srtmplus&FORMAT=image/png&SRS=EPSG:4326&BBOX=-124.1,35.4,-118.1,41.4&WIDTH=150&HEIGHT=150&styles="></p>
<hr>
<p>If you'd like to view these layers interactively, here's a mapserver application which "cascades" the above WMS layers through a single interface. If you're interested in setting up these layers in a mapserver application, check out the <a href="http://perrygeo.net/download/fav_wms.txt">WMS Mapfile </a> for some examples.</p>StarSpan for vector-on-raster analysis2006-02-17T00:00:00-07:002006-02-17T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2006-02-17:/starspan-for-vector-on-raster-analysis.html<p>It's amazing how many excellent open source GIS applications are out there just waiting to be discovered. I've been working with open source GIS for over 3 years now and I still find new and interesting software on a regular basis. The latest "Why haven't I heard of this before …</p><p>It's amazing how many excellent open source GIS applications are out there just waiting to be discovered. I've been working with open source GIS for over 3 years now and I still find new and interesting software on a regular basis. The latest "Why haven't I heard of this before?" discovery came from the GRASS mailing list discussion on <a href="http://starspan.casil.ucdavis.edu/">StarSpan</a>, a tool developed at University of California at Davis "<em>designed to bridge the raster and vector worlds of spatial analysis using fast algorithms for pixel level extraction from geometry features</em>". </p>
<p>Our research project for the <a href="http://ebm.nceas.ucsb.edu">Ecosystem Based Management group at UCSB</a> is in need of this exact tool in order to extract raster statistics based on a vector watersheds layer. ArcGIS and GRASS both have <em>some</em> of the capabilities we need through the Zonal_Statistics and v.rast.stats functions respectively. However they have their limitations and neither really handles categorical raster summaries by polygon. StarSpan looks like a more efficient option in terms of speed, scriptability and capabilities.</p>
<p>Installation is very smooth. It requires a recent version of GDAL (>= 1.2.6) and GEOS (>= 2.1.2). Once the dependencies are met, compilation on a unix system is as easy as configure, make, make install (There are also Windows binaries available). There is a single command line interface for all the functionality and StarSpan is able to handle all <a href="http://www.gdal.org/formats_list.html">GDAL rasters</a> and <a href="http://www.gdal.org/ogr/ogr_formats.html">OGR vectors</a>.</p>
<p>For classified rasters such as a land cover raster, we'd like to get the number of pixels for each landcover class by watershed. StarSpan creates a nice, normalized csv with three columns; The vector feature id, the raster value, and the number of pixels. There will be up to (number of features X number of classes) rows.</p>
<blockquote>
<p>starspan --vector watershed.shp --raster landcover.tif --count-by-class landcover_by_watershed.csv</p>
</blockquote>
<p>In order to find the percentage of a given raster class for each watershed, you can bring the csv into a relational database and do a quick SQL query. Here's an example of finding the percentage of cropland (class value is 12) for each watershed:</p>
<blockquote>
<div class="highlight"><pre><span></span><code><span class="nt">SELECT</span><span class="w"> </span><span class="nt">t</span><span class="p">.</span><span class="nc">fid</span><span class="w"> </span><span class="nt">AS</span><span class="w"> </span><span class="nt">fid</span><span class="o">,</span><span class="w"> </span><span class="o">(</span><span class="nt">t</span><span class="p">.</span><span class="nc">count</span><span class="p">::</span><span class="nd">numeric</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="nt">s</span><span class="p">.</span><span class="nc">total</span><span class="p">::</span><span class="nd">numeric</span><span class="o">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nt">100</span><span class="w"> </span><span class="nt">AS</span><span class="w"> </span><span class="nt">percentage_cropland</span><span class="w"></span>
<span class="nt">FROM</span><span class="w"> </span><span class="nt">landcover_by_watershed</span><span class="w"> </span><span class="nt">t</span><span class="o">,</span><span class="w"></span>
<span class="w"> </span><span class="o">(</span><span class="nt">SELECT</span><span class="w"> </span><span class="nt">fid</span><span class="o">,</span><span class="w"> </span><span class="nt">sum</span><span class="o">(</span><span class="nt">count</span><span class="o">)</span><span class="w"> </span><span class="nt">AS</span><span class="w"> </span><span class="nt">total</span><span class="w"> </span>
<span class="w"> </span><span class="nt">FROM</span><span class="w"> </span><span class="nt">lancover_by_watershed</span><span class="w"> </span>
<span class="w"> </span><span class="nt">GROUP</span><span class="w"> </span><span class="nt">BY</span><span class="w"> </span><span class="nt">fid</span><span class="o">)</span><span class="w"> </span><span class="nt">as</span><span class="w"> </span><span class="nt">s</span><span class="w"> </span>
<span class="nt">WHERE</span><span class="w"> </span><span class="nt">t</span><span class="p">.</span><span class="nc">fid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nt">s</span><span class="p">.</span><span class="nc">fid</span><span class="w"></span>
<span class="nt">AND</span><span class="w"> </span><span class="nt">t</span><span class="p">.</span><span class="nc">class</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nt">12</span><span class="o">;</span><span class="w"></span>
</code></pre></div>
</blockquote>
<p>Which gives us...</p>
<blockquote>
<div class="highlight"><pre><span></span><code> fid | percentage_cropland
-----+------------------------------------------------
1 | 28.571428571428571429
2 | 71.428571428571428571
3 | 36.363636363636363636
4 | 63.636363636363636364
</code></pre></div>
</blockquote>
<p>For continuous surfaces such as elevations and slopes, we'll need to get quantitative statistics of those rasters by watershed. StarSpan can easily generate averages, mode, standard deviation, min and max:</p>
<blockquote>
<p>starspan --vector watershed.shp --raster slope.tif --stats slope_stats.csv avg mode stdev min max</p>
</blockquote>
<p>Which outputs a csv with one row per feature identified by feature id and each stat as a column:</p>
<blockquote>
<div class="highlight"><pre><span></span><code>FID,numPixels,avg_Band1,mode_Band1,stdev_Band1,min_Band1,max_Band1
1,25921,34.694822,38.917000,14.491952,0.347465,66.241035
2,21755,7.965552,0.000000,5.484245,0.000000,42.017155
...
</code></pre></div>
</blockquote>
<p>While I can confirm that these small test cases work very quickly and give us pretty much the exact outputs we need, it will be interesting to see how well it stacks up to ArcGIS and GRASS when it comes to cranking out the big datasets. We'll likely try all three methods and I'll make sure to post the results. </p>
<p>Oh and the comparison between StarSpan and GRASS may become at moot point in the future since there is talk about integrating it with the GRASS project. While a GRASS module would be nice, not everyone has GRASS installed so I would hope the stand-alone version is still maintained since it can deal with pretty much any vector or raster data source.</p>Forest Service plans largest land sale in decades2006-02-13T00:00:00-07:002006-02-13T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2006-02-13:/forest-service-plans-largest-land-sale-in-decades.html<p>The <a href="http://archives.seattletimes.nwsource.com/cgi-bin/texis.cgi/web/vortex/display?slug=landsales11m&date=20060211">Seattle Times is reporting</a> some details on President Bush's "Secure Rural Schools Initiative" which involves the largest US Forest Service land sales in decades in order to pay for rural school and roads. Some 309,421 acres will be up for sale which amounts to only 0.16 % of …</p><p>The <a href="http://archives.seattletimes.nwsource.com/cgi-bin/texis.cgi/web/vortex/display?slug=landsales11m&date=20060211">Seattle Times is reporting</a> some details on President Bush's "Secure Rural Schools Initiative" which involves the largest US Forest Service land sales in decades in order to pay for rural school and roads. Some 309,421 acres will be up for sale which amounts to only 0.16 % of the 190 million acres managed by the Forest Service. Most of the parcels are isolated areas bordering private land. </p>
<p>Details and some limited maps of the initiative can be found <a href="http://www.fs.fed.us/land/staff/rural_schools.shtml">here</a> as well as <a href="http://www.fs.fed.us/land/staff/spd.html"> a listing of forest service land</a> that are potentially eligible for sale. </p>
<p>No doubt environmentalists, developers and timber companies will be scrutinizing these pieces of land in the coming months. More details and maps should be available around Feb 28th. Since the Forest Service is required to request public input on the sales, it would be nice if they could provide a GIS version of the maps for download... A public web GIS would inform the process immensely. I'll be keeping an eye out for some detailed GIS data. Let me know if you know of a good source.</p>GDAL-based DEM utilities2006-02-08T00:00:00-07:002006-02-08T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2006-02-08:/gdal-based-dem-utilities.html<div class="alert alert-error">These DEM tools have been incorporated into GDAL. The code referenced on this page is no longer maintained and I'd highly recommend using <a href="http://www.gdal.org/gdaldem.html">gdaldem</a> instead.</div>
<p>A few months ago, I began looking for some efficient command-line tools to analyze and visualize DEMs. I typically use GRASS for such tasks but …</p><div class="alert alert-error">These DEM tools have been incorporated into GDAL. The code referenced on this page is no longer maintained and I'd highly recommend using <a href="http://www.gdal.org/gdaldem.html">gdaldem</a> instead.</div>
<p>A few months ago, I began looking for some efficient command-line tools to analyze and visualize DEMs. I typically use GRASS for such tasks but GRASS only works with it's native raster format. Sure you can import/export to common formats but that's not as efficient as a single command line tool that could work with the native DEM format, run on systems without GRASS installed and provide easy scriptablity. </p>
<p>Not having found anything that fit the bill, I decided to port some of the common GRASS DEM modules to C++ using the GDAL libraries. For someone with very little experience with C++, this was surprisingly not that difficult though I learned quite alot along the way. The result: 3 command line utilities to generate hillshades, slope and aspect maps and 1 excellent utility contributed by Paul Surgeon to apply color ramping to a DEM.</p>
<h3>Installation</h3>
<h4>Requirements</h4>
<p>I built these utilities on Ubuntu Linux. I admittedly have no idea how to compile them on Windows but some folks have confirmed that the hillshade code compiles under VC++. So to get these running under Linux (and presumably other unixes), there are very minimal requirements</p>
<ol>
<li>GDAL shared libraries </li>
<li>GNU C++ Compiler</li>
</ol>
<h4>Download</h4>
<p>Get the <a href="/download/gdaldemtools_20060207.zip">current source</a> and unzip it. <em><strong>EDIT </strong></em>: This code is now avaible through my SVN repository : <a href="http://perrygeo.googlecode.com/svn/trunk/demtools/">http://perrygeo.googlecode.com/svn/trunk/demtools/</a>,</p>
<h4>Compiling</h4>
<p>Alas there is no makefile but installation should be fairly painless. To compile the source code under linux, the following commands should take care of it:</p>
<div class="highlight"><pre><span></span><code>g++ hillshade.cpp -lgdal -o hillshade
g++ color-relief.cxx -lgdal -o color-relief
g++ aspect.cpp -lgdal -o aspect
g++ slope.cpp -lgdal -o slope
</code></pre></div>
<p>The four binaries can then be placed wherever your local binaries reside (typically /usr/local/bin)</p>
<h3>Examples</h3>
<h4>The original DEM</h4>
<p>In this particular example the input DEM is a GeoTIFF but these utilities can use any <a href="http://gdal.maptools.org/formats_list.html">GDAL-supported raster source</a>.</p>
<p><img alt="" src="/assets/img/dem/dem.jpg"></p>
<h4>Slope</h4>
<p>This command will take a DEM raster and output a 32-bit GeoTiff with slope values. You have the option of specifying the type of slope value you want: degrees or percent slope. In cases where the horizontal units differ from the vertical units, you can also supply a scaling factor.</p>
<div class="highlight"><pre><span></span><code>slope dem.tif slope.tif
</code></pre></div>
<p><img alt="" src="/assets/img/dem/slope.jpg"></p>
<h4>Aspect</h4>
<p>This command outputs a 32-bit GeoTiff with values between 0 and 360 representing the azimuth of the terrain.</p>
<div class="highlight"><pre><span></span><code>aspect dem.tif aspect.tif
</code></pre></div>
<p><img alt="" src="/assets/img/dem/aspect.jpg"></p>
<h4>Hillshade</h4>
<p>This command outputs an 8-bit GeoTiff with a nice shaded relief effect. It's very useful for visualizing the terrain. You can optionally specify the azimuth and altitude of the light source, a vertical exaggeration factor and a scaling factor to account for differences between vertical and horizontal units.</p>
<div class="highlight"><pre><span></span><code>hillshade dem.tif shade.tif
</code></pre></div>
<p><img alt="" src="/assets/img/dem/shade.jpg"></p>
<h4>Color ramps</h4>
<p>After I posted the hillshade utility to the gdal-dev mailing list, there was some discussion about creating color relief maps to supplement the hillshades. Paul Surgeon took up the challenge and created a gdal-based C++ utility to colorize DEMs (or any other single band raster data sources for that matter). The technique is simple and powerful; by using a text-based color configuration file, you can create any range of color ramps for your data. </p>
<div class="highlight"><pre><span></span><code>color-relief dem.tif scale.txt colordem.tif
</code></pre></div>
<p>Where scale.txt is a text file containting 4 columns per line, the elevation value and the corresponding RGB values:</p>
<div class="highlight"><pre><span></span><code><span class="mf">3500</span><span class="w"> </span><span class="mf">255</span><span class="w"> </span><span class="mf">255</span><span class="w"> </span><span class="mf">255</span><span class="w"></span>
<span class="mf">2500</span><span class="w"> </span><span class="mf">235</span><span class="w"> </span><span class="mf">220</span><span class="w"> </span><span class="mf">175</span><span class="w"></span>
<span class="mf">1500</span><span class="w"> </span><span class="mf">190</span><span class="w"> </span><span class="mf">185</span><span class="w"> </span><span class="mf">135</span><span class="w"></span>
<span class="mf">700</span><span class="w"> </span><span class="mf">240</span><span class="w"> </span><span class="mf">250</span><span class="w"> </span><span class="mf">150</span><span class="w"></span>
<span class="mf">0</span><span class="w"> </span><span class="mf">50</span><span class="w"> </span><span class="mf">180</span><span class="w"> </span><span class="mf">50</span><span class="w"></span>
<span class="o">-</span><span class="mf">32768</span><span class="w"> </span><span class="mf">200</span><span class="w"> </span><span class="mf">230</span><span class="w"> </span><span class="mf">255</span><span class="w"></span>
</code></pre></div>
<p>The colors between the given elevation values are blended smoothly and the result is a nice colorized DEM:
<img alt="" src="/assets/img/dem/colordem.jpg"></p>
<h4>Color Shaded Relief (blending hillshade and colorized DEM)</h4>
<p>There are two ways I've come up with to blend the hillshade and the colorized DEM:</p>
<ol>
<li>
<p>Using GIMP or Photoshop, open both images, copy the shaded relief, paste on top of the color DEM and adjust the opacity in the layers dialog.</p>
</li>
<li>
<p>If you're publishing to the web with Mapserver, just stack the two images in your mapfile and set the TRANSPARENCY for the hillshade to a value between 30 and 70 depending on your preference</p>
</li>
</ol>
<p>Though both methods work nicely, neither is really ideal since they don't generate a georeferenced tiff. You can get around this in the GIMP method by creating a <a href="http://gdal.maptools.org/frmt_various.html#WLD">world file (.tfw)</a> for the output tiff. It might be nice, in the future, to do this step programatically but for now...
<img alt="" src="/assets/img/dem/combine.jpg"></p>
<p>Let me know if you've got any suggestions or comments. The technique for all of these utilities is a simple 3x3 moving window so this code might serve as a good template to develop other raster processing utilities... let me know what you come up with!</p>First thoughts on the Open Source Geospatial Foundation2006-02-04T00:00:00-07:002006-02-04T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2006-02-04:/first-thoughts-on-the-open-source-geospatial-foundation.html<p>Well after a long and productive day in Chicago, the 25 attendees (and a few dozen more from IRC) were able to establish a solid plan for the foundation. Gary Sherman at Spatial Galaxy has <a href="http://spatialgalaxy.net/?p=8">a good overview of the meeting outcome</a> and has set up a very helpful <a href="http://logs.qgis.org/geofoundation/">IRC …</a></p><p>Well after a long and productive day in Chicago, the 25 attendees (and a few dozen more from IRC) were able to establish a solid plan for the foundation. Gary Sherman at Spatial Galaxy has <a href="http://spatialgalaxy.net/?p=8">a good overview of the meeting outcome</a> and has set up a very helpful <a href="http://logs.qgis.org/geofoundation/">IRC log</a> of the meeting and the focus group discussions (Go Gary!). Tyler Mitchell has posted some <a href="http://www1.mapserverfoundation.org/chicago-pics/images.html">photos of the meeting</a>. I attended via IRC and phone for only a few hours so my understanding of the entire meeting is limited but I'll add a few thoughts on what went down.</p>
<p>First of all, the name was decided early on to be the "<em>Open Source Geospatial Foundation</em>". IMO, this name fits very well. Now that Autodesk's open source contribution has <a href="http://www.oreillynet.com/pub/wlg/9055?wlg=yes"> been rebranded</a> from <em>Mapserver Enterprise</em> to <em>MapGuide Open Source</em>, I am glad to see the final chapter in the whole naming debacle! </p>
<p>I was also very interested in the funding discussion. The general consesus seemed to be that the foundation would generate income through sponsorships. The benefits to being a sponsor/supporter of OSGF include official recognition and the obvious PR value in addition to being able to direct your funds to a particular project. It would work something like this: 2/3 of your donation could be directed to a particular software project while 1/3 would go to the foundation itself. Of the 2/3 going to the project, the Project Steering Comittee (PSC) would decide how to best allocate those funds. There was brief discussion of doing some sort of "bounty" system that would allow sponsors to fund a particular feature but this was generally thought to be a bad idea since there are so many aspects of software development that are not "sexy" enough to generate income... like cleaning up and optimizing code, bug fixes, etc. By allowing the PSC to allocate the funds, the focus can be on a solid code base and careful feature additions. Of course those who want to fund specific features can still contract directly with the developers.</p>
<p>One of the ironies of the initial foundation's project membership is that Mapserver (the project that was the center of the original Mapserver Foundation) is not yet a member! While this may seem strange at first, the reasoning is so that the Mapserver community can vote on the matter. Other community-based efforts such as QGIS are likely waiting to hear from their users as well. Once the official statements from the OSGF are released, I suspect there will be a vote from these communities (and others) to decide whether they should join.</p>
<p>The criteria for projects to join the foundation was not entirely clear but it appears that they will be based on the commonalities of the initial projects. Requirements such as licensing and open standards are still foggy but will likely be written in such a way that they don't conflict with any of the initial projects.. a sort of reverse engineering of the criteria if you will. </p>
<p>There were many interesting discussions as far as the implementation of the foundation web presence, the legal protections that would be provided by the foundation, the expected costs of running the foundation, promotion and the structure of the governing board. I'll wait until the official announcement to see how these issues were resolved.</p>
<p>Overall it was an exciting and historic day for open source GIS. Many thanks to all the attendees and IRC participants for all the interesting and productive discussions. The future of the foundation is looking very bright and I look forward to seeing where it's heading in the coming months...</p>
<p>One quick update: Schuyler Erle, who deserves an extra round of thanks for his amazing efforts to keep us IRC attendees informed of the meeting, has some great first-hand <a href="http://mappinghacks.com/index.cgi/2006/02/04#osgeo-foundation">insights on the OSGF</a>.</p>
<p>OK another quick update: The official foundation website will <em>eventually</em> reside at <a href="http://www.osgeo.org">www.osgeo.org</a>. </p>Mexico-US Border Crossing Maps2006-01-24T00:00:00-07:002006-01-24T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2006-01-24:/mexico-us-border-crossing-maps.html<p>Thousands of Mexicans come to the United States every year and, besides the legal troubles of border crossing, they face a tough journey across the desert in order to reach their destination. They have very little information to go on and many die from dehydration as they attempt to find …</p><p>Thousands of Mexicans come to the United States every year and, besides the legal troubles of border crossing, they face a tough journey across the desert in order to reach their destination. They have very little information to go on and many die from dehydration as they attempt to find their way through the vast deserts of the american southwest. </p>
<p>A faith-based organization called <a href="http://www.humaneborders.org/about/about_index.html"> Humane Borders</a> is trying to help the situation. They have produced <a href="http://www.humaneborders.org/news/news4.html">a number of maps</a> documenting town locations, roads, water stations, walking distances, cell phone towers and even places where other immigrants have died along the way. This was made possible in part by GIS software donated by ESRI.</p>
<p>I heard today on CNN with Lou Dobbs that the Mexican government is now printing and distributing these maps to citizens. Though the maps will clearly state "Don't Do It! It's Hard! There's Not Enough Water!", critics are saying the maps aid criminals and will enourage illegal aliens to cross the border. Others have pointed out that, from an economic standpoint, this may benefit the US border patrol since so much of their budget is devoted to aiding sick and injured imigrants and properly taking care of the dead. Humane Borders is hoping to make people aware of the risks so that they can either choose not to go or be better prepared should they decide to cross.</p>
<p>In any case, it is an interesting example of how geographic information is still so important (and controversial) in our society.</p>Geocoding an address list to shapefile2006-01-20T00:00:00-07:002006-01-20T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2006-01-20:/geocoding-an-address-list-to-shapefile.html<p>Most commercial software comes with fairly elaborate geocoding engines and there are nice geocoding services on the web that can do one-at-a-time geocoding but the <a href="http://www.spatiallyadjusted.com/2006/01/20/batch-geocode-tabular-address-data-via-your-web-browser/">recent post</a> at Spatially Adjusted pointed out a great free resource for batch geocoding named, conveniently enough, <a href="http://www.batchgeocode.com/">Batch Geocode</a>. Just give it a list of …</p><p>Most commercial software comes with fairly elaborate geocoding engines and there are nice geocoding services on the web that can do one-at-a-time geocoding but the <a href="http://www.spatiallyadjusted.com/2006/01/20/batch-geocode-tabular-address-data-via-your-web-browser/">recent post</a> at Spatially Adjusted pointed out a great free resource for batch geocoding named, conveniently enough, <a href="http://www.batchgeocode.com/">Batch Geocode</a>. Just give it a list of tab or pipe delimited addresses and it outputs a table with your original data plus a lat/long for every row.</p>
<p>I have been working on a python script to convert text files into point shapefiles and thought this would be a great chance to put it to work. The only dependency is a recent version of python with the ogr module (see <a href="http://fwtools.maptools.org">FWTools</a> for an easy to install package for windows or linux).</p>
<p>First, I take a list of cities and feed it to batchgeocode.com (a very nice feature is that the yahoo geocoder, on which batchgeocode is based, does not <em>require</em> street level addresses):</p>
<blockquote>
<p>City|State
Santa Barbara|CA
Arcata|CA
New Milford|CT
Blacksburg|VA</p>
</blockquote>
<p>After running the geocoder, I get back a table with lat/longs:</p>
<blockquote>
<div class="highlight"><pre><span></span><code>City|State|lat|long|precision
Santa Barbara|CA|34.419769|-119.696747|city
Arcata|CA|40.866261|-124.081673|city
New Milford|CT|41.576599|-73.408821|city
Blacksburg|VA|37.229359|-80.413963|city
</code></pre></div>
</blockquote>
<p>Copy and paste that into a text file and add a second header row that defines the data type for each column. It would be possible to autodetect the column types but there are cases where a string of numeric digits should be kept as a string (for instance the zipcode <em>06776</em> would become <em>6776</em> if it was read as an integer).The possible column types are <em>string, integer,real, x</em> and <em>y</em> with x and y representing the coordinates.</p>
<blockquote>
<div class="highlight"><pre><span></span><code>City|State|lat|long|precision
string|string|y|x|string
Santa Barbara|CA|34.419769|-119.696747|city
Arcata|CA|40.866261|-124.081673|city
New Milford|CT|41.576599|-73.408821|city
Blacksburg|VA|37.229359|-80.413963|city
</code></pre></div>
</blockquote>
<p>Now run the <em>txt2shp.py</em> utility. The input and output parameters are self-explanatory and the d parameter defines the string used as a delimiter. Notice that the syntax follows the GRASS standard of <em>parameter=value</em>:</p>
<blockquote>
<div class="highlight"><pre><span></span><code>txt2shp.py input=cities.txt output=cities.shp d='|'
</code></pre></div>
</blockquote>
<p>And now you've got a shapefile of the geocoded cities! </p>
<p><img alt="Cities Shapefile" src="/assets/img/cities.png"></p>
<p>The txt2shp.py script can be downloaded <a href="http://perrygeo.net/download/txt2shp.py"> here</a>. Try it out and let me know how it's working for you.</p>
<p><strong>Update:</strong> In order to generate a .prj file for your output shapefile, you can use the epsg_tr.py utility if you know the EPSG code. Batch Geocoder returns everything in lat/long (presumably with a WGS84 datum?) so you can use EPSG code 4326:</p>
<blockquote>
<p>epsg_tr.py -wkt 4326 > cities.prj</p>
</blockquote>KML to Shapefile Scripting2005-12-11T00:00:00-07:002005-12-11T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2005-12-11:/kml-to-shapefile-scripting.html<p>Christian Spanring has been doing some great work with Google Earth's KML data format. The latest offering is a fairly robust <a href="http://spanring.name/blog/2005/12/11/kml2gml/">XSLT stylesheet for transforming KML into GML</a>. </p>
<p>In the article, he mentions ogr2ogr as a method to convert GML to shapefiles so I immediately had to try it out …</p><p>Christian Spanring has been doing some great work with Google Earth's KML data format. The latest offering is a fairly robust <a href="http://spanring.name/blog/2005/12/11/kml2gml/">XSLT stylesheet for transforming KML into GML</a>. </p>
<p>In the article, he mentions ogr2ogr as a method to convert GML to shapefiles so I immediately had to try it out! I came up with a simple bash script, <strong>kml2shp.sh</strong>, that provides a quick command-line interface:</p>
<blockquote>
<p>kml2shp.sh input.kml output.shp</p>
</blockquote>
<p>Here's the step-by-step:</p>
<ol>
<li>
<p>Make sure you have xsltproc (the command-line xslt processor) and OGR installed.</p>
</li>
<li>
<p>Copy the <a href="http://spanring.name/blog/wp-content/files/kml2gml.xsl">xslt stylesheet </a> to /usr/local/share/kml2gml/</p>
</li>
<li>
<p>Create the kml2shp.sh script below (make sure to change the paths to reflect your system, chmod +x it, etc)</p>
</li>
</ol>
<blockquote>
<h1>!/bin/bash</h1>
</blockquote>
<p>if [ $# -ne 2 ]; then
echo "usage: kml2shp.sh input.kml output.shp"
exit
fi</p>
<p>echo "Processing KML file"
sed 's/ xmlns=\"http\:\/\/earth.google.com\/kml\/2.0\"//' $1 > /tmp/temp.kml
xsltproc -o /tmp/temp.gml /usr/local/share/kml2gml/kml2gml.xsl /tmp/temp.kml</p>
<p>echo "Creating new Shapefile"
ogr2ogr $2 /tmp/temp.gml myFeature</p>
<p>echo "Cleaning up temp files"
rm /tmp/temp.gml
rm /tmp/temp.kml</p>
<p>echo "New shapefile has been created:"
echo $2</p>
<p>Now as far as I can tell, the XSLT is fairly robust although I've only tested it on a few datasets. The wrapper script, however, could use alot of work. Type and error checking would be nice for starters and a better method to remove the xml namespace might be necessary. This is really meant as a starting point.</p>
<p>One potential problem with this technique is that you will most likely get a 3D shapefile (x, y AND z coordinates). Many applications can handle 3D shapefiles but some (QGIS, others?) cannot at the present time. Once the geometry type is known, one could always specify the ogr2ogr "-nlt" parameter to force 2D output. But that's all for now... let me know if anyone has any suggestions on improving this technique.</p>Tissot Indicatrix - Examining the distortion of map projections2005-12-11T00:00:00-07:002005-12-11T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2005-12-11:/tissot-indicatrix-examining-the-distortion-of-map-projections.html<p>The Tissot Indicatrix is a valuable tool for showing the distortions caused by map projections. It is essentially a series of imaginary polygons that represent perfect circles of equal area on a 3D globe. When projected onto a 2D map, their shape, size and/or angles will be distorted accordingly …</p><p>The Tissot Indicatrix is a valuable tool for showing the distortions caused by map projections. It is essentially a series of imaginary polygons that represent perfect circles of equal area on a 3D globe. When projected onto a 2D map, their shape, size and/or angles will be distorted accordingly allowing you to quickly assess the projection's accuracy for a given part of the globe. </p>
<p>I've seen great Tissot diagrams in text books but I wanted to create the indicatrix as a polygon dataset so that I could project and overlay it with other data in a GIS. To do this I wrote a python script using the OGR libraries, which I will revist in a minute. But first the visually interesting part:</p>
<p>Here is a world countries shapefile overlaid with the Tissot circles in geographic (unprojected lat-long) coordinates:</p>
<p><img alt="Latlong tissot" src="/assets/img/latlong.png"></p>
<p>Next I reprojected the datasets to the Mercator projection using ogr2ogr:</p>
<div class="highlight"><pre><span></span><code>ogr2ogr -t_srs "+proj=merc" countries_merc.shp countries_simpl.shp countries_simpl
ogr2ogr -t_srs "+proj=merc" tissot_merc.shp tissot.shp tissot
</code></pre></div>
<p>Note that the angles are perfectly preserved (the trademark feature of the Mercator projection) but the size is badly distorted.</p>
<p><img alt="Mercator tissot" src="/assets/img/mercator.png"></p>
<p>Now lets try Lambert Azimuthal Equal Area (in this case the US National Atlas standard projection - EPSG code 2163). </p>
<div class="highlight"><pre><span></span><code>ogr2ogr -t_srs "epsg:2163" countries_lambert.shp countries_simpl.shp countries_simpl
ogr2ogr -t_srs "epsg:2163" tissot_lambert.shp tissot.shp tissot
</code></pre></div>
<p>This is a great projection for preserving area but get outside the center and shapes become badly distorted:</p>
<p><img alt="LAEA tissot" src="/assets/img/lambert.png"></p>
<p>The best way to experiment with this is to bring the tissot.shp file into ArcMap (or another program that supports on-the-fly projection) and play with it in real time. The distortions of every projection just leap off the screen...</p>
<p>OK, now for the geeky part. Here's the python/OGR script used to create the tissot shapefile. The basic process is to lay out a grid of points across the globe in latlong, loop through the points and reproject each one to an orthographic projection centered directly on the point, buffer it, then reproject to latlong. The end result is a latlong shapefile representing circles of equal area on a globe.</p>
<div class="highlight"><pre><span></span><code> <span class="c1">#!/usr/bin/env python</span>
<span class="c1"># Tissot Circles</span>
<span class="c1"># Represent perfect circles of equal area on a globe</span>
<span class="c1"># but will appear distorted in ANY 2d projection.</span>
<span class="c1"># Used to show the size, shape and directional distortion</span>
<span class="c1"># by Matthew T. Perry</span>
<span class="c1"># 12/10/2005</span>
<span class="kn">import</span> <span class="nn">ogr</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">osr</span>
<span class="n">output</span> <span class="o">=</span> <span class="s1">'tissot.shp'</span>
<span class="n">debug</span> <span class="o">=</span> <span class="kc">False</span>
<span class="c1"># Create the Shapefile</span>
<span class="n">driver</span> <span class="o">=</span> <span class="n">ogr</span><span class="o">.</span><span class="n">GetDriverByName</span><span class="p">(</span><span class="s1">'ESRI Shapefile'</span><span class="p">)</span>
<span class="k">if</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">exists</span><span class="p">(</span><span class="n">output</span><span class="p">):</span>
<span class="n">driver</span><span class="o">.</span><span class="n">DeleteDataSource</span><span class="p">(</span><span class="n">output</span><span class="p">)</span>
<span class="n">ds</span> <span class="o">=</span> <span class="n">driver</span><span class="o">.</span><span class="n">CreateDataSource</span><span class="p">(</span><span class="n">output</span><span class="p">)</span>
<span class="n">layer</span> <span class="o">=</span> <span class="n">ds</span><span class="o">.</span><span class="n">CreateLayer</span><span class="p">(</span><span class="n">output</span><span class="p">,</span> <span class="n">geom_type</span><span class="o">=</span><span class="n">ogr</span><span class="o">.</span><span class="n">wkbPolygon</span><span class="p">)</span>
<span class="c1"># Set up spatial reference systems</span>
<span class="n">latlong</span> <span class="o">=</span> <span class="n">osr</span><span class="o">.</span><span class="n">SpatialReference</span><span class="p">()</span>
<span class="n">ortho</span> <span class="o">=</span> <span class="n">osr</span><span class="o">.</span><span class="n">SpatialReference</span><span class="p">()</span>
<span class="n">latlong</span><span class="o">.</span><span class="n">ImportFromProj4</span><span class="p">(</span><span class="s1">'+proj=latlong'</span><span class="p">)</span>
<span class="c1"># For each grid point, reproject to ortho centered on itself,</span>
<span class="c1"># buffer by 640,000 meters, reproject back to latlong,</span>
<span class="c1"># and output the latlong ellipse to shapefile</span>
<span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="o">-</span><span class="mi">165</span><span class="p">,</span><span class="mi">180</span><span class="p">,</span><span class="mi">30</span><span class="p">):</span>
<span class="k">for</span> <span class="n">y</span> <span class="ow">in</span> <span class="nb">range</span> <span class="p">(</span><span class="o">-</span><span class="mi">60</span><span class="p">,</span><span class="mi">90</span><span class="p">,</span><span class="mi">30</span><span class="p">):</span>
<span class="n">f</span><span class="o">=</span> <span class="n">ogr</span><span class="o">.</span><span class="n">Feature</span><span class="p">(</span><span class="n">feature_def</span><span class="o">=</span><span class="n">layer</span><span class="o">.</span><span class="n">GetLayerDefn</span><span class="p">())</span>
<span class="n">wkt</span> <span class="o">=</span> <span class="s1">'POINT(</span><span class="si">%f</span><span class="s1"> </span><span class="si">%f</span><span class="s1">)'</span> <span class="o">%</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">ogr</span><span class="o">.</span><span class="n">CreateGeometryFromWkt</span><span class="p">(</span><span class="n">wkt</span><span class="p">)</span>
<span class="n">p</span><span class="o">.</span><span class="n">AssignSpatialReference</span><span class="p">(</span><span class="n">latlong</span><span class="p">)</span>
<span class="n">proj</span> <span class="o">=</span> <span class="s1">'+proj=ortho +lon_0=</span><span class="si">%f</span><span class="s1"> +lat_0=</span><span class="si">%f</span><span class="s1">'</span> <span class="o">%</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="p">)</span>
<span class="n">ortho</span><span class="o">.</span><span class="n">ImportFromProj4</span><span class="p">(</span><span class="n">proj</span><span class="p">)</span>
<span class="n">p</span><span class="o">.</span><span class="n">TransformTo</span><span class="p">(</span><span class="n">ortho</span><span class="p">)</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">Buffer</span><span class="p">(</span><span class="mi">640000</span><span class="p">)</span>
<span class="n">b</span><span class="o">.</span><span class="n">AssignSpatialReference</span><span class="p">(</span><span class="n">ortho</span><span class="p">)</span>
<span class="n">b</span><span class="o">.</span><span class="n">TransformTo</span><span class="p">(</span><span class="n">latlong</span><span class="p">)</span>
<span class="n">f</span><span class="o">.</span><span class="n">SetGeometryDirectly</span><span class="p">(</span><span class="n">b</span><span class="p">)</span>
<span class="n">layer</span><span class="o">.</span><span class="n">CreateFeature</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="n">f</span><span class="o">.</span><span class="n">Destroy</span><span class="p">()</span>
<span class="n">ds</span><span class="o">.</span><span class="n">Destroy</span><span class="p">()</span>
</code></pre></div>Processing S57 soundings2005-12-03T00:00:00-07:002005-12-03T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2005-12-03:/processing-s57-soundings.html<p>NOAA Electronic Navigational Charts (ENC) contain (among many other things) depth soundings that can be processed into raster bathymetry grids. The ENC files are available as a huge torrent from geotorrent.org (<a href="http://geotorrent.org/details.php?id=58">http://geotorrent.org/details.php?id=58</a>). </p>
<p>Download this torrent and check readme.txt to find the chart …</p><p>NOAA Electronic Navigational Charts (ENC) contain (among many other things) depth soundings that can be processed into raster bathymetry grids. The ENC files are available as a huge torrent from geotorrent.org (<a href="http://geotorrent.org/details.php?id=58">http://geotorrent.org/details.php?id=58</a>). </p>
<p>Download this torrent and check readme.txt to find the chart of interest:</p>
<blockquote>
<p>Port Hueneme to Santa Barbara|5|2005-10-03|2005-10-03|US5CA65M</p>
</blockquote>
<p>First check out the gdal documentation for s57 files at <a href="http://www.gdal.org/ogr/drv_s57.html">http://www.gdal.org/ogr/drv_s57.html</a>. </p>
<p>Change to the US5CA65M directory and you'll see a .000 file (and maybe .001, .002 etc). Run ogrinfo on the .000 file and you'll see ~ 61 layers, one of which ("SOUNDG") represents the soundings. Let's start by examining the soundings layer:</p>
<div class="highlight"><pre><span></span><code>ogrinfo -summary US5CA65M.000 SOUNDG
</code></pre></div>
<p>We see that there are 43 "features" but since the features are multipoints, there are actually thousands of soundings. The multipoints are 3D so If we convert to a shapefile with ogr2ogr's default settings we loose the 3rd dimension. To solve this, we need to append "25D" to the layer type. Furthermore, the multipoint geometry confuses some applications so we want to split it into a layer with simple 3D point geometries. Luckily there is a SPLIT_MULITPOINT option that must be specified as an environment variable:</p>
<div class="highlight"><pre><span></span><code><span class="k">export</span><span class="w"> </span><span class="n">OGR_S57_OPTIONS</span><span class="o">=</span><span class="s2">"RETURN_PRIMITIVES=ON,RETURN_LINKAGES=ON,LNAM_REFS=ON,SPLIT_MULTIPOINT=ON,ADD_SOUNDG_DEPTH=ON"</span><span class="w"> </span>
<span class="n">ogr2ogr</span><span class="w"> </span><span class="o">-</span><span class="n">nlt</span><span class="w"> </span><span class="n">POINT25d</span><span class="w"> </span><span class="n">test3</span><span class="o">.</span><span class="n">shp</span><span class="w"> </span><span class="n">US5CA65M</span><span class="o">.</span><span class="mi">000</span><span class="w"> </span><span class="n">SOUNDG</span><span class="w"></span>
</code></pre></div>
<p>Now we get ~ 3000 3D points with the depth added as an attribute for good measure.</p>
<p>Now bring these into grass and create a raster:</p>
<div class="highlight"><pre><span></span><code><span class="n">v</span><span class="p">.</span><span class="n">in</span><span class="p">.</span><span class="n">ogr</span><span class="w"> </span><span class="o">-</span><span class="n">zo</span><span class="w"> </span><span class="n">dsn</span><span class="o">=</span><span class="n">test3</span><span class="p">.</span><span class="n">shp</span><span class="w"> </span><span class="k">output</span><span class="o">=</span><span class="n">soundg</span><span class="w"> </span><span class="n">layer</span><span class="o">=</span><span class="n">test3</span><span class="w"></span>
<span class="n">v</span><span class="p">.</span><span class="n">info</span><span class="w"> </span><span class="n">soundg</span><span class="w"></span>
<span class="n">g</span><span class="p">.</span><span class="n">region</span><span class="w"> </span><span class="n">vect</span><span class="o">=</span><span class="n">soundg</span><span class="w"> </span><span class="n">nsres</span><span class="o">=</span><span class="mf">0.001</span><span class="w"> </span><span class="n">ewres</span><span class="o">=</span><span class="mf">0.001</span><span class="w"></span>
<span class="n">v</span><span class="p">.</span><span class="n">surf</span><span class="p">.</span><span class="n">rst</span><span class="w"> </span><span class="k">input</span><span class="o">=</span><span class="n">soundg</span><span class="w"> </span><span class="n">elev</span><span class="o">=</span><span class="n">bathy</span><span class="w"> </span><span class="n">layer</span><span class="o">=</span><span class="mh">0</span><span class="w"></span>
<span class="n">r</span><span class="p">.</span><span class="n">info</span><span class="w"> </span><span class="n">bathy</span><span class="w"></span>
</code></pre></div>
<p>since depths actually show up as positive elevations, we want to multiply the grid by -1</p>
<div class="highlight"><pre><span></span><code>r.mapcalc sb_bathy=bathy*-1
</code></pre></div>
<p>And of course we want to make some nice shaded relief and contour maps for viewing with QGIS:</p>
<div class="highlight"><pre><span></span><code>r.shaded.relief map=sb_bathy shadedmap=sb_shade altitude=45 azimuth=315
r.contour input=sb_bathy output=sb_contour step=5
qgis &
</code></pre></div>
<p><img alt="s57 results" src="/assets/img/s57.png"></p>
<p>From the screenshot, we see the pits and spikes from potential outliers so we might want to go back and adjust the tension and smoothing on the raster creation (the v.surf.rst command).</p>The new blog2005-12-03T00:00:00-07:002005-12-03T00:00:00-07:00Matthew T. Perrytag:www.perrygeo.com,2005-12-03:/the-new-blog.html<p>Well I finally got around to installing some real blogging software. SimplePHP Blog was just not cutting it and WordPress looks like a healthy option. So far I've been really impressed! Let me know if you have any troubles accessing it...</p>