-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathexamining-sqlalchemy-memory-usage.html
150 lines (144 loc) · 13.5 KB
/
examining-sqlalchemy-memory-usage.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
<!DOCTYPE html>
<html lang="en">
<head>
<link href='//fonts.googleapis.com/css?family=Source+Sans+Pro:300,400,700,400italic' rel='stylesheet' type='text/css'>
<link rel="stylesheet" type="text/css" href="http://projects.curiousllc.com/theme/css/style.min.css">
<link rel="stylesheet" type="text/css" href="http://projects.curiousllc.com/theme/css/pygments.min.css">
<link rel="stylesheet" type="text/css" href="http://projects.curiousllc.com/theme/css/font-awesome.min.css">
<link href="http://projects.curiousllc.com/feeds/all.atom.xml" type="application/atom+xml" rel="alternate" title="Working with hardware Atom">
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="robots" content="" />
<meta name="author" content="ryanday" />
<meta name="description" content="Locate large memory structure in SQLAlchemy" />
<meta name="keywords" content="python, sqlalchemy">
<meta property="og:site_name" content="Working with hardware"/>
<meta property="og:title" content="Examining SQLAlchemy Memory Usage"/>
<meta property="og:description" content="Locate large memory structure in SQLAlchemy"/>
<meta property="og:locale" content="en_US"/>
<meta property="og:url" content="http://projects.curiousllc.com/examining-sqlalchemy-memory-usage.html"/>
<meta property="og:type" content="article"/>
<meta property="article:published_time" content="2015-03-20 18:22:00-04:00"/>
<meta property="article:modified_time" content=""/>
<meta property="article:author" content="http://projects.curiousllc.com/author/ryanday.html">
<meta property="article:section" content="programming"/>
<meta property="article:tag" content="python"/>
<meta property="article:tag" content="sqlalchemy"/>
<meta property="og:image" content="https://avatars3.githubusercontent.com/u/6845642?v=3&s=200"> <title>Working with hardware – Examining SQLAlchemy Memory Usage</title>
</head>
<body>
<aside>
<div>
<a href="http://projects.curiousllc.com">
<img src="https://avatars3.githubusercontent.com/u/6845642?v=3&s=200" alt="" title="">
</a>
<h1><a href="http://projects.curiousllc.com"></a></h1>
<p></p>
<nav>
<div style="text-align: left; font-size: 0.9em">At Curious, we connect Things to the Cloud. <br/><br/>
If you would like to launch a product, or connect your product to the Cloud, drop us a line!</div>
<ul class="list">
<li><a href="mailto:[email protected]" target="_blank">[email protected]</a></li>
<li><a href="http://curiousllc.com/" target="_blank">Curious LLC</a></li>
<li><a href="http://www.meetup.com/NOVA-Python" target="_blank">Nova Python</a></li>
<li><a href="http://www.meetup.com/Golang-DC/" target="_blank">Golang DC</a></li>
</ul>
</nav>
<ul class="social">
<li><a class="sc-You can add links in your config file" href="#" target="_blank"><i class="fa fa-You can add links in your config file"></i></a></li>
<li><a class="sc-Another social link" href="#" target="_blank"><i class="fa fa-Another social link"></i></a></li>
</ul>
</div>
</aside>
<main>
<article>
<header>
<h1 id="examining-sqlalchemy-memory-usage">Examining SQLAlchemy Memory Usage</h1>
<p>
Posted on Fri 20 March 2015 in <a href="http://projects.curiousllc.com/category/programming.html">programming</a> by <a href="http://projects.curiousllc.com/author/ryanday.html">ryanday</a>
• Tagged with
<a href="http://projects.curiousllc.com/tag/python.html">python</a>, <a href="http://projects.curiousllc.com/tag/sqlalchemy.html">sqlalchemy</a>
</p>
</header>
<div>
<p>I started having segfaults in my celery processes. These were incredibly difficult to debug because they didn’t always happen, and never happened in development. Fortunately they were happening in certain places and I was able to guess where the problem was. However, I didn’t understand just how severe the problem was.</p>
<p>I use SQLAlchemy in my Flask apps. I typically issue queries similar to</p>
<p>query_result = db.session.query(MyModel).first()</p>
<p>I knew I was running out of memory on my entry level Digital Ocean droplets.
That is when the segfaults started. I stumbled across <a class="reference external" href="https://pypi.python.org/pypi/memory_profiler">Memory Profiler</a> and
created a quick test program to profile a single function that was causing problems.</p>
<div class="highlight"><pre><span class="n">Line</span> <span class="c"># Mem usage Increment Line Contents</span>
<span class="o">================================================</span>
<span class="mi">56</span> <span class="mf">46.0</span> <span class="n">MiB</span> <span class="mf">0.0</span> <span class="n">MiB</span> <span class="nd">@profile</span>
<span class="mi">57</span> <span class="k">def</span> <span class="nf">get_bounced_emails</span><span class="p">():</span>
<span class="mi">58</span> <span class="mf">48.2</span> <span class="n">MiB</span> <span class="mf">2.2</span> <span class="n">MiB</span> <span class="n">emails</span> <span class="o">=</span> <span class="n">db</span><span class="o">.</span><span class="n">session</span><span class="o">.</span><span class="n">query</span><span class="p">(</span><span class="n">EmailAddress</span><span class="p">)</span><span class="o">.</span>\
<span class="mi">59</span> <span class="mf">48.3</span> <span class="n">MiB</span> <span class="mf">0.0</span> <span class="n">MiB</span> <span class="n">join</span><span class="p">(</span><span class="n">send_history</span><span class="p">,</span> <span class="n">send_history</span><span class="o">.</span><span class="n">c</span><span class="o">.</span><span class="n">email_id</span> <span class="o">==</span> <span class="n">EmailAddress</span><span class="o">.</span><span class="n">email_id</span><span class="p">)</span><span class="o">.</span>\
<span class="mi">60</span> <span class="mf">372.1</span> <span class="n">MiB</span> <span class="mf">323.9</span> <span class="n">MiB</span> <span class="nb">filter</span><span class="p">(</span><span class="n">send_history</span><span class="o">.</span><span class="n">c</span><span class="o">.</span><span class="n">bounce</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span><span class="o">.</span>\
<span class="mi">61</span> <span class="nb">all</span><span class="p">()</span>
<span class="mi">62</span>
<span class="mi">63</span> <span class="mf">372.1</span> <span class="n">MiB</span> <span class="mf">0.0</span> <span class="n">MiB</span> <span class="n">email_dict</span> <span class="o">=</span> <span class="p">{}</span>
<span class="mi">64</span> <span class="mf">374.1</span> <span class="n">MiB</span> <span class="mf">2.0</span> <span class="n">MiB</span> <span class="k">for</span> <span class="n">email</span> <span class="ow">in</span> <span class="n">emails</span><span class="p">:</span>
<span class="mi">65</span> <span class="mf">374.1</span> <span class="n">MiB</span> <span class="mf">0.0</span> <span class="n">MiB</span> <span class="n">email_dict</span><span class="p">[</span><span class="n">email</span><span class="o">.</span><span class="n">email_address</span><span class="p">]</span> <span class="o">=</span> <span class="bp">True</span>
<span class="mi">66</span>
<span class="mi">67</span> <span class="mf">314.9</span> <span class="n">MiB</span> <span class="o">-</span><span class="mf">59.2</span> <span class="n">MiB</span> <span class="k">del</span> <span class="n">emails</span>
<span class="mi">68</span> <span class="mf">314.9</span> <span class="n">MiB</span> <span class="mf">0.0</span> <span class="n">MiB</span> <span class="k">return</span> <span class="n">email_dict</span>
</pre></div>
<p>The memory profiler package makes it real easy to see the memory consumption of your application line by line. In this function, we are just retrieving a ton of records. They are taking up a lot of memory.</p>
<p>This is where I learned a little bit about SQLAlchemy. This function only needs the email_address field of the record, but I’m retrieving the entire record. So I made a change to simply grab the email_address record.</p>
<div class="highlight"><pre><span class="n">Line</span> <span class="c"># Mem usage Increment Line Contents</span>
<span class="o">================================================</span>
<span class="mi">56</span> <span class="mf">46.0</span> <span class="n">MiB</span> <span class="mf">0.0</span> <span class="n">MiB</span> <span class="nd">@profile</span>
<span class="mi">57</span> <span class="k">def</span> <span class="nf">get_bounced_emails</span><span class="p">():</span>
<span class="mi">58</span> <span class="mf">47.1</span> <span class="n">MiB</span> <span class="mf">1.1</span> <span class="n">MiB</span> <span class="n">emails</span> <span class="o">=</span> <span class="n">db</span><span class="o">.</span><span class="n">session</span><span class="o">.</span><span class="n">query</span><span class="p">(</span><span class="n">EmailAddress</span><span class="o">.</span><span class="n">email_address</span><span class="p">)</span><span class="o">.</span>\
<span class="mi">59</span> <span class="mf">47.1</span> <span class="n">MiB</span> <span class="mf">0.0</span> <span class="n">MiB</span> <span class="n">join</span><span class="p">(</span><span class="n">send_history</span><span class="p">,</span> <span class="n">send_history</span><span class="o">.</span><span class="n">c</span><span class="o">.</span><span class="n">email_id</span> <span class="o">==</span> <span class="n">EmailAddress</span><span class="o">.</span><span class="n">email_id</span><span class="p">)</span><span class="o">.</span>\
<span class="mi">60</span> <span class="mf">81.2</span> <span class="n">MiB</span> <span class="mf">34.1</span> <span class="n">MiB</span> <span class="nb">filter</span><span class="p">(</span><span class="n">send_history</span><span class="o">.</span><span class="n">c</span><span class="o">.</span><span class="n">bounce</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span><span class="o">.</span>\
<span class="mi">61</span> <span class="nb">all</span><span class="p">()</span>
<span class="mi">62</span>
<span class="mi">63</span> <span class="mf">81.2</span> <span class="n">MiB</span> <span class="mf">0.0</span> <span class="n">MiB</span> <span class="n">email_dict</span> <span class="o">=</span> <span class="p">{}</span>
<span class="mi">64</span> <span class="mf">84.2</span> <span class="n">MiB</span> <span class="mf">3.0</span> <span class="n">MiB</span> <span class="k">for</span> <span class="n">email</span> <span class="ow">in</span> <span class="n">emails</span><span class="p">:</span>
<span class="mi">65</span> <span class="mf">84.2</span> <span class="n">MiB</span> <span class="mf">0.0</span> <span class="n">MiB</span> <span class="n">email_dict</span><span class="p">[</span><span class="n">email</span><span class="p">[</span><span class="mi">0</span><span class="p">]]</span> <span class="o">=</span> <span class="bp">True</span>
<span class="mi">66</span>
<span class="mi">67</span> <span class="mf">65.9</span> <span class="n">MiB</span> <span class="o">-</span><span class="mf">18.2</span> <span class="n">MiB</span> <span class="k">del</span> <span class="n">emails</span>
<span class="mi">68</span> <span class="mf">65.9</span> <span class="n">MiB</span> <span class="mf">0.0</span> <span class="n">MiB</span> <span class="k">return</span> <span class="n">email_dict</span>
</pre></div>
<p>Wow. By just grabbing that one field I seriously reduced the memory consumption of this method. I had under estimated just how much overhead there is in pulling the entire record in a query.</p>
<p>Now I take this into account when building queries. Just what do I need form the DB? Then I only grab that.</p>
</div>
<div class="tag-cloud">
<p>
<a href="http://projects.curiousllc.com/tag/python.html">python</a>
<a href="http://projects.curiousllc.com/tag/sqlalchemy.html">sqlalchemy</a>
</p>
</div>
</article>
<footer>
<p>© Ryan Day </p>
<p>Built using <a href="http://getpelican.com" target="_blank">Pelican</a> - <a href="https://github.com/alexandrevicenzi/flex" target="_blank">Flex</a> theme by <a href="http://alexandrevicenzi.com" target="_blank">Alexandre Vicenzi</a></p> </footer>
</main>
<script type="text/javascript">
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-2393416-13', 'auto');
ga('send', 'pageview');
</script>
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"name": "Examining SQLAlchemy Memory Usage",
"headline": "Examining SQLAlchemy Memory Usage",
"datePublished": "2015-03-20 18:22:00-04:00",
"dateModified": "",
"author": {
"@type": "Person",
"name": "ryanday",
"url": "http://projects.curiousllc.com/author/ryanday.html"
},
"image": "https://avatars3.githubusercontent.com/u/6845642?v=3&s=200",
"url": "http://projects.curiousllc.com/examining-sqlalchemy-memory-usage.html",
"description": "Locate large memory structure in SQLAlchemy"
}
</script></body>
</html>