Skip to content

Latest commit

 

History

History
59 lines (30 loc) · 14.2 KB

Career_Trajectory.md

File metadata and controls

59 lines (30 loc) · 14.2 KB

Summary

I started out in organic chemistry and ended up making supercomputers at Intel and NVIDIA.

Since people often ask me about how this happened, or to speak about career paths in general, I figure I should write it down.

The beginning

I was fascinated by quantum mechanics from a young age, probably because it was elusive, not because I had a particular need for it. I remember reading books on heavy-element chemistry and string theory in high school, in those books written by popularizers that hide all the math. This was most useful in chemistry class, mostly because my boredom often led to mischief, and a chemistry lab is a very bad place to conduct mischief (my high school was old and had actual uranium in storage that nobody knew how to dispose of without shutting down the school).

My career plan at that time was to become a lawyer, because I was under the foolish impression that this was a good way to get rich arguing with people. However, I did really well in chemistry and math classes, so I kept taking them, because they were often an easy way to get my grades up. However, it became clear in college that my math skills were more suited for a chemistry major than a math major, which aligned with my disinterest in not learning German, French and/or Russian (as was prescribed for PhDs in math).

University of Washington

I became a chemistry major because of a University of Washington Professor named Nic Epiotis. Epiotis was a character: he told off-topic stories during lectures and disparaged organic chemistry traditions like "actually reading the textbook" and "memorizing things". I recall one day he held up a 1000-page organic chemistry textbook, pinched off about 20 pages and declared, "if you read and understand this section, you can get an A in this class - the rest of the book doesn't matter." As an inherently lazy person, this was my kind of class. I feel in love with the quantum mechanical aspects of organic chemistry, which explain, among other things, why carrots are orange (see e.g. this page to understand the details).

Later in my undergraduate years, I was drawn into computational chemistry research thanks to Professors Wes Borden and Bill Reinhardt, along with Eric Brown, who was a rather charismatic T.A. and my research mentor in Wes' group. Eric filled me with all sorts of ideas, essentially all of which have turned out to be useful. Eric discouraged me from taking any classes in computer science because, "why do you want to learn Java?" and encouraged me to go deep into the theoretical and computational aspects of chemistry, which I did in grad school. At the time, I was proficient in running Gaussian, and dabbled in Mathematica and Matlab, but I couldn't have compiled a C or Fortran program if my life depended on it. I really wanted to undestand how Gaussian produced results and what sort of math was behind B3LYP and MP2, for example.

University of Chicago

I went to the University of Chicago (UC) for grad school because (1) the quality of the faculty, (2) its reputation as an intensely academic place, and (3) the cost-of-living was a lot lower than at Berkeley, which matters when you're going to make $20K/year. The first half of grad school was devoted to intensely theoretical topics with rudimentary computation, primarily done in Matlab. I attended (sometimes for credit) a lot of classes outside the chemistry department to broaden my horizons, including my first proper computing courses, a course on random number generators, and a few weeks of intro to economics with Steve Levitt (the Freakonomics guy). UC was as erudite as I had been promised and I had a wonderful social life full of profoundly nerds and geeks. I also met my wife, who was a different type of nerd than I was.

In my first year of grad school, I applied to fellowships, including the Department of Energy (DOE) Computational Science Graduate Fellowship (henceforth "CSGF"). Applying for this fellowship requires non-trivial essays about computational science, which prompted me to think a lot about big computers and software in ways that I hadn't before. Unfortunately, I did not get the fellowship the first time I applied. Fortunately, in those days, it was possible to apply as a second-year grad student, which I did successfully. At the time, I was only a Matlab programmer (and a bad one at that), but I had dreams of running semidefinite programming algorithms on Blue Gene/L, which I assume appealed to the selection committee more than my grades.

DOE-CSGF and PNNL

One of the novel features of the CSGF is that it requires the student to do a "practicum" (internship) at a DOE lab. I focused on Argonne National Lab (Argonne) and Pacific Northwest National Lab (PNNL), for both technical and geographic reasons. I ended up at PNNL working on NWChem because they didn't care that I had no relevant programming experience. As some of the details of my practicum are found in Deixis, I won't repeat them here.

Working at PNNL in the summer of 2006 changed my life and is the reason I've been able to have a career in computing (I elaborate on this here). Tim Carlson patiently taught me how to administer a Linux system, Jochen Autschbach and Bert de Jong got me started with Fortran (e.g. "start typing at column seven"), Dunyou Wang taught me about version control and regression testing, and Karol Kowalski taught me quantum many-body theory and putting everything together in NWChem. Karol became my research supervisor and mentor -- we've written about a dozen papers together by now, which was an abnormally good outcome for a summer internship.

I worked on NWChem full-time for my last three years of grad school and my dissertation is based on that work. During that time, I became a lot better at programming, although still primary in old-school Fortran, and I learned about running jobs on supercomputers, first on MPP2, a 1000-node supercomputer made of Itanium2 processors, the Quadrics Elan2 interconnect, and properly provisioned local and global filesystems, and but also on Argonne Blue Gene systems.

One of the things that changed for me due to working on NWChem was how I prioritized research, (software) development, and support. Where once I believed that the scientific food chain had academics at the top, and everyone else was merely a failure who couldn't get a faculty position somewhere, I came to appreciate that national labs are at least as a good a place to do science, and that publishing papers is relatively low impact compared to building and supporting tools like NWChem that allow others to do science. It is not typical for grad students to do software support, but I found a great deal of satisfaction in this, and it forced me to learn things about NWChem that I never would have learned otherwise. I also started learning how to write documentation and practice empathy, which have proven to be far more useful in my professional life than solving quantum many-body equations in parallel.

In the same way that I wanted to understand the guts of Gaussian and B3LYP as an undergrad, working on NWChem inspired to wonder about how low-level system programming and supercomputer interconnects worked. While I could write massively parallel code in NWChem, this had more to do with the structured approach to parallelism built into the application and the Global Arrays runtime system rather than any skill of mine. Robert Harrison quoted So Hirata, who created the Tensor Contraction Engine (TCE) on which my work was built, as saying, "my code is working in parallel and I don't know why." I had a similar experience as So, although I decided I cared more about "why?" questions in parallel computing than chemistry itself.

Argonne

Thanks for CSGF, I had a number of connections at DOE labs to help find a job (postdoc). While Argonne had once declared me unsuitable for their purposes because I didn't know how to program when I was a prospective intern, they were my top choice for a postdoc for geographic reasons. I wanted to go to Argonne because they had a really good computer science division and I wanted to become proficient at both C and MPI. They were also standing up rather novel supercomputers in the Argonne Leadership Computing Facility (ALCF) and needed someone with computational chemistry skills who could make those codes run on the Blue Gene/P architecture.

Shortly after the new computing building was finished, Vinod Tipparaju visited Argonne and I ended up in a conference room with him and Pavan Balaji. I knew Vinod from PNNL days and he was one of the lead developers of Global Arrays. Pavan was (and is) an MPI extraordinaire. I didn't know Pavan very well back then, so it was serendipitous that Vinod made this introduction. We spent an hour or two analyzing common patterns in scalable HPC applications, which set into motion a number of research projects that eventually led to ARMCI-MPI, among other things, and was the beginning of a long and very fruitful collaboration between Pavan and me. Working with Pavan and the rest of the MPICH team turned me into a moderately functioning computer scientist, and is the primary reason I'm capable of doing anything useful for Intel.

Because this blog post must be finite, I'm not going to describe all of the wonderful people at Argonne who taught me things, but one essential person in all of this was my postdoc supervisor, Ray Bair, who, like me, did a PhD in computational chemistry and wandered into HPC. In stark contrast to academic postdocs, Ray did not give me a project, but rather mentored me on how to collaborate with people across the lab and taught me how to be successful in both my ALCF support "day job" and my research "side job". Ironically, I have never co-authored a paper with Ray, but his support made possible some of my most cited papers with Larry Curtiss, Eugene DePrince and Pavan.

In the second half of my tenure at Argonne, I became involved in some of the more facility-oriented activities of ALCF, including the acceptance of the Blue Gene/Q system and the specification of DOE's pre-exascale systems as part of the CORAL-1 program. This allowed me to be acquainted with a number of HPC vendors and the technical and nontechnical aspects of designing and buying a very expensive supercomputer years in advance of its existence. I learned about technical topics like "hardware-software co-design" and nontechnical topics like "using a statement-of-work to hold your vendor accountable."

The other thing that happened to me in my last two years in DOE was a string of rejections to proposals I submitted, many of which were unrelated to merit but rather due to me violating unwritten political critera such as "we must fund the oldest white men first", "you can't do that research at that lab", and "it doesn't matter how good your proposal is if the program office wants to fund another branch of science." Furthermore, even when I was successful, I was kicked off of projects for being a chemist, even when my contributions were computer science. Finally, my efforts to secure an Early Career grant were impeded because, as someone at a DOE user facility rather than a research division, "I was not a problem in need of solving" (i.e. I was covered by block funding and the lab didn't need me to get any grants).

At the end of this long series of rejections from the program office, I concluded that they were never going to allow me to have a research career and that I might as well focus on the facility work I was doing, which included many things that I enjoyed. In hindsight, this was a blessing, because it freed me from any long-term obligations to DOE and allowed me and of the burden of thinking about the politically driven shenanigans associated with research funding. It also set the stage for my departure to industry, which never would have happened if I had been co-PI of an exascale co-design center or SciDAC project, as I had wanted so badly.

Intel

In early 2014, Tim Mattson wrote me with an invitation to join his group at Intel Labs. We had a very long email conversation related to my requirements, which included (1) never using Windows, (2) open-sourcing as much code as possible, and (3) continuing to have no formal working hours or location. The most important factor in all of this was relocation to Portland, Oregon, which was one of the very few places on earth that would motivate me to leave Argonne. There was a time when I thought I'd spend my entire career in DOE, but between the abuse of government by Ted Cruz and the total capitulation of Steve Chu to penny-wise, pound-foolish budget obsession, I was more than happy to take my chances with the private sector.

When I interviewed with the Intel Parallel Computing Lab, they didn't provide much in terms of guidance about what I'd be doing. At one point, I asked whether they wanted me to focus on computational chemistry, parallel programming models, or something else. The answer was something along the lines of, "all of that sounds good to us." I've always been fortunate to have a great deal of freedom in what I do and how I do it, so this was appreciated, although it forced me to take a small leap of faith as to whether or not they meant it. Fortunately, they did mean it, and in my 6+ years at Intel, I have worked under and 80-20 rule, wherein I control about 80% of my time, and somebody else controls about 20% of my time.

I'll write more about my time at Intel in another post. Once I do that, I also need to write about working at NVIDIA.

(c) Copyright Jeff Hammond, 2023. No reuse permitted except by permission from the author.