Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: failed command 'bin/omnibus build td-agent3' #20

Open
prakashsurya opened this issue Feb 27, 2019 · 11 comments
Open

Error: failed command 'bin/omnibus build td-agent3' #20

prakashsurya opened this issue Feb 27, 2019 · 11 comments

Comments

@prakashsurya
Copy link
Contributor

We hit a failure in the nightly build here

The error message shows this:

07:35:25 Progress: |    [NetFetcher: ncurses] I | 2019-02-27T15:35:12+00:00 | Retrying failed download due to Net::OpenTimeout (3 retries left)...
07:35:25 
07:36:22 Progress: |    [NetFetcher: ncurses] I | 2019-02-27T15:36:14+00:00 | Retrying failed download due to Net::OpenTimeout (2 retries left)...
07:36:22 
07:37:20 Progress: |    [NetFetcher: ncurses] I | 2019-02-27T15:37:14+00:00 | Retrying failed download due to Net::OpenTimeout (1 retries left)...
07:37:20 
07:38:17 Progress: |    [NetFetcher: ncurses] E | 2019-02-27T15:38:14+00:00 | Download failed - Net::OpenTimeout!
07:38:17 #<Thread:0x00005630f0757a70@/home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/thread_pool.rb:57 run> terminated with exception (report_on_exception is true):
07:38:17 /usr/lib/ruby/2.5.0/net/http.rb:937:in `initialize': execution expired (Net::OpenTimeout)
07:38:17 	from /usr/lib/ruby/2.5.0/net/http.rb:937:in `open'
07:38:17 	from /usr/lib/ruby/2.5.0/net/http.rb:937:in `block in connect'
07:38:17 	from /usr/lib/ruby/2.5.0/timeout.rb:103:in `timeout'
07:38:17 	from /usr/lib/ruby/2.5.0/net/http.rb:935:in `connect'
07:38:17 	from /usr/lib/ruby/2.5.0/net/http.rb:920:in `do_start'
07:38:17 	from /usr/lib/ruby/2.5.0/net/http.rb:909:in `start'
07:38:17 	from /usr/lib/ruby/2.5.0/open-uri.rb:337:in `open_http'
07:38:17 	from /usr/lib/ruby/2.5.0/open-uri.rb:755:in `buffer_open'
07:38:17 	from /usr/lib/ruby/2.5.0/open-uri.rb:226:in `block in open_loop'
07:38:17 	from /usr/lib/ruby/2.5.0/open-uri.rb:224:in `catch'
07:38:17 	from /usr/lib/ruby/2.5.0/open-uri.rb:224:in `open_loop'
07:38:17 	from /usr/lib/ruby/2.5.0/open-uri.rb:165:in `open_uri'
07:38:17 	from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/core_extensions/open_uri.rb:51:in `open_uri'
07:38:17 	from /usr/lib/ruby/2.5.0/open-uri.rb:735:in `open'
07:38:17 	from /usr/lib/ruby/2.5.0/open-uri.rb:35:in `open'
07:38:17 	from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/download_helpers.rb:80:in `download_file!'
07:38:17 	from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/fetchers/net_fetcher.rb:175:in `download'
07:38:17 	from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/fetchers/net_fetcher.rb:86:in `fetch'
07:38:17 	from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/software.rb:888:in `fetch'
07:38:17 	from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/project.rb:1066:in `block (3 levels) in download'
07:38:17 	from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/thread_pool.rb:64:in `block (4 levels) in initialize'
07:38:17 	from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/thread_pool.rb:62:in `loop'
07:38:17 	from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/thread_pool.rb:62:in `block (3 levels) in initialize'
07:38:17 	from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/thread_pool.rb:61:in `catch'
07:38:17 	from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/thread_pool.rb:61:in `block (2 levels) in initialize'
07:38:17 /usr/lib/ruby/2.5.0/net/http.rb:937:in `initialize': execution expired (Net::OpenTimeout)
07:38:17 	from /usr/lib/ruby/2.5.0/net/http.rb:937:in `open'
07:38:17 	from /usr/lib/ruby/2.5.0/net/http.rb:937:in `block in connect'
07:38:17 	from /usr/lib/ruby/2.5.0/timeout.rb:103:in `timeout'
07:38:17 	from /usr/lib/ruby/2.5.0/net/http.rb:935:in `connect'
07:38:17 	from /usr/lib/ruby/2.5.0/net/http.rb:920:in `do_start'
07:38:17 	from /usr/lib/ruby/2.5.0/net/http.rb:909:in `start'
07:38:17 	from /usr/lib/ruby/2.5.0/open-uri.rb:337:in `open_http'
07:38:17 	from /usr/lib/ruby/2.5.0/open-uri.rb:755:in `buffer_open'
07:38:17 	from /usr/lib/ruby/2.5.0/open-uri.rb:226:in `block in open_loop'
07:38:17 	from /usr/lib/ruby/2.5.0/open-uri.rb:224:in `catch'
07:38:17 	from /usr/lib/ruby/2.5.0/open-uri.rb:224:in `open_loop'
07:38:17 	from /usr/lib/ruby/2.5.0/open-uri.rb:165:in `open_uri'
07:38:17 	from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/core_extensions/open_uri.rb:51:in `open_uri'
07:38:17 	from /usr/lib/ruby/2.5.0/open-uri.rb:735:in `open'
07:38:17 	from /usr/lib/ruby/2.5.0/open-uri.rb:35:in `open'
07:38:17 	from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/download_helpers.rb:80:in `download_file!'
07:38:17 	from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/fetchers/net_fetcher.rb:175:in `download'
07:38:17 	from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/fetchers/net_fetcher.rb:86:in `fetch'
07:38:17 	from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/software.rb:888:in `fetch'
07:38:17 	from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/project.rb:1066:in `block (3 levels) in download'
07:38:17 	from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/thread_pool.rb:64:in `block (4 levels) in initialize'
07:38:17 	from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/thread_pool.rb:62:in `loop'
07:38:17 	from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/thread_pool.rb:62:in `block (3 levels) in initialize'
07:38:17 	from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/thread_pool.rb:61:in `catch'
07:38:17 	from /home/ubuntu/.bundle/ruby/2.5.0/omnibus-1cfcf1714d60/lib/omnibus/thread_pool.rb:61:in `block (2 levels) in initialize'
07:38:17 Error: failed command 'bin/omnibus build td-agent3'
07:38:17 Error: failed command './buildpkg.sh td-agent'
07:38:17 Error: failed command './buildall.sh'
@jgallag88
Copy link
Contributor

The build of this package seems to be downloading artifacts from a number of different sites:

$ cat consoleText | grep 'Downloading from'
   [NetFetcher: jemalloc] I | 2019-02-27T15:29:33+00:00 | Downloading from `https://github.com/jemalloc/jemalloc/releases/download/4.5.0/jemalloc-4.5.0.tar.bz2'
       [NetFetcher: zlib] I | 2019-02-27T15:29:33+00:00 | Downloading from `https://zlib.net/fossils/zlib-1.2.11.tar.gz'
    [NetFetcher: cacerts] I | 2019-02-27T15:29:33+00:00 | Downloading from `https://curl.haxx.se/ca/cacert-2018-12-05.pem'
     [NetFetcher: xproto] I | 2019-02-27T15:29:33+00:00 | Downloading from `https://www.x.org/releases/individual/proto/xproto-7.0.25.tar.gz'
[NetFetcher: util-macros] I | 2019-02-27T15:29:33+00:00 | Downloading from `https://www.x.org/releases/individual/util/util-macros-1.18.0.tar.gz'
[NetFetcher: pkg-config-lite] I | 2019-02-27T15:29:34+00:00 | Downloading from `http://downloads.sourceforge.net/project/pkgconfiglite/0.28-1/pkg-config-lite-0.28-1.tar.gz'
 [NetFetcher: makedepend] I | 2019-02-27T15:29:34+00:00 | Downloading from `https://www.x.org/releases/individual/util/makedepend-1.0.5.tar.gz'
    [NetFetcher: openssl] I | 2019-02-27T15:29:34+00:00 | Downloading from `https://www.openssl.org/source/openssl-1.0.2q.tar.gz'
    [NetFetcher: ncurses] I | 2019-02-27T15:29:34+00:00 | Downloading from `https://ftp.gnu.org/gnu/ncurses/ncurses-5.9.tar.gz'
    [NetFetcher: libedit] I | 2019-02-27T15:29:34+00:00 | Downloading from `http://www.thrysoee.dk/editline/libedit-20120601-3.0.tar.gz'
    [NetFetcher: libtool] I | 2019-02-27T15:29:35+00:00 | Downloading from `https://ftp.gnu.org/gnu/libtool/libtool-2.4.tar.gz'
     [NetFetcher: libffi] I | 2019-02-27T15:29:37+00:00 | Downloading from `ftp://sourceware.org/pub/libffi/libffi-3.2.1.tar.gz'
    [NetFetcher: libyaml] I | 2019-02-27T15:29:38+00:00 | Downloading from `http://pyyaml.org/download/libyaml/yaml-0.1.7.tar.gz'
   [NetFetcher: libiconv] I | 2019-02-27T15:29:38+00:00 | Downloading from `https://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.15.tar.gz'
       [NetFetcher: ruby] I | 2019-02-27T15:29:38+00:00 | Downloading from `https://cache.ruby-lang.org/pub/ruby/2.4/ruby-2.4.5.tar.gz'
    [NetFetcher: liblzma] I | 2019-02-27T15:29:39+00:00 | Downloading from `http://tukaani.org/xz/xz-5.2.3.tar.gz'
    [NetFetcher: libxml2] I | 2019-02-27T15:29:39+00:00 | Downloading from `ftp://xmlsoft.org/libxml2/libxml2-2.9.8.tar.gz'
    [NetFetcher: libxslt] I | 2019-02-27T15:29:41+00:00 | Downloading from `ftp://xmlsoft.org/libxml2/libxslt-1.1.30.tar.gz'
   [NetFetcher: rubygems] I | 2019-02-27T15:29:43+00:00 | Downloading from `http://production.cf.rubygems.org/rubygems/rubygems-2.6.14.tgz'
 [NetFetcher: postgresql] I | 2019-02-27T15:29:43+00:00 | Downloading from `https://ftp.postgresql.org/pub/source/v9.6.9/postgresql-9.6.9.tar.bz2'
$ cat consoleText | grep 'Fetching from'
==========[GitFetcher: config_guess] I | 2019-02-27T15:29:33+00:00 | Fetching from `https://github.com/chef/config-mirror.git'
    [GitFetcher: fluentd] I | 2019-02-27T15:29:45+00:00 | Fetching from `https://github.com/fluent/fluentd.git'
 [GitFetcher: splunk-hec] I | 2019-02-27T15:29:45+00:00 | Fetching from `https://github.com/delphix/fluent-plugin-splunk-hec.git'

We will need a way to mirror these if we want repeatable builds

@prakashsurya
Copy link
Contributor Author

Looking at the build script here

It looks like these downloads are intentional; e.g.

	# Ensure all required gems are installed
	logmust bundle install --binstubs
	# Download dependent gems using downloader
	logmust bin/gem_downloader core_gems.rb
        logmust bin/gem_downloader delphix_plugin_gems.rb

lots of download statements in these two files: core_gems.rb, delphix_plugin_gems.rb

@prakashsurya
Copy link
Contributor Author

The way we rebuild all packages from source, coupled with the fact that building each package can be inherently unreliable (e.g. due to dependencies like this), is concerning. This puts us back in the situation that any build failure of any of the projects included in the "linux-pkg" framework can cause problems building our appliance.

This was the reason we opted to consuming packages in the new appliance-build system, so any one project wouldn't cause problems for the appliance build as a whole, but I think we've regressed on this goal due to the linux-pkg build architecture.

@pzakha
Copy link
Contributor

pzakha commented Feb 27, 2019

We will need a way to mirror these if we want repeatable builds

@jgallag88 I had raised this issue with @prashks during the original review, and he pointed out that every package is versioned so we should achieve repeatable builds. That doesn't mean that the builds are reliable though.

This was the reason we opted to consuming packages in the new appliance-build system, so any one project wouldn't cause problems for the appliance build as a whole, but I think we've regressed on this goal due to the linux-pkg build architecture.

@prakashsurya Since we are now moving towards a larger amount of packages, there is a trade-off to be made. The idea behind linux-pkg was to build packages that do not see much changes brought by the team, meaning that failure of linux-pkg affects a very small part of the team. We are seeing a lot of changes to the packages being managed by linux-pkg right now, but I predict that it would be greatly reduced eventually. Right now a failure of the linux-pkg build doesn't really impact the rest of the appliance-build (you can still test changes to the app-gate, zfs, masking). That said, I do see 2 issues with the way things currently are:

  1. We are making a lot of changes to delphix-platform right now and a breakage of the linux-pkg affects everyone working on that package.
  2. We are currently very vulnerable to changes in the kernel version, which can cause breakage of the whole product (very bad).

I have some ideas on how to reduce the impact of the first issue. As for the second issue, this is not really related to linux-pkg or how we build our packages; this issues should be fixed by the package mirror, although I have some ideas on how we could fix it even before we have the mirror.

@prashks
Copy link
Contributor

prashks commented Feb 27, 2019

The downloads done here are necessary to build this package and uses the most popular rubygems.org site. And looks like this network connection issue could likely be from our side (from https://www.isitdownrightnow.com/rubygems.org.html - it was only down more than a week ago).
So we'll need to have checks on our infrastructure as well for reliability.

In any case, I agree that its prudent to have a local mirror for the dependencies here and i'll touch base with DevOps on that and point the build of this package to such a local mirror eventually.

@pzakha
Copy link
Contributor

pzakha commented Feb 27, 2019

Looking at the output John pasted (#20 (comment)), it seems to copy from a bunch of sites, so setting up a mirror for this might prove problematic.

@prashks
Copy link
Contributor

prashks commented Feb 28, 2019

Ok, see your point @pzakha.
So, speaking for this particular package, we don't need it to be built every time - only times I can think of for now are :

  • if upstream/master changed
  • if master had new changes (pushes)
  • we're building for a different kernel maybe
    So, thinking out loud, how about we either make the framework do that or provide a flag/tunable that each package can choose to skip building a package ?

@prakashsurya
Copy link
Contributor Author

@prashks IIRC, that's true for all packages in this framework. Unfortunately, I don't think that'll work due to the framework of this linux-pkg repository, and how it interacts with appliance-build.

@pzakha
Copy link
Contributor

pzakha commented Feb 28, 2019

So, thinking out loud, how about we either make the framework do that or provide a flag/tunable that each package can choose to skip building a package ?

It's definitely doable but that would introduce extra complexity in the build, and extra potential issues. For instance, we can modify the framework so that it pushes the artifacts for each package to its own directory and have linux-pkg fetch the latest artifacts for a given package from S3 if it fails to build that package.

So it's definitely possible, even with the current framework, but we would need to evaluate the pros and the cons of that approach.

@prakashsurya
Copy link
Contributor Author

For instance, we can modify the framework so that it pushes the artifacts for each package to its own directory and have linux-pkg fetch the latest artifacts for a given package from S3 if it fails to build that package.

This sounds awfully similar to how we do it for non-linux-pkg packages.

@prashks
Copy link
Contributor

prashks commented Feb 28, 2019

For instance, we can modify the framework so that it pushes the artifacts for each package to its own directory and have linux-pkg fetch the latest artifacts for a given package from S3 if it fails to build that package.

Yeah not a bad idea for the framework to have a fallback for artifacts. Instead of pushing artifacts to a package's own dir, how about using our internal artifactory.delphix.com where other packages already live ?

So it's definitely possible, even with the current framework, but we would need to evaluate the pros and the cons of that approach.

Agree, definitely need to evaluate all the pros and cons, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants