NGINX Amplify Archives - NGINX

The Future of NGINX: Getting Back to Our Open Source Roots

Rob Whiteley of F5 — Tue, 23 Aug 2022 15:30:25 +0000

Time flies when you’re having fun. So it’s hard to believe that NGINX is now 18 years old. Looking back, the community and company have accomplished a lot together. We recently hit a huge milestone – as of this writing 55.6% of all websites are powered by NGINX (either by our own software or by products built atop NGINX). We are also the number one web server by market share. We are very proud of that and grateful that you, the NGINX community, have given us this resounding vote of confidence.

We also recognize, more and more, that open source software continues to change the world. A larger and larger percentage of applications are built using open source code. From Bloomberg terminals and news to the Washington Post to Slack to Airbnb to Instagram and Spotify, thousands of the world’s most recognizable brands and properties rely on NGINX Open Source to power their websites. In my own life – between Zoom for work meetings and Netflix at night – I probably spend 80% of my day using applications built atop NGINX.

NGINX is only one element in the success story of open source. We would not be able to build the digital world – and increasingly, to control and manage the physical world – without all the amazing open source projects, from Kubernetes and containers to Python and PyTorch, from WordPress to Postgres to Node.js. Open source has changed the way we work. There are more than 73 million developers on GitHub who have collectively merged more than 170 million pull requests (PRs). A huge percentage of those PRs have been on code repositories with open source licenses.

We are thrilled that NGINX has played such a fundamental role in the rise and success of open source – and we intend to both keep it going and pay it forward. At the same time, we need to reflect on our open source work and adapt to the ongoing evolution of the movement. Business models for companies profiting from open source have become controversial at times. This is why NGINX has always tried to be super clear about what is open source and what is commercial. Above all, this meant never, ever trying to charge for functionality or capabilities that we had included in the open source versions of our software.

Open Source is Evolving Fast. NGINX Is Evolving, Too.

We now realize that we need to think hard about our commitment to open source, provide more value and capabilities in our open source products, and, yes, up our game in the commercial realm as well. We can’t simply keep charging for the same things as in the past, because the world has changed – some features included only in our commercial products are now table stakes for open source developers. We also know that open source security is top of mind for developers. For that reason, our open source projects need to be just as secure as our commercial products.

We also have to acknowledge reality. Internally, we have had a habit of saying that open source was not really production‑ready because it lacked features or scalability. The world has been proving us wrong on that count for some time now: many thousands of organizations are running NGINX open source software in production environments. And that’s a good thing, because it shows how much they believe in our open source versions. We can build on that.

In fact, we are doing that constantly with our core products. To those who say that the original NGINX family of products has grown long of tooth, I say you have not been watching us closely:

For the core NGINX Open Source software, we continue to add new features and functionality and to support more operating system platforms. Two critical capabilities for security and scalability of web applications and traffic, HTTP3 and QUIC, are coming in the next version we ship.
A quiet but incredibly innovative corner of the NGINX universe is NGINX JavaScript (njs), which enables developers to integrate JavaScript code into the event‑processing model of the NGINX HTTP and TCP/UDP (Stream) modules and extend NGINX configuration syntax to implement sophisticated capabilities. Our users have done some pretty amazing things, everything from innovative cache purging and header manipulations to support for more advanced protocols like MQTTv5.
Our universal web application server, NGINX Unit, was conceived by the original author of NGINX Open Source, Igor Sysoev, and it continues to evolve. Unit occupies an important place in our vision for modern applications and a modern application stack that goes well beyond our primary focus on the data plane and security. As we develop Unit, we are rethinking how applications should be architected for the evolving Web, with more capabilities that are cloud‑native and designed for distributed and highly dynamic apps.

The Modern Apps Reference Architecture

We want to continue experimenting and pushing forward on ways to help our core developer constituency more efficiently and easily deploy modern applications. Last year at Sprint 2.0 we announced the NGINX Modern Apps Reference Architecture (MARA), and I am happy to say it recently went into general availability as version 1.0.0. MARA is a curated and opinionated stack of tools, including Kubernetes, that we have wired together to make it easy to deploy infrastructure and application architecture as code. With a few clicks, you can configure and deploy a MARA reference architecture that is integrated with everything you need to create a production‑grade, cloud‑native environment – security, logging, networking, application server, configuration and YAML management, and more.

MARA is a modular architecture, and by design. You can choose your own adventure and design from the existing modules a customized reference architecture that can actually run applications. The community has supported our idea and we have partnered with a number of innovative technology companies on MARA. Sumo Logic has added their logging chops to MARA and Pulumi has contributed modules for automation and workflow orchestration. Our hope is that, with MARA, any developer can get a full Kubernetes environment up and running in a matter of minutes, complete with all the supporting pieces, secure, and ready for app deployment. This is just one example of how I think we can all put our collective energy into advancing a big initiative in the industry.

The Future of NGINX: Modernize, Optimize, Extend

Each year at NGINX Sprint, our virtual user conference, we make new commitments for the coming year. This year is no different. Our promises for the next twelve months can be captured in three words: modernize, optimize, and extend. We intend to make sure these are not just business buzzwords; we have substantial programs for each one and we want you to hold us to our promises.

Promise #1: Modernize Our Approach, Presence, and Community Management

Obviously, we are rapidly modernizing our code and introducing new products and projects. But modernization is not just about code – it encompasses code management, transparency around decision making, and how we show up in the community. While historically the NGINX Open Source code base has run on the Mercurial version control system, we recognize that the open source world now lives on GitHub. Going forward, all NGINX projects will be born and hosted on GitHub because that’s where the developer and open source communities work.

We also are going to modernize how we govern and manage NGINX projects. We pledge to be more open to contributions, more transparent in our stewardship, and more approachable to the community. We will follow all expected conventions for modern open source work and will be rebuilding our GitHub presence, adding Codes of Conduct to all our projects, and paying close attention to community feedback. As part of this commitment to modernize, we are adding an NGINX Community channel on Slack. We will staff the channel with our own experts to answer your questions. And you, the community, will be there to help each other, as well – in the messaging tool you already use for your day jobs.

Promise #2: Optimize Our Developer Experience

Developers are our primary users. They build and create the applications that have made us who we are. Our tenet has always been that NGINX is easy to use. And that’s basically true – NGINX does not take days to install, spin up, and configure. That said, we can do better. We can accelerate the “time to value” that developers experience on our products by making the learning curve shorter and the configuration process easier. By “value” I mean deploying code that does something truly valuable, in production, full stop. We are going to revamp our developer experience by streamlining the installation experience, improving our documentation, and adding coverage and heft to our community forums.

We are also going to release a new SaaS offering that natively integrates with NGINX Open Source and will help you make it useful and valuable in seconds. There will be no registration, no gate, no paywall. This SaaS will be free to use, forever.

In addition, we recognize that many critical features which developers now view as table stakes are on the wrong side of the paywall for NGINX Open Source and NGINX Plus. For example, DNS service discovery is essential for modern apps. Our promise is to make those critical features free by adding them to NGINX Open Source. We haven’t yet decided on all of the features to move and we want your input. Tell us how to optimize your experience as developers. We are listening.

Promise #3: Extend the Power and Capabilities of NGINX

As popular as NGINX is today, we know we need to keep improving if we want to be just as relevant ten years from now. Our ambitious goal is this: we want to create a full stack of NGINX applications and supporting capabilities for managing and operating modern applications at scale.

To date, NGINX has mostly been used as a Layer 7 data plane. But developers have to put up a lot of scaffolding around NGINX to make it work. You have to wire up automation and CI/CD capabilities, set up proper logging, add authentication and certificate management, and more. We want to make a much better extension of NGINX where every major requirement to test and deploy an app is satisfied by one or more high‑quality open source components that seamlessly integrate with NGINX. In short, we want to provide value at every layer of the stack and make it free. For example, if you are using NGINX Open Source or NGINX Plus as an API gateway, we want to make sure you have everything you need to manage and scale that use case – API import, service discovery, firewall, policy rules and security – all available as high‑quality open source options.

To summarize, our dream is to build an ecosystem around NGINX that extends into every facet of application management and deployment. MARA is the first step in building that ecosystem and we want to continue to attract partners. My goal is to see, by the end of 2022, an entire pre‑wired app launch and run in minutes in an NGINX environment, instrumented with a full complement of capabilities – distributed tracing, logging, autoscaling, security, CI/CD hooks – that are all ready to do their jobs.

Introducing Kubernetes API Gateway, a Brand New Amplify, and NGINX Agent

We are committed to all this. And here are three down payments on my three promises.

Earlier this year we launched NGINX Kubernetes Gateway, based on the Kubernetes API Gateway SIG’s reference architecture. This modernizes our product family and keeps us in line with the ongoing evolution of cloud native. The NGINX Kubernetes Gateway is also something of an olive branch we’re extending to the community. We realize it complicated matters when we created both a commercial and an open source Ingress controller for Kubernetes, both different from the community Ingress solution (also built on NGINX). The range of choices confused the community and put us in a bad position.

It’s pretty clear that the Gateway API is going to take the place of the Ingress controller in the Kubernetes architecture. So we are changing our approach and will make the NGINX Kubernetes Gateway – which will be offered only as an open source product – the focal point of our Kubernetes networking efforts (in lockstep with the evolving standard). It will both integrate and extend into other NGINX products and optimize the developer experience on Kubernetes.
A few years back, we launched NGINX Amplify, a monitoring and telemetry SaaS offering for NGINX fleets. We didn’t really publicize it much. But thousands of developers found it and are still using it today. Amplify was and remains free. As part of our modernization pledge, we are adding a raft of new capabilities to Amplify. We aim to make it your trusted co‑pilot for standing up, watching over, and managing NGINX products at scale in real time. Amplify will not only monitor your NGINX instances but will help you configure, apply scripts to, and troubleshoot NGINX deployments.
We are launching NGINX Agent, a lightweight app that you deploy alongside NGINX Open Source instances. It will include features previously offered only in commercial products, for example the dynamic configuration API. With NGINX Agent, you’ll be able to use NGINX Open Source in many more use cases and with far greater flexibility. It will also include far more granular controls which you can use to extend your applications and infrastructure. Agent helps you make smarter decisions about managing, deploying, and configuring NGINX. We’re working hard on NGINX Agent – keep an eye out for a blog coming in the next couple months to announce its availability!

Looking Ahead

In a year, I hope you ask me about these promises. If I can’t report real progress on all three, then hold me to it, please. And please understand – we are engaged and ready to talk with all of you. You are our best product roadmap. Please take our annual survey. Join NGINX Community Slack and tell us what you think. Comment and file PRs on the projects at our GitHub repo.

It’s going to be a great year, the best ever. We look forward to hearing more from you and please count on hearing more from us. Help us help you. It’s going to be a great year, the best ever. We look forward to hearing more from you and please count on hearing more from us.

The post The Future of NGINX: Getting Back to Our Open Source Roots appeared first on NGINX.

Monitoring MySQL with NGINX Amplify

F5 NGINX of F5, Inc. — Mon, 23 Apr 2018 16:53:15 +0000

The initial surge of web servers for the Internet tended to run the famous LAMP stack: Linux, Apache, MySQL, and PHP (or Perl). However, for higher‑performance sites, the LAMP stack is often replaced by the LEMP stack: Linux, NGINX (Engine‑x), MySQL, and PHP, Perl, and/or Python. Today, more of the world’s 1 million busiest websites use NGINX than any other server.

The use of NGINX instead of the Apache web server as the frontend to popular PHP applications like WordPress, Drupal, and Joomla enables more efficient utilization of the underlying server and the OS resources, and often manifests itself in the ability to serve at least ten times more users on the same hardware.

For instance, it’s common to see NGINX deployed with a popular PHP application. In this case, NGINX typically works as a local web accelerator and PHP‑FPM serves as the application server. This setup has proven extremely useful for offloading SSL termination, content caching, authentication, and other aspects of HTTP security from the PHP application to NGINX.

In a previous release of NGINX Amplify, we added the ability to collect PHP‑FPM metrics. Now we’re announcing another useful plug‑in for Amplify, which collects and visualizes metrics for MySQL, making LEMP monitoring with Amplify complete. The same plug‑in works for other servers compatible with MySQL, such as MariaDB and Percona.

MySQL and compatible databases are very popular. MySQL is generally considered the #1 relational database, ahead of alternatives such as Microsoft SQL Server, MongoDB, PostgreSQL, NoSQL, Oracle. So the addition of MySQL metrics collection to Amplify will empower a great many users of NGINX.

Configuring the MySQL Plug-in

As when it monitors NGINX or PHP‑FPM, the Amplify agent needs to be able to detect the MySQL master process automatically when it’s installed, and start to collect the metrics. If everything is set up properly, you immediately see a set of out-of-the-box graphs for MySQL in Amplify, along with a few useful extended metrics like the utilization of the InnoDB buffer pool.

Let’s proceed to the actual configuration process for the Amplify MySQL plug‑in. In order for the Amplify agent to monitor MySQL, you need to do the following:

Create a new MySQL user for the Amplify agent, called amplify-agent.
```
$ mysql -u root -p
...
mysql> CREATE USER 'amplify-agent'@'localhost' IDENTIFIED BY 'YOUR_PASSWORD_HERE';
Query OK, 0 rows affected (0.01 sec)
```
where YOUR_PASSWORD_HERE is a secure password specifically for the amplify-agent user account. (Note this is NOT the password for the MySQL root user!)

Verify that the amplify-agent user can read MySQL metrics.

$ mysql -u amplify-agent -p
...
mysql> show global status;
+---------------------------+----------------------------------------+
| Variable_name             | Value                                  |
+---------------------------+----------------------------------------+
| Aborted_clients           | 0                                      |
...
| Uptime_since_flush_status | 1993                                   |
+---------------------------+----------------------------------------+
353 rows in set (0.01 sec)

Update the Amplify agent software to the most recent version.

Add the following to /etc/amplify-agent/agent.conf:

[extensions]
..
mysql = True

[mysql]
#host =
#port =
unix_socket = /var/run/mysqld/mysqld.sock
user = amplify-agent
password = YOUR_PASSWORD_HERE

where YOUR_PASSWORD_HERE is the same as in Step 1.

Restart the Amplify agent.

The agent is now able to detect the MySQL master process and collect the metrics.

Troubleshooting

For MySQL metrics collection to work, the Amplify agent must run in the same process environment as MySQL and be able to find the mysqld processes with ps(1). For example, if the MySQL server runs inside a Docker container on the host system where the Amplify agent is running, you need to add the Amplify agent to that Docker container.

Here’s a list of possible causes if the MySQL metrics aren’t being collected:

The MySQL instance isn’t local. At this time, you need to run the agent on the host where the MySQL server is started.
The amplify-agent user can’t query the global status metrics. You can easily check it as the mysql(1) client, and fix the permissions if necessary.

If checking the above issues doesn’t help, enable the Amplify agent’s debug log, restart the agent, wait a few minutes, and then create an issue via the Intercom chat button in the bottom‑right corner of the Amplify window. Attach the log to the Intercom chat. We’ll be happy to help.

MySQL Metrics in Amplify

Below is the list of the MySQL metrics that are currently supported in Amplify. The agent retrieves most of the metrics from the MySQL global status variables. (Some metric names are broken across multiple lines for improved table layout.)

Amplify Metric	Description	MySQL Global Status
`mysql.global.connections`	Number of connection attempts to the MySQL server (successful or not).	`SHOW GLOBAL STATUS LIKE "Connections";`
`mysql.global.questions`	Number of statements executed by the server. See MySQL reference manual for details.	`SHOW GLOBAL STATUS LIKE "Questions";`
`mysql.global.select`	Number of `select` statements executed.	`SHOW GLOBAL STATUS LIKE "Com_select";`
`mysql.global.insert`	Number of `insert` statements executed.	`SHOW GLOBAL STATUS LIKE "Com_insert";`
`mysql.global.update`	Number of `update` statements executed.	`SHOW GLOBAL STATUS LIKE "Com_update";`
`mysql.global.delete`	Number of `delete` statements executed.	`SHOW GLOBAL STATUS LIKE "Com_delete";`
`mysql.global.writes`	Sum of preceding `insert`, `update`, and `delete` counters.	–
`mysql.global.commit`	Number `commit` statements executed.	`SHOW GLOBAL STATUS LIKE "Com_commit";`
`>mysql.global.slow_queries`	Number of queries that have taken more than `long_query_time` seconds.	`SHOW GLOBAL STATUS LIKE "Slow_queries";`
`mysql.global.uptime`	Number of seconds that the server has been up.	`SHOW GLOBAL STATUS LIKE "Uptime";`
`mysql.global.aborted_ connects`	Number of failed attempts to connect to the MySQL server.	`SHOW GLOBAL STATUS LIKE "Aborted_connects";`
`mysql.global.innodb_buffer_ pool_read_requests`	Number of logical reads that InnoDB could not satisfy from the buffer pool, and had to read directly from disk.	`SHOW GLOBAL STATUS LIKE "Innodb_buffer_pool_ reads";`
`mysql.global.innodb_buffer_ pool_hit_ratio<`	Hit ratio reflecting the efficiency of the InnoDB buffer pool.	–
`mysql.global.innodb_buffer_ pool_pages_total`	Total size of the InnoDB buffer pool, in pages.	`SHOW GLOBAL STATUS LIKE "Innodb_buffer_pool_ pages_total";`
`mysql.global.innodb_buffer_ pool_pages_free`	Number of free pages in the InnoDB buffer pool.	`SHOW GLOBAL STATUS LIKE "Innodb_buffer_pool_ pages_free";`
`mysql.global.innodb_buffer_ pool_util`	InnoDB buffer pool utilization.	–
`mysql.global.threads_ connected`	Number of currently open connections.	`SHOW GLOBAL STATUS LIKE "Threads_connected";`
`mysql.global.threads_running`	Number of threads that are not sleeping.	`SHOW GLOBAL STATUS LIKE "Threads_running";`

Among the metrics it makes sense to check periodically are the following:

Number of currently open connections
Number of executed MySQL statements (for example com_select)
Number of slow queries
InnoDB pool efficiency
Overall MySQL availability

Conclusion

We hope it’ll be convenient for you to see the MySQL metrics in NGINX Amplify and have a broader view into the application behavior. With the metrics for NGINX, the Linux OS, PHP‑FPM, and now MySQL, monitoring a LEMP stack with Amplify becomes complete.

Here are some useful links and resources for monitoring MySQL with Amplify:

MySQL metrics in the Amplify documentation
Connecting to the MySQL Server in the MySQL documentation
End‑User Guidelines for Password Security in the MySQL documentation
Server Status Variables in the MySQL documentation

We're planning to monitor even more application stack components in Amplify soon, so please keep in touch. If you have any suggestions, let us know.

Many thanks for using NGINX Amplify!

The post Monitoring MySQL with NGINX Amplify appeared first on NGINX.

Monitoring NGINX

F5 NGINX of F5, Inc. — Tue, 09 Jan 2018 04:52:46 +0000

Setting up a monitoring tool for NGINX is an important part of maintaining website operations. Proper NGINX monitoring can reveal a lot of useful information about the underlying application performance. There are quite a few monitoring systems out there suitable for the task; the first step, however, is to enable metric collection in NGINX.

Using the `stub_status` Module

There’s a module for NGINX Open Source called ngx_http_stub_status_module (or simply stub_status) that exposes a few important metrics about NGINX activity.

To check if your NGINX build has the stub_status module, run nginx -V:

$ nginx -V 2>&1 | grep --color -- --with-http_stub_status_module

All of our NGINX builds include the stub_status module on all supported platforms.

If your NGINX build does not include the stub_status module, you have to rebuild from source and include the --with-http_stub_status_module parameter to the configure script.

As the next step, enable the module in your NGINX configuration by including the stub_status directive in a location block. You can always add the block to an existing server configuration. Alternatively, add a separate server block, with a single specialized location for the stub_status directive, as here:

server {
    listen 127.0.0.1:80;
    server_name 127.0.0.1;

    location /nginx_status {
        stub_status;
    }
}

Appropriate server blocks for the stub_status directive are sometimes found outside of the main configuration file (nginx.conf). If you don’t see a suitable block in that file, search for additional configuration files which are typically included in nginx.conf.

We also recommend that you allow only authorized users to access the metrics, for example by including the allow and deny directives in the location or server block.

After the stub_status module is configured, don’t forget to reload the NGINX configuration (with the service nginx reload command, for example). You can read more about NGINX control signals here.

To display the stub_status metrics, make a curl query. The following is appropriate for the configuration shown above:

$ curl http://127.0.0.1/nginx_status
Active connections: 2
server accepts handled requests
 841845 841845 1631067
Reading: 0 Writing: 1 Waiting: 1

If this doesn’t work, check where the requests to /nginx_status are routed. In many cases, another server block can be the reason why you can’t access the stub_status metrics. To read more about these instance‑wide NGINX metrics, see the reference documentation .

With the stub_status module enabled in NGINX and working, you can proceed with the installation and configuration of your monitoring system of choice.

Log Files and syslog

The NGINX access log and error log contain a lot of useful information suitable for metric collection. You can use NGINX variables to fully customize the access log format. Certain monitoring tools can leverage NGINX log files for metric collection.

To meet various performance and security requirements, consider using the NGINX syslog capability. While log files are written to disk, syslog allows NGINX to send log data over a network protocol. For example, you can set up a dedicated Linux system to collect all of your log data from various NGINX instances.

For more information on logging, please refer to the NGINX Plus Admin Guide.

Monitoring NGINX with Amplify

We have a native tool for NGINX monitoring. It’s called NGINX Amplify, and it’s a SaaS tool that you can use to monitor up to five servers for free (subscriptions are available for larger numbers of servers).

It’s easy to get started with NGINX Amplify. You can get out-of-the-box graphs for all key NGINX metrics in under ten minutes. NGINX Amplify automatically uses metrics from stub_status and from access logs, and can collect various OS information as well.

Using NGINX Amplify, you can visualize your NGINX performance, and monitor the OS, PHP‑FPM, Docker containers, and more. A unique feature in Amplify is a static analyzer for your NGINX configuration that provides recommendations for making the configuration more secure and efficient.

Read more about NGINX Amplify here, and try it out for free.

NGINX Amplify is a key monitoring tool

Additional API Module in NGINX Plus

NGINX Plus provides a better way to obtain performance metrics via a specialized API module.

The API module offers a detailed set of metrics, with the primary focus on load balancing and virtual server stats. As an example, a breakdown of all HTTP status codes (1xx, 2xx, 3xx, 4xx, 5xx) is presented for server blocks. Health status information is available for both HTTP and TCP/UDP upstream servers. Cache metrics include hits and misses for each cache zone.

Aside from gathering an extended set of metrics, the API also enables you to reconfigure HTTP and TCP/UDP upstream server groups and manage key‑value variables without reloading configuration or restarting NGINX Plus.

NGINX Plus also comes with an integrated dashboard that utilizes the additional metrics. The additional metrics are also available for use in NGINX Amplify.

Troubleshooting Application Performance and Slow TCP Connections with NGINX Amplify

Anatoly Mikhaylov of Zendesk — Thu, 07 Dec 2017 04:37:46 +0000

In this article I’m sharing an example of how to use NGINX Amplify as a visualization and reporting tool for benchmarking application performance. The primary focus is measuring the effect on performance of keepalive connections.

As you’ll see at the conclusion, we found that we can double performance in a realistic testing scenario by using TCP keepalive connections. Using the NGINX upstream keepalive mechanism reduces connection overhead by reducing the number of TCP/IP packet round trips, and also gives more consistent response time. Using NGINX Amplify, we can easily visualize the interaction, identify bottlenecks, and troubleshoot excessive TCP connect time, improving application performance.

Introduction

Keepalive connections can significantly boost performance, but only if they’re reused extensively enough to sharply reduce the need to create new connections, which are computationally expensive to set up. The performance boost from using keepalives is increased when SSL/TLS is in use, as SSL/TLS connections are even more expensive than insecure connections. (In fact, because keepalives can so greatly reduce the performance penalty otherwise incurred when using SSL/TLS, the extensive use and reuse of keepalives can make the difference as to whether SSL/TLS use is practical or impractical on a website.)

The HTTP Keepalive Connections and Web Performance blog post covers a variety of topics related to the internals of the TCP protocol, as well as a number of common problems, and how to troubleshoot them – including a useful definition of keepalive connections:

HTTP uses a mechanism called keepalive connections to hold open the TCP connection between the client and the server after an HTTP transaction has completed. If the client needs to conduct another HTTP transaction, it can use the idle keepalive connection rather than creating a new TCP connection. Clients generally open a number of simultaneous TCP connections to a server and conduct keepalive transactions across them all. These connections are held open until either the client or the server decides they are no longer needed, generally as a result of an idle timeout.

The Overcoming Ephemeral Port Exhaustion in NGINX and NGINX Plus blog post describes how to make NGINX reuse previously established TCP connections to upstreams [in this case NGINX or NGINX Plus is the client]:

A keepalive connection is held open after the client reads the response, so it can be reused for subsequent requests. Use the keepalive directive to enable keepalive connections from NGINX Plus to upstream servers, defining the maximum number of idle keepalive connections to upstream servers that are preserved in the cache of each worker process. When this number is exceeded, the least recently used connections are closed. Without keepalives, you’re adding more overhead, and being inefficient with both connections and ephemeral ports.

Also, in the 10 Tips for 10x Application Performance blog post, there’s general information about the usefulness and applicability of the client‑side keepalive and upstream‑side keepalive techniques. Note that you use and manage client‑side keepalives and upstream keepalives differently:

Client‑side keepalives – Client‑side keepalive connections reduce overhead, especially when SSL/TLS is in use. For NGINX, you can increase the maximum number of keepalive_requests a client can make over a given connection from the default of 100, and you can increase the keepalive_timeout to allow the keepalive connection to stay open longer, resulting in faster subsequent requests.
Upstream keepalives – Upstream connections (connections to application servers, database servers, and so on) benefit from keepalive connections as well. For upstream connections, you can increase the value of the keepalive directive, which sets the number of idle keepalive connections that each worker process keeps open to backend servers. This forms a pool of upstream keepalive connections, allowing for increased connection reuse and cutting down on the need to open new connections. The keepalive blog post mentioned above provides technical insights on how to use the upstream keepalive connection pool to optimize application performance.

There are two common cases where an upstream keepalive pool is especially beneficial:

The application infrastructure has fast application backends that produce responses in a very short time, usually comparable to the speed of completing a TCP handshake. This is because the cost of the TCP handshake is high in relation to the cost of the response.
The backends in the application infrastructure are remote (from the perspective of NGINX acting as a proxy); therefore network latency is high, as a TCP handshake takes a long time.

Other beneficial side effects of using upstream keepalives include reducing the number of sockets in TIME‑WAIT state, less work for the OS to establish new TCP connections, and fewer packets on the network. However, these are unlikely to result in measurable application performance benefits in a typical setup.

Testing Setup

While using keepalives can be useful, configuring them is usually a bit complex and error prone. We‘ll use Wireshark to inspect the low‑level network elements, such as TCP streams and their connection states.

To measure the TCP connection time, we’ll configure an NGINX container with the NGINX Amplify agent to collect all necessary data. We’ll then use NGINX Amplify to analyze this data. We’ll use siege as the HTTP benchmarking tool with the following configuration (in ~/.siege/siege.conf):

protocol = HTTP/1.1
cache = false
connection = keep-alive
concurrent = 5
benchmark = true

The connection = keep-alive statement means that the client uses the client‑side HTTP keepalive mechanism. Having it configured that way, we demonstrate that there are two distinct TCP connections – two TCP streams, to be precise – when proxying a request. One is the client‑side HTTP keepalive connection to the NGINX proxy, and the other one is the upstream connection between NGINX and the upstream backend. The latter is part of NGINX keepalive connection pool. In the remainder of this blog post, we’ll talk about upstream keepalive connections only.

We’ll begin with showing the benchmark results in the NGINX Amplify dashboard, and then explain them. This will give you some context about the focus of this exercise.

The testing scenario was to run a series of five siege tests initiated from inside the NGINX container, with each test taking 10 minutes to finish. The tests correspond to scenarios A through E discussed below.

siege -b -t 10m http://127.1/upstream-close/proxy_pass_connection-close/server_keepalive_timeout-0
siege -b -t 10m http://127.1/upstream-close/proxy_pass_connection-keepalive/server_keepalive_timeout-0
siege -b -t 10m http://127.1/upstream-close/proxy_pass_connection-keepalive/server_keepalive_timeout-300
siege -b -t 10m http://127.1/upstream-keepalive/proxy_pass_connection-keepalive/server_keepalive_timeout-0
siege -b -t 10m http://127.1/upstream-keepalive/proxy_pass_connection-keepalive/server_keepalive_timeout-300

In this case, we broke a rule of performance testing by running the load generator on the system under test, with the risk of distorting the results. To reduce this effect, we isolated resource use by running and inspecting the traffic in Docker containers. Also, there weren’t many locking operations such as I/O, which both use lots of time and have significant variability unrelated to the system aspects being tested. The fact that the results were pretty consistent seems to indicate that we were able to get relatively “clean” results.

The NGINX Amplify dashboard in the following screenshot includes graphs for the following six metrics. We have omitted the nginx. prefix that NGINX Amplify prepends on each metric name, to match the labels on the graphs:

`upstream.connect.time` – Time to establish connection to upstream		`upstream.response.time` – Time to receive response from upstream
`http.request.time` – Time to completely process request		`upstream.header.time` – Time reading headers in upstream response
`http.request.body_bytes_sent` – Bytes sent to client, not including headers		`upstream.request.count` – Number of requests sent to upstream

For each metric, three measurements are shown: 50pctl (median), 95pctl, and max. The top four graphs are metrics where lower values represent better performance. The bottom two graphs are traffic‑volume metrics where higher values are better.

The next screenshot zooms in on the two most important metrics for measuring upstream keepalive performance, upstream.connect.time and upstream.header.time (these correspond to the top‑left and middle‑right graphs in the previous screenshot):

The NGINX Amplify agent was downloaded from the official repo and installed in the NGINX container:

$ docker run --net network0 --name nginx -p 80:80 -e API_KEY=$AMPLIFY_API_KEY -d nginx-amplify

Every run of the siege benchmark requires two TCP streams: one siege ↔ NGINX and the other NGINX ↔ upstream backend server.

We analyzed the upstream keepalive TCP streams in each of five configuration scenarios (A through E). For the complete NGINX configuration snippets and Wireshark dumps (which also appear below), see this GitHub gist.

Scenarios A, B, and D were tested against an upstream backend server (also running NGINX) configured as follows to disable keepalive connections:

server {
    location / {
       empty_gif; 
       keepalive_timeout 0;
    } 
}

Scenarios C and E were tested against an upstream backend NGINX server configured as follows to enable keepalive connections, also increasing their duration and the maximum number of requests serviced over a given connection (the defaults are 75 seconds and 300 respectively):

server {
    location / {
       empty_gif; 
       keepalive_timeout 300;
       keepalive_requests 100000;
    } 
}

To run Wireshark, we used a tcpdump Docker container and a copy of Wireshark installed on MacOS:

$ docker run --net=container:nginx crccheck/tcpdump -i any --immediate-mode -w - | /usr/local/bin/wireshark -k -i -;

Test Results

Let’s take a look at the TCP streams between the NGINX proxy and the upstream server for each of the above‑mentioned scenarios.

Scenario A – upstream‑close / proxy_pass_connection‑close / server_keepalive_timeout‑0

Scenario B – upstream‑close / proxy_pass_connection‑keepalive / server_keepalive_timeout‑0

Scenario C – upstream‑close / proxy_pass_connection‑close / server_keepalive_timeout‑300

Scenario D – upstream‑keepalive / proxy_pass_connection‑keepalive / server_keepalive_timeout: 0

Scenario E – upstream‑keepalive / proxy_pass_connection‑keepalive / server_keepalive_timeout‑300

According to Section 3.5 of RFC‑793: A TCP connection may terminate in two ways: (1) the normal TCP close sequence using a FIN handshake, and (2) an “abort” in which one or more RST segments are sent and the connection state is immediately discarded. If a TCP connection is closed by the remote site, the local application MUST be informed whether it closed normally or was aborted.

Here’s the Wireshark I/O graph, with the TCP congestion window in dark green, and bytes‑in‑flight in light green:

Now we can see that the upstream keepalive configuration has to contain an upstream keepalive, a proxy connection header as an empty string, with proxy protocol HTTP/1.1. The upstream backend server must support HTTP keepalive connections too, or else TCP connections are closed after every HTTP request.

Test Results Under Load

The results above are from benchmarking against an idle server (located on AWS, EU West → US West). We then performed the same procedure for a heavily loaded server and the same regions (AWS EU West → US West). We performed the same benchmark for two more test cases.

For the loaded server, the testing scenario was to run two siege tests sequentially, again initiated from inside the NGINX container. This time each test took about 20 minutes to finish. The tests correspond to scenarios A and B discussed below.

siege -b -t 20m http://127.1/upstream-heavy-loaded-close/
siege -b -t 20m http://127.1/upstream-heavy-loaded-keepalive/

This screenshot shows graphs for the same six metrics as above, this time for the two new scenarios with a heavily loaded server:

One key result appears in the bottom right graph: with keepalive connections between the NGINX proxy and a heavily loaded backend located far away (Scenario B), the throughput jumps nearly five times compared to no keepalives (Scenario A): upstream.request.count increased from 370 to 1731 requests per minute.

And again let’s zoom in on the two most important metrics, upstream.connect.time and upstream.header.time.

Note that in Scenario B (keepalive connections), the value of upstream.connect.time (the max value out of the statistical distribution collected) is evenly distributed. However, it’s not a zero as it was in the parallel scenario with an idle server (Scenario E above), because the server’s keepalive_requests are configured to 300, and we see FINs followed by SYN/ACKs; we didn’t see that as often in the previous tests, when keepalive_requests were set to a higher value. But given the number of siege runs, we see the 95th‑percentile and the mediam (50th‑percentile) values are 0, which saves client and server resources on the SYN/ACK handshake and the TLS exchange.

In Scenario B, the 95th‑percentile and mean (50th‑percentile) values are same for both upstream.connect.time and upstream.header.time. It’s a very interesting fact, which can possibly be interpreted this way: by reducing the round‑trip time (RTT) we have less connection overhead, and that provides more consistent response time.

The Wireshark TCP streams analyses are very similar to those with the idle server:

Scenario A (load) – upstream‑close / proxy_pass_connection‑close (gist here)

Scenario B (load) – upstream‑keepalive / proxy_pass_connection‑keepalive (gist here)

The I/O graph looks similar, too:

As an extra benefit, in a situation where you have more than one upstream backend server, you can use the least_time directive to avoid querying the slower origin server. [Editor – The Least Time load‑balancing method is available in NGINX Plus.]

Key Findings

To summarize our findings from this testing effort:

The TCP congestion window and bytes‑in‑flight can be increased by using TCP keepalives; the factor depends on round‑trip time. For EU West (client) → US West (origin), performance at least doubles.
The NGINX upstream keepalive mechanism reduces connection overhead – in a nutshell, there are fewer TCP/IP packet round trips – and also gives consistent response time. Using NGINX Amplify helps visualize this in a no‑hassle manner.
NGINX Amplify serves as a powerful tool that helps to identify bottlenecks and to troubleshoot excessive TCP connect time, which would otherwise affect application performance.

NGINX Amplify is free for up to five monitored instances of NGINX or NGINX Plus. Sign up to get started.

The post Troubleshooting Application Performance and Slow TCP Connections with NGINX Amplify appeared first on NGINX.

NGINX Amplify is Generally Available

Owen Garrett of F5 — Mon, 02 Oct 2017 03:01:11 +0000

This blog post is one of six keynotes by NGINX team members from nginx.conf 2017. Together, these blog posts introduce the NGINX Application Platform, new products, and product and strategy updates.

The blog posts are:

Speeding Innovation, Gus Robertson (video here)

NGINX Product Roadmap, Owen Garrett (video here)

Introducing NGINX Controller, Chris Stetson and Rachael Passov (video here)

Introducing NGINX Unit, Igor Sysoev and Nick Shadrin (video here, in‑depth demo here, integration with the OpenShift Service Catalog here)

The Future of Open Source at NGINX, Ed Robinson and Owen Garrett (video here)

This post: NGINX Amplify is Generally Available, Owen Garrett (video here)

Owen Garrett: Thank you again. Wow! It’s been a fantastic conference.

As we close out our final keynote, I think you can be forgiven for wanting to catch your breath and just review what NGINX has been sharing this year at nginx.conf.

We talked to you about an open source project to integrate NGINX within the new Istio service mesh.

[Editor – NGINX is no longer developing or supporting the nginMesh project, which is now sponsored by the community. To learn about our current, free service mesh solution, visit NGINX Service Mesh. To learn about F5’s Istio‑based service mesh solution, visit Aspen Mesh.]

We talked about another open source project to build a fully supported Ingress controller implementation that will allow you to deploy load balancing rules from Kubernetes onto NGINX.

Of course, we took a deep dive and we learned from Nick Shadrin and Igor Sysoev about the vision, and the technology, and the future of a new web application platform: NGINX Unit.

Yesterday, we talked about our vision for the NGINX Application Platform – how this suite of products comes together to build an autonomous, capable platform, managing your applications across a range of different deployment environments.

But, as ever, we like to hold one thing to the end. Today is about open source and about community. This is the most significant announcement that we have for our user community this year.

You may remember that we talked previously about a project: NGINX Amplify. Amplify is NGINX monitoring made easy. It’s out‑of‑the‑box graphs and charts for NGINX operating system metrics. It allows you to delve in and explore and compare NGINX application performance. It has a rich static analyzer that builds on the insights and expertise present in the core NGINX software, and gives that to you in an easy‑to‑understand way. It allows you to measure SLAs [service‑level agreements], monitor how your application is performing, and be alerted with automated notifications.

Amplify has been a long‑term project at NGINX. I remember when we first demonstrated this concept two years ago at our conference in San Francisco. It had been something we’d been considering for some time before then. But the project really only started when one of our founders, Andrew Alexeev, stepped up and made it happen. The Amplify project is a testament to his vision and his resolve.

What are the problems that Amplify solves? NGINX open source doesn’t offer good means for performance monitoring. The SaaS solutions that you can use as an alternative are very comprehensive, but they can be difficult to set up, or they’re not specifically tailored to give you the insight you need into NGINX metrics.

We wanted to share the huge amount of expertise in the NGINX team. We wanted to take you away from the Stack Overflow‑induced problem of doing copy‑and‑paste configuration and re‑implementing bad practices.

We wanted, as part of the monitoring solution, to give you alerting on abnormal behavior so you could identify problems proactively. But most importantly – the key value that goes above everything else in Amplify – was ease of use: the primary factor that defined what Amplify was and how you would use it.

Getting started with Amplify is as easy as 1, 2, 3. Amplify is a SaaS‑based solution. Go and register for an account at amplify.nginx.com. Generates a unique API key for you, which you can find in your Account settings. Then install the Amplify Agent on your NGINX host; you can do that with a single command, as illustrated here.

The Agent is also provided for a range of different operating systems, with configuration scripts. If you were automating and industrializing the way that you deploy NGINX, you could then bring the Amplify package into that process. From that point on, your NGINX host will connect back to our SaaS service, and it will wait for instructions.

What does Amplify give you? Amplify gives you three really, really valuable capabilities:

Monitoring and analytics, so you can understand what is happening
Configuration insight
Proactive alerting, so when an event happens or a particular threshold is met, you’re informed

Let’s look at each of these in turn.

The monitoring and analytics start with the “God’s eye view” page. Internally, we call it the “Gus page” because Gus Robertson, our CEO, was the main driver behind adding it to Amplify. The purpose of this page is to give you a quick, one‑stop summary about the state of your NGINX infrastructure.

There are five overlay graphs for a time period that you can select. They monitor five key metrics for your system: requests, error rates, request time, traffic levels, CPU utilization.

You can very quickly do regression comparisons, choosing a time period – say, 24 hours – and comparing that to the previous 24 hours to see any anomalies. At the top left, the green block displays what we call the Application Health Score. This is a custom calculation – you can tune it if you wish – that takes a range of different metrics and applies them together to give you a green, orange, or red status for your application. It’s the easiest way to see if the system is running healthily.

At its core is a data‑gathering, monitoring, and graphing engine. When you log in, you’ll be presented with a series of graphs. You get an overview of the key metrics – CPU, memory, disk utilization – drawn from operating system metrics.

We also pull in NGINX metrics. We interface with the HTTP Stub Status module [for NGINX OSS users] or the NGINX Plus Extended Status module, if you’re using Amplify with NGINX Plus. And then we look further into the access log and the error log to derive more metrics and measurements, to dynamically measure deep application‑specific behavior.

But, we want to assure you that Amplify is very, very careful about how data is collected. All of the data collection begins on the Agent, which takes filters and accumulators to generate just the information that you need in order for the Amplify core service to display the charts. We don’t gather unnecessary data; we only take what’s necessary for you to visualize. We respect your privacy.

You can use the [individual] charts in Amplify, but it really comes alive when you look at the Amplify dashboards. Amplify dashboards allow you to create custom charts, and then assemble them together into your own dashboard view.

Some use cases – the kind of custom graphs that you can create:

You can monitor NGINX performance by drilling down to a particular application or microservice, maybe looking just at a particular URI path.
You can drill down into individual virtual servers, and then you can group those together, so you can look at the performance of a group of NGINX servers, maybe your frontend load balancers or your intermediate caches.
You can drill down into detailed breakdowns of metrics, like HTTP status codes per application, and then you can take those filters, and you can fit them together to create really rich ways of filtering data multiple times, multiple metric dimensions. For example, you could look, focused on a particular URI, and then just count and monitor the number of post requests for that URI.

So, hugely rich monitoring and analytics information.

The second capability in Amplify is the deep insight that it gives you into your NGINX status, the systems you’re running on, and your configuration. It begins with an inventory: a great way to check out the status of your nodes and to remind yourself of the size of the hardware, and the operating system parameters – maybe the Amazon EC2 image that you’ve deployed for each one.

For each NGINX node, we give you a range of information. For example, the Amplify Agent has the intelligence to be aware of vulnerabilities or issues that were fixed in older versions of NGINX. It lets you know if you’re running an older version, and enumerates what the potential fixes are. You may then wish to make an informed decision to apply an update.

My favorite feature in Amplify is the configuration analyzer. Amplify can parse your configuration locally, and then it passes back up a secure, depersonalized version to the core Amplify servers. It doesn’t include sensitive information like usernames and passwords or SSL certificates.

Based on our experience of looking at hundreds and thousands of NGINX users’ configurations through community support and the commercial support channels, we’re very aware of the common mistakes that users make when they configure NGINX. The config analyzer looks out for those mistakes.

It picks typical configuration gotchas, such as; missing server_name directives. It gives you advice for proxy configurations. It’s smart enough to look at the rewrite and location directives that you’re using, and it can give you optimizations or hints on how to improve those.

It’s very security‑aware, of course. It will look at things like the stub_status directive, and it will warn you if you’ve forgotten to put an access control around that directive.

And finally, the monitoring capabilities in Amplify are of no use if you have to get some poor soul to sit and watch the dashboards all day. For that reason, you can configure alerts on any of the monitored metrics, define them with thresholds and time periods, and when an alert is triggered, we’ll ping an email out to the nominated admin.

There’s a rich roadmap for Amplify, looking into the future. For example, we’d love to be able to call a webhook as a result of an alert being triggered. Wouldn’t it be fantastic if you could monitor your NGINX devices, call a webhook, and integrate with other services, executing an “if this, then that” rule?

Next: documentation. I think it’s fair to admit that documentation hasn’t consistently been our strong point. We’re getting better. We’re investing heavily. Amplify is exemplifying how good documentation for a SaaS service like this can be written. There’s been a lot of effort, from the detailed instructions for getting started and installing the Agent, through to advanced configuration.

And if the documentation doesn’t tell you what you need, we’re running a chat feature so you can get in touch with the Amplify team. The chat feature, at the moment, is not always available live, but we’ll get back to you about any questions as soon as is practical.

How does Amplify fit into the NGINX ecosystem? Amplify is being integrated everywhere. Of course, NGINX is the primary target. We created Amplify as a way for community users to monitor and track what’s happening with NGINX.

Amplify can also monitor NGINX Plus in exactly the same fashion. When it does so, it’ll also pull out additional data from the Extended Status module. This gives you an alternative view to the NGINX Plus dashboard that you may know and love.

And, finally: NGINX Controller. As we build our Controller, the roadmap includes taking Amplify and embedding that directly inside Controller to create a private, on‑premises installation of Amplify. Controller’s monitoring capabilities are “powered” by Amplify.

[Editor – NGINX Controller is now F5 NGINX Management Suite.]

Amplify is now generally available – no longer beta. There is a free tier, so you can sign up now. You can start monitoring your NGINX instances at no cost, no obligation. For higher usage, you can move on to a paid tier which is designed just to cover the cost that it takes us of hosting and managing and developing the Amplify service. And for paying users, we provide support through our primary support team.

This has been a great conference. There are so many things that we’ve shared with you. As a product team, we wanted to end the second‑day keynote on a high. We’re really excited about what we, as a company, have achieved with Amplify, and I hope that you are, too. As we bring things to a close, thank you again.

The post NGINX Amplify is Generally Available appeared first on NGINX.

Announcing General Availability of NGINX Amplify

F5 NGINX of F5, Inc. — Thu, 07 Sep 2017 15:55:30 +0000

Have you been using NGINX, but still don’t have real insights into its performance? Ever wonder whether a configuration snippet copied from somewhere is a valid one? Did you ever have issues with SSL configuration in NGINX? Do you want to make NGINX run even better? Or maybe you’ve just been looking around for the simplest and quickest way to set up cloud monitoring for your LEMP stack? (Where the “E” in “LEMP” stems from “eNGINe‑X” ?)

A little over a year ago, we announced the public beta of NGINX Amplify – our own specialized, free monitoring tool for NGINX and NGINX Plus.

Today, we’re super excited to be removing the beta status for Amplify – it’s now generally available (GA) for NGINX and NGINX Plus users. Amplify is production‑ready, supported, and has a free‑tier monitoring plan. Existing beta users – anyone who signed up prior to today – will have no interruption, and all data will be preserved. Existing features in the free‑tier plan will continue to be provided at no charge. New users will have up to five free hosts to monitor with Amplify.

It’s 2017, and there are plenty of monitoring options out there, ranging from general‑purpose APM product suites to the most esoteric infrastructure monitoring tools. Some of these solutions are free, open source, and do‑it‑yourself. Some are paid, and a lot of them are heavyweight – and some are expensive. Surprisingly, setting up monitoring for a Linux server running NGINX and PHP‑FPM can still be a tedious task, especially for someone who’s just starting with web development and NGINX.

NGINX gained its popularity by being a lightweight and versatile solution for accelerating and securing modern web stacks. A similarly lightweight and versatile monitoring tool for NGINX seemed to be a good idea, and that’s why we’ve created Amplify.

In fact, Amplify has started with the user community for NGINX Open Source as the focus. We’re committed to making our open source products useful and powerful at all levels. Our experience with Amplify so far has demonstrated great success in adding to the power of NGINX.

Analyze and take control of your apps with NGINX Amplify

Amplify is all about “NGINX monitoring made easy” and monitoring NGINX everywhere, whether your deployment is on-prem or in the cloud. Amplify is also an unprecedentedly low‑barrier approach to NGINX monitoring. Start with three simple steps, and in under ten minutes you get all the key NGINX graphs, more than 100 NGINX metrics, plus automated configuration file analysis and recommendations. And it’s all based on many years of first‑hand experience, coming directly from the NGINX core and support teams.

Amplify is a tool made by NGINX for NGINX users that provides near‑real‑time monitoring. It collects hundreds of metrics from NGINX or NGINX Plus, log files, and the operating system, and provides a highly customizable interface to visualize them. The metrics can be aggregated over a cluster of NGINX instances for a high‑level overview, or fine‑tuned to track the performance of individual services, APIs, or applications.

Feature‑wise, here’s what Amplify is today:

Overview and SLA – A 30‑second checkup of NGINX health. See aggregate visualizations for key metrics such as response time, the total sum of HTTP 5xx errors over the past 24 hours, and more. Compare trends by switching time periods and instantly notice trends and abnormalities. SLA measurement is based on the Application Health Score index.
Graphs – A standard set of graphs for NGINX, PHP‑FPM, and OS metrics. Graphs can be copied to user‑defined dashboards for customization.
Analyzer – An overview of your NGINX configuration file, with alerts about common configuration problems. The static analyzer helps to improve performance, reliability, and security, based on recommendations from the NGINX team.
Dashboards – A dashboard where you can create your own graphs. Dashboards allow you to visualize more metrics, aggregate graphs across servers, set up custom metrics based on the NGINX log variables, and more. It’s an easy‑to‑use and very flexible metric graphing instrument.
Alerts – Notifications about abnormal NGINX behavior. When an alert is triggered, Amplify sends a notification, and then monitors the situation until it reverts back to normal.
Inventory – The list of monitored systems. You can check hostnames, OS versions, and IP address information.

During the beta phase, we had thousands of users sign up for Amplify. We’ve been collecting close to a billion updates per day, and we’re extremely grateful to everyone who has tried it out and provided feedback. A lot of what we’ve added or changed in the past 12 months has been based on user feedback.

Here’s some of what Amplify users have said:

A very positive experience
Very easy to install
Instant visibility into NGINX performance/behavior
Outstanding NGINX monitoring capabilities, without many alternatives
Useful, practical advice from the static analyzer – alleviates common configuration errors
Gives insight into application quirks and errors from the NGINX vantage point
Automated alerts are very helpful
Nice, modern UI/UX

In the future, we’ll be adding even more to Amplify. A few ideas we’ve been working on recently include more plugins for monitoring, extensions to the static analyzer, improved alerting algorithms, availability testing, and more. If you have any ideas, feel free to share them with us!

Amplify is available for all major Linux distributions, with experimental support for other OSs. For more detailed information on supported operating systems, please refer to the Amplify documentation.

Signing up for Amplify is easy – just visit amplify.nginx.com, provide a few basic details about yourself, and hit the Create button.

For more information on Amplify, please refer to the official documentation and GitHub repo.

The post Announcing General Availability of NGINX Amplify appeared first on NGINX.

Monitoring PHP Applications with NGINX Amplify

F5 NGINX of F5, Inc. — Thu, 27 Jul 2017 02:04:55 +0000

Introduction

PHP is consistently in the top five programming languages for the Web, and many websites are built on PHP applications like WordPress, Drupal, or Joomla. Every day, millions of developers create new applications or extensions to the existing ones using PHP. NGINX and PHP is a popular combination, and we’ve created a two‑part blog series for maximizing PHP performance.

Monitoring a PHP application can be a daunting task. Logs should be checked, specialized settings should be implemented, and related metrics should be collected and well understood. It’s easy to get lost in the process, especially if you’ve just started with something like WordPress, and you’re still learning how to build your website.

It’s common to see NGINX deployed with a PHP application. In this case, NGINX typically works as a local web accelerator, and PHP‑FPM serves as the application server. This setup has proven extremely useful to offload SSL termination, content caching, authentication, and other aspects of HTTP security, from the PHP application to NGINX.

Recognizing that it would be convenient to also monitor PHP‑FPM metrics together with the NGINX metrics, we’ve recently enabled it in NGINX Amplify. (If you haven’t tried it yet, sign up today for the beta.)

If the Amplify Agent is run next to the PHP‑FPM master process, the agent can detect the PHP‑FPM master process automatically, determine its configuration, and start to collect thePHP‑FPM metrics.

If everything is set up properly, you’ll immediately have a set of out-of-the-box graphs for PHP‑FPM in Amplify.

The PHP‑FPM metrics displayed on the Graphs page are cumulative across all automatically detected pools. If you need per-pool graphs, please use the Dashboards to create custom graphs.

Let’s check it all out in more detail.

PHP-FPM Metrics in Amplify

Below is the list of the PHP‑FPM metrics currently supported in Amplify.

Metric in Amplify	Description	PHP‑FPM Status Metric
php.fpm.conn.accepted	The number of requests accepted by the pool	accepted conn
php.fpm.queue.current	The number of requests in the queue of pending connections	listen queue
php.fpm.queue.max	The maximum number of requests in the queue of pending connections since FPM has started	max listen queue
php.fpm.queue.len	The size of the socket queue of pending connections	listen queue len
php.fpm.proc.idle	The number of idle processes	idle processes
php.fpm.proc.active	The number of active processes	active processes
php.fpm.proc.total	The number of idle + active processes	total processes
php.fpm.proc.max_active	The maximum number of active processes since FPM has started	max active processes
php.fpm.proc.max_child	The number of times the process limit has been reached	max children reached
php.fpm.slow_req	The number of requests that exceeded `request_slowlog_timeout` value	slow requests

The agent should run in the same process environment as PHP‑FPM and be able to find the PHP‑FPM processes with ps(1); otherwise, the PHP‑FPM metric collection won’t work. For example, if the agent is run on the host system, but the PHP application is run inside a Docker container on that host, you should add the agent to that Docker container.

When the agent finds a PHP‑FPM master process, it automatically detects the path to the PHP‑FPM configuration. When the PHP‑FPM configuration is found, the agent will look up the pool definitions and the corresponding pm.status_path directives.

The agent will find all pools and status URIs currently configured. The agent then queries the PHP‑FPM pool status(es) via FastCGI. There’s no need to define an HTTP proxy in your NGINX configuration that will point to the PHP‑FPM status URIs. The agent doesn’t use HTTP, and hence does not depend on NGINX to provide an HTTP proxy to the PHP‑FPM status URI.

Setting Up the Agent for PHP-FPM Monitoring

To start monitoring PHP‑FPM, follow the steps below:

Make sure that your PHP‑FPM status is enabled for at least one pool. If it’s not, uncomment the pm.status_path directive for the pool, then restart PHP‑FPM.
Update the agent to the most recent version.
Check that the following options are included in /etc/amplify-agent/agent.conf:
```
[extensions]
phpfpm = True
```
Restart the agent.

The agent should be able to detect the PHP‑FPM master and the workers, obtain the access to the PHP‑FPM status, then collect the metrics.

Using PHP-FPM with a TCP Socket

If your PHP‑FPM is configured to use a TCP socket, first make sure you can query the PHP‑FPM metrics manually with cgi-fcgi(1). Double‑check that your TCP socket configuration is secure – ideally, with the PHP‑FPM pool(s) listening on 127.0.0.1, and listen.allowed_clients enabled as well.

Check that you can query the PHP‑FPM status for the pool from the command line,

$ SCRIPT_NAME=/status SCRIPT_FILENAME=/status QUERY_STRING= REQUEST_METHOD=GET cgi-fcgi -bind -connect 127.0.0.1:9090

and that the above command (or alike) returns the proper set of PHP‑FPM metrics:

Expires: Thu, 01 Jan 1970 00:00:00 GMT
Cache-Control: no-cache, no-store, must-revalidate, max-age=0
Content-Type: text/plain

pool:                 www
process manager:      dynamic
start time:           26/Jul/2016:12:51:05 -0400
start since:          29458594
accepted conn:        757839
listen queue:         0
max listen queue:     0
listen queue len:     0
idle processes:       1
active processes:     1
total processes:      2
max active processes: 1
max children reached: 0
slow requests:        0

The cgi-fcgi(1) utility has to be installed separately (usually from the libfcgi-dev package). This tool is not required for the agent to collect and report PHP‑FPM metrics. However, it can be indispensable to quickly diagnose possible issues with the PHP‑FPM metric collection.

Using PHP-FPM with a Unix Domain Socket

If you have a single PHP‑FPM instance, check that NGINX, the Amplify Agent, and the PHP‑FPM workers are all run under the same user ID (for example, www-data). If there are multiple pools configured with different user IDs, make sure the agent’s user ID is included in the group IDs of the PHP‑FPM workers. This is required in order for the agent to access the PHP‑FPM pool socket(s) when querying for metrics via FastCGI.

First, check that the listen socket for the PHP‑FPM pool you want to monitor, and for which you enabled pm.status_path, is properly configured with listen.owner and listen.group.

listen.owner = www-data
listen.group = www-data
listen.mode = 0660

Next, verify that the PHP‑FPM listen socket for the pool is properly created and has the right permissions.

$ ls -la /var/run/php5-fpm.sock
srw-rw---- 1 www-data www-data 0 Jul 26  2016 /var/run/php5-fpm.sock

Finally, query the PHP‑FPM status for the pool from the command line, and see if it returns the list of PHP‑FPM metrics.

$ $ SCRIPT_NAME=/status SCRIPT_FILENAME=/status QUERY_STRING= REQUEST_METHOD=GET cgi-fcgi -bind -connect /var/run/php5-fpm.sock

If everything is set up properly, the agent will be collecting the PHP‑FPM metrics, and you’ll see the predefined graphs on the Graphs page momentarily.

What to Monitor in PHP-FPM

If you’re running a PHP‑based application like WordPress, it’s very important to regularly check the following, and adjust the PHP‑FPM settings (and sometimes also the NGINX and the operating system settings as well), to ensure your website works well:

Number of PHP‑FPM workers. Amplify metrics like php.fpm.proc.max_child and other PHP‑FPM related metrics can be helpful. If your PHP‑FPM consistently lacks workers to process incoming requests, it’ll take more time for the end user to open the website (if it opens at all!)
Slow requests. PHP‑FPM has a useful feature that enables logging and monitoring of the requests that take too long (how long is actually configurable). Such monitoring is not on by default, so it makes sense to check your PHP‑FPM pool configuration file, and set slowlog and request_slowlog_timeout variables accordingly. The corresponding metric in Amplify is php.fpm.slow_req.
Connection queues. When an incoming request can’t be instantly accepted (for example, there’s no idle worker available to process the request), it’ll wait in a connection queue. The actual settings and behavior depend on the OS configuration, so it’s worth it to periodically check the associated PHP‑FPM metrics in Amplify, such as php.fpm.queue.max.

Troubleshooting

Here’s a list of caveats to look for if the PHP‑FPM metrics aren’t being collected:

No status URI enabled for any of the pools
Agent can’t connect to the TCP socket (when using PHP‑FPM with a TCP socket)
Agent can’t parse the PHP‑FPM configuration – please report issues like this, and we’ll investigate
Different user IDs used by the agent and the PHP‑FPM workers, or lack of a single group (when using PHP‑FPM with a Unix domain socket)
Wrong permissions configured for the PHP‑FPM listen socket (when using PHP‑FPM with a Unix domain socket)
The agent and the PHP‑FPM instance are not in the same process environment (for example, the agent is installed on the host system, and the PHP‑FPM instance is inside a Docker container)

If checking the above issues didn’t help, please enable the agent debug log, restart the agent, wait a few minutes, and then create an issue via Intercom. Please attach the log to the Intercom chat. We’d be happy to help.

Conclusion

We hope it’ll be convenient for you to see the PHP‑FPM metrics in Amplify, and have a broader view into the application behavior. With metrics for NGINX, the operating system, and now the PHP‑FPM metrics, monitoring a LEMP stack with Amplify becomes easier.

We’re planning to monitor even more application stack components in Amplify soon, so please keep in touch. If you have any suggestions, let us know!

Many thanks for using Amplify! If you haven’t tried it yet, sign up for the beta today!

The post Monitoring PHP Applications with NGINX Amplify appeared first on NGINX.

NGINX Amplify for Distributed Application Monitoring

Nick Shadrin of F5 — Fri, 21 Apr 2017 23:34:43 +0000

Many of today’s applications are getting more distributed in nature. Engineers and architects are choosing to implement different approaches for making the apps more scalable and responsive to infrastructure changes. The popularity of “microservices” is rising, and we have written many overview posts on this topic.

A very useful and popular series by Chris Richardson about microservices application design
The Chris Richardson articles collected into a free ebook, with additional tips on implementing microservices with NGINX and NGINX Plus
A series of blogs about the NGINX Microservices Reference Architecture (MRA), also available as a free ebook
Other microservices blog posts
Microservices webinars

Making your application more distributed does not always come easy, however. New challenges arise in areas that previously didn’t even exist. Let’s take a close look at some of the challenges in monitoring distributed applications.

Larger Number of Objects to Monitor

When you break your application out into multiple services, the plain number of monitored objects rises significantly. We regularly see examples where hundreds of different services are scaled from tens to thousands of instances each.

With NGINX Amplify as your monitoring system, you can use multiple interface features to select and aggregate the metrics from multiple service instances. To name a few:

Image name – Specify the image name with the imagename parameter or AMPLIFY_IMAGE_NAME environment variable in /etc/amplify-agent/agent.conf. NGINX Amplify aggregates the metrics from all instances with the image name into a single instance in the NGINX Amplify Web UI.
Hostname – Specify the hostname with the hostname parameter or AMPLIFY_HOSTNAME environment variable in /etc/amplify-agent/agent.conf. Otherwise, NGINX Amplify uses the default hostname generated by your infrastructure, which might be nondescriptive or just plain unusable.
Aliases – Define user‑friendly names for your services in the Web UI.
Tags – Since March 2017, NGINX Amplify has supported tagging of NGINX instances. After creating a tag, you can use it in the custom dashboards, and create graphs showing the aggregated or average value for all objects that have the tag.

In the figure below we have tagged the dev-nodejs-api01 NGINX instance with backend, ubuntu, and docker.

In the next figure we have tagged the prod-nginxplus-lb01 instance with frontend, plus, and ubuntu, and also created the alias nginx01 for it.

Complicated Network Communication

In a distributed application, most of the services are making HTTP calls to other services, and you need to monitor this network traffic appropriately.

Having NGINX as a part of your distributed application delivery system simplifies the networking layer, because it extends the same set of network features across multiple systems. This includes the use of SSL/TLS, keepalive connections, tracing of requests with the $request_id variable, rate limiting, advanced Layer 7 routing to your applications, etc. Using NGINX for multiple purposes (proxying, load balancing, caching, and so on) also means you can monitor just one solution rather than a number of tools that each perform one function.

Installation, configuration, and ongoing use of NGINX Amplify is simpler than other solutions. NGINX Amplify parses your NGINX configuration files, finds the location and formats of all the logs and stats interfaces, then instantly starts monitoring them. It also doubles as a simple system‑level monitor, and you can even use it to monitor network traffic for servers that don’t have NGINX installed.

Short Object Lifetime

The microservices approach promotes shorter lifetimes for objects, which fits well with immutable infrastructure. The lifetime of a service instance in a microservices application might be as low as a few minutes or hours, in contrast to monolithic applications where an instance might live for days or even months.

Some monitoring systems are not useful for objects with short lifetimes, because there is a long delay before data from an object starts appearing in the monitoring system. NGINX Amplify starts parsing logs and collecting metrics as soon as it finishes parsing its configuration file. It sends monitoring data to the cloud every minute.

Multiple Software Stacks

One advantage of distributed applications is that all components do not have to conform to the same software stack guidelines. Many organizations prefer to limit the number of application stacks for reasons of simplified deployment, but few use only one stack throughout a distributed application. Most choose several different application languages and services to best fit specific purposes.

Placing NGINX and NGINX Amplify in front of every application instance provides the same delivery layer across multiple software stacks. NGINX supports application server protocols like FastCGI or uwsgi together with the HTTP protocol. You can use it in front of your application servers regardless of the application language (Go, Node.js, PHP, Python, etc.).

Instance Termination

In many environments the average service instance does not have a permanent filesystem, and it’s terminated after short‑term use. Collecting of the log is usually implemented in a distributed syslog‑compliant server or service.

NGINX Amplify Agent can be a local syslog server for metric aggregation. This way, your NGINX instance sends logs locally to the agent, and the agent does not try to access /var/log/nginx/access.log in the local filesystem. This does not mean that you can disable existing log collectors or syslog servers. NGINX Amplify components cannot store log data; they only store aggregated metrics derived from the logs. Configuration of the syslog feature is simple; refer to the NGINX Amplify documentation for details.

Summary

NGINX Amplify incorporates many features that make it ideal for distributed application monitoring. Its configuration options for aggregation of statistics, user-friendly naming, and tagging make it easier to monitor multiple service instances. Its quick startup and support of syslog-based logging make it suitable for the short-lived service instances common to microservices applications. And together with NGINX and NGINX Plus, it makes it easier to track complicated network communication and to support a mix of software stacks.

Sign up today to start using NGINX Amplify for free. You can also start a free 30‑day trial of NGINX Plus today or contact us to learn more about how NGINX Plus and NGINX Amplify can improve the performance of your distributed applications.

The post NGINX Amplify for Distributed Application Monitoring appeared first on NGINX.

The Dreaded 418 Error: Monitoring HTCPCP Microservices in IoT Deployments

Nick Shadrin of F5 — Sat, 01 Apr 2017 07:15:17 +0000

The Internet of Things (IoT) is a growing segment of the technology market today. NGINX and NGINX Plus are often at the core of IoT deployments because of their small footprint, high performance, and ease of embedding into various devices. We see NGINX and NGINX Plus installed both on the server side and on the device side very often.

IoT installations can use a variety of popular protocols, including HTTP, HTTP/2, MQTT, and CoAP. We’ve covered the use of our products with HTTP protocols in many blog posts, and MQTT use cases in two recent posts:

But NGINX and NGINX Plus are equally suited for special‑purpose protocols such as the Hyper Text Coffee Pot Control Protocol (HTCPCP). This is the first of series of blog posts where we plan to cover development, deployment, and monitoring of HTCPCP applications.

Installation of HTCPCP‑enabled devices is steadily growing every year, with a seasonal spike at the beginning of April.

Reference Architecture

HTCPCP architecture in a large‑scale installation is different from a standard web application. In a standard web application, we usually see a lightweight client side communicating with a heavier set of backend applications. The architecture of HTCPCP applications is fundamentally different. In this case, the HTCPCP server is designed to work on the device (namely, a connected coffee pot) and the clients connect to these machines using either special command‑line clients (modified curl commands) or a proxy infrastructure.

The microservices approach fits HTCPCP deployments very well. Service discovery, load balancing, cloud deployments, and API gateways are all applicable here.

The main parts of a microservices architecture for HTCPCP are DBAs, DPI, and USB.

DBAs

The Distributed Backend Applications (DBAs) are usually deployed on the coffee‑making devices, and can be implemented in various languages and application platforms. There are multiple commercial and open source examples, including:

An implementation on Python for Raspberry Pi
(https://github.com/HyperTextCoffeePot/HyperTextCoffeePot)
An implementation in JavaScript for Node.JS
(https://github.com/stephen/node-htcpcp)
A high‑performance implementation in C
(https://github.com/madmaze/HTCPCP)

DPI

The Distributed Proxy Infrastructure (DPI) in our HTCPCP deployment is NGINX or NGINX Plus instances embedded in the IoT devices. DBAs usually don’t require a DPI in a smaller installation. In a large enterprise, however, the following features are crucial for reliable service in the HTCPCP deployment:

Rate limiting and overflow protection
Health checks
Traffic monitoring and logging
Traffic encryption and security controls
Authentication

NGINX Plus is the most popular DPI in enterprise‑class HTCPCP deployments.

USB

In a development system or small‑scale installation (up to five coffee machines), the clients can connect to a DPI instance or even directly to the DBA. In a larger enterprises, however, we usually see a Unified Scalable Backend (USB) developed and deployed either on premises or as a scalable cloud service. The USBs usually perform functions like:

Coffee machine discovery, which is similar to service discovery in conventional microservices (read more about service discovery on our blog)
Protocol conversion from HTTP to HTCPCP
Web interface
End‑user authentication and access control

Most highly loaded web applications today are using NGINX and NGINX Plus as a load balancer, reverse proxy, and cache, and HTCPCP USB is no exception.

Monitoring HTCPCP Applications

A large‑scale installation of HTCPCP requires a set of monitoring tools that have to be more sophisticated than a generic monitoring system for a simple microservice. In an HTCPCP deployment we see more protocol conversions and more different types of devices where the monitoring agent needs to be deployed. In addition, organizations impose tight SLAs for preparation of a cup of coffee, because this directly impacts productivity of their employees.

The Infrastructure Department for IoT (IDIoT) at every company is constantly looking for ways to effectively monitor HTCPCP installations.

At NGINX, Inc. we’ve developed NGINX Amplify, the monitoring and configuration assistance system that suits an HTCPCP deployment just as well as a complex HTTP deployment.

To start using NGINX Amplify in your infrastructure, you need to install the NGINX Amplify Agent together with your NGINX instance on all DPIs and USBs. The beauty of our solution starts with the fact that it’s absolutely agnostic to the underlying architecture. While USBs might be located on large enterprise servers or containers in the cloud, you usually own the hardware that runs the DBAs and DPIs. NGINX and NGINX Amplify fit equally well in the cloud and on your hardware.

In the screenshot below you can see a small portion of NGINX Inc.’s own HTCPCP installation, with multiple machines connected through various instances of NGINX and NGINX Plus.

NGINX DevOps engineers are dedicated to providing infrastructure that is resilient to any issues with electricity, the network, or grind inconsistency. In our recent blog post we described how the advanced graphs and dashboards help our own infrastructure team to maintain the highest standards of uptime and reliability.

In addition to standard HTTP monitoring, you can use NGINX Amplify to monitor the specific issues that arise only in an HTCPCP deployment.

NGINX and NGINX Plus produce logs which can be configured to report a lot of internal data. The NGINX Amplify Agent can read those logs, or receive them using the syslog interface. As an example, let’s monitor one of the most devastating status codes that can affect your HTCPCP deployment: 418 I'm a Teapot. This status code is returned if the type of your IoT device is mismatched: the DBA produces tea, but the USB requested a coffee drink.

Fortunately, even in a caffeine‑deprived state it’s easy to create the appropriate NGINX Amplify filter:

Create a new graph in your custom dashboard
Select the metric nginx.http.status.4xx
Create a filter for $status ~ 418, optionally with additional criteria

Once the filtered graph is created, the agents start to apply this filter to the logs, and report the resulting metrics to NGINX Amplfiy Cloud.

The following screenshot shows the daily rate of 418 errors across a large corporate office in Europe.

Learn more about NGINX Amplify filters and dashboards on our blog.

Conclusion

NGINX, NGINX Plus, and NGINX Amplify are the de facto standard for microservices applications and IoT deployments. If your deployment involves the use of HTCPCP protocol at enterprise scale, you need tools that are up to the task. As always, you can count on NGINX, NGINX Plus, and NGINX Amplify.

Get your HTCPCP deployment perking with NGINX Plus – start your free 30-day trial today or contact us to discuss your caffeination schedule. And since you’re wide awake anyway, you’ll want to keep a close eye on your HTCPCP apps with NGINX Amplify – sign up now.

The post The Dreaded 418 Error: Monitoring HTCPCP Microservices in IoT Deployments appeared first on NGINX.

Inside NGINX Amplify: Insights from Our Engineers

Nick Shadrin of F5 — Tue, 17 Jan 2017 14:00:31 +0000

This blog post is the fourth in a series about how to monitor NGINX and make the most out of your web delivery infrastructure with NGINX Amplify:

Setting Up NGINX Amplify in 10 Minutes

Improving Server Configuration with NGINX Amplify Reports

Using NGINX Amplify Custom Dashboards and Filters for Better NGINX Monitoring

Inside NGINX Amplify: Insights from Our Engineers (this post)

The public beta version of NGINX Amplify is now used on thousands of servers. Many sysadmins and DevOps personnel have implemented it in their development, staging, and production environments. Feedback has been very positive, but we are receiving many requests for more examples of how to use it in the real world.

We asked ourselves: who knows NGINX and NGINX Amplify the best? Well, it’s definitely our own DevOps professionals. So we interviewed two of our experienced operations engineers, Andrei Belov and Jason Thigpen. We asked them to describe how they configure NGINX and NGINX Plus for better monitoring and how they use NGINX Amplify.

Andrei Belov

Tell us a bit about your experience and background, before working at NGINX, Inc. and now.

Before NGINX, I worked at a medium‑size Internet service provider which also provided a wide set of hosting services, including a large shared‑hosting farm powered by Apache. I started as a tech support engineer and moved up to become a site reliability engineer [SRE], then a systems engineering manager. Now, at NGINX, I work with some of the largest cloud providers: building their load‑balancing systems, doing release engineering for NGINX Plus, and helping with DevOps tasks for NGINX Amplify infrastructure such as EC2, RDS, and Route 53.

Any interesting projects in your past?

[Open source] NGINX was launched in 2004, and we started to use it at my previous job around 2007. Before NGINX we were using the OOPS caching proxy server (unmaintained since early 2000s) in front of Apache 1.x. For the first step of NGINX adoption, we put NGINX as a reverse proxy in front of Apache, gradually replacing our OOPS accelerators with NGINX. Then, we started to serve static files with NGINX, in order to further speed up the delivery of hosted websites.

The third step, which I led, was porting mod_aclr to support Apache 2.x which we’d just started to use back then, and embedding it in our setup. Apache behind NGINX receives the request, and if the request is trying to get a static file, Apache stops request processing, returning the response header X-Accel-Redirect back to NGINX along with the file location. NGINX then serves the file to the user. That tremendously offloaded our hosting machines’ resources and basically saved us at times of peak load. Another thing that helped us was the support for keepalive to upstreams behind NGINX.

How did you monitor servers in the past?

Years ago, we used our own set of scripts, custom built for the task. We were taking stats and metrics from the machines, switches, and routers in our own data center using SNMP, then feeding it into a central database with a simple but quite functional web frontend.

What qualities are necessary for a good monitoring system?

Minimum false positives and ease of use.

When did you join NGINX?

Three months after it became a company, five years ago.

You’ve been with NGINX Amplify project since its very beginning. What is the history of its software stack?

That was an interesting development. After a set of whiteboarding sessions we implemented a prototype in an on‑premises virtualization system. Then we quickly moved everything to EC2. The requests are now resolving through Route 53, then going to an NGINX Plus load‑balancing layer. We use NGINX Plus everywhere in the stack. I work most frequently with the backend receiver services and with the application deployments. NGINX does not require a lot of attention; it just works. It only needs to be monitored properly.

How was your monitoring implemented? What did you try to use, what worked and what did not?

At the very beginning of the project, monitoring was not implemented. Once the infrastructure started to grow, several months before public beta, we added different monitoring systems. NGINX Plus live activity monitoring was used together with them. All these tools provide views on NGINX Plus status, with longer retention time.

How did you start using NGINX Amplify for monitoring?

When the NGINX Amplify Agent became available, we installed it immediately in the developer stack. We wanted to start using it ourselves as early as possible. With the release of private beta, we started to use it in the production stack.

Which graphs do you usually look at?

It depends on the situation. Most frequently I look at requests per second, network traffic, connections per second, and NGINX upstream errors.

NGINX Amplify graph showing a steady rate of requests per second

Has NGINX Amplify helped you during an outage?

We recently had an interesting situation. We usually have the NGINX Amplify Agent running together with another monitoring system agent, which monitors the system metrics. Last month, one server started to appear offline in that monitoring system for no apparent reason. The server was working, but the monitoring agent failed.

In order to understand why the agent was not working correctly, we created a custom dashboard in NGINX Amplify. We collected every possible system metric, including IOWait, iostat, and network metrics, and started looking for abnormalities. We found a correlation between metrics: every time the service appeared down, we had a burst in IOWait. That information helped us troubleshoot the cloud storage issues that were occurring.

NGINX Amplify graph showing a burst in IOWait time

Which system tools do you usually run during troubleshooting?

Depends on the situation. Generic system tools: vmstat, iostat, ps, top. Network tools: netstat, ifstat, ss, tcpdump.

What do you find especially useful in NGINX Amplify?

Definitely the custom dashboards and custom graphs. Sometimes I need to monitor something very specific. Let’s say there’s something wrong with the server, and I need to find out the cause of the issue. Now I can make my own dashboard with graphs on anything. This helps me in detecting anomalies and finding the root cause of the issues.

What are you going to research more in NGINX Amplify? How are you going to extend its use for your systems?

I definitely want to research filters further. I think they provide an exceptional level of detail and are likely to become critical for complex troubleshooting.

Jason Thigpen

Tell us a bit about yourself. What’s your background and how did you start using NGINX?

I would describe myself as an “infrastructure automation enthusiast”. Generally my roles have focused on building out infrastructure and automating management of it. Not necessarily “set up 12 servers”, but rather “ensure we can easily and automatically create and destroy infrastructure resources on demand”.

In the early 2000s I was the Systems Engineer for a graduate program at a university where I inherited an infrastructure consisting of beige box PCs stacked on wooden shelves. I quickly acquired an old server rack from another department and built out a stack of redundant rack‑mounted hardware. I virtualized everything on Xen, focusing on wrappers around common administration tasks giving me the flexibility to build out internal web tools primarily in PHP, Python, and Ruby/Rails behind NGINX.

As AWS gained popularity, more people became interested in “the cloud” – AKA “someone else’s computer”. I gained a lot of experience consulting on many on‑site–to–cloud migrations over the years. I then spent some time working for an employer with one of the largest data‑center footprints. Through that combination of experience, I have established a vocabulary for defining network resources that can apply to both data‑center and cloud resources. Ideally, I wanted to make those resources more agnostic to the platform.

What makes working on NGINX Amplify project interesting?

I have always had an interest in building a business on top of open source software and the communities behind it. NGINX fits that vision perfectly. In the NGINX Amplify team, we are constantly dealing with the challenge of scaling a production infrastructure from scratch to meet the demands of our growing customer base. This also gives us an opportunity to demonstrate best practices for how to run a software as a service (SaaS) product.

Can you describe the software stack of NGINX Amplify?

We run NGINX at every layer of our stack. That starts with our load‑balancing layer at the edge, routing traffic to frontend and backend services. We also run NGINX in front of each application server as a local reverse proxy. Frontends and backends are distributed across multiple availability zones for redundancy and high availability.

Do you run NGINX Amplify Agent in the stack?

We have run NGINX Amplify Agent on all nodes from day one. NGINX Amplify complements NGINX to the point that I see no reason not to run it next to every instance. You get a central location for visibility into what’s going on with your NGINX deployments and how that correlates to other system metrics.

Can you describe your day‑to‑day monitoring activities? What are you looking at?

Thankfully, I joined after Andrei and others had put together some really great overview dashboards. Among other things, one of our “go to” dashboards covers requests per second, amount of traffic sent and received, and counts of response status codes at every layer of our stack. Using filters, we can get even more granular insight into exactly which response status codes we’re returning.

Any interest in system‑level metrics: CPU, RAM, disk I/O?

I think system metrics are interesting as correlations to other NGINX metrics. We get them with the NGINX Amplify Agent, and it helps us paint the whole picture of what might be happening at any given time.

Do you predict trends with NGINX Amplify?

Today, this largely depends on existing known trends. Since NGINX Amplify is a new product, we often have limited long‑term data. However, based on historical experience we look at time periods that we consider “good” or “bad” and set thresholds for new NGINX Amplify alerts. That feedback helps inform decisions for new graphs and filters that we can use in more meaningful ways.

What special systems do you use for alerting?

We use NGINX Amplify alerts to send emails as we reach warning level thresholds. We send similar critical level alerts to our on‑call escalation service, PagerDuty.

How critical is a server failure?

A well‑architected infrastructure should handle a single host failure seamlessly, and ours is no exception. Our NGINX Plus load‑balancing tier is key to handling a dynamic infrastructure. For instance, if we see a backend host behaving badly, we can quickly and easily prevent it from accepting production traffic and replace it with a new healthy instance using the upstream_conf interface [Editor – This interface is replaced and deprecated by the NGINX Plus API in and later].

How do you do that?

We primarily use SaltStack for infrastructure orchestration. At the simplest level, you can target your upstream load balancers with the Salt execution module http.query, with a query to the appropriate NGINX Plus API endpoint. Ideally, you utilize the Salt Reactor system to act on events triggered by beacons.

Did you have events where you had a server failure and had to use your monitoring tools? What did you do then?

One memorable issue was surfaced through the monitoring of our message queueing and background jobs clusters. The NGINX Amplify Agent gave us the system‑level metrics that we further used for alerting. What we don’t want to see is alerting from customer emails. We expect our infrastructure to alert us before our customers notice anything.

What do you usually look at in case of performance degradation?

We start with the overview dashboards that we have configured in NGINX Amplify, looking for quick and obvious visual correlations. This will usually highlight a host or a service spike in some metric, whether it is requests per second, CPU usage, or errors. Then we take that service or host out of its respective load‑balancing pool for forensics and manual maintenance. As described above, we can also spin up a replacement machine while we continue tracing the issue with the failed server.

Which specific NGINX Amplify metrics and dashboards have been particularly helpful to you?

I remember we began seeing an increase in the nginx.http.status.5xx metric for one of our services. We needed to narrow those down to understand which errors they were; we return a few different application level response status codes for different reasons. For that, using filters to split those into multiple metrics allowed us to create graphs comparing more specific application errors. This showed us that the increase in application errors was related to failed database connections. Using this new insight we discovered an issue with our database hosting provider that we were able to troubleshoot and resolve with them.

What tools do you run in order to troubleshoot issues?

Tracing tools, tcpdump/Wireshark. Many parts of the system tools output are presented in the dashboards already, so we only need to narrow things down to a particular service on the server and troubleshoot it. The logs of the service usually help a lot. Then we may take traffic captures with tcpdump, or even debug the application itself. Sometimes the problem is system‑level, and iostat together with other tools can be helpful.

Which features of NGINX Amplify are you planning to research more and implement later?

I find filters really interesting. I really like the idea of applying filters to some metrics, and I want to understand them better. It is a very powerful tool that we plan to use extensively.

Conclusion

NGINX Amplify is proving its value in our own stack here at NGINX and for thousands of our customers as well. Download the NGINX Amplify public beta and try it yourself today.

The post Inside NGINX Amplify: Insights from Our Engineers appeared first on NGINX.

NGINX Amplify Archives - NGINX

The Future of NGINX: Getting Back to Our Open Source Roots

Open Source is Evolving Fast. NGINX Is Evolving, Too.

The Modern Apps Reference Architecture

The Future of NGINX: Modernize, Optimize, Extend

Promise #1: Modernize Our Approach, Presence, and Community Management

Promise #2: Optimize Our Developer Experience

Promise #3: Extend the Power and Capabilities of NGINX

Introducing Kubernetes API Gateway, a Brand New Amplify, and NGINX Agent

Looking Ahead

Monitoring MySQL with NGINX Amplify

Configuring the MySQL Plug-in

Troubleshooting

MySQL Metrics in Amplify

Conclusion

Monitoring NGINX

Using the stub_status Module

Log Files and syslog

Monitoring NGINX with Amplify

Additional API Module in NGINX Plus

Other Monitoring Solutions

Troubleshooting Application Performance and Slow TCP Connections with NGINX Amplify

Introduction

Testing Setup

Test Results

Test Results Under Load

Key Findings

NGINX Amplify is Generally Available

Announcing General Availability of NGINX Amplify

Monitoring PHP Applications with NGINX Amplify

Introduction

PHP-FPM Metrics in Amplify

Setting Up the Agent for PHP-FPM Monitoring

Using PHP-FPM with a TCP Socket

Using PHP-FPM with a Unix Domain Socket

What to Monitor in PHP-FPM

Troubleshooting

Conclusion

NGINX Amplify for Distributed Application Monitoring

Larger Number of Objects to Monitor

Complicated Network Communication

Short Object Lifetime

Multiple Software Stacks

Instance Termination

Summary

The Dreaded 418 Error: Monitoring HTCPCP Microservices in IoT Deployments

Reference Architecture

DBAs

DPI

USB

Monitoring HTCPCP Applications

Conclusion

Inside NGINX Amplify: Insights from Our Engineers

Andrei Belov

Jason Thigpen

Conclusion

Using the `stub_status` Module