Notes from DrupalCon - Keeping the lights on (operations and monitoring best practices)
The following are my notes from Keeping the lights on - operations and monitoring best practices on Wednesday, March 21st, 2012 at DrupalCon Denver.
“Measurement is the link between mathematics and science” - Brian Ellis, Cambridge, 1968
Primary topics
- Platform management, monitoring, and measurement
- Security testing and monitoring
- Monitoring - mean time to recovery is a key metric (how long does it take to fix)
- Ongoing operational security
Essential Monitoring Features
- Real-time AND trend monitoring
- Custom plugin system
- Avoid proprietary languages to ensure anyone can contribute
- Runs your functional tests
- Active AND passive monitoring
- Log analysis
- Escalation
- Quality of life - levels, rotations
- Remote command/”job” execution
Functional tests
Business metrics
- PageRank
- Things that are relative to the business
- Number of users
Technical monitoring
- Apc tool
- Service state
- Cron - execute from remote monitoring system like Nagios
Nagios Module
Job Automation
- Jenkins is the defacto standard for continuous integration and deployment
- Codify and scripting all deployment activities
Logging
- Turn on syslog logging - instead of database, write to a text file
- Centralized off-server
Monitoring Overview
- Ping or HTTP result code alert monitoring || Live user story testing and trend analysis
- Crontabs and poormanscron || centralized cron management
- Logging to database only || Syslog logging to central host
- Logging in to see Drupal errors and available updates || Centralized Drupal monitoring
- Offsite backups || Off-cloud backups
Book recommendation
Security Testing and Monitoring
- Tools and services to detect and respond to vulnerabilities and threats.
Detect
Finding the problem
Respond
- Mitigate, fix, alert
- Having a response plan before incidents occur
Vulnerabilities
Threats
- Ways to attack, whether or not they are succesful
Vulnerabilities (OAuth Top 10)
- Injection
- XSS - biggest problem in Drupal
- Broken auth/session - using core? OK
- Insecure direct object reference - manging access
- CSRF
- MIsconfiguration
- Insecure cryptographic storage - site specific, SSH, using a VPN to encrypt traffic
- Exception - password hash, encrypted information within site and database (encryption module)
- Failure to restrict URL access
- Insufficient transport layer protection - https
- Unvalidated redirects and forwards
Detecting Vulnerabilities
- Automated code reviews
- Static: Coder Module Secure Code Review module, Acquia
- Dynamic: Not common
- Automated penetration testing
- Generic tools: Grendelscan (open sourcE), Fortify, Rational
- Drupal Tools: Acquia
- Manual code reviews
- db_query(“DELETE FROM {users} WHERE name = “ $name”);
- Manual penetration testing
- Be an intelligent robot
- Vuln.module (NEEDS PORT TO DRUPAL 7), Firefox: Tamperdata
Security review module
Responding to Vulnerabilities
Custom code:
- Fix it
- Test it
- Deploy it
- Contact customers (?)
Contributed Code
- 4 steps above
- Work out a simple, repeatable test case
- Report the issue to the Drupal Secuyrity Team
- Compare to http://drupal.org/security-advisory-poicy
- Work with the Team and maintrainer to get a fix
- something else???
Detecting threats
- Spam
- Can be obvious indicator, but only if you’re actually monitoring
- Defacement (can be hidden)
- Use version control, Hacked! module
- security_review.module
- Watch revisions
- Crowdsource (flag)
- Code injection (xss, php)
- IDS - PHPIDS, TinyIDS
- Web Application Firewall
- Brute force password
Responding to threats
- Spam
- Mollom, Akismet
- Spam, flag_abuse
- Defacement
- Revert to good copies from version control
- Overwrite with new versions
- Node revisions, db backup
- Code injection
- Keep code safe
- Proactively block attackers at the firewall
- Brute force password
- login_security module
- Included in Drupal 7 core
- Help with everything: httpBL
Site monitoring
- Internal/Free
- Views
- Mailmon - brand new
- Quant - charting
- Report - charting
- Chart (system_charts)
- External/Paid
- Acquia network - ~$350/year, includes library, support
- Droptor - $24/month/site, monitoring only
- Drupalmonitor.com - unknown pricing
Three keys to ongoing operational security
- Vigilance
- Strong Chain
- Incident Handling
What are the things that we need to do after launch on an ongoing basis after launch?
- Maintain eternal vigilance
- Automate as much as possible
- Avoiding human error - often “I was too busy to get to it”
- Conduct periodic audits
- Never sleep
Periodic Audit Program
Avoiding weak links in the chain
- Education
- Training
- Awareness
Patching
- PCI DSS requires patching of all critical infrastructure within 30 days
- What:
- Linux or other underlying OS
- Firewall infrastructure
- Switches
- Wireless Access Points
- … more
Incident Management (needs to be written)
- Initial Response
- Notification and Escalation
- Smallest possible group for as long as possible, then figure out communication
- Response Strategy
- Do we need to update? Notify users?
One important take-away
- Don’t use the same password on multiple sites you administer (Playstation Network)
Secure Site Admin Pledge
- I pledge to take the following steps to be a responsible Drupal site administrator:
- I have set a unique, strong password for any accounts with administrative privelegaes, and I do not share passwords across sites
- I use multi-factor authentications (e.g., ssh keys) for OS-level access and have password-only access disabled on my systems.
- I have and execute a patching plan that includes the OS, web server, and Drupal layers (including core, modules, and custom code)
- I have and execute at least a minimalist periodic audit plan
- I am aware of and comply with applicable information security requirements for the data that my site handles (HIPAA, PCI DSS, etc.)
- I monitor vulnerability announcement mailing lists for the technologies I use on my site
- I monitor my system regularly such that I know how it behaves under normal conditions
- I have a documented incident handling plan that I am familiar with and can use in an emergency
- I take responsibility for ensuring that any custom code is developed according to secure coding best practices and is evaluated before being put into production
- I will be eternally vigilant and investigate any unusual/suspicious site behavior
- I have a process in place to ensure non-production sites are appropriately protected from external/access /crawling
- I am an advocate for practical information security practices and avoid “Security theater” showmanship
Thank You!
Please get in touch to chat about these topics:
No comments:
Post a Comment