Category Archives: cloud

IBM SoftLayer IaaS – notes from 2 day training class in NYC

I attended SoftLayer training in NYC and wrote up a few pages of notes. I really like the idea of building IaaS systems via web control panels and APIs, and SoftLayer delivers on this.

Overview

  • 21k customers in 140 countries
  • 15 data centers, 18 network points of presence (PoPs)
  • Mix and match of virtual (diverse set of hypervisors) and bare metal servers, all managed via web control panel and/or API
  • Deployment in real time with high degree of automation.
  • Some customers build a hybrid solution using SoftLayer in addition to their own datacenter. Connect via VPN or leased line.

Server Architecture

  • While most cloud providers offer only virtualized resources on shared infrastructure, SoftLayer offers the option of bare metal and/or virtualization, and the option of shared and/or dedicated infrastructure.
  • Redundancy in some cases stops at the rack, not the server. For example, multiple power supplies for the rack not for each server in the rack
  • Server options
    • Multi-tenant (you don’t know who/what else is running on the same resources as you)
      • Virtual (public node)
        • Managed Citrix Xen hypervisor
        • Monthly/Hourly billing
        • Up to 16 cores
        • Local storage or SAN
        • Free 5 TB outbound data transfer if choose monthly billing
        • 15 minute provisioning
      • Single tenant (all resources dedicated to single customer, aka “private cloud”)
        • Bare metal
          • Optional (unmanaged) hypervisor, such as Citrix Xen, VMWare, Hyper-V, Parallels
          • Monthly billing. In some instances can do hourly billing
          • Free 20 TB outbound bandwidth per month
          • Optional private network, private rack
          • Options on CPUs, up to 36 internal drives (build your own NAS), NVIDIA Tesla GPU http://www.nvidia.com/object/tesla-servers.html
          • 2-4 hour provisioning. That’s the time it takes for the machine to become visible to the customer. Additional time needed to apply operating system and applications.
        • Virtual (private node)
          • Pretty much the same as Multi-tenant virtual except that you have dedicated hardware.
          • You can install as many virtual machines as you want on your hardware.
    • OK for customer to deploy their own software appliances, but there is no option to ever deploy your own hardware
    • Image Templates
      • Software/configuration of a physical or virtual space
      • Apply to a machine to create a runtime environment
      • Two types of image templates
        • Standard
          • Virtual machine only
          • Any operating system
          • Citrix Xen only
        • Flex
          • Both physical and virtual machines
          • Red Hat (RHEL) and Windows only
          • All hypervisors

Networking

  • Three networks
    • Public (2 NICs, both usable rather than just redundancy)
      • Bare metal: 20 TB outbound bandwidth per month
      • Virtual: 5 TB outbound bandwidth per month. Can be pooled if some servers aren’t publically exposed
    • Private (2 NICs, both usable rather than just redundancy)
      • No limitations on bandwidth. Great for backups across multiple datacenters
      • Private VLANs can include servers in multiple datacenters. A server can connect (span) to multiple VLANs
    • Management/Admin (1 NICs)
  • SoftLayer SLA: “reasonable efforts to provide 100% service”
  • VPN
    • tunnels: SSL, PPTP, IPSec
    • Recommends managing with FortiGate or Vyatta appliances
  • SoftLayer Looking Glass: Test latency between your datacenter and SoftLayer, or between resources within or across SoftLayer datacenters
  • Content Delivery Network
  • Load Balancing
  •  Firewalls
    • Fortinet FortiGate 3000 series http://www.fortinet.com/products/fortigate/3000series.html
    • Shared hardware
      • Multi-tenant
      • Managed through Customer Portal & APIs. No console access because it’s shared hardware.
      • Configured to protect a single server
    • Dedicated hardware
      • Same as above, but single-tenant, yet still no console access.
      • Configured to protect a single server or an entire VLAN
    • Dedicated appliance
      • Same as dedicated hardware, but provides access to console and native tools. This gives the customer more capabilities.
  • Gateway Appliance
    •  Vyatta
      • Applies to any portion of, or entire customer infrastructure at SoftLayer
      • Used forGateway Appliance
        • IPSec VPN tunnels
        • NAT
        • Firewall
        • Router
      • Configured by console or Vyatta gui via VPN. No SoftLayer Customer Portal or API
  • DNS Options
    • Customer uses their own DNS that’s external to SoftLayer
    • Customer uses SoftLayer’s DNS, which is redundant across datacenters
    • Customer uses 3rdparty DNS
    • Customer runs their own DNS hosted on their own machines within SoftLayer

Security

  • Much easier to deploy/configure security via the SoftLayer Customer Portal than in a traditional datacenter. One common source of vulnerabilities is incomplete or incorrect security deployments, so an easier to use method would suggest that it’s easier to create a secure system.
  • Offerings
    • McAfee (Windows) anti-virus
    • DDoS – detect and isolate (take off line) machines that are under attack, but does not have service to remediate the threat
      • Cisco Guard DDoS protection
      • Arbor Peakflow traffic analysis
      • Arbor ATLAS Global Traffic Analyzer
    • Servers local to datacenter for Windows and Red Hat updates
    • IDS/IPS protection
      • Nessus vulnerability assessment and reporting
      • McAfee host intrusion protection
    • FortiGate firewalls
    • US Gov’t standards
      • Drive wiping using same tools as Dept of Defense (DoD)
      • SP800-53 US Gov’t standard
      • Federal Information Security Management Act (FISMA).
      • FedRAMP datacenters
      • Health Insurance Portability and Accountability Act (HIPPA). Will sign agreement with customer.
    • Two factor authentication
      • Symantec identity protection
      • Windows Azure Mult-Factor
    • VPN
      • Client site SSL or PPTP, and Site to site IPSec
  • Datacenters are
    • Service Organization Control (SOC) 2 certified
    • Payment Card Industry Security Standard (PCI-DSS) for bare metal and single-tenant virtual. Not recommended for multi-tenant.
    • Tier 3
      • 99.982% availability (translates to < 1.6 hours/year)
      • Multiple power/cooling
      • N+1 fault tolerant
      • Can sustain 72 hour power outage
    • Physical security. All items mentioned are good, but seemed typical of other datacenters I’ve been to or learned about.
    • Cloud Security Alliance (CSA) self-assessment, but not yet certified

Data

Managed services

  • Backup plans
  • Security plans, patching, server hardening
  • Monitoring
  • DBA
  • Change Management

APIs

  • Implemented using SOAP and XML-RPC
  • Available as Representational State Transfer (REST)
  • Supports a wide range of languages
  • 264 services (20 of which are high level) comprising a total of 3,421 API calls
  • Can be used to up-scale and down-scale an implementation in an automated manner. There’s a new package for this called OnScale. Not sure at what level this compares or competes with Pure Applications on SoftLayer
  • Can be used to create a custom branded Customer Portal for reselling services

Compared to other cloud providers

  • A lot of marketing hype, although Gartner quadrant wasn’t at all kind to SoftLayer
  • Compared to Amazon AWS showed as higher performance and availability at lower cost, but used bare metal for the comparison. Didn’t show whether SoftLayer virtual is comparable to AWS, although in theory SoftLayer would cost less.
  • Catalyst: incubator to help small companies with infrastructure costs http://www.softlayer.com/catalyst

IBM PureApplications for Hybrid IaaS Cloud

IBM PureApplications provides on-premise cloud. #PureApp for SoftLayer provides off-premises cloud solutions. ibm.co/TNzV8m @Prolifics

Video includes clip from my manager @Prolifics, Mike Hastie.

Big Data as a Service provider has free developer account

Founders of Qubole built some of the big data technology at Facebook (scaled to 25 petabytes). Their new company has a hosted Hadoop infrastructure. Interesting small and free accounts take the IT configuration out of learning Hadoop.

Source:

Summary of Terradata’s big data approach

  • Terradata Aster 6 platform
  • Includes graph analysis engine (visualization), in addition to traditional rows/columns.
  • Enables execution of SQL across multiple NoSQL repositories
  • Integrates with multiple 3rd parties for solutions such as analytical workflow (Alteryx), advanced analytics algorithms (Fuzzy Logix).
  • Cloud services at comparable cost to on-premises

Source

 

Using Yarn to monitor resources and provision capacity in order to run other applications alongside MapReduce

Hadoop 2.0 enables clusters to grow as large as 4000 nodes within deployments that contain multiple clusters. I think that companies like Google and Facebook each run tens of thousands of nodes.

Using Yarn, developers can run additional applications within the cluster by monitoring what the applications need, and then creating CPU/RAM containers within the cluster (and across clusters?) to run them.

There’s speculation that eventually Yarn could provide a PaaS using Hadoop in order to compete with VMWare’s Cloud Foundry. I suppose that while with VMWare you need to first think in terms of virtualizing hardware components and an operating system, Yarn jumps past that to provide an environment that’s abstracted for a specific application.

Source:

HDFS fault tolerance

HDFS is fault tolerant. Each file is broken up into blocks, and each block must be written to more than one server. The number of servers is configurable, but three is the common configuration. Just as with RAID, this provides fault tolerance and increase retrieval performance.

When a block is read, its checksum indicates whether the block is valid or corrupted. If corrupted, and depending on the scope of the corruption, the block may be rewriten or the server may be taken out of the cluster and the blocks spread to other existing servers. If the cluster is running within an elastic cloud then either the server is healed or a new server is added.

Unlike high end SAN hardware which is architected to avoid failure, HDFS assumes that its low end equipment will fail so it has self-healing built into its operating model.

Cheap Hardware

In theory, a big data cluster uses low cost commodity hardware (2 CPUs, 6-12 drives, 32 GB RAM). By clustering many cheap machines, high performance can be achieved at a low cost, along with high reliability due to decentralization.

There is little benefit to running Hapdoop nodes in a virtualized environment (e.g VMWare), since when the node is active (batch processing) it may be pushing RAM and CPU utilization to its limits. This is in contrast to an application or database server which has idle and bursts, but generally has constant utilization at some medium level. What is of greater benefit is a cloud implementation (e.g. Amazon Elastic Cloud) in which one can scale from a few nodes to hundreds or thousands of nodes in real time as the batch cycles through its process.

Unlike a traditional n-tier architecture, Hadoop combines compute & storage on the same box. In contrast, an Oracle cluster would typically store its databases on a SAN, and application logic would reside on yet another set of application servers which probably do not utilize their inexpensive internal drives for application specific tasks.

A Hadoop cluster is linearly scalable, up to 4000 nodes and dozens of petabytes of data.

In a traditional db cluster (such as Oracle RAC), the architecture of the cluster should be designed with knowledge of the schema and volume (input and retrieval) of the data. WIth Hadoop, scalability is, at worst, linear. Using a cloud architecture, additional Hadoop nodes can be provisioned on the fly as node utilization increases or decreases.