Learn more  » Push, build, and install  RubyGems npm packages Python packages Maven artifacts PHP packages Go Modules Bower components Debian packages RPM packages NuGet packages

squarecapadmin / botocore   python

Repository URL to install this package:

/ botocore / parsers.py

# Copyright 2014 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"). You
# may not use this file except in compliance with the License. A copy of
# the License is located at
#
# http://aws.amazon.com/apache2.0/
#
# or in the "license" file accompanying this file. This file is
# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF
# ANY KIND, either express or implied. See the License for the specific
# language governing permissions and limitations under the License.
"""Response parsers for the various protocol types.

The module contains classes that can take an HTTP response, and given
an output shape, parse the response into a dict according to the
rules in the output shape.

There are many similarities amongst the different protocols with regard
to response parsing, and the code is structured in a way to avoid
code duplication when possible.  The diagram below is a diagram
showing the inheritance hierarchy of the response classes.

::



                                 +--------------+
                                 |ResponseParser|
                                 +--------------+
                                    ^    ^    ^
               +--------------------+    |    +-------------------+
               |                         |                        |
    +----------+----------+       +------+-------+        +-------+------+
    |BaseXMLResponseParser|       |BaseRestParser|        |BaseJSONParser|
    +---------------------+       +--------------+        +--------------+
              ^         ^          ^           ^           ^        ^
              |         |          |           |           |        |
              |         |          |           |           |        |
              |        ++----------+-+       +-+-----------++       |
              |        |RestXMLParser|       |RestJSONParser|       |
        +-----+-----+  +-------------+       +--------------+  +----+-----+
        |QueryParser|                                          |JSONParser|
        +-----------+                                          +----------+


The diagram above shows that there is a base class, ``ResponseParser`` that
contains logic that is similar amongst all the different protocols (``query``,
``json``, ``rest-json``, ``rest-xml``).  Amongst the various services there
is shared logic that can be grouped several ways:

* The ``query`` and ``rest-xml`` both have XML bodies that are parsed in the
  same way.
* The ``json`` and ``rest-json`` protocols both have JSON bodies that are
  parsed in the same way.
* The ``rest-json`` and ``rest-xml`` protocols have additional attributes
  besides body parameters that are parsed the same (headers, query string,
  status code).

This is reflected in the class diagram above.  The ``BaseXMLResponseParser``
and the BaseJSONParser contain logic for parsing the XML/JSON body,
and the BaseRestParser contains logic for parsing out attributes that
come from other parts of the HTTP response.  Classes like the
``RestXMLParser`` inherit from the ``BaseXMLResponseParser`` to get the
XML body parsing logic and the ``BaseRestParser`` to get the HTTP
header/status code/query string parsing.

Return Values
=============

Each call to ``parse()`` returns a dict has this form::

    Standard Response

    {
      "ResponseMetadata": {"RequestId": <requestid>}
      <response keys>
    }

    Error response

    {
      "ResponseMetadata": {"RequestId": <requestid>}
      "Error": {
        "Code": <string>,
        "Message": <string>,
        "Type": <string>,
        <additional keys>
      }
    }

"""
import re
import base64
import json
import xml.etree.cElementTree
import logging

from botocore.compat import six, XMLParseError

from botocore.utils import parse_timestamp, merge_dicts

LOG = logging.getLogger(__name__)

DEFAULT_TIMESTAMP_PARSER = parse_timestamp


class ResponseParserFactory(object):
    def __init__(self):
        self._defaults = {}

    def set_parser_defaults(self, **kwargs):
        """Set default arguments when a parser instance is created.

        You can specify any kwargs that are allowed by a ResponseParser
        class.  There are currently two arguments:

            * timestamp_parser - A callable that can parse a timetsamp string
            * blob_parser - A callable that can parse a blob type

        """
        self._defaults.update(kwargs)

    def create_parser(self, protocol_name):
        parser_cls = PROTOCOL_PARSERS[protocol_name]
        return parser_cls(**self._defaults)


def create_parser(protocol):
    return ResponseParserFactory().create_parser(protocol)


def _text_content(func):
    # This decorator hides the difference between
    # an XML node with text or a plain string.  It's used
    # to ensure that scalar processing operates only on text
    # strings, which allows the same scalar handlers to be used
    # for XML nodes from the body and HTTP headers.
    def _get_text_content(self, shape, node_or_string):
        if hasattr(node_or_string, 'text'):
            text = node_or_string.text
            if text is None:
                # If an XML node is empty <foo></foo>,
                # we want to parse that as an empty string,
                # not as a null/None value.
                text = ''
        else:
            text = node_or_string
        return func(self, shape, text)
    return _get_text_content


class ResponseParserError(Exception):
    pass


class ResponseParser(object):
    """Base class for response parsing.

    This class represents the interface that all ResponseParsers for the
    various protocols must implement.

    This class will take an HTTP response and a model shape and parse the
    HTTP response into a dictionary.

    There is a single public method exposed: ``parse``.  See the ``parse``
    docstring for more info.

    """
    DEFAULT_ENCODING = 'utf-8'

    def __init__(self, timestamp_parser=None, blob_parser=None):
        if timestamp_parser is None:
            timestamp_parser = DEFAULT_TIMESTAMP_PARSER
        self._timestamp_parser = timestamp_parser
        if blob_parser is None:
            blob_parser = self._default_blob_parser
        self._blob_parser = blob_parser

    def _default_blob_parser(self, value):
        # Blobs are always returned as bytes type (this matters on python3).
        # We don't decode this to a str because it's entirely possible that the
        # blob contains binary data that actually can't be decoded.
        return base64.b64decode(value)

    def parse(self, response, shape):
        """Parse the HTTP response given a shape.

        :param response: The HTTP response dictionary.  This is a dictionary
            that represents the HTTP request.  The dictionary must have the
            following keys, ``body``, ``headers``, and ``status_code``.

        :param shape: The model shape describing the expected output.
        :return: Returns a dictionary representing the parsed response
            described by the model.  In addition to the shape described from
            the model, each response will also have a ``ResponseMetadata``
            which contains metadata about the response, which contains at least
            two keys containing ``RequestId`` and ``HTTPStatusCode``.  Some
            responses may populate additional keys, but ``RequestId`` will
            always be present.

        """
        LOG.debug('Response headers: %s', response['headers'])
        LOG.debug('Response body:\n%s', response['body'])
        if response['status_code'] >= 301:
            if self._is_generic_error_response(response):
                parsed = self._do_generic_error_parse(response)
            else:
                parsed = self._do_error_parse(response, shape)
        else:
            parsed = self._do_parse(response, shape)

        # Add ResponseMetadata if it doesn't exist and inject the HTTP
        # status code and headers from the response.
        if isinstance(parsed, dict):
            response_metadata = parsed.get('ResponseMetadata', {})
            response_metadata['HTTPStatusCode'] = response['status_code']
            response_metadata['HTTPHeaders'] = dict(response['headers'])
            parsed['ResponseMetadata'] = response_metadata
        return parsed

    def _is_generic_error_response(self, response):
        # There are times when a service will respond with a generic
        # error response such as:
        # '<html><body><b>Http/1.1 Service Unavailable</b></body></html>'
        #
        # This can also happen if you're going through a proxy.
        # In this case the protocol specific _do_error_parse will either
        # fail to parse the response (in the best case) or silently succeed
        # and treat the HTML above as an XML response and return
        # non sensical parsed data.
        # To prevent this case from happening we first need to check
        # whether or not this response looks like the generic response.
        if response['status_code'] >= 500:
            body = response['body'].strip()
            return body.startswith(b'<html>') or not body

    def _do_generic_error_parse(self, response):
        # There's not really much we can do when we get a generic
        # html response.
        LOG.debug("Received a non protocol specific error response from the "
                  "service, unable to populate error code and message.")
        return {
            'Error': {'Code': str(response['status_code']),
                      'Message': six.moves.http_client.responses.get(
                          response['status_code'], '')},
            'ResponseMetadata': {},
        }

    def _do_parse(self, response, shape):
        raise NotImplementedError("%s._do_parse" % self.__class__.__name__)

    def _do_error_parse(self, response, shape):
        raise NotImplementedError(
            "%s._do_error_parse" % self.__class__.__name__)

    def _parse_shape(self, shape, node):
        handler = getattr(self, '_handle_%s' % shape.type_name,
                          self._default_handle)
        return handler(shape, node)

    def _handle_list(self, shape, node):
        # Enough implementations share list serialization that it's moved
        # up here in the base class.
        parsed = []
        member_shape = shape.member
        for item in node:
            parsed.append(self._parse_shape(member_shape, item))
        return parsed

    def _default_handle(self, shape, value):
        return value


class BaseXMLResponseParser(ResponseParser):
    def __init__(self, timestamp_parser=None, blob_parser=None):
        super(BaseXMLResponseParser, self).__init__(timestamp_parser,
                                                    blob_parser)
        self._namespace_re = re.compile('{.*}')

    def _handle_map(self, shape, node):
        parsed = {}
        key_shape = shape.key
        value_shape = shape.value
        key_location_name = key_shape.serialization.get('name') or 'key'
        value_location_name = value_shape.serialization.get('name') or 'value'
        if shape.serialization.get('flattened') and not isinstance(node, list):
            node = [node]
        for keyval_node in node:
            for single_pair in keyval_node:
                # Within each <entry> there's a <key> and a <value>
                tag_name = self._node_tag(single_pair)
                if tag_name == key_location_name:
                    key_name = self._parse_shape(key_shape, single_pair)
                elif tag_name == value_location_name:
                    val_name = self._parse_shape(value_shape, single_pair)
                else:
                    raise ResponseParserError("Unknown tag: %s" % tag_name)
            parsed[key_name] = val_name
        return parsed

    def _node_tag(self, node):
        return self._namespace_re.sub('', node.tag)

    def _handle_list(self, shape, node):
        # When we use _build_name_to_xml_node, repeated elements are aggregated
        # into a list.  However, we can't tell the difference between a scalar
        # value and a single element flattened list.  So before calling the
        # real _handle_list, we know that "node" should actually be a list if
        # it's flattened, and if it's not, then we make it a one element list.
        if shape.serialization.get('flattened') and not isinstance(node, list):
            node = [node]
        return super(BaseXMLResponseParser, self)._handle_list(shape, node)

    def _handle_structure(self, shape, node):
        parsed = {}
        members = shape.members
        xml_dict = self._build_name_to_xml_node(node)
        for member_name in members:
            member_shape = members[member_name]
            if 'location' in member_shape.serialization:
                # All members with locations have already been handled,
                # so we don't need to parse these members.
                continue
            xml_name = self._member_key_name(member_shape, member_name)
            member_node = xml_dict.get(xml_name)
            if member_node is not None:
                parsed[member_name] = self._parse_shape(
                    member_shape, member_node)
            elif member_shape.serialization.get('xmlAttribute'):
                attribs = {}
                location_name = member_shape.serialization['name']
                for key, value in node.attrib.items():
                    new_key = self._namespace_re.sub(
                        location_name.split(':')[0] + ':', key)
                    attribs[new_key] = value
                if location_name in attribs:
                    parsed[member_name] = attribs[location_name]
        return parsed

    def _member_key_name(self, shape, member_name):
        # This method is needed because we have to special case flattened list
        # with a serialization name.  If this is the case we use the
        # locationName from the list's member shape as the key name for the
        # surrounding structure.
Loading ...