banzaicloud / cloudinfo Goto Github PK

Cloud instance type and price information as a service

Home Page: https://banzaicloud.com/cloudinfo

License: Apache License 2.0

Makefile 1.54% Go 80.43% Shell 0.24% JavaScript 0.31% TypeScript 10.96% HTML 3.09% Dockerfile 0.29% SCSS 2.49% Mustache 0.64%

spot-price cloud-providers price-information cloud-billing-api

cloudinfo's Introduction

Cloud price and service information

The Banzai Cloud Cloudinfo application is a standalone project in the Pipeline ecosystem. While AWS, Google Cloud, Azure, AliBaba or Oracle all provide some kind of APIs to query instance type attributes and product pricing information, these APIs are often responding with partly inconsistent data, or the responses are very cumbersome to parse. The Cloudinfo service uses these cloud provider APIs to asynchronously fetch and parse instance type attributes and prices, while storing the results in an in memory cache and making it available as structured data through a REST API.

Quick start

Building the project is as simple as running a go build command. The result is a statically linked executable binary.

make build

The following options can be configured when starting the exporter (with defaults):

build/cloudinfo --help
Usage of Banzai Cloud Cloudinfo Service:
      --config-vault string               enable config Vault
      --config-vault-address string       config Vault address
      --config-vault-token string         config Vault token
      --config-vault-secret-path string   config Vault secret path
      --log-level string                  log level (default "info")
      --log-format string                 log format (default "json")
      --metrics-enabled                   internal metrics are exposed if enabled
      --metrics-address string            the address where internal metrics are exposed (default ":9090")
      --listen-address string             application listen address (default ":8000")
      --scrape                            enable cloud info scraping (default true)
      --scrape-interval duration          duration (in go syntax) between renewing information (default 24h0m0s)
      --provider-amazon                   enable amazon provider
      --provider-google                   enable google provider
      --provider-alibaba                  enable alibaba provider
      --provider-oracle                   enable oracle provider
      --provider-azure                    enable azure provider
      --provider-digitalocean             enable digitalocean provider
      --config string                     Configuration file
      --version                           Show version information
      --dump-config                       Dump configuration to the console (and exit)

Create a permanent developer configuration:

cp config.toml.dist config.toml

Running cloudinfo requires the web/ project to be built (requires Node.js to be installed):

cd web/
npm run build-prod
cd ..
build/cloudinfo

Cloud credentials

The cloudinfo service is querying the cloud provider APIs, so it needs credentials to access these.

AWS

Cloudinfo is using the AWS Price List API that allows a user to query product pricing in a fine-grained way. Authentication works through the standard AWS SDK for Go, so credentials can be configured via environment variables, shared credential files and via AWS instance profiles. To learn more about that read the Specifying Credentials section of the SDK docs.

The easiest way is through environment variables:

export AWS_ACCESS_KEY_ID=<access-key-id>
export AWS_SECRET_ACCESS_KEY=<secret-access-key>
cloudinfo --provider-amazon

Create AWS credentials with aws command-line tool:

aws iam create-user --user-name cloudinfo
aws iam put-user-policy --user-name cloudinfo --policy-name cloudinfo_policy --policy-document file://credentials/amazon_cloudinfo_role.json
aws iam create-access-key --user-name cloudinfo

Google Cloud

On Google Cloud the project is using two different APIs to collect the full product information: the Cloud Billing API and the Compute Engine API. Authentication to the Cloud Billing Catalog API is done through an API key that can be generated on the Google Cloud Console. Once you have an API key, billing is enabled for the project and the Cloud Billing API is also enabled you can start using the API.

The Compute Engine API is doing authentication in the standard Google Cloud way with service accounts instead of API keys. Once you have a service account, download the JSON credentials file from the Google Cloud Console, and set its account through an environment variable:

export GOOGLE_CREDENTIALS_FILE=<path-to-my-service-account-file>.json
export GOOGLE_PROJECT=<google-project-id>
cloudinfo --provider-google

Create service account key with gcloud command-line tool:

gcloud services enable container.googleapis.com compute.googleapis.com cloudbilling.googleapis.com cloudresourcemanager.googleapis.com
gcloud iam service-accounts create cloudinfoSA --display-name "Service account used for managing Cloudinfo”
gcloud iam roles create cloudinfo --project [PROJECT-ID] --title cloudinfo --description "cloudinfo roles" --permissions compute.machineTypes.list,compute.regions.list,compute.zones.list
gcloud projects add-iam-policy-binding [PROJECT-ID] --member='serviceAccount:cloudinfoSA@[PROJECT-ID].iam.gserviceaccount.com' --role='projects/[PROJECT-ID]/roles/cloudinfo'
gcloud iam service-accounts keys create cloudinfo.gcloud.json --iam-account=cloudinfoSA@[PROJECT-ID].iam.gserviceaccount.com

Azure

There are two different APIs used for Azure that provide machine type information and SKUs respectively. Pricing info can be queried through the Rate Card API. Machine types can be queried through the Compute API's list virtual machine sizes request. Authentication is done via standard Azure service principals.

Follow this link to learn how to generate one with the Azure SDK and set an environment variable that points to the service account file:

export AZURE_SUBSCRIPTION_ID=<subscription-id>
export AZURE_TENANT_ID=<tenant-id>
export AZURE_CLIENT_ID=<client-id>
export AZURE_CLIENT_SECRET=<client-secret>
cloudinfo --provider-azure

Create service principal with azure command-line tool:

cd credentials
az provider register --namespace Microsoft.Compute
az provider register --namespace Microsoft.Resources
az provider register --namespace Microsoft.ContainerService
az provider register --namespace Microsoft.Commerce
az role definition create --verbose --role-definition @azure_cloudinfo_role.json
az ad sp create-for-rbac --name "CloudinfoSP" --role "Cloudinfo" --sdk-auth true > azure_cloudinfo.auth

Oracle

Authentication is done via CLI configuration file. Follow this link to learn how to create such a file and set an environment variable that points to that config file:

export ORACLE_TENANCY_OCID=<tenancy-ocid>
export ORACLE_USER_OCID=<user-ocid>
export ORACLE_REGION=<region>
export ORACLE_FINGERPRINT=<fingerprint>
export ORACLE_PRIVATE_KEY=<private-key>
export ORACLE_PRIVATE_KEY_PASSPHRASE=<private-key-passphrase>
# OR
export ORACLE_CONFIG_FILE_PATH=<config-file-path>
export ORACLE_PROFILE=<profile>

cloudinfo --provider-oracle

Alibaba

The easiest way to authenticate is through environment variables:

export ALIBABA_ACCESS_KEY_ID=<access-key-id>
export ALIBABA_ACCESS_KEY_SECRET=<access-key-secret>
export ALIBABA_REGION_ID=<region-id>
cloudinfo --provider-alibaba

Create Alibaba credentials with Alibaba Cloud CLI:

aliyun ram CreateUser --UserName CloudInfo --DisplayName CloudInfo
aliyun ram AttachPolicyToUser --UserName CloudInfo --PolicyName AliyunECSReadOnlyAccess --PolicyType System
aliyun ram AttachPolicyToUser --UserName CloudInfo --PolicyName AliyunBSSReadOnlyAccess --PolicyType System
aliyun ram CreateAccessKey --UserName CloudInfo

DigitalOcean

export DIGITALOCEAN_ACCESS_TOKEN=<access-token>
cloudinfo --provider-digitalocean

Create a new API access token on DigitalOcean Console.

Configuring multiple providers

Cloud providers can be configured one by one. To configure multiple providers simply list all of them and configure the credentials for all of them. Here's an example of how to configure three providers:

export AWS_SECRET_ACCESS_KEY=<secret-access-key>
export AWS_ACCESS_KEY_ID=<access-key-id>
export ALIBABA_ACCESS_KEY_ID=<access-key-id>
export ALIBABA_ACCESS_KEY_SECRET=<access-key-secret>
export ALIBABA_REGION_ID=<region-id>
export DIGITALOCEAN_ACCESS_TOKEN=<access-token>

cloudinfo --provider-amazon --provider-alibaba --provider-digitalocean

API calls

For a complete OpenAPI 3.0 documentation, check out this URL.

Here's a few cURL examples to get started:

curl  -ksL -X GET "http://localhost:9090/api/v1/providers/azure/services/compute/regions/" | jq .
[
  {
    "id": "centralindia",
    "name": "Central India"
  },
  {
    "id": "koreacentral",
    "name": "Korea Central"
  },
  {
    "id": "southindia",
    "name": "South India"
  },
  ...
]

curl  -ksL -X GET "http://localhost:9090/api/v1/providers/amazon/services/compute/regions/eu-west-1/products" | jq .
{
  "products": [
    {
      "type": "i3.8xlarge",
      "onDemandPrice": 2.752,
      "cpusPerVm": 32,
      "memPerVm": 244,
      "gpusPerVm": 0,
      "ntwPerf": "10 Gigabit",
      "ntwPerfCategory": "high",
      "spotPrice": [
        {
          "zone": "eu-west-1c",
          "price": 1.6018
        },
        {
          "zone": "eu-west-1b",
          "price": 0.9563
        },
        {
          "zone": "eu-west-1a",
          "price": 2.752
        }
      ]
    },
    ...
  ]
}

FAQ

1. The API responses with status code 500 after starting the cloudinfo app and making a cURL request

After the cloudinfo app is started, it takes a few minutes to cache all the product information from the providers. Before the results are cached, responses may be unreliable. We're planning to solve it in the future. After a few minutes it should work fine.

2. Why is it needed to parse the product info asynchronously and periodically instead of relying on static data?

Cloud providers are releasing new instance types and regions quite frequently and also changing on-demand pricing from time to time. So it is necessary to keep this info up-to-date without needing to modify it manually every time something changes on the provider's side. After the initial query, the cloudinfo app will parse this info from the Cloud providers once per day. The frequency of this querying and caching is configurable with the --product-info-renewal-interval switch and is set to 24h by default.

3. What happens if the cloudinfo app cannot cache the AWS product info?

If caching fails, the cloudinfo app will try to reach the AWS Pricing List API on the fly when a request is sent (and it will also cache the resulting information).

4. What kind of AWS permissions do I need to use the project?

The cloudinfo app is querying the AWS Pricing API to keep up-to-date info about instance types, regions and on-demand pricing. You'll need IAM access as described here in example 11 of the AWS IAM docs.

If you don't use Prometheus to track spot instance pricing, you'll need to be able to access the spot price history from the AWS API as well with your IAM user. It means giving permission to ec2:DescribeSpotPriceHistory.

5. What is the advantage of using Prometheus to determine spot prices?

Prometheus is becoming the de-facto monitoring solution in the cloud native world, and it includes a time series database as well. When using the Banzai Cloud spot price exporter, spot price history will be collected as time series data and can be queried for averages, maximums and predictions. It gives a richer picture than relying on the current spot price that can be a spike, or on a downward or upward trend. You can fine tune your query (with the -prometheus-query switch) if you want to change the way spot instance prices are scored. By default the spot price averages of the last week are queried and instance types are sorted based on this score.

6. What happens if my Prometheus server cannot be reached or if it doesn't have the necessary spot price metrics?

If the cloudinfo app fails to reach the Prometheus query API, or it couldn't find proper metrics, it will fall back to querying the current spot prices from the AWS API.

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

cloudinfo's People

Contributors

Stargazers

Watchers

cloudinfo's Issues

implement the azure `aks` service support

remove the scraping related code from the caching cloud info code

scraping logic was moved to the specialized component; ome of the "old" code remained in the cloudinfo implementation not to break functionality.

As scraping providers for information is designed to be completely separated from the "request serving" code - mainly the management component needs to communicate with the scraping component. Probably an event based approach needs to be added for this purpose - integrate watermill if appropriate

add scraping time info / display it on the UI

The product details response has been extended with the scraping time information:
it can be found in the root of the response json:

"scrapingTime": "1535642080264"

The value is the timestamp when the data has been successfully refreshed for the provider.

This needs to be displayed on the ui below the curl command and above the details table

code enhancements (viper, logging, etc)

from the review of #43:

inject configuration values and do not access viper directly: (https://github.com/banzaicloud/productinfo/pull/43/files/97f9eb1e934ec12d8079e8192df5460a240d1319#r214339225)
init waitgroup a few lines later, to prevent creating it unnecessarily:
https://github.com/banzaicloud/productinfo/pull/43/files/97f9eb1e934ec12d8079e8192df5460a240d1319#r214339649

use of internal package

go build
main.go:43:2: use of internal package github.com/banzaicloud/productinfo/internal/app/productinfo/api not allowed

redesign the provider/service select component on the upper left corner on the UI

The component needs to be redesigned: provider and service should be both visible

logs cleanup

design the logging strategy, remove the custom context logger, use logur instead

improve logging

1.: we should create a consistent way of initialising loggers, similarly to Pipeline:
banzaicloud/pipeline#798

2.: we can consider adding a correlation id to requests for better tracing and error handling:
https://github.com/banzaicloud/pipeline/pull/787/files

3.: configure gin to use logrus:
https://github.com/banzaicloud/pipeline/pull/793/files

4.: add provider's name as a field for every log message that's related to a specific provider

5.: add a scrape id to every scrape message to be able to distinguish between them

6.: alibaba package shouldn't use the "github.com/aliyun/alibaba-cloud-sdk-go/sdk/log" package for logging but the standard logrus instead that's used everywhere else

Add Redis and Cassandra config support to helm chart

Using Redis:

CLOUDINFO_STORE_REDIS_ENABLED=true
CLOUDINFO_STORE_REDIS_HOST=localhost
CLOUDINFO_STORE_REDIS_PORT=6379

Using Cassandra:

CLOUDINFO_STORE_CASSANDRA_ENABLED=true
CLOUDINFO_STORE_CASSANDRA_HOSTS=localhost
CLOUDINFO_STORE_CASSANDRA_PORT=9042
CLOUDINFO_STORE_CASSANDRA_KEYSPACE=cloudinfo
CLOUDINFO_STORE_CASSANDRA_TABLE=products

service management

service specific details need to be abstracted and made configurable

Requirements:

administrators should be able to "register" a new service under an existing provider or a completely new provider
cloudinfo behaves similarly to the existing services in case of the newly registered ones
existing services behaviour will not be extended or altered

Design:

the initial format for importing cloud information is similar to what CI currently supports
service data is thought as configuration, passed to the app in yaml format
service data is loaded into the application on startup

- information shared between services need to be reflected in the data definition
- the above introduces ordering between service information loading: dynamic services -those that get the information from providers -need to be started before static services

Tasks:

design the data model for service configuration
design the operational model (how the configuration is loaded)
implement the configuration loader
integrate the loader to the current product / factor out the redundant solutions
include / exclude - data from existing services

implement the `eks` service support

provide only the products and regions that support the eks

categorize regions by geographic location

Telescopes should be able to filter by geographic location

Add support for services

Services are resources that implement provider and region specific logic, mainly expose dependencies, restrictions and other relationship between products.

This story collects the tasks needed to add support for this new resource.

identify the minimum set user roles/permissions required for scraping providers for product data

We need to reduce permissions for users that use provider apis to the minimum

Decorate products with metainfo - labels

The product representation needs to be decorated with labels containing meta information about the product. (labels may contain values already in the struct, derivated values or other data from the provider)
eg.:

purpose (compute, general, etc ...)
network category
TBD

This information needs to be provided to all the clients of the productinfo (promarily telescope, pipeline) - upgrade the productinfo client in these apps

Improve tracing

decorate spans with status codes and messages
cover error flows (relates to the above)
cover gin handlers (trace requests)
trace multiple services (telescopes)

add endpoint for retrieving version / build information

Please provide an API call to serve the following information:

release/tag name
last commit short hash
last commit date

update the openspec with the new op, regenerate the client

specify Azure network performance

remove endpoint for retrieving attribute values

Attributes are not from the external API, therefore compiling the list is the task of Telescopes.

Refactor - configuration, instrumentation

Organize the code based on the structure found here

Focus on / prepare for:

error / problem handling
logging
application configuration
integrating / instrumenting tools (tracing, prometheus, etc)

application configuration to be made more transparent
separate metrics related logic (should not interfere organically with the other logic: scraping etc ...)

implement the google `gke` service support

refresh the data on the ui when no results are returned

Problem:
When no results are returned by the product service for a given provider or a given region the displayed data doesn't get refreshed and it shows the previously displayed data.

The API has changed to return empty json collection ([]) instead of nil when no results are returned

This issue is to handle this case properly in the UI code.

move path parameter validation to the handler function implementation

path parameters should be validated in handlers instead of the validation middleware

fixed in #125

Doesn't honor base path config

Base path has been set to '/' but got following:

<!doctype html>
--
  | <html lang="en" xmlns="http://www.w3.org/1999/html">
  | <head>
  | <meta charset="utf-8">
  | <title>Cloud Products</title>
  | <base href="/productinfo/">
  | <meta name="viewport" content="width=device-width, initial-scale=1">

add a persistent cloudinfo storage

Cloud information is stored in an in memory cache as of now.
We need a persistent storage besides of this solution

The task can be reduced basically to add a database backed implementation of the CloudInfoStore interface

the chosen solution is to a store data in the current format in redis /cassandra

Productinfo returns only two kind of instance in case of Google

It only returns two kind of instance no matter what region we choose, thus recommender fails to recommend instances.

It fails on alpha and on beta too.

implement the oracle `oci` service support

provide only the regions and products that support the oci

identify wired service information in the current implementation

wired values need to be extracted in the service definition yaml file and loaded into the application on startup; wired values to be removed from the code-base

realates to #147

Remove redundant flags

The following two flags seem to be redundant because these values are already present in the auth files required by the providers. We should try to read them from the auth files instead of having to specify them redundantly.

--gce-project-id  
--azure-subscription-id

collect prometheus metrics for services

implement the alibaba `cs` service support

provide only the regions and products that support alibaba cs

Adjust the productinfo UI to the changed rest URL scheme

The productinfo REST api has been normalized, changes need to be reflected in the UI code, so that the appropriate calls are executed against the proper endpoints.

The new scheme has the form:

/api/v1/providers
/api/v1/providers/:provider
/api/v1/providers/:provider/services
/api/v1/providers/:provider/services/:service/regions
/api/v1/providers/:provider/services/:service/regions/:region
/api/v1/providers/:provider/services/:service/regions/:region/images
/api/v1/providers/:provider/services/:service/regions/:region/images?product=productID
/api/v1/providers/:provider/services/:service/regions/:region/products
/api/v1/providers/:provider/services/:service/regions/:region/products/:attribute
/api/v1/providers/:provider/services/:service/regions/:region/products/:attribute?image=imageID

On the UI the "provider" dropdown needs to be populated with data in the following form:

  - service1
  - service2
provider1
  - service1
  - service2
................

The data in the dropdown may be retrieved with the call

curl -L -X GET 'http://localhost:9090/productinfo/api/v1/providers'

Instrumentation, api cleanup

error handling (contextual errors)
API - problem implementation (https://tools.ietf.org/html/rfc7807)
context propagation
instrumentation - metrics
instrumentation - tracing

Ability to refresh cloud provider information individually (per provider)

Sometimes it's required to refresh cached information for a single provider only (eg, a longer service shortage during which the cache expires)

add error handler to the scraping component

Error handler has to log errors occurred during the scraping process
Related operations shouldn't escalate/propagate errors

Add generation information to the product info - ec2 only

In order to improve recommendations for ec2 - use the current generation instance types make the instance type generation information available for clients.

relates to: #9

wait for the product store backend if necessary before scraping

switch to go modules

improve how services are internally handled

This task is about designing and implementing the mechanism that takes into account the service value from the request path. (no overengineering, focus on simplicity)

Add tracing in order to early detect / debug scraping failures

This is more of a design task: by integrating eg. opencensus besides prometheus metrics it's possible to track background operations to better control and monitor the app behaviour

Add Alibaba ECS support

https://www.alibabacloud.com/help/doc-detail/25378.htm#g5

write functional tests for checking the generated client

We need to make sure the api is not broken and the client is not outdated by modifications

Add ability to load and dump cloud product information

cached information can be easily exported and imported
This functionality allows a quicker recover and start (during development or production)

Azure (and other providers) - handle quotas

Product info may show resources that are limited by user specific quota - this may lead to failures during cluster creation

validate the :service path param

cleanup dependencies

add versions to dependencies
regenerate the lock file

make target improvements

check the generic targets (clean/clear etc ...)
apply improvements: test summary etc ...

Vendoring cleanup

perform the vendoring cleanup - keep only the dependency describers under vc

separate backend operations

The application periodically retrieves cloud information from providers; this is quite a long process where different subtasks are executed by individual goroutines.

For better traceability this process should be separated from the "serving" part of the application;
Ideally this could be done by defining an interface for the renewal operations and add an implementaiton ...

UPDATE:
What i meant here - the renewal shouldn't be part of the caching cloud info; the process only writes to the store