Selefra-How Provider pulls data to Selefra

Introduce

In Selefra, we use providers to collect data for Policy as Code. Providers are code snippets written based on the official API documentation provided by cloud services or Software as a Service (SaaS) providers. When using Selefra, users write provider code that is executed to send requests to the cloud and retrieve data. The returned data is then stored in PostgreSQL databases in the predefined format.

AWS Provider Get Resource

Taking AWS resources as an example, users can choose specific AWS services for data collection. Here's an example of using the AWS SDK ("github.com/aws/aws-sdk-go-v2/aws/service/s3") for data collection:

func (x *TableAwsS3BucketsGenerator) GetDataSource() *schema.DataSource {
    return &schema.DataSource{
        Pull: func(ctx context.Context, clientMeta *schema.ClientMeta, client any, task *schema.DataSourcePullTask, resultChannel chan<- any) *schema.Diagnostics {

            diagnostics := schema.NewDiagnostics()

            cl := client.(*aws_client.Client)
            svc := cl.AwsServices().S3
            response, err := svc.ListBuckets(ctx, nil, func(options *s3.Options) {
                options.Region = listBucketRegion(cl)
            })
            if err != nil {
                return schema.NewDiagnosticsErrorPullTable(task.Table, err)
            }

            var wg sync.WaitGroup
            buckets := make(chan types.Bucket)
            errs := make(chan error)
            for i := 0; i < fetchS3BucketsPoolSize; i++ {
                wg.Add(1)
                go fetchS3BucketsWorker(ctx, client, buckets, errs, resultChannel, &wg)
            }
            go func() {
                defer close(buckets)
                for _, bucket := range response.Buckets {
                    select {
                    case <-ctx.Done():
                        return
                    case buckets <- bucket:
                    }
                }
            }()
            done := make(chan struct{})

            go func() {
                for err = range errs {
                    diagnostics.AddErrorPullTable(task.Table, err)
                }
                close(done)
            }()
            wg.Wait()
            close(errs)
            <-done

            return diagnostics
        },
    }
}

In this example, we use the AWS S3 SDK to retrieve the list of AWS S3 buckets. We leverage the ListBuckets method and use goroutines to concurrently fetch detailed information for each bucket. The obtained information is then stored in our predefined data structure. The function returns a pointer to the schema.DataSource type, which contains the method for retrieving the bucket list.

AWS Provider creates a local data storage structure

Of course, you need to have some understanding of the API for the data you want to collect. You need to define the table structure where you want to store your data in advance.

func GenTables() []*schema.Table {
    return []*schema.Table{
        table_schema_generator.GenTableSchema(&amp.TableAwsAmpWorkspacesGenerator{}),
        table_schema_generator.GenTableSchema(&eks.TableAwsEksClustersGenerator{}),
        // Omitted 430 lines of code
        table_schema_generator.GenTableSchema(&kinesis.TableAwsKinesisStreamsGenerator{}),
        table_schema_generator.GenTableSchema(&cloudfront.TableAwsCloudfrontCachePoliciesGenerator{})
    }
}

The table structure could be quite large, but we can use GPT to handle it. On top of that, we only need to do some manual auditing.

You can write provider code according to your needs to retrieve data for other AWS resources, such as S3 buckets, RDS databases, Lambda functions, etc. Refer to the AWS official API documentation to understand the API methods and parameters for each service and write the provider code accordingly.

By using provider code, Selefra sends requests to AWS services and

retrieves the required data. This data is then stored in the PostgreSQL database in the predefined format for subsequent policy analysis and decision support.

Please note that the provided example code is for demonstration purposes only, and the actual provider code may vary depending on the AWS service and API used. We recommend referring to the AWS official documentation for writing provider code specific to the desired service.

Conclusion

This is a detailed explanation of how data is retrieved from cloud resources using providers in Policy as Code products like Selefra. There are many details involved, such as retry mechanisms and error handling, which are specific to the business logic and can be addressed accordingly. You can also visit our official repository to view the open-source provider code. If you would like to stay updated with our team and experience our open-source product Selefra:

GitHub: https://github.com/selefra/selefra

Slack: https://selefra.io/community/join