AWS Machine Learning Blog

Connect the Amazon Q Business generative AI coding companion to your GitHub repositories with Amazon Q GitHub (Cloud) connector

November 2024: This post was reviewed and updated for accuracy.

Incorporating generative artificial intelligence (AI) into your development lifecycle can offer several benefits. For example, using an AI-based coding companion such as Amazon Q Developer can boost development productivity by up to 30 percent. Additionally, reducing the developer context switching that stems from frequent interactions with many different development tools can also increase developer productivity. In this post, we show you how development teams can quickly obtain answers based on the knowledge distributed across your development environment using generative AI.

GitHub (Cloud) is a popular development platform that helps teams build, scale, and deliver software used by more than 100 million developers and over 4 million organizations worldwide. GitHub helps developers host and manage Git repositories, collaborate on code, track issues, and automate workflows through features such as pull requests, code reviews, and continuous integration and deployment (CI/CD) pipelines.

Amazon Q Business is a fully managed, generative AI–powered assistant designed to enhance enterprise operations. You can tailor it to specific business needs by connecting to company data, information, and systems using over 40 built-in connectors.

You can connect your GitHub (Cloud) instance to Amazon Q Business using an out-of-the-box connector to provide a natural language interface to help your team analyze the repositories, commits, issues, and pull requests contained in your GitHub (Cloud) organization. After establishing the connection and synchronizing data, your teams can use Amazon Q Business to perform natural language queries in the supported GitHub (Cloud) data entities, streamlining access to this information.

Overview of solution

To create an Amazon Q Business application to connect to your GitHub repositories using AWS IAM Identity Center and AWS Secrets Manager, follow these high-level steps:

  1. Create an Amazon Q Business application
  2. Perform sync
  3. Run sample queries to test the solution

The following screenshot shows the solution architecture.

Solution architecture, showing the integration of Amazon Q Business with a GitHub Cloud organisation and a sample repository structure

In this post, we show how developers and other relevant users can use the Amazon Q Business web experience to perform natural language–based Q&A over the indexed information reflective of the associated access control lists (ACLs). For this post, we set up a dedicated GitHub (Cloud) organization with four repositories and two teams—review and development. Two of the repositories are private and are only accessible to the members of the review team. The remaining two repositories are public and are accessible to all members and teams.

Prerequisites

To perform the solution, make sure you have the following prerequisites in place:

  1. Have an AWS account with privileges necessary to administer Amazon Q Business
  2. Have access to the AWS region in which Amazon Q Business is available (Supported regions)
  3. Enable the IAM Identity Center and add a user (Guide to enable IAM Identity CenterGuide to add user)
  4. Have a GitHub account with an organization and repositories (Guide to create organization)
  5. Have a GitHub access token classic (Guide to create access tokensPermissions needed for tokens)

Create, sync, and test an Amazon Q business application with IAM Identity Center

To create the Amazon Q Business application, you need to select the retriever, connect the data sources, and add groups and users.

Create application

  1. On the AWS Management Console, search for Amazon Q Business in the search bar, then select Amazon Q Business.

In the AWS Home Screen, type Amazon Q Business in the search bar to pull up the Q service, and select to open the service.

  1. On the Amazon Q Business landing page, choose Get started.

Amazon Q Business get started via AWS console

  1. On the Amazon Q Business Applications screen, at the bottom, choose Create application.

In the Q Home Screen, select "create application" to initiate the process

  1. Under Create application, provide the required values. For example, in Application name, enter anycompany-git-application. For Service access, select Create and use a new service-linked role (SLR). Under Application connected to IAM Identity Center, note the ARN for the associated IAM Identity Center instance. Choose Create.

Creation of a new Amazon Q Business application

Select retriever

Under Select retriever, in Retrievers, select Use native retriever. Under Index provisioning, enter “1.”

Amazon Q Business pricing is based on the chosen document index capacity. You can choose up to 50 capacity units as part of index provisioning. Each unit can contain up to 20,000 documents or 200 MB, whichever comes first. You can adjust this number as needed for your use case.

Choose Next at the bottom of the screen.

Select the "Use native retriever" and choose the "Number of units" based on the how many documents has to be indexed.

Connect data sources

  1. Under Connect data sources, in the search field under All, enter “GitHub” and select the plus sign to the right of the GitHub selection. Choose Next to configure the data source.

You can use the following examples to create a default configuration with file type exclusions to bypass crawling common image and stylesheet files.

Amazon Q Business already has connector for Github. Type Github in the search box, from the search results GitHub, click on the Plus icon.

  1. Enter anycompany-git-datasource in the Data source name and Description.

From the datasource profile, provide the Data source name, description, Github source as "Github Enterprise Cloud" and the Github Host URL.

  1. In the GitHub organization name field, enter your GitHub organization name. Under Authentication, provide a new access token or select an existing access token stored in AWS Secrets Manager.

ACLs and Identity Crawlers are by default enabled for Github connector. Provide the organization name, and the Token for Github authentication. VPC is optional, move to next step without selecting one.

  1. Under IAM role, select Create a new service role and enter the role name under Role name for the data source.

Create a new Service role for Amazon Q Business application

  1. Define Sync scope by selecting the desired repositories and content types to be synced.

Define sync scope

  1. Complete the Additional configuration and Sync mode.

This optional section can be used to specify the file names, types, or file path using regex patterns to define the sync scope. Also, the Sync Mode setting to define the types of content changes to sync when your data source content changes.

Optional configuration settings

  1. For the purposes of this post, under Sync run schedule, select Run on demand under Frequency so you can manually invoke the sync process. Other options for automated periodic sync runs are also supported. In the Field Mappings section, keep the default settings. After you complete the retriever creation, you can modify field mappings and add custom field attributes. You can access field mapping by editing the data source.

Configure sync scope

Add groups and users

There are two users we will use for testing: one with full permissions on all the repositories in the GitHub (Cloud) organization, and a second user with permission only on one specific repository.

  1. Choose Add groups and users.

Add groups and users

  1. Select Assign existing users and groups. This will show you the option to select the users from the IAM Identity Center and add them to this Amazon Q Business application. Choose Next.

  1. Search for the username or name and select the user from the listed options. Repeat for all of the users you wish to test with.

  1. Assign the desired subscrption to the added users.
  1. For Web experience service access, use the default value of Create and use a new service role. Choose Create Application and wait for the application creation process to complete.

Assign subscription and select service role

Perform sync

To sync your new Amazon Q Business application with your desired data sources, follow these steps:

  1. Select the newly created data source under Data sources and choose Sync now.

Depending on the number of supported data entities in the source GitHub (Cloud) organization, the sync process might take several minutes to complete.

Perform data sync

  1. Once the sync is complete, click on the data source name to show the sync history including number of objects scanned, added, deleted, modified, and failed. You can also access the associated Amazon CloudWatch logs to inspect the sync process and failed objects.

View sync history

  1. To access the Amazon Q Business application, select Web experience settings and choose Deployed URL. A new tab will open and ask you for sign-in details. Provide the details of the user you created earlier and choose Sign in.

Access Amazon Q Business Deployed URL

Run sample queries to test the solution

You should now see the home screen of Amazon Q Business, including the associated web experience. Now we can ask questions in natural language and Amazon Q Business will provide answers based on the information indexed from your GitHub (Cloud) organization.

  1. To begin, enter a natural language question in the Enter a prompt.

Access Amazon Q Business application

  1. You can ask questions about the information from the synced GitHub (Cloud) data entities. For example, you can enter, “Tell me how to start a new Serverless application from scratch?” and obtain a response based on the information from the associated repository README.md file.

Amazon Q Business response

  1. Because you are logged in as the first user and mapped to a GitHub (Cloud) user belonging to the review team, you should also be able to ask questions about the contents of private repositories accessible by the members of that team.

As shown in the following screenshot, you can ask questions about the private repository called aws-s3-object-management and obtain the response based on the README.md in that repository.

Amazon Q Business response

However, when you attempt to ask the same question when logged in as the second user, which has no access to the associated GitHub (Cloud) repository, Amazon Q Business will provide an ACL-filtered response.

Filtered Amazon Q Business response

Troubleshooting and frequently asked questions:

1. Why isn’t Amazon Q Business answering any of my questions?

If you are not getting answers to your questions from Amazon Q Business, verify the following:

  1. Permissions – document ACLs indexed by Amazon Q Business may not allow you to query certain data entities as demonstrated in our example. If this is the case, please reach out to your GitHub (Cloud) administrator to verify that your user has access to the restricted documents and repeat the sync process.
  2. Data connector sync – a failed data source sync may prevent the documents from being indexed, meaning that Amazon Q Business would be unable to answer questions about the documents that failed to sync. Please refer to the official documentation to troubleshoot data source connectors.

To troubleshoot ACLs and chat response filtering, verify that GitHub (Cloud) user identities and corresponding repository groups have been added to the Amazon Q Business user principal store using the following CloudWatch Logs Insights query following a data sync run:

fields @ingestionTime, DocumentId, Message, @timestamp
| sort @timestamp desc
| filter @logStream like /^LOG_STREAM_UUID/ and Message like /principal/

Continue by inspecting document-level sync reports to gain enhanced data sync visibility in Amazon Q Business. For example, you can inspect the crawled ACLs for a particular data sync run using the following CloudWatch Logs Insights query:

fields LogLevel, DocumentId, DocumentTitle, Acl, @timestamp
| filter @logStream like 'SYNC_RUN_HISTORY_REPORT/your-data-source-id/run-id' and not isempty(Acl)
| sort @timestamp desc
| limit 10000

The logs show that the ACL crawler has identified the GitHub (Cloud) repository-specific group anycompanyqdemo-repositories-aws-s3-object-management that contains GitHub (Cloud) users with access to this repository identified via direct assignment or associated GitHub (Cloud) teams membership.

Verify that the user ID mapping of the GitHub (Cloud) user correctly reflects the user email address of the associated user in AWS IAM Identity Center. Given the username of the GitHub (Cloud) user anycompanyqdemo, use the AWS CLI to obtain user details from the Amazon Q Business user store where APPLICATION_ID refers to your Amazon Q Business application ID:

aws qbusiness get-user \
  --application-id APPLICATION_ID \
  --user-id anycompanyqdemo

This response indicates that, because the user email in the associated GitHub (Cloud) profile is not public – which is often the case, the Amazon Q Business user mapping was created based on the GitHub (Cloud) username anycompanyqdemo:

{
    "userAliases": [
        {
            "indexId": "INDEX_ID",
            "dataSourceId": "GITHUB_DATA_SOURCE_ID",
            "userId": "U_**********"
        },
        { 
            "userId": "anycompanyqdemo"
        }
    ]
}

In our example, the associated AWS IAM Identity Center user email address is anycompanyqdemo@anycompany.com. To update the mapping so that ACLs are applied correctly when filtering chat responses, add the alias of the AWS IAM Identity Center user to the original user created as part of ACL crawling:

aws qbusiness update-user \
  --application-id APPLICATION_ID \
  --user-id anycompanyqdemo \
  --user-aliases-to-update \
  userId=anycompanyqdemo@anycompany.com

{
    "userAliasesUpdated": [
        {
            "userId": "anycompanyqdemo@anycompany.com"
        }
    ]
}

If the user ID anycompanyqdemo@anycompany.com does not yet exist in the user store, create it first using the create-user API. Once the mapping has been updated, the user details should look like this:

{
    "userAliases": [
        {
            "indexId": "INDEX_ID",
            "dataSourceId": "GITHUB_DATA_SOURCE_ID",
            "userId": "U_**********"
        },
        { 
            "userId": "anycompanyqdemo"
        },
        { 
            "userId": "anycompanyqdemo@anycompany.com"
        }
    ]
}

The updated mapping will result in appropriate resolution of crawled ACLs to reflect the expected document access permissions in chat responses. Use the available Amazon Q Business user store management APIs to configure the required user mapping for your Amazon Q Business application.

2. My connector is unable to sync.

Please refer to the official documentation to troubleshoot data source connectors. Please also verify that all of the required prerequisites for connecting Amazon Q Business to GitHub (Cloud) are in place.

3. I updated the contents of my data source but Amazon Q business answers using old data.

Verifying the sync status and sync schedule frequency for your GitHub (Cloud) data connector should reveal when the last sync ran successfully. It could be that your data connector sync run schedule is set to run on demand or has not yet been triggered for its next periodic run. If the sync is set to run on demand, it will need to be manually triggered.

4. How can I know if the reason I don’t see answers is due to ACLs?

If different users are getting different answers to the same questions, including differences in source attribution with citation, it is likely that the chat responses are being filtered based on user document access level represented via associated ACLs.

5. How can I sync documents without ACLs?

Access control list (ACL) crawling is enabled by default during data source creation and cannot be disabled. To disable ACL crawling, the data source needs to be deleted and re-created. To create a data source with ACLs disabled, you will need specific IAM permissions.

Cleanup

To avoid incurring future charges, clean up any resources you created as part of this solution, including the Amazon Q Business application:

  1. On the Amazon Q Business console, choose Applications in the navigation pane.
  2. Select the application you created.
  3. On the Actions menu, choose Delete.
  4. Delete the AWS Identity and Access Management (IAM) roles created for the application and data retriever. You can identify the IAM roles used by the created Amazon Q Business application and data retriever by inspecting the associated configuration using the AWS console or AWS Command Line Interface (AWS CLI).
  5. If you created an IAM Identity Center instance for this walkthrough, delete it.

Conclusion

In this post, we walked through the steps to connect your GitHub (Cloud) organization to Amazon Q Business using the out-of-the-box GitHub (Cloud) connector. We demonstrated how to create an Amazon Q Business application integrated with AWS IAM Identity Center as the identity provider. We then configured the GitHub (Cloud) connector to crawl and index supported data entities such as repositories, commits, issues, pull requests, and associated metadata from your GitHub (Cloud) organization. We showed how to perform natural language queries over the indexed GitHub (Cloud) data using the AI-powered chat interface provided by Amazon Q Business. Finally, we covered how Amazon Q Business applies ACLs associated with the indexed documents to provide permissions-filtered responses.

Beyond the web-based chat experience, Amazon Q Business offers a Chat API to create custom conversational interfaces tailored to your specific use cases. You can also use the associated API operations using the AWS CLI or AWS SDK to manage Amazon Q Business applications, retriever, sync, and user configurations.

By integrating Amazon Q Business with your GitHub (Cloud) organization, development teams can streamline access to information scattered across repositories, issues, and pull requests. The natural language interface powered by generative AI reduces context switching and can provide timely answers in a conversational manner.

To learn more about Amazon Q connector for GitHub (Cloud), refer to Connecting GitHub (Cloud) to Amazon Q Business, the Amazon Q User Guide, and the Amazon Q Developer Guide.


About the Authors

Maxim Chernyshev

Maxim Chernyshev is a Senior Solutions Architect working with mining, energy, and industrial customers at AWS. Based in Perth, Western Australia, Maxim helps customers devise solutions to complex and novel problems using a broad range of applicable AWS services and features. Maxim is passionate about industrial Internet of Things (IoT), scalable IT/OT convergence, and cyber security.

Manjunath Arakere

Manjunath Arakere is a Senior Solutions Architect on the Worldwide Public Sector team at AWS, based in Atlanta, Georgia. He works with public sector partners to design and scale well-architected solutions and supports their cloud migrations and modernization initiatives. Manjunath specializes in migration, modernization, and serverless technology.

Mira Andhale

Mira Andhale is a Software Development Engineer on the Amazon Q and Amazon Kendra engineering team. She works on the Amazon Q connector design, development, integration and test operations.