AI Software Testing System (AI QA)

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
AI Software Testing System (AI QA)
Complex
from 2 weeks to 3 months
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1214
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    823

AI System for Software Testing and QA

Code coverage of 80% looks good until you examine what's actually covered: happy paths, obvious cases, but not boundary conditions, not integrations between components, not edge cases with unexpected data. The AI QA system solves not the problem "no tests exist" but "tests exist, but they don't catch what matters."

Components of AI Testing System

[Code Analysis]        [Requirement Analysis]
  AST parsing             NLP from Jira/Confluence
       ↓                        ↓
[Test Generation Engine]
  Unit | Integration | E2E | API
       ↓
[Test Prioritization]
  Change Impact Analysis → run necessary tests, not all
       ↓
[Result Analysis]
  Failure Classification + Root Cause Suggestion
       ↓
[Coverage Intelligence]
  Semantic gaps in coverage

AI Coverage Analysis: Finding Semantic Gaps

Traditional coverage (Istanbul, JaCoCo) counts lines. Problem: 100% line coverage doesn't mean all business scenarios are tested.

from langchain_openai import ChatOpenAI
import ast
import textwrap

class SemanticCoverageAnalyzer:
    """Analyzes semantic gaps in test coverage"""

    ANALYSIS_PROMPT = """Analyze the function and existing tests.
Identify which business scenarios and boundary conditions are NOT covered.

Function:
```python
{function_code}

Existing tests:

{existing_tests}

Identify uncovered scenarios:

  1. Boundary values (empty string, None, 0, max int, negative)
  2. Parameter combinations
  3. Error scenarios (exceptions, invalid input)
  4. Concurrent access (if applicable)
  5. Business rules in conditions

For each: describe scenario + why it matters + possible bug if not tested. Return JSON: {{gaps: [{{scenario, importance, potential_bug}}]}}"""

def __init__(self):
    self.llm = ChatOpenAI(model="gpt-4o", temperature=0.1)

def analyze_function_coverage(
    self,
    function_source: str,
    test_source: str
) -> list[dict]:
    result = self.llm.invoke(
        self.ANALYSIS_PROMPT.format(
            function_code=function_source,
            existing_tests=test_source
        )
    )
    import json
    return json.loads(result.content)["gaps"]

def extract_functions_from_module(self, source: str) -> list[dict]:
    """Extracts functions from Python module via AST"""
    tree = ast.parse(source)
    functions = []
    for node in ast.walk(tree):
        if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
            func_source = ast.get_source_segment(source, node)
            complexity = self._calculate_cyclomatic_complexity(node)
            functions.append({
                "name": node.name,
                "source": func_source,
                "complexity": complexity,
                "line_start": node.lineno
            })
    return sorted(functions, key=lambda x: x["complexity"], reverse=True)

def _calculate_cyclomatic_complexity(self, node) -> int:
    """Cyclomatic complexity — test prioritization metric"""
    complexity = 1
    for child in ast.walk(node):
        if isinstance(child, (ast.If, ast.While, ast.For, ast.ExceptHandler,
                               ast.With, ast.Assert)):
            complexity += 1
        elif isinstance(child, ast.BoolOp):
            complexity += len(child.values) - 1
    return complexity

### Test Generator with Mutation Testing

```python
class AITestGenerator:
    UNIT_TEST_PROMPT = """Generate pytest unit tests for the function.

Function:
{function_code}

Uncovered scenarios (focus on these):
{gaps}

Requirements:
- Use pytest + pytest-mock
- Parametrize via @pytest.mark.parametrize where applicable
- For each test: Arrange-Act-Assert
- Boundary value tests
- Invalid input tests
- Mocks for external dependencies

Return only code, no explanations."""

    async def generate_unit_tests(
        self,
        function_source: str,
        gaps: list[dict]
    ) -> str:
        gaps_text = "\n".join([
            f"- {g['scenario']}: {g['importance']}"
            for g in gaps[:5]  # top 5 by importance
        ])

        result = await self.llm.ainvoke(
            self.UNIT_TEST_PROMPT.format(
                function_code=function_source,
                gaps=gaps_text
            )
        )
        return result.content

    async def run_mutation_testing(self, source_file: str, test_file: str) -> dict:
        """Runs mutation testing via mutmut"""
        import subprocess
        result = subprocess.run(
            ["mutmut", "run", f"--paths-to-mutate={source_file}",
             f"--tests-dir={test_file}"],
            capture_output=True, text=True
        )

        # Analyze survived mutants (tests didn't catch the change)
        survived = self._parse_survived_mutants(result.stdout)
        if survived:
            additional_tests = await self._generate_for_mutants(survived, source_file)
            return {"survived_count": len(survived), "additional_tests": additional_tests}

        return {"survived_count": 0, "mutation_score": "100%"}

CI/CD Integration

# .github/workflows/ai-qa.yml
name: AI QA Analysis

on:
  pull_request:
    types: [opened, synchronize]

jobs:
  ai-test-analysis:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # needed for diff

      - name: Analyze changed files
        run: |
          git diff origin/main...HEAD --name-only --diff-filter=AM | \
            grep "\.py$" > changed_files.txt

      - name: Run AI coverage analysis
        run: |
          python qa_system/analyze_coverage.py \
            --changed-files changed_files.txt \
            --generate-missing-tests \
            --output coverage_report.json

      - name: Comment PR with AI findings
        uses: actions/github-script@v7
        with:
          script: |
            const report = require('./coverage_report.json')
            const comment = formatReport(report)
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              body: comment
            })

Case study: Python backend service (FastAPI), 45,000 lines of code, 380 tests. Coverage: 74%. AI analysis identified 89 semantic gaps (scenario-level, not line-level), of which 34 were high-priority. Generated 67 additional tests. When run: 8 of 67 tests failed — found real bugs in boundary condition handling (None in aggregation, negative quantities in orders, empty list on sorting).

Timeframe

  • Coverage analysis + unit test generation: 3–4 weeks
  • Full QA system with CI/CD integration: 8–10 weeks
  • Mutation testing and E2E: +2–3 weeks