Generate text

Introduction

The different narratives are generated using the OpenAI library. A prompt customized to our needs is fed into a model. The models tested are gpt-4o-mini (cost-efficient) and gpt-4o (better results but more expensive).

The role can be defined as user, system, or assistant:

The system role is used to set the overall behavior and guidelines for the assistant, such as specifying that the assistant should be formal or follow specific ethical guidelines.

The user role represents the input from the human user, including queries, requests, or prompts.

The assistant role represents the responses generated by the AI model based on the user's input and any system instructions.

Further testing is required to improve results.

completion = client.chat.completions.create(
                model="gpt-4o-mini",
                store=True,
                messages=[{"role": "system", "content": "You are a helpful investment adviser."},
                    {"role": "user", "content": prompt}]
            )
narrative_text = completion.choices[0].message.content

Function

The different routes (/salient,/narrative,/due_diligence) all use the same function fill_narrative_table which follows these steps:

Check if table exist and if yes, create a backup AND remove data from table
Select countries with missing data
For each country, feed the prompt to the model

Code extract: fill_narrative_table

def fill_narrative_table(table_name, engine,prompt):
    # 1. Check if table exist and if yes, create a backup AND remove data from table
    backup_table_name = f'backup_{table_name}'

    with engine.connect() as conn:
        result = conn.execute(text(f"""
            SELECT EXISTS (
                SELECT 1 
                FROM information_schema.tables 
                WHERE table_schema = 'draft_schema' 
                AND table_name = '{table_name}'
            );
        """))
        table_exists = result.scalar()
        app.logger.debug("Table exists: %s", table_exists)

        if table_exists:
            app.logger.debug("Table %s exists. Creating backup and truncate the table.",table_name)
            conn.execute(text(f"DROP TABLE IF EXISTS draft_schema.{backup_table_name};"))
            conn.execute(text(f"CREATE TABLE draft_schema.{backup_table_name} AS TABLE draft_schema.{table_name};"))
            conn.execute(text(f"TRUNCATE TABLE draft_schema.{table_name};"))
            conn.commit()
        else:
            app.logger.debug("Table draft_schema.%s does not exist. No backup needed.",table_name)

    # 2. Select countries with missing data
    query1 = text(f"""select e.iso3,e.country, n.current_text from draft_schema.equivalency_table as e
left join draft_schema.{table_name} as n
on e.iso3=n.iso3
where (e.selection=1 and n.current_text is null)""")
    app.logger.debug("query1:\n%s", query1)
    countries_df = pd.read_sql(query1, engine, params={"table_name": table_name})

    # 3. For each country, feed the prompt to the model
    # Process each country in batches
    batch_size = 10
    app.logger.debug("length countries",len(countries_df))
    for i in range(0, len(countries_df), batch_size):
        app.logger.debug("Processing batch %s", i)
        batch_df = countries_df.iloc[i:i + batch_size]# Process each country
        app.logger.debug("Batch:\n%s", batch_df)
        for index,row in batch_df.iterrows():
            iso3 = row['iso3']
            country = row['country']
            app.logger.debug("Processing country: %s", country)

            if table_name == 'salient':
                indicator_list = all_fsi_for_prompt(iso3)#select_high_risk_indicators(iso3)#
                app.logger.debug("list:\n%s", indicator_list)
                prompt2=prompt+country+""", to provide examples on the thematics mentioned below. 
                Understanding that for categories the prefixes "c_" are cohesion, "e_" are economy, "p_" are political, "s_" and "x_" are social. Never write these prefixes """ + indicator_list
                prompt2=prompt2+"""attached list:""" + indicator_list
            if table_name == 'narrative':
                prompt2 = """Provide a concise paragraph (maximum 7 lines) summarizing the political and conflict-related developments in """+ country+ prompt
                ihl_text=all_ihl(iso3)
                if ihl_text!="no data":
                    prompt2 = prompt2 + "Include in the analysis the evolution of the relation under IHL:"
                    prompt2 = prompt2+ all_ihl(iso3)

            app.logger.debug("prompt2:\n%s", prompt)        
            completion = client.chat.completions.create(
                model="gpt-4o-mini",
                store=True,
                messages=[{"role": "system", "content": prompt2}]
            )
            narrative_text = completion.choices[0].message.content
            app.logger.debug("%s:\n%s",table_name, narrative_text)

            # Create a DataFrame with the data
            data = {
                'iso3': [iso3],
                'current_text': [narrative_text]
            }
            df = pd.DataFrame(data)
            try:
                with engine.connect() as conn:
                    df.to_sql(table_name, con=conn, schema='draft_schema', if_exists='append', index=False)
                    print(f"Inserted row with iso3: {iso3}, {table_name}: {narrative_text}")
                    app.logger.debug("Inserted row with iso3: %s, %s: %s", iso3, table_name, narrative_text)

            except Exception as e:
                app.logger.error("Error inserting data: %s", e)
                return jsonify({"error": str(e)}), 500

    return jsonify({"message": f"{table_name} processed and inserted successfully."}), 200

Limits

urther testing is needed for this functionality. The prompt used for due diligence is long, fed with many datasets, and may benefit from more advanced models, though they are expensive.

A better solution could involve training the model to fit specific needs, which would require time from both PeaceNexus and the developer. Additional information, such as outlier detection from the ACLED dataset (e.g., administrative level 1 regions with more than XX% of events or a sharp increase from the last month), could be fed into the analysis. This approach would be far more resource-efficient than including all ACLED data in the prompt.

Prompts shared

Note: [VALUES] means that it is loaded by the code.

Country narrative

Provide a concise paragraph (maximum 7 lines) summarizing the political and conflict-related developments in [COUNTRY], with a focus on conflict and fragility dynamics. Address the following: The current political situation, including any recent governance changes (e.g., constitution, coup). Recent significant political or social events (e.g., protests, constitutional amendments). Major security incidents (e.g., large-scale attacks). Regional dynamics influencing or influenced by the country (e.g., new alliances, disputes, heightened tensions). Key upcoming events (e.g., elections, referendum). Include in the analysis the evolution of the relation under IHL: [EXTRACT RULAC DATA IN DATABASE]

Salient

Provide a short paragraph (5 lines maximum) using the top 3 indicators extracted in the attached list, specifying their names and classification (should be in parenthesis Alert if >7.5, Warning if 5-7.5, Stable 2.5-5 or sustainable <2.5 ) but not their value (from 0 = good to 10 =bad), as well as the trends of these indicators in the last 5 years (evolution). Some indicators may be in the same category. Therefore, tailor your response accordingly to make it smooth. Also, include the top two indicators with the most evolution, meaning the one that has moved negatively demonstrating progress and the one that has moved positively showing deterioration. Evolution should be treated as slight (<5%), moderate (5 to 10%) or large (> 10%), remembering that an increase is a negative outcome, but do not mention the evolution value. Use your knowledge on this country,[COUNTRY], to provide examples on the thematics mentioned below. Understanding that for categories the prefixes "c_" are cohesion, "e_" are economy, "p_" are political, "s_" and "x_" are social. Never write these prefixes attached list: [ALL FSI VALUES IN DATABASE FOR THE COUNTRY]

Timeline

While timeline follows different steps to provide the articles links, the prompt is the following:

Up to 10 major events with important economical, ecological, security and political impact in the last 5 years (include possible future elections in the next year) in [COUNTRY],with the mandatory following fields: event, description, category and date(use the first day of the month of the year if unknown) in json format. No need for text outside the json format.

The json format is important as it can be treated afterwards as a list of events and make necessary operations on it.

Due diligence

In the lists below, there are two sets of data: - FSI indicators in country, which provide an overview of political, social, and economic fragility. The FSI methodology includes definitions to enhance interpretation, and the FSI interpretation guidance offers additional analysis questions and guidance. - ACLED index, which analyzes conflict dynamics. These indexes are explained in the ACLED definitions sheet. While FSI indicators assess overall country fragility, ACLED focuses specifically on conflict trends. - The last sheet provides an overview of key sectoral trends and risks for three sectors.""" I am interested in investing in [SECTOR] in [COUNTRY] and require a due diligence assessment that systematically identifies and prioritizes risks based on severity (impact on investment) and likelihood (probability of occurrence) using the provided FSI indicators, ACLED index, and sector-specific trends and risks. The due diligence should focus on issue identification, prioritization, and actionable Key Components of the Due Diligence: 1. Issue Identification & Risk Prioritization - Identify the most critical risks using FSI and ACLED rankings (e.g., highest-scoring indicators and conflict intensity). - Rank risks based on their severity (potential impact on the investment) and likelihood (probability of occurrence based on indicators). 2. Sector-Specific Risks for [SECTOR] - Analyze sectoral vulnerabilities based on the provided sector trends and risks. - Identify conflict-sensitive risks such as land disputes, labor risks, environmental concerns, and regulatory challenges. - Highlight corruption and governance risks in the sector. 3. Conflict & Security Risks - Use the ACLED conflict data to assess threats to project sites, workforce, and supply chains. - Identify hotspot regions or escalation trends that may affect project feasibility. - Provide security mitigation recommendations. 4. Operational & Financial Risks - Assess infrastructure constraints, supply chain bottlenecks, and currency risks based on economic indicators. - Identify risks in government policies, contract enforceability, and local business dynamics. 5. Conflict Sensitivity & Responsible Business Conduct - Identify human rights risks using relevant FSI indicators (e.g., group grievance, public services, state legitimacy, etc.). - Assess community acceptance and land acquisition challenges. - Provide actionable engagement strategies for mitigating social tensions. 6. Double Materiality Risk Matrix - Categorize most important risks (maximum 4) to prioritize mitigation efforts, with a specific focus on how the business can impact the conflict dynamic 7. Action Points (Instead of General Recommendations) - Provide a list of practical steps investors should take to mitigate high-priority risks. - Focus on conflict-sensitive strategies, risk mitigation in procurement and security, and local stakeholder engagement. Expected Output: The assessment should be structured as a due diligence guidance report (not a full due diligence process), with a key quadrants analysis and clear action points rather than general context descriptions, avoid generalities and be specific. The output should be investment-focused and include risk mitigation strategies, as well as looking at how the investment can impact the situation. Use html tags to format the output for readability but keep a text. Remove all reference to indicators coded names (begining such as e1_, e2_,c1_,s1_,x1_,p1_ ...) and replace with the actual indicator names. Don't display values

Attached list : FSI indicators in the country: [ALL FSI VALUES IN DATABASE FOR THE COUNTRY] FSI indicators definitions and interpretation_guidance:[ALL FSI DEFINITION] ACLED Indicators value: [ACLED INDEX VALUES] Attached list : ACLED indicators definitions: [ACLED INDEX DEFINITION] Attached list : Sectors specific trends and risks: [SECTOR RISKS EXTRACT]