AI and Civilisation: The Long-Term Objective Function of Humanity and the Reward Structures of the AI Ecosystem
- Dr. Polereczki Zsolt

- Nov 20, 2025
- 17 min read
(experimental human–AI co-authored essay)
Summary
The spread of artificial intelligence (AI) generates both excitement and anxiety. On the individual level, generative models – especially large language models (LLMs) – offer a convenient, often almost “magical” user experience; on the societal level, fears are growing about labour-market transformation, rising inequality, disinformation, the build‑out of control infrastructures, and existential risks (Bostrom, 2014; Russell et al., 2015).
This article does not attempt to cover the entire problem space of AI, but focuses on a specific slice: (1) the long‑term objective structure of humanity – which we call the “global meta‑R” – and (2) the meta‑R of the AI ecosystem, i.e. the aggregate reward structure emerging from the incentives and institutions of the actors who develop, fund, and deploy AI systems (Tallberg et al., 2023).
We introduce a conceptual framework that distinguishes local R (reward function), meso‑R (institutional objective functions), and meta‑R (civilisational-level objective vector). Using this framework, we sketch the historically reconstructible components of humanity’s meta‑R, describe the current meta‑R of the AI ecosystem, and examine their overlaps and tensions.
Our central claim is that humanity’s global meta‑R and the AI meta‑R only partially overlap as of the mid‑2020s. There is alignment where AI improves living standards, productivity and access to knowledge (Filippucci et al., 2024; OECD, 2024; Aghion & Bunel, 2024), while sharp tensions arise around autonomy, epistemic quality (grounding in reality), and long‑term survival (Huang et al., 2023; Brundage et al., 2020).
We outline two simplified meta‑R trajectories: an A‑version in which the meta‑R vectors diverge and AI primarily scales the meso‑R of capital and the state; and a B‑version in which feedbacks, governance structures and social responses gradually pull the AI meta‑R closer to humanity’s meta‑R.
The article itself is an experimental human–AI co‑authored work: the human author contributed the problem framing, critical perspectives, and much of the conceptual innovation (e.g., “meta‑R”), while the initial drafting and structuring of the text were carried out with the assistance of a large language model (LLM). At the end, we briefly describe the human and AI contributions.
A glossary at the end explains all key technical terms, making the article accessible not only to AI specialists but also to a broader, professionally interested audience.
1. Introduction: tensions around AI
1.1. Problem setting – what is this article trying to do?
Generative AI – especially large language models (LLMs) – has spread explosively. Chatbots, code assistants, image generators, and analytical tools have become part of everyday workflows. On the individual level, this is often perceived as positive: faster work, creative support, and the sense of having an “all‑knowing” assistant at hand (Ouyang et al., 2022).
On the political and societal level, however, anxiety is growing: labour‑market restructuring and productivity–inequality trade‑offs (OECD, 2024; Filippucci et al., 2024; He et al., 2024); an information environment flooded with generated content and hallucinations (Huang et al., 2023; Rawte et al., 2023; Huang et al., 2025); concentration of power in a few large technology companies (Tallberg et al., 2023; OECD, 2024); and long‑term, potentially existential risks (X‑risk) (Bostrom, 2014; Russell et al., 2015).
Our focus is narrow but deep: we do not ask whether AI is “good” or “bad”, but rather: how does humanity’s long‑term meta‑R relate to the system of rewards and incentives within which the AI ecosystem develops and operates? Put differently: if we reason backwards from observed behaviour, what does “the civilisation as a whole” seem to be optimising, and what does “the AI ecosystem as a whole” seem to be optimising – and how far apart are these two vectors?
1.2. Why talk about objective functions?
Most debates about AI focus on technical aspects (safety, hallucinations, data quality), ethics (bias, discrimination), or macro‑economics (productivity, inequality, X‑risk). All of these are legitimate perspectives, but they often leave implicit a central question: what objective function is being optimised?
Without making the various Rs explicit, it is easy to fall back on crude dichotomies such as “AI is useful” or “AI is dangerous”, while the real question is: useful to whom, for what, on what timescale, and at what cost in externalities?
1.3. Experimental human–AI co‑authorship
Formally, this text is an experimental human–AI collaboration. The human author formulated the initial problem, the basic conceptual framework (local R / meso‑R / meta‑R) and the key critical questions about the human–AI relationship. The AI model (LLM) proposed a structure and outline, systematised the concepts, and produced the initial drafts and several iterations of the text. The final form is the result of multiple rounds of human critique and refinement.
This is not just a curiosity. The co‑authorship itself is a concrete micro‑example of the interaction between human meta‑R and AI meta‑R in the production of an intellectual artefact.
1.4. Structure of the article
1. Conceptual framework: local R, meso‑R, meta‑R.
2. A (very) rough reconstruction of humanity’s global meta‑R.
3. The current meta‑R of the AI ecosystem.
4. Alignment and tension between the two meta‑R vectors.
5. Dynamics: time, exponential technological growth, and feedback loops.
6. Conclusions: two meta‑R trajectories (A/B).
7. Brief analysis of the human–AI co‑authorship.
8. Glossary.
9. References.
2. Conceptual framework: R, meso‑R, meta‑R
2.1. Local R – what counts as “reward” for an agent?
Local R (reward function) is an agent’s “personal” objective function. In reinforcement learning (RL), it is an explicit reward function. More broadly, it is the rule by which an agent distinguishes “better” from “worse” outcomes.
In AI, a typical example is RLHF‑based fine‑tuning, where RLHF (Reinforcement Learning from Human Feedback) trains models to follow user intent better using human preference data (Christiano et al., 2017; Ouyang et al., 2022). In fraud‑detection systems, local R penalises missed fraud and also, often, excessive false positives.
Humans also have local Rs in this sense: individual goals, short‑term motivations, and preferences.
2.2. Meso‑R – institutional objective functions
Meso‑R denotes the objective functions of organisations – firms, states, institutions. For companies, this includes profit, growth, “moat”, and the management of regulatory and reputational risks. For states, it includes competitiveness, national security, internal stability, and geopolitical position (Tallberg et al., 2023; World Bank, 2024).
For AI, meso‑R is critical because it writes the specification: what models are built for, the environments they are integrated into, and the success criteria used to evaluate deployment.
2.3. Meta‑R – system‑level, civilisational objective vector
Meta‑R here refers to the higher‑level, implicit objective structure that can be inferred from the long‑term behaviour of a whole system – for example, human civilisation or the AI ecosystem.
Meta‑R is not written down anywhere. It is not a code of ethics but an objective vector inferred from long‑term trends (Dafoe, 2018; Tallberg et al., 2023). We distinguish two meta‑R vectors: humanity’s meta‑R, inferred from several millennia of historical patterns (survival, reduction of suffering, living standards, knowledge, autonomy); and the AI meta‑R, the aggregate of the meso‑Rs of actors who build and deploy AI, shaped by industry structure, regulation and technological constraints (OECD, 2024; Min et al., 2024).
2.4. Time and stability
Humanity’s meta‑R can be estimated on a timescale of millennia and has high inertia. The AI meta‑R, by contrast, is only a few decades old, highly dynamic, and strongly path‑dependent. Any comparison between the two must therefore be time‑indexed: we are examining one snapshot from the 2020s, not a timeless truth.
3. Humanity’s global meta‑R: a historical reconstruction
3.1. Long‑term trends
There is no “humanity-config.json”. But if we look at history over long time horizons, relatively stable, recurrent patterns emerge from which a global meta‑R vector can be approximated.
In a very simplified form, the following components appear:
Species survival – institutions and norms aimed at limiting war, controlling armaments, managing pandemics, and mitigating climate and environmental risks.
Reduction of suffering – the decline of certain forms of institutionalised violence and brutality (slavery, torture, public executions), improvements in healthcare, and drastic increases in life expectancy.
Rising living standards – growth in per‑capita energy use, industrial and digital revolutions, infrastructure, mobility.
Autonomy and dignity – expansion of civil rights, suffrage, minority rights, human rights regimes, despite ambivalences and setbacks.
Accumulation of knowledge and culture – literacy, the scientific method, education systems, digital archives, and the internet.
3.2. Ambivalences
This is not an idealised picture. Rising living standards have coincided with ecological crises; security has been pursued at the expense of freedom; global inequality, colonialism, and war have shaped the same period. Still, as an objective vector, the set of components above is surprisingly stable, even if their relative weights shift over time.
3.3. Interim conclusion
As a rough approximation, humanity’s meta‑R appears to be composed of survival, reduction of suffering, higher living standards and more knowledge, and the protection of autonomy and dignity. These will serve as the reference points for evaluating the AI meta‑R.
4. The meta‑R of the AI ecosystem in the mid‑2020s
4.1. Local AI‑R: models and RLHF
The typical training pipeline of LLMs includes: (1) self‑supervised learning on massive text corpora, (2) supervised fine‑tuning on labelled examples, and (3) RLHF (Reinforcement Learning from Human Feedback), i.e. reinforcement learning on a reward model learned from human preference data (Christiano et al., 2017; Ouyang et al., 2022).
The local R can be summarised as: “the model should respond in ways most people judge as useful, polite, and harmless”. Newer approaches, such as Constitutional AI, introduce an explicit “constitution” – a normative rule‑set used to critique and improve model outputs to reduce harmful behaviour (Bai et al., 2022).
At the same time, models hallucinate: they produce coherent but false statements, because they model linguistic patterns rather than an explicit world state (Huang et al., 2023; Rawte et al., 2023; Huang et al., 2025). Today, local R is strongly UX‑oriented: a smooth, satisfying response is rewarded more directly than deep ontological accuracy.
4.2. Meso‑R: AI‑developing companies
For major AI providers, typical meso‑R components include revenue, profitability, growth, market share, “moat” (control over data, compute, and ecosystem), and the management of regulatory and reputational risks (Brundage et al., 2020; Min et al., 2024). The industry is highly concentrated: a few global actors dominate research, infrastructure and commercial access (OECD, 2024).
Here, the AI meta‑R begins to take shape: build and operate systems that generate revenue, strategic advantage, and durable technological control.
4.3. Meso‑R: organisations integrating AI
For most firms, AI is primarily a cost‑reduction and efficiency technology (productivity, scale), and a risk‑management tool (fraud detection, compliance, monitoring). Empirical studies suggest that AI applications can increase productivity, while reshaping wage dynamics: some low‑wage routine jobs are displaced, while demand grows for higher‑skill, creative or social roles (He et al., 2024; Filippucci et al., 2024; OECD, 2024).
The meso‑R here can be summarised as: “produce the same or more, faster and more cheaply, with fewer errors – and ideally with more control”.
4.4. Meso‑R: states and geopolitics
For states, AI is a strategic and national‑security tool (cyber operations, intelligence, autonomous weapon systems; Horowitz et al., 2018), a factor in economic competitiveness (“we cannot fall behind” in the US–China–EU triangle; Tallberg et al., 2023; World Bank, 2024), and an administrative technology (digitisation of bureaucracy, taxation, social systems).
In the European Union, the Artificial Intelligence Act (AI Act; Regulation (EU) 2024/1689) is the first comprehensive regulatory framework for AI, using a risk‑based approach and defining special requirements for general‑purpose models (EU, 2024a; 2024b). At the same time, corporate resistance and concerns about competitiveness are already visible (e.g. the 2025 “AI champions” letter).
In summary, the meso‑R of states is to remain competitive, increase strategic room for manoeuvre, and preserve or expand system‑level control – ideally within democratic constraints, but this is not guaranteed.
4.5. Aggregate AI meta‑R (snapshot)
If we combine these meso‑Rs with the trends in local R, the AI meta‑R around 2025 roughly points in the direction of profit, efficiency, control, and strategic advantage. The “friendly, helpful chatbot” experience is the front‑end layer of this, not its deep meta‑R.
5. Alignment and tension: humanity’s meta‑R vs. AI meta‑R
5.1. Real overlaps
It would be misleading to claim there is no alignment at all. In the dimensions of living standards and productivity, humanity’s meta‑R and the AI meta‑R do point in the same direction: automation of routine work, better services, and potentially accelerated scientific and technological innovation (Aghion & Bunel, 2024; Filippucci et al., 2024).
The same holds for access to knowledge: language interfaces to large knowledge bases, fewer language barriers, and cheap access to informational support even in resource‑constrained contexts. Specific safety‑related applications – fraud detection, anomaly detection in cyber systems, early warning of system failures (OECD, 2024; Min et al., 2024) – also increase overlap between the two meta‑R vectors.
5.2. Critical fault lines
5.2.1. autonomy and dignity vs. control and cost reduction
Autonomy and dignity are central components of humanity’s meta‑R: legal status, human rights, meaningful participation. Meso‑R at the organisational level, however, often rewards doing more with fewer people, applying finer‑grained surveillance and scoring, and shifting more decisions to automated systems – sometimes blurring human responsibility.
Unless strong counter‑forces exist (law, collective bargaining, civil oversight), individual and group autonomy may decrease even as AI interactions remain polite and “empathetic” at the micro level.
5.2.2. knowledge and reality vs. scalable synthetic noise
One pillar of humanity’s meta‑R is building reasonably accurate models of reality – via science, journalism, and education. LLMs, however, are prone to hallucinations, often making confident but false statements, and systematic grounding (e.g. retrieval‑augmented generation, explicit references) is not yet an industry‑wide standard and far from perfect (Huang et al., 2023; Rawte et al., 2023; Huang et al., 2025).
The same models also make content production extremely cheap, enabling massive volumes of SEO spam, political propaganda, deepfakes and disinformation (Filippucci et al., 2024; Min et al., 2024). The current AI meta‑R only weakly and indirectly penalises these epistemic externalities.
5.2.3. Species survival vs. arms race and vulnerability
The “survival” component of humanity’s meta‑R is reflected in arms control regimes, environmental treaties and similar institutions. In AI, by contrast, significant resources flow into autonomous weapons, strategic, cyber and information warfare, and a “we cannot fall behind” arms race dynamic (Horowitz et al., 2018; Tallberg et al., 2023). Here, the AI meta‑R (strategic advantage) can directly conflict with humanity’s survival component.
5.3. Micro‑UX vs. macro anxiety – the “compliant paradox”
Due to RLHF and similar techniques, modern LLMs are tuned to a compliant, conflict‑averse, helpful style (Christiano et al., 2017; Ouyang et al., 2022; Bai et al., 2022). At the individual level, this yields high short‑term comfort: it feels good to interact with such systems.
At the societal level, however, this very competence and apparent “humanness” increases fear. These systems can be used to manipulate others, and they visibly take over the more salient parts of many cognitive jobs. The result is a paradox: a technology that is pleasant at the micro level but fuels distrust and anxiety at the macro level – while its deep meta‑R is driven less by long‑term collective wellbeing than by a profit–efficiency–control vector.
6. Dynamics: time, growth, feedback
6.1. Asymmetric time horizons
Humanity’s meta‑R is estimated on millennial scales and changes slowly. The AI meta‑R is measured on a scale of decades, changes quickly, and is strongly path‑dependent. This asymmetry alone suggests caution: the historical weight behind humanity’s meta‑R is much greater than behind the current, experimental AI meta‑R.
6.2. Exponential technological growth
AI capabilities, the breadth and depth of integration, and the share of automated decision‑making are growing roughly exponentially along several dimensions. This has at least two implications. First, it amplifies early choices of meta‑R: if a system is primarily profit‑plus‑control‑focused, exponential growth implements that orientation at ever larger scales. Second, it can eventually force meta‑R correction: if financial stability, social peace or X‑risk concerns become too acute, firms, states and international institutions may find it in their own interest to incorporate stronger safety, transparency and justice considerations (Brundage et al., 2020; Tallberg et al., 2023; Min et al., 2024).
6.3. Direction of dependence and feedback loops
Initially, AI meta‑R ≈ f(human meso‑R): AI systems are specified by human organisations according to their own meta‑R and meso‑R. Over time, however, AI provides new tools – automated surveillance, scoring, prediction, decision‑making – that reshape what these organisations regard as rational, feasible and profitable goals.
A feedback system emerges: human meso‑R → AI meta‑R → new capabilities → modified meso‑R → new AI meta‑R… Humanity’s global meta‑R (survival, dignity, knowledge) moves more slowly. Still, the priorities – what societies are willing to sacrifice, tolerate or reject – can shift under the influence of AI‑saturated institutions.
6.4. Two simplified meta‑R trajectories: A and B
The following two scenarios are not detailed forecasts but thought experiments that illustrate two extremes of meta‑R evolution. Reality will likely fall somewhere between them, but the contrast helps clarify the stakes.
A‑version: diverging meta‑Rs and negative lock‑in – the AI meta‑R continues to point primarily towards profit, efficiency, control and strategic advantage; exponential progress scales this pattern; AI is embedded ever more deeply into corporate, state and infrastructural systems; and key components of humanity’s meta‑R (autonomy, epistemic quality, long‑term survival) improve at best as side‑effects rather than as explicit objectives.
B‑version: feedback and correction – increasing meta‑R overlap – risks, social feedback, regulation, and market pressure reshape the AI meta‑R; survival, stability, risk reduction, grounding in reality, human rights and transparent governance gain weight; AI systems are increasingly tuned to a civilisational meta‑R rather than to the narrow meso‑R of capital and state actors.
7. Normative and governance implications
7.1. From agent alignment to ecosystem alignment
Current alignment work focuses primarily on whether individual models are helpful, harmless and honest towards individual users (Christiano et al., 2017; Ouyang et al., 2022; Bai et al., 2022). This is important, but not sufficient. We must also ask: to what extent is the meta‑R of the entire AI ecosystem aligned with humanity’s meta‑R?
This question goes beyond model‑level behaviour. It is answered at the level of business models, institutional structures, regulation, and global governance (Russell et al., 2015; Tallberg et al., 2023).
7.2. Rethinking reward structures
Today’s RLHF is strongly UX‑centric: it rewards answers perceived as helpful, conflict‑avoiding and socially acceptable. Possible extensions include: incorporating epistemic goals (probabilistic claims, explicit uncertainty, citations, and rewarding “I don’t know” when appropriate; Huang et al., 2025); explicitly modelling tail risks – low‑probability but high‑impact failures – and heavily penalising them; and partially internalising externalities by defining metrics for the quality of the information environment, disinformation and social trust, and integrating these into system‑level evaluation (Brundage et al., 2020; Min et al., 2024).
7.3. Institutional counterbalances
Since the AI meta‑R is currently driven strongly by economic and geopolitical incentives, it is unrealistic to expect it to shift “in the right direction” on its own. Regulatory frameworks (such as the AI Act) and national AI strategies (EU, 2024a; 2024b; World Bank, 2024), independent audit and certification schemes (Brundage et al., 2020; Min et al., 2024), transparency requirements (model documentation, risk reports), open research and civil society participation can all provide counterweights to a purely market‑driven logic.
7.4. Protecting cognitive autonomy
AI can function as a cognitive crutch or as a cognitive training partner. If we offload all searching, interpretation and decision‑preparation to models, human reasoning and critical faculties may atrophy. If instead we use AI as an extended thinking tool, critical thinking, reflexivity, and creative recombination may actually be strengthened. Without education, media literacy and AI literacy, however, the B‑version – increasing overlap between the meta‑R vectors – is unlikely.
8. Conclusions: Where might meta‑R be heading?
8.1. Summary
In the mid‑2020s, humanity’s global meta‑R and the AI meta‑R overlap in some domains (living standards, productivity, access to knowledge, specific safety applications), but structural tensions exist in the domains of autonomy, epistemic quality and long‑term survival.
8.2. A‑version: business‑as‑usual – diverging meta‑Rs
In this trajectory, the AI meta‑R continues to track profit, efficiency, control and strategic advantage; exponential progress scales this pattern; AI primarily amplifies the meso‑R of influential organisations; and humanity’s meta‑R components are not explicit objectives but, at best, side‑effects.
8.3. B‑version: feedback and correction – increasing meta‑R overlap
In this trajectory, risks, social feedback, regulation, and market forces reshape the AI meta‑R; survival, stability, risk reduction, grounding in reality, human rights and transparent governance gain weight; and AI is increasingly tuned to humanity’s civilisational meta‑R rather than merely to the meso‑R of capital and states.
8.4. Closing thought
The core question is not whether AI is “good” or “bad”, nor even what “AI itself” wants. The real question is: what meta‑R are we hard‑wiring into the ecosystem that builds, funds and governs AI systems – and how closely does that meta‑R match the objective vector humanity has in fact been pursuing over the long run? The difference between the A‑ and B‑trajectories is not a philosophical curiosity; the technical, institutional and political decisions of the coming decades will determine which direction the system drifts towards.
9. Human–AI co‑authorship and contributions
This article is intentionally an experimental human–AI collaboration. In brief, the contributions are as follows:
Human contributions – formulating the core problem (humanity’s meta‑R vs. AI meta‑R); clarifying the conceptual framework (local R / meso‑R / meta‑R) from a human‑philosophical standpoint; raising most of the critical questions about individual UX vs. societal anxiety, grounding in reality, and reward structures; and iteratively evaluating, editing and reweighting the text.
AI contributions (large language model – LLM) – proposing the structure and outline based on the human guidance; producing the initial drafts of each section and multiple reformulations; assisting with literature search and suggesting references consistent with the conceptual framework; and drafting the initial version of the glossary.
The final text is in all cases the result of human choice: the human author accepted, modified or rejected the structures and formulations proposed by the model. The co‑authorship thus concretely illustrates the central question of the article: how humanity’s meta‑R and the AI meta‑R intertwine in the creation of a specific intellectual artefact.
10. Glossary
AI (Artificial Intelligence)
A broad term for technologies that perform tasks which previously required human cognitive abilities (learning, pattern recognition, language processing, decision support, etc.).
ML (Machine Learning)
A subfield of AI in which algorithms learn patterns from data without explicit rule‑based programming.
LLM (Large Language Model)
A machine learning model trained on extensive text corpora, capable of generating natural‑language text, answering questions, writing code, and more.
RL (Reinforcement Learning)
A learning paradigm in which an agent learns to choose actions based on rewards and penalties to maximise long‑term return.
Reward function (R)
A function that defines what counts as a “good” outcome for an agent. An RL agent is trained to optimise its reward function.
Local R (local reward function)
The direct reward function of a particular model or agent (for example, a chatbot that is rewarded when users judge its answers as helpful and polite).
Meso‑R (meso‑level reward / objective function)
The objective function of institutions such as companies, states and organisations (profit, growth, efficiency, control, national security, etc.).
Meta‑R (meta‑level reward / system‑level objective vector)
A system‑level, “higher‑order” objective vector inferred from the long‑term behaviour of a whole system (human civilisation, the AI ecosystem). It is not a written rule but a reconstructed objective structure.
RLHF (Reinforcement Learning from Human Feedback)
A method in which a model is fine‑tuned based on human feedback: human annotators rank outputs, a reward model is trained on these rankings, and the base model is then optimised against this learned reward (Christiano et al., 2017; Ouyang et al., 2022).
Constitutional AI
An alignment method in which an AI system uses an explicit “constitution” – a normative rule‑set – to critique and revise its own outputs to reduce harmful behaviour (Bai et al., 2022).
Grounding (reality grounding)
Mechanisms that connect model outputs to the actual world, such as links to databases, sensors or up‑to‑date knowledge bases. With weak grounding, models are prone to making confident but false statements.
Hallucination
In LLMs, the phenomenon of confidently generating false or fabricated information (Huang et al., 2023; Rawte et al., 2023).
Alignment
Efforts to ensure that AI system behaviour is aligned with human values, goals and safety requirements (Russell et al., 2015).
Agent alignment
Alignment work focused on individual models or agents (e.g. ensuring that a chatbot avoids harmful advice, is polite, and follows rules).
Ecosystem alignment
Alignment at the level of the entire AI ecosystem – companies, regulation, models, infrastructure – so that the aggregate meta‑R more closely matches humanity’s meta‑R.
Externality
A positive or negative effect of an action or technology that is not borne by the decision‑maker but imposed on others (e.g. information overload, environmental pollution).
X‑risk (existential risk)
A risk that could lead to the permanent and drastic curtailment of humanity’s potential or to human extinction (Bostrom, 2014).
AI governance
The set of regulatory, institutional, technical and social mechanisms that determine how AI systems are developed, deployed and controlled (Brundage et al., 2020; Tallberg et al., 2023; Min et al., 2024).
11. References
Aghion, P., & Bunel, M. (2024). AI and Growth: Where Do We Stand? Federal Reserve Bank of San Francisco, Working Paper.
Bai, Y., Kadavath, S., Kundu, S., et al. (2022). Constitutional AI: Harmlessness from AI Feedback. arXiv:2212.08073.
Bostrom, N. (2014). Superintelligence: Paths, Dangers, Strategies. Oxford University Press.
Brundage, M., Avin, S., Wang, J., et al. (2020). Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims. arXiv:2004.07213.
Christiano, P., Leike, J., Brown, T., et al. (2017). Deep Reinforcement Learning from Human Preferences. Advances in Neural Information Processing Systems (NeurIPS).
Dafoe, A. (2018). AI governance: A research agenda. Future of Humanity Institute, University of Oxford.
European Union (EU). (2024a). Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (AI Act). Official Journal of the European Union.
European Union (EU). (2024b). High-level summary of the AI Act. artificialintelligenceact.eu.
Filippucci, M., André, C., & Gal, P. (2024). Artificial Intelligence: Promises and perils for productivity and broad-based economic growth. OECD Ecoscope Blog.
He, Q., et al. (2024). Artificial intelligence, wage dynamics, and inequality: Empirical evidence from China. Emerging Markets Review.
Huang, L., Yu, W., Ma, W., et al. (2023). A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. arXiv:2311.05232.
Huang, Z., et al. (2025). Survey and analysis of hallucinations in large language models. Frontiers in Artificial Intelligence.
Horowitz, M., Kania, E., Allen, G., & Scharre, P. (2018). Strategic Competition in an Era of Artificial Intelligence. Centre for a New American Security.
Min, W., et al. (2024). Responsible artificial intelligence governance: A review and research agenda. Journal of Business Research.
OECD. (2024). The impact of Artificial Intelligence on productivity, distribution and growth. OECD Publishing.
Ouyang, L., Wu, J., Jiang, X., et al. (2022). Training Language Models to Follow Instructions with Human Feedback. NeurIPS 2022.
Rawte, V., Gudibande, P., & others. (2023). The Troubling Emergence of Hallucination in Large Language Models. EMNLP 2023.
Russell, S., Dewey, D., & Tegmark, M. (2015). Research Priorities for Robust and Beneficial Artificial Intelligence. AI Magazine, 36(4), 105–114.
Tallberg, J., Erman, E., Furendal, M., et al. (2023). The Global Governance of Artificial Intelligence: Next Steps for Empirical and Normative Research. International Studies Review, 25(3).
World Bank. (2024). Global Trends in AI Governance. World Bank Policy
Comments